Chemical Markup support for Wikimedia Commons
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. This was a Google Summer of Code/2014 project/proposal. |
Chemical Markup support for Wikimedia Commons
edit- Public URL
- https://www.mediawiki.org/wiki/Chemical_Markup_support_for_Wikimedia_Commons
- Issue tracker
- Maniphest (a "Phabricator" application)
- Issue tracker for Extension
- MediaWiki Bugzilla
- Task board
- Project Board (a "Phabricator" application)
- Repo
- Gerrit WM
- Bugzilla report
- bugzilla:16491
- Announcement
- wikitech-l, commons-l
Name and contact information
edit- Name
- Rainer Rillke
- <lastname>@wikipedia.de
- IRC or IM networks/handle(s)
- rillke
- Web Page / Blog / Microblog / Portfolio
- https://commons.wikimedia.org/wiki/User:Rillke
- Resume (optional)
- http://osrc.dfm.io/rillke
- Location
- Germany
- Typical working hours
- 08:00 - 20:00 UTC; Tue, Wed, Thur: 08:00 - 16:00 UTC
Synopsis
editWikipedia articles covering chemical reactions or chemical compounds are often illustrated with SVG graphics showing chemical equations or compounds. However, SVG is a graphic format. It is therefore not possible to easily re-mix these fils and one has to draw the whole compound again (or pull it from a database). A common scenario is "Quack" started an article about a compound and "Cheming" wants to contribute how to synthesize that compound. "Cheming" has to re-draw the whole compound.
Goals
edit- Server-side support
Allow uploading and implement rendering for MDL-molfiles. The format is specified, human readable and commonly used.
“The molfile is sufficiently common that most, if not all, cheminformatics software systems/applications are able to read the format, though not always to the same degree. It is also supported by some computational software such as Mathematica.” -en:Chemical table file
There are client-side JavaScript creators for web browsers available and 2image converters for the server side.
- Client side molecule editor
Provide a JavaScript molecule editor so editors do not have to install software and then go through the hoops, choosing the correct format(ting) and uploading the files. File upload can be accomplished by AJAX.
- Possible mentors
- Gilles Dubuc, Brian Wolff, Bryan Davis
Deliverables
editPlease describe the details and the timeline of the work you plan to accomplish on the project you are most interested in (discuss these first with the mentor of the project):
Below a list of existing client-side and server-side solutions.
Sever side or fully integrated solutions
editapproach / third party dependency | pros | cons |
---|---|---|
Client side SVG creation; embedding molfile into generated SVG | less security issues anticipated rapid deployment possible |
creation of some kind of personal format spec. only users with UAs supporting SVG creation would be supported AGPL as license:[dubious ] The only (nice) SVG creating molfile editor I found is Ketcher by GGA; there is another one but this is compiled from Java with Google Webtoolkit and I don't even want to look at the outputUsers could trick the system violating integrity; the user renders the file - molfile could be different from SVG |
Server-side molfile rendering: indigo-depict |
pure PHP, easy to review, rewrite and deploy public domain
|
SVG must be susequently processed by rsvg for thumbnails PHP is not the fastest approach |
Server-side molfile rendering: indigo by GGA https://github.com/Rillke/indigo |
maintanance by a notable company precompiled binaries avalable (good for testing) GPL v.3 |
requires installing binaries or phpize or something like that requires c/c++ security review |
Server-side molfile rendering: ChemAzTech |
incoporates a lot of features and is already translated to french
GPL v.2
|
python for converting to images required |
Server-side molfile rendering: OpenBabel |
a wide variety of formats supported; C/C++, native code — fast processing with almost zero impact on servers expected (since chemical markup is not too commonly used in WMF projects); Ubuntu package for Ubuntu 12.04 available GPL v.2
|
similar to indigo a large framework |
The client side
editmolecule editor | pros | cons |
---|---|---|
Ketcher by GGA | draws on SVG that could be sent and stored at server |
AGPL[dubious ] advertising must be removed |
ChemDoodle Web Components |
GPL v3.0 nice and fast |
code without any helpful comment; looks like concatenated from multiple files, but is still readable, however far away from being a pleasure draws on canvas advertising must be removed |
JSME |
draws on SVG that could be sent and stored at server BSD license (the compatible one) |
Windows 3.1 look SVG produced doesn't look well
JS compiled from GWT/Java - almost impossible to read the |
kemia |
draws on SVG that could be sent and stored at server (in theory) Apache version 2 license (which should work as MW is GPL.v2+, meaning GPL.v3 as an option) but probably not preferred |
looks like not completely ready, yet |
Although not discussed with the mentors yet, I believe the most viable option in regard to achieving the goal having a working prototype or better, advancing into production, is using indigo-depict
+ ChemDoodle Web Components
.
Project Schedule
editTask | Timeline | Remarks | Status |
---|---|---|---|
Setup environment (vagrant, gerrit, git), /microtasks | 04/03/14 - 28/03/14 | took a look at vagrant, other stuff was installed before | Done |
Create GitHub repository, legal check, etc. | 28/03/14 - 07/04/14 | Code will be hosted at Wikimedia Git. The repo will be name after Extension:MolHandler, thus mediawiki/extensions/MolHandler . But I will run a -dev repository at GitHub allowing me to push changes quickly, creating as many branches as I like and to test different options and still showing that I am not idle.
|
Done |
Aim for a working proof-of-concept | 08/05/14 - 20/06/14 | get the whole pipeline running, even if it only works on a local install, the code isn't clean or tested, etc.; something that shows that all the moving parts can work together from upload to file page with a generated thumbnail | In progress |
Mid Term Evaluation | 20/06/14 - 23/06/14 | ||
Prettify, prepare for production | 23/06/14 - 15/07/14 | making it clean, giving it test coverage, and writing all the things necessary for production deployment (presumably things like puppet scripts to deploy the server-side things, production config changes, etc.). | |
Writing Documentation, Deploying changes to labs, letting folk testing there, then WMF side deployment | 15/07/14 - 10/08/14 | ||
Final Report Submission | 20/08/2014 |
Workflow
edit- Client on
.mol
file (existing or non-existing file) - Client loads molfile editor. Editor allows import/export of molfile, export of SMILES and export of SVG (server created SVG).
- User edits file and saves
- FormData is used for file upload
- Molfile is stored; do MDL molfiles contain notable metadata that have to be extracted or converted?
- SVG is created from
.mol
file through indigo-depict and stored - file name, where? - SVG is thumbnailed through rsvg (building on existing SVG support/approach) creating PNG thumbs
Non-obvious challenges
edit- Either molfile editor gets a full security audit (we might even consider prettifying and adding comments to the source code [creating something maintainable], although not nice becasue upstream library) or it is inlcuded through an
<iframe>
, loaded from a different domain - Internationalization of the molecule editor
- Option for turning on/off atom coloring on a per-site and per-inclusion basis:
[[File:Benzene.mol|150px|atomcolors=off]]
sdf2svg
edit- Aromatic bonds not shown
- Some editors write
$RXN
into molfiles... sdf2svg should be able to read this - Padding often too small cutting off atom lables
- --> We went with indigo-depict.
Participation
edit- Style: MediaWiki extension, similar to Extension:TimedMediaHandler or Extension:PagedTiffHandler
- Progress and experiences will be logged at /office desk (including future visions, what's missing etc.) and more in a more narrow frame at /microtasks (commits, code review).
- Code will be hosted at Wikimedia Git. Git/New repositories/Requests. The repo will be named after Extension:MolHandler, thus
mediawiki/extensions/MolHandler
. But I will run a-dev
repository at GitHub allowing me to push changes quickly, creating as many branches as I like and to test different options and still showing that I am not idle. - Every time I commit something to the mw-repo, it will have to be reviewed, thus I learn how to do it correctly. However, do not expect me committing something to that repo every day; but at least once per week.
- MediaWiki has great help resources for self-study (this wiki, doxygen generated stuff and finally the source code looks also sane) but for "best practices" I will certainly need the help of my mentors. Expect me asking a lot of "What is the best approach for … "-questions, especially regarding the PHP-part. This is also the reason I wish two mentors knowledgeable with file handling on the server side. Dependent on what turns out to be more efficient, I'll bug them with e-Mails or on IRC.
- I'll occasionally notify and gather feedback at project chemistry at Wikimedia Commons so it's not going to be vapourware for the reason not being accepted.
About you
edit- Education completed or in progress
- In progress — something closely related to the enhancements the extension will evolve. But well, I am German. I am careful when it comes to sharing all kind of data with the whole world. In other words, I would appreciate if you won't force me publishing anything specific.
- How did you hear about this program?
I read a post on a mailing list complaining raising the point that there wouldn't be enough diversity regarding the origin amongst the applicants. I intended to change that with my participation.
- Will you have any other time commitments, such as school work, another job, planned vacation, etc., during the duration of the program?
Some of my time will go into the Pronunciation Recording Gadget. But this has a wider schedule and I'll have plenty of time this spring/summer. Otherwise there are no specific plans for activities like internships or vacation, yet.
- We advise all candidates eligible to Google Summer of Code and FOSS Outreach Program for Women to apply for both programs. Are you planning to apply to both programs and, if so, with what organization(s)?
Outreach Program for Women? Without looking into the details but *I think*, this doesn't apply to me.
Past experience
edit- Please describe your experience with any other FOSS projects as a user and as a contributor
- I could do this providing links to Gerrit, GitHub and Special:CentralAuth/Rillke and telling you to look though the rights logs and contribs as well as the user pages but here is a brief summary about my experience at Wikimedia: In 2010, I registered at Wikipedia, became Wikimedia Commons addict in 2011 and learned a lot about JavaScript, administrator in November 2011 (so I was able to maintain my scripts), started reporting bugs at Bugzilla in 2012 and using Toolserver and Gerrit in 2013. In 2014, I created some tools at Toollabs (learning PHP) that are still up and running. Most notably the database query services: OctoData and sha1lookup for the old_image table which is not exposed through regular mw-API. I have created documentation using JSDuck but the software it is for isn't in use yet ...
- Please describe any relevant projects that you have worked on previously and what knowledge you gained from working on them (include links)
- So far, I've mainly contributed to Wikimedia projects, for example UploadWizard but also wrote a bunch of user scripts at Wikimedia Commons: FileAnalyzer, VisualFileChange, GalleryTool, a script using chunked upload protocol, Title checker and maintain a lot more (more that I am able/ or let's say is fun to maintain, given all the recent JavaScript deprecations). Furthermore, I worked with molfiles and a molfile editor in the past. Proof can be provided upon request, discretely. Not to forget the daily media-related work at Wikimedia Commons.
- What project(s) are you interested in (these can be in the same or different organizations)?
I prefer projects where I can see the light at the end of the tunnel, and where past experience has proven they're successful, hence my late registration at Wikipedia. Thematically, I like projects around chemistry, media files, uploading, involving communities and feedback cycles. I believe that asking users that are target of the software about their needs, by using specific questions and coming up with different suggestions is a crucial part of software development. Head over to Meta, if you want to see these points proven.