User:Mvolz/OPW proposal round 8

Improving URL citations on WikiMedia

edit
Public URL
Improving URL citations on Wikimedia
Cite-from-id
Bugzilla report
57804
Announcement
OPW 8 Project Proposal: Improving References on Wikimedia

Name and contact information

edit
Name
Marielle Volz
Email
marielle.volz@gmail.com
IRC
mvolz
Location
London, England U.K.
Typical working hours
9-5 UTC+1 M-W, 10-6 UTC+1 Sat/Sun

Synopsis

edit

Incomplete and missing citations of web resources using the {{cite|web}} tag are a relatively endemic problem on Wikipedia and other Wikimedia installations. There are two major problems to be addressed:

  • The relative difficulty and tedium of including {{cite|web}} citations provides a barrier to editors properly citing works.
  • As a consequence, there are many existing {{cite|web}} citations which are currently incomplete.

I propose to address the first issue and to improve the process of including citations, by automating citations given a user-submitted URL on WikiMedia.

It is currently part of the Visual Editor roadmap to add references to the transclusion dialog, and there are currently mock-ups demonstrating the process of adding references. However, there is currently no back-end to supply a wiki formatted citation in response to a user-entered URL.

A mediawiki extension, CiteURLEngine, could be developed to return wiki mark-up citations in response to a submitted URL. The goal is to eventually produce a fully-featured extension that could potentially return citations both to the VisualEditor extension as well as the RefToolbar in the WikiEditor extension.

A final version of this extension might:

  1. Take a URL
  2. Detect if the URL points to a resource that has any other identifier such as a DOI or ISBN
  3. Return the appropriate citation if such an identifier exists
  4. If not, return as a default a {{cite|web}} citation.
Possible mentors
James Forrester, Trevor Parscal

Deliverables

edit

Deliverables are follows:

  1. Get familiar with wikimedia code base and commit Hello World version of extension.
  2. Simple extension that accepts a URL and scrapes the Title to return a {{cite:web}} citation.
  3. Work with VisualEditor team/Trevor Parscal to incorporate it into the transclusion dialog and make sure they can interact appropriately.
  4. Improve functionality of extension to provide better citations with more fields populated.
  5. Investigate possibility of saving citations to a database instead of scraping in realtime to improve scalability.

Project schedule

edit
Month 1: May 19 - June 18
May 19 Commit empty extension
June 1 1st blog post due. Very rough extension produced.
June 18 2nd blog post due. Monthly report. Integrated rough extension into VE project on localhost.
Month 2: June 19- July 18
July 1 3rd blog post due. Extension should now produce citations.
July 18 4th blog post due. Monthly report. Extension added as a submodule of the VE project.
Month 3: July 19 - Aug 18
August 1 5th blog post due. Extension integrated into transclusion VE dialog.
August 18 Last blog post due. Monthly report. Refinement of extension and VE integration.

Participation

edit

In terms of documenting work, I will probably do most of the documentation on mediawiki itself, and as well as keeping the README.md and comments in the code up-to-date.

In terms of commits, in the beginning of the project when I'll primarily be working independently on the extension, I will probably do several local commits and then upload those to gerrit on a semi regular basis. However, at the point at which I'll be hopefully adding the extension as a submodule of Visual Editor and making changed to the VE submodule as well, I plan to adhere to community standards with respect to committing via gerrit, which I hopefully demonstrated with my microtask!

In terms of asking for help, I've already found the #mediawiki-visualeditor, where my mentors are denizens, an extremely helpful and responsive place. One challenge is that I'm 8 hours off timezone-wise from the VE team, which means that our work days overlap by only an hour. However, I am happy to chat later in the evening outside of working hours.

About you

edit
Education completed or in progress
  • B.A. Biological Sciences Cornell University, 2008
  • M.S. Ecology and Evolutionary Biology, 2009
How did you hear about this program?

DevChix mailing list

Will you have any other time commitments, such as school work, another job, planned vacation, etc., during the duration of the program?

No, although for scheduling working hours, I do have to work around the availability of my childcare provider and my partner's conference/work schedule. I anticipate that I'll be able to schedule 40 hours of uninterrupted work a week.

We advise all candidates eligible to Google Summer of Code and FOSS Outreach Program for Women to apply for both programs. Are you planning to apply to both programs and, if so, with what organization(s)?

I'm only applying to OPW: WikiMedia.

What makes you want to make this the most awesomest wiki enhancement ever?

Quite simply, the process of citing works on and off of Wikipedia has been driving me bonkers for years. I've used EndNote, BibTeX, Zotero, ReferenceMe, and a dozen other things.

On Wikipedia, despite my passion for citing things, my own articles are littered with with C1 Missing Title errors. If a citation mad but fundamentally lazy[1] person like myself can't properly cite her articles, what hope does the casual user have? Something must be done.

  1. Please note that this is considered a virtue by Larry Wall.

Past experience

edit
Please describe your experience with any other FOSS projects as a user and as a contributor

I've been a registered Wikipedia editor since 2005 and I've used GNU/Linux as my main OS since 2008.

My microtask for my application is bug 51012.

Please describe any relevant projects that you have worked on previously and what knowledge you gained from working on them (include links)

A few years back I wrote a web app/CMS for academics to keep track of their publications. What I learned from this is that adding papers to a database by filling out fields in a GUI is considerably worse than using the {{cite|journal}} tag!

For the last two years I ran a web app for a CDC funded project called Ex-flu in Django/MySQL. I learned a lot of lessons from that, notably:

  • The two most terrifying things are: Sending out mass e-mails, and running SQL alter table statements on a live database.
  • Test before you commit. Never commit on a live server. Pull your commits before running management commands.

I've also done some contract work in WordPress.

I'm mvolz on github as well.

What project(s) are you interested in (these can be in the same or different organizations)?

My project proposal comes from my interest in the following raw projects:

Any other info

edit

Report on Improving URL citations on Wikimedia

Notes I've made on the process so far

See also

edit

FOSS Outreach Program for Women/Round 8