Reading/Web/PDF Rendering

There is a more updated page at Reading/Web/PDF Functionality. Please read and discuss there instead.

About

edit

The Wikimedia Foundation (WMF) would like to work on enhancing the end result of the rendered PDFs, in a way that meets community needs. The technology solution that we would like to propose is to move from OCG to Electron, an underlying service that supports browser based rendering.

On this page, we are laying out the problem and our proposed solution. The plan below is tentative and just to give clarity around tasks vs the timeline. Please feel free to add your comments and suggestions accordingly. This work is building on the Wikimedia Deutschland (WMDE) lead initiative on enhancing tables in PDF printing.

Current situation

edit

Currently (early 2017) rendering PDF articles from Wikipedia pages is handled by a service called OCG. When rendering "books" through the book creator, it uses OCG as embedded within the Collection extension. OCG has multiple issues, especially with tables.

OCG is currently not well supported by the WMF and there are difficulties with Latex that have disabled table rendering in PDFs. Latex is a fairly brittle framework which is not well-suited to our flexible content-types. Furthermore, bugs in OCG or the Collection extension have greatly diminished the 3rd use of OCG (creating books). Please check earlier OCG discussion here and here.

Currently, when you click "Download as PDF" on the side menu, a screen similar to the below is displayed, where the PDF is available for download shortly after the article is ready. The result will not include tables.

 

For example, check the rendered PDF for the article on list of country codes which contains a big table, that is not captured at all by the current rendering service.

Proposal

edit

Replace OCG with Electron. OCG, the service that is currently used basically does the following:

  1. Converts wikitext pages to latex-formatted PDF and plain text. In the past, it has also supported zim, epub and possibly more
  2. When integrated with the collection extension, it collates articles selected by a user into books + creates a table of contents

It is our hope that moving to a browser based PDF rendering solution such as Electron, would enhance both PDF output and limit maintenance. The new service is currently in use on mediawiki.org (try the "Download as PDF" option in the left-hand menu) and a few other wikis and will be responsible for the underlaying PDF conversion, without major changes to user workflow.

Output examples

edit

Implications

edit

The implications of this change are two-fold.

  1. PDFs will look more like the images above and less like the current 2-column layout.
  2. Books created in the book creator will potentially lose the following features:
    • Paper size selector
    • table of contents creation
    • adding custom chapters
    • plain text rendering
    • selecting number of columns (these are discussed on phabricator here)
    • the ability to print a book is not used with any regularity and it might simplify greatly to remove it
    • current book print from pediapress format (see here and click on preview for an example)
    • If we move away from the printing of books, we will no longer legally need to do attribution in the current manner:
      • attribution of images used in the article (the link would be embedded)
      • enumeration of all editors that contributed to the individual wiki pages (link to history would be embedded)

Current status

edit
  • Mediawiki.org and a few other wikis have already deployed changes to how printing PDF works, but have not yet made changes to the book creator tool
  • New PDF styling is available for review above and should be rolling out in March.
  • A proposed workflow of all details is elaborately explained here

Questions & Answers

edit