Topic on Talk:Reading/Web/PDF Rendering

Chricho (talkcontribs)

Hi Melamrawy! I am wondering how the decision was made to drop OCG in the future? The new PDF export feature has been implemented following a feature request in the German Wikipedia and it was assured https://de.wikipedia.org/w/index.php?diff=156634101 here that OCG would be kept, especially for articles without tables. Best regards

AKlapper (WMF) (talkcontribs)

Hmm, I don't see in that provided link where anyone states "that OCG would be kept"?

Chricho (talkcontribs)
Zu beachten ist, dass die neue Version parallel zur Latex-PDF-Version angeboten werden soll. Nicht alle Artikel haben Tabellen, und die Latex-Darstellung ist optisch eleganter. Beide Alternativen müssten dann auf der Download-Seite entsprechend gekennzeichnet und kurz erklärt werden.

I do not want to translate it, it states just that.

197.218.80.182 (talkcontribs)

They'll be offered in parallel until the ElectronPDF is deemed stable enough to safely kill OCG. See and .

Chricho (talkcontribs)

I am really disappointed by this tone of voice where references to general procedures (“parallel running”) are given, instead of discussing the issue, where decisions which have to be made are presented as technical matters of course. The question is why the mere server side PDF conversion of the well known print version is considered to be a replacement for OCG (to quote your Wikipedia article: “a new system is needed to be installed to replace the old one”) contrary to previous consultation with the community. The question is why the new reading committee never the less claims to speak for the needs of the community on this page.

AKlapper (WMF) (talkcontribs)

What "new reading committee" are you referring to?

Chricho (talkcontribs)

Reading. It seems to be not that new, but I was told that it has only recently become responsible for the PDF export.

Jkatz (WMF) (talkcontribs)

Yep, that is us. We are acting at the behest of the WMF Operations team, which feels that we cannot create yet-another-service without exploring how we can consolidate further. This is not driven by "reader" needs, but in exploring this, we are looking to see if we can take steps to improve the service for everyone. This is one reason we are hoping to move from OCG, which had serious scalability and reliability issues.

197.218.81.64 (talkcontribs)

Err, the second link points to how it was decided. You mentioned the link that indicated running things in "parallel", and a link was added showing that running two similar systems in parallel is almost always a temporary measure.

To put it simply, OCG was built upon a latex , which is a fragile language, and like every product, there comes a time when it reaches its natural end of life, when the cost of maintaining it is higher than the cost of building a new tool. Here's a list of bug reports related to OCG interested:

https://phabricator.wikimedia.org/search/query/7gTKLlPDg4qW/#R

If you want a longer explanation, you can see it at :

https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:Offline_Content_Generator&oldid=734287226

Ultimately, both OCG and the "mere" server side PDF, are just convenience tools. Browsers can easily generate pdfs without any server side tool, and all the text in a page can easily be copy and pasted in to a word processor / text editor, and be changed as needed.

Chricho (talkcontribs)

No, the link does not indicate in any way that it would be about running things temporarily in parallel for ensuring stability, but it clearly states that it would be about keeping OCG for keeping the associated benefits. I faithfully gave an account of that in English. Neither your technocratic language, nor your business slang nor any vitalist metaphor change anything about that. Who are you speaking for by the way?

Melamrawy (WMF) (talkcontribs)

Hello @Chricho, thanks for your comments. What exact features that you are looking forward to maintain by keeping OCG, and which will be lost when Electron is running? Thanks again.

Chricho (talkcontribs)

Hi Melamrawy! It is the typographic quality. Thanks

Melamrawy (WMF) (talkcontribs)

Thanks. Typography quality in what sense? Can you please elaborate? Also, what do you think of the other considerations mentioned about moving to a new service? Does it sound like a worthwhile change? Thanks.

Chricho (talkcontribs)

Hi!

The typographic advantages of OCG can bee seen in the justification and everything which has to do with it (page breaks …), which is handled nicely by Tex. Furthermore embedded formulas look better. Beside of tables there are other problems, but they could be easily solved by offering a single column layout.

Beside the fact that the Electron PDF service is for me (and many others, a PDF export from the browser is no longer that non-standard as it might have used to be) of no use at all (because I can simply export the print version to PDF from my browser, with the advantage of having custom CSS applied), I appreciate that it provides a quick solution for an issue many users have. However, I think this service does not leave much room for improvement, most things which could be done would be based on CSS, and has there been any progress in Print.css in previous years? Everything else depends on the development of Google Chrome …

To the contrary with OCG I think: regarding an implementation of table support for the Latex export: I know about the estimation from the hackathon and it might well be that there is some specific problem I am not aware of. However, I doubt that it is impossible to implement reasonable support in reasonable time. Of course it is more challenging than writing all the boilerplate code for connecting to the Electron service, Latex has its difficulties and it would require thorough examination which Latex table package should be used in which way (for example to support multipage tables and variable widths), but that is also precisely why a doubt the assessment from the hackathon where there is not much time for that. That some unexpected usage of CSS would not be supported by the Latex export is not an argument against Latex in my opinion.

Best regards

Jkatz (WMF) (talkcontribs)

@Chricho Thanks for your comments. We are holding this conversation specifically to hear things like this. I am sorry about the miscommunication around the initial electron launch on de.wikipedia. Essentially electron's release forced the WMF to reconsider a long overdue look at OCG, which hasn't had official support in years.

I don't think it is a done deal that OCG will be turned off, we are looking to see how we can grow the electron service to provide similar or better options without the difficulty of working with Latex, which is quite brittle. The purpose of this consultation is to identify what elements of OCG are most valuable and appreciated by our users so that we can make sure we don't lose them. The reading web team is working over the coming months on improving the print.css and we are definitely interested in knowing specifically what you would like to see. It was one of our primary goals this quarter, but will likely extend into the April-June time frame. I hear from you now that it is justification, page breaks and formulas. But, can I add you to a list of folks we reach out to for more nuanced discussions?

Based on your last paragraph, it sounds like you would like to stick with Latex. I can't personally testify to the superiority of other web technologies, but I can say that our engineers seem to think so, and I just spoke to the Pediapress owner and their current layout does not use Latex and it looks pretty good: https://pediapress.com/books/show/2be3644ea1bb2417adc286573d412/ Thoughts? Is there something about latex you feel we could not get elsewhere?

Chricho (talkcontribs)

@Jkatz (WMF): Thanks for your response! I am not into the latest development of Pediapress. I would be happy if you add me to the list. Best regards

Ckepper (talkcontribs)

@Jkatz (WMF) Maybe there was a misunderstanding. The original PediaPress book renderer and the book you linked above are based on Latex. For future projects however, I would recommend using PrinceXML as it is the only renderer that fully supports the CSS3 Paged Media extension. IMHO this is the best tool available as it allows developers to work on the print output in a HTML/CSS environment they are already familiar with. The main problem of electron-pdf is that Chrome does not support CSS3 Paged Media. This will become apparent when you try to do any form of "advanced" layout stuff (like a table of contents with correct page numbers).

The problem with PrinceXML is of course that it's proprietary software.

Jkatz (WMF) (talkcontribs)

@Ckepper thank you for clarifying and I apologize for misunderstanding the issue. Pediapress uses Latex, just not OCG....right? Assuming we're set on open source (and I think that is a rock-solid assumption), what would you recommend? Using Latex? Working with and building off of Electron? Something else?

Ckepper (talkcontribs)

@Jkatz (WMF) Yes, PediaPress uses a custom Latex renderer but I would not recommend this going forward as our renderer is based on Wikimarkup not HTML. This is a problem because new output rules have to be created for (almost) every MediaWiki extension. As a result, PediaPress supports some WMF projects quite well (e.g. Wikipedia) but completely sucks for others (e.g. Wikisource). As Wikipedia authors usually check their output in a browser, every new effort should be based on the rendered HTML of an article.

Better support for Paged Media in Webkit (or any other major browser) would be a true game changer, but it's probably unlikely that this is going to happen (check related tickets for Webkit, Chromium and Firefox). There are only two active Open Source projects in this space that I am aware of: Weasyprint and Wkhtmltopdf. They are significantly less powerful than the commercial tools available but maybe they are good enough or could be improved over time.

Jkatz (WMF) (talkcontribs)

@Ckepper Great clarification and resources. Thanks!

Reply to "Dropping OCG"