Talk:Reading/Web/PDF Rendering

About this board

Johan (WMF) (talkcontribs)

Hey everyone, we take everything that has been written here earlier into account, but for others to be able to see your discussion, please go to Talk:Reading/Web/PDF Functionality and write there instead. Sorry for the confusion, but enough had changed that we felt it made sense to create a new page.

Reply to "Discussion"
Ceever (talkcontribs)

Not working with the current PDF renderer: the overview maps and infoboxes (like Visa Regulation)

Thus:

1.) We need an overview map including the GPS listings of that article, like Lonely Planet does.

2.) Infoboxes, Cautionboxes, Climateboxes, etc. (i.e. mbox templates) should be rendered properly and not get lost on the way

Malyacko (talkcontribs)

Thanks for the feedback! Could you please provide a link where to see that problem?

109.46.0.148 (talkcontribs)

Hi there.

Just export (PDF) WikiVoyage:Israel and you wont find the Visa regulation box that is there in the original article.

Furthermore, if you WikiVoyage:Eilat the original embedded map just leaves a weird one liner.

Cheers, Ceever

Reply to "Maps & Infoboxes"
Redaktor (talkcontribs)

How does one get the renderer to respect rtl languages? It works for hewiki but not for yiwiki.

Elitre (WMF) (talkcontribs)
Jkatz (WMF) (talkcontribs)
Reply to "RTL rendering"

Book creation vs PDF rendering vs printable version

4
Afernand74 (talkcontribs)

I am not a tech guy nor a UI specialist but I don't see why WP should have a PDF creation that looks like the printed version of a page and a printable version of a page that doesn't look a printed page. When you have Chrome it gives eventually the same result but not with Firefox, Brave or other browsers. Should the printable version of a page not generate and propose its PDF version automatically?

Moreover, creating human readable books obviously requires specific tools that go beyond a simple PDF rendering per page and the collating them. I would separate these two topics.

Jkatz (WMF) (talkcontribs)

@Afernand74 Thanks for your input. I'm not sure I follow what you're saying here, but I'll interpret as best I can and let me know if I got wrong.

  1. I think you're saying that our "printable version" link is useless. I agree. Most people know how to print from the browser now, but for those who can't I would like to provide an icon that just starts the browser printing.
  2. The same as above goes for the pdf generation...an icon is the most I think we should have. Except for the "old" OCG created rendering, which looks visually different. Most people use the browser and the vast majority of people don't want to think about it.

As to breaking out the conversation, they are tied mostly because they currently rely on the same technology. For this reason, we can't have two conversations that end on different conclusions. Any thoughts on how we can untangle this are most welcome. Another challenge I have is ensuring we are hearing from potential big users like wikibooks and wikiversity...we have posted on their pumps, but haven't heard anything. Just throwing my problems at you now ;)

Afernand74 (talkcontribs)

Hi @Jkatz (WMF). Sorry for being cryptic.I think you grasped most of it ;-) I agree the "printable version" link is useless. It is a pity one cannot selectively format what is being sent to the printer but starting the browser printing will be sufficient for most users. For the pdf generation, a pdf button will do great.

If you implement this two thinks, why not keeping OCG alive exclusively for book creation until a better solution is developed? Why not just remove "book creation" link from the "print/export" section and move to another place? Sometimes, less is more.

Jkatz (WMF) (talkcontribs)

Thanks for these thoughts, @Afernand74. I think that both your first option is definitely something we are planning on doing. Though right now we are trying to hone in on what "better" means. If it means books now have tables, but only have 1 column is that "better"? Based on other comments on this talk page it probably is not. As for your second suggestion, that is very interesting and might be a way to keep core users happy and limit the operational support while we figure out a new solution. I and others really do think that books or reading lists (as most users seem to use the tool) is a great feature space that deserves better implementation.

Reply to "Book creation vs PDF rendering vs printable version"
Chricho (talkcontribs)

Hi Melamrawy! I am wondering how the decision was made to drop OCG in the future? The new PDF export feature has been implemented following a feature request in the German Wikipedia and it was assured https://de.wikipedia.org/w/index.php?diff=156634101 here that OCG would be kept, especially for articles without tables. Best regards

AKlapper (WMF) (talkcontribs)

Hmm, I don't see in that provided link where anyone states "that OCG would be kept"?

Chricho (talkcontribs)
Zu beachten ist, dass die neue Version parallel zur Latex-PDF-Version angeboten werden soll. Nicht alle Artikel haben Tabellen, und die Latex-Darstellung ist optisch eleganter. Beide Alternativen müssten dann auf der Download-Seite entsprechend gekennzeichnet und kurz erklärt werden.

I do not want to translate it, it states just that.

197.218.80.182 (talkcontribs)

They'll be offered in parallel until the ElectronPDF is deemed stable enough to safely kill OCG. See and .

Chricho (talkcontribs)

I am really disappointed by this tone of voice where references to general procedures (“parallel running”) are given, instead of discussing the issue, where decisions which have to be made are presented as technical matters of course. The question is why the mere server side PDF conversion of the well known print version is considered to be a replacement for OCG (to quote your Wikipedia article: “a new system is needed to be installed to replace the old one”) contrary to previous consultation with the community. The question is why the new reading committee never the less claims to speak for the needs of the community on this page.

AKlapper (WMF) (talkcontribs)

What "new reading committee" are you referring to?

Chricho (talkcontribs)

Reading. It seems to be not that new, but I was told that it has only recently become responsible for the PDF export.

Jkatz (WMF) (talkcontribs)

Yep, that is us. We are acting at the behest of the WMF Operations team, which feels that we cannot create yet-another-service without exploring how we can consolidate further. This is not driven by "reader" needs, but in exploring this, we are looking to see if we can take steps to improve the service for everyone. This is one reason we are hoping to move from OCG, which had serious scalability and reliability issues.

197.218.81.64 (talkcontribs)

Err, the second link points to how it was decided. You mentioned the link that indicated running things in "parallel", and a link was added showing that running two similar systems in parallel is almost always a temporary measure.

To put it simply, OCG was built upon a latex , which is a fragile language, and like every product, there comes a time when it reaches its natural end of life, when the cost of maintaining it is higher than the cost of building a new tool. Here's a list of bug reports related to OCG interested:

https://phabricator.wikimedia.org/search/query/7gTKLlPDg4qW/#R

If you want a longer explanation, you can see it at :

https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:Offline_Content_Generator&oldid=734287226

Ultimately, both OCG and the "mere" server side PDF, are just convenience tools. Browsers can easily generate pdfs without any server side tool, and all the text in a page can easily be copy and pasted in to a word processor / text editor, and be changed as needed.

Chricho (talkcontribs)

No, the link does not indicate in any way that it would be about running things temporarily in parallel for ensuring stability, but it clearly states that it would be about keeping OCG for keeping the associated benefits. I faithfully gave an account of that in English. Neither your technocratic language, nor your business slang nor any vitalist metaphor change anything about that. Who are you speaking for by the way?

Melamrawy (WMF) (talkcontribs)

Hello @Chricho, thanks for your comments. What exact features that you are looking forward to maintain by keeping OCG, and which will be lost when Electron is running? Thanks again.

Chricho (talkcontribs)

Hi Melamrawy! It is the typographic quality. Thanks

Melamrawy (WMF) (talkcontribs)

Thanks. Typography quality in what sense? Can you please elaborate? Also, what do you think of the other considerations mentioned about moving to a new service? Does it sound like a worthwhile change? Thanks.

Chricho (talkcontribs)

Hi!

The typographic advantages of OCG can bee seen in the justification and everything which has to do with it (page breaks …), which is handled nicely by Tex. Furthermore embedded formulas look better. Beside of tables there are other problems, but they could be easily solved by offering a single column layout.

Beside the fact that the Electron PDF service is for me (and many others, a PDF export from the browser is no longer that non-standard as it might have used to be) of no use at all (because I can simply export the print version to PDF from my browser, with the advantage of having custom CSS applied), I appreciate that it provides a quick solution for an issue many users have. However, I think this service does not leave much room for improvement, most things which could be done would be based on CSS, and has there been any progress in Print.css in previous years? Everything else depends on the development of Google Chrome …

To the contrary with OCG I think: regarding an implementation of table support for the Latex export: I know about the estimation from the hackathon and it might well be that there is some specific problem I am not aware of. However, I doubt that it is impossible to implement reasonable support in reasonable time. Of course it is more challenging than writing all the boilerplate code for connecting to the Electron service, Latex has its difficulties and it would require thorough examination which Latex table package should be used in which way (for example to support multipage tables and variable widths), but that is also precisely why a doubt the assessment from the hackathon where there is not much time for that. That some unexpected usage of CSS would not be supported by the Latex export is not an argument against Latex in my opinion.

Best regards

Jkatz (WMF) (talkcontribs)

@Chricho Thanks for your comments. We are holding this conversation specifically to hear things like this. I am sorry about the miscommunication around the initial electron launch on de.wikipedia. Essentially electron's release forced the WMF to reconsider a long overdue look at OCG, which hasn't had official support in years.

I don't think it is a done deal that OCG will be turned off, we are looking to see how we can grow the electron service to provide similar or better options without the difficulty of working with Latex, which is quite brittle. The purpose of this consultation is to identify what elements of OCG are most valuable and appreciated by our users so that we can make sure we don't lose them. The reading web team is working over the coming months on improving the print.css and we are definitely interested in knowing specifically what you would like to see. It was one of our primary goals this quarter, but will likely extend into the April-June time frame. I hear from you now that it is justification, page breaks and formulas. But, can I add you to a list of folks we reach out to for more nuanced discussions?

Based on your last paragraph, it sounds like you would like to stick with Latex. I can't personally testify to the superiority of other web technologies, but I can say that our engineers seem to think so, and I just spoke to the Pediapress owner and their current layout does not use Latex and it looks pretty good: https://pediapress.com/books/show/2be3644ea1bb2417adc286573d412/ Thoughts? Is there something about latex you feel we could not get elsewhere?

Chricho (talkcontribs)

@Jkatz (WMF): Thanks for your response! I am not into the latest development of Pediapress. I would be happy if you add me to the list. Best regards

Ckepper (talkcontribs)

@Jkatz (WMF) Maybe there was a misunderstanding. The original PediaPress book renderer and the book you linked above are based on Latex. For future projects however, I would recommend using PrinceXML as it is the only renderer that fully supports the CSS3 Paged Media extension. IMHO this is the best tool available as it allows developers to work on the print output in a HTML/CSS environment they are already familiar with. The main problem of electron-pdf is that Chrome does not support CSS3 Paged Media. This will become apparent when you try to do any form of "advanced" layout stuff (like a table of contents with correct page numbers).

The problem with PrinceXML is of course that it's proprietary software.

Jkatz (WMF) (talkcontribs)

@Ckepper thank you for clarifying and I apologize for misunderstanding the issue. Pediapress uses Latex, just not OCG....right? Assuming we're set on open source (and I think that is a rock-solid assumption), what would you recommend? Using Latex? Working with and building off of Electron? Something else?

Ckepper (talkcontribs)

@Jkatz (WMF) Yes, PediaPress uses a custom Latex renderer but I would not recommend this going forward as our renderer is based on Wikimarkup not HTML. This is a problem because new output rules have to be created for (almost) every MediaWiki extension. As a result, PediaPress supports some WMF projects quite well (e.g. Wikipedia) but completely sucks for others (e.g. Wikisource). As Wikipedia authors usually check their output in a browser, every new effort should be based on the rendered HTML of an article.

Better support for Paged Media in Webkit (or any other major browser) would be a true game changer, but it's probably unlikely that this is going to happen (check related tickets for Webkit, Chromium and Firefox). There are only two active Open Source projects in this space that I am aware of: Weasyprint and Wkhtmltopdf. They are significantly less powerful than the commercial tools available but maybe they are good enough or could be improved over time.

Jkatz (WMF) (talkcontribs)

@Ckepper Great clarification and resources. Thanks!

Reply to "Dropping OCG"

Wikivoyage implications

6
LtPowers (talkcontribs)

As a Wikivoyage editor, I think we would be disappointed to see most of the features listed under "Implications" lost. We maintain printability as a major goal of the site, so features that make the PDF look more like a professionally printed book and less like a printed-out webpage are highly valued. Also, as a travel site, we expect our users who print our guides to want a portable format, meaning something quite a bit smaller than A4.

Speaking personally, I think features like tables of contents and indices are extremely useful and it would be nice to retain the former and add the latter.

LtPowers (talkcontribs)

As an aside, Wikivoyage's predecessor site did in fact have a series of guides printed and sold as part of a separate for-profit business that licensed the name. As a former editor of one of those guides, there are a number of features of our wiki-to-print conversion that those of us at Wikivoyage would continue to find useful.

  • Custom chapters
    • The book comprised multiple chapters, each of which corresponded to a different travel guide article. The table of contents included second-level headings from the articles.
  • Automatic index generation
    • The program would parse the wikitext for bolded words and phrases and automatically place them into the index, with page numbers. At my request, the program was modified to allow custom index entries using a dummy template inserted into the wikitext.
  • Image formatting
    • Images from the article had a variety of formatting options, which were specified as dummy parameters in the wikitext invocation. Since the books were printed in a two-column format, images could be placed inline in a single column, spanning two columns at the top or bottom of the page, above the chapter title as the lead image, two per page on a dedicated image page, full-page, or spanning two full pages. Images could be rotated if necessary to fit the book format.
  • Web-only or print-only content
    • Content, including images, could be designated web-only (using a special template that just passed the parameter through unchanged) or print-only (by using HTML comments).
  • Page breaks
    • Manual page breaks could be entered for layout purposes. The software also kept headings with their following text rather than orphaning them on a previous page or column.
  • Automatic page headings
    • The chapter title and current section appeared at the outside top corner of each page (except the first page of a chapter).
  • Automatic hyphenation

You can read more about it at voy:Wikivoyage:Wikitravel Press. This blog post is also instructive, though it contains few technical details.

Jkatz (WMF) (talkcontribs)

@LtPowers Thanks for this! It is very useful. I will be sure to add them to the list of desired features (in progress). Which of these are live currently?

LtPowers (talkcontribs)

@Jkatz (WMF), I'm not sure what you mean by "live". Wikitravel Press is no longer a going concern and so the software that formatted the Wiki guides for print is no longer running.

Jkatz (WMF) (talkcontribs)

Thanks, that answers my question. By "live" I meant "running". #jargon

LtPowers (talkcontribs)

Oh, I know what "live" means. Perhaps I should have queried what you meant by "these". ;)

Reply to "Wikivoyage implications"

Some feedback/requests

2
AKlapper (WMF) (talkcontribs)

Comments on https://www.mediawiki.org/w/index.php?title=Reading/Web/PDF_Rendering&oldid=2369563 : Images in "Output examples" seem to have wrong captures? Under "Current status", "Mediawiki has deployed changed to the how printing PDF works" has a typo and could there please be some link to that "deployed change" - is that T142226? If "New PDF styling is available for review above", how/where to review/enable that "new styling"? Could captures in general make clear if they depict the old PDF rendering or the new PDF rendering or the MediaWiki rendering?

Melamrawy (WMF) (talkcontribs)

I was copying pasting the code for gallery part and it looks like a Sunday morning glitch that I didn't change the capture. Thank you so much for reviewing , and for the heads up :)

Reply to "Some feedback/requests"
There are no older topics
Return to "Reading/Web/PDF Rendering" page.