Talk:Reading/Web/PDF Functionality

About this board

About giving feedback

Update: (15 July 2019) We’ve launched the new PDF renderer. We’re looking at feedback, but haven't so far seen any significant issues. We might incorporate some suggestions, but want to note that this is not an ongoing project with continuous development. In other words, now that it's deployed and proven to work, the new renderer is entering maintenance mode. This page won’t be abandoned, but it could take a while before anyone reacts, simply because everyone's got so much else to do.  

In terms of books, we've left it in the hands of volunteer developers and PediaPress. We'll be glad to reach out to them with questions, but we're not planning any involvement in terms of the technical implementation.

No program code from foldings

4
195.43.90.254 (talkcontribs)
TheDJ (talkcontribs)

This issue is specific to the russian wikipedia, please contact the administrators of the Russian Wikipedia.

195.43.90.254 (talkcontribs)

where shuold i contact to? Link for PDF bugs leads here.

TheDJ (talkcontribs)
Reply to "No program code from foldings"

Obstacles to fixing this functionality

5
DavidMCEddy (talkcontribs)

What are the obstacles to fixing this functionality? It seems to me like there should be sufficient demand for this feature to justify the cost of fixing it. The fact that it has been broken for so long suggests that there must be more obstacles to fixing it than just the technical issues.

I have friend with a PhD in history, who has written several books. I haven't seen them, but I believe they include substantive documentation to standards of modern historical research and include figures and tables to make it easier for people to read and understand. One in particular is a history of w:Robert Campbell (frontiersman)#Sublette and Campbell (1836–1845). He thinks the w:Campbell House Museum in St. Louis might like to sell his book as a fund raiser. He claims there is another "Campbell House Museum" in Northern Ireland, where the said Robert Campbell was born.

I've suggested he consider publishing his books on Wikibooks, where other scholars could potentially improve them.

However, an obstacle to that is how it could be converted into a PDF and printed. A PDF could be distributed via web sites associated with both Campbell House Museums and other interested organizations. A physical book could be sold to raise money for the museums.

I have other projects for which I'd like to be able to create PDF documents potentially from multiple articles with flexible options for font size, headers and footers.

How can I contact and perhaps join the volunteer developers?[1]

Thanks, @DavidMCEddy

  1. I primarily work in w:R (programming language), but I've written code in many other languages and fixed bugs in languages I don't really know. That's feasible with projects that have good test suites. One of my projects is documented in Ecdat: Data Sets for Econometrics inglés. This uses w:GitHub.
Steelpillow (talkcontribs)

I presume you mean rendering whole books? This software package doesn't do that any more, strictly article rendering only (even if not as well as Firefox does it natively).

The official community has gone off the idea of book rendering. Neither the WMF who organise these things nor PediaPress who write the rendering software is willing to give it any kind of priority any more, it has just drifted for years now. The WMF tried to rewrite it and failed miserably, twice (which is why we only do single articles now), so instead they now say nobody used it any more so why fix it. Well of course not because it got so antiquated that we couldn't any more and it had to be withdrawn, but that logic is lost on them in favour of acid remarks about statistics and it's not their fault they can't do system design. PediaPress eventually knocked up a dodgy alpha of a replacement service but then left it to rot because they were too busy elsewhere.

Another guy did create an alternative, but in a coding language the WMF are frightened of so they won't support it on the grounds that he might do a runner. The logic that a service with one active developer is better than no service at all, is quite lost on them.

If you are able to contact the WMF's coding community, establish what languages they get warm feelings about and do some proper system architecting (I can help a little with that, if only with the occasional "I wouldn't do it ''that'' way after what happened last time), then we will all love you forever. All code is tracked on phabricator https://phabricator.wikimedia.org, but I don't know if that is an official point of contact. Steelpillow (talk) 21:33, 26 December 2020 (UTC)

Dirk Hünniger (talkcontribs)
Bert Niehaus (talkcontribs)

At least it is available on the Germany Wikiversity in the menu as Multi Format export. Anyway create a tailored book according to the prerequisites of the learner in Wikiveristy is helpful especially when learners add content to the generated book and do not want to expose their private content to the public (i.e. the learning results). Books are standardized and aggregate article can cover individual interest on the learner or additional articles that meet his or her needs, all the best, Bert

Bert Niehaus (talkcontribs)
Reply to "Obstacles to fixing this functionality"

Cropping of Infobox text in pdf files

3
Gfigs (talkcontribs)

have logged this bug task on a Village Pump (Technical) post here Gfigs (talk) 07:55, 6 January 2021 (UTC)

Gfigs (talkcontribs)

this is display problem with Infobox Template. see (Task T271288) Gfigs (talk) 11:42, 6 January 2021 (UTC)

Gfigs (talkcontribs)
Reply to "Cropping of Infobox text in pdf files"
Gemlog (talkcontribs)

I did read the warning at the top, but the update said it was working and deployed, so I went ahead and made a book.

I can't choose a download format (greyed out), so I can't d/l a pdf (also greyed out).

And I can't save all my work to my user location of User:Gemlog/Books/ nor to https://en.wikipedia.org/w/index.php?title=Special:PrefixIndex&prefix=Book:

Both produce an API error.

[Xb@-swpAIC4AALGGnJMAAAAS] 2019-11-04 06:05:39: Fatal exception of type "ApiUsageException"


I can, of course, give money to PediaPress. That link works perfectly and the books look amazing.


It would be a wonderful thing if the pdf worked like the July 2019 note says though...

Dirk Hünniger (talkcontribs)

any rendering functionality of books or collection to any downloadable format has been decommissioned. Any funds for any development of a replace or repair of any such functionality have been withdrawn. To say it the German language used by the miners in the area I live in: "Et is im Aasch". I try to develop a free alternative in my free time without any funding. https://mediawiki2latex-large.wmflabs.org/ Good Luck

Gemlog (talkcontribs)

Thank you very much for replying me!

The note to the right of this page is extremely misleading to say the least. Well. Now I know not to bother.

However, I may have just learned of a new tool! So there's that :-)

KDE Neon can't find wb2pdf with apt, but I'll find it.


Thanks again!


Steelpillow (talkcontribs)

The page pdf renderer has been updated and deployed, the Book pdf renderer has been decommissioned. On a Book page this can be misleading, as the "Download as PDF" link only downloads the page and not the whole book. On the other hand, it should not be greyed out and you should also be able to save your new page to your user pages or the Book: namespace as desired.

If your experience differs from this, can you give more precise details?

Another volunteer is writing a new Book pdf renderer and says they will release it as open source for us, but we have been waiting a long time.

Gemlog (talkcontribs)

Hi,

I pasted the errors I received into the first post I made ;-)

Gemlog (talkcontribs)

Also, I see that the misleading box on the right of this page that I was referring to is now gone, so... yay :-)

Steelpillow (talkcontribs)

Still we need more precise information. I cannot find a book "PDF Download" option you say is greyed out. Can you give the url of the page you see it on? Or, is it the "Download as PDF" Print/export option in the lefthand menu (which is for article download, not whole books)? Was it perhaps in the strange misleading dialog that vanished? If you do not tell us accurately where it is, we cannot diagnose it for you!

Again, when you received the error message you pasted, was this in the Book Creator when you tried to save the book? I just created and saved a new book and it all worked fine. Did you add any extra code to your book, such as chapter headings or meta-information? If you post a list of the articles in your book, I can try to see if it will work for me.

Guentheralex (talkcontribs)

In Book Creator, there is a "PDF Download" option in a box to the lower right that is greyed out and cannot be used. There is really no simpler way to explain it.

Steelpillow (talkcontribs)

Do you you mean the "Download" box which offers several formats besides PDF? In English, quote marks indicate exact wording. Yes, as I explained above, that is meant to be greyed out.

Otherwise, please post or email me a screenshot to show the option I am not seeing on my PC.

Guentheralex (talkcontribs)

In English, superfluous pedantry is insulting. Please insert that in your "Download" box. Thank you.

Steelpillow (talkcontribs)

My apologies, no insult is intended. I suppose that my approach to problem diagnosis is highly pedantic, but I get better results that way. May I take it that you have no problem with this software which remains to be diagnosed.

Reply to "PDF download is greyed out"
Steelpillow (talkcontribs)

This recent edit to the page suggests that there will be an option to export Wiki markdown instead of PDF. Is this correct? Steelpillow (talk) 17:16, 26 February 2018 (UTC)

Bert Niehaus (talkcontribs)
Steelpillow (talkcontribs)

So this appears to be about an alternative way for a client to import and convert raw wikitext from individual articles, that is wholly unrelated to the PDF export service and, as far as I can tell, from markdown as well. Steelpillow (talk) 09:43, 28 February 2018 (UTC)

Bert Niehaus (talkcontribs)

It just shows an option to create the PDF on the client side due to problems of PDF generation on the server side. Of course this work around enables the export of even more formats. If that is not appropriate as recommendation in this discussion, excuse me for being off track.

This post was hidden by Steelpillow (history)
Korriskoso-vnt (talkcontribs)

Gracias! Comprendido.

2800:4B0:8002:974E:1:2:2DDD:8891 (talkcontribs)

Felicidades

Bert Niehaus (talkcontribs)
Bert Niehaus (talkcontribs)

If you want to create PDF on the client side, you can read the wiki markdown and start converting in the browser as runtime environment with existing libraries like https://github.com/MrRio/jsPDF This reduces the load on the server, because just the wiki markdown and the embedded media must be transfered to the client. The server side implementation is available on wmlabs by Dirk Hünniger http://mediawiki2latex.wmflabs.org/ that generates the PDF on the server and delivers the generated PDF to the user. Wiki markdown is converted in LaTeX (that can be done even in the browser) Costly in terms of performance is LaTeX conversion into the PDF. So why not allow the user to perform the final step - if he/she really wants to have a PDF document and the online-wikibook is not possible due to constraints of internet availability in remote areas and e.g. humantarian organisation want to create a tailored WikiBook for capacity building and need to deploy that offline (see tailored WikiBooks for Risk Mitigation) best regards and many thanks for discussing this topic and allowing offline use of Wikipedia and Wikiversity content under CC-BY-SA 3.0 license.

Reply to "Markdown?"

The benefits to PediaPress of Going Open Source

22
MJL (talkcontribs)

@Ckepper asked about the potential benefits of PediaPress going open source with this project in this thread. I wanted to give them some good takeaways to bring back to their company as it relates to this specific project.

Commercialization of a given project is an important reason to want to keep it closed source. However, I believe it would impede the success of the renderer long term were it to remain closed source. As things stand, I do not think it will be as successful as Extension:Collection because, for starters, it would not be free to install for most. Small wikis would not be able to afford much in licensing. Open source also instills a kind of trust that any large company, nonprofit, or single individual can rely on. It shows you are so confident with your product that you will extend that to showing it for the world to see in its most basic form: code.

On a different note, as is reported on the company's website, "[you] offer consulting, customization, and support for advanced document transformation solutions." This is nothing small right here, and I am confident in that business model. If, however, you believe otherwise, there is still ways to protect copyright without going closed source. In this case, I would look to Chromium for guidance in what potential path you can take. Not every one of your ideas needs to be included in an open source repository, so you can still maintain the parts you want secret or just to yourselves.

The principle rendering service should, however, be available to the public to do bug tests and the like. It's a win-win. Consulting and customization are where the real money is anyways. You could also branch into hosting this rendering service for others similar to how you already offer print-on-demand books to any mediawiki-wiki. Wikis will always need to pay for this if they want the product beyond what is already offered out there.

Finally, it is a strong selling point for a company with such strong ties to the open-source movement! I hope this helps you make the right decision on this matter.

Ckepper (talkcontribs)

Thank you for your comment. After talking with colleagues and other stakeholders, we have made the decision to release mwlib.html as open source when the project is sufficiently mature. This should help to ensure its long-term viability.

Also, I enabled the new render server so that rendering on https://pediapress.com/collector should work again (and be more stable).

MJL (talkcontribs)

That's awesome news! Major thanks goes out to your organisation for its willingness to do that. If there is anything you all need from the community (like press releases*, bug testing, etc.) please reach out! I just tried the collector on Simple:Spooky Scary Skeletons, and I think it really looks great!! Very elegant! :D

*I run Wikisource News (en) now, so I can help with publishing and writing it!

Ckepper (talkcontribs)

I have added Wikibooks (en) and Wikisource (en) to the test renderer. The output is still far from perfect but PediaPress was never able to generate PDFs from those sites before.

Helmoony (talkcontribs)

Hi @Ckepper, is it possible for you to add Wikipedia (ar), Wikibooks (ar) and Wikisource (ar). It's going to be a good test for right-to-left issues.


Ckepper (talkcontribs)

Hi @Helmoony, I have added Wikipedia (ar). A few years ago (for Wikimania Haifa 2011) we created a LTR export with our old PDF renderer and that was really painful - especially since no one on our team knew Hebrew. You can start playing around with the export, but this is definitely not a priority for us right now.

Helmoony (talkcontribs)

Thank you Ckepper, I tested the version, it's not working great. When it doesn't show ''Failed to load PDF document.'', errors are mainly: text format should start from right, wikidata-based infoboxes are not showing wikidata data including OSM-based map, some terms need to be translated (e.g. Image Sources, Licenses and Contributors). But at least we know what we need to do now.

MarkAHershberger (talkcontribs)

The render worked for me just now for arwiki. There are still some RTL issues, but I didn't see the "Failed to load" issue.

Is the source available so that we can contribute?

Ckepper (talkcontribs)

Not yet, I'd like to clean it up a little bit before making it available.

MarkAHershberger (talkcontribs)

I hope you can release it soon. The book functionality is needed! Thank you for your quick reply!

Ckepper (talkcontribs)

I hear you. Maybe I don't do the full cleanup to publish it sooner.

MarkAHershberger (talkcontribs)

That would be awesome! Ugly code that works is better than no code.

MarkAHershberger (talkcontribs)

There are, among those of us who use MediaWiki to run KM systems outside of Wikipedia, some absolutely essential extensions whose code is hideous.

I'm glad you want clean code, but I would hope that you can release the code as soon as possible and then clean up the code later.

Steelpillow (talkcontribs)

Yes, absolutely. A buggy alpha release v0.01 is better than no release at all. Thank you so much for keeping on with this work.

Charis (talkcontribs)

@Steelpillow, agree complete. Is there any progress? Current situation April 11, 2020 is at opening Book Creator "Due to severe issues with our existing system, the Book Creator will no longer support saving a book as a PDF." A collaborative work "will always remain freely distributable and reproducible" only if I can export into another free file format like into the most common book format pdf or odt.

Steelpillow (talkcontribs)

@Charis I have not been following progress lately. There is a test server at https://pediapress.com/collector/ which you can try. Otherwise, Ckepper is the best one to ask, as they have been the voice of PediaPress here.

Peculiar Investor (talkcontribs)

I also posted this on Extension talk:Collection but the failure page when trying to Download to PDF on my wiki lands here, so cross-posting.


Our wiki is running on MediaWiki 1.31.7 and using Collection 1.7.0 (af3a0b8) 14:23, 15 April 2018. The Download as PDF is constantly failing and directing the user to Reading/Web/PDF Functionality which doesn't specifically address the reason for the "Book rendering failed". Reading through Talk:Reading/Web/PDF Functionality doesn't clear up the situation much either. It does seem to indicate there is a new render server available at https://pediapress.com/collector but that doesn't seem to work for non-Wikipedia sites. The existing render server https://tools.pediapress.com/mw-serve/ does seem to still active.

Is the functionality via this extension dead for low traffic sites that don't need or cannot install (i.e shared hosting) their own PDF server?

Dirk Hünniger (talkcontribs)

I got my mediawiki2latex package in ubuntu 20.04 (GPL). PDF generation seems to work fine. Furthermore I got my own rendering server, that also works with non wikimedia sites.

https://mediawiki2latex.wmflabs.org/

Dirk Hünniger (talkcontribs)
Peculiar Investor (talkcontribs)

I'm still confused, sorry, because that doesn't seem to agree with Extension:Collection which shows

MediaWiki 1.34+

as does Special:Version both here and on Wikipedia, both of which are running on

MediaWiki 1.35.0-wmf.30 (6d5d990)

12:06, 4 May 2020

Reading between this discussion and the Extension:Collection and it's associated talk page doesn't help clarify the status of the extension but more importantly whether there is a render server that low traffic wiki sites can use so that the Download to PDF functionality works.

Steelpillow (talkcontribs)

As ever, there is confusion between the collection extension or Book Creator and the rendering service. The old rendering service, the Offline Content Generator, has been pulled and the promised PediaPress replacement interminably delayed. Development of the collection extension/Book Creator also stopped, but it remains in use. It still generates a trickle of bug reports and issues, so periodically gets looked at to see if anything can be fixed. But this is pure volunteer effort and there seem to be no low-hanging fruit any more. Hope this helps.

Dirk Hünniger (talkcontribs)

what else is there

pandoc: also GPL but might require some lua or haskell programmer to make it work for your case

bluespice: from 2900 EUR per year.

Reply to "The benefits to PediaPress of Going Open Source"

mediawiki2latex server now parallel

3
Dirk Hünniger (talkcontribs)
Steelpillow (talkcontribs)

Can you clarify, do you mean that the two servers can run in parallel or that each server can run multiple conversion requests in parallel?

Dirk Hünniger (talkcontribs)

the large server can run up to two requests in parallel. The normal can run up to four requests in parallel.

Reply to "mediawiki2latex server now parallel"

Including Capability for right-to-left Langauges (Arabic)

1
Sky xe (talkcontribs)

Hello team,

I tried to creat a book by exporting Arabic wiki pages using the PediaPress previw, as it was not possible to download directly as pdf for the moment due to bug. However, I noticed that the whole content is aligned left-to-right, although it comes from a right-to-left language (Arabic). Would it be possible to consider that and use the "open-righ" function as well (if not done yet)?


Best thanks

Reply to "Including Capability for right-to-left Langauges (Arabic)"

book generator removed on wikipedia

2
Dirk Hünniger (talkcontribs)
Steelpillow (talkcontribs)

It has not been removed, but user interface links to it have been and notices with incorrect statements added to many pages.

Reply to "book generator removed on wikipedia"

mediawiki2latex mass procduction / testing

5
Dirk Hünniger (talkcontribs)

Hi,

I am currently running a test on all community maintained books on the English Wikipedia. approx 5000 in total. Currently I got 283 pdf files. In 20 cases no pdf was produced, some pdfs are more that 4000 pages long. If anybody can provide webspace I will happily upload them. We could later link to them from Book namespace in Wikipedia.

Yours Dirk

Dirk Hünniger (talkcontribs)
Dirk Hünniger (talkcontribs)
Dirk Hünniger (talkcontribs)
Dirk Hünniger (talkcontribs)
Reply to "mediawiki2latex mass procduction / testing"
Return to "Reading/Web/PDF Functionality" page.