Extension talk:PdfHandler

About this board

PdfHandler Talk Archive


School4schools (talkcontribs)

Have all packages installed (using AlmaLinux v8.9.0) along with poppler-utils.

Software:

MediaWiki  1.41.0
PHP        8.1.27 (fpm-fcgi)
ICU        69.1
MariaDB	   10.6.17-MariaDB
Lua        5.1.5
Pygments   2.16.1

Output "which gs convert pdfinfo pdftotext" returns:

/bin/gs
/bin/convert
/bin/pdfinfo
/bin/pdftotext

PDFs uploaded still read "0 x 0 px". However on the server, using

/pdfinfo "path/filename.pdf"

shows the file size ("612 x 79s pgs (letter)

I have run suggested maintenance scripts ("refreshImageMetadata.php -f" and "rebuildImages.php")

What am I doing wrong?

School4schools (talkcontribs)

Finally resolved this issue through many frustrations. Turns out the problem was that default paths didn't work:

$wgPdfProcessor (default = "gs")
    path to your ghostscript implementation
$wgPdfPostProcessor (default = "convert")
    path to your imagemagick convert
$wgPdfInfo (default = "pdfinfo")
    path to your pdfinfo
$wgPdftoText (default = "pdftotext")
    path to your pdftotext

I'm on Linux (cPanel) and installed ghostwriter, imagemagick via standard distro system (xpdf is another story -- contact me if you're stuck on it). It finally worked when I added paths to them in localsettings.php as per:

$wgPdfProcessor = '/usr/bin/gs';
$wgPdfPostProcessor = '/usr/bin/convert';
$wgPdfInfo = '/usr/bin/pdfinfo';
$wgPdftoText = '/usr/bin/pdftotext';

These paths are to the root directory via WHM, which is not available on a shared hosting cPanel account that has only an "account" homepage.

Please contact me if you're stuck w/ this (or xpdf-utils), as it can be very frustrating trying to figure out the assumed technical knowledge in instructions.

Johnfmarko (talkcontribs)

Yes I'm totally stuck with a similar problem. Everything looks ok (binaries there and Special:Version shows the pdfhandler extension installed). I have tried your explicit path solution but .pdf s don't display inline - with debug messages I do see that there is message to the effect that the particular pdf I am trying to display can't be displayed inline. I don't suppose you have a "test.pdf" that is guaranteed to display inline? Or an idea of why some pdfs are ineligible for display? TY!

School4schools (talkcontribs)

Wish I could help you with that. For me the problem was about the paths, which had to be set at server root and not account root. Make sure you're setting it there, and if you don't have access to server root you'll have to get help from your host. Feel free to contact me. You can find me via my user name on the www. Good luck!

Reply to ""0 × 0 pixel""

Windows fix after TomRamm's and Mwgbell's fix

1
Checkitthrice8 (talkcontribs)

After applying both of their fixes, I was closer, but not all the way there. When uploading a pdf, I could see the image, but still got the 0x0. One thing that I found helpful was to look at the image table in mysql. If the img_width, img_height and img_metadata for your pdf's aren't getting set, you're not going to see thumbnails. SQL for this:

SELECT * FROM blob.image where img_minor_mime='pdf'

The problem I had was in pdfimage.php trying to create the metadata. It was faulting, and therefore never getting to the point of writing these fields. The problem for me is in the function convertDumpToArray($metaDump, $infoDump). The arguments are the outputs from the command call to pdfinfo.exe.

It's expecting $metaDump to be XML, but it starts out with some key->value pairs. The last pair is a key value pair of Metadata: and then the start of the xml. Here's an example from one of my pdfs:

Creator:        ARTS Import
Producer:       PDFlib 4.0.1 (Win32)
CreationDate:   Thu Aug  1 11:37:01 2002
ModDate:        Thu Aug  1 11:57:11 2002

...

JavaScript:     no
PDF version:    1.3
Metadata:
<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d' bytes='853'?>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
xmlns:iX='http://ns.adobe.com/iX/1.0/'>

...

This is obviously not going to make an xml parser happy. It's possible that I'm using a different version of pdfinfo.exe than this code was expecting. To see if my fix will help you, grab a pdf file and run something like this from the command line:

C:\bin\xpdf-tools-win-4.05\pdfinfo.exe -enc UTF-8 -meta file.pdf > meta

If the resulting meta file has the key->value pairs at the top like mine, you'll need this fix.

In extensions/PdfHandler/includes/phpImage.php, there's a line (in my file at line 182):

$lines=explode( "\n", $infodump);

change to

$lines=explode("\n",$metadump);

and comment out lines 216 to 219:

$metaDump = trim( $metaDump );
if ( $metaDump !== '' ) {
	$data['xmp'] = $metaDump;
}

//$metaDump = trim( $metaDump );
//if ( $metaDump !== '' ) {
//	$data['xmp'] = $metaDump;
//}

Oddly, the code was kind of built to accept the data with key/value pairs and xml mixed. I wonder if this got changed to support a slightly different package. This fix ignores the key/value pairs from $infoDump, which seem to be a duplicate of the ones in $metaDump, except for a list of the size of every single page. I don't know why that would be important to include, but if you think that would be nice, you could change the first part of the fix to:

$lines = explode( "\n", $infoDump );
$lines2 = explode("\n", $metaDump );
$lines = [...$lines, ...$lines2];

One last thing: I don't think that TommRamm's fix is thread-safe, but I'm not sure. If multiple calls happen simultaneously, I think they'll end up stomping on each other meta and page output files. It seemed like that's what was happening when I ran the maintenance files to create the thumbs I was missing. No worries, though, after a few runs they were all good.

Reply to "Windows fix after TomRamm's and Mwgbell's fix"

Previous/Next Page functionality

2
47.186.29.164 (talkcontribs)

On the file upload page for the PDF I see navigation buttons for next/previous pages, but I see no such navigation buttons on the page where the file is displayed. What am I doing wrong?

School4schools (talkcontribs)

Any progress on this ?

Would also need to be able to increase the size of the "JPG preview of this PDF file" (shows on my installation as approx 450x 600px, which is too small for readability. I have increased to $wgPdfHandlerJpegQuality = '150'; but that doesn't change anything (tested w/ new uploads).

Reply to "Previous/Next Page functionality"

/bin/bash: Permission denied

1
Bcmpinc (talkcontribs)

I'm getting an error during thumbnail generation. This is the error when viewing the file page: Fout bij het aanmaken van de miniatuurafbeelding: sh: 1: /bin/bash: Permission denied

Something goes wrong in doTransform in PdfHandler.php, but I don't understand what. I suspect the command line gets mangled somewhere. I've replaced $err = wfShellExecWithStderr( $cmd, $retval ); in that function with exec($cmd, $err, $retval);. This seems be a functional workaround.

Reply to "/bin/bash: Permission denied"

Installing xpdf-utils

3
TelePointHistory (talkcontribs)

I am trying to use PdfHandler to show thumbnails but know I don't have the pre-requisites.

Running the command in SSH: which gs convert pdfinfo pdftotext

shows that I have ghostscript but no packages for xpdf-utils

/bin/gs

/bin/convert

/usr/bin/which: no pdfinfo etc

/usr/bin/which: no pdftotext etc


Looking at this page https://www.xpdfreader.com/download.html I don't know what file to download or where to put it once it's downloaded. I am on Siteground hosting with WM 1.35.3 if that makes a difference. I'm hoping someone can give me a basic rundown for how it's meant to work. thanks.

School4schools (talkcontribs)

Did you ever get this extension working w Xpdf-utils / XpdfReader installation?

Kghbln (talkcontribs)

Siteground has to install this for you on the server I guess. Best way is to contact their support.

Reply to "Installing xpdf-utils"

No PDF/thumbnail, issue executing pdfinfo/pdftotext, Windows Server 2012 R2, IIS 8.5, MW 1.31

6
Tommyheyser (talkcontribs)

MW 1.31.1 running on Windows Server 2012 R2 IIS 8.5

I'm getting the following error (from $wgDebugLogFile output log file) for all execution of pdfinfo and pdftotext.

[exec] Error running "pdfinfo" "-enc" "UTF-8" "-meta" "C:/inetpub/wwwroot/w/images/f/f4/Phone_List.pdf": 'pdfinfo" "-enc" "UTF-8" "-meta" "C:' is not recognized as an internal or external command, operable program or batch file.

I'm not sure if this is the result of the new Shell framework introduced in 1.30, Manual:Shell framework, which replaces wfShellExec(). The debug log line before the error is:

[exec] MediaWiki\Shell\Command::execute: "pdfinfo" "-enc" "UTF-8" "-meta" "C:/inetpub/wwwroot/w/images/f/f4/Phone_List.pdf"

Tommyheyser (talkcontribs)
Tommyheyser (talkcontribs)

In case someone else is having this issue of not seeing PDF and is running MW 1.31 on Windows Server 2012 R2.

  1. I added the path to pdfinfo.exe and pdftotext.exe to System variables path (mine was C:\Program Files\xpdf-tools-win-4.00\bin64).
  2. Then, I edit {mediawiki install path}/extensions/PdfHandler/includes/PdfImage.php function retrieveMetaData.

a. Replacing:

$cmdMeta = [
$wgPdfInfo,
'-enc', 'UTF-8', # Report metadata as UTF-8 text...
'-meta',         # Report XMP metadata
$this->mFilename,
];

with

$cmdMeta = "pdfinfo.exe -enc UTF-8 -meta " . $this->mFilename;

b. Replacing

$cmdPages = [
$wgPdfInfo,
'-enc', 'UTF-8', # Report metadata as UTF-8 text...
'-l', '9999999', # Report page sizes for all pages
$this->mFilename,
];

with

$cmdPages = "pdfinfo.exe -enc UTF-8 -l 9999999 " . $this->mFilename;

c. Replacing

$cmd = [ $wgPdftoText,  $this->mFilename, '-' ];

with

$cmd = "pdftotext.exe " . $this->mFilename;


It's a bit of a hack, but it works. This should last until the issue is properly fixed.

173.77.3.157 (talkcontribs)
TomRamm (talkcontribs)

Since the source code has changed considerably in the meantime, this approach no longer works. I have done the following to make it work for me:

created a new file in the scripts subfolder

scripts/retrieveMetaData.cmd

@echo off

if NOT "%PDFHANDLER_INFO%" == "" call:runInfo
if NOT "%PDFHANDLER_TOTEXT%" == "" call:runToText

EXIT /B %ERRORLEVEL%

:runInfo
	call "%PDFHANDLER_INFO%" -enc UTF-8	-meta file.pdf > meta
	call "%PDFHANDLER_INFO%" -enc UTF-8 -l 9999999 file.pdf > pages
EXIT /B 0

:runToText
	call "%PDFHANDLER_TOTEXT%" file.pdf - > text
	echo %ERRORLEVEL% > text_exit_code

EXIT /B 0

in includes/PdfImage.php In the function retrieveMetaData, I changed the call of the script depending on the operating system. Under Linux the original code is used, under Windows the .cmd script is called instead of the .sh script, and the script is not passed as a parameter but directly.

if (strtoupper(substr(PHP_OS, 0, 3)) === 'WIN') {
	# 'This is a server using Windows!'
	$result = $command
		->params( 'scripts/retrieveMetaData.cmd' )
		->inputFileFromFile(
			'scripts/retrieveMetaData.cmd',
			__DIR__ . '/../scripts/retrieveMetaData.cmd' )
		->inputFileFromFile( 'file.pdf', $this->mFilename )
		->outputFileToString( 'meta' )
		->outputFileToString( 'pages' )
		->outputFileToString( 'text' )
		->outputFileToString( 'text_exit_code' )
		->environment( [
			'PDFHANDLER_INFO' => $wgPdfInfo,
			'PDFHANDLER_TOTEXT' => $wgPdftoText,
		] )
		->execute();
} else {
	# 'This is a server not using Windows!'
	$result = $command
		->params( $wgPdfHandlerShell, 'scripts/retrieveMetaData.sh' )
		->inputFileFromFile(
			'scripts/retrieveMetaData.sh',
			__DIR__ . '/../scripts/retrieveMetaData.sh' )
		->inputFileFromFile( 'file.pdf', $this->mFilename )
		->outputFileToString( 'meta' )
		->outputFileToString( 'pages' )
		->outputFileToString( 'text' )
		->outputFileToString( 'text_exit_code' )
		->environment( [
			'PDFHANDLER_INFO' => $wgPdfInfo,
			'PDFHANDLER_TOTEXT' => $wgPdftoText,
		] )
		->execute();
}		

--~~~~

Mwgbell (talkcontribs)

I had a similar problem using ImageMagick 7.1.0-19 Q16-HDRI with MedaiWiki 1.37.1 on Windows 11. To fix it, in extensions\PdfHandler\includes\PdfHandler.php

Change this line:

$cmd .= " | " . wfEscapeShellArg(

$wgPdfPostProcessor,

"-depth",

"8",

"-quality",

$wgPdfHandlerJpegQuality,

"-resize",

$width,

"-",

$dstPath

);

To this: (i.e. move the "-" to the first thing after the $wgPdfPostProcessor, line):

$cmd .= " | " . wfEscapeShellArg(

$wgPdfPostProcessor,

"-",

"-depth",

"8",

"-quality",

$wgPdfHandlerJpegQuality,

"-resize",

$width,

$dstPath

);

Reply to "No PDF/thumbnail, issue executing pdfinfo/pdftotext, Windows Server 2012 R2, IIS 8.5, MW 1.31"

Direct linking to PDF page, When clicking to direct media

2
Gmillerd (talkcontribs)

Does anyone have a modification of the extension to make click of the PDF when a page is specified to go to that page?

/mediawiki/index.php?title=File:Filename.pdf&page=25

to the following, to make the browser skip to the specified page?

/images/0/0b/Filename.pdf#page=25

I am able to do it in javascript, but the PHP evades me.

$("#file.fullImageLink").find("a:first").each(function() {
    $(this).attr("href", $(this).attr("href") + "#page=" + getUrlParameter("page"));
});
212.59.13.226 (talkcontribs)

Use # instead of ...&page=25

No PDF images displayed

4
Darlig Gitarist (talkcontribs)

PDFHandler extension is supposed to allow viewing of pdf files. However, this does not appear to be working as advertised.

We've gone through the troubleshooting area of MediaWiki for this plugin and double-checked the paths to PDF converters. We re-ran the maint scripts for images and image meta. We checked the logs.

There is no indication of errors other than the images not showing up.

MediaWiki 1.35.5
PHP 7.4.27 (fpm-fcgi)
MySQL 5.7.37-0ubuntu0.18.04.1-log
ICU 60.2
Lua 5.1.5
PDF Handler – (16eda4b) 20:58, 2022 January 23

Any help or suggestions would be appreciated.

Cboltz (talkcontribs)

Wild guess: Some Linux distributions (for example openSUSE) have disabled rendering of PDF files in their default ImageMagick config because it has been a steady source of security issues (for example "ImageTragick"). In openSUSE, you'd need to install the ImageMagick-config-7-upstream package to enable rendering of PDF files.

Note: I don't know if Ubuntu did something similar with the ImageMagick config.

If unsure, test if converting a PDF to an image in the shell works: convert foo.pdf foo.png

Drewsaur (talkcontribs)

I have done this, and still can't get the extension to work. Any other ideas?

Michele.Fella (talkcontribs)
..check /etc/ImageMagick-<your_version>/policy.xml 

if <policy domain="coder" rights="none" pattern="PDF" /> means convert is not allowed to perform its job..

you might change rights="read | write"

but you should be aware and responsible of the security risks this might bring (as Cbolts mentioned)

Reply to "No PDF images displayed"

PDFHandler not working. Still displays File Link

3
199.27.199.51 (talkcontribs)

PDFhandler is confirmed installed on Special:Versions, which returns all the required directories, and no extensions could be interfering with the install. What's the issue?

Drewsaur (talkcontribs)

I am having this issue too. I have changed the settings in ImageMagick so that PDFs are able to be converted; verified this at the command line; verified that all 4 related utilities are working at the command line; run all the maintenance scripts; and...nothing.

Michele.Fella (talkcontribs)
..check /etc/ImageMagick-<your_version>/policy.xml 

if <policy domain="coder" rights="none" pattern="PDF" /> means convert is not allowed to perform its job..

you might change rights="read | write"

but you should be aware and responsible of the security risks this might bring (check post below from Cbolts)

Reply to "PDFHandler not working. Still displays File Link"
87.165.252.36 (talkcontribs)

Fehler beim Erstellen des Vorschaubildes: limit.sh: timed out executing command "('/usr/bin/gs' '-sDEVICE=jpeg' '-sOutputFile=-' '-sstdout=%stderr' '-dFirstPage=1' '-dLastPage=1' '-dSAFER' '-r150' '-dBATCH' '-dNOPAUSE' '-q' '/opt/mediawiki/images/7/7e/A.K.2023.pdf' | '/usr/bin/convert' '-depth' '8' '-quality' '95' '-resize' '800' '-' '/tmp/transform_96151e2ec90e.jpg')"

I already changed some settings but obviously not the right ones. What to do to get it work on a PDF file with A LOT of pixels in each direction?

Reply to "Timeout"
Return to "PdfHandler" page.