Manual talk:MIME type detection

Latest comment: 1 year ago by Simon Stier in topic Adding new, application-specific MIME type

Fix for MS Office File Confusion

edit

I have had continuing problems with my system (RHEL 5) not recognizing MS Office files correctly. More precisely, Excel and Powerpoint files get recognized as MIME type application/msword. This seems to be a common problem related to the fact that the only reliable way to tell this is to look a certain offset from the end of the file, and the magic file standard used on unix-based systems only knows how to specify an offset from the beginning of the file.

I'm sure there's a more elegant workaround to this, but I was able to solve it by adding the following code to the file includes/MimeMagic.php. It should be inserted immediately before the line

    if (strpos($mime,"text/")===0 || $mime==="application/xml") {

This on or about line 390:

     # UGLY HACK TO FIX BAD MIME IDENTIFICATION FOR MSOFFICE FILES 11/27/07, DWIGGINS
     # Update 3/5/2008 - AEGA
     # note that this requires the addition of ppt and xls as possible extensions for
     # application/msword files  in the includes/mime.types file.
     if ( $ext === true ) {
           $i = strrpos( $file, '.' );
           $ext = strtolower( $i ? substr( $file, $i + 1 ) : '' );
     }
     if( $mime === 'application/msword') {
          wfDebug( "$fname with extension $ext  has a msword MIME type.\n" );
          if ( $ext === 'ppt' ) {
               $mime = 'application/vnd.ms-powerpoint';
               wfDebug( "$fname reset to Powerpoint MIME type.\n" );
          } elseif ( $ext === 'xls' ) {
               $mime = 'application/vnd.ms-excel';
               wfDebug( "$fname reset to Excel MIME type.\n" );
          }
     }
     # END UGLY MIME HACK

As noted above, you will need to edit the includes/mime.types file and add a line that looks like so:

    application/msword doc ppt xls

If someone has a more elegant way of solving this problem, I'd love to hear about it. In the meantime, I hope this helps someone! --Dmdwiggi 02:55, 28 November 2007 (UTC)Reply

Small changes to make it work properly v1.11 ($ext is already defined as already having the extension (string) or true - need to extract it) --Aega 20:12, 5 March 2008 (UTC)Reply

I'm just new to PHP, trying to make ppt files 'uploadable' in my mediawiki. So I incorporated the above script in v1.11, immediately before the mentioned line but it gave the next error:

    Parse error: syntax error, unexpected ')' in .../mediawiki-1.11.0/includes/MimeMagic.php on line 399

Line 399 is <$ext = strtolower( $i ? substr( $file, $i + 1 ) : '' );> Any help needed here, thanks.--Albert_K 14:13, 8 March 2008 (UTC)Reply

This error occurs due to invalid coding syntax, the line should read : $ext = strtolower( $i ? substr( $file, $i + 1 ) : "" ); Hope this helps : http://www.paramiliar.com


magic file

edit
The 'problem' is that file type is determined by the Unix `magic` command, which looks at the first few bytes of the file for a unique pattern. But from what I have seen this doesn't work so well. These are all "msword" documents, just different subtypes. On my system (probably yours too) the file /usr/share/magic.mime contains the patterns, including
# msword: file(1) magic for MS Word files
#
# Contributor claims:
# Reversed-engineered MS Word magic numbers
#

0       string          \376\067\0\043                  application/msword
0       string          \320\317\021\340\241\261        application/msword
0       string          \333\245-\0\0\0                 application/msword
Now you might think that the first is .doc, the second .ppt, the third .xls, but that's not the case. I looked at .ppt and .xls files I had lying around and they both match the second line, and they match each other much farther along than just these first 6 bytes.
If there is a distinguishable pattern later in the file header which indicates .ppt or .xls then that could be used to determine a more specific MIME type, rather than just trusting the file extension. The magic file format is more flexible than just specifying a fixed offset from the beginning of the file, but more complicated cases require more complicated rules. But since a quick comparison between the files shows that the first many bytes of both kinds of files are the same, then just on content, they are indeed all "msword" files, and that's the best you can do based on content. In that case trusting the file extensions may be all you can hope for. --Eric Myers 14:53, 28 November 2007 (UTC)Reply

Fix for Uploading MS Word 2007 (and greater) Files

edit

The mime detector installed in your environment (PHP FileInfo, MagicMime, or UNIX mime detection) may identify your MS Word 2007 (or greater) file as an application/zip mime type. That is partly correct as MS Office 2007 files (and later) are a kind of zip file. In MediaWiki 1.16.x this will cause a "File extension does not match MIME type" error when you try to upload MS Word 2007 files. This error can arise even for files of extension .doc that have been saved by MS Word 2007.

This issue is meant to be resolved in MediaWiki 1.17 not resolved in MediaWiki 1.20.

One workaround for MediaWiki 1.16.x is as follows:

  • In mediawiki/includes/mime.types add word document extensions to the application/zip mime type. E.g.
   application/zip zip jar xpi  sxc stc  sxd std   sxi sti   sxm stm   sxw stw  doc docx 
  • In mediawiki/includes/mime.types delete word extensions E.g.
   application/msword
  • In mediawiki/LocalSettings.php add the word extensions to the permitted upload list. You don't need to add a zip extension here.
   $wgFileExtensions = array('bmp', 'doc', 'docx', 'gif', 'jpeg', 'jpg', 'mpp', 'pdf', 'png', 'ppt', 'pptx', 'ps', 'tiff', 'xls', 'xlsx');
  • From mediawiki/includes/DefaultSettings.php copy your $wgMimeTypeBlacklist array to LocalSettings.php. In LocalSettings.php comment out the application/zip mime type item in the $wgMimeTypeBlacklist array.
   $wgMimeTypeBlacklist = array(
     # HTML may contain cookie-stealing JavaScript and web bugs
     'text/html', 'text/javascript', 'text/x-javascript',  'application/x-shellscript',
     # PHP scripts may execute arbitrary code on the server
     'application/x-php', 'text/x-php',
     # Other types that may be interpreted by some servers
     'text/x-python', 'text/x-perl', 'text/x-bash', 'text/x-sh', 'text/x-csh',
     # Client-side hazards on Internet Explorer
     'text/scriptlet', 'application/x-msdownload',
     # Windows metafile, client-side vulnerability on some systems
     'application/x-msmetafile',
     # A ZIP file may be a valid Java archive containing an applet which exploits the
     # same-origin policy to steal cookies
     # Commented out to allow uploading of Office 2007 (and greater) files which are identified as .zip files.
     #'application/zip',
   );

Or put the following in your LocalSettings.php file:

if (($key = array_search('application/zip', $wgMimeTypeBlacklist)) !== false) {
    unset($wgMimeTypeBlacklist[$key]);
}

This too is an ugly hack but it might do the trick until MediaWiki 1.17 is released. --John Bentley 09:06, 10 June 2011 (UTC). Note that by allowing .zip file uploads you expose your site to the security vunerability mentioned in the comments. Thus, this work around may be only suitable to private wikis. --John Bentley 09:16, 10 June 2011 (UTC)Reply


Uploading of MS Word files seems to still be broken in 1.19.1, so I've lodged a bug report. Note that adding $wgAllowJavaUploads = true; to LocalSettings.php, in addition to modifying mime.types as above, worked for me. Sam Wilson 01:38, 17 July 2012 (UTC)Reply


In my wikis (MW 22 & 24) works fine $wgAllowJavaUploads = true; with $wgVerifyMIMEType = false; in LocalSettings.php. Ency (talk) 21:43, 26 January 2015 (UTC)Reply

Simplified in MW 1.26

edit

application/zip no longer appears in $wgMimeTypeBlacklist. All I had to do to avoid the error message was to add "docx" to the list for application/zip. Adding it also to $wgFileExtensions is optional (as long as $wgStrictFileExtensions = false) - suppresses the warning. Wikimikef (talk) 18:26, 26 July 2016 (UTC)Reply

Another fix for uploading MSWord files, in Mediawiki 1.21.2

edit

First, check on your server for a command line tool used to detect mime types. On my Linux, it is 'file', run as follows: 'file -bi filename'.

Then run that tool against the MSWord files that fail to load. In my case, those are some xls files, and 'file -bi' returns 'application/vnd.ms-office' as mime type.

Next, you'll need to modify your MimeMagic.php. In guessMimeType(), in that file, the problem is that doGuessMimeType() returns "application/gzip" for my xls files. So I insert the following right after doGuessMimeType() is called, to fix its return code:

$mime = $this->doGuessMimeType( $file, $ext );

# Begin workaround
wfDebug( __METHOD__ . ": Post-doGuessMimeType() check for Microsoft mime types for \$mime.\n");
if ( !strcmp($mime, "application/zip") ) {
  $mime1 = $this->detectMimeType( $file, $ext );
  wfDebug( __METHOD__ . ": Post-doGuessMimeType() check: mime type detected as \$mime1.\n");
  if ( !strcmp($mime1, "application/vnd.ms-office") ) {
    wfDebug( __METHOD__ . ": Post-doGuessMimeType() check: changing mime type to \$mime1.\n");
    $mime = "application/vnd.ms-office";
  }
}
# End workaround

if( !$mime ) {
  wfDebug( __METHOD__ . ": internal type detection failed for $file (.$ext)...\n" );
  $mime = $this->detectMimeType( $file, $ext );
}

To see the log messages, set wgDebugLogFile in your LocalSettings.php:

$wgDebugLogFile ="path_to_log_file.txt";

When done debugging, don't forget to comment that out. In mime.types, add the mime extensions matching the mime type detected by "file -b" (in my case this is 'application/vnd.ms-office'):

application/vnd.ms-office doc docx xls xlsx ppt

Finally, define wgAllowJavaUploads in your LocalSettings.php, to prevent another check for trailing characters that will prevent xls files from loading. Here is my relevant portion of LocalSettings.php:

$wgMimeDetectorCommand = "file -bi";
$wgAllowJavaUploads = true;

In sum, this fix is not perfectly secure, but is perhaps a bit more secure than blindly disabling the mime upload checks in order to be able to load xls files. I really do hope MediaWiki will fix this in later releases, because too many users are getting blocked by this.

Fix for MS Excel 2003 file saved from MS Office 2007

edit

It turns out that Excel file saved from MS Office 2007 as MS Excel 2003 version is not the same as Excel file saved from MS Excel 2003. Mediawiki recognizes it incorrectly as 'application/zip' and it is not possible to upload as the extension 'xls' missmatches the MIME type 'application/zip'.

A termporary workaround in the version 1.20.2:

  • change 'includes/MimeMagic.php' file around the line 820 there is this 'if'
   if ( substr( $header, 512, 4) == "\xEC\xA5\xC1\x00" ) {
       $mime = "application/msword";
   }

add beneath another 'if' for 'application/vnd.ms-excel'

   if ( substr( $header, 512, 4) == "\xEC\xA5\xC1\x00" ) {
       $mime = "application/msword";
   }
   if ( substr( $header, 512, 4) == "\xFD\xFF\xFF\xFF" ) {
   	$mime = "application/vnd.ms-excel";
   }

Yes, there already is a 'switch' which tests for the 'application/vnd.ms-excel', but I think 6 bytes are too long. I have many files which vary in the 5th byte, e.g., by using hexdump tool 'hexdump -s 512 -n 6 -x':

   fffd    ffff    01c1
   fffd    ffff    01c3
   fffd    ffff    01c4
   fffd    ffff    01f3
  • change 'Localsettings.php'
   $wgAllowJavaUploads = true;

Otherwise, it will fail with the error:

   UploadBase::verifyExtension: mime type application/vnd.ms-excel matches extension xls, passing file
   UploadBase::detectScript: checking for embedded scripts and HTML stuff
   UploadBase::detectScript: no scripts found
   ZipDirectoryReader: Fatal error: trailing bytes after the end of the file comment
   ContextSource::getContext (UploadForm): called and $context is null. Using RequestContext::getMain() for sanity
   Class PEAR_Error not found; skipped loading

Settings for OpenDocuments

edit

Since Manual:$wgFileExtensions leads people to this page, we should explain which entries to make for odt and ods files. Thanks, --Flominator 10:22, 20 April 2008 (UTC)Reply

Update: I've found it: [1] --Flominator 13:18, 13 May 2008 (UTC)Reply

% signs in the filename

edit

I have noticed on my wiki that % signs in the filename can also cause MediaWiki to reject the file; eliminating the % signs solves the problem. Tisane 18:42, 15 April 2010 (UTC)Reply

Should add information about how you appended to this mime.

edit

For instance I'm getting "does not match the detected MIME type" and I have no idea what to do, apart from disabling mime checking - which is the only thing explained here. --EnBruger (talk) 15:57, 4 August 2012 (UTC)Reply

Adding new, application-specific MIME type

edit

I'm using MediaWiki to organize 3d models (.obj, .off, .wrml, .mesh files etc.). Typically our models are ASCII text files and get automatically recognized as text/plain. This causes trouble when I want to make a custom thumbnail generator for these files (offscreen render to image). Are there best practices for appending new MIME types and altering the file --> thumbnail pipeline? Thanks, Mangledorf (talk) 08:33, 18 October 2012 (UTC)Reply

for reference: Manual:MIME type detection#Improve MIME type detection Simon Stier (talk) 02:59, 5 September 2023 (UTC)Reply

MIME doesn't match extension

edit

@Krinkle: What should you do if you find a file whose MIME doesn't match its extension in the title?Jonteemil (talk) 23:26, 10 June 2021 (UTC)Reply

Possibly containing apples?! (Detection: Extra Information)

edit

Under https://www.mediawiki.org/w/index.php?title=Manual:MIME_type_detection&oldid=4667634#Extra_info is the following line:

OFFICE // Office Documents, Spreadsheets (office formats possibly containing apples, scripts, etc)

Perhaps meant to be:

possibly containing applets

over:

possibly containing apples

It's a fun typo if a typo - I wanted to check before editing.

Beet keeper (talk) 10:30, 15 September 2021 (UTC)Reply

Yep, a typo. Fixed, thanks!. --Ciencia Al Poder (talk) 11:00, 15 September 2021 (UTC)Reply
Return to "MIME type detection" page.