Release status: beta
|Using Apache Tika, provides text and metadata extraction for thousands of file types, enabling full-text search of almost any uploaded file
|Matt Marjanovic (CtapMaddogtalk)
|Center for Transparent Analysis and Policy
|Master maintains backward compatibility.
|GNU General Public License 3.0 or later
|Translate the TikaAllTheFiles extension if it is available at translatewiki.net
The TikaAllTheFiles (TATF) extension facilitates full-text search over uploaded files, by using the Apache Tika content analysis toolkit, which "detects and extracts metadata and text from over a thousand different file types".
In practical terms: if you already have Extension:CirrusSearch set up and working on your wiki, TATF will allow you to perform full-text searches over the contents of almost any uploaded file --- not just the PDFs.
TATF's features and capabilities:
- extract embedded digital text from any type of uploaded file so that it can be indexed for full-text search;
- extract and index printed text from bitmap image files and from images embedded in document files, e.g., image-only PDF's (requires Tesseract OCR;
- extract metadata from any type of uploaded file for display on
- index metadata properties along with text, to enable simple searching for properties within full-text search.
This extension can be installed using
The complete installation and configuration instructions can be found in.
Configuration parameters edit
The complete description of configuration parameters can be found in.