- brew install djvulibre
- pip install cython
- pip install python-djvulibre
- brew install homebrew/python/numpy
- needs imagemagick with libtiff I think.
- storing images in a mysql blob seems like a bad idea.
- bawolff says $image->getHandler()->getPageText( $image, $pageNumber ); where $image is from wfFindFile()
convert to PHP - python script is a good POC, but not extendable for MW's purposes
- write PHP to shell out to modified python script
- ideally we could do this without Python, but I don't know the djvu-PHP stuff.
- investigate being able to filter out useless ones like just a single number
- figure out wikisource+i18n? POC resulted in a lot of greek text
- how does attribution work?
- Needs to integrate with ConfirmEdit?
- Review interface for Wikisourcians
- Something to support non-WMF wikis
- ???
- Profit.
- Wikisourcian marks djvu file for captcha-ification as proofread file or empty file
- JobQueue starts generating new captcha files
- user hits captcha, store their result for the unknown word
- after new word has reached a certain amount of results, move it into review queue
- review queue shows image + submitted answers, reviewer picks best and API makes edit automatically.
- magic.