I exported the petscan file results and converted it to URLs so they can be opened quickly in new tabs for categorization. I still think a feature to OCR files in a specified category would be very useful. Instead of enabling adding categories based on that I guess one could have the tool write the OCR text somehow to the file info whereby one could then create a search query to bulk-categorize them from SpecialSearch using cat-a-lot....e.g. sth like ocr:", 2016" deepcategory:"Our World in Data maps"
(or insource:"|ocr=, 2016"
) would go into cat c:Category:2016 maps of the world (except for nonworld maps which can be easily spotted). This was just an example.
- Adding a feature to OCR all files in a category using
incategory
search operator
- Adding a feature to write the OCRd text to the file description
@Enhancing999 and Glrx: you may be interested in it since you participated in the discussion. Nevertheless, I don't think it's an overly important issue and having so much OCRd text in Commons could also cause problems if files also show up when terms in the ocr field of the Information template(?) are searched for without something like ocr:"search terms". However, since that OCR tool is already there maybe implementing it wouldn't take that much time and be worth it or it may be good to track this somewhere else.