MediaWiki Developer Meet-Up 2009/Notes/WikiWord
- WikiWord extract a thesaurus from Wikipedia.
- Thesaurus supplies relations:
- term <-> concept (meaning relation)
- concept <-> concept (related, similar, broader, narrower)
- concepts = wiki articles
- terms = title, redirect, anchor text, sort key, etc
- multilingual
- concepts from multiple wikipedias combined
- terms in multiple languages refering to one concept
- useful for indexing, disambiguation
- plan: multilingual image search for commons (german blog post)
- ideas for improvement:
- get magic names and patterns from pywikipediabot config
- use incremental updates as much as possible
- look at coocurrance in paragraphs, look at co-coocurance
- for image serach: index by yimage caption (used images)