AKhatun (WMF)

Aisha Khatun

NLP Researcher @ Research Team, Wikimedia Foundation

Learning is my passion. Everything else just falls in place.

About me

I am an ML and NLP enthusiast from Bangladesh. I love working wih data and drawing information from them. I did my Bachelors in Computer Science and Engineering from Shahjalal University of Science and Technology, Bangladesh and Masters in Computer Science from University of Waterloo, Canada. Upon graduation, I worked as a Machine Learning Engineer for about a year before joining Wikimedia Foundation as a Data Analyst and Researcher, performing several roles along the way.

My work

I am currently working with the Research Team to improve link recommendation in all Wikipedia languages. This work includes fixing mwtokenizer to help parse all languages, improve existing language dependent link recommendation models, and then creating a language agnostic link recommendation model that will replace the 200+ language independent models deployed at present.
I worked with the Research Team as a Research Data Scientist (NLP) to develop Copyediting as a structured task. To increase and maintain the standard of Wikipedia articles, it is important to ensure articles don't have typos, spelling, or grammatical errors. While there are ongoing efforts to automatically detect "commonly misspelled" words in English Wikipedia, most other languages are left behind. The intention was to find ways to detect errors in articles in all languages in an automated fashion. I wrote a program to automatically curate a list of commonly misspelled words from 100+ languages using Wiktionary. The coverage of these lists were compared with misspelling lists in 2-3 languages, and then the list was used to detect misspellings in all possible Wikipedia languages.
Previously I worked with the Search and Analytics team to find ways to scale the Wikidata Query Service by analyzing the queries being made. Find the analysis results in User:AKhatun Subpages. Phabricator Work Board (WDQS Analysis).
I worked on the Abstract Wikimedia project to analyze find out central Scribunto Modules across all the wikis. This work leads to the creation of a central repository of functions to be used in a language-independent manner in the future. See our work in Phabricator and Github.

Disclaimer: I work for or provide services to the Wikimedia Foundation, and this is the account I try to use for edits or statements I make in that role. However, the Foundation does not vet all my activity, so edits, statements, or other contributions made by this account may not reflect the views of the Foundation.

Contact me

Email: akhatun-ctr@wikimedia.org
IRC: tanny411 on libera.chat on freenode
Personal: website/blog
LinkedIn: tanny411
GitHub: tanny411
Personal MediaWiki: Aisha Khatun
Personal Meta: Aisha Khatun
Phabricator: AKhatun_WMF
Wikitech: AKhatun