Content translation/Machine Translation/Apertium/Service

We are introducing our first self-hosted service for CX: Apertium. Apertium provides machine translation (MT), which is critical component for CX.

Why Apertium as a service? edit

1. WMF Language Engineering team (LE) works with Apertium developers closely in reporting bugs and getting Apertium improved. We need to use recent version of Apertium and its fixes and also need to keep it up to date. This collaboration and communication loop work only if LE team has full control on software running on Apertium service. LE team is currently using its own instance on Labs at http://apertium.wmflabs.org

2. Apertium on Ubuntu/Debian are outdated and updating packages on Ubuntu/Debian is an extra effort and there is no guarantee that it will be up-to-date on distro as packages (despite of working closely with Debian Science team and we have proposed up-to-date package to team with help of Apertium upstream).

3. We are going to use Apertium-Apy to serve MT requests. APY provides a web service on top of Apertium and also makes the Apertium scalable by loading the processing pipelines once for all requests. This is the recommended scalable production setup by Apertium

4. Using Apertium-APY and latest Apertium will make sure that user is getting decent result while requesting MT from Apertium, which is the the best Free and Open Source machine translation software available. Work on Google/Bing MT services will take some time and bring a similar situation (we will handle it via LCA team first!) to us without any control over it.

List of packages edit

List of currently deployed packages can be found at: https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/apertium/+/refs/heads/master/.pipeline/blubber.yaml

Links edit

  1. CX Technical architecture
  2. Apertium in CX