Content translation/Deployments/How-to/TPA
This is how-do document to update Template Parameter Alignment database in the cxserver.
Connect to stat100xEdit
ssh -N stat100X -L 8880:127.0.0.1:8880
Open, http://localhost:8880/
This will open JupyterHub, which requires LDAP password to login.
Starting notebookEdit
Make sure to check Kerberos authentication timeout first. Default is set to 48 hours now.
klist
Extend it by running kinit:
kinit
Running scriptsEdit
- Open terminal and clone:
https://gitlab.wikimedia.org/dsaez/templatesAlignment
- Update
config.json
for pairs requires to generate template parameter alignments. - Run all notebooks in order.
00ExtractNamedTempates.ipynb
overwrites existing output files if it runs again, so it is better to save produced JSON files (eg: templates-articles_xx.json and templates-summary_xx.json) in other directory to avoid losing data. For large languages like en, it can be reused if we are running process within few days, this will save time.- While running
02alignmentsSpark.ipynb
, make sure that Wikidata partition is up-to-date.
Updating databaseEdit
Run: scripts/prepare-template-mapping.sh
from cxserver pointing all generated files from the process.
This will update new templatemapping.db in the same folder. Use sqldiff
command (available with sqlite3-tools package in Linux) to see difference between old and new database.
Copy it to config/templatemapping.db and submit patch for review. This database can be open with sqlite command to check number of template parameters updated.
eg: sqlite> select count(*) from templates where source_lang='en' and target_lang='vec';
NotesEdit
1. fastText_multilingual module is available at: https://github.com/babylonhealth/fastText_multilingual
2. `03ProduceAlignments.py` requires https://github.com/facebookresearch/fastText/tree/master/python instead of version provided by pip.
Also seeEdit
- Issues related to Kerberos access: https://wikitech.wikimedia.org/wiki/SWAP#Access_and_infrastructure
- Jupyter at Wikitech contains useful information: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Jupyter