Language Testing Plan/Testing Strategy

Goals of this document edit

This document describes about current and future testing procedures and strategy inside Wikimedia Language Engineering team.

Testing workflow edit

Current workflow of testing is not well defined or well documented. We’re using two types of testing in our workflow.

For each developer’s individual testing methodology, see spreadsheet here[1].

Design references edit

Before testing begins, we spend time in feature conception and GWTs for writing various tests like browser tests.

  • Feature conception
  • GWTs

Manual testing edit

  • Manual testing of patches submitted to Gerrit.
  • Manual testing in MLEB[2] is done once in a month (mostly release is near end of month). This includes Universal Language Selector (ULS) extension as of now. Other extensions are not well-tested or less-tested for manual testing in MLEB.

Manual testing of tests like PHPUnit and QUnit tests where applicable. See ‘Statistics’ section about number of Unit tests in our extensions at moment.

Automated testing edit

  • Automated testing after patch is submitted to Gerrit. This will check patch against jslint, PHP Codesniffer and other syntax errors etc. This procedure (along with release) is well documented at, Continuous Delivery Process diagram[3]
  • Automated browser testing on beta wikis and several instances.
  • Browser tests - coding, review, maintenance (i.e. how is it being changed if the original feature changes or bugs are corrected)
  • Pairing with QA:
    • Jenkins configuration fixes (optional)
    • Bug fixes.
    • Betalabs configuration related fixes.

Minimal testing edit

I have defined testing as ‘Minimal testing’ which is minimal testing.

 

Manual edit

  • Change passes unit tests like QUnit/PHPUnit written for code.
  • Change passes browser test written for code.

Automated edit

  • Code passes Jenkins validations (Which is must to merge code).
  • Code passes browser tests with given components. Failures will be notified on Jenkins.

Average testing edit

An average testing scenario would be what we are doing in Minimal testing case plus following test cases.

 

Manual browser testing edit

  • We run browser testing our own. ie run tests manually upon patch.
  • Test it with different browsers.

Extreme testing edit

Extreme testing will include all scenarios from Minimal, Average plus following cases. Extreme scenario will also focus on testing procedure which is more focused on testing:

 
  • Unit test is written for each patch/changes for backend.
  • Browser test is written for each frontend change or updates.
  • We test change to make sure that several components are taken into account. For example, we do stress testing, try to make test fails, try edge cases etc.
  • We are using different browsers to test changes.
  • Several people are involved into testing when needed. ie More feedback on critical changes.
  • If change is big, we set up instance(s) on Labs to give more robust testing.
  • We use Test Case Management System for manual testing.[4]
  • Browser tests are always Green!

Statistics edit

Browser tests edit

We run browser tests on several beta wikis and instances. Following are number of scenarios for each extension our team maintains:

  • Universal Language Selector: 45
  • Translate: 35
  • Content Translation: 11
  • Twn Main Page: 23

Unit tests edit

We have two types of Unit tests for our extensions. There is discussion/thoughts going on writing unit tests for node.js for Content Translation.

QUnit tests edit

  • Universal Language Selector: 3

ULS also inherited following QUnit tests from upstream jquery.* libraries. Note that this is total list of assertions in unit tests.

  • jquery.i18n: 160
  • jquery.ime: 5109
  • jquery.uls: 48
  • jquery.webfonts: 29
  • Translate: 3
  • Content Translation: 19
  • Twn Main Page: 0

PHPUnit tests edit

Note: This number can be misleading, as I counted number of files as one ‘unit’. Several tests are done using single file in some cases. But, it should give clear idea which extension has most PHPUnit tests as of now.

  • Universal Language Selector: 2
  • Translate: 45
  • Content Translation: 0
  • Twn Main Page: 2

Use Cases edit

ULS edit

  • Stage 0:
    • ULS testing inherited testing from upstream jquery components (jquery.uls, jquery.ime, jquery.i18n and jquery.webfonts).
    • Unit tests are mostly written for jquery.* libraries. This is one time procedure along with updates in code as required.
  • Stage 1:
    • Frontend features are tested with Browser Testing which involves coding, review, local testing, various validations and pairing with QA.
    • Once Browser Test is merged in Gerrit, it runs regularly by Jenkins twice in a day. Failures are reported via Email to listed developers in Jenkins configuration file.
  • Stage 2:
    • Failures of Browser Tests are fix by developers and/or QA. Pairing sessions are done for this.
  • Manual tests are often performed for ULS's various features such as manual verification of features like Autonym font, IMEs and other components.

Recommendations edit

  • While browser tests are useful for ULS, it is taking more time than we expected (for example, failures debugging with Internet Explorer), we should timebox our browser testing.
  • Manual testing is always helpful in ULS where we deal with fonts and input methods.
  • MLEB has indicated that manual test reveals more bugs than automated testing in the case of ULS.

WebFonts edit

Webfonts are part of main ULS repository.

  • Webfonts code is on jquery.webfonts Github repository.
  • Actual webfont is in ULS fontrepo, which contains webfont testing interface which needs manual testing when fonts are updated or changed by developer. Manual testing of font is also received as feedback from various language community.
  • Some part of webfont testing often done as part of Translate.
  • RTL testing and feedback is done by team and community as part of automated and manual testing.

Recommendations edit

  • Webfonts requires mix of Unit testing and manual testing.
  • Features like aesthetic sense of particular fonts or RTL needs manual testing.

MLEB edit

MLEB test cases are repeated for 4 browsers (Google Chrome, Firefox, Safari and Internet Explorer) with 2 releases (stable and legacy-stable at time of testing).

  • Total number of test cases of each browser for each release: 153
  • Total test cases (all): 612

Recommendations edit

  • TestLink instance is useful and we should continue test MLEB using it to save time in testing and record test cases.

Content Translation edit

Content Translation testing follows following process for testing:

  • GWTs and feature conception for frontend testing was written as ContentTranslation project was started.
  • Automated browser testing was setup to do frontend testing.
  • Automated browser testing was stopped as we couldn't keep pace of refactoring browser tests and ContentTranslation backend code.
  • Manual testing was adopted for backend.
  • Manual testing was done for important features like segmentation which is tested against in-production Wikipedia's articles.
  • Instances setups for ContentTranslation server and client.
  • Server testing playground (cxserver.wmflabs.org) was setup to showcase and test features like segmentation and to check server logs etc.
  • node.js testing framework is progress and we need it for backend.
  • Unit testing for features like ROT13 and segmentation is written.

Recommendations edit

  • Instances:
    • Server and client infrastructure setup requires maintenance via cron job, but someone need to watch in case of downtime of Labs instances.
    • Backup etc is taken care by Labs team.
  • Manual testing:
    • Manual testing for features like Segmentation should be recorded and move to automated testing when redundant.
    • Define criteria where we need to stop manual testing process and automated testing.
  • Unit testing:
    • Setting up node.js infrastructure is priority.
  • Performance testing:
    • We should focus on performance testing (and measure) for Content Translation.

Platforms edit

We are using following platforms in our testing:

Beta labs edit

  • Beta labs has similar (almost) set up as production wikis.
  • It is used as Test Wiki in browser testing.
  • Current list of test wikis used by Language Engineering team is Test Wikis page.[5]

Jenkins edit

Jenkins is used in automated browser testing jobs.[6]

Recommendations edit

1. Recommendations for each project is listed in Use Cases section above.

2. Timebox the testing:

  • We should timebox the testing.
  • MLEB for each browser should not exceed 2 hours per person. This can be further reduced with TestLink optimizations.
  • Browser test should be written carefully when required to reduce timing in refactoring later.

3. Labs instance for each project where applicable.

Example: instance for Content Translation (CX) http://language-stage1.wmflabs.org has been setup exclusively to test. Content Translation server is setup on http://cxserver.wmflabs.org This server is also used in automated browser testing procedure.

4. Automated browser testing instances with controlled environment.

Example: instance for web fonts enabled by default in ULS extension at, http://language-browsertests.wmflabs.org Such setup eases testing where our it is difficult to test in default senarios.

5. Test Case Management System(s).

This will reduce time in manual testings like MLEB testing. We can easily compre previous results with current result. This will also help us in tracking bugs reported by such tests.

6. Automated unit testing. This is experimental idea and can be done with discussion with team.

7. System and performance level testing. For example, Page Loading, Page slowness testing etc.

References edit