Wikimedia Quality and Test Engineering Team/Lessons learned testing MediaWiki and associated software

Encourage error handling that defaults to safe or helpful behaviour

During testing watchlist expiry, I noted that an invalid expiry time would lead to an exception (here). Later, the same exception appeared in production and was preventing some users from watching pages.

In hindsight, I could have asked that instead of raising an exception on invalid expiry it should fall-back to some safe behaviour (e.g. ignore the invalid expiry and watch permanently) and perhaps raise only a warning. This way users were not completely blocked from doing what they wanted.

Be careful, what counts as "safe" will vary from case to case.

Moreover, be careful not to hide exceptions that might point to genuine bugs or misconfigurations.

When testing calculations, ask for high precision

When testing the accuracy of calculations, if the answers the software gives are rounded this can hide errors.

The more precision the software can give you the more likely you are to spot small discrepancies.

For example, I missed a bug here because the software did rounding.

Share your raw test data

It is possible that someone else might spot something in your data that you missed.

For example, I did not do so initially here, and we may have spotted this bug earlier. I learned my lesson and did so here.

Pay attention not just to what has changed, but what has not changed

When doing before and after comparisons, it is tempting to only concentrate on what has changed. But, you should also consider what has not changed but should have done.

For example, I missed this bug because I did not notice that a database value was not changed when it should have been. I was only looking for rows in the database that had changed.

You can use SQL for test data generation

SQL has lots of nice features for finding and generating complicated test data.

For example, the PARTITION BY clause can find rows in a database with different combinations of variables. Here is an example where I use it to find data for permissions testing.

Sometimes, you need to generate test data that has complicated logical dependencies. This can be done quite elegantly with SQL. Here is an example of finding all combinations of user preferences.

Pay attention to logs while testing

Important exceptions can be seen in the logs (which you otherwise would not notice from the UI). Some of these might turn into train-blockers, so they are worth reporting early.

Instructions for seeing beta logs
Instructions for seeing logs in your local environment (e.g. docker)
- or MediaWiki-Docker/Configuration_recipes/Logging (I have not tried this myself)

Also, sometimes you can verify certain behaviour by monitoring the logs. For example, you can see if a hook is being fired. Or you can check if an SQL query is correct (e.g. phab:T303034). (But, bear in mind that you won't have access to SQL query logs on beta.)

When writing automation, separate data generation, test execution and results analysis

I find it useful to have different scripts performing different functions:

one which generates the test data
one which executes the test based on the test data
one that analyses the results of the test execution

Have each script write out a .csv file (for example) which the next script in the sequence will read.

Advantages:

Easier to develop and test each function in isolation
If one function fails or has a bug, the others do not have to be re-ran
They can be in different languages. Different languages have different strengths and weaknesses. (For example, SQL can be good for test data generation but not for test execution)
You may find that for some things you only need the output of one of these scripts (e.g. you might use just the test data for exploratory testing)

Here is an example of three scripts which can be ran in sequence, but the test data on its own is handy as well: test data generation, test execution and results analysis.