Wikimedia Technical Conference/2018/Session notes/Improving frontend testing and logging

Questions to answer during this session

edit
Question Significance:

Why is this question important? What is blocked by it remaining unanswered?

What works, and doesn’t work, about our front-end testing infrastructure?  What are the known gaps or failure conditions in our current testing methodology? Before deciding what should be changed, it’s important to understand the current state of our front-end testing infrastructure.  This will help to enumerate the issues, and the things that do not need to be addressed.
What tools or methods should we be looking at to enhance (or replace) parts of our front-end testing environment? Front-end testing has developed significantly over the last few years, including some advances that may not obviously apply to MediaWiki.  This will help to identify how these changes apply.
What confidence do you have that a change that passes our existing tests will be safe to deploy to production?  What changes or enhancements to our existing test methodology or coverage would make you more confident? This will expand on the lived experience of developers who are extending the front-end of MediaWiki, which will help to define priority for various enhancements.
Once front-end code has been deployed, how should we log errors?  What are the tradeoffs of building/maintaining a client side error logging solution, versus integrating with existing solutions? Client side logging is the primary mechanism for identifying bugs that have been introduced to production.  There are several ways that this type of logging could potentially be implemented, and we need to understand the tradeoffs involved.

Attendees list

edit
  • [Didn't capture list]

Structured notes

edit

There are five sections to the notes:

  1. Questions and answers: Answers the questions of the session
  2. Features and goals: What we should do based on the answers to the questions of this session
  3. Important decisions to make: Decisions which block progress in this area
  4. Action items: Next actions to take from this session
  5. New questions: New questions revealed during this session

Questions and answers

edit

Please write in your original questions. If you came up with additional important questions that you answered, please also write them in. (Do not include “new” questions that you did not answer, instead add them to the new questions section)

Q: What works, and doesn’t work, about our front-end testing infrastructure?  What are the known gaps or failure conditions in our current testing methodology?
A (If you were unable to answer the question, why not?):

Works:

Out-of-environment isolated tests that run in node, or in browser outside of MediaWiki, can work great and be very fast to run. However these may require writing code carefully, such as using dependency injection for mockable objects etc.

Doesn’t work:

Documentation is poor, best practices are not well known. Hard to get new folks involved in writing tests if they don’t know how to do so or don’t know how to write good maintainable tests.

Browser-based tests are often vveerryy ssllooww if they must boot up a MediaWiki page, etc.

With things that are slow or have unpredictable performance, it can be hard to rely on data (high false positive/negative rates).

Q: What tools or methods should we be looking at to enhance (or replace) parts of our front-end testing environment?
A (If you were unable to answer the question, why not?):

More use of isolated tests is good: improves test speed, confidence, reliability, and round-trip time and allows running on CLI as well as browsers via automation.

More documentation is good to help people write and maintain good tests.

Emphasizing the need to maintain living tests is key! Crufty tests that don’t work cause loss of confidence and lack of trust in tests.

Q: What confidence do you have that a change that passes our existing tests will be safe to deploy to production?  What changes or enhancements to our existing test methodology or coverage would make you more confident?
A (If you were unable to answer the question, why not?):

Not as much confidence as we would like given crufty tests, incomplete coverage.

Improved code coverage info would be great to help drive further test writing; QUnit can report this data?

Q: Once front-end code has been deployed, how should we log errors?  What are the tradeoffs of building/maintaining a client side error logging solution, versus integrating with existing solutions?
A (If you were unable to answer the question, why not?):

Two main possibilities: look into Sentry, which is a client-error-logging-specific tool… or expand the use of EventLogging as a transport/backend for reporting client errors.

Sentry has advantage of being purpose-built and already existing, and having support from outside world. However we didn’t get into too many details, need to investigate more clearly.

EventLogging used to have a major limitation that performance was not good enough to have confidence that large error spikes could be handled well. Folks seem to think this is no longer an issue with various changes that have been made to store data more efficiently. However there is a size limitation which may make it difficult to send full stack traces, and this needs to be looked into. Allows for use of Analytics’ existing tools for searching/reporting breaking down by browser/version/os/environment/etc, but would probably require custom development for logging-specific features.

The trade-offs need to be identified in more detail, a decision made, and resources committed to pushing forward on this.

Important decisions to make

edit
What are the most important decisions that need to be made regarding this topic?
1. Logging: decide whether to use Sentry or do some custom work on EventLogging.
Why is this important?

EventLogging should handle the necessary scale and analytics for browser/environment breakdowns, but we might need custom work. Consider trade-offs

What is it blocking?

Wide-scale deployment of client-side logging.

Who is responsible?

Unknown - QA SIG?

Action items

edit
What action items should be taken next for this topic? For any unanswered questions, be sure to include an action item to move the process forward.
1. Parties interested in improving testing should join the QA SIG, meets Fridays!
Why is this important?

We need to improve documentation and best practices around client testing; new tools are available and folks should work together to help devs get an idea how to properly write and run these tests and maintain them.

What is it blocking?

Getting more people to actively write tests relies on having good docs to point people at so they know which tools to use and how to use them.

Who is responsible?

-> Individuals interested in improving their teams testing. (Who?)

2. Provide versioning information for deployments (code + config) that can be tracked in error logging.
Why is this important?

Errors that occur during deployments may be transitory version mismatches, or might be real errors. Need granularity in the logging to tell what’s going on during a deploy or examine problems post-mortem.

What is it blocking?

Confidence of release engineers that client-side logging info can be used in realtime during deploys to identify problems.

Who is responsible?

May end up an item for scap3.

3. Once deciding on Sentry vs EventLogging, someone must keep working on implementation...
Why is this important? What is it blocking? Who is responsible?

Detailed notes

edit

Place detailed ongoing notes here. The secondary note-taker should focus on filling any [?] gaps the primary scribe misses, and writing the highlights into the structured sections above. This allows the topic-leader/facilitator to check on missing items/answers, and thus steer the discussion.

  • What works, and doesn’t work, about our front-end testing infrastructure?  What are the known gaps or failure conditions in our current testing methodology?
    • Question: Do apps count as “front end”
      • Answer: yes! But they also have extra special elements as well
    • See photo of board for identified issues → link
  • What tools or methods should we be looking at to enhance (or replace) parts of our front-end testing environment?
    • Sam smith’s team currently working on migrating some of the core resource load modules and using web-hack to bundle them together iror[?] to resource loader, number of properties that come out of this that are useful,esp not having such a set of resource modules, allows access to more modern tooling (we require being able to run tests on the command line, which allows us to run the test with or without mw loaded) this also gives you other kinds of testing like mutation testing (changing the runtime) which allows you to have more confidence in the software
    • Joaquim: First time you run it it generates and stores something on disc, which is useful for testing, “test snapshots”
    • Easier to test it in this way without relying on html
    • If you do q-unit testing on node only, you don’t need to boot up mw, which is helps for Joaquim’s team; runs fast, start-up is small, you can watch irt; however you have to write the code with more dependency injection kind of way
    • Timo: we should still have browser tests, currently don’t have good conventions on how to write browser tests and we should
    • Do we want to mock the database when we do an integration test, or have it be real-world? Don’t want to mock the whole thing because then its not real world (ex issues with job queue) Proposed idea: have a fresh database every time we do integration tests.
    • Don’t really have smoke[sp?] tests at the moment
    • Q - what browsers does silenium run on - a: firefox and chrome
    • Issue - not a lot of QA people at the foundation, which is one of the reasons our browser tests are lacking (this was a big point)
    • raz: think more about what to put in a browser test and what not,especially things beyond just pass/fail in tests (such as waiting for a view before loading a test) [this was pretty technical and i didn’t get all of it]
    • Slaved to ruby
    • Establish meeting (regular or otherwise) to go over practices - such as QA SIG! This is an underutilized resource
    • Using summer of code to tap student resources
    • We are falling into the trap again of treating this as software when it comes to students writing your software test, and we need to mentor these students to do it properly and raz is concerned we aren’t there
    • Test maintenance AND evaluation the block to this is training on how to write proper tests (esp in the no js test and ruby tests since they are so new to us)
    • Major takeaways
      • non browser based testing on command line with node to increase speed, either through bundling things for outside mw and another is being careful about dependency inject creating unnecessary reliages
      • Snapshot testing is helpful giving feedback about what and why something failed.
      • Better documentation and best-practices sharing
  • What confidence do you have that a change that passes our existing tests will be safe to deploy to production?  What changes or enhancements to our existing test methodology or coverage would make you more confident?
    • This discussion merged into the previous one so see above
  • Once front-end code has been deployed, how should we log errors?  What are the tradeoffs of building/maintaining a client side error logging solution, versus integrating with existing solutions?
    • Sentry! Can be used for front end and back end [and some other things - ab testing? Didn’t catch it]
    • Using event logging works for us but needs improvements
      • Good for specific scenarios, not general
      • Size limitations
        • Sam: Do we know that event logging as it stands can’t take the load? Gergo says probably yes but →
        • Page preview was loggin 20000 events a minute at peak and had no issues. Message that the load was unexpected but it didn’t seem to care. Can handle the scale according to Timo
    • Use case that Timo is interested in is having telemetry during employment. How do we do this? We have it for back-ends via logstash but how for front end?
      • Sage - perhaps sentry?
      • [technical discussion i didn’t catch, see board for takeaway]
    • We can start doing expectation analysis (Sam: and should) as we collect more and more of this data
    • Turnillo is great, according to Sam, for showing us the errors we have logged and learning from them
    • Generally friendly reminder that we don’t have to build things ourselves

Whiteboard captures: