Thank you for the time and thought you put into drafting your comments, 197.235.242.46.
I'm going to list the points you shard and then do my best to reply to each one, with a few follow up questions included within them…
1. Test participants can switch editors and those presented with VisualEditor by default will be more likely to switch to editing using the wikitext editor.
You're right, contributors can switch editing interfaces if they choose and this is behavior we will be analyzing. Although, how do you see this impacting the test results considering the test is targeted to include contributors with little to no experience editing Wikipedia and thus, we assume, are not likely to be familiar with the wikitext editor?
2. Fallback to source editor if the VisualEditor can't load
Have you experienced seeing the wikitext editor on mobile in instances where the VisualEditor took longer than a certain amount of time to load? This is not something we expect to happen so if it is, then something might be wrong and we'd value getting to the bottom of this!
More generally, you make a good point about load times. More specifically, how they can impact a contributor's likelihood to complete an edit. In fact, one of the things the team is curious to understand from the test results is this relationship between load times and edit success. More info here: T232175#5545364
Previous studies and the importance of considering edit type in analyses
First, I hadn't thought to use search like this to filter for "research" about a particular topic...this is wonderful!
Second, can you say a bit more to how you think edit type ought to be considered/evaluated in a test like this? You mention differentiating between edits to existing articles and edits associated with creating articles...are you thinking we should limit our comparison of the edit completion rates of the mobile wikitext and mobile visualeditors to certain types of edits? And if so, why?
LintErrors
In bringing up LintErrors, what are you suggesting in the context of the A/B test? Are you thinking if we were to compare mobile wikitext and mobile VE by the number of edits completed in each, it would be important to consider how many edits completed in mobile VE were edits to correct these errors?
Using ORES to score edit quality
It sounds like you're suggesting edit quality is something we should use to evaluate whether the mobile wikitext or the mobile visualeditor provides contributors with a "better" editing experience...am I understanding you correctly there?
If so, edit quality is something we will be evaluating a part of our analysis. See: A/B test.
Right now, our current measure of quality is whether an edit was reverted or not. We chose this approach as a measure of edit quality over ORES because ORES models are not yet deployed on all the wikis included in the test. You can see the ORES deployments here: https://tools.wmflabs.org/ores-support-checklist/
Obscure all VisualEditor change tags…
Interesting thought...are you suggesting that some contributors might have a negative bias to edits made in VE which could those contributors more likely to revert edits made in VE, in turn creating a bad experience for the contributors who made these edits in VE and ultimately driving them away?
Many people just open the editor out of curiosity, or because they inadvertently click a redlink, or due to some random click or just to preview content.
This is a great point and something the team is trying to figure out. More specifically, we're trying to understand how we might be able to measure intent.
Said another way: how can we detect whether a contributor is tapping the edit button with the intention to edit or if they're just – as you said – curious?
An idea we've thought of: Do contributors make any changes to the article before abandoning? This is something we'll soon be able to measure. See: T229079
If you have any other ideas of how we might be able to detect edit intent, we'd be keen to hear!
Thanks again for all your thought,