Topic on Talk:VisualEditor on mobile/VE mobile default/Flow

Test scenarios

4 comments • 17:22, 18 October 2019 5 years ago

4

In our 18 September 2019 update, we shared the next steps we plan to take, depending on the results of the A/B test. Do those actions seem correct to you? Is there something we haven't considered? Please let us know what you think...

Reply 00:46, 19 September 2019 5 years ago

197.235.242.46 (talkcontribs)

There are plenty of nuisances (from a research perspective) with the study:

Participants can change editors on a whim - regardless of whether a user gets Visualeditor or the source editor, they still have a choice to jump from one to the other. This means that there will be bias towards the wikitext editor, primarily because it is the default fallback experience, and because visualeditor is an incomplete editor. It can't handle stuff like "Undo", it can't handle edit conflicts, it is not used in all possible text areas, or some namespaces.
Fallback to source editor - if for whatever reason the visualeditor can't load. There is no alternative besides falling back to the source editor. So for huge pages or browser errors, you'll be dumping editors to the wikitext editor regardless of their choice or the test settings.
Bias towards wikitext editor - the source editor has a lot of inertia, it has been used for more than a decade, and even if that wasn't the case, all edits made by it are logged in recent changes. So they may be more carefully reviewed, and some may be more quick to revert.

It might be prudent to look into similar previous studies. Some of the issues of those studies is that it doesn't take into account different nature of edits. Creation of a page is different from a minor edit, the difficulties and burden is different.

Some suggestions:

Consider edit type, create page vs edit page (it is way harder to create a new page especially on mobile devices).
Consider the number of LintErrors (wikitext vs visualeditor) introduced by editor, while many lint errors do not really affect a page, some can break some rendering, e.g.(Special:LintErrors/wikilink-in-extlink), misnested tags ([1],[2] ,[3], [3b]). Scroll the page and you'll note several full paragraphs with strange styling (e.g. ~~bbbb~~ , bbb , italic). Note how the error has existed since the page was first created ([4]). Of course, there were recent parser changes and it didn't always render like that, but it doesn't change the fact that an average shouldn't care if <b> html tag is closed or not.
How many edits were needed to correct such errors.
Consider the ORES rating for the edit, vandals or random kids or bored people are more likely to get reverted regardless of whether the edit was done successfully

There are many more things that should be considered. For instance, in my opinion any large scale study of Visualeditor should temporarily obscure all visualeditor change tags([5]). Otherwise you're just setting a target on any of those people which could make them quit.

Also, it is worth considering that an incomplete or unsaved edit isn't necessarily bad. Many people just open the editor out of curiosity, or because they inadvertently click a redlink, or due to some random click or just to preview content.

Reply 14:38, 19 September 2019 5 years ago

Whatamidoing (WMF) (talkcontribs)

I think that the chance of "opening the editor out of curiosity" should be equally likely no matter which editing environment is displayed.

Reply 16:09, 26 September 2019 5 years ago

PPelberg (WMF) (talkcontribs)

Thank you for the time and thought you put into drafting your comments, 197.235.242.46.

I'm going to list the points you shard and then do my best to reply to each one, with a few follow up questions included within them…

1. Test participants can switch editors and those presented with VisualEditor by default will be more likely to switch to editing using the wikitext editor.

You're right, contributors can switch editing interfaces if they choose and this is behavior we will be analyzing. Although, how do you see this impacting the test results considering the test is targeted to include contributors with little to no experience editing Wikipedia and thus, we assume, are not likely to be familiar with the wikitext editor?

2. Fallback to source editor if the VisualEditor can't load

Have you experienced seeing the wikitext editor on mobile in instances where the VisualEditor took longer than a certain amount of time to load? This is not something we expect to happen so if it is, then something might be wrong and we'd value getting to the bottom of this!

More generally, you make a good point about load times. More specifically, how they can impact a contributor's likelihood to complete an edit. In fact, one of the things the team is curious to understand from the test results is this relationship between load times and edit success. More info here: T232175#5545364

Previous studies and the importance of considering edit type in analyses

First, I hadn't thought to use search like this to filter for "research" about a particular topic...this is wonderful!

Second, can you say a bit more to how you think edit type ought to be considered/evaluated in a test like this? You mention differentiating between edits to existing articles and edits associated with creating articles...are you thinking we should limit our comparison of the edit completion rates of the mobile wikitext and mobile visualeditors to certain types of edits? And if so, why?

LintErrors

In bringing up LintErrors, what are you suggesting in the context of the A/B test? Are you thinking if we were to compare mobile wikitext and mobile VE by the number of edits completed in each, it would be important to consider how many edits completed in mobile VE were edits to correct these errors?

Using ORES to score edit quality

It sounds like you're suggesting edit quality is something we should use to evaluate whether the mobile wikitext or the mobile visualeditor provides contributors with a "better" editing experience...am I understanding you correctly there?

If so, edit quality is something we will be evaluating a part of our analysis. See: A/B test.

Right now, our current measure of quality is whether an edit was reverted or not. We chose this approach as a measure of edit quality over ORES because ORES models are not yet deployed on all the wikis included in the test. You can see the ORES deployments here: https://tools.wmflabs.org/ores-support-checklist/

Obscure all VisualEditor change tags…

Interesting thought...are you suggesting that some contributors might have a negative bias to edits made in VE which could those contributors more likely to revert edits made in VE, in turn creating a bad experience for the contributors who made these edits in VE and ultimately driving them away?

Many people just open the editor out of curiosity, or because they inadvertently click a redlink, or due to some random click or just to preview content.

This is a great point and something the team is trying to figure out. More specifically, we're trying to understand how we might be able to measure intent.

Said another way: how can we detect whether a contributor is tapping the edit button with the intention to edit or if they're just – as you said – curious?

An idea we've thought of: Do contributors make any changes to the article before abandoning? This is something we'll soon be able to measure. See: T229079

If you have any other ideas of how we might be able to detect edit intent, we'd be keen to hear!

Thanks again for all your thought,

Reply 17:22, 18 October 2019 5 years ago

Reply to "Test scenarios"