Article feedback/Public Policy Pilot Phase 2

What we've learned from Phase 1

Here is a quick summary of what we've learned/are learning from Phase 1. More detail may be found here.

Ratings by Anonymous users outpace ratings by Registered users by 10x.

For many articles, the number of ratings from Registered users is not enough to provide meaningful information about article quality.

Ratings by Anonymous users skew high, with most anonymous users giving either a 4 or 5 rating across all dimensions. We do not yet know whether this skew will prevent ratings from Anonymous users from being a meaningful measurement of article quality (e.g., if an article is significantly improved, yet ratings from Anonymous users doesn't change noticeable).
Ratings by Registered users are both lower and show less of a skew compared to ratings by Anonymous users. This could suggest that Registered users are more critical and/or give higher quality ratings, though more data is needed to support this assertion.
In its current form, the tool is not a good on-ramp for editing. The current version does not offer an explicit invitation to edit (e.g., "Did you know you can edit this article?").

Goals for Phase 2

To build on what we've learned in Phase 1, the goals of Phase 2 focus on the strategic objectives of Participation and Quality:

Participation: How can this feature be improved so that it is a better on-ramp for editing? The current interface fails at being an on-ramp, but as stated, there are no specific calls for a user to edit.
Quality: How useful is this tool to help the community measure the quality of an article? Specifically, how useful is this tool in measuring the quality of an article over time? Are there specific segments of users that offer higher quality ratings than others?

These two goals will be achieved through feature development and the selective targeting of articles.

Scope

The proposed list of features for Phase 2 is here.

Target Articles

In order to better understand how these ratings reflect article quality, we are targeting articles that will undergo substantial revision. Ratings before the substantial revision may then be compared with ratings after the revision to see if there is a noticeable change in ratings based on the revisions. We will deploy this feature on two sets of articles:

Public Policy Articles (Currently deployed): We will continue putting the article on select articles as part of the Public Policy Project.
Articles that are likely to undergo substantial change (To be deployed): We will put the feature on general Wikipedia articles which by nature are subject to substantial revision in the near future (e.g., upcoming movies, elections, etc.). The current list of pages is here (please contribute!).

Target Users

The target users for Phase 2 of this feature are:

Readers:

Rate article
Edit article

Editors:

Rate article
Edit article (increase editing activity, though this is a second priority for Phase 2)

For Phase 2, we are not prioritizing the use of the Article Feedback to provide detailed feedback (i.e., more detailed than the four categories) from readers on what areas of the article need improvement.

Feeback Survey

The draft of the survey that will be used for Phase 2 is available here

Measurement

We will be able to conduct the following analytics on the feature:

Participation

To measure the effect of Rating an article on participation, we will need the following analytics:

(# of users who edit an article after rating AND seeing the Edit call to action) / (# of users who rated AND saw the Edit call to action)

If there are multiple edit calls to action, we'll need to have this ratio for each separate call to action.
Retention of editors: what % of editors that made their first edit as a result of the call to action become New Wikipedians? Continue editing after 1, 2, 3, etc.months?

Fallout of rating-to-edit flow:

Click-through rate of call to action: (# of clicks of Edit call to action) / (# of times Edit call to action is displayed)
Edit conversion: After clicking on Edit call to action, (# of Saves) / (# of times Edit pages is viewed)

Quality

As previously mentioned, the main test for Quality we are conducting for Phase 2 is to determine whether substantial changes in an article are reflected in the ratings. We are doing this by applying the rating tool to articles that are likely to undergo substantial revision in the future. Currently, we have the data to construct moving averages per article by Anonymous and Registered users. We will need to be able to easily construct moving averages per article, per Rater segment (based on survey responses). For example:

The above chart shows hypothetical data for Ratings of US Constitution given by users that have identified themselves from the survey as having a degree in the field. The actual categories (e.g., "degree in field") for the survey are currently being developed.

UX Research

Some of which is relevant to Phase II here

Data + Survey Results

Timeline

Here is the proposed timeline (subject to change):

Design: Start Nov 8
Development sprint: Nov 15-29
Feature on protoype: Early December
Testing/bug fixing: Early-mid December (approx 3 weeks)
Launch: Second week of January