Reading/Strategy/Strategy Process/Testing

About

We have narrowed down our strategic possibilities for the Wikimedia Reading audience vertical to the following four main strategies.

Optimize the user experience of Wikipedia for contemporary technologies, making Wikipedia content equally easily accessible, across different platforms, with an enjoyable reading experience across different devices and channels.

Allow readers to interact with content and each other, adding a further layer of engagement by offering different types of interaction with content in addition to passive reading.

Create deep-dive (guided) educational experience, making it easy for knowledge seekers to find information they are looking for easily and in an understandable way. For example: being able to learn efficiently about semiconductors without being an engineer.

Create a reading experience tailored for users in the Global South

How can you contribute?

Below are tests that we plan to conduct to determine the viability of these strategies. Please take a moment to read and see if it makes sense. Would you like to suggest an alternative test or modify an existing planned test? Please discuss on the talk page.

Tests marked in bold are tentatively planned for Q2 FY 2015-2016 (October - December, 2015).

Tests

Optimize the user experience of Wikipedia for contemporary technologies

Condition to test	Short term test	Medium test	Large effort test (definitive)
Innovative user experiences and well designed apps and sites will drive more traffic	Test: Research large internet sites, find out which have increased their traffic by only updating the user experience and user interface of their projects, in the last 2-3 years? If/Then Hypothesis: If we found examples on how good design demonstrate radical increased traffic, then improved design is likely to drive traffic. Standard of proof: TBD	Test: Redirect 1% of desktop users from desktop to mobile web. If/Then Hypothesis: If redirected users engage more than the usual group, then improvements are likely to have an impact. Standard of proof: Data must support the hypothesis on a daily basis for 7 consecutive days.	Objective: Test user interface Impact on beta. Test: Test a cleaner user interface, on beta, for more users than our current beta users. If/Then Hypothesis: If users prefer the beta user interface experience, as demonstrated by usage and surveys, then improvements are likely to have an impact Standard of proof: TBD
Wikipedia can develop modern experiences faster than intermediate providers	Objective: Deliver a new reading feature on both the desktop and mobile web Test: Successfully deliver "Read More" in Q2 FY 2015-2016 (October-December, 2015) If/Then Hypothesis: If a new reading feature can be delivered in one quarter, then delivery is rapid despite dependencies Standard of proof: feature launched on web beta or non-EN wikis	Objective: Deliver a substantive reading feature Test: Push something that changes the article header onto mobile web within a quarter If/Then Hypothesis: If a more substantive reading feature can be delivered in one quarter, then environmental factors support introduction of such features on a sufficiently rapid timeline Standard of proof: feature launched on web beta or non-EN wikis

A Community of Readers: Allow readers to interact with content and each other

Condition to test	Short term test	Medium test	Large effort test (definitive)
If our content is interactive enough for readers, then they will visit our sites directly.	Objective: Determine if interactive content drives traffic in general Test: Review research in this field If/Then Hypothesis: If interactive content (comments, highlights) drives traffic in general, then it might as well drive traffic for WP Standard of Proof: Research shows sites who add interactive features see a significant boost in traffic.	Objective: See if readers think we have forums/comments and ways for users to interact with each other. Test: Survey readers and ask them if we have discussion pages and how casually interactive our content is If/Then Hypothesis: If readers already see Wikipedia as interactive, but do not want to interact, then there is no point in building more tools. Standard of Proof: Most users do not see WP as a place for people to discuss content.	Objective: Gauge impact Test: Look at how share-a-fact engagement increases overall engagement with site. If/Then Hypothesis:If share-a-fact drives more time-spent from the users who touch it, then interactivity drives traffic. Standard of Proof: A user who has not seen share-a-fact- has sessions/session length measured. After 2 weeks they are shown share-a-fact, their engagement must go up (either # session or length of session)"
Our reputation as the source for accessing the sum of all knowledge, on the web, is not threatened by newer platforms	Objective: Understand current perceptions of Wikipedia and find if more casual participation models might impact perception Test: Review existing studies on life-long-learners users. What impacts their perception and use of Wikipedia? If/Then Hypothesis: If we determined, based on existing research, that learners strongly question content based on the existence of discussions, comments or crowd-sourced data, we should not move forward. Standard of Proof: If there is evidence that comments and discussions or the ability to interact with content significantly, negatively change how people gauge the quality of content it accompanies, then we should not move forward.	Objective: Determine if users perception of Wikipedia would change if they knew that the contribution model was easy and fun. Test: Administer short survey to a small set of users in the lifelong learner demographic to gauge how various elements of perception would change if WP contribution model was different. If/Then Hypothesis: If only a small number of respondents indicate that their perception would change negatively, then we can feel confident about rolling out new models of engagement and contribution to a broad set of users. Standard of Proof: 80% of respondents must demonstrate that their perception of WP would not negatively change if they knew about the new model of engagement.	Objective: Determine real-life reaction to user generated content Test: Ask users for perception of Wikipedia, show them page with comments or Q&A and then ask again. If/Then Hypothesis: If users perceptions of Wikipedia accuracy are diminished by the comments,then this is okay Standard of Proof: 80% of responses must demonstrate that their perception of Wikipedia would not negatively change if they knew about the new model of engagement.

Create deep-dive (guided) educational experience

This refers to a potential strategy where the reading team focuses on learning (comprehension and retention), rather than merely information presentation. Ideas in this theme include a suggested order to reading articles for maximum comprehension, simple or practical versions of articles, quizzes or even games.

Condition to test	Short term tests	Medium test	Large effort test (definitive)
We can differentiate ourselves among other education-tech/learning sites/apps	Test: Catalog and analyze competition's features vs. our own. If/Then Hypothesis: If we can identify features that would give us an advantage, then we should be able to differentiate ourselves from the competition. Standard of Proof: Comparison of ourselves against top 3 performers.	Objective: Determine if/how we’re failing users trying to educate themselves. Test: Conduct a survey asking users about gaps in our existing education experience. If/Then Hypothesis: If we better understand our ability to deliver education, then we can combine that with knowledge of our competition’s features to deliver a superior & differentiating experience. Standard of Proof: Large survey sample size & language diversity.	Objective: Determine our ability to execute a unique & compelling education experience Test: Build a prototype which combines user feedback & industry analysis to deliver a proof of concept on our education product. If/Then Hypothesis: If the prototype can be built and feedback is positive, then we should be able to achieve long-term dominance in this space thanks to a sustainable advantage. Standard of Proof: Prototype with minimal implementation of differentiating feature(s) that has data showing high engagement and positive user feedback."
The community or AI is able/willing to generate content at all levels of complexity	Objective: Gauge interest or resistance Test: Email wikitech-l and wikimedia-l (It has been pointed out the an RFC may be more appropriate for this test) describing the simple-moderate-complex content tagging concept If/Then Hypothesis: If feedback is positive, then the community will welcome it Standard of Proof: The ratio of feedback is 10:1 (positive+neutral:negative)	Objective: Determine real world behavior Objective: determine if summaries actually help users (by focus group?/survey) Test: Ask users to write article summaries on a set of articles (potentially new people mobile) If/Then Hypothesis: If users will attempt to write an article summary for elementary students 10% of the time when prompted, then users will generally be comfortable writing article summaries Standard of Proof: This should hold for the top 5 language Wikipedias	If/Then Hypothesis: If users will attempt to write an article summary for elementary students 10% of the time when prompted, then users will generally be comfortable writing article summaries Standard of Proof: This should hold for the top 5 language Wikipedias
Readers are interested in simplified versions of articles (+ more exposure --> more creation)		Objective: Determine if readers and editors will engage with simplified articles if properly exposed: Test: promote simple english version of articles and see if this leads to reader satisfaction or comprehension, also see if greater exposure leads to increase in editing If/Then Hypothseis: if users are made aware that a simpler version of the article exists and choose to see/edit the simpler version, it is worth investing more in supporting this direction Standard of proof: Users who see Simple English (when promoted) option show greater satisfaction, more comprehensions, or deeper sessions than users who do not have this option. When Simple English is promoted, edits increase proportionally.

Create a reading experience tailored for users in the Global South

Important note: Reading is committed to expanding access in the Global South. The strategic tests for the Global South strategic option largely aid in understanding the degree of the challenge.

Condition to test	Small directional test	Medium test	Large effort test (definitive)
There is relevant content available in local languages to readers.	'Objective: find out if assumption about readers preferring/requiring content in their local languages is correct.'If/Then Hypothesis: If assumption does not hold, then local language relevant content will not be a blocking factor Test: examine existing data/research Standard of Proof: TBD	Objective: see if machine translations (articles or infoboxes) can generate content that readers want. Test: show articles to readers with help of design research. If/Then Hypothesis: If the articles are good enough, readers will find & enjoy content. Standard of Proof:TBD
Users know what we are & consider us trustworthy	Objective: Identify awareness level Test: Percentage who know Wikipedia If/Then Hypothesis: If 30% of people who have data know of Wikipedia, then Wikipedia is well known Standard of Proof: Sample size of people who have data must be at least 30	Objective: Identify trustworthiness Test: Percentage who trust Wikipedia If/Then Hypothesis: If greater than half of people who have data and know Wikipedia trust it, then Wikipedia is generally trusted Standard of Proof: Sample size of people who have data and know Wikipedia must be at least 30	Objective: Verify whether advertising improves odds Test: Media lift If/Then Hypothesis: If after advertising test numbers increase to 40% and 70%, respectively, then advertising will be successful Standard of Proof: Sample sizes must be at least 30 in each test for a randomized sample (should not duplicate call)