Product Analytics/Data Products/ptwiki intervention impact report
This page is currently a draft.

Executive Summary edit
This report is requested by product leaders to study the intervention impact on the number of edits on Portuguese Wikipedia after turning off IP editing. It concentrated on synthesizing a counterfactual through a forecast model that would estimate the edits on Portuguese Wikipedia if IP editing had not been turned off and estimating the impact by comparing the predicted edits to the actual edits. The findings, based on the model, show no conclusive evidence that turning off IP editing has a negative impact on editing activity.
Introduction edit
Portuguese Wikipedia turned off IP editing on October 4th, 2020. Since then, we observed nonreverted edits (excluding bot and revert edits) decreasing in the following months. In the 20/21 fiscal year Q2, edits on Portuguese Wikipedia decreased by 0.91% year over year (Figure 1).
Considering edits on all wikipedias increased by 13.5% during the same period, we want to examine whether Portuguese Wikipedia would have seen the same increase if not turning off IP editing.
To answer this question, we provided analysis using Prophet time series forecasting method,^{[1]} predicted the intervention impact on edits on Portuguese Wikipedia.
Data Characteristics edit
In order to estimate edits without intervention impact, we obtained monthly nonreverted edits (excluding bot and revert edits) from wmf.mediawiki_history table. The data gathered consisted of variables over a span of 69 months from July 2015 to February 2021. The variables given on a monthly basis include: edits on Portuguese Wikipedia (ptwiki), edits on all other wikipedias, and edits on all other wiki projects in Portuguese language.
To explore the pattern of edits on Portuguese Wikipedia, we looked at the historical data and graphically represented it in Figure 2.
The edits almost kept flat in the last five years with a slight downward trend. The fiveyear trend is not purely dominated by yearly or monthly seasonality patterns. It indicated some other factors are impacting the edits. A trend only model cannot explain all the factors. We chose to use a causal model to conduct analysis and selected some wikis which are correlated with Portuguese Wikipedia as the control regressors to reflect the impact of global events in the model. After exploring 311 Wikipedias and 8 projects in Portuguese language, we selected below projects as control regressors based on correlation coefficient.
 Irish Wikipedia (gawiki)
 Russian Wikipedia (ruwiki)
 Sicilian Wikipedia (scnwiki)
 Yiddish Wikipedia (yiwiki)
 Portuguese Wikivoyage (ptwikivoyage)
We also explored edits by geo countries. 95% edits on Portuguese Wikipedia are from Brazil and Portugal. It has a good correlation with the edits on English Wikipedia from Brazil and Portugal. However the data is only available for a short period, not enough to forecast yearly seasonality and trend. If we have sufficient data, edits by geo countries could be a good control regressor candidate.
Model Selection edit
After evaluating the data and using statistical methods, we constructed multiple models for consideration.
1) model consisting of trend, seasonality, wikipedia regressors;
2) model consisting of trend, seasonality, Portuguese project regressors;
3) model consisting of trend, seasonality, Wikipedia regressors, Portuguese Wikivoyage regressor;
4) model consisting of trend, seasonality, Wikipedia regressors, Portuguese Wikivoyage regressor, pageview regressor.
We trained models using monthly data from July 2015 to September 2019 (the month before intervention), conducted 9 folds crossvalidation to estimate the mean absolute percentage error (MAPE) and evaluated the accuracy of the models (Appendix A table1, Appendix B table2). After analyzing these models, we are able to determine the 3rd model is the most trustworthy model.
The model is structured by below three components: (Figure 3)
However, this model has room to improve. We discovered autocorrelation in residuals with one month lagging (Appendix B, table3). Given that Prophet is a wrapped model solution, to fine tune the model (for example, to include an autoregressive component) we have to consider some other statistical models with more flexibility. It could be our next step to consider.
Forecast edit
With the above trained model, we forecasted five months (from October 2020 to February 2021) edits without intervention. (Figure 4)
The black dots are the historical data in the preintervention period. Blue line and blue area are estimation and its 95% prediction interval. Red line is the actual edits after intervention. The actual number of edits is within the 95% prediction interval. In Appendix A Table1, the estimated absolute intervention impact (Actual  Prediction without intervention ) is not constantly below 0,  meaning there is no evidence of edits decreasing due to intervention specifically.
Conclusion edit
As mentioned in the forecast session, we did not see the actual edits in the postintervention period are significantly lower than the predicted edits without intervention. This statistical analysis presents evidence against the hypothesis that turning off IP editing negatively impacted editing activity. However, as mentioned in the model selection session, the current model has some limitations and room for improvement. A more accurate forecasting model may be able to yield results favoring the hypothesis. Furthermore, with more interventions  especially in a randomized controlled experiment design  would help us learn more about the relationship between IP editing and editing activity.
Appendix A: Forecast edit
Month  Prediction
w/o intervention 
Prediction Lower Limit
(95% PI) 
Prediction Upper Limit
(95% PI) 
Actual
w/ intervention 
Absolute
intervention impact 
Relative
intervention impact 

20201001  181569  154498  208989  183585  2016  1.11% 
20201101  182536  155625  208771  172090  10446  5.72% 
20201201  158032  128804  182893  179582  21550  13.64% 
20210101  207637  181810  233151  180340  27297  13.15% 
20210201  185493  158854  212808  165528  19965  10.76% 
Appendix B: Model Diagnostics edit
Horizon  MAPE 

1 month  10.99% 
2 months  7.77% 
3 months  10.59% 
4 months  8.44% 
5 months  11.73% 
Assumption  Diagnostic method  Conclusion 

Normality  Histogram  Histogram is fairly bellshaped, normality assumption holds. 
KolmogorovSmirnov test  P value=0.875 > 0.05,
normality assumption holds  
Linearity  Residual plot  Residual plot has no obvious pattern. Linearity assumption holds. 
Constant variance (Homoscedasticity)  Residual plot  Residual plot has no obvious pattern. Constant variance assumption holds. 
Autocorrelation  DurbinWatson test  DurbinWatson statistic=1.008, positive autocorrelation is affecting the model 
ACFtest  ACF lag 1 stands out. There is autocorrelation in residuals. 
References edit
 ↑ "Prophet  Forecasting at scale.  Facebook ...." http://facebook.github.io/prophet/.