Reading/Web/Projects/Performance/Removal of secondary content in production

< Reading‎ | Web‎ | Projects

Hypothesis Edit

Certain content doesn't necessarily need to be shipped to the user upfront and sometimes not at all. A good example is the navbox content (this content also is not optimised for mobile but that's a secondary concern and out of scope for this test). We can remove this HTML from the initial page load and lazy load it if and when needed. Before introducing the necessary APIs for lazy loading such content, we wanted to gauge how impactful the removal was.

Despite previous experiments showing that this made little impact on performance, it is unclear whether this reflects the global audience. Currently our 2G tests run from Dulles (Washington, East coast USA) which is much closer to our data centers then for example a country like Indonesia. It is thus not clear whether the webpagetest data we are collecting is a good indication for our global traffic. To understand whether reducing HTML size can make any impact on performance we'd need to view global traffic, specifically the navigation timing reports we collect from real end users.

Prediction Edit

Based on previous experiments, removing navboxes for a quality page such as Barack Obama should

  • drop the number of bytes we ship to users
  • make little to no difference to the fully load time
  • increase the time to first byte (TTFB) from a clear cache due to the time needed for the MobileFormatter to transform the parser output
  • no different to first render

There is potential for:

Method Edit

A config change was made to strip navboxes and content not designed for display on mobile (the nomobile class) on Wednesday 16th March around 00:11 PST (week 11 of the year [1]).

A period of waiting time was left to account for cached pages being updated to respect the new setting and allow data to be collected.

Given the 30 day cache on Wikipedia, it was possible that results would not be visible until at least a month had passed. With that in mind, these results are live and will update as more data becomes available.

Using the webpage test reporter tool we were able to quickly get an idea of the impact on fully loaded and first render time was observed on the Barack Obama page for the 13 days prior to the change and the 11 days after the change using data in Graphite.

node wptreporter.js "" 16 3 2016 00 11 "" 26 wikitext=yes

We looked at the 95th percentile of global total page load time before and after the change for anonymous users using the command:

node wptreporter.js "" 16 3 2016 00 11 "p95" 26 wikitext=yes
node wptreporter.js "" 16 3 2016 00 11 "upper" 26 wikitext=yes
node wptreporter.js "" 16 3 2016 00 11 "p95" 26 wikitext=yes

We didn't look at beta, given that other experiments were running there that would impact results.

To be more confident of the data we were seeing we also analyzed the raw data in the NavigationTiming tables as collected by EventLogging. Some raw data was exported from the EventLogging tables for the time period before and after the change using the sql queries below (amounting to approximately 2GB worth of data). Using scripts this data was analysed to get a sense of the value of fully loaded time before and after the change. Any instances where the property being measured had a value of NULL, 0 or 1 were ignored. All data analysed would be for anonymous users, for pages in the main namespace for the mobile stable site.

select * from NavigationTiming_15396488 where timestamp > 20160216001100  and event_loadEventEnd is not null and event_mobileMode = 'stable' and event_namespaceId = 0 and event_isAnon = 1

select * from NavigationTiming_15033442 where timestamp > 20160216001100 and event_loadEventEnd is not null and event_mobileMode = 'stable' and event_namespaceId = 0 and event_isAnon = 1

The impact on the bytes sent to users was monitored but given the graphs contain data from both desktop and mobile and desktop traffic accounts for 50% of our page views, it was expected that it would be difficult to get a sense of any impact there.

Given other changes may have impacted fully loaded time, we considered fully loaded time before and after the change on desktop for German Wikipedia to provide a baseline.

select * from NavigationTiming_15033442 where timestamp > 20160216001100 and event_loadEventEnd is not null and event_mobileMode is NULL and event_namespaceId = 0 and event_isAnon = 1
select * from NavigationTiming_15396488 where timestamp > 20160216001100 and event_loadEventEnd is not null and event_mobileMode is NULL and event_namespaceId = 0 and event_isAnon = 1

Results Edit

Graph showing the impact on bytes for a sample of pages after stripping of the nomobile and navbox class elements. The HTML for the Barack Obama page dropped from 183.8kb to 152.2kb in size, approximately a 17% reduction.

Using Graphite data Edit

Impact of the change in stable on the Barack Obama article (Edge connection)
Property Before (avg) after (avg) Delta (Avg) % decrease (Avg) Before (median) After (median) Delta (median) % decrease (median)
html.bytes 189351.5 155515.9 33835.5 17.87% 189998.0 155509.0 34489.0 18.15%
TTFB.median 4914.0 5149.1 -235.1 -4.78% 5254.0 5258.0 -4.0 -0.08%
render.median 11307.5 11746.1 -438.5 -3.88% 11790.0 12282.0 -492.0 -4.17%
fullyLoaded.median 23867.2 24008.3 -141.1 -0.59% 23574.5 22863.0 711.5 3.02%
image.bytes 618722.1 543479.9 75242.1 12.16% 619389.0 545705.0 73684.0 11.90%
Impact on fully loaded and first paint time for global traffic (upper and 95th percentile)
Property Before (avg) after (avg) Delta (Avg) % decrease (Avg) Before (median) After (median) Delta (median) % decrease (median)
p95 fully loaded 13596.4 12838.9 757.5 5.57% 13619.9 12681.9 938.0 6.89%
upper fully loaded 56735.9 49152.1 7583.9 13.37% 57684.5 50485.0 7199.5 12.48%
First paint p95 5228.0 5187.8 40.2 0.77% 5160.2 4899.3 260.9 5.06%

NavigationTiming raw data Edit

Impact on key metrics for global traffic for anonymous users viewing pages in the main namespace on mobile (unless specified desktop)
Property wiki Before change After change Sample size (before) Sample size (after) % change
event_loadEventEnd (p95) enwiki 18149.6 17218.85 41895 34908 +5.13
event_loadEventEnd (median) enwiki 3719.0 3621.0 41895 34908 +2.64
event_firstPaint (p95) enwiki 6847.6 6436.6 28215 24215 +6.00
event_firstPaint (median) enwiki 1725.0 1685.0 28215 24215 +2.32
event_loadEventEnd (p95) all wikis 16751.3 16674.9 112328 95092 +0.45
event_loadEventEnd (median) all wikis 3625.0 3601.0 112328 95092 +0.66
event_firstPaint (p95) all wikis 6321.0 6191.15 79019 67200 +2.05
event_firstPaint (median) all wikis 1683.0 1689.0 79019 67200 -0.36
event_loadEventEnd (p95) dewiki 10882.85 12239.0 10264 9111 -12.46
event_loadEventEnd (median) dewiki 2778.5 2823.0 10264 9111 -1.60
event_firstPaint (p95) dewiki 4410.0 4507.0 5091 4426 -2.20
event_firstPaint (median) dewiki 1218.0 1260.0 5091 4426 -3.45
event_loadEventEnd (p95) hewiki 15848.5 19812.4 71 58 -25.01
event_loadEventEnd (median) hewiki 3318.0 3526.0 71 58 -6.27
event_loadEventEnd (p95) jawiki 13557.0 12196.4 1196 1083 +10.01
event_loadEventEnd (median) jawiki 3464.0 3325.0 1196 1083 +4.01
event_firstPaint (p95) jawiki 8574.5 8109.6 124 90 +5.42
event_firstPaint (median) jawiki 1401.0 1629.0 124 90 -16.27
event_loadEventEnd (desktop p95) dewiki 5043.0 4973.6 184820 153949 +1.38
event_firstPaint (desktop p95) dewiki 2148.0 2092.0 184820 153949 +2.61
event_loadEventEnd (desktop p95) hewiki 6860.25 6477.2 2236 1842 +5.5
event_firstPaint (desktop p95) hewiki 2536.0 2324.2 2236 1842 +8.35

Analysis Edit

As expected, after 26 days of analysis on the graphite data, a positive impact could be seen on fully loaded time on both the Barack Obama article and global traffic, but it was not substantial. That said, the upper value of fully loaded time dropped considerably giving indication that there is traffic on connections far slower than our simulated 2G connection that are hopefully benefiting from this change. When the raw data was consulted, similar, but not exact patterns were seen. Although globally across all wikis performance seemed to improve by a minimal amount, the greatest impact was seen in enwiki, which also had the highest 95th percentile of all data. It's possible that the majority of our 2G traffic visits this domain and this is where we are likely to see performance gains.

Close analysis suggested that performance on Hebrew Wikipedia and German Wikipedia worsened after the change while performance on English Wikipedia improved. It's worth remembering that our wikis are continuously being edited and these spikes could be caused by any number of things. German Wikipedia does not appear to use the `.navbox` class (although they have a similar .NavFrame class) as a result the improvements here would not necessarily have benefited them. Hebrew Wikipedia does seem to use the `.navbox` class but at a glance not nearly as much as the Japanese and English Wikipedia's. It's impossible to say whether the changes slowed down these wikis - it's possible that users on slower connections may also be gaining access to these sites driving them up, so the worsening performance is not necessarily a bad thing, yet given the small values involved this is unlikely. Looking at desktop for these two wikis showed that performance on desktop seemed to improve during this period, so it's hard to determine what has happened here.

It would have been useful to have more data for smaller wikis. For example when looking at Hebrew Wikipedia we only had 71 NavigationTiming entries to consult before and after the change. It's hard to draw conclusions on such small data sets. Even German Wikipedia had 25% of the entries that English Wikipedia had.

The impact on bytes was clear to see on the Barack Obama article but we were unable to get any sense of impact on the global text cache eqiad.

No unusual spikes in page view traffic were witnessed which would be expected given the low impact on fully load time on the 95th percentile.

Conclusions Edit

It seems likely that the removal of navboxes had an impact on wikis where they are used frequently. Notable improvements were seen on English and Japanese Wikipedia. That said, it's unclear whether these changes have a negative impact on wikis that do not use them, for example German Wikipedia and it's not clear if there were other things that led to these improvements.

It seems plausible that parsing the page without removing any elements could lead to a negative, but our tests do not show significant increase in server side processing time, so it is likely these negative changes are unrelated. It's also plausible that the impacts on English and Japanese may have been greater if they were also impacted by the same performance degradation as German and Hebrew Wikipedia.

Measuring raw global data seems to be an accurate way of validating our performance changes. That said performance can be impacted by many things - improved infrastructure on cellular networks, new traffic that previously wasn't there.

Using Graphite data can give an indication quickly of possible impacts, but should not be relied on given the large differences in values computed.

Next steps Edit

  • We should increase sampling rates for smaller wikis. Right now our performance metrics are geared towards measuring performance on English Wikipedia.
  • Consider handling .NavFrame the same way as .navbox on German Wikipedia
  • We need better ways to gauge impact of bytes savings for our users. Bytes saved equates to money saved in many countries.
  • Investigating perceived performance degradation on dewiki and hewiki
  • We should come up with a generalised solution for deferring the loading of content.

Notes Edit

HHVM was upgraded to 3.12.1 on the 10th March which could have impacted results by improving server processing time.