Reading/Web/Projects/A frontend powered by Parsoid/Parsoid html size initial report

In order to inform what a fast initial response would be for A frontend powered by parsoid we've performed some initial comparisons from the raw parsoid HTML and an optimized version.

Set up edit

For the experiment we're running a middle man server that has two endpoints:

  • /api/raw/html/:title: Serves the raw parsoid html without transformations.
  • /api/slim/html/:title: Serves an optimized version of the HTML
    • For the purposes of the demo, this optimized version performs the following transformations:
      • Removing data-mw attributes
      • Removing comments
      • Remove references
      • Remove tables
      • Remove images

First hit on the API may be going to restbase and performing transformations, so for measuring we'll ensure to hit the API once, and after it's cached on the middle man we'll measure 5 runs for each endpoint.

Each run will be done with browser caching disabled.

For measuring we'll use 2 devices:

  • Macbook Air with OSX 10.10 and Chrome 46
  • Nexus 5 with Android 6 and Chrome mobile 46

Both devices will be connected to the same WIFI.

For measuring we'll use the Chrome developer tools both connected to the desktop browser and to the mobile browser.

Since this is an initial research we'll measure one of our heaviest articles, Barack Obama

Glossary edit

Aggregated Time (ms): Time from initial connection to get the HTML to the DOMContentLoaded event.

Html Size (kb): Size of the HTML document downloaded.

Loading Time (ms): Loading time aggregated on the Timeline tab on the developer tools for the Aggregated Time period.

What are we trying to find out edit

Anecdotal experience shows that loading a parsoid html in a mobile phone blocks the browser for some time, so we're trying to get clear insight into what is happening and how to avoid it, since the purpose of the experiment is to quickly serve content to users on bad connections.

Data edit

See original google docs spreadsheet for data, also replicated below:

Desktop, Chrome 46 edit

Raw (HTML size: 1500 kb) edit

Run # Aggregated Time (ms) Loading Time (ms)
1 2180 162
2 1980 171
3 2010 159
4 1970 124
5 1880 140
Average 2,004.00 151.20
StDev 109.68 18.94

Slim (HTML size: 268 kb) edit

Run # Aggregated Time (ms) Loading Time (ms)
1 727 44
2 895 56
3 846 58
4 901 55
5 719 48
Average 817.60 52.20
StDev 89.00 5.93

Phone, Chrome mobile 46 edit

Raw (HTML size: 1500 kb) edit

Run # Aggregated Time (ms) Loading Time (ms)
1 9100 538
2 7090 620
3 7330 607
4 12400 628
5 9650 710
Average 9,114.00 620.60
StDev 2,142.69 61.35

Slim (HTML size: 268 kb) edit

Run # Aggregated Time (ms) Loading Time (ms)
1 1350 106
2 1250 122
3 1310 103
4 1270 111
5 1250 98
Average 1,286.00 108.00
StDev 43.36 9.14

Comparison edit

Desktop Mobile
Raw Slim Raw Slim
Average SD Average SD Average SD Average SD
HTML Size (kb) 1,500.00 0.00 268.00 0.00 1,500.00 0.00 268.00 0.00
Aggregated Time (ms) 2,004.00 109.68 817.60 89.00 9,114.00 2,142.69 1,286.00 43.36
Loading Time (ms) 151.20 18.94 52.20 5.93 620.60 61.35 108.00 9.14

Observations / Conclusions edit

Desktop vs Phone edit

There is a huge difference when targeting desktop users vs mobile users. Even a modern Nexus 5 with the latest Chrome at the moment is five times slower loading and rendering the same content over the same connection.

Render times (not represented in the data) were also an order of magnitude worse. Will need further studies (anecdotally, about 4s rendering raw content on the mobile device).

Payload size edit

By stripping certain parts of the content, we can get one of the biggest articles on english wikipedia to be an order of magnitude smaller (1500kb to 268kb). After this, several options arise for surfacing the remaining content, like immediately trigger loading the remaining content, or defer it to user actions, but the benefits are clear.

On fast WIFI the improvement of stripping is 5x in loading time, further tests required to verify impact of payload size on worse connections, but we can foresee it being the biggest factor on 2g connections and similar.

Perception edit

The perception on the mobile phone of the slim version is instant. Extremely fast.

More research edit

The subject begs for more research:

  • Using a wider range of devices, network conditions, and a bigger sample of articles.
  • Taking into account also rendering times, besides loading times, payload size and loading time.
  • Measuring different variations of the slim version (what each transformation provides to the end improvement).

Links edit

Authors edit