Reading/Web/Projects/A frontend powered by Parsoid/Parsoid html size initial report
In order to inform what a fast initial response would be for A frontend powered by parsoid we've performed some initial comparisons from the raw parsoid HTML and an optimized version.
Set up edit
For the experiment we're running a middle man server that has two endpoints:
/api/raw/html/:title
: Serves the raw parsoid html without transformations./api/slim/html/:title
: Serves an optimized version of the HTML- For the purposes of the demo, this optimized version performs the following transformations:
- Removing
data-mw
attributes - Removing comments
- Remove references
- Remove tables
- Remove images
- Removing
- For the purposes of the demo, this optimized version performs the following transformations:
First hit on the API may be going to restbase and performing transformations, so for measuring we'll ensure to hit the API once, and after it's cached on the middle man we'll measure 5 runs for each endpoint.
Each run will be done with browser caching disabled.
For measuring we'll use 2 devices:
- Macbook Air with OSX 10.10 and Chrome 46
- Nexus 5 with Android 6 and Chrome mobile 46
Both devices will be connected to the same WIFI.
For measuring we'll use the Chrome developer tools both connected to the desktop browser and to the mobile browser.
Since this is an initial research we'll measure one of our heaviest articles, Barack Obama
Glossary edit
Aggregated Time (ms): Time from initial connection to get the HTML to the DOMContentLoaded event.
Html Size (kb): Size of the HTML document downloaded.
Loading Time (ms): Loading time aggregated on the Timeline tab on the developer tools for the Aggregated Time period.
What are we trying to find out edit
Anecdotal experience shows that loading a parsoid html in a mobile phone blocks the browser for some time, so we're trying to get clear insight into what is happening and how to avoid it, since the purpose of the experiment is to quickly serve content to users on bad connections.
Data edit
See original google docs spreadsheet for data, also replicated below:
Desktop, Chrome 46 edit
Raw (HTML size: 1500 kb) edit
Run # | Aggregated Time (ms) | Loading Time (ms) |
1 | 2180 | 162 |
2 | 1980 | 171 |
3 | 2010 | 159 |
4 | 1970 | 124 |
5 | 1880 | 140 |
Average | 2,004.00 | 151.20 |
StDev | 109.68 | 18.94 |
Slim (HTML size: 268 kb) edit
Run # | Aggregated Time (ms) | Loading Time (ms) |
1 | 727 | 44 |
2 | 895 | 56 |
3 | 846 | 58 |
4 | 901 | 55 |
5 | 719 | 48 |
Average | 817.60 | 52.20 |
StDev | 89.00 | 5.93 |
Phone, Chrome mobile 46 edit
Raw (HTML size: 1500 kb) edit
Run # | Aggregated Time (ms) | Loading Time (ms) |
1 | 9100 | 538 |
2 | 7090 | 620 |
3 | 7330 | 607 |
4 | 12400 | 628 |
5 | 9650 | 710 |
Average | 9,114.00 | 620.60 |
StDev | 2,142.69 | 61.35 |
Slim (HTML size: 268 kb) edit
Run # | Aggregated Time (ms) | Loading Time (ms) |
1 | 1350 | 106 |
2 | 1250 | 122 |
3 | 1310 | 103 |
4 | 1270 | 111 |
5 | 1250 | 98 |
Average | 1,286.00 | 108.00 |
StDev | 43.36 | 9.14 |
Comparison edit
Desktop | Mobile | |||||||
Raw | Slim | Raw | Slim | |||||
Average | SD | Average | SD | Average | SD | Average | SD | |
HTML Size (kb) | 1,500.00 | 0.00 | 268.00 | 0.00 | 1,500.00 | 0.00 | 268.00 | 0.00 |
Aggregated Time (ms) | 2,004.00 | 109.68 | 817.60 | 89.00 | 9,114.00 | 2,142.69 | 1,286.00 | 43.36 |
Loading Time (ms) | 151.20 | 18.94 | 52.20 | 5.93 | 620.60 | 61.35 | 108.00 | 9.14 |
Observations / Conclusions edit
Desktop vs Phone edit
There is a huge difference when targeting desktop users vs mobile users. Even a modern Nexus 5 with the latest Chrome at the moment is five times slower loading and rendering the same content over the same connection.
Render times (not represented in the data) were also an order of magnitude worse. Will need further studies (anecdotally, about 4s rendering raw content on the mobile device).
Payload size edit
By stripping certain parts of the content, we can get one of the biggest articles on english wikipedia to be an order of magnitude smaller (1500kb to 268kb). After this, several options arise for surfacing the remaining content, like immediately trigger loading the remaining content, or defer it to user actions, but the benefits are clear.
On fast WIFI the improvement of stripping is 5x in loading time, further tests required to verify impact of payload size on worse connections, but we can foresee it being the biggest factor on 2g connections and similar.
Perception edit
The perception on the mobile phone of the slim version is instant. Extremely fast.
More research edit
The subject begs for more research:
- Using a wider range of devices, network conditions, and a bigger sample of articles.
- Taking into account also rendering times, besides loading times, payload size and loading time.
- Measuring different variations of the slim version (what each transformation provides to the end improvement).
Links edit
- Source code for the middle man server at the moment of data gathering: https://github.com/joakin/loot/tree/5d6ee62885cd1cd4324b1f40e99b7d418b66c811
- Deployed version links