Wikimedia Developer Summit/2016/T114542
10am Pacific Time on Monday, January 4, 2016
Discussion of the following areas:
- 20 minutes - introductory presentation: Next Generation Content Loading and Routing - Adam Baso, Jon Robson, Joaquin Hernandez, Gabriel Wicke, Sam Smith
- 60 minutes - open discussion
- copied from Etherpad live notes on 2016-02-15
Reminder: All current and past edits in any pad are public. Removing content from a pad does not erase it.
Next Generation Content Loading and Routing
Session name: Next Generation Content Loading and Routing Meeting goal: [ENTER GOAL HERE] Meeting style: [ENTER STYLE HERE] Choose one of:
* Problem-solving: surveying many possible solutions * Strawman: exploring one specific solution * Field narrowing: narrowing down choices of solution * Consensus: coming to agreement on one solution * Education: teaching people about an agreed solution
Phabricator task link: https://phabricator.wikimedia.org/T114542
(not shared publicly, or even with @wm.org) (that's...disappointing)
- Apologies, the title slide had the link to the Commons file on it, but I should have pointed it out earlier --dr0ptp4kt. Here's the PDF on Commons with all of the information:
- Out of date version of vision. Actually vision statement is https://wikimediafoundation.org/wiki/Vision ("Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment.")
(Outdated but also longer) PDF version of slides: https://commons.wikimedia.org/wiki/File:Paradigm.pdf
- Many networks and devices are slow and will remain slow for the near future (3-5 years)
- Google and others are acting as intermeidiaries to trim down the page size and increase speed of loading for 2G networks
- Images and HTML size influence load time
Stripping references, navboxes, infoboxes are some strategies to reduce HTML size
- Reading Web team built transformation API server that plugs into RESTBase to experiment with reduced payload size for 2G networks
- nodejs app that can run on both sever and client side depending on device capability and local device cache
- Load only initial section of html content initially
- Service workers to cache content on device
- Service worker is basically a proxy that can be installed in the browser to selectively intercept and modify HTTP requests
- Same/similar code can be run on the server side to compose page for clients without service worker support (no-js, first visit)
- Certain composition apis are faster than others, varnish cache is an order of magnitutde faster still
For each approach, what functionality would need to be reimplemented? Who is responsible for making the change? Skins would change. For the single-page app, I think skins would have to be implemented on the client-side. For each approach, what functionality would need to be dropped? Which stakeholders care?
Q&A: ServiceWorker composition vs. single-page app ● Pros ○ Can be introduced as a progressive enhancement, targeting only specific views. ○ Can be compatible with e
Question: cscott: Special:Everything, gadgets..., there's a long tail. It would be nice if we could use the future stuff where possible and fall back to the old PHP code for the long tail of random (but very useful for particular users) crap Answer (gwicke): "null skin". When you ask for a page on the "long tail", the service worker can turn around and ask core PHP to render that page with the "null skin", and then take the resulting opaque chunk of HTML and drop it into the existing page.
Alternatively, you can just load that particular page from core PHP bypassing the service worker, but that involves being careful to synchronize skins/UX so that the reload is not unduely jarring (and slow). (For instance, if you did this in the existing mobile app, the result would be very janky because the UX doesn't align.)
Can Service Workers actually distinguish /wiki/Earth from /wiki/Special:Log? Remark (Timo): Pro/cons of service workers: SWs work in a different thread in the browser, so transparent to JS/Gadgets. document ready etc still fires at the same time even if no server round-trip occurred in the background.
Timo: DOM still looks the same even though mechanics are different, so gadgets which run after "page load" should Just Work. ...and in theory the *content* markup can gradually become more semantic, which helps gadget authors more easily pull out semantic information from the page.
Will the HTML attributes be changed, e.g. class, id.? No, html doesn't need to change. Composition happens at the network level. So when the browser downloads the page and JS starts seeing it, it sees the composed version (streamed).
[cscott] that's an orthogonal issue. not necessarily. we do want to (for example) gradually improve <figure> markup, but that's not really related to this front-end proposal. (current implementation would have slightly different markup because it's based on Parsoid/RESTbase, but we're trying to unify the markup between Parsoid and PHP.) Joseph Allemandou: 3rd party MediaWiki, how? Gabriel Wicke: Doing more on the client relieves the server, so it would make running high-traffic wikis earlier. But yes, if (IF) we do this with services, that makes 3rd party harder. There's a separate session about that later today Toby Negrin: similar to Squid/Varnish/ESI on some level, just a wrapper around MW Subbu Sastry: Are we trying to figure out which approach is viable? What's the goal of this discussion? Adam Baso: Open to different approaches, we thought these two were viable. Probably have to borrow pieces of both. Toby Negrin: One of our (Reading's) goals is to reach more readers in the Global South, so have to make performance viable Subbu Sastry: Network issues with mobile is a motivation; is this solution only for mobile? What happens to desktop? Joaquin Hernandez: You don't often have a desktop client with a 2G connection :) but these improvements benefit all clients; on desktop it doesn't make as much of a difference because it's already fast
Adam: note congested wifi, congested cellular Gabriel: Can adapt to network conditions on the fly using client-side code; hard to do with server-side fragmented cache, so separation is beneficial
Matt Flaschen: 1) how will references and UGC be handled for no-JS users? Joaquin: For the HTML-only version, replace links with links to different HTML end point that does have ref content; with client-side, show tooltips
2) Since we're talking about emerging countries and out-of-date phones, are they actually gonna have service worker tech you'd need? Joaquin: Chrome updates separately from the OS, so you probably run a very new version of Chrome even on old phones; it gracefully degrades, the web app approach still works without service worker
Jon Katz: One of the goals of this meeting is to hear from the community, mostly heard from staff so far. Straw poll: Don't care not my area (2) Super against (1) Sounds good, no questions (~10)
Question about single-page app: are you saying that on a cold cache, you'll load the shell and the entire page rendered on the server? Joaquin: cold cache = regular HTML Matt: So if most sessions are one view, is this only a significant benefit for multi-view sessions? Toby: Where's the data that says most people only view one page? IIRC Nuria said there was data on this. cscott: Don't forget that the cache *duration* can be a lot higher as well, since we're effectively separately caching the UX and the content. So the cache expiry time for the client-side JS and/or service worker can be much much longer than content expiry times. So "multi view" sessions could span a month, say. (This seems most convincing for the ServiceWorker implementation.) Daniel Kinzler: Like the direction, have to make sure to get it right. Agree with Ori there are easier ways to improve perf, but I like the flexibility we get from these approaches. Seems like single-page app will be a nightmare with back compat with user scripts and gadgets. One reason people keep using MonoBook is because porting Gadgets is hard (even to Vector, let alone SPA). People have brought up architecture, but let's talk about maintainability. We kind of already have an API-drive front-end: mobile apps/sites. We'd add another one, and also an HTML fallback: spreading ourselves too thin. SW seems better, because you start with doing nothing, and AIUI, just incrementally pull out more things. Gabriel: Lot of overlap in API endpoints used by these things. cscott: Part of the point is that the "HTML fallback" will reuse the client-side code, so we're not actually multiplying codepaths. Jaime Crespo: Don't have anything against the idea, some things against the implementation. We need more testing, both on actual devices and testing of server-side tech. My own tests aren't very promising, will share them with you. Caching user-dependent data is a security concern. Jordan: Agree with Daniel and Ori, thing SW makes the most sense, but I think the RB component should be seen as a last step if at all. Pulling out infoboxes and navboxes will make a much larger difference than using RB. 10%/90% rule applies (10% effort for 90% of improvement) Timo: Next session at 11:30 will mostly continue on from this, will focus mostly on the front-end side of things. Specifically on how skinning will work in the future.
In-Etherpad comments: Jie: In HTML side, currently we use <table class="infobox"> to represent infobox. Does it mean we'll switch to <mw-infobox> tag in the furture. cscott: come to my session at 15:40! We'll talk about this. This is one of the options on the table. We'll also discuss possibly storing infoboxes separate from the main article, say as wikidata facts. So the "storage" and "rendering" of the infoboxes are separate questions. We could change the storage (or retrieval API) without changing the rendering, or vice versa. https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit_2016/T112987
Scott_WUaS: cscott: will this also be for "T112996, T112984, T113004: Make it easy to fork, branch, and merge pages (or pages)" and related? Thanks.
Trevor: This is what happens when you use HTML as an API, you can't change anything without breaking everything
Tim: Since there's not much chance of getting another turn of the microphone, I'll put my position here. I think any UX degradation, e.g. lower quality images, deferred image loading, deferred navboxes, first section only delivery needs to be carefully considered and supported by high-quality performance data. With current data, I would only support deferring collapsed navboxes until click.
Packet loss is obviously the biggest performance problem on mobile, in both the developed and developing world. We should do performance tests with simulated packet loss, not simulated fanciful >1s RTTs. The obvious way to address packet loss is to reduce retransmission time by reducing RTT. For example, we could have a cache POP in Indonesia.
When the Opera Mini team say "send HTML not a web app", they mean fully functional HTML with <img> tags, not first section HTML with JS which will load the rest of the content. It's especially critical to send all HTML at once, with no deferrals, on high RTT high bandwidth connections such as satellite. Satellite ISPs will even bundle CSS/JS/images along with the HTML in order to reduce round trips.
TO THE BIKESHED:
Topics for discussion (feel free to fill in answers as they are discussed, or after the session):
Which stakeholders will likely benefit from an API-driven UI approach? Which stakeholders will be negatively impacted by an API-driven UI approach? What alternatives should we consider for API-driven UI (or generally decreasing perceived page load times)? API-driven UI: Single-page application API-driven UI: ServiceWorker composition Lazy loading images Inlining CSS (any others?)
* * *
Action items with owners:
* * *
DON’T FORGET: When the meeting is over, copy any relevant notes (especially areas of agreement or disagreement, useful proposals, and action items) into the Phabricator task.
See https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit_2016/Session_checklist for more details.