User:Brooke Vibber/EmbedScript 2019

This is a work in progress. Please feel free to provide feedback on the talk page or directly to me via mail/irc/etc!

This is a re-conception of older ideas which went partially in this direction, with smaller more tractable subprojects listed as a work plan.

Background

There are two main target areas:

a safe way to embed interactive HTML/JavaScript "widgets" inside wiki articles alongside images and videos
a safer alternative to shared user scripts and Gadgets that splits plugins between trusted and less-trusted code

The common thread is the use of sandboxed <iframe> elements and Content-Security-Policy to create a JavaScript sandbox for less-trusted code. There are further avenues of exploration for truly untrusted code that needs to limit CPU usage and memory allocation.

Terms

host: the main web page of the embedder, such as MediaWiki
sandbox: an isolated JavaScript context which has no direct access to host objects or data
trusted code: code that has access to host data -- can do almost anything the user could do
less-trusted code: code that is restricted from host data, but is able to trigger recoverable problems like hanging the main thread or crashing the browser tab
untrusted code: code that should be further prevented from hanging the main thread, crashing the browser tab, or over-allocating memory
host API: an asynchronous message-passing interface between trusted host code and less-trusted or untrusted widget/plugin code
widget: a bundle of less-trusted HTML, CSS, and JavaScript code that can be embedded in a sandboxed <iframe>, optionally with a trusted host API
plugin: a bundle of less-trusted or untrusted JavaScript code loaded via an <iframe> + Worker combination that communicates with a trusted host API

Widgets

Embedded content "widgets" have a visible area they can populate with HTML, style with CSS, and manipulate with JavaScript. They run in a sandboxed <iframe> element, which causes the browser's same-origin restrictions to prevent any direct access to the host page from guest code.

Further abilities such as fullscreen, camera/microphone access, etc can be prevented through the browser's Content-Security-Policy mechanism.

It's important to ensure that old browsers cannot load the iframe contents in the parent domain if it directly contains guest HTML or JavaScript, as that would turn it into host code! For safety it is best to use a separate domain and/or to have the frame HTML only contain a stub loader than accepts its input from the host over postMessage.

Cross-origin communication channels can be blocked off with Content-Security-Policy.

A message-passing host API may be used in the setup of the sandbox, and for some host<->guest interactions like clickable links, but is not required for guest code to operate within the iframe.

Widget components

A widget definition is similar to a jsfiddle.net "fiddle" -- three main components holding HTML, CSS, and JS. They can be separately stored as strings in the host environment (wiki pages, or slots in a single wiki page, for the MediaWiki embedding). Additionally there should be a fourth metadata component with descriptions, localizable string definitions, images and common libraries to load, etc.

These can be presented in a viewer/editor as a 4-up panel (HTML, CSS, JS, and a running example) with metadata in a sidebar.

For small screens / mobile, a tabbed view may be better.

Plugins

"Plugins" are intended for the user-interface side of MediaWiki, as a way for custom code to manipulate the host page's UI through a safe, restricted API.

The code for a plugin is split into two components: a trusted host API module which has full access to the host page, and a less-trusted or untrusted guest module which communicates with the host over a secure, asynchronous message-passing API. Multiple plugin guest modules may hook into the same host API.

For instance, an editor plugin may use a host API that sends in a representation of a text selection, then waits for a response with modified text or requests to update a dialog box state.

Plugins will come with metadata for what host APIs they require and what pages or actions they should be loaded on. Operations requiring trust, like cross-site network access, can be mediated by the host API for opt-in security permissions. Plugins are reliant on the host API to present a UI for them, depending on the hooks being plugged into.

Plugin components

With no HTML or CSS interface, plugins have only two components: the JavaScript code, and a metadata component with descriptions, localizable string definitions, references to the host API hooks and any common libraries to load.

This can be presented in a viewer/editor as a single JS panel with metadata in a sidebar.

For small screens / mobile, a tabbed view may be better.

Host APIs

Host APIs are based on asynchronous message-passing, using the browser's postMessage facility between parent pages, iframes and (where needed) Worker threads. This allows sending structured messages, allowing not only the JSON data types but also ArrayBuffers, Blobs, and some other low-level types, so should be suitable for sending both text and binary data (but not HTML DOM elements).

Note it is very important that host API implementations must not insert HTML strings into the host DOM as this would allow <script> injection!

The guest sandbox can set up a safe message-passing wrapper API on top of postMessage in cases where it's desirable to further restrict the guest API from what web or Worker contexts see.

API components

Host APIs, like plugins, have two components: JavaScript code and metadata. They can be presented similarly as a 2-up panel or tabbed view.

Threat model

"Less-trusted code" running in an <iframe> cannot be prevented from causing some trouble, such as blocking the main thread for a few seconds with long-running code, or allocating a ton of memory which can crash the tab or even slow down the operating system.

Thus it's vital that widgets and plugins be recoverable in a fairly straightforward way: if you activate one and it causes trouble, you must be able to turn it off.

For content widgets, using a "click-to-play" model is safest as this will avoid running any dangerous code when an article is being viewed or edited, and reloading the page will clear things up by resetting state.

For plugins, ideally they won't be loaded until some interactive action happens, and there should be a way to turn it back off from preferences (or a context menu on the action trigger itself) without running it on that page view. Things like menu and button setup can be done in a declarative way that doesn't require pre-executing JS code.

Cross-site communication

With suitable CSP sandboxing, exposure of output data to other sites is not possible through "web bugs" (offsite image loads) or other direct techniques. This prevents any data leakage there might be from reaching an attacker. There is some possible danger of side-channel leakage through things that an attached host API permits, however -- for instance if a host API allows fetching a particular on-wiki image, media view counts might go up when the file is loaded by the browser, and this could be checked by the attacker.

Additionally, host API opt-in permissions might be abused by "Trojan horse" malicious code, such as an editor plugin that offers to do formatting cleanup but actually hides data in seemingly invisible adjustments to whitespace and formatting characters. This could then be retrieved by the malicious actor from afar.

As always, open-code and some sort of review system would help.

Requirements and compatibility

At a minimum, modern-ish <iframe> sandboxing, structured clone on postMessage, and CSP support are required. Don't know the exact minimum version requirements for these yet.

Compatibility needs to be checked at runtime before injecting any untrusted HTML, CSS, or JS into an <iframe>. Recommend not instantiating the <iframe> until it's ready to roll; source documents should probably show an <img> with a clickable thumbnail.

Resource limits

Because a malicious or auto-generated wiki page might include hundreds or thousands of widgets, it's best not to instantiate the <iframe>s until user activation even if there's no less-trusted code in them.

The ability to safely run fully "untrusted code" without blocking the CPU or over-allocating memory would be nice too, but it's difficult/impossible to enforce memory limits on JS. WebAssembly modules running in a Worker thread could be locked down further (long loops can be terminated from the main thread, and memory can be statically limited) but this requires a lot more investigation as well as more tooling to compile easy-to-write modules to Wasm.

Note that long-running JavaScript loops in a Worker could be terminated from the parent if it doesn't respond to pings during a long loop, but high sustained CPU usage in an async/await loop cannot be detected from the main thread.

Planned work areas

There are a few areas to work on:

<iframe> loader setup code and host API
- There is existing prior art such as Oasis, which can be used for inspiration
bundling system for taking HTML, CSS, and JS "files" and injecting them into the iframe safely
- Lots of prior art to examine. Would like to support native JS module syntax, but will probably start simpler.
sample content widgets
- update the Mandelbrot fractal generator using <canvas>
- write a "Turtle World" interactive Logo interpreter using SVG graphics
sample UI plugins
- pick something clever that hooks into the editor or page views and implement it
MediaWiki extension storing widget and plugin code as editable wiki pages
- I hope to use multi-content revisions to bundle widget HTML, CSS, and JS together in an "atomic" page.
- Plugins will have both a JS code and a JSON module registration definition.
- Host API implementations can be shared by multiple plugins.
- There should be a common permissions-granting UX for host APIs to use.

Additional productization requirements

If this is to be ever used on Wikipedia and Commons, a good code-review system would be strongly, strongly recommended. A simple way to import/export between a git repo checkout and an on-wiki widget or plugin or host API definition would likely help for "serious maintenance" as well.

Centralized definitions that can be pulled on any wiki, and export of widgets over InstantCommons as well as locally will be required.

Localizable string definitions and a system for translation, or hooking into MediaWiki's existing translation systems, is a requirement for well-maintained tools in our context.

Further areas to examine

There are some more things to do later:

investigate Worker isolation further for avoiding main-thread jank
investigate global JS namespace cleanup for reducing the attack surface further
investigate WebAssembly-based isolation for truly untrusted code
investigate ways to automatically create thumbnail images

I'm really excited about the idea of WebAssembly sandboxing, because you can strictly limit memory usage. If in a Worker you can also terminate long-running loops from the parent thread, and restrict the ability to schedule additional execution via timers or async functions.

But it requires a custom API for the message-passing, and tooling for compiling scripting languages to standalone Wasm modules isn't good yet. Implementing a full JS engine in Wasm is an idea I've considered, but it's a big project that isn't tractable at this time.

Memory over-allocation in JS can be limited somewhat if typed arrays can be hidden from view of guest code, depending on whether the browser engine handles large strings and arrays better than they handle typed arrays... but it's a shame to lose typed arrays, which are useful sometimes. Careful initialization of global state could help mitigate this and other issues with considered-unsafe native objects inside a Worker context.

Running code in a headless browser to make thumbnails for widgets should be possible, and mostly requires server-side tooling. It's less clear whether clients could create their own thumbnails via <canvas> etc.

Appendix 1: <iframe> and CSP details

See spec for iframe sandbox attribute.

"sandbox" attribute must include:

allow-scripts (required to run scripts)

Must not include:

allow-same-origin (would be unsafe to allow this!)

For the CSP header,

default-src should be 'none'
img-src must include 'data:' or 'blob:' to allow using images sent into the embedding
font-src must include 'data:' or 'blob:' to allow using fonts sent into the embedding
media-src must include 'data:' or 'blob:' to allow using audio or video files sent whole into the embedding
style-src may need 'unsafe-inline'?
script-src: 'unsafe-inline' is required to inject the CSS and styles, unless srcdoc or a separate domain with server-side bundling can be relied on
script-src: 'unsafe-eval' is optional (allows eval and Function constructor)
use sandbox: 'allow-scripts' in CSP also to enforce protections on compliant browsers

Must not:

allow any offsite anything! (cross-site communication dangers)
must be able to eval scripts

Note that while it might be "safe" in practice to allow img-src to include images from ourselves (upload.wikimedia.org etc), it would complicate reusing things offline significantly. Plugins should be able to be used self-contained if the necessary resources are made available to them from the host.

Need to do more testing to confirm all the CSP settings do what I think they do.

Appendix 2: de-fanging JavaScript intrinsics

Some past sandboxing projects like Caja have tried to enforce a sandbox world by creating custom prototype chains and rewriting code; this way lies madness and incompleteness, with many potential sandbox escapes.

However within a browser-based sandbox -- where the JS engine itself enforces separation -- some additional things can be cut out using the same techniques.

How JS object prototypes work

JavaScript objects are, roughly, a hash map of property values, some of them private and most of them public. JavaScript-visible state of any object can be modified, unless it's been explicitly locked off from extension and modification, though the internal "slots" (as the spec calls them) cannot be changed or read directly from JS.

When looking up property values, if an object does not have its own property for a given key the engine will check another object called the "prototype", and return the value from the prototype (or its prototype, and so on). A prototype object and a constructor function are usually associated together, so that objects created with a constructor inherit from its associated prototype. By convention (but not required), prototypes point back to their constructors too.

In theory you can replace any (?) global value and even change prototype chains, even to the point of making your own custom constructors and prototype chains to replace the regular ones -- but it's important to note that native code in the browser engine may not respect your custom objects the way you think!

For instance when a string primitive is coerced to an object, it will have the "intrinsic" %StringPrototype% object as its prototype -- it will not necessarily have the current value of String.prototype as its prototype!

This means roughly that while you can replace global constructors with custom functions, and you can replace the properties and methods of global prototypes with custom ones, you can't necessarily replace the actual prototypes themselves.

As long as you're running in a separate "realm" from the host -- as we already are within the sandboxed <iframe> and/or Worker -- then this should be sufficient for fixing a lot of potential funky behavior.

One possible thing is replacing the ArrayBuffer and TypedArray constructors and accessor methods to track how much memory is being allocated (warning: there's no way to track frees, only allocations, because JS has no finalizers!) This could help with the poor behavior of Firefox and Chrome with respect to ArrayBuffer-backed memory allocations -- they don't seem to limit them, and a page can allocate memory until the machine OOMs as far as I can tell.

One could also simply clean up the global namespace to present a more consistent interface between browsers, if desired...

Appendix 3: alternate scripting engines

I did a couple research spikes on custom-build scripting engines and how that might work, with potential for additional security checks beyond what the browser restrictions do.

In mid-2018 I examined implementation of a JS-like runtime in WebAssembly, confirming that NaN-boxing works for a compact value representation, and that a manually-maintained garbage collection root stack for locals and temporaries can function for automatic memory management within the WebAssembly linear memory region (which itself can be strictly limited). While the basics are sound, it would be a lot of work to implement a mostly spec-compliant JavaScript runtime and then maintain it in production.

In January 2019 I looked at a JavaScript-in-JavaScript implementation, investigating both interpreter and transpiled modes. This allows you to leverage the native JavaScript engine for primitives, objects, strings, arrays, GC etc but makes it nearly impossible to limit memory usage. It also becomes more fraught with danger the closer you get to native-like transpiled code, meaning it can't be relied upon to enforce safety or memory limits.

I think the JS-in-JS case is not worth trying to implement fully, though JS-based interpreters inside the sandbox for custom languages in non-performance-critical applications would be fine -- for instance an interactive Logo interpreter I want to do as a widget demo.

The JS-in-WebAssembly case is more interesting because it can enforce both safety and memory limits; and as long as kept in a Worker thread, execution time limits can be enforced by the host. I plan to do some proof of concept poking on the side, merging ideas from the two projects into the Wasm version, but don't expect to be able to bring it to production quality without a lot more investment.

If it ever appears worth it, we can try to get more dev support; if it's not worth it, it'll stay a research project.