Core Platform Team/Initiatives/Unify Parsers-Phase 1

Initiative Description

< Initiatives

Summary

Parsoid is a Node.js codebase. This project aims to (a) port Parsoid to PHP (b) integrate it with MediaWiki core (c) Deploy Parsoid/PHP on the Wikimedia cluster and switch over all clients to use Parsoid/PHP.

Significance and Motivation

The larger project is to make Parsoid the default parser for MediaWiki starting with the Wikimedia cluster wikis. The simplest and shortest path to getting there is to port Parsoid to PHP.

The reasoning for this is covered on Parsing/Notes/Moving Parsoid Into Core#Why move Parsoid into Core? and in the Tech Talk about making Parsoid the default MediaWiki parser (see the Links and Resources section below).

But, the TLDR is that porting lets us (a) fix the architectural complaints about Parsoid as a standalone service (b) leverage code from the MediaWiki core codebase to bring Parsoid and the legacy PHP parser closer together (c) provide simpler installation options for non-Wikimedia wikis while providing VisualEditor and Wikitext Linting out of the box (d) reduce some of the async-related complexity from the codebase.

Outcomes

Reduce complexity in core

This is a step in a larger project, as such the metric is about completing the porting process so we can get to the next phase of development. The best way to do this is to ensure no clients are using the JS version of Parsoid. The next phase will focus on the ultimate goal of moving to a single parser.

Baseline Metrics
  • Percentage of clients using Parsoid: 0%
Target Metrics
  • Percentage of clients using Parsoid: 100%
Stakeholders
  • Client teams: Web, VE, CX, Android, Growth (for Flow)
  • Editing community
  • Core Platform
Known Dependencies/Blockers

Build new HTTP API

Epics, User Stories, and Requirements

< Initiatives

  • Prototyping: Early experiments with porting to evaluate feasibility of the port, potential performance issues, anticipated roadblocks, and expected difficulty. (Status:   Done)
  • Preparation: Fix the JS codebase to smoothen and simplify the porting process -- might include some significant code refactoring. (Status:   Done)
  • Porting: Port the Parsoid codebase to PHP including building interfaces to integrate Parsoid into MediaWiki core. (Status:   In progress)
  • Testing & QA: Rigorous testing and performance tuning to establish production readiness of Parsoid/PHP. (Status:   In progress)
  • Switchover: Switch existing clients to use Parsoid/PHP.
  • Switch off Parsoid/JS

Time and Resource Estimates

< Initiatives

Estimated Start Date

Started in FY1819 Q2

Actual Start Date

None given

Estimated Completion Date

None given

Actual Completion Date

None given

Resource Estimates

9-12 months, completion is expected near end of FY1819 Q4 / early FY1920 Q1

3.5 FTE for the duration

Augmenting with 2+ FTE for 3 months

0.5 Engineering and Project Management for the duration

Collaborators
  • Parsing Team
  • Core Platform
  • Performance
  • SRE


Documentation Links

< Initiatives

Phabricator

https://phabricator.wikimedia.org/tag/parsoid-php/

We are not tracking porting of individual files in Phabricator. We are using the Parsoid-PHP Phabricator board for tracking everything that is not about mechanical porting of individual files in the codebase.

Plans/RFCs

The Long And Winding Road To Making Parsoid The Default MediaWiki Parser ( Slides Video )

Other Documents

Parsing/Notes/Moving Parsoid Into Core

Subpages