Requests for comment/JSON validation

This request for comments discusses JSON validation.

Request for comment (RFC)
JSON validation
Component General
Creation date
Author(s) Legoktm, James Hare
Document status in draft
See Phabricator.

BackgroundEdit

As we increasingly store content in JSON, usually via ContentHandler, there is a need for a standardized JSON validation system that meets the requirements of Wikimedia projects. We currently have two different solutions deployed to the Wikimedia cluster: the JsonSchema class bundled in the EventLogging extension, and the json-schema library that is a development dependency of MediaWiki core and included in the mediawiki/vendor repository.

ProblemEdit

We are using multiple validators in production, neither of which is suitable for our needs. Each has something the other does not: while the EventLogging schema validator supports localizable error messages, it does not support JSON schema v4 features (test suite). The json-schema library included in core has the opposite problem. This will grow as a problem as more extensions are developed to make use of ContentHandler functionality and there is an inconsistent practice as the tradeoff is made between sophisticated schema and multilingual support.

ProposalEdit

Based on an agreed-upon set of requirements, we should adopt a schema validator and implement it in MediaWiki core.

RequirementsEdit

These are the current proposed requirements:

  • Support for localized error messages (i18n)
  • Compliance with JSON Schema Version 4, maybe version 3 as well if there are compatibility issues
    • Who needs this and why? Is v4 a draft spec? When will it be finalized and are people expected to support it before finalization?
      • For example, the extension.json schema uses features like "anyOf". Yes, it's a draft spec, and there are already implementations that support the draft spec. Legoktm (talk) 05:01, 3 October 2016 (UTC)Reply[reply]
  • Direct support for validating PHP objects/arrays, so they don't need to be converted into JSON first
    • Why? Is PHP special?
      • The main purpose is that we only parse the JSON once, where it is converted into a PHP object/array. Then if it is valid JSON, it gets passed to the validator which checks the properties/values/etc. but we don't want to have to reparse the JSON again, so it should be able to use the PHP object/array that we already parsed. Legoktm (talk) 05:01, 3 October 2016 (UTC)Reply[reply]
      • How about passing in the JSON string and validate that? It would make the validator language-agnostic, wouldn't it? Mobrovac-WMF (talk) 14:53, 3 October 2016 (UTC)Reply[reply]

OptionsEdit

Name Author Link MediaWiki usage Pros Cons
JsonSchema RobLa Bundled inside EventLogging extension
  • Localized errors
  • Already deployed/passed security review
  • Not v4 compliant (not sure about v3)
  • Missing functionality like multiple types, max/min
  • Needs to be moved out of EventLogging and published as a composer package (easy)
json-schema Justin Rainbow https://github.com/justinrainbow/json-schema
  • v3 and v4 schema compliance
  • Already deployed/passed security review
  • No i18n
jsonguard PHP League https://github.com/thephpleague/json-guard/ (Not used AFAIK. Wasn't considered during extension.json implementation due to PHP 5.5 requirement)
  • v4 schema compliance
  • Unique error codes, making i18n implementation easier
  • No i18n