Règles des services Wikimédia
Ce document traite des principes (une application par exemple doit être observable et exposer des métriques appropriées) et non pas des règles d'implémentation.
Practical implementation guidelines will be written to detail how the principles enumerated in this document are to be applied in technical terms (e.g. the application must expose RED metrics from a
/metrics endpoint in prometheus format, with a precise naming convention).
The reason to split the two parts is that while we don't expect the principles to change much across time, but we do expect the implementation guidelines to change quicker due to technical evolution.
There are various aspects of the development and usage cycle of a new service, and several of those need to be as standardized as possible across the board in order not to make the complexity of our ecosystem become unmaintainable. In general, adopting a non-monolithic architecture has its costs, and unless standards are maintained regarding how different applications need to interoperate and how they're developed.
Plusieurs aspects du développement d'un service doivent être pris en compte :
- Règles du développement
- Contraintes de sécurité / confidentialité
- Déploiement de la production
Dans les quelques sections qui suivent, nous analyserons les critères qu'un nouveau service doit remplir pour chacune de ces catégories.
Règles du développement
Tout ce que nous développons doit être gratuit, ouvert à la collaboration et utilie par lui-même. Un nouveau service doit donc :
- Actually do something
- Be created only if there is no well crafted, well maintained, architecturally compatible FLOSS software that provides comparable functionality that can be adopted and improved/modified if needed.
- Avoid needlessly duplicating features or functionality provided in other services
- Be licensed under an OSI-approved license
- Provide a configuration mechanism that does not involve changing the distributed code
- Use a language and toolset that have been approved by TechCom
While some of our services will be only useful in the WMF context, in other cases the standalone service is intended to be distributed for general use. Dans ce cas, il doit avoir les propriétés suivantes :
- Have a documented installation and uninstallation process that conform to our implementation guidelines
- Have a documented upgrade process that conform with our implementation guidelines
- Be versioned using semver
- Indicate versions of MediaWiki with which it's compatible
- Provide a mechanism by which support (community or otherwise) can be requested
- Provide a mechanism by which patches can be proposed
- Provide a mechanism by which public security advisories are issued
Sécurité et confidentialité
Toutes les fonctionnalités implémentées en tant que service indépendant, doivent avoir les propriétés suivantes :
- Minimize data collection for any type of PII
- Be compliant with the WMF privacy/data retention policies.
- Implement privacy controls that are at least equivalent to those of any calling service. For example, if the privacy controls of the calling service specify that IP addresses will not be stored for more than 90 days, the external service may not store IP addresses for longer than that time.
- Have passed a Security review
- Have resources allocated so that a prompt response to any security incident is possible
Déploiement de la production
If the standalone service is intended to be used in the Wikimedia production environment, it should comply with the guidelines above, and in addition must
- Be deployable with standard WMF tooling (as specified in the implementation guidelines).
- Have an owner, and a plan for on-going maintenance. If the owner of a service is missing (because the team is disbanded/has a different focus), a new owner must be found via the code stewardship process
- Have logging that conforms to the WMF standards - specified in the service implementation guidelines
- Be able to collect and expose operational metrics according to the current WMF standards specified in the implementation guidelines.
- Have a runbook for operational purposes.
- Support a multi-datacenter active-active (or active-passive) deployment.
- Service Level Indicators must be defined for the service, and Service Level Objectives should be agreed upon. Failure to meet said service level objectives SHOULD result action to bring the service back into operational agreed upon Service Level Objectives. The Service Level Objectives can of course be reevaluated and changed, but preferably not as a result of a violation but rather an informed process.
- Have pinned / pinnable dependencies that don't need to be downloaded at runtime and/or from untrusted source.
- Have backups and a restoration/emergency plan (if the service stores any data).
- Have users, or a plan to acquire users
Interaction de service à service
Services will likely interact with each other; if that is the case, measures must be taken not to make the whole system dependent on the failure of a single component. Il est également nécessaire que l'observabilité soit augmentée dans le flux des requêtes. Ainsi tout nouveau service qui doit être déployé en production doit :
- Degrade gracefully its functionality if it can't access another service. If that's not possible, maybe the new service should be logically tied to the other. An exception is explicitly made for the MediaWiki API, given quite a few services might depend on its availability to be useful.
- Be able to perform requests to a specific hostname/ip provided via configuration.
- Be able to use infrastructure middleware for inter service communication functionalities including, but not limited to, encryption and circuit-breaking. Alternatively, the service SHOULD implement those functionalities internally.
- Add the appropriate tracing headers to the request, according to the WMF standards specified in the implementation guidelines.
- Log actions via the production logging facilities.
This policy was established in March 2019 by RFC T208524.