Core Platform Team/Initiatives/Enable Multi-DC Session Storage
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. The Core Platform Team and its initiatives do not exist anymore. See MediaWiki Engineering Group instead since 2023. |
Enable Multi-DC Session Storage
|
Initiative Description
- Summary
Develop a multi-master replicated key-value storage service, the semantics of which permit session access from MediaWiki in an active-active, multi-datacenter configuration. Secondarily, the service decouples MediaWiki from storage, creating additional isolation of sensitive data.
- Significance and Motivation
This is a blocker to enable active-active data center. Enables multi-data center session access. Makes the system more fault tolerant and resistant. Secondarily, it isolates session data.
- Outcomes
- Baseline Metrics
- Sessions are accessed from 1 Data Center
- Target Metrics
- Sessions can be accessed from 2 Data Centers
- Stakeholders
- SRE
- Performance
- Known Dependencies/Blockers
- Setup Kubernetes security zone (SRE)
- Security review (Security - 30 day lead time)
Epics, User Stories, and Requirements
- Hardware request and setup
- RFC for the session storage API
- Investigate use of Redis session storage to see if there is extra work
- Design implementation (storage, replication semantics, performance)
- Test and prototype in multiple languages to understand performance/latency/throughput
- Implementation
- Figure out deployment method
- CI for build testing docker image creation
- Cassandra cluster configuration
- Beta deployment
- Develop migration plan
- Integrate with MediaWiki
- Determine if “Set if not exist” functionality is needed (implement if needed)
- Determine if Per operation defined TTLs are needed (implement if needed)
- Enable functional testing (set up and tear down of Cassandra)
- Security review
- Implementing service-checker functionality (endpoint monitoring)
- Figure out the Kubernetes deployment (Helm charts)
- Deploy according to migration plan (test wikis, etc…)
Time and Resource Estimates
- Estimated Start Date
October 2018
- Actual Start Date
Started in October 2018
- Estimated Completion Date
None given
- Actual Completion Date
None given
- Resource Estimates
- 2 FTE for 6 months (FY1819 Q2-Q3)
- 2 part time engineers for 3 months during deployment (FY1819 Q4)
- Collaborators
- Core Platform
- SRE
- Security
Open Questions
- Should central auth metadata be stored in the same or different kask instance?
- Is “Set if not exist” functionality is needed?
- Are Per operation defined TTLs are needed?
Documentation Links
- Phabricator
https://phabricator.wikimedia.org/T206016 (master ticket)
- Plans/RFCs
Requests for comment/SessionStorageAPI
- Other Documents
Subpages