Technical decision making/Decision records/Authority

Decision made in February of 2021.

What are your constraints?

edit

Many times we have implicit constraints, based on time, resources, performance, security and other aspects. This is a place to make them explicit and share with your team and stakeholders.

General Assumptions and Requirements

Use a new line for each assumption or requirement which you are using to constrain your proposed solutions.

Source

The person or document that this requirement comes from.

Decoupling code from global state is valuable Established best practice
Encapsulation and separation of concerns is valuable Established best practice
Migration to the new system needs to follow the Stable Interface Policy Stable Interface Policy
Disruption of existing code should be minimized, migration should be straight forward PET
Security Requirements

Describe any security limitations or constraints for the proposed solutions.

Permission checks performed via the new system should be functionality equivalent to checks made through the old system. The semantics of individual permissions remains unchanged. PET
The chance of mistakenly reducing rigor of permission checks should be minimized. PET
Privacy Requirements

Describe potential privacy limitations or constraints for the proposed solutions and how they will be mitigated.

n/a

Important Questions

edit

Fill out any questions you should answer or follow up on before making a decision. Record those answers here.

Question Who can answer? Resolution, answer or action
Does the proposed design cover existing and anticipated use cases that stakeholders have? Engineers writing code that performs permission checks in PHP. All use cases covered. See the consultation section for details
Is the proposed design easy to use in the context of stakeholder’s use cases? Or does it add complexity? Engineers writing code that performs permission checks in PHP. No concerns raised. Use cases are expected to benefit from reduced complexity.

Decision

edit

The table captures your decision as well as important information documentation, context, and support materials. Fill this in after you list out options on the following pages.

Selected Option Option 2 (see below)
Rationale Previously considered options did not provide the desired encapsulation:
  • PermissionManager (option 1) used global state to access information about the current request and session.
  • ActinUser (option 3) does not provide a conceptual and type safe separation of a user that is on record of having done something, and an authority that is currently doing something.
  • Moving to Authority (option 2) will cover ground that ActinUser would not: it allows us to remove dependencies to the old User class right away. With ActingUser, this would have to be done in a later step, duplicating effort. It’s possible that introducing ActingUser would make that step harder.
Data
  • A year or so worth of discussions on T231930 leading up to the current design.
  • Patches linked to T261963 and T231930 and relevant subtasks indicate that the approach can be implemented cleanly and reduces complexity in code that uses it.
Informing
  • Consultation and design took place for over a year on T231930, guided by TechCom, before the Tech Forum was in place.
  • PET has been sending emails about this change to tech-all and wikitech-l
  • PET presented the outcome of the spike to a stakeholders (Video)
  • Notification of deprecation and removal of obsolete code will follow the Deprecation Process defined in the Stable Interface Policy.
Who Daniel Kinzler (PET)
Date February 18 2021

Technical Forum Chair review

edit

This is to be filled out by the Technical Forum Chairs after the decision is made. These questions are designed to help the chairs give feedback to the decision team to help them make an informed decision.

Are the options detailed enough to make an informed decision?
Were all parties identified in the RACI consulted?
Does this decision require C-level review? Why or why not?
Product Chair The name of the Product Department Chair reviewing
Technology Chair The name of the Technology Department Chair reviewing
Date When was this presented in the Technical Forum

What are your options?

edit

On the following pages, list at least three options to address your problem. Try to keep the description clear and use language that can be easily understood. The first option should always be the “nothing” option. This helps you properly evaluate the pros/cons/costs of the status quo as you evaluate potential solutions. If you need to dig deeper into options, technology, etc… link to them in the references.

Option 1: Do Nothing
Description Keep using PermissionManager together with User to perform permission checks. Keep rate limiting separate.
Benefits No direct engineering cost.
Risks Global state makes the code hard to test. Use of stateful “monster” objects make the code unpredictable and hard to understand.
Effort none
Costs none
Testing Test coverage of existing code is incomplete.
Performance & Scaling The current solution has been optimized to avoid performance impact on high traffic code paths. It is in production now.
Deployment n/a
Rollback and reversibility n/a
Operations & Monitoring n/a
Additional References n/a
Consultations
Timo In 2019, Timo said that not only should we not rely on global state to access the current web request: information from the web request should not be available in service wiring either (see T218555#5438529). Instead, they should generally be passed from the entry point as a parameter. This led us to dismiss an earlier proposal to keep building on PermissionManager, and further to the proposal of ActingUser (option 3) and later Authority (option 2).
Product Infra In 2018, Gergö identified a number of problems with the existing system in T212639. This greatly influenced the design of Authority (option 2).
AHT When AHT was working on PermissionManager in 2018, David Barratt identified problems regarding the way the block system interacts with the permission system in T212311. This influenced the design of Authority (option 2).
Option 2: Introduce Authority
Description
  • Model Authority as an object that encapsulates the actor’s identity, request state, relevant configuration and the relevant user account.
  • Have the existing User class implement the new interface.
  • Modify methods that currently require a User to be provided for permission checks to accept an Authority object instead.
  • Provide Authority objects in all contexts that currently provide access to the User object representing the User performing the current request.
Benefits
  • Removes the need to access global state to perform permission checks
  • Provides explicit modeling of the idea of an authority beyond a user an the groups they belong to (considering e.g. oauth grants)
  • Provide a clear migration path that allows backwards compatibility to be retained for a deprecation period.
  • Builds on the existing mechanism for providing code with the means to perform permission checks, namely by passing a user.
  • Allows for alternative implementations of a simple interface to be used in special situations such as maintenance scripts and async jobs.
  • E.g. it is easy to implement an Authority that automatically has all permissions, or one that only has a restricted set of permissions.
  • We automatically get rid of dependencies on the heavy weight User class, removing all the cross-dependencies it comes with.
  • This brings us a big step closer to a future in which we are migrating away from the old User objects towards simple value objects.
Risks As with any refactoring, there is a risk of oversights and misunderstandings. We tried to reduce this risk by keeping the new interface as close to the existing interface as possible, allowing it to act as a drop-in replacement under most circumstances.
Effort The cost for putting the new system into place has already been covered in the experimental spike. It turned out to be about four weeks of work for two FTE.

The cost for converting most of core and extensions maintained by WMF to the new approach will likely be roughly that much again.

The cost of converting the long tail of third party extensions is hard to estimate.

It should be mentioned that these conversions are typically trivial changes.

Costs There is no direct cost beyond the engineering effort.
Testing All new code added will have full test coverage.

Test coverage of code affected by this test varies greatly.

The most crucial functionality is covered by end-to-end tests.

Performance & Scaling The refactoring is designed to retain the scaling characteristics of the existing implementation.
Deployment The code changes will be deployed iteratively with the weekly deployment “train”.
Rollback and reversibility As with any refactoring, this is reversible, but the cost of reversing is relative to the number of places in the code that bind to the new interface. For Authority, that number is rather high.

To mitigate the risk of having to roll back, we implemented the new functionality and changed a limited scope of code to use it, to verify the design and gather experience. Converting more parts of the code base will be done in an iterative manner, allowing us to roll back each step individually.  

Operations & Monitoring There are no additional operational requirements.

The monitoring in place for the existing system will cover the new system as well.

Additional References T231930

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/655438

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/654965

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/654967

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/654969

Consultations
Wikidata One concern addressed: MCR slots are not modeled as separate “targets” of actions, targets are always pages; instead, operations on some slots may be modeled as special actions requiring additional permissions.
Performance No concerns raised regarding performance. Timo expressed a preference for this option over the others presented here (PermissionManager and ActingUser)  
Security Conceptually, the Security Team does not have any issues with the proposed Authority system and believes it should make many of the convoluted permission and security checks within MediaWiki more sane and sustainable.  One potential issue that the Security Team sees somewhat often is developers simply not knowing how or when to check various rights, etc. within MediaWiki but that is more of a social/documentation issue unrelated to Authority.
Multimedia Same question about MCR slots as from the Wikidata team.
Tech Engagement Birgit asked whether this would have any impact on people writing user scripts, gadgets, or bots. It does not.

It does have an impact on third party extension developers.

Abstract Wikipedia This change is a good one, and we support it. We have a minor concern of potentially wanting more fine-grained targets for actions in the future (more than at the page or MCR slot level), but this can be a later iteration of the new system.
Option 3: Introduce ActingUser
Description
  • Create a subclass of User that represents the currently acting user
  • Make the acting user expose additional information that may be relevant for permission checks, such as the current request
  • Make PermissionManager check if the user passed to it is an acting user, and make use of the additional context provided.
Benefits
  • Most code can be oblivious of the change. Only PermissionManager and code that constructs User objects need to be aware.
Risks
  • We continue to bind to the heavy weight User class, passing on the opportunity to remove all the cross-dependencies it comes with. This will have to be addressed in a separate refactoring, that would come with all the cost associated with option 2.
  • It is unclear how this approach would function in a future in which we are migrating away from the old User objects towards simple value objects.
  • We bind to the idea that permission checks are bound to the Request, not to some other factor
  • This makes it awkward to implement permission checks for different circumstances.
  • E.g. there is no way to implement an ActingUser that automatically has all permissions, or one that only has a restricted set of permissions.
Effort
  • about six weeks of work for two FTE.
Costs There is no direct cost beyond the engineering effort.
Testing All new code added will have full test coverage.

Test coverage of code affected by this test varies greatly.

The most crucial functionality is covered by end-to-end tests.

Performance & Scaling The refactoring is designed to retain the scaling characteristics of the existing implementation.
Deployment The code changes will be deployed iteratively with the weekly deployment “train”.
Rollback and reversibility Since only a limited number of classes are affected, this change is easy to undo.
Operations & Monitoring There are no additional operational requirements.

The monitoring in place for the existing system will cover the new system as well.

Additional References T231930 has a section titled “Original Proposal (ActingUser)”, and some discussion about this proposal from 2019.
Consultations
Phab discussion Starting with the comment at T231930#5462972 in 2019, several collaborators analyzed and discussed the proposed solution (ActingUser), identified weaknesses, and identified improvements. This discussion led to the proposal of Authority (option 2). The collaborators involved were Brad, Daniel, and Simetrical (PET); David Barratt (AHT), Gergö (Product Infra), and Máté (Fandom).

Resource:

https://www.atlassian.com/blog/inside-atlassian/make-team-decisions-without-killing-momentum