Requests for comment/Data mapper
Data mapper | |
---|---|
Component | General |
Creation date | |
Author(s) | Andrew Green |
Document status | declined See Phabricator. |
This RFC proposes adding a data mapper facility to core. Hopefully, this will promote domain-driven design and help isolate database access. It seems compatible with the recently approved move to a service-oriented architecture. An implementation with tests and examples is provided.
Implemented and merged on a WIP branch of the Campaigns extension for the Editor campaigns project
Problem statement
editHow to model the things a program is "about" and how to organize code for persistent storage are central problems of software architecture. A possible approach to the first problem is domain-driven design. This approach recommends creating effective domain models that aren't overly determined by the technology they're embedded in.[1] The data mapper pattern is a persistent storage approach that may facilitate domain-driven design.[2]
As the service-oriented architecture RFC notes, a lot of Mediawiki code has wide interfaces and is tightly coupled. Dividing this complex system into narrow, independent services and APIs will improve this situation. But it's not enough. We also need a pattern for persistent storage code and a route to expressive domain models.
Adding a data mapper facility to core will provide a standard way of isolating database access and should facilitate the development of domain models. Using the same patterns and toolkit within multiple service components, as appropriate, will make it easy for developers to move from one component to another, and will help avoid duplicate efforts to solve the same problems.
Rationale (more details)
editDomain-driven design, Mediawiki and data mappers
editIn domain-driven design, you build software around a model of what the software is “about” in the real world. According to Eric Evans:
- “The goal of domain-driven design is to create better software by focusing on a model of the domain rather than the technology.”[3]
- “[...] the software constructs of the domain layer mirror the model concepts. It is not practical to achieve that correspondence when the domain logic is mixed with other concerns of the program. Isolating the domain implementation is a prerequisite for domain-driven design.”[4]
- “The domain objects, free of the responsibility of displaying themselves, storing themselves, managing application tasks, and so forth, can be focused on expressing the domain model. This allows a model to evolve to be rich enough and clear enough to capture essential business knowledge and put it to work.”[5]
- The problem with not following this methodology is that as a system grows, “more and more domain rules become embedded in query code or simply lost.”[6] In such cases “[W]e are no longer thinking about concepts in our domain model. Our code will not be communicating about the business; it will be manipulating the technology of data retrieval.”[7]
As a large system that must support the changing needs of a multifaceted, global social movement with hundreds of thousands of participants, Mediawiki, it seems, would benefit from domain-driven design.
The data mapper pattern is a type of object-relational mapping (ORM). It supports domain-driven design by pushing object-relational mapping out of the domain model, into a lower, infrastructure layer. A dedicated facility, the data mapper, "handles all of the loading and storing between the database and the Domain Model and allows both to vary independently".[8] The data mapper pattern contrasts with the active record pattern (in which domain objects have methods for inserting and updating themselves in a database).
Service orientation and domain-driven design
editService orientation and domain-driven design seem compatible. Service orientation is about building a complex application from smaller, relatively independent units of functionality that expose APIs, often over a network. Depending on how functionality is divided up, it seems there could be one or more domain models, and one or more places to use a data mapper (or some other ORM facility).
Isolating database access
editMediawiki's database classes provide abstraction of low-level database calls. But another, higher level of abstraction and isolation of persistent storage-related code is often justified. Consider, for example, ApiQueryAllUsers::execute()
. This method mixes logic for API parameters together with table and field names, an SQL join, and iteration through a complex database query result to build an API result. It is coupled to the details of the API call, data storage, and API result generation. Some form of ORM could be used to separate out code that depends on data storage details; that would be a step towards greater separation of concerns.
Proposed implementation
editThe proposed implementation is a generic data mapping facility that is configured via a global variable and annotations in entity classes.
Setup
editLet's say that you have this database table and unique index:
CREATE TABLE IF NOT EXISTS /*_*/person (
person_id int unsigned NOT NULL PRIMARY KEY auto_increment,
person_name varchar(255) NOT NULL,
person_age int unsigned NOT NULL
) /*$wgDBTableOptions*/;
CREATE UNIQUE INDEX /*i*/person_name_idx ON
/*_*/person (person_name);
Suppose you also have the following interface for objects that map to rows in that table:
interface IPerson {
public function getId();
public function getName();
public function setName( $name );
public function getAge();
public function setAge( $age );
public function makeNameAndAgeString();
}
Let's also say this is your implementation. (Here we've already added annotations on class variables as required by the data mapper.)
class Person implements IPerson {
/**
* @var int
* @id
*/
private $id;
/**
* @var string
* @unique
* @required
*/
private $name;
/**
* @var int
* @required
*/
private $age;
public function getId() {
return $this->id;
}
public function getName() {
return $this->name;
}
public function setName( $name ) {
$this->name = $name;
}
public function getAge() {
return $this->age;
}
public function setAge( $age ) {
$this->age = $age;
}
public function makeNameAndAgeString() {
return $this->name . ' (' . $this->age . ')';
}
}
Once you have that, you just define an enum class (using TypesafeEnum
) for your entity's fields, and set some values in a global variable:
class PersonField extends TypesafeEnum implements IField {
static $ID;
static $NAME;
static $AGE;
}
PersonField::setUp();
$wgDBPersistence['IPerson'] = array(
'realization' => 'Person',
'table' => 'person',
'column_prefix' => 'person',
'field_class' => 'PersonField'
);
Then you're good to go!
CRUD
editHere are some fun things you can do:
// Get or instantiate the persistence manager
$persistence_mgr = new DBPersistenceManager( new DBMapper() );
// Create Phil
$phil = new Person();
$phil->setName( 'Phil' );
$phil->setAge( 25 );
$persistence_mgr->queueSave( $phil );
$persistence_mgr->flush();
// Phil now has an id (see the @id annotation in Person)
$id = $phil->getId();
// Retrieve Phil
$condition = new Condition( PersonField::$NAME, Operator::$EQUALS, 'Phil' );
$retrieved_phil = $persistence_mgr->getOne( 'IPerson', $condition );
// Update Phil
$phil->setAge( 26 );
$persistence_mgr->queueSave( $phil );
$persistence_mgr->flush();
// Tell Phil he's no longer welcome in your persistence store
$persistence_mgr->queueDelete( 'IPerson', $condition );
$persistence_mgr->flush();
Wait, there's more...
edit$jill = new Person();
$jill->setAge( 27 );
$persistence_mgr->queueSave( $jill );
// Throws a RequiredFieldNotSetException (see the @required annotation in Person
// and the NOT NULL in the database schema)
$persistence_mgr->flush();
// Try creating another Phil
$phil2 = new Person();
$phil2->setName( 'Phil' );
$phil2->setAge( 31 );
$persistence_mgr->queueSave( $phil2 );
// Throws a MWException due to the duplicate value (see the @unique annotation in
// Person and the unique index in the database schema)
$persistence_mgr->flush();
// Hmmm, let's try that again
$persistence_mgr->queueSave( $phil2, function ( $person, $index_name ) {
print( 'Duplicate value on ' . $index_name . '.' );
} );
// Prints 'Duplicate value on NAME.'
$persistence_mgr->flush();
// Let's say we just want to make sure there's a 32-year-old Phil in our store. We
// don't know if there's currently a Phil there or not. If there's already a Phil,
// we want to set his age, and if there's no Phil, we want to insert him.
$phil_to_ensure = new Person();
$phil_to_ensure->setName( 'Phil' );
$phil_to_ensure->setAge( 32 );
$persistence_mgr->queueUpdateOrCreate( $phil_to_ensure, array( PersonField::$NAME ) );
$persistence_mgr->flush();
// Get an array of all the people in our repository, or as many as possible, ordered
// by name. Note that this method also accepts conditions and a continue key (works
// like MW web API's continue)
$people = $persistence_mgr->get( 'IPerson', PersonField::$NAME, Order::$ASCENDING );
Tests and example
editPlease see the Campaigns extension for unit tests.
Considerations
editIn the above example, the Person
class expresses domain knowledge about people: they must have a name and an age, no two people have the same name, the values of both properties can change, and you can create a string with a person's name and age that looks like this: "Name (age)". The class is not cluttered with framework-specific information or logic.
If you want to encapsulate logic related to the set of entities of a given type, it's easy to create repositories. For example, if we know we'll frequently have to retrieve people older than 70 whose names start with "W", we can create a PersonRepository
and put the logic for such queries there. Repositories are a thing in domain-driven design.[9]
This implementation takes several cues (not queues) from Doctrine, a much more complete data mapping library for PHP. It would probably be more fun to just use Doctrine! With the recent acceptance of the Composer-manager libraries RFC, this is definitely something to consider. A possible impediment is that for consistency and legacy support, we may well want low-level database access to continue to go through existing MW classes.
Related efforts
editMediawiki already contains an ORM facility: ORMTable and related classes. These classes support the active record pattern, rather than the data mapper pattern.
The Flow extension encapsulates database access using its own object mapper facility.
Simple base class to handle CRUD, ACLS, and potentially caching (not done yet).
Proposed methodology
editThis RFC is not about refactoring existing Mediawiki classes to use the data mapper pattern, but about adding a data mapper facility to Mediawiki. Such a facility could be used with new and non-central MW code on the understanding that it is experimental and could change or even disappear at any time. It would be a sort of internal "beta feature". Reviewing how it is used and how it impacts on code quality would be a central task.
This same approach has been proposed for adding dependency injection to core.
See also
edit- Discussion and RFC about a fluent SQL interface from Wikia.
- Discussion about using concepts from domain-driven development to improve the separation of concerns in Mediawiki.
- RESTBase storage service with abstract table and bucket interfaces very similar to DynamoDB and DataStore. Examples for the table interface: schema, query. Accessible to PHP through the Requests for comment/PHP Virtual REST Service, but could also be wrapped if necessary. Will expose secondary index and transaction functionality.
References
edit- ↑ Evans, Eric (2004). Domain-Driven Design: Tackling Complexity in the Heart of Software. Boston: Addison-Wesley, 148.
- ↑ Fowler, Martin, with David Rice, Matthew Foemmel, Edward Hieatt, Robert Mee, and Randy Stafford (2003). Patterns of Enterprise Application Architecture. Boston: Addison-Wesley, 36.
- ↑ Evans, op. cit., 148.
- ↑ Ibid., 75.
- ↑ Ibid., 70−71.
- ↑ Ibid., 149.
- ↑ Ibid., 150.
- ↑ Fowler et al., op. cit., 36.
- ↑ Evans, op. cit., 147−162.