User:ASarabadani (WMF)/Database for developers toolkit/How-to/Data migration

With live databases serving millions of requests per minute, data migration is a complex work.

  1. Make the schema for adding the new place, making it fully nullable. Get it reviewed and merged.
  2. Request a schema change by creating a ticket in Blocked-on-schema-change workboard. Data persistence team takes it over and get the change deployed. It usually takes a couple of weeks to finish. Do not do step 4 before this is fully done and you got green light from the DBAs.
  3. Write the code that can write to the new system. It must be behind a configuration flag. You can use SCHEMA_COMPAT_* variables for it as well. See Defines.php. The default must be write to old system only (at least for production). Get the patch reviewed and merged and wait for deployment. Do not depend on querying the schema to change the query, you might hit a different database.
  4. Gradually turn on writing, first, on beta cluster then test wikis, then small wikis, etc.
    1. Look at logs, make sure there are not bugs, or exceptions, or it's not putting too much pressure on the databases.
  5. Now it's time to write a maintenance script to backfill the old data. Get it reviewed, merged and deployed.
    • It should be a mediawiki maintenance script that makes writes taking less than a second. It's better to do read on replicas then build a simple and fast write query on primary.
    • The write queries should be in batches that won't change no more than 10,000 rows at a time (if rows are big, it should be 1000 at most), the write query should be done below a second, otherwise running it would make all of Wikipedia go read-only.
    • Between each batch you MUST call waitForReplication().
    • If it takes more than a day to run the script on a wiki, it should have a sleep of at least two seconds between each batch. This is because replication state of databases in the passive datacenter or other databases in the system (analytics replica, cloud replicas) are not reflected in waitForReplication()
  6. Run the script on small wikis, make sure it works fine, run it on all wikis.
    • If it takes long time, you can run it on different database sections in parallel but not in wikis in the same section.
    • Make sure you turn on "write both" before start of the script
  7. Once 5 and 6 are done, create the code reading on the new system. It must be behind a config (similar to step three, you can reuse the same config if used SCHEMA_COMPAT_*). Get it reviewed, merged and deployed.
  8. Gradually roll it out, first beta cluster, then test wikis, then small wikis. Make sure there is no bug or issues once deployed.
  9. Once it's stabilized, you need to stop writing to the old data. If the old column has unique or not-nullable, you will run into issue so you have to first make a schema change making them nullable or drop unique constraints. Request the schema change similar to step 2.
  10. After finishing previous step, you can move on to dropping the old columns. Create the schema change patch file. Get it reviewed and merged.
  11. Request the final schema change from the DBAs, similar to step two but asking for drop of the old code.
  12. Once done, start cleaning up code from your code base, make sure you take into account data migration for third party installations (including adding running the maintenance script in DatabaseUpdater)

Examples

edit