Deployment tooling/Notes/What does scap do

This documentation describes scap prior to it being ported to python.

Scap ("sync-common-all-php") is a collection of shell scripts used to publish code and configuration to the WMF production web servers.

scap edit

scap is the driver script for syncing the MW versions and configuration files currently staged on tin.equiad.wmnet to the rest of the MW servers in the production cluster.

Usage
scap [--versions=<versions>] [<message>]
  1. Acquire lock on /var/lock/scap
  2. Record start timestamp
  3. Ensure that SSH_AUTH_SOCK is available (needed for dsh to remote hosts)
  4. Check for command line flag to limit activities to a particular MW version
  5. Export MW_VERSIONS_SYNC variable describing software versions to push with sync scripts. Either:
    • A specific version given with the --versions command line argument (eg 1.23wmf12)
    • The output of mwversionsinuse --home
  6. Lint files in $MW_COMMON_SOURCE/wmf-config and $MW_COMMON_SOURCE/multiversion
  7. Runs sync-common
    • copies files from tin.eqiad.wmnet:/usr/local/apache/common-local to tin.eqiad.wmnet:/a/common via rsync
  8. Runs mw-update-l10n
  9. Runs dologmsg to announce that scap is starting
  10. Runs scap-1 via dsh on scap-proxies group
  11. Randomizes list of hosts to update (All hosts listed in /etc/dsh/group/mediawiki-installation)
  12. Runs scap-1 via dsh
  13. Runs scap-rebuild-cdbs via dsh
  14. Runs sync-wikiversions
  15. Compute elapsed runtime
  16. Runs dologmsg to log runtime
  17. Runs deploy2graphite to log scap run completion
  18. Deletes temp files
  19. Releases lock on /var/lock/scap

sync-common edit

sync-common is really just an alias for scap-1 in shell script form.

  1. Runs scap-1

scap-1 edit

scap-1 sets up the local host to receive files via rsync, chooses an rsync server to fetch files from and delegates to scap-2 to actually fetch the files.

  1. Sources /usr/local/lib/mw-deployment-vars.sh
  2. If $MW_COMMON directory is not found:
    • Creates $MW_COMMON via install -d -o mwdeploy -g mwdeploy "${MW_COMMON}"
  3. If /usr/local/apache/uncommon directory is not found:
    • Creates /usr/local/apache/uncommon via install -d -o mwdeploy -g mwdeploy /usr/local/apache/uncommon
  4. Initialize RSYNC_SERVERS variable to first command line argument (could be empty string)
  5. Initialize SERVER as an empty variable
  6. If $RSYNC_SERVERS is not an empty string:
  7. If $SERVER is still empty:
    • Set SERVER to $MW_RSYNC_HOST
  8. Run scap-2 "$SERVER" as the user mwdeploy
    • MW_VERSIONS_SYNC and MW_SCAP_BETA from the current execution context are forwarded to the environment of the scap-2 invocation
  9. Echo "Done"
  10. Exit 0

scap-2 edit

scap-2 copies files from the common module of an rsync server to the MW_COMMON directory on the local host

Usage
scap-2 [<host>]
  1. Sources /usr/local/lib/mw-deployment-vars.sh
  2. Initialize SERVER as $1
  3. If $SERVER is still empty:
    • Set SERVER to $MW_RSYNC_HOST
  4. Initialize RSYNC_ARGS as an array containing MW_RSYNC_ARGS[@]
  5. If $MW_VERSIONS_SYNC is not an empty string:
    • Add --include=php-$v/ to RSYNC_ARGS for each $v in $MW_VERSIONS_SYNC[@]
    • Add --exclude=php-*/ to RSYNC_ARGS
  6. Echo that hostname -s is copying from $SERVER
  7. Run rsync "${RSYNC_ARGS[@]}" "$SERVER"::common/ "${MW_COMMON}"

mw-update-l10n edit

mw-update-l10n generates l10n cdb files and exports their contents as a series of json files that have better rsync compression properties for transfer to cluster hosts.

Usage
mw-update-l10n [--verbose]
  1. Sources /usr/local/lib/mw-deployment-vars.sh
  2. Asserts that the local host is running some variant of linux
  3. Checks for a --verbose command line argument and toggles off the QUIET setting if present
  4. Sets CPUS to the number of cores on the local host (includes hyperthreading cores)
  5. Sets THREADS to CPUS - 2
  6. Sets mwExtVerDbSets to the output of mwversionsinuse --extended --withdb
    • (eg 1.23wmf11=aawikibooks 1.23wmf12=mediawikiwiki)
  7. For each version in $mwExtVerDbSets:
    1. Split version string into mwVerNum (eg 123.wmf11) and mwDbName (eg aawikibooks)
    2. If MW_VERSIONS_SYNC is set and mwVerNum isn't a version being synced: continue
    3. Make a new temp file and track as mwTempDest
    4. Run mergeMessageFileList.php for the wiki mwDbName outputting to mwTempDest
    5. Copy mwTempDest to $MW_COMMON_SOURCE/wmf-config/ExtensionMessages-"$mwVerNum".php
    6. Copy $MW_COMMON_SOURCE/wmf-config/ExtensionMessages-"$mwVerNum".php to $MW_COMMON/wmf-config/ unless they are the same location
    7. Run rebuildLocalisationCache.php using THREADS threads
    8. Run refreshCdbJsonFiles using THREADS threads

refreshCdbJsonFiles edit

refreshCdbJsonFiles generates JSON data files and MD5 checksums from CDB databases.

Usage
refreshCdbJsonFiles --directory <DIR> [--threads <N>]
  1. Validate command line arguments
  2. Create list of .cdb files in target directory
  3. Split list in N parts (N == number of parallel threads requested)
  4. For each sublist of CDB files:
    1. Fork a child process
    2. For each file:
      1. Compute md5 checksum of file
      2. If md5(file) === last md5 recorded: continue
      3. Generate JSON file of key:value pairs found in CDB file to temporary file
      4. Write md5(file) to $file.MD5
      5. Move JSON temp file to $file.json
  5. Wait for children to finish
  6. Echo status message if any files were updated

scap-rebuild-cdbs edit

scap-rebuild-cdbs rebuilds l10n cache CDB database from JSON files

  1. Sources /usr/local/lib/mw-deployment-vars.sh
  2. Sets CPUS to the number of cores on the local host (includes hyperthreading cores)
  3. Sets THREADS to CPUS / 2
  4. Sets mwVersions to either MW_VERSIONS_SYNC or the output of mwversionsinuse
  5. For each version in mwVersions:
    1. Run mergeCdbFileUpdates

mergeCdbFileUpdates edit

mergeCdbFileUpdates updates l10n CDB files from JSON data files

Usage
mergeCdbFileUpdates --directory <DIRECTORY> [--threads <N>] [--trustmtime]
  1. Validate command line arguments
  2. Create list of .json files in target directory
  3. Split list in N parts (N == number of parallel threads requested)
  4. For each sublist of JSON files:
    1. Fork a child process
    2. For each file:
      1. Continue unless JSON newer than CDB / md5 checksums don't match
      2. Load JSON data from file
      3. Create a new CDB file with JSON key:value data
      4. Rename temporary CDB file over .cdb file
  5. Wait for children to finish
  6. Echo status message if any files were updated

sync-wikiversions edit

sync-wikiversions copies wikiversions files to hosts in the mediawiki-installation dsh group.

  1. Sources /usr/local/lib/mw-deployment-vars.sh
  2. Ensure that SSH_AUTH_SOCK is available (needed for dsh to remote hosts)
  3. Run multiversion/refreshWikiversionsCDB
  4. Ensure that dsh is available locally
  5. Run rsync $MW_RSYNC_HOST::common/wikiversions.{dat,cdb} $MW_COMMON via dsh on mediawiki-installation hosts
  6. Runs dologmsg to log completion
  7. Runs deploy2graphite to log sync-wikiversions completion

mw-deployment-vars.sh edit

mw-deployment-vars.sh is a puppet generated shell script that sets several MW related environment variables.

The values of these variables change based on the deployment system in use and the realm of the server. For the sake of this analysis we are only concerned with the values configured for the scap deployment system in the production realm.

MW_COMMON
varies by deployment system
scap: /usr/local/apache/common-local
MW_COMMON_SOURCE
varies by deployment system
scap: /a/common
MW_DBLISTS
varies by deployment system
scap: /usr/local/apache/common-local
MW_DBLISTS_SOURCE
varies by deployment system
scap: /a/common
MW_CRON_LOGS
/home/wikipedia/logs/norotate
MW_RSYNC_HOST
varies by realm
production: tin.eqiad.wmnet
MW_DSH_ARGS
('-cM' '-g' 'mediawiki-installation' '-o' '-oSetupTimeout=30' '-F30')
MW_RSYNC_ARGS
('-a' '--delete-delay' '--delay-updates' '--compress' '--delete' '--exclude=**/.svn/lock' '--exclude=**/.git/objects' '--exclude=**/.git/**/objects' '--exclude=**/cache/l10n/*.cdb' '--no-perms')
MW_CARBON_HOST
varies by realm
production: statsd.eqiad.wmnet
MW_CARBON_PORT
2003

find-nearest-rsync edit

find-nearest-rsync is a perl script that attempts to determine the host with the lowest ICMP ping round trip time (rtt) from a given list of hosts.

Usage
find-nearest-rsync [--verbose] <host> [<host> ...]

The host with the lowest rtt will be printed to stdout.

mwversionsinuse edit

mwversionsinuse is a shell script to call the local version of multiversion/activeMWVersions

  1. Sources /usr/local/lib/mw-deployment-vars.sh
  2. Runs "${MW_COMMON}/multiversion/activeMWVersions" "$@"

dologmsg edit

dologmsg appends a message to an IRC buffer

Usage
dologmsg [MESSAGE]

See also edit