User:BDavis (WMF)/Notes/Finding Files To Purge
Trying to help with bugzilla:54647.
Some unspecified network error caused htcp purge messages to be lost for an extended period of time. Once the network issue was resolved we needed to purge a lot of potentially stale pages from the varnish cache layer.
Figuring out what to do
editFinding the pages
editNik's sql query came in the form of php code from CirrusSearch. I took that and put it into phpsh to get the generated sql.
$minUpdate = new MWTimestamp( '2013-09-22T00:00:00Z' ); $maxUpdate = new MWTimestamp( '2013-09-26T00:00:00Z' ); $dbr = wfGetDB( DB_SLAVE ); $minId = $dbr->addQuotes( '-1' ); $minUpdate = $dbr->addQuotes( $dbr->timestamp( $minUpdate ) ); $maxUpdate = $dbr->addQuotes( $dbr->timestamp( $maxUpdate ) ); $res = $dbr->selectSQLText( array( 'page', 'revision' ), array_merge( array( 'page_touched', 'page_counter', 'page_restrictions' ), Revision::selectPageFields(), Revision::selectFields() ), 'page_id = rev_page' . ' AND rev_id = page_latest' . " AND ( ( $minUpdate = rev_timestamp AND $minId < page_id ) OR $minUpdate < rev_timestamp )" . " AND rev_timestamp <= $maxUpdate", // Note that redirects are allowed here so we can pick up redirects made during search downtime 'INTERACTIVE', array( 'ORDER BY' => 'rev_timestamp, rev_page') );
The resulting sql was:
SELECT page_touched ,page_counter ,page_restrictions ,page_namespace ,page_title ,page_id ,page_latest ,page_is_redirect ,page_len ,rev_id ,rev_page ,rev_text_id ,rev_timestamp ,rev_comment ,rev_user_text ,rev_user ,rev_minor_edit ,rev_deleted ,rev_len ,rev_parent_id ,rev_sha1 ,rev_content_format ,rev_content_model FROM `page` ,`revision` WHERE page_id = rev_page AND rev_id = page_latest AND ( ( '20130922000000' = rev_timestamp AND '-1' < page_id ) OR '20130922000000' < rev_timestamp ) AND rev_timestamp <= '20130926000000' ORDER BY rev_timestamp, rev_page
This was TMI but led me to formulate a simpler version that was "good enough for now". I ran that on tools-lab against the enwiki and dewiki mirrors:
ssh tools-login.wmflabs.org become bd808-test
mysql --defaults-file="${HOME}/replica.my.cnf" -h enwiki.labsdb enwiki_p -e "SELECT page_title FROM page, revision WHERE page_id = rev_page AND rev_id = page_latest AND page_namespace = 0 AND ( ( '20130922000000' = rev_timestamp AND '-1' < page_id ) OR '20130922000000' < rev_timestamp ) AND rev_timestamp <= '20130926000000' ;" > enwiki-misses.txt
mysql --defaults-file="${HOME}/replica.my.cnf" -h dewiki.labsdb dewiki_p -e "SELECT page_title FROM page, revision WHERE page_id = rev_page AND rev_id = page_latest AND page_namespace = 0 AND ( ( '20130922000000' = rev_timestamp AND '-1' < page_id ) OR '20130922000000' < rev_timestamp ) AND rev_timestamp <= '20130926000000' ;" > dewiki-misses.txt
Massaging the data
editNow that I had dumps of page titles I needed to turn them into full urls for Reedy to use to do the purges:
First, URLEncode the titles:
perl -pi.raw -e 'chomp();s/([^A-Za-z0-9])/sprintf("%%%02X", ord($1))/seg;print "\n"' enwiki-misses.txt dewiki-misses.txt
Then turn them into full URLs and make sure there aren't any dupes:
perl -pi.enc -e 's@^@http://de.wikipedia.org/@' dewiki-misses.txt mv dewiki-misses.txt dewiki-misses.txt.url sort -u dewiki-misses.txt.url > dewiki-misses.txt
perl -pi.enc -e 's@^@http://en.wikipedia.org/@' enwiki-misses.txt mv enwiki-misses.txt enwiki-misses.txt.url sort -u enwiki-misses.txt.url > enwiki-misses.txt
Now What?
editAfter a lot of futzing around I scp'd the files to a web accessible location where Reedy picked them up and applied them.
Here's the start of a script that may help with this the next time it happens.
#!/usr/bin/env bash # Find all wiki pages changed between a given start and end date. # TODO: # - Add command line flags # - Add namespace support LANG=${1:-en} WIKI=${LANG}wiki START_TS=20130922000000 END_TS=20130926000000 # NOTE: only gets changes to namespace 0 (Main) mysql --defaults-file="${HOME}/replica.my.cnf" -h ${WIKI}.labsdb \ ${WIKI}_p --skip-column-names -e " SELECT page_title FROM page, revision WHERE page_id = rev_page AND rev_id = page_latest AND page_namespace = 0 AND ( ( '${START_TS}' = rev_timestamp AND '-1' < page_id ) OR '${START_TS}' < rev_timestamp ) AND rev_timestamp <= '${END_TS}' ;" | perl -e 'while (<STDIN>) { next if (/^\s+$/); chomp; s/([^A-Za-z0-9])/sprintf("%%%02X", ord($1))/seg; print $_, "\n"; }'| perl -p -e "s@^@http://${LANG}.wikipedia.org/wiki/@" | sort -u > ${WIKI}-changed.txt # vim:sw=2 ts=2 sts=2 et:
Maintenance Scripts
editpurgeChangedPages
editTim suggested that I go ahead and make a real maintenance script for this. See gerrit:86883 for progress on the new purgeChangedPages
script.
Once the maintenance script is approved, somebody (maybe even me) will use it to send purges esams for all wikis.
Testing
editTested in beta to see if it really does what it is supposed to do.
ssh deployment-bastion.pmtpa.wmflabs
- Check that script runs at all:
mwscript purgeChangedPages.php enwiki --starttime 20130922000000 --endtime 20130926000000 -v --dry-run
- manually examined list of changed files generated
- Check that PURGE messages are recieved by Varnish:
mwscript purgeChangedPages.php enwiki --starttime 20130922000000 --endtime 20130926000000 -v
- watched PURGE requests come in on deployment-cache-text1.pmtpa.wmflabs with
varnishncsa -n frontend|grep PURGE
- Check that using
--htcp-dest
flag does something useful:- started htcpsnoop to listen for HTCP packets at 127.0.0.1:31337 on deployment-bastion.pmtpa.wmflabs
mwscript purgeChangedPages.php enwiki --starttime 20130922000000 --endtime 20130926000000 -v --htcp-dest=127.0.0.1:31337
- watched HTCP arrive on python daemon
- verified that no PURGE requests come in on deployment-cache-text1.pmtpa.wmflabs at same time
Production purge
edit- Done Find out when script will land on terbium: Thursday 2013-10-10
- Done terbium:/usr/local/apache/common/php-1.22wmf21/maintenance/purgeChangedPages.php
- Done terbium:/usr/local/apache/common/php-1.22wmf20/maintenance/purgeChangedPages.php
- Done Get IP address of hooft to target HTCP packets
- 91.198.174.113:4827
- Ori said that he verified that UDP packets can traverse to there from terbium
- Done Schedule deployment window with Greg
- Done Announce intentions to ops-l
- Done Get updated patch with rate limit approved
- Done Get updated patch with rate limit pushed to terbium
- Done Wait... for... deployment... window...
- Done
ssh terbium
- Done Purge all the things!
while read wiki; do echo "== ${wiki} ==" LOG="purge-${wiki}-$(date +%Y%m%dT%H%M).log" date >${LOG} mwscript purgeChangedPages.php $wiki \ --starttime 20130922000000 \ --endtime 20130926000000 \ --htcp-dest 91.198.174.113:4827 \ --sleep-per-batch 500 \ --verbose | tee -a ${LOG} done </usr/local/apache/common/all.dblist
Purge post-mortem
edit- Started at 2013-10-16T20:01Z
- Brandon provided Ops support
- Brandon discovered that traffic was not arriving at esams as expected
- Changed HTCP traffic target to 91.198.174.106 (nescio)
- Still couldn't get packets into esams
- Mark pointed out that terbium has no route to external addresses at about the same time Brandon figured that out
- Mark suggested directing traffic to equiad UDP relay host dobson @ 208.80.152.173
- Discussion was had over rate of sending packets and overflow possibility
- Decided to reduce batch size from 100 to 10 and adjust delay to 50ms
- Restarted at 2013-10-16T21:08Z
- Brandon became concerned by IOWAIT on cp3xxx
- Stopped run in the middle of commonswiki
- Brandon investigated historical IOWAIT and decided it was a daily trend
- Decided that was should add an additional delay just in case
- I changed the runner script to make it possible to skip the wikis that we had already purged
- Restarted at 2013-10-16T21:27Z with 100ms delay and skipping wikis before commonswiki
- Process continued to run with no further interruption until completion at 2013-10-17T02:43Z
- Approximately 1.6M purges were sent
Final driver script used:
#!/usr/bin/env bash
set -o nounset
set -o errexit
LOGDIR="./run-$(date +%Y%m%dT%H%M)"
mkdir ${LOGDIR}
#hooft: RELAY=91.198.174.113
# dobson
RELAY=208.80.152.173
START_AT=commonswiki
while read wiki; do
if [[ ${wiki} < ${START_AT} ]]; then
echo "> SKIPPING ${wiki}"
continue
fi
echo "== ${wiki} =="
LOG="${LOGDIR}/${wiki}.log"
date >${LOG}
mwscript purgeChangedPages.php $wiki \
--starttime 20130922000000 \
--endtime 20130926000000 \
--htcp-dest ${RELAY}:4827 \
--batch-size 10 \
--sleep-per-batch 100 \
--verbose |
tee -a ${LOG}
date >>${LOG}
done </usr/local/apache/common/all.dblist
purgeChangedFiles
editFaidon and Aaron volunteered me to create a similar script for Files. After discussion with Aaron I think the plan is to rename the existing purgeDeletedFiles
maintenance script to purgeChangedFiles
and add some new command line switches:
- --type <created|modified|deleted|all>
- Types of file changes to send purges for
- --htcp-dest <IP:post>
- HTCP announcement destination (IP:port)
The type flag will control which log messages are examined:
- created
- upload/upload, import/upload, import/interwiki
- modified
- upload/overwrite, upload/revert, move/move, move/move_redir
- deleted
- delete/delete, delete/revision, suppress/delete, suppress/revision
The move/*
logs will need special handling to purge not only the original page but also the move destination.