Open main menu

PoolCounter

PoolCounter is a network daemon which provides mutex-like functionality, with a limited wait queue length. If too many servers try to do the same thing at the same time, the wait queue overflows and some configurable action might be taken by subsequent clients, such as displaying an error message or using a stale cache entry.

It was created to avoid massive wastage of CPU due to parallel parsing when the cache of a popular article is invalidated (the "Michael Jackson problem"), but has later been put to other uses as well, such as limiting thumbnail scaling requests.

MediaWiki uses PoolCounter via an abstract interface (see $wgPoolCounterConf) which allows alternative implementations.

SourceEdit

The implementation is located in multiple places:

There is also a Redis-based default implementation in MediaWiki core, and an experimental Python client for the daemon in Thumbor.

ArchitectureEdit

The server is a single-threaded C program based on libevent. It does not use autoconf, it just has a makefile which is suitable for a normal Linux environment. It has currently has no daemonize code, and so is backgrounded by systemd.

In MediaWiki, the client must be a subclass of PoolCounter and the class holding the application-specific logic must be a subclass of PoolCounterWork. See Manual:$wgPoolCounterConf#Usage for details.

ProtocolEdit

The network protocol is line-based, with parameters separated by spaces (spaces in parameters are percent-encoded). The client opens a connection, sends a lock acquire command, does the work, sends a lock release command, then closes the connection. The following commands are defined:

ACQ4ANY <key> <active worker limit> <total worker limit> <timeout>
This is used to acquire a lock when the client is capable of using the cache entry generated by another process. If the active pool worker limit is exceeded, the server will give a delayed response to this command. When a client completes its work, all processes which are waiting with ACQ4ANY will immediately be woken so that they can read the new cache entry.
ACQ4ME <key> <active worker limit> <total worker limit> <timeout>
This is used to acquire a lock when cache sharing is not possible or not applicable, for example when an article rendering request involves a non-default stub threshold . When a lock of this kind is released, only one waiting process will be woken, so as to keep the worker population the same.
RELEASE <key>
releases a lock
STATS [FULL|UPTIME]
show statistics

The possible responses for ACQ4ANY/ACQ4ME:

LOCKED
successfully acquired a lock. Client is expected to do the work, then send RELEASE.
DONE
sent to wake up a waiting client
QUEUE_FULL
there are more workers than <total worker limit>
TIMEOUT
there are more workers than <active worker limit>; no slot was freed up after waiting for <timeout> seconds
LOCK_HELD
trying to get a lock when one is already held

For RELEASE:

NOT_LOCKED
client asked to release a lock that did not exist
RELEASED
lock successfully released

For any command:

ERROR <message>

ConfigurationEdit

The server does not require configuration. Configuration of pool sizes, wait timeouts, etc. is done dynamically by the client.

For MediaWiki-side configuration, see

TestingEdit

$ echo 'STATS FULL' | nc -w1 localhost 7531 
uptime: 633 days, 15209h 42m 26s
total processing time: 85809 days 2059430h 0m 24.000000s
average processing time: 0.957994s
gained time: 1867 days 44820h 50m 24.000000s
waiting time: 390 days 9365h 18m 24.000000s
waiting time for me: 389 days 9343h 3m 28.000000s
waiting time for anyone: 22h 14m 53.898438s
waiting time for good: 520 days 12503h 48m 24.000000s
wasted timeout time: 473 days 11375h 2m 44.000000s
total_acquired: 7739031655
total_releases: 7736374042
hashtable_entries: 119
processing_workers: 119
waiting_workers: 216
connect_errors: 0
failed_sends: 1
full_queues: 10294544
lock_mismatch: 227
release_mismatch: 0
processed_count: 7739031536