Requests for comment/Structured logging
This is a request for comment about adding Structured logging to MediaWiki. It specifies a data model for MediaWiki log messages and an interface for generating log messages that conform to the model.
Structured logging | |
---|---|
Component | General |
Creation date | |
Author(s) | Bryan Davis, Ori Livneh, Aaron Schulz |
Document status | implemented This RfC was never formally approved, but the gerrit changes to implement PSR-3 support have been merged --BDavis (WMF) (talk) 15:44, 10 April 2015 (UTC) |
By "data model" we simply mean an agreed-upon set of fields containing metadata that describes the context in which the log message was generated. The model specifies the name of each field and the value it can hold. Log messages generated via the interface that we propose below would conform to this model, allowing them to be serialized to a machine-readable format.A standard for machine-readable metadata common to all log messages would make it possible to query, collate, and summarize operational data in ways that are currently very difficult to achieve.
We think that the ability to cross-reference logs and query by context will make troubleshooting bugs easier. We also think that ongoing analysis of aggregated log data would reveal which files, interfaces, and code paths are especially prone to bugs or poor performance, and that this information would help us make MediaWiki more reliable and performant.
Problems with the current interface
editMost operational logging in MediaWiki is done via wfDebugLog
calls. Messages logged via wfDebugLog
specify a topic name (or a log bucket). This name usually identifies the name of the component that is emitting the log message. Some parts of the code which generate different kinds of log messages have compound topic names that describe the type of log message being logged, usually in terms of severity ("memcached-serious", for example). The ad hoc overloading of the log group property to encode severity is a good example of existing usage that is twisting the interface to overcome its limitations. This is a good indication that the current interface is inadequate.
Because there is no established standard for encoding severity, the density of logging calls in MediaWiki code varies greatly. Access to the production logs is limited, and many developers will only ever review logs generated on their development instance and consequently fail to appreciate the cost of excessively verbose logging at scale. The instrumentation of code is often pitched to its initial development rather than its ongoing maintenance.
To keep chatty code from drowning out important log data, the logging setup on the Wikimedia production cluster does not automatically transmit all logging topics to the log processor. A developer must first manually enroll the log bucket by adding it to the $wgDebugLogGroups
configuration var. The problem with this approach is that the absence of log data from a particular component is typically noticed when it is needed the most: that is, when the component is suspected of misbehaving in ways that are difficult to reproduce or reason about. Thus log buckets are usually enabled to help solve a particular bug, and they are commonly left enabled long after the motivation for enabling them has ceased to be relevant. The overall effect is that Wikimedia's logs are curated on the basis of historic interest rather than abiding relevance.
Design principles
editFiltering logs by severity and grouping logs by attributes only works if log messages are uniform in structure and content.
What we would need:
- tools for wider audience
- aggregation
- de-duplication
- cross system correlation
- alerting
- reporting
This is not a wholly new idea. Let's look at what's out there and see if we can find a solution or at least borrow the best bits.
Current logging
edit- wfDebug( $text, $logonly = false )
- Logs developer provided free-form text + optional global prefix string
- Possibly has time-elapsed since start of request and real memory usage inserted between prefix and message
- Delegates to
wfErrorLog()
- wfDebugMem( $exact = false )
- Uses
wfDebug()
to log "Memory usage: N (kilo)?bytes"
- wfDebugLog( $logGroup, $text, $public = true)
- Logs either to a custom log sink defined in
$wgDebugLogGroups
or viawfDebug()
- Default
- Prepends "[$logGroup] " to message
- Custom sink
- May log only a fraction of occurrences via
mt_rand()
sampling - Prepends
wfTimestamp( TS_DB ) wfWikiID() wfHostname():
to message - Delegates to
wfErrorLog()
to actually write to sink
- May log only a fraction of occurrences via
- wfLogDBError( $text )
- Enabled/disabled with
$wgDBerrorLog
sink location - Logs "$date\t$host\t$wiki\$text" via
wfErrorLog()
to$wgDBerrorLog
sink - Date format is
'D M j G:i:s T Y'
with possible custom timezone specified by$wgDBerrorLogTZ
- wfErrorLog( $text, $file )
- Writes
$text
to either a local file or a UDP packet depending on the value of$file
- UDP
- If
$file
ends with a string following the host name/IP it will be used as a prefix to$text
- The final message with optional prefix added will be trimmed to 65507 bytes and a trailing newline may be added
- If
- FILE
$text
will be appended to file unless the resulting file size would be >= 0x7fffffff bytes (~2G)
- wfLogProfilingData()
- Delegates to
wfErrorLog()
using$wgDebugLogFile
sink - Creates a tab delimited log message including timestamp, elapsed request time, requesting IPs, and request URL followed by a newline and the profiler output.
- Date is from
gmdate( 'YmdHis' )
- Recent changes logging
- Transport and serialization format may be specified via
$wgRCFeeds
- Various implementations in
includes/rcfeeds/
, including IRC, UDP & Redis.
Proposal
editSerialization Format
editRather than dive down a rabbit hole of trying to find a universal spec for log file formats let's just keep things simple. PHP loves dictionaries (well they call them arrays but whatever; key=value collections) and has a pretty fast json formatter. So the simplest thing that will work reasonably well would be to keep log events internally as PHP arrays and serialize them as json objects. This will be relatively easy to recreate on other internally developed applications as well with the possible exception of apps written in low level languages such as C that don't have ready made key=value data structures.
Data collected
editHere's a list of the data points that we should definitely have:
- timestamp
- Local system time that event occurred either as UNIX epoch timestamp or ISO 8601 formatted string
date( 'c' )
- host
- FQDN of system where event occurred
php_uname( 'n' )
- source
- Name of application generating events; correlates to APP-NAME of RFC 5424
'Mediawiki'
- pid
- Unix process id, thread id, thread name or other process identifier
getmypid()
- severity
- Message severity (RFC 5424 levels)
'WARN'
- channel
- Log channel. Often the function/module/class creating message (similar to
wgDebugLogGroups
groups) get_class( $this )
- message
- Message body
"Help! I'm trapped in a logger factory!"
Additionally we suggest adding a semi-structured "context" component to logs. This would be a collection of key=value pairs that the developers determine to be useful for debugging. There should be two different methods available to add such data. The first is as an optional argument to the logging method itself and the second is a global collection patterned after the Log4J Mapped Diagnostic Context (MDC).
The local collection is useful for obvious reasons such as attaching class/method state data to the log output and deferring stringification of resources in the event that runtime configuration is ignoring messages of the provided level.
- file
- Source file triggering message
- line
- Source line triggering message
- errcode
- Numeric or string identifier for the error
- exception
- Live exception object to be stringified by the log event emitter
- args
- key=value map of method arguments
The global collection is very useful for attaching global application state data to all log messages that may be emitted. Examples of data that could be included:
- vhost
- Apache vhost processing request
$_SERVER['HTTP_HOST']
- ip
- Requesting ip address
$_SERVER['REMOTE_ADDR']
- user
- Authenticated user identity
- req
- Request ID; UUID or similar token that can be used to correlate all log messages connected to a given request
API
editThe developer facing API is the PSR-3 logging interface standard.
<?php
namespace Psr\Log;
/**
* Describes a logger instance
*
* The message MUST be a string or object implementing __toString().
*
* The message MAY contain placeholders in the form: {foo} where foo
* will be replaced by the context data in key "foo".
*
* The context array can contain arbitrary data, the only assumption that
* can be made by implementors is that if an Exception instance is given
* to produce a stack trace, it MUST be in a key named "exception".
*
* See https://github.com/php-fig/fig-standards/blob/master/accepted/PSR-3-logger-interface.md
* for the full interface specification.
*/
interface LoggerInterface
{
/**
* System is unusable.
*
* @param string $message
* @param array $context
* @return null
*/
public function emergency($message, array $context = array());
/**
* Action must be taken immediately.
*
* Example: Entire website down, database unavailable, etc. This should
* trigger the SMS alerts and wake you up.
*
* @param string $message
* @param array $context
* @return null
*/
public function alert($message, array $context = array());
/**
* Critical conditions.
*
* Example: Application component unavailable, unexpected exception.
*
* @param string $message
* @param array $context
* @return null
*/
public function critical($message, array $context = array());
/**
* Runtime errors that do not require immediate action but should typically
* be logged and monitored.
*
* @param string $message
* @param array $context
* @return null
*/
public function error($message, array $context = array());
/**
* Exceptional occurrences that are not errors.
*
* Example: Use of deprecated APIs, poor use of an API, undesirable things
* that are not necessarily wrong.
*
* @param string $message
* @param array $context
* @return null
*/
public function warning($message, array $context = array());
/**
* Normal but significant events.
*
* @param string $message
* @param array $context
* @return null
*/
public function notice($message, array $context = array());
/**
* Interesting events.
*
* Example: User logs in, SQL logs.
*
* @param string $message
* @param array $context
* @return null
*/
public function info($message, array $context = array());
/**
* Detailed debug information.
*
* @param string $message
* @param array $context
* @return null
*/
public function debug($message, array $context = array());
/**
* Logs with an arbitrary level.
*
* @param mixed $level
* @param string $message
* @param array $context
* @return null
*/
public function log($level, $message, array $context = array());
}
MediaWiki\Logger\LoggerFactory provides static methods for registering PSR-3 providers and getting PSR-3 Logger instances from the current provider:
/**
* Get a named logger instance from the currently configured logger factory.
*
* @param string $channel Logger channel (name)
* @return Psr\Log\LoggerInterface
*/
public static function getInstance( $channel );
/**
* Register a service provider to create new \Psr\Log\LoggerInterface
* instances.
*
* @param \MediaWiki\Logger\Spi $provider Provider to register
*/
public static function registerProvider( Spi $provider );
The MediaWiki\Logger\LoggerFactory::getInstance()
method is the means by which most code would acquire an Psr\Log\LoggerInterface instance. It will in turn delegate the creation of Psr\Log\LoggerInterface to a class implementing the MediaWiki\Logger\Spi
interface:
namespace MediaWiki\Logger;
interface Spi {
/**
* Get a logger instance.
*
* @param string $channel Logging channel
* @return Psr\Log\LoggerInterface Logger instance
*/
public function getLogger( $channel );
}
This service provider interface will allow the backend logging library to implemented in multiple ways. The $wgMWLoggerDefaultSpi
global provides the class name of the default MediaWiki\Logger\Spi
implementation. This can be altered via the normal means. Alternately MediaWiki\Logger\LoggerFactory::registerProvider()
can be invoked early in the application setup to inject an alternate SPI implementation.
Managing third-party libraries
editThe use of PSR-3 and monolog introduces the need to manage third-party code dependencies for MediaWiki core. Although there are some third-party components in includes/libs, to my knowledge this is the first large scale use of external PHP code by MediaWiki core.
In a more perfect world, MediaWiki would already be a system that assembled a collection of libraries using a well defined dependency management system. This has actually been envisioned in at least two RFCs (MediaWiki libraries and Third-party components).
Several possible solutions were examined and ultimately the use of Composer as a dependency management system was chosen. The use of PSR-3 is mandatory so this library was added to the "require" section of composer.json
in the mediawiki/core.git repository. The use of monolog is optional, so it was added to the "suggest" section. Since this composer.json
file is in the root of the MediaWiki core project, it does to some extent conflict with the Extension management with Composer RFC. Additional work is being done to mitigate this problem for users who were already making use of composer.json
(phab:T67188).
The Composer managed libraries for use on WMF cluster RFC was written and subsequently approved to deal with the process of staging and deploying Composer managed content in the WMF production environment. Installations of MediaWiki via git clones are expected to manage libraries themselves by using the mediawiki/vendor.git
repo, using Composer directly, or creating a mediawiki/vendor.git
work-alike for their particular deployment. Releases will be expected to include the required libraries in the official tarball.
Implementation
editAn original proof of concept implementation was submitted as Gerrit change 112699. The approach of using Composer to manage external dependencies was given a +1 by Tim. The monolithic patch was then split into four smaller patches for closer review and approval:
- Gerrit change 119939 Add Composer managed libraries Done
- Require Psr\Log and suggest Monolog libraries to MW-Core via
composer.json
. - Gerrit change 119940 Add a PSR-3 based logging interface Done
- The MWLogger class is actually a thin wrapper around any PSR-3 LoggerInterface implementation. Named MWLogger instances can be obtained from the MWLogger::getInstance() static method. MWLogger expects a class implementing the MWLoggerSpi interface to act as a factory for new MWLogger instances. A concrete MWLoggerSpi implementation using the Monolog library is also provided.
- Gerrit change 119941 Enable MWLogger logging for legacy logging methods Done
- Introduces the
MWLoggerLegacyLogger
logger to mimic legacy logging behavior as a PSR-3 compliant logger component and sets it as the default$wgMWLoggerDefaultSpi
implementation. wfDebug, wfDebugLog and wfLogDBError are changed to useMWLogger
. - Gerrit change 119942 Enable MWLogger logging for wfLogProfilingData Done
- Output structured profiling report data from wfLogProfilingData.
- Gerrit change 184830 Replace MWLogger with MWLoggerFactory
- Design change to replace MWLogger with direct use of PSR-3 interfaces in MediaWiki. Done
- Gerrit change 198674 Move MWLogger classes to MediaWiki\Logger namespace
- Move the MWLogger PSR-3 logging related classes into the MediaWiki\Logger namespace. Create shim classes to ease migration of existing MWLoggerFactory usage to the namespaced classes. Done
See Also
edit- Relevant bugs
- Related RFCs
- RFC Review
- Logging standards
- Additional commentary on logging
-
- User:BDavis_(WMF)/Projects/Structured_logging
- http://gregoryszorc.com/blog/2012/12/06/thoughts-on-logging---part-1---structured-logging/
- https://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/
- http://carolina.mff.cuni.cz/~trmac/blog/2011/structured-logging/
- http://dev.splunk.com/view/logging-best-practices/SP-CAAADP6
- https://delicious.com/bd808/logging