The Analytics Query Service (aka AQS) is a set of public-facing APIs that serve analytics data. The AQS datasets are the product of batch compute jobs, the results of which are persisted to Cassandra to facilitate low-latency access. With time, additional uses of this pattern where identified and work began to evolve the infrastructure into a shared platform to leverage economies of scale, lower the barrier to entry, and be more conducive to experimenting with new datasets. The Data Gateway Service is one component to this nascent shared platform.
The Data Gateway Service is an abstraction that sits between consumers of published datasets, and the underlying database (currently Cassandra, though other databases are possible as well). It decouples consumers from the database(s), providing them a contract consisting of an HTTP interface and JSON-encoded results, (as opposed to an implementation-specific driver and/or idiomatic client library). This simplifies client access considerably, eliminating a great deal of boilerplate, and error-prone database handling (hint: lowered barrier to entry ).
Additionally, managing bespoke access to database(s) for an arbitrary number of internal teams rapidly becomes problematic. Even something as mundane as a fleet-wide upgrade of native driver code can be prohibitively expensive when code owners are many, varied in resources and priorities, or worse when code becomes orphaned entirely. Decoupling also opens the (future) possibility to transparently migrate or redistribute datasets among clusters, or to better manage resource utilization, caching, etc.
The premise is simple: Candidate datasets are purpose-built, and expect results that are verbatim (or nearly so) to what is stored (including attribute naming). The Data Gateway is nothing more than thin layer wiring HTTP semantics to a database table, and return JSON-serialized results (an array of rows containing one or more JSON objects).
Source: https://gitlab.wikimedia.org/repos/sre/data-gateway
URL
/public/image_suggestions/suggestions/{wiki}/{page_id}
Method
GET
Params
None
Data
None
Success
Example:HTTP / 1.0 200 OK
Content-Type : application/json
Content-Length : 5000
Date : Mon, 11 Apr 2022 22:07:59 GMT
{
"rows" : [
{
"wiki" : "anwiki" ,
"page_id" : 3326 ,
"id" : "644c90bc-ba40-11ec-ba4c-f0d4e2e69820" ,
"image" : "14_Agosto_2016_(1).jpg" ,
"confidence" : 80.0 ,
"found_on" : null ,
"kind" : [ "istype-commons-category" ],
"origin_wiki" : "commonswiki" ,
"page_qid" : "Q123" ,
"page_rev" : 1797958 ,
"section_heading" : "section title (null if this is an article-level suggestion)" ,
"section_index" : 1
},
...
}
Error
Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json
.
Code
Reason
Example
500
Internal server error
{
"status" : 500 ,
"type" : "about:blank" ,
"title" : "Cassandra query error" ,
"detail" : "An unknown error occurred, contact the administrator(s) ..."
}
Example
$ curl http://api.example.org/public/image_suggestions/suggestions/anwiki/3326
Notes
URL
/private/image_suggestions/feedback/{wiki}/{page_id}
Method
GET
Params
None
Data
None
Success
Example:HTTP / 1.0 200 OK
Content-Type : application/json
Content-Length : 100
Date : Mon, 11 Apr 2022 22:07:59 GMT
{ "rows" : [ ... ]}
Error
Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json
.
Code
Reason
Example
500
Internal server error
{
"status" : 500 ,
"type" : "about:blank" ,
"title" : "Cassandra query error" ,
"detail" : "An unknown error occurred, contact the administrator(s) ..."
}
Example
$ curl http://api.example.org/private/image_suggestions/feedback/enwiki/53848
Notes
This table is a duplication of a relationship that MediaWiki is canonical for. It is maintained here for convenience, with the understanding that it is not trustworthy (it should not be considered a source of truth).
URL
/private/image_suggestions/instanceof_cache/{wiki}/{page_id}
Method
GET
Params
None
Data
None
Success
Example:HTTP / 1.0 200 OK
Content-Type : application/json
Content-Length : 100
Date : Mon, 11 Apr 2022 22:07:59 GMT
{
"rows" : [
{
"wiki" : "anwiki" ,
"page_id" : 3326 ,
"instance_of" : [ "Q112099" , "Q3624078" , "Q6256" ],
"page_rev" : 1797958
}
]
}
Error
Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json
.
Code
Reason
Example
500
Internal server error
{
"status" : 500 ,
"type" : "about:blank" ,
"title" : "Cassandra query error" ,
"detail" : "An unknown error occurred, contact the administrator(s) ..."
}
Example
$ curl http://api.example.org/private/image_suggestions/instanceof_cache/anwiki/3326
Notes
This table is a duplication of a relationship that MediaWiki is canonical for. It is maintained here for convenience, with the understanding that it is not trustworthy (it should not be considered a source of truth).
URL
/private/image_suggestions/title_cache/{wiki}/{title}
Method
GET
Params
None
Data
None
Success
Example:HTTP / 1.0 200 OK
Content-Type : application/json
Content-Length : 100
Date : Mon, 11 Apr 2022 22:07:59 GMT
{ "rows" : [ ... ]}
Error
Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json
.
Code
Reason
Example
500
Internal server error
{
"status" : 500 ,
"type" : "about:blank" ,
"title" : "Cassandra query error" ,
"detail" : "An unknown error occurred, contact the administrator(s) ..."
}
Example
$ curl http://api.example.org/private/image_suggestions/title_cache/enwiki/Banana
Notes
Commons Impact Metrics
edit
Source: https://gitlab.wikimedia.org/repos/sre/data-gateway
start and end parameters are of form
RFC3339
URL
/public/commons/category_metrics_snapshot/{category}/{start}/{end}
Method
GET
Params
None
Data
None
Success
Example:HTTP / 1.0 200 OK
Content-Type : application/json
Content-Length : 5000
Date : Mon, 11 Apr 2022 22:07:59 GMT
{ "rows" : [ ... ] }
Error
Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json
.
Code
Reason
Example
400
Bad Request
{
"status" : 400 ,
"type" : "about:blank" ,
"title" : "Invalid RFC3339 timestamp" ,
"detail" : "Unable to parse timestamp: ..."
}
500
Internal server error
{
"status" : 500 ,
"type" : "about:blank" ,
"title" : "Cassandra query error" ,
"detail" : "An unknown error occurred, contact the administrator(s) ..."
}
Example
$ curl https://api.example.org/public/commons/category_metrics_snapshot/:category/:start/:end
Notes
start and end parameters are of form
RFC3339
URL
/public/commons/media_file_metrics_snapshot/{media_file}/{start}/{end}
Method
GET
Params
None
Data
None
Success
Example:HTTP / 1.0 200 OK
Content-Type : application/json
Content-Length : 5000
Date : Mon, 11 Apr 2022 22:07:59 GMT
{ "rows" : [ ... ] }
Error
Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json
.
Code
Reason
Example
400
Bad Request
{
"status" : 400 ,
"type" : "about:blank" ,
"title" : "Invalid RFC3339 timestamp" ,
"detail" : "Unable to parse timestamp: ..."
}
500
Internal server error
{
"status" : 500 ,
"type" : "about:blank" ,
"title" : "Cassandra query error" ,
"detail" : "An unknown error occurred, contact the administrator(s) ..."
}
Example
$ curl https://api.example.org/public/commons/media_file_metrics_snapshot/:media_file/:start/:end
Notes
Pageviews Per Category Monthly
edit
start and end parameters are of form
RFC3339
URL
/public/commons/pageviews_per_category_monthly/{category}/{category_scope}/{wiki}/{start}/{end}
Method
GET
Params
None
Data
None
Success
Example:HTTP / 1.0 200 OK
Content-Type : application/json
Content-Length : 5000
Date : Mon, 11 Apr 2022 22:07:59 GMT
{ "rows" : [ ... ] }
Error
Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json
.
Code
Reason
Example
400
Bad Request
{
"status" : 400 ,
"type" : "about:blank" ,
"title" : "Invalid RFC3339 timestamp" ,
"detail" : "Unable to parse timestamp: ..."
}
500
Internal server error
{
"status" : 500 ,
"type" : "about:blank" ,
"title" : "Cassandra query error" ,
"detail" : "An unknown error occurred, contact the administrator(s) ..."
}
Example
$ curl https://api.example.org/public/commons/pageviews_per_category_monthly/:category/:category_scope/:wiki/:start/:end
Notes
Pageviews Per Media File Monthly
edit
start and end parameters are of form
RFC3339
URL
/public/commons/pageviews_per_media_file_monthly/{media_file}/{wiki}/{start}/{end}
Method
GET
Params
None
Data
None
Success
Example:HTTP / 1.0 200 OK
Content-Type : application/json
Content-Length : 5000
Date : Mon, 11 Apr 2022 22:07:59 GMT
{ "rows" : [ ... ] }
Error
Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json
.
Code
Reason
Example
400
Bad Request
{
"status" : 400 ,
"type" : "about:blank" ,
"title" : "Invalid RFC3339 timestamp" ,
"detail" : "Unable to parse timestamp: ..."
}
500
Internal server error
{
"status" : 500 ,
"type" : "about:blank" ,
"title" : "Cassandra query error" ,
"detail" : "An unknown error occurred, contact the administrator(s) ..."
}
Example
$ curl https://api.example.org/public/commons/pageviews_per_media_file_monthly/:media_file/:wiki/:start/:end
Notes
Edits Per Category Monthly
edit
start and end parameters are of form
RFC3339
URL
/public/commons/edits_per_category_monthly/{category}/{category_scope}/{edit_type}/{start}/{end}
Method
GET
Params
None
Data
None
Success
Example:HTTP / 1.0 200 OK
Content-Type : application/json
Content-Length : 5000
Date : Mon, 11 Apr 2022 22:07:59 GMT
{ "rows" : [ ... ] }
Error
Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json
.
Code
Reason
Example
400
Bad Request
{
"status" : 400 ,
"type" : "about:blank" ,
"title" : "Invalid RFC3339 timestamp" ,
"detail" : "Unable to parse timestamp: ..."
}
500
Internal server error
{
"status" : 500 ,
"type" : "about:blank" ,
"title" : "Cassandra query error" ,
"detail" : "An unknown error occurred, contact the administrator(s) ..."
}
Example
$ curl https://api.example.org/public/commons/edits_per_category_monthly/:category/:category_scope/:edit_type/:start/:end
Notes
Edits Per User Monthly
edit
start and end parameters are of form
RFC3339
URL
/public/commons/edits_per_user_monthly/{user_name}/{edit_type}/{start}/{end}
Method
GET
Params
None
Data
None
Success
Example:HTTP / 1.0 200 OK
Content-Type : application/json
Content-Length : 5000
Date : Mon, 11 Apr 2022 22:07:59 GMT
{ "rows" : [ ... ] }
Error
Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json
.
Code
Reason
Example
400
Bad Request
{
"status" : 400 ,
"type" : "about:blank" ,
"title" : "Invalid RFC3339 timestamp" ,
"detail" : "Unable to parse timestamp: ..."
}
500
Internal server error
{
"status" : 500 ,
"type" : "about:blank" ,
"title" : "Cassandra query error" ,
"detail" : "An unknown error occurred, contact the administrator(s) ..."
}
Example
$ curl https://api.example.org/public/commons/edits_per_user_monthly/:user_name/:edit_type/:start/:end
Notes
Top Pages Per Category Monthly
edit
URL
/public/commons/top_pages_per_category_monthly/{category}/{category_scope}/{wiki}/{year}/{month}
Method
GET
Params
None
Data
None
Success
Example:HTTP / 1.0 200 OK
Content-Type : application/json
Content-Length : 5000
Date : Mon, 11 Apr 2022 22:07:59 GMT
{ "rows" : [ ... ] }
Error
Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json
.
Code
Reason
Example
500
Internal server error
{
"status" : 500 ,
"type" : "about:blank" ,
"title" : "Cassandra query error" ,
"detail" : "An unknown error occurred, contact the administrator(s) ..."
}
Example
$ curl https://api.example.org/public/commons/top_pages_per_category_monthly/:category/:category_scope/:wiki/:year/:month
Notes
Top Wikis Per Category Monthly
edit
URL
/public/commons/top_wikis_per_category_monthly/{category}/{category_scope}/{year}/{month}
Method
GET
Params
None
Data
None
Success
Example:HTTP / 1.0 200 OK
Content-Type : application/json
Content-Length : 5000
Date : Mon, 11 Apr 2022 22:07:59 GMT
{ "rows" : [ ... ] }
Error
Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json
.
Code
Reason
Example
500
Internal server error
{
"status" : 500 ,
"type" : "about:blank" ,
"title" : "Cassandra query error" ,
"detail" : "An unknown error occurred, contact the administrator(s) ..."
}
Example
$ curl https://api.example.org/public/commons/top_wikis_per_category_monthly/:category/:category_scope/:year/:month
Notes
Top Viewed Categories Monthly
edit
URL
/public/commons/top_viewed_categories_monthly/{category}/{category_scope}/{wiki}/{year}/{month}
Method
GET
Params
None
Data
None
Success
Example:HTTP / 1.0 200 OK
Content-Type : application/json
Content-Length : 5000
Date : Mon, 11 Apr 2022 22:07:59 GMT
{ "rows" : [ ... ] }
Error
Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json
.
Code
Reason
Example
500
Internal server error
{
"status" : 500 ,
"type" : "about:blank" ,
"title" : "Cassandra query error" ,
"detail" : "An unknown error occurred, contact the administrator(s) ..."
}
Example
$ curl https://api.example.org/public/commons/top_viewed_categories_monthly/:category/:category_scope/:wiki/:year/:month
Notes
Top Pages Per Media File Monthly
edit
URL
/public/commons/top_pages_per_media_file_monthly/{media_file}/{wiki}/{year}/{month}
Method
GET
Params
None
Data
None
Success
Example:HTTP / 1.0 200 OK
Content-Type : application/json
Content-Length : 5000
Date : Mon, 11 Apr 2022 22:07:59 GMT
{ "rows" : [ ... ] }
Error
Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json
.
Code
Reason
Example
500
Internal server error
{
"status" : 500 ,
"type" : "about:blank" ,
"title" : "Cassandra query error" ,
"detail" : "An unknown error occurred, contact the administrator(s) ..."
}
Example
$ curl https://api.example.org/public/commons/top_pages_per_media_file_monthly/:media_file/:wiki/:year/:month
Notes
URL
/public/commons/top_wikis_per_media_file_monthly/{media_file}/{year}/{month}
Method
GET
Params
None
Data
None
Success
Example:HTTP / 1.0 200 OK
Content-Type : application/json
Content-Length : 5000
Date : Mon, 11 Apr 2022 22:07:59 GMT
{ "rows" : [ ... ] }
Error
Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json
.
Code
Reason
Example
500
Internal server error
{
"status" : 500 ,
"type" : "about:blank" ,
"title" : "Cassandra query error" ,
"detail" : "An unknown error occurred, contact the administrator(s) ..."
}
Example
$ curl https://api.example.org/public/commons/top_wikis_per_media_file_monthly/:media_file/:year/:month
Notes
URL
/public/commons/top_viewed_media_files_monthly/{category}/{category_scope}/{wiki}/{year}/{month}
Method
GET
Params
None
Data
None
Success
Example:HTTP / 1.0 200 OK
Content-Type : application/json
Content-Length : 5000
Date : Mon, 11 Apr 2022 22:07:59 GMT
{ "rows" : [ ... ] }
Error
Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json
.
Code
Reason
Example
500
Internal server error
{
"status" : 500 ,
"type" : "about:blank" ,
"title" : "Cassandra query error" ,
"detail" : "An unknown error occurred, contact the administrator(s) ..."
}
Example
$ curl https://api.example.org/public/commons/top_viewed_media_files_monthly/:category/:category_scope/:wiki/:year/:month
Notes
Top Edited Categories Monthly
edit
URL
/public/commons/top_edited_categories_monthly/{category}/{category_scope}/{edit_type}/{year}/{month}
Method
GET
Params
None
Data
None
Success
Example:HTTP / 1.0 200 OK
Content-Type : application/json
Content-Length : 5000
Date : Mon, 11 Apr 2022 22:07:59 GMT
{ "rows" : [ ... ] }
Error
Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json
.
Code
Reason
Example
500
Internal server error
{
"status" : 500 ,
"type" : "about:blank" ,
"title" : "Cassandra query error" ,
"detail" : "An unknown error occurred, contact the administrator(s) ..."
}
Example
$ curl https://api.example.org/public/commons/top_edited_categories_monthly/:category/:category_scope/:edit_type/:year/:month
Notes
URL
/public/commons/top_editors_monthly/{category}/{category_scope}/{edit_type}/{year}/{month}
Method
GET
Params
None
Data
None
Success
Example:HTTP / 1.0 200 OK
Content-Type : application/json
Content-Length : 5000
Date : Mon, 11 Apr 2022 22:07:59 GMT
{ "rows" : [ ... ] }
Error
Errors are JSON objects conforming to [rfc:7807 RFC7807 (Problem Details for HTTP APIs)] with a content-type of application/problem+json
.
Code
Reason
Example
500
Internal server error
{
"status" : 500 ,
"type" : "about:blank" ,
"title" : "Cassandra query error" ,
"detail" : "An unknown error occurred, contact the administrator(s) ..."
}
Example
$ curl https://api.example.org/public/commons/top_editors_monthly/:category/:category_scope/:edit_type/:year/:month
Notes