Requests for comment/Removing hit counters from MediaWiki core

Done in https://gerrit.wikimedia.org/r/#/c/150699/

Request for comment (RFC)
Removing hit counters from MediaWiki core
Component General
Creation date
Author(s) MZMcBride
Document status implemented

This is a request for comments regarding removing hit counters from MediaWiki core.

Background

edit

Introduced in rev:45 and rev:51!

Problem

edit

Approach is fatally flawed:

  • useless metric
  • crazy-simple (and consequently crazy inflexible) metric
    • no concept of date, just a "stupid" stat: it has no granularity for per-month or per-day
  • doesn't work with any kind of front-end caching
  • misleading/inaccurate
  • disabled on any large site; hurts performance when enabled (it's enabled by default, bizarrely) as it hits the database on every page view
  • unmaintained
  • better served by an extension
    Extension:Google Analytics Integration or similar
  • hit count data is lost when a page is deleted; when the page is restored, it starts back over at zero.

Proposal

edit

Some think counters should remain in MediaWiki core, some don't.

Trivia: https://en.wikipedia.org/wiki/Talk:December_2002

How to kill:

  • remove global configuration variable
  • remove database column (page.page_counter)
    • update documentation
    • separate ticket for Wikimedia wikis
  • kill Special page (MostPopular or something)
  • kill footer message and any other messages
  • grep code for remnants

prop. 2

edit

remove the junky counter which fails with ajax, implement a javascript/ajax counter that can deal with caching be it nginx varnish or Apache. 666threesixes666 (talk) 17:39, 18 May 2015 (UTC)[reply]

Notifications

edit
  • ...

Comments

edit

People like stats, I don't think it should be removed. I have however, been wanting for some time to add the needed hooks so one can use a different counter implementation. Platonides (talk) 19:46, 27 November 2013 (UTC)[reply]

Platonides: I agree that people like stats. However, these stats are inaccurate and have no granularity. Wikis wanting actual stats install Extension:Google Analytics or similar. I'm not sure why core needs its own hit counters when the cost is so much higher than the benefit. We know that all Wikimedia wikis do not and cannot use built-in hit counters. Which wikis are effectively using them?
My organisation's wiki was effectively using them, for hit-by-hit monitoring of visitor activity across sets of related pages. We only get a few thousand visitors per month, and for us, the stats appeared to be 100% accurate and very useful. We also use GoogleAnalytics, but the two sets of stats present differently, and have different uses. GA is great for showing trends and current activity to the back-office, but the onboard stats show long-term popularity, page by page, to visitors and editors as they browse. ErkDemon (talk) 18:28, 6 June 2015 (UTC)[reply]
ErkDemon: How was your organization able to measure the number of hits per day or per month or per year? Did you do your own record-keeping? --MZMcBride (talk) 00:14, 26 June 2015 (UTC)[reply]
ErkDemon didn't mention needing such a thing. The message was about statistics on "long-term popularity" being useful to them. --Nemo 06:42, 26 June 2015 (UTC)[reply]
Regarding hooks and such, I believe most engineers in this field have decided that using client-side JavaScript is the best way to gather analytics. ResourceLoader and site-wide JavaScript pages (such as MediaWiki:Common.js) already cover most of what you'd need, right? --MZMcBride (talk) 19:00, 2 December 2013 (UTC)[reply]

The stats are good for smaller wikis that don't have any other stats solution. They have to be in place from the time of installation if they're going to be accurate. They can be switched off later if the system administrator doesn't want them.

In what way are they too inflexible? Don't config settings like Manual:$wgHitcounterUpdateFreq give us some flexibility? I am curious, though — how much of a performance hit do wikis take when they use hitcounters? Leucosticte (talk) 07:06, 28 November 2013 (UTC)[reply]

The current implementation is a single integer field in the page table. There's no granularity: there's no additional information stored about unique v. non-unique visitors, browser user agents, geolocation data, etc.
As for performance, if hit counters are enabled (as they are by default) and there's no front-end caching, you're writing to the database on every page view. For small sites, this is negligible for performance. For larger sites, it can be a major performance bottleneck. No large wiki would ever keep this setting enabled, which is why guides such as User:Aaron Schulz/How to make MediaWiki fast and Five minutes of MediaWiki performance tuning recommend disabling counters. Even on sites that keep counters enabled, I fail to see the value if the stats are simply wrong. --MZMcBride (talk) 19:00, 2 December 2013 (UTC)[reply]
Wikia has globally turned off all this code as well due to performance reasons. It doesn't scale for wikis with any kind of traffic (thousands of hits per day is probably okay, but not millions). site_stats has a similar lack of time granularity and we disabled ss_total_views because its also a useless metric which causes performance problems but most of the other fields there are reasonable. I can see the argument for having a simple stats mechanism, but I personally feel like that should be an extension, not a part of core. Owyn (talk) 22:04, 2 December 2013 (UTC)[reply]
Okay ... but if the feature's being downgraded from a core feature to an extension, you write the replacement extension before removing the core feature. Otherwise you're making the situation worse. Just because the guys running high-traffic sites can't use the onboard stats and aren't interested in them, it doesn't mean that the feature should be taken away from everyone else. ErkDemon (talk) 18:28, 6 June 2015 (UTC)[reply]
I'm sure there are people who find this basic counter enough for their needs. There is no need to erase this from existence, core should provide proper hooks anyway, and then we can move this to an extension. Krinkle (talk) 01:13, 8 February 2014 (UTC)[reply]
There should be more than enough hook points to handle this. Off the top of my head you could grab one of the viewing-related hooks to do the hitcounter++ increment. If an extension wants to implement rudimentary hitcounting then they're more than welcome to. ^demon[omg plz] 19:07, 18 August 2014 (UTC)[reply]

Hit statistics requirements

edit

I have been playing with Piwik for a while for traffic analysis and other statistics. As google analytics, though, Piwik is way more than a hit counter. And, the more powerful an analytics system is, the more does one want to know :) So, one issue I came across several times is the question, what information do I need in order to get good statistics. Only once we do know the requirements, we can talk about whether we need a hook or any other mechanism to get this info to the analytics software. Here's some suggestions:

  • Article title
  • Namespace
  • Category
  • User (?)

All of these are readily available client side. In my own example, I also had the need to track only pages that contained a certain tag. A standard way of hooking in this information might be a good idea! --Mglaser (talk) 14:28, 31 December 2013 (UTC)[reply]

Indeed, Piwik is no replacement of the hit counter; it's something totally different. Hit counters are all most users need (as demonstrated by the popularity of http://stats.grok.se ). --Nemo 07:38, 9 August 2014 (UTC)[reply]

Alternatives?

edit

I can't see why the page_counter was deleted when there was no viable alternatives for small wikis to display hit counters. We're talking about wikis that only get a handful of visits each month, so that the counter (which I assume was derived from blogosphere hit counter) is vital to the website to gauge a page's popularity (is it popular? make more of them; is it never get visited? do something about it). Instead of removing it, why not fixing it, for example, add two kinds of counter: lifetime, and monthly (reset to 0 every month), and still many other better ideas than just dump it with no alternatives (Extension:Google Analytics was marked unstable, by the way). Bennylin (talk) 10:06, 29 May 2015 (UTC)[reply]

Well, it seems that nowadays, if a feature isn't useful to the Big Users and commercial wiki farms like Wales' wikia.com, it now gets not just disabled by default but deleted from the software. Sure, those guys pay the bills, but actually deleting functions that those guys happen not to use seems to be going a little bit too far. Surely there was an intermediate option between the feature being "on-by-default" until disabled, and the "nuclear option" of deleting the code altogether? ErkDemon (talk) 18:28, 6 June 2015 (UTC)[reply]
The hit counters in Mediawiki have always been poorly optimized which is why they get immediately disabled on larger wikis.(Too much database churn.) Thus the largest voices say the feature is useless and the smaller voices do not really say anything since it does affect them yet. The new Extension:HitCounters looks to bridge that gap by making the code optimized and perform well. Alexia E. Smith (talk) 19:18, 28 July 2015 (UTC)[reply]

Hitcounters Extension

edit

Just a follow up to those looking to keep or return this functionality. There is an extension that restores the page view functionality.

https://www.mediawiki.org/wiki/Extension:HitCounters

Ckoerner (talk) 02:48, 23 July 2015 (UTC)[reply]

As of now, not for all. 1) Not for users that cannot install extensions that are still in beta. 2) Not for users that need to use one of the "incompatible extensions". The latter set includes one extension used by Wikimedia, so one even is not save when only using these. 91.9.113.155 06:47, 28 July 2015 (UTC)[reply]
Fixes are in progress in DPL3 to support HitCounters. Alexia E. Smith (talk) 19:19, 28 July 2015 (UTC)[reply]
91.9.113.155, what do you mean by "cannot install extensions that are still in beta". I use a few that are labeled as beta in my production environments. I'm curious as to what you mean by this. Ckoerner (talk) 22:06, 29 July 2015 (UTC)[reply]

Security and Privacy Benefits

edit

This subject does not seem to be discussed, so it may be a good idea to add it. I realize the requests for comment is old. We run a small Mediawiki installation, and we are concerned for our users' security and privacy. We use the HitCounters extension. We don't have the scaling problems of large site operators like wikipedia.org.

Here are our observations with respect to HitCounters security and privacy benefits.

  • HitCounters is self contained
    • Everything needed resides at the site
    • No unexpected code changes by third parties
  • HitCounters is easy to administer
    • Only two settings: $wgDisableCounters and $wgHitcounterUpdateFreq
    • Easy to use correctly, hard to screw up
  • HitCounters uses server-side code
    • Users are not asked to download code from a third party
    • Users are not asked to execute code via Javascript
    • No need to audit or trust third party code
  • HitCounters does not track users
    • The extension merely increments a server-side counter
    • The extension does not track users, log visits, etc
    • The extension does not require handling of Do Not Track browser requests
    • The extension does not require handling Global Privacy Control browser settings
  • HitCounters does not share or export data
    • Data remains within the site's security boundary
  • HitCounters does not impose a policy on users
    • The extension does not make a choice for users, like "a user will consent to tracking"

Contrast this with an extension like GoogleAnalyticsMetrics. GoogleAnalyticsMetrics is non-trivial to setup and maintain. Site operators implicitly make a policy decision for users that subject them to data and privacy concerns. Site operators using GoogleAnalyticsMetrics inject Javascript code in the name of the extension, and then more un-audited code is fetched from the third party. GoogleAnalyticsMetrics will track users. GoogleAnalyticsMetrics does not honor Do Not Track or Global Privacy Control. The data then becomes property of Google, and not the site's operator.

The moment site operators engaged a third party like Google a user's security and privacy expectations were abandoned. When security and privacy are accounted for, the GoogleAnalyticsMetrics cure is worse than the HitCounters disease. The patient was effectively killed by the cure.

I think a third option - fix HitCounters - would have been a better choice. The third option did not seem to be entertained. It appears some folks were more interested in killing-off HitCounters rather than trying to improve it or fix it.

Noloader (talk) 12:32, 15 June 2021 (UTC)[reply]