Manual talk:$wgSpamRegex

Latest comment: 1 year ago by Sushimustwrite in topic Examples need to be arrays instead of strings
edit
  • Large Example shows on article:
 "\<</span>\s*a\s*href|".  # This blocks all href links entirely, forcing wiki syntax

in Source:

"\<\s*a\s*href|".  # This blocks all href links entirely, forcing wiki syntax

So this is a parser issue? First will not work because of "/" as delimiter ends the regex. Fails with error "Unknown modifier 'p').

--Martin

  • Are there other categories which this could/should go into? ex. security or spam protection?

Sy Ali 17:48, 19 April 2006 (UTC)Reply

  • On my MediaMiki, using the "Large Example," spam is getting through the regex for "overflow" by dropping the closing semi-colon. So, I deleted the semi-colon and that seems to be working (for now). It might be useful to others to remove it since it's not necessary. I can't, I tried (the spam protection used here won't let me save). Latrippi 02:28, 22 July 2006 (UTC)Reply
edit

Most wikispam I've encountered has taken the form [http://url/ keyword keyword] [http://url2/ keyword2 keyword2] etc. So, what about using this to block many links in a row? I'm thinking something like...

$wgSpamRegex = '/(\[http:\/\/[a-z0-9\.\/\%_-]+(\s+[a-z0-9\.-]+)+\]\s+){10}/i';

or

$wgSpamRegex = '/(\[http:\/\/[^\s]+\s+[^\[]+\s*?){10}/i';

Comments? (Handy PHP regex tester...)

--Alxndr 03:18, 26 November 2006 (UTC)Reply

How does one stop the 'MediaWiki:Spamprotectiontext' telling the spammer what words just got banned and therefore rewording their spam to get passed it?

I'd love to know.

--Quatermass 20:43, 9 May 2007 (UTC)Reply

You can change that message in Special:Allmessages Jonathan3 18:09, 8 September 2007 (UTC)Reply
You read you can delete the "$1" on MediaWiki:Spamprotectionmatch in order to achieve that. w:User:JanCK10:52, 18 November 2007 (UTC)Reply


log

edit

Is there a log that shows how ofter the my mediawiki denies edits? w:User:JanCK00:56, 18 November 2007 (UTC)Reply

$wgSpamRegex is not working in my wiki

edit

Maybe someone can help me. I have configured the variable wgSpamRegex like Manual:$wgSpamRegex#A Large Example but if i try to test the filter with words of spam nothing happens. Is there something else to do? The version of mediawiki is 1.13.0. Thx! --88.65.198.156 18:24, 5 October 2008 (UTC)Reply

You can try Extension:SpamRegex. iAlex 18:35, 5 October 2008 (UTC)Reply
Thx - now it's working but the only problem is that I get a php warning if the spamregex filter alerts. Here the output from html
<b>Warning</b>:  preg_match() [<a href='function.preg-match'>function.preg-match</a>]: 
Delimiter must not be alphanumeric or backslash in <b>/../htdocs/includes/EditPage.php</b> on line <b>747</b><br />

What can I do against this output? Thx again!

Is there nobody who has the same problem? --82.113.113.161 15:24, 13 March 2009 (UTC)Reply
edit

I have tried to add a limit for number of links to 15 as mentioned in the article, but am still able to add articles with more than 15 links. This is my regex in its entirety:

$wgSpamRegex = "/".                        # The "/" is the opening wrapper
               "s-e-x|zoofilia|sexyongpin|grusskarte|geburtstagskarten|animalsex|".
               "sex-with|dogsex|adultchat|adultlive|camsex|sexcam|livesex|sexchat|".
               "chatsex|onlinesex|adultporn|adultvideo|adultweb.|hardcoresex|hardcoreporn|".
               "teenporn|xxxporn|lesbiansex|livegirl|livenude|livesex|livevideo|camgirl|".
               "spycam|voyeursex|casino-online|online-casino|kontaktlinsen|cheapest-phone|".
               "laser-eye|eye-laser|fuelcellmarket|lasikclinic|cragrats|parishilton|".
               "paris-hilton|paris-tape|fuel-dispenser|fueling-dispenser|".
               "jinxinghj|telematicsone|telematiksone|a-mortgage|diamondabrasives|".
               "reuterbrook|sex-plugin|sex-zone|lazy-stars|eblja|liuhecai|".
               "buy-viagra|-cialis|-levitra|boy-and-girl-kissing|". # These match spammy words
               "dirare\.com|".           # This matches dirare.com a spammer's domain name
               "overflow\s*:\s*auto|".   # This matches against overflow:auto (regardless of whitespace on either side of the colon)
               "height\s*:\s*[0-4]px|".  # This matches against height:0px (most CSS hidden spam) (regardless of whitespace on either side of the colon)
               "(http:.*){16}|".         # ***** Limit total number of external links allowed per page / to 15   DOESN'T WORK!
               "display\s*:\s*none".     # This matches against display:none (regardless of whitespace on either side of the colon)
               "/i";                     # The "/" ends the regular expression and the "i" switch which follows makes the test case-insensitive

It does block the other expressions, but I can still save articles with more than 15 links! I don't see what I'm doing wrong, Please help...

  • MediaWiki: 1.11.0
  • PHP: 5.2.6 (cgi-fcgi)
  • MySQL: 5.0.45-community-log

Thanks, Nathanael Bar-Aur L. 17:22, 7 October 2008 (UTC)Reply

PHP 5.2.x introduced pcre.backtrack_limit with default 100000 (less than 100K). I think that is too low and trips up the regex. See stronk7 at moodle dot org's 13-Sep-2007 comment (Find '13-Sep-2007') at http://us.php.net/manual/en/ref.pcre.php. Try adding the following line to LocalSettings.php:
ini_set( 'pcre.backtrack_limit', '8M' );
I don't know what 'pcre.backtrack_limit' value is appropriate. 8M works for me and is lifted from paragraph 4 of Wikipedia's Perl Compatible Regular Expressions article intro. Someone who knows more please adjust that and comment. --Rogerhc 17:44, 9 November 2010 (UTC)Reply
It works for me only with (.|\n)*? <-- this part crosses line ends (\n) and is ungreedy (*?). Like this it works for me up to {129} on a long page with many 200 repetitions of "http://xxxxx " on it. With {130} or higher the server gave this error message: "503 Service Unavailable - The server is temporarily busy, try again later!". Try this:
$wgSpamRegex = "/(http:(.|\n)*?){101}/";
--Rogerhc 05:03, 10 November 2010 (UTC)Reply

Not working for me

edit

i simply put the following line in my settings.

$wgSpamRegex = "/suyash jain/i";

but it is not working

Any help..

Profanity

edit

Hey, anyone got any regex profanity checks out there?

You can just search Google for "Profanity word list". That will give you a number of lists with a few hundred to more than 1000 words. I e.g. found text files with one entry per line. Depending on the list you found, many of the words on it may or may not be problematic, also depending on what you are using the wiki for.
Once you have a list, which suits your needs, it is trivial: Just replace the line breaks with a pipe sign and you have the string for your regular expression. --87.123.6.102 14:44, 12 February 2016 (UTC)Reply

Example blocks legitimate CSS

edit

For example, if I were to type, "overflo:auto; height:" [with "overflow" instead of "overflo", "w" deleted by User:Rogerhc to get this through MwdiaWiki's current spam filter] I would not be allowed to save this page. Rocket000 08:05, 19 August 2009 (UTC)Reply

True and noted. And I had to change your comment to get it past MediaWiki's current spam filter. However, legitimate wiki edits probably don't need that particular CSS, and disallowing it helps stop spam. So it is useful on most wikis. --Rogerhc 18:08, 9 November 2010 (UTC)Reply

Blocking all external links, working version:

edit
$wgSpamRegex = "/^http:|^\[[^][]*\]$/";

What the articles says doesn't work on my wiki for some reason (v1.11). Instead this seems to do the job:

$wgSpamRegex = "/http:\/\//";

I wonder if this can cause any issue I'm not aware of? --Nathanael Bar-Aur L. 22:03, 25 September 2009 (UTC)Reply

I'm doing similar here:
$SpamRegexArray[]="http";
$SpamRegexArray[]="https";
$SpamRegexArray[]="ftp";

if (count($SpamRegexArray))
{
  $wgSpamRegex = "/" . implode("|",array_unique($SpamRegexArray)) . "/";
}
//unset ($wgSpamRegex);
Works perfectly, users cannot save any pages containing Wiki-URLs. Have this working since few years and got (nearly) no spam (some spammers do kind of tagging with random strings to probe wikis for spamfilters, those won't get blocked by my simple filter).
If someone needs to save an URL he might use something like {{Weblink|1=www.example.com/path/to/file?param=value&foo=bar|2=Some Foo Bar}} instead of [http://www.example.com/path/to/file?param=value&foo=bar Some Foo Bar]. Users will get a notice on this, when hitting the spamfilter. Users might understand what to do, bots and spammers won't. It's like a Turing test. --Rabe (talk) 21:26, 17 February 2012 (UTC)Reply

I use the example of Rabe, with the difference i turned $SpamRegexArray[]="ftp"; to $SpamRegexArray[]="www"; and added "The page you wanted to save was blocked by the spam filter." to Systemmsgs "MediaWiki:Spamprotectiontext" and "MediaWiki:Spamprotectionmatch" . So no one sees the reason why something is blocked. I hope this will help some wikis to get less spam. --Feder (talk) 12:51, 26 April 2012 (UTC)Reply


working on this page

edit

I was working on this page, grammar, spelling etc, and moved this section to the talk page:

==== A message to the spammers ====

Occasionally spammers have openly discussed their behavior with the people who fight spam, and the people who are victims of it. From these discussions it's clear that they really believe they are not doing anything wrong. We should tell them otherwise.

Edit your 'MediaWiki:Spamprotectiontext' page, and write a message to the spammers. It's better if it's your own words. If a spammer visits many different wikis and gets many different messages telling them to quit, who knows - maybe they'll start to think about what they are doing. It's probably better to keep the language reasonably polite. You are attempting to reason with them after all. Also remember your legitimate users might end up getting this message in the case of false positives.

Example:

This website is not here to help you promote your site in search engine rankings. Going around wiki sites like this and adding irrelevant messages with links is called 'wiki spamming.' It is thoroughly anti-social, wrong, and may be illegal.

In many cases it's a waste of time, but it would be nice if just a few of these people put their talents to better uses.

I think the last sentence sums it up, "In many cases it's a waste of time" most spam is from bots. This section seems more like a wishful polemic then instructions on how to use $wgSpamRegex. Errectstapler 04:03, 17 July 2011 (UTC)Reply

User group exemptions

edit

Is there a way to allow members of the sysop group, for example, to bypass any restrictions? 83.170.106.45 22:55, 6 May 2012 (UTC)Reply

Not with $wgSpamRegex. It even affects users of the sysop and bureaucrats user groups. Use Extension:AbuseFilter to be able to set up rules, which also allow you to filter by group! I have added that to the page now. --87.123.6.102 15:00, 12 February 2016 (UTC)Reply

Banned words evading the SpamRegex somehow

edit

I have been suffering vast amounts of Chinese spam despite banning the key words (a series of brand names the spammers are using).

The spam is put in a User page and the banned words appear in a big heading (=Gucci bags=, for example) and not in the text (where they put random text). Just putting it in a heading seems to evade the block. It is odd spam; it contains no links, as those are hindered by using CAPTCHA, so how it helps their SEO I do not know.

When I log in (as a normal user) and try to include a heading in a normal article with a banned word, I am blocked as I should be, but somehow the spammers are breaking through.

Is there an exemption in $wgSpamRegex for User pages? Can I reconfigure it so as to close it?

Hogweard (talk) 11:18, 17 July 2012 (UTC)Reply

Using "Add Topic" tab bypasses this filter, i.e. http://www.mediawiki.org/w/index.php?title=Manual_talk:$wgSpamRegex

edit

If you use the "Add Topic" tab to add a new section to a page, and put an external link as the "Subject/headline", it will bypass whatever filter you have set in $wgSpamRegex.--Chibbie (talk) 13:48, 17 June 2013 (UTC)Reply

RESOLVED: You can set $wgSummarySpamRegex to the same as $wgSpamRegex, and that will filter the "Add Topic" subject as well.--Chibbie (talk) 20:55, 18 June 2013 (UTC)Reply

Help with PHP to wildcard all top level domain

edit

Hello, in the example on the page: "domainname\.cn|". Can I wildcard the domain name so that I can block all top level domains? For example "anydomain\.cn|". --MAHR88 (talk) 21:31, 7 December 2016 (UTC)Reply

self solved: experiment,just leave blank no special syntax required: "\.fr|\.xn|\.vn|\.pl|". --MAHR88 (talk) 18:18, 13 December 2016 (UTC)Reply

Examples need to be arrays instead of strings

edit

At some point before 1.39, SpamRegex started requiring its value to be an array instead of a string, instead of proving an array as an option. This should be corrected in the examples, as the examples can no longer be copy/pasted into a LocalSettings.php file. Edit: I went ahead and made the changes. Sushimustwrite (talk) 02:39, 21 February 2023 (UTC)Reply

Return to "$wgSpamRegex" page.