Equivset is a library for detecting visually similar UTF-8 characters.

Equivset is designed to prevent abuse through imitation of words and focusses primarily on letters and punctuation (not emojis or other symbols). It contains mapping of visually identical characters from Unicode Confusables such as Latin "A" and Greek "Α" (alpha), as well as additional mapping for visually similar characters such as "S" and "$" (dollar sign).

It is used at Wikimedia in the AntiSpoof and AbuseFilter software to determine if two characters are visually equivalent.

Data

edit

The library provides its dataset of equivalent set of characters in a standard JSON format and a plain text format (browse files)

It also provides an access library for PHP.

edit