API:ImageContentFiltration
Introduction
editThis tool was developed as a part of an Outreachy (December 2020-March 2021) project proposal that aims to reduce vandalism attacks on Wikimedia and its subsidiaries. The core of this tool is derived from aspects of computer vision.
The Model
editThe main objective that helped navigate through the development process was to create a tool that is light-weight and robust but, at the same time, does not compromise a lot on the observed accuracy. During the research phase of this project, multiple recommended deep learning models were tested on the curated dataset, and MobileNet V1 was unanimously selected based on its performance, size requirements, and processing time. The core of this deep-learning-based tool is this pre-trained model and a custom secondary architecture.
How to use the API?
editClone this repository and download the fully trained model (https://drive.google.com/drive/folders/12iBeCIruhnmoVyBdOXMlYrn7SMRVc7zN?usp=sharing). Specify the path to the model and your constants in the "constants_api.py" file, and use the following command in the terminal: curl -X POST -F image=@Test_A001.png "http://127.0.0.1:5000/predict"
Where Test_A001.png
is an image you want to test and "http://127.0.0.1:5000/predict"
is where the API predict method is running.
Complete Technical Documentation
editIn order to read more about the tool, its background, and other intricacies, kindly visit this GitHub repository.
Preliminary Performance
editBased on the curated dataset, after the training phase, the training accuracy was 98.90%, the training error was 0.0346, the validation accuracy was 96.43% and the validation loss was 0.1177.
Acceptable image extensions for input
editThis tool currently supports major image file extensions i.e. raster image files (JPEG/JPG, PNG, and GIF). Support for PSD, SVG and other vector image formats might come shortly.
Future Work
editIntegration with the abuse filter
editCurrently, this tool is under the testing stage. Once it passes this stage, it could be hooked on to the Abuse Filter tool at Wikimedia where it could preemptively flag an image that meets the profile of a content that should be filtered.
Image Annotation Tool
editCurrently, the label annotation tool developed by Wikimedia Labs helps users participate in the task of text annotation that helps with the training of intelligent wiki-tools based on Natural Language Processing. Future work would involve creating a similar tool that allows users to assign label(s) to images, which can then be used for bettering the image-recognition based tools (for example: this content filtration tool) at Wikimedia.
Video Content Filtration
editFuture versions of this tool could incorporate functionality that also accepts videos and assesses the percentage of unsafe content in them.
Categorization
editThis is subject to data availability. Deeper categories could be introduced (for example: why is a particular content marked unsafe?) or category-based tags could be assigned to each user input (for images that might have been marked unsafe for multiple reasons).