API:ImageContentFiltration

Introduction

This tool was developed as a part of an Outreachy (December 2020-March 2021) project proposal that aims to reduce vandalism attacks on Wikimedia and its subsidiaries. The core of this tool is derived from aspects of computer vision.

The Model

The main objective that helped navigate through the development process was to create a tool that is light-weight and robust but, at the same time, does not compromise a lot on the observed accuracy. During the research phase of this project, multiple recommended deep learning models were tested on the curated dataset, and MobileNet V1 was unanimously selected based on its performance, size requirements, and processing time. The core of this deep-learning-based tool is this pre-trained model and a custom secondary architecture.

Click here to access the fully trained model

How to use the API?

Clone this repository and download the fully trained model (https://drive.google.com/drive/folders/12iBeCIruhnmoVyBdOXMlYrn7SMRVc7zN?usp=sharing). Specify the path to the model and your constants in the "constants_api.py" file, and use the following command in the terminal: curl -X POST -F image=@Test_A001.png "http://127.0.0.1:5000/predict" Where Test_A001.png is an image you want to test and "http://127.0.0.1:5000/predict" is where the API predict method is running.

Complete Technical Documentation

In order to read more about the tool, its background, and other intricacies, kindly visit this GitHub repository.

Preliminary Performance

Based on the curated dataset, after the training phase, the training accuracy was 98.90%, the training error was 0.0346, the validation accuracy was 96.43% and the validation loss was 0.1177.

Acceptable image extensions for input

This tool currently supports major image file extensions i.e. raster image files (JPEG/JPG, PNG, and GIF). Support for PSD, SVG and other vector image formats might come shortly.

Future Work

Integration with the abuse filter

Currently, this tool is under the testing stage. Once it passes this stage, it could be hooked on to the Abuse Filter tool at Wikimedia where it could preemptively flag an image that meets the profile of a content that should be filtered.

Image Annotation Tool

Currently, the label annotation tool developed by Wikimedia Labs helps users participate in the task of text annotation that helps with the training of intelligent wiki-tools based on Natural Language Processing. Future work would involve creating a similar tool that allows users to assign label(s) to images, which can then be used for bettering the image-recognition based tools (for example: this content filtration tool) at Wikimedia.

Video Content Filtration

Future versions of this tool could incorporate functionality that also accepts videos and assesses the percentage of unsafe content in them.

Categorization

This is subject to data availability. Deeper categories could be introduced (for example: why is a particular content marked unsafe?) or category-based tags could be assigned to each user input (for images that might have been marked unsafe for multiple reasons).