User:Magnus Manske/File handling

This is a proposal. Feel free to edit!


Several issues with current Wikimedia file storage:

  • NFS (misc problems)
  • Different engines for different extensions
  • Temporary files are forever

keep adding

Proposed solution:

  • Central MySQL database for handeling all file storage (temporary and permanent) for all projects
  • Can handle multiple storage types (could even have the same file in multiple storage engines during move to new storage engine)
  • Scales (add DB slaves if necessary)
  • timestamp could show time of last request (only actually update for 1 in a 100 requests to keep writes down) to nuke old, unused (temporary) files
  • file_key would be the file name for uploaded files, or the text inside the math tag, or the md5sum of the timeline text, or...
  • file_location would be a file_storage-specific key (NFS path, cloud ID,...)
  • width and height can be stored (thumbnails), but are optional

Proposed DB tables:

CREATE TABLE file (
 file_id INTEGER PRIMARY KEY AUTO_INCREMENT,
 file_project INTEGER
 file_type SET ('file','thumbnail','math','timeline'),
 file_key MEDIUMBLOB,
 file_storage SET ('nfs','cloud','vapor'),
 file_location MEDIUMBLOB,
 file_width INTEGER DEFAULT NULL,
 file_height INTEGER DEFAULT NULL,
 file_timestamp VARCHAR(14)
) engine=InnoDB;

CREATE TABLE project (
 project_id INTEGER PRIMARY KEY AUTO_INCREMENT,
 project_name VARCHAR(255)
) engine=InnoDB;

Add indices!