FileBackend design considerations
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. This document informed the 2012 initiative to migrate Wikipedia's media store from NFS to Swift. For current details, see wikitech:Media storage. |
This project page is for ideas about the FileBackend class rewrite.
The FileBackend project is a refactoring project which will split off backend operations from the FileRepo class hierarchy. The idea is to separate metadata storage from file storage, so that metadata features such as Commons (Foreign*Repo) can be implemented independently from file storage features (Swift etc.), allowing the two to be combined in arbitrary ways.
Note: A similar concept was proposed in the past for this and was called FileStore. That project was scrapped without ever seeing real use (and was removed entirely in 1.16.0).
Backend considerations
editPossible backends:
- Filesystem
- Swift
- Azure
- Amazon S3
Current architecture assumes:
- We can use FS path-like names (called storage paths).
- If a backend can't do this, it can always normalize them in some fashion.
- We can list objects/files that have a path starting with a prefix (thumbnail purging uses this)
- Swift can list object names by prefix
- php-cloudfiles has a function for these. Also see "swift.common.client.get_container" at http://swift.openstack.org/misc.html#client.
- Amazon S3 can also list object names by prefix.
- "Keys can be listed by prefix. By choosing a common prefix for the names of related keys and marking these keys with a special character that delimits hierarchy, you can use the list operation to select and browse keys hierarchically. This is similar to how files are stored in directories within a file system. " -- http://docs.amazonwebservices.com/AmazonS3/latest/dev/
- Azure supports this too.
- See "List Blobs" API request at http://msdn.microsoft.com/en-us/library/dd135734.aspx
- If a backend can't do this, then it will need DB tracking to get the lists.
- Swift can list object names by prefix
- Container support
- Swift & Azure puts objects in user namable "containers"
- Amazon S3 puts objects in user "buckets" (basically containers)
Consistency with filesystem behavior
edit- On FS (using FSFileBackend), if one saves "/.../cont/a/b/c.txt", the parent directories are created automatically. Now "/.../cont/a" and "/.../cont/a/b" cannot be created as files since directories already exist there. "/.../cont/a/b/c.txt/d" also cannot be created as a file exists where a directory would be needed.
- In Swift/S3, there are not really directories. So one can create files "/.../cont/a/b/c.txt", "/.../cont/a" and "/.../cont/a/b".
- It might be possible to create directory markers or just salt the directories with dummy objects. See http://docs.openstack.org/cactus/openstack-object-storage/developer/content/pseudo-hierarchical-folders-directories.html.
- This greatly increases RTTs (and locking required) in order to create parent directories for files.
- One could also disallow saving an object "PREFIX" if files exist under "PREFIX/" and any of it's "parent directories" are objects.
- This still increases RTTs (and locking required) in order to prevent objects from popping up after not finding any (phantom objects).
- Valid files in FS will be valid in Swift. It's not likely for people to migrate from that later to the former...we could just ignore this issue. Also, we can make sure the FileRepo code does operations in a way that works on FS.
Locking considerations
editThere is a LockManager hierarchy for handling locks.
- Includes DBFileLockManager and LSFileLockManager class we can use.
- We can use DB-based locking or make a subclass to use some distributed lock manager that's already out there. Things like FSRepo or backend uses without involving `image` table rows really need this, since there is no locking at all currently. LocalRepo tries to do some locking by locking `image` table rows.
Data consistency
editSwift/S3 et all have "eventual consistency" rather than "strong consistency" or "release consistency" (NFS). One must be careful when reading a file and using it for a write (as with reading from a DB slave for a write to the master).
- Swift supports an X-Newest header for getting latest version of an object, which involves checking (via HEAD) each storage node. See https://answers.launchpad.net/swift/+question/181905 and https://blueprints.launchpad.net/swift/+spec/x-newest.
- When a file is created/changed, the Swift proxy requires a majority quorum of nodes to respond with success in order for the operation to be treated as a success. The remaining nodes, if they can't receive the update yet, will be asynchronously updated via the Replicator process. Our WMF Swift storage nodes are on different racks/strips and such. This makes strong consistency almost guaranteed when first locking the files (via LockManager) and then using X-Newest to HEAD/GET the object. It's unlikely for two nodes, having data that didn't yet replicate to the third, both go down, but have the third node still up (it will likely have failed if two of the others did), and have a read-for-write operation happen to the affected files. In this case, only the third node would respond and it would have outdated data. See https://answers.launchpad.net/swift/+question/134996 and https://lists.launchpad.net/openstack/msg06861.html.
- Some good Q&A about how swift writes objects and handles failures can be found at https://answers.launchpad.net/swift/+question/184443.
- Given the write quorum for PUT/POST requests and the various possible Ring setups, it's possible that writes to some objects cannot be written to while others can. This can be annoying for batch operations, which are preferably all or nothing (to the extent possible). Currently, batch operations stop at the first failure and log the failed operation and the aborted next operations in the batch. This can be used for recovery.
Security
editRestrictions can be placed on the swift user used to handle non-authorized (end-user) requests. See http://programmerthoughts.com/openstack/swift-permissions/.
Performance considerations
editOn backends without native file/object renaming support, copy+delete has to be used. This kind of sucks for super large files.
- In Swift, one can store 0-byte objects with X-Object-Manifest headers at the file name that points to the data segments (1 or more files). This is how large objects (>5GB) are stored. See http://swift.openstack.org/overview_large_objects.html. The segment names could use a content hash perhaps (for immutability). Moving would just involve copy+delete of these tiny files.
- DO note that deletes in Swift are fast. They just consist of adding a new 0-byte tombstone file.
FileRepo integration
edit- FSRepo and LocalRepo assume an FS backend. We need to change that to remove that assumption but also maintain backwards compatibility, which is a bit tricky.
- Generic zone configuration. The old code has ad-hoc configuration for each zone (directory, deletedDir, thumbDir). It would simplify the FileBackend interface if the relevant code was factored out and the configuration became:
- array( 'zones' => array( 'public' => array( 'directory' => ..., 'container' => ... ) ) )
- The old configuration keys would be kept for backwards compatibility.
- The default backend (FileRepo::getBackend()) should be FSFileBackend. Any shared backend interface code should be moved to FileRepo. FSRepo should become an alias for FileRepo.
- FileRepo can be non-abstract, since the abstract backend access methods will be in FileBackend instead.
- File::getPath() callers should be migrated to some more remote-friendly interface
Other integration
edit- Media/thumbnail handler code in /media [done]
- MTO::getPath() is mostly for thumb.php. Possibly replace with a getVirtualUrl() or stream() function.
- Upload code in /upload [done]
- thumb.php [done]
- image_auth.php [done]
- thumb_handler.php [done]
See also
edit- Meeting notes: NOLA_Hackathon/Sunday
- wikitech:Media server/FileBackend