RESTBase acts as a data store, and storage is either sqlite or cassandra.
What I can't find anywhere in the documentation is the retention policy of the stored data.
After some months or years of usage, the sqlite file is in the order of Gigabytes. Quite unmanageable. I simply stopped the service, deleted the sqlite file and started it again, and everything seems to work. However, is there any purge or maintenance process of the data that I should run periodically to clean up unneeded data? I mean, stopping the service, manually removing the file and starting again is not particularly clean (although doable from a shell script run from cron). However, is it safe? As I said, apparently everything works, but I'd like to be sure it's safe to do, or if there are more clean ways to remove old data.
I want to migrate the store backend to cassandra, and I fear I'll have the same problems, data storage getting bigger without limit, and the only way to clean it would be to delete the storage entirely and let it recreate itself.
WMF should have bigger storage needs for all its wikis. How is WMF doing this? Do they really have terabytes of data in their cassandra cluster? Is all the data really needed?