Extension:GWToolset/Technical Design
AbstractEdit
This section of the document answers questions about the project. What is the project? What is its purpose? What are the requirements?
GWToolset (or GLAMWikiToolset) is a Special Page extension. The main goal of the extension is to allow GLAMs the ability to mass upload content (pictures, videos, and sounds) to Wikimedia Commons based on respective metadata (XML); the intent is to allow for a wide variety of XML schemas. The extension goes about this task by presenting the user with several steps, represented by HTML forms, in order to set-up a batch upload process that will upload content and metadata to the wiki, which creates individual mediafile pages for each item uploaded.
The project was co-funded by Europeana and a few Wikimedia chapters[1].
Further information can be found on the project page. Your feedback and questions are welcome, feel free to contact us.
RationaleEdit
This section explains the value of the project, why we think it is of value, and how it fits into the bigger picture.
ProcessEdit
Often cut into multiple sections, this describes how the feature is intended to work.
The current steps within the upload process are:
Metadata detectionEdit
- indicate which element within the metadata file represents a mediafile record.
- a mediafile record contains metadata about the digital item such as author, date created, and a url to the mediafile.
- select a MediaWiki template that will display the mediafile metadata on the mediafile page.
- optionally select a previously saved metadata mapping that maps the metadata fields within the metadata file with the fields in the MediaWiki template.
- select the metadata file stored on your local hard drive.
- upload the metadata file.
The metadata file will be uploaded to a FileBackend store; a relative reference to the FileBackend store is placed in the subsequent HTML forms so that the extension can retrieve it as necessary.
Metadata mappingEdit
SummaryEdit
- a summary of the information provided in Metadata detection step.
- a listing of all of the MediaWiki fields in the template selected in the Metadata detection step.
- drop-down menus next to those fields that contain all of the metadata elements found in the metadata file.
- a sample mediafile record with corresponding metadata information about the mediafile record.
Create a mappingEdit
- create a mapping of the MediaWiki template fields to the metadata record elements by selecting the corresponding metadata record element from the drop-down next to the appropriate MediaWiki template field.
- more than one metadata record element can be related to a MediaWiki template field.
- a metadata record element can be related to many MediaWiki template fields.
Global categoriesEdit
- optionally add global categories to the upload
- global categories are applied to all mediafile records in the metadata file
- more than one global category can be applied
Item specific categoriesEdit
- optionally add item specific categories to the upload
- these are applied to each mediafile record, but use item specific information.
- for example, if the drop-down contains a mediafile field called author, the value for each individual record will be used.
- the phrase allows you to prefix the mediafile metadata field with something like “created by” which could pair with a drop-down field author.
SummaryEdit
- optionally provide a summary message that gives an overview of why you are uploading this metadata file and all of its records.
Batch previewEdit
Uploads and creates the first 3 mediafile pages based on those records found in the metadata file.
- you can preview the results of the mapping
- you can go back to the mapping step and make any necessary changes.
Batch job creationEdit
If the Batch preview looks good, go ahead and create the batch job process. This step will create the following background jobs:
UploadMetadataJobEdit
The UploadMetadataJob will cycle through all of the records found in the uploaded metadata file provided in Step 1: Metadata Detection and create several individual UploadMediafileJobs. Depending on various configurations, the UploadMetadataJob will re-create itself in order to process all of the metadata records.
Throttles, Limits, DelaysEdit
- a metadata job delay.
- the intent of this delay is to space out the run of the UploadMetadataJobs when possible.
- this delay only works when a job queue that honours delays is used to create UploadMetadataJobs, e.g. JobQueueRedis; the “regular” JobQueue does not currently honour delays.
GWToolset\Config::$metadata_job_delay
, default is 1 minute.
- the total number of UploadMediafileJobs added to the job queue during an UploadMetadataJob run.
- the intent of this throttle is to limit the number of mediafile requests against a given mediafile server.
- this can also be set by the user in Step 1: Metadata Detection.
GWToolset\Config::$mediafile_job_throttle_default
, default is 10.GWToolset\Config::$mediafile_job_throttle_min
, default is 1.GWToolset\Config::$mediafile_job_throttle_max
, default is 20.
- the total number of UploadMediafileJobs allowed in the job queue.
- the intent of this throttle is to make sure the extension does not flood the job queue.
GWToolset\Config::$mediafile_job_queue_max
, default is 1000.- if that limit is reached, the UploadMetadataJob will create another instance of itself and attempt to add the UploadMediafileJobs when that new instance is called.
- If delayedJobsEnabled() is true, the new instance will also contain a
jobReleaseTimestamp
determined byGWToolset\Config::$metadata_job_attempt_delay
, default is 5 minutes. - this process of re-creating the UploadMetadataJob if the
GWToolset\Config::$mediafile_job_queue_max
has been reached, will be attempted for a limited number of times. This limit is set byGWToolset\Config::$metadata_job_max_attempts
, default is 10. if this limit is exceeded, the extension will give-up on trying to add the UploadMediafileJobs and issue an Exception message.
- If delayedJobsEnabled() is true, the new instance will also contain a
UploadMediafileJobEdit
The UploadMediafileJobs contain all of the information entered in Step2: Metadata Mapping:
- the MediaWiki template to map to
- the metadata mapping
- specific record information
- any global and item specific categories that may have been added
- a summary message if entered
- whether or not to re-upload the mediafile
FileBackendCleanupJobEdit
The UploadMetadataJob will continue to create another instance of itself as long as there are more metadata records to process. When it finishes cycling through all of the metadata records and has created the last UploadMediafileJob, it will create a FileBackendCleanupJob that will delete the FileBackend metadata file that was originally uploaded in Step 1: Metadata Detection.
Gallery and AssetsEdit
These are images that are essential to understand the project. Mockups, screenshots, and icons fall into this category.