Extension:Wikispeech
Wikispeech Release status: beta |
|
---|---|
![]() |
|
Implementation | Page action , Ajax, API |
Description | Reads page text out loud using text-to-speech |
Author(s) | Sebastian Berlin, André Costa, Karl Wettin and Igor Leturia |
Latest version | 0.1.7 (2020-10-19) |
MediaWiki | >= 1.35 |
Database changes | Yes |
License | GNU General Public License 2.0 or later |
Download | |
Example | |
|
|
|
|
Translate the Wikispeech extension if it is available at translatewiki.net | |
Vagrant role | wikispeech |
Issues | Open tasks · Report a bug |
The Wikispeech project aims to create an open source text-to-speech tool to make Wikimedia's projects more accessible for people that have difficulties reading for different reasons. Wikispeech will be available as a MediaWiki extension. More information can be found on the project page; this page is just about the Wikispeech extension itself.
SpeechoidEdit
The extension uses a service for TTS operations, such as creating audio for utterances called Speechoid. Speechoid consists of a main server, a lexicon server, TTS engines and any additional components that may be required for certain languages.
To prepare an utterance for playing, the extension sends a request to the service. This request contains the utterance as text, which language it is in and which voice to use. The service processes the text using a lexicon and one of the installed TTS engines, depending on what voice is being used. Once the audio has been generated, a response is returned with audio data along with some information that will enable highlighting and skipping. This is then used by the extension to actually play the utterance to the user and the process is repeated for the following utterances as needed.
Main Wikispeech ServerEdit
The main server has a web API that includes an endpoint for generating speech. It handles internal communication between the underlying servers, listed below.
PronlexEdit
A lexicon server with its own API. Holds information about lexicon entries and has endpoints for lookup and manipulation of them. When processing an utterance, words are looked up in the lexicon and if there is a matching entry it is used for the pronunciation.
TTS enginesEdit
The server supports having multiple TTS engines. Which one is used for a certain utterance depends on which voice is given in the request.
MaryTTSEdit
Comes with support for Arabic, English and Swedish.
Additional ComponentsEdit
MishkalEdit
Used to vocalize Arabic text.
SymbolsetEdit
Symbolset is a repository for handling phonetic symbol sets and mappers/converters between different symbol sets and languages.
InstallationEdit
- Download and place the file(s) in a directory called
Wikispeech
in yourextensions/
folder. - Run Composer to install PHP dependencies, by issuing
composer install
in the extension directory. (See T173141 for potential complications.) - Add the following code at the bottom of your LocalSettings.php : wfLoadExtension( 'Wikispeech' );
- Done - Navigate to Special:Version on your wiki to verify that the extension is successfully installed.
Setting up SpeechoidEdit
The Wikispeech extension requires Speechoid to generate audio. Detailed instructions for installing Speechoid can be found on /Installing_Speechoid.
Basic configurationEdit
For the Wikispeech extension to be able to communicate with Speechoid, you need to specify the service's URL. You can do this by adding the following line to LocalSettings.php :
$wgWikispeechSpeechoidUrl = 'URL';
where URL
is the URL to your Speechoid instance.
Complete list of configuration optionsEdit
Option | Default value | Documentation |
---|---|---|
WikispeechSpeechoidUrl | ""
|
The URL to use for the Speechoid service. |
WikispeechListenMaximumInputCharacters | 2048
|
Maximum number of characters in the input (a segment) sent to the Speechoid service. |
WikispeechRemoveTags | {
"span": "mw-editsection",
"table": true,
"sup": "reference",
"div": [
"thumb",
"toc"
]
}
|
Map of HTML tags that should be removed completely, i.e. including any content. Keys are tag names and the values determine whether a tag should be removed, as follows:
|
WikispeechSegmentBreakingTags | [
"h1",
"h2",
"h3",
"h4",
"h5",
"h6",
"p",
"br",
"li"
]
|
HTML tags that will break text in segments. This ensure that, for example a header text without punctuation suffix will not be merged to the same segment as the text content of a preceding paragraph. |
WikispeechNamespaces | [
0
]
|
List of the namespace indices, for which Wikispeech is activated. |
WikispeechKeyboardShortcuts | {
"playStop": {
"key": 32,
"modifiers": [
"alt",
"shift"
]
},
"skipAheadSentence": {
"key": 39,
"modifiers": [
"alt",
"shift"
]
},
"skipBackSentence": {
"key": 37,
"modifiers": [
"alt",
"shift"
]
},
"skipAheadWord": {
"key": 40,
"modifiers": [
"alt",
"shift"
]
},
"skipBackWord": {
"key": 38,
"modifiers": [
"alt",
"shift"
]
}
}
|
Shortcuts for Wikispeech commands. Each shortcut defines the key pressed (as key code[1]) and any modifier keys (ctrl, alt or shift). |
WikispeechSkipBackRewindsThreshold | 3.0
|
If an utterance has played longer than this (in seconds), skipping back will rewind to the start of the current utterance, instead of skipping to previous utterance. |
WikispeechHelpPage | "Help:Wikispeech"
|
Help page for Wikispeech. If defined, a button that takes the user here is added next to the player buttons. |
WikispeechFeedbackPage | "Wikispeech feedback"
|
Feedback page for Wikispeech. If defined, a button that takes the user here is added next to the player buttons. |
WikispeechContentSelector | "#mw-content-text"
|
The selector for the element that contains the text of the page. Used internally, but may change with MediaWiki version. |
WikispeechVoices | {
"ar": [
"ar-nah-hsmm"
],
"en": [
"dfki-spike-hsmm",
"cmu-slt-flite"
],
"sv": [
"stts_sv_nst-hsmm"
]
}
|
Registered voices per language. System default voice falls back on the first registered voice for a language if not defined by Speechoid. |
WikispeechUtteranceTimeToLiveDays | 31
|
Minimum number of days for an utterance to live before being automatically flushed from the utterance store. More or less the cache flush setting for synthesized text. Setting this value too low will save disk space but cause frequently requested text segments to be re-synthesized more often with a CPU cost.Setting this value too high will block improvements to the voice synthesis. Setting this value to 0 will in effect turn off the cache and thus flush all utterances as soon as possible.To avoid running the flush job too often, see the MW job documentation: https://www.mediawiki.org/wiki/Manual:Job_queue#Job_execution_on_page_requests |
WikispeechUtteranceFileBackendName | ""
|
FileBackend group defined in LocalSettings.php used for utterance audio and metadata files.If not defined in LocalSettings.php, a FSBackend will be created that work against a temporary directory. See log warnings for exact path. |
WikispeechUtteranceFileBackendContainerName | "wikispeech_utterances"
|
Container name used in FileBackend for utterance audio and metadata files. |
CSSEdit
This is a subset the CSS rules that are most interesting for a non-developer.
Selector | Default values | Documentation |
---|---|---|
.ext-wikispeech-highlight-sentence
|
background-color: rgb( 200, 170, 255 );
|
The visual highlighting for the sentence that is currently being recited. |
.ext-wikispeech-highlight-word
|
background-color: rgb( 255, 200, 140 );
|
The visual highlighting for the word that is currently being recited. |