User:Mr.Z-man/cite

Ideas for a refurbishing of the current ref system:

The edit page

edit
  • The current <ref></ref> system will continue to work, but will be deprecated in favor of a separate "reference manager"
    • The reference manager will be separate from the editbox and list all the references in the article in their own textareas.
    • Each reference can be assigned a name to use in the article and will have its own unique ID.
  • The current <ref name="Foo"/> system will be deprecated in favor of a simpler {{#ref:Foo}}.
    • The new parser function will also have the option to add additional info, for example: {{#ref:Foo|page 50}}
  • The reference manager will work best with JavaScript, but will still be functional without it.
  • <references /> will continue to work as now.

Rendered page

edit
  • The rendered output will be essentially the same, with a few minor changes.
  • Rather than lumping all refs that use the same text onto the same line, any reference that has additional info will be on a separate line, indented. For example:
  1. ^ab Doe, John. The Foo Book
    c page 55
    d pages 75-80
  • Any references included with the page but not used in the text will still be listed.

Database

edit
  • One table will store the actual wikitext of the reference, and a unique ID number for the text.
  • Another will provide a mapping of rev_id -> ref_id
  • This is done in 2 separate tables to save space, as most edits won't change every reference on the page

Other requirements

edit
  • enwiki citation templates are hard on the parser, so it needs good caching. This should be easier with this system than the current one
  • Need to make sure links tables are properly updated - parser should take care of this?
  • Reference changes need to be included in diffs somewhere
  • Need to be able to revert reference changes when reverting an article, either with rollback, undo, or manual reversion
  • Need to be able to retrieve and edit refs with the API.
  • Needs to be integrated with anti-abuse systems - spam checks, abuse filter
  • Reference text needs to be included with database dumps
  • FlaggedRevs - if text is tied to revids, this may not be an issue

Possibly

edit
  • Allow importing of refs from other articles
  • Automatic conversion of the current syntax with an option while editing

Searching

edit
  • Doing a plain text search would likely be too slow to implement. InnoDB doesn't support fulltext indexes but concurrent writes is probably more important than ability to search.
  • The options for searching are:
    1. Instead of using a single textarea for the references stored in a blob, use small text fields stored in a varchar. Fulltext searching still couldn't be used, but searching for exact matches or prefixes on individual fields would be possible - e.g. author_last_name == "Doe" AND author_first_name LIKE "J%"
      • Downsides: puts a lot more limitations on input/output options (templates would have to be defined in system messages), less backward compatible, could make the edit page more cluttered with tons of input fields rather than a few textareas.
      • Benefits: Probably the easiest to use, not too hard to code.
    2. Integrate with the current search engine, or create a new search engine with a MyISAM index table just for this.
      • Downsides: A lot more complex, extra tables/indicies would take up a lot of space for a non-critical feature, output might be harder to use.
      • Benefits: Would be fast and versatile, would work with any reference format.
    3. Don't allow searching of citation text, just lookups by page or revid
      • Downsides: Not very useful.
      • Benefits: Easiest to code, performance would not be an issue.
    4. Create a toolserver tool to search.
      • Downsides: Would be an external tool, so harder to integrate into normal editing, would be less useful if reference text needs to be put in external storage - text would need to be retrieved from dumps.
      • Benefits: Performance isn't as critical for the toolserver, so a slower query isn't as much of a problem, allows for more site-specific customization.