User:DanielRenfro/Biological Wikis

Abstract

edit
  • Biological wikis have are here to stay.
  • However, there has yet to be an in-depth investigation of the advantages and disadvantages of the wiki methodology.

Introduction

edit
  • The inherent complexity and ever-increasing production rate for biological data makes the task of structured storage difficult. The canonical relational model devised by Edgar Codd in1970 mandates an understanding of the types of data and their relationships prior to data input [1]. This type of model is inadequately suited for information that constantly changes. Despite this, relational databases have become the main type of data storage mechanism in biology, examples of which include Chado [2], ACeDB [3, 4], PathwayTools [5], and ArkDB [6]. In the past few years there has been an increasing interest in wiki based biological databases. Numerous so-called "bio-wikis" have appeared on the internet [refs!], in the literature [7-10] and have even led to meetings specifically about the topic [11]. Based on the idea of distributed and collaborative efforts by the community the wiki approach has gained popularity mainly in the area of biocuration [12, 13].
  • Some have integrated wikis into their projects with varying success
    • Biocuration, annotation
    • Health Information Management [14-16]
  • Some have called for changes to the existing archival-databases, this might be too much.
  • There are qualities that wikis offer, along with difficulties to overcome.
  • Resistance to updating GenBank. [17]
  • Quality is as good as a referential encyclopedia. [18]
  • Wikis are complex systems built upon collaboration between many individuals and have been likened to a self-organizing systems [19]. For any emergent system, self-organization relies on multiple independent entities interacting with limited local knowledge. In any wiki a critical number of users is necessary for the wiki to become self-sustaining via stigmergy...
  • One reason why the wiki pragma does not work well in academia could be that wikis are self-organizing systems. A successful wiki is an emergent entity that is not governed by any overarching blueprint or master engineer. Small wikis, in addition to having to overcome the user-base size problem, struggle to grow within their defined boundaries. (Is it correct to make a page about ___ on this wiki?) Wikipedia avoids this problem by being all-encompassing. Whatever the wiki becomes cannot be dictated by a single member – the wiki is dependent on the limited contributions of individual members changing their local environment (a set of pages that interest them.) Mark Elliot borrows the term "stigmergy" from the study of eusocial ants to describe Wikipedia's self-organizing nature [20, 21].


Benefits of Wiki

edit
  1. features that don't have to be customly written
    1. edit history, auditing, undo/rollback
    2. user/group permissions
      1. not at the page level
    3. revision comparison
    4. create pages/content on-the-fly
  2. ease of use
    1. no prior knowledge required
    2. scalable to users of all levels of knowledge (of wiki markup)
    3. more recently WYSIWYG editor
  3. ability to capture "notebook-level" (narrative) information
  4. experts can add knowledge of their field

Considerations

edit

The popularity of the online encyclopedia Wikipedia has drawn many biologists and bioinformatics researchers into the wiki arena. What makes the wiki model work for Wikipedia may be it's largest drawback for biology – a large community of users. Most biological wikis appear to suffer most from a lack of community participation.

The Gene Wiki has overcome this issue by teaming up with Wikipedia in order to leverage the large user base to annotate gene and protein function [12, 22]. The partnership solves problems such as web visibility at a price – Wikipedia is idiosyncratic about the types of articles that can be made. A certain level of 'noteworthiness' is required for an article, which excludes the majority of biological entities [23]. In addition to which Wikipedia maintains a strict policy against original research [24]. This means all information in Wikipedia must be previously published and citable.

Specific to Wiki

edit
  1. wiki's strength is in capturing narrative data, not tabular or ordered data
  2. lack of data-consistency checking
  3. highly manual
  4. custom software still needs to be written
    1. software in academia is horrible [ref:]
    2. smaller community
  5. Mediawiki (most popular wiki software) written in PHP (not much biological software initiatives in PHP [ref:bioperl, bioJava, etc.])

In general

edit
  1. software in academia lacks robustness
    1. little to no documentation
    2. written for a specific machine architecture, language, library, etc.
    3. unobtainable source
  2. not much invested in collaboration in academia
    1. lack of willingness
      • inherent to the system of publish-or-perish based on competition
      • lack of knowledge of said technology
      • lack of time
      • lack of funding
  3. few standards in web connectivity between databases
    1. combined with #2 above = not implemented
  4. collaboration incentives lacking
    1. no immediate rewards for contribution
    2. tenuous long-term rewards
    3. biological data seems to be subcritical in it's atomic state (the pieces alone tell you little), only in higher-ordered derivations does it become apparently important
    4. established publication system is resistant to recognizing or validating wikis as an accepted source
      • due to problems associated with anonymous editing (& Wikipedia)

Discussion and Future Directions

edit
  • community participation is the most-often cited rate-limiting factor
  • structure of the wiki (standards need to be decided upon)

Resources

edit

References

edit

1. Codd EF: A relational model of data for large shared data banks. 1970. MD Comput 1998, 15(3):162-166.

2. Mungall CJ, Emmert DB: A Chado case study: an ontology-based modular schema for representing genome-associated biological information. Bioinformatics 2007, 23(13):i337-346.

3. Walsh S, Anderson M, Cartinhour SW: ACEDB: a database for genome information. Methods Biochem Anal 1998, 39:299-318.

4. Rogers A, Antoshechkin I, Bieri T, Blasiar D, Bastiani C, Canaran P, Chan J, Chen WJ, Davis P, Fernandes J et al: WormBase 2007. Nucleic Acids Res 2008, 36(Database issue):D612-617.

5. Karp PD, Paley SM, Krummenacker M, Latendresse M, Dale JM, Lee TJ, Kaipa P, Gilham F, Spaulding A, Popescu L et al: Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology. Brief Bioinform 2010, 11(1):40-79.

6. Hu J, Mungall C, Law A, Papworth R, Nelson JP, Brown A, Simpson I, Leckie S, Burt DW, Hillyard AL et al: The ARKdb: genome databases for farmed and other animals. Nucleic Acids Res 2001, 29(1):106-110.

7. Bidartondo MI: Preserving accuracy in GenBank. Science 2008, 319(5870):1616.

8. Hu JC, Aramayo R, Bolser D, Conway T, Elsik CG, Gribskov M, Kelder T, Kihara D, Knight TF, Jr., Pico AR et al: The emerging world of wikis. Science 2008, 320(5881):1289-1290.

9. Salzberg SL: Genome re-annotation: a wiki solution? Genome Biol 2007, 8(1):102.

10. Giles J: Key biology databases go wiki. Nature 2007, 445(7129):691.

11. NETTAB 2010 - Biological Wikis [1]

12. Huss JW, 3rd, Lindenbaum P, Martone M, Roberts D, Pizarro A, Valafar F, Hogenesch JB, Su AI: The Gene Wiki: community intelligence applied to human gene annotation. Nucleic Acids Res 2010, 38(Database issue):D633-639.

13. Howe D, Costanzo M, Fey P, Gojobori T, Hannick L, Hide W, Hill DP, Kania R, Schaeffer M, St Pierre S et al: Big data: The future of biocuration. Nature 2008, 455(7209):47-50.

14. Harris ST, Zeng X: Using wiki in an online record documentation systems course. Perspect Health Inf Manag 2008, 5:1.

15. Cobus L: Using blogs and wikis in a graduate public health course. Med Ref Serv Q 2009, 28(1):22-32.

16. Mader S: Using wiki in education. In. Providence, R.I.: S. Mader.

17. Pennisi E: DNA data. Proposal to 'Wikify' GenBank meets stiff resistance. Science 2008, 319(5870):1598-1599.

18. Giles J: Internet encyclopaedias go head to head. Nature 2005, 438(7070):900-901.

19. Tapscott D, Williams AD: Wikinomics : how mass collaboration changes everything, Expanded edn. New York: Portfolio; 2008.

20. Elliot M: Stigmergic Collaboration: The Evolution of Group Work. M/C Journal 2006, 9(2).

21. Elliot M: Stigmergic Collaboration: A Theoretical Framework for Mass Collaboration. Melbourne, Australia: The University of Melbourne, Australia; 2007.

22. Gene Wiki Portal [2]

23. Wikipedia:Notability [3]

24. Wikipedia:No original research [4]