User:OrenBochman/Search/Plan

Milestone 1 - Working Prototyping edit

  • Goal: Build & Deploy Protype to labs

Tasks edit

  • Project Admin
    •   Write a Test Plan
    •   Finish Risks Assesment
  •   Development (Search engine componets are listed below)
  •   Prortype Deployment to Labs

Search Engine Components edit

  • Indexing
    •   reading compressed dumps
    •   reading html from cache
    •   integrate Html cleaning (Tika)
  • User Interface
    • {{decistage|0} javascript user interface
  • Admin & Deployment
    •   continious integration
    •   configuration management
    •   puppetise environment
  • Testing
    •   Unit Test in CI
    •   preliminary benchmarking reports via R in docs folder

Milestone 2 - Feature Compatibility with Lucene_Search 2.1

  • Goal: Match most of 2.1 Features
  • Use TermPositionVectors
  • Fast Highlighting
  • Update analysis chain to work with current api version
    • HTML Analyzer
    • Processing Wikicode - wikitokenizer
    • Lowercase
    • Hyperlink
    • Aliases
    • Title Shingles
  • Language support
    • Accent Normlization
    • Snowball
    • English
    • CJK filter
    • Serbian
    • Vietnamese
    • Russian
  • Wordnet
  • Spelling/Did you mean
    • Admin
  • JMX support
  • More Benchmarking Reports
  • Integrate Existing UI

Milesotne 3: Production edit

  • Deployment to production environment

These should be operational from the prototype stage

  • Shrading
  • Replication
  • Update Mechanism (Incremental)
    • Get Update form Metadata Repository
    • Get Data via maintenence/dump.php
    • Bittorrent based distibution of search indexes update dumps
  • Configuration Managment
    • Standalone - LocalSettings.php
    • Multiple - CommonSettings.php
  • realtime indexing
    • minimize edit to search time
    • update to special page on search

Phase 4: NG Features edit

  • UI improvemnets
  • More Language support
  • Result Clustering support
  • Result Faceting support
  • Disambiguation support
  • Search Analytics
  • Morphological Search
  • Ontology
  • Semantic Search
  • Entity Extraction
  • Integration of NLP tools
  • Memee