Search/Old/Solr 4

Here lies Chad's first impressions on Solr 4.2, after a week of using it.

  • I setup a SolrCloud on labs
    • Used 3 nodes acting as zookeeper (solr-zk[0-2])
    • Used 4 nodes for solr (solr0-solr3), one collection in two shards
  • The new SolrCloud stuff is (mostly) awesome
    • It's super easy to add new replicas to a collection, so it scales out nicely.
    • Each instance can act as a master (for writes) or a slave (for reads)
      • This removes the SPOF of having a single indexer (lsearchd) or a single master (solr 3.x and below)
    • Has a gui which makes it easy to look at the state of the "cloud"
    • Zookeeper manages config & index state
    • Can't re-shard a collection, requires index rebuild. There's bugs reported for this, no ETA.
      • Proper initial planning for the larger indicies would make this less of a priority.
  • Zookeeper was easy to setup, works with standard ubuntu packages & minimal config
    • Not really a SPOF since it requires multiple instances to run.
      • Formula is "require 50% + 1" to operate. So with 3 servers you need 2/3 operational, with 5 you need 3/5, etc.
      • Even the "leader" isn't a SPOF--if the leader goes away then zookeeper elects a new leader.
    • Zookeeper is already used by analytics, and they like it.
    • Unknown how well it would work cross-DC (conflicting reports)
  • Solr 4.x isn't in Ubuntu yet, not even raring
    • Installed by hand for the demo, but we'll want to look into real packages.