Comparisons

edit

Several large consumers of big-data have published detailed information about some of their data processing products and clusters. A few are presented here for reference.

Twitter Rainbird (Cassandra, 2011)

edit

Analytics for the Promoted Tweets advertising platform.

  • 100,000s writes/sec
  • 10,000s reads/sec
  • 100TB+ storage
  • Extremely low latency: <100ms reads
  • Events are batched for ~60 seconds
  • Parsing and structuring performed by bundlers
  • Clients submitting events are Rainbird-aware

References

edit


Facebook Insights (Hbase, 2011)

edit

Social-plugin analytics for site owners.

  • 20 billion events/day
  • 200,000 events/second
  • <30s average delay before event surfaces in queries
  • 100+ metrics, but stored only as counters
  • Events are batched for ~1.5 seconds
  • Each node handling 10k writes/sec

References

edit


Facebook Messages (Hbase, 2010)

edit

The Facebook messaging system.

  • 135 billion messages/month (~4.5B/day)
  • 1.5M+ operations/second at peak
  • ~55% reads: 825,000 reads/sec
  • ~45% writes: 675,000 writes/sec
  • 2PB+ (petabytes) data in storage

References

edit