1. HBase/PH ENIX @ Scale A study of Salesforce’s use of HBase and Phoenix Lars Hofhansl Vice President and Principal Architect at Salesforce Apache HBase, Apache Phoenix committer and PMC member
2.Two Years Ago, I showed you this
5. Zookeeper? HBase?
6. Zookeeper? HBase? HDFS?
7. Zookeeper? HBase? HDFS? Commodity Hardware?
8. Zookeeper? HBase? Unstructured HDFS? Data? Commodity Hardware?
9.HBase/Phoenix are BIG* at Salesforce * Numbers are from some time in the past and do not reflect the current scale
10.Heavy users of relational and semi-structure data
11.Mix of customer* and internal data * Through the Salesforce Platform Offering
12.Typical Use Cases ● Samples of Customer Data: ○ Login data to track anomalies in real-time ○ Archiving, historical data moved from operational, relational storage to HBase ○ Denormalized feed views for Salesforce Chatter ○ Chatter @mention low-latency relevancy queries ○ Storage of user activity on marketing campaigns for reporting and AI/ML ● Samples of Internal usage: ○ Periodic thread dumps from all AppServers ○ Machine metrics from all machines
14.> 100 clusters of varying size
15.~4bn write requests / day
16.~80TB written / day
17.That’s about ~8 gbit/s, sustained
18.~600m read requests / day
19.~500GB read / day
20.Central Metrics Database
21.Central Metrics Database
22.Collecting data from > 80.000 machines
23.11.4 trillion metrics stored and growing
24.2.8 tn metrics in 6 months and growing
25.210 bn reads in 6 months