- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
基于Kubernetes搭建HBase在知乎的实践
展开查看详情
1 .Building Online HBase Cluster of Zhihu Based on Kubernetes
2 .Agenda • HBase at Zhihu • Using Kubernetes • HBase Online Platform
3 .• HBase at Zhihu • Using Kubernetes • HBase Online Platform
4 .HBase at Zhihu • Offline • Physical machine, more than 200 nodes. • Working with Spark/Hadoop. • Online • Based on Kubernetes, more than 300 containers.
5 .Our online storage • mysql • used in most business • some need scale, some need transform • all SSD expensive • Redis • cache and partial storage • no shard • expensive • HBase / Cassandra / Rocksdb etc. ?
6 .At the beginning • All business at one big cluster • Also runs Nodemanager and ImpalaServer • Basically operation • Physical node level monitor
7 .What we want • From Business Sight • environment isolation • SLA definition • business level monition • From Operation Sight • balance resource ( CPU, I/O, RAM ) • friendly api • controllable costs
8 .In one word: Make HBase as a Service.
9 .• HBase at Zhihu • Using Kubernetes • HBase Online Platform
10 .Zhihu’s Unified Cluster Manage Platfom
11 . Kubernetes • Cluster resource manager and scheduler • Using container to isolate resource • Application management • Perfect API and active community
12 . Failover Design • Component Level • Cluster Level • Data Replication
13 .Component Level • HMaster -> use ZooKeeper • RegionServer -> Stateless designed • ThriftServer -> use proxy • HFile -> ???
14 .Component Level - HFile • Shared HDFS Cluster • Keep the whole cluster stateless
15 .Cluster Level • What if cluster is down ? • Component -> Kubernetes ReplicationSet • What if Kubernetes is down ? • Mixed deployment • Few physical nodes with high CPU && RAM
16 .Data Replication • Replication in cluster • HDFS built in ( 3 replicas) • Replication between clusters • snapshot + bulk load • HBase replication • Offline cluster doing MR / Spark
17 .• HBase at Zhihu • Using Kubernetes • HBase Online Platform
18 .Physical Node Resource • CPU: 2 * 12 core • Memory: 128 G • Disk: 4 T
19 .Resource Definition (1) • Minimize the resource • Business scaled by number of containers • Pros • reduce resource wasted per node • simplified debug • Cons • minimum resource not easy to define by business • hardly tune params for RAMs and GC
20 .Resource Definition (2) • Customize container resource by business • Business scaled by number of containers • Pros • flexible RAM config and tuning ( especially non-heap size ) • used in production
21 .Container Configuration • Params inject to container via ENV • Add xml config to container • Use start-env.sh to init configuration • Modify params during cluster running is permitted
22 .RegionServer G1GC ( thanks Xiaomi ) -XX:+UnlockExperimentalVMOptions -XX:MaxGCPauseMillis=50 -XX:G1NewSizePercent=5 -XX:InitiatingHeapOccupancyPercent=45 -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=2 -XX:ParallelGCThreads=8 -XX:MaxTenuringThreshold=15 -XX:G1OldCSetRegionThresholdPercent=10 -XX:G1MixedGCCountTarget=16 -XX:MaxDirectMemorySize=256M
23 .Network • Dedicated ip per container • DNS register/deregister automatically • Modified /etc/hosts for pod
24 .Manage Cluster • Platform controls cluster • Kubernetes schedule resources • Shared HDFS and ZK cluster • Cons: • fully scan still impact whole cluster • no locality && short circuit holly
25 .Client Design • For Java/Scala • native HBase client • only offer ZK address to business • For Python • happybase • client proxy • service discovery
26 .API Server • Bridge between Kubernetes and business user • Encapsulate component of a HBase cluster • Restful API • Friendly interface
27 . Monitor Cluster • Physical nodes Level • nodes cpu loads && usage ( via IT ) • Cluster Level • pods cpu loads ( via Kubernetes) • read && write rate , P95, cacheHit ( via JMX) • Table Level • client write speed && read latency ( via tracing ) • thrift server ( via JMX ) • proxy concurrency ( via DNS/haproxy monitor )
28 .Current Situation • 10 online business on platform • More than 300 containers • 100% SLA
29 . Benefits • Easy • Isolate • Flexible