新能源汽车领域大数据平台的架构,特别对于数据存储和处理有着很重的依赖,博尼施科技的颜禹介绍了HBase在其中的关键作用。

注脚

展开查看详情

1. hosted by The Application of HBase in New Energy Vehicle Monitoring System 颜禹 Yan Yu 博尼施科技 Burnish Technology Co. Ltd

2. hosted by content 1. Background 2. Challenges and Decisions 3. System architecture 4. Why HBase 5. Challenges with HBase 6. Data backup in HBase 7. Conclusion 8. Prospect

3. hosted by Backgroud  100k running vehicles online  send 2 packages per minute every vehicle.  data space  the origin package size is 1KB.  parsed package size is about 7KB.  one vehicle will produce 20mb data per day.  2TB data were generated per day.  2.9 billion rows need to write to HBase every day.  concurrency  3.3k persistent tps  100k persistent connections  3.3MB origin data needs to parse per second  23.1MB parsed data needs to storage in HBase per second

4.hosted by

5. hosted by Challenges  Small team  Limited funds, machines,  Short deadline  System integration  How to handle the huge amount of vehicle data  Demands are very foggy.

6. hosted by Decisions  Language  Message queue  Database  Develop flow  Micro service  Monolithic service  Deploy and maintain  Cloud  Native data center

7. hosted by Language  C/C++  High performance  Hard to integrate  Long development time  Java  High performance  Rich third part packages  Easy to integrate with big data system, i.e. Hadoop, HBase, spark  Python  Sprint development  Rich third part packages  Performance issue with multi thread  Golang  Easy to write multi thread program  There is no Golang developer in our team

8. hosted by Message  Redis  High performance  High memory requirement  Hard to scale  Celery  More fit for distribution task  Easy to develop with python  Redis or rabbitmq as it’s backend  Kafka  Write to disk first to ensure the message security  Support consumer group  Auto balance  Enough performance for our system  Easy to scale  Rabbitmq  Classic message queue  Performance

9. hosted by Database  MySql  Relational database  Fit for storage static information  ORM support  MongoDB  Document based  ORM support  Hard to maintain and scale  Hbase  Column database  High write performance  Easy to handle TB  Easy to scale  OpenTSDB  Time series database  Based on HBase

10.Monolithic service vs micro hosted by service  Monolithic service  Easy to develop when system is not very complicate  Acceleration for development  Build the basic system due the the foggy demands  Micro service  Easy to scale in a complicate system  Rapid iteration  More developers requirement

11. hosted by Develop flow Dependences on central server.  Dependences on central server.  Easy to setup on one server  Single point failure risk  Confliction over multi developers

12. hosted by Develop flow  Dependences on individual docker engine.  Easy to setup with docker-compose  No single point risk  High memory develop machine requirement(starting from 32GB)

13. hosted by Deploy and maintain  Cloud  Easy setup  Low cost with small scale  Fast deployment  No employees  Native data center  Hard setup  Expensive cost with small scale  Professional employee to maintain our data center

14. hosted by Deploy and maintain  Deploy system with kubernetes  Easy to management  Rapid scale  Compute and storage split separation  Deploy basic component with cloud service  Fast deploy  Careless  Easy to get high available service  No employees

15.How a small team hold a high performance hosted by system.  Individual develop environment with docker and docker- compose.  Deploy system with kubernetes to reduce the operation cost.  Develop with pure python code.  Just build the basic system, another demands delay to second phase development.

16. hosted by The System Architecture

17. hosted by The System Architecture

18. hosted by System maintain  Application scale  Application scale with kubernetes  Basic component with cloud service  CI/CD  CI with jenkins  CD with jenkins and kubernetes  Data Backup  Mysql backup  Hbase backup  Message backup

19. hosted by Why HBase?  High write performance  Quick response for query  Easy to scale  SQL support with phoenix  Aliyun provide HBase SAAS

20. hosted by Connect to HBase cluster with python  Provide native java API  Connect HBase with thrift  Happybase provide pythonic API  SQL support with phoenix

21. hosted by challenges with HBase  Row key design  Hash prefix + timestamp  Second index  Import phoenix support  Insert index manually  Table design  Short column name  Carefully design the table with demands (i.e. the mileage of every single vehicle)  Complex query is very slow.  Create index  Export some results to HDFS or MySql (kylin?)

22. hosted by Hbase Cluster

23. hosted by Data Backup Approach

24.hosted by

25. hosted by Pain point  Complex query with HBase API still very slow  Phoenix needs create index to the query speed  Phoenix query still very slow if there is no index in HBase  Complex query needs big size of index in HBase  The queryserver between python and phoenixdb is very weak

26. hosted by Conculsion  Introduced the background of monitoring system  Our decisions of the system  Why we choose HBase as our main database  How we deploy and maintain the system  Introduced the practice of HBase in the system

27. hosted by Prospects  Rewrite high performance component with golang.  Split the monolithic system into micro service when the system becomes complex  Data analysis  Fault diagnosis  Predict the vehicle status  Data compression  Opentsdb  Combine the elasticsearch and Hbase in our application.

28.hosted by Thanks

user picture
为了让众多HBase相关从业人员及爱好者有一个自由交流HBase相关技术的社区,阿里巴巴、小米、华为、网易、京东、滴滴、知乎等公司的HBase技术研究人员共同发起了组建中国HBase技术社区。

相关文档