当分布式数据库遇到云时 — 汲取经验教训
展开查看详情
1.When Distributed Database Meets Cloud Lessons Learned Yanqing Weng iweng@pivotal.io © Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0
2. ■ Principal Software Engineer in Pivotal ■ Apache HAWQ Committer About me ■ Apache HAWQ PMC Member
3.Agenda ■ Introduction to Distributed Database ■ Distributed Database on Cloud ■ Lessons Learned ■ Q&A Cover w/ Image
4.Distributed Database
5. ■ Apache Hadoop Native SQL, Advanced, MPP, Elastic Query Engine. Apache HAWQ ■ Apache Top Level Project in 2018.8
6.Apache HAWQ Architecture
7.Apache HAWQ Query Processing Slice: 1. a portion of the plan that segments can work on independently. 2. a query plan is sliced wherever a motion operation occurs in the plan. Motion: 1. an operation involves moving tuples between the segments during query processing. 2. three types: redistribution, broadcast, Virtual Segment: gather motion. 1. a resource unit for QD and Resource Manager 2. an execution unit 3. VSEG number determines the degree of parallelism of a query SELECT COUNT(*) FROM lineitem, part WHERE dynamically. p_partkey=l_partkey AND p_brand = 'Brand#23'
8.Virtual Segment ■ Resource allocation unit ■ Query execution unit ■ Variable virtual segment number ■ Place on any physical segment
9. ■ High Performance ■ Storage computing separation Summary ■ Fine-grained resource management ■ Elastic query execution engine ■ Stateless segment
10.Cloud Database
11. ■ Database as Service ■ Efficient Resource Management Requirement ■ Infrastructure Agnostic ■ DBA Free
12. ■ Container VS. Virtual Machine Deployment & ■ Kubernetes Operation ○ Service discovery ○ Load balancing ○ Horizontal and Vertical auto scaling ○ Rolling upgrade ○ Monitor and metrics collection ○ …...
13.Architecture
14. ● Storage Service ○ Cloud Storage, Amazon S3, Hadoop…... ○ Unified Cache Lever by Alluxio Architecture ● Computing Service ○ Shared Segment Pool ○ Global Resource Management Service ● Database Service ○ Master/Standby as Database ○ Get Segments for Query on Demand ● Control Plane ○ Operator/Controllers as DBA
15.Apache HAWQ on Kubernetes
16.Custom Resource
17. ■ HAWQ Operator ■ Resource Pool Controller Controller ■ Resource Pool AutoScaler ■ Resource Recommender ■ Query Controller
18.Apache HAWQ on Kubernetes
19.Lesson Learned
20. ■ Service Oriented Architecture ○ Monolithic → Micro Service Architecture ■ Resource Centric ○ Abstract Component as Resource ○ Service for Resource Usage ○ Controller for Resource Management
21. ■ Container != Image ■ Container != VM Containerization ■ Container = Fine-grained Resource
22. ■ Traditional Database Resource ○ Fixed resources Management ○ Balance resource usage among queries ■ Cloud Database ○ Dynamic resources ○ Maximize resource sharing ○ Maximize resource utilization for each query
23.Resource ● Database Monitoring & ○ Variant Query Workload ○ Data Size Tuning ○ …… ● Query Similarity and Classification ● Query Resource Monitoring ○ Pod Runtime Metrics ○ Application Logs ○ Kubernetes Events ○ …... ● Intelligent Resource Tuning ○ Resource Pool Definition ○ Horizontal & Vertical
24. ■ Log Collection Kubernetes ○ Fluentd Ecosystem ■ Monitoring and Metrics Collection ○ Prometheus ■ Visualization ○ Grafana ■ …...
25. ■ Management Utilities ■ Imperative VS. Declarative Others ■ Pod Priority ■ …...
26.Thank you! Questions?