- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- <iframe src="https://www.slidestalk.com/dbfun/CEDAR_A_Distributed_Database_for_Scalable_OLTP?embed" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
- 微信扫一扫分享
CEDAR: A Distributed Database for Scalable OLTP
CEDAR的设计目标
- Separating read and write operations on different nodes
- Scalable reading on multiple nodes
- Writing to memory in one node only
- No expensive distributed concurrent control or synchronization
- “Deep” optimization for transactions
- To optimize data transmissions, query execution plans, and executions
- High-availability is guaranteed by log synchronization
展开查看详情
1 .CEDAR: A Distributed Database for Scalable OLTP Weining Qian wnqian@dase.ecnu.edu.cn
2 . The Internet economy 2 ¨ Content/traffic => money ¨ O2O: Online => Offline, or Offline => Online ¨ O2O of mission-critical apps: 互联网+ (Internet+) ¨ OLTP is inevitable
3 . Phenomenal applications 3 ¨ Phenomenal - very remarkable; extraordinary. • 180,000 tps in year 2016
4 . Phenomenal is common 4 ¨ 12306 during Spring Festival ¨ Black Friday promotion/second kill, ... ¨ The pressure is on backend (transaction | payment) systems ¨ Be inevitable and day-to-day more common ¨ All pressure will finally go to mission critical systems ¨ Essentially a High-throughput, scalable transaction processing problem
5 . A brief history of DBMS 5 https://maxkanaskar.files.wordpress.com/2014/04/database-platform-history.png
6 . One size fits all => One size fits none! 6
7 . DBMS 7 Data model abstraction File DBMS Systems Data processing abstraction
8 . NoSQL 8 Data model abstraction Distributed NoSQL FS Weak consistency
9 . NewSQL 9 In-memory Relational computing model NewSQL Fast networking Transaction processing
10 . OldSQL vs. NoSQL vs. NewSQL 10 OldSQL NoSQL NewSQL Data model Relational --- Relational Interface SQL Variance SQL Consistency/Concurrency control Strong Weak Strong Fault tolerance Strong Fine Strong Performance Poor Good Very good Scalability Poor Good Fine
11 . How to scale a database system? 11 Scaling-up Sharding/partitioning Replication Caching https://www.cs.cmu.edu/~christos/courses/dbms.S14/slides/29scaling.pdf
12 . The open-source OceanBase (0.4) 12
13 . OceanBase 0.4 is not enough 13 Not enough for mission ¨ Simple transactions only critical apps in banks with strong-consistency, high- ¨ Weak availability support availability, and high- throughput complex transaction processing ¨ Single-point transaction requirements ¨ More optimization needed for query/storage ¨ Interface adaptibility
14 . Features we need 14 ¨ Complex transactions ¨ High-performance: ¤ High-throughput ¤ Low latency ¨ High-availability ¨ Scalability ¨ Elasticity
15 . Overview (CEDAR core) 15 Master T-Node (backup) (backup) Master T-Node Master T-Node (backup) (backup) Hybrid storage S-Node S-Node S-Node S-Node S-Node
16 . Design choices 16 ¨ Separating read and write operations on different nodes ¨ Scalable reading on multiple nodes ¨ Writing to memory in one node only ¤ No expensive distributed concurrent control or synchronization ¨ “Deep” optimization for transactions ¤ To optimize data transmissions, query execution plans, and executions ¨ High-availability is guaranteed by log synchronization
17 . Status = Baseline + Delta 17
18 . Reading 18 • All readings need to access S-Node as well as T-Node
19 . Writings 19 • All writings need to access S-Node as well as T-Node
20 . Transaction management 20 • Transactions are only processed on the T-Node
21 . Pros and cons 21 ¨ Pros ¨ Cons ¤ Massive storage ¤ Expensivedata ¤ Scalable read transmission ¤ Efficient transaction management
22 . Performance is affected by 22 ¨ The lengths of locks affect ¤ Degrees of parallelization 35 30 Latency Time Used per Write (us) ¤ 25 ¨ Capability of S-Nodes 20 ¤ Throughput of readings 15 10 ¨ Capability of the T-Node 5 ¤ Throughput of writings 1 5 10 50 100 500 1000 Transaction Size in # Writes • Most cost is for short/simple transactions • They are easier to be scheduled
23 . S-Node optimizations 23 Static data caching Parallel readings
24 . T-Node 24 ¨ Storage node? • Balancing T-Node’s ¤ All computation resources computation for are for TP queries and transaction ¤ High communication cost processing ¨ Computation node? ¤ Low communication cost ¤ How much work should it do?
25 . Transaction compilation Procedure Order(p_itemType int, p_custId int, p_orderAmount int) declare v_price, v_value double; declare v_itemId, v_stock, v_orderAmount int; 25 select itemId, price, stock into v_itemId, v_price, v_stock from item where itemType = p_itemType order by price desc limit 1; S1 if( v_stock > p_orderAmount ) update item set stock = stock - p_orderAmount where itemId = v_itemId; L1 Execution plan v_orderAmount = p_orderAmount; else update item set stock = stock - v_stock where itemId = v_itemId; v_orderAmount = v_stock; S1 RD(itemType, ...) S2 L2 L3 S3 end v_value = v_price * v_orderAmount; update customter set balance -= v_value L1 if(v_stock>p_orderAmount) where custId = p_custId; T2 L4 S4 T3 S2 RS(itemId, ...) S3 Transactions T2 UP(itemId, ...) T3 T4 L2 v_orderAmount=... L3 control dep. L4 v_value=v_price*... data dep. S4 RS(custId, balance) Dependency graph T4 UP(custId, balance) RD: read, UP: update, RS: read static data
26 . T-Node optimization 26 Re-order T-Node ops Postpone conflict ops T RWDelta(A) S ReadStatic(A) T ReadDelta(A) T RWDelta(B) S ReadStatic(B) T RWDelta(C) S ReadStatic(A) T RWDelta(A) T RWDelta(D)
27 . TPC-C, Smallbank, TATP Benchmarks 27 20.0k NewOrder-8 100.0k PG VD CSD • 500kBetter PG than PG VD CSD NewOrder-24 15.0k 80.0k • 400kBetter than VoltDB under Throughput (tps) Throughput (tps) Throughput (tps) 10.0k 60.0k complex workload 300k 40.0k • 200kWith significant 5.0k 20.0k advantage when data 100k 0.0 cannot 0 be naturally PG VD SD CSD 8 16 24 32 40 48 8 16 24 32 40 48 System Partitions (#) partitioned Partitions (#) Payment-24 NewOrder-24 Payment-8 NewOrder-8 35 3.5 30.0k Payment-8 Payment-24 30 3.0 Norm.CSD (CSD/VD) Nomr.VD (VD/CSD) Throughput (tps) 25 2.5 20.0k 20 2.0 15 1.5 10.0k 10 1.0 5 0.5 0 0.0 0.0 PG VD SD CSD 0 4 8 12 16 0 4 8 12 16 System Cross-Warehouse Transaction Ratio (%) Cross-Warehouse Transaction Ratio (%)
28 . Indexing 28 l Index is organized as a table. l No distributed transactions. l Taking advantage of the load balancing, availability of system.
29 . Indexing overview 29 l Initialization: Preparing for the start of index construction. l Bulk Loading l Local processing: collecting statistical information l Global processing: achieving load balancing based on an equi- depth histogram l Termination: Scheduling the task for replication of the index for high availability.