CEDAR: A Distributed Database for Scalable OLTP

下载 1

快召唤伙伴们来围观吧
微博 QQ QQ空间 贴吧
文档嵌入链接
<iframe src="https://www.slidestalk.com/dbfun/CEDAR_A_Distributed_Database_for_Scalable_OLTP?embed" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
微信扫一扫分享
已成功复制到剪贴板

发布于

5年前

5208

人观看

#信息技术

CEDAR的设计目标

Separating read and write operations on different nodes
Scalable reading on multiple nodes
Writing to memory in one node only
No expensive distributed concurrent control or synchronization
“Deep” optimization for transactions
To optimize data transmissions, query execution plans, and executions
High-availability is guaranteed by log synchronization

展开查看详情

1 .CEDAR: A Distributed Database for Scalable OLTP Weining Qian wnqian@dase.ecnu.edu.cn

2 . The Internet economy 2 ¨ Content/traffic => money ¨ O2O： Online => Offline, or Offline => Online ¨ O2O of mission-critical apps: 互联网+ (Internet+) ¨ OLTP is inevitable

3 . Phenomenal applications 3 ¨ Phenomenal - very remarkable; extraordinary. • 180,000 tps in year 2016

4 . Phenomenal is common 4 ¨ 12306 during Spring Festival ¨ Black Friday promotion/second kill, ... ¨ The pressure is on backend (transaction | payment) systems ¨ Be inevitable and day-to-day more common ¨ All pressure will finally go to mission critical systems ¨ Essentially a High-throughput, scalable transaction processing problem

5 . A brief history of DBMS 5 https://maxkanaskar.files.wordpress.com/2014/04/database-platform-history.png

6 . One size fits all => One size fits none! 6

7 . DBMS 7 Data model abstraction File DBMS Systems Data processing abstraction

8 . NoSQL 8 Data model abstraction Distributed NoSQL FS Weak consistency

9 . NewSQL 9 In-memory Relational computing model NewSQL Fast networking Transaction processing

10 . OldSQL vs. NoSQL vs. NewSQL 10 OldSQL NoSQL NewSQL Data model Relational --- Relational Interface SQL Variance SQL Consistency/Concurrency control Strong Weak Strong Fault tolerance Strong Fine Strong Performance Poor Good Very good Scalability Poor Good Fine

11 . How to scale a database system? 11 Scaling-up Sharding/partitioning Replication Caching https://www.cs.cmu.edu/~christos/courses/dbms.S14/slides/29scaling.pdf

12 . The open-source OceanBase (0.4) 12

13 . OceanBase 0.4 is not enough 13 Not enough for mission ¨ Simple transactions only critical apps in banks with strong-consistency, high- ¨ Weak availability support availability, and high- throughput complex transaction processing ¨ Single-point transaction requirements ¨ More optimization needed for query/storage ¨ Interface adaptibility

14 . Features we need 14 ¨ Complex transactions ¨ High-performance: ¤ High-throughput ¤ Low latency ¨ High-availability ¨ Scalability ¨ Elasticity

15 . Overview (CEDAR core) 15 Master T-Node (backup) (backup) Master T-Node Master T-Node (backup) (backup) Hybrid storage S-Node S-Node S-Node S-Node S-Node

16 . Design choices 16 ¨ Separating read and write operations on different nodes ¨ Scalable reading on multiple nodes ¨ Writing to memory in one node only ¤ No expensive distributed concurrent control or synchronization ¨ “Deep” optimization for transactions ¤ To optimize data transmissions, query execution plans, and executions ¨ High-availability is guaranteed by log synchronization

17 . Status = Baseline + Delta 17

18 . Reading 18 • All readings need to access S-Node as well as T-Node

19 . Writings 19 • All writings need to access S-Node as well as T-Node

20 . Transaction management 20 • Transactions are only processed on the T-Node

21 . Pros and cons 21 ¨ Pros ¨ Cons ¤ Massive storage ¤ Expensivedata ¤ Scalable read transmission ¤ Efficient transaction management

22 . Performance is affected by 22 ¨ The lengths of locks affect ¤ Degrees of parallelization 35 30 Latency Time Used per Write (us) ¤ 25 ¨ Capability of S-Nodes 20 ¤ Throughput of readings 15 10 ¨ Capability of the T-Node 5 ¤ Throughput of writings 1 5 10 50 100 500 1000 Transaction Size in # Writes • Most cost is for short/simple transactions • They are easier to be scheduled

23 . S-Node optimizations 23 Static data caching Parallel readings

24 . T-Node 24 ¨ Storage node? • Balancing T-Node’s ¤ All computation resources computation for are for TP queries and transaction ¤ High communication cost processing ¨ Computation node? ¤ Low communication cost ¤ How much work should it do?

25 . Transaction compilation Procedure Order(p_itemType int, p_custId int, p_orderAmount int) declare v_price, v_value double; declare v_itemId, v_stock, v_orderAmount int; 25 select itemId, price, stock into v_itemId, v_price, v_stock from item where itemType = p_itemType order by price desc limit 1; S1 if( v_stock > p_orderAmount ) update item set stock = stock - p_orderAmount where itemId = v_itemId; L1 Execution plan v_orderAmount = p_orderAmount; else update item set stock = stock - v_stock where itemId = v_itemId; v_orderAmount = v_stock; S1 RD(itemType, ...) S2 L2 L3 S3 end v_value = v_price * v_orderAmount; update customter set balance -= v_value L1 if(v_stock>p_orderAmount) where custId = p_custId; T2 L4 S4 T3 S2 RS(itemId, ...) S3 Transactions T2 UP(itemId, ...) T3 T4 L2 v_orderAmount=... L3 control dep. L4 v_value=v_price*... data dep. S4 RS(custId, balance) Dependency graph T4 UP(custId, balance) RD: read, UP: update, RS: read static data

26 . T-Node optimization 26 Re-order T-Node ops Postpone conflict ops T RWDelta(A) S ReadStatic(A) T ReadDelta(A) T RWDelta(B) S ReadStatic(B) T RWDelta(C) S ReadStatic(A) T RWDelta(A) T RWDelta(D)

27 . TPC-C, Smallbank, TATP Benchmarks 27 20.0k NewOrder-8 100.0k PG VD CSD • 500kBetter PG than PG VD CSD NewOrder-24 15.0k 80.0k • 400kBetter than VoltDB under Throughput (tps) Throughput (tps) Throughput (tps) 10.0k 60.0k complex workload 300k 40.0k • 200kWith significant 5.0k 20.0k advantage when data 100k 0.0 cannot 0 be naturally PG VD SD CSD 8 16 24 32 40 48 8 16 24 32 40 48 System Partitions (#) partitioned Partitions (#) Payment-24 NewOrder-24 Payment-8 NewOrder-8 35 3.5 30.0k Payment-8 Payment-24 30 3.0 Norm.CSD (CSD/VD) Nomr.VD (VD/CSD) Throughput (tps) 25 2.5 20.0k 20 2.0 15 1.5 10.0k 10 1.0 5 0.5 0 0.0 0.0 PG VD SD CSD 0 4 8 12 16 0 4 8 12 16 System Cross-Warehouse Transaction Ratio (%) Cross-Warehouse Transaction Ratio (%)

28 . Indexing 28 l Index is organized as a table. l No distributed transactions. l Taking advantage of the load balancing, availability of system.

29 . Indexing overview 29 l Initialization: Preparing for the start of index construction. l Bulk Loading l Local processing: collecting statistical information l Global processing: achieving load balancing based on an equi- depth histogram l Termination: Scheduling the task for replication of the index for high availability.

23点赞

8收藏

1下载