New Kylin Streaming in eBay

下载 7

Kyligence

发布于

5446

人观看

#信息技术

2018年10月Apache Kylin meetup@杭州，eBay 大数据团队 Kylin 架构师分享了团队基于 Kylin 开发的实时 OLAP 解决方案。

展开查看详情

1 .New Kylin Streaming in eBay 2018-10-22

2 .Agenda Why new Streaming Overall Architecture Detail Design Segment and Storage HA Checkpoint Performance KYLIN REAL-TIME ANALYTICS LAUNCH 2

3 .Why New Streaming • Milliseconds Data Preparation Delay • Lambda Architecture • Less MR jobs and HBase Tables KYLIN REAL-TIME ANALYTICS LAUNCH 3

4 .New Streaming Architecture KYLIN REAL-TIME ANALYTICS LAUNCH 4

5 .New Streaming We divide the unbounded incoming streaming data into 3 stages, the data come into different stages are all queryable. InMem Stage Unbounded Continuously InMem streaming events Aggregations On Disk Stage Flush to disk, columnar based storage and indexes Full Cubing Stage Full cubing with MR or Spark, save to HBase. KYLIN REAL-TIME ANALYTICS LAUNCH 5

6 .Streaming Components Query Engine Build Engine Management Monitor And Streaming Coordinator Metadata Store Streaming Receiver KYLIN REAL-TIME ANALYTICS LAUNCH 6

7 .How Streaming Cube Engine Works new streaming cube request 1 Steaming Coordinator 6 Build Engine 8 2 5 Streaming 7 Receivers Cluster Streaming Sources ReplicaSet1 ReplicaSet2 4 3 Cube Storage ReplicaSet3 ReplicaSet4 (HBase) ReplicaSet5 KYLIN REAL-TIME ANALYTICS LAUNCH 7

8 .How Streaming Query Engine Works SQL Query SQL Response 1 Query Engine Steaming Coordinator 2 3 Streaming Receivers Cube Cluster Storage (HBase) 8 KYLIN REAL-TIME ANALYTICS LAUNCH 8

9 .Real-time Segment States Seg_3 Seg_4 1 … L In Memory Seg_2 Store 1 … M Unbounded Fragments streaming events 1 … J Seg_1 1 … N Active Segments Immutable Segments Open to Write Close to Process KYLIN REAL-TIME ANALYTICS LAUNCH 9

10 .Segment Store On Disk KYLIN REAL-TIME ANALYTICS LAUNCH 10

11 .Column Based Fragment File Format KYLIN REAL-TIME ANALYTICS LAUNCH 11

12 .Invert Index Format • Use Roaring Bitmap. • Two format for tri-tree encoded values and fix-len encoded values KYLIN REAL-TIME ANALYTICS LAUNCH 12

13 .Compression • Support Run Length Encoding and LZ4 Compression • Use RLE compression for time-related dim and first dim • Use LZ4 for other dimensions by default • Use LZ4 Compression simple-type measure(long, double) • No compression for complex measure(count dinstinct, topn, etc.) KYLIN REAL-TIME ANALYTICS LAUNCH 13

14 .Replica Set • All receivers in the Replica Set replica set share the same assignment. Receiver1 • The lead of the ReplicaSet is responsible to upload Receiver2 Assignment: “cube1”:[1,2] real-time segments to “cube2”:[2,3] HDFS • Use Zookeeper to do leader election Zookeeper KYLIN REAL-TIME ANALYTICS LAUNCH 14

15 . Local Check Point Date Time Partition Offsets SeqID of Active Segments 2016/10/01 Kafka 1,x 2,y 3,z Seg_5, I Seg_4, J Seg_3, K 12:00:00 Seg_5 1 … I Topic_1 Part_1 ... x x+1 … Seg_4 Topic_1 Part_2 ... y x+1 … 1 … J Seg_3 Topic_1 Part_3 ... z z+1 … 1 … K Kafka Active Fragments KYLIN REAL-TIME ANALYTICS LAUNCH 15

16 .Remote Check Point • Checkpoint is saved to Cube Segment metadata after HBase segment build ”segments”:[{…, "stream_source_checkpoint": {"0":8946898241, “1”: 8193859535, ...} }, ] • The checkpoint info is the smallest partition offsets on the streaming receiver when real-time segment is sent to full build. KYLIN REAL-TIME ANALYTICS LAUNCH 16

17 .Performance • Count Query on one hour data which has 36M rows take around 800ms • Consume around 44000 events/s for one receiver(11 dimensions, 1 metrics, no aggregations) • Detail Performance Doc: • https://drive.google.com/file/d/1GSBMpRuVQRmr8Ev2BWvssfMd- Rck9vsH/view?ths=true KYLIN REAL-TIME ANALYTICS LAUNCH 17

18 .Streaming Next Step Star Schema Support Multi-Tenant Enhance Monitoring/Alerting For Streaming Receiver On Kubernetes KYLIN REAL-TIME ANALYTICS LAUNCH 18

19 .Thank you! KYLIN REAL-TIME ANALYTICS LAUNCH 19

9点赞

4收藏

7下载