- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
2019.02.23-eBay-Apache Kylin Real-time Streaming
展开查看详情
1 .Apache Kylin Real-time Streaming 2019-02
2 .Agenda Why Real-time Streaming Overall Architecture Detail Design Segment and Storage HA Checkpoint Performance In eBay KYLIN REAL-TIME ANALYTICS LAUNCH 2
3 .Why Real-time Streaming • Milliseconds Data Preparation Delay • Lambda Architecture • Less Hadoop jobs and HBase Tables KYLIN REAL-TIME ANALYTICS LAUNCH 3
4 .RT Streaming Architecture KYLIN REAL-TIME ANALYTICS LAUNCH 4
5 .RT Streaming We divide the unbounded incoming streaming data into 3 stages, the data come into different stages are all queryable. InMem Stage Unbounded Continuously InMem streaming events Aggregations On Disk Stage Flush to disk, columnar based storage and indexes Full Cubing Stage Full cubing with MR or Spark, save to HBase. KYLIN REAL-TIME ANALYTICS LAUNCH 5
6 .RT Streaming Components Query Engine Build Engine Management Monitor And Streaming Coordinator Metadata Store Streaming Receiver KYLIN REAL-TIME ANALYTICS LAUNCH 6
7 .How Cube Engine Works new streaming cube request 6 1 Steaming Coordinator Build Engine 8 2 5 Streaming 7 Receivers Cluster Streaming Sources ReplicaSet1 ReplicaSet2 4 3 Cube Storage ReplicaSet3 ReplicaSet4 (HBase) ReplicaSet5 KYLIN REAL-TIME ANALYTICS LAUNCH 7
8 .How Query Engine Works SQL Query SQL Response 1 Query Engine Steaming Coordinator 2 3 Streaming Receivers Cube Cluster Storage (HBase) 8 KYLIN REAL-TIME ANALYTICS LAUNCH 8
9 .Real-time Segment States Seg_3 Seg_4 1 … L In Memory Seg_2 Store 1 … M Unbounded Fragments streaming events 1 … J Seg_1 1 … N Active Segments Immutable Segments Open to Write Close to Process KYLIN REAL-TIME ANALYTICS LAUNCH 9
10 .Segment Store On Disk KYLIN REAL-TIME ANALYTICS LAUNCH 10
11 .Column Based Fragment File Format KYLIN REAL-TIME ANALYTICS LAUNCH 11
12 .Invert Index Format • Use Roaring Bitmap. • Two format for tri-tree encoded values and fix-len encoded values KYLIN REAL-TIME ANALYTICS LAUNCH 12
13 .Compression • Support Run Length Encoding and LZ4 Compression • Use RLE compression for time-related dim and first dim • Use LZ4 for other dimensions by default • Use LZ4 Compression simple-type measure(long, double) • No compression for complex measure(count dinstinct, topn, etc.) KYLIN REAL-TIME ANALYTICS LAUNCH 13
14 .Replica Set • All receivers in the Replica Set replica set share the same assignment. Receiver1 • The lead of the ReplicaSet is responsible to upload Receiver2 Assignment: “cube1”:[1,2] real-time segments to “cube2”:[2,3] HDFS • Use Zookeeper to do leader election Zookeeper KYLIN REAL-TIME ANALYTICS LAUNCH 14
15 . Local Check Point Date Time Partition Offsets SeqID of Active Segments 2016/10/01 Kafka 1,x 2,y 3,z Seg_5, I Seg_4, J Seg_3, K 12:00:00 Seg_5 1 … I Topic_1 Part_1 ... x x+1 … Seg_4 Topic_1 Part_2 ... y x+1 … 1 … J Seg_3 Topic_1 Part_3 ... z z+1 … 1 … K Kafka Active Fragments KYLIN REAL-TIME ANALYTICS LAUNCH 15
16 .Remote Check Point • Checkpoint is saved to Cube Segment metadata after HBase segment build ”segments”:[{…, "stream_source_checkpoint": {"0":8946898241, “1”: 8193859535, ...} }, ] • The checkpoint info is the smallest partition offsets on the streaming receiver when real-time segment is sent to full build. KYLIN REAL-TIME ANALYTICS LAUNCH 16
17 .Performance • Count Query on one hour data which has 36M rows take around 800ms • Consume around 44000 events/s for one receiver (11 dimensions, 1 metrics) • Detail Performance Doc: • https://drive.google.com/file/d/1GSBMpRuVQRmr8Ev2BWvssfMd- Rck9vsH/view?ths=true KYLIN REAL-TIME ANALYTICS LAUNCH 17
18 .In eBay • 20 streaming receivers; HW: 86GB RAM, 16 cores vm • Use case: site speed analytics, 16 dim, 50 metrics KYLIN REAL-TIME ANALYTICS LAUNCH 18
19 .Kylin RT Streaming Next Step Star Schema Support Multi-Tenant Enhance Monitoring/Alerting For Streaming Receiver On Kubernetes KYLIN REAL-TIME ANALYTICS LAUNCH 19
20 .Thank you! KYLIN REAL-TIME ANALYTICS LAUNCH 20