- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Achieve high-performance and high cost-effective OLAP service in the cloud with Alluxio
Agenda
- Apache Kylin
- Kyligence Cloud
- Alluxio + Kyligence Cloud
- Summary
冯景华
高级软件工程师。Kyligence 云产品高级研发,平台组技术负责人
展开查看详情
1 .Achieve high-performance and high cost- effective OLAP service in the cloud with Alluxio jinghua.feng@kyligence.io
2 . u Apache Kylin Agenda u Kyligence Cloud u Alluxio + Kyligence Cloud u Summary
3 .Apache Kylin Top Level Project Sub-Second Interactive Ø The only open-source distributed Query OLAP platform Ø Large scale, high concurrency, multi- dimensional, sub-second query latency Award Winning 1,000+ Organizations Ø InfoWorld’s Bossies 2015 & 2016 Ø Adopted by thousands of (Best of Open Source Software Awards) organizations globally © Kyligence Inc. 2020, Confidential.
4 .Under the hood : Smart Cuboids • Each Model consists of N-Dimension Cuboids which is a combination of several dimension in different permutations and combinations. • Apache Spark is used to build the cuboids making query results extremely fast. • When the user sends a query the model intelligently looks for the Cuboids/segment returns the results extremely fast. © Kyligence Inc. 2020, Confidential.
5 .Kyligence Cloud u Simplify Data Analytics over the Cloud Azure Apache Kylin AI-Augmented Engine AWS Semantic Layer ANSI SQL MDX Marketing Finance REST Snowflake Sales Kafka Index Hadoop © Kyligence Inc. 2020, Confidential.
6 .Spark Native – Reduced Hadoop Overhead in the Cloud 1. Read …from S3/ADLS/Snowflake… 2. Build …Build Cube & Index 3. Store …Persistent #2 Data into Cloud Storage: S3/ADLS 4. Serve …Execute SQL parallelly with Cloud Storage © Kyligence Inc. 2020, Confidential.
7 .Challenge On Cloud l Rate-limited Cloud API call l Throughput limitation of object storage leads to slower query than on-premise deployment with local storage l Time-consuming metadata operations © Kyligence Inc. 2020, Confidential.
8 .Challenge On Cloud Throughput limitation of object storage leads to slower query than on-premise deployment with local storage TPC-H100 average query response time Kyligence Enterprise in S3 Kyligence Enterprise in loca storagel 45.00 40.00 35.00 30.00 25.00 20.00 15.00 10.00 5.00 0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 • Run each query 3 times • Lower is better • Record the average time • Response time increase 48% © Kyligence Inc. 2020, Confidential.
9 .Alluxio © Kyligence Inc. 2020, Confidential.
10 .Alluxio- A DATA ORCHESTRATION OPEN SOURCE IMPLEMENTATION © Kyligence Inc. 2020, Confidential.
11 .Alluxio Reference Architecture © Kyligence Inc. 2020, Confidential.
12 . Alluxio can slove Inconsistent performance on S3 Limited compute capacity on-prem Slow on-prem object store S3 performance for analytic workloads Making object store data Object storage performance, is inconsistent and data egress accessible to any compute in any particularly for metadata operations, is expensive. cloud is complex. is unpredictable. © Kyligence Inc. 2020, Confidential.
13 .Kyligence Cloud + Alluxio © Kyligence Inc. 2020, Confidential.
14 .Kyligence Cloud + Alluxio u Alluxio mounts S3 bucket or blob store as the underlying file system u Transparent to the application with little code change © Kyligence Inc. 2020, Confidential.
15 .TPC-H 22 Queries with Alluxio TPC-H100 average query response time Kyligence Cloud + Alluxio Kyligence Cloud 30.00 25.00 20.00 15.00 10.00 5.00 0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 • Run each query 3 times • Lower is better • Record the average time • Response time decreases 58% © Kyligence Inc. 2020, Confidential.
16 .TPC-H 22 S3 API CALL S3 API CALL 60000 50000 40000 30000 20000 10000 0 Get Bucket HEAD Object With alluxio Without Alluxio • Run each query 2000+ times • Lower is better • S3 Api Cost decreases 73% © Kyligence Inc. 2020, Confidential.
17 .Kyligence Cloud + Alluxio problems encountered l Some gRPC exceptions Solution: Upgrade to Alluxio 2.2.1 l Alluxio does not support ADLS Gen2 Solution: Implementing custom underlying FS l Hard to monitor outage and collect statistics of usage Solution: Integration with Prometheus for monitoring © Kyligence Inc. 2020, Confidential.
18 .Summary © Kyligence Inc. 2020, Confidential.
19 .Summary l Conclusion Alluxio can help Kyligence Cloud improve OLAP query speed and reduce the costs of object storage API calls. l Follow-up Recommended embedded Journal Replica redistribution for hot files after horizontal scale of Kyligence Cloud © Kyligence Inc. 2020, Confidential.
20 .© Kyligence Inc. 2020, Confidential.