Apache Kylin大数据OLAP利器 部分1

【分会场五05-倪春恩】Apache Kylin大数据OLAP利器

1.Apache Kylin: OLAP ‫ݪل‬ғKyligence ᘳ֖ғṛᕆᎸ‫ݎ‬ૡᑕ૵ ᄍᦖᘏғ‫׻‬ช௮

2.Agenda Ø Apache Kylin ᓌՕ Ø Apache Kylin Brief Ø Kylin ໐ஞܻቘ Ø Kylin Core Ø َࣳአಁໜֺ Ø Use cases Ø Kylin ‫ܲ઀ݎ‬ᑕ Ø Kylin development history Ø Kylin 3.0 roadmap Ø Kylin 3.0 roadmap

3. Apache Kylinғ‫ق‬ቖᶾ‫ض‬ጱय़හഝ‫ړ‬ຉದ๞ OLAP-on-Hadoop Apache Kylin Leading OLAP-on-Hadoop Engine • ᶮᕆᶱፓ Top Level Project • ኞாᐒ‫ ܄‬Community Apache KylinᒫӞӻ๶ᛔӾࢵጱ $SDFKHᶮᕆ୏რᶱፓ ၚ᪋ጱᐒ‫҅܄‬ռग़አಁ݊୏‫ݎ‬ᘏ҅ଠာጱ୏რ̵ࠟ Apache Kylin, the first TLP from China ӱ‫֢ݳ‬վ֎֛ᔮ Active user and developer community, diverse users. • ᤈӱᦊ‫ ݢ‬Recognition 2015/16 ᬳᖅӷଙឍ឴InfoWorldʼn๋֯୏რय़හഝૡٍॹŊ 2015/16 Bossie Awards by InfoWorld. • ದ๞ս۠ Technology चԭᶼᦇᓒଚᤈᦇᓒ‫ڜ‬ୗਂ‫ؙ‬ᒵս۸ದ๞҅ਫሿ ၹᰁහഝṛଚ‫ݎ‬Եᑁᕆߥଫጱਫ෸හഝ‫ړ‬ຉଘ‫ݣ‬ • አಁᦊ‫ ݢ‬Adoption Built with the leading big data technologies, Kylin ‫ق‬ቖ᩻ᬦ1000ਹᶾ‫ض‬մӱֵአKylinय़හഝ‫ړ‬ຉଘ‫ݣ‬ᥴ٬ොໜ can support massive data, high concurrency and More than 1000 deployments in the world sub-second latency.

4.Kyligence = Kylin + Intelligence Apache Kylin մӱᕆ Founded by Apache Kylin origin team ӫӱ๐‫ۓ‬ Ծߝ Professional Enterprise ü 50% Apache Kylin PMC Service Product More than 50% Apache Kylin PMC ü 90% Kylin ຅ୌᶾ‫ض‬ጱ Contribute 90% source code ᓕቘӨᛔۖ۸ ‫ق‬ቖ୏რᐒ‫܄‬ Automation & Build a global open Kylin Monitoring source community Enterprise product powered by Kylin ü Kyligence Enterpriseғմӱᕆ OLAP ଘ‫ݣ‬ ᤈӱ Kyligence Enterprise Intelligent analytics platform ԯᦇᓒ ü Kyligence Cloudԯᦇᓒय़හഝฬᚆᬩᖌ ᥴ٬ොໜ Cloud Kyligence Cloud: Analytics in the cloud Solution

5.ᶱፓ޾‫ܲ઀ݎݪل‬ᑕ Kylin and Kyligence milestones 2017.5 2018.7 2014.11 Kyligence ᗦ Kyligence ឴஑ේ᭲ᩒ๜ ‫ فے‬Apache ਔ۸ 2016.3 2016.9 ࢵ‫ݪلړ‬౮ 1500ӡᗦ‫ ز‬B ᫪ᣟᩒ ࢏҅Apache Kylin ྋ Kyligence ୌᒈ҅ ԫེ឴஑ ឴஑ᕁᅩ‫ڠ‬ಭහጯӡॠֵಭᩒ InfoWorld๋֯୏ ᒈ ୗ୏რ Series B financing 15 რय़හഝૡٍॹ Kyligence US million USD led by Join Apache Kyligence founded branch Eight Road capital incubator Kylin open angel investments from Red Win InfoWorld Point Ventures Bossie Award founded sourced 2015.11 2016.8 2017.4 2017.12 2018.9 ླӱ౮ԅApache ‫૲ݎ‬մӱᕆฬᚆय़හഝ ਠ౮A᫪ᣟᩒ 800 Kyligence Apache Kylin v2.5 ‫ݎ‬ ᶮᕆᶱፓ ᥴ٬ොໜ ӡᗦᰂ ҅ኧ਼ଃ Cloud‫૲ݎ‬ ૲ Kyligence Enterprise ᩒ๜̵ᶲԅᩒ๜ᶾ Graduated to ಭ҅ᕁᅩӾࢵ᪙ಭ Announce Apache Kylin v2.5 Apache Top Level Announce Kyligence released Project Kyligence Enterprise Series A financing Cloud 8 million USD led by Broadband capital, Shunwei Capital, Red Point

6.:KDWLV$SDFKH.\OLQ - 3 ӡՊ๵හഝ < 1 ᑁ ັᧃ୊᬴ BI @१๵ࢵٖᒫӞෛ᳼ᩒᦔapp Visualization Interactive Reporting Dashboard 3 trillion data, < 1 s latency @toutiao, top news feed app in China - 60+ ᖌଶጱCube OLAP Engine @CPIC 60+ dimensions @CPIC, top3 insurance company Hive / HDFS / Hadoop MapReduce HBase/Parquet - JDBC / ODBC / RestAPI Kafka - BI integration Tableau Excel Cognos Superset, Redash, Qlik

7. 2/$3DQG2/$3&XEH • Online analytical processing, • OLAP Cube is the core or OLAP, is an approach to answering multi-dimensional • OLAP cube is a data structure optimized for analytical (MDA) queries swiftly very quick data analysis. in computing. – Wikipedia • च๜඙֢ Basic operations Ӥ‫ ܫ‬Roll-up ӥ᱂ Drill-down ‫ڔ‬ᇆ Slice and dice ෤᫨ Pivot

8.ᑮᳵഘ෸ᳵғCube चᏐܻቘ Cube: balance between space and time ᶼ‫ض‬ᬰᤈ ࿤௛̵‫ړ‬ᔄ̵ഭଧ Classification, aggregation, and sorting ᖌଶཛྷࣳ හഝᒈො֛ Multiple Dimensional OLAP Cube model

9.Cube ᶼᦇᓒฎ Kylin ໐ஞದ๞ቘஷ Cube is the core concept in Kylin

10.$SDFKH.\OLQ चԭԆၞय़හഝದ๞ $SDFKH.\OLQ LVEXLOWZLWKPDLQVWUHDPELJGDWDWHFKQRORJLHV Data Analyst, BI Tools, Web App… ᐶᕚᦇᓒ SQL Offline calculation ࣁᕚᦇᓒ Online calculation Optimize & Rewrite Scan & filter Extract Load Compute

11. ෫ᶼᦇᓒጱ 64/ಗᤈᦇ‫ښ‬ 64/H[HFXWLRQSODQZLWKRXW&XEH ໏ֺғ ‫ړ‬ຉӞྦྷ෸ᳵٖ҅ӧ‫ ݶ‬ʼnreturnflagŊ޾ ʼnorderstatusŊ ੒ଫጱᲀࠓఘ‫٭‬ Sample Check the order return and order status relationship in a time range select l_returnflag, o_orderstatus, sum(l_quantity) as sum_qty, Sort sum(l_extendedprice) as sum_base_price from Aggr v_lineitem . inner join v_orders on l_orderkey = o_orderkey Filter ෫ᶼᦇᓒ҅‫ق‬᮱ሿ࣋ᦇ where ᓒ҅I/O ग़҅୊᬴ṛ l_shipdate <= '1998-09-16' Join group by No cube, all need online l_returnflag, calculations, CPU and IO Tables o_orderstatus intensive, latency is O(N) order by remarkable. l_returnflag, o_orderstatus;

12.ํᶼᦇᓒጱ 64/ಗᤈ 64/H[HFXWLRQSODQZLWK&XEH ํᶼᦇᓒ҅चԭ Sort Sort Cube ‫ڊ‬ᕮຎ҅I/O ੝҅ᦇᓒ੝҅୊᬴֗ Aggr Filter . Directly from aggregated data Filter Cube (cube) with index; ൉‫ڹ‬࿤௛޾ᔱ୚౮ Much less CPU and IO. Cube Join Latency is small. The table join and aggregation are Cube Tables completed offline. O(N) O(flag x status x days) = O(1)