Apache Kylin大数据OLAP利器 部分2

Apache Kylin大数据OLAP利器
展开查看详情

1.Kylin: ᩻ṛ௔ᚆ̵᩻ṛଚ‫ݎ‬ Kylin: high performance, high concurrency ࣁຽ‫ٵ‬௔ᚆၥᦶහഝᵞӤ҅൉‫׀‬Եᑁᕆັᧃߥଫ҅ፘ ੒Hiveํጯ‫׭‬զӤ‫ے‬᭛ྲ Tested on standard SSB data set, 200X faster than Apache Hive

2.Cube ฎই֜ᦇᓒጱ How to calculate the cube

3.Cube ฎই֜ਂ‫ ࣁؙ‬HBase ጱ How to persistent the cube

4.Cube ฎই֜ັᧃጱ How to query the cube SELECT test_cal_dt.week_beg_dt, test_category.category_name, test_category.lvl2_name, test_category.lvl3_name, test_kylin_fact.lstg_format_name, test_sites.site_name, SUM(test_kylin_fact.price) AS GMV, COUNT(*) AS TRANS_CNT FROM test_kylin_fact LEFT JOIN test_cal_dt ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt LEFT JOIN test_category ON test_kylin_fact.leaf_categ_id = test_category.leaf_categ_id AND test_kylin_fact.lstg_site_id = test_category.site_id LEFT JOIN test_sites ON test_kylin_fact.lstg_site_id = test_sites.site_id WHERE test_kylin_fact.seller_id = 123456 OR test_kylin_fact.lstg_format_name = 'New' GROUP BY test_cal_dt.week_beg_dt, test_category.category_name, test_category.lvl2_name, test_category.lvl3_name, test_kylin_fact.lstg_format_name,test_sites.site_name OLAPToEnumerableConverter OLAPProjectRel(WEEK_BEG_DT=[$0], category_name=[$1], CATEG_LVL2_NAME=[$2], CATEG_LVL3_NAME=[$3], LSTG_FORMAT_NAME=[$4], SITE_NAME=[$5], GMV=[CASE(=($7, 0), null, $6)], TRANS_CNT=[$8]) OLAPAggregateRel(group=[{0, 1, 2, 3, 4, 5}], agg#0=[$SUM0($6)], agg#1=[COUNT($6)], TRANS_CNT=[COUNT()]) OLAPProjectRel(WEEK_BEG_DT=[$13], category_name=[$21], CATEG_LVL2_NAME=[$15], CATEG_LVL3_NAME=[$14], LSTG_FORMAT_NAME=[$5], SITE_NAME=[$23], PRICE=[$0]) OLAPFilterRel(condition=[OR(=($3, 123456), =($5, ’New'))]) OLAPJoinRel(condition=[=($2, $25)], joinType=[left]) OLAPJoinRel(condition=[AND(=($6, $22), =($2, $17))], joinType=[left]) OLAPJoinRel(condition=[=($4, $12)], joinType=[left]) OLAPTableScan(table=[[DEFAULT, TEST_KYLIN_FACT]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]]) OLAPTableScan(table=[[DEFAULT, TEST_CAL_DT]], fields=[[0, 1]]) OLAPTableScan(table=[[DEFAULT, test_category]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8]]) OLAPTableScan(table=[[DEFAULT, TEST_SITES]], fields=[[0, 1, 2]])

5.Cube ฎই֜ັᧃጱ How to query the cube • Translate cube query into HBase table scan Columns, Group by à Cuboid ID Filters -> Scan Range (Row Key) Aggregations -> Measure Columns (Row Values) • Scan HBase table and translate HBase result into cube result HBase Result (key + value) -> Cube Result (dimensions + measures)

6.Apache Kylin ‫ق‬ቖአಁ Apache Kylin global users Internet FSI Telecom Manufacturing Others • eBay • ୌᦡᱷᤈ • Ӿࢵᑏۖ • ੜᔂ • MachineZon • Yahoo! • ೗ࠟᱷᤈ • Ӿࢵኪ‫מ‬ • ܏ԅ e Japan • ၳ‫ݎ‬ᱷᤈ • Ӿࢵᘶ᭗ • Ӿ‫ي‬ • Glispa • ጯଶ • ॡଘ၇‫כ‬ • AT & T • ӣจ • Inovex • ᗦࢫ ᴾ • Lenovo • Adobe • ᗑฃ • Ӿ‫מ‬ᱷᤈ • OPPO • ᑀय़ᦔᷢ • ([SHGLD • Ӿࢵᱷᘶ • VIVO • Ղӳ • ܏းᦤ‫ڭ‬ • Ṳ෧ • ࠔߝտ • ࢵး‫ށ‬ਞ • Ӥ࿻ᵞࢫ • ॰ᡡ360 ᦤ‫ڭ‬ • १๵ • JPMorgan • ӟḕࢮ • ൭ᑕ • ᴨ᯾ UC • ᨬॎತ಄ • ᄆᄆ • ൤ᇶ • ᗦࢶᐹᐹ • Ḙ᢫ᑾ

7. ᗦࢫᅩᦧ Meituan & Dianping • ࢵ๋ٖय़ O2O ‫ݪل‬ Top O2O service provider in China • Apache Kylin ؉ԅᗦࢫᐶᕚ‫ړ‬ຉ OLAP ଘ‫ݣ‬ጱ໐ஞ҅๐‫ۓ‬ಅํӱ‫ۓ‬ᕚ Apache Kylin as the main OLAP platform, serving all business lines • ౼ྊ2018ଙ8์҅හഝᰁ  ӡՊ҅Cube ௛ਂ‫ ؙ‬971 7%҅ྯॠ 380ӡ ེັᧃ Till Aug, 2018, total data row 8.9 trillion Cube storage 971 TB 3.8 million SQL queries per day • 50 ັᧃ < 200msັᧃ < 1.2s 50% queries < 200ms; 90% queries < 1.2s • ࢫᴚํग़‫ ݷ‬Apache Kylin committer & PMC Grow 3 Apache Kylin committer & PMC

8.ੜᔂ mi.com • Apache Kylin ԅੜᔂʼnහഝૡ࣋Ŋ ໐ஞ҅๐‫ۓ‬ԭ๵ӱ‫ۓ‬ᕚҔ Apache Kylin act as the engine for mi’s “data factory”, serving 18 business lines • ෭ीᰁ170Պ҅95%ັᧃࣁ 500ms̶ٖ Daily incremental 17 billion, 95% queries < 500ms.

9. <DKRR-DSDQ • ෭๜๋य़ጱࣁᕚᨻᇔᗑᒊԏӞ Leading search engine and portal in Japan; • ՗ Impala ᬢᑏ‫ ک‬Kylin զჿ᪃‫ړ‬ຉ૵֗୊᬴ᥝ࿢ Use Kylin to replace Impala, to fulfill the low latency requirement to business analysts; • ᕷय़ग़හັᧃ 1s ٖਠ౮ Most queries are returned in less than 1s; • ᯻አ᪜හഝӾஞ᮱ᗟ҅ਖ਼ Cube ҁᘒӧฎහഝ҂വᭆ‫ک‬ᐶ‫ړ‬ຉ૵ᬪጱහഝ Ӿஞ Kylin supports cross-region deployment, only push Cube instead of raw data to the DC that nearby the analysts https://techblog.yahoo.co.jp/oss/apache-kylin/

10.Apache Kylin ܲ‫ݥ‬ᇇ๜ࢧᶶ Apache Kylin development history v2.6 in progress • Distributed cache • SDK for RDBMS v2.5 v2.3 • Hadoop 3/HBase • Cube 2 v2.0 planner • MySQL as v1.6 • Snowflake • Dashboard metastore v1.5 • NRT Streaming • Plug-in architect ure

11.$SDFKH.\OLQ 5RDGPDS • New storage Parquet Druid • Real-time support • Flexible model • Containerization

12.Thanks Apache Kylin Kyligence Apache Kylin