- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
2022-深圳-大数据开源汇
展开查看详情
1 .
2 .活动议程 13:30-14:00 签到 14:00-14:10 开场演讲 陈传杰 / openLooKeng开源社区 陈广胜 / 微众银行资深技术专家、Apache Eventmesh创始人、ALC Shenzhen Lead 14:10-14:50 Apache Hive 3.0 Updates 刘岩 / Cloudera解决方案工程师、Apache Hive/Apache Flink Contributor 14:50-15:30 openLooKeng业务场景实践与案例分享 罗旦 / openLooKeng高级开发工程师 15:30-16:10 openLooKeng&Linkis打开数据互联互通新场景 王和平 / 微众银行大数据平台高级工程师、Apache linkis (Incubating) PPMC 16:10-16:50 Apache RocketMQ 5.0:消息、事件、流融合高可用架构的演进 金融通 / Apache RocketMQ PMC Member、Committer 16:50-17:10 互动&抽奖 17:10-17:20 合影留念
3 .
4 .ALC(Apache Local Community) Shenzhen 是Apache的官 方公益组织,旨在中国本土传播Apache 之道!认真贯彻落实村 落效应,加强深圳以及粤港澳大湾区范围内的开源项目线下活动: 分享、研讨、项目推广、共同体建设。
5 .Apache Hive 3: A new horizon Yan Liu SE @ Cloudera
6 .About Me • Work : Solution Engineer @ Cloudera Since 01/2019 Solution Engineer @ Hortonworks Since 04/2017 • Education: Graduated @ Massey University (New Zealand) Major 1 : Computer Science Major 2 : Mathematics Studying Artificial Intelligence @ Stanford (U.S) • Play: Old school rock music Coding 6 © Cloudera, Inc. All rights reserved.
7 .Hive 1.X and 2.X hive.execution.engine = mr,tez,spark © 2019 Cloudera, Inc. All rights reserved. 7
8 .Goodbye MapReduce Hive 3.0 hive.execution.engine = tez,spark © 2019 Cloudera, Inc. All rights reserved. 8
9 .Goodbye Hive On Spark We only support Hive On Tez hive.execution.engine = tez © 2019 Cloudera, Inc. All rights reserved. 9
10 .Goodbye Hive On Spark Tez is the only supported engine onwards hive.execution.engine = tez hive.llap.execution.mode = none map only all auto With Full SQL 2011 standard Support © 2019 Cloudera, Inc. All rights reserved. 10
11 .Difference between non-LLAP and LLAP execution Without LLAP With LLAP Query arrives to HS2; it is parsed and compiled into “tasks” Query arrives to HS2; it is parsed and compiled into “tasks” Tasks are handed over to Tez AM (query coordinator) Tasks are handed over to Tez AM (query coordinator) Coordinator (AM) asks YARN for containers Coordinator (AM) locates LLAP instances via ZK Coordinator (AM) pushes task attempts into containers Coordinator (AM) pushes task attempts as fragment into LLAP LLAP IO/cache used to read dataor RecordReader used to read RecordReader used to read data data Hive operators are used to process data Hive operators are used to process data* Final tasks write out results into HDFS Final tasks write out results into HDFS HS2 forwards rows to JDBC HS2 forwards rows to JDBC © 2019 Cloudera, Inc. All rights reserved. 11
12 .Hive 2 With LLAP © 2019 Cloudera, Inc. All rights reserved. 12
13 .Hive 2 With LLAP Tez Tez Spark Tez Tez Spark LLAP LLAP LLAP Spark Tez Spark Spark Tez Spark Yarn HDFS © 2019 Cloudera, Inc. All rights reserved. 13
14 . Hive LLAP 480 Vcores 4TB Mem MPP 512 Vcores 4TB Mem MPP 840 Vcores 16TB Mem https://www.slideshare.net/Hadoop_Summit/hive-llap-a-high-performance-costeffective-alternative-to-traditional-mpp-databases © 2019 Cloudera, Inc. All rights reserved. 14
15 .Hive 3.0 Updates
16 .Hive 3.0 - LLAP Resource Management CREATE RESOURCE PLAN daytime; CREATE POOL daytime.bi WITH ALLOC_FRACTION=0.8, QUERY_PARALLELISM=5; CREATE POOL daytime.etl WITH ALLOC_FRACTION=0.2, QUERY_PARALLELISM=20; CREATE RULE downgrade IN daytime WHEN total_runtime > 3000 THEN MOVE etl; © 2019 Cloudera, Inc. All rights reserved. 16
17 . Hive Storage Handler SQL HiveServer2 Tez LLAP Yarn Customize Storage Handler HBASE Elastic HDFS Google MongoDB VoltDB Cassandra JDBC Druid Kafka Search Spreadsheets S3 ADLS WASB GCS © 2019 Cloudera, Inc. All rights reserved. 17
18 .HIVE-1555: JDBC Storage Handler CREATE EXTERNAL TABLE Store ( s_store_sk bigint, s_store_id string, HiveServer2 s_rec_start_date string, SQL s_rec_end_date string, ... ... ) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( Tez LLAP ... "hive.sql.table" = "STORE", Yarn ... CREATE EXTERNAL TABLE Store ); ( s_store_sk bigint, s_store_id string, s_rec_start_date string, s_rec_end_date string, ) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' Store Store_Sales TBLPROPERTIES ( Store HDFS ... "hive.sql.query" = "SELECT s_store_sk , s_store_id , s_rec_start_date , s_rec_end_date FROM Store ", ... ); © 2019 Cloudera, Inc. All rights reserved. 18
19 .HIVE-1555: JDBC Storage Handler Predicate Pushdown Sink Select FLOOR(d_timestamp TO DAY), Sort Limit SUM(ss_sales_price) AS sum_sales FROM store_sales,store_returns,date_dim WHERE ss_item_sk=sr_item_sk AND Aggregate ss_ticket_number = sr_ticket_number AND ss_sold_date_sk = d_date_sk AND ss_store_sk = 410 Join Group BY FLOOR(d_timestamp TO DAY) ORDER BY sum_sales DESC LIMIT 100; JDBC Scan Join (Store_Returns) JDBC Scan JDBC Scan (Store_Sales) (Date_dim) © 2019 Cloudera, Inc. All rights reserved. 19
20 .Hive 1010 – Information schema & sysdb SYS database Find which tables have a column with ‘ssn’ as part of the column name? SELECT tbl_name, total_size FROM table_stats_view v, tbls t WHERE t.tbl_id = v.tbl_id ORDER BY cast(v.total_size as int) DESC LIMIT 3; © 2019 Cloudera, Inc. All rights reserved. 20
21 .Hive 1010 – Information schema & sysdb Using JDBC – Storage Handler © 2019 Cloudera, Inc. All rights reserved. 21
22 .surrogate keys row_id CREATE TABLE students_v2 INSERT INTO students_v2 (row_id, name, dorm) SELECT * (`ID` BIGINT DEFAULT SURROGATE_KEY(), FROM students; row_id INT, name VARCHAR(64), dorm INT, PRIMARY KEY (ID) DISABLE NOVALIDATE); +-----------------+---------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | students_v2.id | students_v2.row_id | students_v2.name | students_v2.dorm | +-----------------+---------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ | 1099511627776 | 1 | fred flintstone | 100 | | 1099511627777 | 2 | barney rubble | 200 | +-----------------+---------------------+-------------------+-------------------+-------------------+-------------------+-------------------+ © 2019 Cloudera, Inc. All rights reserved. 22
23 .Hive Transactions Stage I V1: CREATE TABLE hello_acid (load_date date, key int, value int) CLUSTERED BY(key) INTO 3 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true’); V2: CREATE TABLE hello_acid_v2 (load_date date, key int, value int); Hive managed table External tables ACID on by default No ACID, Text by default No SBA, Ranger auth only SBA possible Statistics and other optimizations apply Some optimizations unavailable Spark access via HiveWarehouseConnector Spark direct file access © 2019 Cloudera, Inc. All rights reserved. 23
24 .Hive Merge Statement Slowly Changing Dimension (SCD) all inserts and updates are done in a single pass with full atomicity and isolation to upstream SQL queries, plus automated rollback if failures occur. Guaranteeing all these properties with legacy SQL-on- Hadoop approaches is so difficult that hardly anyone has put them into practice, but Hive’s MERGE makes it trivial. © 2019 Cloudera, Inc. All rights reserved. 24
25 .Hive Merge Statement SCD Type I contacts_stage ID Name Email State 93 Tosha Parisian junie45@price.info CA contacts_target ID Name Email State 1 Dr. Iza Gerhold loragutkowski@yahoo.com NJ 2 Katharyn Goyette DVM sadiewunsche@hotmail.com VI 3 Nikolas Tromp danniekemmer@yahoo.com VI 4 Ms. Shawnna Gerlach DVM reichelson@hotmail.com MP 93 Tosha Parisian junie45@price.info CA © 2019 Cloudera, Inc. All rights reserved. 25
26 .Hive Merge Statement SCD Type II contacts_stage ID Name Email State 93 Tosha Parisian tosha@price.info CA contacts_target ID Name Email State Validat Validat e From e To 1 Dr. Iza Gerhold loragutkowski@yahoo.com NJ 2017- null 01-01 2 Katharyn Goyette DVM sadiewunsche@hotmail.com VI 2017- null 01-01 3 Nikolas Tromp danniekemmer@yahoo.com VI 2017- null 01-01 4 Ms. Shawnna Gerlach DVM reichelson@hotmail.com MP 2017- null 01-01 93 Tosha Parisian junie45@price.info CA 2017- 2017- 01-01 07-24 93 Tosha Parisian tosha@price.info IL 2017- © 2019 Cloudera, Inc. All rights reserved. null 26 07-24
27 .Hive Merge Statement SCD Type III contacts_stage ID Name Email State 93 Tosha Parisian junie45@price.info CA contacts_target ID Name Email State Last State 1 Dr. Iza Gerhold loragutkowski@yahoo.com NJ NJ 2 Katharyn Goyette DVM sadiewunsche@hotmail.com VI VI 3 Nikolas Tromp danniekemmer@yahoo.com VI VI 4 Ms. Shawnna Gerlach DVM reichelson@hotmail.com MP MP 93 Tosha Parisian junie45@price.info IL CA © 2019 Cloudera, Inc. All rights reserved. 27
28 .Hive Materialized View Views – Hive-1143 Select A.a,B.b from A join B on A.c = B.c where A.a = 111 Select A.a,B.b from V where A.a = 111 Select C.a,C.b from SQL View SQL Rewrite (select A.a, B.b from A join B on A.c = B.c) C where C.a = 111 Create View As Select A.a,B.b from A join B on A.c = B.c © 2019 Cloudera, Inc. All rights reserved. 28
29 .Hive Materialized View Materialized View - Select A.a,B.b from A join B on A.c = B.c where A.a = 111 Select A.a,B.b from A join B on A.c = B.c where A.a = 111 Select A.a,B.b from mv SQL Check MV SQL Rewrite where A.a = 111 Create Materialized View mv As Select A.a,B.b from A join B on A.c = B.c © 2019 Cloudera, Inc. All rights reserved. 29