- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
delta lake in ebay
Carmel Spark是eBay Carmel团队基于Apache Spark进行魔改的SQL-on-Hadoop引擎。在功能性和性能上,做了大量开发和优化。例如全新的CBO,并发调度,物化视图,索引,临时表,Extended Adaptive Execution,Range Partition,列级访问权限控制,以及各类监控和管理功能等。本次分享主要介绍Delta Lake在Carmel Spark中的实践经验:基于Delta Lake存储层,实现了完全兼容Teradata的扩展语义Update/Delete语法。
展开查看详情
1 .Delta Lake in eBay Transform a legacy Spark SQL to support complex CRUD Lantao Jin lajin@ebay.com
2 .Agenda Background Challenges Implementation Optimization Values Demo 2 © 2018 eBay. All rights reserved.
3 .Background
4 . Why it has to be done? ● CRUD is a fundamental requirement from data service ● Databasization is a trend in analytic datasets ○ Google Dremel/BigQuery ● A block of TD-Offload ○ 5%+ Update/Delete statements from customers ● Provide a new approach in many scenarios ○ Transaction/ACID/Time Travel, etc. 4 © 2018 eBay. All rights reserved.
5 . Why now? ● Evolution of big data technology ○ Volume of data growth is not a problem any more. ○ Variety of data application is taking into account. ● The work except update/delete is nearly done in TDOffload ○ At least, they have ready-made solutions. ● Many storage projects of analytic datasets open source ○ Apache Hudi(incubating), Apache Iceberg(incubating) ○ Databricks Delta Lake 5 © 2018 eBay. All rights reserved.
6 .Challenges
7 . Apache Spark is not ready Apache Spark 3.0.0-SNAPSHOT 7 © 2018 eBay. All rights reserved.
8 . Delta Lake OSS still looks like a toy 8 © 2018 eBay. All rights reserved.
9 . Teradata: “You get what you pay for” 9 © 2018 eBay. All rights reserved.
10 . Touching Carmel Spark is stubborn ● Carmel Spark is advancing ○ Dynamic Partition Pruning, Skew join in Adaptive Execution ● Carmel Spark is archaic ○ DataSourceV1, codegen ● Carmel Spark is complicated ○ file index, ACL, materialized view, volcano CBO, runtime filter, range partition, session level cache, concurrency schedule, Multiple file scan, etc. 10 © 2018 eBay. All rights reserved.
11 .Implementation
12 . Using Delta Lake 12 © 2018 eBay. All rights reserved.
13 .13 © 2018 eBay. All rights reserved.
14 . Implements in Spark Catalyst ● Plugable ● SparkSessionExtensions Delta Lake ● Fully benefit by Catalyst Delta Lake ● Bucket join optimization ● Cross table update/delete ● Fully support TD syntax 14 © 2018 eBay. All rights reserved.
15 . Cross-table update/delete 15 © 2018 eBay. All rights reserved.
16 . “Time Travel” among transaction lineage SELECT AT ROLLBACK 16 © 2018 eBay. All rights reserved.
17 . Self-Government ● Store metadata of delta tables to a delta table. ● Use listener to handle meta async. ● Diff behaviours in diff queues. ● Manage transactions chains to rollback/time travel 17 © 2018 eBay. All rights reserved.
18 . Code Review for Update SqlBase.g4 解析From子句, 构建多表join, SparkSqlParser.scala 封装Assignments, visitUpdateTable() 封装Condition, 最终生成UpdateTableStatement节点 通过SparkSessionExtension注入解析规则 18 © 2018 eBay. All rights reserved.
19 . 解析Assignments 解析Condition yes Assignments foldable && Condition empty 退回单表update no 推断出所有只在 source表的join Source表包含多个join condition,增加Filter节 点 根据join condition和 assignment的attributes ProjectionPushdown source表 生成属性都resolved的 UpdateWithJoinTable 19 © 2018 eBay. All rights reserved.
20 . 标记为filesToRewrite 找出tahoeFileIndex,构建 UpdateWithJoinCommand 构建left join plan 过滤掉target表没有涉及更 新的文件 yes bucket表 对上述plan进行repartition 通过inner join找出满足join 条件的文件 no mapPartition上述plan对于 match的行,输出右表的 yes no exception 多行match output,不match的输出左 表的output no 通过FileFormatWrite写出并 标记为RemoveFiles 标记为AddedFiles 20 © 2018 eBay. All rights reserved. commit到transaction log
21 . Code Review for Insert INSERT INTO/OVERWRITE ... DataSourceStrategy: Case InsertIntoTable 为query添加static yes 有static partition partition的projection no 将得到的新的query作为 Child封装到逻辑节点 InsertIntoDataSource 将actualQuery传入 InsertIntoDataSourceCo mmand 21 © 2018 eBay. All rights reserved.
22 . SparkStrategy: BasicOperators: Case InsertIntoDataSource 生成物理节点 InsertIntoDataSourceExec EnsureRequirements: ensureDistributionAndOrde ring() 添加 yes 是bucket表 HashClusteredDistribution no EnsurePartitionForWriting: 添加ShuffleExchangeExec InsertIntoDataSourceComm and.run() 22 © 2018 eBay. All rights reserved.
23 . InsertableRelation.insert() yes insert static partition 填充replace_where && overwrite no 组装predicates,调用 OptimisticTransaction.write snapshot.fileForScan 找出deleteFiles 通过FileFormatWrite写出并 标记为RemoveFiles 标记为AddedFiles commit到transaction log 23 © 2018 eBay. All rights reserved.
24 . More details https://mp.weixin.qq.com/s/L64xhtKztwWhlBQrreiDfQ 24 © 2018 eBay. All rights reserved.
25 .Optimization
26 . Functional optimizations Downgrade some codegen APIs to be compatible with Spark 2.3 Invalid table cache after any schema changed Data type in SET should consider casting Row level metrics for update/delete Expose insert into table metrics to Spark Auto vacuum: 'CONVERT TO DELTA VACUUM', 'VACUUM AUTO RUN' Support bucket delta table Insert overwrite with static/dynamic partitions 26 © 2018 eBay. All rights reserved.
27 . Functional optimizations Allow executing CONVERT TO DELTA on an empty parquet table Update/Delete supports multiple tables join Record table partition schema into transaction log when insert Rollback should support partitioned table Hive Authorization for UPDATE/DELETE Create table like a delta table should generate an empty parquet table Resolve conflicting attributes in update with self-join A new approach to getPartition/listPartitions/listPartitionNames for delta tbl 27 © 2018 eBay. All rights reserved.
28 . Performance optimizations Fallback to simple update if all SET statements are foldable and no join Support bucketing join CRUD on delta table should update table statistics Handle delta meta table async Project the source columns that only appears in SET and WHERE Skip schema infer and merge when table schema can be read from catalog Skip caching delta table plan in broadcast cache Convert to delta should work on temporary table 28 © 2018 eBay. All rights reserved.
29 . Performance optimizations (indirect) File Index Multiple files scan Adaptive Runtime filtering AE skew joining Optimized bucket join Range Partition Broadcast/Local cache 29 © 2018 eBay. All rights reserved.