- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
离散数据与算力场景下 openLooKeng 的技术路线
Ken Zhang-华为计算首席大数据架构专家、OpenLooKeng社区PMC主席
展开查看详情
1 .https://openlookeng.io
2 .openLooKeng and the technical trend of big data Ken Zhang OpenLooKeng Community PMC Chair Chief expert of Huawei Kunpeng Big Data https://openlookeng.io
3 . Table of Contents 1 OpenLooKeng Introduction 2 Thoughts on the Development of Big Data 3 OpenLooKeng Strategies https://openlookeng.io
4 . openLooKeng: Enterprise Ready Data Virtualization Engine • Presto 316 was chosen as Enhanced SQL Parser Layer the baseline for our engine Engine Kernel • The SQL interface and the Virtual Data Mart (VDM) Connector framework is backward compatible with Scheduler Optimization Operator pushdown ACID AA High Availability Task Recovery our enhancements Dynamic filtering Cache Resource Mgmt Horizontal Scaling Heuristic Index • The kernel is replaced for better performance in both Native Runtime (OmniCache | OmniVec | OmniJit) complex queries and point Unified Data Source Connection Framework queries • A native runtime is tentatively planned for the …… Cross DC intelligent operator end of 2021 pushdown and dynamic filtering https://openlookeng.io
5 . Performance: the Ultimate Goal openLooKeng 1.4.0 vs Trino 361 30% TPC-DS,10TB Data Size 测试背景: 节点:11节点(1cn + 10 worker) 内存:376GB CPU:2*Intel(R) Xeon(R) Gold 6140 CPU @ 2.60GHz 18 Cores OS:centos7.6 Spark 3.0 vs openLooKeng 1.4.0 Impala 3.4.0 vs openLooKeng 1.4.0 Point Query:Presto 347 vs openLooKeng 1.1. Real customer SQL is used 46% 35% 99% TPC-DS, 1TB Data Size TPC-DS (96 SQL), 1TB Data Size https://openlookeng.io
6 . with Partners, Build a active openLooKeng Ecosystem Milestone openLooKeng openLooKeng 1.0.0 openLooKeng 1.1.0 openLooKeng 1.2.0 openLooKeng 1.3.0 openLooKeng 1.4.0 (open source) (released) (released) (released) (released) (New) 2020.06 2020.09 2020.12 2021.03 2021.06 2021.09 Unified entrance Support for IUD for ORC Cross-DC dynamic filtering enhancement DM optimization Resource isolation Incremental index update High-performance Supports virtual data marts. General operator push- Enhanced reliability On Yarn support converged analytics down framework Cross-domain analysis Community Development 8w+ 13w+ 35w+ 1000 + 100 + 200 + 12 User quantity Community Overseas Contributs preacher Downloads Visits membership members Powered By openLooKeng 8 3 2 2 2 5+ ISV government Finance ICT Internet Other https://openlookeng.io
7 . Table of Contents 1 OpenLooKeng Introduction 2 Thoughts on the Development of Big Data 3 OpenLooKeng Strategies https://openlookeng.io
8 .The Evolution of Data Infrastructure SQL SQL/MapReduce Data & Analytics as a Function Data Warehouse Data Lake Data Mesh Centralized Centralized, Federated Governance Schema on Write lack of reuse, Composable Data & Analytics Difficult to scale up Complicated software stack Domain Driven Ownership Self-service Data Infrastructure https://openlookeng.io
9 . Data Infrastructure: From Contractor to Consultant, From Data to Insight Problems: 1. New laws such as GDPR requires tighter data security and privacy Data & Analytics as a Function 2. Data will be owned and hosted on disparate systems Virtualization Domain 3. New data security, privacy and distributed data access will further Data Federated Governance Driven Governance access performance under restricted conditions is severely challenged. Trend: Composable 1. Static Data - Live Insight: Data & Analytics as a Function Data & Analytics Data 2. Data & Analytics Functions as the base unit for data security and privacy. Mesh 3. Cross Cloud data virtualization continue to be the core for simpler data management 4. Latency tolerant RPC protocol: Data & Analytics Function RPC Protocol for Delay Tolerance 5. Security, data locality, computing resource, and cost affiliate scheduler https://openlookeng.io
10 . A Modular Data Mesh Architecture Data & Analytics as a Function • Natural Evolution Based on Data Lake Infrastructure Exchange D&A Mgmt Domain Driven • Incremental construction and Governance Cross DC Data Access smooth transition TICS Trusted Intelligent Computing Service catalog Analysis • Reuse existing data lake investments Composable Data & Analytics • Non-invasive modification of big Storage Data Mesh data applications https://openlookeng.io
11 . Overall Architecture - User Perspective • Unified, global, virtual data infrastructure presented as local access points for upper-layer business applications D&A Mgmt • Provides traditional SQL and new D&A Cross DC Data Access function interfaces. TICS Trusted Intelligent Computing Service catalog • Data and analytics are managed based on • Create a virtual data infrastructure organizational structure based on the published data & • Each organization is responsible for the analytics functions quality of the data and analytics functions https://openlookeng.io
12 . Table of Contents 1 OpenLooKeng Introduction 2 Thoughts on the Development of Big Data 3 OpenLooKeng Strategies https://openlookeng.io
13 . DCC enables data access data centers, domains, clouds. DC connectors are used to access data from remote data centers. Cross-DC data collaborative query does not need to rely on the data forwarding platform. Operator push-down and cross-domain dynamic filtering technologies can achieve WAN deployment and LAN performance experience. DCC Related Technologies ➢ Distributed Task Pushdown Technology ➢ Parallel access data ➢ data compression and transmission ➢ Data resumable ➢ Cross-DC dynamic filtering The preceding figure shows the data flow of the DC connector. (1) Obtain the segment. The segment of the current DC is a value set by configuration. (2) DC1 scheduling slice; (3) After receiving the fragment processing request, DC1 worker sends a Post request, which is actually an SQL request, to DC2. (4) The worker of DC1 keeps calling Http Get to obtain data from DC2 until the data is obtained. https://openlookeng.io
14 . DCC Key Technologies - Dynamic Filtering SELECT s_order_id, s_item_name, d_date FROM hive.department.date_dim LEFT JOIN dc1.hive.department.store_sales ON d_date = s_date; https://openlookeng.io
15 . Dynamic compilation JIT is widely used, and JIT has a remarkable effect on SQL applications. General Programming Internet Data analysis Clang-JIT NativeJIT - Microsoft Spark - C++ template programming - Bing Search / PageRank - Whole Stage Code Gen EasyJIT HHVM JIT – facebook, baidu Impala - Parameter Specialization - Web - Jitting function parameter atJIT Presto - Auto Tuning - Java based code gen BOLT – facebook Flink - Binary Optimization - Java based code gen - Big data analysis is classified into interactive processing, batch processing, and stream processing. - SQL is widely used in the preceding three scenarios (60%+%). - SparkSQL, FlinkSql - HiveQL - Presto/Impala/ClickHouse SQL 2003 - Filtering, grouping, aggregation, and Join are common operators for data analysis. https://openlookeng.io
16 . OLK Spark openLoo SQL Hive Flink Impala Keng Batch Presto Spark Hive Flink Java Java Java Java C/C++ Local Local Local Local Common Big Data Runtime optimization optimization optimization optimization (OmniRuntime) Performant | Secure Compute CPU Kunpeng X86 XPU Ascend GPGPU FPGA DPU - 高效大数据分析运行时,构筑大数据分析算子生态 - 协同编译器, 云与计算产品线,实现,效率、硬件、服务的端到端垂直整合 - 以OmniRuntime为依托,推动科研机构参与共筑生态 Page 16/32 Huawei Confidential
17 . OmniRuntime: the Foundation for High-Performance Data Analytics OLK Spark openLoo SQL Hive Flink Impala Keng Batch Presto Spark Hive Flink Java Java Java Java C/C++ Local Local Local Local Common Big Data Runtime optimization optimization optimization optimization (OmniRuntime) Performant | Secure Compute CPU Kunpeng X86 XPU Ascend GPGPU FPGA DPU - Dynamic optimization and performance tuning tools to ease development of data analytics operators - Native fast columnar in memory format with memory lifecycle management - Supports Heterogeneous computing resource https://openlookeng.io
18 . OmniFlex: Data-Driven | Non-Intrusive Dynamic Optimization OmniFlex PreProcessor Original Caller Application Original Caller Jittable Application Original omnidds.so Annotation Callee->JitProxy Compilation Original Callee (shared library) Processor Pipeline DataStatProvider omnidds.so (library) Annotation • Select the target function for annotation. The built-in annotation and out-of-place annotation are supported. Both code readability and maintainability are supported. • The OmniFlex PreProcessor is added to the original build process. The OmniJit PreProcessor injects a code injection dynamic optimization library into the marked function and generates an executable program that can be dynamically optimized independently after compilation. • The OmniDDS is invoked during application running. • Determine whether to perform dynamic optimization based on the Annotation. • Call the DataStatProvider to obtain data information, determine the data type based on the data information, reshape the type, and optimize the pipeline. • Generate optimized execution code https://openlookeng.io
19 .OmniRuntime: Performance Evaluation Operator Level Filter and Project Hash Aggregation Order By openLooKeng + OmniJIT openLooKeng End-to-end TPC-H Q1 TPCDS-Q7* 50% 60.00% 38.70% 41.46% 40% 29% 31.70% 40.00% 31.61% 25.15% 30% 20% 20.00% 10% 0.00% 0% 1 4 10 1 4 10 Concurrency Concurrency * Project-related content is not included. openLooKeng integrate with OmniRuntime https://openlookeng.io
20 . OmniRuntime: Analytics Operator Ecosystem openLooKeng Spark SQL Hive Flink Impala Java Java Java Java C/C++ OmniRuntime - Common Big Data Runtime Java | Scala | C/C++ | Python OmniCache Manages cache of OmniVec with improved hit ratio C/C++ Relational cache, supporting cross-process data sharing OmniEx OmniVec In memory data structure holding column data Column-based memory data format, supporting zero-copy, vectorization, and operator ecosystem OmniOp Data driven optimization and OmniOp management OmniFlex Tools, programming framework, and input parameters of transparent data are fixed. Compute CPU Kunpeng X86 XPU Ascend GPGPU FPGA DPU https://openlookeng.io
21 .openLooKeng – making big data easier openLooKeng WeChat public account openLooKeng WeChat assistant Official website: https://openlookeng.io/ https://openlookeng.io
22 .https://openlookeng.io