- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Apache Spark Data Governance Best Practices—Lessons Learned from Centers for Med
展开查看详情
1 .WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
2 .Apache Spark Data Governance Best Practices—Lessons Learned from Centers for Medicare and Medicaid Services Donghwa John Kim, NewWave #UnifiedAnalytics #SparkAISummit
3 .Customers #UnifiedAnalytics #SparkAISummit 3
4 .About NewWave CMMI Level 4 for Prime Contract Mid-Size Business Services & Development Vehicles 300+ Employees ISO 9001:2015 CMS SPARC – 8(a) & Small 11 Prime Contracts GSA 8(A) STARS II Databricks Gold Level Partner GSA Schedule 70 & Health IT Support 7 CMS Centers SIN Microsoft Gold Cloud Platform AWS Advanced Consulting Partner #UnifiedAnalytics #SparkAISummit 4
5 .Technology Vendor Partners #UnifiedAnalytics #SparkAISummit 5
6 .Centers for Medicare & Medicaid Services (CMS) CMS is the largest healthcare payer in the country, with a budget of $793.7B. NewWave is its trusted partner and leading innovator. #UnifiedAnalytics #SparkAISummit 6
7 .A unique customer that sets the standard for industry & defines the market in healthcare #UnifiedAnalytics #SparkAISummit 7
8 .Data Challenge • 2 billion data points* annually to store, analyze and disseminate • Privacy requirements (PHI, PII) without compromising agility • Central view of available data on multiple systems * Just on Medicare data #UnifiedAnalytics #SparkAISummit 8
9 .The Objectives The vision is to provide a simple and reliable technology and data experience for all of CMS IT Portfolio stakeholders. Center-wide shared data services Robust data governance Single cloud-native architecture #UnifiedAnalytics #SparkAISummit 9
10 .The Definition of Genius Is Taking the Complex and Making it Simple – Albert Einstein #UnifiedAnalytics #SparkAISummit 10
11 .Solution from a Bird’s Eye View #UnifiedAnalytics #SparkAISummit 11
12 .Data as a Service Data Agility Improved Data Quality Cost Effectiveness #UnifiedAnalytics #SparkAISummit 12
13 .Agility - Dremio Virtual Datasets • Built on top of the immutable physical datasets found in sources • A layered stack of data transformations that have been performed on top of one or more physical datasets • Each virtual dataset is ultimately described by a SQL query • Chaining of datasets are possible. • Data Lineage - a history of all the applied transformations is available #UnifiedAnalytics #SparkAISummit 13
14 .Agility - Dremio Virtual Dataset Example #UnifiedAnalytics #SparkAISummit 14
15 .Simplicity - SQL for [almost] EVERYTHING • Ability to join data from multiple data sources including JSON, CSV, Parquet, relational database and NoSQL • Unified interface for the data And suddenly ... SQL is sexy again! #UnifiedAnalytics #SparkAISummit 15
16 .Simplicity - SQL for [almost] EVERYTHING #UnifiedAnalytics #SparkAISummit 16
17 .Privacy - Row Level Masking Use query_user() and is_member() for selective filtering of rows for different users or groups without having to create multiple datasets. #UnifiedAnalytics #SparkAISummit 17
18 .Privacy - Column Level Masking #UnifiedAnalytics #SparkAISummit 18
19 .Privacy - Column Level Masking - VDS #UnifiedAnalytics #SparkAISummit 19
20 .Centralized View - Data Catalog • Ability to search for the data • Collaboration experience using Wiki and content tagging #UnifiedAnalytics #SparkAISummit 20
21 .Data Lineage #UnifiedAnalytics #SparkAISummit 21
22 .Looker’s LookML = “SQL Evolved” LookML is a language for describing dimensions, aggregates, calculations and data relationships in a SQL database. #UnifiedAnalytics #SparkAISummit 22
23 .Looker’s Explorer #UnifiedAnalytics #SparkAISummit 23
24 .LookML => SQL #UnifiedAnalytics #SparkAISummit 24
25 .Data Modeling with Looker SQL models generated by Looker from LookML can be exported into Dremio to create virtual datasets. #UnifiedAnalytics #SparkAISummit 25
26 .Accessing Dremio from Databricks • Adding Dremio JDBC Driver jar in Databricks #UnifiedAnalytics #SparkAISummit 26
27 .Accessing Dremio from Databricks Use it! * Driver Virtual Dataset Parallelism level * https://docs.databricks.com/user-guide/secrets/example-secret-workflow.html #UnifiedAnalytics #SparkAISummit 27
28 .Demo #UnifiedAnalytics #SparkAISummit 28
29 .Demo #UnifiedAnalytics #SparkAISummit 29