- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Databricks + Snowflake: Catalyzing Data and AI Initiatives
展开查看详情
1 .WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
2 .Databricks + Snowflake: Catalyzing Data and AI Garren Staubli Solutions Architect garren@databricks.com | @gstaubli #UnifiedAnalytics #SparkAISummit Slides & Resources: garrens.com/DataSnowCat
3 .Agenda Introductions Scenario Challenges Solutions Demo #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 3
4 . Introductions - Me MySQL AWS Python Hadoop Scala, Python & Java Ruby Pig & Hive Linux NoSQL Apache Spark & ML 2011 2012 2013 2014 2015 2016 2017 2018 2019 #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 4
5 .Introductions - Databricks Databricks Workspace Collaborative Notebooks, Production Jobs Data &Runtime Databricks ML Databricks Delta Lifecycle ML Frameworks ML Frameworks Transactions Indexing Data Engineering Data Science Cloud Accelerate innovation by unifying data science and engineering
6 .Introductions - Snowflake #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 6
7 .Forget Oil. Data is worth more than Gold #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 7
8 .Scenario #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 8
9 . Scenario - Annotated Data Mining Data Science ML Engineering Production Delivery* DevOps QA * not Digiorno #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 9
10 .Scenario - Reality #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 10
11 .Challenges Sources LAKES STREAMS Data-Driven Production APIs Apps BI WAREHOUSES NOSQL #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 11
12 . Challenges - Reality Partitions: 20 Insights Rows per second: 10,000 Format: JSON ML Analysis Extract Transform Load #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 12
13 . Challenges & Solutions - ETL Partitions: 20 Rows per second: 10,000 Format: JSON Sources Flat, RDBMS, Streams, etc Syntax Unified batch & stream APIs Scale Autoscaling with usage Languages Python, Scala, SQL, R & Java Performance JVM w/ optimization Expressiveness Multilevel APIs (SQL + RDDs) #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 13
14 . Challenges & Solutions - ETL Partitions: 20 Rows per second: 10,000 Format: JSON Malformed Records Ignore/infer + log records Errors Handle + retry w/ checkpoint Changing Fields Schema Evolution Writes - Performance Partitioned + optimized files - Semantics Exactly once - Reliability ACID transactions #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 14
15 . Challenges & Solutions - ML Partitions: 20 Rows per second: 10,000 Format: JSON Data Access Apache Spark + Delta Syntax Koalas Collaboration Databricks Notebooks Models - Iteration - Tracking - Reproducibility - Projects - Deployment - Models #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 15
16 .Challenges & Solutions - Analysis Partitions: 20 Rows per second: 10,000 Format: JSON Time to value Interactive queries Intermittent demand Instant Scaling Language SQL Common Tooling Tableau, PowerBI, etc Ease of Use Optimized DWaaS Cost control Decoupled storage + compute #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 16
17 . Final Solution Architecture Machine Learning Partitions: 20 Rows per second: 10,000 Format: JSON BI Reporting Dashboards #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 17
18 . Demo #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 18
19 .Review Introductions Scenario Challenges Solutions Demo #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 19
20 .Solution Sources Persistence LAKES MLBI LAKES STREAMS DELTA DELTA WAREHOUSES NOSQL WAREHOUSES NOSQL Processing Integration APIs Apps BI #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 20
21 .DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT
22 .