Databricks + Snowflake: Catalyzing Data and AI Initiatives
展开查看详情
1.WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
2.Databricks + Snowflake: Catalyzing Data and AI Garren Staubli Solutions Architect garren@databricks.com | @gstaubli #UnifiedAnalytics #SparkAISummit Slides & Resources: garrens.com/DataSnowCat
3.Agenda Introductions Scenario Challenges Solutions Demo #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 3
4. Introductions - Me MySQL AWS Python Hadoop Scala, Python & Java Ruby Pig & Hive Linux NoSQL Apache Spark & ML 2011 2012 2013 2014 2015 2016 2017 2018 2019 #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 4
5.Introductions - Databricks Databricks Workspace Collaborative Notebooks, Production Jobs Data &Runtime Databricks ML Databricks Delta Lifecycle ML Frameworks ML Frameworks Transactions Indexing Data Engineering Data Science Cloud Accelerate innovation by unifying data science and engineering
6.Introductions - Snowflake #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 6
7.Forget Oil. Data is worth more than Gold #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 7
8.Scenario #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 8
9. Scenario - Annotated Data Mining Data Science ML Engineering Production Delivery* DevOps QA * not Digiorno #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 9
10.Scenario - Reality #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 10
11.Challenges Sources LAKES STREAMS Data-Driven Production APIs Apps BI WAREHOUSES NOSQL #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 11
12. Challenges - Reality Partitions: 20 Insights Rows per second: 10,000 Format: JSON ML Analysis Extract Transform Load #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 12
13. Challenges & Solutions - ETL Partitions: 20 Rows per second: 10,000 Format: JSON Sources Flat, RDBMS, Streams, etc Syntax Unified batch & stream APIs Scale Autoscaling with usage Languages Python, Scala, SQL, R & Java Performance JVM w/ optimization Expressiveness Multilevel APIs (SQL + RDDs) #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 13
14. Challenges & Solutions - ETL Partitions: 20 Rows per second: 10,000 Format: JSON Malformed Records Ignore/infer + log records Errors Handle + retry w/ checkpoint Changing Fields Schema Evolution Writes - Performance Partitioned + optimized files - Semantics Exactly once - Reliability ACID transactions #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 14
15. Challenges & Solutions - ML Partitions: 20 Rows per second: 10,000 Format: JSON Data Access Apache Spark + Delta Syntax Koalas Collaboration Databricks Notebooks Models - Iteration - Tracking - Reproducibility - Projects - Deployment - Models #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 15
16.Challenges & Solutions - Analysis Partitions: 20 Rows per second: 10,000 Format: JSON Time to value Interactive queries Intermittent demand Instant Scaling Language SQL Common Tooling Tableau, PowerBI, etc Ease of Use Optimized DWaaS Cost control Decoupled storage + compute #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 16
17. Final Solution Architecture Machine Learning Partitions: 20 Rows per second: 10,000 Format: JSON BI Reporting Dashboards #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 17
18. Demo #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 18
19.Review Introductions Scenario Challenges Solutions Demo #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 19
20.Solution Sources Persistence LAKES MLBI LAKES STREAMS DELTA DELTA WAREHOUSES NOSQL WAREHOUSES NOSQL Processing Integration APIs Apps BI #UnifiedAnalytics #SparkAISummit | Slides & Resources: garrens.com/DataSnowCat 20
21.DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT
22.