- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Databricks, MLeap, and Kubernetes
展开查看详情
1 . How We Used Databricks, MLeap, and Kubernetes to Productionize Spark ML Faster Edward Kent Spark Summit Europe 2018 #SAISEnt9
2 .Introduction 2
3 .Introduction Transition to Digital Fully Digital Business Digital strategy accelerates First website Launch 2007 Implements digital 2013 transition strategy Digital revenue = Print revenue 100% digital business in UK and Ireland 1996 2010 Start of online operations - Websites for Retailers 1977 Launch of autotrader.co.uk Launches apps - Thames Valley Trader Classified magazine 1977 1996 2007 2013 3
4 .12th 450,000 Cars listed per day 2x more biggest UK (average) influential website for new car buyers 4x 80% Page views, ComScore CY2017 UK Auto Retailers than nearest advertise on Auto Trader competitor 55m more Annual Car Buying Report 2016 Auto monthly cross-platform visits searches than Google 9m monthly unique users 10x More minutes on site than all OEMS ComScore –Nov 17 ComScore –Nov 17 combined
5 .Introduction Services for consumers New and used car search listings Valuations Searching by monthly budget Price indicator Dealer reviews and ratings Private sales Vehicle check Car reviews 5
6 .Introduction Services for retailers Classified advertising Valuations Finance solutions Creating a trusted marketplace Forecourt management tools Retailer education and insight 6
7 .Data at Auto Trader 7
8 . Auto Trader Data Platform Analytics consumers Database Database Notebooks Sqoop Sqoop Kafka Kafka connect EMR Kafka S3 Scheduler External Production workloads Airflow 8
9 .Days to Sell
10 .Section title
11 .Discovery to production Discovery phase Production phase Automated model Automated model training deployment • Slow feedback cycle • Quick iterations • Fully automated • Fast feedback • Unit tests • Offline predictions • Data integrity checks • Deployment • Resilient • Online, low-latency predictions 11
12 .Discovery phase 12
13 . The Model Training data Test data Spark ML Pipeline T T Transformers T fit() T transform() Predictions Estimators T T E T Pipeline PipelineModel 13
14 . The Model Days to Sell ML Pipeline T T Make/model/… Age Valuation Location Historic advert data T T fit() transform() Vehicle taxonomy Days to sell Valuations Geographic data T T Prediction Training data E T Pipeline PipelineModel 14
15 . Production phase Automated model training 15
16 .Automated model training Converting to a Spark job • Automated testing • Integrated release automation Self-contained JAR • Airflow integration Notebook • Extra stage of code review Scheduler Airflow 16
17 .DAG Scheduling Historic advert data Spark job Spark job … Spark job Training Integrity Train ML data checks pipeline + tests Taxonomy data Write to S3 Write to S3
18 . Production phase Automated model deployment 18
19 .Architectural overview • Make/model/… Vehicle Frontend • Age valuations apps • Mileage service • Location Taxonomy 3rd party service APIs Metric Price indicator aggregator service service Stream processor Days to sell model service • Valuations • Price Indicator • Days to sell metric 19
20 .Automated model deployment Online serving attempt 1 HTTP Request Local Spark Context transform() HTTP Response Microservice 20
21 .Automated model deployment Online serving attempt 2 T HTTP Request T MLeap runtime transform() T HTTP Response T Microservice 21
22 .DAG Scheduling Historic advert data Spark job Spark job Spark job … Spark job Training Integrity Train ML Convert to data checks pipeline + MLeap tests Taxonomy data Write to S3 Write to S3 Write to S3
23 .Automated model deployment Spark job Spark job Spark job Spark job Training Integrity Train ML Convert to Update data checks pipeline + MLeap model tests location Write to S3 Write to S3 Write to S3 Git commit Airflow DAG
24 .Automated model deployment Spark job Spark job Spark job Spark job Training Integrity Train ML Convert to Update data checks pipeline + MLeap model tests location Write to S3 Write to S3 Write to S3 Git commit Airflow DAG Fetch MLeap Build Run model from S3 application automated tests Deployment
25 .Automated model deployment Model training Testing Strategy • Unit tests Generate sample • Integration tests? data S3 Load sample data Integration tests 25
26 .Automated model deployment Spark job Spark job Spark job Spark job Training Integrity Train ML Convert to Update data checks pipeline + MLeap model tests location Write to S3 Write to S3 Write to S3 Git commit Airflow DAG Fetch MLeap Build Run Publish model from S3 application automated container tests Deployment
27 .Automated deployment Brief introduction to Kubernetes “Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.” 27
28 .Automated deployment Brief introduction to Kubernetes Pod HTTP request Container Load balancer Pod Container 28
29 .Automated deployment Spark job Spark job Spark job Spark job Training Integrity Train ML Convert to Update data checks pipeline + MLeap model tests location Write to S3 Write to S3 Write to S3 Git commit Fetch MLeap Build Run Publish to Kubernetes model from application automated container deployment S3 tests registry Rebuild container