云环境下数千名Spark 员工的管理

在DataVisor,我们使用无监督的机器学习方法来打击在线欺诈、滥用和洗钱,这种方法聚集了数百万用户。为了支持计算密集型的工作负载,DataVisor使用Spark作为其计算基础设施的主体。我们的Spark基础设施的可扩展性和可移植性对我们公司拓展业务至关重要。在这次谈话中,我们将展示我们如何管理我们的Spark基础设施的规模。
展开查看详情

1.Managing Thousands of Spark Workers in Cloud Environment Yuhao Zheng & Boduo Li, DataVisor #HWCSAIS14

2.How To Peak Scale Cloud Cost Ops Cost # of Spark executors Annual cost in USD Weekly man-hour 5 THOUSANDS 20 MILLIONS 50 4 40 4 15 40 15 3 4X 30 4X 10 5X 2 20 1 5 10 1 3 10 0 0 0 #HWCSAIS14 2

3.• Focus on fraud detection • Founded in Dec 2013 • 100-person team #HWCSAIS14 3

4.Coordinated Online Attacks Fake Review Transaction Fraud Promotion Abuse Crime Ring Malicious Accounts Launch Attacks Loss: >50B/Year #HWCSAIS14 4

5. DataVisor: UML Fraud Detection Unsupervised Machine Learning Early Detection Catch incubated accounts Unknown Attack Detection Catch unknown suspicious activities High Coverage and Accuracy Detect all bad users in a campaign #HWCSAIS14 5

6.UML is Expensive application_freq application_time … Behavior … Pattern work_year_distrib marriage_distrib Clustering Data Clean, Profile Register promoter_info Analysis Feature Ext. … Pattern Profile Login Trasaction deviceid_distrib … ip_usage Device devicetype_var Pattern … User Events … Association probability !(#, %) = ( *) ∗ ,) (-. , -/ ) … … ) … Clustering probability 01 = ( !1 (#, %) Feature Pool: Thousands of Features 1 #HWCSAIS14 6

7.Huge Data Volume 3 Billion+ user accounts 600 Billion+ events and growing 3 Petabytes of data #HWCSAIS14 7

8.Schedule Dependency Original Data Original Data Pipeline Module Pipeline Module Pipeline Module Pipeline Module Pipeline Module Pipeline Module Detection Result ~20 modules / client #HWCSAIS14 8

9. 15 Million Naïve Solution: Single Cluster Estimated Annual Cost M Spark S S F I F O QU E U E Applications S S S Spark Cluster Static cluster No auto-scale #HWCSAIS14 9

10. Problems of Single Static Cluster Application Executor Memory # Executors 1 2 GB 2 2 6 GB 80 3 12 GB 48 Cluster Size Wasted memory Running: 2GB 12GB Queueing: 12GB App Executor #HWCSAIS14 10

11. 12 Million Improvement: Multiple Clusters Estimated Annual Cost M Small Applications F I F O QU E U E S S S S S 2GB Executors M Large S S F I F O QU E U E Applications S S S 12GB Executors #HWCSAIS14 11

12. 8 Million Further Reduce Cost Estimated Annual Cost Cloud cost Operational cost • Spot instances • Loss of spot • Smaller cluster • Job failure #HWCSAIS14 12

13.Drawbacks of Static Allocation Under capacity Fixed Size Human Cost Over capacity Always On Cloud Cost Maintenance Cost #HWCSAIS14 13

14.Can We Go Dynamic? Why Not? #HWCSAIS14 14

15.More Requirements Product A Product B • Product features – Affect module dependencies • Job priority – SLA assurance High priority Normal priority #HWCSAIS14 15

16.DataVisor SparkGen #HWCSAIS14 16

17. 3 Million DataVisor SparkGen Estimated Annual Cost M Prod Jobs S S S M Prod Job S S Scheduler Spark S S Resource Manager M S S Dev Jobs S Developers S S #HWCSAIS14 17

18.Cost Equations Cost = Machine Cost + Human Cost Machine Cost = Machine Up Time x Unit Price Human Cost = Operation Overhead #HWCSAIS14 18

19. Reduce Machine Up Time Under-utilized Resource 1 Utilization Launch Job Idle Idle Job Idle 0 Time 60% Saving Single Static Cluster Multiple Static Clusters One Job Per Cluster ⊕ One-time launch ⊕ One-time launch ⊖ Per-job launch ⊖ Low utilization Moderate utilization ⊕ High utilization ⊖ Idle time ⊖ Idle time ⊕ No idle time ⊖ Limited concurrency ⊖ Limited concurrency ⊕ Dynamic max concurrency ⊖ Inter-job interference ⊖ Inter-job interference ⊕ No inter-job interference ⊖ High maintenance overhead ⊖ High maintenance overhead ⊕ Low maintenance overhead #HWCSAIS14 19

20.Reduce Launch Time Amazon Machine Image (AMI) • Pre-built AMI Docker Docker Docker – Systems & libs (dockerized) Spark Ganglia Libs – Pre-configured (non-runtime) Require Master Ready Master Init 1 2 1 2 • Concurrent master/slave 1 Launch Time2 Sequential initialization Slave Initialization (2 phases) Master Init • Phase 1 1 • Launch 2 instance • Upload runtime configuration 1 2 • Result: 30 min → 3 min • •Concurrent Phase 2 Start services (local) • Start services (connect to master) Launch Time #HWCSAIS14 20

21. Maximize Job Concurrency Sequential H A B C D E F G H I J K L M Time B E I L One Job Per Cluster H A C F G J B E I L A C F G J D K M D K M Time 2X lower latency Eliminate prioritization issue #HWCSAIS14 21

22.Cost Equations Cost = Machine Cost + Human Cost Machine Cost = Machine Up Time x Unit Price Human Cost = Operation Overhead #HWCSAIS14 22

23.Reduce Unit Price • Spot Slaves (75% Saving) • Reserved Masters (40% Saving) ON DEMAND RESERVED SPOT 0 0.1 0.2 0.3 0.4 0.5 0.6 R4.2XLARGE HOURLY $ #HWCSAIS14 23

24.Cost Equations Cost = Machine Cost + Human Cost Machine Cost = Machine Up Time x Unit Price Human Cost = Operation Overhead #HWCSAIS14 24

25.Reduce Operation Overhead • One Job Per Cluster – Dynamic scale out – No inter-job interference – Easy patch/re-launch clusters zone a, r4.2xlarge – Spot Fleet zone b, r4.8xlarge • Higher availability (diversified) zone c, r4.xlarge • Maintain minimum capacity zone b, r4.4xlarge zone b, r4.2xlarge #HWCSAIS14 25

26.Why Not Yarn? • Compared to One Job Per Cluster – Single-point of failure (Master) – Slower to scale – One more system to configure / maintain #HWCSAIS14 26

27.Job Scheduler Auto Generate Product Dependency Features Spark Resource Manager Simple Per-client Spec #HWCSAIS14 27

28. Results 5 THOUSANDS Peak Scale 4 Pipeline Latency 15 12 4 # of Spark executors End-to-end hours 3 10 6 4X 2 1 2X 5 1 0 0 20 MILLIONS Cloud Cost 15 Ops Cost 50 40 Annual cost in USD 15 Weekly man-hour 40 30 10 5X 4X 20 10 5 3 10 0 0 #HWCSAIS14 28

29.Q&A www.datavisor.com #HWCSAIS14 29