- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
使用Apache Spark来调整处理语言
展开查看详情
1 .Using&Spark&to&Tune&Spark Adrian'Popescu,'Shivnath Babu #AI7SAIS
2 .Meet$the$speakers Adrian$Popescu Shivnath Babu • Data%engineer%at%Unravel • Cofounder%and%CTO%at%Unravel,%Adjunct% • PhD%from%EPFL,%Switzerland Professor%at%Duke%University • Focusing%on%easeKofKuse%and%manageability% • 8+%years%of%experience%in%performance% of%dataKintensive%systems monitoring%&%modeling%of%data%management% • Recipient%of%US%National%Science% systems Foundation%CAREER%Award,%three%IBM% • Focusing%on%tuning%and%optimization%of%Big% Faculty%Awards,%HP%Labs%Innovation% Data%apps Research%Award #Exp8SAIS 2
3 .Many%apps%are%being%built%in%Spark #Exp8SAIS 3
4 . But,%let%us%face%it:% Running%Spark%apps%in% production%is%hard #Exp8SAIS 4
5 .My#app#often#fails#with#Out#of# Memory… DATA#SCIENTIST #Exp8SAIS 5
6 .My#app#is#too#slow… DATA#ENGINEER #Exp8SAIS 6
7 .My#app#is#missing#SLA… DATA#PIPELINE#OWNER #Exp8SAIS 7
8 .This%rogue%app%is%wasting%resources% and%reducing%cluster%throughput OPERATIONS%TEAMS #Exp8SAIS 8
9 .Many%factors%affect%app%performance #Exp8SAIS 9
10 .To#add#to#Spark’s#complexity • Many#types#of#Spark#apps########################### • SQL • Streaming • AI/ML Simple#SQL#and# • Graph Programming#Interface# • Scala/Python/R • Many#app#submission#methods#in#Spark#### • CLI • Thrift=Server • Notebooks=like=Zeppelin,=Jupyter,=Hue • ETL=tools=like=Informatica,=Pentaho,=and=Talend • Schedulers=like=Airflow,=Autosys,=Control=M,=Oozie,=Tidal,=TWS • Many#infrastructure#choices#for#Spark######### • OnNpremises=multiNtenant=clusters • Transient=cloud=clusters • AutoNscaling=clusters • Containerized=deployments= #Exp8SAIS 10
11 .Can-we-convert-this-problem into-a-data-problem? #Exp8SAIS 11
12 .First:'Bring'all'monitoring'data'to'a' single'platform Resource' Manager'API History'Server' API Container' Metrics Data' Statistics SQL'Query' Plans Logs Metadata Configuration One$complete$correlated$view. #Exp8SAIS 12
13 .Then:&Apply&intelligent&algorithms&to& analyze&the&data&automatically Resource& Manager&API History&Server& API Container& Metrics Data& Statistics SQL&Query& Plans Logs Metadata Configuration One$complete$correlated$view. Built4in$intelligence. #Exp8SAIS 13
14 .#Exp8SAIS 14
15 . Why$not$use$Spark$itself? Resource$ Manager$API What$ History$Server$ application$&$ API Container$ cluster$ Metrics management$ Data$ Statistics tasks$can$we$ SQL$Query$ automate$with$ Plans intelligent$ Logs Metadata algorithms$in$ Configuration Spark? #Exp8SAIS 15
16 .Let$us$take$three$(hard)$tasks • Failures$in$Spark • SLA%management%for%real0time%data%pipelines • Application%autotuning #Exp8SAIS 16
17 .Manual&Root&Cause&Analysis&of&Spark&Failures Typical(Failure(in(Spark • Many(levels(of(correlated(stack(traces • Identifying(the(root(cause(is(hard(and(time(consuming #Exp8SAIS 17
18 .Automated&Root&Cause&Analysis&of&Spark&Failures • Reduce&troubleshooting&time&from&days&to&seconds • Improve)productivity)of)data)scientists)and)analysts #Exp8SAIS 18
19 .Automatic)Root)Cause)Analysis Feature$ Learning$ Error$ vectors Algorithm Container$ Logs Template$ for$ Extraction Predictive$ Model Root$ causes Predictive$ Model #Exp8SAIS 19
20 .We#have#created#a#Failure#Taxonomy Root1Node Category1of1failure Configuration Data Resource1 Deployment1 Errors Errors Errors Errors Input1Path1 Number1 SparkSQL Not1 Format1 JsonProcessing … Available Exception Exception Root1cause1labels #Exp8SAIS 20
21 .Two$Ways$to$get$Root-Cause$Labels • Manual'diagnosis'by'a'domain'expert • Automatic'injection'of'the'root'cause #Exp8SAIS 21
22 .Unravel’s Large,scale.Lab.Framework.for. Automatic.Root.Cause.Analysis Environment: = Lab(created(on(demand(on(cloud(or(on=premises = Workloads(are(run(and(failures(are(injected Spark.and.multi,tenant.Workloads: , Variety(of(workloads:(Batch,(ML,(SQL,(Streaming,(etc. Failures: = Large(set(of(root(causes(learned(from(customers(&( partners.(Constantly(updated = Continuously(inject(these(root(causes(to(train(&(test( models(for(root=cause(prediction( #Exp8SAIS 22
23 .Injecting)Failures Application Application FAILED Input6Feature6 Execution Monitor Extraction Labeled6 Failures Injected6 Label Failure Injected)failure)examples: • Invalid6input • No6space6left6on6device • Invalid6memory6 • Transformations6inside6 configuration other6transformations • OOME:6Java6heap6space • Runtime6error • OOME:6GC6overhead6limit6 • Arithmetic6error • Container6killed6by6YARN • Invalid6configuration6 • Runtime6incompatibility settings #Exp8SAIS 23
24 .Extracting*Input*Features*from*Logs java.lang.OutOfMemoryError: Java heap space at scala.reflect.ManifestFactory$$anon$9.newArray(Manifest.scala:114) at scala.reflect.ManifestFactory$$anon$9.newArray(Manifest.scala:112) at … • Extracting+stack+traces+and+error+messages • Tokenize+by+class+names+and+words • Create+a+vocabulary+of+words+from+all+words+collected Tokens*example: java.lang.OutOfmemoryError Java heap space at scala.reflect.ManifestFactory$$anon$9.newArray(Manife st.scala:114) #Exp8SAIS 24
25 .Input&Feature&extraction • Bag&of&Words&with&TF8IDF – Computes+a+vocabulary+of+words – Uses+TF9IDF+to+reflect+importance+of+words+in+a+document • Doc2Vec – Maps+words,+paragraphs,+or+documents+to+multi9dimensional+vectors – Evaluates+the+placement+of+words+wrt neighboring+words – Uses+a+39layer+neural+network #Exp8SAIS 25
26 .System'Architecture Feature$ Learning$ Error$ vectors Algorithm Container$ Logs Template$ for$ Extraction Predictive$ Model Root$ causes Root$cause$ New$failure of$the$ Error$ failure Container$ Predictive$ Logs Template$ Model Extraction Feature$ 26 #Exp8SAIS vector
27 .Predictive)Models • Shallow)Learning – Logistic*Regression Very)easy)to) – Random*forests implement)these) in)Spark • Deep)Learning – Neural*networks #Exp8SAIS 27
28 .Predicting*the*Root*Cause*of*Failures • Training and%testing*with%injected%failures • Test%to%train%data%set%ratio%75%*to*25% • Models:%logistic%regression,%random%forests% Work%with% Logistic%Regression Random%Forests deep%learning% Accuracy*Score* 100 is%in%progress 95 [%] 90 See*our*talk*at* 85 Strata,*NY*2017* 80 TF>IDF Doc2Vec for*more*details #Exp8SAIS 28
29 .Let$us$take$three$(hard)$tasks • Failures*in*Spark • SLA$management$for$real>time$data$pipelines • Application*autotuning #Exp8SAIS 29