- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Official Announcement of Koalas Open Source Project
展开查看详情
1 .WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
2 .Jaime Woodfin, FIS Global Brooke Wenig, Databricks Special thanks to Amir Issaei, Kevin Mellott, and Aaron Colcord #UnifiedAnalytics #SparkAISummit
3 .Outline • FIS & Databricks Intro • Business Problem & Motivation • Approach v1 – Problems encountered along the way • Approach v2 • Production #UnifiedAnalytics #SparkAISummit 3
4 .Who is FIS Global? • The Global leader in financial services technology • Customers: banks and credit unions • Ecosystem of products and services built around core banking • FIS Digital Finance, Digital Data and Analytics #UnifiedAnalytics #SparkAISummit 4
5 .VISION Accelerate innovation by unifying data science, engineering and business PRODUCT Unified Analytics Platform powered by Apache Spark WHO WE ARE • Founded by the original creators of Apache Spark • Contributes 75% of the open source code, 10x more than any other company • Trained 100k+ Spark users on the Databricks platform
6 .Business Problem
7 .Conversational Analytics • Measure in support conversations • Follow-up on support conversation #UnifiedAnalytics #SparkAISummit 7
8 .Example Conversation
9 .Conversational Channels Developments Face-to-face Human support chat Support chatbots Conversational Banking #UnifiedAnalytics #SparkAISummit 9
10 .Goals • Score overall user conversation satisfaction • Question: What contributed to their satisfaction? • AND how to do it at scale?? #UnifiedAnalytics #SparkAISummit 10
11 .Approach v1
12 .Approach v1 • Apply open-source NLP libraries to each turn in conversation #UnifiedAnalytics #SparkAISummit 12
13 .Library Comparison • Different Scales – TextBlob: [-1, 1] – NLTK: [-1, 1] – John Snow Labs (sparknlp): Negative or Positive – Stanford CoreNLP: 0, 1, 2, 3, 4 #UnifiedAnalytics #SparkAISummit 13
14 .Demo
15 .Problems Encountered • Stanford CoreNLP gave predictions per sentence, not turn • Performed poorly on neutral sentences – John Snow Labs had no “neutral” category • Didn’t do well with banking domain Can we do better? #UnifiedAnalytics #SparkAISummit 15
16 .Approach v2
17 .Approach v2 • No pre-trained sentiment analysis models! • Model: – Build LSTM model on all conversation text to predict sentiment – Augment with additional features (e.g. # of turns, time of day, etc.) – Pass features through end classifier • Positive/Negative X #UnifiedAnalytics #SparkAISummit 17
18 .Transfer Learning • Distributed training of LSTM on open-source sentiment dataset with HorovodRunner • Transfer learning on banking data #UnifiedAnalytics #SparkAISummit 18
19 .LSTM Stats • Basic stats for performance: – Accuracy: 73% – FPR: 11% – FNR: 32% 𝐹𝑃 𝐹𝑃 𝐹𝑃𝑅 = = 𝑁 𝐹𝑃 + 𝑇𝑁 𝐹𝑁 𝐹𝑁 𝐹𝑁𝑅 = = 𝑁 𝑇𝑃 + 𝐹𝑁 #UnifiedAnalytics #SparkAISummit 19
20 .Features • LSTM output • Conversational Features – User average turn length – Agent average turn length – # Turns – Duration • Temporal Features – Day of Week – Time of Day • Others #UnifiedAnalytics #SparkAISummit 20
21 .Classifier • Logistic Regression – Accuracy: 77% – FPR: 24% – FNR: 22% • Random Forests – Accuracy: 80% – FPR: 10% – FNR: 51% • Others #UnifiedAnalytics #SparkAISummit 21
22 .Random Forest • Chose the Random Forest b/c: – Lowest FP rate & Highest Accuracy – Good model interpretability – Part of SparkML and can use with Pipeline API (easy to switch to Scala) #UnifiedAnalytics #SparkAISummit 22
23 .Production
24 .Production Requirements • Fit into dev pipeline, largely Scala/Java based – But a lot of data science is done in Python • Close the feedback loop - constantly learning • Automated deployments • Streaming instead of batch #UnifiedAnalytics #SparkAISummit 24
25 .Architecture #UnifiedAnalytics #SparkAISummit 25
26 .Architecture #UnifiedAnalytics #SparkAISummit 26
27 .Python & Scala • Train LSTM in Python (Keras) • Save Model • Load model via UDF • Apply using Scala! #UnifiedAnalytics #SparkAISummit 27
28 .Deployment #UnifiedAnalytics #SparkAISummit 28
29 .Recap Idea Notebook Production Happy #UnifiedAnalytics #SparkAISummit 29