- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
High Performance Transfer Learning for Classifying Intent of Sales Engagement Em
展开查看详情
1 .WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
2 .High Performance Transfer Learning for Classifying Intent of Sales Engagement Emails: An Experimental Study Yong Liu Corey Zumar #UnifiedAnalytics #SparkAISummit
3 .Outline • Data Science Research Objectives • Sales Engagement Platform (SEP) • Use Cases and Technical Challenges • Experiments and Datasets • Results • MLflow Integration and Experiments Tracking • Summary and Future Work #UnifiedAnalytics #SparkAISummit 3
4 .Data Science Research Objectives • Establish a high performance transfer learning evaluation framework for email classification • Three research questions: – Which embeddings and pre-trained LMs are to be used? – Which transfer learning implementation strategies (feature- based vs. fine-tuning) are to be used? – How many labeled samples are needed? #UnifiedAnalytics #SparkAISummit 4
5 .Sales Engagement Platform (SEP) • A new category of software Sales Reps Sales Engagement Platform (SEP) (e.g., Outreach) CRMs (e.g., Salesforce, Microsoft Dynamics, SAP) #UnifiedAnalytics #SparkAISummit 5
6 .SEP Encodes and Automates Sales Activities into Workflows/Pipelines Ø Automates execution and capture of activities (e.g., emails) and records in a CRM. Ø Schedules and reminds the rep when it is the right time to do the manual tasks (e.g. phone call, custom manual email) Ø Enables reps to perform one-on-one personalized outreach to up to 10x more prospects than before. #UnifiedAnalytics #SparkAISummit 6
7 .Why Email Intent Classification Is Needed • Email content is critical for driving results for prospecting and other stages of the sales process • A replier’s email intent-based metric (e.g., positive, objection, unsubscription) is much better than a simple “reply rate” • A/B testing using a better metric can pick winners of the email content/template more confidently #UnifiedAnalytics #SparkAISummit 7
8 .Why Email Intent Classification is Challenging @ SEP • Different context and players: different roles of players are involved throughout the sales processes and at different orgs • Limited labeled sales engagement domain emails: GDPR and privacy/compliance-constraints; time- consuming and even not possible to label emails in many orgs on a SEP #UnifiedAnalytics #SparkAISummit 8
9 .Why Transfer Learning? • Using pretrained language models opens doors for high performance transfer learning (HPTL): – Fewer training samples – Better accuracy – Reduced model training time and engineering complexity • Pretrained language models such as BERT have achieved state-of-the-art scores in the NLP GLUE leaderboard (https://gluebenchmark.com/) – However, whether such benchmark success can be readily translated to practical application is still unknown #UnifiedAnalytics #SparkAISummit 9
10 .A List of Pretrained LMs and Embeddings for Experiments • GloVe – count-based context-free word embeddings released in 2014 • ELMo – context-aware character-based embeddings that is based on a recurrent neural network (RNN) architecture released in 2018 • Flair – contextual string embedding released in 2018 • BERT – state-of-the-art transformer-based deep bidirectional language model released in late 2018 by Google #UnifiedAnalytics #SparkAISummit 10
11 .Experimental Email Dataset #UnifiedAnalytics #SparkAISummit 11
12 .Example Intents and Emails • Positive: "Actually, I'd be interested in talking Friday. Do you have some time around 10am?” • Objection: “Thanks for reaching out. This is not something I am interested in at this time.” • Unsubscribe: “Please remove me from your email list.” • Not-sure: “Mike, in regards to? John” #UnifiedAnalytics #SparkAISummit 12
13 .Two Sets of Experiment Runs • Using different pretrained language models (LMs) and embeddings: feature-based vs. fine-tuning – Using the full training examples • Different labeled training size with feature-based and fine-tuning Approach – Increasingly larger training size: 50, 100, 200, 300, 500, 1000, 2000, 3000 #UnifiedAnalytics #SparkAISummit 13
14 .Result (1): Different Embeddings feature-based Ø BERT-finetuning has the best f1 score Ø When using feature-based approaches, GloVe performs slightly better Ø Classical MLs such as LightGBM+TF-IDF underperform BERT-finetuing #UnifiedAnalytics #SparkAISummit 14
15 .Result (2): Scaling Effect with Different Training Sample Sizes Ø BERT-finetuning outperforms all other Feature-based approaches when training example size is greater than 300 Ø When training size is small (< 100), BERT+Flair performs better Ø To achieve an f1-score > 0.8, BERT-finetuning needs at least 500 training examples, while feature-based approach needs at least 2000 training examples #UnifiedAnalytics #SparkAISummit 15
16 .Introducing Open machine learning platform • Works with any ML library & language • Runs the same way anywhere (e.g. any cloud) • Designed to be useful for 1 or 1000+ person orgs • Integrates with Databricks #UnifiedAnalytics #SparkAISummit 16
17 .MLflow Components Tracking Projects Models Record and query Packaging format General model format experiments: code, for reproducible runs that supports diverse configs, results, …etc on any platform deployment tools #UnifiedAnalytics #SparkAISummit 17
18 .Key Concepts in Tracking Parameters: key-value inputs to your code Metrics: numeric values (can update over time) Artifacts: arbitrary files, including data and models Source: training code that ran Version: version of the training code Tags and Notes: any additional info #UnifiedAnalytics #SparkAISummit 18
19 .MLflow Tracking: Example Code import mlflow with mlflow.start_run(): Tracking mlflow.log_param("layers", layers) mlflow.log_param("alpha", alpha) # train model Record and query experiments: code, mlflow.log_metric("mse", model.mse()) configs, results, mlflow.log_artifact("plot", model.plot(test_df)) mlflow.tensorflow.log_model(model) …etc #UnifiedAnalytics #SparkAISummit 19
20 . MLflow Models Inference Code Model Format Flavor 1 Flavor 2 Batch & Stream Scoring Standardfor Standard forML ML ML Frameworks models Serving Tools models #UnifiedAnalytics #SparkAISummit 20
21 .MLflow to Manage Hundreds of Experiments • Pytorch models for the feature-based approach – Using the Flair framework • Tensorflow for BERT fine-tuning – Using the bert-tensorhub framework #UnifiedAnalytics #SparkAISummit 21
22 .MLflow Tracking All Experiments #UnifiedAnalytics #SparkAISummit 22
23 .MLflow Logs Artifacts/Parameters/Metrics/Models mlflow.log_metric("micro_avg_f1_score_avg", np.asarray(test_scores).mean()) #UnifiedAnalytics #SparkAISummit 23
24 .Images Can Be Logged as Artifacts mlflow.log_artifact(tSNE_img, 'run_{0}'.format(run_id)) #UnifiedAnalytics #SparkAISummit 24
25 .Summary • Transfer learning using fine-tuning BERT outperforms all feature-based approaches using different embeddings/pretrained LMs when training example size is greater than 300 • Pretrained language models solve the cold start problem when there is very little training data – E.g., with as little as 50 labeled examples, the f1 score reaches 0.67 with BERT+Flair using the feature-based approach). • However, to get to f1-score >0.8, it may still need one to two thousand examples for a feature-based approach or 500 examples for fine-tuning a pre-trained BERT language model. • MLFlow is proven to be useful and powerful for tracking all experiments #UnifiedAnalytics #SparkAISummit 25
26 .Future Work • MLflow: from experimentation to production – Pick the best model for deployment • Extend to cross-org transfer learning – Using one or multiple orgs data for training and then applying to other orgs #UnifiedAnalytics #SparkAISummit 26
27 .Acknowledgements • Outreach Data Science Team • Databricks MLflow team #UnifiedAnalytics #SparkAISummit 27
28 .DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT