- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Burger King使用RayOnSpark进行基于实时情景特征的快餐食品推荐使用
在快餐推荐的场景下,用户实时的点餐行为和各种情景特征(比如时间、天气和位置等)都是能够被用来做合适推荐的重要因素。在Burger King,我们开发了一个全新的Transformer Cross Transformer (TxT)推荐模型,用多个 Transformer编码器来提取用户点单行为和复杂的情景特征,并通过点积的方法将Transformer的输出组合在一起以生成推荐。线上A/B测试结果表明TxT模型不仅比现有的其他推荐模型取得了更好的效果,同时该模型也能被成功地应用到其他推荐场景中。
此外,我们利用 Analytics Zoo提供的RayOnSpark功能,使用 Ray, Apache Spark和 Apache MXNet 构建了一个完整的端到端的推荐系统。它将数据处理(使用 Spark )和分布式训练(使用 MXNet和Ray)集成到一个统一的数据分析和 AI 流水线中,并直接运行在存储数据的同一个大数据集群上。我们已经在 Burger King成功部署了这套推荐系统,并且已经在生产环境中取得了卓越的成果。
展开查看详情
1 . AI on Big Data Distributed, High-Performance Unified Analytics + AI Platform Deep Learning Framework Distributed TensorFlow, Keras and PyTorch on for Apache Spark Apache Spark/Flink & Ray https://github.com/intel-analytics/bigdl https://github.com/intel-analytics/analytics-zoo Accelerating Data Analytics + AI Solutions At Scale
2 .Context-aware Fast Food Recommendation with RayOnSpark at Burger King LUYANG WANG Burger King Corporation KAI HUANG Intel Corporation
3 . LUYANG WANG ▪ Food recommendation use case TxT model in detail Agenda ▪ KAI HUANG ▪ AI on big data ▪ Distributed training pipeline with RayOnSpark
4 .Food Recommendation Use Case
5 . Food Recommendation Use Case Guest arrives ODMB Checks Menu Board Cashier enters order Checks Menu Board Guest completes order
6 . Food Recommendation Use Case Guest arrives ODMB Checks Menu Board Cashier enters order Checks Menu Board Guest completes order
7 .Use Case Challenges Challenges ▪ Lack of user identifiers ▪ Same session food compatibilities ▪ Other variables in our use case: locations, weathers, time, etc. ▪ Deployment challenges
8 .Use Case Challenges Solutions ▪ Session based recommendation model ▪ Able to take complex context features into consideration ▪ Able to be deployed anywhere, both edge / cloud
9 .Transformer Cross Transformer (TxT)
10 .TxT Model Overview Model Components ▪ Sequence Transformer Taking item order sequence as input ▪ Context Transformer Taking multiple context features as input ▪ Latent Cross Joint Training Element-wise product for both transformer outputs
11 . Model Comparison TxT RNN Latent Cross
12 . Offline Evaluation Offline Training Loss Offline Training Result Model Top1 Accuracy Top3 Accuracy RNN 29.98% 46.24% Contextual 32.18% 48.37% ItemCF RNN Latent Cross 33.10% 49.98% TxT 34.52% 52.37%
13 . Online Performance Inference Performance A/B Testing Result Inference Latency (ms) 25 Model Conversation Rate Add-on Sales Gain Gain 20 20 18 15 RNN Latent Cross - - (control) 10 5 TxT +7.5% +4.7% 0 RNN Latent Cross TxT Inference Latency (ms)
14 . Model Training Architecture Previous Current
15 .AI on Big Data
16 . AI on Big Data Accelerating Data Analytics + AI Solutions At Scale ▪ BigDL: Distributed Deep Learning Framework for Apache Spark https://github.com/intel-analytics/BigDL ▪ Analytics Zoo: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray https://github.com/intel-analytics/analytics-zoo ▪ We develop Project Orca in Analytics Zoo based on Spark and Ray to allow users to easily scale out single node Python notebook across large clusters, by providing: ▪ Data-parallel preprocessing for Python AI (supporting common Python libraries such as Pandas, Numpy, PIL, TensorFlow Dataset, PyTorch DataLoader, etc.) ▪ Sklearn-style APIs for transparently distributed training and inference (supporting TensorFlow, PyTorch, Keras, MXNet, Horovod, etc.) https://github.com/intel-analytics/analytics-zoo/tree/master/pyzoo/zoo/orca
17 . Ray Ray is a fast and simple framework for building and running distributed applications. ▪ Ray Core provides easy Python interface for parallelism by using remote functions and actors. Ray is packaged with several high-level libraries to accelerate machine learning workloads. ▪ Tune: Scalable Experiment Execution and Hyperparameter Tuning ▪ RLlib: Scalable Reinforcement Learning ▪ RaySGD: Distributed Training Wrappers ▪ https://github.com/ray-project/ray/
18 .Distributed Training Pipeline on Big Data
19 . RayOnSpark Seamlessly integrate Ray applications into Spark data processing pipelines. ▪ Runtime cluster environment preparation. ▪ Create a SparkContext on the drive node and use Spark to perform data cleaning, ETL, and preprocessing tasks. ▪ RayContext on Spark driver launches Ray across the cluster. ▪ Similar to RaySGD, we implement a lightweight shim layer around native MXNet modules for easy deployment on YARN cluster. ▪ Each MXNet worker takes the local data partition of Spark RDD or DataFrame from the plasma object store used by Ray.
20 .End-to-end Distributed Training Pipeline Project Orca provides a user-friendly interface for the pipeline. ▪ Minimum code changes and learning efforts are needed to scale the training from single node to big data clusters. ▪ The entire pipeline runs on a single cluster. No extra data transfer needed. from zoo.orca import init_orca_context from zoo.orca.learn.mxnet import Estimator # init_orca_context unifies SparkContext and RayContext sc = init_orca_context(cluster_mode="yarn", num_nodes, cores, memory) # Use sc to load data and do data preprocessing. mxnet_estimator = Estimator(train_config, model=txt, loss=SoftmaxCrossEntropyLoss(), metrics=[mx.metric.Accuracy(), mx.metric.TopKAccuracy(3)]) mxnet_estimator.fit(data=train_rdd, validation_data=val_rdd, epochs=…, batch_size=…)
21 . Conclusion ▪ Context-Aware Fast Food Recommendation at Burger King with RayOnSpark https://arxiv.org/abs/2010.06197 https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king- with-rayonspark-2e7a6009dd2d ▪ For more details of RayOnSpark: https://www.slidestalk.com/w/217 ▪ More information for Analytics Zoo at: https://github.com/intel-analytics/analytics-zoo https://analytics-zoo.github.io/
22 . Unified Analytics + AI Platform Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray https://github.com/intel-analytics/analytics-zoo