Burger King使用RayOnSpark进行基于实时情景特征的快餐食品推荐使用

在快餐推荐的场景下,用户实时的点餐行为和各种情景特征(比如时间、天气和位置等)都是能够被用来做合适推荐的重要因素。在Burger King,我们开发了一个全新的Transformer Cross Transformer (TxT)推荐模型,用多个 Transformer编码器来提取用户点单行为和复杂的情景特征,并通过点积的方法将Transformer的输出组合在一起以生成推荐。线上A/B测试结果表明TxT模型不仅比现有的其他推荐模型取得了更好的效果,同时该模型也能被成功地应用到其他推荐场景中。

此外,我们利用 Analytics Zoo提供的RayOnSpark功能,使用 Ray, Apache Spark和 Apache MXNet 构建了一个完整的端到端的推荐系统。它将数据处理(使用 Spark )和分布式训练(使用 MXNet和Ray)集成到一个统一的数据分析和 AI 流水线中,并直接运行在存储数据的同一个大数据集群上。我们已经在 Burger King成功部署了这套推荐系统,并且已经在生产环境中取得了卓越的成果。


1. AI on Big Data Distributed, High-Performance Unified Analytics + AI Platform Deep Learning Framework Distributed TensorFlow, Keras and PyTorch on for Apache Spark Apache Spark/Flink & Ray https://github.com/intel-analytics/bigdl https://github.com/intel-analytics/analytics-zoo Accelerating Data Analytics + AI Solutions At Scale

2.Context-aware Fast Food Recommendation with RayOnSpark at Burger King LUYANG WANG Burger King Corporation KAI HUANG Intel Corporation

3. LUYANG WANG ▪ Food recommendation use case TxT model in detail Agenda ▪ KAI HUANG ▪ AI on big data ▪ Distributed training pipeline with RayOnSpark

4.Food Recommendation Use Case

5. Food Recommendation Use Case Guest arrives ODMB Checks Menu Board Cashier enters order Checks Menu Board Guest completes order

6. Food Recommendation Use Case Guest arrives ODMB Checks Menu Board Cashier enters order Checks Menu Board Guest completes order

7.Use Case Challenges Challenges ▪ Lack of user identifiers ▪ Same session food compatibilities ▪ Other variables in our use case: locations, weathers, time, etc. ▪ Deployment challenges

8.Use Case Challenges Solutions ▪ Session based recommendation model ▪ Able to take complex context features into consideration ▪ Able to be deployed anywhere, both edge / cloud

9.Transformer Cross Transformer (TxT)

10.TxT Model Overview Model Components ▪ Sequence Transformer Taking item order sequence as input ▪ Context Transformer Taking multiple context features as input ▪ Latent Cross Joint Training Element-wise product for both transformer outputs

11. Model Comparison TxT RNN Latent Cross

12. Offline Evaluation Offline Training Loss Offline Training Result Model Top1 Accuracy Top3 Accuracy RNN 29.98% 46.24% Contextual 32.18% 48.37% ItemCF RNN Latent Cross 33.10% 49.98% TxT 34.52% 52.37%

13. Online Performance Inference Performance A/B Testing Result Inference Latency (ms) 25 Model Conversation Rate Add-on Sales Gain Gain 20 20 18 15 RNN Latent Cross - - (control) 10 5 TxT +7.5% +4.7% 0 RNN Latent Cross TxT Inference Latency (ms)

14. Model Training Architecture Previous Current

15.AI on Big Data

16. AI on Big Data Accelerating Data Analytics + AI Solutions At Scale ▪ BigDL: Distributed Deep Learning Framework for Apache Spark https://github.com/intel-analytics/BigDL ▪ Analytics Zoo: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray https://github.com/intel-analytics/analytics-zoo ▪ We develop Project Orca in Analytics Zoo based on Spark and Ray to allow users to easily scale out single node Python notebook across large clusters, by providing: ▪ Data-parallel preprocessing for Python AI (supporting common Python libraries such as Pandas, Numpy, PIL, TensorFlow Dataset, PyTorch DataLoader, etc.) ▪ Sklearn-style APIs for transparently distributed training and inference (supporting TensorFlow, PyTorch, Keras, MXNet, Horovod, etc.) https://github.com/intel-analytics/analytics-zoo/tree/master/pyzoo/zoo/orca

17. Ray Ray is a fast and simple framework for building and running distributed applications. ▪ Ray Core provides easy Python interface for parallelism by using remote functions and actors. Ray is packaged with several high-level libraries to accelerate machine learning workloads. ▪ Tune: Scalable Experiment Execution and Hyperparameter Tuning ▪ RLlib: Scalable Reinforcement Learning ▪ RaySGD: Distributed Training Wrappers ▪ https://github.com/ray-project/ray/

18.Distributed Training Pipeline on Big Data

19. RayOnSpark Seamlessly integrate Ray applications into Spark data processing pipelines. ▪ Runtime cluster environment preparation. ▪ Create a SparkContext on the drive node and use Spark to perform data cleaning, ETL, and preprocessing tasks. ▪ RayContext on Spark driver launches Ray across the cluster. ▪ Similar to RaySGD, we implement a lightweight shim layer around native MXNet modules for easy deployment on YARN cluster. ▪ Each MXNet worker takes the local data partition of Spark RDD or DataFrame from the plasma object store used by Ray.

20.End-to-end Distributed Training Pipeline Project Orca provides a user-friendly interface for the pipeline. ▪ Minimum code changes and learning efforts are needed to scale the training from single node to big data clusters. ▪ The entire pipeline runs on a single cluster. No extra data transfer needed. from zoo.orca import init_orca_context from zoo.orca.learn.mxnet import Estimator # init_orca_context unifies SparkContext and RayContext sc = init_orca_context(cluster_mode="yarn", num_nodes, cores, memory) # Use sc to load data and do data preprocessing. mxnet_estimator = Estimator(train_config, model=txt, loss=SoftmaxCrossEntropyLoss(), metrics=[mx.metric.Accuracy(), mx.metric.TopKAccuracy(3)]) mxnet_estimator.fit(data=train_rdd, validation_data=val_rdd, epochs=…, batch_size=…)

21. Conclusion ▪ Context-Aware Fast Food Recommendation at Burger King with RayOnSpark https://arxiv.org/abs/2010.06197 https://medium.com/riselab/context-aware-fast-food-recommendation-at-burger-king- with-rayonspark-2e7a6009dd2d ▪ For more details of RayOnSpark: https://www.slidestalk.com/w/217 ▪ More information for Analytics Zoo at: https://github.com/intel-analytics/analytics-zoo https://analytics-zoo.github.io/

22. Unified Analytics + AI Platform Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray https://github.com/intel-analytics/analytics-zoo