利用Real-Time Decision部署实时决策服务

本讲座是一次技术会议,探讨如何将Redis与基于Spark的培训平台结合使用,以提供实时预测和决策特性,作为更大系统的一部分。为了设置会话的上下文,我们首先介绍Redis数据模型,以及如何使用Redis的特性(名称空间、复制)来构建(按规模)更快速的预测引擎,这些引擎比定制应用程序更可靠、功能更丰富、更易于管理。从那里,我们查看Redis-ML的模型服务能力,以及如何将它们与基于Spark的ML管道集成,以自动化从培训到部署的整个模型开发过程。
展开查看详情

1.Deploying Real-Time Decision Sevices using Redis Tague Griffith, Redis Labs #MLSAIS12

2.Why Machine Learning

3. Teaching a computer, by example, an algorithm that is too complex to program

4.Machine Learning Problems Classification Regression Clustering Pick One of a Set Score or Rank Group Similar • Spam Detection • Recommendations • Find Similar Items • Manufacturing defect • Likelihood of • Customer detection Purchase segmentation • Handwriting analysis • Cohort detection • Decision Trees • K-Means • Naïve Bayes • Linear Regression • K-Nearest Neighbors • Logistic Regression • SVM • Hierarchical Clustering

5.Supervised Learning – Training Spam Classifier #MLSAIS12

6.Deploying a Spam Classifier #MLSAIS12

7.How do we Build these Boxes ¯\_( )_/¯ #MLSAIS12

8.• Building high performance and reliable services are hard, isn't there something we can deploy

9.Redis - ML

10.Typical Spark Application Structure Spark Training File System Custom Server Client App Data is loaded into Spark Model is saved in files Model is loaded to your Serving Client custom app #MLSAIS12

11. Redis-ML: Predictive Model Serving Engine • Predictive models as native Redis types • Perform evaluation directly in Redis Any Training • Store training output as “hot model” Platform ClientClientClient Spark Training Redis-ML App App App Data loaded into Spark Model is saved in Serving Client Redis-ML

12.REmote DIctionary Server Strings Hashes Lists Hyperlog- Sets Bitmaps logs Sorted Geo- Bitfield Sets spatial

13. A Quick Recap of Redis "I'm a Plain Text String!" Strings / Bitmaps / BitFields { A: “foo”, B: “bar”, C: “baz” } Hash Tables (objects!) [A→B→C→D→E] Linked Lists Key { A,B,C,D,E } Sets { A: 0.1, B: 0.3, C: 100, D: 1337 } Sorted Sets { A: (51.5, 0.12), B: (32.1, 34.7) } Geo Sets 00110101 11001110 10101010 HyperLogLog

14. Redis Modules • Any C/C++ program can now run on Redis • Use existing or add new data-structures • Enjoy simplicity, infinite scalability and high availability while keeping the native speed of Redis • Can be created by anyone New Data Types New Commands New Capabilities

15.Redis ML Module Redis Module Tree Ensembles Linear Regression Logistic Regression Matrix + Vector Operations More to come...

16. Random Forest Model • A collection of decision trees • Supports classification & regression • Splitter Node can be: ◦ Categorical (e.g. day == “Sunday”) ◦ Numerical (e.g. age < 43) • Decision is taken by the majority of decision trees

17.Classic Tree Problem: Titanic Survival Sex = • Passenger Data encoded as feature vecto Male ? YES NO • ML Algorithm learns the tree rules • ID3, CART (RPART), etc. Age < Survived • Tree rules used to infer results 9.5? Sibps > Died 2.5? Died Survived

18.Titanic Survival: Random Forest Tree #1 Tree #2 Tree #3 Weight< Sex = Country= 80kg? Male ? US? YES NO YES NO YES NO Age < State = Survived Survived I.Q<100? Survived 9.5? CA? *Sibps > Height> Eye color Died Died Died 2.5? 1.60m? =blue? Died Survived Died Survived Died Survived

19.Who Would Survive the Titanic • John: Mathew: • Male, 34, • Male, 6 • 3 Sisters (Sibps=3) • Married w/ 2 kids • New York, USA (Sibps=3) • 1.06m, 22.7 kg • New York, USA • 100 iq • 1.78m, 78kg • Brown eyes • 110 iq • Blue eyes Let's use our forest to find out

20. Redis: Forest Data Type Add nodes to a tree in a forest: ML.FOREST.ADD <forestId> <treeId> <path> [ [NUMERIC|CATEGORIC] <splitterAttr> <splitterVal> ] | [LEAF] <predVal> Perform classification/regression of a feature vector: ML.FOREST.RUN <forestId> <features> [CLASSIFICATION|REGRESSION]

21. Real World Challenge • Ad serving company • Need to serve 20,000 ads/sec @ 50msec data-center latency • Runs 1k campaigns → 1K random forest • Each forest has 15K trees • On average each tree has 7 levels (depth)

22.Ad Serving costs: Homegrown v. Redis Cut computing infrastructure by 97% Homegrown 1,247 x c4.8xlarge 35 x c4.8xlarge

23. Summary • Train with Spark, Serve with Redis • Redise (Cloud or Pack): • 97% resource cost serving ‒Scaling, HA, Performance • Simplify ML lifecycle ‒PAYG – cost optimized ‒Ease of use ‒Supported by the teams who created Spark and Redis Client Client Client Spark Training Redis-ML App App App + Data loaded into Spark Model is saved in Serving Client Redis-ML

24.Thank you!