- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
利用Real-Time Decision部署实时决策服务
展开查看详情
1 .Deploying Real-Time Decision Sevices using Redis Tague Griffith, Redis Labs #MLSAIS12
2 .Why Machine Learning
3 . Teaching a computer, by example, an algorithm that is too complex to program
4 .Machine Learning Problems Classification Regression Clustering Pick One of a Set Score or Rank Group Similar • Spam Detection • Recommendations • Find Similar Items • Manufacturing defect • Likelihood of • Customer detection Purchase segmentation • Handwriting analysis • Cohort detection • Decision Trees • K-Means • Naïve Bayes • Linear Regression • K-Nearest Neighbors • Logistic Regression • SVM • Hierarchical Clustering
5 .Supervised Learning – Training Spam Classifier #MLSAIS12
6 .Deploying a Spam Classifier #MLSAIS12
7 .How do we Build these Boxes ¯\_( )_/¯ #MLSAIS12
8 .• Building high performance and reliable services are hard, isn't there something we can deploy
9 .Redis - ML
10 .Typical Spark Application Structure Spark Training File System Custom Server Client App Data is loaded into Spark Model is saved in files Model is loaded to your Serving Client custom app #MLSAIS12
11 . Redis-ML: Predictive Model Serving Engine • Predictive models as native Redis types • Perform evaluation directly in Redis Any Training • Store training output as “hot model” Platform ClientClientClient Spark Training Redis-ML App App App Data loaded into Spark Model is saved in Serving Client Redis-ML
12 .REmote DIctionary Server Strings Hashes Lists Hyperlog- Sets Bitmaps logs Sorted Geo- Bitfield Sets spatial
13 . A Quick Recap of Redis "I'm a Plain Text String!" Strings / Bitmaps / BitFields { A: “foo”, B: “bar”, C: “baz” } Hash Tables (objects!) [A→B→C→D→E] Linked Lists Key { A,B,C,D,E } Sets { A: 0.1, B: 0.3, C: 100, D: 1337 } Sorted Sets { A: (51.5, 0.12), B: (32.1, 34.7) } Geo Sets 00110101 11001110 10101010 HyperLogLog
14 . Redis Modules • Any C/C++ program can now run on Redis • Use existing or add new data-structures • Enjoy simplicity, infinite scalability and high availability while keeping the native speed of Redis • Can be created by anyone New Data Types New Commands New Capabilities
15 .Redis ML Module Redis Module Tree Ensembles Linear Regression Logistic Regression Matrix + Vector Operations More to come...
16 . Random Forest Model • A collection of decision trees • Supports classification & regression • Splitter Node can be: ◦ Categorical (e.g. day == “Sunday”) ◦ Numerical (e.g. age < 43) • Decision is taken by the majority of decision trees
17 .Classic Tree Problem: Titanic Survival Sex = • Passenger Data encoded as feature vecto Male ? YES NO • ML Algorithm learns the tree rules • ID3, CART (RPART), etc. Age < Survived • Tree rules used to infer results 9.5? Sibps > Died 2.5? Died Survived
18 .Titanic Survival: Random Forest Tree #1 Tree #2 Tree #3 Weight< Sex = Country= 80kg? Male ? US? YES NO YES NO YES NO Age < State = Survived Survived I.Q<100? Survived 9.5? CA? *Sibps > Height> Eye color Died Died Died 2.5? 1.60m? =blue? Died Survived Died Survived Died Survived
19 .Who Would Survive the Titanic • John: Mathew: • Male, 34, • Male, 6 • 3 Sisters (Sibps=3) • Married w/ 2 kids • New York, USA (Sibps=3) • 1.06m, 22.7 kg • New York, USA • 100 iq • 1.78m, 78kg • Brown eyes • 110 iq • Blue eyes Let's use our forest to find out
20 . Redis: Forest Data Type Add nodes to a tree in a forest: ML.FOREST.ADD <forestId> <treeId> <path> [ [NUMERIC|CATEGORIC] <splitterAttr> <splitterVal> ] | [LEAF] <predVal> Perform classification/regression of a feature vector: ML.FOREST.RUN <forestId> <features> [CLASSIFICATION|REGRESSION]
21 . Real World Challenge • Ad serving company • Need to serve 20,000 ads/sec @ 50msec data-center latency • Runs 1k campaigns → 1K random forest • Each forest has 15K trees • On average each tree has 7 levels (depth)
22 .Ad Serving costs: Homegrown v. Redis Cut computing infrastructure by 97% Homegrown 1,247 x c4.8xlarge 35 x c4.8xlarge
23 . Summary • Train with Spark, Serve with Redis • Redise (Cloud or Pack): • 97% resource cost serving ‒Scaling, HA, Performance • Simplify ML lifecycle ‒PAYG – cost optimized ‒Ease of use ‒Supported by the teams who created Spark and Redis Client Client Client Spark Training Redis-ML App App App + Data loaded into Spark Model is saved in Serving Client Redis-ML
24 .Thank you!