- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Explain Yourself: Why You Get the Recommendations You Do
展开查看详情
1 .WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
2 .Explain Yourself: Why you get the recommendations you do Niels Hanson, KPMG Kishori Konwar, Broad Institute (Harvard/MIT) Guohao Xiao, KPMG #UnifiedAnalytics #SparkAISummit
3 .Recommender engines • Supercharged online retail and have been essential tools for personalizing web- experiences • Recommender engines drive online engagement and sales • Models themselves are highly influential to the type of content being consumed • Netflix and YouTube report that recommendations are the primary way customers find content on their platform #UnifiedAnalytics #SparkAISummit 3
4 .Explainable recommendations • There is a problem of transparency as many recommender models are viewed as black boxes • Customers are vaguely aware that recommendations are related to their activity and profiles • Moreover, in an era of GDPR there are calls for increased accountability and transparency of models • We should strive for models that are both effective and understandable • This talk: Introduce a method for spark.ml to explain recommendations coming from its popular ALS Recommender model #UnifiedAnalytics #SparkAISummit 4
5 .Generally: There are two styles of recommender systems Collaborative filtering: Use user + item interactions to build a model from past behavior • Able to recommend products based on general trends in the data • Often require a lot of data to identify these trends and get started leading to the Cold Start problem Content-based filtering: Use characteristics of an item to recommend similar items • Able to recommend right away using any item as a starting point • Difficult to recommend items outside far outside this starting point Often combined to form Hybrid Recommender Systems #UnifiedAnalytics #SparkAISummit 5
6 .Matrix-factorized collaborative filtering Probably the most widely-used recommender model: • Tries to predict missing values of the user-item consumption matrix by decomposing the matrix into one-or-more latent factors • These factors represent general trends pf user-item consumption • Large component of model that won the $1M the Netflix prize in 2009 [1] • Recommender model implemented in the spark.mllib and spark.ml YT libraries [2] – Known to return relevant recommendations – Still a major part of many recommender systems – Spark Implementation via Alternating Least-Squares (ALS) is highly scalable to large data 1. https://www.nytimes.com/2009/09/22/technology/internet/22netflix.html 2. Hu, Y., Koren, Y., & Volinsky, C. (2008). Collaborative Filtering for Implicit Feedback Datasets. Icdm, 263–272. http://doi.org/10.1109/ICDM.2008.22 #UnifiedAnalytics #SparkAISummit 6
7 .Some Details: Matrix-factorized collaborative filtering The model itself has a number of tune-able parameters: • Rank: number of latent factors to fit • Lambda (regparam): regularization parameter for ALS • Alpha: “Confidence” parameter that weights user data • Model-type: We user interaction actively given (explicit) or inferred from an observed activity (implicit) Tip: Explicit and implicit models are actually quite different, you’ll YT need to know what situation you are in. 1. https://www.nytimes.com/2009/09/22/technology/internet/22netflix.html 2. Hu, Y., Koren, Y., & Volinsky, C. (2008). Collaborative Filtering for Implicit Feedback Datasets. Icdm, 263–272. http://doi.org/10.1109/ICDM.2008.22 #UnifiedAnalytics #SparkAISummit 7
8 .User-spaces and item-spaces from implicit factors An over-looked aspect of CF is that implicit factors capture lots of information about how customers consume products: • It is straightforward to find users and items with similar consumption patterns – Similar users will have similar user-factors – Similar items will have similar item-factors • Grouping users together can inform advertising campaigns around user’s similar interests • Groups of items together might inform cross-selling YT initiatives While these are great use-cases in their own right we can Y also take a more user-specific perspective of the item- space… 1. https://www.nytimes.com/2009/09/22/technology/internet/22netflix.html 2. Hu, Y., Koren, Y., & Volinsky, C. (2008). Collaborative Filtering for Implicit Feedback Datasets. Icdm, 263–272. http://doi.org/10.1109/ICDM.2008.22 #UnifiedAnalytics #SparkAISummit 8
9 .A user-specific item-space We might want a user-specific item-space that is Wu scales and rotates the implicit item factors based on customized to the interests of the user (Su) the activities of the user. We do this by creating stretch-and-rotate matrix (Wu) which From Wu we can generate a user-specific item-similarity will linearly transform our item similarities based on the matrix (Su): user’s interests (Cu). -1 1 = = 1 YT 6 Wu YT 5 Wu 1 1 Cu Y Y Su 1. Hu, Y., Koren, Y., & Volinsky, C. (2008). Collaborative Filtering for Implicit Feedback Datasets. Icdm, 263–272. http://doi.org/10.1109/ICDM.2008.22 #UnifiedAnalytics #SparkAISummit 9
10 .Item-spaces that reflect user We can see that there is likely some structure between interests running shoes and music products in the original space: We can image creating user-specific item spaces two users with two different interests: • a: a runner who primarily shops for shoes • b, a musician who shops for music and instruments = YT But the user-specific spaces: Wa • The runner knows about the differences in running shoes • The musician knows about the differences in musical instruments Y Sa = Wb YT Y Sb #UnifiedAnalytics #SparkAISummit 10
11 .Our runner… YY WY TT aY #UnifiedAnalytics #SparkAISummit 11
12 .Our musician… YY WY TT bY #UnifiedAnalytics #SparkAISummit 12
13 .User-specific item-spaces allow us to decompose recommendations into their component parts The predicted recommendation value for an item can be decomposed as the following linear combination Using the example on the right, the blue circles representing the c_uj values and distance being the inverse of similarity. This means that our recommendation score for the new sneaker will be: 5 1 2 5 = (1/1 * 4) + (1/2 * 5) + (1/5 * 1) 4 1 1. Hu, Y., Koren, Y., & Volinsky, C. (2008). Collaborative Filtering for Implicit Feedback Datasets. Icdm, 263–272. http://doi.org/10.1109/ICDM.2008.22 #UnifiedAnalytics #SparkAISummit 13
14 .User-specific item-spaces allow us to decompose recommendations into their component parts The predicted recommendation value for an item can be decomposed as the following linear combination 5 0.5 Using the example on the right, the blue circles representing the c_uj values and distance being the inverse of similarity. This means that our recommendation score for the new sneaker will be: 5 1 8 = (1/1 * 4) + (1/2 * 5) + (1/5 * 1) 4 = (1/8 * 4) + (1/5 * 5) + (1/0.5 * 1) 1. Hu, Y., Koren, Y., & Volinsky, C. (2008). Collaborative Filtering for Implicit Feedback Datasets. Icdm, 263–272. http://doi.org/10.1109/ICDM.2008.22 #UnifiedAnalytics #SparkAISummit 14
15 .User-specific item-spaces allow us to decompose recommendations into their component parts The predicted recommendation value for an item can be decomposed as the following linear combination 3.5 Using the example on the right, the blue circles representing the c_uj values and distance being the inverse of similarity. This means that our recommendation score for the new sneaker will be: 5 1 = (1/1 * 4) + (1/2 * 5) + (1/5 * 1) = 7.0 4 (57%) 2.5 (35%) 0.5 (7%) 4 7.0 = (1/8 * 4) + (1/5 * 5) + (1/0.5 * 1) = 3.5 0.5 (14%) 1.0 (29%) 2.0 (57%) 1. Hu, Y., Koren, Y., & Volinsky, C. (2008). Collaborative Filtering for Implicit Feedback Datasets. Icdm, 263–272. http://doi.org/10.1109/ICDM.2008.22 #UnifiedAnalytics #SparkAISummit 15
16 .Use the decompositions to explain recommendations Both the relative and absolute decompositions can be helpful: • Relative numbers explain which items have a strong influence over specific recommendations (7.0) (3.5) • Absolute values allow us to look at items that drive popular recommendations in general • Also gives us some intuition about now the CF model is 5.0 Influence Score working • Way to add new prior information or perform “man-in-the- middle” corrections 2.5 1. Hu, Y., Koren, Y., & Volinsky, C. (2008). Collaborative Filtering for Implicit Feedback Datasets. Icdm, 263–272. http://doi.org/10.1109/ICDM.2008.22 #UnifiedAnalytics #SparkAISummit 16
17 .Getting it done in spark.ml The spark.ml ALS model provides us access to the learned item factors However, there are a few tricks to it when calculating explanations it for a large number of users and products We implemented these as proposed changes to spark.ml as a separate class ALSExplain • The item-factor matrix from the fitted ALS model • Original user-product ratings (user, prod, rating) • Regularization parameter lambda (regParam) • Regularization parameter alpha (alpha) Hopefully our PR [SPARK-27447] will be included in an upcoming Spark release, but in the meantime, please take a look at our Github fork: https://github.com/nielshanson/spark #UnifiedAnalytics #SparkAISummit 17
18 .Analysis on Movie Lens (20m) • Performed an analysis on Movie Lens (20m) – 20m user ratings for (27,000 movies x 138,000 users) – Not extremely large, but a classic and big enough to warm-up Spark #UnifiedAnalytics #SparkAISummit 18
19 .Next steps and further work • Currently ALSExplain only generates recommendations and their decomposition scores without explicitly generating full user-specific item-item similarity matrixes. – Could be interesting cases where you would want a user’s item-item similarity matrix • There would be a number of further performance improvements: – E.g., Sparse matrix calculations, Matrix Inverse approximations • Our proposed code design is our first draft at making changes to spark.ml [SPARK-27447] – We want to work with the community to better integrate the functionality with the existing codebase • The ALS paper is 10 years old! Yet this part of the method wasn’t implemented in Spark – Want to explore generating explainable recommendations from more complex models (e.g., Autoencoders, Feed-forward “Deep” Neural Nets, etc.) #UnifiedAnalytics #SparkAISummit 19
20 .Demo • We’ll now do a quick demo… #UnifiedAnalytics #SparkAISummit 20
21 .Questions #UnifiedAnalytics #SparkAISummit 21
22 .DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT