矩在Apache Spark分层模型的估计

摘要:在Stitch Fix,层次模型是我们的推荐系统技术中使用的核心机器学习框架之一。当同分布随机变量的经典假设失效时,分层模型允许对聚类数据进行估计。传统的基于似然的分层模型拟合方法常常与工业上发现的数据规模相冲突,这促使最近研究基于矩的参数估计过程。
展开查看详情

1.Moment-Based Estimation for Hierarchical Models in Apache Spark Kyle Schmaus, Stitch Fix #DSSAIS17

2.Overview • About Stitch Fix • Hierarchical (Mixed Effects) Models • Moment Based Estimation • Spark Implementation & Application #DSSAIS17 2

3.Stitch Fix #DSSAIS17 3

4.Stitch Fix #DSSAIS17 4

5.Stitch Fix #DSSAIS17 5

6.Stitch Fix #DSSAIS17 6

7.Stitch Fix Algorithmic recommendations Machine learning Human curation #DSSAIS17 7

8.Stitch Fix • Algorithms Team: 80+ Data Scientists and Data Engineers • Data Integrates into every aspect of the business • Our Blog: multithreaded.stitchfix.com #DSSAIS17 8

9.Hierarchical Models Motivation ℙ(sale) = ? #DSSAIS17 9

10.Hierarchical Models Motivation Color Cut Size Brand Material … #DSSAIS17 10

11.Shared Model ℙ(sale) = "#$ %$& ' ℙ(sale) = "#$ %(& ' ) " ) = log 1−) #DSSAIS17 11

12.Individual Model Per Group #DSSAIS17 12

13.Individual Model Per Group ℙ(sale) = "#$ %&$ ℙ(sale) = "#$ %&' ( " ( = log 1−( #DSSAIS17 13

14.Hierarchical Models • ![#$ ] = ()* +$ , + .$ /$ • /$ ~ 1 0, Σ • 5 ∈ 1, … , 9 • ,, /$ , and Σ are unknown #DSSAIS17 14

15.Simulation Results RMSE By N Observations Per Group • ! ∈ 1, … , & = 100 ● 1.75 • )* = +* , + .* /* + 0* • dim , = dim /* = 11 × 1 model_type 1.50 ● individual model rmse • /* ~ 7 0, Σ ● ● ● ● ● ● ● ● mixed model shared model ● • 0* ~ 7 0, 9 1.25 ● < ~ : ;< ● • Σ 9, 11 ● ● << ● ● ● ● ● ● ● ● 1.00 3 4 5 6 7 log(n_obs) #DSSAIS17 15

16.Software Implementations lme4, nlme, mbest, … statsmodels MixedModels #DSSAIS17 16

17.Software Implementations lme4, nlme, mbest, … statsmodels MixedModels #DSSAIS17 17

18.Likelihood Based Methods Expectation-Maximization, Variational Approximations, or Likelihood Maximization require an ! "#$ initial cost, then a series of iterations costing ! %#& , where • " is the number of total observations • # the number of fixed and random effects • % the number of groups #DSSAIS17 18

19.Moment Based Methods Using a moment-based approach laid out in Perry (2015) and implemented in the mbest package, we can achieve a non-iterative fit in ! "#$ + ! &#' steps. This can be trivially spread across ( processors. #DSSAIS17 19

20.Moment Based Methods Using a moment-based approach laid out in Perry (2015) and implemented in the mbest package, we can achieve a non-iterative fit in ! "#$ + ! &#' steps. This can be trivially spread across ( processors. This improvement in computational efficiency is paid for by sacrificing some statistical efficiency. #DSSAIS17 20

21.Moment Based Setup !" = $" % + '" (" + )" for * ∈ {1, … , 0} (" ∼ 3(0, Σ) )" ∼ 3(0, 89) (" and )" independent #DSSAIS17 21

22.Moment Based Setup • Define !" ≡ $" %" and &"' ≡ (' )"' • *" = $" ( + %" )" + -" ⟹ • *" = !" &" + -" ' 0 ' • Estimate &/" = !" !" !" 1" , for each 3 ∈ 1, … , 7 • Note when !" is rank deficient, &/" is not an unbiased estimator #DSSAIS17 22

23.Moment Based Setup • !" = $" %" &"' • (" = $" %" &")' • *" = $" %" &"+' ) + , = /. + = • ϕ ∑5 , "4) 6" − !" 9̂ " 012 • : ≡ ∑5 "4) <" #DSSAIS17 23

24.Estimate ! ( Ω ≡ $ )%' *% )%'+ %&' ( !, - = Ω/' $ ) + %' % % 1 * ) 0% %&' is an unbiased estimator for !. #DSSAIS17 24

25.Estiamte ! Say we knew *, not just an estimate. Define A • 6 ≡ ∑< = ? (= 9:; 9> 9 9 9 A C B − = A 9; *) VGH CB9 − =9; * ?9 =9>A A • I ≡ ∑< = ? 9:; 9> 9 9J K> ? = A 9 9> • Ω> ≡ ∑< = ? = 9:; 9> 9 9> A ⨂= ? 9> 9 9>= A • vec{ΣQ S } = Ω K; > vec{6 − VI} • ΣQ S is an unbiased estimator for Σ #DSSAIS17 25

26.Estiamte ! • Σ"$ is an unbiased estimator for Σ if you know 6 a priori. We don’t … > • Instead, we use ?. This mean Σ" $ is not an unbiased estimator in practice. • It can be shown the bias is often negligible. • We can project Σ"$ to be Positive Semidefinite #DSSAIS17 26

27.Estimate !" |%$" • Using a gaussian approximation for p(6$7 |87 ) and p(87 ), compute posterior distribution – p(87 |6$7 ) ∝ p(6$7 |87 ) p(87 ) • There exists a formula for C7 (Σ) such that –FE (87 G = C7 I7J (I7K 6$7 − I7MK N) O 7 G =Q – Var(8 O7 C7 #DSSAIS17 27

28. That was a lot of math! Read Fast moment-based estimation for hierarchical models on arXiv, it’s a great paper #DSSAIS17 28

29.Moment Based Estimation Summary • Estimate "!# #DSSAIS17 29