- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
矩在Apache Spark分层模型的估计
展开查看详情
1 .Moment-Based Estimation for Hierarchical Models in Apache Spark Kyle Schmaus, Stitch Fix #DSSAIS17
2 .Overview • About Stitch Fix • Hierarchical (Mixed Effects) Models • Moment Based Estimation • Spark Implementation & Application #DSSAIS17 2
3 .Stitch Fix #DSSAIS17 3
4 .Stitch Fix #DSSAIS17 4
5 .Stitch Fix #DSSAIS17 5
6 .Stitch Fix #DSSAIS17 6
7 .Stitch Fix Algorithmic recommendations Machine learning Human curation #DSSAIS17 7
8 .Stitch Fix • Algorithms Team: 80+ Data Scientists and Data Engineers • Data Integrates into every aspect of the business • Our Blog: multithreaded.stitchfix.com #DSSAIS17 8
9 .Hierarchical Models Motivation ℙ(sale) = ? #DSSAIS17 9
10 .Hierarchical Models Motivation Color Cut Size Brand Material … #DSSAIS17 10
11 .Shared Model ℙ(sale) = "#$ %$& ' ℙ(sale) = "#$ %(& ' ) " ) = log 1−) #DSSAIS17 11
12 .Individual Model Per Group #DSSAIS17 12
13 .Individual Model Per Group ℙ(sale) = "#$ %&$ ℙ(sale) = "#$ %&' ( " ( = log 1−( #DSSAIS17 13
14 .Hierarchical Models • ![#$ ] = ()* +$ , + .$ /$ • /$ ~ 1 0, Σ • 5 ∈ 1, … , 9 • ,, /$ , and Σ are unknown #DSSAIS17 14
15 .Simulation Results RMSE By N Observations Per Group • ! ∈ 1, … , & = 100 ● 1.75 • )* = +* , + .* /* + 0* • dim , = dim /* = 11 × 1 model_type 1.50 ● individual model rmse • /* ~ 7 0, Σ ● ● ● ● ● ● ● ● mixed model shared model ● • 0* ~ 7 0, 9 1.25 ● < ~ : ;< ● • Σ 9, 11 ● ● << ● ● ● ● ● ● ● ● 1.00 3 4 5 6 7 log(n_obs) #DSSAIS17 15
16 .Software Implementations lme4, nlme, mbest, … statsmodels MixedModels #DSSAIS17 16
17 .Software Implementations lme4, nlme, mbest, … statsmodels MixedModels #DSSAIS17 17
18 .Likelihood Based Methods Expectation-Maximization, Variational Approximations, or Likelihood Maximization require an ! "#$ initial cost, then a series of iterations costing ! %#& , where • " is the number of total observations • # the number of fixed and random effects • % the number of groups #DSSAIS17 18
19 .Moment Based Methods Using a moment-based approach laid out in Perry (2015) and implemented in the mbest package, we can achieve a non-iterative fit in ! "#$ + ! &#' steps. This can be trivially spread across ( processors. #DSSAIS17 19
20 .Moment Based Methods Using a moment-based approach laid out in Perry (2015) and implemented in the mbest package, we can achieve a non-iterative fit in ! "#$ + ! &#' steps. This can be trivially spread across ( processors. This improvement in computational efficiency is paid for by sacrificing some statistical efficiency. #DSSAIS17 20
21 .Moment Based Setup !" = $" % + '" (" + )" for * ∈ {1, … , 0} (" ∼ 3(0, Σ) )" ∼ 3(0, 89) (" and )" independent #DSSAIS17 21
22 .Moment Based Setup • Define !" ≡ $" %" and &"' ≡ (' )"' • *" = $" ( + %" )" + -" ⟹ • *" = !" &" + -" ' 0 ' • Estimate &/" = !" !" !" 1" , for each 3 ∈ 1, … , 7 • Note when !" is rank deficient, &/" is not an unbiased estimator #DSSAIS17 22
23 .Moment Based Setup • !" = $" %" &"' • (" = $" %" &")' • *" = $" %" &"+' ) + , = /. + = • ϕ ∑5 , "4) 6" − !" 9̂ " 012 • : ≡ ∑5 "4) <" #DSSAIS17 23
24 .Estimate ! ( Ω ≡ $ )%' *% )%'+ %&' ( !, - = Ω/' $ ) + %' % % 1 * ) 0% %&' is an unbiased estimator for !. #DSSAIS17 24
25 .Estiamte ! Say we knew *, not just an estimate. Define A • 6 ≡ ∑< = ? (= 9:; 9> 9 9 9 A C B − = A 9; *) VGH CB9 − =9; * ?9 =9>A A • I ≡ ∑< = ? 9:; 9> 9 9J K> ? = A 9 9> • Ω> ≡ ∑< = ? = 9:; 9> 9 9> A ⨂= ? 9> 9 9>= A • vec{ΣQ S } = Ω K; > vec{6 − VI} • ΣQ S is an unbiased estimator for Σ #DSSAIS17 25
26 .Estiamte ! • Σ"$ is an unbiased estimator for Σ if you know 6 a priori. We don’t … > • Instead, we use ?. This mean Σ" $ is not an unbiased estimator in practice. • It can be shown the bias is often negligible. • We can project Σ"$ to be Positive Semidefinite #DSSAIS17 26
27 .Estimate !" |%$" • Using a gaussian approximation for p(6$7 |87 ) and p(87 ), compute posterior distribution – p(87 |6$7 ) ∝ p(6$7 |87 ) p(87 ) • There exists a formula for C7 (Σ) such that –FE (87 G = C7 I7J (I7K 6$7 − I7MK N) O 7 G =Q – Var(8 O7 C7 #DSSAIS17 27
28 . That was a lot of math! Read Fast moment-based estimation for hierarchical models on arXiv, it’s a great paper #DSSAIS17 28
29 .Moment Based Estimation Summary • Estimate "!# #DSSAIS17 29