在机器学习中有效地运用稀疏性

下载 0

快召唤伙伴们来围观吧
微博 QQ QQ空间 贴吧
文档嵌入链接
<iframe src="https://www.slidestalk.com/u181/Large_Scale_Matrix_Analysis_and_Inference06?embed" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
微信扫一扫分享
已成功复制到剪贴板

有只羊

发布于

6年前

3119

人观看

#信息技术

稀疏性是指只使用几个特性表示解决方案。“稀疏”的解决方案推广到样本外更好，由于少过度拟合，并且稀疏解更容易解释由于重要的特性很少而且计算效率更高。本章除了介绍稀疏性的优点之外，还粗略的介绍了其实现算法以及简短的证明。

展开查看详情

1 .Efficiently Implementing Sparsity in Learning M. Magdon-Ismail Rensselaer Polytechnic Institute (Joint Work) December 9, 2013.

2 . Out-of-Sample is What Counts NO • A pattern exists • We don’t know it YES • We have data to learn it • Tested on new cases ? c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 2 /16 Data −→

3  n data points      .  ..   c Creator: Malik Magdon-Ismail . Data Data Matrix Response Matrix d dimensions name age debt income ··· hair weight sex credit? limit risk John 21yrs −$10K $65K ··· black 175lbs M 2K high     Joe 74yrs −$100K · · · blonde 275lbs M  $25K  × 0 −         Jane 27yrs −$20K $85K · · · blonde 135lbs F  10K low          .  ..           Jen 37yrs −$400K $105K · · · brun 155lbs F 15K high X ∈ Rn×d Y ∈ Rn×ω Learning, Sparsity and Big Data: 3 /16 More beautiful data −→

4 . More Beautiful Data X ∈ R231×174 Y ∈ R231×166 15 15 % reconstruction error % reconstruction error 10 10 5 5 0 0 10 20 30 40 50 60 10 20 30 40 50 60 number of principal components number of principal components c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 4 /16 Sparsity is good −→

5 . Throwing Out Unnecessary Features is Good Sparsity: represent your solution using only a few features. ‘Sparse’ solutions generalize to out-of-sample better – less overfitting. Sparse solutions are easier to interpret – few important features. Computations are more efficient. Problem: How to find the few relevant features quickly. c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 5 /16 PCA, K-means, Linear Regression −→

6 . PCA, K-means, Linear Regression k = 20 r = 2k PCA K-Means Regression = Exact Exact Exact Approx, fast (relative error) top-k PCA regression Fast-sparse regression (additive error) Sparse, approx, fast (relative error) Sparse, approx, fast (relative error) c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 6 /16 Sparsity example −→

7 . Sparsity Represent your solution using only a few . . . Example: linear regression             =     Xw = y y is an optimal linear combination of only a few columns in X. (sparse regression; regularization (|| w ||0 ≤ k); feature subset selection; . . . ) c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 7 /16 SVD −→

8 . Singular Value Decomposition (SVD) Σk 0 Vkt X = Uk Ud−k O(nd2) 0 Σd−k Vd−k t U Σ Vt (n × d) (d × d) (d × d) Xk = Uk Σk Vkt = XVk Vkt Xk is the best rank-k approximation to X. Reconstruction of X using only a few deg. of freedom. X X20 X40 X60 Vk is an orthonormal basis for the best k-dimensional subspace of the row space of X. c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 8 /16 Fast approximate SVD −→

9 . Fast Approximate SVD 1: Z = XR R ∼ N (d × r), Z ∈ Rn×r 2: Q = qr.factorize(Z) 3: ˆ k ← svdk (QtX) V ˆ kV Theorem. Let r = k(1 + 1ǫ ) and E = X − XV ˆ t . Then, k E [|| E ||] ≤ (1 + ǫ)|| X − Xk || running time is O(ndk) = o(svd) [BDM, FOCS 2011] c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 9 /16 Vk and sparsity −→

10 . Vk and Sparsity Important “dimensions” of Vkt are important for X      ×s 1 ×s2 ×s3 ×s4 ×s5  −→   Vkt ˆ t ∈ Rk×r V k The sampled r columns are “good” if ˆ kt V I = VktVk ≈ V ˆ k. Sampling schemes: Largest norm (Jollife, 1972); Randomized norm sampling (Rudelson, 1999; RudelsonVershynin, 2007); Greedy (Batson et al, 2009; BDM, 2011). c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 10 /16 Sparse PCA – algorithm −→

11 . Sparse PCA – Algorithm 1: Choose a few columns C of X; C ∈ Rn×r . 2: Find the best rank-k approximation of X in the span of C, XC,k . 3: Compute the SVDk of t XC,k = UC,k ΣC,k VC,k . 4: Z = XVC,k . Each feature in Z is a mixture of only the few original r feature dimensions in C. t || X − XVC,k VC,k t || ≤ || X − XC,k VC,k VC,k || = || X − XC,k || ≤ 1 + O( 2k r ) || X − Xk ||. [BDM, FOCS 2011] c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 11 /16 Sparse PCA - pictures −→

12 . Sparse PCA k = 20 k = 40 k = 60 Dense PCA Sparse PCA, r = 2k Theorem. One can construct, in o(svd), k features that are r-sparse, r = O(k), that are as good as exact dense top-k PCA-features. c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 12 /16 Clustering: K-Means −→

13 . Clustering: K-Means Full, slow Fast, sparse 3 clusters 4 Clusters Theorem. There is a subset of features of size O(#clusters) which produces nearly the optimal partition (within a constant factor). One can quickly produce features with a log-approximation factor. [BDM,2013] c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 13 /16 Regression using few important features −→

14 . Fast Regression using Few Important Features PCA, slow, dense Sparse, fast k = 20 k = 40 Theorem. Can find O(k) pure features which performs as well top-k pca-regression (additive error controlled by || X − Xk ||F /σk ). [BDM,2013] c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 14 /16 Proofs −→

15 . The Proofs All the algorithms use the sparsifier of Vkt in [BDM,FOCS2011]. 1. Choose columns of Vkt to preserve its singular values. 2. Ensure that the selected columns preserve the structural properties of the objective with respect to the columns of X that are sampled. 3. Use dual set sparsification algorithms to accomplish (2). c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 15 /16 Thanks −→

16 . THANKS! • Data compression (PCA): quick and reveals few important features • Unsupervised clustering: quick and reveals few important features • Supervised Regression: quick and reveals few important features Few features: easy to interpret; better generalizers; faster computations. c Creator: Malik Magdon-Ismail Learning, Sparsity and Big Data: 16 /16

7点赞

0收藏

0下载