大型矩阵分析与推理

下载 2

快召唤伙伴们来围观吧
微博 QQ QQ空间 贴吧
文档嵌入链接
<iframe src="https://www.slidestalk.com/u181/Large_Scale_Matrix_Analysis_and_Inference09?embed" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
微信扫一扫分享
已成功复制到剪贴板

有只羊

发布于

6年前

2377

人观看

#信息技术大型矩阵分析专题矩阵 pca 大数据协方差贝叶斯公式广义分布矩阵

本章属于总结章节，从矩阵的基础知识讲起，介绍了协方差、椭圆对称矩阵、Dyads、Directional variance（方向差）等知识并从而扩展到了概率论知识例如贝叶斯公式，Bayes Rule for density matrices等，并用简洁有力的语言总结了主成分分析的执行思想。

展开查看详情

1 . Large Scale Matrix Analysis and Inference Wouter M. Koolen - Manfred Warmuth Reza Bosagh Zadeh - Gunnar Carlsson - Michael Mahoney Dec 9, NIPS 2013 1 / 32

2 .Introductory musing — What is a matrix? ai,j 1 A vector of n2 parameters 2 A covariance 3 A generalized probability distribution 4 ... 2 / 32

3 .1. A vector of n2 parameters When you regularize with the squared Frobenius norm min ||W||2F + loss(tr(WXn )) W n 3 / 32

4 .1. A vector of n2 parameters When you regularize with the squared Frobenius norm min ||W||2F + loss(tr(WXn )) W n Equivalent to min ||vec(W)||22 + loss(vec(W) · vec(Xn )) vec(W) n No structure: n2 independent variables 4 / 32

5 .2. A covariance View the symmetric positive definite matrix C as a covariance matrix of some random feature vector c ∈ Rn , i.e. C = E (c − E(c))(c − E(c)) n features plus their pairwise interactions 5 / 32

6 .Symmetric matrices as ellipses Ellipse = {Cu : u 2 = 1} Dotted lines connect point u on unit ball with point Cu on ellipse 6 / 32

7 .Symmetric matrices as ellipses Eigenvectors form axes Eigenvalues are lengths 7 / 32

8 .Dyads uu , where u unit vector One eigenvalue one All others zero Rank one projection matrix 8 / 32

9 .Directional variance along direction u V(c u) = u Cu = tr(C uu ) ≥ 0 The outer figure eight is direction u times the variance u C u PCA: find direction of largest variance 9 / 32

10 .3 dimensional variance plots tr(C uu ) is generalized probability when tr(C) = 1 10 / 32

11 .3. Generalized probability distributions Probability vector ω = (.2, .1., .6, .1) = i ωi ei mixture coefficients pure events Density matrix W= i ωi wi wi mixture coefficients pure density matrices 11 / 32

12 .3. Generalized probability distributions Probability vector ω = (.2, .1., .6, .1) = i ωi ei mixture coefficients pure events Density matrix W= i ωi wi wi mixture coefficients pure density matrices Matrices as generalized distributions 12 / 32

13 .3. Generalized probability distributions Probability vector ω = (.2, .1., .6, .1) = i ωi ei mixture coefficients pure events Density matrix W= i ωi wi wi mixture coefficients pure density matrices Matrices as generalized distributions Many mixtures lead to same density matrix There always exists a decomposition into n eigendyads Density matrix: Symmetric positive matrix of trace one 13 / 32

14 .It’s like a probability! Total variance along orthogonal set of directions is 1 u1 Wu1 + u2 Wu2 = 1 a+b+c =1 14 / 32

15 .Uniform density? 1 1 All dyads have generalized probability nI n 1 1 1 tr( I uu ) = tr(uu ) = n n n Generalized probabilities of n orthogonal dyads sum to 1 15 / 32

16 .Conventional Bayes Rule P(Mi )P(y |Mi ) P(Mi |y ) = P(y ) 4 updates with the same data likelihood Update maintains uncertainty information about maximum likelihood Soft max 16 / 32

17 .Conventional Bayes Rule P(Mi )P(y |Mi ) P(Mi |y ) = P(y ) 4 updates with the same data likelihood Update maintains uncertainty information about maximum likelihood Soft max 17 / 32

18 .Conventional Bayes Rule P(Mi )P(y |Mi ) P(Mi |y ) = P(y ) 4 updates with the same data likelihood Update maintains uncertainty information about maximum likelihood Soft max 18 / 32

19 .Conventional Bayes Rule P(Mi )P(y |Mi ) P(Mi |y ) = P(y ) 4 updates with the same data likelihood Update maintains uncertainty information about maximum likelihood Soft max 19 / 32

20 .Bayes Rule for density matrices exp (log D(M) + log D(y|M)) D(M|y) = tr (above matrix) 1 update with data likelyhood matrix D(y|M) Update maintains uncertainty information about maximum eigenvalue Soft max eigenvalue calculation 20 / 32

21 .Bayes Rule for density matrices exp (log D(M) + log D(y|M)) D(M|y) = tr (above matrix) 2 updates with same data likelyhood matrix D(y|M) Update maintains uncertainty information about maximum eigenvalue Soft max eigenvalue calculation 21 / 32

22 .Bayes Rule for density matrices exp (log D(M) + log D(y|M)) D(M|y) = tr (above matrix) 3 updates with same data likelyhood matrix D(y|M) Update maintains uncertainty information about maximum eigenvalue Soft max eigenvalue calculation 22 / 32

23 .Bayes Rule for density matrices exp (log D(M) + log D(y|M)) D(M|y) = tr (above matrix) 4 updates with same data likelyhood matrix D(y|M) Update maintains uncertainty information about maximum eigenvalue Soft max eigenvalue calculation 23 / 32

24 .Bayes Rule for density matrices exp (log D(M) + log D(y|M)) D(M|y) = tr (above matrix) 10 updates with same data likelyhood matrix D(y|M) Update maintains uncertainty information about maximum eigenvalue Soft max eigenvalue calculation 24 / 32

25 .Bayes Rule for density matrices exp (log D(M) + log D(y|M)) D(M|y) = tr (above matrix) 20 updates with same data likelyhood matrix D(y|M) Update maintains uncertainty information about maximum eigenvalue Soft max eigenvalue calculation 25 / 32

28 .Vector case as special case of matrix case Vectors as diagonal matrices All matrices same eigensystem Fancy becomes · Often the hardest problem ie bounds for the vector case “lift” to the matrix case 28 / 32

29 .Vector case as special case of matrix case Vectors as diagonal matrices All matrices same eigensystem Fancy becomes · Often the hardest problem ie bounds for the vector case “lift” to the matrix case This phenomenon has been dubbed the “free matrix lunch” Size of matrix = size of vector = n 29 / 32

3点赞

1收藏

2下载