张响亮老师在cncc上的算法分享。

献良发布于2018/10/26 09:39

注脚

1. 无监督学习中的 选代表和被代表问题 - AP & LLE 张响亮 Xiangliang Zhang King Abdullah University of Science and Technology CNCC, Oct 25, 2018 Hangzhou, China

2. Outline • Affinity Propagation (AP) [Frey and Dueck, Science, 2007] • Locally Linear Embedding (LLE) [Roweis and Saul, Science, 2000] 2

3.Affinity Propagation [Frey and Dueck, Science 2007] 3

4. Affinity Propagation [Frey and Dueck, NIPS 2005] We describe a new method that, for the first time to our knowledge, combines the advantages of model-based clustering and affinity-based clustering. Component Mixing coefficient 4

5. Clustering: group the similar points together Minimize Minimize Skm=1S x ÎC ( xi - µm )2 Skm=1S x ÎC ( xi - µm )2 i m i m where where 1 µm = S x ÎC xi µm = {xi | xi Î Cm } | Cm | i m K-medoids K-medians

6. Inspired by greedy k-medoids Data to cluster: The likelihood of belong to the cluster with center is the Bayesian prior probability that is a cluster center The responsibility of k-th component generating xi Assign xi with center si Choose a new center

7.Understanding the process Message sent from xi to each center/exemplar: preference to be with each exemplar Hard decision for cluster centers/exemplars Introduce: “availabilities”, which is sent from exemplars to other points, and provides soft evidence of the preference for each exemplar to be available as a center for each point

8.The method presented in NIPS‘05 • Responsibilities are computed using likelihoods and availabilities • Availabilities are computed using responsibilities, recursively Affinities 8

9.Interpretation by Factor Graph is the index of the exemplar for Should not be empty with a single exemplar Constraints: An exemplar must select itself as exemplar Objective function is 9

10.Input and Output of AP in Science07 Preference (prior) 10

11.AP: a message passing algorithm 11

12.Iterations of Message passing in AP 12

13.Iterations of Message passing in AP 13

14.Iterations of Message passing in AP 14

15.Iterations of Message passing in AP 15

16.Iterations of Message passing in AP 16

17.Iterations of Message passing in AP 17

18.Iterations of Message passing in AP 18

19.Iterations of Message passing in AP 19

20.Summary AP Xiangliang Zhang, KAUST CS340: Data Mining 20

21.Extensive study of AP 21

22. Outline • Affinity Propagation (AP) [Frey and Dueck, Science, 2007] • Locally Linear Embedding (LLE) [Roweis and Saul, Science, 2000] 22

23.Locally Linear Embedding (LLE) [Roweis and Saul, Science, 2000] Saul and Roweis. Think globally, fit locally: unsupervised learning of low dimensional manifolds. JMLR 2003 23

24.LLE - motivations 24

25. Inspired by MDS Multidimensional Scaling (MDS), find embedding of objects in low-dim space, preserving pairwise distance Given pairwise similarity Embedding to find Eliminate the need to estimate pairwise distances between widely separated data points? 25

26. LLE – general idea — Locally, on a fine enough scale, everything looks linear — Represent object as linear combination of its neighbors — Assumption: same linear representation will hold in the low dimensional space — Find a low dimensional embedding which minimizes reconstruction loss 26

27. LLE – matrix representation 1.Select k nearest neighbors 2.Reconstruct xi by its K nearest neighbors Find W by minimizing e (W) = å || x i - å w ij x j || 2 i j 27

28. LLE – matrix representation 3.Need to solve system Y = W*Y Find the embedding vectors Yi by minimizing: 2 N N e (Y) = å || Yi - å WijYj || = å å M ij (Yi • Yj ) i j i j where M = (I - W)T (I - W) N s.t. å Yi = 0 (centered on the origin) i 1 N T and å Yi Yi = I ( with unit convariance) N i 28

29. LLE – algorithm summary 1. Find k nearest neighbors in X space 2. Solve for reconstruction weights W C-11 w= where C jk = ( x - h j )T ( x - h k ), and h j is one of x' s K nearest neighbors 1T C-11 3. Compute embedding coordinates Y using weights W: — Create a sparse matrix M = (I-W)T(I-W) — Set Y to be the eigenvectors corresponding to the bottom d non-zero eigenvectors of M 29

30. Continuing, SNE Allows “many-to-one” mappings in which a single ambiguous object really belongs in several disparate locations in the low-dimensional space, while LLE makes one-to-one mapping. pj|i is the asymmetric probability qj|i is induced probability that point that i would pick j as its neighbor i picks point j as its neighbor Gaussian Neighborhood in low- 30 Gaussian Neighborhood in original space dim space

31.Continuing, t-SNE — uses a Student-t distribution (heavier tail) rather than a Gaussian to compute the similarity between two points in the low-dimensional space — symmetrized version of the SNE cost function with simpler gradients 31

32. [Chen and Liu, 2011] LLE - Follow up work LLE output strongly depends on selection of k Jing Chen and Yang Liu. Locally linear embedding: a survey. Artificial Intelligence Review (2011) 32

33.[Ting and Jordan, 2018]

34. Thank you for your attention! Lab of Machine Intelligence and kNowledge Engineering (MINE): http://mine.kaust.edu.sa/

user picture
  • 献良
  • 非著名互联网公司工程师

相关Slides