Semi-Supervised Learning With Graphs

注脚

展开查看详情

1.Semi-Supervised Learning With Graphs Shannon Quinn (with thanks to William Cohen at CMU)

2.Semi-supervised learning A pool of labeled examples L A (usually larger) pool of unlabeled examples U Can you improve accuracy somehow using U?

3.Semi-Supervised Bootstrapped Learning/Self-training Paris Pittsburgh Seattle Cupertino mayor of arg1 live in arg1 San Francisco Austin denial arg1 is home of traits such as arg1 anxiety selfishness Berlin Extract cities:

4.Semi-Supervised Bootstrapped Learning via Label Propagation Paris live in arg1 San Francisco Austin traits such as arg1 anxiety mayor of arg1 Pittsburgh Seattle denial arg1 is home of selfishness

5.Semi-Supervised Bootstrapped Learning via Label Propagation Paris live in arg1 San Francisco Austin traits such as arg1 anxiety mayor of arg1 Pittsburgh Seattle denial arg1 is home of selfishness Nodes “ near ” seeds Nodes “ far from ” seeds Information from other categories tells you “ how far ” (when to stop propagating) arrogance traits such as arg1 denial selfishness

6.Semi-Supervised Learning as Label Propagation on a (Bipartite) Graph Paris live in arg1 San Francisco Austin traits such as arg1 anxiety mayor of arg1 Pittsburgh Seattle denial arg1 is home of selfishness Propagate labels to nearby nodes X is “ near ” Y if there is a high probability of reaching X from Y with a random walk where each step is either (a) move to a random neighbor or (b) jump back to start node Y, if you ’ re at an NP node rewards multiple paths penalizes long paths penalizes high-fanout paths I like arg1 beer Propagation methods: “ personalized PageRank ” (aka damped PageRank, random-walk-with-reset)

7.ASONAM-2010 (Advances in Social Networks Analysis and Mining)

8.Network Datasets with Known Classes UBMCBlog AGBlog MSPBlog Cora Citeseer

9.RWR - fixpoint of: Seed selection order by PageRank, degree, or randomly go down list until you have at least k examples/class

10.CoEM /HF/ wvRN One definition [ MacSkassy & Provost, JMLR 2007] :…

11.CoEM /HF/ wvRN Another definition in [X. Zhu, Z. Ghahramani , and J. Lafferty, ICML 2003] A harmonic field – the score of each node in the graph is the harmonic, or linearly weighted, average of its neighbors’ scores (harmonic field, HF)

12.CoEM /HF/ wvRN Another justification of the same algorithm…. … start with co-training with a naïve Bayes learner

13.

14.

15.

16.How to do this minimization? First, differentiate to find min is at Jacobi method : To solve Ax=b for x Iterate: … or:

17.How to do this minimization? First, differentiate to find min is at Jacobi method : To solve Ax=b for x Iterate: … or:

18.How to do this minimization? First, differentiate to find min is at Jacobi method : To solve Ax=b for x Iterate: … or:

19.How to do this minimization? First, differentiate to find min is at Jacobi method : To solve Ax=b for x Iterate: … or:

20.How to do this minimization? First, differentiate to find min is at Jacobi method : To solve Ax=b for x Iterate: … or:

21.How to do this minimization? First, differentiate to find min is at Jacobi method : To solve Ax=b for x Iterate: … or:

22.How to do this minimization? First, differentiate to find min is at Jacobi method : To solve Ax=b for x Iterate: … or:

23.How to do this minimization? First, differentiate to find min is at Jacobi method : To solve Ax=b for x Iterate: … or:

24.How to do this minimization? First, differentiate to find min is at Jacobi method : To solve Ax=b for x Iterate: … or:

25.From SemiSupervised to Unsupervised Learning

26.Spectral Clustering: Graph = Matrix M*v 1 = v 2 “ propogates weights from neighbors ” H A B C D E F G H I J A _ 1 1 1 B 1 _ 1 C 1 1 _ D _ 1 1 E 1 _ 1 F 1 1 _ G _ 1 1 H _ 1 1 I 1 1 _ 1 J 1 1 1 _ A B C F D E G I J A 3 B 2 C 3 D E F G H I J M M v 1 A 5 B 6 C 5 D E F G H I J v 2 * =

27.Repeated averaging with neighbors as a clustering method Pick a vector v 0 (maybe at random) Compute v 1 = Wv 0 i.e., replace v 0 [x] with weighted average of v 0 [y] for the neighbors y of x Plot v 1 [x] for each x Repeat for v 2 , v 3 , … Variants widely used for semi-supervised learning clamping of labels for nodes with known labels Without clamping, will converge to constant v t What are the dynamics of this process?

28.Repeated averaging with neighbors on a sample problem… Create a graph, connecting all points in the 2-D initial space to all other points Weighted by distance Run power iteration for 10 steps Plot node id x vs v 10 (x) nodes are ordered by actual cluster number b b b b b g g g g g g g g g r r r r r r r … blue green ___red___

29.Repeated averaging with neighbors on a sample problem… blue green ___red___ blue green ___red___ blue green ___red___ smaller larger

30.Repeated averaging with neighbors on a sample problem… blue green ___red___ blue green ___red___ blue green ___red___ blue green ___red___ blue green ___red___

31.Repeated averaging with neighbors on a sample problem… very small

32.PIC: Power Iteration Clustering run power iteration (repeated averaging w/ neighbors) with early stopping V 0 : random start, or “ degree matrix ” D, or … Easy to implement and efficient Very easily parallelized Experimentally, often better than traditional spectral methods Surprising since the embedded space is 1-dimensional!