谱图稀疏化:理论和实用方法概述

图形稀疏化是指通过保留输入的一些关键属性例如图拉普拉斯算子的特征值和特征向量(基于计算拉普拉斯特征向量加速许多聚类和分区算法)来计算一个较小的图形,从而即保证了近似又可以提高计算效率。本章还讲述了其他几种用于图形稀疏化的算法。包括求随机正定矩阵的和、用于求解的增量稀疏化、并行和分布式稀疏化、用于更好聚类的启发式算法等
展开查看详情

1. Spectral Graph Sparsification: overview of theory and practical methods Yiannis Koutis University of Puerto Rico - Rio Piedras

2. Graph Sparsification or ‘Sketching’ Compute a smaller graph that preserves some crucial property of the input Motivation: Computational efficiency with approximation guarantees • BFS: Breadth First Spanning Tree • Spanner: Spanning subgraph that approximately preserves distances

3. Spectral Graph Sparsification Compute a smaller graph that preserves some crucial property of the input We want to approximately preserve the eigenvalues and eigenvectors of the graph Laplacian Motivation: Speed-up many clustering and partitioning algorithms based on computing Laplacian eigenvectors

4. Spectral Graph Sparsification Compute a smaller graph that preserves some crucial property of the input We want to approximately preserve the quadratic form xTLx of the Laplacian L Implies spectral approximations for both the Laplacian and the normalized Laplacian

5. The Graph Laplacian 30 2 1 1 20 15

6. Spectral Sparsification by Picture • H is a reweighted subgraph of G • H is obtained using randomness (sampling)

7. Combinatorial sketching Spectral sketching Linear system solvers Better combinatorial sketching

8. Outline • Sums of random positive matrices • Combinatorial sketching to incremental sparsification • Incremental sparsification for solving • Parallel and distributed sparsification • Deep sparsification by effective resistances • A heuristic for better clustering

9. Matrix Ordering • Whenever for all vectors x we have • We write G¹H • In this notation, a spectral sparsifier H satisfies

10. Sums of random matrices [Tropp ‘12, adapted]: 1. Let S be a nxn PSD matrix and Y1, … , Yk be independent random PSD matrices also of size nxn. 2. Let Y = S+i Yi 3. Let Z = E[Y] S : Combinatorial Sketch 4. Suppose Yi ¹ R ¢ Z Yi : Edges R: should be O(C/log n)

11. Outline • Sums of random positive matrices • Combinatorial sketching to incremental sparsification • Incremental sparsification for solving • Parallel and distributed sparsification • Deep sparsification by effective resistances • A heuristic for better clustering

12. A simple algorithm 1. Compute a spanner S’: 2. Let H:= S’ 3. For every edge e not in H: H:=H + k*e, with probability 1/k H has O(nlog n) + m/k edges n: number of vertices m: number of edges

13. and a simple proof • Suppose the graph G has n vertices and m edges • For simplicity we let G be unweighted (generalization is easy) • By definition of spanner, for every edge e of G, there is a path pe in S ‘ that joins the two endpoints of e and has length logn. • Algebraically, if Ge is the Laplacian of edge e:

14. and a simple proof • Apply Tropp’s Theorem on: • Combinatorial Sketch: • Samples:

15. Incremental Sparsification for Solving H has O(nlog n) + m/k edges [Spielman and Teng] If we can construct H with same guarantees but only n+m/k edges then we can solve linear systems on Laplacians in O(mlog2n) time [K, Miller Peng 10] Use low-stretch tree instead of spanner. It preserves distances on average. Use sampling with skewed probability distribution.

16. Incremental Sparsification for Solving [K, Miller Peng 10,11] If A is a symmetric diagonally dominant matrix then an approximate solution to Ax= b can be computed in O(m log n log log n log (1/²)) time, where ² is the required precision Fact: Approximations to the j first eigenvectors can be computed via solving O(j log n) linear systems

17. Outline • Sums of random positive matrices • Combinatorial sketching to incremental sparsification • Incremental sparsification for solving • Parallel and distributed sparsification • Deep sparsification by effective resistances • A heuristic for better clustering

18. Parallel and Distributed Sparsification 1. Can we do better than incremental sparsification ? 2. Is there a parallel sparsification algorithm? 3. Is there a distributed sparsification algorithm ? • Spanners hold the key to questions #2, #3 • There are very efficient parallel and distributed algorithms for computing spanners. From this we get parallel and distributed incremental sparsification. This doesn’t itself imply parallel solvers. • The main problem is question #1.

19. Parallel and Distributed Sparsification • A better combinatorial sketch: • t-bundle spanner: A collection of graphs S1….St such that Si is a spanner for G – (S1+….+ Si-1) • A t-bundle spanner can be computed with t sequential calls to a spanner computation algorithm. • If t is small then the algorithm remains efficient (polylogarithmic parallel time and distributed rounds)

20. A simple algorithm 1. Compute O(log4 n)-bundle spanner S 2. Let H:= S 3. For every edge e not in H: H:=H + 2*e, with probability 1/2 H has O(nlog5 n) + m/2 edges n: number of vertices m: number of edges

21. A simple algorithm • H has O(nlog5 n) + m/2 edges • Small size reduction (factor of 2) • Very tight spectral approximation • Repeat recursively on H. In O(log n) rounds we get O(nlog6 n) edges and a constant spectral approximation.

22. A parallel solver • This is the best known parallel sparsification routine. • Improves the total work guarantees of a parallel solver recently described by Peng and Spielman. • Parallel solver that works in polylogarithmic time and does O(mlog3 n) work.

23. Outline • Sums of random positive matrices • Combinatorial sketching to incremental sparsification • Incremental sparsification for solving • Parallel and distributed sparsification • Deep sparsification by effective resistances • A heuristic for better clustering

24. Spielman-Srivastava: Deep sparsification • Spielman and Srivastava proved: There is a graph H with edges such that • The algorithm is based on sampling edges with probabilities proportional to the effective resistances of the edges in the graph • Proof uses similarly Tropp’s theorem

25. Combinatorial sketching Spectral sketching Linear system solvers Better combinatorial sketching

26. The Incidence Matrix 30 2 1 1 20 15 W is the diagonal matrix containing the square roots of edge weights

27. Spielman-Srivastava: Deep sparsification • Effective resistances can be approximated closely by solving O(log n) linear Laplacian systems as follows: 1. Let L be the Laplacian of the graph 2. Let B be the incident matrix of the graph 3. Let Q be a random Johnson-Lindenstrauss projection of size m x O(log n) 4. Solve the systems L X = BT Q 5. Effective resistance between vertices i and j is equal to the ||Xi – Xi||2 where Xi is the ith row of X • The solution X is a n x O(log n) matrix. • Each row can be interpreted as an embedding of corresponding vertex to the O(log n)-dimensional Euclidean space. • Let’s call this the effective resistance embedding.

28.• http://ccom.uprrp.edu/~ikoutis/SpectralAlgorithms.htm

29. Heuristic: clustering based on the effective resistance embedding • Small effective resistance for an edge e means that there are a lot of short connections between the two endpoints . • Points that are close in the geometric embedding should be close in this connectivity sense in the graph. • Idea: Produce a k-clustering of the graph by running k-means on the effective resistance embedding • Produced clusterings appear to have better properties than clusterings based on geometric embeddings using the k-first eigenvectors. It is also much faster for most values of k.