## 展开查看详情

1.Data Structures and Algorithms in Parallel Computing Lecture 3

2.Graphs G=(V,E) where V is the set of vertices and E the set of edges Weighted graph – edges have weights w(e) Sparse graph – m=|E| << n=|V| Diameter of a graph D(G) the maximum, over all pairs of vertices (u, v) , of the minimum number of edges that must be traversed to get from u to v

3.Examples Twitter graph Vertices are users Mentions or retweets are edges Fast changing topology 5,700 tweets/second Highest peak: 143,199 tweets/second (August 3, 2013)

4.Examples Romanian road network Vertices are cities Edges are main national roads Slow changing topology

5.Big Data Large volume > 2 B internet users 1 > 1 B a ctive F acebook users 1 > 2.5 M daily active Twitter users 2 High velocity >7.5 K Tweets/second 1 >1.5 K Skype calls/second 1 >2000k emails/second 1 1 http :// www.internetlivestats.com 2 http ://www.statista.com/ Kat Wiki Pam John Han @Pam … @Wiki … RT @Pam… @Wiki … Twitter Interactions

6.Applications Community detection Detect communities in graphs U sers which interact a lot Graph dynamics Spread of viral infections Viral marketing Shortest drive path given real traffic data …

7.Limitations Memory Too large to fit in memory Requires large shared memory architectures or distributed systems Compute Size and speed of change puts a lower limit on the processing speed

8.Parallel graph processing Sequential techniques are often hard to parallelize Easy Minimum spanning tree Biconnected components Hard Single source shortest path Efficiency depends also on the type of graph Sparse, dense, power law

9.Graph representations Edge lists List of edges Each edge is a vertex pair Adjacency lists Array of lists G[ i ] Each element corresponds to a vertex and contains the set of vertex neighbors Convert to edge list: Adjacency matrices Binary matrix where A ij =1 iff ( i,j ) in E Used only for sparse graphs as it requires O(n 2 ) space

10.Graph representations For parallel algorithms Replace linked lists with arrays Makes it easer to process each array in parallel

11.Some examples Breadth first search Connected components Minimum spanning tree

12.Breadth first search Traverse a tree or graph from the root by visiting neigbours first, before moving to the next level Keep a queue of not visited vertices

13.Parallel BFS Graph is given as adjacency array Visit vertices in each level in parallel No need for a queue Keep a set of frontier vertices generated at each step by collecting the neighbors of current frontier set Initial frontier set is the single source vertex (root for trees) Due to parallelism same new frontier vertex may be collected by more vertices Requires an extra step to clean duplicates New frontier vertex linked randomly with only one parent Work is O(m+n) Concurrent write model ensures a total depth of O(D)

14.Example

15.Algorithm

16.Connected components Label 2 vertices with same label iff there is a path between the two Sequentially it can be achieved by depth first or breadth first search

17.Parallel CC Use parallel BFS? Inneficient due to its O(D) depth Good only for small diameter graphs Solution Use graph contraction Start with set of vertices Contract neighborhood to a single vertex recursively O(log n) steps for each component to contract

18.Random mate graph contraction

19.Random mate graph contraction

20.Deterministic graph contraction Create disjoint trees by assigning decremental labels to each child and only linking a vertex if its label is smaller than that of the last child

21.Algorithm No relabeling required

22.Worse case scenario Star graph with root in the middle

23.Spanning trees CC can be extended to finding the spanning of a graph tree minimum spanning tree of a weighted graph

24.Minimum spanning tree All weights are distinct There exists a unique minimum spanning tree Based on random mate CC technique Instead of randomly picking a parent, pick edge with minimum weight and hook to vertex if it is a parent Keep track of the edges used for hooking

25.Minimum spanning tree All weights are distinct There exists a unique minimum spanning tree Based on random mate CC technique Instead of randomly picking a parent, pick edge with minimum weight and hook to vertex if it is a parent Keep track of the edges used for hooking

26.What’s next? BSP model SSSP, connected components, pagerank Vertex centric vs. subgraph centric Load balancing Importance of partitioning and graph type ...