05Data Structures and Algorithms--BSP model

The course is mainly about BSP model,Generally covered BSP,supersteps,Pagerank,Single Source Shortest Path,Connected Components,Pagerank,Single Source Shortest Path,Connected Components,Apache Giraph,Building an application and so on.

1.Data Structures and Algorithms in Parallel Computing Lecture 5

2.BSP Processors + network + synchronization Superstep Concurrent parallel computation Message exchanges between processors Barrier synchronization All processors reaching this point wait for the rest

3.Supersteps A BSP algorithm is a sequence of supersteps Computation superstep Many small steps Example: floating point operations (addition, subtraction, etc.) Communication superstep Communication operations each transmitting a data word Example: transfer a real number between 2 processors In theory we distinguish between the 2 types of supersteps In practice we assume a single superstep

4.Some applications Pagerank Single Source Shortest Path (SSSP) Connected Components

5.Pagerank Analysis algorithm to determine the importance of a document Based on the number of references to it and the importance of the source documents Named after Larry Page


7.Pagerank Source: wikipedia

8.Solving Pagerank System of linear equations Iterative loop till convergence

9.Pagerank in Pregel

10.Experimental results On Apache Giraph Taken from http://muratbuffalo.blogspot.ro/2015/09/one-trillion-edges-graph-processing-at.html

11.SSSP Find shortest path between a single source vertex and every other vertex in the graph Dijsktra’s algorithm for sequential computations

12.Sequential SSSP Source: wikipedia

13.SSSP in Pregel

14.Experimental results Binary trees

15.Connected components (recap) Label 2 vertices with same label iff there is a path between the two Sequentially it can be achieved by depth first or breadth first search

16.CC in Pregel Use graph contraction Algorithm Each vertex starts with a label Each vertex sends its label to all neighbors Each vertex replaces its label with the minimum (maximum) value it receives from neighbors Algorithm stops when convergence is achieved

17.Experimental results

18.Apache Giraph Pregel is proprietary Giraph is an open source Pregel implementation Runs on standard Hadoop Computation is executed in memory Can be a job in a pipeline (MapReduce) Uses Zookeeper for synchronization

19.Building an application Create a custom vertex by extending BasicVertex Create a custom input format Adjacency list where each line looks like vertexID neighborID1 neighborID2 … Extend the TextVertexInputFormat Create a custom output format Extend the TextVertexOutputFormat

20.What’s next? Vertex centric vs. subgraph centric Load balancing Importance of partitioning and graph type ...