# JanusGraph - Distributed Graph Database with HBase

## 展开查看详情

1. hosted by HBaseConAsia2018 JanusGraph — Distributed graph database with HBase XueMin Zhang @ TalkingData

2. hosted by Content 01 About Us 02 Something about Graph 03 Introduction to JanusGraph 04 JanusGraph with HBase

3. hosted by Content 01 About Us 02 Something about Graph 03 Introduction to JanusGraph 04 JanusGraph with HBase

4. hosted by About us About me • Seven years of practical experience in technical research and development(R&D),focusing on distributed storage, distributed computing, real-time computing, etc. • Successively worked in Sina Weibo and TalkingData, and served as the big data Team Leader of Sina r&d center. • Technical speechers on the platforms of China Hadoop, Strata Hadoop/Data Conference and DTCC. About TalkingData • Founded in 2011, TalkingData is China’s leading third-party big data platform. With SmartDP as the core of its data intelligence application ecosystem, TalkingData empowers enterprises and helps them achieve a data-driven digital transformation. • From the beginning, TalkingData’s vision of using “big data for smarter business decisions and a better world” has allowed it to gradually become China’s leading data intelligence solution provider. TalkingData creates value for clients and serves as their “performance partner,” helping modern enterprises achieve data-driven transformation and accelerating the digitization of clients from various industries. Using data-generated insights to change how people see the world and themselves, TalkingData hopes to ultimately improve people’s lives.

5. hosted by Content 01 About Us 02 Something about Graph 03 Introduction to JanusGraph 04 JanusGraph with HBase

6. hosted by Something about Graph What is a Graph Database As name suggests, it is a database. Uses graph structures for semantic queries with nodes, edges and properties to represent and store data. Allow data in the store to be linked together directly. compare with traditional relational databases Hybrid relations. Handy in finding connections between entities.

7. hosted by Something about Graph Graph Structures - Vertices Vertices are the nodes or points in a graph structure Every vertex may contain a unique ID.

8. hosted by Something about Graph Graph Structures - Vertices Vertices are the nodes or points in a graph structure Every vertex may contain a unique ID. Vertices can be associated with a set of properties (key- value pairs)

9. hosted by Something about Graph Graph Structures - Edges Edges are the connections between the vertices in a graph

10. hosted by Something about Graph Graph Structures - Edges Edges are the connections between the vertices in a graph Edges can be nondirectional, directional, or bidirectional

11. hosted by Something about Graph Graph Structures - Edges Edges are the connections between the vertices in a graph Edges can be nondirectional, directional, or bidirectional Edges like vertices can have properties and id

12. hosted by Something about Graph Graph Structures - Graph G = (V, E) The graph is the collection of vertices, edges, and associated properties Vertices and edges can use label classification

13. hosted by Something about Graph Graph Storage Model - Adjacency Matrix G.vertices = 1 2 3 4 5 6 1 2 3 4 50 61 G.edges = 1 1 1 0 0 2 1 0 0 0 0 0 3 1 0 0 1 0 0 4 1 0 1 0 1 0 5 0 0 0 1 0 0 6 0 0 1 0 0 0

14. hosted by Something about Graph Graph Storage Model - Adjacency Lists 1 2 3 4 Λ 2 1 Λ 3 1 4 6 Λ 4 1 3 5 Λ 5 4 Λ 6 3 Λ

15. hosted by Content 01 About Us 02 Something about Graph 03 Introduction to JanusGraph 04 JanusGraph with HBase

16. hosted by Introduction to JanusGraph Scalable graph database distribute on multi-maching clusters with pluggable storage and indexing. Fully compliant with Apache TinkerPop graph computing framework. Optimized for storing/querying billions of vertices and edges. Supports thousands of concurrent users. Can execute local queries (OLTP) or cross-cluster distributed queries (OLAP). Sponsored by the Linux Foundation. Apache License 2.0

17. hosted by Introduction to JanusGraph Architecture

18. hosted by Introduction to JanusGraph Apache Tinkerpop & Gremlin A graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP) Gremlin graph traversal language

19. hosted by Introduction to JanusGraph Schema and Data Modeling Consist of edge labels, property keys, vertex labels ,index Explicit or Implicit Can evolve over time without database downtime

20. hosted by Introduction to JanusGraph Schema - Edge Label Multiplicity MULTI: Multiple edges of the same label between vertices SIMPLE: One edge with that label (unique per label) MANY2ONE: One outgoing edge with that label ONE2MANY: One incoming edge with that label ONE2ONE: One incoming, one outgoing edge with that label

21. hosted by Introduction to JanusGraph Schema - Property Key Data Types

22. hosted by Introduction to JanusGraph Schema - Property Key Cardinality SINGLE: At most one value per element. LIST: Arbitrary number of values per element. Allows duplicates. SET: Multiple values, but no duplicates.

23. hosted by Introduction to JanusGraph Storage Model

24. hosted by Introduction to JanusGraph What is Graph Partitioning? When the JanusGraph cluster consists of multiple storage backend instances, the graph must be partitioned across those machines. Stores graph in an adjacency list , ssignment of vertices to machines determines the partitioning. Different ways to partition a graph Random Graph Partitioning Explicit Graph Partitioning

25. hosted by Introduction to JanusGraph Random Graph Partitioning Pros Very efficient Requires no configuration Results in balanced partitions Cons Less efficient query processing as the cluster grows Requires more cross-instance communication to retrieve the desired

26. hosted by Introduction to JanusGraph Explicit Graph Partitioning Pros Ensures strongly connected subgraphs are stored on the same instance Reduces the communication overhead significantly Easy to setup Cons Only enabled against storage backends that support ordered key Hotspot issue

27. hosted by Introduction to JanusGraph Edge Cut & Vertex Cut Edge Cut Vertices are hosted on separate machines. Optimization aims to reduce the cross communication and thereby improve query execution. Vertex Cut (by label) A vertex label can be defined as partitioned which means that all vertices of that label will be partitiond across the cluster. In other words, Storing a subset of that vertex’s adjacency list on each partition . Address the hotspot issue caused by vertices with a large number of incident edges.

28. hosted by Introduction to JanusGraph What is Graph Index? graph indexes : efficient retrieval of vertices or edges by their properties Composite Index (supported through the primary storage backend) Mixed Index (supported through external indexing backend) vertex-centric indexes : effectively address query performance for large degree vertices

29. hosted by Content 01 About Us 02 Something about Graph 03 Introduction to JanusGraph 04 JanusGraph with HBase