Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apache

Graph data and graph analytics are increasingly important in data science and engineering. Cypher is an open language used for querying and updating graph databases and analytics platforms, which is now available in the Apache Spark environment. Neo4j Morpheus leverages the open source graph language project to integrate data from Neo4j operational graph databases with Hive and JDBC SQL data sources, using new Cypher features like the Property Graph Catalog, named graphs, graph projection, parameterized graph view functions, and graph/table views. Input and output graphs can be loaded and stored as structured collections of DataFrames with strong graph schemas to ensure data consistency and graph query optimization. Property graphs can also be analyzed and transformed using graph algorithms such as those in the GraphFrames project. Besides describing and demonstrating these capabilities, this talk also discusses the Spark Project Improvement Proposal to bring Cypher into Spark 3.0, and outlines current work to unify Cypher with other graph query languages to form a new ISO standard Graph Query Language.

1.WIFI SSID:SparkAISummit | Password: UnifiedAnalytics









10.Node ● Represents an entity within the graph ● Can have labels Relationship ● Connects a start node with an end node ● Has one type Property ● Describes a node/relationship: e.g. name, age, weight etc ● Key-value pair: String key; typed value (string, number, bool, list, ...)


12.Property graph view of data mirrors conceptual view ○ Entities and relationships, with attributes ○ Nodes and relationships, with properties Graph queries are concise and visual (ASCII Art) MATCH (c:Customer)-[:BOUGHT]-(p:Product) RETURN, Network algorithms run over graphs → Graphs enhance data engineering and science

13. Tables Graphs PostgreSQL, Transactional Oracle, Neo4j SQLServer Data Integration & Analytics Spark SQL Morpheus

14.Spark is an immutable data processing engine ○ Spark graphs are compositions of tables (DFs) ○ Spark graphs can be transformed and combined ○ Functions (including queries) over multiple graphs ○ Cypher query plans mapped to Catalyst Neo4j is a native transactional CRUD database ○ Neo4j graphs use a native graph data representation ○ Neo4j has optimized in-process MT graph algos ○ Morpheus helps move data in and out of Neo4j

15.Graphs and tables are both useful data models ○ Finding paths and subgraphs, and transforming graphs ○ Viewing, aggregating and ordering values The Morpheus project parallels Spark SQL ○ PropertyGraph type (composed of DataFrames) ○ Catalog of graph data sources, named graphs, views, ○ Cypher query language A CypherSession adds graphs to a SparkSession

16.● Data integration ○ Integrate (non-)graphy data from multiple, heterogeneous data sources into one or more property graphs ● Distributed Cypher execution ○ OLAP-style graph analytics ● Data science ○ Integration with other Spark libraries ○ Feature extraction using Neo4j Graph Algorithms

17. Pathfinding Centrality / Community & Search Importance Detection Finds optimal paths Determines the Detects group or evaluates route importance of distinct clustering or partition availability and quality nodes in the network options Estimates the likelihood of nodes forming a Evaluates how future relationship alike nodes are Link Prediction Similarity

18. Hive, DF, JDBC TABLES PROPERTY GRAPH SUB- composing Morpheus DataFrames GRAPH SOURCES FS snapshot

19. DataFrame Driving Table Property Property Property Graph Cypher Graph Result Cypher Graph Result QUERY QUERY Cypher DataFrame QUERY Table Result

20.Property Property Graph GRAPH Graph ALGOS ANALYSIS DataFrame DataFrame toolsets

21. SUBGRAPH Property Graph Morpheus STORE FS snapshot


23.Cypher 9 is the latest full version of openCypher ○ Implemented in Neo4j 3.5 ○ Includes date/time types and functions ○ Implemented in whole/part by six other vendors ○ Several other partial and research implementations ○ Cypher for Gremlin is another openCypher project

24.Cypher is a full CRUD language ← OLTP database ○ RETURNs only tabular results: not composable ○ Results can include graph elements (paths, relationships, nodes) or property values Morpheus implements most of read-only Cypher ○ No MERGE or DELETE ○ Spark immutable data + transformations

25.Cypher 10 proposes Multiple Graph features ○ Multiple Graph CIP: Allows for Cypher Query composition ○ Similar to chaining transformations on DataFrames Support Graph Catalog for managing Graphs ○ Analogous to Spark SQL catalog Query support for Graph Construction

26. Input: a property graph Output: a table FROM GRAPH socialNetwork MATCH ({name: 'Dan'})-[:FRIEND*2]->(foaf) RETURN toUpper( AS name ORDER BY name DESC Language features available in Morpheus

27. Input: a property graph Output: a property graph FROM GRAPH socialNetwork MATCH (p:Person)-[:FRIEND*2]->(foaf) WHERE NOT (p)-[:FRIEND]->(foaf) CONSTRUCT CREATE (p)-[:POSSIBLE_FRIEND]->(foaf) RETURN GRAPH Language features available in Morpheus

28. Input: property graphs Output: a property graph FROM GRAPH socialNetwork MATCH (p:Person) FROM GRAPH products MATCH (c:Customer) WHERE = CONSTRUCT ON socialNetwork, products CREATE (p)-[:IS]->(c) RETURN GRAPH Language features available in Morpheus

29. Input: property graphs Output: a property graph CATALOG CREATE VIEW youngFriends($inGraph){ FROM GRAPH $inGraph MATCH (p1:Person)-[r]->(p2:Person) WHERE p1.age < 25 AND p2.age < 25 CONSTRUCT CREATE (p1)-[COPY OF r]->(p2) RETURN GRAPH } Language features available in Morpheus

由Apache Spark PMC & Committers发起。致力于发布与传播Apache Spark + AI技术,生态,最佳实践,前沿信息。