Transforming AI with Graphs - Real World Examples using Spark and Neo4J

Graphs – or information about the relationships, connection, and topology of data points – are transforming machine learning. We’ll walk through real world examples of how to get transform your tabular data into a graph and how to get started with graph AI. This talk will provide an overview of how we to incorporate graph based features into traditional machine learning pipelines, create graph embeddings to better describe your graph topology, and give you a preview of approaches for graph native learning using graph neural networks. We’ll talk about relevant, real world case studies in financial crime detection, recommendations, and drug discovery. This talk is intended to introduce the concept of graph based AI to beginners, as well as help practitioners understand new techniques and applications. Key take aways: how graph data can improve machine learning, when graphs are relevant to data science applications, what graph native learning is and how to get started.

展开查看详情

1.WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics

2.Transforming AI with Graphs: Real World Examples with Spark & Neo4j Alicia Frame, PhD Senior Data Scientist, Neo4j #UnifiedDataAnalytics #SparkAISummit

3.

4.Graph Data Science Applications Financial Services Drug Discovery Recommendations Customer Segmentation Cybersecurity Churn Prediction Search/MDM Predictive Maintenance

5.Labeled Property Graphs name: “Dan” born: May 29, 1970 name: “Ann” Nodes twitter: “@dan” born: Dec 5, 1975 • Can have Labels to classify nodes • Labels have native indexes MARRIED TO Relationships PERSON PERSON LIVES WITH • Relate nodes by type and direction RI D Properties NS VE OW S • Attributes of Nodes & Relationships since: Jan 10, 2011 • Stored as Name/Value pairs brand: “Volvo” • Can have indexes and composite indexes CAR model: “V70” Latitude: 37.5629900° Longitude: -122.3255300° 7

6.Novel & More Accurate Predictions with the Data You Already Have • Current data science models ignore network structure • Graphs add highly predictive features to existing ML models • Otherwise unattainable predictions based on relationships Machine Learning Pipeline

7.“The idea is that graph networks are bigger than "Where do the any one machine-learning approach. graphs come from that Graphs bring an ability to generalize about graph networks structure that the individual neural nets don't have.” operate over?”

8. Building a Graph ML Model Data Native Graph Machine Sources Platform Learning MLlib Parquet JSON and more… and more… Aggregate Disparate Data Unify Graphs and Engineer Build Predictive Models and Cleanse Features

9. Example: Spark & Neo4j Workflow Spark Graph Native Graph Machine Learning Platform Graph Graph Transactions Analytics Cypher 9 in Spark 3.0 Native Graph Algorithms, to create non- MLlib to Train Models Processing, and Storage persistent graphs

10. Build Graph Explore Graphs Solutions • Massively scalable • Persistent, dynamic graphs • Powerful data pipelining • Graph native query and algorithm • Robust ML Libraries performance • Non-persistent, non-native graphs • Constantly growing list of graph algorithms and embeddings

11. The Steps of Graph Data Science Knowledge Graph Feature Graph Native Graphs Engineering Learning Data Science Complexity Graph Neural Networks Graph Graph Algorithm Embeddings Query Based Feature Feature Engineering Query Based Engineering Knowledge Graph Graph Persistence

12.Data Science Complexity Steps Forward in Graph Data Science Graph Neural Graph Algorithm Networks Graph Feature Embeddings Engineering Query Based Feature Query Based Engineering Knowledge Graph Enterprise Maturity

13.Query based knowledge graphs: Connecting the Dots at NASA “Using Neo4j someone from our Orion project found information from the Apollo project that prevented an issue, saving well over two years of work and one million dollars of taxpayer funds.”

14. Steps Forward in Graph Data Science Data Science Complexity Graph Neural Networks Graph Graph Algorithm Embeddings Query Based Feature Feature Engineering Query Based Engineering Knowledge Graph Enterprise Maturity

15.Query-Based Feature Engineering Telecom-churn prediction Churn prediction research has found that simple hand- engineered features are highly predictive • How many calls/texts has an account made? • How many of their contacts Telecommunication have churned? networks are easily represented as graphs

16.Query-Based Feature Engineering Telecom-churn prediction Khan et al, 2015 Add connected features based on graph queries to tabular data 23

17. Knowledge Graphs: Getting Started Example with Spark Spark Graph Native Graph Machine Learning Platform Graph Graph Transactions Analytics • Merge distributed data • Move to Neo4j to build • Bring query based into DataFrames expert queries graph features to ML • Reshape your tables • Persist your graph pipeline into graphs • Explore cypher queries

18. Steps Forward in Graph Data Science Data Science Complexity Graph Neural Networks Graph Graph Algorithm Embeddings Query Based Feature Feature Engineering Query Based Engineering Knowledge Graph Enterprise Maturity

19.Graph Feature Engineering Feature Engineering is how we combine and process the data to create new, more meaningful features, such as clustering or connectivity metrics. Add More Descriptive Features: - Influence - Relationships - Communities

20.Graph Feature Categories & Algorithms Community Centrality / Pathfinding Detection Importance & Search Detects group clustering or Determines the importance of Finds the optimal paths or partition options distinct nodes in the network evaluates route availability and quality Heuristic Similarity Link Prediction Embeddings Estimates the likelihood of nodes Learned representations Evaluates how alike nodes forming a relationship of connectivity or topology are

21.Financial Crime: Detecting Fraud Large financial institutions already have existing pipelines to identify fraud via heuristics and models Graph based features improve accuracy: • Connected components to identify disjointed graphs sharing identifiers • PageRank to measure influence and transaction volumes • Louvain to identify communities that frequently interact • Jaccard to measure account similarity based on relationships

22.+142,000 Peer Reviewed Publications Graph Fraud / Anomaly Detection in the last 10 years

23. Graph Feature Engineering: Getting Started Example with Spark Spark Graph Native Graph Machine Learning Platform Graph Graph Transactions Analytics • Merge distributed data • Persist your graph • Bring graph features into DataFrames • Create rule based to ML pipeline for • Reshape your tables features training into graphs • Run native graph • Explore cypher queries algorithms and write to and simple algorithms graph or stream

24. Graph Algorithms in Neo4J Pathfinding Centrality / Community & Search Importance Detection • Parallel Breadth First Search • Degree Centrality • Triangle Count • Parallel Depth First Search • Closeness Centrality • Clustering Coefficients • Shortest Path • CC Variations: Harmonic, Dangalchev, • Connected Components (Union Find) • Single-Source Shortest Path Wasserman & Faust • Strongly Connected Components • All Pairs Shortest Path • Betweenness Centrality • Label Propagation • Minimum Spanning Tree • Approximate Betweenness Centrality • Louvain Modularity – 1 Step & Multi-Step • A* Shortest Path • PageRank • Balanced Triad (identification) • Yen’s K Shortest Path • Personalized PageRank • K-Spanning Tree (MST) • ArticleRank • Random Walk • Eigenvector Centrality Link Similarity Prediction • Euclidean Distance • Adamic Adar • Cosine Similarity • Common Neighbors • Jaccard Similarity • Preferential Attachment • Overlap Similarity • Resource Allocations neo4j.com/docs/ • Pearson Similarity • Same Community graph-algorithms/current/ • Total Neighbors

25. Graph Algorithms in Neo4J Pathfinding Centrality / Community & Search Importance Detection • Parallel Breadth First Search • Degree Centrality • Triangle Count • Parallel Depth First Search • Closeness Centrality • Clustering Coefficients • Shortest Path • CC Variations: Harmonic, Dangalchev, • Connected Components (Union Find) • Single-Source Shortest Path Wasserman & Faust • Strongly Connected Components • All Pairs Shortest Path • Betweenness Centrality • Label Propagation • Minimum Spanning Tree • Approximate Betweenness Centrality • Louvain Modularity – 1 Step & Multi-Step • A* Shortest Path • PageRank • Balanced Triad (identification) • Yen’s K Shortest Path • Personalized PageRank • K-Spanning Tree (MST) • ArticleRank • Random Walk • Eigenvector Centrality Link Similarity Prediction • Euclidean Distance • Adamic Adar • Cosine Similarity • Common Neighbors • Jaccard Similarity • Preferential Attachment • Overlap Similarity • Resource Allocations neo4j.com/docs/ • Pearson Similarity • Same Community graph-algorithms/current/ • Total Neighbors

26. Steps Forward in Graph Data Science Data Science Complexity Graph Neural Networks Graph Graph Algorithm Embeddings Query Based Feature Feature Engineering Query Based Engineering Knowledge Graph Enterprise Maturity

27.Graph Embeddings Embedding transforms graphs into a vector, or set of vectors, describing topology, connectivity, or attributes of nodes and edges in the graph • Vertex embeddings: describe connectivity of each node • Path embeddings: traversals across the graph • Graph embeddings: encode an entire graph into a single vector

28.Graph Embeddings - Recommendations Explainable Reasoning over Knowledge Graphs for Recommendation

29.Graph Embeddings - Recommendations Explainable Reasoning over Knowledge Graphs for Recommendation