0% found this document useful (0 votes)

79 views34 pages

Graphx: Graph Analytics in Spark

GraphX is a graph-parallel processing system built on Apache Spark. It provides APIs for representing graphs as property graphs with vertices and edges, and for performing graph-parallel computations and algorithms like PageRank, triangle counting, and connected components. The GraphX API allows users to create graphs from RDDs, transform graphs using operations like mapVertices and subgraph, and run graph algorithms and custom computations using the triplets view.

Uploaded by

webdaxter

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views34 pages

Graphx: Graph Analytics in Spark

Uploaded by

webdaxter

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

GraphX

Graph Analytics in Spark

Ankur Dave
Graduate Student, UC Berkeley AMPLab

Joint work with Joseph Gonzalez, Reynold Xin, Daniel

Crankshaw, Michael Franklin, and Ion Stoica UC BERKELEY
Machine Learning Landscape

Model &
Dependencies
Small & Dense Sparse Large & Dense

Architecture

MapReduce Graph-Parallel Parameter Server

Machine Learning Landscape

Model &
Dependencies
Small & Dense Sparse Large & Dense

GraphX
Architecture
Spark Dataflow
Framework Parameter Server
Graphs
Social Networks
Web Graphs
User-Item Graphs

Graph Algorithms
PageRank
Triangle Counting
Collaborative Filtering

Products
x
Users

Ratings f(j)

Users
f(i)

Products
Collaborative Filtering
f(3)
r13
f(1)

Product Factors
r14
User Factors
f(4)
r24
f(2)
r25 f(5)

X 2
T
f [i] = arg min rij w f [j] + ||w||22
w2Rd
j2Nbrs(i)
The Graph-Parallel Pattern
The Graph-Parallel Pattern
The Graph-Parallel Pattern
Many Graph-Parallel Algorithms
Collaborative Filtering Community Detection
Alternating Least Squares Triangle-Counting
Stochastic Gradient Descent K-core Decomposition
Tensor Factorization K-Truss

Structured Prediction Graph Analytics

Loopy Belief Propagation PageRank
Max-Product Linear Personalized PageRank
Programs Shortest Path
Gibbs Sampling Graph Coloring

Semi-supervised ML Classification
Graph SSL Neural Networks
CoEM
Modern Analytics
Link Table Hyperlinks PageRank Top 20 Pages
Title Link Title PR

Raw
Wikipedia
Com. PR..
<</ />>
</> Top Communities
XML

Editor Community User

Table Editor Graph Detection Community
Editor Title User Com.
Tables
Link Table Hyperlinks PageRank Top 20 Pages
Title Link Title PR

Raw
Wikipedia
Com. PR..
<</ />>
</> Top Communities
XML

Editor Community User

Table Editor Graph Detection Community
Editor Title User Com.
Graphs
Link Table Hyperlinks PageRank Top 20 Pages
Title Link Title PR

Raw
Wikipedia
Com. PR..
<</ />>
</> Top Communities
XML

Editor Community User

Table Editor Graph Detection Community
Editor Title User Com.
The GraphX API
Property Graphs

Vertex Property:
User Profile
Current PageRank Value

Edge Property:
Weights
Relationships
Timestamps
Creating a Graph (Scala)
type VertexId = Long

Graph
val vertices: RDD[(VertexId, String)] =
sc.parallelize(List(
(1L, Alice),
1 Alice
(2L, Bob),
(3L, Charlie))) coworker

class Edge[ED](
val srcId: VertexId,
val dstId: VertexId, 2 Bob
val attr: ED)

val edges: RDD[Edge[String]] = friend
sc.parallelize(List(
Edge(1L, 2L, coworker),
Edge(2L, 3L, friend)))

3 Charlie
val graph = Graph(vertices, edges)
Graph Operations (Scala)
class Graph[VD, ED] {
// Table Views -----------------------
def vertices: RDD[(VertexId, VD)]
def edges: RDD[Edge[ED]]
def triplets: RDD[EdgeTriplet[VD, ED]]
// Transformations -------------------------------------------

def mapVertices[VD2](f: (VertexId, VD) => VD2): Graph[VD2, ED]
def mapEdges[ED2](f: Edge[ED] => ED2): Graph[VD2, ED]
def reverse: Graph[VD, ED]
def subgraph(epred: EdgeTriplet[VD, ED] => Boolean,
vpred: (VertexId, VD) => Boolean): Graph[VD, ED]
// Joins ----------------------------------------
def outerJoinVertices[U, VD2]
(tbl: RDD[(VertexId, U)])
(f: (VertexId, VD, Option[U]) => VD2): Graph[VD2, ED]
// Computation ----------------------------------
def mapReduceTriplets[A](
sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)],
mergeMsg: (A, A) => A): RDD[(VertexId, A)]

Built-in Algorithms (Scala)
// Continued from previous slide
def pageRank(tol: Double): Graph[Double, Double]
def triangleCount(): Graph[Int, ED]
def connectedComponents(): Graph[VertexId, ED]
// ...and more: org.apache.spark.graphx.lib
}

PageRank Triangle Count Connected

Components
The triplets view
class Graph[VD, ED] {
def triplets: RDD[EdgeTriplet[VD, ED]]
}

class EdgeTriplet[VD, ED](
val srcId: VertexId, val dstId: VertexId, val attr: ED,
val srcAttr: VD, val dstAttr: VD)

Graph

1 Alice RDD
coworker srcAttr dstAttr attr
triplets
Alice coworker Bob
2 Bob
Bob friend Charlie
friend

3 Charlie
The subgraph transformation
class Graph[VD, ED] {
def subgraph(epred: EdgeTriplet[VD, ED] => Boolean,
vpred: (VertexId, VD) => Boolean): Graph[VD, ED]

}

graph.subgraph(epred = (edge) => edge.attr != relative)

Graph Graph

Alice coworker Bob Alice coworker Bob

relative subgraph
friend friend

Charlie relative David Charlie David

The subgraph transformation
class Graph[VD, ED] {
def subgraph(epred: EdgeTriplet[VD, ED] => Boolean,
vpred: (VertexId, VD) => Boolean): Graph[VD, ED]

}

graph.subgraph(vpred = (id, name) => name != Bob)

Graph Graph

Alice coworker Bob Alice

relative subgraph relative

friend

Charlie relative David Charlie relative David

Computation with mapReduceTriplets
class Graph[VD, ED] {
def mapReduceTriplets[A](
upgrade to aggregateMessages
sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)],
in Spark 1.2.0
mergeMsg: (A, A) => A): RDD[(VertexId, A)]
}

graph.mapReduceTriplets(
edge => Iterator(
(edge.srcId, 1),
(edge.dstId, 1)),
_ + _) RDD

Graph vertex id degree

Alice Bob
Alice 2
coworker
mapReduceTriplets Bob 2
relative
friend
Charlie 3
Charlie relative David
David 1
How GraphX Works
Encoding Property Graphs as RDDs
Vertex Routing Edge Table
Property Graph Table Table (RDD)
(RDD) (RDD)
Part. 1 A B
A A 1 2
B C A C

Machine 1
B B 1 B C

C D
ACut
VertexA
D
D C C 1
A D
A E
D D 1 2
Machine 2

A F
E E 2
F E E D

Part. 2 F F 2 E F
Graph System Optimizations
Specialized Vertex-Cuts Remote
Data-Structures Partitioning Caching / Mirroring

Message Combiners Active Set Tracking

PageRank Benchmark
EC2 Cluster of 16 x m2.4xLarge (8 cores) + 1GigE

Twitter Graph (42M Vertices,1.5B Edges) UK-Graph (106M Vertices, 3.7B Edges)
3500 9000
3000 8000
Runtime (Seconds)

7000
2500
6000
2000 5000
7x 18x
1500 4000
3000
1000
2000
500 1000
0 0

GraphX performs comparably to

state-of-the-art graph processing systems.
Future of GraphX
1. Language support
a) Java API: PR #3234
b) Python API: collaborating with Intel, SPARK-3789

2. More algorithms
a) LDA (topic modeling): PR #2388
b) Correlation clustering
c) Your algorithm here?

3. Speculative
a) Streaming/time-varying graphs
b) Graph databaselike queries
Thanks!
http://spark.apache.org/graphx

[email protected]

[email protected]
[email protected]
[email protected]

Aph: User Guide
No ratings yet
Aph: User Guide
21 pages
Spark Graphx
No ratings yet
Spark Graphx
43 pages
Spark-GraphX and Neo4j
No ratings yet
Spark-GraphX and Neo4j
32 pages
GraphX & Graph Analytics
No ratings yet
GraphX & Graph Analytics
61 pages
Da 4
No ratings yet
Da 4
14 pages
Practical Apache Spark in GraphX
No ratings yet
Practical Apache Spark in GraphX
8 pages
GraphX - Spark 3.5.0 Documentation
No ratings yet
GraphX - Spark 3.5.0 Documentation
34 pages
Lec 32
No ratings yet
Lec 32
25 pages
Session 3.8
No ratings yet
Session 3.8
17 pages
Lec 33
No ratings yet
Lec 33
33 pages
Lab Distributed Big Data Analytics: Worksheet-3: Spark Graphx and Spark SQL Operations
No ratings yet
Lab Distributed Big Data Analytics: Worksheet-3: Spark Graphx and Spark SQL Operations
5 pages
Boosting Big Data Analytics With Apache Spark GraphX
No ratings yet
Boosting Big Data Analytics With Apache Spark GraphX
13 pages
MODULE-Analyzing Co-Occurrence-Networks With GraphX
No ratings yet
MODULE-Analyzing Co-Occurrence-Networks With GraphX
43 pages
Create The Property Graph From Array of Vertex and Edges
No ratings yet
Create The Property Graph From Array of Vertex and Edges
5 pages
ECS765P - W9 - Large-Scale Graph Processing
No ratings yet
ECS765P - W9 - Large-Scale Graph Processing
51 pages
Graphx
No ratings yet
Graphx
3 pages
GraphX Tutorial
No ratings yet
GraphX Tutorial
17 pages
Graph Based Data Science
No ratings yet
Graph Based Data Science
37 pages
Graph Theory and Its Applications - What Can Graphs Do For Your Software - by Héla Ben Khalfallah - Sep, 2024 - ITNEXT
No ratings yet
Graph Theory and Its Applications - What Can Graphs Do For Your Software - by Héla Ben Khalfallah - Sep, 2024 - ITNEXT
52 pages
An Introduction To Graph Data Management
No ratings yet
An Introduction To Graph Data Management
39 pages
Spark Summit East 2015 - Adv Dev Ops - Student Slides
No ratings yet
Spark Summit East 2015 - Adv Dev Ops - Student Slides
219 pages
09 - Introduction To Graph Data Model
No ratings yet
09 - Introduction To Graph Data Model
22 pages
Graphs
No ratings yet
Graphs
122 pages
L4-GraphAlgorithms v4
No ratings yet
L4-GraphAlgorithms v4
56 pages
Graphanalyticswitharangodbfeb2021 210215121042
No ratings yet
Graphanalyticswitharangodbfeb2021 210215121042
56 pages
Graph Data Structure
No ratings yet
Graph Data Structure
19 pages
BDA Experiment 10
No ratings yet
BDA Experiment 10
9 pages
Paper Graph Mining
No ratings yet
Paper Graph Mining
8 pages
Module 5 1
No ratings yet
Module 5 1
25 pages
HND in Computing and Software Engineering: Lesson 16 - Graph Data Structure
No ratings yet
HND in Computing and Software Engineering: Lesson 16 - Graph Data Structure
40 pages
Social Network Analysis Con Python PDF
No ratings yet
Social Network Analysis Con Python PDF
80 pages
Graphs Fundamental Concepts and Applications
No ratings yet
Graphs Fundamental Concepts and Applications
10 pages
Apache Spark With Java
No ratings yet
Apache Spark With Java
209 pages
Datastructure 5
No ratings yet
Datastructure 5
34 pages
Graph Data Structures
No ratings yet
Graph Data Structures
78 pages
Graph Analytics For Python Developers
No ratings yet
Graph Analytics For Python Developers
13 pages
C5-SPARK Technology
No ratings yet
C5-SPARK Technology
39 pages
SA Lab Manual
No ratings yet
SA Lab Manual
7 pages
DS Unit - 4
No ratings yet
DS Unit - 4
11 pages
Devops Slides
No ratings yet
Devops Slides
223 pages
Graph Analytics PDF
No ratings yet
Graph Analytics PDF
13 pages
Daa Aat Orh PDF
No ratings yet
Daa Aat Orh PDF
13 pages
UNIT - 4 Graphs PDF
No ratings yet
UNIT - 4 Graphs PDF
19 pages
Lec28 - RDD
No ratings yet
Lec28 - RDD
56 pages
Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases
No ratings yet
Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases
65 pages
Graph Algorithms: Timothy Vismor June 11, 2011
No ratings yet
Graph Algorithms: Timothy Vismor June 11, 2011
30 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
58 pages
Data Structures Unit 4
No ratings yet
Data Structures Unit 4
43 pages
Unit I Graph Theory and Concepts
No ratings yet
Unit I Graph Theory and Concepts
35 pages
DSC Unit-4
No ratings yet
DSC Unit-4
30 pages
7001 Ds Graph
No ratings yet
7001 Ds Graph
52 pages
Graph
No ratings yet
Graph
7 pages
Ds Mod 4
No ratings yet
Ds Mod 4
26 pages
Graphs
No ratings yet
Graphs
53 pages
Apache Spark Graph Processing - Sample Chapter
No ratings yet
Apache Spark Graph Processing - Sample Chapter
22 pages
Graphs
No ratings yet
Graphs
18 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
124 pages
Graph Data Structure and Algorithms: Recent Articles On Graph
No ratings yet
Graph Data Structure and Algorithms: Recent Articles On Graph
30 pages
Lisp Programming Language
From Everand
Lisp Programming Language
Faiz ul haque Zeya
No ratings yet
Couchbase Certified Java Developer - Exam Practice Tests
From Everand
Couchbase Certified Java Developer - Exam Practice Tests
Cristian Scutaru
No ratings yet

Graphx: Graph Analytics in Spark

Uploaded by

Graphx: Graph Analytics in Spark

Uploaded by

GraphX

Graph Analytics in Spark

Joint work with Joseph Gonzalez, Reynold Xin, Daniel

MapReduce Graph-Parallel Parameter Server

Structured Prediction Graph Analytics

Editor Community User

Editor Community User

Editor Community User

PageRank Triangle Count Connected

Alice coworker Bob Alice coworker Bob

Charlie relative David Charlie David

Alice coworker Bob Alice

relative subgraph relative

Charlie relative David Charlie relative David

Graph vertex id degree

Message Combiners Active Set Tracking

GraphX performs comparably to 

You might also like

GraphX performs comparably to