0% found this document useful (0 votes)

108 views43 pages

Spark Graphx

Uploaded by

vaishnavireddy1809vs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

108 views43 pages

Spark Graphx

Uploaded by

vaishnavireddy1809vs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

1

GraphX introduction
• Data is generally stored and processed as a collection of
records or rows. It is represented as a two-dimensional
table with data divided into rows and columns.
• However, collections or tables are not the only way to
represent data. Sometimes, a graph provides a better
representation of data than a collection.
• For example, the Internet is a large graph of
interconnected computers, routers, and switches. The
World Wide Web is a large graph. Web pages connected by
hypertext links form a graph. Social networks on sites such
as Facebook, LinkedIn, and Twitter are graphs.
Transportation hubs such as airports, train terminals, and
bus stops can also be represented by a graph.

2
GraphX introduction
• Graph provides an easy-to-understand and intuitive
model for working with data.
• In addition, specialized graph algorithms are available
for processing graph-oriented data.
• These algorithms provide efficient tools for different
analytics tasks
• Spark GraphX provides efficient library for processing
of large-scale graph-oriented data.

3
Spark GraphX
• For graphs and graph-parallel computation, we have GraphX API
in Spark.
• It leverages an advantage of growing collection of graph
algorithms. Also includes Graph builders to simplify graph
analytics tasks.
• Basically, it extends the Spark RDD with a Resilient Distributed
Property Graph.
• The property graph is a directed multigraph. It has multiple edges
in parallel. Here, every vertex and edge have user-defined
properties associated with it. Moreover, parallel edges allow
multiple relationships between the same vertices.

4
Spark GraphX Features
• Flexibility: We can work with both graphs and computations with
Spark GraphX. It includes exploratory analysis, ETL (Extract,
Transform & Load), as well as iterative graph.
• It is possible to view the same data as both graphs, collections,
transform and join graphs with RDDs.
• Also using the Pregel (Parallel, Graph, and Google) API it is
possible to write custom iterative graph algorithms.
• Speed : It is fastest on comparing with the other graph systems.
Even while retaining Spark’s flexibility, fault tolerance and ease of
use.

5
Spark GraphX Features
• Growing Algorithm Library: Spark GraphX offers a
growing library of graph algorithms. It offers Popular
algorithms such as PageRank, connected components,
strongly connected components, and triangle count.

6
Property Graph
• A property graph is a directed multigraph in which data is associated
with the vertices and the edges. Each vertex of property graph has
properties (attributes). Similarly, each edge is associated with a label
and properties
• A directed multigraph with user-defined objects attached to each
vertex and edge is a property graph.
• It is a graph with potentially multiple parallel edges. They are also
sharing the same source and destination vertex.
• It supports multiple relationships between the same vertices and
parallel edges also.
• Each vertex is keyed with 64-bit long identifier (VertexId)
• As same as RDDs, property graphs are also immutable, distributed,
and fault-tolerant.
7
Property Graph Example
Example of a property graph is a graph representing a social
network on Twitter.

8
Example of Property Graph
• Suppose we want to construct a property graph consisting of the
various collaborators on the GraphX project.
• The vertex property might contain the username and occupation.
• We could annotate edges with a string describing the relationships
between collaborators:
Property Graph Construction
Assume the SparkContext has already been constructed
//

val sc: SparkContext

// Create an RDD for the vertices
val users: RDD[(VertexId, (String, String))] = sc.parallelize(Seq((3L, ("rxin",
"student")), (7L, ("jgonzal", "postdoc")), (5L, ("franklin", "prof")), (2L, ("istoica",
"prof"))))

//Create an RDD for edges

val relationships: RDD[Edge[String]] = sc.parallelize(Seq(Edge(3L, 7L, "collab"),
Edge(5L, 3L, "advisor"), Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi")))
// Define a default user in case there are relationship with missing user

val defaultUser = ("John Doe", "Missing")

// Build the initial Graph
val graph = Graph(users, relationships, defaultUser)

10
GraphX Library

11
GraphX API
• first we need to import Spark and GraphX into our
project.

import org.apache.spark._
import org.apache.spark.graphx._

// To make some of the examples work we will also need RDD

import org.apache.spark.rdd.RDD

12
GraphX API
• The GraphX API provides data types for representing graph-
oriented data and operators for graph analytics.
• Just as RDDs have basic operations like map, filter,
and reduceByKey, property graphs also have a collection of basic
operators that take user defined functions and produce new graphs
with transformed properties and structure.
• It also provides an implementation of Google's Pregel API.
• These operators simplify graph analytics tasks.
• Since GraphX is integrated with Spark, a GraphX user has access to
both GraphX and Spark APIs, including the RDD and DataFrame
APIs
• We can compute the in-degree of each vertex (defined
in GraphOps) by the following
13
Data Types
• The key data types provided by GraphX for working with property
graphs include VertexRDD, Edge, EdgeRDD, EdgeTriplet, and
Graph.

• VertexRDD: VertexRDD represents a distributed collection of

vertices in a property graph. VertexRDD stores only one entry for
each vertex.

• Each vertex is represented by a key-value pair, where the key is a

unique id and value is the data associated with a vertex. The data
type of the key is VertexId, which is essentially a 64-bit Long. The
value can be of any type.

14
Data Types
• Edge: The Edge class abstracts a directed edge in a property graph.
An instance of the Edge class contains source vertex id, destination
vertex id, and edge attributes.

• EdgeRDD: EdgeRDD represents a distributed collection of the

edges in a property graph.

• EdgeTriplet: EdgeTriplet represents a combination of an edge and

the two vertices that it connects. It stores the attributes of an edge
and the two vertices that it connects. It also contains the unique
identifiers for the source and destination vertices of an edge.

15
Data Types
• EdgeContext: It combines EdgeTriplet with methods to send
messages to source and destination vertices of an edge.

• Graph: It represents the property graphs; an instance of the

Graph class represents a property graph. Similar to RDD, it is
immutable, distributed, and fault-tolerant.

• GraphX partitions and distributes a graph across a cluster using

vertex partitioning heuristics.

16
Property Graph Creation
• The cities and distance between cities are given. the cities are the
vertices and the distances between them are the edges. We have
to create a property graph.

17
Graph Creation
• To get started, launch the Spark-shell:
$/path/to/spark/bin/spark-shell
• Once you are inside the Spark-shell, import the GraphX library
import org.apache.spark.graphx._
• create an array of vertices with attributes city name and
population
val verArray = Array((1L, (“Philadelphia”, 1580863)),
(2L, (“Baltimore”, 620961)),(3L, (“Harrisburg”, 49528)),
(4L, (“Wilmington”, 70851)),(5L, (“New York”, 8175133)),
(6L, (“Scranton”, 76089)))

18
Graph Creation
• To create edges array, type in the spark shell:
val edgeArray = Array(Edge(2L, 3L, 113),Edge(2L, 4L,
106),Edge(3L, 4L, 128),Edge(3L, 5L, 248),Edge(3L, 6L,
162),Edge(4L, 1L, 39),Edge(1L, 6L, 168),Edge(1L, 5L,
130),Edge(5L, 6L, 159))

Next, create RDDs from the vertices and edges arrays by using the
sc.parallelize()command

val verRDD = sc.parallelize(verArray)

val edgeRDD = sc.parallelize(edgeArray)

19
Graph Creation
• Finally, build a property graph
val graph = Graph(verRDD, edgeRDD)

Filter operation
• find the cities with population more than 50000
graph.vertices.filter {case (id, (city, population)) => population >
50000}.collect.foreach {case (id, (city, population)) =>println(s”The
population of $city is $population”)}

20
triplets RDD
• There is one triplet for each edge which contains
information about both the vertices and the edge
information.
• We can find the distances between the connected cities
by using graph.triplets.collect

for (triplet <- graph.triplets.collect) {println(s”””The distance

between ${triplet.srcAttr._1} and${triplet.dstAttr._1} is
${triplet.attr} kilometers”””)}

21
Filtration by edges
• we want to find the cities, the distance between which is
less than 150 kilometers. If we type in the spark shell:

graph.edges.filter {case Edge(city1, city2, distance) => distance <

150}.collect.foreach {case Edge(city1, city2, distance)
=>println(s”The distance between $city1 and $city2 is $distance”)}

22
)

Aggregation
• We can find total population of the neighboring cities
by using aggregateMessages operator.
• As GraphX deals only with directed graphs. But to take into account
edges in both directions, we should add the reverse directions to the
graph.
• Take a union of reversed edges and original ones.
val undirectedEdgeRDD = graph.reverse.edges.union(graph.edges)
val graph = Graph(verRDD, undirectedEdgeRDD)
Perform aggrgation
val neighbors = graph.aggregateMessages[Int](ectx =>
ectx.sendToSrc(ectx.dstAttr._2), _ + _)

23
GraphX Operators
• Basic Operators
• numEdges
• numVertices
• inDegrees
• outDegrees
• degrees

• Property Operators
•mapVertices
•mapEdges
•mapTriplets

24
GraphX Operators
• Structural Operators
•reverse
•subgraph
•mask
•groupEdges

•Join Operator
•joinVertices
•outerJoinVertices

25
Graph Creation
Create RDD of id and user pairs:

val users = List((1L, User("Alex", 26)), (2L, User("Bill", 42)), (3L,

User("Carol", 18)), (4L, User("Dave", 16)), (5L, User("Eve", 45)),
(6L, User("Farell", 30)), (7L, User("Garry", 32)), (8L, User("Harry",
36)), (9L, User("Ivan", 28)), (10L, User("Jill", 48)))
val usersRDD = sc.parallelize(users)
usersRDD.collect

26
Graph Creation
• Next, let us create an RDD of connections (edges) between users.
An edge can have any number of attributes; however, to keep
things simple, we assign a single attribute of type Int to each
edge.
val follows = List(Edge(1L, 2L, 1), Edge(2L, 3L, 1), Edge(3L, 1L,
1), Edge(3L, 4L, 1), Edge(3L, 5L, 1), Edge(4L, 5L, 1), Edge(6L,
5L, 1), Edge(7L, 6L, 1), Edge(6L, 8L, 1), Edge(7L, 8L, 1),
Edge(7L, 9L, 1), Edge(9L, 8L, 1), Edge(8L, 10L, 1), Edge(10L,
9L, 1), Edge(1L, 11L, 1))
val followsRDD = sc.parallelize(follows)
followsRDD.collect

27
Graph Creation
• Note that there is an edge connecting vertex with id 1 to
vertex with id 11 (Edge(1L, 11L, 1)). However, the vertex
with id 11 does not have any property.
• GraphX allows you to handle such cases by creating a
default set of properties. It will assign the default
properties to the vertices that have not been explicitly
assigned any properties:
• val defaultUser = User("NA", 0)

28
Graph Creation
• Now we have all the components required to a
construct a property graph:
val socialGraph = Graph(usersRDD, followsRDD,
defaultUser)

29
Find Graph Information
• Next, we briefly describe how to find useful
information about a property graph.
• You can find the number of edges in a property graph,
as shown next.
val numEdges = socialGraph.numEdges
• You can find the number of vertices in a property
graph, as shown next.
val numVertices = socialGraph.numVertices

30
Graph Information
• Now, we show how to find the number of edges
terminating at a vertex :
val inDegrees = socialGraph.inDegrees
inDegrees.collect
• Now, we show how to find the number of edges
originating from :
val outDegrees = socialGraph.outDegrees
outDegrees.collect

31
Graph Information
• Next, we find the degree of a vertex
val degrees = socialGraph.degrees
degrees.collect
• We can obtain a collections view of the vertices, edges, and triplets in a property
graph.
• Vertices:
val vertices = socialGraph.vertices
vertices.collect
• Edges:
val edges = socialGraph.edges
edges.collect
• Triplets:
val triplets = socialGraph.triplets
triplets.take(3)
32
Structural Operators
class Graph[VD, ED]
{ def reverse: Graph[VD, ED]

def subgraph(epred: EdgeTriplet[VD,ED] => Boolean,

vpred: (VertexId, VD) => Boolean): Graph[VD, ED]

def mask[VD2, ED2](other: Graph[VD2, ED2]): Graph[VD, ED]

def groupEdges(merge: (ED, ED) => ED): Graph[VD,ED]

}

33
Reverse Operators
• reverse operator returns a new graph with all the edge directions
reversed.

• This can be useful when, trying to compute the inverse PageRank.

Because the reverse operation does not modify vertex or edge
properties or change the number of edges

• It can be implemented efficiently without data movement or

duplication.

34
Subgraph

• The subgraph operator takes vertex and edge predicates and

returns the graph containing only the vertices that satisfy the
vertex predicate (evaluate to true) and edges that satisfy the edge
predicate and connect
• vertices that satisfy the vertex predicate.
• The subgraph operator can be used in number of situations to
restrict the graph to the vertices and edges of interest or eliminate
broken links.
•

35
Join Operators
class Graph[VD, ED]
{
def joinVertices[U](table: RDD[(VertexId, U)])
(map: (VertexId, VD, U) => VD): Graph[VD, ED]
def outerJoinVertices[U, VD2](table: RDD[(VertexId, U)])
(map: (VertexId, VD, Option[U]) => VD2)
: Graph[VD2, ED]
}

36
Aggregate Messages
• Connected Components algorithm-labels each connected
component of the graph with the ID. GraphX contains
an implementation of the algorithm for the
ConnectedComponentsObject.

• Aggregation operator: used in computing the shortest

path to a source, smallest reachable vertex id, connected
components and PageRank.
• Pregel operator: executes in a series of super steps in
which vertices receive the sum of their inbound

37
PageRank Algorithm
• PageRank measures the importance of each vertex in a graph,
assuming an edge from u to v represents an endorsement of v’s
importance by u. For example, if a Twitter user is followed by many
others, the user will be ranked highly.

• GraphX also includes an example social network dataset that we can

run PageRank on. A set of users is given in data/graphx/users.txt,
and a set of relationships between users is given
in data/graphx/followers.txt. We compute the PageRank of each user
as follows:

38
import org.apache.spark.graphx.GraphLoader // Load the edges as
a graph
val graph = GraphLoader.edgeListFile(sc,
"data/graphx/followers.txt") /
/ Run PageRank
val ranks = graph.pageRank(0.0001).vertices
// Join the ranks with the usernames
val users = sc.textFile("data/graphx/users.txt").map { line => val
fields = line.split(",") (fields(0).toLong, fields(1)) }
val ranksByUsername = users.join(ranks).map { case (id, (username,
rank)) => (username, rank) }
// Print the result
println(ranksByUsername.collect().mkString("\n"))
39
Connected component
• import org.apache.spark.graphx.GraphLoader
// Load the graph as in the PageRank example
val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
// Find the connected components
val cc = graph.connectedComponents().vertices
// Join the connected components with the usernames
val users = sc.textFile("data/graphx/users.txt").map { line => val fields =
line.split(",") (fields(0).toLong, fields(1)) }
val ccByUsername = users.join(cc).map
{ case (id, (username, cc)) => (username, cc) }
// Print the result println(ccByUsername.collect().mkString("\n"))

40
Triangle Counting
• GraphX implements a triangle counting algorithm in the
TriangleCount object that determines the number of triangles
passing through each vertex, providing a measure of clustering.

• TriangleCount requires the edges to be in canonical orientation

(srcId < dstId) and the graph to be partitioned using
Graph.partitionBy.

41
Triangle Counting
import org.apache.spark.graphx.{GraphLoader, PartitionStrategy}
// Load the edges in canonical order and partition the graph for triangle count
val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt", true)
.partitionBy(PartitionStrategy.RandomVertexCut)

// Find the triangle count for each vertex

val triCounts = graph.triangleCount().vertices

// Join the triangle counts with the usernames

val users = sc.textFile("data/graphx/users.txt").map { line => val fields =
line.split(",") (fields(0).toLong, fields(1)) }
val triCountByUsername = users.join(triCounts).map { case (id, (username, tc)) =>
(username, tc) }

// Print the result

println(triCountByUsername.collect().mkString("\n"))
42
Resources

https://spark.apache.org/graphx/

Networkx: Network Analysis With Python: Salvatore Scellato
No ratings yet
Networkx: Network Analysis With Python: Salvatore Scellato
49 pages
Aph: User Guide
No ratings yet
Aph: User Guide
21 pages
Networkx: Network Analysis With Python
No ratings yet
Networkx: Network Analysis With Python
47 pages
Networkx Tutorial
100% (1)
Networkx Tutorial
8 pages
GraphX - Spark 3.5.0 Documentation
No ratings yet
GraphX - Spark 3.5.0 Documentation
34 pages
Spark-GraphX and Neo4j
No ratings yet
Spark-GraphX and Neo4j
32 pages
Graphx: Graph Analytics in Spark
No ratings yet
Graphx: Graph Analytics in Spark
34 pages
Practical Apache Spark in GraphX
No ratings yet
Practical Apache Spark in GraphX
8 pages
Lec 33
No ratings yet
Lec 33
33 pages
MODULE-Analyzing Co-Occurrence-Networks With GraphX
No ratings yet
MODULE-Analyzing Co-Occurrence-Networks With GraphX
43 pages
Lec 32
No ratings yet
Lec 32
25 pages
Boosting Big Data Analytics With Apache Spark GraphX
No ratings yet
Boosting Big Data Analytics With Apache Spark GraphX
13 pages
Spark PPT
No ratings yet
Spark PPT
55 pages
GraphX & Graph Analytics
No ratings yet
GraphX & Graph Analytics
61 pages
Da 4
No ratings yet
Da 4
14 pages
Asme Ix Welder Qualification Interpretation
100% (3)
Asme Ix Welder Qualification Interpretation
95 pages
BDT Unit 3
No ratings yet
BDT Unit 3
105 pages
Graph Analytics For Python Developers
No ratings yet
Graph Analytics For Python Developers
13 pages
Lab Distributed Big Data Analytics: Worksheet-3: Spark Graphx and Spark SQL Operations
No ratings yet
Lab Distributed Big Data Analytics: Worksheet-3: Spark Graphx and Spark SQL Operations
5 pages
Create The Property Graph From Array of Vertex and Edges
No ratings yet
Create The Property Graph From Array of Vertex and Edges
5 pages
Session 3.8
No ratings yet
Session 3.8
17 pages
Graphx
No ratings yet
Graphx
3 pages
Terminology and Body Plan
No ratings yet
Terminology and Body Plan
26 pages
Unit 6
No ratings yet
Unit 6
34 pages
Unit-5 Spark
No ratings yet
Unit-5 Spark
24 pages
BDA Lect5 Apache Spark 2023
No ratings yet
BDA Lect5 Apache Spark 2023
115 pages
Apache Spark With Java
No ratings yet
Apache Spark With Java
209 pages
Week 8-2-4
No ratings yet
Week 8-2-4
1 page
Lec28 - RDD
No ratings yet
Lec28 - RDD
56 pages
GraphX Tutorial
No ratings yet
GraphX Tutorial
17 pages
Unit-V Spark
No ratings yet
Unit-V Spark
69 pages
Writing Spark Application
No ratings yet
Writing Spark Application
37 pages
4.1. Spark Basics
No ratings yet
4.1. Spark Basics
28 pages
Spark MLIB
No ratings yet
Spark MLIB
50 pages
Gaminglasopa: Powered by
No ratings yet
Gaminglasopa: Powered by
3 pages
Unit 4 Spark Cassendra
No ratings yet
Unit 4 Spark Cassendra
41 pages
API 510 Closed Book Practice Exam
No ratings yet
API 510 Closed Book Practice Exam
111 pages
Lecture 25
No ratings yet
Lecture 25
59 pages
8 Apache Spark
No ratings yet
8 Apache Spark
25 pages
C5-SPARK Technology
No ratings yet
C5-SPARK Technology
39 pages
Introduction To Spark
No ratings yet
Introduction To Spark
54 pages
The Uses of 3 Dimensional Printing Technology in o
No ratings yet
The Uses of 3 Dimensional Printing Technology in o
5 pages
Bootcamp Keynote
No ratings yet
Bootcamp Keynote
47 pages
What Is Spark?: Up To 100× Faster
No ratings yet
What Is Spark?: Up To 100× Faster
56 pages
Var, Svar and Svec Models
No ratings yet
Var, Svar and Svec Models
32 pages
Prysmian MV 1CALX33HD Datasheet 2015-04
No ratings yet
Prysmian MV 1CALX33HD Datasheet 2015-04
2 pages
Apache Spark
No ratings yet
Apache Spark
31 pages
An Introduction To Graph Data Management
No ratings yet
An Introduction To Graph Data Management
39 pages
Networkx Tutorial
No ratings yet
Networkx Tutorial
8 pages
BDA Unit III
No ratings yet
BDA Unit III
19 pages
Intro To Apache Spark: Credits To CS 347-Stanford Course, 2015, Reynold Xin, Databricks (Spark Provider)
No ratings yet
Intro To Apache Spark: Credits To CS 347-Stanford Course, 2015, Reynold Xin, Databricks (Spark Provider)
96 pages
BDT MSE2Scheme 23-24
No ratings yet
BDT MSE2Scheme 23-24
4 pages
Keyscan System VII (7.0.19) User Quick Reference Guide: Table of Content
No ratings yet
Keyscan System VII (7.0.19) User Quick Reference Guide: Table of Content
8 pages
Notes 1
No ratings yet
Notes 1
76 pages
Unit 6 Spark
No ratings yet
Unit 6 Spark
43 pages
Apache Spark Graph Processing - Sample Chapter
No ratings yet
Apache Spark Graph Processing - Sample Chapter
22 pages
L03-Spark Framework
No ratings yet
L03-Spark Framework
58 pages
Spark Programming Basics
No ratings yet
Spark Programming Basics
54 pages
Spark - RDD CS DESIGN
No ratings yet
Spark - RDD CS DESIGN
1 page
Chapter 3 Spark
No ratings yet
Chapter 3 Spark
6 pages
Mod4 Bda
No ratings yet
Mod4 Bda
14 pages
Chap 3 Vectors EC
No ratings yet
Chap 3 Vectors EC
12 pages
BDA Lec7
No ratings yet
BDA Lec7
32 pages
BDA Lec8
No ratings yet
BDA Lec8
39 pages
Devops Slides
No ratings yet
Devops Slides
223 pages
MiR Main Brochure EN Web PDF
No ratings yet
MiR Main Brochure EN Web PDF
20 pages
Problems Chapter 1 Sec B
No ratings yet
Problems Chapter 1 Sec B
7 pages
(09.10.30) Dsme H-4453-4 Magnetic Compass (Final)
No ratings yet
(09.10.30) Dsme H-4453-4 Magnetic Compass (Final)
18 pages
What Is Spark?: History of Apache Spark
No ratings yet
What Is Spark?: History of Apache Spark
65 pages
Physical Characterization of Activated Carbon Derived From Mangosteen Peel PDF
No ratings yet
Physical Characterization of Activated Carbon Derived From Mangosteen Peel PDF
5 pages
BDA Experiment 10
No ratings yet
BDA Experiment 10
9 pages
Spark Summit East 2015 - Adv Dev Ops - Student Slides
No ratings yet
Spark Summit East 2015 - Adv Dev Ops - Student Slides
219 pages
Rhcsa v8 Exam Objectives
No ratings yet
Rhcsa v8 Exam Objectives
2 pages
Various Methods of Ligation Ties: Review Article
No ratings yet
Various Methods of Ligation Ties: Review Article
6 pages
FN-642 842 1042-ULADA DataSheet
No ratings yet
FN-642 842 1042-ULADA DataSheet
1 page
App T Da Pam 73-1 S
No ratings yet
App T Da Pam 73-1 S
4 pages
Multiple Choice (8 X 1 PT)
No ratings yet
Multiple Choice (8 X 1 PT)
5 pages
Optimal Capital Allocation
No ratings yet
Optimal Capital Allocation
37 pages
9 Redox Notes
No ratings yet
9 Redox Notes
12 pages
Module 5
No ratings yet
Module 5
27 pages
Recurring Crux Configurations 6 OI Parallel Side 38.4
No ratings yet
Recurring Crux Configurations 6 OI Parallel Side 38.4
4 pages
Software Engineering
No ratings yet
Software Engineering
5 pages
Statistical Tool Iggat Shaira Salinen Ruffa Grace
No ratings yet
Statistical Tool Iggat Shaira Salinen Ruffa Grace
14 pages
20 Jan Paper I EN
No ratings yet
20 Jan Paper I EN
55 pages
Spark SQL - Updated
No ratings yet
Spark SQL - Updated
19 pages
Struc Patterns
No ratings yet
Struc Patterns
86 pages
2021BCS0103 CSE411 Lab8
No ratings yet
2021BCS0103 CSE411 Lab8
12 pages
2021BCS0103
No ratings yet
2021BCS0103
7 pages
2021BCS0103 Cse411 Lab-9
No ratings yet
2021BCS0103 Cse411 Lab-9
10 pages
Hive Part 2
No ratings yet
Hive Part 2
53 pages
2021BCS0103 Cse411 Lab6
No ratings yet
2021BCS0103 Cse411 Lab6
11 pages
KMD Clustering: Robust General-Purpose Clustering of Biological Data
No ratings yet
KMD Clustering: Robust General-Purpose Clustering of Biological Data
12 pages
2021BCS0103 Cse321 Lab6
No ratings yet
2021BCS0103 Cse321 Lab6
12 pages
9 - Pig Latin
No ratings yet
9 - Pig Latin
42 pages
2021BCS0103
No ratings yet
2021BCS0103
15 pages
2021BCS0103 Cse321 Lab
No ratings yet
2021BCS0103 Cse321 Lab
16 pages
Module 1 Intro
No ratings yet
Module 1 Intro
32 pages
Mod 2
No ratings yet
Mod 2
43 pages
String Formatting Exercise
No ratings yet
String Formatting Exercise
4 pages
19 Jan Paper II Statistics HN 1
No ratings yet
19 Jan Paper II Statistics HN 1
34 pages
Akka Parlour Menu
No ratings yet
Akka Parlour Menu
4 pages
Practice Paper 3 - Solutions
No ratings yet
Practice Paper 3 - Solutions
12 pages
Steps To Create Jar File and Execute Word Count Problem in Mapper Reducer
No ratings yet
Steps To Create Jar File and Execute Word Count Problem in Mapper Reducer
5 pages
2021BCS0103 ICS322 Assignment2
No ratings yet
2021BCS0103 ICS322 Assignment2
10 pages
Steps of Hadoop Installation
No ratings yet
Steps of Hadoop Installation
3 pages
2021BCS0103 CSE411 Lab5
No ratings yet
2021BCS0103 CSE411 Lab5
11 pages
BERGHOUT Et Al, 2020 - Aircraft Engines Remaining Useful Life Prediction With An Adaptive Denoising Online Sequential Extreme Learning Machine
No ratings yet
BERGHOUT Et Al, 2020 - Aircraft Engines Remaining Useful Life Prediction With An Adaptive Denoising Online Sequential Extreme Learning Machine
10 pages
2021BCS0103 Ics322
No ratings yet
2021BCS0103 Ics322
3 pages
2021BCS0103 Lab2 Microproc
No ratings yet
2021BCS0103 Lab2 Microproc
3 pages
2021BCS0103 MicroP Lab
No ratings yet
2021BCS0103 MicroP Lab
3 pages
2021BCS0103 Cse321 Lab7
No ratings yet
2021BCS0103 Cse321 Lab7
3 pages
PIG - Installation Step
No ratings yet
PIG - Installation Step
2 pages
Java Programming
No ratings yet
Java Programming
2 pages
OfferLetter PDF
No ratings yet
OfferLetter PDF
1 page
2021BCS0103 ML
No ratings yet
2021BCS0103 ML
1 page
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
1 page
2014 The Rietveld Method
No ratings yet
2014 The Rietveld Method
7 pages
GraphX in Practice: Definitive Reference for Developers and Engineers
From Everand
GraphX in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Vector Graphics Editor: Empowering Visual Creation with Advanced Algorithms
From Everand
Vector Graphics Editor: Empowering Visual Creation with Advanced Algorithms
Fouad Sabry
No ratings yet
Vector Graphics: Mastering Vector Graphics in Computer Vision
From Everand
Vector Graphics: Mastering Vector Graphics in Computer Vision
Fouad Sabry
No ratings yet

Spark Graphx

Uploaded by

Spark Graphx

Uploaded by

1

val sc: SparkContext

//Create an RDD for edges

val defaultUser = ("John Doe", "Missing")

// To make some of the examples work we will also need RDD

• VertexRDD: VertexRDD represents a distributed collection of

• Each vertex is represented by a key-value pair, where the key is a

• EdgeRDD: EdgeRDD represents a distributed collection of the

• EdgeTriplet: EdgeTriplet represents a combination of an edge and

• Graph: It represents the property graphs; an instance of the

• GraphX partitions and distributes a graph across a cluster using

val verRDD = sc.parallelize(verArray)

for (triplet <- graph.triplets.collect) {println(s”””The distance

graph.edges.filter {case Edge(city1, city2, distance) => distance <

val users = List((1L, User("Alex", 26)), (2L, User("Bill", 42)), (3L,

def subgraph(epred: EdgeTriplet[VD,ED] => Boolean,

def mask[VD2, ED2](other: Graph[VD2, ED2]): Graph[VD, ED]

def groupEdges(merge: (ED, ED) => ED): Graph[VD,ED]

• This can be useful when, trying to compute the inverse PageRank.

• It can be implemented efficiently without data movement or

• The subgraph operator takes vertex and edge predicates and

• Aggregation operator: used in computing the shortest

• GraphX also includes an example social network dataset that we can

• TriangleCount requires the edges to be in canonical orientation

// Find the triangle count for each vertex

// Join the triangle counts with the usernames

// Print the result

You might also like