0% found this document useful (0 votes)

335 views17 pages

GraphX Tutorial

This tutorial discusses using Apache Spark GraphX to analyze flight data. It defines airports as vertices and routes as edges to build a graph. It then analyzes sample flight data to determine the number of airports, routes, and routes over 1000 miles. Real flight data from January 2015 is also analyzed.

Uploaded by

Gustavo Monsalve

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

335 views17 pages

GraphX Tutorial

Uploaded by

Gustavo Monsalve

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

This tutorial will help you get started using Apache Spark GraphX with Scala on the MapR

Sandbox.
GraphX is the Apache Spark component for graph-parallel computations, built upon a branch of
mathematics called graph theory. It is a distributed graph processing framework that sits on top of Spark
core.

Overview of some graph concepts

A graph is a mathematical structure used to model relations between objects. A graph is made up of
vertices and edges that connect them. The vertices are the objects and the edges are the relationships
between them.

A directed graph is a graph where the edges have a direction associated with them. An example of a
directed graph is a Twitter follower. User Bob can follow user Carol without implying that user Carol
follows user Bob.

A regular graph is a graph where each vertex has the same number of edges. An example of a regular
graphs is Facebook friends. If Bob is a friend of Carol, then Carol is also a friend of Bob.

Graphx Property Graph

GraphX extends the Spark RDD with a Resilient Distributed Property Graph.
Build a Standalone Spark Application
The property graph is a directed multigraph which can have multiple edges in parallel. Every edge and
vertex has user defined properties associated with it. The parallel edges allow multiple relationships
between the same vertices.

In this activity, you will use GraphX to analyze flight data.

PROPRIETARY AND CONFIDENTIAL INFORMATION

2015 MapR Technologies, Inc. All Rights Reserved.
2
Build a Standalone Spark Application

Scenario
As a starting simple example, we will analyze 3 flights, for each flight we have the following information:

Originating Airport Destination Airport Distance

SFO ORD 1800 miles

ORD DFW 800 miles

DFW SFO 1400 miles

In this scenario, we are going to represent the airports as vertices and routes as edges. For our graph we
will have three vertices, each representing an airport. The Vertices each have an Id and the airport code
as a property:

Vertex Table for Airports

ID Property
1 SFO
2 ORD
3 DFW

The Edges have the Source Id , the Destination Id and the distance as a property.

Edges Table for Routes

SrcId DestId Property

1 2 1800

2 3 800

3 1 1400

PROPRIETARY AND CONFIDENTIAL INFORMATION

2015 MapR Technologies, Inc. All Rights Reserved.
3
Build a Standalone Spark Application
Software

This tutorial will run on the MapR Sandbox, which includes Spark.

You can download the code and data to run these examples from here:
o https://github.com/caroljmcdonald/sparkgraphxexample
The examples in this post can be run in the spark-shell, after launching with the spark-shell
command.
You can also run the code as a standalone application as described in the tutorial on Getting Started
with Spark on MapR Sandbox.

Launch the Spark Interactive Shell

Log into the MapR Sandbox, as explained in Getting Started with Spark on MapR Sandbox, using userid
user01, password mapr. Start the spark shell with:
$ spark-shell

Define Vertices
First we will import the GraphX packages.

(In the code boxes, comments are in Green and output is in Blue)

import org.apache.spark._
import org.apache.spark.rdd.RDD
// import classes required for using GraphX
import org.apache.spark.graphx._

We define airports as vertices. Vertices have an Id and can have properties or attributes associated with
them. Each vertex consists of :

Vertex id Id (Long)

Vertex Property name (String)

PROPRIETARY AND CONFIDENTIAL INFORMATION

2015 MapR Technologies, Inc. All Rights Reserved.
4
Build a Standalone Spark Application
Vertex Table for Airports

ID Property(V)
1 SFO

We define an RDD with the above properties that is then used for the Vertexes .

// create vertices RDD with ID and Name

val vertices=Array((1L, ("SFO")),(2L, ("ORD")),(3L,("DFW")))
val vRDD= sc.parallelize(vertices)
vRDD.take(1)
// Array((1,SFO))

// Defining a default vertex called nowhere

val nowhere = "nowhere"

Define Edges
Edges are the routes between airports. An edge must have a source, a destination, and can have
properties. In our example, an edge consists of :

Edge origin id src (Long)

Edge destination id dest (Long)

Edge Property distance distance (Long)

Edges Table for Routes

srcid destid Property(E)

1 12 1800

We define an RDD with the above properties that is then used for the Edges . The edge RDD has the
form (src id, dest id, distance ).

// create routes RDD with srcid, destid , distance

val edges = Array(Edge(1L,2L,1800),Edge(2L,3L,800),Edge(3L,1L,1400))

PROPRIETARY AND CONFIDENTIAL INFORMATION

2015 MapR Technologies, Inc. All Rights Reserved.
5
Build a Standalone Spark Application
val eRDD= sc.parallelize(edges)

eRDD.take(2)
// Array(Edge(1,2,1800), Edge(2,3,800))

Create Property Graph

To create a graph, you need to have a Vertex RDD, Edge RDD and a Default vertex.

Create a property graph called graph.

// define the graph

val graph = Graph(vRDD,eRDD, nowhere)
// graph vertices
graph.vertices.collect.foreach(println)
// (2,ORD)
// (1,SFO)
// (3,DFW)

// graph edges
graph.edges.collect.foreach(println)

// Edge(1,2,1800)
// Edge(2,3,800)
// Edge(3,1,1400)

1. How many airports are there?

// How many airports?

val numairports = graph.numVertices
// Long = 3

2. How many routes are there?

// How many routes?
val numroutes = graph.numEdges
// Long = 3

3. which routes > 1000 miles distance?

// routes > 1000 miles distance?

PROPRIETARY AND CONFIDENTIAL INFORMATION

2015 MapR Technologies, Inc. All Rights Reserved.
6
Build a Standalone Spark Application
graph.edges.filter { case Edge(src, dst, prop) => prop > 1000 }.collect.foreach(println)
// Edge(1,2,1800)
// Edge(3,1,1400)

4. The EdgeTriplet class extends the Edge class by adding the srcAttr and dstAttr members which
contain the source and destination properties respectively.
// triplets
graph.triplets.take(3).foreach(println)
((1,SFO),(2,ORD),1800)
((2,ORD),(3,DFW),800)
((3,DFW),(1,SFO),1400)

5. Sort and print out the longest distance routes

// print out longest routes

graph.triplets.sortBy(_.attr, ascending=false).map(triplet =>
"Distance " + triplet.attr.toString + " from " + triplet.srcAttr + " to " + triplet.dstAttr +
".").collect.foreach(println)

Distance 1800 from SFO to ORD.

Distance 1400 from DFW to SFO.
Distance 800 from ORD to DFW.

Analyze real Flight data with GraphX

Scenario
Our data is from http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-
Time. We are using flight information for January 2015. For each flight we have the following information:

Field Description Example Value

dOfM(String) Day of month 1
dOfW (String) Day of week 4

PROPRIETARY AND CONFIDENTIAL INFORMATION

2015 MapR Technologies, Inc. All Rights Reserved.
7
Build a Standalone Spark Application
carrier (String) Carrier code AA
tailNum (String) Unique identifier for the plane - tail N787AA
number
flnum(Int) Flight number 21
org_id(String) Origin airport ID 12478
origin(String) Origin Airport Code JFK
dest_id (String) Destination airport ID 12892

dest (String) Destination airport code LAX

crsdeptime(Double) scheduled departure time 900

deptime (Double) actual departure time 855
depdelaymins (Double) departure delay in minutes 0
crsarrtime (Double) scheduled arrival time 1230
arrtime (Double) actual arrival time 1237
arrdelaymins (Double) Arrival delay minutes 7
crselapsedtime Elapsed time 390
(Double)
dist (Int) Distance 2475

In this scenario, we are going to represent the airports as vertices and routes as edges. We are interested
in visualizing airports and routes and would like to see the number of airports that have departures or
arrivals.

You can download the code and data to run these examples from here:

https://github.com/caroljmcdonald/sparkgraphxexample

Log into the MapR Sandbox, as explained in Getting Started with Spark on MapR Sandbox, using userid
user01, password mapr. Copy the sample data files to your sandbox home directory /user/user01 using
scp. Start the spark shell with:
$ spark-shell

Define Vertices
First we will import the GraphX packages.

PROPRIETARY AND CONFIDENTIAL INFORMATION

2015 MapR Technologies, Inc. All Rights Reserved.
8
Build a Standalone Spark Application
(In the code boxes, comments are in Green and output is in Blue)

import org.apache.spark._
import org.apache.spark.rdd.RDD
import org.apache.spark.util.IntParam
// import classes required for using GraphX
import org.apache.spark.graphx._
import org.apache.spark.graphx.util.GraphGenerators

Below we a Scala case classes to define the Flight schema corresponding to the csv data file.

// define the Flight Schema

case class Flight(dofM:String, dofW:String, carrier:String, tailnum:String, flnum:Int, org_id:Long,
origin:String, dest_id:Long, dest:String, crsdeptime:Double, deptime:Double,
depdelaymins:Double, crsarrtime:Double, arrtime:Double,
arrdelay:Double,crselapsedtime:Double,dist:Int)

The function below parses a line from the data file into the Flight class.

// function to parse input into Flight class

def parseFlight(str: String): Flight = {
val line = str.split(",")
Flight(line(0), line(1), line(2), line(3), line(4).toInt, line(5).toLong, line(6), line(7).toLong,
line(8), line(9).toDouble, line(10).toDouble, line(11).toDouble, line(12).toDouble,
line(13).toDouble, line(14).toDouble, line(15).toDouble, line(16).toInt)
}

Below we load the data from the csv file into a Resilient Distributed Dataset (RDD). RDDs can have
transformations and actions, the first() action returns the first element in the RDD.

// load the data into a RDD

val textRDD = sc.textFile("/user/user01/data/rita2014jan.csv")
// MapPartitionsRDD[1] at textFile

// parse the RDD of csv lines into an RDD of flight classes

val flightsRDD = textRDD.map(parseFlight).cache()

We define airports as vertices. Vertices can have properties or attributes associated with them. Each vertex
has the following property:

PROPRIETARY AND CONFIDENTIAL INFORMATION

2015 MapR Technologies, Inc. All Rights Reserved.
9
Build a Standalone Spark Application
Airport name (String)

Vertex Table for Airports

ID Property(V)
10397 ATL

We define an RDD with the above properties that is then used for the Vertexes .

// create airports RDD with ID and Name

val airports = flightsRDD.map(flight => (flight.org_id, flight.origin)).distinct

airports.take(1)
// Array((14057,PDX))

// Defining a default vertex called nowhere

val nowhere = "nowhere"

// Map airport ID to the 3-letter code to use for printlns

val airportMap = airports.map { case ((org_id), name) => (org_id -> name) }.collect.toList.toMap
// Map(13024 -> LMT, 10785 -> BTV,)

Define Edges
Edges are the routes between airports. An edge must have a source, a destination, and can have
properties. In our example, an edge consists of :

Edge origin id src (Long)

Edge destination id dest (Long)

Edge Property distance distance (Long)

Edges Table for Routes

srcid destid Property(E)

14869 14683 1087

PROPRIETARY AND CONFIDENTIAL INFORMATION

2015 MapR Technologies, Inc. All Rights Reserved.
10
Build a Standalone Spark Application
We define an RDD with the above properties that is then used for the Edges . The edge RDD has the
form (src id, dest id, distance ).

// create routes RDD with srcid, destid , distance

val routes = flightsRDD.map(flight => ((flight.org_id, flight.dest_id), flight.dist)).distinctdistinct

routes.take(2)
// Array(((14869,14683),1087), ((14683,14771),1482))

// create edges RDD with srcid, destid , distance

val edges = routes.map {
case ((org_id, dest_id), distance) =>Edge(org_id.toLong, dest_id.toLong, distance) }

edges.take(1)
//Array(Edge(10299,10926,160))

Create Property Graph

To create a graph, you need to have a Vertex RDD, Edge RDD and a Default vertex.

Create a property graph called graph.

// define the graph

val graph = Graph(airports, edges, nowhere)

// graph vertices
graph.vertices.take(2)
Array((10208,AGS), (10268,ALO))

// graph edges
graph.edges.take(2)
Array(Edge(10135,10397,692), Edge(10135,13930,654))

6. How many airports are there?

// How many airports?
val numairports = graph.numVertices
// Long = 301

7. How many routes are there?

PROPRIETARY AND CONFIDENTIAL INFORMATION

8. which routes > 1000 miles distance?

// routes > 1000 miles distance?

graph.edges.filter { case ( Edge(org_id, dest_id,distance))=> distance > 1000}.take(3)
// Array(Edge(10140,10397,1269), Edge(10140,10821,1670), Edge(10140,12264,1628))

9. The EdgeTriplet class extends the Edge class by adding the srcAttr and dstAttr members which
contain the source and destination properties respectively.
// triplets
graph.triplets.take(3).foreach(println)
((10135,ABE),(10397,ATL),692)
((10135,ABE),(13930,ORD),654)
((10140,ABQ),(10397,ATL),1269)

10. Sort and print out the longest distance routes

// Sort and print out the longest distance routes

graph.triplets.sortBy(_.attr, ascending=false).map(triplet =>
"Distance " + triplet.attr.toString + " from " + triplet.srcAttr + " to " + triplet.dstAttr +
".").take(10).foreach(println)

Distance 4983 from JFK to HNL.

Distance 4983 from HNL to JFK.
Distance 4963 from EWR to HNL.
Distance 4963 from HNL to EWR.
Distance 4817 from HNL to IAD.
Distance 4817 from IAD to HNL.
Distance 4502 from ATL to HNL.
Distance 4502 from HNL to ATL.
Distance 4243 from HNL to ORD.
Distance 4243 from ORD to HNL.

PROPRIETARY AND CONFIDENTIAL INFORMATION

2015 MapR Technologies, Inc. All Rights Reserved.
12
Build a Standalone Spark Application
11. compute the highest degree vertex
// Define a reduce operation to compute the highest degree vertex
def max(a: (VertexId, Int), b: (VertexId, Int)): (VertexId, Int) = {
if (a._2 > b._2) a else b
}
val maxInDegree: (VertexId, Int) = graph.inDegrees.reduce(max)
maxInDegree: (org.apache.spark.graphx.VertexId, Int) = (10397,152)

val maxOutDegree: (VertexId, Int) = graph.outDegrees.reduce(max)

maxOutDegree: (org.apache.spark.graphx.VertexId, Int) = (10397,153)

val maxDegrees: (VertexId, Int) = graph.degrees.reduce(max)

airportMap(10397)
res70: String = ATL

12. which airport has the most incoming flights?

// get top 3
val maxIncoming = graph.inDegrees.collect.sortWith(_._2 > _._2).map(x => (airportMap(x._1),
x._2)).take(3)

maxIncoming.foreach(println)
(ATL,152)
(ORD,145)
(DFW,143)

// which airport has the most outgoing flights?

val maxout= graph.outDegrees.join(airports).sortBy(_._2._1, ascending=false).take(3)

maxout.foreach(println)
(10397,(153,ATL))
(13930,(146,ORD))
(11298,(143,DFW))

PROPRIETARY AND CONFIDENTIAL INFORMATION

PageRank
Another GraphX operator is PageRank. which is based on the Google PageRank algorithm.
PageRank measures the importance of each vertex in a graph, by determining which vertexes have the
most edges with other vertexes. In our example we can use PageRank to determine which airports are
the most important, by measuring which airports have the most connections to other airports.
We have to specify the tolerance, which is the measure of convergence.

13. What are the most important airports according to PageRank?

// use pageRank
val ranks = graph.pageRank(0.1).vertices

val impAirports = ranks.join(airports).sortBy(_._2._1, false).map(_._2._2)

impAirports.take(4)
//res6: Array[String] = Array(ATL, ORD, DFW, DEN)

Pregel
Many important graph algorithms are iterative algorithms since properties of vertices depend on
properties of their neighbors, which depend on properties of their neighbors. Pregel is an iterative graph
processing model, developed at Google, which uses a sequence of iterations of message passing
between vertices in a graph. GraphX Implements a Pregel-like bulk-synchronous message-passing API.

With the Pregel implementation in GraphX vertices can only send messages to neighboring vertices.

The Pregel operator is executed in a series of super steps. In each super step:

the vertices receive the sum of their inbound messages from the previous super step,

compute a new value for the vertex property,

send messages to the neighboring vertices in the next super step.

When there are no more messages remaining, the Pregel operator will end the iteration and the final
graph is returned.

PROPRIETARY AND CONFIDENTIAL INFORMATION

The code below computes the cheapest airfare using Pregel with the following formula to compute airfare.
50 + distance / 20

// starting vertex
val sourceId: VertexId = 13024
// a graph with edges containing airfare cost calculation
val gg = graph.mapEdges(e => 50.toDouble + e.attr.toDouble/20 )
// initialize graph, all vertices except source have distance infinity
val initialGraph = gg.mapVertices((id, _) => if (id == sourceId) 0.0 else Double.PositiveInfinity)
// call pregel on graph
val sssp = initialGraph.pregel(Double.PositiveInfinity)(
// Vertex Program
(id, dist, newDist) => math.min(dist, newDist),
triplet => {
// Send Message

PROPRIETARY AND CONFIDENTIAL INFORMATION

2015 MapR Technologies, Inc. All Rights Reserved.
15
Build a Standalone Spark Application
if (triplet.srcAttr + triplet.attr < triplet.dstAttr) {
Iterator((triplet.dstId, triplet.srcAttr + triplet.attr))
} else {
Iterator.empty
}
},
// Merge Message
(a,b) => math.min(a,b)
)

// routes , lowest flight cost

println(sssp.edges.take(4).mkString("\n"))
Edge(10135,10397,84.6)
Edge(10135,13930,82.7)
Edge(10140,10397,113.45)
Edge(10140,10821,133.5)

// routes with airport codes , lowest flight cost

ssp.edges.map{ case ( Edge(org_id, dest_id,price))=> ( (airportMap(org_id), airportMap(dest_id),
price)) }.takeOrdered(10)(Ordering.by(_._3))
Array((WRG,PSG,51.55), (PSG,WRG,51.55), (CEC,ACV,52.8), (ACV,CEC,52.8), (ORD,MKE,53.35),
(IMT,RHI,53.35), (MKE,ORD,53.35), (RHI,IMT,53.35), (STT,SJU,53.4), (SJU,STT,53.4))

// airports , lowest flight cost

println(sssp.vertices.take(4).mkString("\n"))

(10208,277.79)
(10268,260.7)
(14828,261.65)
(14698,125.25)

// airport codes , sorted lowest flight cost

sssp.vertices.collect.map(x => (airportMap(x._1), x._2)).sortWith(_._2 < _._2)
res21: Array[(String, Double)] = Array(PDX,62.05), (SFO,65.75), (EUG,117.35)

Want to learn more?

Free Spark Training at http://learn.mapr.com

http://spark.apache.org/docs/latest/graphx-programming-guide.html

PROPRIETARY AND CONFIDENTIAL INFORMATION

JSS2 Maths Second Term Exam
33% (3)
JSS2 Maths Second Term Exam
7 pages
Eldritch Knight
No ratings yet
Eldritch Knight
3 pages
Aph: User Guide
No ratings yet
Aph: User Guide
21 pages
Specification FOR Approval: Title 15.6" HD+ TFT LCD
No ratings yet
Specification FOR Approval: Title 15.6" HD+ TFT LCD
30 pages
Instruction Manual Installation, Operation, Maintenance Screw Pump Series Type A (Bearings Arranged Externally)
No ratings yet
Instruction Manual Installation, Operation, Maintenance Screw Pump Series Type A (Bearings Arranged Externally)
23 pages
An Introduction To Docker
No ratings yet
An Introduction To Docker
15 pages
React and Socket - Io Realtime Chat App
No ratings yet
React and Socket - Io Realtime Chat App
7 pages
Module 3
No ratings yet
Module 3
28 pages
Pycryptodome Master
100% (1)
Pycryptodome Master
82 pages
LTR WheelhousePrison - v1
No ratings yet
LTR WheelhousePrison - v1
23 pages
BDA Experiment 10
No ratings yet
BDA Experiment 10
9 pages
Create The Property Graph From Array of Vertex and Edges
No ratings yet
Create The Property Graph From Array of Vertex and Edges
5 pages
Spark Graphx
No ratings yet
Spark Graphx
43 pages
Practical Apache Spark in GraphX
No ratings yet
Practical Apache Spark in GraphX
8 pages
Case Study: Flight Data Analysis Using Spark Graphx
No ratings yet
Case Study: Flight Data Analysis Using Spark Graphx
23 pages
GraphX - Spark 3.5.0 Documentation
No ratings yet
GraphX - Spark 3.5.0 Documentation
34 pages
Spark-GraphX and Neo4j
No ratings yet
Spark-GraphX and Neo4j
32 pages
GraphX & Graph Analytics
No ratings yet
GraphX & Graph Analytics
61 pages
Da 4
No ratings yet
Da 4
14 pages
Graphx: Graph Analytics in Spark
No ratings yet
Graphx: Graph Analytics in Spark
34 pages
Lec 33
No ratings yet
Lec 33
33 pages
ArangoDB GraphCourse Beginners
No ratings yet
ArangoDB GraphCourse Beginners
64 pages
Apache Spark With Java
No ratings yet
Apache Spark With Java
209 pages
ArangoDB GraphCourse Beginners
No ratings yet
ArangoDB GraphCourse Beginners
64 pages
C5-SPARK Technology
No ratings yet
C5-SPARK Technology
39 pages
Apache Spark Graph Processing - Sample Chapter
No ratings yet
Apache Spark Graph Processing - Sample Chapter
22 pages
Spark Summit East 2015 - Adv Dev Ops - Student Slides
No ratings yet
Spark Summit East 2015 - Adv Dev Ops - Student Slides
219 pages
A Graph Is A Way of Representing Relationships That Exist Between Pairs of Objects
No ratings yet
A Graph Is A Way of Representing Relationships That Exist Between Pairs of Objects
2 pages
Unit 6
No ratings yet
Unit 6
34 pages
Module 5 Data Science
No ratings yet
Module 5 Data Science
8 pages
What Is Spark?: Up To 100× Faster
No ratings yet
What Is Spark?: Up To 100× Faster
56 pages
Apach Spark With Scala Slides
No ratings yet
Apach Spark With Scala Slides
187 pages
Neo4j Graph Data Modeling - Sample Chapter
100% (1)
Neo4j Graph Data Modeling - Sample Chapter
22 pages
Lab Distributed Big Data Analytics: Worksheet-3: Spark Graphx and Spark SQL Operations
No ratings yet
Lab Distributed Big Data Analytics: Worksheet-3: Spark Graphx and Spark SQL Operations
5 pages
Bootcamp Keynote
No ratings yet
Bootcamp Keynote
47 pages
SPARK
No ratings yet
SPARK
27 pages
Devops Slides
No ratings yet
Devops Slides
223 pages
4.1. Spark Basics
No ratings yet
4.1. Spark Basics
28 pages
Lec28 - RDD
No ratings yet
Lec28 - RDD
56 pages
BDT Unit 3
No ratings yet
BDT Unit 3
105 pages
Unit-V Spark
No ratings yet
Unit-V Spark
69 pages
Writing Spark Application
No ratings yet
Writing Spark Application
37 pages
MODULE-Analyzing Co-Occurrence-Networks With GraphX
No ratings yet
MODULE-Analyzing Co-Occurrence-Networks With GraphX
43 pages
BDA Lec7
No ratings yet
BDA Lec7
32 pages
Intro To Apache Spark: Credits To CS 347-Stanford Course, 2015, Reynold Xin, Databricks (Spark Provider)
No ratings yet
Intro To Apache Spark: Credits To CS 347-Stanford Course, 2015, Reynold Xin, Databricks (Spark Provider)
96 pages
Session 3.8
No ratings yet
Session 3.8
17 pages
Pygraphviz
No ratings yet
Pygraphviz
36 pages
Graph Analytics For Python Developers
No ratings yet
Graph Analytics For Python Developers
13 pages
Spark
No ratings yet
Spark
96 pages
ECS765P - W4 - Introduction To Spark
No ratings yet
ECS765P - W4 - Introduction To Spark
39 pages
Fast Data Processing With Spark - Second Edition - Sample Chapter
No ratings yet
Fast Data Processing With Spark - Second Edition - Sample Chapter
18 pages
Introduction To Spark
No ratings yet
Introduction To Spark
54 pages
BDA Unit III
No ratings yet
BDA Unit III
19 pages
Advanced Data Science On Spark: Reza Zadeh
No ratings yet
Advanced Data Science On Spark: Reza Zadeh
47 pages
BDA Lec8
No ratings yet
BDA Lec8
39 pages
Apache Spark Cheatsheet (2014)
No ratings yet
Apache Spark Cheatsheet (2014)
9 pages
Lecture 25
No ratings yet
Lecture 25
59 pages
Lec 32
No ratings yet
Lec 32
25 pages
Lecture 3 - Introduction To Apache Spark - 1691899519972
No ratings yet
Lecture 3 - Introduction To Apache Spark - 1691899519972
67 pages
Public - Crash Course - Apache Spark - Berlin - 2018 PDF
No ratings yet
Public - Crash Course - Apache Spark - Berlin - 2018 PDF
76 pages
Spark
No ratings yet
Spark
160 pages
R23 IDS Unit3
No ratings yet
R23 IDS Unit3
36 pages
Comp9313: Big Data Management: Introduction To Mapreduce and Spark
No ratings yet
Comp9313: Big Data Management: Introduction To Mapreduce and Spark
30 pages
EG2000 Manual PDF
No ratings yet
EG2000 Manual PDF
620 pages
Realism and Education (38-49)
50% (2)
Realism and Education (38-49)
13 pages
Marshmallow Catapult Lesson Plan For March 26 With Reflection
No ratings yet
Marshmallow Catapult Lesson Plan For March 26 With Reflection
6 pages
The Content Validity Index: Are You Sure You Know What's Being Reported? Critique and Recommendations
No ratings yet
The Content Validity Index: Are You Sure You Know What's Being Reported? Critique and Recommendations
10 pages
As & A Level Physics 9702 - 42 Paper 4 A Level Structured Questions Oct - Nov 2023
No ratings yet
As & A Level Physics 9702 - 42 Paper 4 A Level Structured Questions Oct - Nov 2023
2 pages
Ms. Sana Tahir at Giki: Engineering Statistics ES-202
No ratings yet
Ms. Sana Tahir at Giki: Engineering Statistics ES-202
17 pages
Quantification of A Professional Football Team s.26 PDF
No ratings yet
Quantification of A Professional Football Team s.26 PDF
8 pages
Hose Assembly Consist Page - 3844579: Caterpillar: Confidential Yellow
No ratings yet
Hose Assembly Consist Page - 3844579: Caterpillar: Confidential Yellow
2 pages
Detailed Scheduling and Planning (Lesson 6)
88% (8)
Detailed Scheduling and Planning (Lesson 6)
41 pages
Semantic Network Representation
No ratings yet
Semantic Network Representation
13 pages
Flare Radiation
No ratings yet
Flare Radiation
27 pages
Air Exchange Rates in Greenhouses With Different Types of Ventilation Opening in The Western Mediterranean Region of Turkey
No ratings yet
Air Exchange Rates in Greenhouses With Different Types of Ventilation Opening in The Western Mediterranean Region of Turkey
2 pages
The Design of Aircraft Landing Gear: February 2021
No ratings yet
The Design of Aircraft Landing Gear: February 2021
25 pages
2016 Summer Model Answer Paper
No ratings yet
2016 Summer Model Answer Paper
20 pages
Numericals:: 1300 4740 E Ti/ti 4740/1300 3.646
No ratings yet
Numericals:: 1300 4740 E Ti/ti 4740/1300 3.646
36 pages
Chapter 4 (Thermochemistry)
No ratings yet
Chapter 4 (Thermochemistry)
31 pages
ECT Inspection Technique: Theory and General Concepts
No ratings yet
ECT Inspection Technique: Theory and General Concepts
9 pages
Calculation and Selection Result
No ratings yet
Calculation and Selection Result
18 pages
Ucas and University
No ratings yet
Ucas and University
6 pages
CH 2 and 3 MCQS
No ratings yet
CH 2 and 3 MCQS
3 pages
Signage Drawing Final
No ratings yet
Signage Drawing Final
2 pages
Theoretical Distributions - Revision Notes
No ratings yet
Theoretical Distributions - Revision Notes
7 pages
Remodel V22 Geometry Interactive Notebook Segment 1
No ratings yet
Remodel V22 Geometry Interactive Notebook Segment 1
121 pages
2021-AP Price List-Fixed Flames Controllers - Issue 3V12 - 20201211 - Final
No ratings yet
2021-AP Price List-Fixed Flames Controllers - Issue 3V12 - 20201211 - Final
156 pages
Asymptomatic Radiopacity of Mandible Causing Delayed Orthodontic Tooth Movement: A Case Report
No ratings yet
Asymptomatic Radiopacity of Mandible Causing Delayed Orthodontic Tooth Movement: A Case Report
4 pages
Readme Etabs 2013
No ratings yet
Readme Etabs 2013
5 pages
Chapter 1 Review of Vectors and Maxwell's Equations
No ratings yet
Chapter 1 Review of Vectors and Maxwell's Equations
164 pages