Unit 6

Uploaded by

21dce011

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views34 pages

Unit 6

Uploaded by

21dce011

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 34

U N I T- 6

G R A P H A N A LY T I C S A N D D ATA
V I S U A L I Z AT I O N

1
Agenda
• What is data visualization?
• Benefits of using data visualization
• Why is it required?
• Its benefits and why is it required?
• Apache Spark GraphX: Property Graph
• Graph Operator
• SubGraph, Triplet
• Neo4j: Modeling data with Neo4j
• Cypher
• Query Language: General clauses
• Read and Write clauses.
• Big Data Visualization with Power BI
• Apache Super-Set
What is data visualization?
• Data visualization is the practice of translating information into a visual
context, such as a map or graph, to make data easier for the human brain
to understand and pull insights from.
• The main goal of data visualization is to make it easier to identify patterns,
trends and outliers in large data sets.
• The term is often used interchangeably with others, including information
graphics, information visualization and statistical graphics.
• Data visualization is one of the steps of the data science process, which states
that after data has been collected, processed and modeled, it must be
visualized for conclusions to be made.
What is data visualization? Cont..
• Data visualization is important for almost every career.
• It can be used by teachers to display student test results, by computer
scientists exploring advancements in artificial intelligence (AI) or by
executives looking to share information with stakeholders.
• It also plays an important role in big data projects.
• As businesses accumulated massive collections of data during the early years
of the big data trend, they needed a way to quickly and easily get an
overview of their data.
• Visualization tools were a natural fit.
Need of Data Visualization
• When a data scientist is writing advanced predictive analytics or machine
learning (ML) algorithms, it becomes important to visualize the outputs
to monitor results and ensure that models are performing as intended.
• This is because visualizations of complex algorithms are generally easier
to interpret than numerical outputs.
Example
Importance of Data Visualization
• Data visualization provides a quick and effective way to communicate information in a
universal manner using visual information.
• The practice can also help businesses identify which factors affect customer behavior;
pinpoint areas that need to be improved or need more attention; make data more
memorable for stakeholders; understand when and where to place specific products;
and predict sales volumes.
• It has ability to absorb information quickly, improve insights and make faster
decisions.
• It provides an increased understanding of the next steps that must be taken to improve
the organization.
• Provides an improved ability to maintain the audience's interest with information they
can understand.
Importance of Data Visualization cont..
• Provides an easy distribution of information that increases the
opportunity to share insights with everyone involved.
• It eliminates the need for data scientists since data is more
accessible and understandable.
• Provides an increased ability to act on findings quickly and,
therefore, achieve success with greater speed and less mistakes.
Data Visualization for Big data
• Data analysis projects have made visualization more important than ever.
• Companies are increasingly using machine learning to gather massive amounts of data that
can be difficult and slow to sort through, comprehend and explain.
• Visualization offers a means to speed this up and present information to business owners
and stakeholders in ways they can understand.
• Big data visualization often goes beyond the typical techniques used in normal
visualization, such as pie charts, histograms and corporate graphs.
• It instead uses more complex representations, such as heat maps and fever charts.
• Big data visualization requires powerful computer systems to collect raw data, process it
and turn it into graphical representations that humans can use to quickly draw insights.
Needs of Organizations to use Data Visualization
Visualization specialist is required for organization, who can apply appropriate data set and
visual styles so that, it guarantees that the organization are optimizing the use of the data.
Involvement of IT specialist is required as organization would need powerful computer
hardware, efficient storage systems and even a move to the cloud.
Quality of data to be used needs to accurate and should be in control of governing person.
Example of Various Visualization Styles
In the early days of visualization, the most common visualization technique was using a
Microsoft Excel, spreadsheet to transform the information into a table, bar graph or pie
chart. While these visualization methods are still commonly used, more intricate
techniques are now available, including the following:
 infographics
 bubble clouds
 bullet graphs
 heat maps
 fever charts
Example of Infographics
Example of bubble clouds
Example of Bullet chart
Example of heat map
Fever chart example
Apache Spark GraphX
• GraphX is the graph processing library, built in Apache Spark.
• It makes use of Property Graph and Spark RDD(Resilient Distributed
Database) .
• GraphX is the hybrid technology, that combines two components, data
parallel systems, such as Hadoop and spark, which focus on distributed
data across multiple nodes.
• Graph-parallel systems such as pregel, Graph lab, Giraph, efficiently
execute graph algorithms through partitioning and distributing
techniques.
• GraphX will unify data parallel and Graph parallel approach.
Table View v/s Graph view
Data parallel v/s Graph parallel
GraphX
• GraphX is the collection of graph that extends the Spark
RDD(Resilient Distributed Database) class, which is an
immutable distributed collection of objects.
• Basically there are two types of graphs:
• Directed Graph: Edges have direction associated with the graph.
• Regular Graph: Graph where each vertex has same number of
edges.
GraphX property graph
• It is a directed multigraph which has multiple edges in a
parallel.
• Every edge and vertex has user defined properties
associated with it.
• The parallel edges allow multiple relationships between
the same vertices.
Example of Property Graph
Example
In this scenario, we will analyze three flights, information for the same is given in table below:
• Airport will act as vertices
• Routes will act as edges
• For vertices, each of them have an ID and Airport Name as a property.

ID Airport Name SrcID DestID Distance

1 Ahmedabad 1 2 263.3
2 Surat 2 3 279.4
3 Mumbai 3 1 524.2
Table for Routes and Distances Vertex Table for Airports Edges Table for Routes
ID - Long and Airport Name - String SrcID, DestID and Distance - Long
Graph Operator
• Big data comes in different shapes and sizes. It can be batch data that needs to be
processed offline, processing large set of records and generating the results and insights
at a later time.
• Or the data can be real-time streams which needs to be processed on the fly and create
the data insights almost instantaneously.
• Apache Spark can be used for processing batch (Spark Core) as well as real-time data
(Spark Streaming).
Graph Operator
GraphX makes it easier to run analytics on graph data with the built-in operators and
algorithms.
It also allows us to cache and uncache the graph data to avoid recomputation when we
need to call a graph multiple times.
Basically, there are four types of graph operators:
1. Basic
2. Property
3. Structural
4. Join
Types of Graphs Operators
Graph Operators

Basic Join
numEdges joinVertices
numVertices Property Structural outerJoinVertices
inDegress mapVertices reverse
outDegress mapEdges subgraph
degrees mapTriplets mask
groupEdges
Basic Operators
'degree': used to calculate the degree, which portrays the number of indecent edges for each
vertex present in the graph. It uses the reduce function to find the maximum degree.

'inDegree' calculates the degree indicating the number of incoming edges for each vertex
present in the graph. It uses the reduce function to find the maximum In-degree.

'Outdegree' calculates the degree showing the number of outgoing edges for each vertex
present in the graph. It utilizes the reduce function to find the maximum outdegree.
Property Operators
Property operators in GraphX simplify tasks such as filtering nodes based on specific
standards, modifying node or edge features, and aggregating properties across the
graph. These operators are critical in various applications, such as social network analysis,
recommendation systems, and graph-based machine learning.
Property Operators
We have used three methods in this code:
'mapVertices': Map the vertices of the graph using the mapping function to a new type.

'mapEdges': Map the edges of the graph using the mapping function to a new type

'mapTriplets': Map the triplets of the graph to a new type using the map functions.
Structural Operators
Structural operators in GraphX allow tasks such as graph partitioning, subgraph extraction,
and graph join operations. These operators can organize, extract, and combine graph
components, enabling various graph-related applications like community detection, parallel
graph analysis, and graph integration.
Structural Operators
We have used two methods in this:
'Reverse': The 'reverse' method is frequently employed in graph operations to alter the
direction of edges. By modifying the edges, a new graph can be obtained, potentially
offering a more efficient solution to a given problem.

'Subgraph': the subgraph method can create a subgraph by selecting a subset of the edges or
vertices that may suit the condition.

These methods manipulate the graph and extract only necessary information according to the
problem.
Join Operators
Join operators in GraphX enables graph merging, attribute alignment, and graph pattern
matching tasks. These operations are essential in various domains, such as social network
analysis, recommendation systems, or graph-based data integration.
Join Operators
We have used two methods in this:
'joinVertices': this method acts like a transformation operation that helps join the vertices of a
graph with the RDD(Resilient Distributed Dataset) table. Further, it has a map function
that allows merging between the two graphs' vertices and adds data information to the
table.

'outerVertices': Its working is much similar to the 'joinVertices,' but it performs the outer Join
operation on the graph. Further, it has a map function that allows merging between the two
graphs' vertices and adds data information to the table.
Thank You.

C4H341 EN Col24
No ratings yet
C4H341 EN Col24
179 pages
3.1.4 Functional Requirements: Exam Ple: Call-Out 1 User Name Textbox Yes Yes Alpha - Numeric None NA Agujar User Entry
No ratings yet
3.1.4 Functional Requirements: Exam Ple: Call-Out 1 User Name Textbox Yes Yes Alpha - Numeric None NA Agujar User Entry
2 pages
Unit-6 - Graph Analytics and Data Visualization
No ratings yet
Unit-6 - Graph Analytics and Data Visualization
40 pages
Unit-6 - Data Visualization and Graph Analytics
No ratings yet
Unit-6 - Data Visualization and Graph Analytics
27 pages
Graph Visualization WP Compressed
No ratings yet
Graph Visualization WP Compressed
22 pages
Bda - Unit 5
No ratings yet
Bda - Unit 5
24 pages
Data Visualization-1
No ratings yet
Data Visualization-1
29 pages
Graph Analytics For Python Developers
No ratings yet
Graph Analytics For Python Developers
13 pages
Bda U-5
No ratings yet
Bda U-5
33 pages
Big Data Analytics: - by Ayushi Gupta
No ratings yet
Big Data Analytics: - by Ayushi Gupta
94 pages
Unit 5 BDT
No ratings yet
Unit 5 BDT
132 pages
Unit 5 DSA
No ratings yet
Unit 5 DSA
42 pages
Dsbda Ut6
No ratings yet
Dsbda Ut6
11 pages
UNIT 5 Data Analytics
No ratings yet
UNIT 5 Data Analytics
20 pages
EIT Project
No ratings yet
EIT Project
16 pages
Data-Visualization Intro
No ratings yet
Data-Visualization Intro
7 pages
GraphX in Practice: Definitive Reference for Developers and Engineers
From Everand
GraphX in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Unit-5 BDA - Data Visualization
No ratings yet
Unit-5 BDA - Data Visualization
19 pages
Ultimate Guide To Graph Viz-2024
No ratings yet
Ultimate Guide To Graph Viz-2024
23 pages
Class X AI Project Cycle Notes
No ratings yet
Class X AI Project Cycle Notes
19 pages
Bda Unit1
No ratings yet
Bda Unit1
19 pages
Unit 5
No ratings yet
Unit 5
6 pages
Unit-6: Data Visualization and Hadoop
No ratings yet
Unit-6: Data Visualization and Hadoop
96 pages
Foundation of Data Science Imp Notes
No ratings yet
Foundation of Data Science Imp Notes
6 pages
Boosting Big Data Analytics With Apache Spark GraphX
No ratings yet
Boosting Big Data Analytics With Apache Spark GraphX
13 pages
Data Science
No ratings yet
Data Science
12 pages
Data Interpreter
No ratings yet
Data Interpreter
11 pages
Learn 2
No ratings yet
Learn 2
32 pages
Data Visualization in Data Science
No ratings yet
Data Visualization in Data Science
50 pages
Big Dataf8
No ratings yet
Big Dataf8
7 pages
Big Data Analytics M1
No ratings yet
Big Data Analytics M1
27 pages
Ian Talks Algos & Data Structures A-Z: WebDevAtoZ, #2
From Everand
Ian Talks Algos & Data Structures A-Z: WebDevAtoZ, #2
Ian Eress
No ratings yet
IMTC634 - Data Science - Chapter 8
No ratings yet
IMTC634 - Data Science - Chapter 8
24 pages
Ls 5 Big Data Visualization
No ratings yet
Ls 5 Big Data Visualization
7 pages
Unit 6
No ratings yet
Unit 6
12 pages
Unit V-Data Visualization
No ratings yet
Unit V-Data Visualization
5 pages
Data Visualization and Hadoop
No ratings yet
Data Visualization and Hadoop
34 pages
Big Data Analysis by Deshbandhu
No ratings yet
Big Data Analysis by Deshbandhu
368 pages
Big Data Visualization
No ratings yet
Big Data Visualization
7 pages
Unit4 - DataAnalytics and IoT PDF
No ratings yet
Unit4 - DataAnalytics and IoT PDF
40 pages
BIG Data Analytics 21CSH-471: Computer Science & Engineering
No ratings yet
BIG Data Analytics 21CSH-471: Computer Science & Engineering
16 pages
Eds Unit 3
No ratings yet
Eds Unit 3
22 pages
Excercise1 Data VIsualization
No ratings yet
Excercise1 Data VIsualization
5 pages
22BCE9083 Ass1
No ratings yet
22BCE9083 Ass1
8 pages
Data Visualization New
No ratings yet
Data Visualization New
103 pages
Data Science
No ratings yet
Data Science
59 pages
Data Visualization
No ratings yet
Data Visualization
5 pages
Ls 5 - IMP
No ratings yet
Ls 5 - IMP
23 pages
Big Data and Hadoop Self Notes
No ratings yet
Big Data and Hadoop Self Notes
16 pages
DV Chapter 1
No ratings yet
DV Chapter 1
25 pages
Unit-1 Introduction To Big Data Analytics
No ratings yet
Unit-1 Introduction To Big Data Analytics
57 pages
Big Data
No ratings yet
Big Data
106 pages
Data Science and Big Data Analytics A Comprehensive Guide
No ratings yet
Data Science and Big Data Analytics A Comprehensive Guide
8 pages
UNIT I BIG DATA Extra Content
No ratings yet
UNIT I BIG DATA Extra Content
15 pages
DVP 1
No ratings yet
DVP 1
24 pages
Module 4 Data Science Visualization Tools
No ratings yet
Module 4 Data Science Visualization Tools
20 pages
Introduction To Big Data Computing
No ratings yet
Introduction To Big Data Computing
25 pages
DSV Module-4
No ratings yet
DSV Module-4
36 pages
What Is Data Visualization UNIT-V
No ratings yet
What Is Data Visualization UNIT-V
24 pages
Dgraph Essentials: The Complete Guide for Developers and Engineers
From Everand
Dgraph Essentials: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Notes DV 2025
No ratings yet
Notes DV 2025
10 pages
An Approach Based On Model Driven Engineering For Big Data Visualization in Different Visual Modes
No ratings yet
An Approach Based On Model Driven Engineering For Big Data Visualization in Different Visual Modes
10 pages
HCM Extract DBI List REL11 Updated
No ratings yet
HCM Extract DBI List REL11 Updated
5 pages
Azure SQL Database Azure Cosmos DB Azure Database For Mysql Azure Database For Postgresql
No ratings yet
Azure SQL Database Azure Cosmos DB Azure Database For Mysql Azure Database For Postgresql
26 pages
MOP For HNBGW Config On Redundant Serving Node
No ratings yet
MOP For HNBGW Config On Redundant Serving Node
8 pages
User - Concurrent - Program - Name Enabled - Flag Description Output - File - Type Save - Output - Flag Application - Id
No ratings yet
User - Concurrent - Program - Name Enabled - Flag Description Output - File - Type Save - Output - Flag Application - Id
7 pages
Aishwarya Analyst Resume 2024 Latest
No ratings yet
Aishwarya Analyst Resume 2024 Latest
2 pages
Mahima Arampady - Resume
No ratings yet
Mahima Arampady - Resume
3 pages
Microsoft Cybersecurity Reference Architectures (MCRA)
No ratings yet
Microsoft Cybersecurity Reference Architectures (MCRA)
68 pages
Sys Design Resources
No ratings yet
Sys Design Resources
12 pages
Rameshkumar FINAL PROJECT REPORT
No ratings yet
Rameshkumar FINAL PROJECT REPORT
66 pages
DBMS TutorialsPoint Min
No ratings yet
DBMS TutorialsPoint Min
47 pages
SMM - Pso21009995 - Do Nguyen Thien Truc - Assignment 2
No ratings yet
SMM - Pso21009995 - Do Nguyen Thien Truc - Assignment 2
2 pages
GnuCOBOL C Interaction
No ratings yet
GnuCOBOL C Interaction
29 pages
Credit Card Analysis
No ratings yet
Credit Card Analysis
33 pages
Saraswati Science College: A Project Report On
No ratings yet
Saraswati Science College: A Project Report On
6 pages
CUE VM and AA CLI Administrator Guide
No ratings yet
CUE VM and AA CLI Administrator Guide
486 pages
Data Protection For VMware Installation
No ratings yet
Data Protection For VMware Installation
142 pages
Sample Book Data Migration For SAP SAP Press
100% (1)
Sample Book Data Migration For SAP SAP Press
24 pages
1st Quarter Exam - ICT Exploratory
No ratings yet
1st Quarter Exam - ICT Exploratory
6 pages
Solution Manual For C++ Programming: Program Design Including Data Structures, 6th Edition D.S. Malik
100% (10)
Solution Manual For C++ Programming: Program Design Including Data Structures, 6th Edition D.S. Malik
46 pages
PostgreSQL For Data Architects - Sample Chapter
No ratings yet
PostgreSQL For Data Architects - Sample Chapter
23 pages
Biotechnology Resources
No ratings yet
Biotechnology Resources
14 pages
Ict450 SQL Exercise Question
No ratings yet
Ict450 SQL Exercise Question
12 pages
A Mini Project Report ON Decentralized Voting System Using Blockchain
No ratings yet
A Mini Project Report ON Decentralized Voting System Using Blockchain
55 pages
EBK TMS Toolkit Technology Stack GTreasury
No ratings yet
EBK TMS Toolkit Technology Stack GTreasury
18 pages
Oracle Backup & Recovery MCQs
No ratings yet
Oracle Backup & Recovery MCQs
20 pages
Information Technology Management 2A Final OSA
No ratings yet
Information Technology Management 2A Final OSA
4 pages
Project Security and Control Risk Assessment of Toll Bridge Operations
No ratings yet
Project Security and Control Risk Assessment of Toll Bridge Operations
3 pages
Troubleshooting Cluster Administration
No ratings yet
Troubleshooting Cluster Administration
5 pages

Unit 6

Uploaded by

Unit 6

Uploaded by

U N I T- 6

ID Airport Name SrcID DestID Distance

You might also like