Unit 6
Unit 6
G R A P H A N A LY T I C S A N D D ATA
V I S U A L I Z AT I O N
1
Agenda
• What is data visualization?
• Benefits of using data visualization
• Why is it required?
• Its benefits and why is it required?
• Apache Spark GraphX: Property Graph
• Graph Operator
• SubGraph, Triplet
• Neo4j: Modeling data with Neo4j
• Cypher
• Query Language: General clauses
• Read and Write clauses.
• Big Data Visualization with Power BI
• Apache Super-Set
What is data visualization?
• Data visualization is the practice of translating information into a visual
context, such as a map or graph, to make data easier for the human brain
to understand and pull insights from.
• The main goal of data visualization is to make it easier to identify patterns,
trends and outliers in large data sets.
• The term is often used interchangeably with others, including information
graphics, information visualization and statistical graphics.
• Data visualization is one of the steps of the data science process, which states
that after data has been collected, processed and modeled, it must be
visualized for conclusions to be made.
What is data visualization? Cont..
• Data visualization is important for almost every career.
• It can be used by teachers to display student test results, by computer
scientists exploring advancements in artificial intelligence (AI) or by
executives looking to share information with stakeholders.
• It also plays an important role in big data projects.
• As businesses accumulated massive collections of data during the early years
of the big data trend, they needed a way to quickly and easily get an
overview of their data.
• Visualization tools were a natural fit.
Need of Data Visualization
• When a data scientist is writing advanced predictive analytics or machine
learning (ML) algorithms, it becomes important to visualize the outputs
to monitor results and ensure that models are performing as intended.
• This is because visualizations of complex algorithms are generally easier
to interpret than numerical outputs.
Example
Importance of Data Visualization
• Data visualization provides a quick and effective way to communicate information in a
universal manner using visual information.
• The practice can also help businesses identify which factors affect customer behavior;
pinpoint areas that need to be improved or need more attention; make data more
memorable for stakeholders; understand when and where to place specific products;
and predict sales volumes.
• It has ability to absorb information quickly, improve insights and make faster
decisions.
• It provides an increased understanding of the next steps that must be taken to improve
the organization.
• Provides an improved ability to maintain the audience's interest with information they
can understand.
Importance of Data Visualization cont..
• Provides an easy distribution of information that increases the
opportunity to share insights with everyone involved.
• It eliminates the need for data scientists since data is more
accessible and understandable.
• Provides an increased ability to act on findings quickly and,
therefore, achieve success with greater speed and less mistakes.
Data Visualization for Big data
• Data analysis projects have made visualization more important than ever.
• Companies are increasingly using machine learning to gather massive amounts of data that
can be difficult and slow to sort through, comprehend and explain.
• Visualization offers a means to speed this up and present information to business owners
and stakeholders in ways they can understand.
• Big data visualization often goes beyond the typical techniques used in normal
visualization, such as pie charts, histograms and corporate graphs.
• It instead uses more complex representations, such as heat maps and fever charts.
• Big data visualization requires powerful computer systems to collect raw data, process it
and turn it into graphical representations that humans can use to quickly draw insights.
Needs of Organizations to use Data Visualization
Visualization specialist is required for organization, who can apply appropriate data set and
visual styles so that, it guarantees that the organization are optimizing the use of the data.
Involvement of IT specialist is required as organization would need powerful computer
hardware, efficient storage systems and even a move to the cloud.
Quality of data to be used needs to accurate and should be in control of governing person.
Example of Various Visualization Styles
In the early days of visualization, the most common visualization technique was using a
Microsoft Excel, spreadsheet to transform the information into a table, bar graph or pie
chart. While these visualization methods are still commonly used, more intricate
techniques are now available, including the following:
infographics
bubble clouds
bullet graphs
heat maps
fever charts
Example of Infographics
Example of bubble clouds
Example of Bullet chart
Example of heat map
Fever chart example
Apache Spark GraphX
• GraphX is the graph processing library, built in Apache Spark.
• It makes use of Property Graph and Spark RDD(Resilient Distributed
Database) .
• GraphX is the hybrid technology, that combines two components, data
parallel systems, such as Hadoop and spark, which focus on distributed
data across multiple nodes.
• Graph-parallel systems such as pregel, Graph lab, Giraph, efficiently
execute graph algorithms through partitioning and distributing
techniques.
• GraphX will unify data parallel and Graph parallel approach.
Table View v/s Graph view
Data parallel v/s Graph parallel
GraphX
• GraphX is the collection of graph that extends the Spark
RDD(Resilient Distributed Database) class, which is an
immutable distributed collection of objects.
• Basically there are two types of graphs:
• Directed Graph: Edges have direction associated with the graph.
• Regular Graph: Graph where each vertex has same number of
edges.
GraphX property graph
• It is a directed multigraph which has multiple edges in a
parallel.
• Every edge and vertex has user defined properties
associated with it.
• The parallel edges allow multiple relationships between
the same vertices.
Example of Property Graph
Example
In this scenario, we will analyze three flights, information for the same is given in table below:
• Airport will act as vertices
• Routes will act as edges
• For vertices, each of them have an ID and Airport Name as a property.
Basic Join
numEdges joinVertices
numVertices Property Structural outerJoinVertices
inDegress mapVertices reverse
outDegress mapEdges subgraph
degrees mapTriplets mask
groupEdges
Basic Operators
'degree': used to calculate the degree, which portrays the number of indecent edges for each
vertex present in the graph. It uses the reduce function to find the maximum degree.
'inDegree' calculates the degree indicating the number of incoming edges for each vertex
present in the graph. It uses the reduce function to find the maximum In-degree.
'Outdegree' calculates the degree showing the number of outgoing edges for each vertex
present in the graph. It utilizes the reduce function to find the maximum outdegree.
Property Operators
Property operators in GraphX simplify tasks such as filtering nodes based on specific
standards, modifying node or edge features, and aggregating properties across the
graph. These operators are critical in various applications, such as social network analysis,
recommendation systems, and graph-based machine learning.
Property Operators
We have used three methods in this code:
'mapVertices': Map the vertices of the graph using the mapping function to a new type.
'mapEdges': Map the edges of the graph using the mapping function to a new type
'mapTriplets': Map the triplets of the graph to a new type using the map functions.
Structural Operators
Structural operators in GraphX allow tasks such as graph partitioning, subgraph extraction,
and graph join operations. These operators can organize, extract, and combine graph
components, enabling various graph-related applications like community detection, parallel
graph analysis, and graph integration.
Structural Operators
We have used two methods in this:
'Reverse': The 'reverse' method is frequently employed in graph operations to alter the
direction of edges. By modifying the edges, a new graph can be obtained, potentially
offering a more efficient solution to a given problem.
'Subgraph': the subgraph method can create a subgraph by selecting a subset of the edges or
vertices that may suit the condition.
These methods manipulate the graph and extract only necessary information according to the
problem.
Join Operators
Join operators in GraphX enables graph merging, attribute alignment, and graph pattern
matching tasks. These operations are essential in various domains, such as social network
analysis, recommendation systems, or graph-based data integration.
Join Operators
We have used two methods in this:
'joinVertices': this method acts like a transformation operation that helps join the vertices of a
graph with the RDD(Resilient Distributed Dataset) table. Further, it has a map function
that allows merging between the two graphs' vertices and adds data information to the
table.
'outerVertices': Its working is much similar to the 'joinVertices,' but it performs the outer Join
operation on the graph. Further, it has a map function that allows merging between the two
graphs' vertices and adds data information to the table.
Thank You.