0% found this document useful (0 votes)
57 views120 pages

Chapter 4

This document discusses data visualization and various visualization techniques. It begins with an introduction to exploratory data analysis and the benefits of visualizing data. Then it covers topics like data visualization libraries, basic tools like histograms, bar charts, and pie charts, and more specialized tools. It also discusses concepts like visual encoding and how different data types can be encoded visually.

Uploaded by

payalwani73
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views120 pages

Chapter 4

This document discusses data visualization and various visualization techniques. It begins with an introduction to exploratory data analysis and the benefits of visualizing data. Then it covers topics like data visualization libraries, basic tools like histograms, bar charts, and pie charts, and more specialized tools. It also discusses concepts like visual encoding and how different data types can be encoded visually.

Uploaded by

payalwani73
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 120

Data visualization

Chapter 4
Contents
• Introduction to Exploratory Data Analysis
• Data visualization and visual encoding
• Data visualization libraries
• Basic data visualization tools

– Histograms, Bar charts/graphs, Scatter plots, Line charts, Area


plots, Pie charts, Donut charts
• Specialized data visualization tools
– Boxplots, Bubble plots, Heat map, Dendrogram, Venn diagram,
Treemap, 3D scatter plots
– Advanced data visualization tools- Wordclouds
– Visualization of geospatial data
– Data Visualization types
Introduction to Exploratory Analysis
• Data science deliverables obtained after
performing analytics.
• Deliverables better understood when presented as
a picture.
• A good analytics project starts (after cleaning and
understanding the data) with exploratory
visualizations that help us develop hypotheses and
get a feel for the data, and it ends with carefully
manicured figures that make the final results
visually obvious.
• Visualizations provide better understanding
and also helps in providing an insight to data.
• However there are still situations where we
need a number. 2 main reasons:
– Our eyes can trick us, so it’s important to have a
cold hard statistic too.

Often, you don’t have time to sift through every


●●

possible picture, and you need some way to put a


number on it so that the computer can make
decisions of some sort automatically (even if the
decision is only which pictures are worth your
time to look at).
• Exploratory data analysis is an initial step in data
analysis.
– Helps to provide an insight or a bird’s view to data , so that
an analyst can make some sense of it.
• Exploratory data analysis
– EDA is the process of visualizing and analyzing data to
extract insights from it
– A process of examining the available data set to discover
patterns, spot anomalies, test hypothesis, and check
assumptions using statistical measures.
– A process of summarizing important characteristics of data
in order to gain better understanding of te dataset.
– A process to detect outliers and anomalies in dataset
– Identify the most important features/ influential features in
the data set.
• EDA techniques are graphical in nature
– Plotting raw data (bar plots, histograms etc)
– Univariate/multivariate analysis using pie chart,
box plot, cluster map etc.
Introduction to data visualization
• Visualization means graphical representation
of data, that helps in easy understanding.
• Data visualization is helpful to
– Identify outliers in data
– Improve response time
– Greater simplicity
– Easier visualization of patterns
– Business analysis made easy
– Enhanced collaboration
Visual Encoding
• The visual encoding is the way in which data
is mapped into visual structures, upon which
we build the images on a screen.
• Visual Encoding refers to the process by which
we remember visual images.
– For example, if we are presented a list of words,
each shown for one second, we would be able to
remember if there was a word that was written in
all capital letters, or if there was a word written in
italics.
• The use of an appropriate visualization graph
is a challenging task and is an important task
in data analytics
• Visual encoding can be broadly classified into
following two types:
– planar and retinal.
– Humans are sensitive to the retinal variables. They
easily differentiate between various colors,
shapes, sizes and other properties.
• Planar variables are known to everybody.
– Example : graphs across the X- and Y-axis.
– Planar variables work for any data type. They
work great to present any quantitative data
• Retinal variables some of the retinal
variables are size, color, shape, size,
orientation etc.
– Human beings sensitive to retinal variables
– Humans can easily differentiate between different
retinal variables.
• Visualization tools are chosen based on what
type of data we have to represent in the
visualization graph
– Nominal data  varying shapes can be used
– Ordinal data various shades of a particular color
can be used to map data with particular order or
ranking
– Numerical data using graphs
• Examples of visual encoding
• 1. Visualize quantity of items as given
– item Type Quantity
• Features 3
• Bugs 5
• User Stories 6
– Two variables  item type (categorical),
Quantity(numerical/Quantitative)
– Possible choices are
– tem Types Orientation
Color
Shape
Texture
X(or Y)
Item Quantity Orientation
Size
Value
X (or Y)
• Shape + Value

• Value doesn’t work for Quantitative data


• Color + Size

• 3 5 10

• Yellow  Feature
• Pink  Bug
• Blue Story
Python’s visualization tools
• Main visualization library tool for python 
matplotlib
• Mathplotlib is a powerful AND flexible library .
• Integrates well with other libraries.
• Bokeh & plot.ly  browser based libraries are
also popular.
Basic data visualization tools
• Pie charts
– Human are more capable of comprehending things by
visualization rather than reading or listening.
– In data science, visualization’s scope much increases as the
complete data cannot be read or understood, but a visualization
gives much information about the data.
– In visualization, use of pie charts is very common.
– One of the clearest ways to present data
– Conveys information in a way that the human brain will
understand
– Examples of exploratory analysis, where pie charts perform best
• How many of our customers are seniors?
• How many page views came from UK?
• Pie chart looks similar to a pie.
• It’s a circular structure divided into slices
• Each slice indicates a statistical numerical
proportion based on the data divided.
• The Arc length of each slice Is proportional to
the quantity it represents.
• Data can be either numerical or categorical in
nature.
• Pie chart cannot be plotted just by variables,
but some transformations may be needed as
per conditions.
• In iris data set, we choose Class variable as it
has three different species of iris and for the
purpose of the demonstration Sepal_Length
variable is used.
• Below is the code for data transformation and
plotting pie chart.
• We do the following transformation first:
– df1 = df.groupby(“Class”).count()
– df2 = sums[‘Sepal_Length’]
– df2
• o/p
Class
Iris-setosa 50
Iris-versicolor 50
Iris-virginica 50
Name: Sepal_length, dtype: int64
• import matplotlib
• from matplotlib import font_manager as fm
• plt.rcParams['font.size'] = 22.0
• matplotlib.rcParams['text.color'] = 'g‘
• matplotlib.rcParams['lines.linewidth'] = 2
• labels=['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
• explode=(0,0,0.1)
• plt.pie(df2, labels=labels,explode=explode,
autopct='%1.1f%%', radius=1.2, colors=("y","m","b"),
textprops={'fontsize': 22})
• plt.title("Distribution of Sepal Length by Iris Class",
bbox={'facecolor':'0.8', 'pad':5}, y=1.2, fontsize=22)
• plt.show()
Option of “explode”, specifies the fraction of the radius with
which to offset any wedge.
• Donut Chart:
• A donut chart is a kind of pie chart, but has a donut
shape means the area of the center is cut out.
• Donut charts are considered more space efficient, since
the blank inner space can be used to display
percentage or any other info related to data series.
• import matplotlib
• from matplotlib import font_manager as fm
• plt.rcParams['font.size'] = 22.0
• matplotlib.rcParams['text.color'] = 'r‘
• matplotlib.rcParams['lines.linewidth'] = 2
• labels=['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']
• explode=(0.1,0.1,0.1)
• plt.pie(df2, labels=labels,explode=explode, autopct='%1.1f%%',
radius=1.2, colors=("b","g","y"), textprops={'fontsize': 22})
• centre_circle = plt.Circle((0,0),0.75, fc='white',linewidth=1.25)
• fig = plt.gcf()
• fig.gca().add_artist(centre_circle)
• plt.title("Distribution of Sepal Length by Iris Class",
bbox={'facecolor':'0.8', 'pad':5}, y=1.2, fontsize=22)
• plt.show()
• Histograms
– A graphical display of data using bars of different
heights.
– A histogram is basically used to represent data
provided in a form of some groups.
– It is accurate method for the graphical
representation of numerical data distribution.
– It is a type of bar plot where X-axis represents the
bin ranges while Y-axis gives information about
frequency
from matplotlib import pyplot as plt
import numpy as np
# Creating dataset
a = np.array([22, 87, 5, 43, 56,
73, 55, 54, 11,
20, 51, 5, 79, 31,
27])
# Creating histogram
fig, ax = plt.subplots(figsize =(10, 7))
ax.hist(a, bins = [0, 25, 50, 75, 100])

# Show plot
plt.show()
• Two major problems with histograms
– The first one is the number and size of the bins you use.
• If the bins are too large, then you can obscure fascinating
patterns that occur within a single bucket.
• If they are too small, then many of your buckets will contain no
points, and your bell‐shaped curve will turn into a bunch of
one‐unit high bars.
– The second problem is that sometimes your data can mar
the picture.
• There might be one bucket that contains so many points, for
example, that every other bucket is squashed down to what
looks like noise.
– The other visual problem is outliers, which can smash the
overwhelming majority of the points to the far left of the
graph.
• Bar Charts
• A bar chart is a graph with rectangular bars.
• The graph usually compares different categories.
• The graphs can be plotted vertically (bars standing up)
or horizontally (bars laying flat from left to right), the
most usual type of bar graph is vertical.
• A bar graph is useful for looking at a set of data and
making comparisons.
• For example, it’s easier to see which items are taking the
largest chunk of your budget by glancing at the above
chart rather than looking at a string of numbers.
• They can also shows trends over time, or reveal patterns
in periodic sequences
• Bar charts can also represent more complex
categories with stacked bar charts or grouped bar charts.
• For example, if you had two houses and needed budgets
for each, you could plot them on the same x-axis with a
grouped bar chart, using different colors to represent
each house.
• Although they look the same, bar charts and histograms
have one important difference: they plot different types
of data.
• Plot discrete data on a bar chart, and plot continuous
data on a histogram
• A bar chart is used for when we have categories of
data: Types of movies, music genres, or dog breeds.
• It’s also a good choice when we want to compare things
between different groups.
• we could use a bar graph if we want to track change over
time as long as the changes are significant (for example,
decades or centuries).
• If we have continuous data, like people’s weights or IQ
scores, a histogram is best.
Horizontal bars, y-axis categories
Grouped bar graph
Stacked bar chart
• Like the double bar chart, different colors
represent different sub-groups.
• Stacked bar chart is a good choice if we
– Want to show the total size of groups.
– Are interested in showing how the proportions
between groups related to each other, in addition
to the total of each group.
– Have data that naturally falls into components,
like:
• Sales by district.
• Book sales by type of book.
• The matplotlib API in Python provides the
bar() function which can be used in MATLAB
style use or as an object-oriented API.
• The syntax of the bar() function to be used
with the axes is as follows:
• plt.bar(x, height, width, bottom, align)
• The function creates a bar plot bounded with a
rectangle depending on the given parameters.
• A simple example of the bar plot, which represents the number of students enrolled in different courses of
an institute.

import numpy as np
import matplotlib.pyplot as plt

# creating the dataset


data = {'C':20, 'C++':15, 'Java':30,
'Python':35}
courses = list(data.keys())
values = list(data.values())

fig = plt.figure(figsize = (10, 5))

# creating the bar plot


plt.bar(courses, values, color ='maroon',
width = 0.4)

plt.xlabel("Courses offered")
plt.ylabel("No. of students enrolled")
plt.title("Students enrolled in different courses")
plt.show()
• plt.bar(courses, values, color=’maroon’) is used
to specify that the bar chart is to be plotted by
using the courses column as the X-axis, and the
values as the Y-axis.
• The color attribute is used to set the color of the
bars(maroon in this case).
• plt.xlabel(“Courses offered”) and
plt.ylabel(“students enrolled”) are used to label
the corresponding axes.
• plt.title() is used to make a title for the graph.
• plt.show() is used to show the graph as output
using the previous commands.
• Scatter plots:
– they are one of the simplest but most powerful
ways to visualize relationships within a dataset.
– Scatter plots are best when we want to visualize a
first hand information about a data set
df.plot(kind="scatter",
x="sepal length (cm)", y="sepal width (cm)")
plt.title("Length vs Width")
plt.show()
• Scatter plots can have several other
characteristics which allow more than just the
two dimensions to be packed in.
• Color coding  Often data points that fall into different
categories are given different colors.
• Size. Changing the size of data points communicates
another dimension of information. It also has the often‐
desirable ability to draw attention disproportionately to
some points instead of others.
• Opacity. In scatterplots and other visualizations, it is
often useful to make things partially transparent in case
they overlap with other parts of the visualization
• Line charts:
– Used to plot a representation of continuous data
points on a number line.
– They are created by first plotting the data points
on a cartesian plane, and then joining those points
with a number line.
– Line plots can be used to plot data points for both
single variable analysis and multiple variable
analysis.
– Generally used for visualizing trends in time-series
problems.
Specialized data visualization tools

• Visualization tools meant for specific


purposes.
• They are
– Box plots
– Bubble plot
– Violin plot
– Heat map
– dendogram
• Box plots
– Boxplots are a convenient way to summarize a
dataset by showing the median, quantiles, and
min/max values for each of the variables.
– Displays distribution of data based on 5-number
theory by dividing the data set into 3 quartiles and
then presents the following values in the plotted
graph.
• Minimum, maximum, median, first quartile(lower),
third quartile(upper)
• Box plots help to have information on the
variability or dispersion of the data.
• A boxplot is a graph that gives a good
indication of how the values in the data are
spread out.
• Box plots take up less space, as compared to
histograms / density plots, which is useful
when comparing distributions between many
groups or datasets.
With outliers shown
• median (Q2/50th Percentile): the middle value of the
dataset.
• first quartile (Q1/25th Percentile): the middle number
between the smallest number (not the “minimum”) and
the median of the dataset.
• third quartile (Q3/75th Percentile): the middle value
between the median and the highest value (not the
“maximum”) of the dataset.
• interquartile range (IQR): 25th to the 75th percentile.
• whiskers (shown in blue)
• outliers (shown as green circles)
• “maximum”: Q3 + 1.5*IQR
• “minimum”: Q1 -1.5*IQR
• Box plots
– It can tell us about the outliers in out data set and
what their values are.
– It can also tell us about the symmetry of our data
set . If our data is symmetrical, how tightly our
data is grouped, and if and how our data is
skewed.
• An example of Graph Boxplot used below to
analyze the relationship between a categorical
feature (malignant or benign tumor) and a
continuous feature (area_mean).
• import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt# Put dataset on my
github repo
df = pd.read_csv('
https://raw.githubusercontent.com/mGalarnyk/Py
thon_Tutorials/master/Kaggle/BreastCancerWisco
nsin/data/data.csv
')
malignant = df[df['diagnosis']=='M']['area_mean']
benign = df[df['diagnosis']=='B']['area_mean']
fig = plt.figure()
ax = fig.add_subplot(111)
ax.boxplot([malignant,benign], labels=['M', 'B'])
• We observe that there is a greater variability
for malignant tumor area_mean as well as
larger outliers.
• Bubble plots 
– A bubble plot is a scatterplot where a third dimension is
added: the value of an additional numeric variable is
represented through the size of the dots.
– We need 3 numerical variables as input: one is represented
by the X axis, one by the Y axis, and one by the dot size.
– Bubble charts are used to determine if at least three
numerical variables are related or share some kind of
pattern.
– Under special circumstances, they could be used to show
trends over time or to compare categorical variables.
– They are considered a natural extension of the scatter plot
where the dots are replaced with bubbles or disks.
• A bubble or disk is drawn for each observation
of a pair of numerical variables (A, B)
positioning, in a Cartesian coordinate system,
the disk horizontally according to the value of
variable A and vertically according to variable
B.
• A third numerical variable C is represented by
means of the area of the bubble.
• A bubble chart (aka bubble plot) is an
extension of the scatter plot used to look at
relationships between three numeric
variables.
• Each dot in a bubble chart corresponds with a
single data point, and the variables’ values for
each point are indicated by horizontal
position, vertical position, and dot size.
• Example data for bubble chart:

Avg poits against Avg_points for Wins


26.56 14.06 3
26.44 25.88 7
17.94 24.31 10
23.38 16.81 6

• A bubble chart is created from a data table with


three columns.
• Two columns will correspond with the horizontal
and vertical positions of each point, while the third
will indicate each point’s size.
• One point will be plotted for each row in the table.
• The example bubble chart above depicts the
points scored per game by teams in the regular
season of the National Football League in 2018.
• Each bubble represents a single team’s
performance.
• A bubble’s horizontal position notes the average
points scored against that team each game, and
the vertical position notes the average points
scored by that team each game.
• Each bubble’s size indicates the number of wins
earned by each team, with larger bubbles
corresponding to higher win rates.
• From the plot, we can see that there is a lot
more variance in points scored by teams than
by their opponents, but there’s no particularly
strong correlation between the two.
• Instead, the main takeaway from the plot
comes from the third variable: as teams score
more points and allow fewer points from their
opponents (towards the upper left), they will
earn more victories, as one might naturally
expect.
• The bubble plot is interpreted based on the
shape that these data points generate as well
as from the differences in the relative sizes of
the bubbles or discs.
• There must be appropriate legends for the
dissimilar categories represented by the colors
and some type of scale that allows us to infer
the numerical value indicated by the size of
the bubble.
The graph above shows a direct (positive) relationship between
variable A and variable B. The disk indicated in the position (60, 225) is
clearly an outlier.
• Following three important features of the
dataset can be found in a bubble chart:
– Outliers, piece of data that are very different from
all the others in the dataset and do not seem to fit
the same pattern. These anomalous values might
represent valuable information to analyze;
– Gaps, an interval that contains no data. The
visualization of gaps between data justifies an in-
depth analysis that explains their presence;
– Clusters, isolated groups of data bubbles which
can also merit a particular analysis of the reason
for their presence in the graph.
• The following bubble chart describes
the Prevalence of stunting against Access to basic
sanitation services across regions in the world.
• The yellow bubbles in the upper left of the chart
indicates that the Sub-Saharan Africa region
makes up a clear cluster of countries where
millions of children are growing up without future
basic life skills.
• The size of the disks corresponds to the region’s
population, the chart also shows the urgent need
to bring safe sanitation to millions of children in
South Asia
• Bubble charts are appropriate when we want to show
relationships between three or four variables but not
their exact values.
– For example, in business you can make investment
decisions by visualizing in a bubble plot relationships in
dimensions such as cost, value, and risk between different
business alternatives.
• Bubble charts are commonly drawn with transparency
on points since overlaps are a much easier occurrence
than when all points are a small size.
• This overlapping also means that there are limitations
to the number of data points that can be plotted
while keeping a plot readable.
• Using Matplotlib, we can make bubble plot in
Python using the scatter() function.
• To make bubble plot, we need to specify size
argument “s” for size of the data points.
# scatter plot with scatter() function
# transparency with "alpha"
# bubble size with "s"
plt.scatter('X', 'Y', s='bubble_size',
alpha=0.5, data=df)
plt.xlabel("X", size=16)
plt.ylabel("y", size=16)
plt.title("Bubble Plot with Matplotlib", size=18)
• We have also added transparency to the
bubbles in the bubble plot using alpha=0.5.
• Violin plots
– Sometimes the median and mean aren't enough to
understand a dataset.
– Are most of the values clustered around the median? Or are
they clustered around the minimum and the maximum with
nothing in the middle?
– For questions like these, distribution plots come to our help.
– The box plot is an old standby for visualizing basic
distributions. It's convenient for comparing
summary statistics (such as range and quartiles),
but it doesn't show the variations in the data
• Unlike a box plot that can only show summary
statistics, violin plots depict summary statistics
and the density of each variable.
– A violin plot is a method of plotting numeric data.
– It is similar to a box plot, with the addition of a
rotated kernel density plot on each side.
– Violin plots are similar to box plots, except that
they also show the probability density of the data
at different values, usually smoothed by a kernel
density estimator.
– used to visualize the distribution of numerical
data.
– A box plot that can only show summary statistics,
violin plots depict summary statistics and the
density of each variable.
• How to read a violin plot
• Violin plots have many of the same summary statistics
as box plots:
– the white dot represents the median
– the thick gray bar in the center represents the interquartile
range
– the thin gray line represents the rest of the distribution,
except for points that are determined to be “outliers” using
a method that is a function of the interquartile range.
– On each side of the gray line is a kernel density estimation
to show the distribution shape of the data.
– Wider sections of the violin plot represent a higher
probability that members of the population will take on the
given value; the skinnier sections represent a lower
probability.
– As an example dataset, we have a dataset that
contains records of 71 six-week-old baby chickens
and includes observations on their particular feed
type, sex, and weight.
• This violin plot shows the relationship of feed
type to chick weight.
• The box plot elements show the median
weight for horsebean-fed chicks is lower than
for other feed types.
• The shape of the distribution (extremely
skinny on each end and wide in the middle)
indicates the weights of sunflower-fed chicks
are highly concentrated around the median.
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(10)
collectn_1 = np.random.normal(100, 10, 200)
collectn_2 = np.random.normal(80, 30, 200)
collectn_3 = np.random.normal(90, 20, 200)
collectn_4 = np.random.normal(70, 25, 200)
## combine these different collections into a list
data_to_plot = [collectn_1, collectn_2, collectn_3, collectn_4]
# Create a figure instance
fig = plt.figure()
# Create an axes instance
ax = fig.add_axes([0,0,1,1])
# Create the boxplot
bp = ax.violinplot(data_to_plot) plt.show()
• Heat map
– A heat map is a two-dimensional representation
of data in which values are represented by colors.
– A simple heat map provides an immediate visual
summary of information.
– Each data value is represented in a matrix with
different color coding.
– Generally the darker shades represents higher
values than the lighter shades, of a color
– Heat maps are used to find correlation between
various data columns in a dataset
• A heat map can be considered as a data-driven
“paint by numbers” canvas overlaid on top of
an image.
• In short, an image is divided into a grid and
within each square, the heat map shows the
relative intensity of values captured by your
eye tracker by assigning each value a color
representation.
• Those that are highest in their value – relative
to the other present numbers – will be given a
“hot” color, while those that are lower in their
value will be given a “cold” color.
• As an example of a heat map, consider
understanding the way a web page is browsed
by users:
– If you’re looking at a web page and you want to
know which areas get the most attention, a heat
map shows you in a visual way that’s easy to
assimilate and make decisions from.
– A heat map uses a warm-to-cool color spectrum to
show you which parts of a page receive the most
attention.
This heat map, for example, shows how far down the page visitors have
scrolled:
– Creating a heat map helps you understand visitor
behavior instantly.
– They also help you answer a crucial question:
“Where should the most important content be on
this page?”
• Heat mapping software works by collecting
data from a web page and displaying that data
over the web page itself.
import numpy as np
import matplotlib.pyplot as plt
data = np.random.random(( 12 , 12 ))
plt.imshow( data , cmap = 'autumn' ,
interpolation = 'nearest' )
plt.title( "2-D Heat Map" )
plt.show()
• Dendogram:
– The dendrogram is a visual representation of the
compound correlation data.
– The individual compounds are arranged along the bottom of
the dendrogram and referred to as leaf nodes.
– Compound clusters are formed by joining individual
compounds or existing compound clusters with the join
point referred to as a node.
– A dendrogram is a branching diagram that represents the
relationships of similarity among a group of entities.
– Used to quantitatively estimate the relation between every
sample in the system and study how close each data point is
related to each other in the system
– A dendrogram is a tree-like structure that explains the
relationship between all the data points in the system
• The vertical direction (y-axis) in it represents
the distance between clusters in some metric.
• As we keep going down in a path, we keep
breaking the clusters into smaller and smaller
units until your granularity level reaches the
data sample.
• When we traverse in up direction, at each level,
we are combining smaller clusters into larger
ones till the point we reach the entire system.
• As a result, hierarchical clustering is also known
as clustering of clustering.
• Venn diagram
– A Venn diagram is an illustration that uses circles
to show the relationships among things or finite
groups of things.
– Circles that overlap have a commonality while
circles that do not overlap do not share those
traits.
– Venn diagrams help to visually represent the
similarities and differences between two concepts.
• A Venn diagram is an illustration of the
relationships between and among sets, groups
of objects that share something in common.
• Usually, Venn diagrams are used to depict set
intersections (denoted by an upside-down
letter U).
• This type of diagram is used in scientific and
engineering presentations, in theoretical
mathematics, in computer applications, and in
statistics.
• The drawing is an example of a Venn diagram
that shows the relationship among three
overlapping sets X, Y, and Z.
• Example of a venn diagram
– A study is being done at a school on students who take
the subjects mathematics and economics. There are 12
students who attend both classes and 2 students who do
not take either of the subjects.
• import matplotlib.pyplot as plt
• from matplotlib_venn import venn3
• set1 = set(['A', 'B', 'C'])
• set2 = set(['A', 'B', 'D'])
• set3 = set(['A', 'E', 'F'])
• venn3([set1, set2, set3], ('Group1', 'Group2',
'Group3'))
• plt.show()
• Treemaps:
– Tree Maps are primarily used to display data that is
grouped and nested in a hierarchical (or tree-based)
structure.
– The space in the visualization is split up into rectangles that
are sized and ordered by a quantitative variable.
– The levels in the hierarchy of the treemap are visualized as
rectangles containing other rectangles.
– Each set of rectangles on the same level in the hierarchy
represents a column or an expression in a data table.
– Each individual rectangle on a level in the hierarchy
represents a category in a column.
• For example, a rectangle representing a continent may contain
several rectangles representing countries in that continent.
• Each rectangle representing a country may in turn contain
rectangles representing cities in these countries.
• Treemap of sales by city
• Treemap, in heirarchy
• import matplotlib.pyplot as plt
import squarify
import pandas as pd
• sizes = [50, 25, 12, 6]
• label=["50", "25", "12", "6"]
• color=['red','blue','green','grey']
• squarify.plot(sizes=sizes, label=label,
color=color , alpha=0.6 )
plt.axis('off')
• plt.show()
Advanced data visualization tools
• Sophisticated Tools used for specialized
purpose
• A sophisticated technique, typically beyond
that of traditional Business Intelligence, that
uses “the autonomous or semi-autonomous
examination of data or content to discover
deeper insights, make predictions, or generate
recommendations.
• Wordclouds:
– Data visualizations (like charts, graphs, infographics, and
more) give businesses a valuable way to communicate
important information at a glance, but what if raw data is
text-based?
– in order to get a stunning visualization format to highlight
important textual data points, using a word cloud can
make dull data sizzle and immediately convey crucial
information.
– Word clouds or tag clouds are graphical representations of
word frequency that give greater prominence to words
that appear more frequently in a source text.
– Applied in case of large textual data, like social media
posts, user feedbacks, comments etc
– Tools for understanding and mining out patterns from
textual data
– A tag cloud is a visual representation of text data, which is
often used to depict keyword metadata on websites, or to
visualize free form text.
– Tags are usually single words, and the importance of each
tag is shown with font size or color
– Helps to get immediate insights into the most
important terms in data.
– Word clouds (also known as text clouds or tag
clouds) work in a simple way:
• the more a specific word appears in a source of textual
data (such as a speech, blog post, or database), the
bigger and bolder it appears in the word cloud.
– A word cloud is a collection, or cluster, of words
depicted in different sizes.
– The bigger and bolder the word appears, the more
often it’s mentioned within a given text and the
more important it is.
– Also known as tag clouds or text clouds, these are
ideal ways to pull out the most pertinent parts of
textual data, from blog posts to databases.
– They can also help business users compare and
contrast two different pieces of text to find the
wording similarities between the two.
• Much of the research an organization conducts will
include at least some form of an open-ended inquiry that
prompts respondents to give a textual answer.
– For instance, you might ask current customers what they like or
don’t like about your new product line. Or, you could ask them
to give suggestions on how your organization could improve..
• There are industry tools that allow to code such open-
ended data so users can understand it quantitatively, but
are costly
• Word clouds offer a cost-effective, yet powerful,
alternative.
• Here we don’t create graphs or chart as in numerical data,
instead we create a word cloud generator to transform
the most critical information into a word cloud.
• An example from USA Today using U.S.
President Barack Obama’s State of the Union
Speech 2012:

• As you can see, words like “American,” “jobs,”


“energy” and “every” stand out since they
were used more frequently in the original text.
• A few areas where Word Clouds Excel for Businesses

– Finding customer pain points — and opportunities to connect.


Analyzing customer feedback can allow us to see what our
customers like most about our business and what they like least.
Pain points (such as “wait time,” “price,” or “convenience”) are
very easy to identify with text clouds.
– Understanding how your employees feel about your
company. Text cloud visualization can turn employee feedback
from a pile of information you’ll read through later to an
immediately valuable company feedback that positively drives
company culture.
– Identifying new SEO terms to target. In addition to normal
keyword research techniques, using a word cloud may make you
aware of potential keywords to target that your site content
already uses.
• Visualization of Geospatial data
– Geospatial data includes location information and
features or attributes of places
– Geospatial data, or spatial data (as it's sometimes
known), is information that has a geographic aspect to it.
– In other words, the records in this type of information
set have coordinates, an address, city, postal code, or zip
code included with them.
– The most obvious example is a road map. We see the
rendered result, but the features on the map are stored
with this type of information included in them.
– Visualizing geospatial data helps us communicate how
different variables correlate to geographical locations by
layering these variables over maps.
• Tools used for Geospatial data:
• Chloropeth Map: Chloropleth maps represent
data using different colors or shading patterns
for different regions.
– Each color or shading pattern corresponds to a
different value or range of values that a variable
can take.
– An example , in a chloropeth visualization of
countries of the world, there are 8 colors, each
representing a different range of percentages of
women participating in the labour force.
– Contains partitioned geographical regions/areas
– Displays geographical areas, that are coloured,
shaded or patterned in relation to a data variable
– Helps to visualize values over a geographical area
• Bubble map:
– With this data map, circles are displayed over a
designated geographical region with the area of
the circle proportional to its value in the dataset.
– Bubble Maps are good for comparing proportions
over geographic regions without the issues caused
by regional area size, as seen on Choropleth Maps.
– However, a major flaw with Bubble Maps is that
overly large bubbles can overlap other bubbles
and regions on the map, so this needs to be
accounted for.
• Connection maps:
– A connection map shows the connections
between several positions on a map.
– Connection Maps are used in order to show
network connections laid over geographical data.
– This visualization technique is most commonly
used to showcase the import & export flow, a
travel journey, the typical flight and subway
station connections.
– Connection Map is used to display network
combined with geographical data
• Accept name n serial number
<? Php
• For(….i<5..){ ?>
• <html>
• form …..input name=marks[]…… <html>
• <? }?>
• $marks = $_post[‘marks’]

You might also like