0% found this document useful (0 votes)
5 views62 pages

UNIT4

This document provides an introduction to data visualization, outlining its principles, tools, and examples of industry projects. It emphasizes the importance of understanding the audience and selecting appropriate visual formats to convey data effectively. Additionally, it discusses popular Python libraries for visualization, such as Matplotlib and Seaborn, and includes practical examples of creating various types of graphs.

Uploaded by

janviiiiiii0046
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views62 pages

UNIT4

This document provides an introduction to data visualization, outlining its principles, tools, and examples of industry projects. It emphasizes the importance of understanding the audience and selecting appropriate visual formats to convey data effectively. Additionally, it discusses popular Python libraries for visualization, such as Matplotlib and Seaborn, and includes practical examples of creating various types of graphs.

Uploaded by

janviiiiiii0046
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Introduction to Data Science

(114AT01)

Module 4 :
Data Visualization

1
Contents:
● Data Visualization
○ Basic Principles
○ Ideas and Tools for Data Visualization
● Examples of inspiring (industry) projects
● Exercise : Create your own visualization for complex data set

2
Data Visualization :
● Data visualization is the process of translating raw data into graphs, images that
explain numbers and allow us to gain insight into them.
● It shifts the way we make use of the knowledge to build meaning out of it, to find
new patterns, and to identify trends.

3
Data Visualization Basic Principles :
● Data visualization is a very useful tool in today’s data-driven business world
● It is necessary to apply effective data visualization principles to improve the way data
is displayed to our target audience.

● Before planning visualization, Know your audience:


○ Graphs can be used for
■ Our own exploratory data analysis,
■ To convey a message to experts, or
■ To help tell a story to a general audience.
○ Make sure that the intended audience understands each element of the plot.

4
Data Visualization Basic Principles : (...)
● Some of the key aspects of effective data visualization include :
○ Determine the best visual
■ To begin with, it is imperative to understand the volume of data in hand.
■ Then identify the aspects that we wish to visualize along with the information that we
wish to convey.
■ After this, select the best-suited and simplest visual format for your target
audience.

○ Balance the design


■ This refers to equally distributed visual elements across the plot such as texture,
color, shape, and negative space.
■ Select the visual to be symmetrical, asymmetrical or radial, and figure out the right
balance of elements that work best for visualizing your data.
5
Data Visualization Basic Principles : (...)
● Some of the key aspects of effective data visualization include : (...)
○ Focus on the key areas : Ensure the key areas are well highlighted
○ Keep it simple:
■ Visuals should be simple and easy-to-understand.
■ Adding unwanted information may make it confusing, which defeats the purpose of
data visualization
○ Incorporate interactivity : Deploy data visualization tools that infuse interactivity into
your graphs or charts (but without creating confusion)

○ Use patterns : Establish a pattern by using similar chart types, colors etc.
○ Compare aspects :
■ Side-by-side comparison of aspects to make understanding of data easier.
■ May align data either horizontally/vertically to compare accurately
6
Ideas and Tools for Visualization :
● Importance of Data Visualization
○ The visual representation serves as a compass to guide decision-makers through the
complexities and empowering them to extract meaningful insights.

○ Has capacity to reveal hidden patterns and anomalies within datasets.


○ Data visualization is not merely a tool for rendering data aesthetically; it is a
dynamic enabler of strategic thinking.
○ By presenting data in a visually digestible format, it empowers business users to focus
their attention where it matters most.

7
Ideas and Tools for Visualization : (...)
● Tools : ( Details of Visualization Tools )

8
Python Libraries for Data Visualization :
● Two of the most popular libraries are Seaborn and Matplotlib.
○ Matplotlib
■ Powerful library that provides a high degree of control over every aspect of a plot.

■ Great for creating basic plots such as line plots, scatter plots, and bar charts.
■ Matplotlib is also highly customizable and can be used to create complex
visualizations.

○ Seaborn
■ Is built on top of Matplotlib and provides a higher-level interface for creating
statistical graphics.

■ Requires less code to create complex visualizations compared to Matplotlib.

9
Python Libraries for Data Visualization : (...)
● Which is best?
○ Seaborn is designed to work with Pandas dataframes and provides easy-to-use
functions for creating more complex plots such as heatmaps, violin plots, and box plots.
○ If we need complete control over every aspect of your plot and want to create complex
visualizations, then Matplotlib may be the better choice.
○ However, if we are working with dataframes and want to quickly create statistical
graphics with minimal effort, then Seaborn would be the better choice.

● In summary, both Seaborn and Matplotlib have their strengths and weaknesses.
● Choosing between them ultimately depends on your specific needs and preferences.

10
Mostly used functions in Matplotlib :
● Most of the Matplotlib utilities lies under the pyplot submodule, and are usually
imported under the plt alias:import matplotlib.pyplot as plt
1. plot() function: Draw points(markers) or also draws line from point to point.
■ This function takes two parameters. First parameter takes array of values for X-
axis and second parameter takes array of values for Y-axis.
2. xlabel() and ylabel(): For specifying labels along X-axis and Y-axis
3. scatter(): To draw a scatter plot.
■ The scatter() function plots one dot for each observation.
■ It needs two arrays of the same length, one for the values of the x-axis, and one for
values on the y-axis:
4. bar(): To draw bar graph
5. pie(): To draw pie charts
Line Plot

12
Starting with a Graph:
● Defining the plot:

13
Starting with a Graph: (…)
● Drawing multiple lines and plots:

14
Starting with a Graph: (…)
● Saving your work to disk:

15
Setting the Axis, Ticks, Grids :
● Some means or labels are needed to understand the data shown visually so that
viewer can compare the displayed data

● Getting the axes: ax = plt.axes() gives handle to the axes


● Formatting the axes:
o ax.set_xlim() and ax.set_ylim() : change the axes limits (length of each axis)
o ax.set_xticks() and ax.set_yticks() : Change the ticks used to display the data

● Adding grids :
o ax.grid() : Gridlines enable to see the precise value for each element of a graph
Setting the Axis, Ticks, Grids : (…)
Setting the Axis, Ticks, Grids : (…)

%matplotlib inline :
Supports inline plotting of the graph

18
Setting the Axis, Ticks, Grids : (…)

%matplotlib auto :
Removes inline plotting of the graph

19
Setting the Axis, Ticks, Grids : (…)

%matplotlib notebook :
Interacting plotting of the graph with pan and
zoom controls, can move between views and
allows to save/ download
20
Defining the Line Appearance:
● Working with line styles :
○ Line styles differentiate graphs by drawing the lines in various ways
○ Line style (ls) is the third argument to the plot() function call

Character Line Style


‘-’ Solid line
‘--’ Dashed line
‘-.’ Dash-dot line
‘:’ Dotted line

21
Defining the Line Appearance: (...)
● Working with line styles : (...)

22
Defining the Line Appearance: (...) Character Color

● Using colors : Colors to differentiate line graph ‘b’ blue


‘g’ green
‘r’ red
‘c’ cyan
‘m’ magenta
‘y’ yellow
‘k’ black
‘w’ white

23
Defining the Line Appearance: (...)
● Adding markers :
○ Markers add a special symbol to each data point in a line graph

24
Defining the Line Appearance: (...)
● Using labels, annotations and legends :
○ Labels :
■ Provides positive identification of a particular data element or grouping
■ Purpose : To make it easy for the viewer to know the name or kind of data illustrated
■ Helps people understand the significance of each axis of any graph.

25
Defining the Line Appearance: (...)
● Using labels, annotations and legends : (...)
○ Annotations :
■ Augments the information the viewer can immediately see about the data with notes,
sources, or other useful information.

■ Purpose : To draw special attention to points of interest on a graph

26
Defining the Line Appearance: (...)
● Using labels, annotations and legends : (...)
○ Legend :
■ Presents a listing of the data groups within the graph and often provides cues to make
identification of the data group easier.

■ Purpose : Document the individual elements of a plot


Matplotlib Subplot
import matplotlib.pyplot as plt
Display Multiple Plots
import numpy as np
With the subplot() function you can draw multiple plots in one
#plot 1: figure:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])

plt.subplot(1, 2, 1)
plt.plot(x,y)

#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])

plt.subplot(1, 2, 2)
plt.plot(x,y)

plt.show()
The subplot() Function
The subplot() function takes three arguments that describes the layout of the figure.

The layout is organized in rows and columns, which are represented by the first and
second argument.

The third argument represents the index of the current plot.

plt.subplot(1, 2, 1)
#the figure has 1 row, 2 columns, and this plot is the first plot.

plt.subplot(1, 2, 2)
#the figure has 1 row, 2 columns, and this plot is the second plot.
Matplotlib Subplot
Draw 2 plots on top of each other:
import matplotlib.pyplot as plt
import numpy as np

#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])

plt.subplot(2, 1, 1)
plt.plot(x,y)

#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])

plt.subplot(2, 1, 2)
plt.plot(x,y)

plt.show()
Title in Subplot()
You can add a title to each plot with the title() function:

#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])

plt.subplot(1, 2, 1)
plt.plot(x,y)
plt.title("SALES")

#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])

plt.subplot(1, 2, 2)
plt.plot(x,y)
plt.title("INCOME")
Super Title in Subplot()
You can add a title to the entire figure with the suptitle() function:

#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(1, 2, 1)
plt.plot(x,y)
plt.title("SALES")

#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])

plt.subplot(1, 2, 2)
plt.plot(x,y)
plt.title("INCOME")
plt.suptitle("MY SHOP")
Choosing the Right Graph:
● Kind of graph we choose determines how people view the associated data

● Choosing the right graph from the outset is important

● e.g.
○ Pie chart : To show how various data elements contribute towards a whole

○ Bar chart : To form opinions on how data elements compare

● Idea : To choose a graph that naturally leads people to draw the conclusion that

we need them to draw about the showed data


Choosing the Right Graph: (...)
● Showing parts of a whole with pie charts:

○ Focus on showing parts as a whole

○ Entire pie is considered as 100%


○ Each slice (wedge) represents value

in %

○ By default, plotting of first wedge

starts from X axis and moves in

counter clockwise direction


Choosing the Right Graph: (...)
● Showing parts of a whole with pie charts: (...)

○ Most used parameters of pie chart:

■ colors : To choose custom color for each wedge


■ labels : To identify each wedge
■ explode : If value = 0, wedge stays at place. For any other value, wedge
moves from the center of the pie

■ autopct : Takes format string to format percentage


■ counterclock : To determine the direction of wedges (anticlockwise)
■ shadow : Decides whether shadow beneath is needed or not
Choosing the Right Graph: (...)
● Labels
○ Add labels to the pie chart with the labels parameter.
○ The labels parameter must be an array with one label for each
wedge:
import matplotlib.pyplot as plt
import numpy as np

y = np.array([35, 25, 25, 15])


mylabels = ["Apples", "Bananas", "Cherries",
"Dates"]

plt.pie(y, labels = mylabels)


plt.show()
Choosing the Right Graph: (...)
Start Angle
● As mentioned the default start angle is at the x-axis, but you can change the start angle
by specifying a startangle parameter.
● The startangle parameter is defined with an angle in degrees, default angle is 0:

import matplotlib.pyplot as plt


import numpy as np

y = np.array([35, 25, 25, 15])


mylabels = ["Apples", "Bananas", "Cherries",
"Dates"]

plt.pie(y, labels = mylabels, startangle = 90)


plt.show()
Choosing the Right Graph: (...)
Shadow
● Add a shadow to the pie chart by setting the shadows parameter to True:

import matplotlib.pyplot as plt


import numpy as np

y = np.array([35, 25, 25, 15])


mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
myexplode = [0.2, 0, 0, 0]

plt.pie(y, labels = mylabels, explode = myexplode,


shadow = True)
plt.show()
Choosing the Right Graph: (...)
Legend
To add a list of explanation for each wedge, use the legend() function:

import matplotlib.pyplot as plt


import numpy as np

y = np.array([35, 25, 25, 15])


mylabels = ["Apples", "Bananas", "Cherries", "Dates"]

plt.pie(y, labels = mylabels)


plt.legend(loc="upper right")
plt.show()
Choosing the Right Graph: (...)
● Showing parts of a whole with pie charts: (...)

○ Most used parameters of pie chart:


Choosing the Right Graph: (...)
● Creating comparisons with bar charts:
○ Bar charts presents categorical data with rectangular bars with heights or
lengths proportional to the values that they represent.

○ The bars can be plotted vertically or horizontally.

○ A vertical bar chart is sometimes called a column chart.

○ They are mainly used with discrete values


Choosing the Right Graph: (...)
Creating Bars
With Pyplot, you can use the bar() function to draw bar graphs:

import matplotlib.pyplot as plt


import numpy as np

x = np.array(["A", "B", "C", "D"])


y = np.array([3, 8, 1, 10])

plt.bar(x,y)
plt.show()
Choosing the Right Graph: (...)
Horizontal Bars
If you want the bars to be displayed horizontally instead of vertically, use the barh() function:

import matplotlib.pyplot as plt


import numpy as np

x = np.array(["A", "B", "C", "D"])


y = np.array([3, 8, 1, 10])

plt.barh(x, y)
plt.show()
Choosing the Right Graph: (...)
Bar Width
The bar() takes the keyword argument width to set the width of the bars:

import matplotlib.pyplot as plt


import numpy as np

x = np.array(["A", "B", "C", "D"])


y = np.array([3, 8, 1, 10])

plt.bar(x, y, width = 0.1)


plt.show()
Choosing the Right Graph: (...)
● Creating comparisons with bar charts: (...)
Choosing the Right Graph: (...)
● Creating comparisons with bar charts: (...)
Choosing the Right Graph: (...)
● Showing distributions using histograms :

○ Histogram is a graph showing frequency distribution


○ It categorize data by breaking it into bins, where each bin contains a subset
of the data range.

○ It then displays the number of items in each bin helping us to view:

■ Distribution of data

■ Progression of data from one bin to another

○ Histogram is used with continuous data


Choosing the Right Graph: (...)
● Showing distributions using histograms : (...)

○ Most used parameters of histogram:


■ bins: defines the equal-width bins in the range
■ range: specified lower and upper range of bins. Range has no effect if
bins are in sequence i.e. it will consider min and max value

■ histtype: {'bar', 'barstacked', 'step', 'stepfilled'}, [default value : 'bar']


■ align: {'left', 'mid', 'right'}, [default value : 'mid']
■ color: To assign color schemes
Choosing the Right Graph: (...)
● Showing distributions using histograms : (...)
Choosing the Right Graph: (...)
● Showing distributions using histograms : (...)
Choosing the Right Graph: (...)
import matplotlib.pyplot as plt
import numpy as np

# Generate random data for the histogram


data = np.random.randn(1000)

plt.hist(data, bins=30, color='skyblue',


edgecolor='black')

# Adding labels and title


plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Basic Histogram')

plt.show()
Choosing the Right Graph: (...)
● Depicting groups using boxplots :
○ Boxplots is a means of depicting groups of numbers through their
quartiles (3 points dividing a group into 4 equal parts)
○ A box-plot is a very useful and standardized way of displaying the
distribution of data based on a five-number summary (minimum, first
quartile, second quartile(median), third quartile and maximum).
○ Boxplot may also have lines called whiskers, indicating data outside the
upper and lower quartiles
○ It helps in understanding these parameters of the distribution of data and is
extremely helpful in detecting outliers.
Choosing the Right Graph: (...)
● Depicting groups using boxplots : (...)
Choosing the Right Graph: (...)
● Depicting groups using boxplots : (...)
Matplotlib Scatter
● matplotlib.pyplot.scatter() is used to
create scatter plots, which are essential
for visualizing relationships between
numerical variables. Scatter plots help
illustrate how changes in one variable
can influence another, making them
invaluable for data analysis.
● A basic scatter plot can be created
using matplotlib.pyplot.scatter() by
plotting two sets of data points on the
x and y axes:
Matplotlib Scatter (Multiple Datasets)
Matplotlib Scatter (Apply Legend)
import matplotlib.pyplot as plt

# adjust coordinates
x = [1,2,3,4,5]
y1 = [2,4,6,8,10]
y2 = [3,6,9,12,15]

# depict illustration
plt.scatter(x, y1)
plt.scatter(x,y2)

# apply legend()
plt.legend(["x*2" , "x*3"])
plt.show()
Matplotlib Scatter (With lines)
import numpy as np
import matplotlib.pyplot as plt

# initialize x and y coordinates


x = [0.1, 0.2, 0.3, 0.4, 0.5]
y = [6.2, 8.4, 8.5, 9.2, 6.3]

# set the title of a plot


plt.title("Connected Scatterplot points with
lines")

# plot scatter plot with x and y data


plt.scatter(x, y)

# plot with x and y data


plt.plot(x, y)
Increase the size of scatter points
Parameters:
● x_axis_data- An array containing x-axis data
● y_axis_data- An array containing y-axis data
● s- marker size (can be scalar or array of size equal to size of x or y)
● c- color of sequence of colors for markers
● marker– marker style
● linewidths- width of marker border
● edgecolor- marker border color
● alpha- blending value, between 0 (transparent) and 1 (opaque)
Matplotlib Scatter (Change of Scatter Points)
import matplotlib.pyplot as plt
import numpy as np

x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [8, 7, 6, 4, 5, 6, 7, 8, 9, 10]

plt.xticks(np.arange(11))
plt.yticks(np.arange(11))

plt.scatter(x, y, s=500, c='g')

plt.title("Scatter Plot", fontsize=25)

plt.xlabel('x-axis', fontsize=18)
plt.ylabel('y-axis', fontsize=18)

plt.show()
Matplotlib Scatter (Change of Scatter Points)
import matplotlib.pyplot as plt
import numpy as np

x = [1,2,3,4,5,6,7,8,9,10,11,12]
y = [1,2,3,4,5,6,7,8,9,10,11,12]
points_size =
[100,200,300,400,500,600,700,800,900,1000,1100,1200]

plt.xticks(np.arange(13))
plt.yticks(np.arange(13))
plt.scatter(x,y,s=points_size,c='g')
plt.title("Scatter Plot with increase in size of scatter
points ", fontsize=22)

plt.xlabel('x-axis',fontsize=20)
plt.ylabel('y-axis',fontsize=20)
plt.show()
Choosing the Right Graph: (...)
● Seeing data patterns using scatterplots :

○ Scatterplots show clusters of data rather than trends.

You might also like