UNIT4
UNIT4
(114AT01)
Module 4 :
Data Visualization
1
Contents:
● Data Visualization
○ Basic Principles
○ Ideas and Tools for Data Visualization
● Examples of inspiring (industry) projects
● Exercise : Create your own visualization for complex data set
2
Data Visualization :
● Data visualization is the process of translating raw data into graphs, images that
explain numbers and allow us to gain insight into them.
● It shifts the way we make use of the knowledge to build meaning out of it, to find
new patterns, and to identify trends.
3
Data Visualization Basic Principles :
● Data visualization is a very useful tool in today’s data-driven business world
● It is necessary to apply effective data visualization principles to improve the way data
is displayed to our target audience.
4
Data Visualization Basic Principles : (...)
● Some of the key aspects of effective data visualization include :
○ Determine the best visual
■ To begin with, it is imperative to understand the volume of data in hand.
■ Then identify the aspects that we wish to visualize along with the information that we
wish to convey.
■ After this, select the best-suited and simplest visual format for your target
audience.
○ Use patterns : Establish a pattern by using similar chart types, colors etc.
○ Compare aspects :
■ Side-by-side comparison of aspects to make understanding of data easier.
■ May align data either horizontally/vertically to compare accurately
6
Ideas and Tools for Visualization :
● Importance of Data Visualization
○ The visual representation serves as a compass to guide decision-makers through the
complexities and empowering them to extract meaningful insights.
7
Ideas and Tools for Visualization : (...)
● Tools : ( Details of Visualization Tools )
8
Python Libraries for Data Visualization :
● Two of the most popular libraries are Seaborn and Matplotlib.
○ Matplotlib
■ Powerful library that provides a high degree of control over every aspect of a plot.
■ Great for creating basic plots such as line plots, scatter plots, and bar charts.
■ Matplotlib is also highly customizable and can be used to create complex
visualizations.
○ Seaborn
■ Is built on top of Matplotlib and provides a higher-level interface for creating
statistical graphics.
9
Python Libraries for Data Visualization : (...)
● Which is best?
○ Seaborn is designed to work with Pandas dataframes and provides easy-to-use
functions for creating more complex plots such as heatmaps, violin plots, and box plots.
○ If we need complete control over every aspect of your plot and want to create complex
visualizations, then Matplotlib may be the better choice.
○ However, if we are working with dataframes and want to quickly create statistical
graphics with minimal effort, then Seaborn would be the better choice.
● In summary, both Seaborn and Matplotlib have their strengths and weaknesses.
● Choosing between them ultimately depends on your specific needs and preferences.
10
Mostly used functions in Matplotlib :
● Most of the Matplotlib utilities lies under the pyplot submodule, and are usually
imported under the plt alias:import matplotlib.pyplot as plt
1. plot() function: Draw points(markers) or also draws line from point to point.
■ This function takes two parameters. First parameter takes array of values for X-
axis and second parameter takes array of values for Y-axis.
2. xlabel() and ylabel(): For specifying labels along X-axis and Y-axis
3. scatter(): To draw a scatter plot.
■ The scatter() function plots one dot for each observation.
■ It needs two arrays of the same length, one for the values of the x-axis, and one for
values on the y-axis:
4. bar(): To draw bar graph
5. pie(): To draw pie charts
Line Plot
12
Starting with a Graph:
● Defining the plot:
13
Starting with a Graph: (…)
● Drawing multiple lines and plots:
14
Starting with a Graph: (…)
● Saving your work to disk:
15
Setting the Axis, Ticks, Grids :
● Some means or labels are needed to understand the data shown visually so that
viewer can compare the displayed data
● Adding grids :
o ax.grid() : Gridlines enable to see the precise value for each element of a graph
Setting the Axis, Ticks, Grids : (…)
Setting the Axis, Ticks, Grids : (…)
%matplotlib inline :
Supports inline plotting of the graph
18
Setting the Axis, Ticks, Grids : (…)
%matplotlib auto :
Removes inline plotting of the graph
19
Setting the Axis, Ticks, Grids : (…)
%matplotlib notebook :
Interacting plotting of the graph with pan and
zoom controls, can move between views and
allows to save/ download
20
Defining the Line Appearance:
● Working with line styles :
○ Line styles differentiate graphs by drawing the lines in various ways
○ Line style (ls) is the third argument to the plot() function call
21
Defining the Line Appearance: (...)
● Working with line styles : (...)
22
Defining the Line Appearance: (...) Character Color
23
Defining the Line Appearance: (...)
● Adding markers :
○ Markers add a special symbol to each data point in a line graph
24
Defining the Line Appearance: (...)
● Using labels, annotations and legends :
○ Labels :
■ Provides positive identification of a particular data element or grouping
■ Purpose : To make it easy for the viewer to know the name or kind of data illustrated
■ Helps people understand the significance of each axis of any graph.
25
Defining the Line Appearance: (...)
● Using labels, annotations and legends : (...)
○ Annotations :
■ Augments the information the viewer can immediately see about the data with notes,
sources, or other useful information.
26
Defining the Line Appearance: (...)
● Using labels, annotations and legends : (...)
○ Legend :
■ Presents a listing of the data groups within the graph and often provides cues to make
identification of the data group easier.
plt.subplot(1, 2, 1)
plt.plot(x,y)
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(1, 2, 2)
plt.plot(x,y)
plt.show()
The subplot() Function
The subplot() function takes three arguments that describes the layout of the figure.
The layout is organized in rows and columns, which are represented by the first and
second argument.
plt.subplot(1, 2, 1)
#the figure has 1 row, 2 columns, and this plot is the first plot.
plt.subplot(1, 2, 2)
#the figure has 1 row, 2 columns, and this plot is the second plot.
Matplotlib Subplot
Draw 2 plots on top of each other:
import matplotlib.pyplot as plt
import numpy as np
#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(2, 1, 1)
plt.plot(x,y)
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(2, 1, 2)
plt.plot(x,y)
plt.show()
Title in Subplot()
You can add a title to each plot with the title() function:
#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(1, 2, 1)
plt.plot(x,y)
plt.title("SALES")
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(1, 2, 2)
plt.plot(x,y)
plt.title("INCOME")
Super Title in Subplot()
You can add a title to the entire figure with the suptitle() function:
#plot 1:
x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.subplot(1, 2, 1)
plt.plot(x,y)
plt.title("SALES")
#plot 2:
x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])
plt.subplot(1, 2, 2)
plt.plot(x,y)
plt.title("INCOME")
plt.suptitle("MY SHOP")
Choosing the Right Graph:
● Kind of graph we choose determines how people view the associated data
● e.g.
○ Pie chart : To show how various data elements contribute towards a whole
● Idea : To choose a graph that naturally leads people to draw the conclusion that
in %
plt.bar(x,y)
plt.show()
Choosing the Right Graph: (...)
Horizontal Bars
If you want the bars to be displayed horizontally instead of vertically, use the barh() function:
plt.barh(x, y)
plt.show()
Choosing the Right Graph: (...)
Bar Width
The bar() takes the keyword argument width to set the width of the bars:
■ Distribution of data
plt.show()
Choosing the Right Graph: (...)
● Depicting groups using boxplots :
○ Boxplots is a means of depicting groups of numbers through their
quartiles (3 points dividing a group into 4 equal parts)
○ A box-plot is a very useful and standardized way of displaying the
distribution of data based on a five-number summary (minimum, first
quartile, second quartile(median), third quartile and maximum).
○ Boxplot may also have lines called whiskers, indicating data outside the
upper and lower quartiles
○ It helps in understanding these parameters of the distribution of data and is
extremely helpful in detecting outliers.
Choosing the Right Graph: (...)
● Depicting groups using boxplots : (...)
Choosing the Right Graph: (...)
● Depicting groups using boxplots : (...)
Matplotlib Scatter
● matplotlib.pyplot.scatter() is used to
create scatter plots, which are essential
for visualizing relationships between
numerical variables. Scatter plots help
illustrate how changes in one variable
can influence another, making them
invaluable for data analysis.
● A basic scatter plot can be created
using matplotlib.pyplot.scatter() by
plotting two sets of data points on the
x and y axes:
Matplotlib Scatter (Multiple Datasets)
Matplotlib Scatter (Apply Legend)
import matplotlib.pyplot as plt
# adjust coordinates
x = [1,2,3,4,5]
y1 = [2,4,6,8,10]
y2 = [3,6,9,12,15]
# depict illustration
plt.scatter(x, y1)
plt.scatter(x,y2)
# apply legend()
plt.legend(["x*2" , "x*3"])
plt.show()
Matplotlib Scatter (With lines)
import numpy as np
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [8, 7, 6, 4, 5, 6, 7, 8, 9, 10]
plt.xticks(np.arange(11))
plt.yticks(np.arange(11))
plt.xlabel('x-axis', fontsize=18)
plt.ylabel('y-axis', fontsize=18)
plt.show()
Matplotlib Scatter (Change of Scatter Points)
import matplotlib.pyplot as plt
import numpy as np
x = [1,2,3,4,5,6,7,8,9,10,11,12]
y = [1,2,3,4,5,6,7,8,9,10,11,12]
points_size =
[100,200,300,400,500,600,700,800,900,1000,1100,1200]
plt.xticks(np.arange(13))
plt.yticks(np.arange(13))
plt.scatter(x,y,s=points_size,c='g')
plt.title("Scatter Plot with increase in size of scatter
points ", fontsize=22)
plt.xlabel('x-axis',fontsize=20)
plt.ylabel('y-axis',fontsize=20)
plt.show()
Choosing the Right Graph: (...)
● Seeing data patterns using scatterplots :