0% found this document useful (0 votes)
6 views41 pages

UNIT-6(Data Analytics and Visualization With Python)

The document provides an overview of essential Python libraries for data analytics, focusing on Pandas, NumPy, SciPy, and Matplotlib. It explains the functionalities, pros, and cons of each library, highlighting their roles in data manipulation, numerical calculations, and data visualization. Additionally, it details the process of creating histograms using Matplotlib, including customization options.

Uploaded by

Raj Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views41 pages

UNIT-6(Data Analytics and Visualization With Python)

The document provides an overview of essential Python libraries for data analytics, focusing on Pandas, NumPy, SciPy, and Matplotlib. It explains the functionalities, pros, and cons of each library, highlighting their roles in data manipulation, numerical calculations, and data visualization. Additionally, it details the process of creating histograms using Matplotlib, including customization options.

Uploaded by

Raj Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

structures to be inserted in both rows and columns

 6.1 ESSENTIAL DATA LIBRARIES FOR DATA


of tabular data.
ANALYTICS : PANDAS
(iii) Labelling : To allow automatic data alignment and
indexing, pandas provide labeling on series and
GQ. Explain Pandas in detail.
tabular data.
Python is a great language for doing data analysis, (iv) Multiple Labels for a Data Item : Heterogeneous
primarily because of the fantastic ecosystem of data- indexing of data spread across multiple axes, which
centric Python packages. Pandas are one of those helps in creating more than one label on each data
packages, and makes importing and analyzing data much item.
easier.
(v) Grouping : The functionality to perform split-apply-
All of us can do data analysis using pen and paper combine on series as well on tabular data.
on small data sets. We require specialized tools and
(vi) Identify and Fix Missing Data : Programmers can
techniques to analyze and derive meaningful information
quickly identify and mix missing data floating and
from massive datasets. Pandas Python is one of those
non-floating pointing numbers using pandas.
libraries for data analysis that contains high-level data
(vii) Powerful capabilities to load and save data from
structures and tools to manipulate data in a simple way.
various formats such as JSON, CSV, HDF5, etc.
Providing an effortless yet effective way to analyze data
requires the ability to index, retrieve, split, join, (viii)Conversion from NumPy and Python data
restructure, and various other analyses on both multi structures to pandas objects.
and single-dimensional data. (ix) Slicing and sub-setting of datasets, including
What is Pandas? merging and joining data sets with SQL- like
constructs.
Pandas is a Python library used for working with
data sets. It has functions for analyzing, cleaning, Although pandas provide many statistical methods,
exploring, and manipulating data. The name it is not enough to do data science in Python. Pandas
depend upon other python libraries for data science like
"Pandas" has a reference to both "Panel Data", and
NumPy, SciPy, Sci-Kit Learn, Matplotlib, ggvis in the
"Python Data Analysis" and was created by Wes
Python ecosystem to conclude from large data sets. Thus,
McKinney in 2008.
making it possible for Pandas applications to take
Why Use Pandas?
advantage of the robust and extensive Python
Pandas allows us to analyze big data and make framework.
conclusions based on statistical theories. Pandas can
clean messy data sets, and make them readable and
relevant. Relevant data is very important in data
science.
Key Features of Pandas

Pandas data analysis library has some unique


features that provide these capabilities-
(i) The Series and Data Frame Objects : These two
are high-performance array and table structures for Fig. 6.1.1 : Essential Data Libraries for data analytics:
Pandas
representing the heterogeneous and homogeneous
data sets in Pandas Python. Pros of using Pandas

(ii) Restructuring of Data Sets : Pandas python


provides the flexibility for reshaping the data
• Pandas allow you to represent data effortlessly and • In Python we have lists that serve the purpose of
in a simpler manner, improving data analysis and arrays, but they are slow to process.
comprehension.
• NumPy aims to provide an array object that is up to
• For data science projects, such a simple data 50x faster than traditional Python lists.
representation helps glean better insights.
• The array object in NumPy is called ndarray, it
• Pandas is highly efficient as it enables you to
provides a lot of supporting functions that make
perform any task by writing only a few lines of code.
working with ndarray very easy.
• Pandas provide users with a broad range of
• Arrays are very frequently used in data science,
commands to analyze data quickly.
where speed and resources are very important.
Cons of using Pandas
 Key Features of NumPy
• The learning curve for Pandas may appear to be
simple at first, but as you start working with it, you Below are some of the features provided by NumPy-
may find it challenging to grasp.  (I) Integration with legacy languages
• One of the most evident flaws of Pandas is that it (1) Mathematical Operations : It provides all the
isn’t suitable for working with 3D matrices. standard functions required to perform operations
on large data sets swiftly and efficiently, which
 6.2 NUMPY
otherwise have to be achieved through looping
GQ. Explain Numpy? constructs.
(2) ndarray: It is a fast and efficient multidimensional
• Numerical Python code name: - NumPy is a Python
array that can perform vector-based arithmetic
library for numerical calculations and scientific
operations and has powerful broadcasting
computations. NumPy provides numerous features
capabilities.
which Python enthusiasts and programmers can use
(3) I/O Operations: It provides various tools which can
to work with high-performing arrays and matrices.
be used to write/read huge data sets from disk. It
NumPy arrays provide vectorization of
also supports I/O operations on memory-based file
mathematical operations, which gives it a
mappings.
performance boost over Python’s looping
constructs. (4) Fourier transform capabilities, Linear Algebra, and
Random Number Generation.
• Pandas Series and DataFrame objects rely primarily
on NumPy arrays for all the mathematical  (II) Pros of using NumPy
calculations like slicing elements and performing (1) NumPy provides efficient and scalable data storage
vector operations. and better data management for mathematical
What is NumPy? calculations.
• NumPy is a Python library used for working with (2) The Numpy array contains a variety of functions,
arrays. methods, and variables that make computing
• It also has functions for working in domain of linear matrices simpler.
algebra, fourier transform, and matrices.
• NumPy was created in 2005 by Travis Oliphant. It is
an open source project and you can use it freely.
• NumPy stands for Numerical Python.
Why Use NumPy?
 (III) Cons of using NumPy specialized, sophisticated applications backed by a
robust and fast-growing Python community.
(1) "Nan" is an acronym for "not a number” intended to
deal with the issue of missing values. Although What is SciPy?

NumPy supports "nan," Python's lack of cross- • SciPy is a scientific computation library that uses
platform compatibility makes it challenging for NumPy underneath. SciPy stands for Scientific
users. As a result, we may run into issues while Python.
comparing values within the Python interpreter. • It provides more utility functions for optimization,

(2) When data is stored in contiguous memory stats and signal processing. Like NumPy, SciPy is
open source so we can use it freely. SciPy was
addresses, insertion and deletion processes become
created by NumPy's creator Travis Olliphant.
expensive since shifting.
Why Use SciPy?

If SciPy uses NumPy underneath, why can we not


just use NumPy? SciPy has optimized and added
functions that are frequently used in NumPy and
Data Science.

Which Language is SciPy Written in?

SciPy is predominantly written in Python, but a few


segments are written in C.

Fig. 6.2.1 : NumPy Pros of using SciPy

 6.3 SCIPY (1) Visualizing and manipulating data with high-level

GQ. Explain SciPy? commands and classes.

(2) Python sessions that are both robust and


• Scientific Python code name, SciPy-It is an
interactive.
assortment of mathematical functions and
(3) For parallel programming, there are classes and
algorithms built on Python’s extension NumPy.
web and database procedures.
SciPy provides various high-level commands and
classes for manipulating and visualizing data. SciPy Con of using SciPy

is useful for data-processing and prototyping


• SciPy does not provide any plotting function
systems.
because its focus is on numerical objects and
• Apart from this, SciPy provides other advantages for algorithms.
building scientific applications and many
Fig. 6.3.1 : Essential Data Libraries for data analytics

 6.4 PLOTTING AND VISUALIZATION WITH PYTHON: INTRODUCTION TO MATPLOTLIO

GQ. Explain Matplotlio ?

• Data Visualization is the process of presenting data in the form of graphs or charts. It helps to understand large
and complex amounts of data very easily. It allows the decision-makers to make decisions very efficiently and
also allows them in identifying new trends and patterns very easily.
• It is also used in high-level data analysis for Machine Learning and Exploratory Data Analysis (EDA). Data
visualization can be done with various tools like Tableau, Power BI, Python.

Fig. 6.4.1 : Introduction to Matplotlio

 Matplotlib
• Matplotlib is a low-level library of Python which is used in Python and IPython shells, Jupyter notebook
used for data visualization. It is easy to use and and web application servers also.
emulates MATLAB like graphs and visualization. • Matplotlib has a procedural interface named the
This library is built on the top of NumPy arrays and Pylab, which is designed to resemble MATLAB, a
consist of several plots like line chart, bar chart, proprietary programming language developed by
histogram, etc. It provides a lot of flexibility but at MathWorks.
the cost of writing more code.
• Matplotlib along with NumPy can be considered as
• Matplotlib is one of the most popular Python the open source equivalent of MATLAB.
packages used for data visualization. It is a cross-
• Matplotlib is an open-source drawing library that
platform library for making 2D plots from data in
supports various drawing types
arrays. Matplotlib is written in Python and makes
• You can generate plots, histograms, bar charts, and
use of NumPy, the numerical mathematics extension
other types of charts with just a few lines of code.
of Python.
• It’s often used in web application servers, shells, and
• It provides an object-oriented API that helps in
Python scripts
embedding plots in applications using Python GUI
toolkits such as PyQt, WxPythonotTkinter. It can be
 Basic Plotting with Matplotlib

Fig. 6.4.2

 6.5 CREATE HISTOGRAM • A histogram, with which you may be well-


acquainted, is a kind of bar plot that gives a
GQ. Explain process of creation of Histogram using discretized display of value frequency. The data
Python ?
points are split into discrete, evenly spaced bins, Attribute parameter
and the number of data points in each bin is plotted.
color optional parameter used to set color or
• Using the tipping data from before, we can make a
sequence of color specs
histogram of tip percentages of the total bill using
the hist method on the Series. label optional parameter string or sequence of
• A histogram is basically used to represent data string to match with multiple datasets
provided in a form of some groups. It is accurate log optional parameter used to set
method for the graphical representation of histogram axis on log scale
numerical data distribution.
Let’s create a basic histogram of some random
• It is a type of bar plot where X-axis represents the
values. Below code creates a simple histogram of some
bin ranges while Y-axis gives information about
random values:
frequency.
Python3
• To create a histogram the first step is to create bin
from matplotlib import pyplot as plt
of the ranges, then distribute the whole range of the
import numpy as np
values into a series of intervals, and count the values
which fall into each of the intervals.
# Creating dataset
• Bins are clearly identified as consecutive, non-
a = np.array([22, 87, 5, 43, 56,
overlapping intervals of variables.
73, 55, 54, 11,
• The matplotlib.pyplot.hist() function is used to
20, 51, 5, 79, 31,
compute and create histogram of x.
27])
The following table shows the parameters accepted
by matplotlib.pyplot.hist() function :
# Creating histogram
Attribute parameter fig, ax = plt.subplots(figsize =(10, 7))

X array or sequence of array ax.hist(a, bins = [0, 25, 50, 75, 100])

bins optional parameter contains integer or


# Show plot
sequence or strings
plt.show()
density optional parameter contains boolean
values
range optional parameter represents upper
and lower range of bins
histtype optional parameter used to create type
of histogram [bar, barstacked, step,
stepfilled], default is “bar”
align optional parameter controls the plotting
of histogram [left, right, mid]
weights optional parameter contains array of
weights having same dimensions as x
bottom location of the baseline of each bin
rwidth optional parameter which is relative
width of the bars with respect to bin
width
Output
# Show plot
plt.show()
Output

 Customization of Histogram

• Matplotlib provides a range of different methods to


customize histogram. ➢ Example 6.5.2 : The code below modifies the above
matplotlib.pyplot.hist() function itself provides histogram for a better view and accurate readings.
many attributes with the help of which we can Python3
import matplotlib.pyplot as plt
modify a histogram.The hist() function provide a
import numpy as np
patches object which gives access to the properties
from matplotlib import colors
of the created objects, using this we can modify the
from matplotlib.ticker import PercentFormatter
plot according to our will.
➢ Example 6.5.1
Python3 # Creating dataset
import matplotlib.pyplot as plt np.random.seed(23685752)
import numpy as np N_points = 10000
from matplotlib import colors n_bins = 20
from matplotlib.ticker import PercentFormatter
# Creating distribution
# Creating dataset x = np.random.randn(N_points)
y = .8 ** x + np.random.randn(10000) + 25
np.random.seed(23685752)
legend = ['distribution']
N_points = 10000
n_bins = 20
# Creating histogram
fig, axs = plt.subplots(1, 1,
# Creating distribution
figsize =(10, 7),
x = np.random.randn(N_points)
tight_layout = True)
y = .8 ** x + np.random.randn(10000) + 25

# Creating histogram
# Remove axes splines
fig, axs = plt.subplots(1, 1,
for s in ['top', 'bottom', 'left', 'right']:
figsize =(10, 7),
axs.spines[s].set_visible(False)
tight_layout = True)
# Remove x, y ticks
axs.hist(x, bins = n_bins)
axs.xaxis.set_ticks_position('none')
axs.yaxis.set_ticks_position('none')
# Setting color
# Add padding between axes and labels fracs = ((N**(1 / 5)) / N.max())
axs.xaxis.set_tick_params(pad = 5) norm = colors.Normalize(fracs.min(), fracs.max())
axs.yaxis.set_tick_params(pad = 10)
for thisfrac, thispatch in zip(fracs, patches):
# Add x, y gridlines color = plt.cm.viridis(norm(thisfrac))
axs.grid(b = True, color ='grey', thispatch.set_facecolor(color)
linestyle ='-.', linewidth = 0.5,
alpha = 0.6) # Adding extra features
plt.xlabel("X-axis")
# Add Text watermark plt.ylabel("y-axis")
fig.text(0.9, 0.15, 'Jeeteshgavande30', plt.legend(legend)
fontsize = 12, plt.title('Customized histogram')
color ='red',
ha ='right', # Show plot
va ='bottom', plt.show()
alpha = 0.7)

# Creating histogram
N, bins, patches = axs.hist(x, bins = n_bins)
Output

 6.5.1 Bar Chart • A bar plot or bar chart is a graph that represents the
category of data with rectangular bars with lengths
GQ Explain Bar Chart in detail ? and heights that is proportional to the values which
they represent. The bar plots can be plotted
horizontally or vertically.
# creating the dataset
• A bar chart describes the comparisons between the
discrete categories. One of the axis of the plot data = {'C':20, 'C++':15, 'Java':30,
represents the specific categories being compared,
'Python':35}
while the other axis represents the measured values
corresponding to those categories. courses = list(data.keys())
• The matplotlib API in Python provides the bar() values = list(data.values())
function which can be used in MATLAB style use or
as an object-oriented API.
The syntax of the bar() function to be used with the fig = plt.figure(figsize = (10, 5))
axes is as follows:-
# creating the bar plot
• plt.bar(x, height, width, bottom, align)
plt.bar(courses, values, color ='maroon',
• The function creates a bar plot bounded with a
rectangle depending on the given parameters. width = 0.4)
• Following is a simple example of the bar plot, which plt.xlabel("Courses offered")
represents the number of students enrolled in
different courses of an institute. plt.ylabel("No. of students enrolled")

 Python 3 plt.title("Students enrolled in different courses")


import numpy as np plt.show()
import matplotlib.pyplot as plt
Output

• Here plt.bar(courses, values, color=’maroon’) is the courses column as the


used to X-axis, and the values as the Y-axis.
• enrolled”) are used to label the corresponding • The color attribute is used to set the color of the
specify that the bar chart is to be plotted by using bars(maroon in this case).plt.xlabel(“Courses
offered”) and plt.ylabel(“students axes.plt.title() is
used to make a title for the graph.plt.show() is used name = df['car'].head(12)
to show the graph as output using the previous price = df['price'].head(12)
commands.Customizing the bar plot
Python3 # Figure Size

import pandas as pd fig = plt.figure(figsize =(10, 7))

from matplotlib import pyplot as plt


# Horizontal Bar Plot

# Read CSV into pandas plt.bar(name[0:10], price[0:10])

data = pd.read_csv(r"cars.csv")
data.head() # Show Plot

df = pd.DataFrame(data) plt.show()

Output

It is observed in the above bar graph that the X-axis ticks are overlapping each other thus it cannot be seen
properly. Thus by rotating the X-axis ticks, it can be visible clearly. That is why customization in bar graphs is
required.
Python 3 ax.yaxis.set_tick_params(pad = 10)
import pandas as pd
from matplotlib import pyplot as plt # Add x, y gridlines
ax.grid(b = True, color ='grey',
# Read CSV into pandas linestyle ='-.', linewidth = 0.5,
data = pd.read_csv(r"cars.csv") alpha = 0.2)
data.head()
df = pd.DataFrame(data) # Show top values
ax.invert_yaxis()
name = df['car'].head(12)
price = df['price'].head(12) # Add annotation to bars
for i in ax.patches:
# Figure Size plt.text(i.get_width()+0.2, i.get_y()+0.5,
fig, ax = plt.subplots(figsize =(16, 9)) str(round((i.get_width()), 2)),
# Horizontal Bar Plot fontsize = 10, fontweight ='bold',
ax.barh(name, price) color ='grey')
# Add Plot Title
# Remove axes splines ax.set_title('Sports car and their price in crore',
for s in ['top', 'bottom', 'left', 'right']: loc ='left', )
ax.spines[s].set_visible(False)
# Add Text watermark
# Remove x, y Ticks fig.text(0.9, 0.15, 'Jeeteshgavande30', fontsize = 12,
ax.xaxis.set_ticks_position('none') color ='grey', ha ='right', va ='bottom',
ax.yaxis.set_ticks_position('none') alpha = 0.7)

# Add padding between axes and labels # Show Plot


ax.xaxis.set_tick_params(pad = 5) plt.show()
Output

There are many more Customizations available for bar plots.


 6.5.2 Multiple Bar Plots
Multiple bar plots are used when comparison Output

among the data set is to be done when one variable is


changing. We can easily convert it as a stacked area bar
chart, where each subgroup is displayed by one on top of
the others. It can be plotted by varying the thickness and
position of the bars. Following bar plot shows the
number of students passed in the engineering branch:
Python3

import numpy as np
import matplotlib.pyplot as plt

# set width of bar  6.5.3 Stacked Bar Plot


barWidth = 0.25 Stacked bar plots represent different groups on top
fig = plt.subplots(figsize =(12, 8)) of one another. The height of the bar depends on the
resulting height of the combination of the results of the
# set height of bar groups. It goes from the bottom to the value instead of
going from zero to value. The following bar plot
IT = [12, 30, 1, 8, 22]
represents the contribution of boys and girls in the team.
ECE = [28, 6, 16, 5, 10]
Python3
CSE = [29, 3, 24, 25, 17]
import numpy as np
import matplotlib.pyplot as plt
# Set position of bar on X axis
br1 = np.arange(len(IT))
N=5
br2 = [x + barWidth for x in br1]
br3 = [x + barWidth for x in br2]
boys = (20, 35, 30, 35, 27)
# Make the plot
girls = (25, 32, 34, 20, 25)
plt.bar(br1, IT, color ='r', width = barWidth,
edgecolor ='grey', label ='IT') boyStd = (2, 3, 4, 1, 2)
plt.bar(br2, ECE, color ='g', width = barWidth, girlStd = (3, 5, 2, 3, 3)
edgecolor ='grey', label ='ECE') ind = np.arange(N)
plt.bar(br3, CSE, color ='b', width = barWidth,
width = 0.35
edgecolor ='grey', label ='CSE')

# Adding Xticks fig = plt.subplots(figsize =(10, 7))


plt.xlabel('Branch', fontweight ='bold', fontsize = 15)
p1 = plt.bar(ind, boys, width, yerr = boyStd)
plt.ylabel('Students passed', fontweight ='bold', fontsize
= 15) p2 = plt.bar(ind, girls, width,
plt.xticks([r + barWidth for r in range(len(IT))], bottom = boys, yerr = girlStd)
['2015', '2016', '2017', '2018', '2019'])

plt.legend() plt.ylabel('Contribution')
plt.show() plt.title('Contribution by the teams')
plt.xticks(ind, ('T1', 'T2', 'T3', 'T4', 'T5')) represented by data/sum(data). If um(data)<1,
plt.yticks(np.arange(0, 81, 10)) then the data values returns the fractional area
directly, thus resulting pie will have empty wedge of
plt.legend((p1[0], p2[0]), ('boys', 'girls'))
size 1-sum(data).
• labels is a list of sequence of strings which sets the
plt.show() label of each wedge.
Output
• color attribute is used to provide color to the
wedges.
• autopct is a string used to label the wedge with
their numerical value.
• shadow is used to create shadow of wedge.
Let’s create a simple pie chart using the pie()
function:
Example
Python3

# Import libraries
from matplotlib import pyplot as plt
 6.5.4 Pie Chart
import numpy as np
GQ Explain Pie chart in detail?

A Pie Chart is a circular statistical plot that can # Creating dataset


display only one series of data. The area of the chart is cars = ['AUDI', 'BMW', 'FORD',
the total percentage of the given data. The area of slices 'TESLA', 'JAGUAR', 'MERCEDES']
of the pie represents the percentage of the parts of the
data. The slices of pie are called wedges. The area of the
data = [23, 17, 35, 29, 12, 41]
wedge is determined by the length of the arc of the
wedge. The area of a wedge represents the relative
percentage of that part with respect to whole data. Pie # Creating plot
charts are commonly used in business presentations like fig = plt.figure(figsize =(10, 7))
sales, operations, survey results, resources, etc as they
plt.pie(data, labels = cars)
provide a quick summary.

 Creating Pie Chart


# show plot
Matplotlib API has pie() function in its pyplot
plt.show()
module which create a pie chart representing the data in
an array.
• Syntax: matplotlib.pyplot.pie(data, explode=None,
labels=None, colors=None, autopct=None,
shadow=False)
Parameters

• data represents the array of data values to be


plotted, the fractional area of each slice is
Output # Creating color parameters
colors = ( "orange", "cyan", "brown",
"grey", "indigo", "beige")

# Wedge properties
wp = { 'linewidth' : 1, 'edgecolor' : "green" }

# Creating autocpt arguments


def func(pct, allvalues):
absolute = int(pct / 100.*np.sum(allvalues))
return "{:.1f}%\n({:d} g)".format(pct, absolute)

 6.5.5 Customizing Pie Chart


# Creating plot
A pie chart can be customized on the basis several
fig, ax = plt.subplots(figsize =(10, 7))
aspects. The startangle attribute rotates the plot by the
wedges, texts, autotexts = ax.pie(data,
specified degrees in counter clockwise direction
autopct = lambda pct: func(pct, data),
performed on x-axis of pie chart. shadow attribute
explode = explode,
accepts boolean value, if its true then shadow will appear
labels = cars,
below the rim of pie. Wedges of the pie can be
customized using wedgeprop which takes Python shadow = True,
dictionary as parameter with name values pairs denoting colors = colors,
the wedge properties like linewidth, edgecolor, etc. By startangle = 90,
setting frame=True axes frame is drawn around the pie wedgeprops = wp,
chart.autopct controls how the percentages are textprops = dict(color ="magenta"))
displayed on the wedges. Let us try to modify the above
plot: # Adding legend
Example 1 ax.legend(wedges, cars,
Python3 title ="Cars",
# Import libraries loc ="center left",
import numpy as np bbox_to_anchor =(1, 0, 0.5, 1))
import matplotlib.pyplot as plt
plt.setp(autotexts, size = 8, weight ="bold")
# Creating dataset ax.set_title("Customizing pie chart")
cars = ['AUDI', 'BMW', 'FORD',
'TESLA', 'JAGUAR', 'MERCEDES'] # show plot
plt.show()
data = [23, 17, 35, 29, 12, 41]

# Creating explode data


explode = (0.1, 0.0, 0.2, 0.3, 0.0, 0.0)
Output inner_colors = cmap(np.array([1, 2, 5, 6, 9,
10, 12, 13, 15,
17, 18, 20 ]))

# Creating plot
fig, ax = plt.subplots(figsize =(10, 7),
subplot_kw = dict(polar = True))

ax.bar(x = left[:, 0],


width = norm.sum(axis = 1),
bottom = 1-size,
height = size,
color = outer_colors,
edgecolor ='w',
linewidth = 1,
align ="edge")

➢ Example 6.5.3 : Creating a Nested Pie Chart


ax.bar(x = left.flatten(),
Python3
width = norm.flatten(),
# Import libraries bottom = 1-2 * size,
from matplotlib import pyplot as plt height = size,
import numpy as np color = inner_colors,
edgecolor ='w',
linewidth = 1,
# Creating dataset
align ="edge")
size = 6
cars = ['AUDI', 'BMW', 'FORD',
ax.set(title ="Nested pie chart")
'TESLA', 'JAGUAR', 'MERCEDES']
ax.set_axis_off()

data = np.array([[23, 16], [17, 23],


# show plot
[35, 11], [29, 33],
plt.show()
[12, 27], [41, 42]])
Output
Nested pie chart
# normalizing data to 2 pi
norm = data / np.sum(data)*2 * np.pi

# obtaining ordinates of bar edges


left = np.cumsum(np.append(0,
norm.flatten()[:-1])).reshape(data.shape)

# Creating color scale


cmap = plt.get_cmap("tab20c")  6.5.6 Box Plot
outer_colors = cmap(np.arange(6)*4)
Python3
GQ. Explain Box Plot?
# Import libraries
A Box Plot is also known as Whisker plot is import matplotlib.pyplot as plt
created to display the summary of the set of data values import numpy as np
having properties like minimum, first quartile, median,
third quartile and maximum. In the box plot, a box is # Creating dataset
created from the first quartile to the third quartile, a np.random.seed(10)
vertical line is also there which goes through the box at data = np.random.normal(100, 20, 200)
the median. Here x-axis denotes the data to be plotted
while the y-axis shows the frequency distribution. fig = plt.figure(figsize =(10, 7))

 Creating Box Plot # Creating plot


The matplotlib.pyplot module of matplotlib library plt.boxplot(data)
provides boxplot() function with the help of which we
can create box plots. # show plot
plt.show()
Syntax
Output
matplotlib.pyplot.boxplot(data, notch=None,
vert=None, patch_artist=None, widths=None)
Parameters
Attribute Value
data array or sequence of array to be plotted
notch optional parameter accepts boolean
values
vert optional parameter accepts boolean
values false and true for horizontal and
vertical plot respectively
bootstrap optional parameter accepts int specifies
intervals around notched boxplots
usermedians optional parameter accepts array or
sequence of array dimension compatible
with data
positions optional parameter accepts array and
sets the position of boxes
Customizing Box Plot
widths optional parameter accepts array and
sets the width of boxes The matplotlib.pyplot.boxplot() provides endless
customization possibilities to the box plot. The notch =
patch_artist optional parameter having boolean
True attribute creates the notch format to the box plot,
values
patch_artist = True fills the boxplot with colors, we can
labels sequence of strings sets label for each set different colors to different boxes. The vert = 0
dataset attribute creates horizontal box plot. labels takes same
meanline optional having boolean value try to dimensions as the number data sets.
render meanline as full width of box
➢ Example 6.5.4 :
order optional parameter sets the order of the
Python3
boxplot
The data values given to the ax.boxplot() method # Import libraries
can be a Numpy array or Python list or Tuple of arrays.
import matplotlib.pyplot as plt
Let us create the box plot by using
numpy.random.normal() to create some random data, it import numpy as np
takes mean, standard deviation, and the desired number
of values as arguments.
Example # Creating dataset
np.random.seed(10) # Creating axes instance
ax = fig.add_axes([0, 0, 1, 1])
data_1 = np.random.normal(100, 10, 200)
data_2 = np.random.normal(90, 20, 200) # Creating plot
data_3 = np.random.normal(80, 30, 200) bp = ax.boxplot(data)
data_4 = np.random.normal(70, 40, 200)
data = [data_1, data_2, data_3, data_4] # show plot
plt.show()
fig = plt.figure(figsize =(10, 7))

Output
➢ Example 6.5.5 : Let’s try to modify the above plot with
some of the customizations
# changing color and linewidth of
Python3
# caps
# Import libraries
for cap in bp['caps']:
import matplotlib.pyplot as plt
cap.set(color ='#8B008B',
import numpy as np
linewidth = 2)

# Creating dataset
# changing color and linewidth of
np.random.seed(10)
# medians
data_1 = np.random.normal(100, 10, 200)
for median in bp['medians']:
data_2 = np.random.normal(90, 20, 200)
median.set(color ='red',
data_3 = np.random.normal(80, 30, 200)
linewidth = 3)
data_4 = np.random.normal(70, 40, 200)
data = [data_1, data_2, data_3, data_4]
# changing style of fliers
for flier in bp['fliers']:
fig = plt.figure(figsize =(10, 7))
flier.set(marker ='D',
ax = fig.add_subplot(111)
color ='#e7298a',
alpha = 0.5)
# Creating axes instance
bp = ax.boxplot(data, patch_artist = True,
# x-axis labels
notch ='True', vert = 0)
ax.set_yticklabels(['data_1', 'data_2',
'data_3', 'data_4'])
colors = ['#0000FF', '#00FF00',
'#FFFF00', '#FF00FF']
# Adding title
plt.title("Customized box plot")
for patch, color in zip(bp['boxes'], colors):
patch.set_facecolor(color)
# Removing top axes and right axes
# ticks
# changing color and linewidth of
ax.get_xaxis().tick_bottom()
# whiskers
ax.get_yaxis().tick_left()
for whisker in bp['whiskers']:
whisker.set(color ='#8B008B',
# show plot
linewidth = 1.5,
plt.show()
linestyle =":")
Output

 6.6 VIOLIN PLOT USING MATPLOTLIB

GQ. Explain Violin plot using Matplotlib?

Matplotlib is a plotting library for creating static,


animated, and interactive visualizations in Python.
Matplotlib can be used in Python scripts, the Python and
IPython shell, web application servers, and various
graphical user interface toolkits like Tkinter, awxPython,
etc.
What does a violin plot signify?
Violin plots are a combination of box plot and • quantiles: array-like, default = None
histograms. It portrays the distribution, median, If not None, set a list of floats in interval [0, 1] for
interquartile range of data. So we see that iqr and each violin, which stands for the quantiles that will be
median are the statistical information provided by box rendered for that violin.
plot whereas distribution is being provided by the
• points: scalar, default = 100
histogram.
Defines the number of points to evaluate each of the
 Violin Plot gaussian kernel density estimations at.
• The white dot refers to the median. • bw_method: str, scalar or callable, optional
• The end points of the bold line represent the iqr1 The method used to calculate the estimator
and iqr3. bandwidth. This can be ‘scott’, ‘silverman’, a scalar
• The end points of the thin line represent the min constant or a callable. If a scalar, this will be used
and max similar to the box plot. directly as kde.factor. If a callable, it should take a
• The distribution above 1.5x interquartile(min, max GaussianKDE instance as its only parameter and return a
end points of the thin line) denotes the presence of scalar. If None (default), ‘scott’ is used.
outliers. ➢ Example 6.6.1

• Syntax: violinplot(dataset, positions=None, import numpy as np


vert=True, widths=0.5, showmeans=False, import matplotlib.pyplot as plt
showextrema=True, showmedians=False,
quantiles=None, points=100, # creating a list of
bw_method=None, *, data=None) # uniformly distributed values
Parameters
uniform = np.arange(-100, 100)
• dataset: Array or a sequence of vectors.
The input data. # creating a list of normally
• positions: array-like, default = [1, 2, …, n]. # distributed values
Sets the positions of the violins. The ticks and limits normal = np.random.normal(size = 100)*30
are automatically set to match the positions.
• vert: bool, default = True. # creating figure and axes to
If true, creates a vertical violin plot. Otherwise, # plot the image
creates a horizontal violin plot. fig, (ax1, ax2) = plt.subplots(nrows = 1,
• widths: array-like, default = 0.5 ncols = 2,
Either a scalar or a vector that sets the maximal figsize =(9, 4),
width of each violin. The default is 0.5, which uses about sharey = True)
half of the available horizontal space.
• showmeans: bool, default = False # plotting violin plot for
If True, will toggle rendering of the means. # uniform distribution
• showextrema: bool, default = True ax1.set_title('Uniform Distribution')
If True, will toggle rendering of the extrema. ax1.set_ylabel('Observed values')
• showmedians: bool, default = False ax1.violinplot(uniform)
If True, will toggle rendering of the medians.
# plotting violin plot for
# normal distribution
ax2.set_title('Normal Distribution')
ax2.violinplot(normal)

# Function to show the plot


plt.show()
Output
➢ Example 6.6.2 : Multiple Violin plots  6.7 INTRODUCTION TO SEABORN LIBRARY
import numpy as np
import matplotlib.pyplot as plt
GQ. Explain seaborn Library?
from random import randint
Seaborn is an amazing visualization library for
# Creating 3 empty lists statistical graphics plotting in Python. It provides
l1 = [] beautiful default styles and color palettes to make
l2 =[]
statistical plots more attractive. It is built on the top of
l3 =[]
matplotlib library and also closely integrated to the data
structures from pandas. Seaborn aims to make
# Filling the lists with random value
visualization the central part of exploring and
for i in range(100):
n = randint(1, 100) understanding data. It provides dataset-oriented APIs, so
l1.append(n) that we can switch between different visual
representations for same variables for better
for i in range(100): understanding of dataset.
n = randint(1, 100) What Is Seaborn in Python?
l2.append(n)
Python Seaborn library is a widely popular data
visualization library that is commonly used for data
for i in range(100):
science and machine learning tasks. You build it on top of
n = randint(1, 100)
the matplotlib data visualization library and can perform
l3.append(n)
exploratory analysis. You can create interactive plots to
random_collection = [l1, l2, l3] answer questions about your data.

# Create a figure instance


 6.7.1 Different Categories of Plot in Seaborn

fig = plt.figure() Plots are basically used for visualizing the


relationship between variables. Those variables can be
# Create an axes instance either be completely numerical or a category like a
ax = fig.gca()
group, class or division. Seaborn divides plot into the
below categories –
# Create the violinplot
violinplot = ax.violinplot(random_collection) • Relational plots: This plot is used to understand
plt.show() the relation between two variables.
Output • Categorical plots: This plot deals with categorical
variables and how they can be visualized.
• Distribution plots: This plot is used for examining
univariate and bivariate distributions
• Regression plots: The regression plots in seaborn
are primarily intended to add a visual guide that
helps to emphasize patterns in a dataset during
exploratory data analyses.
• Matrix plots: A matrix plot is an array of
scatterplots.
• Multi-plot grids: It is an useful approach is to draw
multiple instances of the same plot on different
subsets of the dataset.
Installation

• For python environment : pip install seaborn


• For conda environment : conda install seaborn
Dependencies

• Python 3.6+
• numpy (>= 1.13.3)
• scipy (>= 1.0.1) Line plot
• pandas (>= 0.22.0) The line plot is one of the most basic plot in seaborn
• matplotlib (>= 2.1.2) library. This plot is mainly used to visualize the data in
• statsmodel (>= 0.8.0) form of some time series, i.e. in continuous manner.

Some basic plots using seaborn Python 3


import seaborn as sns
Dist plot : Seaborn dist plot is used to plot a
histogram, with some other variations like kdeplot and
sns.set(style="dark")
rugplot.
fmri = sns.load_dataset("fmri")
Python3

# Importing libraries
# Plot the responses for different\
import numpy as np # events and regions
import seaborn as sns sns.lineplot(x="timepoint",
y="signal",
# Selecting style as white, hue="region",
# dark, whitegrid, darkgrid style="event",
# or ticks data=fmri)
sns.set(style="white") Output

# Generate a random univariate


# dataset
rs = np.random.RandomState(10)
d = rs.normal(size=100)

# Plot a simple histogram and kde


# with binsize determined automatically
sns.distplot(d, kde=True, color="m")
Output

Lmplot

The lmplot is another most basic plot. It shows a


line representing a linear regression model along with
data points on the 2D-space and x and y can be set as the • In Seaborn, we will plot multiple graphs in a single
horizontal and vertical labels respectively. window in two ways. First with the help of
Python3 Facetgrid() function and other by implicit with the
help of matplotlib.
import seaborn as sns
FacetGrid

• FacetGrid is a general way of plotting grids based on


sns.set(style="ticks")
a function. It helps in visualizing distribution of one
variable as well as the relationship between
# Loading the dataset multiple variables. Its object uses the dataframe as
df = sns.load_dataset("anscombe") Input and the names of the variables that shape the
column, row, or color dimensions of the grid, the
syntax is given below:
# Show the results of a linear regression
• Syntax: seaborn.FacetGrid( data, \*\*kwargs)
sns.lmplot(x="x", y="y", data=df)
• data: Tidy dataframe where each column is a
Output
variable and each row is an observation.
• \*\*kwargs: It uses many arguments as input such
as, i.e. row, col, hue, palette etc.
Below is the implementation of above method:
Import all Python libraries needed
Python3

import seaborn as sns


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

➢ Example 6.7.1 : Here, we are Initializing the grid like this


sets up the matplotlib figure and axes, but doesn’t draw
 6.7.2 Multiple Plots anything on them, we are using the Exercise dataset
which is well known dataset available as an inbuilt
dataset in seaborn. The basic usage of the class is very
GQ Explain Multiple Plots in detail?
similar to FacetGrid. First you initialize the grid, then you
• We are going to see multi-dimensional plot data, It pass plotting function to a map method and it will be
is a useful approach to draw multiple instances of called on each subplot.

the same plot on different subsets of your dataset. It


allows a viewer to quickly extract a large amount of
information about a complex dataset.
Python3
# loading of a dataframe from seaborn
exercise = sns.load_dataset("exercise")

# Form a facetgrid using columns


sea = sns.FacetGrid(exercise, col = "time")
Output

➢ Example 6.7.2 : This function will draw the figure and annotate the axes. To make a relational plot, First, you initialize the
grid, then you pass the plotting function to a map method and it will be called on each subplot.
Python3
# Form a facetgrid using columns with a hue
sea = sns.FacetGrid(exercise, col = "time", hue = "kind")

# map the above form facetgrid with some attributes


sea.map(sns.scatterplot, "pulse", "time", alpha = .8)

# adding legend
sea.add_legend()
Output

➢ Example 6.7.3 : There are several options for controlling the look of the grid that can be passed to the class constructor.
Python3
sea = sns.FacetGrid(exercise, row = "diet",
col = "time", margin_titles = True)

sea.map(sns.regplot, "id", "pulse", color = ".3",


fit_reg = False, x_jitter = .1)
Output
➢ Example 6.7.4 : The size of the figure is set by providing the height of each facet, along with the aspect ratio:
Python3

sea = sns.FacetGrid(exercise, col = "time",


height = 4, aspect =.5)

sea.map(sns.barplot, "diet", "pulse",


order = ["no fat", "low fat"])
Output

➢ Example 6.7.5 : The default ordering of the facets is derived from the information in the DataFrame. If the variable used to
define facets has a categorical type, then the order of the categories is used. Otherwise, the facets will be in the order of
appearance of the category levels. It is possible, however, to specify an ordering of any facet dimension with the
appropriate *_order parameter:
Python3
exercise_kind = exercise.kind.value_counts().index
sea = sns.FacetGrid(exercise, row = "kind",
row_order = exercise_kind,
height = 1.7, aspect = 4)
sea.map(sns.kdeplot, "id")
Output

➢ Example 6.7.6 : If you have many levels of one variable, you can plot it along the columns but “wrap” them so that they
span multiple rows. When doing this, you cannot use a row variable.
Python3
g = sns.PairGrid(exercise)
g.map_diag(sns.histplot)
g.map_offdiag(sns.scatterplot)
Output

➢ Example 6.7.7 : In this example, we will see that we can also plot multiplot grid with the help of pairplot() function. This
shows the relationship for (n, 2) combination of variable in a DataFrame as a matrix of plots and the diagonal plots are the
univariate plots.
Python3

# importing packages
import seaborn
import matplotlib.pyplot as plt

# loading dataset using seaborn


df = seaborn.load_dataset('tips')

# pairplot with hue sex


seaborn.pairplot(df, hue ='size')
plt.show()
Output

Method 2 : Implicit with the help of matplotlib.

In this we will learn how to create subplots using matplotlib and seaborn.
Import all Python libraries needed
Python3

import seaborn as sns


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Setting seaborn as default style even


# if use only matplotlib
sns.set()
➢ Example 6.7.8 : Here, we are Initializing the grid without arguments returns a Figure and a single Axes, which we can
unpack using the syntax bellow.
Python3
figure, axes = plt.subplots()
figure.suptitle('Geeksforgeeks - one axes with no data')
Output :

➢ Example 6.7.9 : In this example we create a plot with 1 row and 2 columns, still no data passed i.e. nrows and ncols. If
given in this order, we don’t need to type the arg names, just its values.
figsize set the total dimension of our figure.
sharex and sharey are used to share one or both axes between the charts.

Python3

figure, axes = plt.subplots(1, 2, sharex=True, figsize=(10,5))


figure.suptitle('Geeksforgeeks')
axes[0].set_title('first chart with no data')
axes[1].set_title('second chart with no data')
Output
➢ Example 6.7.10 : If you have many levels
Python3
figure, axes = plt.subplots(3, 4, sharex=True, figsize=(16,8))
figure.suptitle('Geeksforgeeks - 3 x 4 axes with no data')
Output

➢ Example 6.7.11 : Here, we are Initializing matplotlib figure and axes, In this example, we are passing required data on
them with the help of the Exercise dataset which is a well-known dataset available as an inbuilt dataset in seaborn. By
using this method you can plot any number of the multi-plot grid and any style of the graph by implicit rows and columns
with the help of matplotlib in seaborn. We are using sns.boxplot here, where we need to set the argument with the
correspondent element from the axes variable.
Python3
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 3, figsize=(18, 10))

fig.suptitle('Geeksforgeeks - 2 x 3 axes Box plot with data')

iris = sns.load_dataset("iris")

sns.boxplot(ax=axes[0, 0], data=iris, x='species', y='petal_width')


sns.boxplot(ax=axes[0, 1], data=iris, x='species', y='petal_length')
sns.boxplot(ax=axes[0, 2], data=iris, x='species', y='sepal_width')
sns.boxplot(ax=axes[1, 0], data=iris, x='species', y='sepal_length')
sns.boxplot(ax=axes[1, 1], data=iris, x='species', y='petal_width')
sns.boxplot(ax=axes[1, 2], data=iris, x='species', y='petal_length')
Output
➢ Example 6.7.12 : A gridspec() is for a grid of rows and columns with some specified width and height space. The
plt.GridSpec object does not create a plot by itself but it is simply a convenient interface that is recognized by the subplot()
command.
Python 3

import matplotlib.pyplot as plt

Grid_plot = plt.GridSpec(2, 3, wspace = 0.8,


hspace = 0.6)

plt.subplot(Grid_plot[0, 0])
plt.subplot(Grid_plot[0, 1:])
plt.subplot(Grid_plot[1, :2])
plt.subplot(Grid_plot[1, 2])
Output

➢ Example 6.7.13 : Here we’ll create a 3×4 grid of subplot using subplots(), where all axes in the same row share their y-
axis scale, and all axes in the same column share their x-axis scale.
Python3
import matplotlib.pyplot as plt

figure, axes = plt.subplots(3, 4,


figsize = (15, 10))

figure.suptitle('Geeksforgeeks - 2 x 3 axes grid plot using subplots')


Output

 6.7.3 Regression Plot

GQ. Explain Regression plot

The regression plots in seaborn are primarily intended to add a visual guide that helps to emphasize patterns in
a dataset during exploratory data analyses. Regression plots as the name suggests creates a regression line between 2
parameters and helps to visualize their linear relationships. We consider those kinds of plots in seaborn and shows
the ways that can be adapted to change the size, aspect, ratio etc. of such plots.
Seaborn is not only a visualization library but also a provider of built-in datasets. Here, we will be working with
one of such datasets in seaborn named ‘tips’. The tips dataset contains information about the people who probably
had food at the restaurant and whether or not they left a tip. It also provides information about the gender of the
people, whether they smoke, day, time and so on.
Let us have a look at the dataset first before we start with the regression plots.
Load the dataset
Python3

# import the library


import seaborn as sns

# load the dataset


dataset = sns.load_dataset('tips')

# the first five entries of the dataset


dataset.head()
Output

]Now let us begin with the regression plots in seaborn. Regression plots in seaborn can be easily implemented
with the help of the lmplot() function. lmplot() can be understood as a function that basically creates a linear model
plot. lmplot() makes a very simple linear regression plot.It creates a scatter plot with a linear fit on top of it.
Simple linear plot
Python3

sns.set_style('whitegrid')

sns.lmplot(x ='total_bill', y ='tip', data = dataset)


Output

Explanation

x and y parameters are specified to provide values for the x and y axes. sns.set_style() is used to have a grid in the
background instead of a default white background. The data parameter is used to specify the source of information
for drawing the plots.
Linear plot with additional parameters
Python3

sns.set_style('whitegrid')
sns.lmplot(x ='total_bill', y ='tip', data = dataset,
hue ='sex', markers =['o', 'v'])
Output
Explanation

In order to have a better analysis capability using these plots, we can specify hue to have a categorical separation
in our plot as well as use markers that come from the matplotlib marker symbols. Since we have two separate
categories we need to pass in a list of symbols while specifying the marker.
Setting the size and color of the plot
Python3

sns.set_style('whitegrid')
sns.lmplot(x ='total_bill', y ='tip', data = dataset, hue ='sex',
markers =['o', 'v'], scatter_kws ={'s':100},
palette ='plasma')
Output

Explanation

In this example what seaborn is doing is that its calling the matplotlib parameters indirectly to affect the scatter
plots. We specify a parameter called scatter_kws. We must note that the scatter_kws parameter changes the size of
only the scatter plots and not the regression lines. The regression lines remain untouched. We also use the palette
parameter to change the color of the plot. Rest of the things remain the same as explained in the first example.
Displaying multiple plots
Python3

sns.lmplot(x ='total_bill', y ='tip', data = dataset,

col ='sex', row ='time', hue ='smoker')


Output

Explanation

In the above code, we draw multiple plots by specifying a separation with the help of the rows and columns. Each
row contains the plots of tips vs the total bill for the different times specified in the dataset. Each column contains the
plots of tips vs the total bill for the different genders. A further separation is done by specifying the hue parameter on
the basis of whether the person smokes.
Size and aspect ratio of the plots

Python3

sns.lmplot(x ='total_bill', y ='tip', data = dataset, col ='sex',


row ='time', hue ='smoker', aspect = 0.6,
size = 4, palette ='coolwarm')
Output

Explanation

Suppose we have a large number of plots in the output, we need to set the size and aspect for it in order to better
visualize it. aspect: scalar, optional specifies the aspect ratio of each facet, so that “aspect * height” gives the width of
each facet in inches.

 6.7.4 Regplot

GQ. Explain Regplot?


Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing
attractive and informative statistical graphics. Seaborn helps resolve the two major problems faced by Matplotlib; the
problems are ?
• Default Matplotlib parameters
• Working with data frames
As Seaborn compliments and extends Matplotlib, the learning curve is quite gradual. If you know Matplotlib, you
are already half-way through Seaborn.
• seaborn.regplot() :
This method is used to plot data and a linear regression model fit. There are a number of mutually exclusive
options for estimating the regression model.
• Syntax : seaborn.regplot( x, y, data=None, x_estimator=None, x_bins=None, x_ci=’ci’, scatter=True, fit_reg=True,
ci=95, n_boot=1000, units=None, order=1, logistic=False, lowess=False, robust=False, logx=False,
x_partial=None, y_partial=None, truncate=False, dropna=True, x_jitter=None, y_jitter=None, label=None,
color=None, marker=’o’, scatter_kws=None, line_kws=None, ax=None)
• Parameters: The description of some main parameters are given below:
• x, y: These are Input variables. If strings, these should correspond with column names in “data”. When pandas
objects are used, axes will be labeled with the series name.
• data: This is dataframe where each column is a variable and each row is an observation.
• lowess: (optional) This parameter take boolean value. If “True”, use “statsmodels” to estimate a nonparametric
lowess model (locally weighted linear regression).
• color: (optional) Color to apply to all plot elements.
• marker: (optional) Marker to use for the scatterplot glyphs.
• Return: The Axes object containing the plot.
Below is the implementation of above method:
➢ Example 6.7.14
Python3
# importing required packages
import seaborn as sns
import matplotlib.pyplot as plt

# loading dataset
data = sns.load_dataset("mpg")

# draw regplot
sns.regplot(x = "mpg",
y = "acceleration",
data = data)

# show the plot


plt.show()

# This code is contributed


# by Deepanshu Rustagi.
Output
➢ Example 6.7.15
Python3
# importing required packages
import seaborn as sns
import matplotlib.pyplot as plt

# loading dataset
data = sns.load_dataset("titanic")

# draw regplot
sns.regplot(x = "age",
y = "fare",
data = data,
dropna = True)
# show the plot
plt.show()

# This code is contributed


# by Deepanshu Rustagi.
Output
➢ Example 6.7.16

Python3

# importing required packages


import seaborn as sns
import matplotlib.pyplot as plt

# loading dataset
data = sns.load_dataset("exercise")

# draw regplot
sns.regplot(x = "id",
y = "pulse",
data = data)

# show the plot


plt.show()

# This code is contributed


# by Deepanshu Rustagi.
Output

➢ Example 6.7.17
Python3

# importing required packages


import seaborn as sns
import matplotlib.pyplot as plt

# loading dataset
data = sns.load_dataset("attention")
# draw regplot
sns.regplot(x = "solutions",
y = "score",
data = data)

# show there plot


plt.show()

# This code is contributed


# by Deepanshu Rustagi.
Output

You might also like