0% found this document useful (0 votes)
16 views9 pages

46 Seaborn

Seaborn is a Python data visualization library that enhances Matplotlib by addressing issues like default parameters and data frame compatibility. It offers features for styling graphics, visualizing data, and fitting regression models, including methods like regplot and lmplot for plotting relationships between variables. The document also provides examples of importing datasets and creating various plots using Seaborn.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views9 pages

46 Seaborn

Seaborn is a Python data visualization library that enhances Matplotlib by addressing issues like default parameters and data frame compatibility. It offers features for styling graphics, visualizing data, and fitting regression models, including methods like regplot and lmplot for plotting relationships between variables. The document also provides examples of importing datasets and creating various plots using Seaborn.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

seeaborn & Regression plot

Seaborn is a Python data visualization library based on matplotlib. It


provides a high-level interface for drawing attractive and informative
statistical graphics. Seaborn helps resolve the two major problems faced
by Matplotlib; the problems are ?
•Default Matplotlib parameters
•Working with data frames
As Seaborn compliments and extends Matplotlib, the learning curve is
quite gradual. If you know Matplotlib, you are already half-way through
Seaborn.
This method is used to plot data and a linear regression model fit. There
are a number of mutually exclusive options for estimating the regression
model. For more information

Important Features of Seaborn

Seaborn is built on top of Python’s core visualization library Matplotlib. It is


meant to serve as a complement, and not a replacement. However, Seaborn
comes with some very important features. Let us see a few of them here. The
features help in −

•Built in themes for styling matplotlib graphics


•Visualizing univariate and bivariate data
•Fitting in and visualizing linear regression models
•Plotting statistical time series data
•Seaborn works well with NumPy and Pandas data structures

Importing Datasets
We have imported the required libraries. In this section, we will understand how
to import the required datasets.

Seaborn comes with a few important datasets in the library. When Seaborn is
installed, the datasets download automatically.
You can use any of these datasets for your learning. With the help of the
following function you can load the required dataset

load_dataset()
seaborn.regplot() method:

Syntax : seaborn.regplot( x, y, data=None, x_estimator=None,


x_bins=None, x_ci=’ci’, scatter=True, fit_reg=True, ci=95, n_boot=1000,
units=None, order=1, logistic=False, lowess=False, robust=False,
logx=False, x_partial=None, y_partial=None, truncate=False,
dropna=True, x_jitter=None, y_jitter=None, label=None, color=None,
marker=’o’, scatter_kws=None, line_kws=None, ax=None)
Parameters:The description of some main parameters are given below:
•x, y:These are Input variables. If strings, these should correspond with
column names in “data”. When pandas objects are used, axes will be
labeled with the series name.
•data:This is dataframe where each column is a variable and each row
is an observation.
•lowess:(optional) This parameter take boolean value. If “True”, use
“statsmodels” to estimate a nonparametric lowess model (locally
weighted linear regression).
•color:(optional) Color to apply to all plot elements.
•marker:(optional) Marker to use for the scatterplot glyphs.
Return:The Axes object containing the plot

seaborn.Implot() method:

seaborn.lmplot() method is used to draw a scatter plot onto a FacetGrid.


Syntax : seaborn.lmplot(x, y, data, hue=None, col=None,
row=None, palette=None, col_wrap=None, height=5, aspect=1,
markers=’o’, sharex=True, sharey=True, hue_order=None,
col_order=None, row_order=None, legend=True,
legend_out=True, x_estimator=None, x_bins=None, x_ci=’ci’,
scatter=True, fit_reg=True, ci=95, n_boot=1000, units=None,
seed=None, order=1, logistic=False, lowest=False, robust=False,
logx=False, x_partial=None, y_partial=None, truncate=True,
x_jitter=None, y_jitter=None, scatter_kws=None, line_kws=None,
size=None
Parameters : This method is accepting the following parameters that are
described below:
•x, y: ( optional) This parameters are column names in data.
•data : This parameter is DataFrame .
•hue, col, row : This parameters are define subsets of the data, which
will be drawn on separate facets in the grid. See the *_order
parameters to control the order of levels of this variable.
•palette: (optional) This parameter is palette name, list, or dict, Colors
to use for the different levels of the hue variable. Should be something
that can be interpreted by color_palette(), or a dictionary mapping hue
levels to matplotlib colors.
•col_wrap : (optional) This parameter is of int type, “Wrap” the column
variable at this width, so that the column facets span multiple rows.
Incompatible with a row facet.
•height : (optional) This parameter is Height (in inches) of each facet.
•aspect : (optional) This parameter is Aspect ratio of each facet, so that
aspect * height gives the width of each facet in inches.
•markers : (optional) This parameter is matplotlib marker code or list of
marker codes, Markers for the scatterplot. If a list, each marker in the
list will be used for each level of the hue variable.
•share{x, y} : (optional) This parameter is of bool type, ‘col’, or ‘row’, If
true, the facets will share y axes across columns and/or x axes across
rows.
•{hue, col, row}_order : (optional) This parameter is lists, Order for the
levels of the faceting variables. By default, this will be the order that
the levels appear in data or, if the variables are pandas categoricals,
the category order.
•legend : (optional) This parameter accepting bool value, If True and
there is a hue variable, add a legend.
•legend_out : (optional) This parameter accepting bool value, If True,
the figure size will be extended, and the legend will be drawn outside
the plot on the center right.
•x_estimator : (optional)This parameter is callable that maps vector ->
scalar, Apply this function to each unique value of x and plot the
resulting estimate. This is useful when x is a discrete variable. If x_ci is
given, this estimate will be bootstrapped and a confidence interval will
be drawn.
•x_bins : (optional) This parameter is int or vector, Bin the x variable
into discrete bins and then estimate the central tendency and a
confidence interval. This binning only influences how the scatter plot is
drawn; the regression is still fit to the original data. This parameter is
interpreted either as the number of evenly-sized (not necessary
spaced) bins or the positions of the bin centers. When this parameter is
used, it implies that the default of x_estimator is numpy.mean.
•x_ci : (optional) This parameter is “ci”, “sd”, int in [0, 100] or None,
Size of the confidence interval used when plotting a central tendency
for discrete values of x. If “ci”, defer to the value of the ci parameter. If
“sd”, skip bootstrapping and show the standard deviation of the
observations in each bin.
•scatter : (optional) This parameter accepting bool value . If True, draw
a scatterplot with the underlying observations (or the x_estimator
values).
•fit_reg : (optional) This parameter accepting bool value . If True,
estimate and plot a regression model relating the x and y variables.
•ci : (optional) This parameter is int in [0, 100] or None, Size of the
confidence interval for the regression estimate. This will be drawn using
translucent bands around the regression line. The confidence interval is
estimated using a bootstrap; for large datasets, it may be advisable to
avoid that computation by setting this parameter to None.
•n_boot : (optional) This parameter is Number of bootstrap resamples
used to estimate the ci. The default value attempts to balance time and
stability; you may want to increase this value for “final” versions of
plots.
•units : (optional) This parameter is variable name in data, If the x and y
observations are nested within sampling units, those can be specified
here. This will be taken into account when computing the confidence
intervals by performing a multilevel bootstrap that resamples both units
and observations (within unit). This does not otherwise influence how
the regression is estimated or drawn.
•seed : (optional) This parameter is int, numpy.random.Generator, or
numpy.random.RandomState, Seed or random number generator for
reproducible bootstrapping.
•order : (optional) This parameter, order is greater than 1, use
numpy.polyfit to estimate a polynomial regression.
•logistic : (optional) This parameter accepting bool value, If True,
assume that y is a binary variable and use statsmodels to estimate a
logistic regression model. Note that this is substantially more
computationally intensive than linear regression, so you may wish to
decrease the number of bootstrap resamples (n_boot) or set ci to None.
•lowest : (optional) This parameter accepting bool value, If True, use
statsmodels to estimate a non-parametric lowest model (locally
weighted linear regression). Note that confidence intervals cannot
currently be drawn for this kind of model.
•robust : (optional) This parameter accepting bool value, If True, use
statsmodels to estimate a robust regression. This will de-weight
outliers. Note that this is substantially more computationally intensive
than standard linear regression, so you may wish to decrease the
number of bootstrap resamples (n_boot) or set ci to None.
•logx : (optional) This parameter accepting bool value. If True, estimate
a linear regression of the form y ~ log(x), but plot the scatterplot and
regression model in the input space. Note that x must be positive for
this to work.
•{x, y}_partial : (optional) This parameter is strings in data or matrices,
Confounding variables to regress out of the x or y variables before
plotting.
•truncate : (optional) This parameter accepting bool value.If True, the
regression line is bounded by the data limits. If False, it extends to the
x axis limits.
•{x, y}_jitter : (optional) This parameter is Add uniform random noise of
this size to either the x or y variables. The noise is added to a copy of
the data after fitting the regression, and only influences the look of the
scatterplot. This can be helpful when plotting variables that take
discrete values.
•{scatter, line}_kws : (optional)dictionaries

# importing required packages


import seaborn as sns
import matplotlib.pyplot as plt

# loading dataset
data = sns.load_dataset("mpg")
print(data)

# draw regplot
sns.regplot(x = "mpg",
y = "acceleration",
data = data)

# show the plot


plt.show()

mpg cylinders displacement horsepower weight acceleration


0 18.0 8 307.0 130.0 3504 12.0
1 15.0 8 350.0 165.0 3693 11.5
2 18.0 8 318.0 150.0 3436 11.0
3 16.0 8 304.0 150.0 3433 12.0
4 17.0 8 302.0 140.0 3449 10.5
.. ... ... ... ... ... ...
393 27.0 4 140.0 86.0 2790 15.6
394 44.0 4 97.0 52.0 2130 24.6
395 32.0 4 135.0 84.0 2295 11.6
396 28.0 4 120.0 79.0 2625 18.6
397 31.0 4 119.0 82.0 2720 19.4

model_year origin name


0 70 usa chevrolet chevelle malibu
1 70 usa buick skylark 320
2 70 usa plymouth satellite
3 70 usa amc rebel sst
4 70 usa ford torino
.. ... ... ...
393 82 usa ford mustang gl
394 82 europe vw pickup
395 82 usa dodge rampage
396 82 usa ford ranger
397 82 usa chevy s-10

[398 rows x 9 columns]

Default Data sets available in seaborn:’mpg’,’tips’,’titanic’,’iris’, "acceleration"

# import the library


import seaborn as sns

# load the dataset


dataset = sns.load_dataset('tips')

# the first five entries of the dataset


dataset.head()

x and y parameters are specified to provide values for the x and y axes.
sns.set_style() is used to have a grid in the background instead of a
default white background. The data parameter is used to specify the
source of information for drawing the plots.

sns.set_style('whitegrid')
sns.lmplot(x ='total_bill', y ='tip', data = dataset,
hue ='sex', markers =['o', 'v'])

In order to have a better analysis capability using these plots, we can


specify hue to have a categorical separation in our plot as well as use
markers that come from the matplotlib marker symbols. Since we have
two separate categories we need to pass in a list of symbols while
specifying the marker.

Setting the size and color of the plot:

sns.set_style('whitegrid')
sns.lmplot(x ='total_bill', y ='tip', data = dataset, hue ='sex',
markers =['o', 'v'], scatter_kws ={'s':100},
palette ='plasma')

In this example what seaborn is doing is that its calling the matplotlib
parameters indirectly to affect the scatter plots. We specify a parameter
called scatter_kws. We must note that the scatter_kws parameter changes
the size of only the scatter plots and not the regression lines. The
regression lines remain untouched. We also use the palette parameter to
change the color of the plot.

regplot lmplot

accepts the x and y variables in a variety of


has data as a required parameter and the x
formats including simple numpy arrays, pandas
and y variables must be specified as strings.
Series objects, or as references to variables in a
This data format is called “long-form” data
pandas DataFrame

# importing required packages


import seaborn as sns
import matplotlib.pyplot as plt

# loading dataset
data = sns.load_dataset("mpg")
print(data
# draw regplot
sns.regplot(x = "mpg",y = "acceleration", data = data)

# show the plot


plt.show()

# draw regplot
sns.lmplot(x = "mpg",y = "acceleration",data = data)

# show the plot


plt.show()

You might also like