0% found this document useful (0 votes)
6 views7 pages

Scrib 1

The document discusses data visualization in Python using libraries like Matplotlib and Seaborn, highlighting the importance of visual representation for analyzing complex data sets. It outlines the advantages of data visualization, best practices for creating effective visualizations, and the features of both Matplotlib and Seaborn. Additionally, it describes various types of plots, including box plots and scatter plots, and their respective uses in understanding data distribution and relationships.

Uploaded by

Bhuvan Nagaraja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views7 pages

Scrib 1

The document discusses data visualization in Python using libraries like Matplotlib and Seaborn, highlighting the importance of visual representation for analyzing complex data sets. It outlines the advantages of data visualization, best practices for creating effective visualizations, and the features of both Matplotlib and Seaborn. Additionally, it describes various types of plots, including box plots and scatter plots, and their respective uses in understanding data distribution and relationships.

Uploaded by

Bhuvan Nagaraja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Data Visualisation in Python using Matplotlib and

Seaborn
Last Updated : 09 Nov, 2022



It may sometimes seem easier to go through a set of data points and build insights from it
but usually this process may not yield good results. There could be a lot of things left
undiscovered as a result of this process. Additionally, most of the data sets used in real life
are too big to do any analysis manually. This is essentially where data visualization steps
in.
Data visualization is an easier way of presenting the data, however complex it is, to
analyze trends and relationships amongst variables with the help of pictorial
representation.
The following are the advantages of Data Visualization
 Easier representation of compels data
 Highlights good and bad performing areas
 Explores relationship between data points
 Identifies data patterns even for larger data points
While building visualization, it is always a good practice to keep some below mentioned
points in mind
 Ensure appropriate usage of shapes, colors, and size while building visualization
 Plots/graphs using a co-ordinate system are more pronounced
 Knowledge of suitable plot with respect to the data types brings more clarity to the
information
 Usage of labels, titles, legends and pointers passes seamless information the wider
audience

Python Libraries
There are a lot of python libraries which could be used to build visualization like matplotlib,
vispy, bokeh, seaborn, pygal, folium, plotly, cufflinks, and networkx. Of the
many, matplotlib and seaborn seems to be very widely used for basic to intermediate level
of visualizations.

Matplotlib

It is an amazing visualization library in Python for 2D plots of arrays, It is a multi-platform


data visualization library built on NumPy arrays and designed to work with the
broader SciPy stack. It was introduced by John Hunter in the year 2002. Let's try to
understand some of the benefits and features of matplotlib
 It's fast, efficient as it is based on numpy and also easier to build
 Has undergone a lot of improvements from the open source community since inception
and hence a better library having advanced features as well
 Well maintained visualization output with high quality graphics draws a lot of users to it
 Basic as well as advanced charts could be very easily built
 From the users/developers point of view, since it has a large community support,
resolving issues and debugging becomes much easier

Seaborn

Conceptualized and built originally at the Stanford University, this library sits on top
of matplotlib. In a sense, it has some flavors of matplotlib while from the visualization
point, it is much better than matplotlib and has added features as well. Below are its
advantages
 Built-in themes aid better visualization
 Statistical functions aiding better data insights
 Better aesthetics and built-in plots
 Helpful documentation with effective examples
Nature of Visualization
Depending on the number of variables used for plotting the visualization and the type of
variables, there could be different types of charts which we could use to understand the
relationship. Based on the count of variables, we could have
 Univariate plot(involves only one variable)
 Bivariate plot(more than one variable in required)
A Univariate plot could be for a continuous variable to understand the spread and
distribution of the variable while for a discrete variable it could tell us the count
Similarly, a Bivariate plot for continuous variable could display essential statistic like
correlation, for a continuous versus discrete variable could lead us to very important
conclusions like understanding data distribution across different levels of a categorical
variable. A bivariate plot between two discrete variables could also be developed.

Box plot
A boxplot, also known as a box and whisker plot, the box and the whisker are clearly
displayed in the below image. It is a very good visual representation when it comes to
measuring the data distribution. Clearly plots the median values, outliers and the quartiles.
Understanding data distribution is another important factor which leads to better model
building. If data has outliers, box plot is a recommended way to identify them and take
necessary actions.
Syntax: seaborn.boxplot(x=None, y=None, hue=None, data=None,
order=None, hue_order=None, orient=None, color=None, palette=None,
saturation=0.75, width=0.8, dodge=True, fliersize=5, linewidth=None,
whis=1.5, ax=None, **kwargs)

Parameters:
x, y, hue: Inputs for plotting long-form data.
data: Dataset for plotting. If x and y are absent, this is interpreted as
wide-form.
color: Color for all of the elements.

Returns: It returns the Axes object with the plot drawn onto it.
The box and whiskers chart shows how data is spread out. Five pieces of information are
generally included in the chart

1. The minimum is shown at the far left of the chart, at the end of the left ‘whisker’
2. First quartile, Q1, is the far left of the box (left whisker)
3. The median is shown as a line in the center of the box
4. Third quartile, Q3, shown at the far right of the box (right whisker)
5. The maximum is at the far right of the box
As could be seen in the below representations and charts, a box plot could be plotted for
one or more than one variable providing very good insights to our data.
Representation of box plot.

Data Visualisation in Python using Matplotlib and


Seaborn
Last Updated : 09 Nov, 2022



It may sometimes seem easier to go through a set of data points and build insights from it
but usually this process may not yield good results. There could be a lot of things left
undiscovered as a result of this process. Additionally, most of the data sets used in real life
are too big to do any analysis manually. This is essentially where data visualization steps
in.
Data visualization is an easier way of presenting the data, however complex it is, to
analyze trends and relationships amongst variables with the help of pictorial
representation.
The following are the advantages of Data Visualization
 Easier representation of compels data
 Highlights good and bad performing areas
 Explores relationship between data points
 Identifies data patterns even for larger data points
While building visualization, it is always a good practice to keep some below mentioned
points in mind
 Ensure appropriate usage of shapes, colors, and size while building visualization
 Plots/graphs using a co-ordinate system are more pronounced
 Knowledge of suitable plot with respect to the data types brings more clarity to the
information
 Usage of labels, titles, legends and pointers passes seamless information the wider
audience

Python Libraries
There are a lot of python libraries which could be used to build visualization like matplotlib,
vispy, bokeh, seaborn, pygal, folium, plotly, cufflinks, and networkx. Of the
many, matplotlib and seaborn seems to be very widely used for basic to intermediate level
of visualizations.

Matplotlib

It is an amazing visualization library in Python for 2D plots of arrays, It is a multi-platform


data visualization library built on NumPy arrays and designed to work with the
broader SciPy stack. It was introduced by John Hunter in the year 2002. Let's try to
understand some of the benefits and features of matplotlib
 It's fast, efficient as it is based on numpy and also easier to build
 Has undergone a lot of improvements from the open source community since inception
and hence a better library having advanced features as well
 Well maintained visualization output with high quality graphics draws a lot of users to it
 Basic as well as advanced charts could be very easily built
 From the users/developers point of view, since it has a large community support,
resolving issues and debugging becomes much easier

Seaborn

Conceptualized and built originally at the Stanford University, this library sits on top
of matplotlib. In a sense, it has some flavors of matplotlib while from the visualization
point, it is much better than matplotlib and has added features as well. Below are its
advantages
 Built-in themes aid better visualization
 Statistical functions aiding better data insights
 Better aesthetics and built-in plots
 Helpful documentation with effective examples

Nature of Visualization
Depending on the number of variables used for plotting the visualization and the type of
variables, there could be different types of charts which we could use to understand the
relationship. Based on the count of variables, we could have
 Univariate plot(involves only one variable)
 Bivariate plot(more than one variable in required)
A Univariate plot could be for a continuous variable to understand the spread and
distribution of the variable while for a discrete variable it could tell us the count
Similarly, a Bivariate plot for continuous variable could display essential statistic like
correlation, for a continuous versus discrete variable could lead us to very important
conclusions like understanding data distribution across different levels of a categorical
variable. A bivariate plot between two discrete variables could also be developed.

Box plot
A boxplot, also known as a box and whisker plot, the box and the whisker are clearly
displayed in the below image. It is a very good visual representation when it comes to
measuring the data distribution. Clearly plots the median values, outliers and the quartiles.
Understanding data distribution is another important factor which leads to better model
building. If data has outliers, box plot is a recommended way to identify them and take
necessary actions.
Syntax: seaborn.boxplot(x=None, y=None, hue=None, data=None,
order=None, hue_order=None, orient=None, color=None, palette=None,
saturation=0.75, width=0.8, dodge=True, fliersize=5, linewidth=None,
whis=1.5, ax=None, **kwargs)

Parameters:
x, y, hue: Inputs for plotting long-form data.
data: Dataset for plotting. If x and y are absent, this is interpreted as
wide-form.
color: Color for all of the elements.

Returns: It returns the Axes object with the plot drawn onto it.
The box and whiskers chart shows how data is spread out. Five pieces of information are
generally included in the chart

1. The minimum is shown at the far left of the chart, at the end of the left ‘whisker’
2. First quartile, Q1, is the far left of the box (left whisker)
3. The median is shown as a line in the center of the box
4. Third quartile, Q3, shown at the far right of the box (right whisker)
5. The maximum is at the far right of the box
As could be seen in the below representations and charts, a box plot could be plotted for
one or more than one variable providing very good insights to our data.
Representation of box plot.

Box plot
representing multi-variate categorical variables
Box plot representing
multi-variate categorical variables
# import required modulesimport matplotlib as pltimport seaborn as sns
# Box plot and violin plot for Outcome vs BloodPressure_, axes = plt.subplots(1, 2,
sharey=True, figsize=(10, 4))
# box plot illustrationsns.boxplot(x='Outcome', y='BloodPressure', data=diabetes, ax=axes[0])
# violin plot illustrationsns.violinplot(x='Outcome', y='BloodPressure', data=diabetes,
ax=axes[1])

Output for Box Plot and Violin Plot


# Box plot for all the numerical variablessns.set(rc={'figure.figsize': (16, 5)})
# multiple box plot illustrationsns.boxplot(data=diabetes.select_dtypes(include='number'))

Output Multiple Box PLot

Scatter Plot
Scatter plots or scatter graphs is a bivariate plot having greater resemblance to line graphs
in the way they are built. A line graph uses a line on an X-Y axis to plot a continuous
function, while a scatter plot relies on dots to represent individual pieces of data. These
plots are very useful to see if two variables are correlated. Scatter plot could be 2
dimensional or 3 dimensional.
Syntax: seaborn.scatterplot(x=None, y=None, hue=None, style=None,
size=None, data=None, palette=None, hue_order=None,
hue_norm=None, sizes=None, size_order=None, size_norm=None,
markers=True, style_order=None, x_bins=None, y_bins=None,
units=None, estimator=None, ci=95, n_boot=1000, alpha=’auto’,
x_jitter=None, y_jitter=None, legend=’brief’, ax=None, **kwargs)
Parameters:
x, y: Input data variables that should be numeric.
data: Dataframe where each column is a variable and each row is an
observation.
size: Grouping variable that will produce points with different sizes.
style: Grouping variable that will produce points with different markers.
palette: Grouping variable that will produce points with different
markers.
markers: Object determining how to draw the markers for different
levels.
alpha: Proportional opacity of the points.
Returns: This method returns the Axes object with the plot drawn onto it.

Advantages of a scatter plot

 Displays correlation between variables


 Suitable for large data sets
 Easier to find data clusters
 Better representation of each data point

# import moduleimport matplotlib.pyplot as plt


# scatter plot illustrationplt.scatter(diabetes['DiabetesPedigreeFunction'], diabetes['BMI'])

Output 2D Scattered Plot


# import required modulesfrom mpl_toolkits.mplot3d import Axes3D
# assign axis valuesx = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]y = [5, 6, 2, 3, 13, 4, 1, 2,
4, 8]z = [2, 3, 3, 3, 5, 7, 9, 11, 9, 10]
# adjust size of plotsns.set(rc={'figure.figsize': (8, 5)})fig = plt.figure()ax =
fig.add_subplot(111, projection='3d')ax.scatter(x, y, z, c='r', marker='o')
# assign labelsax.set_xlabel('X Label'), ax.set_ylabel('Y Label'), ax.set_zlabel('Z Label')
# display illustrationplt.show()

You might also like