Scrib 1
Scrib 1
Seaborn
Last Updated : 09 Nov, 2022
It may sometimes seem easier to go through a set of data points and build insights from it
but usually this process may not yield good results. There could be a lot of things left
undiscovered as a result of this process. Additionally, most of the data sets used in real life
are too big to do any analysis manually. This is essentially where data visualization steps
in.
Data visualization is an easier way of presenting the data, however complex it is, to
analyze trends and relationships amongst variables with the help of pictorial
representation.
The following are the advantages of Data Visualization
Easier representation of compels data
Highlights good and bad performing areas
Explores relationship between data points
Identifies data patterns even for larger data points
While building visualization, it is always a good practice to keep some below mentioned
points in mind
Ensure appropriate usage of shapes, colors, and size while building visualization
Plots/graphs using a co-ordinate system are more pronounced
Knowledge of suitable plot with respect to the data types brings more clarity to the
information
Usage of labels, titles, legends and pointers passes seamless information the wider
audience
Python Libraries
There are a lot of python libraries which could be used to build visualization like matplotlib,
vispy, bokeh, seaborn, pygal, folium, plotly, cufflinks, and networkx. Of the
many, matplotlib and seaborn seems to be very widely used for basic to intermediate level
of visualizations.
Matplotlib
Seaborn
Conceptualized and built originally at the Stanford University, this library sits on top
of matplotlib. In a sense, it has some flavors of matplotlib while from the visualization
point, it is much better than matplotlib and has added features as well. Below are its
advantages
Built-in themes aid better visualization
Statistical functions aiding better data insights
Better aesthetics and built-in plots
Helpful documentation with effective examples
Nature of Visualization
Depending on the number of variables used for plotting the visualization and the type of
variables, there could be different types of charts which we could use to understand the
relationship. Based on the count of variables, we could have
Univariate plot(involves only one variable)
Bivariate plot(more than one variable in required)
A Univariate plot could be for a continuous variable to understand the spread and
distribution of the variable while for a discrete variable it could tell us the count
Similarly, a Bivariate plot for continuous variable could display essential statistic like
correlation, for a continuous versus discrete variable could lead us to very important
conclusions like understanding data distribution across different levels of a categorical
variable. A bivariate plot between two discrete variables could also be developed.
Box plot
A boxplot, also known as a box and whisker plot, the box and the whisker are clearly
displayed in the below image. It is a very good visual representation when it comes to
measuring the data distribution. Clearly plots the median values, outliers and the quartiles.
Understanding data distribution is another important factor which leads to better model
building. If data has outliers, box plot is a recommended way to identify them and take
necessary actions.
Syntax: seaborn.boxplot(x=None, y=None, hue=None, data=None,
order=None, hue_order=None, orient=None, color=None, palette=None,
saturation=0.75, width=0.8, dodge=True, fliersize=5, linewidth=None,
whis=1.5, ax=None, **kwargs)
Parameters:
x, y, hue: Inputs for plotting long-form data.
data: Dataset for plotting. If x and y are absent, this is interpreted as
wide-form.
color: Color for all of the elements.
Returns: It returns the Axes object with the plot drawn onto it.
The box and whiskers chart shows how data is spread out. Five pieces of information are
generally included in the chart
1. The minimum is shown at the far left of the chart, at the end of the left ‘whisker’
2. First quartile, Q1, is the far left of the box (left whisker)
3. The median is shown as a line in the center of the box
4. Third quartile, Q3, shown at the far right of the box (right whisker)
5. The maximum is at the far right of the box
As could be seen in the below representations and charts, a box plot could be plotted for
one or more than one variable providing very good insights to our data.
Representation of box plot.
Python Libraries
There are a lot of python libraries which could be used to build visualization like matplotlib,
vispy, bokeh, seaborn, pygal, folium, plotly, cufflinks, and networkx. Of the
many, matplotlib and seaborn seems to be very widely used for basic to intermediate level
of visualizations.
Matplotlib
Seaborn
Conceptualized and built originally at the Stanford University, this library sits on top
of matplotlib. In a sense, it has some flavors of matplotlib while from the visualization
point, it is much better than matplotlib and has added features as well. Below are its
advantages
Built-in themes aid better visualization
Statistical functions aiding better data insights
Better aesthetics and built-in plots
Helpful documentation with effective examples
Nature of Visualization
Depending on the number of variables used for plotting the visualization and the type of
variables, there could be different types of charts which we could use to understand the
relationship. Based on the count of variables, we could have
Univariate plot(involves only one variable)
Bivariate plot(more than one variable in required)
A Univariate plot could be for a continuous variable to understand the spread and
distribution of the variable while for a discrete variable it could tell us the count
Similarly, a Bivariate plot for continuous variable could display essential statistic like
correlation, for a continuous versus discrete variable could lead us to very important
conclusions like understanding data distribution across different levels of a categorical
variable. A bivariate plot between two discrete variables could also be developed.
Box plot
A boxplot, also known as a box and whisker plot, the box and the whisker are clearly
displayed in the below image. It is a very good visual representation when it comes to
measuring the data distribution. Clearly plots the median values, outliers and the quartiles.
Understanding data distribution is another important factor which leads to better model
building. If data has outliers, box plot is a recommended way to identify them and take
necessary actions.
Syntax: seaborn.boxplot(x=None, y=None, hue=None, data=None,
order=None, hue_order=None, orient=None, color=None, palette=None,
saturation=0.75, width=0.8, dodge=True, fliersize=5, linewidth=None,
whis=1.5, ax=None, **kwargs)
Parameters:
x, y, hue: Inputs for plotting long-form data.
data: Dataset for plotting. If x and y are absent, this is interpreted as
wide-form.
color: Color for all of the elements.
Returns: It returns the Axes object with the plot drawn onto it.
The box and whiskers chart shows how data is spread out. Five pieces of information are
generally included in the chart
1. The minimum is shown at the far left of the chart, at the end of the left ‘whisker’
2. First quartile, Q1, is the far left of the box (left whisker)
3. The median is shown as a line in the center of the box
4. Third quartile, Q3, shown at the far right of the box (right whisker)
5. The maximum is at the far right of the box
As could be seen in the below representations and charts, a box plot could be plotted for
one or more than one variable providing very good insights to our data.
Representation of box plot.
Box plot
representing multi-variate categorical variables
Box plot representing
multi-variate categorical variables
# import required modulesimport matplotlib as pltimport seaborn as sns
# Box plot and violin plot for Outcome vs BloodPressure_, axes = plt.subplots(1, 2,
sharey=True, figsize=(10, 4))
# box plot illustrationsns.boxplot(x='Outcome', y='BloodPressure', data=diabetes, ax=axes[0])
# violin plot illustrationsns.violinplot(x='Outcome', y='BloodPressure', data=diabetes,
ax=axes[1])
Scatter Plot
Scatter plots or scatter graphs is a bivariate plot having greater resemblance to line graphs
in the way they are built. A line graph uses a line on an X-Y axis to plot a continuous
function, while a scatter plot relies on dots to represent individual pieces of data. These
plots are very useful to see if two variables are correlated. Scatter plot could be 2
dimensional or 3 dimensional.
Syntax: seaborn.scatterplot(x=None, y=None, hue=None, style=None,
size=None, data=None, palette=None, hue_order=None,
hue_norm=None, sizes=None, size_order=None, size_norm=None,
markers=True, style_order=None, x_bins=None, y_bins=None,
units=None, estimator=None, ci=95, n_boot=1000, alpha=’auto’,
x_jitter=None, y_jitter=None, legend=’brief’, ax=None, **kwargs)
Parameters:
x, y: Input data variables that should be numeric.
data: Dataframe where each column is a variable and each row is an
observation.
size: Grouping variable that will produce points with different sizes.
style: Grouping variable that will produce points with different markers.
palette: Grouping variable that will produce points with different
markers.
markers: Object determining how to draw the markers for different
levels.
alpha: Proportional opacity of the points.
Returns: This method returns the Axes object with the plot drawn onto it.