00. Data+Visualization+in+Python
00. Data+Visualization+in+Python
data handling and cleaning, sanity checks, and various charts and plots, all of which can be
used to observe trends, obtain relationships, and portray final results.
The major advantage of data visualisation is that we can decipher the underlying patterns
from the raw numbers which are otherwise difficult to see for the human eye. Therefore, it
is important to visualise the data to observe how different features behave. The following
example shows the distribution of sales corresponding to specific discount rates in four
major cities. By observing the raw numbers, it is difficult to conclude anything because both
the statistics - average and standard deviation - are equal across cities. However, when the
relationship between sales and discount rate is obtained, it is observed that different cities
follow their own trends.
Anscombe’s Quartet
Each branch employs different strategies to calculate its discount rates. Moreover, sales
numbers were also different across branches. Therefore, one should utilise an
appropriate visualisation technique to ‘look’ into the data.
Each block is further divided into several blocks representing the runs scored by the
cricketers in each innings.
Once you load a dataset, there is a possibility of many disturbances being present. The
most common ones are missing values and incorrect data types. Following are some
common techniques to address these issues:
For missing values: Following are some common techniques to address this issue:
Example:
Example:
Outliers are extreme values that deviate from other observations on data. They may
indicate variability in measurement, experimental errors, or a novelty. In other words, an
outlier is an observation that diverges from an overall
pattern on a sample. This is where one should start
utilising visualisation to achieve tasks. The visualisation
best suited for this is the box plot.
Functionalities of Seaborn:
● Dataset-oriented API
● Analysing univariate and bivariate distributions
● Automatic estimation and plotting of linear regression models
● Convenient views for complex datasets
● Concise control over the style
● Colour palettes
Importing Seaborn:
Following is an example of a distribution plot. Notice that the left axis has the density for
each bin or bucket instead of the frequency.
Distribution Plot
2. Bar Chart: Bar charts are among the most frequently used chart types. As the name
suggests, a bar chart is composed of a series of bars illustrating a variable’s development.
Given that bar charts are such a common chart type, people are generally familiar with
them and can understand them easily. Examples like the following one are
straightforward to read.
Scatter plots are perhaps one of the most commonly used and powerful visualisations
used in the field of machine learning. They are crucial in revealing relationships between
the data points. And one can generally deduce some sort of trends in the data with the
help of a scatter plot.
Code:
● Joint plot displays a relationship between two variables. On the other hand, Reg plots are an extension
to the joint plots with the addition of a regression line to
the view.
Heat maps utilise the concept of using colours and colour intensities to visualise a range of
values. In Python, you can create a heat map whenever you have a rectangular grid or table
of numbers analysing any two features.
A line chart or a line graph is a type of chart which displays information as a series of
data points called markers connected by straight line segments. It is a basic type of chart
commonly used in a number of domains.
The plotly Python library is an interactive, open-source plotting library that supports over 40
unique chart types covering a wide range of statistical, financial, geographic, scientific, and
three-dimensional use cases.