0% found this document useful (0 votes)
10 views22 pages

AIDS C04-Session-22

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views22 pages

AIDS C04-Session-22

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

21CS2213RA

AI for Data Science

Session -22

Contents: Exploratory data analysis

1
Session Objective
• An ability to understand about Exploratory data analysis

• An ability to Understand difference types of plots like bar, line,


scatter plot
Exploratory Data Analysis (EDA)

• Exploratory Data Analysis is a process of examining or


understanding the data and extracting insights or main characteristics
of the data. EDA is generally classified into two methods,
i.e. graphical analysis and non-graphical analysis.
• EDA is very essential because it is a good practice to first understand
the problem statement and the various relationships between the data
features before getting your hands dirty.
Primary motive of EDA
• Identification of variables and data types
• Analyzing the basic metrics
• Non-graphical Univariate analysis
• Graphical Univariate analysis
• Multivariate analysis
• Missing value treatment
• Correlation analysis
Types of EDA
• Univariate Non-graphical
• Multivariate Non-graphical
• Univariate graphical
• Multivariate graphical
Univariate Non-graphical
• This is the simplest form of data analysis as during this we use just
one variable to research the info.
• The standard goal of Univariate non-graphical EDA is to know the
underlying sample distribution/ data and make observations about the
population.
• Outlier detection is additionally part of the analysis. The
characteristics of population distribution include:
• Central tendency
• Spread
• Skewness and kurtosis
Multivariate Non-graphical

• Multivariate Non-graphical: Multivariate non-graphical EDA


technique is usually wont to show the connection between two or more
variables within the sort of either cross-tabulation or statistics.
Univariate graphical
• Non-graphical methods are quantitative and objective, they are doing
not give the complete picture of the data; therefore, graphical methods
are more involve a degree of subjective analysis, also are required.
Common sorts of Univariate graphics are:
• Histogram
• Stem-and-leaf plots
• Boxplots
• Quantile-normal plots
Multivariate graphical
• Multivariate graphical data uses graphics to display relationships
between two or more sets of knowledge. The sole one used commonly
may be a grouped bar plot with each group representing one level of 1
of the variables and every bar within a gaggle representing the amount
of the opposite variable. Other common sorts of multivariate graphics
are:
• Scatterplot
• Run chart
• Heat map
• Multivariate chart
• Bubble chart
Steps in EDA
• Exploratory Data Analysis, or EDA, is an important step in any Data
Analysis or Data Science project. EDA is the process of investigating
the dataset to discover patterns, and anomalies (outliers), and form
hypotheses based on our understanding of the dataset.
• EDA involves generating summary statistics for numerical data in the
dataset and creating various graphical representations to understand
the data better. In this article, we will understand EDA with the help of
an example dataset. We will use Python language (Pandas library) for
this purpose.
Importing libraries
• We will start by importing the libraries we will require for performing
EDA. These include NumPy, Pandas, Matplotlib, and Seaborn.
Reading data
• Read the data from the CSV file into a pandas dataframe.

• Let us have a look at how our dataset looks like using df.head(). The
output should look like this:
Descriptive Statistics
• Our dataset contains data about different students at a school/college,
and their scores in 3 subjects. We need descriptive statistic parameters
for the dataset. We will use describe() for this.

• By assigning include attribute a value of ‘all’, we make sure that


categorical features are also included in the result. The output
Dataframe should look like this:
For numerical parameters, fields like mean, standard deviation, percentiles, and
maximum have been populated. For categorical features, count, unique, top (most
frequent value), and corresponding frequency have been populated. This gives us a
broad idea of our dataset.
Important Components for Plotting
• Python 3.X You must have
• Python libraries
• Pandas
• NumPy
• SciPy
• Matplotlib
• seaborn
• Bookeh
Type of Plots
• There are many types of visualizations.
• line plot
• scatter plot
• box plot
• bar chart
• pie chart.
• Heat map
• Violin plot
Simple Line Plots

It can display information as a


Data points are called
series of data points and
“Marker”
connected by straight lines.

In this type of plot, we need the This type of plot is often used
measurement points to be to visualize a trend in data over
ordered (typically by their x- intervals of time - a time series.
axis values).
SAMPLE LINE CHART

import matplotlib.pyplot as plt


years = [1983, 1984, 1985, 1986, 1987]
total_populations = [8939007,
8954518, 8960387, 8956741, 8943721]
plt.plot(years,
total_populations,marker='o')
plt.title("Year vs Population in
Bulgaria")
plt.xlabel("Year")
plt.ylabel("Total Population")
plt.show()
Scatterplots
• Scatterplots show many points plotted
in the Cartesian plane. Each point
represents the values of two variables.
One variable is chosen in the
horizontal axis and another in the
vertical axis.
• Scatter plot can be created using the
DataFrame.plot.scatter() methods.
Example of Simple Scatter Plot

import matplotlib.pyplot as plt


temp = [30, 32, 33, 28.5, 35, 29, 29]
rainfall_perhour = [10, 11, 12, 16, 12, 14, 13]
plt.scatter(temp,rainfall_perhour )
plt.title("Temperature vs. Rainfall")
plt.xlabel("Temperature(deg)")
plt.ylabel("Rainfall(mm)")
plt.show()
Bivariate Analysis Graphical representation
Scatter plot Line plot
Thank you

22

You might also like