0% found this document useful (0 votes)

21 views4 pages

Exp 12

The document outlines the process of Exploratory Data Analysis (EDA) for classification using Pandas and Matplotlib, detailing key steps such as data collection, cleaning, visualization, and feature engineering. It provides practical code examples for loading a dataset, inspecting data, handling missing values, and visualizing relationships between variables. The document emphasizes the importance of EDA in understanding data before advanced analytics or modeling.

Uploaded by

8367748261durga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views4 pages

Exp 12

Uploaded by

8367748261durga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Experiment-12:

Exploratory Data Analysis for Classification using Pandas or

Matplotlib.

Introduction to EDA

Exploratory Data Analysis (EDA) is a critical step in the data analysis process, which
involves examining and visualizing data to gain insights and uncover patterns, anomalies,
and relationships within the dataset. EDA helps data analysts and scientists understand the
data they are working with before proceeding to more advanced analytics or modeling.
Below is a detailed explanation of the key steps involved in EDA:
1. Data Collection:
Gather the dataset from various sources, such as databases, CSV files, APIs, or web
scraping.
Ensure that the data is structured and organized for analysis.
2. Data Loading:
Import the dataset into your preferred data analysis environment, such as Python using
libraries like Pandas.
3. Initial Data Inspection:
Examine the first few rows of the dataset to get a sense of its structure and content.
Check the data types, column names, and missing values.
4. Data Cleaning:
Handle missing values by either imputing them or removing rows/columns with missing
data.
Correct data inconsistencies and errors, such as typos and outliers.
Ensure that data types are appropriate for each column (e.g., numeric, categorical).
5. Descriptive Statistics:
Calculate basic statistics for numerical variables, including mean, median, standard
deviation, and quartiles.
Understand the central tendencies and spread of the data.
6. Univariate Analysis:
Visualize the distribution of individual variables through histograms, density plots, box
plots, or bar charts.
Identify outliers and anomalies.
7. Bivariate and Multivariate Analysis:
Explore relationships between pairs of variables through scatter plots, heatmaps, or
correlation matrices.
Investigate how variables interact with each other.
Identify potential predictors for the target variable in a classification or regression task.
8. Data Visualization:
Create meaningful visualizations such as line plots, bar charts, pie charts, and box plots to
represent data patterns.
Use color and labels to make visualizations more interpretable.
9. Feature Engineering:
Create new features based on domain knowledge or insights from the EDA.
Transform variables to better suit the modeling algorithms.
10. Outlier Detection: - Identify and handle outliers that may affect the quality of the
analysis or model. - Consider whether outliers should be removed or transformed.
11. Categorical Variable Analysis: - Analyze categorical variables using frequency
tables, bar plots, or stacked bar charts. - Understand the distribution of categories within
each variable.
12. Time Series Analysis (if applicable): - For time series data, examine trends,
seasonality, and autocorrelation. - Decompose time series data to better understand its
components.
13. Hypothesis Testing (if applicable): - Perform statistical tests to validate or reject
hypotheses about the data. - Common tests include t-tests, chi-squared tests, and ANOVA.
14. Summary and Insights: - Summarize the key findings from the EDA process. -
Document interesting patterns, relationships, and potential insights.
15. Data Visualization and Reporting: - Create clear and informative data visualizations
for reporting and presentation. - Communicate the results and insights effectively to
stakeholders.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load your dataset (replace 'your_dataset.csv' with your dataset's file path)
data = pd.read_csv('/content/Iris.csv')

# Display the first few rows of the dataset to get an overview

print(data.head())

# Summary statistics for numeric columns

print(data.describe())

# Missing value analysis

print("\nMissing Values:")
print(data.isnull().sum())

# Explore missing data

plt.figure(figsize=(8, 6))
sns.heatmap(data.isnull(), cbar=False, cmap='viridis')
plt.title('Missing Data')
plt.show()

# Class distribution for classification

class_counts = data['Species'].value_counts()
print(class_counts)

# Visualization of class distribution

plt.figure(figsize=(8, 6))
sns.countplot(x='Species', data=data)
plt.title('Class Distribution')
plt.xlabel('Target Class')
plt.ylabel('Count')
plt.show()

# Import label encoder

from sklearn import preprocessing

# label_encoder object knows

# how to understand word labels.
label_encoder = preprocessing.LabelEncoder()

# Encode labels in column 'species'.

data['Species']= label_encoder.fit_transform(data['Species'])

data['Species'].unique()

# Correlation matrix for numeric features

correlation_matrix = data.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

# Pairplot to visualize relationships between numerical features

sns.pairplot(data, hue='Species')
plt.show()

# Box plots for numerical features vs. target variable

plt.figure(figsize=(12, 8))
for i, feature in enumerate(data.columns[:-1]):
plt.subplot(2, 3, i + 1)
sns.boxplot(data=data, x='Species', y=feature)
plt.title(f'{feature} vs. Target')
plt.tight_layout()
plt.show()

# Box plots for numeric features by class

plt.figure(figsize=(12, 8))
sns.boxplot(x='Species', y='SepalLengthCm', data=data)
plt.title('Box Plot of SepalLengthCm by Class')
plt.xlabel('Target Class')
plt.ylabel('Feature1')
plt.show()
# Histograms for numeric features
data.hist(bins=20, figsize=(12, 8))
plt.suptitle('Histograms of Numeric Features', y=1.02)
plt.show()

# Distribution plots for numerical features

numerical_features = data.select_dtypes(include=['int64', 'float64']).columns
plt.figure(figsize=(12, 8))
for i, feature in enumerate(numerical_features):
plt.subplot(2, 3, i + 1)
sns.histplot(data=data, x=feature, kde=True)
plt.title(f'{feature} Distribution')
plt.tight_layout()
plt.show()

# Scatter plot for feature relationships

plt.figure(figsize=(8, 6))
sns.scatterplot(data, x='SepalLengthCm', y='SepalWidthCm', hue='Species')
plt.title("Scatter Plot between Feature1 and Feature2")
plt.show()

# Pairwise feature correlation with the target variable

correlation_with_target = data.corr()['Species'].abs().sort_values(ascending=False)
print("\nFeature Correlation with Target:")
print(correlation_with_target)

Azure OpenAI Cookbook
No ratings yet
Azure OpenAI Cookbook
173 pages
Exploratory Data Analysis (EDA) in Python
No ratings yet
Exploratory Data Analysis (EDA) in Python
6 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
13 pages
ML Exp No 1
No ratings yet
ML Exp No 1
8 pages
Unit 3
No ratings yet
Unit 3
222 pages
EDA & Data Viz
No ratings yet
EDA & Data Viz
21 pages
Course Title: Data Pre-Processing and Visualization
100% (2)
Course Title: Data Pre-Processing and Visualization
11 pages
Exploratory Data Analysis of Heart Disease Dataset 1737826105
No ratings yet
Exploratory Data Analysis of Heart Disease Dataset 1737826105
50 pages
Data Analytics Fundamentals-2
No ratings yet
Data Analytics Fundamentals-2
34 pages
AIDS C04-Session-22
No ratings yet
AIDS C04-Session-22
22 pages
1.3.1. Exploratory Data Analysis
No ratings yet
1.3.1. Exploratory Data Analysis
24 pages
Guidebook On Exploratory Data Analysis
No ratings yet
Guidebook On Exploratory Data Analysis
27 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Comprehensive EDA Python Guide
No ratings yet
Comprehensive EDA Python Guide
13 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
Ai ML Exp2
No ratings yet
Ai ML Exp2
7 pages
UNIT 1 Exploratory Data Analysis
100% (1)
UNIT 1 Exploratory Data Analysis
8 pages
Learneverythingai
No ratings yet
Learneverythingai
9 pages
Lab07ML - f40
No ratings yet
Lab07ML - f40
13 pages
Edap Lab
No ratings yet
Edap Lab
47 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Ccs346 Eda Unit 1
No ratings yet
Ccs346 Eda Unit 1
139 pages
DSML Notes
No ratings yet
DSML Notes
32 pages
Intro
No ratings yet
Intro
26 pages
EDA Feature Eng - Estimation Inference and Hypothesis
No ratings yet
EDA Feature Eng - Estimation Inference and Hypothesis
53 pages
05 AIHC Exp02
No ratings yet
05 AIHC Exp02
11 pages
Perform Exploratory Data Analysis
No ratings yet
Perform Exploratory Data Analysis
5 pages
Unit 2
No ratings yet
Unit 2
36 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
7 pages
Dev Ans
No ratings yet
Dev Ans
8 pages
Practical 02
No ratings yet
Practical 02
3 pages
PDF Experiments-1 DADV
No ratings yet
PDF Experiments-1 DADV
41 pages
Exploratory Data Analysis With Python
No ratings yet
Exploratory Data Analysis With Python
2 pages
CS202 Assignment - 4 - GIKI
No ratings yet
CS202 Assignment - 4 - GIKI
3 pages
Document
No ratings yet
Document
21 pages
DL EDA Process
No ratings yet
DL EDA Process
2 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Assignment EDA
No ratings yet
Assignment EDA
4 pages
Exp 12
No ratings yet
Exp 12
7 pages
Machine Learning Project Roadmap
No ratings yet
Machine Learning Project Roadmap
4 pages
DAV Practical 2
No ratings yet
DAV Practical 2
6 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
3 pages
Unit 1
No ratings yet
Unit 1
23 pages
Unit 6
No ratings yet
Unit 6
3 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
11 pages
Machine
No ratings yet
Machine
10 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
INDEX
No ratings yet
INDEX
16 pages
Dev 1
No ratings yet
Dev 1
2 pages
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
No ratings yet
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
4 pages
DSP Unit - Ii
No ratings yet
DSP Unit - Ii
14 pages
Activity EDA
No ratings yet
Activity EDA
4 pages
Lesson 5 Exploratory Data Analysis
No ratings yet
Lesson 5 Exploratory Data Analysis
10 pages
Dev Core
No ratings yet
Dev Core
7 pages
Path, Path Products and Regular Expressions - G9
No ratings yet
Path, Path Products and Regular Expressions - G9
37 pages
Exploratory Data Analysis (EDA)
No ratings yet
Exploratory Data Analysis (EDA)
1 page
Eda Sandhya
No ratings yet
Eda Sandhya
7 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
4 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
4 pages
Silo - Tips - Application Packaging Interview Questions and Answers PDF
No ratings yet
Silo - Tips - Application Packaging Interview Questions and Answers PDF
10 pages
Pe s4hc PR Dd2 Wa
No ratings yet
Pe s4hc PR Dd2 Wa
8 pages
MCQ Questions On Cyber Security
No ratings yet
MCQ Questions On Cyber Security
13 pages
Complete Textbook Upate 55 Coppy-1
100% (1)
Complete Textbook Upate 55 Coppy-1
140 pages
Counting Figures 1
No ratings yet
Counting Figures 1
14 pages
Replit Prompt
No ratings yet
Replit Prompt
3 pages
Boats and Streams
No ratings yet
Boats and Streams
12 pages
Cyberstalking and Cyberbullying: Effects and Prevention Measures
No ratings yet
Cyberstalking and Cyberbullying: Effects and Prevention Measures
8 pages
Automatic & Manual Vacuum Cleaning Robot
No ratings yet
Automatic & Manual Vacuum Cleaning Robot
4 pages
Maroche MG164 UM
No ratings yet
Maroche MG164 UM
74 pages
4721 Assig 02
No ratings yet
4721 Assig 02
37 pages
Computer Programming and Data Structures, CS-322
No ratings yet
Computer Programming and Data Structures, CS-322
3 pages
Exp 14
No ratings yet
Exp 14
27 pages
Probability
No ratings yet
Probability
34 pages
Cubes and Dice
No ratings yet
Cubes and Dice
21 pages
SAP HANA 2.0 Cockpit Central Release Note
No ratings yet
SAP HANA 2.0 Cockpit Central Release Note
4 pages
Quick Start Guide - SAP BTP SDK For iOS
No ratings yet
Quick Start Guide - SAP BTP SDK For iOS
26 pages
Machine Learning Algorithms From Scratch
No ratings yet
Machine Learning Algorithms From Scratch
9 pages
Byu Pathway Online Degree Maps
No ratings yet
Byu Pathway Online Degree Maps
12 pages
Cape Pure Math Unit 2 The Binomial Theorem
No ratings yet
Cape Pure Math Unit 2 The Binomial Theorem
4 pages
Profit & Loss
No ratings yet
Profit & Loss
3 pages
Microprocessors and Peripherals: Lab Programs - 2019
No ratings yet
Microprocessors and Peripherals: Lab Programs - 2019
40 pages
Adhisuchna 25042013
No ratings yet
Adhisuchna 25042013
5 pages
Seven Segment Decoder
No ratings yet
Seven Segment Decoder
4 pages
Bhavna Interiors Details
No ratings yet
Bhavna Interiors Details
11 pages
MTH603 Final Term Solved MCQ's
No ratings yet
MTH603 Final Term Solved MCQ's
9 pages
A Transfer Alignment Algorithm Study Based On Actual Flight Test Data From A Tactical Air-To-Ground Weapon Launch
No ratings yet
A Transfer Alignment Algorithm Study Based On Actual Flight Test Data From A Tactical Air-To-Ground Weapon Launch
8 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Guide For Anaconda Navigator Installation
No ratings yet
Guide For Anaconda Navigator Installation
4 pages
Cspo GEM
No ratings yet
Cspo GEM
3 pages
On The Robustness of Binomial Model and Finite Difference Method For Pricing European Options
No ratings yet
On The Robustness of Binomial Model and Finite Difference Method For Pricing European Options
7 pages
CS1101-DF-Unit 5 - Strings and Iterations
No ratings yet
CS1101-DF-Unit 5 - Strings and Iterations
7 pages
Usps Tracking - Google Search
No ratings yet
Usps Tracking - Google Search
1 page
Advantages and Disadvantages of VIDEO CALLING and SOCIAL NETWORKING
No ratings yet
Advantages and Disadvantages of VIDEO CALLING and SOCIAL NETWORKING
1 page
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet

Exp 12

Uploaded by

Exp 12

Uploaded by

Experiment-12:

Exploratory Data Analysis for Classification using Pandas or

# Display the first few rows of the dataset to get an overview

# Summary statistics for numeric columns

# Missing value analysis

# Explore missing data

# Class distribution for classification

# Visualization of class distribution

# Import label encoder

# label_encoder object knows

# Encode labels in column 'species'.

# Correlation matrix for numeric features

# Pairplot to visualize relationships between numerical features

# Box plots for numerical features vs. target variable

# Box plots for numeric features by class

# Distribution plots for numerical features

# Scatter plot for feature relationships

# Pairwise feature correlation with the target variable

You might also like