Exp 12

Ml lab exp 12

Uploaded by

g.monikadevi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

21 views7 pages

Exp 12

Ml lab exp 12

Uploaded by

g.monikadevi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 7

Experiment-12 Exploratory Data Analysis for Classification using Panda or Matplottib. Introduction to EDA Data Preparation fr ru Structure Data Exploration Insight Data Analysis Reports (EDA) visual Graphs Exploratory Data Analysis (EDA) is a critical step in the data analysis process, which involves examining and visualizing data to gain insights and uncover patterns, anomalies, and relationships within the dataset. EDA helps data analysts and scientists understand the data they are working with before proceeding to more advanced analytics or modeling. Below is a detailed explanation of the key steps involved in EDA: 1. Data Collection: Gather the dataset from various sources, such as databases, CSV files, APIs, or web scraping. Ensure that the data is structured and organized for analysis 2. Data Loading: Import the dataset into your preferred data analysis environment, such as Python using libraries like Pandas. 3. Initial Data Inspection: Examine the first few rows of the dataset to get a sense of its structure and content. Check the data types, column names, and missing values. Page 128 of 1824, Data Cleaning: Handle missing values by either imputing them or removing rows/columns with missing data. Correct data inconsistencies and errors, such as typos and outliers, Ensure that data types are appropriate for each column (e.g., numeric, categorical) 5. Descriptive Statistics: Calculate basic statistics for numerical variables, including mean, median, standard deviation, and quartiles. Understand the central tendencies and spread of the data, 6, Univariate Analysis: Visualize the distribution of individual variables through histograms, density plots, box plots, or bar charts Identify outliers and anomalies. 7. Bivariate and Multivariate Analysis: Explore relationships between pairs of variables through scatter plots, heatmaps, or correlation matrices. Investigate how variables interact with each other. Identify potential predictors for the target variable in a classification or regression task 8. Data Visualization: Create meaningful visualizations such as line plots, bar charts, pie charts, and box plots to represent data patterns. se color and labels to make visualizations more interpretable. 9, Feature Engineering: Create new features based on domain knowledge or insights from the EDA. Transform variables to better suit the modeling algorithms. Page 129 of 18210. Outlier Detection: - Identify and handle outliers that may affect the quality of the analysis or model. - Consider whether outliers should be removed or transformed. 11. Categorical Variable Analysis: - Analyze categorical variables using frequency tables, bar plots, or stacked bar charts. - Understand the distribution of categories within each variable. 12. Time Series Analysis (if applicable): - For time series data, examine trends, seasonality, and autocorrelation. - Decompose time series data to better understand its components. 13. Hypothesis Testing (if applicable): - Perform statistical tests to validate or reject hypotheses about the data, - Common tests include t-tests, chi-squared tests, and ANOVA. 14. Summary and Insights: - Summarize the key findings from the EDA proce: Document interesting patterns, relationships, and potential insights. 15. Data Visualization and Reporting: - Create clear and informative data visualizations for reporting and presentation. - Communicate the results and insights effectively to stakeholders. import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Load your dataset (replace 'your_dataset.csv' with your dataset's file path) data = pd.read_esv(/content/Iris.csv') # Display the first few rows of the dataset to get an overview print(data.head() # Summary statistics for numeric columns print(data.deseribe()) Page 130 of 182# Missing value analysis print("\nMissing Values:") print(data.isnull().sum() # Explore missing data plt.figure(figsize=(8, 6)) sns.heatmap(data.isnull(), cbar=False, emap='viridis') plt.itle(Missing Data’) plt.show() # Class distribution for classification class_counts = data['Species'].value_counts() print(class_counts) # Visualization of class distribution pit figure(figsize=(8, 6)) sns.countplot(x—'Species’, data-data) plt.title(Class Distribution’) plt.xlabel(’Target Class’) pltylabel((Count’) plt.show0) # Import label encoder from skleam import preprocessing Page 131 of 182# label_encoder object knows # how to understand word labels. label_encoder ~ preprocessing. LabelEncoder() # Encode labels in column 'species'. data['Species'}= label_encoder.fit_transform(data['Species'}) data['Species'].unique() # Correlation matrix for numeric features correlation_matrix = data.corr() plt.figure(figsize=(10, 8)) sns.heatmap(correlation_matrix, annot=True, emap='coolwarm') plt.title('Correlation Matrix’) plt.show(, # Pairplot to visualize relationships between numerical features sns.pairplot(data, hue='Species’) plt.show0, # Box plots for numerical features vs. target variable plt.figure(figsize-(12, 8)) for i, feature in enumerate(data.columns[:-1]): plt.subplot(2, 3, i+ 1) sns.boxplot(data=data, x='Species’, yfeature) Page 132 of 182plttitle(f {feature} vs. Target’) plt.tight_layout() plt.show0) # Box plots for numeric features by cl pit.figure(figsize=(12, 8)) sns.boxplot(x='Species', y—'SepalLengthCm’, data=data) plt.itle('Box Plot of SepalLengthCm by Class’) plt.xlabel(’Target Class’) plt.ylabel('Feature!") pltshow() # Histograms for numeric features data.hist(bins=20, figsize=(12, 8)) plt.suptitle(Histograms of Numeric Features’, y-1.02) pltshow() # Distribution plots for numerical features numerical_features = data.select_dtypes(include=['int64’, 'float64']).columns plt.figure(figsize=(12, 8)) for i, feature in enumerate(numerical_features): plt.subplot(2, 3, i+ 1) sns.histplot(data=data, x=feature, kde=True) plttitle(f {feature} Distribution’) plt.tight_layout() Page 133 of 182plt.show() # Scatter plot for feature relationships plt.figure(figsize=(8, 6)) sns.scatterplot(data, x='SepalLengthCm’, y’SepalWidthCm', hue='Species') plt.title("Scatter Plot between Featurel and Feature2") pit.show() # Pairwise feature correlation with the target variable correlation_with_target = data.corr()['Species'].abs().sort_values(ascending-False) print("\nFeature Correlation with Target:") print(correlation_with_target) Page 134 of 182

Unit I - Part I Notes
100% (7)
Unit I - Part I Notes
33 pages
Exp 12
No ratings yet
Exp 12
4 pages
Unit 1
No ratings yet
Unit 1
23 pages
Document
No ratings yet
Document
21 pages
Eda Sandhya
No ratings yet
Eda Sandhya
7 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
11 pages
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
No ratings yet
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
4 pages
DL EDA Process
No ratings yet
DL EDA Process
2 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
DSML Notes
No ratings yet
DSML Notes
32 pages
PDF Experiments-1 DADV
No ratings yet
PDF Experiments-1 DADV
41 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
4 pages
Data Analytics Fundamentals-2
No ratings yet
Data Analytics Fundamentals-2
34 pages
Dev 1
No ratings yet
Dev 1
2 pages
CS202 Assignment - 4 - GIKI
No ratings yet
CS202 Assignment - 4 - GIKI
3 pages
Intro
No ratings yet
Intro
26 pages
03a EDA
No ratings yet
03a EDA
47 pages
Ai ML Exp2
No ratings yet
Ai ML Exp2
7 pages
Machine
No ratings yet
Machine
10 pages
Eda 2
No ratings yet
Eda 2
69 pages
AIDS C04-Session-22
No ratings yet
AIDS C04-Session-22
22 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
13 pages
Dev Answer Key
No ratings yet
Dev Answer Key
21 pages
AI6322 - Module 3 - Exploratory Data Analysis (EDA) - MODULE
No ratings yet
AI6322 - Module 3 - Exploratory Data Analysis (EDA) - MODULE
15 pages
EDA Feature Eng - Estimation Inference and Hypothesis
No ratings yet
EDA Feature Eng - Estimation Inference and Hypothesis
53 pages
Exploratory Data Analysis EDA Part of Data PreProcessing
No ratings yet
Exploratory Data Analysis EDA Part of Data PreProcessing
11 pages
Lesson 5 Exploratory Data Analysis
No ratings yet
Lesson 5 Exploratory Data Analysis
10 pages
1.3.1. Exploratory Data Analysis
No ratings yet
1.3.1. Exploratory Data Analysis
24 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Dev Core
No ratings yet
Dev Core
7 pages
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
No ratings yet
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
9 pages
Lab07ML - f40
No ratings yet
Lab07ML - f40
13 pages
Wa0000.
No ratings yet
Wa0000.
15 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Perform Exploratory Data Analysis
No ratings yet
Perform Exploratory Data Analysis
5 pages
05 AIHC Exp02
No ratings yet
05 AIHC Exp02
11 pages
Unit 1
No ratings yet
Unit 1
50 pages
DSP Unit - Ii
No ratings yet
DSP Unit - Ii
14 pages
Group 7
No ratings yet
Group 7
19 pages
UNIT 1 Exploratory Data Analysis
100% (1)
UNIT 1 Exploratory Data Analysis
8 pages
Eda 1
No ratings yet
Eda 1
25 pages
BI-LEc 3
No ratings yet
BI-LEc 3
24 pages
Unit 3
No ratings yet
Unit 3
47 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
4 pages
Unit 1
No ratings yet
Unit 1
52 pages
Ch-1 Introduction To Data Analysis
No ratings yet
Ch-1 Introduction To Data Analysis
23 pages
Assignment EDA
No ratings yet
Assignment EDA
4 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
3 pages
Guidebook On Exploratory Data Analysis
No ratings yet
Guidebook On Exploratory Data Analysis
27 pages
ML Exp1 - 2201107
No ratings yet
ML Exp1 - 2201107
34 pages
Unit 1
No ratings yet
Unit 1
19 pages
Unit 3
No ratings yet
Unit 3
222 pages
DAV Practical 2
No ratings yet
DAV Practical 2
6 pages
Unit 2
No ratings yet
Unit 2
58 pages
Notes - Unit 1 - Exploratory Data Analysis
No ratings yet
Notes - Unit 1 - Exploratory Data Analysis
33 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
No ratings yet
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
8 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
7 pages

Exp 12

Uploaded by

Exp 12

Uploaded by

You might also like