0% found this document useful (0 votes)
24 views

EDA Python Guide

Uploaded by

Muhammad Faizan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

EDA Python Guide

Uploaded by

Muhammad Faizan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Exploratory Data Analysis in Python

Exploratory Data Analysis in Python

1. Loading Libraries

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from scipy import stats

from sklearn.preprocessing import MinMaxScaler, StandardScaler


Exploratory Data Analysis in Python

2. Loading the Dataset

# Example: Loading a CSV file

df = pd.read_csv('your_dataset.csv')
Exploratory Data Analysis in Python

3. Data Overview

# Display the first few rows of the dataset

print(df.head())

# Display summary statistics

print(df.describe())

# Display information about the dataset

print(df.info())
Exploratory Data Analysis in Python

4. Cleaning Data

# Handling missing values

print(df.isnull().sum())

df.fillna(df.mean(), inplace=True)

# Handling duplicates

print(df.duplicated().sum())

df.drop_duplicates(inplace=True)
Exploratory Data Analysis in Python

5. Preprocessing Data

# Encoding categorical variables

df = pd.get_dummies(df, columns=['categorical_column'])

# Feature Engineering

df['new_feature'] = df['existing_feature1'] * df['existing_feature2']


Exploratory Data Analysis in Python

6. Outlier Detection and Treatment

# Using Z-score to identify outliers

z_scores = stats.zscore(df['column_name'])

abs_z_scores = np.abs(z_scores)

filtered_entries = (abs_z_scores < 3)

df = df[filtered_entries]
Exploratory Data Analysis in Python

7. Scaling and Normalization

# Min-Max Scaling

scaler = MinMaxScaler()

df[['column1', 'column2']] = scaler.fit_transform(df[['column1', 'column2']])

# Alternatively, for Standardization

# scaler = StandardScaler()

# df[['column1', 'column2']] = scaler.fit_transform(df[['column1', 'column2']])


Exploratory Data Analysis in Python

8. Data Visualization (Examples)

# Histogram

plt.figure(figsize=(10, 6))

sns.histplot(df['column_name'], kde=True)

plt.title('Histogram of column_name')

plt.show()

# Boxplot

plt.figure(figsize=(10, 6))

sns.boxplot(x=df['column_name'])

plt.title('Boxplot of column_name')

plt.show()

# Scatter plot

plt.figure(figsize=(10, 6))

sns.scatterplot(x='column1', y='column2', data=df)

plt.title('Scatter plot between column1 and column2')

plt.show()

# Heatmap for correlation

plt.figure(figsize=(12, 8))

sns.heatmap(df.corr(), annot=True, cmap='coolwarm')

plt.title('Correlation Heatmap')

plt.show()
Exploratory Data Analysis in Python
Exploratory Data Analysis in Python

9. Summarizing Findings

print("Key Findings:")

print("1. Description of key patterns or anomalies.")

print("2. Potential relationships between features.")

print("3. Insights on missing values and outliers.")

You might also like