EDA Python Guide
EDA Python Guide
1. Loading Libraries
import pandas as pd
import numpy as np
df = pd.read_csv('your_dataset.csv')
Exploratory Data Analysis in Python
3. Data Overview
print(df.head())
print(df.describe())
print(df.info())
Exploratory Data Analysis in Python
4. Cleaning Data
print(df.isnull().sum())
df.fillna(df.mean(), inplace=True)
# Handling duplicates
print(df.duplicated().sum())
df.drop_duplicates(inplace=True)
Exploratory Data Analysis in Python
5. Preprocessing Data
df = pd.get_dummies(df, columns=['categorical_column'])
# Feature Engineering
z_scores = stats.zscore(df['column_name'])
abs_z_scores = np.abs(z_scores)
df = df[filtered_entries]
Exploratory Data Analysis in Python
# Min-Max Scaling
scaler = MinMaxScaler()
# scaler = StandardScaler()
# Histogram
plt.figure(figsize=(10, 6))
sns.histplot(df['column_name'], kde=True)
plt.title('Histogram of column_name')
plt.show()
# Boxplot
plt.figure(figsize=(10, 6))
sns.boxplot(x=df['column_name'])
plt.title('Boxplot of column_name')
plt.show()
# Scatter plot
plt.figure(figsize=(10, 6))
plt.show()
plt.figure(figsize=(12, 8))
plt.title('Correlation Heatmap')
plt.show()
Exploratory Data Analysis in Python
Exploratory Data Analysis in Python
9. Summarizing Findings
print("Key Findings:")