0% found this document useful (0 votes)
18 views5 pages

ML Complete Notes Hridoy

The document outlines a comprehensive data analysis workflow using Python, including data preprocessing, visualization, and model preparation. It details steps such as handling missing values, normalizing features, and creating various visualizations like heatmaps and scatter plots. Additionally, it describes the implementation of multiple machine learning models, including Linear Regression, Logistic Regression, Decision Trees, and more, along with their evaluation metrics.

Uploaded by

Istiak Utsab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views5 pages

ML Complete Notes Hridoy

The document outlines a comprehensive data analysis workflow using Python, including data preprocessing, visualization, and model preparation. It details steps such as handling missing values, normalizing features, and creating various visualizations like heatmaps and scatter plots. Additionally, it describes the implementation of multiple machine learning models, including Linear Regression, Logistic Regression, Decision Trees, and more, along with their evaluation metrics.

Uploaded by

Istiak Utsab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

from google.

colab import drive

drive.mount('/content/drive/')​

import pandas as pd

import numpy as np

# Assuming the file is in the 'My Drive' folder of your Google Drive

file_path = '/content/drive/My Drive/healthcare-dataset/train.csv'

df = pd.read_csv(file_pa)


Part 1: Data Preprocessing

import pandas as pd # Import pandas for data manipulation

data = pd.read_csv("Data.csv") # Load the dataset

data.head() # View first few rows of the dataset

data.isnull().sum() # Check for missing values

data.fillna(data.mean(), inplace=True) # Replace missing values with column mean

data.fillna(data.mean(), inplace=True) # Replaces NaN with column mean

data.fillna(data.median(), inplace=True) # Replaces NaN with column median

for column in data.columns:

data[column].fillna(data[column].mode()[0], inplace=True) # Fill NaN with the most frequent


value (mode)

data.fillna(0, inplace=True) # Replaces all NaN with 0 (or any chosen constant)

data.dropna(inplace=True) # Removes any rows with NaN values

data.dropna(axis=1, inplace=True) # Removes columns that have missing values

from sklearn.preprocessing import LabelEncoder # For converting categorical to numerical

le = LabelEncoder()

data["Gender"] = le.fit_transform(data["Gender"]) # Encode 'Gender' column

from sklearn.preprocessing import StandardScaler # For scaling numerical features

scaler = StandardScaler()

data[["Age", "Salary"]] = scaler.fit_transform(data[["Age", "Salary"]]) # Normalize 'Age' and 'Salary'

df2['age'].fillna(df2['age'].mean(),inplace = True)
df2['age'] #for age change in filna to cover null value to replace null​
Part 2: Data Visualization​


1. Correlation Heatmap

import seaborn as sns

import matplotlib.pyplot as plt

sns.heatmap(data.corr(), annot=True, cmap="coolwarm") # Shows relationships between numerical


features

plt.title("Correlation Heatmap")

plt.show()

✅ 2. Pairplot
sns.pairplot(data, hue="Purchased") # Visualize pairwise relationships between features

✅ 3. Boxplot​
sns.boxplot(data=data[["Age", "Salary"]]) # Detect outliers and understand value distributions

✅ 4. Histogram
data["Age"].hist(bins=20)

plt.title("Distribution of Age")

plt.xlabel("Age")

plt.ylabel("Frequency")

plt.show()

✅ 5. Scatter Plot​
sns.scatterplot(x="Age", y="Salary", hue="Purchased", data=data)

plt.title("Age vs Salary")

plt.show()

✅ 6. Count Plot​
sns.countplot(x="Purchased", data=data) # Count of each class/category

✅ 7. Pie Chart (for categorical distribution)


data['Gender'].value_counts().plot.pie(autopct="%1.1f%%", shadow=True)

plt.title("Gender Distribution")

plt.show()

✅ 8. Bar Plot
sns.barplot(x="Gender", y="Salary", data=data) # Compare average Salary by Gender

import seaborn as sns


import matplotlib.pyplot as plt

sns.heatmap(data.corr(), annot=True) # Correlation heatmap

plt.title("Correlation Heatmap")

plt.show()

sns.pairplot(data) # Pairwise scatter plots for all numerical features

sns.scatterplot(x="Age", y="Salary", hue="Purchased", data=data) # Scatter plot for Age vs Salary

sns.boxplot(data=data[["Age", "Salary"]]) # Box plot to detect outliers and distributions

Part 3: Model Preparation

from sklearn.model_selection import train_test_split

X = data[["Age", "Salary"]] # Features

y = data["Purchased"] # Target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Split dataset

Model 1: Linear Regression

from sklearn.linear_model import LinearRegression

model = LinearRegression()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

from sklearn.metrics import mean_squared_error, r2_score

print("MSE:", mean_squared_error(y_test, y_pred))

print("R2 Score:", r2_score(y_test, y_pred))

Model 2: Logistic Regression

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

from sklearn.metrics import accuracy_score, confusion_matrix

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))


Model 3: Decision Tree

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

Model 4: Random Forest

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

Model 5: Support Vector Machine (SVM)

from sklearn.svm import SVC

model = SVC()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

Model 6: K-Nearest Neighbors (KNN)

from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier(n_neighbors=3)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))


Model 7: Naive Bayes

from sklearn.naive_bayes import GaussianNB

model = GaussianNB()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

You might also like