0% found this document useful (0 votes)

9 views7 pages

Machine Learning Assignment

This document presents a comprehensive analysis of student performance using various machine learning techniques, including regression, classification, and clustering models. Key methods employed include Linear Regression, Logistic Regression, Decision Trees, and K-Means Clustering, with evaluation metrics such as R² Score and Confusion Matrix. The analysis aims to predict student performance based on the dataset from Kaggle, showcasing the effectiveness of each model.

Uploaded by

www.shashanksaini1111

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views7 pages

Machine Learning Assignment

Uploaded by

www.shashanksaini1111

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

In [ ]: # This Python 3 environment comes with many helpful analytics libraries inst

# It is defined by the kaggle/python Docker image: https://github.com/kaggle

# For example, here's several helpful packages to load

import numpy as np # linear algebra

import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory

# For example, running this (by clicking run or pressing Shift+Enter) will l

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that

# You can also write temporary files to /kaggle/temp/, but they won't be sav

Student Performance Analysis Using

Machine Learning
This notebook implements various machine learning techniques on the Student
Performance dataset from Kaggle.

Included Techniques:

Simple & Multiple Linear Regression

Polynomial, Lasso & Ridge Regression
Naïve Bayes, Logistic Regression
Decision Tree, SVM, K-NN Classifier
Artificial Neural Network
K-Means and Hierarchical Clustering

Evaluation Metrics:

R² Score (Regression)
Confusion Matrix, F1 Score (Classification)
Silhouette Score (Clustering)

In [1]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.metrics import mean_squared_error, r2_score, confusion_matrix,
In [4]: df = pd.read_csv('/kaggle/input/students-performance-in-exams/StudentsPerfor
df = pd.get_dummies(df, drop_first=True)

X = df.drop(['math score'], axis=1)

y = df['math score']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, ran

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

📈 Regression Models
1️⃣ Simple Linear Regression
In [5]: from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train[:, [0]], y_train)
y_pred = model.predict(X_test[:, [0]])

print("R² Score:", r2_score(y_test, y_pred))

R² Score: 0.6804469009921283

2️⃣ Multiple Linear Regression

In [6]: model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print("R² Score:", r2_score(y_test, y_pred))

R² Score: 0.8804332983749564

3️⃣ Polynomial Regression

In [7]: poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X_train)

model = LinearRegression()
model.fit(X_poly, y_train)
y_pred = model.predict(poly.transform(X_test))

print("R² Score:", r2_score(y_test, y_pred))

R² Score: 0.8650480765142721
4️⃣ Ridge and Lasso Regression
In [8]: from sklearn.linear_model import Ridge, Lasso

ridge = Ridge(alpha=1.0).fit(X_train, y_train)

lasso = Lasso(alpha=0.1).fit(X_train, y_train)

print("Ridge R²:", r2_score(y_test, ridge.predict(X_test)))

print("Lasso R²:", r2_score(y_test, lasso.predict(X_test)))

Ridge R²: 0.8805453685953484

🤖 Classification Models
Lasso R²: 0.8822147639745545

🔁 Convert Target to Binary (Pass/Fail)

In [9]: df['target'] = ['pass' if s >= 50 else 'fail' for s in df['math score']]

X = df.drop(['math score', 'target'], axis=1)

y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, ran

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

5️⃣ Logistic Regression

In [10]: from sklearn.linear_model import LogisticRegression

log_model = LogisticRegression()
log_model.fit(X_train, y_train)
log_preds = log_model.predict(X_test)

print("Confusion Matrix:\n", confusion_matrix(y_test, log_preds))

print("Classification Report:\n", classification_report(y_test, log_preds))
Confusion Matrix:
[[ 20 14]
[ 7 159]]
Classification Report:
precision recall f1-score support

fail 0.74 0.59 0.66 34

pass 0.92 0.96 0.94 166

accuracy 0.90 200

macro avg 0.83 0.77 0.80 200
weighted avg 0.89 0.90 0.89 200

6️⃣ Naïve Bayes Classifier

In [11]: from sklearn.naive_bayes import GaussianNB

nb_model = GaussianNB()
nb_model.fit(X_train, y_train)
nb_preds = nb_model.predict(X_test)

print("Confusion Matrix:\n", confusion_matrix(y_test, nb_preds))

print("Classification Report:\n", classification_report(y_test, nb_preds))

Confusion Matrix:
[[ 23 11]
[ 18 148]]
Classification Report:
precision recall f1-score support

fail 0.56 0.68 0.61 34

pass 0.93 0.89 0.91 166

accuracy 0.85 200

macro avg 0.75 0.78 0.76 200
weighted avg 0.87 0.85 0.86 200

7️⃣ K-Nearest Neighbors (K-NN)

In [12]: from sklearn.neighbors import KNeighborsClassifier

knn_model = KNeighborsClassifier(n_neighbors=5)
knn_model.fit(X_train, y_train)
knn_preds = knn_model.predict(X_test)

print("Confusion Matrix:\n", confusion_matrix(y_test, knn_preds))

print("Classification Report:\n", classification_report(y_test, knn_preds))
Confusion Matrix:
[[ 9 25]
[ 4 162]]
Classification Report:
precision recall f1-score support

fail 0.69 0.26 0.38 34

pass 0.87 0.98 0.92 166

accuracy 0.85 200

macro avg 0.78 0.62 0.65 200
weighted avg 0.84 0.85 0.83 200

8️⃣ Decision Tree Classification

In [13]: from sklearn.tree import DecisionTreeClassifier

tree_model = DecisionTreeClassifier(random_state=0)
tree_model.fit(X_train, y_train)
tree_preds = tree_model.predict(X_test)

print("Confusion Matrix:\n", confusion_matrix(y_test, tree_preds))

print("Classification Report:\n", classification_report(y_test, tree_preds))

Confusion Matrix:
[[ 20 14]
[ 11 155]]
Classification Report:
precision recall f1-score support

fail 0.65 0.59 0.62 34

pass 0.92 0.93 0.93 166

accuracy 0.88 200

macro avg 0.78 0.76 0.77 200
weighted avg 0.87 0.88 0.87 200

9️⃣ Support Vector Machine (SVM)

Classification
In [14]: from sklearn.svm import SVC

svm_model = SVC()
svm_model.fit(X_train, y_train)
svm_preds = svm_model.predict(X_test)

print("Confusion Matrix:\n", confusion_matrix(y_test, svm_preds))

print("Classification Report:\n", classification_report(y_test, svm_preds))
Confusion Matrix:
[[ 16 18]
[ 2 164]]
Classification Report:
precision recall f1-score support

fail 0.89 0.47 0.62 34

pass 0.90 0.99 0.94 166

accuracy 0.90 200

macro avg 0.89 0.73 0.78 200
weighted avg 0.90 0.90 0.89 200

🔟 Artificial Neural Network (ANN)

In [15]: from sklearn.neural_network import MLPClassifier

ann_model = MLPClassifier(hidden_layer_sizes=(10,), max_iter=300, random_sta

ann_model.fit(X_train, y_train)
ann_preds = ann_model.predict(X_test)

print("Confusion Matrix:\n", confusion_matrix(y_test, ann_preds))

print("Classification Report:\n", classification_report(y_test, ann_preds))

Confusion Matrix:
[[ 19 15]
[ 8 158]]
Classification Report:
precision recall f1-score support

fail 0.70 0.56 0.62 34

pass 0.91 0.95 0.93 166

accuracy 0.89 200

macro avg 0.81 0.76 0.78 200
weighted avg 0.88 0.89 0.88 200

/usr/local/lib/python3.11/dist-packages/sklearn/neural_network/_multilayer_p
erceptron.py:686: ConvergenceWarning: Stochastic Optimizer: Maximum iteratio
ns (300) reached and the optimization hasn't converged yet.

📊 Clustering Models
warnings.warn(

1️⃣1️⃣ K-Means Clustering

In [16]: from sklearn.cluster import KMeans

X = df.drop(['math score', 'target'], axis=1)

X_scaled = scaler.fit_transform(X)
kmeans = KMeans(n_clusters=2, random_state=0).fit(X_scaled)
print("KMeans Silhouette Score:", silhouette_score(X_scaled, kmeans.labels_)

KMeans Silhouette Score: 0.10611407723279058

/usr/local/lib/python3.11/dist-packages/sklearn/cluster/_kmeans.py:870: Futu
reWarning: The default value of `n_init` will change from 10 to 'auto' in 1.
4. Set the value of `n_init` explicitly to suppress the warning
warnings.warn(

1️⃣2️⃣ Hierarchical Clustering

In [17]: from sklearn.cluster import AgglomerativeClustering

hclust = AgglomerativeClustering(n_clusters=2).fit(X_scaled)
print("Hierarchical Clustering Silhouette Score:", silhouette_score(X_scaled

Hierarchical Clustering Silhouette Score: 0.15767520836587193

📌 Conclusion
This notebook demonstrates the application of various machine learning
algorithms to predict student performance. We implemented multiple regression
models, classification algorithms, and clustering techniques to analyze the
dataset and evaluate the effectiveness of each model.

Thank you for reviewing this analysis!

Submitted by:
Raunak Kumar Singh

University Name:
Atmaram Sanatan Dharma College

Course Name:
B.sc Computer Science Hons

Date of Submission:
15-04-2025

This notebook was converted with convert.ploomber.io

Buteyko Breathing Course Manual
100% (3)
Buteyko Breathing Course Manual
25 pages
Grade 6 Arts
100% (1)
Grade 6 Arts
84 pages
Crochet - Pattern No 1724 Bunnies Bunnies Bunnies
100% (2)
Crochet - Pattern No 1724 Bunnies Bunnies Bunnies
11 pages
ML Lab-1
No ratings yet
ML Lab-1
32 pages
Free Business Plan Presentation
100% (7)
Free Business Plan Presentation
49 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
Ann Experiential Learning
No ratings yet
Ann Experiential Learning
43 pages
Hatespeech Code Ipynb
No ratings yet
Hatespeech Code Ipynb
31 pages
ML Lab 8
No ratings yet
ML Lab 8
9 pages
Machine Learning II
No ratings yet
Machine Learning II
61 pages
Final Drug Study.
67% (3)
Final Drug Study.
20 pages
Progress of CATBOOST ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
No ratings yet
Progress of CATBOOST ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
9 pages
ML Prac1-10
No ratings yet
ML Prac1-10
32 pages
ADS - Phase 3
No ratings yet
ADS - Phase 3
34 pages
Event Based Revenue Recognition
No ratings yet
Event Based Revenue Recognition
7 pages
ML Manual With Outputs
No ratings yet
ML Manual With Outputs
30 pages
Python For Data Science IA 1 Programs
No ratings yet
Python For Data Science IA 1 Programs
14 pages
ML Mini Project
No ratings yet
ML Mini Project
9 pages
Artificial Intelligence Lab 7
No ratings yet
Artificial Intelligence Lab 7
10 pages
ML Practical Kiranjot 6-10
No ratings yet
ML Practical Kiranjot 6-10
10 pages
Machine Learning Final Report
No ratings yet
Machine Learning Final Report
8 pages
ML - Labtask5.ipynb - K - Colab
No ratings yet
ML - Labtask5.ipynb - K - Colab
8 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
13 pages
Shobit Sharma (2124399) ML Lab File PDF
No ratings yet
Shobit Sharma (2124399) ML Lab File PDF
19 pages
ML Lab Programs 2
No ratings yet
ML Lab Programs 2
16 pages
Data Preprocessing
No ratings yet
Data Preprocessing
9 pages
ML Lab 146
No ratings yet
ML Lab 146
50 pages
Bi 6 New
No ratings yet
Bi 6 New
6 pages
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
CCD - Ipynb - Colab
No ratings yet
CCD - Ipynb - Colab
6 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
33 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
8 pages
ML Practical 205160694034
No ratings yet
ML Practical 205160694034
33 pages
Python Code For KNN Classifier 1. Initial Message
No ratings yet
Python Code For KNN Classifier 1. Initial Message
7 pages
Cheat Sheet: Python For Data Science
100% (1)
Cheat Sheet: Python For Data Science
1 page
ML Internal Answers
No ratings yet
ML Internal Answers
9 pages
Naive Bayes Gaussian Table Tennis - Jupyter Notebook
No ratings yet
Naive Bayes Gaussian Table Tennis - Jupyter Notebook
6 pages
Dsbda 10
No ratings yet
Dsbda 10
5 pages
Ai HW
No ratings yet
Ai HW
7 pages
ML 2 16
No ratings yet
ML 2 16
6 pages
Deep Learning Assignments
No ratings yet
Deep Learning Assignments
6 pages
Random Forest
No ratings yet
Random Forest
5 pages
6 - 2 - SVMS, - Randon - Forests - and - KNN - Ipynb - Colaboratory
No ratings yet
6 - 2 - SVMS, - Randon - Forests - and - KNN - Ipynb - Colaboratory
4 pages
ML Assignment 4
No ratings yet
ML Assignment 4
7 pages
Advance AI and ML LAB
No ratings yet
Advance AI and ML LAB
16 pages
Aiml 5-8
No ratings yet
Aiml 5-8
19 pages
Exp9 10
No ratings yet
Exp9 10
4 pages
TASK 8: Deploy Support Vector Machine, Apriori Algorithm: BTCS619-18
No ratings yet
TASK 8: Deploy Support Vector Machine, Apriori Algorithm: BTCS619-18
5 pages
I Avaliação Parcial - 25.0 PTS - Gabarito
No ratings yet
I Avaliação Parcial - 25.0 PTS - Gabarito
9 pages
Import As Import As From Import From Import From Import From Import
No ratings yet
Import As Import As From Import From Import From Import From Import
4 pages
SPPUML5
No ratings yet
SPPUML5
4 pages
ADS Expt5 BE9 29
No ratings yet
ADS Expt5 BE9 29
3 pages
AIML - ECE304 - Assign-2 - Kartikeya - Kandpal - Ajitesh - S.ipynb - Colab
No ratings yet
AIML - ECE304 - Assign-2 - Kartikeya - Kandpal - Ajitesh - S.ipynb - Colab
3 pages
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
Individual Development Plan 2015 - 2016: Department of Education
No ratings yet
Individual Development Plan 2015 - 2016: Department of Education
4 pages
Import As Import As Import As Import As From Import
No ratings yet
Import As Import As Import As Import As From Import
3 pages
Project-4 (KNN CLASSIFICATION) (2) PRANAB
No ratings yet
Project-4 (KNN CLASSIFICATION) (2) PRANAB
2 pages
ML 4
No ratings yet
ML 4
2 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
Modern Approaches To Quality Control - Ahmed Badr Eldin
No ratings yet
Modern Approaches To Quality Control - Ahmed Badr Eldin
550 pages
Lab Week 7
No ratings yet
Lab Week 7
3 pages
Maxbox Starter66 Machine Learning4
No ratings yet
Maxbox Starter66 Machine Learning4
10 pages
Machine Learning Lab Assignment CSE-716: S. M. Shafkat Raihan ID: 16701041 SESSION: 2015-16
No ratings yet
Machine Learning Lab Assignment CSE-716: S. M. Shafkat Raihan ID: 16701041 SESSION: 2015-16
9 pages
API 653 - (9.10) Repair of Tank Bottoms PDF
No ratings yet
API 653 - (9.10) Repair of Tank Bottoms PDF
1 page
Self Healing Nanotechnology: Presented By
No ratings yet
Self Healing Nanotechnology: Presented By
14 pages
BORTE General Catalogue V4.2 PDF
No ratings yet
BORTE General Catalogue V4.2 PDF
147 pages
Collocations Challenge
No ratings yet
Collocations Challenge
2 pages
Scikit-Learn Cheat Sheet Python For Data Science: Preprocessing The Data Evaluate Your Model's Performance
100% (1)
Scikit-Learn Cheat Sheet Python For Data Science: Preprocessing The Data Evaluate Your Model's Performance
1 page
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
100% (1)
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
1 page
02 Non-Audit Assurance Engagements and Related Services
No ratings yet
02 Non-Audit Assurance Engagements and Related Services
12 pages
Biomass Estimation 29062021
No ratings yet
Biomass Estimation 29062021
21 pages
Expt No.03a Sieve Analysis of Soil
No ratings yet
Expt No.03a Sieve Analysis of Soil
12 pages
Chemistry f2t2 Ms
No ratings yet
Chemistry f2t2 Ms
11 pages
Ac 25.856-2a PDF
No ratings yet
Ac 25.856-2a PDF
35 pages
Class 12 Economics Overview
No ratings yet
Class 12 Economics Overview
5 pages
Paten US8957127 - Liquid Glue Formulated With Acrylic Emulsions - Google Paten
No ratings yet
Paten US8957127 - Liquid Glue Formulated With Acrylic Emulsions - Google Paten
4 pages
Tax Audit and Collection On Tax Revenue
No ratings yet
Tax Audit and Collection On Tax Revenue
14 pages
Lecture Guide 2
No ratings yet
Lecture Guide 2
9 pages
India - 2276890 - DART - FAQ Guideline
No ratings yet
India - 2276890 - DART - FAQ Guideline
5 pages
334 Gss 2010-01
No ratings yet
334 Gss 2010-01
8 pages
Piper PA28 TDI Checklist 2.1 SPHAIR
No ratings yet
Piper PA28 TDI Checklist 2.1 SPHAIR
4 pages
Influence of Interest Groups On Policy Making
No ratings yet
Influence of Interest Groups On Policy Making
9 pages
Crypto PNL Sheet - May
No ratings yet
Crypto PNL Sheet - May
2 pages
6th English Half Yearly Exam 2022 Original Question Paper Thenkasi District PDF Download
No ratings yet
6th English Half Yearly Exam 2022 Original Question Paper Thenkasi District PDF Download
2 pages
Essay Writing 12 May 2023
No ratings yet
Essay Writing 12 May 2023
2 pages
Rescheduled Interviews
No ratings yet
Rescheduled Interviews
2 pages
2022-02-07 Anschreiben Firmen Englisch
No ratings yet
2022-02-07 Anschreiben Firmen Englisch
1 page

Machine Learning Assignment

Uploaded by

Machine Learning Assignment

Uploaded by

In [ ]: # This Python 3 environment comes with many helpful analytics libraries inst

# It is defined by the kaggle/python Docker image: https://github.com/kaggle

import numpy as np # linear algebra

# Input data files are available in the read-only "../input/" directory

# You can write up to 20GB to the current directory (/kaggle/working/) that

Student Performance Analysis Using

Simple & Multiple Linear Regression

In [1]: import pandas as pd

from sklearn.model_selection import train_test_split

X = df.drop(['math score'], axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, ran

print("R² Score:", r2_score(y_test, y_pred))

2️⃣ Multiple Linear Regression

print("R² Score:", r2_score(y_test, y_pred))

3️⃣ Polynomial Regression

print("R² Score:", r2_score(y_test, y_pred))

ridge = Ridge(alpha=1.0).fit(X_train, y_train)

print("Ridge R²:", r2_score(y_test, ridge.predict(X_test)))

Ridge R²: 0.8805453685953484

🔁 Convert Target to Binary (Pass/Fail)

X = df.drop(['math score', 'target'], axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, ran

5️⃣ Logistic Regression

print("Confusion Matrix:\n", confusion_matrix(y_test, log_preds))

fail 0.74 0.59 0.66 34

accuracy 0.90 200

6️⃣ Naïve Bayes Classifier

print("Confusion Matrix:\n", confusion_matrix(y_test, nb_preds))

fail 0.56 0.68 0.61 34

accuracy 0.85 200

7️⃣ K-Nearest Neighbors (K-NN)

print("Confusion Matrix:\n", confusion_matrix(y_test, knn_preds))

fail 0.69 0.26 0.38 34

accuracy 0.85 200

8️⃣ Decision Tree Classification

print("Confusion Matrix:\n", confusion_matrix(y_test, tree_preds))

fail 0.65 0.59 0.62 34

accuracy 0.88 200

9️⃣ Support Vector Machine (SVM)

print("Confusion Matrix:\n", confusion_matrix(y_test, svm_preds))

fail 0.89 0.47 0.62 34

accuracy 0.90 200

🔟 Artificial Neural Network (ANN)

ann_model = MLPClassifier(hidden_layer_sizes=(10,), max_iter=300, random_sta

print("Confusion Matrix:\n", confusion_matrix(y_test, ann_preds))

fail 0.70 0.56 0.62 34

accuracy 0.89 200

1️⃣1️⃣ K-Means Clustering

X = df.drop(['math score', 'target'], axis=1)

KMeans Silhouette Score: 0.10611407723279058

1️⃣2️⃣ Hierarchical Clustering

Hierarchical Clustering Silhouette Score: 0.15767520836587193

Thank you for reviewing this analysis!

This notebook was converted with convert.ploomber.io

You might also like