0% found this document useful (0 votes)
5 views

ML Assignment 2

The document is a Jupyter Notebook that processes a dataset related to university admissions, specifically analyzing GRE and TOEFL scores, university ratings, and other factors to predict admission chances. It uses libraries such as pandas, numpy, and scikit-learn to read the data, perform data cleaning, and build a Decision Tree model for classification. The model achieves an accuracy of 86% in predicting admission outcomes based on the provided features.

Uploaded by

lucifer267302
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

ML Assignment 2

The document is a Jupyter Notebook that processes a dataset related to university admissions, specifically analyzing GRE and TOEFL scores, university ratings, and other factors to predict admission chances. It uses libraries such as pandas, numpy, and scikit-learn to read the data, perform data cleaning, and build a Decision Tree model for classification. The model achieves an accuracy of 86% in predicting admission outcomes based on the provided features.

Uploaded by

lucifer267302
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

8/1/24, 12:40 PM TEIT-10(1) - Jupyter Notebook

In [6]: import numpy as np;


import pandas as pd;
import dask.dataframe as dd;
import seaborn as sns;
import matplotlib.pyplot as mtp;

In [7]: df = pd.read_csv("Admission_Predict.csv")

In [8]: df

Out[8]: Serial GRE TOEFL University Chance of


Rating SOP LOR CGPA Research
No. Score Score Admit
1
1 337 118 0.92
0 4 4.5 4.5 9.65
1
2 324 107 0.76
1 4 4.0 4.5 8.87
1
3 316 104 0.72
2 3 3.0 3.5 8.00
1
4 322 110 0.80
3 3 3.5 2.5 8.67
0
5 314 103 0.65
4 2 2.0 3.0 8.21
...
... ... ... ...
... ... ... ... ...
1
396 324 110 0.82
395 3 3.5 3.5 9.04
1
397 325 107 0.84
396 3 3.0 3.5 9.11
1
398 330 116 0.91
397 4 5.0 4.5 9.45
0
399 312 103 0.67
398 3 3.5 4.0 8.78
1
400 333 117 0.95
399 4 5.0 4.0 9.66

400 rows × 9 columns

In [10]: dfd = dd.read_csv("Admission_Predict.csv")

In [11]: dfd
Out[11]: Dask DataFrame Structure:
Serial GRE TOEFL University Chance
SOP LOR CGPA Research
No. Score Score Rating of Admit

npartitions=1

int64 int64 int64 int64 float64 float64 float64 int64 float64

... ... ... ... ... ... ... ... ...


Dask Name: read-csv, 1 graph layer

localhost:8888/notebooks/TEIT-10 (1).ipynb 1/5


8/1/24, 12:35 PM TEIT-10 (1) - Jupyter Notebook

In [12]: df.head()

Out[12]: Serial GRE TOEFL University Chance of


Rating SOP LOR CGPA Research
No. Score Score Admit
1
1 337 118 0.92
0 4 4.5 4.5 9.65
1
2 324 107 0.76
1 4 4.0 4.5 8.87
1
3 316 104 0.72
2 3 3.0 3.5 8.00
1
4 322 110 0.80
3 3 3.5 2.5 8.67
0
5 314 103 0.65
4 2 2.0 3.0 8.21

In [13]: df.isnull()

Out[13]: Serial GRE TOEFL University Chance of


SOP LOR CGPA Research
No. Score Score Rating Admit
False False False False False
0 False False False False
False False False False False
1 False False False False
False False False False False
2 False False False False
False False False False False
3 False False False False
False False False False False
4 False False False False
... ... ... ... ...
... ... ... ... ...
False False False False False
395 False False False False
False False False False False
396 False False False False
False False False False False
397 False False False False
False False False False False
398 False False False False
False False False False False
399 False False False False

400 rows × 9 columns

In [14]: df.isnull().sum()

Out[14]: Serial No. 0


GRE Score 0
TOEFL Score 0
University Rating 0
SOP 0
LOR 0
CGPA 0
Research 0
Chance of Admit 0
dtype: int64

localhost:8888/notebooks/TEIT-10 (1).ipynb 2/5


8/1/24, 12:35 PM TEIT-10 (1) - Jupyter Notebook

In [15]: df.sum() Serial

Out[15]: No. 80200.00


GRE Score 126723.00
TOEFL Score 42964.00
University Rating 1235.00
SOP 1360.00
LOR 1381.00
CGPA 3439.57
Research 219.00
Chance of Admit 289.74
dtype: float64

In [16]: df = df.drop('Serial No.',axis=1)

In [17]: df

Out[17]: GRE Score TOEFL Score University Rating SOP LOR CGPA Research Chance of Admit

0 337 118 4 4.5 4.5 9.65 1 0.92

1 324 107 4 4.0 4.5 8.87 1 0.76

2 316 104 3 3.0 3.5 8.00 1 0.72

3 322 110 3 3.5 2.5 8.67 1 0.80

4 314 103 2 2.0 3.0 8.21 0 0.65

... ... ... ... ... ... ... ... ...

395 324 110 3 3.5 3.5 9.04 1 0.82

396 325 107 3 3.0 3.5 9.11 1 0.84

397 330 116 4 5.0 4.5 9.45 1 0.91

398 312 103 3 3.5 4.0 8.78 0 0.67

399 333 117 4 5.0 4.0 9.66 1 0.95

400 rows × 8 columns

In [18]: df.shape()

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[18], line 1
----> 1 df.shape()

TypeError: 'tuple' object is not callable

In [19]: df.shape

Out[19]: (400, 8)

localhost:8888/notebooks/TEIT-10 (1).ipynb 3/5


8/1/24, 12:35 PM TEIT-10(1) - Jupyter Notebook

In [20]: df['Chance of Admit '] = [1 if each > 0.75 else 0 for each in df['Chance of Adm

In [21]: df.head()
Out[21]: GRE Score TOEFL Score University Rating SOP LOR CGPA Research Chance of Admit

0 337 118 4 4.5 4.5 9.65 1 1

1 324 107 4 4.0 4.5 8.87 1 1

2 316 104 3 3.0 3.5 8.00 1 0

3 322 110 3 3.5 2.5 8.67 1 1

4 314 103 2 2.0 3.0 8.21 0 0

In [22]: x = df[['GRE Score', 'TOEFL Score', 'University Rating', 'SOP', 'LOR ', 'CGPA',
'Research']] #input on the x-axix

y = df['Chance of Admit '] #output on the y-axis

In [23]: from sklearn.model_selection import train_test_split

In [24]: x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.25,random_s

In [25]: print(f"Size of splitted data")


print(f"x_train {x_train.shape}")
print(f"y_train {y_train.shape}")
print(f"x_test {x_test.shape}")
print(f"y_test {y_test.shape}")
Size of splitted data
x_train (300, 7)
y_train (300,)
x_test (100, 7)
y_test (100,)

In [26]: from sklearn.tree import DecisionTreeRegressor

In [27]: model_dt = DecisionTreeRegressor(random_state=1)

In [29]: model_dt.fit(x_train,y_train)

Out[29]: DecisionTreeRegressor(random_state=1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust
the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with
nbviewer.org.

In [30]: y_pred_dt = model_dt.predict(x_test) #int

localhost:8888/notebooks/TEIT-10(1).ipynb 4/5
8/1/24, 12:35 PM TEIT-10(1) - Jupyter Notebook

In [31]: from sklearn.metrics import ConfusionMatrixDisplay, accuracy_score


from sklearn.metrics import classification_report

In [33]: ConfusionMatrixDisplay.from_predictions(y_test,y_pred_dt)
mtp.title('Decision Tree')
mtp.show()
print(f" Accuracy is {accuracy_score(y_test,y_pred_dt)}")
print(classification_report(y_test,y_pred_dt))

Accuracy is 0.86
precision recall f1-score support

0 0.86 0.89 0.88 56


1 0.86 0.82 0.84 44

accuracy 0.86 100


macro avg 0.86 0.86 0.86 100
weighted avg 0.86 0.86 0.86 100

In [ ]:

localhost:8888/notebooks/TEIT-10(1).ipynb 5/5

You might also like