ML PR-3
ML PR-3
AIM:
Every year many students give the GRE exam to get admission in foreign Universities. The
data set contains GRE Scores (out of 340), TOEFL Scores (out of 120), University Rating
(out of 5), Statement of Purpose strength (out of 5), Letter of Recommendation strength (out
of 5), Undergraduate GPA (out of 10), Research Experience (0=no, 1=yes), Admitted (0=no,
1=yes). Admitted is the target variable. Data Set Available on kaggle (The last column of the
dataset needs to be changed to 0 or 1)Data Set :
https://www.kaggle.com/mohansacharya/graduate-admissions The counselor of the firm is
supposed check whether the student will get an admission or not based on his/her GRE score
and Academic Score. So to help the counselor to take appropriate decisions build a machine
learning model classifier using Decision tree to predict whether a student will get admission
or not.
Theory:
The Classification algorithm is a Supervised Learning technique that is used to identify the
category of new observations on the basis of training data. In Classification, a program
learns from the given dataset or observations and then classifies new observation into a
number of classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog,
etc. Classes can be called as targets/labels or categories.
Unlike regression, the output variable of Classification is a category, not a value, such as
"Green or Blue", "fruit or animal", etc. Since the Classification algorithm is a Supervised
learning technique, hence it takes labeled input data, which means it contains input with the
corresponding output.
SIT IT Lonavala
Machine Learning Laboratory TE IT (2019 Pattern) Lab Manual, SPPU Curriculum
1. Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it
receives the test dataset. In Lazy learner case, classification is done on the basis of the
most related data stored in the training dataset. It takesless time in training
but more time for predictions.
Example: K-NN algorithm, Case-based reasoning
2. Eager Learners:Eager Learners develop a classification model based on a training
dataset before receiving a test dataset. Opposite to Lazy learners, Eager Learner takes
more time in learning, and less time in prediction. Example: Decision Trees,
Naïve Bayes, ANN.
SIT IT Lonavala
n [2]:
import pandas as pd
import seaborn as sns
n [3]:
df = pd.read_csv('Admission_Predict.csv')
n [4]:
df.head()
n [5]:
df.shape
(500, 9)
ut[5]:
n [6]:
from sklearn.preprocessing import Binarizer
n [7]:
bi = Binarizer(threshold=0.75)
df['Chance of Admit '] = bi.fit_transform(df[['Chance of Admit ']])
n [8]:
df.head()
n [9]:
x = df.drop('Chance of Admit ', axis =1)
y = df['Chance of Admit ']
n [10]:
x
ut[10]: Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research
n [11]:
y= y.astype('int')
n [12]:
y
0 1
ut[12]:
1 1
2 0
3 1
4 0
..
495 1
496 1
497 1
498 0
499 1
Name: Chance of Admit , Length: 500, dtype: int32
n [13]:
sns.countplot(x=y)
n [15]:
x_train.shape
(375, 8)
ut[15]:
n [16]:
x_test.shape
(125, 8)
ut[16]:
n [17]:
y_train.shape
(375,)
ut[17]:
n [18]:
y_test.shape
(125,)
ut[18]:
n [32]:
from sklearn.tree import DecisionTreeClassifier
n [33]:
classifier = DecisionTreeClassifier(random_state=0)
n [34]:
classifier.fit(x_train,y_train)
DecisionTreeClassifier(random_state=0)
ut[34]:
n [35]:
y_pred = classifier.predict(x_test)
n [36]:
result = pd.DataFrame({'actual' : y_test,'predicted':y_pred})
n [37]:
result
90 0 0
254 1 1
283 1 1
445 1 1
461 0 0
430 0 0
49 1 0
actual predicted
134 1 1
365 1 1
413 0 0
n [44]:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
C:\Users\OSLAB~1\AppData\Local\Temp/ipykernel_3204/1472053813.py in <module>
----> 1 cm = confusion_matrix(y_test, predictions, labels=classifier.classes_)
n [42]:
from sklearn.metrics import ConfusionMatrixDisplay, accuracy_score
n [39]:
from sklearn.metrics import classification_report
n [ ]:
n [43]:
accuracy_score(y_test,y_pred)
0.96
ut[43]:
n [50]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred,labels = classifier.classes_)
n [51]:
disp = ConfusionMatrixDisplay(confusion_matrix=cm,display_labels = classifier.classe
n [52]:
disp.plot()
<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x1c7f4a76c70>
ut[52]:
n [54]:
accuracy_score(y_test, y_pred)
0.96
ut[54]:
n [55]:
print(classification_report(y_test, y_pred))
n [68]:
new = [[140,300,110,5,4.5,4.5,9.2,1]]
n [69]:
classifier.predict(new)[0]
1
ut[69]:
n [70]:
from sklearn.tree import plot_tree
plot_tree(classifier, );
n [73]:
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt
plt.figure(figsize=(12,12))
plot_tree(classifier, fontsize=8, filled=True, rounded = True);
n [74]:
plt.figure(figsize=(12,12))
plot_tree(classifier, fontsize=8, filled=True, rounded = True, feature_names=x.colum
n [ ]: