0% found this document useful (0 votes)
20 views9 pages

ML PR-3

Also same as previous

Uploaded by

Sahil Zaware
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views9 pages

ML PR-3

Also same as previous

Uploaded by

Sahil Zaware
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Machine Learning Laboratory TE IT (2019 Pattern) Lab Manual, SPPU Curriculum

AIM:
Every year many students give the GRE exam to get admission in foreign Universities. The
data set contains GRE Scores (out of 340), TOEFL Scores (out of 120), University Rating
(out of 5), Statement of Purpose strength (out of 5), Letter of Recommendation strength (out
of 5), Undergraduate GPA (out of 10), Research Experience (0=no, 1=yes), Admitted (0=no,
1=yes). Admitted is the target variable. Data Set Available on kaggle (The last column of the
dataset needs to be changed to 0 or 1)Data Set :
https://www.kaggle.com/mohansacharya/graduate-admissions The counselor of the firm is
supposed check whether the student will get an admission or not based on his/her GRE score
and Academic Score. So to help the counselor to take appropriate decisions build a machine
learning model classifier using Decision tree to predict whether a student will get admission
or not.

a) Apply Data pre-processing (Label Encoding, Data Transformation….)


techniques if necessary.
b) Perform data-preparation ( Train-Test Split)
c) Apply Machine Learning Algorithm
d) Evaluate Model.

Theory:

What is the Classification Algorithm?

The Classification algorithm is a Supervised Learning technique that is used to identify the
category of new observations on the basis of training data. In Classification, a program
learns from the given dataset or observations and then classifies new observation into a
number of classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog,
etc. Classes can be called as targets/labels or categories.

Unlike regression, the output variable of Classification is a category, not a value, such as
"Green or Blue", "fruit or animal", etc. Since the Classification algorithm is a Supervised
learning technique, hence it takes labeled input data, which means it contains input with the
corresponding output.

In classification algorithm, a discrete output function(y) is mapped to inputvariable(x).

SIT IT Lonavala
Machine Learning Laboratory TE IT (2019 Pattern) Lab Manual, SPPU Curriculum

1. y=f(x), where y = categorical output


2. The best example of an ML classification algorithm is Email Spam Detector.
3. The main goal of the Classification algorithm is to identify the category of a given
dataset, and these algorithms are mainly used to predict the output for the categorical
data.
4. Classification algorithms can be better understood using the below diagram. In the
below diagram, there are two classes, class A and Class B. These classes have features
that are similar to each other and dissimilar toother classes.

The algorithm which implements the classification on a dataset is known as a classifier.


There are two types of Classifications:
o Binary Classifier: If the classification problem has only two possible outcomes, then
it is called as Binary Classifier. Examples: YES or NO, MALE or
FEMALE, SPAM or NOT SPAM, CATor DOG, etc.
o Multi-class Classifier: If a classification problem has more than two outcomes, then
it is called as Multi-class Classifier. Example: Classifications of types of
crops, Classification of types ofmusic.

Learners in Classification Problems:

In the classification problems, there are two types of learners:

1. Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it
receives the test dataset. In Lazy learner case, classification is done on the basis of the
most related data stored in the training dataset. It takesless time in training
but more time for predictions.
Example: K-NN algorithm, Case-based reasoning
2. Eager Learners:Eager Learners develop a classification model based on a training
dataset before receiving a test dataset. Opposite to Lazy learners, Eager Learner takes
more time in learning, and less time in prediction. Example: Decision Trees,
Naïve Bayes, ANN.

SIT IT Lonavala
n [2]:
import pandas as pd
import seaborn as sns

n [3]:
df = pd.read_csv('Admission_Predict.csv')

n [4]:
df.head()

ut[4]: Serial GRE TOEFL University Chance of


SOP LOR CGPA Research
No. Score Score Rating Admit

0 1 337 118 4 4.5 4.5 9.65 1 0.92

1 2 324 107 4 4.0 4.5 8.87 1 0.76

2 3 316 104 3 3.0 3.5 8.00 1 0.72

3 4 322 110 3 3.5 2.5 8.67 1 0.80

4 5 314 103 2 2.0 3.0 8.21 0 0.65

n [5]:
df.shape

(500, 9)
ut[5]:

n [6]:
from sklearn.preprocessing import Binarizer

n [7]:
bi = Binarizer(threshold=0.75)
df['Chance of Admit '] = bi.fit_transform(df[['Chance of Admit ']])

n [8]:
df.head()

ut[8]: Serial GRE TOEFL University Chance of


SOP LOR CGPA Research
No. Score Score Rating Admit

0 1 337 118 4 4.5 4.5 9.65 1 1.0

1 2 324 107 4 4.0 4.5 8.87 1 1.0

2 3 316 104 3 3.0 3.5 8.00 1 0.0

3 4 322 110 3 3.5 2.5 8.67 1 1.0

4 5 314 103 2 2.0 3.0 8.21 0 0.0

n [9]:
x = df.drop('Chance of Admit ', axis =1)
y = df['Chance of Admit ']

n [10]:
x

ut[10]: Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research

0 1 337 118 4 4.5 4.5 9.65 1


Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research

1 2 324 107 4 4.0 4.5 8.87 1

2 3 316 104 3 3.0 3.5 8.00 1

3 4 322 110 3 3.5 2.5 8.67 1

4 5 314 103 2 2.0 3.0 8.21 0

... ... ... ... ... ... ... ... ...

495 496 332 108 5 4.5 4.0 9.02 1

496 497 337 117 5 5.0 5.0 9.87 1

497 498 330 120 5 4.5 5.0 9.56 1

498 499 312 103 4 4.0 5.0 8.43 0

499 500 327 113 4 4.5 4.5 9.04 0

500 rows × 8 columns

n [11]:
y= y.astype('int')

n [12]:
y

0 1
ut[12]:
1 1
2 0
3 1
4 0
..
495 1
496 1
497 1
498 0
499 1
Name: Chance of Admit , Length: 500, dtype: int32

n [13]:
sns.countplot(x=y)

<AxesSubplot:xlabel='Chance of Admit ', ylabel='count'>


ut[13]:
n [31]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y,random_state=0, test_size =0

n [15]:
x_train.shape

(375, 8)
ut[15]:

n [16]:
x_test.shape

(125, 8)
ut[16]:

n [17]:
y_train.shape

(375,)
ut[17]:

n [18]:
y_test.shape

(125,)
ut[18]:

n [32]:
from sklearn.tree import DecisionTreeClassifier

n [33]:
classifier = DecisionTreeClassifier(random_state=0)

n [34]:
classifier.fit(x_train,y_train)

DecisionTreeClassifier(random_state=0)
ut[34]:

n [35]:
y_pred = classifier.predict(x_test)

n [36]:
result = pd.DataFrame({'actual' : y_test,'predicted':y_pred})

n [37]:
result

ut[37]: actual predicted

90 0 0

254 1 1

283 1 1

445 1 1

461 0 0

... ... ...

430 0 0

49 1 0
actual predicted

134 1 1

365 1 1

413 0 0

125 rows × 2 columns

n [44]:

---------------------------------------------------------------------------
NameError Traceback (most recent call last)
C:\Users\OSLAB~1\AppData\Local\Temp/ipykernel_3204/1472053813.py in <module>
----> 1 cm = confusion_matrix(y_test, predictions, labels=classifier.classes_)

NameError: name 'confusion_matrix' is not defined

n [42]:
from sklearn.metrics import ConfusionMatrixDisplay, accuracy_score

n [39]:
from sklearn.metrics import classification_report

n [ ]:

n [43]:
accuracy_score(y_test,y_pred)

0.96
ut[43]:

n [50]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred,labels = classifier.classes_)

n [51]:
disp = ConfusionMatrixDisplay(confusion_matrix=cm,display_labels = classifier.classe

n [52]:
disp.plot()
<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x1c7f4a76c70>
ut[52]:

n [54]:
accuracy_score(y_test, y_pred)

0.96
ut[54]:

n [55]:
print(classification_report(y_test, y_pred))

precision recall f1-score support

0 0.96 0.98 0.97 81


1 0.95 0.93 0.94 44

accuracy 0.96 125


macro avg 0.96 0.95 0.96 125
weighted avg 0.96 0.96 0.96 125

n [68]:
new = [[140,300,110,5,4.5,4.5,9.2,1]]

n [69]:
classifier.predict(new)[0]

1
ut[69]:

n [70]:
from sklearn.tree import plot_tree
plot_tree(classifier, );
n [73]:
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt
plt.figure(figsize=(12,12))
plot_tree(classifier, fontsize=8, filled=True, rounded = True);

n [74]:
plt.figure(figsize=(12,12))
plot_tree(classifier, fontsize=8, filled=True, rounded = True, feature_names=x.colum
n [ ]:

You might also like