0% found this document useful (0 votes)

24 views

Assignment 3 - LP1

The document discusses building a machine learning classifier using decision trees to predict graduate school admissions. It describes preprocessing the dataset, applying classification algorithms like decision trees and logistic regression, and evaluating the model's performance. Code is provided to preprocess and explore the admissions dataset, including plotting various graphs.

Uploaded by

bbad070105

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Assignment 3 - LP1

Uploaded by

bbad070105

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Experiment No.

3
Aim:
Assignment on Classification technique
Every year many students give the GRE exam to get admission in foreign Universities. The data set
contains GRE Scores (out of 340), TOEFL Scores (out of 120), University Rating (out of 5), Statement of
Purpose strength (out of 5), Letter of Recommendation strength (out of 5), Undergraduate GPA (out of
10), Research Experience (0=no, 1=yes), Admitted (0=no, 1=yes). Admitted is the target variable. Data
Set Available on kaggle (The last column of the dataset needs to be changed to 0 or 1)Data Set :
https://www.kaggle.com/mohansacharya/graduate-admissions The counselor of the firm is supposed
check whether the student will get an admission or not based on his/her GRE score and Academic Score.
So to help the counselor to take appropriate decisions build a machine learning model classifier using
Decision tree to predict whether a student will get admission or not. Apply Data pre-processing (Label
Encoding, Data Transformation….) techniques if necessary. Perform data-preparation (Train-Test Split)
C. Apply Machine Learning Algorithm D. Evaluate Model.

Theory:
Classification: Classification may be defined as the process of predicting class or category from
observed values or given data points. The categorized output can have the form such as “Black” or
“White” or “spam” or “no spam”.Mathematically, classification is the task of approximating a mapping
function (f) from input variables (X) to output variables (Y).

Building a Classifier in Python:

Step1: Importing necessary python

package Step2: Importing dataset

Step3: Organizing data into training & testing

sets Step4: Model evaluation

Step5: Finding accuracy

Classification Algorithms Include:

Naive Bayes, Logistic regression, K-nearest neighbours, (Kernel) SVM, Decision tree
1. Logistic Regression Algorithm: It is a Machine Learning classification algorithm that is used to
predict the probability of a categorical dependent variable. In logistic regression, the dependent
variable is a binary variable that contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.).
Logistic regression model predicts P(Y=1) as a function of X.

Logistic Regression Algorithm Equation:

The Logistic regression equation can be obtained from the Linear Regression equation. The mathematical
steps to get Logistic Regression equations are given below:

o We know the equation of the straight line can be written as:

o In Logistic Regression y can be between 0 and 1 only, so for this let's divide the above equation
by (1-y):

o But we need range between -[infinity] to +[infinity], then take logarithm of the equation it will
become:

The above equation is the final equation for Logistic Regression.

Steps in Logistic Regression: To implement the Logistic Regression using Python, we will use the
same steps as we have done in previous topics of Regression. Below are the steps:
1. Data Pre-processing step
2. Fitting Logistic Regression to the Training set
3. Predicting the test result
4. Test accuracy of the result(Creation of Confusion matrix)
5. Visualizing the test set result.

2. Decision Tree Algorithm: Decision trees can be constructed by an algorithmic approach that can
split the dataset in different ways based on different conditions. Decisions tress is the most powerful
algorithms that falls under the category of supervised algorithms.

Decision Tree Algorithm Steps:

Step-1: Begin the tree with the root node, says S, which contains the complete dataset.

Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).

Step-3: Divide the S into subsets that contains possible values for the best attributes.

Step-4: Generate the decision tree node, which contains the best attribute.

Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3.
Continue this process until a stage is reached where you cannot further classify the nodes and called
the final node as a leaf node.

Solve decision tree such problems there is a technique which is called as Attribute selection
measure or ASM. There are two popular techniques for ASM, which are:

1. Information Gain: Information gain is the measurement of changes in entropy after the
segmentation of a dataset based on an attribute. It calculates how much information a feature
provides us about a class. According to the value of information gain, we split the node and build
the decision tree.

Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)

2. Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies randomness
in data. Entropy can be calculated as:
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

Where,S= Total number of samples, P(yes)= probability of yes, P(no)= probability of

3. Gini Index: Gini index is a measure of impurity or purity used while creating a decision tree in the
CART(Classification and Regression Tree) algorithm. An attribute with the low Gini index should
be preferred as compared to the high Gini index.

Gini Index= 1- ∑jPj2

3. SVM Algorithm: Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems.

SVM Algorithm Steps:

1. Importing the dataset
2. Splitting the dataset into training and test samples
3. Classifying the predictors and target
4. Initializing Support Vector Machine and fitting the training data
5. Predicting the classes for test set
6. Attaching the predictions to test set for comparing
7. Comparing the actual classes and predictions
8. Calculating the accuracy of the predictions

Applications of Classifications Algorithms:

1. Sentiment Analysis
2. Email Spam Classification
3. Document Classification
4. Image Classification

Code:
# To load the

dataset import

pandas as pd

importmatplotlib.pyplot as plt

#seaborn: for data visualization and exploratory data

analysis importseaborn as sns

import warnings

warnings.filterwarnings("ignore")

#Read data in csv file store into dataframe

df =

pd.read_csv('Admission_Predict.csv')

print(df.head(5))

##########################################################################

#To drop the irrelevant column and check if there are any null values in the

dataset df = df.drop(['Serial No.'], axis=1)

print(df.isnull().sum())

#To see the distribution of the variables of graduate applicants.

#distplot() plot distributed data as observations

#KDE: Kerner Density Estimate, probability density function of a continuous random variable Show
GRE Score

fig = sns.distplot(df['GRE Score'], kde=False)

plt.title("Distribution of GRE Scores")

plt.show()

#Show TOEFL Score

fig = sns.distplot(df['TOEFL Score'], kde=False)

plt.title("Distribution of TOEFL Scores")

plt.show()

#Show University Ratings

fig = sns.distplot(df['University Rating'], kde=False)

plt.title("Distribution of University Rating")

plt.show()

#Show SOP Ratings

fig = sns.distplot(df['SOP'],

kde=False) plt.title("Distribution of

SOP Ratings") plt.show()

#Show CGPA

fig = sns.distplot(df['CGPA'], kde=False)

plt.title("Distribution of CGPA")

plt.show()

#It is clear from the distributions, students with varied merit apply for the university.

#Understanding the relation between different factors responsible for graduate admissions GRE Score vs
TOEFL Score
#regplot() :Plot data and a linear regression model fit.

fig = sns.regplot(x="GRE Score", y="TOEFL Score", data=df)

plt.title("GRE Score vs TOEFL Score")

plt.show()

#People with higher GRE Scores also have higher TOEFL Scores which is justified because both TOEFL
and GRE have a verbal section which although not similar are relatable

#GRE Score vs CGPA

fig = sns.regplot(x="GRE Score", y="CGPA", data=df)

plt.title("GRE Score vs CGPA")

plt.show()

#Although there are exceptions, people with higher CGPA usually have higher GRE scores maybe
because they are smart or hard working

#LOR vs CGPA show wheather Research 0 or 1

#lmplot():a 2D scatterplot with an optional overlaid regression line.

#hue: Variables that define subsets of the data, which will be drawn on separate facets in the grid.

fig = sns.lmplot(x="CGPA", y="LOR ", data=df, hue="Research")

plt.title("LOR vs

CGPA") plt.show()

#LORs (Letter of Recommendation strength) are not that related with CGPA so it is clear that a persons
LOR is not dependent on that persons academic excellence.

#Having research experience is usually related with a good LOR which might be justified by the fact that
supervisors have personal interaction with the students performing research which usually results in
good LORs

#GRE Score vs LOR SHOW WHEATHER Research 0 or 1

fig = sns.lmplot(x="GRE Score", y="LOR ", data=df, hue="Research")

plt.title("GRE Score vs LOR")

plt.show()

#GRE scores and LORs are also not that related. People with different kinds of LORs have all kinds of
GRE scores

#SOP vs CGPA

fig = sns.regplot(x="CGPA", y="SOP", data=df)

plt.title("SOP vs CGPA")

plt.show()

#CGPA and SOP are not that related because Statement of Purpose is related to academic performance,
but since people with good CGPA tend to be more hard working so they have good things to say in their
SOP which might explain the slight move towards higher CGPA as along with good SOPs

#GRE Score vs SOP

fig = sns.regplot(x="GRE Score", y="SOP", data=df)

plt.title("GRE Score vs SOP")

plt.show()

#Similary, GRE Score and CGPA is only slightly related

#SOP vs TOEFL

fig = sns.regplot(x="TOEFL Score", y="SOP", data=df)

plt.title("SOP vs TOEFL")

plt.show()

.#Correlation among variables

importnumpy as np

#corr():Find the pairwise correlation of all columns in the dataframe

corr = df.corr()
print(corr)

#plt.subplot:Crate a figure & set sub plots

fig, ax = plt.subplots(figsize=(8, 8))

#Make a diverging palette between two HUSL

colors. #cmap: colour map set

colormap = sns.diverging_palette(220, 10, as_cmap=True)

#zeros_like():Returns an array of given shape and type as given array, with

zeros. dropSelf = np.zeros_like(corr)

#np.triu_indices_from(dropSelf): Return indices of array

dropSelf[np.triu_indices_from(dropSelf)] = True

colormap = sns.diverging_palette(220, 10,

as_cmap=True)

sns.heatmap(corr, cmap=colormap, linewidths=.5, annot=True, fmt=".2f", mask=dropSelf)

plt.show()

fromsklearn.model_selection import train_test_split

#drop col chances of admission

X = df.drop(['Chance of Admit '], axis=1)

y = df['Chance of Admit ']

#split data for training & tasting

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.20, shuffle=False)

#DecisionTree, Random Forest, K Neighbor, SVR, Linear Regression

fromsklearn.tree import DecisionTreeRegressor

fromsklearn.ensemble import RandomForestRegressor

fromsklearn.svm import SVR

fromsklearn.linear_model import LinearRegression

fromsklearn.metrics import mean_squared_error

#These methods predict the future applicant's chances of admission.

models = [['DecisionTree :',DecisionTreeRegressor()],

['Linear Regression :', LinearRegression()],

['SVM :', SVR()]]

print("Results...")

#For loop for generating model

results forname,model in models:

model = model

#Fit training data of x & y

axis model.fit(X_train,

y_train) #Pass predicted or

test result

predictions = model.predict(X_test)

#Difference between actual value & predicted value

print(name, (np.sqrt(mean_squared_error(y_test, predictions))))

classifier = RandomForestRegressor()

classifier.fit(X,y)

#X.columns features in

dataset feature_names =

X.columns

print(feature_names)

#Initialize importance_frame[] in 2 dim array.

importance_frame = pd.DataFrame()
#Two Dimensional Array Format column

names importance_frame['Features'] =

X.columns

#classifier.feature_importance is decision tree based on correlation value As per importance of admission

importance_frame['Importance'] = classifier.feature_importances_

#Sort the features by high to low bar graph

importance_frame = importance_frame.sort_values(by=['Importance'], ascending=True)

#Visualize 7 Feature Importances

#bar: plots horizontal rectangles with constant heights.

plt.barh([1,2,3,4,5,6,7], importance_frame['Importance'], align='center', alpha=0.5)

#yticks: set feature lable on y axis

plt.yticks([1,2,3,4,5,6,7], importance_frame['Features'])

plt.xlabel('Importance')

#Clearly, CGPA is the most factor for graduate admissions followed by GRE Score.

plt.title('Feature Importances')

plt.show()

Output:
Conclusion: Thus we have studied different classification techniques.

Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
Lect 6 Quantinfo 1112
No ratings yet
Lect 6 Quantinfo 1112
13 pages
Assignment 1:: Intro To Machine Learning
No ratings yet
Assignment 1:: Intro To Machine Learning
6 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
6CS4-02 Machine Learning Manish Bhardwaj
No ratings yet
6CS4-02 Machine Learning Manish Bhardwaj
625 pages
Data Science Machine Learning
No ratings yet
Data Science Machine Learning
470 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
ML Module Iii
No ratings yet
ML Module Iii
12 pages
Machine Learning QNA
No ratings yet
Machine Learning QNA
1 page
ML File
No ratings yet
ML File
17 pages
K.Venkat Ratnam 191911412 Class Work 1) Describe The Attribute Selection Measures Used by The ID3 Algorithm To Construct A Decision Tree. A)
No ratings yet
K.Venkat Ratnam 191911412 Class Work 1) Describe The Attribute Selection Measures Used by The ID3 Algorithm To Construct A Decision Tree. A)
8 pages
Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
FALL SEMESTER 2019-20 AI With Python: ECE4031 Digital Assignment - 1
No ratings yet
FALL SEMESTER 2019-20 AI With Python: ECE4031 Digital Assignment - 1
14 pages
statistic inference unit 2 notes
No ratings yet
statistic inference unit 2 notes
34 pages
Module 4 - Supervised Learning - First ML Model
No ratings yet
Module 4 - Supervised Learning - First ML Model
23 pages
Whole ML PDF 1614408656
100% (1)
Whole ML PDF 1614408656
214 pages
Assignment1_LATEX
No ratings yet
Assignment1_LATEX
11 pages
ML Unit 3 Part 3
No ratings yet
ML Unit 3 Part 3
33 pages
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
dwm_06
No ratings yet
dwm_06
4 pages
Practical No4 - 5 ML
No ratings yet
Practical No4 - 5 ML
11 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Minor Project
No ratings yet
Minor Project
9 pages
ML LAB MANUAL 4-8
No ratings yet
ML LAB MANUAL 4-8
11 pages
ML Unit 1
No ratings yet
ML Unit 1
27 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
5 - Model For Predictions - ML
No ratings yet
5 - Model For Predictions - ML
52 pages
KNN-Unit1-Notes (1)
No ratings yet
KNN-Unit1-Notes (1)
57 pages
Practical # 11
No ratings yet
Practical # 11
10 pages
Supervised Learning (Classification and Regression)
No ratings yet
Supervised Learning (Classification and Regression)
14 pages
Data Science Machine Learning
No ratings yet
Data Science Machine Learning
369 pages
41 j48 Naive Bayes Weka
No ratings yet
41 j48 Naive Bayes Weka
5 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
Scikit - Notes ML
100% (2)
Scikit - Notes ML
12 pages
Machine Learning and Deep Learning Supervised Learning 1682688720
No ratings yet
Machine Learning and Deep Learning Supervised Learning 1682688720
121 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
105 pages
Unit 4_Question Bank and answers
No ratings yet
Unit 4_Question Bank and answers
23 pages
Classification Algorithms I
No ratings yet
Classification Algorithms I
14 pages
Machine Learning Algorithm
No ratings yet
Machine Learning Algorithm
8 pages
Unit - 2 ML notes
No ratings yet
Unit - 2 ML notes
14 pages
Interview Questions
100% (1)
Interview Questions
67 pages
Assignment 1 - LP1
No ratings yet
Assignment 1 - LP1
14 pages
CE802 Report
No ratings yet
CE802 Report
7 pages
Interview Questions For DS & DA (ML)
100% (1)
Interview Questions For DS & DA (ML)
66 pages
Prac5 AAM
No ratings yet
Prac5 AAM
2 pages
Gradient Descent Algorithm
No ratings yet
Gradient Descent Algorithm
5 pages
1 - An Introduction To Machine Learning With Scikit-Learn
No ratings yet
1 - An Introduction To Machine Learning With Scikit-Learn
9 pages
Raghav soni(20IOT6014) Algo_Assignment
No ratings yet
Raghav soni(20IOT6014) Algo_Assignment
14 pages
Machine Learning Part: Domain Overview
No ratings yet
Machine Learning Part: Domain Overview
20 pages
TD2345
No ratings yet
TD2345
3 pages
DWDM_pavan_final[1]
No ratings yet
DWDM_pavan_final[1]
10 pages
Codes and Concepts of ML-Developer-2
No ratings yet
Codes and Concepts of ML-Developer-2
17 pages
Handling The Dataset Using R - Word
No ratings yet
Handling The Dataset Using R - Word
54 pages
UNIT 1 - Types of Learning
No ratings yet
UNIT 1 - Types of Learning
13 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
No ratings yet
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
10 pages
Data Science Interview Questions
100% (1)
Data Science Interview Questions
68 pages
Unit 3 (DWDM)
No ratings yet
Unit 3 (DWDM)
23 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Modeling of Physical Systems Using Simulink.: Control System Lab 3
No ratings yet
Modeling of Physical Systems Using Simulink.: Control System Lab 3
8 pages
Finance and the Behavioral Prospect: Risk, Exuberance, and Abnormal Markets 1st Edition James Ming Chen (Auth.) - The ebook is now available, just one click to start reading
100% (1)
Finance and the Behavioral Prospect: Risk, Exuberance, and Abnormal Markets 1st Edition James Ming Chen (Auth.) - The ebook is now available, just one click to start reading
63 pages
TOP - 20 Syllogism Questions For IBPS PO & Clerk Exams
No ratings yet
TOP - 20 Syllogism Questions For IBPS PO & Clerk Exams
16 pages
Aero9500 STK T2
No ratings yet
Aero9500 STK T2
18 pages
Statistics MMW
No ratings yet
Statistics MMW
65 pages
Fourth Grade Hands-On Activities
No ratings yet
Fourth Grade Hands-On Activities
4 pages
Methods For Designing Ship
100% (2)
Methods For Designing Ship
60 pages
Counter & Registers Counter & Registers
No ratings yet
Counter & Registers Counter & Registers
39 pages
You Know My Methods. Apply Them! - Sherlock Holmes
No ratings yet
You Know My Methods. Apply Them! - Sherlock Holmes
1 page
Interpenetration of Solids / Intersection of Surfaces / Lines & Curves of Intersection
No ratings yet
Interpenetration of Solids / Intersection of Surfaces / Lines & Curves of Intersection
23 pages
TRANS-RS199A-Module 6
No ratings yet
TRANS-RS199A-Module 6
4 pages
Energy in Thermal Processes Problem Solutions: Chapter 11
No ratings yet
Energy in Thermal Processes Problem Solutions: Chapter 11
6 pages
Blooms Taxonomy Action Verbs PDF
No ratings yet
Blooms Taxonomy Action Verbs PDF
1 page
120 Days Countdown
No ratings yet
120 Days Countdown
6 pages
Measure of Central Tendency - Questions
No ratings yet
Measure of Central Tendency - Questions
14 pages
Answers Homework3 PDF
No ratings yet
Answers Homework3 PDF
7 pages
Logic and Ciritical Thinking
No ratings yet
Logic and Ciritical Thinking
12 pages
DATAFRAME
No ratings yet
DATAFRAME
4 pages
DIGITAL ELECTRONICS Exam Notes
No ratings yet
DIGITAL ELECTRONICS Exam Notes
42 pages
Task 10
No ratings yet
Task 10
1 page
Contoh Soalan
No ratings yet
Contoh Soalan
5 pages
Assembler Training (Basics) Part - 1
No ratings yet
Assembler Training (Basics) Part - 1
133 pages
DLD Final Lab Rubrics Done
No ratings yet
DLD Final Lab Rubrics Done
8 pages
Probability: Hypothesis
No ratings yet
Probability: Hypothesis
35 pages
Cmcsis DSP Ccs
No ratings yet
Cmcsis DSP Ccs
30 pages
Reverse-Scoring Items in SPSS
No ratings yet
Reverse-Scoring Items in SPSS
2 pages
NP 082 04 Actiunea Vantului
No ratings yet
NP 082 04 Actiunea Vantului
14 pages
Digital Imaging and Communication in Nondestructive Evaluation (DICONDE) For Ultrasonic Test Methods
No ratings yet
Digital Imaging and Communication in Nondestructive Evaluation (DICONDE) For Ultrasonic Test Methods
10 pages
01. Introduction to Machine Learning
No ratings yet
01. Introduction to Machine Learning
4 pages