IEEE
IEEE
net/publication/360130609
CITATIONS READS
3 1,770
1 author:
Priyanka Shahane
6 PUBLICATIONS 14 CITATIONS
SEE PROFILE
All content following this page was uploaded by Priyanka Shahane on 25 October 2022.
Abstract — Campus placement is an activity of participating, considered the features such as matriculation score, senior
identifying and hiring young talent for internships and entry secondary score, scores of the subjects in various semesters &
level positions. demographics. Here, dataset used is of GuruNanak Dev
Engineering College (GNDEC), Ludhiana. This model gave an
Reputation and yearly admissions of the institute invariably accuracy of around 83.33%.
depend upon the placements provided by the institute to the
students. Therefore, most of the institutions, assiduously, try to Elayidom et. al. constructed multi way decision trees using
boost their placement department in order to improve their various parameters such as branch, sector, sex & rank. Here, the
organization on a full scale. Any assistance during this specific dataset used is received from the National Technical Manpower
space can have a good impact on the institute’s capability to Information System (NTMIS) via the Nodal center. This model
position it’s students. gave an accuracy of 80%.
In this study, the target is to analyze student's placement data Nagaria et. al. used the Random Forest model where he has
of last year and use it to determine the probability of campus considered various parameters such as degree type, work
placement of the present students. For this we have experimented experience, e test percentage, specialization, MBA percentage.
with four different machine learning algorithms i.e. Logistic The dataset used is taken from Kaggle. This model gave the
Regression, Decision Tree, K Nearest Neighbours and Random highest accuracy of 85 %.
Forest.
S.Venkatachalam et. al. designed the fuzzy inference system
Index Terms — Machine Learning, Campus placements using Naive Bayes algorithm for campus placement prediction.
prediction, Logistic Regression, Decision Tree, KNN, Random The dataset is prepared with the help of primary & secondary
Forest data collection sources. This model gave the highest accuracy of
86.15%.
I. INTRODUCTION
Manvitha et. al. designed used the Random Forest model
NOWADAYS the number of educational institutes is where she has considered various parameters such as credit ,
growing day by day. The aim of each higher educational backlogs , whether placed or not, b.tech %. The dataset is
institute is to help their students to get a well-paid job through collected from the placement department of Sreenidhi Institute
their placement cell. One of the biggest challenges that higher of Science and Technology. This model gave the highest
learning institutes face these days is to uplift the placement accuracy of 86%.
performance of scholars.
The goal of this system is to predict whether the student III. METHODOLOGY
will get a campus placement or not based on various The steps involved in this system are as follows,
parameters such as gender, SSC percentage, HSC percentage,
HSC stream, degree percentage, degree type, work experience A. Data Acquisition:
& e-test percentage. The campus placement dataset is collected from Kaggle
This research focuses on various algorithms of machine website. Here is the link for the dataset:
learning such as Logistic Regression, Decision Tree, K-Nearest https://www.kaggle.com/benroshan/factors-affecting-campus
Neighbours and Random Forest in order to produce placement?select=Placement_Data_Full_Class.csv
economical and correct results for campus placement The dataset consists of various attributes such as Serial
prediction. This system follows a supervised machine learning Number, Gender, SSC percentage, SSC Board - Central/ Others,
approach as it uses class labelled data for training the HSC percentage, HSC Board, HSC Specialization, Degree
classification algorithm. Percentage, UG Degree Stream, Work Experience, E -test
Percentage, Degree Specialization, Degree Percentage,
II. LITERATURE SURVEY Placement Status & Salary. The size of dataset is 19.71 KB & it
Sharma et. al. developed the placement predictor system has total 215 records.
i.e. PPS by using a model of logistic regression. For this he has
1) Handling missing values:
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on May 02,2022 at 10:01:58 UTC from IEEE Xplore. Restrictions apply.
In our dataset missing values are present only in the salary 3) Feature Selection:
column as these values correspond to the students who didn’t Here, various features are visualized to understand their
get placed in any placement drive. So it is assumed that the correlation with the target feature.
missing values in Salary Column are Zero & replaced them by
zero using fillna(0,inplace=True) function in Python.
2) Handling categorical data:
Since we cannot deal with categorical values directly,
mapping is done for attributes having categorical values.
Gender attribute has values M (Male) & Female (M). Here,
M is replaced by 0 & F is replaced by 1. SSC & HSC Board
attributes has values ‘Central’ & ‘Other.’ Here, Central is
replaced by 1 & Other is replaced by 0. Work Experience
attribute has values ‘Yes’ & ‘No’. Here, ‘Yes’ is replaced by 1
and ‘No’ is replaced by 0. Degree specialization attribute has
values ‘Marketing & Finance’ & ‘Marketing & HR’. Here, Fig. 2. M/F ratio
‘Marketing & Finance’ is replaced by 1 and ‘Marketing & HR’
is replaced by 0. Status attribute has values ‘Placed’ and ‘Not Here, male : female ratio for one batch of students is
Placed’. Here, ‘Placed’ is replaced by 1 and ‘Not Placed’ is approximately equal to 2. It means that there are 2 male
replaced by 0. This is achieved through map function in candidates appearing for placement drives for every 1 female
Python. candidate.
For e.g.,
x df['gender']=df['gender'].map({'M':0,'F':1})
x df['ssc_b']=df['ssc_b'].map({'Central':1,'Others':0})
x df['workex']=df['workex'].map({'Yes':1,'No':0})
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on May 02,2022 at 10:01:58 UTC from IEEE Xplore. Restrictions apply.
In the above graph, class 1 represents students having
scores between 80-100%, class 2 represents students having
scores between 60-80% and class 3 represents students having
less than 60 % score in 10th standard.
Fig. 5. Placement count vs. 10th percentage From the above graph, it's observed that all the students
having scores between 80-100% in 12th standard got placed.
From the above graph, it's observed that all the students Very few students having scores between 60-80% in 12th
having scores between 80-100% in 10th standard got placed. standard couldn’t get placed. Whereas, most of the students
Very few students having scores between 60-80% in 10th having below 60% score in 12th standard couldn’t get placed.
standard couldn’t get placed. Whereas, most of the students
having below 60% score in 10th standard couldn’t get placed.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on May 02,2022 at 10:01:58 UTC from IEEE Xplore. Restrictions apply.
Fig. 11. Placement count vs. MBA percentage
Fig. 9. Placement count vs. UG percentage In the above graph we can see that more students from class
2 got placed as compared to class 3.
From the above graph, it's observed that most of the
students having scores between 80-100% in UG got placed. Hence, it is clear that placement count of the students is
Very few students having scores between 60-80% in UG dependent on various features such as Gender, SSC percentage,
couldn’t get placed. Whereas, most of the students having SSC Board - Central/ Others, HSC percentage, HSC Board,
below 60% score in UG couldn’t get placed. HSC Specialization, Degree Percentage, UG Degree Stream,
Work Experience, E -test Percentage, Degree Specialization,
Degree Percentage.
4) Split data:
Here, data is divided into two parts i.e. training data &
testing data. Where 80 % data is taken for training our machine
learning algorithm and remaining 20 % data is used for testing
whether our trained machine learning model is working
correctly or not.
5) Machine Learning Algorithm:
a) Logistic Regression:
Logistic regression is a statistical method used to determine
the outcome of a dependent variable (y) based on the values of
independent variable (x).
In our problem dependent variable is placement status and
independent variables are the features selected by us in the
previous step.
This algorithm is mostly used for the problems of binary
classification.
b) Decision Tree:
Fig. 10. MBA percentage distribution A decision tree is a graph like a tree where nodes represent
the position where we select the feature and ask a question,
After studying MBA percentage data it is observed that no edges represent the answers of the question; and the leaves
student has secured more than 80% marks. So the class 1 data represent the final output or label of the class.
isn’t available for percentage of MBA.
c) KNN:
K-NN stores all the training data into different classes based
on the class labels and classifies new data by checking its
similarity with data in the available classes.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on May 02,2022 at 10:01:58 UTC from IEEE Xplore. Restrictions apply.
d) Random Forest: III. CONCLUSION
Random Forest classifier consists of a number of decision The problem of campus placement prediction can be solved
trees which apply on different subsets of our dataset and the with the help of different machine learning algorithms such as
average of outputs of all the decision trees is taken to improve Logistic regression, Decision Tree, KNN & Random Forest.
the accuracy of output prediction.
Here, the Logistic Regression algorithm gave the highest
6) Evaluate results: accuracy of 95. 34 % for campus placements prediction.
Accuracy is calculated by following formula,
The selected features i.e. Gender, SSC percentage, SSC
Accuracy = (TP + TN) / (TP + FP + TN + FN) Board - Central/ Others, HSC percentage, HSC Board, HSC
Specialization, Degree Percentage, UG Degree Stream, Work
Where,
Experience, E -test Percentage, Degree Specialization & Degree
TP: True Positive (the number of cases correctly identified Percentage lead to higher classification accuracy.
as placed)
IV. FUTURE SCOPE
TN: True Negative (the number of cases correctly
identified as unplaced). Accuracy may further increase by application of more
advanced techniques such as deep learning & experimenting
FP: False Positive (the number of cases incorrectly with different activation functions of neural networks such as
identified as placed) linear, sigmoid, tan h & ReLU.
FN: False Negative (the number of cases incorrectly We can also experiment with different cross validation
identified as unplaced) techniques such as 3 Fold, 5 Fold, 10 Fold, 15 Fold cross
validation in order to analyze the change in accuracy.
TABLE I. TP, FP, FN & TN VALUES OF DIFFERENT MODELS
Model TP FP FN TN REFERENCES
Logistic Regression 16 1 1 25 [1] A. S. Sharma, S. Prince, S. Kapoor and K. Kumar, "PPS —
Decision Tree 13 3 4 23 Placement prediction system using logistic regression," 2014 IEEE
International Conference on MOOC, Innovation and Technology in
KNN 14 1 3 25 Education (MITE), 2014, pp. 337-341, doi:
Random Forest 13 2 4 24 10.1109/MITE.2014.7020299.
[2] S. Elayidom, S. M. Idikkula, J. Alexander and A. Ojha, "Applying Data
Mining Techniques for Placement Chance Prediction," 2009 International
TABLE II. CAMPUS PLACEMENT PREDICTION ACCURACY OF DIFFERENT Conference on Advances in Computing, Control, and Telecommunication
MODELS. Technologies, 2009, pp. 669-671, doi: 10.1109/ACT.2009.169.
Model Accuracy [3] J. Nagaria and S. V. S, "Utilizing Exploratory Data Analysis for the
Logistic Regression 95.34 % Prediction of Campus Placement for Educational Institutions," 2020 11th
Decision Tree 83.72 % International Conference on Computing, Communication and Networking
Technologies (ICCCNT), 2020, pp. 1-7, doi:
KNN 90.69 %
10.1109/ICCCNT49239.2020.9225441.
Random Forest 88.67 %
[4] S.Venkatachalam,“Data Mining Classification and analytical model of
prediction for Job Placements using Fuzzy Logic,” 2021 IEEE
International Conference on Trends in Electronics and Informatics
(ICOEI), 2021.
[5] Pothuganti Manvitha, Neelam Swaroopa “Campus Placement Prediction
Using Supervised Machine Learning Techniques,” 2019 International
Journal of Applied Engineering Research, pp. 2188-2191.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY BOMBAY. Downloaded on May 02,2022 at 10:01:58 UTC from IEEE Xplore. Restrictions apply.
View publication stats