Automated_Resume_Classification_System_Using_Ensemble_Learning
Automated_Resume_Classification_System_Using_Ensemble_Learning
Abstract—One of job recruiters’ biggest challenges is require manual human labor. We propose a mechanism
selecting a suitable resume from the pool of resumes. For a that allows the recruiters to recruit based on the skill set
single role, thousands of candidates send their resumes. and information mentioned in the resume of the
Manually selecting the resume from a large number of candidates, by using an ensemble deep-learning model,
applicants and assigning them suitable positions is time which classifies the given resume into various categories.
taking and not feasible. An automated system can make this This reduces the time to screen a resume manually. Ease
process easy and efficient. This system takes candidates’ of Use.
resumes in word, pdf, or any format and classifies them
according to the skill set mentioned in the resume. We
propose an ensemble deep-learning model to classify the II. LITERATURE SURVEY
resume. The research paper “A machine learning approach for
Automation of resume recommendation system” describes
Keywords—resume; classification; job; deep learning; an algorithm that analyses the characteristics extracted
ensemble learning; skill set.
from the resume and categorizes those characteristics
according to the job description. The categorized resume
I. INTRODUCTION is mapped and recommends the candidate who is more
Recruitments in the Information Technology field suitable for the position. They have built two models. A
have been increasing exponentially. Recruiters must Classification model was built using several algorithms
properly screen resumes to hire suitable candidates. The like the random forest, Multinomial Naïve Bayes, Logistic
process of checking if a candidate is suitable for a Regression, and Linear Support Vector Machine
particular role according to the information from their Classifiers. Among these models, the SVM model’s
CV/Resume is called resume screening. Recruiters have to performance was better. The Recommendation model was
screen through a large amount of resume data fast and built on content-based recommendation and K-Nearest
reliably. Neighbours [1].
The most important and basic tool in any selection The research paper “automated tool for resume
process is the candidate’s resume. Interviewing has classification using semantic analysis” presents the
become a time-consuming affair. The number of development of a resume classification application. It uses
applications is in the millions, making it time-consuming a voting classifier which is based on ensemble learning. It
to sort through them. Here we need a machine learning categorizes a candidate’s profile into an appropriate
algorithm that can give a better way of screening and full domain in accordance with the work experience and other
fill the requirements in the industry. details given by the applicant in the profile [2].
The world of Artificial Intelligence and Machine The research Paper “Resume Classification using
Learning has grown significantly. In the discipline of various Machine Learning Algorithms” describes a model
machine learning, a dataset is used to train a model to using Nave Bayes, Random Forest, and SVM, which
predict the intended outcome from incoming data. The extracts skills and shows diverse capabilities under
large amounts of data available have contributed to appropriate job profile classes. Random Forest gave the
significant growth in the performance of ML models best accuracy among the three of them [3].
recently. We can take advantage of this growth in ML for The research paper “Differential Hiring using a
automation and increase productivity in the areas which Combination of NER and Word Embedding” describes a
979-8-3503-9737-6/23/$31.00 ©2023 IEEE
1782
Authorized licensed use limited to: Dayananda Sagar University. Downloaded on March 13,2025 at 05:19:34 UTC from IEEE Xplore. Restrictions apply.
2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS)
methodology using the NLP, Word2Vec which is a pre- 1) Data Cleaning: In the cleaning process, numbers,
trained word embedding layer. Word embedding is a special characters, and words with single letters are
method of expressing words as real-valued vectors that removed. Then we get the cleaned resumes .
convey their meaning in such a way that words that are 2) Tokenization: Tokenization was performed on the
adjacent to one another in the vector space are assumed to
resume data using the tokenizer class of TensorFlow.
have identical meanings. This will help to get the most
accurate resume according to the skillset provided [4]. 3) Removal of stop words: Words such as ‘is’, ‘are’,
‘was’ etc are called stop words. They appear in most of
The research paper “Resume Screening Using the text. Such stop words that do not provide any
Machine Learning and NLP: A Proposed System” important information for the classification task are
proposes a machine learning model which takes a removed from the generated tokens.
student's resume and according to skills and other details English stop words were imported from the NLTK
mentioned in the resume, the model shows suitable job corpus and used for the stop word removal process.
roles and the resume's relevance to the job description [5].
4) Label encoding: Label encoding should be done to
“Resume Ranking based on Job Description using
SpaCy NER model” devised a method that lowers hiring assign a numerical label to all categories. The sklearn
costs and speeds up the process of selecting the best Label Encoder was used. Fig. 2. shows how after label
candidate for the job role [6]. encoding, the categories are given unique numeric values.
Number of instances of different domains in the dataset is
“Domain Adaptation for Resume Classification Using shown in Fig. 1.
Convolutional Neural Networks” employs a classifier to
categorize resume data after training it on a large number
of openly accessible job description excerpts. Despite just
having a tiny amount of labeled resume data at their
disposal, they empirically confirmed a respectable
classification performance of the approach [7].
The research work “A Hybrid Approach to Conceptual
Classification and Ranking of Resumes and Their
Corresponding Job Posts" presents a hybrid approach
using a conceptual-based classification of resumes and a
ranking system that ranks the candidates according to the
corresponding job offers. They collected 2000 resumes
from online sites and 10,000 different job postings for the
experiment. They used job titles and skill sets in the
classification process. They got higher precision results
[8].
In “Towards an automated system for intelligent
screening of candidates for recruitment using ontology
mapping EXPERT”, EXPERT mapping-based candidate
screening, an intelligent ontology tool, was utilized to Fig. 1. Number of instances of different domains in the dataset.
construct an automated system for the intelligent
screening of prospects for recruitment, improving the
precision with which candidates are matched to the job
criteria [9].
III. METHODOLOGY
B. Data Preprocessing
Data Preprocessing involves converting raw data in
the dataset suitable to our task. The information supplied
by the Curriculum vitae in this procedure would be
cleaned. Unnecessary data would be removed. Then the
data would be converted into vectors. The following steps Fig. 2. After label encoding
were performed in the data pre-processing of resume data.
1783
Authorized licensed use limited to: Dayananda Sagar University. Downloaded on March 13,2025 at 05:19:34 UTC from IEEE Xplore. Restrictions apply.
2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS)
C. Model Architecture The Max pooling layer is used to downsample the feature
We created an ensemble model using 1D space. The pool size is 2. Each of the dimensions of the
Convolutional Neural Network (CNN) and Bi-directional output can be considered an ‘extracted feature’. Then we
Gated Recurrent Unit (GRU), as in Fig. 3. They both act used a flattened layer. Then this is fed to a drop-out layer.
as two channels in our model. Each and every text The drop-out rate is 0.5. It is used to regularize learning
message is mapped to a reality we used the pre-trained and prevent overfitting.
word embeddings trained with a skip-gram model using At last, a softmax layer is added.
the 3-billion-word Google News corpus [10]. The input
sequences are fed to the embedding layer. In channel 2, the output from the embedding layer
feeds into a GRU layer. Then the output is fed to a drop-
In channel 1, the embedding layer’s output is fed into out layer. The drop-out rate is 0.5. It is used to regularize
a 1D convolutional layer. The number of filters is 100. learning and prevent overfitting. At last, a softmax layer is
The kernel size is 3. The rectified linear unit (ReLU) is added.
used as the activation function. ReLU helps in preventing
the exponential computation growth in neural networks. The output from both channels is combined to get the
The input feature space is convolved as a result of this. final output.
The convolved input feature space is then down-sampled. Fig. 3. Shows the model architecture.
1784
Authorized licensed use limited to: Dayananda Sagar University. Downloaded on March 13,2025 at 05:19:34 UTC from IEEE Xplore. Restrictions apply.
2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS)
REFERENCES
[1] Roy, P. K., Chowdhary, S. S., & Bhatia, R. (2020). A Machine
Learning approach for automation of Resume Recommendation
system. International Conference on Computational Intelligence
and Data Science, 167(Elsevier B.V), 2318–2327.
[2] Gopalakrishna, S. T., & Varadharajan, V. (2019). AUTOMATED
TOOL FOR RESUME CLASSIFICATION. International Journal
of Artificial Intelligence and Applications, 10. Bengaluru.
[3] Pal, R., Shaikh, S., Bhagwat, S., & Satpute, S. (2022). Resume
Classification using various Machine Learning Algorithms.
International Conference on Automation, Computing and
Communication, 44. Navi Mumbai.
[4] (2020). Differential Hiring using a Combination of NER and Word
Embedding. International Journal of Recent Technology and
Engineering.
[5] Kinge, B., Mandhare, S., Chavan, P., & Chaware, S. M. (2022).
Fig. 5. Web app Resume Screening Using Machine Learning and NLP : A Proposed
System. International Journal of Scientific Research in Computer
Science, Engineering and Information Technology, 8(2), 253-258.
[6] Dr.K.Satheesh, A.Jahnavi, L.Iswarya, K.Ayesha, G.Bhanusekhar,
& K.Hanisha. (2020). Resume Ranking based on Job Description
using SpaCy NER model. International Research Journal of
Engineering and Technology, 07(05), 74-77.
[7] Sayfullina, L., Malmi, E., Liao, Y., & Jung, A. (2017). Domain
Adaptation for Resume Classification Using Convolutional Neural
Networks. Springer, Cham.
[8] Zaroor, Abeer & Maree, Mohammed & Sabha, Muath. (2017). A
Hybrid Approach to Conceptual Classification and Ranking of
Resumes and Their Corresponding Job Posts. 10.1007/978-3-319-
59421-7_10.
[9] V, Senthil kumaran & Annamalai, Sankar. (2013). Towards an
automated system for intelligent screening of candidates for
recruitment using ontology mapping EXPERT. International
Journal of Metadata, Semantics and Ontologies. 8. 56-64.
10.1504/IJMSO.2013.054184.
[10] Zhang, Z., Robinson, D., & Tepper, J. (2018). Detecting Hate
Speech on Twitter Using a Convolution-GRU Based Deep Neural
Network (Vol. 48). Europe: Springer, Cham, June 2018.
doi:10.1007/978-3-319-93417-4_48
1785
Authorized licensed use limited to: Dayananda Sagar University. Downloaded on March 13,2025 at 05:19:34 UTC from IEEE Xplore. Restrictions apply.