Resume Screening Using Machine Learning
Resume Screening Using Machine Learning
ABSTRACT
Article Info The Indian Recruitment market has grown substantially over the last half-decade as
Volume 8, Issue 2 the need for cheap labor grows and the number of job openings is increasing. And as
Page Number : 253-258 the job market increases so does the recruitment industry which is a new way of
hiring people by outsourcing the hiring process itself to other companies whose sole
Publication Issue : purpose is to give the correct talent required for the company. This is done because
March-April-2022 these companies are hiring in bulk and doing such a thing in-house will require a lot
of company resources which will hamper productivity. As such companies emerge
Article History even for them manually going through all of the Resume of candidates is very time-
Accepted: 15 April 2022 consuming and tedious so these Talent Acquisition Companies use various Machine
Published: 25 April 2022 Learning models to filter out top resumes according to the job roles, which reduces
the efforts for the Human Resource team
Keywords: NLP, Resume, CV, KNN, SVM, NER
Copyright: © the author(s), publisher and licensee Technoscience Academy. This is an open-access article distributed under the 253
terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use,
distribution, and reproduction in any medium, provided the original work is properly cited
Bhushan Kinge et al Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol, March-April -2022, 8 (2) : 253-258
Hence, if the companies hire in bulk there are many web service API [9]. This data is then used to rank
applications to find the talent that they need which the student’s resumes comparing their skills
will require a considerable amount of resources and required in the job, this is done using the Named
time, this problem Talent acquisition Companies arise Entity Recognition(NER) like Apache OpenNLP
as solutions for this problem who fill in the spot and [10] and Stanford Name Entity Recognizer [11].
get the job done with less amount of resources costing The Skill finder efficiently matches the Resumes
to the company with an acceptable timeline. Even according to the job role posted and successfully
here the applications are in millions which is a sends emails to the desiredcandidates
tedious task to go through them hence these
companies use various Machine learning models C. ResumeNET : A Learning-based Framework for
which will rank out the top resumes which are the Automatic Resume Quality Assessment
best fit for the job role. In research conducted by Yong Luo [3], they have
developed a custom dataset of 10,343 resumes which
II. LITERATURE was acquired by a private resume management
company. 98.82% (i.e 10,221 resumes) data is
A. Machine Learning approach for automation unlabeled and the remaining 1.18% (i.e. 122 resumes)
Resume Recommendation System data is labeled in 2 categories positive and negative,
Pradeep Kumar Roy in their research [1], 33 and 89 of them are labeled as positive and
created a system where they can minimize negative.
the cost of hiring new candidates for the job
D. Web Application for Screening Resume
positions in the company. They focused on 3
major problems in this process The goal for Sujit Amin [4] was to develop a web
application for resume screening, with the help of
● Picking the right candidates from
220 resumes out of which 200 were used for training
the applicants
and 20 used for testing purposes, further, the web
● Making sense of their CV’s
application is divided into 3 divisions
● Finding out if the candidate is fit
A) Job Applicant side
for the job role
B) Server-Side
They performed NER, NLP, and text
classification using n-grams and used C) Recruiter Side
Machine Learning to perform the The applicant side is where the applicant will provide
classification using the algorithms of Random his/her resume, the server-side will process the
Forest with 38.9% accuracy, Multinomial resume and then be trained using the NLP Pipeline
Naïve Bayes with 44.39%, Logistic which used SpaCy which is an NLP framework. On
Regression with 62.4%, and the highest the recruiter’s side, the rank list of the resumes will
accuracy was obtained by Linear Support
be shown which was decided from a score calculator
Vector Machine Classifier with an accuracy
of 78.53% . [13] so the recruiter can select the best fit candidate
for the job.
B. Skill Finder: Automated Job-Resume Matching
System E. Design and Development of Machine Learning
In research conducted by Thimma Reddy Kalva [2], based Resume Ranking System
they have developed a custom dataset of 3000 jobs The proposed system hereby Tejaswini K [5] is where
and 80 resumes from the website indeed using the the resume is submitted by the candidate after an
MCQ test which has a face detection system to detect or, the, etc.”. Other techniques like Parts of Speech
malpractice. Once the resume is submitted it’s run tagging and NER are also used.
through NLP techniques to get the relevant skills Total 6 types of classification models are used: Naïve
from the resume and use TF- IDF vectorization [14] bayes, Multinomial Naïve Bayes, Linear SVM,
to convert the words into vectors so the machine can Bernoulli Naïve Bayes, Logistic Regression and KNN.
understand it. These classification models are run on a dataset of
30,000 employees’ resumes which was split in a 9:1
The classifier used here is the KNN algorithm to ratio where 27,000 were used for training and 3000
identify the resume that closely matches the JD used for testing across various domains including AI,
provided by the recruiter. The system has an average Computer Architecture, CG, Databases, Distributed
parsing accuracy of 85%. Computing, CN, Web Technologies and Cloud
Computing. Out of all the 6 Classifiers Multinomial
F. Resume Classification and Ranking using KNN Naïve Bayes came out with a top accuracy of 91%.
and Cosine Similarity
Riza Tanaz Fareed, Rajath V, Sharadadevi H. Differential Hiring using a Combination of NER
Kaganumath came up with a method to implement and Word Embedding
the Resume classification with the addition of cosine The objectives of this examination conducted by
similarity [15]. The process is the candidate provides Suhas H E and Manjunath A E [17], were to create a
his/her resume to the system. model which uses NLP, NER Word embedding, and
Cosine Similarity to suggest Resume for Job roles.
The resume is then passed through an NLP pipeline Resumes and the JD is taken as the input by the
where the words are extracted out of the resume. system.
Techniques like stop words, lemmatization are used to Data dump of technical skills [18] used to tag
get the correct set of words. TF-IDF vectorizer [14] is technical skills in each resume document. Tab-
used to vectorize the words for the KNN model to separated value (TSV0 file is generated and that file is
classify the resume into various categories. Now to provided to the Stanford NER model [10] to train the
evaluate the resume on the given JD document NER model. The output of the NER model (i.e. skills)
similarity detection is necessary so the Cosine becomes the input for the word2vec model which
Similarity Algorithm is used in which the JD content uses a shallow neural network.
is matched with the candidate’s resume. The accuracy The last step of the process is Cosine Similarity which
for this trained model is 98.96% . determines how much the given resume matches the
given JD. The accuracy obtained was 79.8%.
G. Automated Tool for Resume Classification using
Semantic analysis III. LIMITATIONS
This study conducted by Suhas Tangadle ▪ The Above systems has models that don’t have
Gopalakrishna and Vijayaraghavan Varadharajan [16] any way to improve themselves over the time,
provides a descriptive view of how they use semantic the models will be trained only once.
analysis for resume classification. The received ▪ The above models used Machine learning
resumes by the HR team are parsed through the algorithms which have a tend to plateau in
Natural Language Processing Pipeline (NLPP) where performance when runned over a large dataset.
the Stop deletion is used to delete the words like “and,
Table1 shows accuracy in % when different methods Research & Technology (IJERT), ISSN: 2278-
are used for resume recommendation. 0181, Vol.10.
[6]. Suhas Tangadle Gopalakrishna, Vijayaraghavan
V. CONCLUSION Varadharajan, “Automated Tool for Resume
Classification Using Semantic Analysis”,
This Paper deals with multiple methods to detect, International Journal of Artificial Intelligence
identify and classify various resumes using multiple and Applications (IJAIA), Vol. 10, No.1,
machine learning and Neural Network models like January 2019
SVM, KNN, Word2Vec, Cosine similarity, etc. The [7]. Suhas H E, Manjunath AE, “Differential Hiring
accuracy of the models varies based on the datasets using a Combination of NER and Word
used, the complexity of the learning methods and the Embedding”, In 2020 International Journal of
size of the dataset, the results range from 78% - to Recent Technology and Engineering (IJRTE),
98%. We conclude that with a proper dataset and the ISSN: 2277-3878, Vol.9
right algorithm we can get good accuracy and desired [8]. Centre for Monitoring Indian Economy Pvt
output for a large variety of purpose. Ltd. (CMIE),2022. The unemployment rate in
India.
VI. REFERENCES [9]. Howard, J.L., Ferris, G.R., 1996. The
employment interview context: Social and
[1]. Pradeep Kumar Roy, Vellore Institute of situational influences on interviewer decisions
Technology, 2019. A Machine learning 1. Journal of applied social psychology 26, 112-
approach for automation of resume 136.
recommendation system, ICCIDS 2019. [10]. Mudit Kapoor, Business Today, 2021. India’s
10.1016/j.procs.2020.03.284. formal job creation numbers beat pandemic
[2]. Thimma Reddy Kalva, Utah State University, blues.
2013. Skill-Finder: Automated Job-Resume [11]. M. Belkin, P. Niyogi, and V. Sindhwani,
Matching system. 3]Yong Luo, Nanyang “Manifold regularization: A geometric
Technological University, 2018. A Learning- framework for learning from labeled and
Based Framework for automatic resume quality unlabeled examples,” Journal of Machine
assessment, arXiv:1810.02832v1 cs.IR]. Learning Research, vol. 7, pp. 2399–2434, 2006.
[3]. Suhjit Amin, Fr.Conceicao Rodrigues Institute [12]. A. Zaroor, M. Maree, and M. Sabha, “A Hybrid
of Technology, 2019. Web Application for Approach to Conceptual Classification and
Screening resume, IEEE DOI: Ranking of Resumes” In Czarnowski I., Howlett
10.1109/ICNTE44896.2019.8945869. R. (eds) Intelligent Decision Technologies 2017.
[4]. Tejaswini K, Umadevi V, Shashank M Kadiwal, IDT 2017. Smart Innovation, Systems and
Sanjay Revanna, Design and Development of Technologies vol 72. Springer.
Machine Learning based Resume Ranking [13]. Jabri, Siham, Azzeddine Dahbi, Taoufiq Gadi,
System (2021), DOI: and Abdelhak Bassir. "Ranking of text
https://doi.org/10.1016/j.gltp.2021.10.002. documents using TF-IDF weighting and
[5]. Riza tana Fareed, rajah V, and Sharadadevi association rules mining." In 2018 4th
kaganumath, “Resume Classification and international conference on optimization and
Ranking using KNN and Cosine Similarity” In applications (ICOA), pp. 1-6. IEEE, 2018. .
2021 International Journal of Engineering
[14]. The data source for the skills used in the NER Cite this article as :
train.
[15]. Jagan Mohan Reddy D, Sirisha Regella., Bhushan Kinge, Shrinivas Mandhare, Pranali Chavan,
“Recruitment Prediction using Machine S. M. Chaware, "Resume Screening using Machine
Learning”, IEEE Xplore, 2020. Learning and NLP: A proposed system", International
[16]. Resnick, P., Varian, Journal of Scientific Research in Computer Science,
H.R.,1997.Recommender Engineering and Information Technology
Systems.Communications of the ACM40, 56– (IJSRCSEIT), ISSN : 2456-3307, Volume 8 Issue 2, pp.
59. 253-258, March-April 2022. Available at doi :
[17]. Xavier Schmitt, Sylvain Kubler, Jer my Robert, https://doi.org/10.32628/CSEIT228240
Mike Papadakis, Yves LeTraon University of Journal URL : https://ijsrcseit.com/CSEIT228240
Luxembourg, Luxembourg Replicable
Comparison Study of NER Software:
StanfordNLP, NLTK, OpenNLP, SpaCy, Gate.
[18]. Y. Luo, Y. Wen, T. Liu, and D. Tao,
“Transferring knowledge fragments for learning
distance metric from a heterogeneous domain,”
IEEE Transactions on Pattern Analysis and
Machine Intelligence, 2018.
[19]. Mikheev, Andrei; Moens, Marc; Glover, Claire.
1999. “Named Entity Recognition without
Gazetteers.” Proceedings of EACL ’99. HCRC
Language Technology Group, University of
Edinburgh. http://acl.ldc.upenn.edu/E/E99/E99-
1001.pdf.
[20]. Zhou, GuoDong; Su, Jian. 2002. “Named Entity
Recognition using an HMM-based Chunk
Tagger.” Proceedings of the Association for
Computational Linguistics (ACL), Philadelphia,
July 2002. Laboratories for Information
Technology, Singapore
[21]. Zhang, L., Fei, W., Wang,
L.,2015.Pjmatchingmodelofknowledgeworkers.
Procedi acomputerscience60,1128–1137
[22]. http://www.indeed.com/isp/apiinfo.jsp
[23]. https://opennlp.apache.org/documentation/1.5.
3/
manual/opennlp.html#tools.namefind.recogniti
on
[24]. https://nlp.stanford.edu/software/CRF-
NER.shtml