0% found this document useful (0 votes)
57 views7 pages

Resume Screening Using Machine Learning

Uploaded by

jessen.535220023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views7 pages

Resume Screening Using Machine Learning

Uploaded by

jessen.535220023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

International Journal of Scientific Research in Computer Science, Engineering and Information Technology

ISSN : 2456-3307 (www.ijsrcseit.com)


doi : https://doi.org/10.32628/CSEIT228240

Resume Screening Using Machine Learning and NLP : A


Proposed System
Bhushan Kinge*1, Shrinivas Mandhare2, Pranali Chavan3, S. M. Chaware4
1-3 UG Student, Information Technology, SPPU, Pune, Maharashtra, India
4 Professor, Information Technology, SPPU, Pune, Maharashtra, India

ABSTRACT

Article Info The Indian Recruitment market has grown substantially over the last half-decade as
Volume 8, Issue 2 the need for cheap labor grows and the number of job openings is increasing. And as
Page Number : 253-258 the job market increases so does the recruitment industry which is a new way of
hiring people by outsourcing the hiring process itself to other companies whose sole
Publication Issue : purpose is to give the correct talent required for the company. This is done because
March-April-2022 these companies are hiring in bulk and doing such a thing in-house will require a lot
of company resources which will hamper productivity. As such companies emerge
Article History even for them manually going through all of the Resume of candidates is very time-
Accepted: 15 April 2022 consuming and tedious so these Talent Acquisition Companies use various Machine
Published: 25 April 2022 Learning models to filter out top resumes according to the job roles, which reduces
the efforts for the Human Resource team
Keywords: NLP, Resume, CV, KNN, SVM, NER

I. INTRODUCTION computers process the way human language works in


the form of texts or voice data and to ‘understand’ its
Machine learning is a field where we train a model full meaning. As the job market is growing in India,
with a dataset to predict the desired output when millions of new job seekers are joining the workforce
given new data. Screening the resumes is mostly done every year, as per LinkedIn [7]. Around 1.3 million
using Natural Language Processing (NLP), Natural new jobs were created as per2021 Employees
language refers to the way we humans communicate Provident Fund Organization (EPFO) [8]. As of this
with each other. NLP is concerned with giving year, the unemployment rate of India is around 7.74%
computers the ability to understand the text and [6] where the urban area has an unemployment rate
spoken words in much the same way human beings of 9.06% and the rural area is 7.13%. The number of
can. NLP combines computational linguistics- rule- job seats available is not enough to cover the
based modeling of human language with statistical, staggering amount of applications the companies will
machine learning, and deep learning models. receive.
Together, combining these technologies helps

Copyright: © the author(s), publisher and licensee Technoscience Academy. This is an open-access article distributed under the 253
terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use,
distribution, and reproduction in any medium, provided the original work is properly cited
Bhushan Kinge et al Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol, March-April -2022, 8 (2) : 253-258

Hence, if the companies hire in bulk there are many web service API [9]. This data is then used to rank
applications to find the talent that they need which the student’s resumes comparing their skills
will require a considerable amount of resources and required in the job, this is done using the Named
time, this problem Talent acquisition Companies arise Entity Recognition(NER) like Apache OpenNLP
as solutions for this problem who fill in the spot and [10] and Stanford Name Entity Recognizer [11].
get the job done with less amount of resources costing The Skill finder efficiently matches the Resumes
to the company with an acceptable timeline. Even according to the job role posted and successfully
here the applications are in millions which is a sends emails to the desiredcandidates
tedious task to go through them hence these
companies use various Machine learning models C. ResumeNET : A Learning-based Framework for
which will rank out the top resumes which are the Automatic Resume Quality Assessment
best fit for the job role. In research conducted by Yong Luo [3], they have
developed a custom dataset of 10,343 resumes which
II. LITERATURE was acquired by a private resume management
company. 98.82% (i.e 10,221 resumes) data is
A. Machine Learning approach for automation unlabeled and the remaining 1.18% (i.e. 122 resumes)
Resume Recommendation System data is labeled in 2 categories positive and negative,
Pradeep Kumar Roy in their research [1], 33 and 89 of them are labeled as positive and
created a system where they can minimize negative.
the cost of hiring new candidates for the job
D. Web Application for Screening Resume
positions in the company. They focused on 3
major problems in this process The goal for Sujit Amin [4] was to develop a web
application for resume screening, with the help of
● Picking the right candidates from
220 resumes out of which 200 were used for training
the applicants
and 20 used for testing purposes, further, the web
● Making sense of their CV’s
application is divided into 3 divisions
● Finding out if the candidate is fit
A) Job Applicant side
for the job role
B) Server-Side
They performed NER, NLP, and text
classification using n-grams and used C) Recruiter Side
Machine Learning to perform the The applicant side is where the applicant will provide
classification using the algorithms of Random his/her resume, the server-side will process the
Forest with 38.9% accuracy, Multinomial resume and then be trained using the NLP Pipeline
Naïve Bayes with 44.39%, Logistic which used SpaCy which is an NLP framework. On
Regression with 62.4%, and the highest the recruiter’s side, the rank list of the resumes will
accuracy was obtained by Linear Support
be shown which was decided from a score calculator
Vector Machine Classifier with an accuracy
of 78.53% . [13] so the recruiter can select the best fit candidate
for the job.
B. Skill Finder: Automated Job-Resume Matching
System E. Design and Development of Machine Learning
In research conducted by Thimma Reddy Kalva [2], based Resume Ranking System
they have developed a custom dataset of 3000 jobs The proposed system hereby Tejaswini K [5] is where
and 80 resumes from the website indeed using the the resume is submitted by the candidate after an

Volume 8, Issue 2, March-April-2022 | http://ijsrcseit.com 254


Bhushan Kinge et al Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol, March-April -2022, 8 (2) : 253-258

MCQ test which has a face detection system to detect or, the, etc.”. Other techniques like Parts of Speech
malpractice. Once the resume is submitted it’s run tagging and NER are also used.
through NLP techniques to get the relevant skills Total 6 types of classification models are used: Naïve
from the resume and use TF- IDF vectorization [14] bayes, Multinomial Naïve Bayes, Linear SVM,
to convert the words into vectors so the machine can Bernoulli Naïve Bayes, Logistic Regression and KNN.
understand it. These classification models are run on a dataset of
30,000 employees’ resumes which was split in a 9:1
The classifier used here is the KNN algorithm to ratio where 27,000 were used for training and 3000
identify the resume that closely matches the JD used for testing across various domains including AI,
provided by the recruiter. The system has an average Computer Architecture, CG, Databases, Distributed
parsing accuracy of 85%. Computing, CN, Web Technologies and Cloud
Computing. Out of all the 6 Classifiers Multinomial
F. Resume Classification and Ranking using KNN Naïve Bayes came out with a top accuracy of 91%.
and Cosine Similarity
Riza Tanaz Fareed, Rajath V, Sharadadevi H. Differential Hiring using a Combination of NER
Kaganumath came up with a method to implement and Word Embedding
the Resume classification with the addition of cosine The objectives of this examination conducted by
similarity [15]. The process is the candidate provides Suhas H E and Manjunath A E [17], were to create a
his/her resume to the system. model which uses NLP, NER Word embedding, and
Cosine Similarity to suggest Resume for Job roles.
The resume is then passed through an NLP pipeline Resumes and the JD is taken as the input by the
where the words are extracted out of the resume. system.
Techniques like stop words, lemmatization are used to Data dump of technical skills [18] used to tag
get the correct set of words. TF-IDF vectorizer [14] is technical skills in each resume document. Tab-
used to vectorize the words for the KNN model to separated value (TSV0 file is generated and that file is
classify the resume into various categories. Now to provided to the Stanford NER model [10] to train the
evaluate the resume on the given JD document NER model. The output of the NER model (i.e. skills)
similarity detection is necessary so the Cosine becomes the input for the word2vec model which
Similarity Algorithm is used in which the JD content uses a shallow neural network.
is matched with the candidate’s resume. The accuracy The last step of the process is Cosine Similarity which
for this trained model is 98.96% . determines how much the given resume matches the
given JD. The accuracy obtained was 79.8%.
G. Automated Tool for Resume Classification using
Semantic analysis III. LIMITATIONS
This study conducted by Suhas Tangadle ▪ The Above systems has models that don’t have
Gopalakrishna and Vijayaraghavan Varadharajan [16] any way to improve themselves over the time,
provides a descriptive view of how they use semantic the models will be trained only once.
analysis for resume classification. The received ▪ The above models used Machine learning
resumes by the HR team are parsed through the algorithms which have a tend to plateau in
Natural Language Processing Pipeline (NLPP) where performance when runned over a large dataset.
the Stop deletion is used to delete the words like “and,

Volume 8, Issue 2, March-April-2022 | http://ijsrcseit.com 255


Bhushan Kinge et al Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol, March-April -2022, 8 (2) : 253-258

IV. PROPOSED SYSTEM will be trained on the existing data which we


collected from the open platform of Kaggle. There are
A. Problem Statement 2 Models used in this method the first one being
Resume Screening is a process that is majorly used in either K-Nearest Neighbour or Support Vector
Big Tech companies where they receive a massive Machine which will help us to get the prediction of
amount of resumes, and rank them according to what kind of job role our resume is best fit for and the
resume strength or how much the resume is relevant second model will give recommendations of how we
to the job description and filtering them according to can improve our resume to increase its strength by
that. but the student who applies for the job role using cosine similarity which will check the user’s
doesn’t know why his resume got rejected and how input of what job role they want what the model
he can improvise so his resume can become relevant predicted on that basis the Recommendation system
and strong. Currently, there is no such technology will give its suggestions for the improvement.
available that would benefit the students which can
help them strengthen their resume. The control flow will be in the following manner, the
candidate submits the resume at the front-end the
B. Solution resume is then passed to the resume parser which is a
The solution for the given problem statement will pipeline of NLP techniques that will extract useful
consist of the machine learning model which takes information from the resume, and then the system
the student resume as input and extracts the details will visit the person’s LinkedIn and GitHub profile to
like skills, certifications from it. for the extra details scrape useful information from the website which
about the student, it also takes GitHub and LinkedIn adds more value to the overall extracted data to from
profile links where it can extract the student vectors and provide it to the Machine learning Model
contribution in various fields. The student also has to for the prediction.
provide which job role he/she is applying for. The
model is trained using a job description and skill set D. Accuracy Table
dataset. So ,When the resume is inputted by the Table1: Accuracy in % for different methods
student it can tell which job role is suitable for you or Title Accuracy
how your resume is relevant to the given job A Machine Learning approach for 78.53%
automation of Resume Recommendation
description.
System
Skill Finder: Automated Job-Resume 87%
C. System Architecture Matching System
Below given is the System architecture of our ResumeNet: A Learning-based Framework 85%
for Automatic Resume Quality Assessment
proposed design Fig:1, this shows the entire working
Web Application for Screening Resume 98.96%
of our model and the parts included in it represents
the flow of the system. Design and Development of Machine 79%
Learning based Resume Ranking System
Resume Classification and Ranking using 79.8%
KNN and Cosine Similarity
Figure 1. System Architecture. Automated Tool for Resume Classification 80%
using Semantic analysis
Differential Hiring using a Combination of 82.67%
The system consists of a database; it will be an SQL
NER and Word Embedding
database as our data is properly arranged in columns
and rows rather than being unstructured. The model

Volume 8, Issue 2, March-April-2022 | http://ijsrcseit.com 256


Bhushan Kinge et al Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol, March-April -2022, 8 (2) : 253-258

Table1 shows accuracy in % when different methods Research & Technology (IJERT), ISSN: 2278-
are used for resume recommendation. 0181, Vol.10.
[6]. Suhas Tangadle Gopalakrishna, Vijayaraghavan
V. CONCLUSION Varadharajan, “Automated Tool for Resume
Classification Using Semantic Analysis”,
This Paper deals with multiple methods to detect, International Journal of Artificial Intelligence
identify and classify various resumes using multiple and Applications (IJAIA), Vol. 10, No.1,
machine learning and Neural Network models like January 2019
SVM, KNN, Word2Vec, Cosine similarity, etc. The [7]. Suhas H E, Manjunath AE, “Differential Hiring
accuracy of the models varies based on the datasets using a Combination of NER and Word
used, the complexity of the learning methods and the Embedding”, In 2020 International Journal of
size of the dataset, the results range from 78% - to Recent Technology and Engineering (IJRTE),
98%. We conclude that with a proper dataset and the ISSN: 2277-3878, Vol.9
right algorithm we can get good accuracy and desired [8]. Centre for Monitoring Indian Economy Pvt
output for a large variety of purpose. Ltd. (CMIE),2022. The unemployment rate in
India.
VI. REFERENCES [9]. Howard, J.L., Ferris, G.R., 1996. The
employment interview context: Social and
[1]. Pradeep Kumar Roy, Vellore Institute of situational influences on interviewer decisions
Technology, 2019. A Machine learning 1. Journal of applied social psychology 26, 112-
approach for automation of resume 136.
recommendation system, ICCIDS 2019. [10]. Mudit Kapoor, Business Today, 2021. India’s
10.1016/j.procs.2020.03.284. formal job creation numbers beat pandemic
[2]. Thimma Reddy Kalva, Utah State University, blues.
2013. Skill-Finder: Automated Job-Resume [11]. M. Belkin, P. Niyogi, and V. Sindhwani,
Matching system. 3]Yong Luo, Nanyang “Manifold regularization: A geometric
Technological University, 2018. A Learning- framework for learning from labeled and
Based Framework for automatic resume quality unlabeled examples,” Journal of Machine
assessment, arXiv:1810.02832v1 cs.IR]. Learning Research, vol. 7, pp. 2399–2434, 2006.
[3]. Suhjit Amin, Fr.Conceicao Rodrigues Institute [12]. A. Zaroor, M. Maree, and M. Sabha, “A Hybrid
of Technology, 2019. Web Application for Approach to Conceptual Classification and
Screening resume, IEEE DOI: Ranking of Resumes” In Czarnowski I., Howlett
10.1109/ICNTE44896.2019.8945869. R. (eds) Intelligent Decision Technologies 2017.
[4]. Tejaswini K, Umadevi V, Shashank M Kadiwal, IDT 2017. Smart Innovation, Systems and
Sanjay Revanna, Design and Development of Technologies vol 72. Springer.
Machine Learning based Resume Ranking [13]. Jabri, Siham, Azzeddine Dahbi, Taoufiq Gadi,
System (2021), DOI: and Abdelhak Bassir. "Ranking of text
https://doi.org/10.1016/j.gltp.2021.10.002. documents using TF-IDF weighting and
[5]. Riza tana Fareed, rajah V, and Sharadadevi association rules mining." In 2018 4th
kaganumath, “Resume Classification and international conference on optimization and
Ranking using KNN and Cosine Similarity” In applications (ICOA), pp. 1-6. IEEE, 2018. .
2021 International Journal of Engineering

Volume 8, Issue 2, March-April-2022 | http://ijsrcseit.com 257


Bhushan Kinge et al Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol, March-April -2022, 8 (2) : 253-258

[14]. The data source for the skills used in the NER Cite this article as :
train.
[15]. Jagan Mohan Reddy D, Sirisha Regella., Bhushan Kinge, Shrinivas Mandhare, Pranali Chavan,
“Recruitment Prediction using Machine S. M. Chaware, "Resume Screening using Machine
Learning”, IEEE Xplore, 2020. Learning and NLP: A proposed system", International
[16]. Resnick, P., Varian, Journal of Scientific Research in Computer Science,
H.R.,1997.Recommender Engineering and Information Technology
Systems.Communications of the ACM40, 56– (IJSRCSEIT), ISSN : 2456-3307, Volume 8 Issue 2, pp.
59. 253-258, March-April 2022. Available at doi :
[17]. Xavier Schmitt, Sylvain Kubler, Jer my Robert, https://doi.org/10.32628/CSEIT228240
Mike Papadakis, Yves LeTraon University of Journal URL : https://ijsrcseit.com/CSEIT228240
Luxembourg, Luxembourg Replicable
Comparison Study of NER Software:
StanfordNLP, NLTK, OpenNLP, SpaCy, Gate.
[18]. Y. Luo, Y. Wen, T. Liu, and D. Tao,
“Transferring knowledge fragments for learning
distance metric from a heterogeneous domain,”
IEEE Transactions on Pattern Analysis and
Machine Intelligence, 2018.
[19]. Mikheev, Andrei; Moens, Marc; Glover, Claire.
1999. “Named Entity Recognition without
Gazetteers.” Proceedings of EACL ’99. HCRC
Language Technology Group, University of
Edinburgh. http://acl.ldc.upenn.edu/E/E99/E99-
1001.pdf.
[20]. Zhou, GuoDong; Su, Jian. 2002. “Named Entity
Recognition using an HMM-based Chunk
Tagger.” Proceedings of the Association for
Computational Linguistics (ACL), Philadelphia,
July 2002. Laboratories for Information
Technology, Singapore
[21]. Zhang, L., Fei, W., Wang,
L.,2015.Pjmatchingmodelofknowledgeworkers.
Procedi acomputerscience60,1128–1137
[22]. http://www.indeed.com/isp/apiinfo.jsp
[23]. https://opennlp.apache.org/documentation/1.5.
3/
manual/opennlp.html#tools.namefind.recogniti
on
[24]. https://nlp.stanford.edu/software/CRF-
NER.shtml

Volume 8, Issue 2, March-April-2022 | http://ijsrcseit.com 258


Bhushan Kinge et al Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol, March-April -2022, 8 (2) : 253-258

Volume 8, Issue 2, March-April-2022 | http://ijsrcseit.com 259

You might also like