My Final Project
My Final Project
My Final Project
BY
180404027
DECEMBER, 2023.
CERTIFICATION
i
This is to certify that this project work was carried out by ANTHONY
Computer Science, Faculty of Science, Adekunle Ajasin University, Akungba Akoko, Ondo
State.
……………………….. ……………………...
Dr. D. A. Akinwumi Date
Supervisor
……………………….. ..…………………
Dr. F.O. Aranuwa Date
Head of Department
DEDICATION
ii
This project work is dedicated to the Rock of Ages for His love, blessings, unmerited favours
and protection, now and throughout my days on campus as an undergraduate.
iii
ACKNOWLEDGEMENTS
First and foremost, my utmost gratitude goes to the Omnipresent God for being there for me
before and after this research work. May your name be glorified both now and forever more. I
am heartily thankful to my supervisor, Dr. D.A. Akinwumi, whose encouragement, guidance,
fatherly love and support from the initial to the final level enabled me to develop an
understanding of the subject.
Let me seize this wonderful opportunity to appreciate the Head of Computer Science
Department, in person of Dr. F.O. Aranuwa for his moral and academic support. I would like
to express my deepest appreciation to the former head of Department, Dr. Olusola Ajayi and
to all lecturers in the Department of Computer Science for their unselfish and unfailing
supports in all ways.
It is a pleasure to thank my parents, Mr and Mrs Anthony for all they have done in helping to
purse my career, I love you so much. To my able siblings. Thank you all for being there for
me.
In any academic endeavor, many people and institutions will be involved in the success story.
I sincerely and humbly express my appreciation and gratitude to Afunibiowo Adejoke and
also to my course mates and everybody who really impacted my life positively during the
course of my study. God bless you all.
iv
ABSTRACT
The development of a Curriculum Vitae (CV) review model based on Natural Language
Processing (NLP) represents an innovative approach to enhance the efficiency and objectivity
of the hiring process. This research focuses on leveraging NLP techniques to analyze and
assess CVs, extracting relevant information and evaluating candidates based on predefined
criteria. The model aims to automate the initial screening phase, reducing human bias and
expediting the selection of qualified candidates. Through the integration of advanced NLP
algorithms, the model interprets and comprehends the content of CVs, considering both
structured and unstructured data. The research explores the feasibility, accuracy, and
effectiveness of this NLP-based CV review model, emphasizing its potential to revolutionize
the recruitment process by improving precision, saving time, and promoting fair evaluations.
The findings contribute to the intersection of natural language processing and human
resources, offering a valuable tool for organizations seeking to optimize their hiring
procedures.
v
TABLE OF CONTENTS
TITLE PAGE ….…..….…….…………..………………...…………………………….……i
CERTIFICATION.…….………...….………………………………………………………ii
DEDICATION………………………………………...……………………………………..iii
ACKNOWLEDGEMENT………………………………………………………………......iv
ABSTRACT …………………………………………………………………………………v
TABLE OF CONTENT……………………………..…………………………………………………………...……vi
CHAPTER ONE.......................................................................................................................1
INTRODUCTION....................................................................................................................1
1.1 Background of the Study..............................................................................................1
1.2 Statement of the Problem.............................................................................................3
1.3 Aim and Objectives.......................................................................................................3
1.3.1 Aim..........................................................................................................................3
1.3.2 Objectives...............................................................................................................4
1.4 Methodology..................................................................................................................4
1.5 Scope of the Study.........................................................................................................5
1.6 Expected Contribution to Knowledge.........................................................................6
1.7 Definition of terms.........................................................................................................6
CHAPTER TWO.....................................................................................................................7
LITERATURE REVIEW........................................................................................................7
2.0 Introduction...................................................................................................................7
2.1 Research Concepts.............................................................................................................9
2.1.1 Curriculum Vitae (CV).............................................................................................9
2.1.2 How a Curriculum Vitae (CV) Works.................................................................9
2.1.3 Differences between Curriculum Vitae (CV) and Resume..............................10
2.1.4 Different types of CV...........................................................................................10
2.1.5 Benefits of CV.......................................................................................................11
2.1.6 An Overview of NLP...............................................................................................13
2.1.7 Using Machine Learning for analysis and Classification.................................14
vi
2.2 Review of related works..........................................................................................16
CHAPTER THREE...............................................................................................................19
METHODOLOGY.................................................................................................................19
3.1 Introduction.................................................................................................................19
3.2 System Architecture....................................................................................................19
3.3 Resume from Kaggle...............................................................................................20
3.4 Resuming Preprocessing.........................................................................................20
3.5 NLP and ML Model Development.........................................................................21
3.6 Model Evaluation.....................................................................................................21
3.7 System flowchart.....................................................................................................23
4.1 Introduction.................................................................................................................24
4.2 Implementation Requirements..................................................................................24
4.2.1 Programming Language......................................................................................24
4.2.2 Justification for Choice of Programming Language........................................25
4.3 Implementation phases of the model.........................................................................26
4.3.1 Python Machine Learning Model.......................................................................26
4.3.2 Data Cleaning.......................................................................................................29
4.3.2 Building the model...............................................................................................32
4.3.4 Result.....................................................................................................................32
CHAPTER FIVE....................................................................................................................33
5.1 Summary..................................................................................................................33
5.2 Conclusion................................................................................................................34
REFERENCES
vii
CHAPTER ONE
INTRODUCTION
with the organization, the applicant must provide one of these documents with their
important marketing tools that will give a self-portrait or advertisement of the employee and
will present your relative strengths, skills, and experiences to a potential employer
(Bhushan Kinge et al., 2022)
. An effective resume or CV will provide an employer with an overview
of who you are as a student or young professional, what you know and can do in relation to
the position of interest, and what relevant skills, traits, and accomplishments you have
achieved at this point in your education or career. Therefore, the objective of your resume or
Every day, any company with a job opening for a particular position receives thousands of
emails from potential employees. It will be challenging for any recruiter to select the top
candidates from a huge pool of applicants for that employment position. It is exceedingly
difficult for recruiters to manually go through hundreds of resumes to locate the top
candidates for the post. About 75% of the thousands of resumes that were sent to the
organization in response to the job posting do not demonstrate the pertinent abilities needed
for the position. As a result, it can be quite difficult for recruiters to select the best individuals
from a big pool of applicants. Also, the process of reviewing each employee CV can be
cumbersome, as such, company would have adopted the use of external recruiter
(Amin et al., 2019)
.
1
The recruitment industry is worth $200 billion and it deals with selecting the best candidates
from an enormous pool of applicants who have the necessary skills for a certain job
description. Numerous employees send their resumes to the organization to apply for any job
openings that may exist at the company and screening resumes of all job applicants is the
recruiting process for any recruiter (Bersin, 2017). There has always been a search for an
automated process in which employers can quickly select eligible candidates and applicants
can show their ingenuity by using a single application format to apply to several
application of an automated system to carry out the process is a tested and true way to carry
out the process. This study has adopted the application of machine learning.
In the subject of machine learning, we develop a model with a dataset to forecast the intended
result from new data. Natural language processing (NLP), which relates to the way humans
speak with one another, is primarily used to screen the resumes. The goal of NLP is to enable
computers to comprehend spoken and written language in a manner similar to that of humans.
statistical, machine learning and deep learning models. Together, combining these
technologies helps computers process the way human language works in the form of texts or
voice data and to ‘understand’ its full meaning. As the job market is growing in India,
millions of new job seekers are joining the workforce every year, as per LinkedIn (Suhua et
al, 2020)
Although the data formats used in CV/Resumes are not entirely unstructured, it is still
difficult to accept them in a standardized format since there is no set of rules for writing a
CV/Resume. With machine learning and NLP, to analyze any written documents such as
resumes, the potential to interpret unstructured data and extract relevant information from it,
2
1.2 Statement of the Problem
The number of job seats available is not enough to cover the staggering number of
with different experiences apply for jobs. Some persons hold positions in the human
resources division and they will have to review hundreds to thousands of resumes in order to
find the best fit for a job opening. Hence, if the companies hire in bulk there are many
applications to find the talent that they need which will require a considerable number of
each résumé, putting a summary together and adding the data to the database. Executives
condense the résumé and enter the applicant's contact information into their database and
calling them for interviews following the acquisition of the resume, but with machine
learning it will rank out the top resumes which are the best fit for the job role using NLP
algorithms.
The model will also ensure the switch from labor intensive human resume processing to
incredibly quick and affordable software. The following objectives are identified as necessary
1.3.1 Aim
The aim of this project is to develop a machine learning model to automate the extraction of
required information of candidate resume without manually going through all submitted
resume of an applicant. To achieve the project aim, the following objectives would be
considered.
1.3.2 Objectives
The objectives of this project are to:
3
a. develop a model that can accept new CV/Resume for review using NLP (Natural
1.4 Methodology
The methodology for the development of a curriculum vitae review model based on natural
INFORMATION GATHERING
BUILDING MODEL
TESTING MODEL
DEPLOYMENT
needed, reviewing of related works and identifying the best and effective
4
ii. Building the model: The process involves writing the codes to achieve the project
iii. Test the model: The process involves creating and structuring a format the model
will follow to review the document. The new CV to be reviewed will be supplied
and the similarity score will be produced ranging to an 100%. The higher the
iv. Deployment: The model can be packed and a manual will be provided on how the
The difficulty of extracting relevant information from a resume in an organized fashion can
be overcome with the aid of a purpose system. This study aims at developing machine
learning model that can help automate the process. This will be achieved by implementing a
NLP algorithm that will be capable of comparing submitted resume with the expected format
and information needed by a company. The model will only cover supplying a format for the
resume and a resume would be provided to compare the similarity of the new resume to the
company’s format.
computer science, and artificial intelligence concerned with the interactions between
ii. CV: A CV, which stands for curriculum vitae, otherwise known as Resume, is a
document used when applying for jobs. It allows you to summarise your education,
5
skills and experience enabling you to successfully sell your abilities to potential
employers
iii. Machine learning (ML) is a field of inquiry devoted to understanding and building
methods that 'learn', that is, methods that leverage data to improve performance on
6
CHAPTER TWO
LITERATURE REVIEW
2.0 Introduction
Hiring the right talent is a challenge for all businesses. Manually screening a large number of
resumes/cv takes at least one day. If a recruiter considers 4-6 appropriate resumes when
going through the initial resumes, chances are that they will not consider the other submitted
resumes. This decreases the likelihood of a successful resume being shortlisted. Going
through each resume is time-consuming, and manually organizing and managing a large
number of resumes is challenging. It’s normal to have some prejudice, wherever there’s been
(Naik et al., 2022)
human involvement .
This challenge is magnified by the high volume of applicants if the business is labor-
intensive, growing, and facing high attrition rates. An example of such a business is that IT
with a variety of technical skills and business domain expertise are hired and assigned to
(Barrak et al., 2022)
projects to resolve customer issues . This task of selecting the best talent
among many is known as Resume Screening. Typically, large companies do not have enough
time to open each CV, so they use machine learning algorithms for the Resume Screening
task and by this unemployment rate also reduced with efficient hiring. Machine learning is a
field in which involves training a model with data to anticipate the intended outcome when
new data is submitted. Natural language processing (NLP) is a commonly used to screen
resumes. Natural language refers to how humans communicate with one another (Riza et al,
2021).
In the NLP the system enables us to find the text based on the English dictionary in the same
way as humans. NLP combines statistical, machine learning, and deep learning models of
7
human language with computational linguistics-based rule-based modeling, here we need to
check for the data from different formats which are either in the form of the document or
either in the form of the audio data and understanding the whole meaning of it
(Dimopoulos, 2019)
. The number of applications is in the millions, making it a time-consuming chore to
sort through them. Here we need a machine learning algorithm that can give a better way of
understanding and also can full fill the requirements according to the requirement in the
industry. The proposed system takes a CSV file as input which contains different categories
and resumes based on the category and features of the resume the accuracy and performance
The study Employers expectations, a probabilistic text mining model (Gao and Eldin, 2018),
more than 20,000 job advertisements from various websites were processed, the method of
text mining was applied to identify information skills derived from the web pages of the
construction industry sector. In the research named Text Analysis for Job Matching Quality
Improvement (Kinoa et al., 2017), in a context of data analysis that includes travel time, job
location, job type, rates, and candidate skill set, etc. And when applying keywords in a
machine learning process using text mining tools, as a result, effective keywords are
discovered for a job matching system. In the research entitled Natural Language Processing
and Text Mining to Identify Knowledge Profiles for Software Engineering Positions (Almada
et al., 2017), through the application of NLP and TM to analyze the unstructured text of the
resumes and job offers, it manages to identify the knowledge profiles for software
engineering positions.
In the research entitled Data Mining Approach to Monitoring the Requirements of the Job
Market: A Case Study (Karakatsanis et al., 2018), presents an approach based on data mining
to identify the most demanded occupations in the modern labor market. To achieve this, have
8
a latent semantic indexing model that is able to match the job announcement extracted from
the 18web with the data of the occupation description in the database.
A curriculum vitae works in much the same way as a resume, providing information about an
individual's educational and work history. Often called a CV for short, it's much more
comprehensive than the typical resume and can be much longer. There's no limit to how long
a CV can be, but it must be focused on academic and professional experience. A lengthy CV
isn't any better than a short one if it contains fluff or irrelevant data. A job applicant seeking
position, should always use a CV. If you're unsure whether a prospective employer wants a
resume or CV, use the job announcement to guide you. It will usually state which document
A CV begins with one’s contact information, including one’s name, address, telephone
number, and email address. You should also indicate one’s area or areas of academic interest.
One’s CV should include a comprehensive account of one’s academic history, including the
title of one’s dissertation or thesis. It must also contain details about all publications, research
projects, and presentations to which you have contributed. You should also list any grants,
academic awards, and other related honors you've received. The employment and experience
section of one’s CV should contain teaching and research positions, both paid and unpaid. In
addition to jobs, include any relevant internships and volunteer experiences here. Following
9
that section, discuss memberships in scholarly and professional associations and include
offices you have held, if any. Finally, provide a list of references, along with their contact
information, on one’s curriculum vitae. Doing this is in contrast to a resume, which never
A resume is a summary of one’s background and experience. Its emphasis is on one’s work
background. Resumes are typically two pages or less, while CVs can be as long as needed to
convey one’s academic background and experience. CVs are used for academic positions,
and the format can vary as long as it includes all the information one’s prospective employer
requires. Resumes are used for most other positions and follow a few standard templates.
CV RESUME
Comprehensive list of one’s academic and professional Summary of one’s relevant work
experience experience
applications
If you're crafting a new CV (or one’s first CV) you'll need to think about what type of CV
you want to make. This will depend on one’s experience, circumstances, industry and
10
i. Chronological CVs
This is the most traditional type of CV, and is what most employers expect to see. A
that one’s most recent job is at the top of the page. Ideally, a CV should go back around 10-
Although most CVs are chronological, in certain situations you may decide to order them
differently. For example, if you are changing careers, you might prioritise education and
experience that is most relevant to the role you're applying for, moving less relevant
experience further down the page. However, ensure that one’s CV is as clear as possible for
potential employers.
Creative CVs heavily use visual elements such as pictures, graphs and colors to represent
skills and experience. Creative CVs are common in fields such as marketing or design, but
may not be a good idea for more formal industries like banking or law. You can get an idea of
whether a creative CV would impress one’s potential employer by studying their job advert
and website—if it's written very formally, it's probably best to stick to a traditional CV.
2.1.5 Benefits of CV
the more successful future of living one’s dream career. The curriculum vita e is the most
important document that you’ll ever write to snatch the job opportunity. It will be one’s first
11
impression, and it should leave people wanting more. It should show the best angle of one’s
image to one’s future employers. Here are several benefits of having a well-made CV.
i. Boost self-confidence
Writing down one’s skills and abilities on one’s CV is a very constructive thing to do for
one’s self-confidence. With all of one’s positive traits on a piece of paper (or a screen) in
front of you, you will feel imbued with a strange strength that you thought would only be fit
for heroes. Wise men say that well-earned confidence is half the battle won. Future
employers will also be more likely to hire confident candidates. So, all the more reason to
A good CV is not only one’s positive traits but also one’s certifications, experience, and other
notable achievements you have. Those are one’s proof of knowledge and help put you as one
of the self-aware candidates. Employers look for these kinds of credentials when they search
for candidates. They know that someone who has done something before knows how to do it
Since we're already mentioning one’s college organizational experience, one’s Curriculum
Vitae would also be a good place to display what teamwork skills you've put to good use. In
all fairness, there would only be a handful of jobs out there that don't require you to work
with a team to finish one’s daily tasks. This is why having one’s teamwork skills easily
12
Let's face it, how would you feel if you receive a letter without clear information? Would you
try to get to know the person sending the letter? Most of us would just forget that letter and
grab the next letter in the queue. Now, that clear information for employers is one’s CV. Let's
make a memorable resume and let one’s potential employers remember one’s application and
With a good CV in the hands of one’s potential employer, there will be more minor
interrogation happening to you. If they already have the needed information suitable for the
job description you are aiming for, the interview process would focus on what kind of person
you are instead of the details on the official documents. Concentrating on what you can do
again is memorizing one’s life which is backed up by one’s CV, that's going to be a fun
process. Reciting what you can do is another confidence booster, at least that's what I've
experienced myself.
A great CV makes for a great hiring process, both for you as the job seeker and also for one’s
potential employer. With all one’s suitable traits written down on the CV that they're now
holding, prospective employers can't help but conclude that you're a person with good
attention to detail, who will go and research things that need to be done, an independent yet
inquisitive worker. Add that to the pleasant interview they had when they had you in their
office, and at the very least, I'd say you'll be on their list of top candidates to hire
widely used technology for personal assistants that are used in various business fields/areas.
13
This technology works on the speech provided by the user, breaks it down for proper
understanding and processes accordingly. This is a very recent and effective approach due to
which it has a really high demand in today’s market. Natural Language Processing is an
upcoming field where already many transitions such as compatibility with smart devices,
interactive talks with a human have been made possible. Knowledge representation, logical
reasoning, and constraint satisfaction were the emphasis of AI applications in NLP. Here first
(Anggakusuma et al., 2020)
it was applied to semantics and later to the grammar .
In the last decade, a significant change in NLP research has resulted in the widespread use of
statistical approaches such as machine learning and data mining on a massive scale. The need
for automation is never ending courtesy of the amount of work required to be done these
days. NLP is a very favorable, but aspect when it comes to automated applications. The
applications of NLP have led it to be one of the most sought-after methods of implementing
machine learning. Natural Language Processing (NLP) is a field that combines computer
science, linguistics, and machine learning to study how computers and humans communicate
in natural language. The goal of NLP is for computers to be able to interpret and generate
human language. This not only improves the efficiency of work done by humans but also
helps in interacting with the machine. NLP bridges the gap of interaction between humans
Pros and cons but none of them stands as a perfect solution. Static analysis is one of the
approaches and it can be defined as the analysis of a software without its execution. It is clear
that a good analysis tool can help spot and eradicate vulnerabilities, furthermore, it is
becoming a part of the development process. But there is still room for improvement and all
14
the research work done in this area can be of uttermost relevance for the industry
There are different types and classifications of machine learning models, provided by
i. Decision trees:
Decision trees are a simple, but powerful form of multiple variable analysis. They are
produced by algorithms that identify various ways of splitting data into branch-like segments.
Decision trees partition data into subsets based on categories of input variables, helping you
Regression is one of the most popular methods in statistics. Regression analysis estimates
relationships among variables, finding key patterns in large and diverse data sets, and how
Patterned after the operation of neurons in the human brain, neural networks (also called
artificial neural networks) are a variety of deep learning technologies. They’re typically used
to solve complex pattern recognition problems – and are incredibly useful for analyzing large
data sets. They are great at handling nonlinear relationships in data – and work well when
Time Series Algorithms, Clustering Algorithms, Ensemble Models, Factor Analysis, Naïve
Bayes and Support vector machines. Each classifier approaches data in a different way,
therefore for organizations to get the results they need, they need to choose the right
15
classifiers and models. data scientists and IT experts are tasked with the development of
choosing the right predictive models or building their own to meet the organization’s needs.
Several researchers have worked on curriculum vitae review model based on natural
Juneja et al. (2016) Used Natural Language Processing (NLP) and Machine Learning (ML) to
rank the resumes according to the given constraint, this intelligent system ranks the resume of
any format according to the given constraints or the following requirement provided by the
client company. We will basically take the bulk of input resume from the client company and
that client company will also provide the requirement and the constraints according to which
the resume should be ranked by our system. Beside the information provided by the resume
we are going to read the candidates social profiles (like LinkedIn, Github etc) which will give
Amin et al. (2019) , this research focus majorly on the design of the web application which
will be used to screen resumes (Curriculum Vitae) for a particular job posting. In the
proposed system, a web application will encourage the job applicant candidates as well as the
recruiters to use it for job applications and screening of resumes. The recruiters from various
companies can post the details of the job openings available in their respective companies.
The interactive web application will allow the job applicants to submit their resume and apply
for their job postings they may still be interested in. The resumes submitted by the candidates
are then compared with the job profile requirement posted by the company recruiter by using
techniques like machine learning and Natural Language Processing (NLP). Scores can then
16
be given to the resumes and they can be ranked from highest match to lowest match. This
ranking is made visible only to the company recruiter who is interested to select the best
Rabih et al. (2021), presented a paper on curriculum vitae evaluation using machine learning
approach. Its main role is to detect the eligibility of people who are applying to job vacancies
or higher education programs. This research work ambitions in elaborating a system that
automates the preselection of eligibility and assessment of candidates in the higher education
students’ recruitment process. This system will replace the tedious tasks of manual
processing of CVs and will provide accurate and effective evaluation results. To achieve this
requirement, the system will be implemented using a machine learning approach using
Its main role is to detect the eligibility of people who are applying to job vacancies or higher
education programs. This research work ambitions in elaborating a system that automates the
recruitment process. This system will replace the tedious tasks of manual processing of CVs
and will provide accurate and effective evaluation results. To achieve this requirement, the
system will be implemented using a machine learning approach using different classification
algorithms. The limitation of this work is that the system can not the analysis scope is applied
Lokesh.et al. (2022),
to the candidates who are applying to pursue a Master’s degree only this
report has discussed the detailed design and related algorithms for a resume screener, to
decide whether a particular candidate is suitable for the applied role or not. Candidates apply
in large numbers for jobs on web portals by uploading their resumes. As a result, filtering
applicants for the appropriate position in an organization becomes a difficult task for
recruiters. Natural Language Processing (NLP) techniques to extract the relevant information
from the resume to save time and effort. Also, a Machine Learning (ML) model is trained to
17
check whether a candidate’s skills, experiences, and other aspects are suitable for that
particular role. In addition to that, our system will also recommend the other available job
roles based on the candidate’s skillset. On analyzing the performance of the models, we
found that Logistic Regression performs the best for this problem statement. We also found
that more dataset is required for making this model work even more efficiently. More
attributes can be added to find much better performance. Overall, the system performs pretty
well with the current resources. As a part of our future work, it was intended to improve the
accuracy of the system by collecting more resumes from organizations and training our
model for all the available roles. In addition to that, we could also analyze the candidate’s
information from social networking sites like Facebook, Twitter, Linkedin, so that we can
decide more accurately and authentically whether to offer the job or not. Additionally,
algorithms such as Naive Bayes, K-Nearest Neighbor, and C4.5 Analysis can be performed,
Naga (2022), Selecting applicants for the appropriate job within a company is a difficult task
for recruiters. Extraction the key information from the CV using NLTK, Natural Language
Processing (NLP) techniques to save time and effort. This paper examines a variety of
machine learning model such as KNN, SVM, logistic regression and MLP, to detect, identify,
and categories diverse resumes. And here we achieve the better accuracy and we implement a
web interface to screen the resumes and analyses the type of job related to resume, MLP
outperforms other approaches like KNN, SVM, Logistic Regression. Furthermore, this
system attempts to find the accuracy and performance of the proposed methodology and
incorporate it in the IT firms and other regulations for the prevention of manual screening and
18
CHAPTER THREE
METHODOLOGY
3.1 Introduction
This section of the project discussed the methodological steps to be followed to develop the
automated CV review model using NLP. It also discusses the requirements and architecture
processing
19
3.3 Resume from Kaggle
This is methodical process of collecting and analyzing data from a variety of sources to get a
complete and accurate picture of a subject. In this context, the dataset to be used would be
samples and two variables (Columns), the first variable is the category, which defines the
category of job the candidate is applying for while the second variable is the CV, which
The very first step of NLP projects is pre-processing the data. This is essential in preparing
the text data for the model building before using it for analysis or prediction. Some of the
dataset.
iii. Removing Stop words: Stop words are the words in a stop list which are removed
before or after processing of natural language data because they are unimportant.
They can be list of pronouns that doesn’t have meaning to the dataset.
and sentences into smaller units that can be more easily assigned meaning.
to analyze the meaning behind a word. Stemming uses the stem of the word, while
lemmatization uses the context in which the word is being used (Deepanshi, 2022).
20
3.5 NLP and ML Model Development
To build an NLP Model, this process involves both dependency parsing and part of speech
tagging. Dependency parsing is used to find out the relationship between all the words in a
sentence. To find the dependency, it involves building a tree and assigning a single word as a
parent word. The main verb in the sentence will function as the root node while part of
speech tagging contains verbs, adverbs, nouns, and adjectives that help convey the meaning
implemented using an NLP library called NLTK (Natural Language Toolkit) which is a suite
of libraries and programs for symbolic and statistical natural language processing for English
written in the Python programming language to develop NLP models. An output of the NLP
process will be an input in form of dataset to the classification model to be developed. This
model will classify CVs based on applicant’s skillset to determine if the applicant will be
i. Accuracy
Accuracy is the ratio of correct predictions to the total number of predictions. It is one of the
simplest measures of a model. We must aim for high accuracy for our model. If a model has
high accuracy, we can infer that the model makes correct predictions most of the time.
CorrectPrediction
Accuracy=
CorrectPrediction+ IncorrectPrediction
21
TruePositive+TrueNegative
Accuracy=
TruePositive+ FalsePositive+TrueNegative + FalseNegative
F1 score depends on both the Recall and Precision, it is the harmonic mean of both the
values.
Recall∗Precision
Mean Score=
Recall+ Precision
iii. Precision
CV correctly classified
iv. Recall
A Recall is essentially the ratio of true positives to all the positives in ground truth.
CV correctly classified
22
3.7 System flowchart
23
CHAPTER FOUR
4.1 Introduction
This chapter describes the implementation stages for the development of Natural Language
models for reviewing CV. The modeling is done using a dataset which is retrieved from
kaggle. Python programming language will be used for developing the model.
The model development requirements are into two main parts, software and hardware
requirements. The candidates’ credibility datasets form a major component for the model
building.
a. Software Requirements
i. Python 3.9.0
b. Hardware Requirements
languages use punctuation, and it has fewer syntactic constructions than other languages.
need to compile your program before executing it. This is similar to PERL and PHP.
ii. Python is Interactive: You can actually sit at a Python prompt and interact with the
24
iii. Python is Object-Oriented: Python supports Object-Oriented style or technique of
iv. Python is a Beginner's Language: Python is a great language for the beginner-level
Machine learning is considered to be the trending technology of the future. Already there are
a number of applications made on it. Due to this, many companies and researchers are taking
interest in it. But the main question that arises here is that in which programming language
can these machine learning be developed? There are various programming languages like
Lisp, Prolog, C++, Java and Python, which can be used for developing applications of
Machine learning. Among them, Python programming language gains a huge popularity and
Python involves very less coding and simple syntax among other programming languages
which can be used for developing Machine learning Model. Due to this feature, the testing
A major advantage for using Python for Machine learning is that it comes with inbuilt
libraries. Python has libraries for almost all kinds of machine learning projects. For example,
iii. Open source: Python is an open-source programming language. This makes it widely
25
4.3 Implementation phases of the model
The implementation phase of this study shows the modeling of LR, SVM and KNN to
In order to reduce the complexity of the structure. The full dataset which is loaded using a
In this project, we aim to investigate the effect of alcohol consumption on student academic
performance using a random forest model. We will be using the sklearn library in Python to
i. numpy: a library for numerical computing in Python that provides support for arrays,
ii. pandas: a library for data manipulation and analysis in Python that provides support
for reading and writing data in various file formats and performing operations such as
iii. scikit-learn: a library for machine learning in Python that provides support for a wide
iv. matplotlib: a library for data visualization in Python that provides support for creating
26
Figure 4.1: Importing libraries
The dataset has two variables, a category, which is the area the applicant is applying for and
27
Previewing all the categories in the dataset
28
Figure 4.4: Categories of job available
This process involves scanning through the dataset to identify and remove errors from the
dataset. It involves special symbols and characters, hashtags. The idea is to have only English
29
Figure 4.6: Cleaning the dataset
After properly cleaning the dataset, a new column was created to have the cleaned dataset.
This process involves breaking down the cleaned resumed into single words and converting
to lower case.
30
Figure 4.8: Tokenizing the cleaned dataset
This process involves converting the classification category into integers, instead of
31
4.3.2 Building the model
The model building process involves splitting the dataset into training and testing set, 20% of
the entire dataset will be for testing and 80% for training.
After successfully building the model, the validation accuracy is 96%, signifying a
satisfactory result.
4.3.4 Result
32
CHAPTER FIVE
5.1 Summary
The submission of either a résumé or curriculum vitae (CV) is essential for individuals
organization. These documents serve as crucial marketing tools, portraying the applicant's
and relevant achievements, with the primary goal of capturing the attention of prospective
employers for interview consideration. The high volume of job applications received by
companies daily poses a significant challenge for recruiters in manually identifying top
considerable percentage of submitted resumes often lack the essential qualifications for the
Recognizing the need for an efficient and cost-effective solution, this study adopts machine
learning, specifically natural language processing (NLP). NLP, which mimics human
to enable computers to understand and process spoken and written language, combining rule-
based computational linguistics with statistical, machine learning, and deep learning models.
The recruitment industry, valued at $200 billion, grapples with the task of selecting the best
candidates from a vast pool of applicants possessing the required skills. The study
emphasizes the growing job market in India, leading to millions of new job seekers entering
33
the workforce annually. While CV/Resume data formats are not entirely unstructured, their
lack of standardized rules for writing makes them challenging to accept in a uniform format.
The integration of machine learning and NLP offers a solution to analyze and interpret
unstructured data from resumes, facilitating information extraction. The proposed model aims
automated system.
The identified objectives for the study include developing a model capable of accepting and
reviewing CVs using NLP technology, followed by testing and evaluating the model's
effectiveness. The anticipated outcomes involve the shift towards a more efficient and
5.2 Conclusion
In conclusion, this study shows the critical role of résumés and curriculum vitae (CVs) as
indispensable marketing tools for individuals seeking employment. The documents serve as
potential employers. The ultimate objective is to capture the attention of recruiters and secure
necessitate a paradigm shift in the recruitment process. The study recognizes the limitations
of traditional methods and advocates for the adoption of machine learning, specifically
screening resumes offers the potential to automate and expedite the review process. By
enabling computers to comprehend and process human language, NLP serves as a bridge
between rule-based computational linguistics and advanced statistical, machine learning, and
deep learning models. The study emphasizes the significance of leveraging technology to
address the challenges posed by the sheer volume of job applications received by companies.
34
The proposed machine learning model, as outlined in the study, aims to revolutionize the
conventional approach to resume review. The model's objectives include the development of
a system that utilizes NLP technology for efficient CV/Resume review and subsequent testing
and evaluation. This transition from labor-intensive human processing to a swift and cost-
landscape. In essence, the integration of machine learning and NLP offers a promising
solution to the challenges faced by recruiters in handling large volumes of resumes. As the
imperative for enhancing efficiency, reducing costs, and ensuring the identification of the
most qualified candidates. This study contributes to the ongoing dialogue on optimizing the
the required tool to extract the crucial data from a CV and Find relevant qualifications from a
5.4 Recommendations
35
REFERENCES
Amin, S., Jayakar, N., Sunny, S., Babu, P., Kiruthika, M., & Gurjar, A. (2019, January 1). Web Application for
"https://doi.org/10.1109/ICNTE44896.2019.8945869"https://doi.org/10.1109/
ICNTE44896.2019.8945869
Anggakusuma, J., Mawardi, V. C., & Lauro, M. D. (2020). Resume extraction with conditional
random field method. IOP Conference Series: Materials Science and Engineering, 1007(1).
https://doi.org/10.1088/1757-899X/1007/1/012154
Barrak, A., Adams, B., & Zouaq, A. (2022). Toward a traceable, explainable, and fairJD/Resume
Rabih, H.E. & Mercier, L., (2021). Curriculum Vitae Evaluation using Machine Learning
Bhushan Kinge, Shrinivas Mandhare, Pranali Chavan, & S. M. Chaware. (2022). Resume
Screening using Machine Learning and NLP: A proposed system. International Journal of
258. https://doi.org/10.32628/cseit228240
Resume Screening and Interview Process Outcomes. Empirical Research for Greece.
36
International Journal of Human Resource Studies, 9(3), 230.
https://doi.org/10.5296/ijhrs.v9i3.15226
Lokesh. S, Balaje. S, M., Prathish. E, & B. Bharathi. (2022). Resume Screening and
www.jespublication.com
Naik, R. S., Dhotre, S. R., & Professor, A. (2022). RESUME RECOMMENDATION USING
Almada, R. V., Elias, O. M., G ´omez, C. E., Mendoza, M. D., L ´opez, S. G.,Natural
Language Processing and Text Mining to Identify Knowledge Profiles for Software
Gao, L., Eldin, N., Employer’s expectations: A probabilistic text mining model, Creative
Karakatsanis, I., AlKhader, W., MacCrory, F., Alibasic, A., Omar, M. A., Aung,Z., Woon,
W. L., Data Mining Approach to Monitoring The Requirements of the Job Market: A
37
Kinoa, Y., Kurokia, H., Machidab, T., Furuyab, N., Takanob, K., “Text Analysis for Job
Melo-Acosta, German E., et al. “Fraud Detection in Big Data Using Supervised and Semi-
doi:10.1109/colcomcon.2017.8088206.
Mohammed, Emad, and Behrouz Far. “Supervised Machine Learning Algorithms for Credit
doi.ieeecomputersociety.org/10.1109/IRI.2018.00025.
Juneja, A., Momin, A. (2016) Resume Ranking using NLP and Machine Learning.
38