My Final Project

DEVELOPMENT OF A CURRICULUM VITAE REVIEW MODEL BASED ON
NATURAL LANGUAGE PROCESSING
BY
ANTHONY OLUWATOBI EMMANUEL
180404027
A PROJECT WORK SUBMITTED TO THE DEPARTMENT OF COMPUTER
SCIENCE, FACULTY OF SCIENCE, ADEKUNLE AJASIN UNIVERSITY
AKUNGBA-AKOKO, ONDO STATE.
IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE AWARD OF

BACHELOR OF SCIENCE (B. Sc) IN COMPUTER SCIENCE ADEKUNLE AJASIN
UNIVERSITY AKUNGBA AKOKO (AAUA)
DECEMBER, 2023.
CERTIFICATION
i
This is to certify that this project work was carried out by ANTHONY
OLUWATOBILOBA EMMANUEL with matric number 180404027 in the Department of
Computer Science, Faculty of Science, Adekunle Ajasin University, Akungba Akoko, Ondo
State.
……………………….. ……………………...
Dr. D. A. Akinwumi Date
Supervisor
……………………….. ..…………………
Dr. F.O. Aranuwa Date
Head of Department
DEDICATION
ii
This project work is dedicated to the Rock of Ages for His love, blessings, unmerited favours
and protection, now and throughout my days on campus as an undergraduate.
iii
ACKNOWLEDGEMENTS
First and foremost, my utmost gratitude goes to the Omnipresent God for being there for me
before and after this research work. May your name be glorified both now and forever more. I
am heartily thankful to my supervisor, Dr. D.A. Akinwumi, whose encouragement, guidance,
fatherly love and support from the initial to the final level enabled me to develop an
understanding of the subject.
Let me seize this wonderful opportunity to appreciate the Head of Computer Science
Department, in person of Dr. F.O. Aranuwa for his moral and academic support. I would like
to express my deepest appreciation to the former head of Department, Dr. Olusola Ajayi and
to all lecturers in the Department of Computer Science for their unselfish and unfailing
supports in all ways.
It is a pleasure to thank my parents, Mr and Mrs Anthony for all they have done in helping to
purse my career, I love you so much. To my able siblings. Thank you all for being there for
me.
In any academic endeavor, many people and institutions will be involved in the success story.
I sincerely and humbly express my appreciation and gratitude to Afunibiowo Adejoke and
also to my course mates and everybody who really impacted my life positively during the
course of my study. God bless you all.
iv
ABSTRACT
The development of a Curriculum Vitae (CV) review model based on Natural Language
Processing (NLP) represents an innovative approach to enhance the efficiency and objectivity
of the hiring process. This research focuses on leveraging NLP techniques to analyze and
assess CVs, extracting relevant information and evaluating candidates based on predefined
criteria. The model aims to automate the initial screening phase, reducing human bias and
expediting the selection of qualified candidates. Through the integration of advanced NLP
algorithms, the model interprets and comprehends the content of CVs, considering both
structured and unstructured data. The research explores the feasibility, accuracy, and
effectiveness of this NLP-based CV review model, emphasizing its potential to revolutionize
the recruitment process by improving precision, saving time, and promoting fair evaluations.
The findings contribute to the intersection of natural language processing and human
resources, offering a valuable tool for organizations seeking to optimize their hiring
procedures.
v
TABLE OF CONTENTS
TITLE PAGE ….…..….…….…………..………………...…………………………….……i
CERTIFICATION.…….………...….………………………………………………………ii
DEDICATION………………………………………...……………………………………..iii
ACKNOWLEDGEMENT………………………………………………………………......iv
ABSTRACT …………………………………………………………………………………v
TABLE OF CONTENT……………………………..…………………………………………………………...……vi
CHAPTER ONE.......................................................................................................................1
INTRODUCTION....................................................................................................................1
1.1 Background of the Study..............................................................................................1
1.2 Statement of the Problem.............................................................................................3
1.3 Aim and Objectives.......................................................................................................3
1.3.1 Aim..........................................................................................................................3
1.3.2 Objectives...............................................................................................................4
1.4 Methodology..................................................................................................................4
1.5 Scope of the Study.........................................................................................................5
1.6 Expected Contribution to Knowledge.........................................................................6
1.7 Definition of terms.........................................................................................................6
CHAPTER TWO.....................................................................................................................7
LITERATURE REVIEW........................................................................................................7
2.0 Introduction...................................................................................................................7
2.1 Research Concepts.............................................................................................................9
2.1.1 Curriculum Vitae (CV).............................................................................................9
2.1.2 How a Curriculum Vitae (CV) Works.................................................................9
2.1.3 Differences between Curriculum Vitae (CV) and Resume..............................10
2.1.4 Different types of CV...........................................................................................10
2.1.5 Benefits of CV.......................................................................................................11
2.1.6 An Overview of NLP...............................................................................................13
2.1.7 Using Machine Learning for analysis and Classification.................................14
vi
2.2 Review of related works..........................................................................................16
CHAPTER THREE...............................................................................................................19
METHODOLOGY.................................................................................................................19
3.1 Introduction.................................................................................................................19
3.2 System Architecture....................................................................................................19
3.3 Resume from Kaggle...............................................................................................20
3.4 Resuming Preprocessing.........................................................................................20
3.5 NLP and ML Model Development.........................................................................21
3.6 Model Evaluation.....................................................................................................21
3.7 System flowchart.....................................................................................................23
4.1 Introduction.................................................................................................................24
4.2 Implementation Requirements..................................................................................24
4.2.1 Programming Language......................................................................................24
4.2.2 Justification for Choice of Programming Language........................................25
4.3 Implementation phases of the model.........................................................................26
4.3.1 Python Machine Learning Model.......................................................................26
4.3.2 Data Cleaning.......................................................................................................29
4.3.2 Building the model...............................................................................................32
4.3.4 Result.....................................................................................................................32
CHAPTER FIVE....................................................................................................................33
5.1 Summary..................................................................................................................33
5.2 Conclusion................................................................................................................34
REFERENCES
vii
CHAPTER ONE
INTRODUCTION
1.1 Background of the Study
Whether seeking for a temporary employment to gain experience or a permanent position
with the organization, the applicant must provide one of these documents with their
application materials: a résumé or curriculum vitae (CV). These documents serve as
important marketing tools that will give a self-portrait or advertisement of the employee and
will present your relative strengths, skills, and experiences to a potential employer
(Bhushan Kinge et al., 2022)
. An effective resume or CV will provide an employer with an overview
of who you are as a student or young professional, what you know and can do in relation to
the position of interest, and what relevant skills, traits, and accomplishments you have
achieved at this point in your education or career. Therefore, the objective of your resume or
CV is to catch the eye of a prospective employer and secure an interview

(Anggakusuma et al., 2020)
.
Every day, any company with a job opening for a particular position receives thousands of
emails from potential employees. It will be challenging for any recruiter to select the top
candidates from a huge pool of applicants for that employment position. It is exceedingly
difficult for recruiters to manually go through hundreds of resumes to locate the top
candidates for the post. About 75% of the thousands of resumes that were sent to the
organization in response to the job posting do not demonstrate the pertinent abilities needed
for the position. As a result, it can be quite difficult for recruiters to select the best individuals
from a big pool of applicants. Also, the process of reviewing each employee CV can be
cumbersome, as such, company would have adopted the use of external recruiter
(Amin et al., 2019)
.
1
The recruitment industry is worth $200 billion and it deals with selecting the best candidates
from an enormous pool of applicants who have the necessary skills for a certain job
description. Numerous employees send their resumes to the organization to apply for any job
openings that may exist at the company and screening resumes of all job applicants is the
recruiting process for any recruiter (Bersin, 2017). There has always been a search for an
automated process in which employers can quickly select eligible candidates and applicants
can show their ingenuity by using a single application format to apply to several
organizations because employing an external recruiter can be expensive at times, therefore,
application of an automated system to carry out the process is a tested and true way to carry
out the process. This study has adopted the application of machine learning.
In the subject of machine learning, we develop a model with a dataset to forecast the intended
result from new data. Natural language processing (NLP), which relates to the way humans
speak with one another, is primarily used to screen the resumes. The goal of NLP is to enable
computers to comprehend spoken and written language in a manner similar to that of humans.
NLP combines computational linguistics-rule-based modeling of human language with
statistical, machine learning and deep learning models. Together, combining these
technologies helps computers process the way human language works in the form of texts or
voice data and to ‘understand’ its full meaning. As the job market is growing in India,
millions of new job seekers are joining the workforce every year, as per LinkedIn (Suhua et
al, 2020)
Although the data formats used in CV/Resumes are not entirely unstructured, it is still
difficult to accept them in a standardized format since there is no set of rules for writing a
CV/Resume. With machine learning and NLP, to analyze any written documents such as
resumes, the potential to interpret unstructured data and extract relevant information from it,
as well as the ability to teach the computer can be achieved.
2
1.2 Statement of the Problem
The number of job seats available is not enough to cover the staggering number of
applications/resumes companies will receive. Each applicant is unique, as different people
with different experiences apply for jobs. Some persons hold positions in the human
resources division and they will have to review hundreds to thousands of resumes in order to
find the best fit for a job opening. Hence, if the companies hire in bulk there are many
applications to find the talent that they need which will require a considerable number of
resources and time. On an average, an HR executive takes about 10 to 15 minutes to review
each résumé, putting a summary together and adding the data to the database. Executives
condense the résumé and enter the applicant's contact information into their database and
calling them for interviews following the acquisition of the resume, but with machine
learning it will rank out the top resumes which are the best fit for the job role using NLP
algorithms.
The model will also ensure the switch from labor intensive human resume processing to
incredibly quick and affordable software. The following objectives are identified as necessary
in meeting set goals.
1.3 Aim and Objectives
1.3.1 Aim
The aim of this project is to develop a machine learning model to automate the extraction of
required information of candidate resume without manually going through all submitted
resume of an applicant. To achieve the project aim, the following objectives would be
considered.
1.3.2 Objectives
The objectives of this project are to:
3
a. develop a model that can accept new CV/Resume for review using NLP (Natural
Language Processing) technology and;
b. test and evaluate the model.
1.4 Methodology
The methodology for the development of a curriculum vitae review model based on natural
language processing is depicted in Figure 1.1 below.
INFORMATION GATHERING
BUILDING MODEL
TESTING MODEL
DEPLOYMENT
Figure 1.1: Architecture of the Proposed Model
To develop the model, following methodology would be considered;
i. Information gathering: This process involves collecting the necessary tools
needed, reviewing of related works and identifying the best and effective
tool/techniques to achieve the project aim.
4
ii. Building the model: The process involves writing the codes to achieve the project
aim. The programming language chosen is python, the development environment
is Anaconda and the algorithm to apply is NLP.
iii. Test the model: The process involves creating and structuring a format the model
will follow to review the document. The new CV to be reviewed will be supplied
and the similarity score will be produced ranging to an 100%. The higher the
similarity score, the more important the cv is.
iv. Deployment: The model can be packed and a manual will be provided on how the
model can be used.
1.5 Scope of the Study
The difficulty of extracting relevant information from a resume in an organized fashion can
be overcome with the aid of a purpose system. This study aims at developing machine
learning model that can help automate the process. This will be achieved by implementing a
NLP algorithm that will be capable of comparing submitted resume with the expected format
and information needed by a company. The model will only cover supplying a format for the
resume and a resume would be provided to compare the similarity of the new resume to the
company’s format.
1.6 Definition of terms
i. NLP: Natural language processing is an interdisciplinary subfield of linguistics,
computer science, and artificial intelligence concerned with the interactions between
computers and human language, in particular how to program computers to process
and analyze large amounts of natural language data.
ii. CV: A CV, which stands for curriculum vitae, otherwise known as Resume, is a
document used when applying for jobs. It allows you to summarise your education,
5
skills and experience enabling you to successfully sell your abilities to potential
employers
iii. Machine learning (ML) is a field of inquiry devoted to understanding and building
methods that 'learn', that is, methods that leverage data to improve performance on
some set of tasks. It is seen as a part of artificial intelligence.
6
CHAPTER TWO
LITERATURE REVIEW
2.0 Introduction
Hiring the right talent is a challenge for all businesses. Manually screening a large number of
resumes/cv takes at least one day. If a recruiter considers 4-6 appropriate resumes when
going through the initial resumes, chances are that they will not consider the other submitted
resumes. This decreases the likelihood of a successful resume being shortlisted. Going
through each resume is time-consuming, and manually organizing and managing a large
number of resumes is challenging. It’s normal to have some prejudice, wherever there’s been
(Naik et al., 2022)
human involvement .
This challenge is magnified by the high volume of applicants if the business is labor-
intensive, growing, and facing high attrition rates. An example of such a business is that IT
departments are short of growing markets. In a typical service organization, professionals
with a variety of technical skills and business domain expertise are hired and assigned to
(Barrak et al., 2022)
projects to resolve customer issues . This task of selecting the best talent
among many is known as Resume Screening. Typically, large companies do not have enough
time to open each CV, so they use machine learning algorithms for the Resume Screening
task and by this unemployment rate also reduced with efficient hiring. Machine learning is a
field in which involves training a model with data to anticipate the intended outcome when
new data is submitted. Natural language processing (NLP) is a commonly used to screen
resumes. Natural language refers to how humans communicate with one another (Riza et al,
2021).
In the NLP the system enables us to find the text based on the English dictionary in the same
way as humans. NLP combines statistical, machine learning, and deep learning models of
7
human language with computational linguistics-based rule-based modeling, here we need to
check for the data from different formats which are either in the form of the document or
either in the form of the audio data and understanding the whole meaning of it
(Dimopoulos, 2019)
. The number of applications is in the millions, making it a time-consuming chore to
sort through them. Here we need a machine learning algorithm that can give a better way of
understanding and also can full fill the requirements according to the requirement in the
industry. The proposed system takes a CSV file as input which contains different categories
and resumes based on the category and features of the resume the accuracy and performance
are calculated using different machine learning classifiers.
The study Employers expectations, a probabilistic text mining model (Gao and Eldin, 2018),
more than 20,000 job advertisements from various websites were processed, the method of
text mining was applied to identify information skills derived from the web pages of the
construction industry sector. In the research named Text Analysis for Job Matching Quality
Improvement (Kinoa et al., 2017), in a context of data analysis that includes travel time, job
location, job type, rates, and candidate skill set, etc. And when applying keywords in a
machine learning process using text mining tools, as a result, effective keywords are
discovered for a job matching system. In the research entitled Natural Language Processing
and Text Mining to Identify Knowledge Profiles for Software Engineering Positions (Almada
et al., 2017), through the application of NLP and TM to analyze the unstructured text of the
resumes and job offers, it manages to identify the knowledge profiles for software
engineering positions.
In the research entitled Data Mining Approach to Monitoring the Requirements of the Job
Market: A Case Study (Karakatsanis et al., 2018), presents an approach based on data mining
to identify the most demanded occupations in the modern labor market. To achieve this, have
8
a latent semantic indexing model that is able to match the job announcement extracted from
the 18web with the data of the occupation description in the database.
2.1 Research Concepts
This gives the details of the research concept of curriculum vitae
2.1.1 Curriculum Vitae (CV)
A curriculum vitae works in much the same way as a resume, providing information about an
individual's educational and work history. Often called a CV for short, it's much more
comprehensive than the typical resume and can be much longer. There's no limit to how long
a CV can be, but it must be focused on academic and professional experience. A lengthy CV
isn't any better than a short one if it contains fluff or irrelevant data. A job applicant seeking
an academic position, like a teaching appointment at a college or university or a research
position, should always use a CV. If you're unsure whether a prospective employer wants a
resume or CV, use the job announcement to guide you. It will usually state which document
the institution wants (Karakatsanis et al., 2018).
2.1.2 How a Curriculum Vitae (CV) Works
A CV begins with one’s contact information, including one’s name, address, telephone
number, and email address. You should also indicate one’s area or areas of academic interest.
One’s CV should include a comprehensive account of one’s academic history, including the
title of one’s dissertation or thesis. It must also contain details about all publications, research
projects, and presentations to which you have contributed. You should also list any grants,
academic awards, and other related honors you've received. The employment and experience
section of one’s CV should contain teaching and research positions, both paid and unpaid. In
addition to jobs, include any relevant internships and volunteer experiences here. Following
9
that section, discuss memberships in scholarly and professional associations and include
offices you have held, if any. Finally, provide a list of references, along with their contact
information, on one’s curriculum vitae. Doing this is in contrast to a resume, which never
contains this information.
2.1.3 Differences between Curriculum Vitae (CV) and Resume
A resume is a summary of one’s background and experience. Its emphasis is on one’s work
experience. A CV is much more comprehensive, providing details about one’s academic
background. Resumes are typically two pages or less, while CVs can be as long as needed to
convey one’s academic background and experience. CVs are used for academic positions,
and the format can vary as long as it includes all the information one’s prospective employer
requires. Resumes are used for most other positions and follow a few standard templates.
CV RESUME
Comprehensive list of one’s academic and professional Summary of one’s relevant work
experience experience
Can be multiple pages Typically, two pages or less
Used for academic positions Used for most employment
applications
2.1.4 Different types of CV
If you're crafting a new CV (or one’s first CV) you'll need to think about what type of CV
you want to make. This will depend on one’s experience, circumstances, industry and
personal preference. The different options include:
10
i. Chronological CVs
This is the most traditional type of CV, and is what most employers expect to see. A
chronological CV lays out one’s professional experience in reverse chronological order so
that one’s most recent job is at the top of the page. Ideally, a CV should go back around 10-
15 years, or cover one’s last 5-6 positions.
ii. Custom CVs
Although most CVs are chronological, in certain situations you may decide to order them
differently. For example, if you are changing careers, you might prioritise education and
experience that is most relevant to the role you're applying for, moving less relevant
experience further down the page. However, ensure that one’s CV is as clear as possible for
potential employers.
iii. Creative CVs
Creative CVs heavily use visual elements such as pictures, graphs and colors to represent
skills and experience. Creative CVs are common in fields such as marketing or design, but
may not be a good idea for more formal industries like banking or law. You can get an idea of
whether a creative CV would impress one’s potential employer by studying their job advert
and website—if it's written very formally, it's probably best to stick to a traditional CV.
2.1.5 Benefits of CV
A CV is important because it serves as an attention-grabbing bridge between you today and
the more successful future of living one’s dream career. The curriculum vita e is the most
important document that you’ll ever write to snatch the job opportunity. It will be one’s first
11
impression, and it should leave people wanting more. It should show the best angle of one’s
image to one’s future employers. Here are several benefits of having a well-made CV.
i. Boost self-confidence
Writing down one’s skills and abilities on one’s CV is a very constructive thing to do for
one’s self-confidence. With all of one’s positive traits on a piece of paper (or a screen) in
front of you, you will feel imbued with a strange strength that you thought would only be fit
for heroes. Wise men say that well-earned confidence is half the battle won. Future
employers will also be more likely to hire confident candidates. So, all the more reason to
create one’s self-confidence booster.
ii. Prove one’s knowledge
A good CV is not only one’s positive traits but also one’s certifications, experience, and other
notable achievements you have. Those are one’s proof of knowledge and help put you as one
of the self-aware candidates. Employers look for these kinds of credentials when they search
for candidates. They know that someone who has done something before knows how to do it
better than someone who hasn’t.
iii. Show teamwork skills
Since we're already mentioning one’s college organizational experience, one’s Curriculum
Vitae would also be a good place to display what teamwork skills you've put to good use. In
all fairness, there would only be a handful of jobs out there that don't require you to work
with a team to finish one’s daily tasks. This is why having one’s teamwork skills easily
known is a good thing while you're composing one’s CV.
iv. Make application stand out and leave a lasting impression
12
Let's face it, how would you feel if you receive a letter without clear information? Would you
try to get to know the person sending the letter? Most of us would just forget that letter and
grab the next letter in the queue. Now, that clear information for employers is one’s CV. Let's
make a memorable resume and let one’s potential employers remember one’s application and
try to get to know you, shall we?
v. Make interview process more effective and efficient
With a good CV in the hands of one’s potential employer, there will be more minor
interrogation happening to you. If they already have the needed information suitable for the
job description you are aiming for, the interview process would focus on what kind of person
you are instead of the details on the official documents. Concentrating on what you can do
again is memorizing one’s life which is backed up by one’s CV, that's going to be a fun
process. Reciting what you can do is another confidence booster, at least that's what I've
experienced myself.
vi. Improve employability
A great CV makes for a great hiring process, both for you as the job seeker and also for one’s
potential employer. With all one’s suitable traits written down on the CV that they're now
holding, prospective employers can't help but conclude that you're a person with good
attention to detail, who will go and research things that need to be done, an independent yet
inquisitive worker. Add that to the pleasant interview they had when they had you in their
office, and at the very least, I'd say you'll be on their list of top candidates to hire
(Karakatsanis et al., 2018).
2.1.6 An Overview of NLP
Natural language processing (NLP) is a subfield of Artificial Intelligence (AI). This is a
widely used technology for personal assistants that are used in various business fields/areas.
13
This technology works on the speech provided by the user, breaks it down for proper
understanding and processes accordingly. This is a very recent and effective approach due to
which it has a really high demand in today’s market. Natural Language Processing is an
upcoming field where already many transitions such as compatibility with smart devices,
interactive talks with a human have been made possible. Knowledge representation, logical
reasoning, and constraint satisfaction were the emphasis of AI applications in NLP. Here first
(Anggakusuma et al., 2020)
it was applied to semantics and later to the grammar .
In the last decade, a significant change in NLP research has resulted in the widespread use of
statistical approaches such as machine learning and data mining on a massive scale. The need
for automation is never ending courtesy of the amount of work required to be done these
days. NLP is a very favorable, but aspect when it comes to automated applications. The
applications of NLP have led it to be one of the most sought-after methods of implementing
machine learning. Natural Language Processing (NLP) is a field that combines computer
science, linguistics, and machine learning to study how computers and humans communicate
in natural language. The goal of NLP is for computers to be able to interpret and generate
human language. This not only improves the efficiency of work done by humans but also
helps in interacting with the machine. NLP bridges the gap of interaction between humans
and electronic devices.
2.1.7 Using Machine Learning for analysis and Classification
Pros and cons but none of them stands as a perfect solution. Static analysis is one of the
approaches and it can be defined as the analysis of a software without its execution. It is clear
that a good analysis tool can help spot and eradicate vulnerabilities, furthermore, it is
becoming a part of the development process. But there is still room for improvement and all
14
the research work done in this area can be of uttermost relevance for the industry
(Mohammed, and Behrouz, 2018).
There are different types and classifications of machine learning models, provided by
different contributors. The most widely used review models are:
i. Decision trees:
Decision trees are a simple, but powerful form of multiple variable analysis. They are
produced by algorithms that identify various ways of splitting data into branch-like segments.
Decision trees partition data into subsets based on categories of input variables, helping you
to understand someone’s path of decisions.
ii. Regression (linear and logistic)
Regression is one of the most popular methods in statistics. Regression analysis estimates
relationships among variables, finding key patterns in large and diverse data sets, and how
they relate to each other.
iii. Neural networks:
Patterned after the operation of neurons in the human brain, neural networks (also called
artificial neural networks) are a variety of deep learning technologies. They’re typically used
to solve complex pattern recognition problems – and are incredibly useful for analyzing large
data sets. They are great at handling nonlinear relationships in data – and work well when
certain variables are unknown
iv. Other classifiers:
Time Series Algorithms, Clustering Algorithms, Ensemble Models, Factor Analysis, Naïve
Bayes and Support vector machines. Each classifier approaches data in a different way,
therefore for organizations to get the results they need, they need to choose the right
15
classifiers and models. data scientists and IT experts are tasked with the development of
choosing the right predictive models or building their own to meet the organization’s needs.
2.2 Review of related works
Several researchers have worked on curriculum vitae review model based on natural
language processing. Some of the works are documented as follows:
Juneja et al. (2016) Used Natural Language Processing (NLP) and Machine Learning (ML) to
rank the resumes according to the given constraint, this intelligent system ranks the resume of
any format according to the given constraints or the following requirement provided by the
client company. We will basically take the bulk of input resume from the client company and
that client company will also provide the requirement and the constraints according to which
the resume should be ranked by our system. Beside the information provided by the resume
we are going to read the candidates social profiles (like LinkedIn, Github etc) which will give
us the more genuine information about that candidate
Amin et al. (2019) , this research focus majorly on the design of the web application which
will be used to screen resumes (Curriculum Vitae) for a particular job posting. In the
proposed system, a web application will encourage the job applicant candidates as well as the
recruiters to use it for job applications and screening of resumes. The recruiters from various
companies can post the details of the job openings available in their respective companies.
The interactive web application will allow the job applicants to submit their resume and apply
for their job postings they may still be interested in. The resumes submitted by the candidates
are then compared with the job profile requirement posted by the company recruiter by using
techniques like machine learning and Natural Language Processing (NLP). Scores can then
16
be given to the resumes and they can be ranked from highest match to lowest match. This
ranking is made visible only to the company recruiter who is interested to select the best
candidates from a large pool of candidates.
Rabih et al. (2021), presented a paper on curriculum vitae evaluation using machine learning
approach. Its main role is to detect the eligibility of people who are applying to job vacancies
or higher education programs. This research work ambitions in elaborating a system that
automates the preselection of eligibility and assessment of candidates in the higher education
students’ recruitment process. This system will replace the tedious tasks of manual
processing of CVs and will provide accurate and effective evaluation results. To achieve this
requirement, the system will be implemented using a machine learning approach using
different classification algorithms.
Its main role is to detect the eligibility of people who are applying to job vacancies or higher
education programs. This research work ambitions in elaborating a system that automates the
preselection of eligibility and assessment of candidates in the higher education students’
recruitment process. This system will replace the tedious tasks of manual processing of CVs
and will provide accurate and effective evaluation results. To achieve this requirement, the
system will be implemented using a machine learning approach using different classification
algorithms. The limitation of this work is that the system can not the analysis scope is applied
Lokesh.et al. (2022),
to the candidates who are applying to pursue a Master’s degree only this
report has discussed the detailed design and related algorithms for a resume screener, to
decide whether a particular candidate is suitable for the applied role or not. Candidates apply
in large numbers for jobs on web portals by uploading their resumes. As a result, filtering
applicants for the appropriate position in an organization becomes a difficult task for
recruiters. Natural Language Processing (NLP) techniques to extract the relevant information
from the resume to save time and effort. Also, a Machine Learning (ML) model is trained to
17
check whether a candidate’s skills, experiences, and other aspects are suitable for that
particular role. In addition to that, our system will also recommend the other available job
roles based on the candidate’s skillset. On analyzing the performance of the models, we
found that Logistic Regression performs the best for this problem statement. We also found
that more dataset is required for making this model work even more efficiently. More
attributes can be added to find much better performance. Overall, the system performs pretty
well with the current resources. As a part of our future work, it was intended to improve the
accuracy of the system by collecting more resumes from organizations and training our
model for all the available roles. In addition to that, we could also analyze the candidate’s
information from social networking sites like Facebook, Twitter, Linkedin, so that we can
decide more accurately and authentically whether to offer the job or not. Additionally,
algorithms such as Naive Bayes, K-Nearest Neighbor, and C4.5 Analysis can be performed,
to check if it improves the result.
Naga (2022), Selecting applicants for the appropriate job within a company is a difficult task
for recruiters. Extraction the key information from the CV using NLTK, Natural Language
Processing (NLP) techniques to save time and effort. This paper examines a variety of
machine learning model such as KNN, SVM, logistic regression and MLP, to detect, identify,
and categories diverse resumes. And here we achieve the better accuracy and we implement a
web interface to screen the resumes and analyses the type of job related to resume, MLP
outperforms other approaches like KNN, SVM, Logistic Regression. Furthermore, this
system attempts to find the accuracy and performance of the proposed methodology and
incorporate it in the IT firms and other regulations for the prevention of manual screening and
establish a safe allocation of resources for the companies.
18
CHAPTER THREE
METHODOLOGY
3.1 Introduction
This section of the project discussed the methodological steps to be followed to develop the
automated CV review model using NLP. It also discusses the requirements and architecture
of the proposed model.
3.2 System Architecture
model is shown in the Figure 3.1 below.
Figure 3.1: System Architecture of CV review model based on natural language
processing
19
3.3 Resume from Kaggle
This is methodical process of collecting and analyzing data from a variety of sources to get a
complete and accurate picture of a subject. In this context, the dataset to be used would be
downloaded from an online repository
https://www.kaggle.com/datasets/gauravduttakiit/resume-dataset. The dataset contains 962
samples and two variables (Columns), the first variable is the category, which defines the
category of job the candidate is applying for while the second variable is the CV, which
contains the content in the candidate CV.
3.4 Resuming Preprocessing
The very first step of NLP projects is pre-processing the data. This is essential in preparing
the text data for the model building before using it for analysis or prediction. Some of the
preprocessing steps are:
i. Removing punctuations like . , ! $( ) * % @ : Removing punctuations from the
dataset.
ii. Removing URLs: Removing any links from the dataset.
iii. Removing Stop words: Stop words are the words in a stop list which are removed
before or after processing of natural language data because they are unimportant.
They can be list of pronouns that doesn’t have meaning to the dataset.
iv. Tokenization: Tokenization is used in natural language processing to split paragraphs
and sentences into smaller units that can be more easily assigned meaning.
v. Stemming or Lemmatization: Stemming and lemmatization are methods in NLP used
to analyze the meaning behind a word. Stemming uses the stem of the word, while
lemmatization uses the context in which the word is being used (Deepanshi, 2022).
20
3.5 NLP and ML Model Development
To build an NLP Model, this process involves both dependency parsing and part of speech
tagging. Dependency parsing is used to find out the relationship between all the words in a
sentence. To find the dependency, it involves building a tree and assigning a single word as a
parent word. The main verb in the sentence will function as the root node while part of
speech tagging contains verbs, adverbs, nouns, and adjectives that help convey the meaning
of words in a sentence in a grammatically correct way in a sentence. This process will be
implemented using an NLP library called NLTK (Natural Language Toolkit) which is a suite
of libraries and programs for symbolic and statistical natural language processing for English
written in the Python programming language to develop NLP models. An output of the NLP
process will be an input in form of dataset to the classification model to be developed. This
model will classify CVs based on applicant’s skillset to determine if the applicant will be
selected or not. The algorithm to be used is a multi-class KNN algorithm.
3.6 Model Evaluation
Model Evaluation involves using performance metrics techniques to measure the
performance of a model. Some of the metrics used are highlighted below.
i. Accuracy
Accuracy is the ratio of correct predictions to the total number of predictions. It is one of the
simplest measures of a model. We must aim for high accuracy for our model. If a model has
high accuracy, we can infer that the model makes correct predictions most of the time.
CorrectPrediction
Accuracy=
CorrectPrediction+ IncorrectPrediction
21
TruePositive+TrueNegative
Accuracy=
TruePositive+ FalsePositive+TrueNegative + FalseNegative
ii. Mean Score
F1 score depends on both the Recall and Precision, it is the harmonic mean of both the
values.
Recall∗Precision
Mean Score=
Recall+ Precision
iii. Precision
Precision is the ratio of true positives and total positives predicted.
CV correctly classified
CV correctly classified + Incorrectly classified
iv. Recall
A Recall is essentially the ratio of true positives to all the positives in ground truth.
CV correctly classified
CV correctly classified + Incorrectly classified
22
3.7 System flowchart
The system flowchart of CV review is presented in Figure 3.2
Figure 3.2: System Flowchart of CV’s Review
23
CHAPTER FOUR
SYETEM IMPLEMENTATION, RESULTS AND DISCUSSION
4.1 Introduction
This chapter describes the implementation stages for the development of Natural Language
models for reviewing CV. The modeling is done using a dataset which is retrieved from
kaggle. Python programming language will be used for developing the model.
4.2 Implementation Requirements
The model development requirements are into two main parts, software and hardware
requirements. The candidates’ credibility datasets form a major component for the model
building.
a. Software Requirements
i. Python 3.9.0
ii. Jupyter Notebook
b. Hardware Requirements
i. DELL 7400, 8 GHz processor, 16GB RAM, 64-bits OS
4.2.1 Programming Language
Python is a high-level, interpreted, interactive and object-oriented scripting language. Python
is designed to be highly readable. It uses English keywords frequently whereas other
languages use punctuation, and it has fewer syntactic constructions than other languages.
i. Python is Interpreted: Python is processed at runtime by the interpreter. You do not
need to compile your program before executing it. This is similar to PERL and PHP.
ii. Python is Interactive: You can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.
24
iii. Python is Object-Oriented: Python supports Object-Oriented style or technique of
programming that encapsulates code within objects.
iv. Python is a Beginner's Language: Python is a great language for the beginner-level
programmers and supports the development of a wide range of applications from
simple text processing to WWW browsers to games.
4.2.2 Justification for Choice of Programming Language
Machine learning is considered to be the trending technology of the future. Already there are
a number of applications made on it. Due to this, many companies and researchers are taking
interest in it. But the main question that arises here is that in which programming language
can these machine learning be developed? There are various programming languages like
Lisp, Prolog, C++, Java and Python, which can be used for developing applications of
Machine learning. Among them, Python programming language gains a huge popularity and
the reasons are as follows:
i. Simple syntax & less coding
Python involves very less coding and simple syntax among other programming languages
which can be used for developing Machine learning Model. Due to this feature, the testing
can be easier and we can focus more on programming.
ii. Inbuilt libraries for Machine Learning projects
A major advantage for using Python for Machine learning is that it comes with inbuilt
libraries. Python has libraries for almost all kinds of machine learning projects. For example,
NumPy, SciPy, are some of the important inbuilt libraries of Python.
iii. Open source: Python is an open-source programming language. This makes it widely
popular in the community.
25
4.3 Implementation phases of the model
The implementation phase of this study shows the modeling of LR, SVM and KNN to
classify customer churn, which is carried out using Python.
4.3.1 Python Machine Learning Model
In order to reduce the complexity of the structure. The full dataset which is loaded using a
train_test_split method will be implemented.
Importing the necessary libraries
In this project, we aim to investigate the effect of alcohol consumption on student academic
performance using a random forest model. We will be using the sklearn library in Python to
implement the model and evaluate its performance.
Some other important libraries imported include the following:
i. numpy: a library for numerical computing in Python that provides support for arrays,
matrices, and mathematical functions.
ii. pandas: a library for data manipulation and analysis in Python that provides support
for reading and writing data in various file formats and performing operations such as
filtering, grouping, and merging.
iii. scikit-learn: a library for machine learning in Python that provides support for a wide
range of algorithms, including the random forest algorithm.
iv. matplotlib: a library for data visualization in Python that provides support for creating
various types of plots, charts, and graphs.
26
Figure 4.1: Importing libraries
Previewing the dataset
The dataset has two variables, a category, which is the area the applicant is applying for and
the resume of the application.
Figure 4.2: Dataset Preview
27
Previewing all the categories in the dataset
Figure 4.3: Categories of job available and number of applicants
28
Figure 4.4: Categories of job available
Figure 1.5: Job category distribution in percentage
4.3.2 Data Cleaning
This process involves scanning through the dataset to identify and remove errors from the
dataset. It involves special symbols and characters, hashtags. The idea is to have only English
like words in the dataset.
29
Figure 4.6: Cleaning the dataset
Creating a new column for cleaned dataset
After properly cleaning the dataset, a new column was created to have the cleaned dataset.
Figure 4.7: Creating a new dataset
Tokenizing the cleaned dataset
This process involves breaking down the cleaned resumed into single words and converting
to lower case.
30
Figure 4.8: Tokenizing the cleaned dataset
Generating a classification class target
This process involves converting the classification category into integers, instead of
addressing it with the string name
Figure 4.9: Generating a classification class target
31
4.3.2 Building the model
The model building process involves splitting the dataset into training and testing set, 20% of
the entire dataset will be for testing and 80% for training.
KNN algorithm is used to build the classification model.
Figure 4.10: Building the model
After successfully building the model, the validation accuracy is 96%, signifying a
satisfactory result.
4.3.4 Result
32
CHAPTER FIVE
SUMMARY, CONCLUSION AND RECOMMENDATIONS
5.1 Summary
The submission of either a résumé or curriculum vitae (CV) is essential for individuals
seeking temporary employment for experience or a permanent position within an
organization. These documents serve as crucial marketing tools, portraying the applicant's
strengths, skills, and experiences to potential employers. A well-crafted résumé or CV
provides an overview of the applicant's identity, educational and professional background,
and relevant achievements, with the primary goal of capturing the attention of prospective
employers for interview consideration. The high volume of job applications received by
companies daily poses a significant challenge for recruiters in manually identifying top
candidates. The manual review of hundreds of resumes is time-consuming, and a
considerable percentage of submitted resumes often lack the essential qualifications for the
advertised positions. To address these challenges, some companies resort to employing
external recruiters, adding to the overall recruitment costs.
Recognizing the need for an efficient and cost-effective solution, this study adopts machine
learning, specifically natural language processing (NLP). NLP, which mimics human
language comprehension, is employed to automate the screening of resumes. The objective is
to enable computers to understand and process spoken and written language, combining rule-
based computational linguistics with statistical, machine learning, and deep learning models.
The recruitment industry, valued at $200 billion, grapples with the task of selecting the best
candidates from a vast pool of applicants possessing the required skills. The study
emphasizes the growing job market in India, leading to millions of new job seekers entering
33
the workforce annually. While CV/Resume data formats are not entirely unstructured, their
lack of standardized rules for writing makes them challenging to accept in a uniform format.
The integration of machine learning and NLP offers a solution to analyze and interpret
unstructured data from resumes, facilitating information extraction. The proposed model aims
to transition from labor-intensive human resume processing to a rapid and cost-effective
automated system.
The identified objectives for the study include developing a model capable of accepting and
reviewing CVs using NLP technology, followed by testing and evaluating the model's
effectiveness. The anticipated outcomes involve the shift towards a more efficient and
affordable software-driven resume review process.
5.2 Conclusion
In conclusion, this study shows the critical role of résumés and curriculum vitae (CVs) as
indispensable marketing tools for individuals seeking employment. The documents serve as
comprehensive self-portraits, presenting an applicant's strengths, skills, and experiences to
potential employers. The ultimate objective is to capture the attention of recruiters and secure
interview opportunities. The challenges faced by recruiters in manually reviewing a vast
number of resumes, particularly in the context of an insufficient number of job openings,
necessitate a paradigm shift in the recruitment process. The study recognizes the limitations
of traditional methods and advocates for the adoption of machine learning, specifically
natural language processing (NLP), as a transformative solution. The application of NLP in
screening resumes offers the potential to automate and expedite the review process. By
enabling computers to comprehend and process human language, NLP serves as a bridge
between rule-based computational linguistics and advanced statistical, machine learning, and
deep learning models. The study emphasizes the significance of leveraging technology to
address the challenges posed by the sheer volume of job applications received by companies.
34
The proposed machine learning model, as outlined in the study, aims to revolutionize the
conventional approach to resume review. The model's objectives include the development of
a system that utilizes NLP technology for efficient CV/Resume review and subsequent testing
and evaluation. This transition from labor-intensive human processing to a swift and cost-
effective software-driven approach marks a significant advancement in the recruitment
landscape. In essence, the integration of machine learning and NLP offers a promising
solution to the challenges faced by recruiters in handling large volumes of resumes. As the
job market continues to evolve, the adoption of technology-driven models becomes
imperative for enhancing efficiency, reducing costs, and ensuring the identification of the
most qualified candidates. This study contributes to the ongoing dialogue on optimizing the
recruitment process through innovative and intelligent solutions.
5.3 Contribution to Knowledge
The proposed machine learning model is expected to contribute to knowledge by providing
the required tool to extract the crucial data from a CV and Find relevant qualifications from a
variety of CVs and learning pertinent information from them.
5.4 Recommendations
35
REFERENCES
Amin, S., Jayakar, N., Sunny, S., Babu, P., Kiruthika, M., & Gurjar, A. (2019, January 1). Web Application for
Screening Resume. 2019 International Conference on Nascent Technologies in Engineering, ICNTE
2019 - Proceedings. HYPERLINK
"https://doi.org/10.1109/ICNTE44896.2019.8945869"https://doi.org/10.1109/
ICNTE44896.2019.8945869
Anggakusuma, J., Mawardi, V. C., & Lauro, M. D. (2020). Resume extraction with conditional
random field method. IOP Conference Series: Materials Science and Engineering, 1007(1).
https://doi.org/10.1088/1757-899X/1007/1/012154
Barrak, A., Adams, B., & Zouaq, A. (2022). Toward a traceable, explainable, and fairJD/Resume
recommendation system. http://arxiv.org/abs/2202.08960
Rabih, H.E. & Mercier, L., (2021). Curriculum Vitae Evaluation using Machine Learning
Approach. Curriculum Vitae Evaluation using Machine Learning Approach. Artificial
Intelligence for Knowledge Management IFIP AICT 614, 2021.
Bhushan Kinge, Shrinivas Mandhare, Pranali Chavan, & S. M. Chaware. (2022). Resume
Screening using Machine Learning and NLP: A proposed system. International Journal of
Scientific Research in Computer Science, Engineering and Information Technology, 253–
258. https://doi.org/10.32628/cseit228240
Dimopoulos, A. (2019). Comparative Effect of Candidates’ Physical Attractiveness between
Resume Screening and Interview Process Outcomes. Empirical Research for Greece.
36
International Journal of Human Resource Studies, 9(3), 230.
https://doi.org/10.5296/ijhrs.v9i3.15226
Lokesh. S, Balaje. S, M., Prathish. E, & B. Bharathi. (2022). Resume Screening and
Recommendation System using Machine Learning Approaches. Computer Science &
Engineering: An International Journal, 12(1), 1–7. https://doi.org/10.5121/cseij.2022.12101
Naga, M. (2022). RESUME SCREENING USING MACHINE LEARNING.
www.jespublication.com
Naik, R. S., Dhotre, S. R., & Professor, A. (2022). RESUME RECOMMENDATION USING
MACHINE LEARNING (Vol. 10, Issue 7). www.ijcrt.org
Almada, R. V., Elias, O. M., G ´omez, C. E., Mendoza, M. D., L ´opez, S. G.,Natural
Language Processing and Text Mining to Identify Knowledge Profiles for Software
Engineering Positions, 5th 81International Conference in Software Engineering
Research and Innovation (CONISOFT), 2017.
Gao, L., Eldin, N., Employer’s expectations: A probabilistic text mining model, Creative
Construction Conference 2018, CC2014.
Karakatsanis, I., AlKhader, W., MacCrory, F., Alibasic, A., Omar, M. A., Aung,Z., Woon,
W. L., Data Mining Approach to Monitoring The Requirements of the Job Market: A
Case Study. Electrical Engineering and Computer Science, Masdar Institute of
Science and Technology, Abu Dhabi, United Arab Emirates, 2018.
37
Kinoa, Y., Kurokia, H., Machidab, T., Furuyab, N., Takanob, K., “Text Analysis for Job
Matching Quality Improvement,” International Conference on Knowledge Based and
Intelligent Information and Engineering Systems, 2017.
Melo-Acosta, German E., et al. “Fraud Detection in Big Data Using Supervised and Semi-
Supervised Learning Techniques.” 2017 IEEE Colombian Conference on
Communications and Computing (COLCOM), 2017,
doi:10.1109/colcomcon.2017.8088206.
Mohammed, Emad, and Behrouz Far. “Supervised Machine Learning Algorithms for Credit
Card Fraudulent Transaction Detection: A Comparative Study.” IEEE Annals of the
History of Computing, IEEE, 1 July 2018,
doi.ieeecomputersociety.org/10.1109/IRI.2018.00025.
Riza, F. Rajah V, and Sharadadevi, V. (2021) “Resume Classification and Ranking
using KNN and Cosine Similarity” In 2021 International Journal of Engineering.
Juneja, A., Momin, A. (2016) Resume Ranking using NLP and Machine Learning.
38

My Final Project

Uploaded by

Copyright:

Available Formats

My Final Project

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

My Final Project

Uploaded by

Copyright:

Available Formats

DEVELOPMENT OF A CURRICULUM VITAE REVIEW MODEL BASED ON

NATURAL LANGUAGE PROCESSING

ANTHONY OLUWATOBI EMMANUEL

A PROJECT WORK SUBMITTED TO THE DEPARTMENT OF COMPUTER

SCIENCE, FACULTY OF SCIENCE, ADEKUNLE AJASIN UNIVERSITY

AKUNGBA-AKOKO, ONDO STATE.

IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE AWARD OF

OLUWATOBILOBA EMMANUEL with matric number 180404027 in the Department of

1.1 Background of the Study

Whether seeking for a temporary employment to gain experience or a permanent position

application materials: a résumé or curriculum vitae (CV). These documents serve as

CV is to catch the eye of a prospective employer and secure an interview

organizations because employing an external recruiter can be expensive at times, therefore,

NLP combines computational linguistics-rule-based modeling of human language with

as well as the ability to teach the computer can be achieved.

applications/resumes companies will receive. Each applicant is unique, as different people

resources and time. On an average, an HR executive takes about 10 to 15 minutes to review

in meeting set goals.

1.3 Aim and Objectives

Language Processing) technology and;

b. test and evaluate the model.

language processing is depicted in Figure 1.1 below.

Figure 1.1: Architecture of the Proposed Model

To develop the model, following methodology would be considered;

i. Information gathering: This process involves collecting the necessary tools

tool/techniques to achieve the project aim.

aim. The programming language chosen is python, the development environment

is Anaconda and the algorithm to apply is NLP.

similarity score, the more important the cv is.

model can be used.

1.5 Scope of the Study

1.6 Definition of terms

i. NLP: Natural language processing is an interdisciplinary subfield of linguistics,

computers and human language, in particular how to program computers to process

and analyze large amounts of natural language data.

some set of tasks. It is seen as a part of artificial intelligence.

departments are short of growing markets. In a typical service organization, professionals

are calculated using different machine learning classifiers.

2.1 Research Concepts

This gives the details of the research concept of curriculum vitae

2.1.1 Curriculum Vitae (CV)

an academic position, like a teaching appointment at a college or university or a research

the institution wants (Karakatsanis et al., 2018).

2.1.2 How a Curriculum Vitae (CV) Works

contains this information.

2.1.3 Differences between Curriculum Vitae (CV) and Resume

experience. A CV is much more comprehensive, providing details about one’s academic

Can be multiple pages Typically, two pages or less

Used for academic positions Used for most employment

2.1.4 Different types of CV

personal preference. The different options include:

chronological CV lays out one’s professional experience in reverse chronological order so

15 years, or cover one’s last 5-6 positions.

ii. Custom CVs

iii. Creative CVs

A CV is important because it serves as an attention-grabbing bridge between you today and

create one’s self-confidence booster.

ii. Prove one’s knowledge