My Final Project

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 45

DEVELOPMENT OF A CURRICULUM VITAE REVIEW MODEL BASED ON

NATURAL LANGUAGE PROCESSING

BY

ANTHONY OLUWATOBI EMMANUEL

180404027

A PROJECT WORK SUBMITTED TO THE DEPARTMENT OF COMPUTER

SCIENCE, FACULTY OF SCIENCE, ADEKUNLE AJASIN UNIVERSITY

AKUNGBA-AKOKO, ONDO STATE.

IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE AWARD OF


BACHELOR OF SCIENCE (B. Sc) IN COMPUTER SCIENCE ADEKUNLE AJASIN
UNIVERSITY AKUNGBA AKOKO (AAUA)

DECEMBER, 2023.

CERTIFICATION
i
This is to certify that this project work was carried out by ANTHONY

OLUWATOBILOBA EMMANUEL with matric number 180404027 in the Department of

Computer Science, Faculty of Science, Adekunle Ajasin University, Akungba Akoko, Ondo

State.

……………………….. ……………………...
Dr. D. A. Akinwumi Date
Supervisor

……………………….. ..…………………
Dr. F.O. Aranuwa Date
Head of Department

DEDICATION

ii
This project work is dedicated to the Rock of Ages for His love, blessings, unmerited favours
and protection, now and throughout my days on campus as an undergraduate.

iii
ACKNOWLEDGEMENTS

First and foremost, my utmost gratitude goes to the Omnipresent God for being there for me
before and after this research work. May your name be glorified both now and forever more. I
am heartily thankful to my supervisor, Dr. D.A. Akinwumi, whose encouragement, guidance,
fatherly love and support from the initial to the final level enabled me to develop an
understanding of the subject.

Let me seize this wonderful opportunity to appreciate the Head of Computer Science
Department, in person of Dr. F.O. Aranuwa for his moral and academic support. I would like
to express my deepest appreciation to the former head of Department, Dr. Olusola Ajayi and
to all lecturers in the Department of Computer Science for their unselfish and unfailing
supports in all ways.

It is a pleasure to thank my parents, Mr and Mrs Anthony for all they have done in helping to
purse my career, I love you so much. To my able siblings. Thank you all for being there for
me.

In any academic endeavor, many people and institutions will be involved in the success story.
I sincerely and humbly express my appreciation and gratitude to Afunibiowo Adejoke and
also to my course mates and everybody who really impacted my life positively during the
course of my study. God bless you all.

iv
ABSTRACT

The development of a Curriculum Vitae (CV) review model based on Natural Language
Processing (NLP) represents an innovative approach to enhance the efficiency and objectivity
of the hiring process. This research focuses on leveraging NLP techniques to analyze and
assess CVs, extracting relevant information and evaluating candidates based on predefined
criteria. The model aims to automate the initial screening phase, reducing human bias and
expediting the selection of qualified candidates. Through the integration of advanced NLP
algorithms, the model interprets and comprehends the content of CVs, considering both
structured and unstructured data. The research explores the feasibility, accuracy, and
effectiveness of this NLP-based CV review model, emphasizing its potential to revolutionize
the recruitment process by improving precision, saving time, and promoting fair evaluations.
The findings contribute to the intersection of natural language processing and human
resources, offering a valuable tool for organizations seeking to optimize their hiring
procedures.

v
TABLE OF CONTENTS
TITLE PAGE ….…..….…….…………..………………...…………………………….……i
CERTIFICATION.…….………...….………………………………………………………ii
DEDICATION………………………………………...……………………………………..iii
ACKNOWLEDGEMENT………………………………………………………………......iv
ABSTRACT …………………………………………………………………………………v
TABLE OF CONTENT……………………………..…………………………………………………………...……vi
CHAPTER ONE.......................................................................................................................1
INTRODUCTION....................................................................................................................1
1.1 Background of the Study..............................................................................................1
1.2 Statement of the Problem.............................................................................................3
1.3 Aim and Objectives.......................................................................................................3
1.3.1 Aim..........................................................................................................................3
1.3.2 Objectives...............................................................................................................4
1.4 Methodology..................................................................................................................4
1.5 Scope of the Study.........................................................................................................5
1.6 Expected Contribution to Knowledge.........................................................................6
1.7 Definition of terms.........................................................................................................6
CHAPTER TWO.....................................................................................................................7
LITERATURE REVIEW........................................................................................................7
2.0 Introduction...................................................................................................................7
2.1 Research Concepts.............................................................................................................9
2.1.1 Curriculum Vitae (CV).............................................................................................9
2.1.2 How a Curriculum Vitae (CV) Works.................................................................9
2.1.3 Differences between Curriculum Vitae (CV) and Resume..............................10
2.1.4 Different types of CV...........................................................................................10
2.1.5 Benefits of CV.......................................................................................................11
2.1.6 An Overview of NLP...............................................................................................13
2.1.7 Using Machine Learning for analysis and Classification.................................14

vi
2.2 Review of related works..........................................................................................16
CHAPTER THREE...............................................................................................................19
METHODOLOGY.................................................................................................................19
3.1 Introduction.................................................................................................................19
3.2 System Architecture....................................................................................................19
3.3 Resume from Kaggle...............................................................................................20
3.4 Resuming Preprocessing.........................................................................................20
3.5 NLP and ML Model Development.........................................................................21
3.6 Model Evaluation.....................................................................................................21
3.7 System flowchart.....................................................................................................23
4.1 Introduction.................................................................................................................24
4.2 Implementation Requirements..................................................................................24
4.2.1 Programming Language......................................................................................24
4.2.2 Justification for Choice of Programming Language........................................25
4.3 Implementation phases of the model.........................................................................26
4.3.1 Python Machine Learning Model.......................................................................26
4.3.2 Data Cleaning.......................................................................................................29
4.3.2 Building the model...............................................................................................32
4.3.4 Result.....................................................................................................................32
CHAPTER FIVE....................................................................................................................33
5.1 Summary..................................................................................................................33
5.2 Conclusion................................................................................................................34

REFERENCES

vii
CHAPTER ONE

INTRODUCTION

1.1 Background of the Study

Whether seeking for a temporary employment to gain experience or a permanent position

with the organization, the applicant must provide one of these documents with their

application materials: a résumé or curriculum vitae (CV). These documents serve as

important marketing tools that will give a self-portrait or advertisement of the employee and

will present your relative strengths, skills, and experiences to a potential employer
(Bhushan Kinge et al., 2022)
. An effective resume or CV will provide an employer with an overview

of who you are as a student or young professional, what you know and can do in relation to

the position of interest, and what relevant skills, traits, and accomplishments you have

achieved at this point in your education or career. Therefore, the objective of your resume or

CV is to catch the eye of a prospective employer and secure an interview


(Anggakusuma et al., 2020)
.

Every day, any company with a job opening for a particular position receives thousands of

emails from potential employees. It will be challenging for any recruiter to select the top

candidates from a huge pool of applicants for that employment position. It is exceedingly

difficult for recruiters to manually go through hundreds of resumes to locate the top

candidates for the post. About 75% of the thousands of resumes that were sent to the

organization in response to the job posting do not demonstrate the pertinent abilities needed

for the position. As a result, it can be quite difficult for recruiters to select the best individuals

from a big pool of applicants. Also, the process of reviewing each employee CV can be

cumbersome, as such, company would have adopted the use of external recruiter
(Amin et al., 2019)
.

1
The recruitment industry is worth $200 billion and it deals with selecting the best candidates

from an enormous pool of applicants who have the necessary skills for a certain job

description. Numerous employees send their resumes to the organization to apply for any job

openings that may exist at the company and screening resumes of all job applicants is the

recruiting process for any recruiter (Bersin, 2017). There has always been a search for an

automated process in which employers can quickly select eligible candidates and applicants

can show their ingenuity by using a single application format to apply to several

organizations because employing an external recruiter can be expensive at times, therefore,

application of an automated system to carry out the process is a tested and true way to carry

out the process. This study has adopted the application of machine learning.

In the subject of machine learning, we develop a model with a dataset to forecast the intended

result from new data. Natural language processing (NLP), which relates to the way humans

speak with one another, is primarily used to screen the resumes. The goal of NLP is to enable

computers to comprehend spoken and written language in a manner similar to that of humans.

NLP combines computational linguistics-rule-based modeling of human language with

statistical, machine learning and deep learning models. Together, combining these

technologies helps computers process the way human language works in the form of texts or

voice data and to ‘understand’ its full meaning. As the job market is growing in India,

millions of new job seekers are joining the workforce every year, as per LinkedIn (Suhua et

al, 2020)

Although the data formats used in CV/Resumes are not entirely unstructured, it is still

difficult to accept them in a standardized format since there is no set of rules for writing a

CV/Resume. With machine learning and NLP, to analyze any written documents such as

resumes, the potential to interpret unstructured data and extract relevant information from it,

as well as the ability to teach the computer can be achieved.

2
1.2 Statement of the Problem

The number of job seats available is not enough to cover the staggering number of

applications/resumes companies will receive. Each applicant is unique, as different people

with different experiences apply for jobs. Some persons hold positions in the human

resources division and they will have to review hundreds to thousands of resumes in order to

find the best fit for a job opening. Hence, if the companies hire in bulk there are many

applications to find the talent that they need which will require a considerable number of

resources and time. On an average, an HR executive takes about 10 to 15 minutes to review

each résumé, putting a summary together and adding the data to the database. Executives

condense the résumé and enter the applicant's contact information into their database and

calling them for interviews following the acquisition of the resume, but with machine

learning it will rank out the top resumes which are the best fit for the job role using NLP

algorithms.

The model will also ensure the switch from labor intensive human resume processing to

incredibly quick and affordable software. The following objectives are identified as necessary

in meeting set goals.

1.3 Aim and Objectives

1.3.1 Aim
The aim of this project is to develop a machine learning model to automate the extraction of

required information of candidate resume without manually going through all submitted

resume of an applicant. To achieve the project aim, the following objectives would be

considered.

1.3.2 Objectives
The objectives of this project are to:

3
a. develop a model that can accept new CV/Resume for review using NLP (Natural

Language Processing) technology and;

b. test and evaluate the model.

1.4 Methodology

The methodology for the development of a curriculum vitae review model based on natural

language processing is depicted in Figure 1.1 below.

INFORMATION GATHERING

BUILDING MODEL

TESTING MODEL

DEPLOYMENT

Figure 1.1: Architecture of the Proposed Model

To develop the model, following methodology would be considered;

i. Information gathering: This process involves collecting the necessary tools

needed, reviewing of related works and identifying the best and effective

tool/techniques to achieve the project aim.

4
ii. Building the model: The process involves writing the codes to achieve the project

aim. The programming language chosen is python, the development environment

is Anaconda and the algorithm to apply is NLP.

iii. Test the model: The process involves creating and structuring a format the model

will follow to review the document. The new CV to be reviewed will be supplied

and the similarity score will be produced ranging to an 100%. The higher the

similarity score, the more important the cv is.

iv. Deployment: The model can be packed and a manual will be provided on how the

model can be used.

1.5 Scope of the Study

The difficulty of extracting relevant information from a resume in an organized fashion can

be overcome with the aid of a purpose system. This study aims at developing machine

learning model that can help automate the process. This will be achieved by implementing a

NLP algorithm that will be capable of comparing submitted resume with the expected format

and information needed by a company. The model will only cover supplying a format for the

resume and a resume would be provided to compare the similarity of the new resume to the

company’s format.

1.6 Definition of terms

i. NLP: Natural language processing is an interdisciplinary subfield of linguistics,

computer science, and artificial intelligence concerned with the interactions between

computers and human language, in particular how to program computers to process

and analyze large amounts of natural language data.

ii. CV: A CV, which stands for curriculum vitae, otherwise known as Resume, is a

document used when applying for jobs. It allows you to summarise your education,

5
skills and experience enabling you to successfully sell your abilities to potential

employers

iii. Machine learning (ML) is a field of inquiry devoted to understanding and building

methods that 'learn', that is, methods that leverage data to improve performance on

some set of tasks. It is seen as a part of artificial intelligence.

6
CHAPTER TWO

LITERATURE REVIEW

2.0 Introduction

Hiring the right talent is a challenge for all businesses. Manually screening a large number of

resumes/cv takes at least one day. If a recruiter considers 4-6 appropriate resumes when

going through the initial resumes, chances are that they will not consider the other submitted

resumes. This decreases the likelihood of a successful resume being shortlisted. Going

through each resume is time-consuming, and manually organizing and managing a large

number of resumes is challenging. It’s normal to have some prejudice, wherever there’s been
(Naik et al., 2022)
human involvement .

This challenge is magnified by the high volume of applicants if the business is labor-

intensive, growing, and facing high attrition rates. An example of such a business is that IT

departments are short of growing markets. In a typical service organization, professionals

with a variety of technical skills and business domain expertise are hired and assigned to
(Barrak et al., 2022)
projects to resolve customer issues . This task of selecting the best talent

among many is known as Resume Screening. Typically, large companies do not have enough

time to open each CV, so they use machine learning algorithms for the Resume Screening

task and by this unemployment rate also reduced with efficient hiring. Machine learning is a

field in which involves training a model with data to anticipate the intended outcome when

new data is submitted. Natural language processing (NLP) is a commonly used to screen

resumes. Natural language refers to how humans communicate with one another (Riza et al,

2021).

In the NLP the system enables us to find the text based on the English dictionary in the same

way as humans. NLP combines statistical, machine learning, and deep learning models of

7
human language with computational linguistics-based rule-based modeling, here we need to

check for the data from different formats which are either in the form of the document or

either in the form of the audio data and understanding the whole meaning of it
(Dimopoulos, 2019)
. The number of applications is in the millions, making it a time-consuming chore to

sort through them. Here we need a machine learning algorithm that can give a better way of

understanding and also can full fill the requirements according to the requirement in the

industry. The proposed system takes a CSV file as input which contains different categories

and resumes based on the category and features of the resume the accuracy and performance

are calculated using different machine learning classifiers.

The study Employers expectations, a probabilistic text mining model (Gao and Eldin, 2018),

more than 20,000 job advertisements from various websites were processed, the method of

text mining was applied to identify information skills derived from the web pages of the

construction industry sector. In the research named Text Analysis for Job Matching Quality

Improvement (Kinoa et al., 2017), in a context of data analysis that includes travel time, job

location, job type, rates, and candidate skill set, etc. And when applying keywords in a

machine learning process using text mining tools, as a result, effective keywords are

discovered for a job matching system. In the research entitled Natural Language Processing

and Text Mining to Identify Knowledge Profiles for Software Engineering Positions (Almada

et al., 2017), through the application of NLP and TM to analyze the unstructured text of the

resumes and job offers, it manages to identify the knowledge profiles for software

engineering positions.

In the research entitled Data Mining Approach to Monitoring the Requirements of the Job

Market: A Case Study (Karakatsanis et al., 2018), presents an approach based on data mining

to identify the most demanded occupations in the modern labor market. To achieve this, have

8
a latent semantic indexing model that is able to match the job announcement extracted from

the 18web with the data of the occupation description in the database.

2.1 Research Concepts

This gives the details of the research concept of curriculum vitae

2.1.1 Curriculum Vitae (CV)

A curriculum vitae works in much the same way as a resume, providing information about an

individual's educational and work history. Often called a CV for short, it's much more

comprehensive than the typical resume and can be much longer. There's no limit to how long

a CV can be, but it must be focused on academic and professional experience. A lengthy CV

isn't any better than a short one if it contains fluff or irrelevant data. A job applicant seeking

an academic position, like a teaching appointment at a college or university or a research

position, should always use a CV. If you're unsure whether a prospective employer wants a

resume or CV, use the job announcement to guide you. It will usually state which document

the institution wants (Karakatsanis et al., 2018).

2.1.2 How a Curriculum Vitae (CV) Works

A CV begins with one’s contact information, including one’s name, address, telephone

number, and email address. You should also indicate one’s area or areas of academic interest.

One’s CV should include a comprehensive account of one’s academic history, including the

title of one’s dissertation or thesis. It must also contain details about all publications, research

projects, and presentations to which you have contributed. You should also list any grants,

academic awards, and other related honors you've received. The employment and experience

section of one’s CV should contain teaching and research positions, both paid and unpaid. In

addition to jobs, include any relevant internships and volunteer experiences here. Following

9
that section, discuss memberships in scholarly and professional associations and include

offices you have held, if any. Finally, provide a list of references, along with their contact

information, on one’s curriculum vitae. Doing this is in contrast to a resume, which never

contains this information.

2.1.3 Differences between Curriculum Vitae (CV) and Resume

A resume is a summary of one’s background and experience. Its emphasis is on one’s work

experience. A CV is much more comprehensive, providing details about one’s academic

background. Resumes are typically two pages or less, while CVs can be as long as needed to

convey one’s academic background and experience. CVs are used for academic positions,

and the format can vary as long as it includes all the information one’s prospective employer

requires. Resumes are used for most other positions and follow a few standard templates.

CV RESUME

Comprehensive list of one’s academic and professional Summary of one’s relevant work

experience experience

Can be multiple pages Typically, two pages or less

Used for academic positions Used for most employment

applications

2.1.4 Different types of CV

If you're crafting a new CV (or one’s first CV) you'll need to think about what type of CV

you want to make. This will depend on one’s experience, circumstances, industry and

personal preference. The different options include:

10
i. Chronological CVs

This is the most traditional type of CV, and is what most employers expect to see. A

chronological CV lays out one’s professional experience in reverse chronological order so

that one’s most recent job is at the top of the page. Ideally, a CV should go back around 10-

15 years, or cover one’s last 5-6 positions.

ii. Custom CVs

Although most CVs are chronological, in certain situations you may decide to order them

differently. For example, if you are changing careers, you might prioritise education and

experience that is most relevant to the role you're applying for, moving less relevant

experience further down the page. However, ensure that one’s CV is as clear as possible for

potential employers.

iii. Creative CVs

Creative CVs heavily use visual elements such as pictures, graphs and colors to represent

skills and experience. Creative CVs are common in fields such as marketing or design, but

may not be a good idea for more formal industries like banking or law. You can get an idea of

whether a creative CV would impress one’s potential employer by studying their job advert

and website—if it's written very formally, it's probably best to stick to a traditional CV.

2.1.5 Benefits of CV

A CV is important because it serves as an attention-grabbing bridge between you today and

the more successful future of living one’s dream career. The curriculum vita e is the most

important document that you’ll ever write to snatch the job opportunity. It will be one’s first

11
impression, and it should leave people wanting more. It should show the best angle of one’s

image to one’s future employers. Here are several benefits of having a well-made CV.

i. Boost self-confidence

Writing down one’s skills and abilities on one’s CV is a very constructive thing to do for

one’s self-confidence. With all of one’s positive traits on a piece of paper (or a screen) in

front of you, you will feel imbued with a strange strength that you thought would only be fit

for heroes. Wise men say that well-earned confidence is half the battle won. Future

employers will also be more likely to hire confident candidates. So, all the more reason to

create one’s self-confidence booster.

ii. Prove one’s knowledge

A good CV is not only one’s positive traits but also one’s certifications, experience, and other

notable achievements you have. Those are one’s proof of knowledge and help put you as one

of the self-aware candidates. Employers look for these kinds of credentials when they search

for candidates. They know that someone who has done something before knows how to do it

better than someone who hasn’t.

iii. Show teamwork skills

Since we're already mentioning one’s college organizational experience, one’s Curriculum

Vitae would also be a good place to display what teamwork skills you've put to good use. In

all fairness, there would only be a handful of jobs out there that don't require you to work

with a team to finish one’s daily tasks. This is why having one’s teamwork skills easily

known is a good thing while you're composing one’s CV.

iv. Make application stand out and leave a lasting impression

12
Let's face it, how would you feel if you receive a letter without clear information? Would you

try to get to know the person sending the letter? Most of us would just forget that letter and

grab the next letter in the queue. Now, that clear information for employers is one’s CV. Let's

make a memorable resume and let one’s potential employers remember one’s application and

try to get to know you, shall we?

v. Make interview process more effective and efficient

With a good CV in the hands of one’s potential employer, there will be more minor

interrogation happening to you. If they already have the needed information suitable for the

job description you are aiming for, the interview process would focus on what kind of person

you are instead of the details on the official documents. Concentrating on what you can do

again is memorizing one’s life which is backed up by one’s CV, that's going to be a fun

process. Reciting what you can do is another confidence booster, at least that's what I've

experienced myself.

vi. Improve employability

A great CV makes for a great hiring process, both for you as the job seeker and also for one’s

potential employer. With all one’s suitable traits written down on the CV that they're now

holding, prospective employers can't help but conclude that you're a person with good

attention to detail, who will go and research things that need to be done, an independent yet

inquisitive worker. Add that to the pleasant interview they had when they had you in their

office, and at the very least, I'd say you'll be on their list of top candidates to hire

(Karakatsanis et al., 2018).

2.1.6 An Overview of NLP

Natural language processing (NLP) is a subfield of Artificial Intelligence (AI). This is a

widely used technology for personal assistants that are used in various business fields/areas.

13
This technology works on the speech provided by the user, breaks it down for proper

understanding and processes accordingly. This is a very recent and effective approach due to

which it has a really high demand in today’s market. Natural Language Processing is an

upcoming field where already many transitions such as compatibility with smart devices,

interactive talks with a human have been made possible. Knowledge representation, logical

reasoning, and constraint satisfaction were the emphasis of AI applications in NLP. Here first
(Anggakusuma et al., 2020)
it was applied to semantics and later to the grammar .

In the last decade, a significant change in NLP research has resulted in the widespread use of

statistical approaches such as machine learning and data mining on a massive scale. The need

for automation is never ending courtesy of the amount of work required to be done these

days. NLP is a very favorable, but aspect when it comes to automated applications. The

applications of NLP have led it to be one of the most sought-after methods of implementing

machine learning. Natural Language Processing (NLP) is a field that combines computer

science, linguistics, and machine learning to study how computers and humans communicate

in natural language. The goal of NLP is for computers to be able to interpret and generate

human language. This not only improves the efficiency of work done by humans but also

helps in interacting with the machine. NLP bridges the gap of interaction between humans

and electronic devices.

2.1.7 Using Machine Learning for analysis and Classification

Pros and cons but none of them stands as a perfect solution. Static analysis is one of the

approaches and it can be defined as the analysis of a software without its execution. It is clear

that a good analysis tool can help spot and eradicate vulnerabilities, furthermore, it is

becoming a part of the development process. But there is still room for improvement and all

14
the research work done in this area can be of uttermost relevance for the industry

(Mohammed, and Behrouz, 2018).

There are different types and classifications of machine learning models, provided by

different contributors. The most widely used review models are:

i. Decision trees:

Decision trees are a simple, but powerful form of multiple variable analysis. They are

produced by algorithms that identify various ways of splitting data into branch-like segments.

Decision trees partition data into subsets based on categories of input variables, helping you

to understand someone’s path of decisions.

ii. Regression (linear and logistic)

Regression is one of the most popular methods in statistics. Regression analysis estimates

relationships among variables, finding key patterns in large and diverse data sets, and how

they relate to each other.

iii. Neural networks:

Patterned after the operation of neurons in the human brain, neural networks (also called

artificial neural networks) are a variety of deep learning technologies. They’re typically used

to solve complex pattern recognition problems – and are incredibly useful for analyzing large

data sets. They are great at handling nonlinear relationships in data – and work well when

certain variables are unknown

iv. Other classifiers:

Time Series Algorithms, Clustering Algorithms, Ensemble Models, Factor Analysis, Naïve

Bayes and Support vector machines. Each classifier approaches data in a different way,

therefore for organizations to get the results they need, they need to choose the right

15
classifiers and models. data scientists and IT experts are tasked with the development of

choosing the right predictive models or building their own to meet the organization’s needs.

2.2 Review of related works

Several researchers have worked on curriculum vitae review model based on natural

language processing. Some of the works are documented as follows:

Juneja et al. (2016) Used Natural Language Processing (NLP) and Machine Learning (ML) to

rank the resumes according to the given constraint, this intelligent system ranks the resume of

any format according to the given constraints or the following requirement provided by the

client company. We will basically take the bulk of input resume from the client company and

that client company will also provide the requirement and the constraints according to which

the resume should be ranked by our system. Beside the information provided by the resume

we are going to read the candidates social profiles (like LinkedIn, Github etc) which will give

us the more genuine information about that candidate

Amin et al. (2019) , this research focus majorly on the design of the web application which

will be used to screen resumes (Curriculum Vitae) for a particular job posting. In the

proposed system, a web application will encourage the job applicant candidates as well as the

recruiters to use it for job applications and screening of resumes. The recruiters from various

companies can post the details of the job openings available in their respective companies.

The interactive web application will allow the job applicants to submit their resume and apply

for their job postings they may still be interested in. The resumes submitted by the candidates

are then compared with the job profile requirement posted by the company recruiter by using

techniques like machine learning and Natural Language Processing (NLP). Scores can then

16
be given to the resumes and they can be ranked from highest match to lowest match. This

ranking is made visible only to the company recruiter who is interested to select the best

candidates from a large pool of candidates.

Rabih et al. (2021), presented a paper on curriculum vitae evaluation using machine learning

approach. Its main role is to detect the eligibility of people who are applying to job vacancies

or higher education programs. This research work ambitions in elaborating a system that

automates the preselection of eligibility and assessment of candidates in the higher education

students’ recruitment process. This system will replace the tedious tasks of manual

processing of CVs and will provide accurate and effective evaluation results. To achieve this

requirement, the system will be implemented using a machine learning approach using

different classification algorithms.

Its main role is to detect the eligibility of people who are applying to job vacancies or higher

education programs. This research work ambitions in elaborating a system that automates the

preselection of eligibility and assessment of candidates in the higher education students’

recruitment process. This system will replace the tedious tasks of manual processing of CVs

and will provide accurate and effective evaluation results. To achieve this requirement, the

system will be implemented using a machine learning approach using different classification

algorithms. The limitation of this work is that the system can not the analysis scope is applied
Lokesh.et al. (2022),
to the candidates who are applying to pursue a Master’s degree only this

report has discussed the detailed design and related algorithms for a resume screener, to

decide whether a particular candidate is suitable for the applied role or not. Candidates apply

in large numbers for jobs on web portals by uploading their resumes. As a result, filtering

applicants for the appropriate position in an organization becomes a difficult task for

recruiters. Natural Language Processing (NLP) techniques to extract the relevant information

from the resume to save time and effort. Also, a Machine Learning (ML) model is trained to

17
check whether a candidate’s skills, experiences, and other aspects are suitable for that

particular role. In addition to that, our system will also recommend the other available job

roles based on the candidate’s skillset. On analyzing the performance of the models, we

found that Logistic Regression performs the best for this problem statement. We also found

that more dataset is required for making this model work even more efficiently. More

attributes can be added to find much better performance. Overall, the system performs pretty

well with the current resources. As a part of our future work, it was intended to improve the

accuracy of the system by collecting more resumes from organizations and training our

model for all the available roles. In addition to that, we could also analyze the candidate’s

information from social networking sites like Facebook, Twitter, Linkedin, so that we can

decide more accurately and authentically whether to offer the job or not. Additionally,

algorithms such as Naive Bayes, K-Nearest Neighbor, and C4.5 Analysis can be performed,

to check if it improves the result.

Naga (2022), Selecting applicants for the appropriate job within a company is a difficult task

for recruiters. Extraction the key information from the CV using NLTK, Natural Language

Processing (NLP) techniques to save time and effort. This paper examines a variety of

machine learning model such as KNN, SVM, logistic regression and MLP, to detect, identify,

and categories diverse resumes. And here we achieve the better accuracy and we implement a

web interface to screen the resumes and analyses the type of job related to resume, MLP

outperforms other approaches like KNN, SVM, Logistic Regression. Furthermore, this

system attempts to find the accuracy and performance of the proposed methodology and

incorporate it in the IT firms and other regulations for the prevention of manual screening and

establish a safe allocation of resources for the companies.

18
CHAPTER THREE

METHODOLOGY

3.1 Introduction

This section of the project discussed the methodological steps to be followed to develop the

automated CV review model using NLP. It also discusses the requirements and architecture

of the proposed model.

3.2 System Architecture

model is shown in the Figure 3.1 below.

Figure 3.1: System Architecture of CV review model based on natural language

processing

19
3.3 Resume from Kaggle

This is methodical process of collecting and analyzing data from a variety of sources to get a

complete and accurate picture of a subject. In this context, the dataset to be used would be

downloaded from an online repository

https://www.kaggle.com/datasets/gauravduttakiit/resume-dataset. The dataset contains 962

samples and two variables (Columns), the first variable is the category, which defines the

category of job the candidate is applying for while the second variable is the CV, which

contains the content in the candidate CV.

3.4 Resuming Preprocessing

The very first step of NLP projects is pre-processing the data. This is essential in preparing

the text data for the model building before using it for analysis or prediction. Some of the

preprocessing steps are:

i. Removing punctuations like . , ! $( ) * % @ : Removing punctuations from the

dataset.

ii. Removing URLs: Removing any links from the dataset.

iii. Removing Stop words: Stop words are the words in a stop list which are removed

before or after processing of natural language data because they are unimportant.

They can be list of pronouns that doesn’t have meaning to the dataset.

iv. Tokenization: Tokenization is used in natural language processing to split paragraphs

and sentences into smaller units that can be more easily assigned meaning.

v. Stemming or Lemmatization: Stemming and lemmatization are methods in NLP used

to analyze the meaning behind a word. Stemming uses the stem of the word, while

lemmatization uses the context in which the word is being used (Deepanshi, 2022).

20
3.5 NLP and ML Model Development

To build an NLP Model, this process involves both dependency parsing and part of speech

tagging. Dependency parsing is used to find out the relationship between all the words in a

sentence. To find the dependency, it involves building a tree and assigning a single word as a

parent word. The main verb in the sentence will function as the root node while part of

speech tagging contains verbs, adverbs, nouns, and adjectives that help convey the meaning

of words in a sentence in a grammatically correct way in a sentence. This process will be

implemented using an NLP library called NLTK (Natural Language Toolkit) which is a suite

of libraries and programs for symbolic and statistical natural language processing for English

written in the Python programming language to develop NLP models. An output of the NLP

process will be an input in form of dataset to the classification model to be developed. This

model will classify CVs based on applicant’s skillset to determine if the applicant will be

selected or not. The algorithm to be used is a multi-class KNN algorithm.

3.6 Model Evaluation

Model Evaluation involves using performance metrics techniques to measure the

performance of a model. Some of the metrics used are highlighted below.

i. Accuracy

Accuracy is the ratio of correct predictions to the total number of predictions. It is one of the

simplest measures of a model. We must aim for high accuracy for our model. If a model has

high accuracy, we can infer that the model makes correct predictions most of the time.

CorrectPrediction
Accuracy=
CorrectPrediction+ IncorrectPrediction

21
TruePositive+TrueNegative
Accuracy=
TruePositive+ FalsePositive+TrueNegative + FalseNegative

ii. Mean Score

F1 score depends on both the Recall and Precision, it is the harmonic mean of both the

values.

Recall∗Precision
Mean Score=
Recall+ Precision

iii. Precision

Precision is the ratio of true positives and total positives predicted.

CV correctly classified

CV correctly classified + Incorrectly classified

iv. Recall

A Recall is essentially the ratio of true positives to all the positives in ground truth.

CV correctly classified

CV correctly classified + Incorrectly classified

22
3.7 System flowchart

The system flowchart of CV review is presented in Figure 3.2

Figure 3.2: System Flowchart of CV’s Review

23
CHAPTER FOUR

SYETEM IMPLEMENTATION, RESULTS AND DISCUSSION

4.1 Introduction

This chapter describes the implementation stages for the development of Natural Language

models for reviewing CV. The modeling is done using a dataset which is retrieved from

kaggle. Python programming language will be used for developing the model.

4.2 Implementation Requirements

The model development requirements are into two main parts, software and hardware

requirements. The candidates’ credibility datasets form a major component for the model

building.

a. Software Requirements

i. Python 3.9.0

ii. Jupyter Notebook

b. Hardware Requirements

i. DELL 7400, 8 GHz processor, 16GB RAM, 64-bits OS

4.2.1 Programming Language

Python is a high-level, interpreted, interactive and object-oriented scripting language. Python

is designed to be highly readable. It uses English keywords frequently whereas other

languages use punctuation, and it has fewer syntactic constructions than other languages.

i. Python is Interpreted: Python is processed at runtime by the interpreter. You do not

need to compile your program before executing it. This is similar to PERL and PHP.

ii. Python is Interactive: You can actually sit at a Python prompt and interact with the

interpreter directly to write your programs.

24
iii. Python is Object-Oriented: Python supports Object-Oriented style or technique of

programming that encapsulates code within objects.

iv. Python is a Beginner's Language: Python is a great language for the beginner-level

programmers and supports the development of a wide range of applications from

simple text processing to WWW browsers to games.

4.2.2 Justification for Choice of Programming Language

Machine learning is considered to be the trending technology of the future. Already there are

a number of applications made on it. Due to this, many companies and researchers are taking

interest in it. But the main question that arises here is that in which programming language

can these machine learning be developed? There are various programming languages like

Lisp, Prolog, C++, Java and Python, which can be used for developing applications of

Machine learning. Among them, Python programming language gains a huge popularity and

the reasons are as follows:

i. Simple syntax & less coding

Python involves very less coding and simple syntax among other programming languages

which can be used for developing Machine learning Model. Due to this feature, the testing

can be easier and we can focus more on programming.

ii. Inbuilt libraries for Machine Learning projects

A major advantage for using Python for Machine learning is that it comes with inbuilt

libraries. Python has libraries for almost all kinds of machine learning projects. For example,

NumPy, SciPy, are some of the important inbuilt libraries of Python.

iii. Open source: Python is an open-source programming language. This makes it widely

popular in the community.

25
4.3 Implementation phases of the model

The implementation phase of this study shows the modeling of LR, SVM and KNN to

classify customer churn, which is carried out using Python.

4.3.1 Python Machine Learning Model

In order to reduce the complexity of the structure. The full dataset which is loaded using a

train_test_split method will be implemented.

Importing the necessary libraries

In this project, we aim to investigate the effect of alcohol consumption on student academic

performance using a random forest model. We will be using the sklearn library in Python to

implement the model and evaluate its performance.

Some other important libraries imported include the following:

i. numpy: a library for numerical computing in Python that provides support for arrays,

matrices, and mathematical functions.

ii. pandas: a library for data manipulation and analysis in Python that provides support

for reading and writing data in various file formats and performing operations such as

filtering, grouping, and merging.

iii. scikit-learn: a library for machine learning in Python that provides support for a wide

range of algorithms, including the random forest algorithm.

iv. matplotlib: a library for data visualization in Python that provides support for creating

various types of plots, charts, and graphs.

26
Figure 4.1: Importing libraries

Previewing the dataset

The dataset has two variables, a category, which is the area the applicant is applying for and

the resume of the application.

Figure 4.2: Dataset Preview

27
Previewing all the categories in the dataset

Figure 4.3: Categories of job available and number of applicants

28
Figure 4.4: Categories of job available

Figure 1.5: Job category distribution in percentage

4.3.2 Data Cleaning

This process involves scanning through the dataset to identify and remove errors from the

dataset. It involves special symbols and characters, hashtags. The idea is to have only English

like words in the dataset.

29
Figure 4.6: Cleaning the dataset

Creating a new column for cleaned dataset

After properly cleaning the dataset, a new column was created to have the cleaned dataset.

Figure 4.7: Creating a new dataset

Tokenizing the cleaned dataset

This process involves breaking down the cleaned resumed into single words and converting

to lower case.

30
Figure 4.8: Tokenizing the cleaned dataset

Generating a classification class target

This process involves converting the classification category into integers, instead of

addressing it with the string name

Figure 4.9: Generating a classification class target

31
4.3.2 Building the model

The model building process involves splitting the dataset into training and testing set, 20% of

the entire dataset will be for testing and 80% for training.

KNN algorithm is used to build the classification model.

Figure 4.10: Building the model

After successfully building the model, the validation accuracy is 96%, signifying a

satisfactory result.

4.3.4 Result

32
CHAPTER FIVE

SUMMARY, CONCLUSION AND RECOMMENDATIONS

5.1 Summary

The submission of either a résumé or curriculum vitae (CV) is essential for individuals

seeking temporary employment for experience or a permanent position within an

organization. These documents serve as crucial marketing tools, portraying the applicant's

strengths, skills, and experiences to potential employers. A well-crafted résumé or CV

provides an overview of the applicant's identity, educational and professional background,

and relevant achievements, with the primary goal of capturing the attention of prospective

employers for interview consideration. The high volume of job applications received by

companies daily poses a significant challenge for recruiters in manually identifying top

candidates. The manual review of hundreds of resumes is time-consuming, and a

considerable percentage of submitted resumes often lack the essential qualifications for the

advertised positions. To address these challenges, some companies resort to employing

external recruiters, adding to the overall recruitment costs.

Recognizing the need for an efficient and cost-effective solution, this study adopts machine

learning, specifically natural language processing (NLP). NLP, which mimics human

language comprehension, is employed to automate the screening of resumes. The objective is

to enable computers to understand and process spoken and written language, combining rule-

based computational linguistics with statistical, machine learning, and deep learning models.

The recruitment industry, valued at $200 billion, grapples with the task of selecting the best

candidates from a vast pool of applicants possessing the required skills. The study

emphasizes the growing job market in India, leading to millions of new job seekers entering

33
the workforce annually. While CV/Resume data formats are not entirely unstructured, their

lack of standardized rules for writing makes them challenging to accept in a uniform format.

The integration of machine learning and NLP offers a solution to analyze and interpret

unstructured data from resumes, facilitating information extraction. The proposed model aims

to transition from labor-intensive human resume processing to a rapid and cost-effective

automated system.

The identified objectives for the study include developing a model capable of accepting and

reviewing CVs using NLP technology, followed by testing and evaluating the model's

effectiveness. The anticipated outcomes involve the shift towards a more efficient and

affordable software-driven resume review process.

5.2 Conclusion

In conclusion, this study shows the critical role of résumés and curriculum vitae (CVs) as

indispensable marketing tools for individuals seeking employment. The documents serve as

comprehensive self-portraits, presenting an applicant's strengths, skills, and experiences to

potential employers. The ultimate objective is to capture the attention of recruiters and secure

interview opportunities. The challenges faced by recruiters in manually reviewing a vast

number of resumes, particularly in the context of an insufficient number of job openings,

necessitate a paradigm shift in the recruitment process. The study recognizes the limitations

of traditional methods and advocates for the adoption of machine learning, specifically

natural language processing (NLP), as a transformative solution. The application of NLP in

screening resumes offers the potential to automate and expedite the review process. By

enabling computers to comprehend and process human language, NLP serves as a bridge

between rule-based computational linguistics and advanced statistical, machine learning, and

deep learning models. The study emphasizes the significance of leveraging technology to

address the challenges posed by the sheer volume of job applications received by companies.

34
The proposed machine learning model, as outlined in the study, aims to revolutionize the

conventional approach to resume review. The model's objectives include the development of

a system that utilizes NLP technology for efficient CV/Resume review and subsequent testing

and evaluation. This transition from labor-intensive human processing to a swift and cost-

effective software-driven approach marks a significant advancement in the recruitment

landscape. In essence, the integration of machine learning and NLP offers a promising

solution to the challenges faced by recruiters in handling large volumes of resumes. As the

job market continues to evolve, the adoption of technology-driven models becomes

imperative for enhancing efficiency, reducing costs, and ensuring the identification of the

most qualified candidates. This study contributes to the ongoing dialogue on optimizing the

recruitment process through innovative and intelligent solutions.

5.3 Contribution to Knowledge

The proposed machine learning model is expected to contribute to knowledge by providing

the required tool to extract the crucial data from a CV and Find relevant qualifications from a

variety of CVs and learning pertinent information from them.

5.4 Recommendations

35
REFERENCES

Amin, S., Jayakar, N., Sunny, S., Babu, P., Kiruthika, M., & Gurjar, A. (2019, January 1). Web Application for

Screening Resume. 2019 International Conference on Nascent Technologies in Engineering, ICNTE

2019 - Proceedings. HYPERLINK

"https://doi.org/10.1109/ICNTE44896.2019.8945869"https://doi.org/10.1109/

ICNTE44896.2019.8945869

Anggakusuma, J., Mawardi, V. C., & Lauro, M. D. (2020). Resume extraction with conditional

random field method. IOP Conference Series: Materials Science and Engineering, 1007(1).

https://doi.org/10.1088/1757-899X/1007/1/012154

Barrak, A., Adams, B., & Zouaq, A. (2022). Toward a traceable, explainable, and fairJD/Resume

recommendation system. http://arxiv.org/abs/2202.08960

Rabih, H.E. & Mercier, L., (2021). Curriculum Vitae Evaluation using Machine Learning

Approach. Curriculum Vitae Evaluation using Machine Learning Approach. Artificial

Intelligence for Knowledge Management IFIP AICT 614, 2021.

Bhushan Kinge, Shrinivas Mandhare, Pranali Chavan, & S. M. Chaware. (2022). Resume

Screening using Machine Learning and NLP: A proposed system. International Journal of

Scientific Research in Computer Science, Engineering and Information Technology, 253–

258. https://doi.org/10.32628/cseit228240

Dimopoulos, A. (2019). Comparative Effect of Candidates’ Physical Attractiveness between

Resume Screening and Interview Process Outcomes. Empirical Research for Greece.

36
International Journal of Human Resource Studies, 9(3), 230.

https://doi.org/10.5296/ijhrs.v9i3.15226

Lokesh. S, Balaje. S, M., Prathish. E, & B. Bharathi. (2022). Resume Screening and

Recommendation System using Machine Learning Approaches. Computer Science &

Engineering: An International Journal, 12(1), 1–7. https://doi.org/10.5121/cseij.2022.12101

Naga, M. (2022). RESUME SCREENING USING MACHINE LEARNING.

www.jespublication.com

Naik, R. S., Dhotre, S. R., & Professor, A. (2022). RESUME RECOMMENDATION USING

MACHINE LEARNING (Vol. 10, Issue 7). www.ijcrt.org

Almada, R. V., Elias, O. M., G ´omez, C. E., Mendoza, M. D., L ´opez, S. G.,Natural

Language Processing and Text Mining to Identify Knowledge Profiles for Software

Engineering Positions, 5th 81International Conference in Software Engineering

Research and Innovation (CONISOFT), 2017.

Gao, L., Eldin, N., Employer’s expectations: A probabilistic text mining model, Creative

Construction Conference 2018, CC2014.

Karakatsanis, I., AlKhader, W., MacCrory, F., Alibasic, A., Omar, M. A., Aung,Z., Woon,

W. L., Data Mining Approach to Monitoring The Requirements of the Job Market: A

Case Study. Electrical Engineering and Computer Science, Masdar Institute of

Science and Technology, Abu Dhabi, United Arab Emirates, 2018.

37
Kinoa, Y., Kurokia, H., Machidab, T., Furuyab, N., Takanob, K., “Text Analysis for Job

Matching Quality Improvement,” International Conference on Knowledge Based and

Intelligent Information and Engineering Systems, 2017.

Melo-Acosta, German E., et al. “Fraud Detection in Big Data Using Supervised and Semi-

Supervised Learning Techniques.” 2017 IEEE Colombian Conference on

Communications and Computing (COLCOM), 2017,

doi:10.1109/colcomcon.2017.8088206.

Mohammed, Emad, and Behrouz Far. “Supervised Machine Learning Algorithms for Credit

Card Fraudulent Transaction Detection: A Comparative Study.” IEEE Annals of the

History of Computing, IEEE, 1 July 2018,

doi.ieeecomputersociety.org/10.1109/IRI.2018.00025.

Riza, F. Rajah V, and Sharadadevi, V. (2021) “Resume Classification and Ranking

using KNN and Cosine Similarity” In 2021 International Journal of Engineering.

Juneja, A., Momin, A. (2016) Resume Ranking using NLP and Machine Learning.

38

You might also like