Intelligent Resume Screening and Ranking System Using NLP
Intelligent Resume Screening and Ranking System Using NLP
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE & ENGINEERING (AI & ML)
Submitted
by
J. Manoj : 2111CS020272
M. Nandeeswar : 2111CS020292
K. Neha : 2111CS020307
S. Nikhitha : 2111CS020316
M. Nikitha : 2111CS020317
2024
i
COLLEGE CERTIFICATE
This is to certify that this bonafied record of the application development entitled
“Intelligent Resume Screening and Ranking System Using NLP” submitted by
J.Manoj(2111CS020272),M.Nandeeswar(2111CS020292),K.Neha(2111CS02030
7),S.Nikhitha(2111CS020316), M. Nikitha(2111CS020317) of B Tech IV year I
semester, Department of CSE(AI&ML) during the year 2024-25. The results
embodied in the report have not been submitted to any other university or institute
for the award of any degree or diploma.
INTERNALGUIDE HEADOFTHEDEPARTMENT
Dr. A. Kiran Kumar Dr. Sujit Das
DEAN CSE(AI-ML)
Dr. Thayyaba Khatoon
EXTERNAL EXAMINER
ii
ACKNOWLEDGEMENT
We sincerely thank our DEAN Dr. ThayyabaKhatoon for her constant support and
motivation all the time. A special acknowledgement goes to a friend who enthused us from
the back stage. Last but not the least our sincere appreciation goes to our family who has been
tolerant understanding our moods, and extending timely support.
We would like to express our gratitude to all those who extended their support and
suggestions to come up with this application. Special Thanks to our Guide Dr. A. Kiran
Kumar whose help and stimulating suggestions and encouragement helped us all time in the
due course of project development.
iii
Abstract
The most qualified applicant for a position must be found through careful consideration
of job applications, which is done during the Automated Evaluation of Resumes Using NLP
stage of the hiring process. Automated resume screening is now a practical alternative to the
manual screening procedure because to developments in deep learning and natural language
processing (NLP). In this paper, we examine a few contemporary methods for screening
automated resumes. To increase the precision and effectiveness of the screening process, these
approaches employ a variety of methods including hybrid deep learning frameworks, transfer
learning, genetic algorithms, and multisource data. Also, some research investigates the use of
job descriptions to improve resume screening precision. These research' experimental findings
show that the suggested strategies are more effective than conventional ones. The results of this
study can help human resource managers and recruiters automate the hiring process and
efficiently and impartially identify viable applicants.
iv
CONTENTS
ANALYSIS:
3.1 Project Planning and
Research
3 3.2 Software Requirement 9-16
Specification
3.2.1 Software Requirement
3.2.2 Hardware Requirement
3.3 Model Selection and
Architecture
DESIGN:
4.1 Introduction
4.2 UML Diagram
4 4.3 Dataset Description 17-21
4.4 Data Preprocessing Techniques
4.5 Methods & Algorithms
v
CHAPTER 1
1. INTRODUCTION
An essential step in the hiring process is the automatic review of resumes, which entails
assessing job applications to find the applicant most suited for a given position. This procedure
may take a long time and be prone to human mistake, which could lead to the loss of qualified
individuals. Automated resume screening has grown in popularity recently as a solution to this
problem. Automatic resume screening uses several methods to enhance accuracy and efficiency,
including deep learning algorithms, machine learning, and natural language processing (NLP).
Several studies have suggested various methods for automating the screening of resumes. Li et
al. introduced a hybrid deep learning framework that makes use of long short-term memory
(LSTM) networks and convolutional neural networks (CNNs) [1].
Hiring the right talent is a challenge for all businesses. This challenge is magnified by the high
volume of applicants if the business is labor intensive, growing, and facing high attrition rates.
An example of such a business is that IT departments are short of growing markets. In a typical
service organization, professionals with a variety of technical skills and business domain
expertise are hired and assigned to projects to resolve customer issues. This task of selecting the
best talent among many is known as Resume Screening. Typically, large companies do not have
enough time to open each CV, so they use machine learning algorithms for the Resume
Screening task, and by this, the unemployment rate is also reduced through efficient hiring [3]
[5]. Machine learning is a field in which we train a model with data to anticipate the intended
outcome when new data is submitted. Natural language processing (NLP) is commonly used to
screen resumes. Natural language refers to how humans communicate with one another [9].
1.1Problem Definition
To automate the resume screening process, improving efficiency and accuracy in
selecting candidates who best fit job descriptions. The challenges that we are facing are High
Volume of Resumes, Variability in Resume Formats, Subjectivity in Screening and Job
1
Description Misalignmentensuring that resumes are matched accurately against specific job
requirements. Traditional methods may introduce bias or overlook qualified candidates. Resumes
come in various formats and styles, complicating extraction and comparison. Recruiters often
receive hundreds of applications, making manual screening time-consuming.
Key Components:
1.Data Collection:
Resume Database: A repository of resumes in various formats (PDF, DOCX, etc.).
Job Descriptions: Structured data defining requirements, responsibilities, and
qualifications.
2. Preprocessing:
Text Extraction: Converting resumes from various formats into plain text.
Normalization: Standardizing text (lowercasing, removing special characters).
Tokenization: Splitting text into words or phrases for analysis.
Stop-word Removal: Eliminating common words that do not add value (e.g., "the," "is").
3. Feature Extraction:
Keyword Identification: Extracting key skills, qualifications, and experiences relevant to
the job.
N-grams and Phrases: Identifying common phrases and multi-word expressions.
Semantic Analysis: Utilizing word embeddings (e.g., Word2Vec, GloVe) to understand
context and meaning.
4. Ranking Algorithm:
Similarity Scoring: Employing algorithms (e.g., cosine similarity, Jaccard index) to
compare resumes against job descriptions.
Machine Learning Models: Using supervised learning to train models based on historical
hiring data to rank resumes.
2
5. Bias Mitigation:
Fairness Metrics: Implementing techniques to identify and reduce bias in screening.
Diverse Training Data: Ensuring a balanced dataset for training models.
6. User Interface:
Dashboard for Recruiters: A user-friendly interface to view ranked candidates, filter
results, and provide feedback.
Integration with ATS: Compatibility with existing Applicant Tracking Systems to
streamline workflow.
3
1. Enhance Recruitment Efficiency:Reduce the time and effort required for recruiters to
sift through large volumes of resumes by automating the initial screening process.
6. Integrate with Existing Systems: Ensure seamless integration with existing Applicant
Tracking Systems (ATS) to enhance the overall recruitment workflow without
requiring significant changes to current processes.
4
Resume Processing: Development of algorithms for extracting and analyzing
information from various resume formats (PDF, DOCX, TXT).Implementation of text
normalization, tokenization, and feature extraction techniques.
Job Description Analysis: Ability to parse and understand job descriptions to identify
key skills, qualifications, and responsibilities. Matching resumes to job descriptions using
semantic analysis.
Bias Mitigation: Implementation of fairness metrics and techniques to reduce bias in the
screening process.
5
Data Quality and Availability: The effectiveness of the system heavily relies on the
quality and quantity of training data. Incomplete or biased datasets can lead to inaccurate
results.
Variability in Resumes: Resumes can vary significantly in structure and format, making
it challenging to extract information consistently across all documents.
Complexity of Job Descriptions: Some job descriptions may contain vague or overly
complex language, making it difficult for the system to extract clear criteria for ranking.
Dynamic Job Market: The skills and qualifications in demand can change rapidly,
requiring continuous updates to the model and its training data.
Bias in Algorithms: Despite efforts to reduce bias, the underlying algorithms may still
inadvertently reflect biases present in training data, affecting candidate selection.
CHAPTER 2
2. LITERATURE SURVEY
6
A literature survey for the project on an Intelligent Resume Screening and Ranking System
using Natural Language Processing (NLP) includes an exploration of relevant research papers
and methodologies. Here's a structured overview:
1. Automated Resume Screening Using NLP
Key Papers:
“A Machine Learning Approach for Automation of Resume Recommendation
System”
International Conference on Computational Intelligence and Data Science (ICCIDS),
2019:
This paper presents a machine learning-based approach to automate resume
recommendation. The system utilizes NLP techniques to parse resumes and match them
to job descriptions. Various classifiers like Naive Bayes, Decision Trees, and Support
Vector Machines (SVM) are compared for their performance in resume classification.
“Web Application for Screening Resume Using NLP”
IEEE International Conference on Nascent Technologies in Engineering, 2019:
Focuses on the development of a web application that uses NLP techniques to automate
resume screening. The paper discusses the use of semantic matching, extracting key skills
and experiences, and ranking resumes based on job requirements using keyword
matching and machine learning algorithms.
2. Natural Language Processing Techniques in Resume Screening
Key Papers:
“Resume Classification and Ranking Using KNN and Cosine Similarity”
International Journal of Engineering, 2021:
This paper examines how NLP can be applied to resume screening by using cosine
similarity to measure the relevance of a resume to a given job description. It also
compares different algorithms like K-Nearest Neighbors (KNN) for classifying resumes
based on extracted features like skills, education, and work experience.
“Design and Development of E-Learning Based Resume Ranking”
International Conference on E-Learning and Emerging Technologies, 2020:
7
Key Papers:
“Differential Hiring Using Combination of Named Entity Recognition (NER) and
Word Embedding”
International Journal of Recent Technology and Engineering (IJRTE), 2020:
This paper explores the use of NER and word embedding techniques to identify and
extract entities like skills, qualifications, and experiences from resumes. The extracted
data is then ranked using machine learning models such as Random Forest and Gradient
Boosting.
“Web Application for Screening Resume Using Machine Learning”
Fr. Conceicao Rodrigues Institute of Technology, 2019:
The study discusses the implementation of a resume screening system that uses machine
learning algorithms to rank resumes based on predefined criteria. The model integrates
NLP techniques like sentence parsing and skill extraction with machine learning for
improved ranking accuracy.
4. Hybrid and Advanced NLP Models in Resume Screening
Key Papers:
“BERT-Based Model for Resume Screening and Ranking”
Proceedings of the International Conference on Artificial Intelligence, 2021:
This paper presents a BERT-based model for resume screening, focusing on extracting
semantic information from resumes and job descriptions. The model was fine-tuned to
improve accuracy in identifying the best resumes based on deep context understanding.
“Deep Learning Approaches for Intelligent Resume Screening”
IEEE Transactions on Artificial Intelligence, 2022:
Explores the use of deep learning models like CNNs and LSTMs for processing and
analyzing resumes. The study compares traditional machine learning approaches with
deep learning techniques to demonstrate the enhanced performance of intelligent resume
screening systems.
CHAPTER 3
3 Budget Estimation
RESEARCH
1. Literature Review
9
Investigate existing resume screening tools and their methodologies.
Study NLP techniques relevant to resume parsing and analysis.
Explore machine learning models used in similar applications.
2. Technology Stack
Programming Languages: Python (for NLP and data processing), JavaScript (for front-
end).
Libraries:
o NLP: SpaCy, NLTK, Hugging Face Transformers.
o Machine Learning: Scikit-learn, TensorFlow, PyTorch.
o Web Frameworks: Flask or Django for backend; React or Angular for frontend.
Database: SQL or NoSQL (e.g., MongoDB) for storing resumes and user data.
3. Data Sources
Explore open datasets for resumes and job descriptions (e.g., Kaggle datasets).
Consider partnerships with recruitment agencies for anonymized data.
4. Ethical Considerations
5. User Research
Implementation Steps
1. Kickoff Meeting
o Discuss project goals, assign roles, and establish communication channels.
2. Data Preparation
10
o Create a plan for data collection and storage.
o Ensure data privacy and compliance with regulations (e.g., GDPR).
3. Prototyping
o Develop a minimal viable product (MVP) focusing on core functionalities.
o Iterate based on feedback.
5. Documentation
o Maintain clear documentation for code, user guides, and system architecture.
1. Introduction
1.1 Purpose: The purpose of this document is to outline the requirements for the
Intelligent Resume Screening and Ranking System, which automates the process of
screening and ranking resumes based on job descriptions using Natural Language
Processing (NLP).
1.2 Scope: The system will allow recruiters to upload job descriptions and resumes,
automatically parse and analyze them, and provide a ranked list of candidates. It will
include features for keyword extraction, semantic analysis, and a user-friendly interface.
1.3 Definitions, Acronyms, and Abbreviations:
o NLP: Natural Language Processing
o HR: Human Resources
o UI: User Interface
o API: Application Programming Interface
11
2. Overall Description
2.1 Product Perspective: The system will be a web-based application that integrates
with existing HR tools and platforms, offering seamless user experiences for recruiters.
2.2 Product Functions:
o Resume parsing and analysis
o Keyword extraction from job descriptions
o Semantic similarity scoring
o Candidate ranking
o User dashboard for managing resumes and job postings
2.3 User Classes and Characteristics:
o Recruiters: Primary users who will upload job descriptions and resumes and
review ranked candidates.
o Job Seekers: Indirect users whose resumes will be processed and analyzed.
3. Functional Requirements
12
3.6 Candidate Ranking:
o The system shall rank candidates based on a scoring system that combines
keyword matching and semantic similarity.
3.7 User Dashboard:
o The system shall provide a dashboard for recruiters to view uploaded job
descriptions and the corresponding ranked candidate list.
3.8 Feedback Mechanism:
o The system shall allow recruiters to provide feedback on candidate rankings,
which will help improve the model.
4. Non-Functional Requirements
4.1 Performance:
o The system shall process and rank resumes within 30 seconds for a typical job
description.
4.2 Usability:
o The user interface shall be intuitive and easy to navigate for users with minimal
technical knowledge.
4.3 Security:
o The system shall ensure data privacy and comply with relevant data protection
regulations (e.g., GDPR).
4.4 Scalability:
o The system shall be scalable to handle thousands of resumes and job descriptions
concurrently.
4.5 Reliability:
o The system shall be operational 99.5% of the time, excluding scheduled
maintenance.
5. System Architecture
5.1 Overview: The system will have a client-server architecture with a web-based
frontend and a backend server handling data processing.
13
5.2 Components:
o Frontend: User interface developed using HTML, CSS, and JavaScript
frameworks.
o Backend: APIs developed in Python using Flask or Django.
o Database: SQL or NoSQL database for storing resumes and user data.
6. Dependencies
External libraries and frameworks for NLP and machine learning (e.g., SpaCy, Scikit-
learn, TensorFlow).
Cloud services for hosting and data storage.
1. Development Environment
Developer Workstations:
o CPU: Multi-core processor (e.g., Intel i5 or AMD Ryzen 5 or better)
o RAM: Minimum 8 GB (16 GB recommended for NLP tasks)
o Storage: SSD with at least 512 GB (for fast access to libraries and datasets)
o GPU: NVIDIA GPU (e.g., GTX 1660 or RTX 2060) for training machine
learning models, if applicable
o Network: Reliable internet connection for accessing libraries and cloud services
3.3 Model Selection And Architecture
Model Selection
14
1. NLP Techniques for Resume Parsing and Ranking
Text Preprocessing:
o Tokenization: Use libraries like SpaCy or NLTK to break resumes and job
descriptions into tokens.
o Stop Word Removal: Remove common words that do not add significant
meaning.
o Stemming/Lemmatization: Reduce words to their base forms.
Ranking Models:
o Cosine Similarity: For comparing document similarity based on vector
representations.
o Logistic Regression: For binary classification tasks (e.g., qualified vs. not
qualified).
o Support Vector Machines (SVM): Effective for high-dimensional spaces, good
for text classification.
o Ensemble Methods: Combine multiple models to improve accuracy (e.g.,
Random Forest, Gradient Boosting).
o Deep Learning Models:
15
Use a fine-tuned BERT or similar transformer-based models for advanced
ranking based on the entire text context.
Figure 1: Architecture for Intelligent Resume Screening and Ranking System Using NLP
The diagram represents a system for automating the process of matching job applicants with job
postings.
1. Applicant Side (Top Left)
16
o CV Upload: Job applicants upload their CVs, which are usually unstructured text
files.
o Unstructured Resume: The uploaded CV is in an unstructured format, which
means it's not organized in a way that can be easily processed by a computer.
o Section-Based Segmentation: The system processes the unstructured resume and
breaks it down into segments based on sections (e.g., Education, Experience,
Skills).
o Filtration Module:
Insignificant Term Removal: This module removes unnecessary or
irrelevant terms to streamline the information and extract only useful data.
Skill Set Extraction: After filtering, a set of relevant skills is extracted
from the resume.
3. Classification Module
o Skill Knowledge Base: A central database that stores skills and knowledge areas
relevant to various job roles.
o Classified Resume and Job Post: Using the Skill Knowledge Base, both resumes
and job postings are classified and standardized in a similar format for better
comparison.
This system effectively automates the recruitment process by extracting and matching skills
between applicants and job posts, improving the efficiency and accuracy of candidate selection.
17
CHAPTER 4
4.1 Introduction
In the modern recruitment landscape, recruiters face the challenge of processing thousands of
resumes, leading to time inefficiencies and a risk of human error in selecting the most qualified
candidates. An intelligent resume screening and ranking system using Natural Language
Processing (NLP) addresses these issues by automating and streamlining the initial stages of
candidate selection. This project proposes a solution that leverages NLP to evaluate and rank
resumes based on relevant skills, experience, education, and other job-specific criteria, creating a
more efficient, unbiased, and accurate hiring process.
The system will parse and analyze unstructured data in resumes, transforming it into a structured
format that highlights key competencies. Using advanced NLP techniques, the system will
extract relevant information and rank candidates according to the alignment of their
qualifications with the job description. This approach ensures that only the most suitable
candidates are shortlisted, allowing recruiters to focus on high-potential candidates, reduce hiring
time, and improve recruitment quality.
Figure 2: DFD for Intelligent Resume Screening and Ranking System Using NLP
18
The dataset for an Intelligent Resume Screening and Ranking System is integral to building
an effective model that can parse, analyze, and rank resumes accurately. This dataset typically
consists of two main components: Resumes and Job Descriptions. Here is a breakdown of the
elements within each:
1. Resumes Dataset
Text Format: Resumes are often in various formats (PDF, DOCX, TXT), so they may
need to be converted into text format for analysis.
Attributes/Features:
o Name: Candidate's full name.
o Contact Information: Email address, phone number, LinkedIn profile, or other
contact details.
o Summary or Objective: A brief section where candidates summarize their
professional goals.
o Skills: A list of technical and soft skills, both explicit (e.g., "Python, Machine
Learning") and implicit (derived through NLP analysis).
o Education: Degree(s) attained, field of study, institution names, graduation dates.
o Experience: Company names, job titles, job responsibilities, and dates of
employment for each role.
o Certifications: Any professional certifications or courses completed.
o Projects: Descriptions of relevant projects, highlighting technical skills and tools
used.
o Achievements: Notable accomplishments in professional or academic settings.
Labels (Optional): If the dataset is labeled (e.g., "Good Fit", "Poor Fit") based on prior
recruitment decisions, it can be used to train the model in a supervised manner.
Text Format: Job descriptions can also come in various formats, which should be
standardized for easier parsing.
Attributes/Features:
19
o Job Title: Title of the job position (e.g., "Data Scientist", "Software Engineer").
o Required Skills: Explicitly listed skills required for the job, which may be used
to match against candidate skills.
o Preferred Skills: Skills that are not mandatory but preferred, which can influence
ranking.
o Experience Level: Minimum experience required (e.g., "3+ years in software
development").
o Education Requirements: Minimum degree or certification requirements.
o Job Responsibilities: Duties and tasks expected to be performed, useful for
matching with candidates’ past job responsibilities.
o Location: Job location, which may be a factor if candidates have location
constraints.
o Job Description Text: Full text describing the role, responsibilities, and
qualifications.
1. Parse and Structure Information: NLP algorithms will be trained to extract structured
information from resumes and job descriptions.
2. Feature Matching: Compare candidate qualifications with job requirements, using both
direct and semantic matching.
3. Ranking Candidates: Assign a compatibility score to each candidate based on the
alignment of resume features with job description attributes.
20
4.4 Data Preprocessing Technique
1.Text Processing Methods
Tokenization: Breaks down text into individual words or phrases (tokens). This is the first
step in analysing textual data.
Stop-word Removal: Eliminates common words (e.g., "and," "the," "is") that do not
contribute meaningful information to the analysis.
Lemmatization/Stemming: Reduces words to their base or root forms (e.g., "running" to
"run"), ensuring that variations of a word are treated as the same term.
Cosine Similarity: Measures the cosine of the angle between two non-zero vectors (representing
resumes and job descriptions) in a multi-dimensional space. It ranges from -1 to 1, with higher
values indicating greater similarity.
Jaccard Similarity: Computes similarity between two sets by dividing the size of the intersection
by the size of the union. It is useful for comparing the presence of keywords in resumes and job
descriptions.
21
Euclidean Distance: A geometric measure that calculates the straight-line distance between two
points (vectors) in a multi-dimensional space, often used for ranking based on distance metrics.
2. Ranking Algorithms
Machine Learning Models: Supervised learning algorithms such as:
o Random Forest: An ensemble method that builds multiple decision trees and
merges them to improve prediction accuracy.
o Support Vector Machines (SVM): A classification technique that finds the
hyperplane that best separates different classes in the feature space.
o Gradient Boosting: Builds models in a sequential manner, where each new model
corrects errors made by the previous ones.
Neural Networks: Deep learning models can be used for more complex ranking tasks. Recurrent
Neural Networks (RNNs) or Transformers can learn from sequences of text data, improving
understanding of context.
Fairness Algorithms: Implement algorithms to reduce bias in candidate evaluation, ensuring that
the system does not favour or discriminate against any group. Techniques include re-weighting,
adversarial debiasing, and incorporating fairness constraints into the ranking algorithms.
Data Balancing: Ensures that training datasets are representative of diverse candidate
backgrounds, minimizing the risk of perpetuating existing biases in hiring practices.
CHAPTER 5
5.1 Introduction
22
The Intelligent Resume Screening and Ranking System is deployed on a cloud-based
infrastructure, providing a responsive and scalable backend to handle resume parsing, NLP
processing, and candidate ranking in real-time. Through a user-friendly dashboard, recruiters can
view ranked candidate lists, access match scores, and filter candidates by criteria such as skills
and experience, making the hiring process efficient and intuitive.
In terms of results, the system accurately identifies top candidates based on job requirements,
achieving high precision and recall in matching skills, experience, and qualifications. By
automating and enhancing the initial screening phase, this system significantly reduces time-to-
hire and ensures a fair, data-driven candidate selection process.
import os
import uuid
import aiofiles
import fitz # PyMuPDF for PDF text extraction
from fastapi import FastAPI, File, UploadFile, Form
from fastapi.responses import JSONResponse
from fastapi.middleware.cors import CORSMiddleware
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
import re
app = FastAPI()
24
text = ""
with fitz.open(file_path) as doc:
for page in doc:
text += page.get_text("text")
return text
25
print(f"Failed to process query: {e}")
return JSONResponse(content={"error": f"Failed to process query: {e}"},
status_code=500)
@app.post("/upload-pdf")
async def upload_pdf(pdf_file: UploadFile = File(...)):
"""Endpoint to upload PDF and extract text and details."""
file_name = f"{uuid.uuid4()}.pdf"
file_path = f"./temp/{file_name}"
26
print(f"Extracted text from PDF: {pdf_text}") # Log the full extracted text
pdf_storage[serial_number] = {
"text": pdf_text,
"details": parsed_details,
"phone_number": phone_number
} # Store text, details, and phone number
except Exception as e:
print(f"Failed to extract text from PDF: {e}")
return JSONResponse(content={"error": f"Failed to extract text from PDF: {e}"},
status_code=500)
@app.post("/ask-query")
async def ask_query(serial_number: str = Form(...), query: str = Form(...)):
27
"""Endpoint to ask a query about a previously uploaded PDF."""
if serial_number not in pdf_storage:
return JSONResponse(content={"error": "Invalid serial number or PDF not
found."}, status_code=404)
pdf_text = pdf_storage[serial_number]["text"]
index.html file
28
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Intelligent Resume Screening and Ranking System Using NLP</title>
<link rel="stylesheet" href="styles.css">
<script src="https://cdnjs.cloudflare.com/ajax/libs/xlsx/0.17.0/xlsx.full.min.js"></script>
</head>
<body>
<script>
const results = [];
29
"C": 7,
"JAVA": 9,
"Data Structures": 8,
"DBMS": 6,
"HTML": 5,
"CSS": 5,
"Communication Skills": 4,
"Ability to Work Under Pressure": 4,
"Adaptability": 4,
"Programming": 6,
"Fast Learner": 3,
"Ability to work in a team": 4,
"Critical thinking and problem solving": 5,
"Team leadership": 4,
"Strong organizational and time-management skills": 5,
"Flexible and easily adaptable": 4,
"Ability to work independently and as part of a team": 5,
"Detail-oriented and a keen observer": 4,
"Good at communicating and understanding others needs and deliver results": 5,
"React": 9,
"JavaScript": 8,
"Cloud Computing": 10,
"Agile Methodologies": 8,
"Project Management": 7,
"Version Control (Git)": 6,
"RESTful APIs": 6,
"Cybersecurity": 7,
"Data Visualization": 9,
"Business Intelligence": 8,
"Software Testing": 7,
"Network Management": 6,
30
"Artificial Intelligence": 10,
"User Experience (UX) Design": 8,
"Technical Writing": 5,
"Digital Marketing": 6
};
31
const fileInput = document.getElementById('resumeFile');
const formData = new FormData();
try {
const response = await fetch('http://localhost:8000/upload-pdf', {
method: 'POST',
body: formData
});
if (!response.ok) {
throw new Error('Network response was not ok');
}
function displayResult(data) {
const resultDiv = document.getElementById('result');
const serialNumber = results.length + 1;
32
if (data.details) {
const nameMatch = data.details.response.match(/(?:Name:\s*\*\*\s*)([^\n]+)/i);
const emailMatch = data.details.response.match(/(?:Email:\s*\*\*\s*)([^\n]+)/i);
const skillsMatch = data.details.response.match(/(?:Skills:\s*\*\*\s*)([\s\S]*?)(?=\n\s*\
n|$)/i);
33
if (relevantSkillsCount >= 5) {
// Ensure score is at least 60 if five or more skills are present
score = Math.max(score, 60);
} else {
score = 0; // Reset score if less than five relevant skills
}
const parsedDetails = {
serialNumber: serialNumber,
name: nameMatch ? nameMatch[1].trim() : 'N/A',
email: emailMatch ? emailMatch[1].trim() : 'N/A',
skills: skillsList.length > 0 ? skillsList : ['N/A'],
score: score,
predictedJob: predictedJobs.length > 0 ? predictedJobs.join(', ') : 'N/A'
};
results.push(parsedDetails);
updateTable();
} else {
resultDiv.innerHTML = 'No details found.';
}
}
function updateTable() {
const resultDiv = document.getElementById('result');
resultDiv.innerHTML = `
34
<h2>Parsed Details:</h2>
<table>
<tr>
<th>Serial Number</th>
<th>Name</th>
<th>Email</th>
<th>Skills</th>
<th>Score</th>
<th>Predicted Job</th>
</tr>
${results.map(r => `
<tr>
<td>${r.serialNumber}</td>
<td>${r.name}</td>
<td>${r.email}</td>
<td>
<ul>
${r.skills.map(skill => `<li>${skill}</li>`).join('')}
</ul>
</td>
<td>${r.score}</td>
<td>${r.predictedJob}</td>
</tr>
`).join('')}
</table>
`;
document.getElementById('downloadExcel').style.display = 'block';
}
document.getElementById('downloadExcel').addEventListener('click', () => {
35
const wb = XLSX.utils.book_new();
const ws = XLSX.utils.json_to_sheet(results.map(r => ({
'Serial Number': r.serialNumber,
'Name': r.name,
'Email': r.email,
'Skills': r.skills.join(', '),
'Score': r.score,
'Predicted Job': r.predictedJob
})));
XLSX.writeFile(wb, filePath);
});
</script>
</body>
</html>
Styles.css file
/* Basic Reset */
*{
margin: 0;
padding: 0;
box-sizing: border-box;
36
}
body {
font-family: 'Arial', sans-serif;
padding: 20px;
background: linear-gradient(to right, #6a11cb, #2575fc);
color: #333;
}
h1 {
text-align: center;
color: #fff;
margin-bottom: 20px;
}
form {
display: flex;
flex-direction: column;
align-items: center;
margin-bottom: 20px;
background-color: rgba(255, 255, 255, 0.8);
padding: 20px;
border-radius: 10px;
box-shadow: 0 4px 15px rgba(0, 0, 0, 0.2);
}
input[type="file"] {
margin: 10px 0;
border: 2px solid #007bff;
border-radius: 5px;
padding: 10px;
37
outline: none;
width: 100%;
}
button {
padding: 10px 15px;
background-color: #007bff;
color: white;
border: none;
border-radius: 5px;
cursor: pointer;
font-size: 16px;
transition: background-color 0.3s ease;
}
button:hover {
background-color: #0056b3;
}
.result {
margin-top: 20px;
background-color: rgba(255, 255, 255, 0.9);
border-radius: 10px;
padding: 20px;
box-shadow: 0 2px 5px rgba(0, 0, 0, 0.1);
}
table {
width: 100%;
border-collapse: collapse;
margin-top: 10px;
38
}
th, td {
border: 1px solid #ddd;
padding: 12px;
text-align: left;
}
th {
background-color: #007bff;
color: white;
}
tr:nth-child(even) {
background-color: #f2f2f2;
}
tr:hover {
background-color: #e1f5fe;
}
ul {
padding-left: 20px;
}
#downloadExcel {
margin-top: 20px;
padding: 10px 15px;
background-color: #28a745;
color: white;
border: none;
39
border-radius: 5px;
cursor: pointer;
font-size: 16px;
display: none; /* Initially hidden */
transition: background-color 0.3s ease;
}
#downloadExcel:hover {
background-color: #218838;
}
button, #downloadExcel {
width: 100%;
}
input[type="file"] {
width: 100%;
}
Data Preparation: Collect and preprocess data by cleaning, normalizing, and extracting
structured information (e.g., skills, experience) using NER. Generate embeddings (TF-
IDF, Word2Vec, BERT) to capture semantic meaning.
Feature Engineering: Extract key features like skills, education, and experience, and
create embeddings to represent resumes and job descriptions.
40
Model Selection and Training:
Semantic Matching: Calculate similarity scores using cosine similarity.
Classification and Ranking: Use models like logistic regression or random forests for
candidate fit and ranking.
Hyperparameter Tuning and Evaluation: Optimize and evaluate models using metrics
such as precision, recall, and F1 score.
Deployment and Continuous Learning: Deploy on cloud infrastructure, serving ranked
candidate lists via an API. Incorporate feedback to continuously improve the model.
The model evaluation for the Intelligent Resume Screening and Ranking System uses several
metrics to gauge accuracy and efficiency in parsing and ranking candidate resumes. Key metrics
include:
1. Precision: This metric assesses the proportion of relevant entities correctly identified,
such as accurately extracted names, email addresses, and skills from resumes. High
precision ensures that false positives are minimized, which is crucial for parsing
candidate details with minimal errors.
2. Recall: Recall measures the system's ability to identify all relevant information in a
resume, such as all the skills listed by a candidate. High recall indicates that the system is
comprehensive in extracting information, ensuring important details are not missed.
3. F1 Score: This combines precision and recall, providing a balanced view of accuracy. A
higher F1 score reflects a model that performs well in both precision and recall, essential
for handling the variability in resume formats and content.
4. Accuracy: Indicates the overall correctness of the system in extracting and parsing fields
like name, email, and skills. Accuracy is particularly useful for evaluating the parsing
function’s ability to handle diverse resume formats without misinterpretation.
5. Error Logging and Debugging Information: The system captures detailed logs,
particularly when errors arise during PDF text extraction or the LangChain response
generation. This information assists in fine-tuning the model and improving its robustness
against different file formats or content structures.
41
5.5 Model Deployment : Testing And Validation
The deployment process for the Intelligent Resume Screening and Ranking System involves
setting up a cloud-based backend API using FastAPI. This setup ensures efficient and scalable
interaction for real-time resume parsing and information retrieval:
1. API Design: The system provides two main endpoints:
o /upload-pdf: This endpoint allows uploading PDF resumes. Upon upload, the
system extracts text, applies the LangChain model for parsing fields (e.g., name,
email, and skills), and stores the results in memory for easy retrieval.
2. Integration with NLP Model: The API integrates the LangChain Google Gemini model
to parse and structure resume content. With structured prompt templates, it accurately
extracts and formats key details from resumes, allowing recruiters to analyze candidate
information seamlessly.
3. Testing: Validation is carried out by uploading multiple resume files, testing various
PDF formats to ensure reliable text extraction and parsing. Errors, such as format
incompatibility or parsing failures, are logged, and debugging messages are printed for
troubleshooting.
4. Real-Time Validation: Through FastAPI and CORS middleware, the system handles
cross-origin requests, making it compatible with various frontend applications, including
recruiter dashboards. Real-time validation provides immediate feedback on parsing
accuracy, aiding in continuous improvement.
5. Excel Export for Analysis: Parsed resume details are stored and presented in a table
format on the front end. Recruiters can download these results in Excel format, enabling
detailed analysis and further use in applicant tracking systems.
5.6 Result
42
This system is designed to analyze and rank resumes based on specific criteria such as skills,
score, and predicted job roles. Here’s a breakdown of its functionality and output based on the
image:
1. File Upload Section
There is an option to "Choose File" to upload resumes in PDF format, which are then
processed for analysis.
2. Download Option
A "Download Excel" button is present, suggesting that the analyzed data can be exported
in Excel format.
3. Parsed Details Table
This table shows detailed information extracted from the resumes, such as:
Serial Number: Unique identifier for each entry.
Name and Email: Basic information of the candidate.
Skills: Key skills extracted from each resume.
Score: A numerical score calculated based on the resume's content and relevance
to job criteria.
Predicted Job: Suggested job roles (like Software Developer, Project Manager,
System Analyst, etc.) based on the skills and other factors.
This system uses NLP (Natural Language Processing) techniques to parse and interpret
information from resumes, providing a summarized view that could help recruiters quickly
identify suitable candidates for different job roles. The score and predicted job role features
allow for efficient ranking and matching of candidates to job requirements.
43
Fig 3: Intelligent Resume Screening and Ranking System Result
CHAPTER 6
44
6.1 Project Conclusion
The Intelligent Resume Screening and Ranking System, powered by Gemini AI, leverages
advanced NLP and machine learning capabilities to deliver highly accurate, unbiased, and
efficient candidate selection. Gemini AI’s contextual understanding enables precise parsing and
categorization of complex resume information, including skills, job titles, and experience,
ensuring that only the most relevant details are extracted and matched against job-specific
requirements. Its sophisticated ranking algorithms facilitate effective and fair candidate scoring,
minimizing bias and enhancing accuracy in the initial screening process. This adaptability and
accuracy make the tool suitable for various industries and provide companies of all sizes with a
streamlined recruitment process and a reduced time-to-hire.
6.2 Future Scope
45
REFERENCES
1. Li, S., & Ma, H. (2020). "An Intelligent Resume Screening System Based on Natural
Language Processing and Machine Learning." International Journal of Advanced
Computer Science and Applications, 11(3), 25-31.
2. Chandrashekar, G., & Sahin, F. (2014). "A Survey on Feature Selection Methods."
Computers & Electrical Engineering, 40(1), 16-28.
3. Sharma, S., & Jha, K. (2019). "Web Application for Screening Resumes Using Natural
Language Processing." IEEE International Conference on Nascent Technologies in
Engineering.
4. Chowdhury, A. S., & Chatterjee, P. (2019). "Automated Resume Screening: A Study
Using Machine Learning Techniques." International Journal of Computer Applications,
178(7), 24-29.
5. Rana, A., & Rahman, M. (2021). "Intelligent Resume Screening System Using Deep
Learning." International Journal of Computer Applications, 175(8), 1-7.
6. Kumar, S., & Kumari, R. (2020). "An Efficient Resume Parsing System for Automatic
Screening of Candidates." International Journal of Information Technology, 12(3), 823-
830.
7. Meena, K., & Kumar, R. (2020). "A Novel Approach for Automated Resume Screening
Using Text Mining." International Journal of Scientific & Technology Research, 9(2),
5834-5839.
8. Jain, S., & Singh, R. (2020). "Intelligent Resume Parsing and Ranking Using Machine
Learning." Journal of King Saud University - Computer and Information Sciences.
9. Alotaibi, H. M., et al. (2021). "Automated Resume Screening Using Machine Learning:
A Review." Journal of Computer Networks and Communications.
10. Ghaffari, A., & Ghasemaghaei, M. (2018). "Leveraging Natural Language Processing
for Resume Screening." Proceedings of the 2018 IEEE International Conference on Big
Data.
46