0% found this document useful (0 votes)
73 views24 pages

Iocl1 Internship

This contains a report of the project resume screening

Uploaded by

Nilakhya Chawrok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views24 pages

Iocl1 Internship

This contains a report of the project resume screening

Uploaded by

Nilakhya Chawrok
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Summer Internship Report

Indian Oil Corporation Ltd.


Guwahati Refinery

Topic: Resume Screening Website using Web Development, Machine


Learning, and Natural Language Processing

By:
Nilakhya Mandita Bordoloi
Computer Science and Engineering,
Jorhat Engineering College

Under the Guidance of:


Jon Jonak Phukan
Chief Manager, Information Systems Department

Organization:
Indian Oil Corporation Ltd., Guwahati Refinery

Date: 30th July, 2024


Abstract

This project report details the development of a Resume Screening Website created dur-
ing a summer internship at IOCL Guwahati Refinery under the Information Systems
Department. The website is designed to accept resumes in both text and PDF formats,
categorizing them based on the provided data and recommending appropriate job roles
to the resume owners.
The system parses essential information such as the name, personal details, skills,
and more from the resumes. For the frontend development, we utilised HTML, CSS, and
JavaScript, along with the Flask framework for the backend.
In terms of machine learning, we identified Logistic Regression as the most suitable
model for categorization, while the Random Forest Classifier was chosen for job role
recommendations. The project also incorporates the use of TF-IDF Vectorizer for feature
extraction. Additionally, Natural Language Processing (NLP) techniques and regular
expressions were employed to parse and analyze the resume content effectively.
The system aims to streamline the resume screening process, providing efficient cate-
gorization and job role recommendations based on the extracted information.

1
Acknowledgements

We would like to express our deepest gratitude to our internship guide, Mr. Jon Jonak
Phukan, Chief Manager, Information Systems Department, IOCL Guwahati Refinery, for
his invaluable guidance, support, and encouragement throughout this project. His in-
sights and expertise were crucial in navigating the complexities of developing the Resume
Screening Website.
We also extend our sincere thanks to the entire Information Systems Department
team at IOCL Guwahati Refinery for providing us with the necessary resources and a
conducive environment for our project work.
Additionally, we are grateful to the management and staff of IOCL Guwahati Re-
finery for offering us this internship opportunity, which has been an invaluable learning
experience.
Finally, we would like to thank our families and friends for their unwavering support
and encouragement throughout the internship period.

2
Contents

Abstract 1

Acknowledgements 2

1 Introduction 5
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Project Description 7
2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Solution Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Project Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Technical Stack 9
3.1 Web Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.1 HTML (HyperText Markup Language) . . . . . . . . . . . . . . . 9
3.1.2 CSS (Cascading Style Sheets) . . . . . . . . . . . . . . . . . . . . 9
3.1.3 JavaScript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.4 Flask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.1 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.2 Random Forest Classifier . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Natural Language Processing (NLP) . . . . . . . . . . . . . . . . . . . . 10
3.4 Platforms and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.1 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.2 Visual Studio Code (VS Code) . . . . . . . . . . . . . . . . . . . 11
3.4.3 Port Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.4 Jupyter Notebook . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.5 Python Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Development Process 12
4.1 Data Collection and Preprocessing . . . . . . . . . . . . . . . . . . . . . 12
4.1.1 Clean Resume Data . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.2 Job Dataset with Features . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Integration with Flask and NLP . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Web Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4 Testing and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 14

3
5 Implementation Details 15
5.1 Integration with Flask and NLP . . . . . . . . . . . . . . . . . . . . . . . 15
5.2 Frontend Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.3 Testing and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.4 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

6 Results and Discussion 17


6.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.1.1 Categorization Model . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.1.2 Recommendation Model . . . . . . . . . . . . . . . . . . . . . . . 17
6.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.2.1 Model Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.2.2 Challenges and Limitations . . . . . . . . . . . . . . . . . . . . . 18
6.2.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

7 Conclusion and Future Work 21


7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.1.1 Summary of Achievements . . . . . . . . . . . . . . . . . . . . . . 21
7.1.2 Impact and Significance . . . . . . . . . . . . . . . . . . . . . . . 21
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.2.1 Data and Model Improvements . . . . . . . . . . . . . . . . . . . 22
7.2.2 Real-World Testing and Validation . . . . . . . . . . . . . . . . . 22
7.2.3 Expansion and Integration . . . . . . . . . . . . . . . . . . . . . . 22
7.2.4 Ethical and Privacy Considerations . . . . . . . . . . . . . . . . . 22
7.3 Final Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7.4 Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7.4.1 GitHub Repository . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4
Chapter 1

Introduction

1.1 Background
In the digital age, organizations receive an overwhelming number of resumes for every
job opening, making the recruitment process increasingly complex and time-consuming.
Traditional methods of resume screening, which often involve manual review by human
resources personnel, can be inefficient, inconsistent, and prone to bias. This challenge has
driven the development of automated systems that leverage advancements in technology
to streamline and enhance the hiring process.
Our project, developed during a summer internship at IOCL Guwahati Refinery un-
der the Information Systems Department, addresses this challenge by creating a Resume
Screening Website. This website utilizes web development technologies, machine learn-
ing, and natural language processing (NLP) to automate the resume screening process,
providing an efficient and scalable solution for organizations.

1.2 Purpose
The primary purpose of the resume screening system is to automate the initial stages
of the recruitment process by categorizing resumes based on predefined criteria and rec-
ommending suitable job roles to applicants. By doing so, the system aims to reduce the
time and effort required for manual resume screening, increase the accuracy of candidate
selection, and minimize potential biases in the evaluation process.
The system extracts and analyzes key information from resumes, such as personal
details, educational background, work experience, and skills. It then matches this in-
formation against job descriptions and criteria to categorize candidates and provide job
recommendations. This automated process not only speeds up the recruitment cycle but
also ensures a more objective assessment of candidates.

1.3 Scope
The scope of the project encompasses the following components:

• Resume Parsing: The system accepts resumes in both text and PDF formats,
extracting essential information using NLP techniques and regular expressions.

5
• Categorization: Utilizing machine learning models, the system categorizes re-
sumes based on various criteria such as skills, experience, and qualifications.

• Job Recommendation: The system provides job role recommendations to can-


didates based on the extracted information and job requirements.

• Web Interface: A user-friendly web interface developed using HTML, CSS, and
JavaScript, with Flask as the backend framework, allows users to upload resumes
and view results.

1.4 Objectives
The key objectives of the project are:

• Automate Resume Screening: Develop a system capable of automatically pars-


ing and categorizing resumes to streamline the recruitment process.

• Enhance Accuracy and Efficiency: Use advanced machine learning models


like Logistic Regression for categorization and Random Forest Classifier for job
recommendations to ensure accurate and reliable results.

• Reduce Bias: Implement an objective system to minimize human biases in the


resume screening process, promoting fair evaluation of all candidates.

• Provide User-Friendly Interface: Create an intuitive and accessible web inter-


face for users to interact with the system seamlessly.

• Scalability and Adaptability: Design the system to handle a large volume of


resumes and adapt to various job roles and industries.

6
Chapter 2

Project Description

2.1 Problem Statement


The recruitment process in large organizations often involves handling a substantial num-
ber of job applications, each accompanied by a resume. The traditional method of manu-
ally screening these resumes is not only time-consuming but also susceptible to human er-
rors and biases. These challenges are particularly pronounced in high-volume recruitment
scenarios, where recruiters must sift through hundreds or even thousands of applications
to identify suitable candidates.
The key issues identified in the traditional resume screening process include:
• Time-Consuming Process: Manual review of resumes is a lengthy process that
slows down the recruitment cycle.
• Inconsistent Evaluation: Different recruiters may evaluate resumes differently,
leading to inconsistency in candidate assessment.
• Potential for Bias: Human biases, whether conscious or unconscious, can influ-
ence the selection process, potentially leading to unfair hiring practices.
• Difficulty in Matching Candidates to Roles: Manually matching candidate
skills and experiences to specific job roles can be challenging, especially in cases
where applicants do not clearly articulate their qualifications.
These challenges necessitate the development of an automated system that can effi-
ciently and accurately screen resumes, categorize candidates, and recommend appropriate
job roles, thereby optimizing the recruitment process.

2.2 Solution Overview


To address these challenges, we developed an automated Resume Screening Website dur-
ing our internship at IOCL Guwahati Refinery under the Information Systems Depart-
ment. The solution leverages modern web development, machine learning, and natural
language processing (NLP) technologies to streamline the resume screening process.
The system comprises the following key components:
• Resume Parsing: The system can accept resumes in text or PDF format. Using
NLP techniques and regular expressions, it extracts critical information such as
personal details, educational background, work experience, and skills.

7
• Categorization: The system uses a Logistic Regression model to categorize re-
sumes based on predefined criteria, such as skill sets, industry experience, and edu-
cational qualifications. This model was selected for its simplicity and effectiveness
in binary and multi-class classification tasks.

• Job Recommendation: A Random Forest Classifier is employed to recommend


suitable job roles to candidates. This model is chosen for its robustness and ability
to handle complex data patterns, providing reliable recommendations based on the
candidate’s profile.

• Web Interface: A user-friendly web interface, built using HTML, CSS, and
JavaScript, with Flask as the backend framework, allows users to upload resumes
and view categorization and job recommendation results. The interface is designed
to be intuitive and accessible, ensuring a seamless user experience.

2.3 Project Scope


The scope of this project includes the following aspects:

• Development of a resume parsing system capable of extracting relevant information


from resumes in multiple formats.

• Implementation of machine learning models for the categorization of resumes and


job recommendations.

• Design and development of a responsive web interface for user interaction.

• Integration of the system components to ensure smooth and efficient operation.

• Testing and validation of the system to ensure accuracy, reliability, and scalability.

By automating the resume screening process, this system aims to reduce the workload
on human resources personnel, minimize biases in candidate evaluation, and improve the
overall efficiency of the recruitment process.

8
Chapter 3

Technical Stack

3.1 Web Development


The frontend of the Resume Screening Website is built using a combination of HTML,
CSS, and JavaScript, while Flask serves as the backend framework. Below are the details
of each technology:

3.1.1 HTML (HyperText Markup Language)


HTML is the standard language used to create the structure of web pages. It allows devel-
opers to define elements like headings, paragraphs, links, images, and forms. HTML pro-
vides the foundational structure that is styled and controlled using CSS and JavaScript,
respectively.

3.1.2 CSS (Cascading Style Sheets)


CSS is used to control the presentation, formatting, and layout of the HTML elements.
It allows for the separation of content from design, enabling developers to style web
pages with fonts, colors, spacing, and positioning. CSS provides the flexibility to create
responsive and aesthetically pleasing web interfaces.

3.1.3 JavaScript
JavaScript is a versatile programming language used to add interactivity and dynamic
features to web pages. It allows for the manipulation of HTML and CSS elements,
making it possible to create responsive user interfaces, handle events, and communicate
with backend services asynchronously using AJAX.

3.1.4 Flask
Flask is a lightweight web framework for Python, designed to be simple yet flexible. It
is used for developing the backend of the Resume Screening Website. Flask provides a
robust platform for handling requests, routing, and integration with machine learning
models. It also supports templating, making it easy to render dynamic web pages.

9
3.2 Machine Learning
The system employs two key machine learning models: Logistic Regression for cate-
gorization and Random Forest Classifier for job recommendations. Both models are
implemented using the Python programming language and related libraries.

3.2.1 Logistic Regression


Logistic Regression is a statistical method used for binary and multi-class classification.
It estimates the probability of a binary response based on one or more predictor variables.
The model uses the logistic function (also known as the sigmoid function) to output a
probability value between 0 and 1.
Mathematically, the logistic function is defined as:
1
σ(z) =
1 + e−z
where z is the linear combination of input features, represented as:
z = β0 + β1 x1 + β2 x2 + · · · + βn xn
Here, β0 is the intercept, and β1 , β2 , . . . , βn are the coefficients corresponding to the input
features x1 , x2 , . . . , xn .

3.2.2 Random Forest Classifier


The Random Forest Classifier is an ensemble learning method that operates by construct-
ing a multitude of decision trees during training. The output of the classifier is the mode
of the classes output by individual trees. It is particularly useful for handling complex
data sets with a large number of features and providing robust and accurate classification
results.
The Random Forest algorithm works as follows:
• Randomly select k features from the total n features.
• Among the k features, calculate the node d using the best split point.
• Split the node into child nodes using the best split.
• Repeat the above steps until the desired number of nodes is reached.
• Build a forest by repeating the process for n times to create n trees.
The final prediction is made by aggregating the predictions of all individual trees (e.g.,
by majority voting in the case of classification).

3.3 Natural Language Processing (NLP)


NLP is a field of artificial intelligence that focuses on the interaction between computers
and human languages. In this project, NLP techniques are used to parse and analyze
resume content, extracting essential information such as personal details, skills, and ex-
periences.
Key NLP techniques employed include:

10
• Named Entity Recognition (NER): Detecting and classifying named entities
in text, such as names, dates, and organizations.

• Regular Expressions: Used for pattern matching within text to extract specific
information.

• TF-IDF Vectorization: Transforming text data into numerical vectors based on


the frequency of terms, adjusted by the inverse document frequency to highlight
important words.

3.4 Platforms and Tools


3.4.1 Python
Python is the primary programming language used for this project. It is known for
its simplicity and versatility, making it ideal for web development, data analysis, and
machine learning tasks. Python’s rich ecosystem of libraries, such as NumPy, pandas,
scikit-learn, and Flask, was crucial in developing the project’s functionalities.

3.4.2 Visual Studio Code (VS Code)


Visual Studio Code is a popular code editor used for writing and debugging code. It
provides a rich set of features, including syntax highlighting, code completion, and version
control integration, which made the development process more efficient.

3.4.3 Port Number


The project is hosted locally on a development server using Flask, typically running on
port 5000. This port number can be configured as needed for deployment in different
environments.

3.4.4 Jupyter Notebook


Jupyter Notebook is an open-source web application that allows for creating and sharing
documents containing live code, equations, visualizations, and narrative text. It was ex-
tensively used during the development and testing of machine learning models, providing
an interactive environment for data analysis and model experimentation.

3.4.5 Python Libraries


pandas, wordcloud, matplotlib, sklearn, re, pickle, numpy,

11
Chapter 4

Development Process

4.1 Data Collection and Preprocessing


The development process began with gathering two datasets from Kaggle, which served
as the foundation for our machine learning models.

4.1.1 Clean Resume Data


The first dataset, ”Clean Resume Data,” was used to train the categorization model. This
dataset contained resumes categorized into 24 different job categories. The preprocessing
steps included:
• Resampling: To balance the dataset, we resampled the data to ensure each cate-
gory was equally represented.
• Duplicate Removal: All duplicate entries were removed to maintain data quality.
• Feature Extraction: Text data was vectorized using the TF-IDF Vectorizer,
which converts text into numerical vectors based on term frequency and inverse
document frequency.
• Model Selection: We applied Grid Search Cross-Validation (Grid Search CV) to
several models, including SVM, Random Forest Classifier, Logistic Regression, and
Decision Tree. Logistic Regression was found to be the most suitable model with
an accuracy of 84.75%. The best parameters identified were:
– C: 15
– Penalty: l1
• Model and Vectorizer Development: Based on the Grid Search CV results,
we finalized the Logistic Regression model and developed the TF-IDF Vectorizer
accordingly.

4.1.2 Job Dataset with Features


The second dataset, ”Job Dataset with Features,” was used for the recommendation
model. This dataset contained a large number of entries across 61 job roles. The prepro-
cessing steps included:

12
Figure 4.1: Data Analysis - Clean Resume Data

• Random Row Dropping: Due to the large size of the dataset, some rows were
dropped randomly to manage data processing efficiently.

• Feature Extraction: Similar to the Clean Resume Data, text data from job
descriptions and features was vectorized.

• Model Selection: The Random Forest Classifier was chosen for its robustness and
capability to handle a diverse set of features. Remarkably, the model achieved an
accuracy of 100%, indicating a perfect fit with the data.

4.2 Integration with Flask and NLP


The next phase involved integrating the machine learning models with a Flask-based web
application. This included the implementation of Natural Language Processing (NLP)
techniques and regular expressions to parse critical information from resumes, such as
names, personal details, skills, and experiences. The extracted information was then
used to categorize resumes and provide job recommendations.

4.3 Web Development


Simultaneously, we worked on developing the web interface using HTML, CSS, and
JavaScript. The interface was designed to be user-friendly, allowing users to upload
resumes and view the categorization and job recommendation results easily.

13
Figure 4.2: Data Analysis - Job Dataset with Features

4.4 Testing and Implementation


The final stage involved rigorous testing of the system using a variety of resumes, including
those of team members and friends. The testing aimed to validate the accuracy and
reliability of the models in categorizing resumes and recommending job roles. The system
demonstrated a high level of accuracy, confirming its effectiveness in real-world scenarios.

14
Chapter 5

Implementation Details

5.1 Integration with Flask and NLP


Following data preprocessing and model training, the next step involved integrating these
models into a web application framework using Flask. The backend was responsible for
handling user requests, processing uploaded resumes, and interacting with the machine
learning models.
Key components included:

• NLP and Regular Expressions: These were employed to parse important infor-
mation from the resumes, such as the candidate’s name, contact information, skills,
and work experience. This information was crucial for both categorization and job
recommendation processes.

• Flask Framework: Flask was chosen for its simplicity and flexibility, enabling
efficient routing and handling of user interactions with the web application.

5.2 Frontend Development


The frontend was developed using HTML, CSS, and JavaScript, designed to provide
a user-friendly interface for uploading resumes and viewing results. The interface was
crafted to be intuitive, ensuring a smooth user experience from resume submission to
result retrieval.

5.3 Testing and Validation


To validate the system’s effectiveness, extensive testing was conducted using a variety of
resumes, including those of the project team members and their friends. The system was
evaluated on its ability to accurately categorize resumes into appropriate job categories
and recommend relevant job roles.
The testing phase confirmed the system’s high accuracy and reliability, with the cat-
egorization and recommendation models performing well under diverse conditions.

15
5.4 Deployment
The application was deployed on a local server, running on Flask’s development server
on a specified port (typically port 5000). This setup allowed for easy testing and demon-
stration of the system’s capabilities.

16
Chapter 6

Results and Discussion

6.1 Results
6.1.1 Categorization Model
The categorization model was developed using the ”Clean Resume Data” dataset, which
contained resumes categorized into 24 distinct job categories. The key findings are as
follows:
• Data Preprocessing: The dataset was resampled to balance the class distribution,
and duplicates were removed to ensure data quality. The resumes were vectorized
using the TF-IDF Vectorizer.
• Model Evaluation: Various models were evaluated using Grid Search Cross-
Validation, including SVM, Random Forest Classifier, Logistic Regression, and De-
cision Tree.
• Best Performing Model: Logistic Regression emerged as the best-performing
model with an accuracy of 84.75%. The optimal parameters identified were:
– C: 15
– Penalty: l1
• Model Development: The Logistic Regression model was finalized and imple-
mented with the TF-IDF Vectorizer. This model effectively categorized resumes
into their respective job categories.

6.1.2 Recommendation Model


The recommendation model utilized the ”Job Dataset with Features,” which contained
job descriptions and features for 61 different job roles. The key findings are:
• Data Reduction: Due to the dataset’s large size, a random subset of rows was
selected for processing.
• Model Training: The Random Forest Classifier was employed for the recom-
mendation model. This model achieved a perfect accuracy of 100%, indicating its
effectiveness in handling the job dataset.

17
Figure 6.1: Categorization Model Performance

Figure 6.2: Recommendation Model Performance

6.2 Discussion
6.2.1 Model Performance
The results from the categorization model show that Logistic Regression performed well
in classifying resumes into 24 different categories. The model’s accuracy of 84.75% in-
dicates robust performance. The chosen parameters (C: 15, Penalty: l1) contributed
to this performance by optimizing the regularization and complexity of the model. The
performance of other models like SVM and Decision Tree was also evaluated, but Logistic
Regression provided the best balance between accuracy and interpretability.
For the recommendation model, the Random Forest Classifier achieved a perfect ac-
curacy of 100

6.2.2 Challenges and Limitations


While the models performed well, there were several challenges and limitations encoun-
tered during the project:

• Data Imbalance: Even after resampling, some categories in the ”Clean Resume
Data” dataset had fewer samples, which may impact model performance.

• Large Dataset Handling: The ”Job Dataset with Features” was large, and ran-
dom row dropping was necessary to manage processing. This may have resulted in

18
Figure 6.3: website

the loss of potentially valuable information.

• Model Overfitting: The perfect accuracy achieved by the Random Forest Clas-
sifier raises concerns about potential overfitting. Further validation on unseen data
would be needed to confirm the model’s generalization capability.

6.2.3 Future Work


Based on the current findings, several areas for future work are identified:

• Enhanced Data Collection: Collecting more diverse and balanced datasets could
improve model performance and generalization.

• Advanced Models and Techniques: Exploring advanced machine learning mod-


els and techniques, such as deep learning or hybrid models, could enhance both
categorization and recommendation performance.

• Real-world Testing: Implementing the system in a real-world scenario with live


data would provide additional insights into its practical effectiveness and user ex-
perience.

The project successfully developed a resume screening website utilizing web develop-
ment, machine learning, and natural language processing techniques. The categorization
and recommendation models demonstrated strong performance, with the Logistic Regres-
sion model achieving an accuracy of 84.75% and the Random Forest Classifier reaching

19
Figure 6.4: fig: categorization, recommended job and parsed information

a perfect accuracy of 100%. The implementation in Flask, combined with effective data
preprocessing and model training, resulted in a functional and accurate system. Fur-
ther enhancements and real-world testing could further validate and refine the system’s
capabilities.

20
Chapter 7

Conclusion and Future Work

7.1 Conclusion
The project successfully developed a comprehensive resume screening website that inte-
grates web development technologies with advanced machine learning and natural lan-
guage processing techniques. This system was designed to process and categorize resumes
and recommend suitable job roles based on the analyzed data.

7.1.1 Summary of Achievements


• Categorization Model: The categorization model utilized the ”Clean Resume
Data” dataset, which consisted of resumes categorized into 24 different job roles.
After preprocessing, including resampling and duplicate removal, various machine
learning models were evaluated. Logistic Regression was selected as the most effec-
tive model with an accuracy of 84.75%. This model demonstrated its capability to
classify resumes accurately into their respective categories.

• Recommendation Model: For job recommendations, the ”Job Dataset with


Features” was used. Despite the initial challenge of handling a large dataset, a
subset was utilized, and the Random Forest Classifier was employed. The model
achieved an impressive accuracy of 100%, indicating its effectiveness in providing
relevant job recommendations based on the features of the dataset.

• Integration and Deployment: The project involved the integration of machine


learning models with a Flask-based web application. The NLP and regular ex-
pression techniques were used to extract key information from resumes, facilitat-
ing accurate categorization and recommendation. The frontend, developed using
HTML, CSS, and JavaScript, ensured a user-friendly interface. The entire system
was successfully deployed and tested, confirming its functionality and performance.

7.1.2 Impact and Significance


The developed resume screening website provides a valuable tool for automating the
resume evaluation process. It significantly reduces the time and effort required for resume
sorting and job matching, enhancing efficiency for recruiters and job seekers alike. The
accurate categorization and recommendation capabilities of the system demonstrate its
potential for practical applications in human resources and career counseling.

21
7.2 Future Work
While the project has achieved its primary objectives, several areas for future enhance-
ment and exploration can further refine and expand the system’s capabilities:

7.2.1 Data and Model Improvements


• Enhanced Data Collection: Collecting a larger and more diverse dataset, in-
cluding resumes and job descriptions from various industries and regions, could
improve model accuracy and generalization. This could address any potential bi-
ases and enhance the system’s ability to handle a broader range of resume formats
and job roles.

• Model Optimization: Exploring advanced machine learning techniques and mod-


els, such as deep learning approaches (e.g., neural networks) or hybrid models com-
bining multiple algorithms, could further enhance performance. Hyperparameter
tuning and ensemble methods could also be investigated to improve model accuracy
and robustness.

7.2.2 Real-World Testing and Validation


• Pilot Deployment: Implementing the system in a real-world setting, such as a
recruitment agency or job portal, would provide practical insights into its perfor-
mance and user experience. Feedback from real users can guide further refinements
and ensure the system meets industry standards and requirements.

• User Experience Enhancements: Based on real-world testing, the user inter-


face and experience can be improved. Incorporating features such as resume upload
validation, real-time feedback, and personalized recommendations can enhance us-
ability and user satisfaction.

7.2.3 Expansion and Integration


• Integration with Other Systems: Integrating the resume screening system with
existing HR software or job portals can streamline the recruitment process. Pro-
viding APIs for seamless integration with other platforms can expand the system’s
applicability and accessibility.

• Multilingual Support: Adding support for multiple languages can make the sys-
tem accessible to a global audience. This would involve developing language-specific
models and incorporating translation capabilities for resume and job description
analysis.

7.2.4 Ethical and Privacy Considerations


• Data Privacy: Ensuring the privacy and security of users’ personal information
is crucial. Implementing robust data protection measures and compliance with
privacy regulations (e.g., GDPR) should be prioritized.

22
• Bias and Fairness: Addressing potential biases in the machine learning models
is essential. Implementing fairness audits and bias mitigation techniques can help
ensure equitable treatment of all applicants and prevent discriminatory practices.

7.3 Final Thoughts


In conclusion, the resume screening website represents a significant step towards automat-
ing and improving the resume evaluation process. With its successful implementation
and strong performance, the system offers valuable insights and capabilities for future
enhancements. The outlined future work provides a roadmap for further development
and refinement, ensuring that the system remains effective, user-friendly, and relevant in
an evolving job market.

7.4 Links
7.4.1 GitHub Repository

Figure 7.1: Scan this QR code to access the GitHub repository

The GitHub repository for this project is available at:


https://github.com/nilaMan16/Resume-Screening/tree/main

23

You might also like