0% found this document useful (0 votes)
34 views6 pages

Revolutionizing Talent Acquisition A Comparative Study of Large Language Models in Resume Classification

This study investigates the effectiveness of Large Language Models (LLMs) in resume classification, aiming to enhance talent acquisition processes by overcoming limitations of traditional methods. The empirical evaluation of various LLMs, including Text Davinci, GPT models, LLaMa, and Flan T5, demonstrates that LLaMA outperforms others with the highest accuracy and performance metrics. The findings highlight the potential of LLMs to revolutionize human resource management practices and emphasize the importance of prompt engineering in optimizing their capabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views6 pages

Revolutionizing Talent Acquisition A Comparative Study of Large Language Models in Resume Classification

This study investigates the effectiveness of Large Language Models (LLMs) in resume classification, aiming to enhance talent acquisition processes by overcoming limitations of traditional methods. The empirical evaluation of various LLMs, including Text Davinci, GPT models, LLaMa, and Flan T5, demonstrates that LLaMA outperforms others with the highest accuracy and performance metrics. The findings highlight the potential of LLMs to revolutionize human resource management practices and emphasize the importance of prompt engineering in optimizing their capabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Revolutionizing Talent Acquisition: A Comparative

Study of Large Language Models in Resume


2024 5th International Conference on Innovative Trends in Information Technology (ICITIIT) | 979-8-3503-8681-3/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICITIIT61487.2024.10580109

Classification
1stVenkatakrishnan R, 2 n d R i th an i M 3 r d B h a ra th i M oh an G

D e p a r t m e nt of Co m pu t e r Sc i e n c e D e p a r t m en t o f Co m pu t e r D e p a r t m e nt of Co m pu t e r Sc i e n c e
a nd En gi n e e r in g S c i e n c e and En gi n e e r in g a nd En gi n e e r in g
A m r i t a S c hoo l o f Co m pu ti ng , A m r i t a S c hoo l o f Co m pu ti ng , A m r i t a S c hoo l o f Co m pu ti ng ,
A m r i t a V i s hw a V id y a p e e tha m , A m r i t a V i s hw a V id y a p e e tha m , A m r i t a V i s hw a V id y a p e e tha m ,
C h e nn a i, In di a C h e nn a i, In di a C h e nn a i, In di a
4thV Sulochana 5 t h P r a sa nn a Ku m a r R

C o u r s e D i r e c to r D e p a r t m e nt of Co m pu t e r Sc i e n c e
A n na A d m in i s t ra t i v e S ta f f a nd En gi n e e r in g
C o l l e g e, Ch e nn a i, In di a A m r i t a S c hoo l o f Co m pu ti ng ,
A m r i t a V i s hw a V id y a p e e tha m ,
C h e nn a i, In di a

Abstract—The core aim of this study is to explore the capa-


the application of LLMs in specialized domains, such as
bilities of Large Language Models (LLMs) in resume resume classification, still presents a set of unique
classifica- tion, a pivotal task in talent acquisition and human challenges and untapped opportunities. The importance
resources management. This research seeks to surpass the of precise resume classification is paramount,
constraints of traditional methods, which typically depend on particularly in the sectors of talent acquisition and human
elementary Natural Language Processing (NLP) strategies, by
leveraging the advanced potential of LLMs. To achieve this,
resource management [3]. Conventional methods, which
the study conducted a detailed empirical evaluation, testing a often depend on keyword matching and rudimentary
range of LLMs, including various versions of Text Davinci, NLP techniques, have proven to be inadequate in terms
GPT models, LLaMa and Flan T5 The methodology was of both accuracy and efficiency [4]. This research aims to
thorough, integrating data preprocessing and text address this gap by harnessing the capabilities of LLMs,
normalization to bolster the relia- bility and validity of the
findings. The investigation provided a comparative analysis of
specifically focusing on different versions of Text
the selected LLMs, concentrating on key performance Davinci and GPT models, to improve the precision and
indicators such as accuracy, precision, recall, and F1-Score. recall metrics in resume classification [5].
The findings reveal that LLMs, including the addition of
This study augments the existing scholarly discourse in
Gemini, significantly improve upon conventional methods in
resume classification tasks. This research not only offers deep multiple dimensions. Initially, it furnishes an empirical
insights into the utility and efficiency of LLMs in this field but assess- ment of a range of Large Language Models,
also lays the groundwork for future explorations. elucidating key performance indicators such as accuracy,
Furthermore, it highlights the revolutionary impact of LLMs precision, recall, and F1-Score, thereby extending prior
on revolutionizing talent acquisition and human resource
management practices.
research in deep learning- based job classification.
Subsequently, the investigation delves into the often-
neglected realm of prompt engineering, a crucial factor for
Index Terms—Large Language Models, Resume
the efficacious deployment of LLMs.
Classification, Talent Acquisition, Text Davinci, GPT Models
, Flan T5 , LLaMa
The remainder of this manuscript is systematically
I. INTRODUCTION orga- nized: Section 2 provides an exhaustive review of
extant liter- ature; Section 3 outlines the employed
The emergence of Large Language Models (LLMs) research methodology; Section 4 disseminates the
has marked a significant milestone in the field of Natural empirical results and their broader implications. The
Lan- guage Processing (NLP) and machine learning [1]. manuscript culminates with a synthesis of pivotal findings
These advanced models have shown exceptional and prospective avenues for scholarly explo- ration.
performance across a wide array of applications, from text
generation to intricate classification tasks [2]. However,

Authorized licensed use limited to: PES University Bengaluru. Downloaded on January 12,2025 at 13:10:49 UTC from IEEE Xplore. Restrictions apply.
II. RELATED WORKS engineering tasks [14, 19, 21] and K-8 educational
initiatives. The prevailing body of work posits prompt
engineering as an indispensable mechanism for leveraging
A. Introduction to Large Language Models LLMs’ capabilities across a broad spectrum of tasks.
The rise of Large Language Models (LLMs) such as Nevertheless, there exists a pressing need for more
GPT- 3, GPT-4, and Google’s Gemini has redefined the empirical studies to dissect the constraints and scalability
capabil- ities within Natural Language Processing (NLP), of these methodologies, particularly in sectors requiring
delivering unparalleled proficiency in text generation, specialized knowledge, like healthcare [15, 23] and the
summarization, and classification. These models, each legal framework.
trained on massive datasets, showcase exceptional skill in
identifying intricate language patterns and nuances. D. LLMs in Text Classification
However, the substantial size of these models introduces The rise of Large Language Models (LLMs) has
significant challenges in terms of computational demands fundamen- tally transformed text classification,
and energy consumption [6]. showcasing extraordinary abilities in mimicking human-
like understanding and text gen- eration. These models,
LLMs extend their utility beyond mere text trained on vast datasets, excel in a range of classification
generation to tackle complex tasks such as text tasks, from discerning sentiment to identifying topics. Yet,
classification. For in- stance, [7] demonstrated that LLMs, their capabilities are not flawless. Research identi- fied by
including the newer entrant Gemini, act as potent near [16,18,20,19] highlights a significant challenge: despite
cold-start recommenders for language- and item-based their proficiency in content comprehension, LLMs often
preferences. While this research provides a detailed fall short in accurately predicting user ratings. This
evaluation, it stops short of examining the models’ discrepancy underscores a crucial area for further
performance in a broad range of real-world scenarios, investigation into the subtle dynamics of user behavior.
highlighting a gap that future investigations need to
address. E. LLMs in Job Classification
The application of Large Language Models (LLMs) in
B. LLMs and Chain of Thoughts the realm of job classification and recommendation has
garnered significant attention in recent years. Du et al.
In the rapidly evolving domain of Large Language
(2023) pioneered this domain with their work ”Enhancing
Models (LLMs), the” Chain of Thought” concept has
Job Recommendation through LLM-based Generative
become a crucial tool for boosting the reasoning
Adversarial Networks,” which leveraged the generative
abilities of these models. The foundational work by
capabilities of LLMs to create more personalized and
Sanner et al. [8] provided key insights into how LLMs
relevant job recommendations. The methodology
achieve consistency in prolonged conversations.
employed Generative Adversarial Networks (GANs) in
Expanding on this, Ji et al. [9] delved into task- specific
tandem with LLMs, thereby achieving a synergistic effect
reasoning, with a particular focus on personalized rec-
that significantly improved the quality of job
ommendation systems. Further innovation was introduced
recommendations.
by Wang et al. [10], who incorporated user behavior
analysis into LLMs, enabling the models to adjust their
reasoning processes based on user interactions. Despite III. Methodology
these significant strides, a noticeable gap remains in
understanding the limitations of LLMs’ reasoning depth,
highlighting an area ripe for further investigation. A) Dataset
C. Prompt Engineering in LLMs The dataset is a collection of 358 resumes, distributed
across various technical roles in the IT sector. Each resume
The realm of Prompt Engineering within Large is categorized into one of seven distinct job roles:
Language Models (LLMs) has emerged as a focal point of Blockchain, DevOps Engineer, Hadoop, HR, Java
interest in con- temporary research. Investigations in this Developer, Python Developer, and Web Designing. The
area have spanned a variety of aspects, including ad-hoc dataset aims to provide a comprehensive resource for
task adaptation [11, 17], job type classification [12 , 22 ], automating the task of job role classification and resume
and the exploration of achiev- ing human-level prompt screening in the IT industry shown in figure 1.
engineering [13]. Such research has not only underscored
the effectiveness of meticulously crafted prompts in
boosting model output but also paved the way for novel
practical applications, encompassing automated software

Authorized licensed use limited to: PES University Bengaluru. Downloaded on January 12,2025 at 13:10:49 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Dataset Description

A. Model Architecture and Configuration

1) Davinci : The foundation of both Text Davinci 001


and 003 models is the GPT (Generative Pre-trained
Transformer) architecture. This design enables the models Fig 2. Architecture of FLAN T5
to grasp context and produce text that is coherent and
relevant to the given context. The models are distinguished
by their size and the vol- ume of data they have been 3) LLaMA 2: LLaMA 2, a state-of-the-art large language
trained on. With each subsequent version, such as moving model developed by Meta AI, was also adapted for the task
from 001 to 003, the models typically increase in size and of resume classification. This model, known for its
are trained on a larger corpus of data. This expansion exceptional performance in various benchmarks, was fine-
results in enhanced performance, including better tuned using QLoRA, an extension of LoRA that
understanding and more nuanced text generation incorporates quantization to enhance parameter efficiency
capabilities. further. QLoRA applies quantization techniques to reduce
the precision of floating- point numbers, significantly
2) FLAN T5: The FLAN T5 model refer figure 2, a lowering the memory footprint and computational cost of
variant of the T5 model fine-tuned with a focus on fine-tuning large language models. The Architecture of
instruction-based tasks, was utilized for the classification LLMa2 is shown in Fig 3.
of resumes. This model’s architecture, which is built on
the encoder-decoder framework of the original T5, is For the classification of resumes, this approach allowed
particularly suited for tasks that require an understanding LLaMA 2 to be fine-tuned in a way that balances
of complex language structures and the ability to generate efficiency and accuracy, ensuring that the model could
text classifications based on the input prompt. effectively categorize resumes without a substantial
degradation in performance. The fine-tuning process
For resume classification, the FLAN T5 model was utilized the Supervised Fine-tuning trainer function from
fine- tuned using Low-Rank Adaptation (LoRA), a the Transformers reinforcement learning library, tailored
parameter- efficient fine-tuning method that allows for to optimize the model for the nuanced task of parsing
the adaptation of the model with minimal computational and classifying resumes.
resources. This approach involves updating a subset of the
model’s parameters, specifically targeting the weight By employing QLoRA, LLaMA 2 was made adept at
matrices in the transformer layers with low-rank matrices understanding the intricate de- tails within resumes, such
to approximate changes. as parsing the nuanced information contained within job
descriptions and educational achievements and
The fine-tuning process was facilitated by the Seq2Seq classifying them accurately.
trainer function from the transformer’s library, ensuring
that the model could efficiently learn to categorize
resumes into predefined categories based on the content,
such as work experience, skills, and educational
background.

Authorized licensed use limited to: PES University Bengaluru. Downloaded on January 12,2025 at 13:10:49 UTC from IEEE Xplore. Restrictions apply.
efficacy of each model. Accuracy, a measure of the
model’s overall correctness, is complemented by
Precision and Recall, which respectively gauge the
model’s ability to generate relevant responses and its
sensitivity in identifying pertinent instances. The F1-
Scoreharmonizes these aspects, offering a singular metric
that encapsulates the model’s balanced performance in
precision and recall. This multifaceted evaluation
framework is pivotal, not only in benchmarking the
current capabilities of LLMs but also in illuminating
pathways for future enhancements, thereby propelling the
advancement of natural language processing technologies
and their application in diverse domains.

III. RESULT AND DISCUSSION

The table below presents a comparative analysis of the


performance metrics, including Accuracy, Precision,
Recall, and F1-Score, for four different LLMs: Text
Davinci 001, Text Davinci 003, Flan T5, and LLaMA.
From the data, it is evident that LLaMA outperforms the
other models across all metrics, achieving the highest
Accuracy (85.0%), Precision (86.40%), Recall (86.09%),
and F1-Score (87.00%). Following LLaMA, Flan T5 shows
commendable performance with an Accuracy of 80.06%,
Precision of 81.10%, Recall of 81.06%, and an F1-Score of
81.50%. Text Davinci 003 demonstrates moderate
performance improvements over Text Davinci 001,
particularly in Accuracy (75.5% vs. 67.88%) and Recall
(72.91% vs. 67.88%). However, its Precision (66.55%) is
slightly lower than that of Text Davinci 001 (67.09%), while
its F1-Score (69.50%) shows an improvement over Text
Davinci 001 (68.00%).

TABLE I
PERFORMANCE METRICS OF VARIOUS LLMS

Model Name Accuracy (%) Precision Recall F1-Score


Text Davinci 001 67.88 67.09 67.88 68.00
Text Davinci 003 75.5 66.55 72.91 69.50
Flan t5 80.06 81.10 81.06 81.50
LLaMA 85.0 86.40 86.09 87.00

The performance metrics in Table I indicate a clear trend


Fig. 3. Architecture of LLaMa 2 of improvement in model capabilities from Text Davinci
001 to LLaMA. The advancements in model architecture,
training methodologies, and fine-tuning approaches
B. Model Evaluation contribute to these improvements. LLaMA’s superior
performance can be attributed to its state-of-the-art
The Evaluation of Large Language Models (LLMs) architecture and the effective use of parameter-efficient
across varied prompt styles necessitates a meticulous fine-tuning techniques, such as QLoRA, which enhances its
approach, em- ploying a suite of performance metrics- ability to understand and generate more accurate responses
Accuracy, Precision, Recall, and F1-Score to ascertain the to prompts.

Authorized licensed use limited to: PES University Bengaluru. Downloaded on January 12,2025 at 13:10:49 UTC from IEEE Xplore. Restrictions apply.
convolu- tional networks for text classification. arXiv (Cornell
University). https://doi.org/10.48550/arxiv.1509.01626
Flan T5’s performance, while not reaching the heights of
LLaMA, still represents a significant advancement over the [6] Strubell, E., Ganesh, A., McCallum, A. (2019). Energy and policy
considerations for deep learning in NLP. arXiv (Cornell
Text Davinci models. Its architecture, which is University). https://doi.org/10.48550/arxiv.1906.02243
optimized for instruction-based tasks through instruction
[7] Sanner, S., Balog, K., Radlinski, F., Wedin, B., Dixon, L. (2023).
tuning, likely contributes to its higher precision and recall, Large Language Models are Competitive Near Cold-start
making it more adept at generating responses that closely Recommenders for Language- and Item-based Preferences. arXiv
align with the expected outcomes. (Cornell University). https://doi.org/10.48550/arxiv.2307.14225
[8] Sun, J., Luo, Y., Gong, Y., Chen, L., Shen, Y., Guo, J.,
The incremental improvement from Text Davinci 001 to Duan,N. (2023). Enhancing Chain-of-Thoughts Prompting with
Iterative Bootstrapping in Large Language Models. arXiv (Cornell
Text Davinci 003 highlights the ongoing enhancements in University). https://doi.org/10.48550/arxiv.2304.11657
model design and training. Despite Text Davinci 003’s
[9] Masikisiki, B., Marivate, V., Hlope, Y. (2023). Investigating the
lower precision compared to Text Davinci 001, its higher Ef- ficacy of Large Language Models in Reflective Assessment
accuracy, recall, and F1-Score suggest a better overall Methods through Chain of Thoughts Prompting. arXiv (Cornell
balance in per- formance, indicating a model that is more University). https://doi.org/10.48550/arxiv.2310.00272
reliable across a variety of tasks. The differences in [10] Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L.,
performance metrics also underscore the importance of Chen, Y., Narasimhan, K. (2023). Tree of Thoughts: Deliberate
Problem Solving with Large Language Models. arXiv (Cornell
choosing the right model for specific applications. While University). https://doi.org/10.48550/arxiv.2305.10601
LLaMA demonstrates superior overall performance, the
[11] Strobelt, H., Webson, A., Sanh, V., Hoover, B., Beyer, J., Pfister,
specific requirements of a task, such as the need for higher H., Rush, A. M. (2022). Interactive and Visual Prompt
precision over recall or vice versa, may make other models Engineering for Ad-hoc Task Adaptation with Large Language
more suitable for certain applications. Models. arXiv (Cornell University).
https://doi.org/10.48550/arxiv.2208.07852
[12] Clavie´, B., Ciceu, A., Naylor, F., Soulie´, G., Brightwell, T.
IV. CONCLUSION (2023). Large Language Models in the workplace: A case study on
prompt Engineering for job type classification. arXiv (Cornell
This study embarked on an exploration of the University). https://doi.org/10.48550/arxiv.2303.07142.
performance of various state-of-the-art LLMs, namely
Text Davinci 001, Text Davinci 003, Flan T5, and
[13] Zhou, Y., Muresanu, A. I., Zeng-Lin, H., Paster, K.,
LLaMA, across different prompt styles. Through Pitis, S., han, H., Ba, J. (2022b). Large language
meticulous evaluation based on ac- curacy, precision, models are Human-Level prompt engineers. arXiv (Cornell
recall, and F1-score, we have identified clear distinctions University). https://doi.org/10.48550/arxiv.2211.01910
in the capabilities of each model, with LLaMA emerging [14] Shin, J., Tang, C., Mohati, T., Nayebi, M., Wang, S., Hemmati,
as the frontrunner. This superior perfor- mance of LLaMA H. (2023). Prompt engineering or fine tuning: An empirical
assessment of large language models in automated software
is attributed to its advanced architecture and the innovative engineering tasks. arXiv (Cornell University).
application of parameter-efficient fine-tuning techniques, https://doi.org/10.48550/arxiv.2310.10508.
such as QLoRA, which significantly enhance its [15] Sivarajkumar, S., Kelley, M. R., Samolyk-Mazzanti, A.,
efficiency and effectiveness in processing and generating Visweswaran, S., Wang, Y. (2023). An empirical evaluation of
lan- guage. prompting strategies for large language models in Zero-Shot
clinical natural language processing. arXiv (Cornell University).
https://doi.org/10.48550/arxiv.2309.08008
REFERENCES
[16] Gurusamy, Bharathi Mohan Kumar, R. Parathasarathy,
Srinivasan Aravind, S. Hanish, K. Pavithria, G.. (2023). Text
[1] Brown, T. B., Mann, B. F., Ryder, N. C., Subbiah, M., Kaplan, J., Summarization for Big Data Analytics: A Comprehensive
Dhari- wal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, Review of GPT 2 and BERT Approaches. 10.1007/978-3-031-
A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., 33808-3 14.
Child, R., Ramesh, A., Ziegler, D. M., Wu, J. C., Winter, C., . . .
Amodei, D. (2020). Language Models are Few-Shot Learners. [17] Gurusamy, B. M., Rangarajan, P. K., & Srinivasan, P. (2023).
https://doi.org/10.48550/arxiv.2005.14165 A hybrid approach for text summarization using semantic latent
[2] Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., and Dirichlet allocation and sentence concept mapping with
Bowman,S. R. (2018). GLUE: a Multi-Task benchmark and transformer. International Journal of Electrical and Computer
analysis plat- form for natural language understanding. arXiv Engineering (IJECE), 13(6), 6663- 6672.
(Cornell University). https://doi.org/10.48550/arxiv.1804.07461 https://doi.org/10.11591/ijece.v13i6.pp6663-6672

[3] Nakamura, T., and Goto, R. (2018). Outfit generation and style [18] Rithani, M. Kumar, R. Doss, Srinath. (2023). A review on big
extraction via bidirectional LSTM and autoencoder. arXiv data based on deep neural network approaches. Artificial
(Cornell University). https://doi.org/10.48550/arxiv.1807.03133 Intelligence Review.56. 1-37. 10.1007/s10462-023-10512-5.

[4] Johnson, Rie Zhang, Tong. (2017). Deep Pyramid Convolutional [19] G. B. Mohan, R. P. Kumar and T. Ravi,” Coalescing Clustering
Neural Networks for Text Categorization. 562-570. and Classification,” IET Chennai 3rd International on Sustainable
10.18653/v1/P17-1052. . Energy and Intelligent Systems (SEISCON 2012), Tiruchengode,
2012, pp. 1-5, doi: 10.1049/cp.2012.2254.
[5] Zhang, X., Zhao, J., LeCun, Y. (2015). Character-level
[20] Siva Jyothi Natha Reddy, B., Yadav, S., Venkatakrishnan, R.,

Authorized licensed use limited to: PES University Bengaluru. Downloaded on January 12,2025 at 13:10:49 UTC from IEEE Xplore. Restrictions apply.
Oviya,I.R. (2023). Comparison of Deep Learning Approaches for
DNA- Binding Protein Classification Using CNN and Hybrid
Models. In: Tripathi, A.K., Anand, D., Nagar, A.K. (eds)
Proceedings of World Conference on Artificial Intelligence:
Advances and Applications. WWCA 1997. Algorithms for
Intelligent Systems. Springer, Singapore.
https://doi.org/10.1007/978-981-99-5881-
[21] P. K. R, B. M. G, P. Srinivasan and V. R,” Transformer-Based
Models for Named Entity Recognition: A Comparative Study,”
2023 14th International Conference on Computing
Communication and Networking Technologies (ICCCNT),
Delhi, India, 2023, pp. 1-5, doi: 10.1109/IC-
CCNT56998.2023.10308039.
[22] A. R et al.,” A u t o m a t i n g Machine Learning Model
Development: An OperationalML Approach with PyCARET and
Streamlit,” 2023 Innovations in Power and Advanced Computing
Technologies (i- PACT), Kuala Lumpur, Malaysia, 2023, pp. 1-
6, doi: 10.1109/i- PACT58649.2023.10434389.

[23] B. M. G et al.,” Transformer-based Models for Language


Identification: A Comparative Study,” 2023 International
Conference on System, Computation, Automation and
Networking (ICSCAN), PUDUCHERRY, India, 2023, pp. 1-6,
doi: 10.1109/ICSCAN58655.2023.10394757.

Authorized licensed use limited to: PES University Bengaluru. Downloaded on January 12,2025 at 13:10:49 UTC from IEEE Xplore. Restrictions apply.

You might also like