Revolutionizing Talent Acquisition A Comparative Study of Large Language Models in Resume Classification
Revolutionizing Talent Acquisition A Comparative Study of Large Language Models in Resume Classification
Classification
1stVenkatakrishnan R, 2 n d R i th an i M 3 r d B h a ra th i M oh an G
D e p a r t m e nt of Co m pu t e r Sc i e n c e D e p a r t m en t o f Co m pu t e r D e p a r t m e nt of Co m pu t e r Sc i e n c e
a nd En gi n e e r in g S c i e n c e and En gi n e e r in g a nd En gi n e e r in g
A m r i t a S c hoo l o f Co m pu ti ng , A m r i t a S c hoo l o f Co m pu ti ng , A m r i t a S c hoo l o f Co m pu ti ng ,
A m r i t a V i s hw a V id y a p e e tha m , A m r i t a V i s hw a V id y a p e e tha m , A m r i t a V i s hw a V id y a p e e tha m ,
C h e nn a i, In di a C h e nn a i, In di a C h e nn a i, In di a
4thV Sulochana 5 t h P r a sa nn a Ku m a r R
C o u r s e D i r e c to r D e p a r t m e nt of Co m pu t e r Sc i e n c e
A n na A d m in i s t ra t i v e S ta f f a nd En gi n e e r in g
C o l l e g e, Ch e nn a i, In di a A m r i t a S c hoo l o f Co m pu ti ng ,
A m r i t a V i s hw a V id y a p e e tha m ,
C h e nn a i, In di a
Authorized licensed use limited to: PES University Bengaluru. Downloaded on January 12,2025 at 13:10:49 UTC from IEEE Xplore. Restrictions apply.
II. RELATED WORKS engineering tasks [14, 19, 21] and K-8 educational
initiatives. The prevailing body of work posits prompt
engineering as an indispensable mechanism for leveraging
A. Introduction to Large Language Models LLMs’ capabilities across a broad spectrum of tasks.
The rise of Large Language Models (LLMs) such as Nevertheless, there exists a pressing need for more
GPT- 3, GPT-4, and Google’s Gemini has redefined the empirical studies to dissect the constraints and scalability
capabil- ities within Natural Language Processing (NLP), of these methodologies, particularly in sectors requiring
delivering unparalleled proficiency in text generation, specialized knowledge, like healthcare [15, 23] and the
summarization, and classification. These models, each legal framework.
trained on massive datasets, showcase exceptional skill in
identifying intricate language patterns and nuances. D. LLMs in Text Classification
However, the substantial size of these models introduces The rise of Large Language Models (LLMs) has
significant challenges in terms of computational demands fundamen- tally transformed text classification,
and energy consumption [6]. showcasing extraordinary abilities in mimicking human-
like understanding and text gen- eration. These models,
LLMs extend their utility beyond mere text trained on vast datasets, excel in a range of classification
generation to tackle complex tasks such as text tasks, from discerning sentiment to identifying topics. Yet,
classification. For in- stance, [7] demonstrated that LLMs, their capabilities are not flawless. Research identi- fied by
including the newer entrant Gemini, act as potent near [16,18,20,19] highlights a significant challenge: despite
cold-start recommenders for language- and item-based their proficiency in content comprehension, LLMs often
preferences. While this research provides a detailed fall short in accurately predicting user ratings. This
evaluation, it stops short of examining the models’ discrepancy underscores a crucial area for further
performance in a broad range of real-world scenarios, investigation into the subtle dynamics of user behavior.
highlighting a gap that future investigations need to
address. E. LLMs in Job Classification
The application of Large Language Models (LLMs) in
B. LLMs and Chain of Thoughts the realm of job classification and recommendation has
garnered significant attention in recent years. Du et al.
In the rapidly evolving domain of Large Language
(2023) pioneered this domain with their work ”Enhancing
Models (LLMs), the” Chain of Thought” concept has
Job Recommendation through LLM-based Generative
become a crucial tool for boosting the reasoning
Adversarial Networks,” which leveraged the generative
abilities of these models. The foundational work by
capabilities of LLMs to create more personalized and
Sanner et al. [8] provided key insights into how LLMs
relevant job recommendations. The methodology
achieve consistency in prolonged conversations.
employed Generative Adversarial Networks (GANs) in
Expanding on this, Ji et al. [9] delved into task- specific
tandem with LLMs, thereby achieving a synergistic effect
reasoning, with a particular focus on personalized rec-
that significantly improved the quality of job
ommendation systems. Further innovation was introduced
recommendations.
by Wang et al. [10], who incorporated user behavior
analysis into LLMs, enabling the models to adjust their
reasoning processes based on user interactions. Despite III. Methodology
these significant strides, a noticeable gap remains in
understanding the limitations of LLMs’ reasoning depth,
highlighting an area ripe for further investigation. A) Dataset
C. Prompt Engineering in LLMs The dataset is a collection of 358 resumes, distributed
across various technical roles in the IT sector. Each resume
The realm of Prompt Engineering within Large is categorized into one of seven distinct job roles:
Language Models (LLMs) has emerged as a focal point of Blockchain, DevOps Engineer, Hadoop, HR, Java
interest in con- temporary research. Investigations in this Developer, Python Developer, and Web Designing. The
area have spanned a variety of aspects, including ad-hoc dataset aims to provide a comprehensive resource for
task adaptation [11, 17], job type classification [12 , 22 ], automating the task of job role classification and resume
and the exploration of achiev- ing human-level prompt screening in the IT industry shown in figure 1.
engineering [13]. Such research has not only underscored
the effectiveness of meticulously crafted prompts in
boosting model output but also paved the way for novel
practical applications, encompassing automated software
Authorized licensed use limited to: PES University Bengaluru. Downloaded on January 12,2025 at 13:10:49 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. Dataset Description
Authorized licensed use limited to: PES University Bengaluru. Downloaded on January 12,2025 at 13:10:49 UTC from IEEE Xplore. Restrictions apply.
efficacy of each model. Accuracy, a measure of the
model’s overall correctness, is complemented by
Precision and Recall, which respectively gauge the
model’s ability to generate relevant responses and its
sensitivity in identifying pertinent instances. The F1-
Scoreharmonizes these aspects, offering a singular metric
that encapsulates the model’s balanced performance in
precision and recall. This multifaceted evaluation
framework is pivotal, not only in benchmarking the
current capabilities of LLMs but also in illuminating
pathways for future enhancements, thereby propelling the
advancement of natural language processing technologies
and their application in diverse domains.
TABLE I
PERFORMANCE METRICS OF VARIOUS LLMS
Authorized licensed use limited to: PES University Bengaluru. Downloaded on January 12,2025 at 13:10:49 UTC from IEEE Xplore. Restrictions apply.
convolu- tional networks for text classification. arXiv (Cornell
University). https://doi.org/10.48550/arxiv.1509.01626
Flan T5’s performance, while not reaching the heights of
LLaMA, still represents a significant advancement over the [6] Strubell, E., Ganesh, A., McCallum, A. (2019). Energy and policy
considerations for deep learning in NLP. arXiv (Cornell
Text Davinci models. Its architecture, which is University). https://doi.org/10.48550/arxiv.1906.02243
optimized for instruction-based tasks through instruction
[7] Sanner, S., Balog, K., Radlinski, F., Wedin, B., Dixon, L. (2023).
tuning, likely contributes to its higher precision and recall, Large Language Models are Competitive Near Cold-start
making it more adept at generating responses that closely Recommenders for Language- and Item-based Preferences. arXiv
align with the expected outcomes. (Cornell University). https://doi.org/10.48550/arxiv.2307.14225
[8] Sun, J., Luo, Y., Gong, Y., Chen, L., Shen, Y., Guo, J.,
The incremental improvement from Text Davinci 001 to Duan,N. (2023). Enhancing Chain-of-Thoughts Prompting with
Iterative Bootstrapping in Large Language Models. arXiv (Cornell
Text Davinci 003 highlights the ongoing enhancements in University). https://doi.org/10.48550/arxiv.2304.11657
model design and training. Despite Text Davinci 003’s
[9] Masikisiki, B., Marivate, V., Hlope, Y. (2023). Investigating the
lower precision compared to Text Davinci 001, its higher Ef- ficacy of Large Language Models in Reflective Assessment
accuracy, recall, and F1-Score suggest a better overall Methods through Chain of Thoughts Prompting. arXiv (Cornell
balance in per- formance, indicating a model that is more University). https://doi.org/10.48550/arxiv.2310.00272
reliable across a variety of tasks. The differences in [10] Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L.,
performance metrics also underscore the importance of Chen, Y., Narasimhan, K. (2023). Tree of Thoughts: Deliberate
Problem Solving with Large Language Models. arXiv (Cornell
choosing the right model for specific applications. While University). https://doi.org/10.48550/arxiv.2305.10601
LLaMA demonstrates superior overall performance, the
[11] Strobelt, H., Webson, A., Sanh, V., Hoover, B., Beyer, J., Pfister,
specific requirements of a task, such as the need for higher H., Rush, A. M. (2022). Interactive and Visual Prompt
precision over recall or vice versa, may make other models Engineering for Ad-hoc Task Adaptation with Large Language
more suitable for certain applications. Models. arXiv (Cornell University).
https://doi.org/10.48550/arxiv.2208.07852
[12] Clavie´, B., Ciceu, A., Naylor, F., Soulie´, G., Brightwell, T.
IV. CONCLUSION (2023). Large Language Models in the workplace: A case study on
prompt Engineering for job type classification. arXiv (Cornell
This study embarked on an exploration of the University). https://doi.org/10.48550/arxiv.2303.07142.
performance of various state-of-the-art LLMs, namely
Text Davinci 001, Text Davinci 003, Flan T5, and
[13] Zhou, Y., Muresanu, A. I., Zeng-Lin, H., Paster, K.,
LLaMA, across different prompt styles. Through Pitis, S., han, H., Ba, J. (2022b). Large language
meticulous evaluation based on ac- curacy, precision, models are Human-Level prompt engineers. arXiv (Cornell
recall, and F1-score, we have identified clear distinctions University). https://doi.org/10.48550/arxiv.2211.01910
in the capabilities of each model, with LLaMA emerging [14] Shin, J., Tang, C., Mohati, T., Nayebi, M., Wang, S., Hemmati,
as the frontrunner. This superior perfor- mance of LLaMA H. (2023). Prompt engineering or fine tuning: An empirical
assessment of large language models in automated software
is attributed to its advanced architecture and the innovative engineering tasks. arXiv (Cornell University).
application of parameter-efficient fine-tuning techniques, https://doi.org/10.48550/arxiv.2310.10508.
such as QLoRA, which significantly enhance its [15] Sivarajkumar, S., Kelley, M. R., Samolyk-Mazzanti, A.,
efficiency and effectiveness in processing and generating Visweswaran, S., Wang, Y. (2023). An empirical evaluation of
lan- guage. prompting strategies for large language models in Zero-Shot
clinical natural language processing. arXiv (Cornell University).
https://doi.org/10.48550/arxiv.2309.08008
REFERENCES
[16] Gurusamy, Bharathi Mohan Kumar, R. Parathasarathy,
Srinivasan Aravind, S. Hanish, K. Pavithria, G.. (2023). Text
[1] Brown, T. B., Mann, B. F., Ryder, N. C., Subbiah, M., Kaplan, J., Summarization for Big Data Analytics: A Comprehensive
Dhari- wal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, Review of GPT 2 and BERT Approaches. 10.1007/978-3-031-
A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., 33808-3 14.
Child, R., Ramesh, A., Ziegler, D. M., Wu, J. C., Winter, C., . . .
Amodei, D. (2020). Language Models are Few-Shot Learners. [17] Gurusamy, B. M., Rangarajan, P. K., & Srinivasan, P. (2023).
https://doi.org/10.48550/arxiv.2005.14165 A hybrid approach for text summarization using semantic latent
[2] Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., and Dirichlet allocation and sentence concept mapping with
Bowman,S. R. (2018). GLUE: a Multi-Task benchmark and transformer. International Journal of Electrical and Computer
analysis plat- form for natural language understanding. arXiv Engineering (IJECE), 13(6), 6663- 6672.
(Cornell University). https://doi.org/10.48550/arxiv.1804.07461 https://doi.org/10.11591/ijece.v13i6.pp6663-6672
[3] Nakamura, T., and Goto, R. (2018). Outfit generation and style [18] Rithani, M. Kumar, R. Doss, Srinath. (2023). A review on big
extraction via bidirectional LSTM and autoencoder. arXiv data based on deep neural network approaches. Artificial
(Cornell University). https://doi.org/10.48550/arxiv.1807.03133 Intelligence Review.56. 1-37. 10.1007/s10462-023-10512-5.
[4] Johnson, Rie Zhang, Tong. (2017). Deep Pyramid Convolutional [19] G. B. Mohan, R. P. Kumar and T. Ravi,” Coalescing Clustering
Neural Networks for Text Categorization. 562-570. and Classification,” IET Chennai 3rd International on Sustainable
10.18653/v1/P17-1052. . Energy and Intelligent Systems (SEISCON 2012), Tiruchengode,
2012, pp. 1-5, doi: 10.1049/cp.2012.2254.
[5] Zhang, X., Zhao, J., LeCun, Y. (2015). Character-level
[20] Siva Jyothi Natha Reddy, B., Yadav, S., Venkatakrishnan, R.,
Authorized licensed use limited to: PES University Bengaluru. Downloaded on January 12,2025 at 13:10:49 UTC from IEEE Xplore. Restrictions apply.
Oviya,I.R. (2023). Comparison of Deep Learning Approaches for
DNA- Binding Protein Classification Using CNN and Hybrid
Models. In: Tripathi, A.K., Anand, D., Nagar, A.K. (eds)
Proceedings of World Conference on Artificial Intelligence:
Advances and Applications. WWCA 1997. Algorithms for
Intelligent Systems. Springer, Singapore.
https://doi.org/10.1007/978-981-99-5881-
[21] P. K. R, B. M. G, P. Srinivasan and V. R,” Transformer-Based
Models for Named Entity Recognition: A Comparative Study,”
2023 14th International Conference on Computing
Communication and Networking Technologies (ICCCNT),
Delhi, India, 2023, pp. 1-5, doi: 10.1109/IC-
CCNT56998.2023.10308039.
[22] A. R et al.,” A u t o m a t i n g Machine Learning Model
Development: An OperationalML Approach with PyCARET and
Streamlit,” 2023 Innovations in Power and Advanced Computing
Technologies (i- PACT), Kuala Lumpur, Malaysia, 2023, pp. 1-
6, doi: 10.1109/i- PACT58649.2023.10434389.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on January 12,2025 at 13:10:49 UTC from IEEE Xplore. Restrictions apply.