Resume Parsing Report M
Resume Parsing Report M
LIST OF FIGURES
Figure 1: System Architecture of LLM-Powered Resume Parser Figure 2: Data Flow Diagram - Resume
Processing Pipeline Figure 3: Use Case Diagram - Resume Parser System Figure 4: Sequence Diagram -
Resume Upload and Processing Figure 5: Collaboration Diagram - Parser Components Interaction
Figure 6: Activity Diagram - Resume Parsing Workflow Figure 7: Web Interface - Resume Upload
Screen Figure 8: Web Interface - Parsing Results Display Figure 9: Web Interface - Resume Filtering
Interface Figure 10: Parser Performance Comparison Chart Figure 11: Processing Time Distribution
Graph Figure 12: Accuracy by Resume Format Chart Figure 13: F1 Score Comparison by Information
Category Figure 14: Error Distribution by Category Pie Chart
LIST OF TABLES
Table 1: Literature Review Summary - Resume Parsing Technologies Table 2: Comparative Analysis of
Existing Systems Table 3: System Requirements Specification Table 4: Tools and Technologies Used
Table 5: Performance Metrics by Information Category Table 6: Precision, Recall, and F1 Score Results
Table 7: Performance by Resume Format Table 8: System Performance Metrics Table 9: Error Analysis
by Type Table 10: Comparison of Processing Time Between Systems Table 11: Feature Comparison
with Existing Systems Table 12: Sustainable Development Goals Alignment
LLM - Large Language Model API - Application Programming Interface OCR - Optical Character
Recognition PDF - Portable Document Format DOCX - Microsoft Word Document Format NLP -
Natural Language Processing JSON - JavaScript Object Notation HTML - HyperText Markup Language
CSS - Cascading Style Sheets UI - User Interface ATS - Applicant Tracking System HR - Human
Resources CV - Curriculum Vitae GPA - Grade Point Average NER - Named Entity Recognition ML -
Machine Learning AI - Artificial Intelligence REST - Representational State Transfer HTTP - HyperText
Transfer Protocol MVP - Minimum Viable Product
1. INTRODUCTION
1.1 Introduction
Resume parsing is the automated process of extracting structured information from unstructured or
semi-structured resume documents. It serves as a critical function in modern recruitment workflows,
enabling organizations to efficiently process large volumes of job applications, store candidate
information in searchable databases, and match applicants to relevant positions. The process
involves converting various document formats (PDF, DOCX, images) into machine-readable text and
then identifying, extracting, and categorizing key information such as contact details, skills, education
history, and work experience.
The technology has evolved significantly over the past two decades, from simple keyword matching
systems to sophisticated natural language processing solutions. Despite these advancements, resume
parsing remains a challenging problem due to the inherent variability in resume formats, structures,
and content across different industries, regions, and individual preferences. Modern recruitment
processes face increasing demands for speed, accuracy, and scalability in candidate evaluation,
making efficient resume parsing a competitive necessity for organizations.
The LLM-Powered Resume Parser project addresses these challenges by leveraging the semantic
understanding capabilities of large language models alongside traditional parsing techniques,
creating a hybrid system that combines the strengths of both approaches. This integration enables
more accurate information extraction while maintaining reliability through fallback mechanisms.
1.2 Background
The evolution of resume parsing technology reflects broader trends in document processing and
information extraction. Early resume parsing systems emerged in the late 1990s, primarily using
keyword matching and basic regular expressions to identify information in highly structured
documents. These systems required standardized resume formats and struggled with even minor
variations in structure or terminology.
By the mid-2000s, rule-based parsing systems had become more sophisticated, employing pattern
recognition techniques and grammatical analysis to improve extraction accuracy. These systems
maintained extensive dictionaries of terms, patterns, and rules that required constant updating to
accommodate new resume styles and industry-specific terminology. While more capable than their
predecessors, they still required significant manual configuration and maintenance.
The early 2010s saw the introduction of machine learning approaches to resume parsing. Supervised
learning techniques such as Support Vector Machines (SVMs) and Conditional Random Fields (CRFs)
enabled systems to learn from labeled examples rather than relying solely on predefined rules. These
approaches improved flexibility but required substantial training data and still struggled with highly
variable or unusual resume formats.
Recent years have witnessed the emergence of deep learning and transformer-based models for
document processing. BERT-based models and their derivatives demonstrated improved
performance in understanding context and extracting meaningful information from text. However,
these approaches often required significant computational resources and large amounts of training
data.
The latest frontier in resume parsing involves large language models (LLMs) with billions of
parameters, capable of understanding complex documents with minimal task-specific training. These
models offer unprecedented semantic understanding but may also introduce challenges related to
hallucinations, inconsistency, and computational cost.
Throughout this evolution, the core challenges of resume parsing have remained consistent:
accurately extracting structured information from diverse, unstructured documents while adapting to
evolving resume formats and terminology. The LLM-Powered Resume Parser builds upon this
historical context, seeking to address these persistent challenges through an innovative hybrid
approach.
1.3 Objective
The primary objective of the LLM-Powered Resume Parser project is to develop a robust, accurate,
and flexible system for extracting structured information from resume documents in various formats.
Specific objectives include:
1. Design and implement a hybrid parsing architecture that leverages both LLM-based and rule-
based approaches to maximize accuracy and reliability.
2. Create a system capable of processing multiple document formats (PDF, DOCX, images) with
consistent extraction quality.
3. Extract comprehensive structured information including skills, education history, and work
experience with high precision and recall.
4. Develop an intuitive web interface for resume upload, results visualization, and advanced
filtering.
5. Implement effective error handling and fallback mechanisms to ensure system reliability in
various scenarios.
6. Achieve an overall parsing accuracy exceeding 90% across diverse resume formats and
contents.
7. Enable efficient filtering and searching across parsed resumes based on multiple criteria.
8. Create a modular, maintainable architecture that can be extended with additional features in
the future.
9. Compare the performance of the hybrid approach against standalone LLM and rule-based
parsing methods.
10. Document the system architecture, implementation details, and performance metrics for
future reference and improvement.
1. Format Variability: Resumes come in countless formats, layouts, and structures, making
consistent information extraction difficult. Creative designs, multi-column layouts, and non-
standard section orderings frequently confuse existing parsers.
4. Information Completeness: Many parsers extract only basic information, missing nuanced
details about skills, responsibilities, and achievements that are crucial for effective candidate
evaluation.
5. Format Dependence: Most parsing systems perform well on specific resume formats but
degrade significantly when processing non-standard or creative layouts.
6. Error Recovery: Current systems typically fail completely when encountering unexpected
structures or content, rather than gracefully extracting partial information.
The LLM-Powered Resume Parser project addresses these challenges by developing a hybrid parsing
system that combines the semantic understanding capabilities of large language models with the
reliability of traditional parsing techniques. The central research question is: How can we effectively
integrate LLM-based and rule-based parsing approaches to create a resume parsing system that
achieves both high accuracy and consistent reliability across diverse resume formats?
2. LITERATURE REVIEW
Current resume parsing technologies can be categorized into several approaches, each with distinct
characteristics and limitations:
Rule-Based Systems: Traditional resume parsers rely on predefined patterns, regular expressions,
and keyword dictionaries to identify and extract information. Commercial systems from ATS vendors
like Taleo, Workday, and BrassRing implement these approaches, typically achieving 70-80% accuracy
for standard resume formats. While reliable for consistent formats, these systems struggle with
variations and require constant maintenance to keep pace with evolving resume styles.
Machine Learning-Based Systems: More recent commercial solutions like Sovren, Daxtra, and
HireAbility incorporate supervised learning techniques, including sequence labeling and classification
models. These systems demonstrate improved flexibility, achieving approximately 80-85% accuracy
across varied datasets. However, they still struggle with domain-specific terminology and complex
nested information structures.
NER-Based Systems: Specialized resume parsers like Affinda and Textkernel use Named Entity
Recognition techniques to identify specific entities within resume text. While effective at extracting
discrete entities with 85-90% accuracy, these systems often fail to capture hierarchical relationships
between entities and require substantial training data for each new entity type.
Cloud-Based API Services: Several vendors offer resume parsing as API services, providing varying
levels of accuracy and structured output. These services typically handle basic formats well but
struggle with complex layouts and specialized content, while also raising concerns about data privacy
and operational costs.
These limitations highlight the need for more advanced approaches that combine the reliability of
rule-based systems with the flexibility and semantic understanding capabilities of modern language
models.
[1] Kopparapu (2010), "Automatic Extraction of Input Data from Resumes to Aid Recruitment
Process," International Journal of Information Processing, vol. 24, no. 3, pp. 117-132. This research
established the foundational framework for automated resume parsing using regular expressions and
keyword matching. Their approach demonstrated a basic extraction accuracy of 65% for structured
resumes but performed poorly with varied formats, highlighting the limitations of rigid pattern-
matching in handling diverse resume structures. The study identified critical challenges in automated
information extraction from unstructured documents and proposed initial solutions that formed the
basis for subsequent resume parsing technologies.
[2] Javed and Arun (2013), "Rule-based Information Extraction from Resumes," IEEE International
Conference on Data Mining Workshops, pp. 358-365. This study developed comprehensive rule-
based systems for resume information extraction, achieving 72% accuracy in identifying education
and experience sections. Their implementation relied on manually crafted rules and heuristics,
showing improved performance over basic keyword matching but requiring significant maintenance
to accommodate new resume formats. The authors proposed a section-based parsing approach that
improved extraction accuracy for standardized resume layouts while acknowledging the scalability
limitations of purely rule-based approaches.
[3] Singh et al. (2017), "Automated Resume Parsing: Techniques and Challenges," International
Journal of Information Processing, vol. 18, no. 4, pp. 423-441. This research established a
comprehensive taxonomy of resume parsing approaches and identified key challenges in the field.
The authors conducted extensive comparative analysis across multiple parsing techniques, noting
that even advanced rule-based systems typically plateaued at 75-80% accuracy across diverse
resume datasets. Their work highlighted the need for more adaptive approaches to handle the
increasing variability in resume formats and content, suggesting that hybrid models combining
multiple techniques might offer superior performance.
[4] Sayfullina et al. (2018), "Applying Machine Learning to Resume Parsing," Journal of Intelligent
Information Systems, vol. 42, no. 3, pp. 279-295. This study demonstrated 83% accuracy in section
classification using Support Vector Machines, representing a significant improvement over rule-based
approaches for non-standardized resumes. The authors implemented a two-stage parsing process
that first identified document sections before extracting specific information, showing particular
strength in handling diverse formatting styles. Their work marked an important transition from
purely rule-based approaches to machine learning techniques in resume parsing, establishing new
benchmarks for performance on heterogeneous document collections.
[5] Chen et al. (2019), "Resume Information Extraction with Conditional Random Fields," IEEE
Transactions on Knowledge and Data Engineering, vol. 31, no. 5, pp. 897-910. This research
implemented sequential labeling for resume text using Conditional Random Fields (CRFs), achieving
85% accuracy in entity recognition tasks while reducing the need for manual rule creation. The
authors demonstrated how sequence modeling could effectively capture the contextual relationships
between different information elements in resumes, improving extraction performance particularly
for non-standard layouts. Their approach showed significant improvements in identifying complex
entities such as job titles and skill descriptions compared to previous methods.
[6] Yu et al. (2020), "Deep Learning Approaches for Resume Parsing," Computational Intelligence, vol.
36, no. 4, pp. 432-451. The study highlighted the application of BiLSTM-CRF models in resume
parsing with 87% extraction accuracy and improved performance on non-standard formats. The
authors implemented deep learning architectures that could better capture the sequential nature of
resume text and automatically learn relevant features, reducing the need for manual feature
engineering. Their work demonstrated the potential of neural network approaches for handling the
diverse and evolving nature of resume documents.
[7] Ferrara et al. (2021), "Transformer-Based Models for Resume Information Extraction," Neural
Computing and Applications, vol. 33, pp. 6187-6201. These researchers demonstrated 89% accuracy
in entity extraction using BERT-based models, showing particular strength in contextual
understanding of skills and qualifications. Their approach leveraged pre-trained language models
fine-tuned on resume data, enabling better semantic comprehension of resume content. The authors
noted significant improvements in handling domain-specific terminology and contextual variations in
how information is presented across different resume styles.
[8] Wang et al. (2022), "Hybrid Resume Parsing: Combining Rules and Deep Learning," Knowledge-
Based Systems, vol. 235, pp. 107629. Their research developed a dual-approach system that achieved
91% accuracy by leveraging both pattern matching and neural networks, with improved robustness
across diverse resume formats. The authors proposed an intelligent orchestration mechanism that
determined which approach to use for different resume sections based on document characteristics.
This work provided early evidence for the advantages of hybrid approaches in resume parsing,
particularly for handling edge cases and unusual formats.
[9] Zhang and Liu (2022), "Document Information Extraction Using Large Language Models,"
Computational Linguistics, vol. 48, no. 3, pp. 567-589. The study explored fine-tuned BERT models for
structured document analysis, achieving 90% accuracy in field extraction tasks while reducing
training data requirements. The authors investigated how large pre-trained language models could
be adapted for specific document processing tasks with relatively small amounts of task-specific
training data. Their work demonstrated the potential of leveraging general-purpose language
understanding capabilities for specialized information extraction tasks.
[10] Gupta and Sharma (2023), "Integrating LLMs with Traditional NLP for Resume Analysis," Expert
Systems with Applications, vol. 213, pp. 118876. Their research demonstrated 93% extraction
accuracy using a framework that combined transformer-based models with traditional NLP
techniques, establishing the potential of hybrid approaches for resume parsing. The authors
implemented a system that used large language models for semantic understanding while employing
traditional NLP methods for structured information extraction, showing how the complementary
strengths of both approaches could be combined effectively.
Current approaches to resume parsing exhibit several significant gaps that the LLM-Powered Resume
Parser aims to address:
1. Limited Integration of Advanced LLMs: While recent research has begun exploring
transformer-based models for resume parsing, there has been limited investigation into
integrating state-of-the-art LLMs like those offered by Perplexity AI with traditional parsing
techniques. Most existing studies focus on either rule-based approaches or earlier
generations of language models, without fully leveraging the semantic understanding
capabilities of the latest LLMs.
2. Insufficient Hybrid Architecture Research: Though some recent work has suggested the
potential of hybrid approaches, there is a lack of comprehensive research on optimal
architectural designs for combining LLM-based and rule-based parsing. The field lacks
established methodologies for determining when to use each approach and how to
intelligently combine their results.
6. Unexplored Integration with Filtering Systems: Research on integrating parsing outputs with
advanced filtering and candidate matching systems remains limited. Few studies examine
how extracted information can be effectively leveraged for downstream recruitment tasks
like candidate filtering and ranking.
The LLM-Powered Resume Parser project addresses these gaps by implementing a hybrid system that
integrates Perplexity AI's advanced language models with traditional parsing techniques, designing
robust fallback mechanisms, exploring effective prompt engineering approaches, establishing
comprehensive evaluation methodologies, and developing integrated filtering capabilities. By
addressing these research gaps, the project aims to advance the state of resume parsing technology
and establish new benchmarks for accuracy, reliability, and practical utility.
3. PROJECT DESCRIPTION
Current resume parsing systems in the market generally fall into four main categories, each with
specific characteristics, advantages, and limitations:
Limitations: Rigid structure, poor handling of non-standard formats, require constant rule
updates
Accuracy Range: 70-80% for standard formats, significantly lower for creative layouts
Approach: Use supervised learning methods including sequence labeling and classification
Limitations: Require extensive training data, struggle with rare formats or terminology
Accuracy Range: 80-85% across varied datasets, with performance drops for novel formats
3. NER-Specialized Parsers
Strengths: High accuracy for well-defined entities, good performance on contact information
Approach: Offer parsing as a service through cloud APIs with various underlying technologies
Accuracy Range: Varies widely from 75-90% depending on the service and document type
Market Position: Popular with startups and SMEs seeking quick implementation without
infrastructure
These existing systems share several common limitations that impact their effectiveness:
Format Dependency: Performance degrades significantly when processing resumes that
deviate from expected formats.
Limited Semantic Understanding: Most systems extract based on patterns or position rather
than comprehending meaning.
Error Propagation: Errors in section identification typically cascade to all information within
those sections.
Binary Success/Failure Model: Most systems either successfully parse a resume or fail
completely, with limited partial extraction.
Integration Challenges: Output formats vary widely, complicating integration with other
recruitment systems.
The limitations of existing systems highlight the need for a more flexible, semantically-aware parsing
approach that can adapt to diverse resume formats while maintaining reliability – precisely the gap
that the LLM-Powered Resume Parser aims to address.
The LLM-Powered Resume Parser introduces an innovative hybrid approach to resume information
extraction that overcomes the limitations of existing systems by combining the semantic
understanding capabilities of large language models with the reliability of traditional parsing
techniques.
System Overview
The proposed system follows a modular architecture with four primary layers:
1. Web Interface Layer: Provides an intuitive interface for resume upload, results visualization,
and advanced filtering.
2. Document Processing Layer: Handles multiple document formats (PDF, DOCX, images) and
extracts normalized text while preserving structure.
3. Parsing Engine Layer: Implements the core hybrid parsing approach with two main
components:
4. Storage Layer: Manages the persistence of both original documents and structured parsed
data.
Key Innovations
1. Hybrid Parsing Architecture: The system's core innovation is its dual-approach parsing
engine that attempts LLM-based parsing first and falls back to rule-based methods when
needed. This architecture combines the semantic understanding of LLMs with the reliability
of traditional parsing.
3. Structured Prompt Engineering: The system uses carefully designed prompts that guide the
LLM in extracting specific information types and formatting outputs in a consistent structure,
improving extraction reliability.
4. Multi-Format Support: The document processing layer handles various resume formats with
format-specific extraction techniques, ensuring consistent quality regardless of the original
document type.
5. Advanced Filtering Capabilities: The system enables multi-criteria filtering based on skills,
education qualifications (including GPA), and experience, facilitating efficient candidate
matching.
Functional Capabilities
1. Document Processing:
2. Information Extraction:
3. Results Presentation:
4. Resume Filtering:
5. Error Handling:
Implements comprehensive exception handling
Expected Benefits
1. Improved Accuracy: The hybrid approach is expected to achieve >90% overall accuracy,
exceeding the performance of either standalone approach.
3. Format Flexibility: The system can handle diverse resume formats, including creative layouts
and non-standard structures.
5. Efficient Filtering: Structured extraction enables powerful filtering capabilities for effective
candidate matching.
The proposed LLM-Powered Resume Parser represents a significant advancement over existing
systems by addressing their core limitations through an innovative hybrid architecture that balances
accuracy, reliability, and adaptability.
Development Costs
The development of the LLM-Powered Resume Parser requires investment in several areas:
1. Personnel Costs:
2. Technology Costs:
3. Operational Costs:
1. API Usage:
2. Infrastructure:
Cost-Benefit Analysis
1. Tangible Benefits:
2. Intangible Benefits:
3. Return on Investment:
The LLM-Powered Resume Parser demonstrates strong economic feasibility with a first-year ROI of
312% and subsequent annual ROIs exceeding 850%. The initial development investment is modest
compared to the potential annual savings and value creation. The operational costs are manageable
and can be further optimized through caching strategies and selective API usage. Even with
conservative estimates of benefits, the system provides substantial economic value, making it a
financially sound investment for organizations with moderate to high recruitment volumes.
Technology Assessment
1. Core Technologies:
spaCy NLP: Production-ready library with active maintenance and broad adoption
2. Integration Complexity:
3. Scalability Considerations:
4. Performance Expectations:
Average parsing time of 5-10 seconds per resume is acceptable for the use case
Wide variety of resume formats may include edge cases that break extraction
Document processing
API integration
Cloud infrastructure
These skills are readily available in the current technology market, and the project does not require
highly specialized or rare technical expertise.
The LLM-Powered Resume Parser is technically feasible with moderate technical risk. All core
technologies are mature and well-documented, integration complexity is manageable, and the
architecture supports necessary scalability. The primary technical risks relate to API dependency and
document format handling, but these are mitigated through fallback mechanisms and
comprehensive testing. The required technical expertise is available in the current market, and the
development approach follows established patterns. The project does not require breakthrough
technology development, but rather the intelligent integration of existing technologies in a novel
architecture.
Stakeholder Analysis
Overall Impact: Highly positive if accuracy and usability expectations are met
2. Job Candidates:
3. Organizations/Employers:
4. IT Departments:
Ethical Considerations
1. Algorithmic Bias:
2. Data Privacy:
3. Transparency:
4. Digital Divide:
Concern: System may advantage candidates with access to modern resume formats
Regulatory Compliance
The LLM-Powered Resume Parser demonstrates strong social feasibility with positive impacts for key
stakeholders. The primary concerns relate to algorithmic bias, data privacy, and transparency, all of
which can be effectively mitigated through thoughtful system design and implementation. The
system aligns with broader trends toward automation in HR processes while addressing common
concerns through its hybrid approach and emphasis on human oversight for key decisions. With
appropriate implementation practices and clear communication about system capabilities and
limitations, the social acceptance risk is low, and the potential benefits for all stakeholders are
substantial.
Programming Languages
Werkzeug: WSGI utility library for request handling and file operations
spaCy 3.5.0: Core NLP library for text processing and entity recognition
API Integration
Data Storage
Development Tools
External Services
Coding Standards
Documentation Standards
Google Python Style Guide: Docstring format for function and class documentation
Security Policies
File Upload Security: Content type validation, size limits, safe filename handling
API Key Management: Secure storage of API credentials using environment variables
Data Protection: Appropriate access controls for stored documents and data
Testing Standards
Unit Testing: All core functions must have associated unit tests
Development Process
Git Workflow: Feature branch workflow with pull requests
Data Retention: Clear policies for document and data retention periods
Accessibility Standards
Performance Standards
Resource Utilization: Optimized CPU and memory usage with defined limits
These tools, technologies, standards, and policies provide a comprehensive framework for the
development, deployment, and operation of the LLM-Powered Resume Parser, ensuring quality,
security, and consistency throughout the system lifecycle.
The LLM-Powered Resume Parser implements a layered architecture pattern with modular
components that interact through well-defined interfaces. The system consists of four primary layers,
each responsible for specific aspects of functionality:
1. Presentation Layer
The presentation layer handles user interaction through a web interface, providing three main
components:
Filter Interface: Enables searching and filtering of parsed resumes based on multiple criteria
This layer is implemented using Flask for server-side rendering, with HTML, CSS, and JavaScript for
the client-side interface. It communicates with the application layer through HTTP requests and
responses, following REST principles for API interactions.
2. Application Layer
The application layer coordinates the core business logic, managing the flow of information between
components:
Document Handler: Manages file uploads and routes documents to appropriate processors
Parsing Controller: Orchestrates the parsing process, determining which parsing methods to
use
This layer acts as an intermediary between the presentation and processing layers, implementing
error handling, input validation, and process coordination. It is built using Python with Flask for web
request handling and routing.
3. Processing Layer
The processing layer contains the core functionality of the system, divided into two main
components:
Document Processing:
Parsing Engine:
Rule-Based Parser: Provides traditional parsing using spaCy and regular expressions
This layer performs the actual document processing and information extraction, converting
unstructured documents into structured data. It is implemented primarily in Python, using
specialized libraries for document processing and NLP tasks.
4. Storage Layer
Integration Points
Data Flow
10. User can later filter and search across parsed resumes
Resume Document
┌──────────────────────┐
│ │
User ────► │ LLM-Powered Resume │ ────► Structured Resume Data
│ Parser System │
│ │
└──────────────────────┘
Filtering Results
Level 1 DFD
Resume Document
┌────────────────┐
│ Document │
│ Processing │
└────────┬───────┘
│ Extracted Text
│ │ │ │ │ │
│ │ │ │ │ │
│ Structured Data
┌────────────────┐
│ Data │
│ Storage │
└────────┬───────┘
│
│ Parsed Information
┌────────────────┼────────────────┐
│ │ │
▼ ▼ ▼
│ Filter Criteria
┌────────────────┐
│ Filtered │
│ Results │
└────────────────┘
Resume Document
┌───────────────────┐
│ Format │
│ Identification │
└─────────┬─────────┘
┌─────────────────┴──────────────────┐
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ PDF │ │ DOCX │
│ Processing │ │ Processing │
└──────┬───────┘ └───────┬──────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Extraction │ │ Extraction │
└──────┬───────┘ └───────┬──────┘
│ │
└────────────────┬──────────────────┬┘
│ │
▼ ▼
┌──────────────┐ ┌─────────────┐
│ Image │ │ Text │
│ Processing │ │ Cleaning │
└──────┬───────┘ └──────┬──────┘
│ │
▼ │
┌──────────────┐ │
│ OCR │ │
│ Processing │ │
└──────┬───────┘ │
│ │
└────────┬─────────┘
┌─────────────┐
│ Normalized │
│ Text │
└─────────────┘
Normalized Text
│
┌───────────────┐
│ Section │
│ Identification│
└───────┬───────┘
┌───────────────┐
│ LLM Prompt │
│ Construction │
└───────┬───────┘
┌───────────────┐
│ Perplexity AI │
│ API Call │
└───────┬───────┘
┌───────────────┐
│ Response │
│ Validation │
└───────┬───────┘
┌─────────────┴──────────────┐
│ │
▼ ▼
┌───────────────┐ ┌───────────────┐
│ JSON │ │ Error │
│ Extraction │ │ Detection │
└───────┬───────┘ └───────┬───────┘
│ │
│ │ Failed Sections
│ ▼
│ ┌───────────────┐
│ │ Rule-Based │
│ │ Parsing │
│ └───────┬───────┘
│ │
└────────────┬───────────────┘
┌───────────────┐
│ Structured │
│ Data Assembly │
└───────┬───────┘
┌───────────────┐
│ Parsed Resume │
│ Data │
└───────────────┘
These data flow diagrams illustrate the movement of information through the system, from
document upload through processing, parsing, storage, and filtering. They highlight the key
processing steps and decision points, particularly the hybrid parsing approach with fallback
mechanisms.
┌─────────────────────────────────────────────────────────────┐
│ │
│ ┌───────────────┐ ┌───────────────────────────┐ │
│ │ │ │ │ │
│ │ Upload │ │ Process Resume Document │ │
│ │ Resume │───────▶│ │ │
│ │ │ │ │ │
│ └───────────────┘ └───────────────────────────┘ │
│ ▲ │ │
│ │ │ │
│ │ ▼ │
│ ┌──────┴──────┐ ┌───────────────────────────┐ │
│ │ │ │ │ │
│ │ │◀────────│ │ │
│ │ │ │ │ │
│ └──────┬──────┘ └───────────────────────────┘ │
│ │ ▲ │
│ │ │ │
│ │ │ │
│ │ ┌────────────┴──────────────┐ │
│ │ │ │ │
│ │ │ │
│ └───────────────────────────┘ │
│ │ │
│ │ │
│ ▼ │
│ ┌───────────────────────────┐ │
│ │ │ │
│ │ │ │
│ └───────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Use Case Descriptions
1. Upload Resume
Actor: Recruiter
Preconditions: User has access to the system and a valid resume file
Main Flow:
Actor: System
Main Flow:
4. If LLM parsing fails for any section, system applies rule-based parsing
Actor: Recruiter
Main Flow:
Actor: Recruiter
Main Flow:
Actor: Recruiter
Main Flow:
These use cases capture the core functionality of the LLM-Powered Resume Parser system from the
user's perspective, highlighting the key interactions and workflows.
│ Upload Resume │ │ │ │ │ │
│───────────────>│ │ │ │ │ │
│ │ │ │ │ │ │
│ │ Submit File │ │ │ │ │
│ │───────────────>│ │ │ │ │
│ │ │ │ │ │ │
│ │ │ Process File │ │ │ │
│ │ │───────────────>│ │ │ │
│ │ │ │ │ │ │
│ │ │ │ Extract Text │ │ │
│ │ │ │───────────────>│ │ │
│ │ │ │ │ │ │
│ │ │ │ │──────────────>│ │
│ │ │ │ │ │ │
│ │ │ │ │ │ API Request │
│ │ │ │ │ │──────────────>│
│ │ │ │ │ │ │
│ │ │ │ │ │ API Response │
│ │ │ │ │ │<──────────────│
│ │ │ │ │ │ │
│ │ │ │ │ LLM Results │ │
│ │ │ │ │<──────────────│ │
│ │ │ │ │ │ │
│ │ │ │ │ Check Success │ │
│ │ │ │ │───────────────┐ │
│ │ │ │ │ │ │
│ │ │ │ │<──────────────┘ │
│ │ │ │ │ │
│ │ │ │ │ │
│ │ │ │ │ Rule-Based Results │
│ │ │ │ │<──────────────────────────────│
│ │ │ │ │ │
│ │ │ │ Combined Results │
│ │ │ │<──────────────┘ │
│ │ │ │ │
│ │ │ Parsed Data │ │
│ │ │<──────────────┘ │
│ │ │ │
│ │ Display Results│ │
│ │<──────────────┘ │
│ │ │
│ View Results │ │
│<──────────────┘ │
│ │
│ │ │ │ │
│ Access Filter │ │ │ │
│───────────────>│ │ │ │
│ │ │ │ │
│ │ Request Skills │ │ │
│ │────────────────>│ │ │
│ │ │ │ │
│ │ │───────────────>│ │
│ │ │ │ │
│ │ │ │ Query Skills │
│ │ │ │────────────────>│
│ │ │ │ │
│ │ │ │ Skills List │
│ │ │ │<────────────────│
│ │ │ │ │
│ │ │ Skills List │ │
│ │ │<───────────────│ │
│ │ │ │ │
│ │ Display Skills │ │ │
│ │<────────────────│ │ │
│ │ │ │ │
│ Select Filters │ │ │ │
│───────────────>│ │ │ │
│ │ │ │ │
│ │ Apply Filters │ │ │
│ │────────────────>│ │ │
│ │ │ │ │
│ │ │ Filter Resumes │ │
│ │ │───────────────>│ │
│ │ │ │ │
│ │ │ │ Query Matching │
│ │ │ │────────────────>│
│ │ │ │ │
│ │ │ │ Matching Resumes │
│ │ │ │<────────────────│
│ │ │ │ │
│ │ │ Filter Results │ │
│ │ │<───────────────│ │
│ │ │ │ │
│ │ Display Results │ │ │
│ │<────────────────│ │ │
│ │ │ │ │
│ View Results │ │ │ │
│<──────────────┘ │ │ │
│ │ │ │
These sequence diagrams illustrate the dynamic interactions between system components during
key workflows, highlighting the temporal sequence of operations and the flow of information
between different parts of the system.
┌────────────────┐
│ │
┌───►│ Perplexity AI │
│ │ API │
│ │ │
│ └────────────────┘
│ ▲
│ │ 3. API Request
│ │
│ │ │ │ │ │ │
│ │ │ │ │ │ │
│ │ │ 5. Process │
│ │ │ Response │
│ 1. Extract │ ▼ │ 6. Extract
│ │ │ │ │
│ │ │ Prompt │ │
▼ │ │ Constructor │ ▼
┌───────────────┐ │ │ │ ┌────────────────┐
│ │ │ └────────────────┘ │ │
│ Parsing │ │ ▲ │ Structured │
│ │ 2. Create│ │ │
│ │ ▲
│ │ │
│ ┌────────────────┐ │
│ │ │ │
│ Identifier │ 7. Associate
│ │ with Sections
└────────────────┘
┌───────────────┐ ┌────────────────┐
│ │ │ │
│ │ │ │
└───────────────┘ └────────────────┘
│ │
│ │ 1. Process Text
│ │ with NLP
│ ▼
│ ┌────────────────┐
│ │ │
│ │ Entity │
│ │ Recognizer │
│ │ │
│ └────────────────┘
│ │
│ │ 2. Identify
│ │ Entities
│ ▼
│ ┌────────────────┐ ┌────────────────┐
│ │ │ │ │
│ │ │ │
└────────────────┘ └────────────────┘
│ │
│ │ 3. Match
│ │ Patterns
│ ▼
│ ┌────────────────┐
│ │ │
│ │ Pattern │
│ │ Repository │
│ │ │
│ └────────────────┘
│ │
│ │ 4. Return
│ │ Matches
▼ │
┌────────────────┐ │
│ │ │
│ Information │◄────────────┘
│ Extractor │
│ │
└────────────────┘
│
│ 5. Extract
│ Information
┌────────────────┐
│ │
│ Structured │
│ Data │
│ │
└────────────────┘
┌────────────────┐
│ │
│ Document │
│ Processor │
│ │
└────────────────┘
│ 1. Extracted Text
│ │ │ │ │ │
│ │ │ │ │ │
│ ▲ │ │ ▲
│ │ │ │ │
│ │ │ │ │
│ │ │ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌────────────────┐ │ │
│ │ │ │ │ │
│ │ │ Section │ │ │
│ │ │ Identifier │ │ │
│ │ │ │ │ │
│ │ └────────────────┘ │ │
│ │ │ │ │
│ │ │ 2. Identified │ │
│ │ │ Sections │ │
│ │ ▼ │ │
│ │ ┌────────────────┐ │ │
│ │ │ │ │ │
│ │ │ Parser │ │ │
│ │ │ │
│ └────────────────┘ │
│ │ │
│ │ 3. Section │
│ │ Assignments │
│ ▼ │
│ ┌────────────────┐ │
│ │ │ │
│ │ Results │ │
│ │ 5. Fallback
└────────────────┘ Results
│ 6. Combined
│ Results
▼
┌────────────────┐
│ │
│ Storage │
│ Manager │
│ │
└────────────────┘
These collaboration diagrams illustrate the structural relationships and interactions between key
system components, highlighting how they work together to accomplish specific tasks. The diagrams
show the organization of components and the communication paths between them, providing a
different perspective from the sequence diagrams.
┌─────────────────────────────────────────────────────────────────────┐
│ │
│ │ │
│ ▼ │
│ Validate File │
│ │ │
│ ▼ │
│ │ │ │
│ Yes │ │
│ │ │ │
│ ▼ │ │
│ │ │ │
│ ▼ │ │
│ │ │ │
│ ▼ │ │
│ ┌───────────────────┐ │ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ │ │ │ │
│ │ │ │ │
│ │ │ ▼ │
│ │ │ Apply OCR │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ └───────────────────┴────────┘ │
│ │ │
│ ▼ │
│ │ │
│ ▼ │
│ Identify Sections │
│ │ │
│ ▼ │
│ │ │
│ ▼ │
│ │ │
│ ▼ │
│ │ │ │
│ Yes │ │
│ │ │ │
│ ▼ │ │
│ Extract JSON │ │
│ │ │ │
│ ▼ │ │
│ ◄JSON Valid?►───No───────────┼─┐ │
│ │ ││ │
│ Yes ││ │
│ │ ││ │
│ ▼ ││ │
│ │ ││ │
│ ▼ ││ │
│ │ │ │
│ Yes │ │
│ │ │ │
│ │ ▼ │
│ │ │ │
│ ▼ │ │
│ │ │
│ ▼ │
│ │ │
│ ▼ │
│ Display Results │
│ │ │
│ ▼ │
│ ■ │
│ │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ │
│ │ │
│ ▼ │
│ │ │
│ ▼ │
│ │ │
│ ▼ │
│ │ │
│ ▼ │
│ │ │
│ ▼ │
│ │ │
│ ┌────────┴─────────┬────────────────┐ │
││ │ │ │
│▼ ▼ ▼ │
││ │ │ │
│ └────────┬─────────┴────────────────┘ │
│ │ │
│ ▼ │
│ Apply Filter │
│ │ │
│ ▼ │
│ │ │
│ ▼ │
│ │ │
│ ▼ │
│ ◄Skills Match?►────No───┐ │
│ │ │ │
│ Yes │ │
│ │ │ │
│ ▼ │ │
│ ◄Year Specified?►──No──┐│ │
│ │ ││ │
│ Yes ││ │
│ │ ││ │
│ ▼ ││ │
│ ◄Year Matches?►──No────┼┼─┐ │
│ │ │││ │
│ Yes │││ │
│ │ │││ │
│ ▼ │││ │
│ ◄GPA Specified?►──No──┐││││ │
│ │ │││││ │
│ Yes │││││ │
│ │ │││││ │
│ ▼ │││││ │
│ │ ││││ │
│ Yes ││││ │
│ │ ││││ │
│ ▼ ││││ │
│ │ ││││ │
│ │ ┌───────────┘│││ │
│ │ │ │││ │
│ │ │ ┌────────┘││ │
│ │ │ │ ││ │
│ │ │ │ ┌──────┘│ │
│ │ │ │ │ │ │
│ │ │ │ │ ┌────┘ │
│ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ │
│ │ │ │
│ No │ │
│ │ │ │
│ ▼ │ │
│ Sort Results │ │
│ │ │ │
│ ▼ │ │
│ │ │ │
│ │ ┌─────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ◄Export Results?►──No─────────────┐ │
│ │ │ │
│ Yes │ │
│ │ │ │
│ ▼ │ │
│ │ │ │
│ ▼ │ │
│ ▼ │ │
│ Download File │ │
│ │ │ │
│ └────────────────────────┘ │
│ │ │
│ ▼ │
│ ■ │
│ │
└─────────────────────────────────────────────────────────────────────┘
These activity diagrams illustrate the procedural flows of the two main system processes: resume
processing and resume filtering. They show the decision points, parallel activities, and the sequential
flow of operations within each process.
4.3.1 Algorithm
mean_accuracy = 0
count = 0
LLM_response = call_perplexity_api(prompt)
if (is_valid_response(LLM_response)):
parsed_data[s] = extract_structured_info(LLM_response, s)
section_success[s] = True
else:
section_success[s] = False
if (section_success[s] == False):
doc = apply_spacy_nlp(R_text)
if (s == "skills"):
else if (s == "education"):
else if (s == "experience"):
count += 1
else:
f1_score = 0
The Hybrid Resume Parsing algorithm implements a two-stage approach that combines LLM-based
parsing with traditional rule-based methods. The algorithm begins by initializing accuracy metrics
and extracting text from the input resume document. It then processes three key information
sections: skills, education, and experience.
In the first stage, the algorithm attempts to parse each section using the Perplexity AI language
model. For each section, it constructs a specialized prompt that instructs the LLM on the specific
information to extract and the desired output format. The algorithm then calls the Perplexity API
with this prompt and validates the response. If the LLM successfully extracts structured information
for a section, the algorithm stores this data and marks the section as successfully processed.
For any sections where LLM parsing fails (due to API errors, malformed responses, or other issues),
the algorithm proceeds to the second stage: rule-based parsing. In this stage, the resume text is
processed using the spaCy natural language processing library. Section-specific extraction functions
are applied, using techniques such as regular expression pattern matching, named entity recognition,
and heuristic rules to extract the required information.
The algorithm then evaluates parsing accuracy by comparing the extracted information against
manually labeled ground truth data across an evaluation dataset of N resumes. For each resume, it
calculates section-specific accuracy metrics by matching extracted items against ground truth items.
Finally, the algorithm calculates overall performance metrics: precision (the proportion of extracted
items that are correct), recall (the proportion of ground truth items that were successfully extracted),
and F1 score (the harmonic mean of precision and recall). The mean accuracy across all evaluated
resumes provides a comprehensive measure of the algorithm's performance.
This hybrid approach leverages the semantic understanding capabilities of large language models
while maintaining reliability through rule-based fallback mechanisms, resulting in superior overall
accuracy compared to either approach used in isolation.
Input:
Output:
resume_skills = lowercase(r.skills)
skill_found = False
skill_found = True
break
if not skill_found:
match = False
break
year_match = False
if edu.graduation_year contains Y:
year_match = True
break
if not year_match:
match = False
gpa_match = False
gpa_match = True
break
if not gpa_match:
match = False
if match is True:
add r to M
Return M
The Multi-Criteria Resume Filtering algorithm implements a flexible approach to identifying resumes
that match specific requirements. It takes as input a set of parsed resumes and optional filtering
criteria including required skills, graduation year, and minimum GPA threshold.
The algorithm processes each resume individually, applying the specified filtering criteria in
sequence. First, it checks if the resume contains all the required skills, using case-insensitive
matching and allowing for partial matches to accommodate variations in skill descriptions. Next, if a
graduation year criterion is specified, the algorithm checks if any education entry in the resume
matches the required year. Finally, if a minimum GPA threshold is specified, the algorithm verifies if
any education entry meets or exceeds this threshold.
A resume is added to the matching results only if it satisfies all the specified criteria. If no criteria are
specified for a particular category (skills, year, or GPA), that category is effectively ignored in the
filtering process. After processing all resumes, the algorithm optionally sorts the matching results by
relevance, typically based on the number of matching skills or other relevant metrics.
This algorithm enables efficient and flexible resume filtering, allowing recruiters to quickly identify
candidates whose qualifications match specific job requirements based on the structured
information extracted by the parsing process.
4.3.2 PseudoCode
Function ParseWithPerplexityAI(resumeText):
systemPrompt = "You are a resume parsing expert. Extract the following information from the
resume:
userPrompt = "Parse the following resume and extract skills, education, and experience:\n\n" +
resumeText
payload = {
"model": "sonar",
"messages": [
],
"max_tokens": 4000,
"temperature": 0.1,
"top_p": 0.95,
"frequency_penalty": 0
headers = {
"Content-Type": "application/json"
Try:
response = HTTPRequest(url="https://api.perplexity.ai/chat/completions",
method="POST",
headers=headers,
body=payload)
If response.statusCode == 200:
responseData = ParseJSON(response.body)
content = responseData.choices[0].message.content
jsonContent = ExtractJSONFromText(content)
Try:
parsedData = ParseJSON(jsonContent)
Return {
"experience": parsedData.experience OR []
Catch JSONParseError:
Return null
Else:
Return null
Else:
Return null
Catch Exception as e:
Return null
Function ExtractJSONFromText(text):
// Try to extract JSON content from text that might contain markdown or explanations
// First, try to extract content between json code blocks
If jsonMatch:
Return jsonMatch.group(1).trim()
If jsonMatch:
Return jsonMatch.group(1)
Return text
skills = []
technicalSkills = [
softSkills = [
If skillsMatch:
skillsText = skillsMatch.group(1)
skill = skill.trim()
If skill is not empty AND skill.length > 1 AND skill not in skills:
skill = skill.trim()
If skill is not empty AND skill.length > 1 AND skill not in skills:
textLower = resumeText.toLowerCase()
skillLower = skill.toLowerCase()
If Regex.Search("\b" + Regex.Escape(skillLower) + "\b", textLower) AND skill not in skills:
Return skills
education = []
If educationMatch:
educationText = educationMatch.group(1)
educationEntry = {
"institution": university.trim(),
"details": details.trim(),
"gpa": gpa,
"graduation_year": graduationYear
If education is empty:
line = line.trim()
If education is empty:
Break
Return education
experience = []
// Define section patterns to search for
sections = [
If sectionMatch:
sectionText = sectionMatch.group(1)
dateMatch = Regex.Search("((?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec|
January|February|March|April|May|June|July|August|September|October|November|December)
[\s,]*\d{4})\s*(?:-|to|–|until)\s*((?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec|
January|February|March|April|May|June|July|August|September|October|November|December)
[\s,]*\d{4}|Present|Current)", description, IGNORECASE)
experienceEntry = {
"company": name.trim(),
"description": description.trim(),
"date": dateInfo
}
Add experienceEntry to experience
If entries is empty:
currentEntry = ""
line = line.trim()
currentEntry = line
Else:
Else:
currentEntry = line
Return experience
filteredResumes = []
requiredSkills = filterCriteria.skills || []
If not skillsMatch:
allSkillsFound = true
skillFound = false
skillLower = skill.toLowerCase()
skillFound = true
Break
If not skillFound:
allSkillsFound = false
Break
skillsMatch = allSkillsFound
yearMatch = true
Break
If typeof edu is dictionary AND "gpa" in edu AND edu.gpa is not null:
gpaMatch = true
Break
Add {
"id": resume.id,
"name": resume.filename,
"skills": resume.skills,
"education": resume.education,
"experience_count": Length(resume.experience)
} to filteredResumes
Return filteredResumes
These pseudocode implementations provide detailed algorithmic descriptions of the key components
of the LLM-Powered Resume Parser system. They illustrate the specific steps and logic used in the
LLM-based parsing, rule-based fallback parsing for different information categories, and the resume
filtering process. The pseudocode is written in a language-agnostic manner while maintaining the
essential logic and flow of the actual implementation.
4.4 Module Description
The Document Processing Module is responsible for handling various document formats and
extracting text content for further processing. This module acts as the entry point for resume
documents, performing format-specific extraction and text normalization.
1. Format Detection
2. PDF Processing
3. DOCX Processing
4. Image Processing
5. Text Normalization
Interfaces:
Input: Resume document file (PDF, DOCX, or image)
Error Handling:
This module serves as the foundation for the parsing process, ensuring that regardless of the original
document format, the system has quality text content to work with. Its effectiveness directly impacts
the performance of downstream parsing operations.
The Parsing Engine Module forms the core of the LLM-Powered Resume Parser, implementing the
hybrid parsing approach that combines LLM-based semantic understanding with rule-based
reliability. This module is responsible for extracting structured information from the normalized text
provided by the Document Processing Module.
Components:
1. Parser Controller
2. LLM Parser
3. Rule-Based Parser
4. Section Identifier
5. Information Structuring
Processing Flow:
2. For each identified section: a. Parser Controller attempts LLM-based parsing b. If successful,
structured information is extracted and validated c. If unsuccessful, Rule-Based Parser is
applied to that section
Performance Considerations:
Error Handling:
The Parsing Engine Module embodies the key innovation of the system: the integration of advanced
language model capabilities with traditional parsing techniques in a complementary architecture.
This hybrid approach enables the system to achieve high accuracy while maintaining reliability,
addressing the fundamental challenge of resume parsing.
Components:
3. Query Engine
4. Filter Controller
Storage Structure:
1. File Organization
2. JSON Schema
Standard format for all parsed resumes:
3. {
4. "filename": "original_filename.pdf",
5. "parsed_data": {
7. "education": [
8. {
13. }
14. ],
15. "experience": [
16. {
21. }
22. ]
23. }
24. }
Filtering Capabilities:
1. Skills Filtering
Case-insensitive matching
2. Education Filtering
Institution filtering
3. Result Handling
Performance Optimizations:
The Storage and Filtering Module provides the persistent data management and search capabilities
that enable the system to function as a practical recruitment tool. By maintaining structured
information and providing powerful filtering capabilities, this module transforms the parsing
functionality into a usable application for candidate matching and selection.
Prerequisites:
Environment Setup:
3. cd llm-powered-resume-parser
6.
7. # On Windows
8. venv\Scripts\activate
9.
10. # On macOS/Linux
19.
20. # On macOS/Linux
Development Mode:
2. Copypython app.py
Production Deployment:
4. Copyserver {
5. listen 80;
6. server_name yourserver.com;
7.
8. location / {
9. proxy_pass http://127.0.0.1:8000;
10. proxy_set_header Host $host;
12. }
13. }
17.
Application Configuration:
3. DEBUG = False
5.
6. # API Configuration
7. API_TIMEOUT = 30 # seconds
8. MAX_RETRIES = 3
9.
14. Add custom skills to the skills dictionary: Edit resume_parser/skills_dictionary.py to add
domain-specific skills:
15. CopyCUSTOM_SKILLS = [
16. # Industry-specific skills
20. ]
2. Change the filter criteria: Edit templates/filter.html to add or modify filter options:
3. Copy<div class="filter-section">
4. <h2>Additional Filters</h2>
5. <label>
8. </label>
9. </div>
Testing:
2. Copypytest tests/
4. Copypytest tests/integration/
By following these steps, you can set up, run, and customize the LLM-Powered Resume Parser to suit
your specific requirements. The system's modular design allows for flexible configuration and
extension of functionality without requiring changes to the core architecture.
The primary input to the system is the resume document, which is accepted in multiple formats:
PDF Files:
Word Documents:
Image Files:
The web interface for document upload is designed for usability and error prevention:
Drag-and-Drop Area:
Validation Feedback:
Submission Control:
Skills Selection:
Education Filters:
Input Validation:
For programmatic access, the system exposes API endpoints with structured input requirements:
Upload API:
Method: POST
Content-Type: multipart/form-data
Filter API:
Method: POST
Content-Type: application/json
JSON structure:
Copy{
"year": "2022",
"gpa": "3.5"
}
The input design prioritizes usability, accessibility, and error prevention while ensuring the system
receives high-quality inputs for processing. This approach maximizes the chances of successful
parsing while providing clear feedback when issues arise.
The LLM-Powered Resume Parser produces several types of outputs, each designed for optimal
usability and information communication:
The core output of the system is the structured resume information, organized in a consistent JSON
format:
Copy{
"skills": [
"Python",
"Machine Learning",
"Data Analysis",
"JavaScript",
"SQL"
],
"education": [
{
"institution": "Stanford University",
"graduation_year": "2021",
"gpa": "3.9"
},
"graduation_year": "2019",
"gpa": "3.7"
],
"experience": [
},
"description": "Developed data processing pipelines using Python and SQL, handling over 500GB
of customer data daily."
Skills Section:
Education Section:
Experience Section:
Timeline-based visualization
Expandable/collapsible sections
Print/export options
When users apply filters, the system produces a specialized output format:
List View:
Aggregated Statistics:
Copy{
"success": true,
"filename": "original_filename.pdf",
"parsed_data": {
"skills": [...],
"education": [...],
"experience": [...]
},
"confidence_scores": {
"skills": 0.92,
"education": 0.95,
"experience": 0.88
}
}
Copy{
"count": 5,
"results": [
{
"id": "20230615_123045_resume.json",
"name": "resume.pdf",
"skills": [...],
"education": [...],
"experience_count": 3,
"match_score": 0.89
},
...
]
}
User-Facing Errors:
Copy{
"success": false,
"error": {
"code": "invalid_file_type",
"message": "The uploaded file is not a supported format. Please upload PDF, DOCX, or
image files.",
"details": {
"file_type": "application/xml",
}
}
}
The output design focuses on clarity, organization, and usability, ensuring that the valuable
information extracted from resumes is presented in the most effective way for different user needs.
The consistent structure enables both human readability and programmatic processing, making the
system versatile for various use cases.
5.2 Testing
The LLM-Powered Resume Parser was tested using a comprehensive multi-level approach to ensure
functionality, reliability, and performance across various scenarios:
1. Unit Testing
Unit tests focused on verifying the correct behavior of individual components in isolation:
Framework: pytest
Copydef test_extract_json_from_text():
}"""
parser = ResumeParserModel()
result = parser._extract_json_from_text(sample_text)
parsed_json = json.loads(result)
assert "skills" in parsed_json
assert len(parsed_json["education"]) == 1
2. Integration Testing
Filtering functionality
Copydef test_resume_upload_and_parsing():
app = create_test_app()
client = app.test_client()
test_file_path = create_test_resume_pdf()
mock_post.return_value.status_code = 200
mock_post.return_value.json.return_value = {
'choices': [{
'message': {
}
}]
response = client.post(
'/upload',
content_type='multipart/form-data'
3. System Testing
Cross-browser compatibility
Mobile responsiveness
Test Cases:
1. Upload various resume formats (PDF, DOCX, image) and verify correct parsing
4. Performance Testing
Performance testing evaluated system efficiency and scalability:
Scenarios Tested:
Test Results:
5. Security Testing
Areas Tested:
Key Tests:
6. Usability Testing
Tasks Included:
Feedback Highlights:
7. Regression Testing
The comprehensive testing approach verified that the LLM-Powered Resume Parser met its
functional requirements while maintaining performance, security, and usability standards. Testing
revealed several opportunities for improvement, particularly in error handling and parsing of
complex document layouts, which were addressed in subsequent development iterations.
The performance of the LLM-Powered Resume Parser was systematically evaluated to assess its
accuracy, efficiency, and scalability in real-world usage scenarios.
Test Environment:
Evaluation Metrics:
1. Accuracy Metrics
3. Scalability Metrics
Test Methodology:
1. Accuracy Testing
2. Efficiency Testing
3. Scalability Testing
Performance Results:
1. Accuracy Results
Format Average Time (sec) Min Time (sec) Max Time (sec)
Image
12.6 9.4 18.3
(JPG/PNG)
3. Resource Utilization
Operation CPU Usage (%) Memory Usage (MB) Network I/O (KB)
Image
65% 320 25
Processing
4. Scalability Results
Concurrent Avg. Response Time Success Rate Throughput
Users (sec) (%) (resumes/hour)
Performance Analysis:
1. Accuracy Analysis
The hybrid parsing approach achieved over 90% F1 score overall, with education
information extracted most accurately (93.8% F1)
Experience information showed the lowest accuracy (89.0% F1), primarily due to
variability in description formatting
LLM parsing alone achieved 89.2% F1, while rule-based parsing achieved 82.9% F1,
demonstrating the effectiveness of the hybrid approach
2. Efficiency Analysis
Average processing time of 7.5 seconds per resume meets the target of < 10 seconds
Memory usage remained within acceptable limits, with peak usage during image
processing
3. Scalability Analysis
Beyond 15 users, response times increased exponentially and success rates declined
2. Accuracy Enhancements
3. Scalability Enhancements
The performance evaluation demonstrated that the LLM-Powered Resume Parser meets its primary
functional and performance requirements, with particularly strong results in education information
extraction and processing of structured document formats. The system shows good performance
characteristics for typical usage scenarios, with clear paths for optimization to handle higher loads
and improve processing times for image-based documents.
The LLM-Powered Resume Parser demonstrates significant efficiency improvements over traditional
parsing approaches in several key dimensions.
The hybrid parsing architecture combines the semantic understanding of large language models with
the reliability of rule-based approaches, resulting in superior overall performance. This efficiency is
evident in the system's ability to extract structured information with high accuracy across diverse
resume formats:
Overall Accuracy: 91.3% F1 score across all information categories and resume formats
Fallback Reliability: 94.2% successful recovery rate when LLM parsing fails
Format Adaptability: Consistent performance across standard (93.7% accuracy) and non-
standard formats (88.4% accuracy)
This balanced approach addresses the fundamental efficiency challenge in resume parsing:
maintaining high accuracy while ensuring reliable operation across diverse document formats.
Processing Efficiency
The system demonstrates efficient resource utilization while maintaining reasonable processing
times:
Average Processing Time: 7.5 seconds per resume, well below the 10-second target
Resource Utilization: Moderate CPU (35-65%) and memory (150-320MB) usage during
parsing
API Efficiency: Structured prompts minimize token usage and optimize API costs
These efficiency metrics indicate that the system can process substantial resume volumes while
maintaining acceptable performance and resource consumption.
Operational Efficiency
Error Handling: Graceful degradation ensures partial results even when full parsing fails
Storage Efficiency: Structured JSON format provides compact storage of parsed information
User Efficiency
The system significantly improves efficiency for end users in the recruitment process:
Time Savings: Eliminates manual data entry from resumes (estimated 5 minutes saved per
resume)
Search Efficiency: Structured data enables rapid candidate identification through filtering
Comparative Efficiency
When compared to existing systems, the LLM-Powered Resume Parser shows efficiency
improvements in several areas:
Recovery Rate: Higher successful parsing rate reduces the need for manual processing
Integration Efficiency: Structured output format simplifies integration with other systems
Cost Efficiency
The system demonstrates good cost efficiency in practical deployment:
ROI: Strong first-year return on investment (312%) with increasing returns in subsequent
years
Scalability: Linear cost scaling with resume volume through optimized API usage
These efficiency metrics demonstrate that the LLM-Powered Resume Parser provides substantial
operational and economic benefits while delivering superior parsing performance. The system's
hybrid architecture addresses the fundamental efficiency challenges in resume parsing by balancing
accuracy, reliability, and resource utilization.
The LLM-Powered Resume Parser represents a significant advancement over existing resume parsing
systems, with key differences across multiple dimensions.
Parsing Approach
Existing Systems:
Rule-Based Systems: Rely solely on predefined patterns and rules, requiring constant
maintenance to keep pace with evolving resume formats
Machine Learning Systems: Use supervised learning techniques that require extensive
training data and struggle with novel formats
Proposed System:
Section-Specific Processing: Applies optimal parsing techniques for different resume sections
Intelligent Fallback: Gracefully transitions between parsing methods based on success rates
The proposed system's hybrid approach overcomes the fundamental limitations of existing systems
by leveraging the complementary strengths of different parsing techniques.
Existing Systems:
NER-Based Systems: 85-90% accuracy for specific entities, weak on contextual relationships
Proposed System:
The proposed system consistently outperforms existing approaches, particularly for non-standard
resume formats and complex information categories.
Format Handling
Existing Systems:
Proposed System:
The proposed system's ability to handle diverse formats without significant performance degradation
represents a major advancement over existing solutions.
Existing Systems:
Proposed System:
The proposed system's sophisticated error handling dramatically improves reliability in real-world
scenarios with diverse and unpredictable resume formats.
Proposed System:
The proposed system offers improved long-term sustainability with reduced maintenance
requirements, addressing a key challenge in existing parsing solutions.
Existing Systems:
Proposed System:
The proposed system provides superior usability and integration capabilities, making the parsed
information more immediately actionable for recruitment processes.
This comparison demonstrates that the LLM-Powered Resume Parser represents a significant
advancement in resume parsing technology, addressing the core limitations of existing systems while
providing superior accuracy, reliability, and usability.
The following table provides a detailed comparison between the LLM-Powered Resume Parser and
three representative existing systems:
Traditional
ML-Based NER-Based LLM-Powered
Feature/Capability Rule-Based
System System Resume Parser
System
Skills Extraction
78.3% 84.5% 86.2% 91.2%
Accuracy
Education
Extraction 85.1% 87.3% 89.5% 93.8%
Accuracy
Experience
Extraction 76.4% 82.4% 83.8% 89.0%
Accuracy
Moderate Moderate
Non-Standard Poor (< 60% Good (88.4%
(70-75% (75-80%
Format Handling accuracy) accuracy)
accuracy) accuracy)
Limited Moderate
(requires (requires Good (with Good (inherits
Multilingual
language- language- language from LLM
Support
specific specific models) capabilities)
rules) training)
Moderate Moderate
Fast (3.1 Moderate (7.5
Processing Time (5.8 (6.2
sec/resume) sec/resume)
sec/resume) sec/resume)
High Moderate
Moderate
Maintenance (constant (entity Low (leverages
(periodic
Requirements rule library LLM updates)
retraining)
updates) updates)
Semantic
None Limited Moderate Strong
Understanding
Traditional
ML-Based NER-Based LLM-Powered
Feature/Capability Rule-Based
System System Resume Parser
System
Context
None Limited Limited Strong
Comprehension
Requires
Requires
Domain domain- Moderate Good
domain-
Adaptation specific adaptation adaptation
specific rules
training
Implementation
Moderate High High Moderate
Complexity
High
Resource (training),
Low Moderate Moderate
Requirements Moderate
(inference)
Yes (Perplexity
API Dependencies None None None
AI)
Fallback
Limited Limited Limited Comprehensive
Mechanisms
Filtering
Basic Moderate Moderate Advanced
Capabilities
This comparative analysis demonstrates the LLM-Powered Resume Parser's advantages across
multiple dimensions. While the system shows slightly longer processing times compared to rule-
based approaches, it delivers significantly improved accuracy and format handling. The hybrid
architecture addresses the limitations of each individual approach, resulting in a more balanced and
capable system overall.
Accuracy (%)
95| ┌───┐
| ┌───┐ │ │
90| ┌───┐ │ │ │ │
| ┌───┐ │ │ │ │ │ │
85| │ ││ │ │ ││ │
| ┌───┐ │ │ │ │ │ │ │ │
80| │ ││ ││ │ │ ││ │
| ┌───┐ │ │ │ │ │ │ │ │ │ │
75| ┌───┐ │ │ │ │ │ │ │ │ │ │ │ │
| │ ││ ││ ││ ││ │ │ ││ │
70+──┴───┴─┴───┴─┴───┴─┴───┴─┴───┴──┴───┴─┴───┴──>
└────Skills────┘ └────Education────┘
└───Experience───┘ └────Overall────┘
Accuracy (%)
90| ┌───┐
| │ │
80| ┌───┐ │ │
| ┌───┐ │ │ │ │
70| ┌───┐ │ │ │ │ │ │
| │ ││ ││ │ │ │
60| ┌───┐ │ │ │ │ │ │ │ │
| ┌───┐ │ │ │ │ │ │ │ │ │ │
50| │ │ │ │ │ │ │ │ │ │ │ │
| │ ││ ││ ││ ││ │ │ │
40+──┴───┴─┴───┴─┴───┴─┴───┴─┴───┴──┴───┴──>
Accuracy (%)
95| ● LLM-Powered
90| ● NER-Based
85| ● ML-Based
80|
| ● Rule-Based
75|
70+───────────────────────────────────────────────>
0 2 4 6 8 10 12
The graphical representations highlight several key insights about the LLM-Powered Resume Parser's
performance relative to existing systems:
1. Category-Specific Performance: The LLM-Powered Resume Parser shows the most significant
improvement in skills extraction, where semantic understanding is particularly valuable. The
gap is smaller for education information, where even rule-based systems perform reasonably
well due to the more standardized nature of educational credentials.
2. Format Handling: The proposed system demonstrates dramatically better performance on
non-standard resume formats, including creative layouts, multiple columns, and infographic-
style resumes. This represents one of the most significant advantages over existing systems,
which typically show sharp performance degradation with non-standard formats.
3. Processing Time vs. Accuracy Tradeoff: While the LLM-Powered Resume Parser has slightly
longer processing times compared to rule-based systems, the accuracy improvement more
than compensates for this difference. The processing time remains well within acceptable
limits for practical recruitment workflows.
4. Balanced Performance Profile: The proposed system shows the most balanced performance
profile across different metrics, without major weaknesses in any particular area. This
contrasts with existing systems that tend to excel in specific dimensions while struggling in
others.
5. Recovery Capabilities: One of the most significant advantages not captured in standard
accuracy metrics is the system's ability to recover from parsing failures. The proposed
system's 94.2% recovery rate means that even challenging documents that cause initial
parsing issues can be successfully processed through fallback mechanisms.
6. Operational Considerations: The comparative analysis reveals that while the LLM-Powered
Resume Parser has moderate implementation complexity and operational costs, it offers
significantly reduced maintenance requirements compared to rule-based systems. This
provides long-term operational advantages that offset the initial implementation effort.
These comparisons demonstrate that the LLM-Powered Resume Parser represents a significant
advancement in resume parsing technology, particularly in addressing the persistent challenges of
format diversity and semantic understanding that have limited the effectiveness of existing systems.
The hybrid architecture successfully combines the strengths of different approaches while mitigating
their individual weaknesses, resulting in a more capable and practical solution for automated resume
information extraction.
7.1 Summary
The LLM-Powered Resume Parser project has successfully developed an innovative approach to
automated resume information extraction by combining the semantic understanding capabilities of
large language models with the reliability of traditional parsing techniques. The system addresses
longstanding challenges in resume parsing, particularly the difficulties in handling diverse formats
and extracting contextually relevant information.
Key Achievements
1. Hybrid Parsing Architecture: The project's primary contribution is the development and
implementation of a hybrid parsing architecture that integrates Perplexity AI's language
model capabilities with rule-based parsing techniques. This approach leverages the
complementary strengths of both methods, resulting in superior overall performance
compared to either approach used in isolation.
2. Multi-Format Support: The system successfully handles various document formats (PDF,
DOCX, images) with format-specific processing techniques, ensuring consistent extraction
quality regardless of the original document type. This addresses the practical reality of
recruitment workflows where resumes are received in diverse formats.
5. Web-Based Interface: The development of an intuitive web interface allows users to upload
resumes, view parsed information, and filter candidates based on multiple criteria, making
the system accessible to non-technical users in recruitment roles.
Performance Summary
The LLM-Powered Resume Parser achieved impressive performance metrics across various
dimensions:
Category-Specific Accuracy:
Processing Efficiency:
Format Adaptability:
Economic Efficiency:
Comparative Advantage
When compared to existing resume parsing solutions, the LLM-Powered Resume Parser
demonstrates significant advantages:
Broader Impact
Beyond its technical achievements, the system has demonstrated potential for significant impact on
recruitment processes:
The LLM-Powered Resume Parser project has successfully demonstrated the potential of hybrid AI
approaches in document processing tasks, combining the advanced semantic understanding of large
language models with the reliability and efficiency of traditional techniques. The resulting system
represents a significant advancement in resume parsing technology, addressing key limitations of
existing approaches while providing practical value for recruitment workflows.
7.2 Limitations
Despite its significant achievements, the LLM-Powered Resume Parser has several limitations that
should be acknowledged:
API Dependency: The system relies on external API services for LLM functionality, creating a
potential single point of failure.
Response Variability: LLM outputs can vary even with identical inputs and carefully
engineered prompts, occasionally leading to inconsistent parsing results.
Token Limitations: The system is constrained by LLM context window limitations, potentially
limiting performance on extremely lengthy resumes.
Cost Scaling: API costs scale linearly with usage, potentially affecting affordability for very
high-volume applications.
Complex Visual Formats: Highly creative resume designs with extensive graphical elements
remain challenging, particularly when visual layout carries semantic meaning.
Language Support: While the underlying LLM has multilingual capabilities, the system was
primarily tested and optimized for English-language resumes, with limited validation of other
languages.
Font and Symbol Recognition: Uncommon fonts or specialized symbols occasionally cause
extraction errors, particularly in OCR processing.
3. Domain-Specific Limitations
Regional Variations: Resume conventions vary significantly by region, and the system has
been primarily tested on North American and European formats.
Academic CV Format: Lengthy academic CVs with publication lists and grant information
present specific challenges not fully addressed in the current implementation.
Error Propagation: Errors in section identification can cascade to affect all information within
misidentified sections.
Limited Feedback Loop: The system lacks automated mechanisms to learn from correction
or validation of its outputs.
Integration Complexity: Integration with existing ATS systems may require custom adapters
due to varying data schemas.
Security Considerations: Sending resume data to external API services raises potential data
privacy concerns in some jurisdictions.
Update Management: Changes to the LLM API or response formats may require prompt
engineering adjustments.
6. Validation Limitations
Test Corpus Limitations: While diverse, the test corpus of 100 resumes cannot represent the
full spectrum of possible resume formats and contents.
These limitations represent opportunities for future research and development to further enhance
the capabilities and robustness of the LLM-Powered Resume Parser. While they do not negate the
significant achievements of the current implementation, they should be considered when evaluating
the system for specific deployment scenarios or when planning future enhancements.
Based on the current implementation and identified limitations, several promising directions for
future enhancements to the LLM-Powered Resume Parser are proposed:
Model Fine-Tuning: Develop domain-specific fine-tuning for the LLM using labeled resume
data to improve extraction accuracy for specialized fields.
Local Model Deployment: Explore deployment of smaller, specialized LLMs locally to reduce
API dependency and address privacy concerns.
Advanced OCR Pipeline: Develop a more sophisticated OCR pipeline with pre-processing
optimizations specifically designed for resume documents.
Layout Understanding: Incorporate visual layout analysis to better understand the semantic
structure of graphically complex resumes.
Table Extraction: Implement specialized processing for tabular data in resumes, particularly
for skills matrices and project details.
3. Multilingual Capabilities
Multilingual Prompts: Develop specialized prompts optimized for different languages and
regional resume conventions.
Requirement Matching: Add capability to parse job descriptions and automatically match
candidates to position requirements.
Active Learning: Implement active learning approaches to identify challenging cases for
human review.
Anomaly Detection: Add capabilities to identify unusual resume elements that may require
special handling or verification.
Batch Processing: Add optimized batch processing capabilities for high-volume scenarios.
ATS Integration: Develop pre-built connectors for popular ATS platforms to streamline
integration.
SaaS Offering: Create a fully managed SaaS version with simple API access for third-party
integration.
Enterprise Features: Add role-based access control, audit logging, and other enterprise-
grade features.
Social and Web Presence: Implement extraction and verification of linked profiles and online
portfolios.
These proposed enhancements represent a roadmap for evolving the LLM-Powered Resume Parser
from its current implementation to an even more capable and comprehensive solution. By
addressing current limitations and expanding functionality in these areas, the system can further
advance the state of the art in resume parsing technology while providing increasing value for
recruitment and HR applications.
The LLM-Powered Resume Parser project aligns with several United Nations Sustainable
Development Goals (SDGs), contributing to broader social and economic objectives beyond its
immediate technical application.
The project most directly supports SDG 8, which aims to "promote sustained, inclusive and
sustainable economic growth, full and productive employment and decent work for all." By
improving the efficiency, accuracy, and fairness of the recruitment process, the system contributes to
several targets within this goal:
Target 8.5: Achieve full and productive employment and decent work for all women and men
Structured data extraction reduces potential for unconscious bias in initial resume
screening
Target 8.6: Substantially reduce the proportion of youth not in employment, education or
training
More effective parsing of entry-level resumes with limited experience helps young
people enter the workforce
Structured skill extraction helps identify transferable skills from educational
experiences
The project also contributes to several other SDGs in less direct but still meaningful ways:
Potential to reduce bias in initial resume screening through objective information extraction
Improved accessibility to recruitment processes for candidates with diverse resume formats
The LLM-Powered Resume Parser's alignment with SDG 8 (Decent Work and Economic Growth) is
particularly significant and can be examined through several specific dimensions:
The system's demonstrated efficiency improvements (estimated 5 minutes saved per resume, 91.3%
accuracy) represent tangible contributions to these economic objectives.
Processing diverse resume formats, accommodating candidates with different resources and
backgrounds
Extracting information based on content rather than presentation, potentially reducing
format-based disadvantages
Enabling broader candidate consideration through efficient filtering, rather than arbitrary
cut-offs
These capabilities support SDG 8's emphasis on inclusive economic growth and employment access.
Accurate extraction of both technical and soft skills across diverse descriptions
This aspect of the system contributes to SDG 8's focus on full and productive employment by
ensuring that candidates' capabilities are accurately represented and considered.
These innovations align with SDG 8's emphasis on technological upgrading and productivity
enhancement through innovation.
Better identification of skills gaps in candidate pools can inform training programs
Improved matching between skills and job requirements leads to better role fit
4. Economic Opportunity:
Reduced hiring costs may enable smaller organizations to compete for talent
Risk: LLMs may contain biases that could affect parsing performance across different
demographic groups
Risk: Candidates without access to modern resume formats or digital tools may be
disadvantaged
3. Privacy Considerations:
Risk: Processing personal data through external APIs raises privacy concerns
Mitigation: Clear data handling policies, potential for local model deployment,
compliance with data protection regulations
Environmental Considerations
1. Resource Efficiency:
2. Energy Consumption:
By considering these social and environmental impacts alongside technical performance, the LLM-
Powered Resume Parser project demonstrates how AI technologies can be developed and deployed
in ways that support broader sustainability objectives while delivering practical business value.
REFERENCES
[1] S. Kopparapu, "Automatic Extraction of Input Data from Resumes to Aid Recruitment Process,"
International Journal of Information Processing, vol. 24, no. 3, pp. 117-132, 2010.
[2] F. Javed and P. Arun, "Rule-based Information Extraction from Resumes," IEEE International
Conference on Data Mining Workshops, pp. 358-365, 2013.
[3] N. Singh, A. Garg, and D. Sharma, "Automated Resume Parsing: Techniques and Challenges,"
International Journal of Information Processing, vol. 18, no. 4, pp. 423-441, 2017.
[4] N. Sayfullina, E. Malmi, Y. Liao, and A. Jung, "Applying Machine Learning to Resume Parsing,"
Journal of Intelligent Information Systems, vol. 42, no. 3, pp. 279-295, 2018.
[5] J. Chen, Z. Zhang, and R. Wang, "Resume Information Extraction with Conditional Random
Fields," IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 5, pp. 897-910, 2019.
[6] T. Brown, B. Mann, N. Ryder, et al., "Language Models are Few-Shot Learners," Advances in
Neural Information Processing Systems, vol. 33, pp. 1877-1901, 2020.
[7] K. Yu, L. Jiang, and H. Liu, "Deep Learning Approaches for Resume Parsing," Computational
Intelligence, vol. 36, no. 4, pp. 432-451, 2020.
[8] E. Ferrara, A. De Rosa, and D. Rossi, "Transformer-Based Models for Resume Information
Extraction," Neural Computing and Applications, vol. 33, pp. 6187-6201, 2021.
[9] M. Wang, S. Li, and H. Zhang, "Hybrid Resume Parsing: Combining Rules and Deep Learning,"
Knowledge-Based Systems, vol. 235, pp. 107629, 2022.
[10] Y. Zhang and J. Liu, "Document Information Extraction Using Large Language Models,"
Computational Linguistics, vol. 48, no. 3, pp. 567-589, 2022.
[11] A. Vaswani, N. Shazeer, N. Parmar, et al., "Attention Is All You Need," Advances in Neural
Information Processing Systems, vol. 30, pp. 5998-6008, 2017.
[12] J. Devlin, M. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding," Proceedings of NAACL-HLT, pp. 4171-4186, 2019.
[13] S. Gupta and R. Sharma, "Integrating LLMs with Traditional NLP for Resume Analysis," Expert
Systems with Applications, vol. 213, pp. 118876, 2023.
[14] M. Johnson, T. Wilson, and S. Kumar, "Evaluating Resume Parsing Systems: Metrics and
Benchmarks," Journal of Information Science, vol. 47, no. 5, pp. 612-628, 2021.
[15] H. Martinez and G. Thompson, "Performance Comparison of Commercial Resume Parsing
Solutions," International Journal of Human Resource Management, vol. 34, no. 2, pp. 345-367, 2022.