Automated Literature Review Using NLP Techniques and LLM-Based Retrieval-Augmented Generation
Automated Literature Review Using NLP Techniques and LLM-Based Retrieval-Augmented Generation
Abstract—This research presents and compares multiple ap- relevant information can be a time-consuming, tedious, and
proaches to automate the generation of literature reviews using error-prone task. Due to these difficulties, there has been
several Natural Language Processing (NLP) techniques and an increasing interest in automating the process of literature
retrieval-augmented generation (RAG) with a Large Language
Model (LLM). The ever-increasing number of research articles reviews [1]. Automated systems can use natural language
provides a huge challenge for manual literature review. It has processing techniques and machine learning algorithms to
resulted in an increased demand for automation. Developing a analyze extensive amounts of text, extract relevant details, and
system capable of automatically generating the literature reviews create structured summaries [2].
from only the PDF files as input is the primary objective of this The primary objective of this research is to develop a system
research work. The effectiveness of several Natural Language
Processing (NLP) strategies, such as the frequency-based method that can automatically generate the literature review segment
(spaCy), the transformer model (Simple T5), and retrieval- of a research paper by using only the PDF files of the related
augmented generation (RAG) with Large Language Model (GPT- papers as input. Several Natural Language Processing tech-
3.5-turbo), is evaluated to meet the primary objective. The niques such as the Frequency-based approach, Transformer-
SciTLDR dataset is chosen for this research experiment and based approach, and Large Language Model-based approach
three distinct techniques are utilized to implement three different
systems for auto-generating the literature reviews. The ROUGE are implemented and compared to find the best procedure. The
scores are used for the evaluation of all three systems. Based SciTLDR dataset [3] is selected for this research work. The
on the evaluation, the Large Language Model GPT-3.5-turbo first procedure uses the frequency-based approach. The library
achieved the highest ROUGE-1 score, 0.364. The transformer named spaCy [4] is utilized here. The second procedure uses
model comes in second place and spaCy is at the last position. the transformer-based model. The Simple T5 model is utilized
Finally, a graphical user interface is created for the best system
based on the large language model. here. The last procedure is based on using the Large Language
Index Terms—T5, SpaCy, Large Language Model, GPT, Model. The GPT-3.5-TURBO-0125 model is utilized here.
ROUGE, Literature Review, Natural Language Processing, The evaluation and comparison are performed using ROUGE
Retrieval-augmented generation. scores [5]. Then the best approach is identified and a Graphical
User Interface-based tool is created.
I. INTRODUCTION Automating aspects of the literature review process allows
Literature reviews have gained considerable importance academicians to save time and concentrate on the most perti-
for scholars. It provides researchers with a comprehensive nent articles for their research. It can also reduce the chance
overview of previous findings in a specific field and assists of errors or prejudice in the review process. The highlights of
scholars in identifying gaps in past understandings. It helps to this article are:
conduct future research and informs researchers of areas where • All three considered NLP approaches such as spaCy, T5,
they can provide significant input. However, conducting liter- and GPT-3.5-TURBO-0125 model can produce satisfac-
ature reviews can be incredibly cumbersome because there’s tory results in automating the literature review generation.
so much to read. Due to the vast volume of research articles • The LLM-based model outperforms T5 and spaCy in
being released, reviewing all related studies and extracting generating literature reviews.
II. LITERATURE REVIEW and enhance the effectiveness of automated approaches.
A brief overview on the topic of automatic literature review
A framework was proposed by Silva et al. [6] for auto- tools was given by Tsai et. al. [11] They discussed the
matically producing systematic literature reviews. They have existing research in the field, the challenges faced in conduct-
focused on four technical steps: Searching, Screening, Map- ing literature reviews manually, and the potential benefits of
ping, and Synthesizing. In response to a specific inquiry, automating the process. The main focus of their contributions
extensive searches are conducted to find as much relevant is the evaluation of Mistral LLM’s effectiveness in the field
research as feasible, involving looking through reference lists, of Academic Research.
scouring internet databases, and reviewing published materials. The gaps in the intersection of systematic literature reviews
Screening reduces the search scope by limiting the collection (SLRs) and LLMs are discussed by Susnjak et. al. [12]. They
to only the papers pertinent to a particular review, aiming also emphasized the need to address challenges in the synthesis
to highlight important findings and facts that could influence phase of research and highlighted the potential of fine-tuning
policy. Mapping is used to comprehend research activity in LLMs with datasets to enhance knowledge synthesis accuracy.
a particular area, involve stakeholders, and define priorities The study aims to bridge this gap by proposing a Systematic
concerning the review emphasis. Synthesizing integrates data Literature Review automation framework.
from numerous sources and provides an overview of the Most of the related works that have been discussed are
outcomes. The formulation of research questions, reporting mainly focused on discussing the potential and challenges of
phase, and peer review are some steps that are also discussed using NLP techniques and LLMs to automate the literature
for the composition of systematic literature reviews. review process. None of them proposes a complete system
Peer-reviewed publications are growing exponentially with pipeline where users can directly generate the literature re-
the rapid development of science. Therefore, Yuan et al. view only using the PDF and DOI. In contrast, this article
[7] have explored the use of machine learning techniques, proposes and implements three unique end-to-end pipelines
natural language generation, multi-document summarization, and procedures for a literature review automation system. This
and multi-objective optimization for automating scientific re- research endeavor has also resulted in the implementation
viewing. They have discussed the generation of comprehensive of a UI tool where users can directly upload PDFs and get
reviews and noted the limitations of constructive feedback a literature review segment generated automatically without
compared to human-written reviews. The models used in this any additional effort. Moreover, this paper also includes a
research are not yet fully capable of automating Literature comparative analysis of different approaches such as the
Reviews and they require human reviewers. frequency-based approach, transformer-based approach, and
A comprehensive analysis of existing tools for systematic rag-based approach using ROUGE scores which contributes
literature reviews was done by Karakan et al. [8]. They have towards finding the effectiveness of these approaches for this
explored the potential for automation in various phases of the task.
review process, highlighting the need for a holistic tool de-
sign to address researchers’ challenges effectively. They have III. SYSTEM DESIGN
discussed two methodologies to accomplish their research: The research is carried out in four stages: 1. Defining
Rapid Review and Semi-Structured Interviews. Rapid Review research objectives. 2. Proposing multiple procedures for au-
emphasizes decision-making procedures for resolving issues, tomated literature review generation. 3. Evaluating multiple
difficulties, and challenges that software engineers encounter procedures to find the best approach. 4. The final system
in their daily work. Semi-structured interviews are used development.
to explore researchers’ experiences, challenges, strategies,
strengths, weaknesses of Systematic Literature Review tools, A. Dataset Selection
and requirements for effective support in software engineering. The SciTLDR dataset from the Hugging Face is selected
Jaspers et al. [9] focused on the use of machine learning for this research work [13]. It contains the summarization
techniques for automation of literature reviews and systematic of scientific documents. It is a dataset with 5,400 TLDRs
reviews. They have outlined the pros and cons of different derived from over 3,200 papers. It contains both author-written
machine-learning techniques. The process of automating the and expert-derived TLDRs of scientific documents. Curated
literature review was elaborately discussed. The paper lacks research articles’ abstract, introduction, and conclusion (AIC)
practical validation across diverse domains and detailed in- or full text of the paper are given as ”source” and the
sights. summaries of the corresponding articles are given as ”target”.
A concise overview of automated literature reviews was Only these two attributes are utilized in all three proposed
presented by Tauchert et. al. [10] They have emphasized the procedures. There is no training for the spaCy approach, but
potential for automation in various stages of the systematic the dataset is utilized for testing purposes. The T5 model is
review process. The paper discusses the importance of in- trained using the SciTLDR dataset for the transformer-based
tegrating computational techniques to streamline tasks such approach and later evaluated on the test dataset. For the LLM-
as searching, screening, extraction, and synthesis. It also ac- based approach, this dataset is used as the knowledge base for
knowledges the need for further research to address challenges the model.
B. The Procedure Utilizing the Frequency-Based Approach model for the final pipeline. The SciTLDR dataset is collected
using spaCy to train the model. Then the dataset is prepared to use as the
The first procedure utilizes the frequency-based approach by training data for the selected model. A task-specific prefix is
using spaCy. The first task is to build the model pipeline. The added to summarize individual papers. Then the model is fine-
model pipeline takes text as input and converts the text into tuned as per the requirements. Then the model is trained with
NLP tokens using the spaCy library. Then preprocessing step is the training data and the result is predicted. The result is the
done by removing stop words and punctuation. Afterward, the summarization of individual papers. Then the evaluation is
word frequency is calculated for each word which later helps performed using ROUGE scores and the model is saved for
to calculate individual sentence weights. This sentence weight further utilization later in the system pipeline. The training
represents the importance of that sentence. Then the top 10 overview of the Transformer Model is given in Figure 3.
percent of sentences are selected as the final output. The model
is later evaluated using ROUGE scores to get an overview of
the performance. The overview of the spaCy Model is given
in Figure 1.
Figure 1: Building spaCy Model The next step is to implement a system pipeline by using
the transformer-based model to generate a literature review
The next step is to implement a system pipeline by using segment automatically. The system takes the DOI and PDF of
the spaCy model to generate a literature review segment multiple papers as input. It uses the Requests library to collect
automatically. The system takes the DOI and PDF files of the paper titles and first author names from DOIs. Then it uses
multiple papers as input. It uses the Requests library to collect PYPDF2 and Regular Expression (RE) libraries to collect each
the paper titles and first author names from the DOI. Then it PDF’s abstract, introduction, and conclusion. Then it merges 3
uses PYPDF2 and Regular Expression (RE) libraries to collect of these sections to get the final model input. Later it uses the
only the conclusion of each PDF. Then it uses the previously previously trained and saved T5 model to get a summary of
implemented spaCy model to get a summary of each paper. each paper. In the next step, it performs post-processing and
Later it performs post-processing and merges all summaries merges all summaries to produce a coherent literature review
to produce a coherent literature review segment. The system segment. The system pipeline overview of the Transformer
pipeline overview of the spaCy Model is given in Figure 2. Model is given in Figure 4.
C. The Procedure Utilizing the Transformer-Based T5 Model Figure 4: Pipeline using Transformer Model
The second approach utilizes the transformer-based Simple
T5 model. The first task is to train the model and prepare the
D. The Procedure Utilizing the Large Language Model: GPT- submits the thread to the assistant with the extracted text as
3.5-TURBO-0125 a query. Then the response from the assistant is retrieved and
The third procedure utilizes the RAG-based approach by the outputs of each paper are merged for the final literature
using the Large Language Model: GPT-3.5-TURBO-0125. The review segment. The system pipeline overview of the LLM is
first task is to create a custom OpenAI Assistant. Firstly, the given in Figure 6.
SciTLDR dataset is collected, and then the GPT-3.5-TURBO- E. The Final System Tool
0125 model is selected for the OpenAI assistant. The retrieval
The final system is implemented using the Large Language
is turned on and the dataset is added for the knowledge of the
Model: GPT-3.5-TURBO-0125 as the backend. An aesthetic
LLM. Now some prompt engineering is performed to produce
and simple user interface is created where the user can easily
the required output. Then the LLM results are evaluated using
upload multiple research articles or PDF files. The user has
ROUGE SCORE. The overview of the creation of the OpenAI
to press the ”Browse files” button and then select the files
assistant is given in figure 5.
to upload. Then the system loads the research papers and
within a few seconds, it produces the literature review segment
automatically. It individually processes each paper and pro-
duces output. The loading screen and processing file numbers
indicate the progress level and the number of processed papers.
At the end of the literature review, the UI shows ”Done” text
to indicate the completion of the task. The user interface of
the system is given in Figure 7
The used prompt: “The user will give you a pdf file as input,
similar to the “input” field of the given “data.json” file in your
knowledge base. You have to produce a summarized “output”
for the given pdf based on the file given to your knowledge.
The output will be of max 80 words. Note: You must write
in a way that can be considered a literature review of a new
research paper. The user in the future might add more PDFs
so try to make the literature review coherent and as per IEEE
standards. Please mention the first author’s name and paper
title. Don’t write like this “Literature Review of. . . ”.” Figure 7: The Preview of the System UI
REFERENCES
[1] Felizardo KR, Carver JC. Automating systematic literature review.
Contemporary empirical methods in software engineering. 2020:327-55.
[2] Adhikari S. Nlp based machine learning approaches for text summariza-
tion. In2020 Fourth International Conference on Computing Methodolo-
gies and Communication (ICCMC) 2020 Mar 11 (pp. 535-538). IEEE.
[3] Cachola I, Lo K, Cohan A, Weld DS. TLDR: Extreme summarization
of scientific documents. arXiv preprint arXiv:2004.15011. 2020 Apr 30.
[4] Jugran S, Kumar A, Tyagi BS, Anand V. Extractive automatic text
summarization using SpaCy in Python & NLP. In2021 International
conference on advance computing and innovative technologies in en-
gineering (ICACITE) 2021 Mar 4 (pp. 582-585). IEEE.
[5] Ali NF, Tanvin JU, Islam MR, Ahmed J, Akhtaruzzaman M. ROUGE
Score Analysis and Performance Evaluation Between Google T5 and
SpaCy for YouTube News Video Summarization. In2023 26th Interna-
tional Conference on Computer and Information Technology (ICCIT)
2023 Dec 13 (pp. 1-6). IEEE.