Automation of Text Summarization Using Hugging Face NLP
Automation of Text Summarization Using Hugging Face NLP
Abstract—Within the expansive domain of ”Natural Language tion models, unraveling the intricacies of context comprehen-
2024 5th International Conference for Emerging Technology (INCET) | 979-8-3503-6115-5/24/$31.00 ©2024 IEEE | DOI: 10.1109/INCET61516.2024.10593316
Processing” (NLP), the task of ”text summarization” emerges as sion, theme extraction, and abstraction. The significance of
a foundational element, playing a pivotal role in distilling relevant these models lies not only in their ability to generate concise
information from extensive textual corpora. In the digital age,
the importance of efficient summarization becomes increasingly summaries but also in their aptitude for grasping the nuanced
critical, given the overwhelming volume of textual information. layers of meaning embedded within textual data. As infor-
This comprehensive study delves into the intricacies of both mation proliferates across digital platforms, the demand for
extractive and abstractive summarization techniques, placing a sophisticated summarization tools becomes increasingly im-
specific focus on transformer-based models like BERT and GPT. perative. The study aims to contribute comprehensive insights
These models, celebrated for their remarkable capabilities in con-
text comprehension and coherent summarization, are rigorously into the evolving landscape of text summarization, delineating
evaluated alongside established methods like TF-IDF, TextRank, the varied approaches adopted by different models to address
Sumy, Fine Tuning Transformers, Model-T5, LSTM, greedy, and the challenges posed by the CNN/Daily Mail dataset.
beam search. The practical implications of text summarization Among the array of models under scrutiny, the Hug-
extend across diverse fields, encompassing news stories, academic ging Face model ”ml6team/mbart-large-cc25-cnn-dailymail-
papers, and social media content, underscoring its broad utility
in various domains. This study not only incorporates cutting- nl-finetune” takes center stage. Training is done on the dataset,
edge models but also explores a gamut of evaluation methods this model exhibits promising results in the form of accuracy
to discern the quality of summarization. By intertwining theory and efficiency. The study meticulously unravels the intricacies
and application, this research positions itself at the forefront of the model’s training process, shedding light on the fine-
of evolving summarization approaches, shedding light on the tuning mechanisms that optimize its performance for the
transformative impact on information consumption patterns. The
dynamic landscape of summarization methods underscores the specific characteristics of the dataset. Model evaluation and
need for continuous research and innovation, as technological visualization techniques are employed to provide a granular
advancements continue to reshape how individuals access and understanding of the Hugging Face model’s output, presenting
comprehend information. a comprehensive view of its strengths and potential avenues
Index Terms—text summarization, Extractive Summarization, for refinement.
Abstractive Summarization, News Summarization
In the pursuit of comprehensive analysis, the study ex-
tends its scrutiny to various summarization models, creating
I. I NTRODUCTION
a comprehensive benchmark for performance evaluation. The
In the widely evolving landscape of Natural Language objective is not only to identify the superior model but also
Processing (NLP), the task of text summarization emerges to discern the unique strengths and limitations inherent in
as a cornerstone, playing a pivotal role in distilling relevant each approach. As the study unfolds, the Hugging Face model
information from extensive textual corpora. The profound stands out, showcasing superior accuracy and adaptability
impact of NLP technologies on information retrieval, content within the challenging landscape of the dataset. However,
curation, and user experience underscores the importance of this analysis is not merely a proclamation of success; it’s a
advancing text summarization techniques. This comprehensive recognition of the iterative nature of model development. The
study embarks on a nuanced exploration of various state-of- study underscores the importance of continuous refinement
the-art summarization models, with a specific emphasis on and adaptation to meet the evolving intricacies of textual data.
their performance within the intricacies of the CNN/Daily In conclusion, this study provides a holistic understanding
Mail dataset. This dataset, renowned for its diversity and of text summarization models, their training processes, and
complexity, serves as an ideal testing ground for evaluating their performance on the dataset. As the Hugging Face model
the robustness and adaptability of text summarization models emerges as a frontrunner, the study contributes valuable in-
under real-world conditions. sights to the ongoing discourse in NLP research. It illuminates
Delving into the heart of the matter, the study meticulously the path forward, emphasizing the need for robust, adaptive
dissects the methodologies employed by different summariza- models that can navigate the complexities of real-world tex-
2
Authorized licensed use limited to: JNT University Kakinada. Downloaded on December 16,2024 at 10:49:54 UTC from IEEE Xplore. Restrictions apply.
of real-world applications of ATS, including book, story/novel, sentence lengths. Table 1 gives us the information about the
and email summarization. The paper also covers the challenges Sentence length analysis.
involved in evaluating the quality of generated summaries,
and the different evaluation methods used. Overall, the paper
provides a comprehensive overview of the current state of ATS
research and its potential applications.
III. DATASET
The CNN/DailyMail Dataset is a comprehensive English-
language collection comprising over 300,000 unique news
articles from CNN and the Daily Mail. Initially designed for
machine-reading and comprehension, versions 2.0.0 and 3.0.0 Figure 1: IQR analysis of of sentence lengths
transformed the dataset to support abstractive and extractive
summarization. With a focus on model evaluation through
Table I: Sentence length analysis
ROUGE scores, the dataset consists of train (287,113), valida-
tion (13,368), and test (11,490) splits. Each instance includes ValueCategory label length
count 1 000 1 000
an article, highlights, and a unique identifier. The mean token mean 4 .73600 16 .54400
counts for articles and highlights are 781 and 56, respectively. std 0 .78161 10 .567106
The dataset, spanning April 2007 to April 2015, aims to min 1 .00000 3 .000000
0.25 5 .00000 9 .000000
facilitate the development of models adept at summarizing 0.5 5 .00000 14 .000000
extensive text into concise sentences. Notably, concerns about 0.75 5 .00000 20 .000000
biases, gender bias measurements, and potential limitations max 5 .00000 64 .000000
in article structure and co-reference errors are discussed,
highlighting the dataset’s nuances for future research. C. Data Generation
A. Data Preprocessing
The data collected from CNN/Daily Mail has been pre-
processed by the previous authors which include the removal Figure 2: Data generation
of stop words, cleaning of unnecessary punctuation, and re-
moval of non concise statements.
3
Authorized licensed use limited to: JNT University Kakinada. Downloaded on December 16,2024 at 10:49:54 UTC from IEEE Xplore. Restrictions apply.
LearningRateScheduler. The model underwent 2 epochs, with vocabulary size determined by words occurring at least 50
a learning rate of 0.001. Notably, the acknowledgment of over- times [20]. The two-epoch training employs essential call-
fitting and the need for further experimentation underscored a backs, including ModelCheckpoint and EarlyStopping. Model
thoughtful and adaptive training approach. parameters total 146,575, and the learning rate is set at
2) Beam and Greedy Search: Beam search takes into 0.001. A visual representation of the training history aids in
account a predetermined number of the most likely candidates, understanding the model’s learning dynamics.
known as the beam width, and chooses the sequence with the 5) Huggingface model: The Hugging Face model, based
highest joint probability as opposed to the greedy search algo- on the ”ml6team/mbart-large-cc25-cnn-dailymail-nl-finetune”
rithm, which chooses the highest probability token at each time architecture, is trained to utilize a large-scale dataset with
step[15-16]. When the end-of-sequence token is generated, the diverse news articles. The training process involves fine-tuning
algorithm keeps producing candidates and updating the beam; the pre-existing model on the specific summarization task,
the candidate with the highest joint probability is then chosen considering a maximum sequence length of 1024 tokens[21].
as the output sequence. The training corpus includes a mix of news articles, allowing
The decoder chooses the token with the highest likelihood the model to grasp the nuances of diverse writing styles
to be the next token in the output sequence at each decoding and content structures. Key hyperparameters, such as learn-
step in a greedy search[14]. Until an end-of-sequence token ing rate, batch size, and training epochs, are optimized for
is issued, indicating that the output sequence is finished, this effective convergence. The Hugging Face model, renowned
operation is repeated. for its innovation in natural language processing (NLP),
3) Pegasus Model: The Pegasus model emerges as a boasts impressive technical specifications. With a foundation
groundbreaking innovation in natural language generation, in transformer architecture, it leverages attention mechanisms
renowned for its exceptional capabilities in producing coherent to process input sequences efficiently. The model’s multi-
and contextually relevant long-form text summaries. Leverag- layered structure enables it to capture intricate linguistic
ing advanced pre-training objectives that prioritize document- patterns and nuances, facilitating tasks such as text generation,
level understanding, Pegasus excels in distilling crucial infor- translation, and sentiment analysis with remarkable accuracy.
mation from documents, articles, or web pages into succinct Additionally, its parameter-efficient design allows for faster
and comprehensible summaries. Its finely tuned attention inference without compromising performance, making it a
mechanisms ensure the effective capture and synthesis of preferred choice for various NLP applications. The Hugging
salient details, striking a balance between conciseness and in- Face model’s versatility, speed, and state-of-the-art capabilities
formativeness. What sets Pegasus apart is its impressive trans- continue to redefine the landscape of language understanding
fer learning prowess, effortlessly adapting to diverse domains and generation in the realm of artificial intelligence.
and tasks with minimal fine-tuning required. Furthermore,
its scalability enables efficient processing of large datasets, E. MODEL EVALUATION
empowering researchers and practitioners to extract insights 1) RNN model and Pegasus model: Post-training evaluation
from extensive textual data efficiently. The amalgamation of showcased the RNN model’s journey, with loss values of
these features positions Pegasus at the forefront of natural 2.5572 and 2.5004 for the two epochs. The visual representa-
language processing, promising transformative advancements tion of the model’s loss over epochs provided a clear narrative
in information synthesis and knowledge extraction. The trans- of the learning process, guiding further refinement strategies.
formers library is employed to configure the Pegasus model for In conclusion, the human-centric approach to data exploration,
conditional generation. Both the Pegasus tokenizer and model preprocessing, and model development incorporated essential
are loaded seamlessly, showcasing a reliance on pre-trained numerical considerations. The dataset’s initial size, vocabulary
models for text summarization tasks[17]. This strategic use of dimensions, and key statistics on sentence lengths and model
state-of-the-art transformer models aligns with best practices parameters provided a quantitative foundation for effective
in natural language processing[18]. The code proceeds to load language modeling. Refer to Figure 3 and Figure 4.
and explore data from the CNN-DailyMail dataset, providing Rouge scores are calculated to quantitatively check the
insights into the structure of the dataset. Comprising articles quality of the summaries that are generated. The metrics in-
and corresponding highlights, the dataset forms the foundation clude precision, recall, and F-score for Rouge-1, Rouge-2, and
for subsequent text summarization tasks[19]. Text summariza- Rouge-l. The results are presented in tabulated form, offering
tion is achieved through a loop that iterates over a subset a comprehensive overview of the summarization performance.
of articles. Summaries are generated using a function named Specifically, the calculated Rouge-1 precision, recall, and F-
’text summarization,’ and the resulting summaries are stored score are approximately 0.335041,0.339785, and 0.335659
in a dictionary for further analysis. This section exemplifies respectively. For Rouge-2, the corresponding values are ap-
the practical application of the Pegasus model for real-world proximately 0.156672,0.174546 and 0.16399. Lastly, Rouge-
summarization tasks. l scores are approximately 0.319168,0.323091 and 0.319436.
4) Seq2Seq: The Seq2Seq model undergoes training using The Rouge scores are further visualized using a bar chart,
a bidirectional LSTM architecture with an embedding layer, providing a clear comparative analysis of precision, recall,
considering 85,000 data points. Tokenization is applied with a and F-score across Rouge-1, Rouge-2, and Rouge-l metrics.
4
Authorized licensed use limited to: JNT University Kakinada. Downloaded on December 16,2024 at 10:49:54 UTC from IEEE Xplore. Restrictions apply.
3) Seq2Seq: Post-training, the Seq2Seq model is rigorously
evaluated using Rouge metrics, resulting in precision, recall,
and F-score values. Rouge-1 precision, recall, and F-score
are approximately 0.2049, 0.3461, and 0.2494. For Rouge-
2, values are about 0.0770, 0.1600, and 0.1035, and Rouge-l
scores are approximately 0.1912, 0.3179, and 0.2312. A bar
chart visually represents these metrics, enhancing accessibility
for a comprehensive analysis. The Seq2Seq model, driven
by bidirectional LSTM architecture, showcases robust training
and evaluation. Careful consideration of hyperparameters, in-
Figure 3: Comparative plot for precision, recall, fscore metric tegration of essential callbacks, and use of Rouge metrics col-
for RNN model and Pegasus model lectively contribute to an effective text summarization model.
The model’s comprehensive evaluation, both numerically and
visually, attests to its ability to generate coherent and contex-
tually relevant text summaries.
Figure 4: Loss function graph for RNN model and Pegasus Figure 6: Training and validation loss for Seq2Seq model
model
5
Authorized licensed use limited to: JNT University Kakinada. Downloaded on December 16,2024 at 10:49:54 UTC from IEEE Xplore. Restrictions apply.
and contextually relevant summaries. The model’s success in based on metrics such as ROUGE scores, where this model
handling diverse news articles attests to its adaptability and consistently demonstrated superior performance, achieving the
effectiveness in real-world summarization scenarios. highest precision, recall, and F-score among the compared
models. The accuracy of the Hugging Face model can be
attributed to its effective fine-tuning of the CNN/Daily Mail
dataset, ensuring a better understanding and generation of
concise summaries. To further enhance summarization mod-
els, in the future improvements can be made in terms of
handling longer sequences, as indicated by challenges faced
during model evaluation with certain articles. Additionally,
incorporating more diverse datasets and exploring advanced
pre-training strategies may contribute to creating even more
robust and effective summarization models, Extractive and
abstractive summarization along with transformer models like
T5, and GPT models can be used. In comparison to other
models, the Hugging Face model showcased a notable edge
Figure 8: Comparative plot for precision, recall, F-score metric in producing high-quality and coherent summaries, making it
for Hugging Face model a preferred choice for applications demanding accurate and
informative content condensation.
6
Authorized licensed use limited to: JNT University Kakinada. Downloaded on December 16,2024 at 10:49:54 UTC from IEEE Xplore. Restrictions apply.
[13] Kryściński, Wojciech, Bryan McCann, Caiming Xiong, and Richard
Socher. ”Evaluating the factual consistency of abstractive text summa-
rization.” arXiv preprint arXiv:1910.12840 (2019).
[14] Xu, Jiacheng, Zhe Gan, Yu Cheng, and Jingjing Liu. ”Discourse-aware
neural extractive text summarization.” arXiv preprint arXiv:1910.14142
(2019).
[15] Mohamed, Muhidin, and Mourad Oussalah. ”SRL-ESA-TextSum: A text
summarization approach based on semantic role labeling and explicit
semantic analysis.” Information Processing and Management 56, no. 4
(2019): 1356-1372.
[16] A. M and S. M. Rajgopal, ”Exploring Unique Techniques to Preserve
Confidentiality and Authentication,” 2024 2nd International Confer-
ence on Intelligent Data Communication Technologies and Internet of
Things (IDCIoT), Bengaluru, India, 2024, pp. 440-447, doi: 10.1109/ID-
CIoT59759.2024.10467248.
[17] Nair, A.R., Singh, R.P., Gupta, D. and Kumar, P., 2024. Evaluating the
Impact of Text Data Augmentation on Text Classification Tasks using
DistilBERT. Procedia Computer Science, 235, pp.102-111.
[18] Paul, Pretty, and Rimjhim Padam Singh. ”Sentiment Rating Predic-
tion using Neural Collaborative Filtering.” In 2022 IEEE 7th Interna-
tional Conference on Recent Advances and Innovations in Engineering
(ICRAIE), vol. 7, pp. 148-153. IEEE, 2022.
[19] Kavitha C.R.Rajarajan S J, R. Jothilakshmi, Kamal Alaskar, Mohammad
Ishrat, V. Chithra, Study of Natural Language Processing for Sentiment
Analysis, 2023 3rd International Conference on Pervasive Computing
and Social Networking (ICPCSN), 19-20 June 2023. 2023 3rd Inter-
national Conference on Pervasive Computing and Social Networking
(ICPCSN) 19th-20th June 2023
[20] Vidya Kumari K. R., Kavitha C. R., Data Mining for the Social
Awareness of the Social Networks, 3rd International Conference on
Computational System and Information Technology Sustainable Solu-
tions (CSITSS 2018), December 2018, pp: 7-17.
[21] Mridula A., Kavitha C. R., Opinion Mining and Sentiment Study of
Tweets Polarity Using Machine Learning, Proceedings of 2nd Inter-
national Conference on Inventive Communication and Computational
Technologies (ICICCT 2018), April 2018, pp: 621-626.
[22] Venkataramani, Eknath and Gupta, Deepa. (2010). English-Hindi
Automatic Word Alignment with Scarce Resources. 253-256.
10.1109/IALP.2010.5
7
Authorized licensed use limited to: JNT University Kakinada. Downloaded on December 16,2024 at 10:49:54 UTC from IEEE Xplore. Restrictions apply.