0% found this document useful (0 votes)
103 views8 pages

Meeting Insights Summarisation Using Speech Recognition

Speech is the strongest mode of discourse through which people express their emotions and ideas through numerous
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views8 pages

Meeting Insights Summarisation Using Speech Recognition

Speech is the strongest mode of discourse through which people express their emotions and ideas through numerous
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Meeting Insights Summarisation Using


Speech Recognition
Sakshil Verma1 Saksham Thareja2
Computer Science and Engineering Student, Computer Science and Engineering Student,
SRM Institute of Science and Technology SRM Institute of Science and Technology
Chennai, Tamil Nadu, India Chennai, Tamil Nadu, India

Dr. P. Supraja3
Associate Professor, Department of Networking and Communications, SRM Institute of Science and Technology,
Chennai, Tamil Nadu, India

Abstract:- Speech is the strongest mode of discourse the recognition and translation of speech into text. Text
through which people express their emotions and ideas summarization pulls the most significant information from a
through numerous languages. Speech recognition text-based source and offers an effective summary of the
authorization has varied applications as it provides same.
Hassle free procedure which does not require physical
contact as in the case of fingerprint authorization. Speech summarization is the process of condensing
Speech summarisation methods use speech from people human speech into a more concise and manageable form. It
as input and produce a condensed form as spoken or tries to write a summary that is suitable for a specific task.
written language. Speech synthesis offers a variety of The summary should be more coherent than a direct
applications spanning from computer technology to transcription of speech, as it eliminates common
medical care, including improving language libraries irregularities, breaks, repairs, and repetitions. The recent
and reducing therapeutic paperwork load. Every dialect interest in speech summarization is driven by improvements
has its unique collection of features for speaking. Despite in improving the precision of speech recognition systems,
speaking a comparable language, the speed and dialect the standard in audio capturing, and the rising use of natural
differ from individual to individual. This can make language as a computer structure.
comprehending the conveyed message difficult for
certain people. Conferences are an important part of The process of speech summarization involves several
every organisation's operation, regardless of if they took technological components such as automated speech
place via the web or in reality. Meeting translation and recognition (ASR), which translates voice into written form,
summarization standards, on the contrary hand, are and summary modules, which summarise information
typically disagreeable demands because they necessitate summarise key parts of the transcription. Users can use the
time-consuming workers. This project aims to identify Internet's Voice APIs to capture audio and submit it to a
things during meetings like the greatest number of times speech recognition web service for processing.
a person spoke in a meeting to determine his level of
inputs and summarisation of insights of meetings for all Speech summarization has a range of real-world
the employees in the meeting and identifying their applications, such as summarising broadcast news, podcasts,
insights through the words spoken by them. clinical conversations, and meetings. It presents a challenge
in speech understanding research and can be achieved
Keywords:- Speech Recognition, Speech Summarization, through extractive or abstractive summarization techniques.
Speech Pre-Processing, Spacy, Gensim. Extractive summarization preserves the original format and
is typically more fluent, while abstractive summarization is
I. INTRODUCTION more concise and flexible. The summary of speech ought to
be more intelligible than a straight transcript.
Speech is a highly powerful mode of communication
through which humans express their thoughts and feelings Meetings are a common and important part of business
through numerous languages. Each language has its unique operations. They provide opportunities for team members to
set of linguistic qualities. Even while speaking the same collaborate, exchange ideas, and make decisions. However,
language, the speed and accent vary from person to person. meetings can also be time-consuming and distracting,
It makes it difficult for certain people to comprehend the making it difficult for attendees to retain key information
conveyed message. Long speeches can be difficult to follow and insights. To address this challenge, the use of speech
at times owing to factors such as differing pronunciation, recognition and summarization technology has gained
pace, and other factors. Speech recognition, which is a attention as a way to efficiently and effectively process
cross-disciplinary issue in computational language science, meeting content.
contributes to the advancement of technology that allows for

IJISRT23APR2036 www.ijisrt.com 1747


Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Voice recognition is a technique that converts words summarization's purpose is to give a shortened and more
that are spoken into text. It is also known as automated intelligible version of the speech that is appropriate for a
speech recognition (ASR) or speech-to-text. It is an certain activity. The two primary types of speech summary
interdisciplinary field that involves speech signal approaches are extractive summarization and abstractive
processing, acoustic modelling, language modelling, and summarization, each with its own set of advantages and
machine learning. The goal of speech recognition is to disadvantages. The accuracy of the Speech recognition
accurately transcribe. human speech into written text, engine, the quality of the NLP algorithms, and the efficacy
allowing for easier processing, storage, and retrieval of of the machine learning models utilised all influence the
spoken information. quality of the summary output.

Recognition of speech has an extensive variety of use II. LITERATURE SURVEY


cases, including voice-activated artificial intelligence,
dictation software, and voice recognition software, Converting speech to text is beneficial in a variety of
accessibility solutions for people with disabilities, and scenarios. Jose et al. developed an effective technique for
hands-free control of devices. obtaining English fluency that improves the user's speaking
style through proper pronunciation using English phonetics.
Speech recognition is a challenging task due to the Sivakumar et al. did a comparative study of the advantages
complexity of human speech and the variability of spoken and disadvantages of different sizes of vocabulary Voice
language. Some of the major challenges include: recognition systems. The research conducted highlighted the
significance of computational models of language in
 Speaker Variability enhancing the precision of monologue-to-text translation
Different speakers have unique speech patterns, across various interference and breached-word conditions.
including pronunciation, speaking rate, and intonation. This Yogita and co-workers developed a bilingual language
variability can make it difficult for speech recognition conversion technology using the extraction of features from
systems to accurately transcribe speech from different MFCC and audio classification algorithms such as the Least
speakers. Length Encoder and Support Vector Machine (SVM).
Sphinx 4, a platform that is free to use, was recommended
 Background Noise for converting authentic Bengali text into English. In the
The existence of ambient noise can decrease the information set beneath examination, the researchers
quality of the spoken signal dramatically and make it more estimate to have achieved a level of precision of 71.7%.
difficult for the system to accurately transcribe the speech. Wan proposes summarising English text using association
semantic criteria. The novel extraction approach, according
 Vocabulary Size to the author, shows enhanced extraction convergence and
The size of the vocabulary that a speech recognition precision. LDA is the most extensively used topic-based text
system needs to support can have a significant impact on its categorization algorithm.
accuracy. Larger vocabularies require more complex
language models, which can be more difficult to train and A novel method to similarity calculations suggests a
can result in lower recognition accuracy. change for the better. Saiyed and Sajja gave a succinct
summary of the various categories of summarising
Speech pre-processing consists of reducing methodologies, emphasising their advantages and
background noise, adjusting loudness, and transforming the disadvantages. This work offers researchers advice on
speech input to a digital representation. The process of selecting particular methods in accordance with their
extracting features from a Speech signal entail translating it requirements. Choosing the right term is a multi-objective
into a collection of distinguishing qualities that may be used optimization problem. With this, the writers applied a
to identify the words uttered. human-centred training optimisation technique. According
to the authors of, feature extraction using neural networks is
Speech summarization approaches are often based on a more effective than online extractive techniques.
mix of the processing of natural language (NLP), Vythelingum et al. proposed a method for detecting errors in
recognition of speech, and machine learning are all grapheme-to-phoneme conversion in speech-to-text
examples of artificial intelligence (AI). The precision of generation. Authors stated that the method they used had a
detection of speech engine, the quality of the NLP greater rate of mistake adjustment, therefore would help the
algorithms, and the efficacy of the machine learning models real-life annotator. As stated in the scientific review that
utilised all influence the quality of the summary output. resulted in this study's activity, the transformation of voice
With the continuous progress in Speech recognition to written form and its summation are essential. A cross-
accuracy and the rising appeal of natural language for a dimensional text summarising technique based on
computer gateway, there has recently been a spike of dimensional selection and filtering was proposed by Zenkert
interest in speech summarising approaches. et al. Using the findings from the Multidimensional
knowledge representation database, the technique was
To summarise, Speech summarization is a difficult evaluated. Devasena and Hemalatha's content processor was
process that necessitates the use of a mix of speech utilised to identify the arrangement of the content that was
recognition, NLP, and machine learning approaches. Speech entered.[1] Transcribing spoken word materials including

IJISRT23APR2036 www.ijisrt.com 1748


Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
speeches, presentations, lectures, and news broadcasts is one A summary, according to Radev et al., is "a text that is
of the main uses for automatic voice recognition [2]. formed from one or more texts, that conveys\important
information in the original text(s), and that is no\longer than
Although speech is the most efficient and natural form half of the original text(s) and usually, \significantly less
of human communication, just recording speech as an audio than that". The summarization of a text is the process of
signal makes it difficult to quickly examine, retrieve, and identifying and seeking the key and most notable details
reuse speech documents. Speech transcription is therefore within a piece of writing or set of related writings, and
anticipated to be a key skill in the upcoming IT era. subsequently distilling it into a simpler form that maintains
Notwithstanding the reality that extremely high the basic idea. The act of creating a short and flowing
identification accuracy can be readily achievable for voice synopsis that preserves the essential as well as general
given via a written work, such as anchoring commentators' significance is known as summarization of text
news conference phrases, the ability of tech to distinguish automatically. [7]
speech that is impulsive remains limited. [3]. Only one
survey work evaluating various output summaries, features, In 2015, Nallapati et al. used methods involving deep
methodologies, and assessment criteria has been published learning in abstracting and summarising texts for the very
on automatic speech summarization [4]. The present first occasion, and the suggested methodology centred on
research focused solely on a two-phase summarising the encoding and decoding framework.
approach that included essential phrase retrieval and phrase
compression, and it mainly evaluated at publications The encoder-decoder models were designed to solve
released around the year 2006. In the year 2008, the exact Sequence to Sequence difficulties (Seq2Seq). The initial
same researchers published another investigation of pattern of the artificial brain is translated into a comparable
unstructured speech recordings that addressed issues such as pattern of characters, phrases, or sentences using Seq2Seq
audio collections, pronunciation recognition, auditory algorithms. This approach is employed in many NLP uses
simulations, language structure, the process of extraction, such as machine interpretation and summarization of
and voice synthesis [5]. content. The list of inputs in the content condensing is the
data that needs to be summarised, and the order of results is
The bulk of the initial research on separate-document the summary that is produced. [8]
summarization was focused on scientific papers. The most
widely cited paper on synthesis is likely the first (Luhn, The following is the hypothesis suggested by X. Wan
1958), and that discusses studies undertaken at IBM in the et al.: The first step in the reverse parser generates an
early 1950s. According to Luhn's study, the number of times explanation spanning right to left, similar to the Seq2Seq-
of a certain phrase in a piece of writing is a fair measure of Attn model. 2. Both the encoding device and the reversing
its significance. processor employ the focus approach so that the forward-
looking processor may construct an overview from left to
Some major ideas advanced in this research have right. Both the forward as well as backwards decoding
gained prominence in subsequent work on summarization. algorithms utilise a pointer-based approach.[9]
Terms were initially rooted to their fundamental kinds, and
then the endings were deleted. Luhn then created a list of
keywords and phrases arranged by frequency that were III. SPEECH RECOGNITION
reduced, with the ranking supplying an indication of the
phrase’s relevance. On an expression level, an importance Voice recognition is a technique that is sometimes
component was established that reveals the total number of known as automated recognition of speech (ASR), used to
repetitions of noteworthy words inside an expression, in convert spoken words into written or transcribed text. The
addition to the standard deviation separating them due to not technology has made significant advancements in recent
important word interventions. Each of the phrases are scored years, driven by improvements in machine learning
relative to their significance element, and those with the algorithms, speech recognition accuracy, and audio capture
highest scoring statements are subsequently selected to quality. The Web Speech API is one of the latest
construct the activate-abstract. A comparable study developments in speech recognition technology and enables
(Baxendale, 1958), additionally conducted at IBM and people to capture mic sounds and submit it via a speech
presented in the very same journal, gives an early glimpse detection web page for analysis. The API provides
into a key attribute beneficial in spotting major portions of developers with the ability to add speech-to-text
papers, notably phrase placement. This writer investigated functionality to their applications.
200 segments to reach this goal and determined that the
topic phrase occurred as the initial phrase in 85% of the Voice recognition is utilised in numerous applications
sentences and as the final word in 7% of the subsections. As particularly operated by voice systems, personal assistants,
a result, identifying one of each of these is a basic but hands-free dictation systems, and call centre automation.
somewhat precise approach of identifying the subject The accuracy of speech recognition systems has
phrase. This geographical feature is now employed in a considerably improved over the past decade as a result of
number of complicated artificial intelligence applications. breakthroughs in computer learning and neural networks
[6] with deep layers.

IJISRT23APR2036 www.ijisrt.com 1749


Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
The field of speech recognition has advanced disambiguate between words with similar pronunciations,
significantly in the past few years, and it is now possible to and to choose the most likely transcription given the speech
transcribe speech with high accuracy, even in noisy or input.
reverberant environments. This makes it possible to use
speech recognition technology to facilitate the  Decoding:
summarization of office meetings, which can save time and Decoding involves using the acoustic and language
effort, and allow for the quick and easy dissemination of key models to transcribe the speech signal into written text. The
information from the meeting. acoustic models and the language model are combined to
generate the final recognized text. The decoder outputs the
A. Components of Speech Recognition Systems: most likely word sequence based on the acoustic and
Speech recognition systems are composed of several language models like a hypothesis, which is the most likely
components that work together to transcribe speech into transcription of the speech.
text. STT, often known as Voice recognition, is a method of
converting spoken words into printed text. The purpose of B. Speech Reconnaissance System Types:
STT is to translate spoken words as correctly and fast as Speech-Based Recognition Systems are classified into
feasible into machine-readable format. two distinct categories:

Speech pre-processing, feature extraction, acoustic


modelling, language modelling, and decoding are all  Isolated Word Recognition:
components of the STT process: This type of system is designed to recognize a limited
vocabulary of isolated words, such as "yes" or "no". It is
 Speech Pre-Processing: often used in applications such as Speech-activated controls,
The raw audio signal is processed beforehand to where the user is required to speak a limited set of
eliminate undesirable noise and distortions and to improve predefined words.
its quality for better speech recognition performance.
 Continuous Speech Recognition:
 Feature Extraction: This type of system is designed to transcribe speech in
This component processes the raw speech signal to real-time, without requiring the user to pause between
extract relevant information that is used to identify the words. It is used in applications such as dictation software
words spoken. This includes processing to remove noise, and Speech-activated virtual assistants, where the user is
normalise the signal, and extract features such as spectral expected to speak naturally and continuously.
coefficients, prosodic features, and pitch.
STT Technology has advanced significantly in recent
 Acoustic Modelling: years and continues to do so and is used in many
Acoustic modelling involves training machine learning applications, such as voice-activated virtual assistants,
algorithms on large amounts of speech data to recognize voice-activated TV remotes, voice-controlled devices, call
patterns in the speech signal and identify the sounds that centres, and speech-enabled accessibility technologies.
make up speech. The resulting model is then used to
transcribe new speech. This component uses the features However, despite the advances in technology, STT
extracted from the speech signal to model the sound patterns systems can still be inaccurate, especially when dealing with
of spoken words. This typically involves training machine different accents, noisy environments, or fast speech. The
learning algorithms, such as Hidden Markov Models size of the vocabulary that a speech recognition system
(HMMs) or Deep Neural Networks (DNNs), on large needs to support can have a significant impact on its
amounts of speech data to learn the relationships between accuracy. Larger vocabularies require more complex
the acoustic features and the spoken words. The extracted language models, which can be more difficult to train and
features are used to train an acoustic model, which maps the can result in lower recognition accuracy. The ongoing
features to a set of possible phonemes or sub-word units. research and development in this field aim to improve the
accuracy and speed of STT systems, making speech
 Language Modelling: recognition an increasingly important technology for the
A language model is used to model the relationships future. [10]
between the acoustic models and the words in a language. It
is used to predict the most likely word sequences given the IV. SPEECH SUMMARISATION
acoustic models. Language modelling involves considering
the context of the words being spoken to increase the  Overview:
precision of the STT system. For instance, the STT system Speech summarization is the process of reducing the
may be capable of recognizing that a word is more likely to length of a speech while retaining its most important
be "bank" as a financial institution rather than a riverbank, content. Summarization techniques are methods used to
based on the words that have been spoken before. This condense text into a more manageable form. The goal of
component uses statistical techniques to model the structure speech summarization is to provide a condensed and more
of a language, including the probabilities of word sequences, understandable version of the speech that is suitable for a
grammar, and pronunciation. This information is used to specific task.

IJISRT23APR2036 www.ijisrt.com 1750


Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Speech Summarising Approaches are Classified Into  Keyword Extraction:
two Types: The processed text is then analysed to extract the most
important keywords and phrases that capture the essence of
 Integrative Summarising: the speech.
Choosing important words is an example of
summarised extracts, sentences from the original text to  Summarization:
create a summary. This approach includes picking and The extracted keywords and phrases are then used to
retrieving some of the most significant phrases from an generate a concise and coherent summary of the office
article. Extractive summarization preserves the format of the meeting. This summary can be in the form of a written
original speech and is usually more fluent but can result in a document, or a presentation, or a summary report.
summary that is less concise. A summary is formed by
combining chosen sentences, that retains the main points  Review and Refinement:
and key information from the speech. Extractive Finally, the generated summary is reviewed and
summarization is mainly used for tasks where preserving the refined to ensure that it accurately reflects the content of the
original format is important, such as legal documentation, office meeting and that it is clear and concise.
news articles, etc.
 Gensim:
 Abstractive Summarization: Gensim serves as a freely available processing of
This approach includes creating fresh phrases that natural languages and a subject modelling framework. One
summarise the main points of the speech. The new sentences of its core functionalities is text summarization. Gensim's
are created by using a combo of Artificial learning and the summarization module provides an implementation of the
processing of natural languages (NLP) approaches. TextRank algorithm, which is a graph-based approach to
Abstractive summarization is more concise and flexible, but extractive text summarization. [11]
it is also more complex and harder to implement than
extractive summarization. Abstractive summarization is The TextRank algorithm starts by splitting the input
mainly used for tasks where summarising the speech in a text into sentences and constructing a graph where the
more concise manner is important, such as generating vertices represent sentences and edges show the
executive summaries, summarising long conversations, etc. resemblance among them. The resemblance of phrases is
often calculated using word coincide, co-occurring or cosine
 Meeting Insights Summarization: correspondence. The TextRank algorithm is applied when
The proposed solution for meeting insights the graph has been built, applying PageRank, a well-known
summarization involves the use of speech recognition and algorithm for finding the importance of nodes in a graph, to
summarization techniques. First, speech is recorded and the vertices (sentences) in the graph. The result is a ranking
transcribed into text using ASR. Next, summarization of the sentences, with the most important sentences having
techniques are applied to the transcribed text to condense the the highest score.
information into a more manageable form. The goal of this
approach is to provide attendees with a summary of the Finally, the Gensim summarization module selects the
meeting's key information and insights, allowing them to top-k sentences with the highest scores, where k is a user-
more effectively retain and recall the content of the meeting. defined parameter, to form a summary. The resulting
summary gathers the most important data from the supplied
This technology can be used to facilitate the text, while omitting redundant or irrelevant information.
summarization of office meetings by automatically
transcribing the speech into text, which can then be As a supplement to the TextRank algorithm, Gensim
processed by a summarization algorithm. supports alternative synthesis approaches such as Non-
negative Matrix Factorization, Latent Dirichlet Allocation
 The Process of Office Meeting Summarization Using and Latent Semantic Analysis. These techniques can be used
Speech Recognition can be Broken Down into the to generate summaries based on the underlying topics and
Procedures that follow: latent structures in the text.

 Speech Recognition: In conclusion, Gensim summarization is a powerful


The initial step is to type the speech from the office tool for generating concise and meaningful summaries of
meeting into text. This can be done using speech recognition large amounts of text. Its TextRank algorithm design
software that converts the audio of the speech into a text delivers a simple yet efficient way for extracting the most
representation. significant data off the text being entered.

 Text Processing:  Spacy:


The transcribed text is then processed to remove any Spacy serves as a freely available Python toolkit for
redundant or irrelevant information, such as filler words, sophisticated natural language processing. It is intended to
repetitions, or irrelevant comments. be quick, efficient, and simple to use. Part-of-speech
tagging, tokenization, dependency parsing, named entity

IJISRT23APR2036 www.ijisrt.com 1751


Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
identification, text categorization, and other text analysis Named entity recognition: Spacy is able to recognise
and manipulation functions are available in Spacy. [12] and categorise designated entities in written content, such as
individuals, groups, places, and occasions, using named
Spacy's quickness constitutes one of its primary assets. entity recognition. This is important for activities like
It is designed for massive-scale processing of data and can gathering data and object connection.
swiftly and effectively handle enormous quantities of Dependency parsing: Spacy itself can evaluate the
information. That makes it ideal for applications requiring structure of grammar of a phrase and find the links among
velocity and scaling, for instance in manufacturing items via dependency parsing. This is important for
situations or while handling huge datasets. activities like analysing sentiment and query response.

Spacy's primary characteristics include the following: Text classification: Spacy includes a range of built-in
Language support: Spacy handles a number of dialects, models for text classification, including sentiment analysis
including Spanish, German, English, Dutch, French, Italian, and topic modelling. These models can be trained on custom
and others. datasets to create more accurate models for specific use
cases.
Pre-trained models: Spacy offers models that have
been trained for many dialects, which may be uploaded with Customization: Spacy provides a range of tools for
only a few pieces of script. These representations may be customising and training models on specific tasks or
utilised in a variety of tasks involving NLP, including domains. This allows developers to create more accurate
dependency parsing, part-of-speech tagging, named entity models for specific use cases and can help improve
recognition, and others. performance on specific datasets.

Tokenization: Spacy uses advanced tokenization Overall, Spacy is a powerful and flexible library for
techniques to split text into individual words and natural language processing in Python. Its rapidity and
punctuation marks. It can handle a range of languages and flexibility render it suitable for usage in commercial
can also split compound words and contractions. situations, and its variety of functions and customizable
possibilities make it an appealing option among academics
Part-of-speech tagging: Spacy may instantly assign and engineers who are developing an extensive variety of
elements of speech, such as a noun, a verb, an adjective, or applications that use NLP.
adverb, to every syllable in a phrase. This may be helpful for
a variety of uses including sentiment analysis and text
categorization.

Fig 1 Input Given by the User

IJISRT23APR2036 www.ijisrt.com 1752


Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig 2 Processed Output in the form of Summary

 Block Diagram: V. CONCLUSION

In conclusion, meeting insights summarization using


speech recognition and summarization techniques presents a
promising solution for improving the efficiency and
effectiveness of meetings. The use of ASR and
summarization technology can provide attendees with a
concise and manageable summary of meeting content,
allowing them to more effectively retain and recall key
information and insights. This paper provides a
comprehensive overview of the current state of speech
recognition and summarization technology and demonstrates
how these technologies can be applied to meeting insights
summarization. In our attempt to design the code for a
speech summarization system for meetings, we tried using
Spacy and Genism libraries to implement the system, and
Figure 1 depicts the speech spoken by the user which is
processed using the system created by us and Figure 2
displays the processed output in the form of the spoken
speech summary. More study is required to investigate the
Fig 3 Block Diagram of a Speech Recognition System possible advantages and disadvantages of this strategy, in
addition to developing more advanced summary algorithms.
Figure 3 depicts the block diagram of a speech
recognition system and Figure 4 depicts the activity diagram Speech recognition is a fast-expanding technology
of a speech recognition system. with the possibility to transform how we communicate with
machines and other objects. Notwithstanding ongoing
 Activity Diagram: obstacles, developments in machine learning and signal
processing are enabling the creation of increasingly precise
and trustworthy voice recognition networks, which have the
ability to alter a broad spectrum of industry sectors and
applications.
REFERENCES

[1]. Newell, A., Yang, K., & Deng, J. (2016, October).


Stacked hourglass networks for human pose
estimation. In the European conference on computer
vision (pp. 483-499). Springer, Cham.
[2]. Furui, S., Iwano, K., Hori, C., Shinozaki, T., Saito,
Y., & Tamura, S. (2001, May). Ubiquitous speech
processing. In 2001 IEEE International Conference
on Acoustics, Speech, and Signal Processing.
Proceedings (Cat. No. 01CH37221) (Vol. 1, pp. 13-
16). IEEE.
[3]. Furui, S. (2003). Recent advances in spontaneous
speech recognition and understanding. In ISCA &
Fig 4 Activity Diagram of a Speech Recognition System

IJISRT23APR2036 www.ijisrt.com 1753


Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[4]. IEEE workshop on spontaneous speech processing
and recognition.
[5]. Hori, C., & Furui, S. (2001). Advances in automatic
speech summarization. RDM, 80, 100.
[6]. Furui, S., & Kawahara, T. (2008). Transcription and
distillation of spontaneous speech. Springer
Handbook of Speech Processing, 627-652.
[7]. Sakshi Bhalla, Roma Verma, Kusum Madaan, 2017,
Comparative Analysis of Text Summarisation
Techniques, INTERNATIONAL JOURNAL OF
ENGINEERING RESEARCH & TECHNOLOGY
(IJERT) ICCCS – 2017 (Volume 5 – Issue 10),
[8]. Radev, D., Hovy, E., & McKeown, K. (2002).
Introduction to the special issue on summarization.
Computational linguistics, 28(4), 399-408.
[9]. Nallapati, R., Zhou, B., Gulcehre, C., & Xiang, B.
(2016). Abstractive text summarization using
sequence-to-sequence rnns and beyond. arXiv
preprint arXiv:1602.06023.
[10]. Wan, X., Li, C., Wang, R., Xiao, D., & Shi, C.
(2018). Abstractive document summarization via
bidirectional decoder. In Advanced Data Mining and
Applications: 14th International Conference, ADMA
2018, Nanjing, China, November 16–18, 2018,
Proceedings 14 (pp. 364-377). Springer International
Publishing.
[11]. https://www.sciencedirect.com/topics/engineering/spe
ech-recognition
[12]. https://pypi.org/project/gensim/
[13]. https://spacy.io/

IJISRT23APR2036 www.ijisrt.com 1754

You might also like