0% found this document useful (0 votes)

103 views8 pages

Meeting Insights Summarisation Using Speech Recognition

Speech is the strongest mode of discourse through which people express their emotions and ideas through numerous

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

103 views8 pages

Meeting Insights Summarisation Using Speech Recognition

Speech is the strongest mode of discourse through which people express their emotions and ideas through numerous

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Meeting Insights Summarisation Using

Speech Recognition
Sakshil Verma1 Saksham Thareja2
Computer Science and Engineering Student, Computer Science and Engineering Student,
SRM Institute of Science and Technology SRM Institute of Science and Technology
Chennai, Tamil Nadu, India Chennai, Tamil Nadu, India

Dr. P. Supraja3
Associate Professor, Department of Networking and Communications, SRM Institute of Science and Technology,
Chennai, Tamil Nadu, India

Abstract:- Speech is the strongest mode of discourse the recognition and translation of speech into text. Text
through which people express their emotions and ideas summarization pulls the most significant information from a
through numerous languages. Speech recognition text-based source and offers an effective summary of the
authorization has varied applications as it provides same.
Hassle free procedure which does not require physical
contact as in the case of fingerprint authorization. Speech summarization is the process of condensing
Speech summarisation methods use speech from people human speech into a more concise and manageable form. It
as input and produce a condensed form as spoken or tries to write a summary that is suitable for a specific task.
written language. Speech synthesis offers a variety of The summary should be more coherent than a direct
applications spanning from computer technology to transcription of speech, as it eliminates common
medical care, including improving language libraries irregularities, breaks, repairs, and repetitions. The recent
and reducing therapeutic paperwork load. Every dialect interest in speech summarization is driven by improvements
has its unique collection of features for speaking. Despite in improving the precision of speech recognition systems,
speaking a comparable language, the speed and dialect the standard in audio capturing, and the rising use of natural
differ from individual to individual. This can make language as a computer structure.
comprehending the conveyed message difficult for
certain people. Conferences are an important part of The process of speech summarization involves several
every organisation's operation, regardless of if they took technological components such as automated speech
place via the web or in reality. Meeting translation and recognition (ASR), which translates voice into written form,
summarization standards, on the contrary hand, are and summary modules, which summarise information
typically disagreeable demands because they necessitate summarise key parts of the transcription. Users can use the
time-consuming workers. This project aims to identify Internet's Voice APIs to capture audio and submit it to a
things during meetings like the greatest number of times speech recognition web service for processing.
a person spoke in a meeting to determine his level of
inputs and summarisation of insights of meetings for all Speech summarization has a range of real-world
the employees in the meeting and identifying their applications, such as summarising broadcast news, podcasts,
insights through the words spoken by them. clinical conversations, and meetings. It presents a challenge
in speech understanding research and can be achieved
Keywords:- Speech Recognition, Speech Summarization, through extractive or abstractive summarization techniques.
Speech Pre-Processing, Spacy, Gensim. Extractive summarization preserves the original format and
is typically more fluent, while abstractive summarization is
I. INTRODUCTION more concise and flexible. The summary of speech ought to
be more intelligible than a straight transcript.
Speech is a highly powerful mode of communication
through which humans express their thoughts and feelings Meetings are a common and important part of business
through numerous languages. Each language has its unique operations. They provide opportunities for team members to
set of linguistic qualities. Even while speaking the same collaborate, exchange ideas, and make decisions. However,
language, the speed and accent vary from person to person. meetings can also be time-consuming and distracting,
It makes it difficult for certain people to comprehend the making it difficult for attendees to retain key information
conveyed message. Long speeches can be difficult to follow and insights. To address this challenge, the use of speech
at times owing to factors such as differing pronunciation, recognition and summarization technology has gained
pace, and other factors. Speech recognition, which is a attention as a way to efficiently and effectively process
cross-disciplinary issue in computational language science, meeting content.
contributes to the advancement of technology that allows for

IJISRT23APR2036 www.ijisrt.com 1747

Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
Voice recognition is a technique that converts words summarization's purpose is to give a shortened and more
that are spoken into text. It is also known as automated intelligible version of the speech that is appropriate for a
speech recognition (ASR) or speech-to-text. It is an certain activity. The two primary types of speech summary
interdisciplinary field that involves speech signal approaches are extractive summarization and abstractive
processing, acoustic modelling, language modelling, and summarization, each with its own set of advantages and
machine learning. The goal of speech recognition is to disadvantages. The accuracy of the Speech recognition
accurately transcribe. human speech into written text, engine, the quality of the NLP algorithms, and the efficacy
allowing for easier processing, storage, and retrieval of of the machine learning models utilised all influence the
spoken information. quality of the summary output.

Recognition of speech has an extensive variety of use II. LITERATURE SURVEY

cases, including voice-activated artificial intelligence,
dictation software, and voice recognition software, Converting speech to text is beneficial in a variety of
accessibility solutions for people with disabilities, and scenarios. Jose et al. developed an effective technique for
hands-free control of devices. obtaining English fluency that improves the user's speaking
style through proper pronunciation using English phonetics.
Speech recognition is a challenging task due to the Sivakumar et al. did a comparative study of the advantages
complexity of human speech and the variability of spoken and disadvantages of different sizes of vocabulary Voice
language. Some of the major challenges include: recognition systems. The research conducted highlighted the
significance of computational models of language in
 Speaker Variability enhancing the precision of monologue-to-text translation
Different speakers have unique speech patterns, across various interference and breached-word conditions.
including pronunciation, speaking rate, and intonation. This Yogita and co-workers developed a bilingual language
variability can make it difficult for speech recognition conversion technology using the extraction of features from
systems to accurately transcribe speech from different MFCC and audio classification algorithms such as the Least
speakers. Length Encoder and Support Vector Machine (SVM).
Sphinx 4, a platform that is free to use, was recommended
 Background Noise for converting authentic Bengali text into English. In the
The existence of ambient noise can decrease the information set beneath examination, the researchers
quality of the spoken signal dramatically and make it more estimate to have achieved a level of precision of 71.7%.
difficult for the system to accurately transcribe the speech. Wan proposes summarising English text using association
semantic criteria. The novel extraction approach, according
 Vocabulary Size to the author, shows enhanced extraction convergence and
The size of the vocabulary that a speech recognition precision. LDA is the most extensively used topic-based text
system needs to support can have a significant impact on its categorization algorithm.
accuracy. Larger vocabularies require more complex
language models, which can be more difficult to train and A novel method to similarity calculations suggests a
can result in lower recognition accuracy. change for the better. Saiyed and Sajja gave a succinct
summary of the various categories of summarising
Speech pre-processing consists of reducing methodologies, emphasising their advantages and
background noise, adjusting loudness, and transforming the disadvantages. This work offers researchers advice on
speech input to a digital representation. The process of selecting particular methods in accordance with their
extracting features from a Speech signal entail translating it requirements. Choosing the right term is a multi-objective
into a collection of distinguishing qualities that may be used optimization problem. With this, the writers applied a
to identify the words uttered. human-centred training optimisation technique. According
to the authors of, feature extraction using neural networks is
Speech summarization approaches are often based on a more effective than online extractive techniques.
mix of the processing of natural language (NLP), Vythelingum et al. proposed a method for detecting errors in
recognition of speech, and machine learning are all grapheme-to-phoneme conversion in speech-to-text
examples of artificial intelligence (AI). The precision of generation. Authors stated that the method they used had a
detection of speech engine, the quality of the NLP greater rate of mistake adjustment, therefore would help the
algorithms, and the efficacy of the machine learning models real-life annotator. As stated in the scientific review that
utilised all influence the quality of the summary output. resulted in this study's activity, the transformation of voice
With the continuous progress in Speech recognition to written form and its summation are essential. A cross-
accuracy and the rising appeal of natural language for a dimensional text summarising technique based on
computer gateway, there has recently been a spike of dimensional selection and filtering was proposed by Zenkert
interest in speech summarising approaches. et al. Using the findings from the Multidimensional
knowledge representation database, the technique was
To summarise, Speech summarization is a difficult evaluated. Devasena and Hemalatha's content processor was
process that necessitates the use of a mix of speech utilised to identify the arrangement of the content that was
recognition, NLP, and machine learning approaches. Speech entered.[1] Transcribing spoken word materials including

IJISRT23APR2036 www.ijisrt.com 1748

Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
speeches, presentations, lectures, and news broadcasts is one A summary, according to Radev et al., is "a text that is
of the main uses for automatic voice recognition [2]. formed from one or more texts, that conveys\important
information in the original text(s), and that is no\longer than
Although speech is the most efficient and natural form half of the original text(s) and usually, \significantly less
of human communication, just recording speech as an audio than that". The summarization of a text is the process of
signal makes it difficult to quickly examine, retrieve, and identifying and seeking the key and most notable details
reuse speech documents. Speech transcription is therefore within a piece of writing or set of related writings, and
anticipated to be a key skill in the upcoming IT era. subsequently distilling it into a simpler form that maintains
Notwithstanding the reality that extremely high the basic idea. The act of creating a short and flowing
identification accuracy can be readily achievable for voice synopsis that preserves the essential as well as general
given via a written work, such as anchoring commentators' significance is known as summarization of text
news conference phrases, the ability of tech to distinguish automatically. [7]
speech that is impulsive remains limited. [3]. Only one
survey work evaluating various output summaries, features, In 2015, Nallapati et al. used methods involving deep
methodologies, and assessment criteria has been published learning in abstracting and summarising texts for the very
on automatic speech summarization [4]. The present first occasion, and the suggested methodology centred on
research focused solely on a two-phase summarising the encoding and decoding framework.
approach that included essential phrase retrieval and phrase
compression, and it mainly evaluated at publications The encoder-decoder models were designed to solve
released around the year 2006. In the year 2008, the exact Sequence to Sequence difficulties (Seq2Seq). The initial
same researchers published another investigation of pattern of the artificial brain is translated into a comparable
unstructured speech recordings that addressed issues such as pattern of characters, phrases, or sentences using Seq2Seq
audio collections, pronunciation recognition, auditory algorithms. This approach is employed in many NLP uses
simulations, language structure, the process of extraction, such as machine interpretation and summarization of
and voice synthesis [5]. content. The list of inputs in the content condensing is the
data that needs to be summarised, and the order of results is
The bulk of the initial research on separate-document the summary that is produced. [8]
summarization was focused on scientific papers. The most
widely cited paper on synthesis is likely the first (Luhn, The following is the hypothesis suggested by X. Wan
1958), and that discusses studies undertaken at IBM in the et al.: The first step in the reverse parser generates an
early 1950s. According to Luhn's study, the number of times explanation spanning right to left, similar to the Seq2Seq-
of a certain phrase in a piece of writing is a fair measure of Attn model. 2. Both the encoding device and the reversing
its significance. processor employ the focus approach so that the forward-
looking processor may construct an overview from left to
Some major ideas advanced in this research have right. Both the forward as well as backwards decoding
gained prominence in subsequent work on summarization. algorithms utilise a pointer-based approach.[9]
Terms were initially rooted to their fundamental kinds, and
then the endings were deleted. Luhn then created a list of
keywords and phrases arranged by frequency that were III. SPEECH RECOGNITION
reduced, with the ranking supplying an indication of the
phrase’s relevance. On an expression level, an importance Voice recognition is a technique that is sometimes
component was established that reveals the total number of known as automated recognition of speech (ASR), used to
repetitions of noteworthy words inside an expression, in convert spoken words into written or transcribed text. The
addition to the standard deviation separating them due to not technology has made significant advancements in recent
important word interventions. Each of the phrases are scored years, driven by improvements in machine learning
relative to their significance element, and those with the algorithms, speech recognition accuracy, and audio capture
highest scoring statements are subsequently selected to quality. The Web Speech API is one of the latest
construct the activate-abstract. A comparable study developments in speech recognition technology and enables
(Baxendale, 1958), additionally conducted at IBM and people to capture mic sounds and submit it via a speech
presented in the very same journal, gives an early glimpse detection web page for analysis. The API provides
into a key attribute beneficial in spotting major portions of developers with the ability to add speech-to-text
papers, notably phrase placement. This writer investigated functionality to their applications.
200 segments to reach this goal and determined that the
topic phrase occurred as the initial phrase in 85% of the Voice recognition is utilised in numerous applications
sentences and as the final word in 7% of the subsections. As particularly operated by voice systems, personal assistants,
a result, identifying one of each of these is a basic but hands-free dictation systems, and call centre automation.
somewhat precise approach of identifying the subject The accuracy of speech recognition systems has
phrase. This geographical feature is now employed in a considerably improved over the past decade as a result of
number of complicated artificial intelligence applications. breakthroughs in computer learning and neural networks
[6] with deep layers.

IJISRT23APR2036 www.ijisrt.com 1749

Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
The field of speech recognition has advanced disambiguate between words with similar pronunciations,
significantly in the past few years, and it is now possible to and to choose the most likely transcription given the speech
transcribe speech with high accuracy, even in noisy or input.
reverberant environments. This makes it possible to use
speech recognition technology to facilitate the  Decoding:
summarization of office meetings, which can save time and Decoding involves using the acoustic and language
effort, and allow for the quick and easy dissemination of key models to transcribe the speech signal into written text. The
information from the meeting. acoustic models and the language model are combined to
generate the final recognized text. The decoder outputs the
A. Components of Speech Recognition Systems: most likely word sequence based on the acoustic and
Speech recognition systems are composed of several language models like a hypothesis, which is the most likely
components that work together to transcribe speech into transcription of the speech.
text. STT, often known as Voice recognition, is a method of
converting spoken words into printed text. The purpose of B. Speech Reconnaissance System Types:
STT is to translate spoken words as correctly and fast as Speech-Based Recognition Systems are classified into
feasible into machine-readable format. two distinct categories:

Speech pre-processing, feature extraction, acoustic

modelling, language modelling, and decoding are all  Isolated Word Recognition:
components of the STT process: This type of system is designed to recognize a limited
vocabulary of isolated words, such as "yes" or "no". It is
 Speech Pre-Processing: often used in applications such as Speech-activated controls,
The raw audio signal is processed beforehand to where the user is required to speak a limited set of
eliminate undesirable noise and distortions and to improve predefined words.
its quality for better speech recognition performance.
 Continuous Speech Recognition:
 Feature Extraction: This type of system is designed to transcribe speech in
This component processes the raw speech signal to real-time, without requiring the user to pause between
extract relevant information that is used to identify the words. It is used in applications such as dictation software
words spoken. This includes processing to remove noise, and Speech-activated virtual assistants, where the user is
normalise the signal, and extract features such as spectral expected to speak naturally and continuously.
coefficients, prosodic features, and pitch.
STT Technology has advanced significantly in recent
 Acoustic Modelling: years and continues to do so and is used in many
Acoustic modelling involves training machine learning applications, such as voice-activated virtual assistants,
algorithms on large amounts of speech data to recognize voice-activated TV remotes, voice-controlled devices, call
patterns in the speech signal and identify the sounds that centres, and speech-enabled accessibility technologies.
make up speech. The resulting model is then used to
transcribe new speech. This component uses the features However, despite the advances in technology, STT
extracted from the speech signal to model the sound patterns systems can still be inaccurate, especially when dealing with
of spoken words. This typically involves training machine different accents, noisy environments, or fast speech. The
learning algorithms, such as Hidden Markov Models size of the vocabulary that a speech recognition system
(HMMs) or Deep Neural Networks (DNNs), on large needs to support can have a significant impact on its
amounts of speech data to learn the relationships between accuracy. Larger vocabularies require more complex
the acoustic features and the spoken words. The extracted language models, which can be more difficult to train and
features are used to train an acoustic model, which maps the can result in lower recognition accuracy. The ongoing
features to a set of possible phonemes or sub-word units. research and development in this field aim to improve the
accuracy and speed of STT systems, making speech
 Language Modelling: recognition an increasingly important technology for the
A language model is used to model the relationships future. [10]
between the acoustic models and the words in a language. It
is used to predict the most likely word sequences given the IV. SPEECH SUMMARISATION
acoustic models. Language modelling involves considering
the context of the words being spoken to increase the  Overview:
precision of the STT system. For instance, the STT system Speech summarization is the process of reducing the
may be capable of recognizing that a word is more likely to length of a speech while retaining its most important
be "bank" as a financial institution rather than a riverbank, content. Summarization techniques are methods used to
based on the words that have been spoken before. This condense text into a more manageable form. The goal of
component uses statistical techniques to model the structure speech summarization is to provide a condensed and more
of a language, including the probabilities of word sequences, understandable version of the speech that is suitable for a
grammar, and pronunciation. This information is used to specific task.

IJISRT23APR2036 www.ijisrt.com 1750

Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
 Speech Summarising Approaches are Classified Into  Keyword Extraction:
two Types: The processed text is then analysed to extract the most
important keywords and phrases that capture the essence of
 Integrative Summarising: the speech.
Choosing important words is an example of
summarised extracts, sentences from the original text to  Summarization:
create a summary. This approach includes picking and The extracted keywords and phrases are then used to
retrieving some of the most significant phrases from an generate a concise and coherent summary of the office
article. Extractive summarization preserves the format of the meeting. This summary can be in the form of a written
original speech and is usually more fluent but can result in a document, or a presentation, or a summary report.
summary that is less concise. A summary is formed by
combining chosen sentences, that retains the main points  Review and Refinement:
and key information from the speech. Extractive Finally, the generated summary is reviewed and
summarization is mainly used for tasks where preserving the refined to ensure that it accurately reflects the content of the
original format is important, such as legal documentation, office meeting and that it is clear and concise.
news articles, etc.
 Gensim:
 Abstractive Summarization: Gensim serves as a freely available processing of
This approach includes creating fresh phrases that natural languages and a subject modelling framework. One
summarise the main points of the speech. The new sentences of its core functionalities is text summarization. Gensim's
are created by using a combo of Artificial learning and the summarization module provides an implementation of the
processing of natural languages (NLP) approaches. TextRank algorithm, which is a graph-based approach to
Abstractive summarization is more concise and flexible, but extractive text summarization. [11]
it is also more complex and harder to implement than
extractive summarization. Abstractive summarization is The TextRank algorithm starts by splitting the input
mainly used for tasks where summarising the speech in a text into sentences and constructing a graph where the
more concise manner is important, such as generating vertices represent sentences and edges show the
executive summaries, summarising long conversations, etc. resemblance among them. The resemblance of phrases is
often calculated using word coincide, co-occurring or cosine
 Meeting Insights Summarization: correspondence. The TextRank algorithm is applied when
The proposed solution for meeting insights the graph has been built, applying PageRank, a well-known
summarization involves the use of speech recognition and algorithm for finding the importance of nodes in a graph, to
summarization techniques. First, speech is recorded and the vertices (sentences) in the graph. The result is a ranking
transcribed into text using ASR. Next, summarization of the sentences, with the most important sentences having
techniques are applied to the transcribed text to condense the the highest score.
information into a more manageable form. The goal of this
approach is to provide attendees with a summary of the Finally, the Gensim summarization module selects the
meeting's key information and insights, allowing them to top-k sentences with the highest scores, where k is a user-
more effectively retain and recall the content of the meeting. defined parameter, to form a summary. The resulting
summary gathers the most important data from the supplied
This technology can be used to facilitate the text, while omitting redundant or irrelevant information.
summarization of office meetings by automatically
transcribing the speech into text, which can then be As a supplement to the TextRank algorithm, Gensim
processed by a summarization algorithm. supports alternative synthesis approaches such as Non-
negative Matrix Factorization, Latent Dirichlet Allocation
 The Process of Office Meeting Summarization Using and Latent Semantic Analysis. These techniques can be used
Speech Recognition can be Broken Down into the to generate summaries based on the underlying topics and
Procedures that follow: latent structures in the text.

 Speech Recognition: In conclusion, Gensim summarization is a powerful

The initial step is to type the speech from the office tool for generating concise and meaningful summaries of
meeting into text. This can be done using speech recognition large amounts of text. Its TextRank algorithm design
software that converts the audio of the speech into a text delivers a simple yet efficient way for extracting the most
representation. significant data off the text being entered.

 Text Processing:  Spacy:

The transcribed text is then processed to remove any Spacy serves as a freely available Python toolkit for
redundant or irrelevant information, such as filler words, sophisticated natural language processing. It is intended to
repetitions, or irrelevant comments. be quick, efficient, and simple to use. Part-of-speech
tagging, tokenization, dependency parsing, named entity

IJISRT23APR2036 www.ijisrt.com 1751

Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
identification, text categorization, and other text analysis Named entity recognition: Spacy is able to recognise
and manipulation functions are available in Spacy. [12] and categorise designated entities in written content, such as
individuals, groups, places, and occasions, using named
Spacy's quickness constitutes one of its primary assets. entity recognition. This is important for activities like
It is designed for massive-scale processing of data and can gathering data and object connection.
swiftly and effectively handle enormous quantities of Dependency parsing: Spacy itself can evaluate the
information. That makes it ideal for applications requiring structure of grammar of a phrase and find the links among
velocity and scaling, for instance in manufacturing items via dependency parsing. This is important for
situations or while handling huge datasets. activities like analysing sentiment and query response.

Spacy's primary characteristics include the following: Text classification: Spacy includes a range of built-in
Language support: Spacy handles a number of dialects, models for text classification, including sentiment analysis
including Spanish, German, English, Dutch, French, Italian, and topic modelling. These models can be trained on custom
and others. datasets to create more accurate models for specific use
cases.
Pre-trained models: Spacy offers models that have
been trained for many dialects, which may be uploaded with Customization: Spacy provides a range of tools for
only a few pieces of script. These representations may be customising and training models on specific tasks or
utilised in a variety of tasks involving NLP, including domains. This allows developers to create more accurate
dependency parsing, part-of-speech tagging, named entity models for specific use cases and can help improve
recognition, and others. performance on specific datasets.

Tokenization: Spacy uses advanced tokenization Overall, Spacy is a powerful and flexible library for
techniques to split text into individual words and natural language processing in Python. Its rapidity and
punctuation marks. It can handle a range of languages and flexibility render it suitable for usage in commercial
can also split compound words and contractions. situations, and its variety of functions and customizable
possibilities make it an appealing option among academics
Part-of-speech tagging: Spacy may instantly assign and engineers who are developing an extensive variety of
elements of speech, such as a noun, a verb, an adjective, or applications that use NLP.
adverb, to every syllable in a phrase. This may be helpful for
a variety of uses including sentiment analysis and text
categorization.

Fig 1 Input Given by the User

IJISRT23APR2036 www.ijisrt.com 1752

Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig 2 Processed Output in the form of Summary

 Block Diagram: V. CONCLUSION

In conclusion, meeting insights summarization using

speech recognition and summarization techniques presents a
promising solution for improving the efficiency and
effectiveness of meetings. The use of ASR and
summarization technology can provide attendees with a
concise and manageable summary of meeting content,
allowing them to more effectively retain and recall key
information and insights. This paper provides a
comprehensive overview of the current state of speech
recognition and summarization technology and demonstrates
how these technologies can be applied to meeting insights
summarization. In our attempt to design the code for a
speech summarization system for meetings, we tried using
Spacy and Genism libraries to implement the system, and
Figure 1 depicts the speech spoken by the user which is
processed using the system created by us and Figure 2
displays the processed output in the form of the spoken
speech summary. More study is required to investigate the
Fig 3 Block Diagram of a Speech Recognition System possible advantages and disadvantages of this strategy, in
addition to developing more advanced summary algorithms.
Figure 3 depicts the block diagram of a speech
recognition system and Figure 4 depicts the activity diagram Speech recognition is a fast-expanding technology
of a speech recognition system. with the possibility to transform how we communicate with
machines and other objects. Notwithstanding ongoing
 Activity Diagram: obstacles, developments in machine learning and signal
processing are enabling the creation of increasingly precise
and trustworthy voice recognition networks, which have the
ability to alter a broad spectrum of industry sectors and
applications.
REFERENCES

[1]. Newell, A., Yang, K., & Deng, J. (2016, October).

Stacked hourglass networks for human pose
estimation. In the European conference on computer
vision (pp. 483-499). Springer, Cham.
[2]. Furui, S., Iwano, K., Hori, C., Shinozaki, T., Saito,
Y., & Tamura, S. (2001, May). Ubiquitous speech
processing. In 2001 IEEE International Conference
on Acoustics, Speech, and Signal Processing.
Proceedings (Cat. No. 01CH37221) (Vol. 1, pp. 13-
16). IEEE.
[3]. Furui, S. (2003). Recent advances in spontaneous
speech recognition and understanding. In ISCA &
Fig 4 Activity Diagram of a Speech Recognition System

IJISRT23APR2036 www.ijisrt.com 1753

Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[4]. IEEE workshop on spontaneous speech processing
and recognition.
[5]. Hori, C., & Furui, S. (2001). Advances in automatic
speech summarization. RDM, 80, 100.
[6]. Furui, S., & Kawahara, T. (2008). Transcription and
distillation of spontaneous speech. Springer
Handbook of Speech Processing, 627-652.
[7]. Sakshi Bhalla, Roma Verma, Kusum Madaan, 2017,
Comparative Analysis of Text Summarisation
Techniques, INTERNATIONAL JOURNAL OF
ENGINEERING RESEARCH & TECHNOLOGY
(IJERT) ICCCS – 2017 (Volume 5 – Issue 10),
[8]. Radev, D., Hovy, E., & McKeown, K. (2002).
Introduction to the special issue on summarization.
Computational linguistics, 28(4), 399-408.
[9]. Nallapati, R., Zhou, B., Gulcehre, C., & Xiang, B.
(2016). Abstractive text summarization using
sequence-to-sequence rnns and beyond. arXiv
preprint arXiv:1602.06023.
[10]. Wan, X., Li, C., Wang, R., Xiao, D., & Shi, C.
(2018). Abstractive document summarization via
bidirectional decoder. In Advanced Data Mining and
Applications: 14th International Conference, ADMA
2018, Nanjing, China, November 16–18, 2018,
Proceedings 14 (pp. 364-377). Springer International
Publishing.
[11]. https://www.sciencedirect.com/topics/engineering/spe
ech-recognition
[12]. https://pypi.org/project/gensim/
[13]. https://spacy.io/

IJISRT23APR2036 www.ijisrt.com 1754

Medical Billing Final Project
No ratings yet
Medical Billing Final Project
293 pages
Irwin, Engineering Circuit Analysis, 11e ISV
No ratings yet
Irwin, Engineering Circuit Analysis, 11e ISV
194 pages
Synthesis
100% (1)
Synthesis
98 pages
FDBS Unit - 1,2,3
No ratings yet
FDBS Unit - 1,2,3
72 pages
Vulnerability Report AITL
No ratings yet
Vulnerability Report AITL
16 pages
Service-Now: Types of Support Tools
No ratings yet
Service-Now: Types of Support Tools
4 pages
Software Productivity - Word
No ratings yet
Software Productivity - Word
36 pages
Bahri Cloning Oracle Applications Release 12
No ratings yet
Bahri Cloning Oracle Applications Release 12
14 pages
TRBOnet Watch Release Notes 3.1
No ratings yet
TRBOnet Watch Release Notes 3.1
15 pages
User Manual: Semikron Skiip - Tester Manual Control Unit
100% (2)
User Manual: Semikron Skiip - Tester Manual Control Unit
20 pages
Introduction To Agricultural Information Systems
No ratings yet
Introduction To Agricultural Information Systems
13 pages
Fieldserver Quickserver Start-Up Guide Fs-Qs-2X10: Applicability & Effectivity
No ratings yet
Fieldserver Quickserver Start-Up Guide Fs-Qs-2X10: Applicability & Effectivity
34 pages
CIAM Ping
No ratings yet
CIAM Ping
35 pages
6.6.7 Packet Tracer - Configure PAT - ILM
No ratings yet
6.6.7 Packet Tracer - Configure PAT - ILM
5 pages
Learn-Keyboard .Co - Uk: Learn How To Play Electronic Keyboard or Piano
No ratings yet
Learn-Keyboard .Co - Uk: Learn How To Play Electronic Keyboard or Piano
13 pages
HTML Input Types
No ratings yet
HTML Input Types
13 pages
2022-04-01.txt (SHARED) (1) .
No ratings yet
2022-04-01.txt (SHARED) (1) .
7 pages
Bde Team Assignment
No ratings yet
Bde Team Assignment
17 pages
G H Raisoni Institute of Engineering and Technology
No ratings yet
G H Raisoni Institute of Engineering and Technology
6 pages
GREEN-UP Charging Station Cat No. 0590-00-01-05-06 - 0580-00-01 V1.7
No ratings yet
GREEN-UP Charging Station Cat No. 0590-00-01-05-06 - 0580-00-01 V1.7
6 pages
HƯỚNG DẪN THI NÓI progress test 2- kỳ hè 2023
No ratings yet
HƯỚNG DẪN THI NÓI progress test 2- kỳ hè 2023
9 pages
Voice Assistant Notepad
No ratings yet
Voice Assistant Notepad
9 pages
Phase-1 Report
No ratings yet
Phase-1 Report
29 pages
Speech Recognition Using Machine Learning
No ratings yet
Speech Recognition Using Machine Learning
8 pages
2.snooker King Installation Guide
No ratings yet
2.snooker King Installation Guide
3 pages
Automated Extraction and Augmentation of Key Information From Audio Using Speech Recognition and Text Summarization
No ratings yet
Automated Extraction and Augmentation of Key Information From Audio Using Speech Recognition and Text Summarization
5 pages
Internet of Things - Unit 1
No ratings yet
Internet of Things - Unit 1
22 pages
Analysis On Text Summarization
No ratings yet
Analysis On Text Summarization
10 pages
Video and Text Summarisation Using NLP
No ratings yet
Video and Text Summarisation Using NLP
3 pages
Dell Precision 690 Technicke Specifikace en
No ratings yet
Dell Precision 690 Technicke Specifikace en
2 pages
"Speech Recognition and Voice Detection System": Bachlor of Technology IN Computer Science Engineering
No ratings yet
"Speech Recognition and Voice Detection System": Bachlor of Technology IN Computer Science Engineering
29 pages
ATSSI Abstractive Text Summarization Using Sentiment Infusion
No ratings yet
ATSSI Abstractive Text Summarization Using Sentiment Infusion
7 pages
Automatic Summarization of Document Using Machine Learning
No ratings yet
Automatic Summarization of Document Using Machine Learning
3 pages
Speech Recognition Using Deep Learning Techniques
No ratings yet
Speech Recognition Using Deep Learning Techniques
5 pages
Year 8 Standard Index Form Exemplar Questions & Answers
No ratings yet
Year 8 Standard Index Form Exemplar Questions & Answers
81 pages
Youtube Transcript Summarizer Using Flask
No ratings yet
Youtube Transcript Summarizer Using Flask
9 pages
Progress - Report - of - Intership MD Shams Alam
No ratings yet
Progress - Report - of - Intership MD Shams Alam
4 pages
Human-Computer Interaction Based On Speech Recogni
No ratings yet
Human-Computer Interaction Based On Speech Recogni
9 pages
Agile Web Development With Rails 6 1st Edition Sam Ruby All Chapter Instant Download
100% (3)
Agile Web Development With Rails 6 1st Edition Sam Ruby All Chapter Instant Download
54 pages
Symbolic and Statistical Learning Approaches To Speech Summarization A Scoping Review
No ratings yet
Symbolic and Statistical Learning Approaches To Speech Summarization A Scoping Review
21 pages
Voice To Text Conversion Using Deep Learning
No ratings yet
Voice To Text Conversion Using Deep Learning
6 pages
Text Summarization Using Word Frequency
No ratings yet
Text Summarization Using Word Frequency
3 pages
Abstrating Wisdom: Text Summarization in The Age of Intelligence
No ratings yet
Abstrating Wisdom: Text Summarization in The Age of Intelligence
8 pages
Natural Language Processing: by Dr. Parminder Kaur
No ratings yet
Natural Language Processing: by Dr. Parminder Kaur
26 pages
Speech Recognition Technology
No ratings yet
Speech Recognition Technology
9 pages
2612 Manikanta Reddy K
No ratings yet
2612 Manikanta Reddy K
53 pages
Minor Project Report
No ratings yet
Minor Project Report
13 pages
Speech Recognition Final Report (1) - Removed - Removed
No ratings yet
Speech Recognition Final Report (1) - Removed - Removed
62 pages
Chiweoke and Hannah's Work
No ratings yet
Chiweoke and Hannah's Work
85 pages
A Systematic Survey of Text Summarization - From Statistical To Langauge Models
No ratings yet
A Systematic Survey of Text Summarization - From Statistical To Langauge Models
42 pages
Text Summarization and Conversion of Speech To Text
No ratings yet
Text Summarization and Conversion of Speech To Text
5 pages
Piyu Sem Report.5
No ratings yet
Piyu Sem Report.5
30 pages
Speech Recognition
No ratings yet
Speech Recognition
9 pages
Icimes 113
No ratings yet
Icimes 113
27 pages
Approved by AICTE, New Delhi Affiliated To Aryabhatta Knowledge University, Patna, BIHAR
No ratings yet
Approved by AICTE, New Delhi Affiliated To Aryabhatta Knowledge University, Patna, BIHAR
5 pages
A Skill Based Evaluation Report: Submitted by Joy James Swamy (Urk23Cs1042)
No ratings yet
A Skill Based Evaluation Report: Submitted by Joy James Swamy (Urk23Cs1042)
16 pages
Mini Combined Report
No ratings yet
Mini Combined Report
27 pages
Sample
No ratings yet
Sample
6 pages
Speech Recognition Using Python
No ratings yet
Speech Recognition Using Python
49 pages
Automated Business Report Summarization Using Transformer Model
No ratings yet
Automated Business Report Summarization Using Transformer Model
5 pages
I Grade Exams Datesheet and Invigilation Plan F2024
No ratings yet
I Grade Exams Datesheet and Invigilation Plan F2024
9 pages
DINLect 1
No ratings yet
DINLect 1
69 pages
B.tech It Batchno 136
No ratings yet
B.tech It Batchno 136
25 pages
Minutes of Meeting Generation For Online Meetings Using NLP Amp ML Techniques
No ratings yet
Minutes of Meeting Generation For Online Meetings Using NLP Amp ML Techniques
6 pages
Presentation Computer
No ratings yet
Presentation Computer
12 pages
Speech Recognition Applications TEXT
No ratings yet
Speech Recognition Applications TEXT
7 pages
A Multimodal Approach To Multispeaker Summarization and Mind Mapping For Audio Data
No ratings yet
A Multimodal Approach To Multispeaker Summarization and Mind Mapping For Audio Data
6 pages
Mini Project Synopsis Draft
No ratings yet
Mini Project Synopsis Draft
5 pages
Data Mining Series 2 Important Topics
No ratings yet
Data Mining Series 2 Important Topics
22 pages
Sample Seminar Report
No ratings yet
Sample Seminar Report
14 pages
Finalreport
No ratings yet
Finalreport
30 pages
IEEE Conference Template 1
No ratings yet
IEEE Conference Template 1
7 pages
JETIR2201426
No ratings yet
JETIR2201426
8 pages
Automatic Meeting Minutes Generation Using Natural Language Processing
No ratings yet
Automatic Meeting Minutes Generation Using Natural Language Processing
7 pages
7 LS
No ratings yet
7 LS
6 pages
AIML Project Documentation
No ratings yet
AIML Project Documentation
33 pages
Irjet V7i6965
No ratings yet
Irjet V7i6965
5 pages
Research Paper 9
No ratings yet
Research Paper 9
7 pages
Temperature-Energy Relationships and Spatial Distribution Analysis for Nano-Enhanced Phase Change Materials Via Thermal Energy Storage
No ratings yet
Temperature-Energy Relationships and Spatial Distribution Analysis for Nano-Enhanced Phase Change Materials Via Thermal Energy Storage
18 pages
Reviving Chettinad Architecture: A Cultural Legacy of Tamil Nadu
No ratings yet
Reviving Chettinad Architecture: A Cultural Legacy of Tamil Nadu
9 pages
NPAs and Profitability in Indian Private Sector Banks: Evidence from a Panel Study
No ratings yet
NPAs and Profitability in Indian Private Sector Banks: Evidence from a Panel Study
7 pages
Ginkgo Biloba-Derived Flavonoids as Metal Chelators in Alzheimer’s Neurochemistry: A Biochemical Approach
No ratings yet
Ginkgo Biloba-Derived Flavonoids as Metal Chelators in Alzheimer’s Neurochemistry: A Biochemical Approach
7 pages
The Impact of Artificial Intelligence Interventions on Adolescent Mental Health: A Multidimensional Study Using ChatGPT, Gemini, and DeepSeek
No ratings yet
The Impact of Artificial Intelligence Interventions on Adolescent Mental Health: A Multidimensional Study Using ChatGPT, Gemini, and DeepSeek
8 pages
From Global Standards to Local Fields: Redefining Labour Through MGNREGS in Kerala’s Tribal Heartlands – An Interrogation of ILO Norms
No ratings yet
From Global Standards to Local Fields: Redefining Labour Through MGNREGS in Kerala’s Tribal Heartlands – An Interrogation of ILO Norms
7 pages
IMPROVE Floodeye: Integrated Mobile System for Predictive Routing and Optimized Vehicle Navigation Using Ensemble Algorithm
No ratings yet
IMPROVE Floodeye: Integrated Mobile System for Predictive Routing and Optimized Vehicle Navigation Using Ensemble Algorithm
6 pages
Cementing “Optimization Techniques” in Social Sciences Research: Towards Non-Mathematical Optimization Techniques for the Social Sciences
No ratings yet
Cementing “Optimization Techniques” in Social Sciences Research: Towards Non-Mathematical Optimization Techniques for the Social Sciences
10 pages
Innovation of Detector Score Plaque Sensor Based to Improve the Effectiveness and Afficiency of Dental Health Services
No ratings yet
Innovation of Detector Score Plaque Sensor Based to Improve the Effectiveness and Afficiency of Dental Health Services
7 pages
Molecular Insights into Prion Degradation in Creutzfeldt Jakob Disease’s Challenges and Future Directions: A Review
No ratings yet
Molecular Insights into Prion Degradation in Creutzfeldt Jakob Disease’s Challenges and Future Directions: A Review
13 pages
Promptsecure: Secure Prompt Engineering Protocols for Regulated Genai Environments
No ratings yet
Promptsecure: Secure Prompt Engineering Protocols for Regulated Genai Environments
9 pages
Impact of Yogic Intervention on Refractive Error Among Adolescents: An Experimental Study
No ratings yet
Impact of Yogic Intervention on Refractive Error Among Adolescents: An Experimental Study
5 pages
Managing Cardiovascular Toxicities in Cancer Therapy
No ratings yet
Managing Cardiovascular Toxicities in Cancer Therapy
9 pages
Bringing India to the Global Table: The Transformative Power of International Joint Ventures
No ratings yet
Bringing India to the Global Table: The Transformative Power of International Joint Ventures
4 pages
Pharmacological Evaluation of the Analgesic Potential of Eleusine indica (Poaceae) Ethanolic Root Extract
No ratings yet
Pharmacological Evaluation of the Analgesic Potential of Eleusine indica (Poaceae) Ethanolic Root Extract
15 pages
Alzheimer's Disease: Advances in Early Diagnosis and Emerging Therapeutics
No ratings yet
Alzheimer's Disease: Advances in Early Diagnosis and Emerging Therapeutics
4 pages
Rethinking Urban Mobility Through Public Parking Facilities in Yaounde : A Case Study of Mokolo, Yaounde
No ratings yet
Rethinking Urban Mobility Through Public Parking Facilities in Yaounde : A Case Study of Mokolo, Yaounde
17 pages
Perception and Readiness of Graduate Level Students Toward E-Governance Implementation in Nepal: A Study at Far Western University
No ratings yet
Perception and Readiness of Graduate Level Students Toward E-Governance Implementation in Nepal: A Study at Far Western University
15 pages
Zinner Syndrome: A Radiological Case Report with Multimodal Imaging Insights
No ratings yet
Zinner Syndrome: A Radiological Case Report with Multimodal Imaging Insights
6 pages
An Overview of Evans Syndrome–A Rare Disease
No ratings yet
An Overview of Evans Syndrome–A Rare Disease
5 pages
Integrative Approach to Type 1 Diabetes Mellitus: An Unani Perspective on Asbab-E-Sitta Zaruriya
No ratings yet
Integrative Approach to Type 1 Diabetes Mellitus: An Unani Perspective on Asbab-E-Sitta Zaruriya
3 pages
From Resilience to Success: An Appreciative Inquiry into the Experiences of Criminologist Licensure Examination Passers
No ratings yet
From Resilience to Success: An Appreciative Inquiry into the Experiences of Criminologist Licensure Examination Passers
17 pages
Understanding Students’ Entrepreneurial Mindset in Sorsogon State University
No ratings yet
Understanding Students’ Entrepreneurial Mindset in Sorsogon State University
16 pages
The School of Talents as an Empowerment Catalyst in Transforming Women’s Lives and Promoting Gender Equality in Pentecostal Communities
No ratings yet
The School of Talents as an Empowerment Catalyst in Transforming Women’s Lives and Promoting Gender Equality in Pentecostal Communities
11 pages
An Analysis of Cognitive Flexibility and Student Engagement: Reimagining Teaching Strategies in Post-Pandemic Higher Education
No ratings yet
An Analysis of Cognitive Flexibility and Student Engagement: Reimagining Teaching Strategies in Post-Pandemic Higher Education
9 pages
A Comprehensive Insight into Adult Congenital Heart Disease: A Battle of Survival into Adulthood
No ratings yet
A Comprehensive Insight into Adult Congenital Heart Disease: A Battle of Survival into Adulthood
11 pages
The Role of Streptococci in Infective Endocarditis
No ratings yet
The Role of Streptococci in Infective Endocarditis
6 pages
Kuba Raffia Technology, A Symbol of Authenticity for the Dress Code of Ancestral Value in Congo-Kinshasa
No ratings yet
Kuba Raffia Technology, A Symbol of Authenticity for the Dress Code of Ancestral Value in Congo-Kinshasa
3 pages
Design and Implementation of Smart Dustbin for Automated Wet and Dry Waste Segregation
No ratings yet
Design and Implementation of Smart Dustbin for Automated Wet and Dry Waste Segregation
5 pages
Managing Performance and Building Digital Trust in Remote Teams Through Cybersecurity-Conscious HRM Policies and the Economics of Remote Work
No ratings yet
Managing Performance and Building Digital Trust in Remote Teams Through Cybersecurity-Conscious HRM Policies and the Economics of Remote Work
14 pages
Applied HuggingSound for Speech Recognition: The Complete Guide for Developers and Engineers
From Everand
Applied HuggingSound for Speech Recognition: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Aimybox Voice Assistant Development: Definitive Reference for Developers and Engineers
From Everand
Aimybox Voice Assistant Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
OpenAI Whisper for Developers: The Complete Guide for Developers and Engineers
From Everand
OpenAI Whisper for Developers: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Voice Technologies and Systems: Definitive Reference for Developers and Engineers
From Everand
Voice Technologies and Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Speech-to-Text Systems and Technologies: Definitive Reference for Developers and Engineers
From Everand
Speech-to-Text Systems and Technologies: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Text-to-Speech Systems and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Text-to-Speech Systems and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
The spaCy Handbook: Simplifying Natural Language Processing
From Everand
The spaCy Handbook: Simplifying Natural Language Processing
Robert Johnson
No ratings yet
Speech Recognition: Fundamentals and Applications
From Everand
Speech Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Language Identification: Fundamentals and Applications
From Everand
Language Identification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Natural Language Understanding: Fundamentals and Applications
From Everand
Natural Language Understanding: Fundamentals and Applications
Fouad Sabry
No ratings yet
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
ChatGPT for Linguists: Revolutionize Language Research and Analysis with AI-Driven Insights (2024 Guide)
From Everand
ChatGPT for Linguists: Revolutionize Language Research and Analysis with AI-Driven Insights (2024 Guide)
JED RAMOS
No ratings yet

Meeting Insights Summarisation Using Speech Recognition

Uploaded by

Meeting Insights Summarisation Using Speech Recognition

Uploaded by

Volume 8, Issue 4, April – 2023 International Journal of Innovative Science and Research Technology

Meeting Insights Summarisation Using

IJISRT23APR2036 www.ijisrt.com 1747

Recognition of speech has an extensive variety of use II. LITERATURE SURVEY

IJISRT23APR2036 www.ijisrt.com 1748

IJISRT23APR2036 www.ijisrt.com 1749

Speech pre-processing, feature extraction, acoustic

IJISRT23APR2036 www.ijisrt.com 1750

 Speech Recognition: In conclusion, Gensim summarization is a powerful

 Text Processing:  Spacy:

IJISRT23APR2036 www.ijisrt.com 1751

Fig 1 Input Given by the User

IJISRT23APR2036 www.ijisrt.com 1752

Fig 2 Processed Output in the form of Summary

 Block Diagram: V. CONCLUSION

In conclusion, meeting insights summarization using

[1]. Newell, A., Yang, K., & Deng, J. (2016, October).

IJISRT23APR2036 www.ijisrt.com 1753

IJISRT23APR2036 www.ijisrt.com 1754

You might also like