0% found this document useful (0 votes)

43 views9 pages

Paper 10

Uploaded by

Manda Raghava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views9 pages

Paper 10

Uploaded by

Manda Raghava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Hindawi

Scientiﬁc Programming
Volume 2022, Article ID 7994191, 9 pages
https://doi.org/10.1155/2022/7994191

Research Article
Audio Segmentation Techniques and Applications Based on
Deep Learning

Shruti Aggarwal ,1 Vasukidevi G,2 S. Selvakanmani,3 Bhaskar Pant,4 Kiranjeet Kaur,5

Amit Verma,5 and Geleta Negasa Binegde 6
1
Department of Computer Science and Engineering, Thapar University, Patiala 147004, Punjab, India
2
Department of Science and Humanities, R.M.K. College of Engineering and Technology, R.S.M. Nagar, Puduvoyal,
Tamil Nadu, India
3
Department of Artificial Intelligence and Data Science, Velammal Institute of Technology, Velammal Knowledge Park, Chennai,
Tamil Nadu, India
4
Department of Computer Science and Engineering, Graphic Era Deemed to be University, Bell Road, Clement Town 248002,
Dehradun, Uttarakhand, India
5
University Center for Research and Development, Chandigarh University, Ajitgarh, Punjab, India
6
Department of Computer Science, College of Engineering and Technology, Mettu University, Metu, Ethiopia

Correspondence should be addressed to Shruti Aggarwal; [email protected] and Geleta Negasa Binegde;
[email protected]

Received 22 May 2022; Revised 12 July 2022; Accepted 18 July 2022; Published 19 August 2022

Academic Editor: Punit Gupta

Copyright © 2022 Shruti Aggarwal et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.

Audio processing has become an inseparable part of modern applications in domains ranging from health care to speech-
controlled devices. In automated audio segmentation, deep learning plays a vital role. In this article, we are discussing audio
segmentation based on deep learning. Audio segmentation divides the digital audio signal into a sequence of segments or frames
and then classifies these into various classes such as speech recognition, music, or noise. Segmentation plays an important role in
audio signal processing. The most important aspect is to secure a large amount of high-quality data when training a deep learning
network. In this study, various application areas, citation records, documents published year-wise, and source-wise analysis are
computed using Scopus and Web of Science (WoS) databases. The analysis presented in this paper supports and establishes the
significance of the deep learning techniques in audio segmentation.

1. Introduction segmentation architecture, which is an open form of ar-

chitecture that can take many various kinds [3]. Tradi-
The fundamental goal of audio segmentation is to divide an tionally, to produce meaningful and quality segments, audio
audio signal into small segments so that the entities may be signal is passed through several stages of processing, as
easily identified. Each segment contains audio data from a shown in Figure 1. The output stream, which contains an
certain acoustic category, such as speech, animal voices, adjunct data series of segment-level labels, is then trans-
music, human activity sounds, environmental sounds, and mitted to a routing switch, where each audio form is routed
so on [1]. The level of abstraction in audio class analysis to the proper type of post-processing [4]. Speech parts are
varies depending on the deployment. For example, the radio driven into automatic speech recognizers for linguistic or
broadcast audio signal segmentation has focused on speaker role processing in broadcast transmissions, while
detecting speech, silence, and other noise disturbances [2]. music parts are driven into a sound effect collection library
The general concept and process of audio segmentation are [5]. For each audio sequence, segmentation is performed by
given in Figure 1. The audio stream is fed into an audio computing multiple features [6]. These characteristics are
2 Scientiﬁc Programming

(Signal Level
(Equal Lengthed
Preprocessing
Segments Optimizations
Treatment) Audio to Tensor Streams of
formation) Firm Boundary
Audio Signal Filtering Data Conversion Segments/
RNN Detection of
Noise Cancilation FFT Frames
BiLSTM Segments
Signal
Code Division
Enhancement

Figure 1: Audio segmentation process.

calculated on an audio segment, a frame, or a set of samples The authors suggested that the noise injection method
that is a subset of the audio segment. effectively covers data shortage. Normally, data is audio data
In recent years, audio segmentation and deep learning that has augmented to prevent overfitting by deliberately
have received widespread attention for research focus. injecting noise; it adds random noise to the audio signal and
Several countries and researchers have successfully applied performs audio transformation that slightly deforms the
audio segmentation techniques in various fields like speech pitch and tempo [11]. When data augmentation is per-
recognition, music, or noise removal with different deep formed, the standard quality of the source data has a vital
learning algorithms [7]. A study analysis literature review influence. High-quality data mean a clear signal without any
method was employed by analyzing articles and conferences other type of noise in the audio signal. However, noise is
published from 2005 to 2021 using the VOS viewer software. unavoidable during recordings, and every sound recording
One hundred seventy documents were downloaded in has a different length [12]. For effective analysis, noise re-
(.CSV) file format using two keywords, “Audio Segmenta- moval is very important, and normalizing and generalizing
tion” and “Deep Learning,” from the Scopus database. the raw dataset is also required.
A study showed an improvement in performance by
conducting denoising in the preprocessing step [13]. The
1.1. Audio Data Analysis. A sound is represented as an audio authors mentioned the importance and effect of data
signal in which the frequency, bandwidth, decibel, and so on generalization [14]. It is important to extract appropriate
are the parameters. A typical audio signal can be represented features for each label to classify the data according to the
as a function of Amplitude and Time [8]. Several digital class. There are several methods for extracting the data,
devices help in the audio recording and then represent these like MFCC, spectrogram, and using a deep learning
sounds in a computer-readable way. These are some in- network.
stances of these formats:
(i) mp3 (MPEG-1 Audio Layer 3) format 2. Research Trends in Web of Science
(ii) wav (Waveform Audio File) format Database for Audio Segmentation Based on
(iii) WMA (Windows Media Audio) format Deep Learning
The extraction of acoustics features relevant to the task at In Figure 2, we have presented the Source-Wise Analysis of
hand is involved a typical audio data processing procedure audio segmentation and deep learning research trends using
followed by decision-making techniques, including detec- the Web of Science Database [15]. The experiment was
tion and classification. As a result, audio data analysis is used conducted based on the data collected and analyzed from the
to analyze and comprehend audio signals captured by digital Web of Science database using two keywords, “Audio
equipment, with various applications in healthcare, pro- Segmentation” and “Deep Learning,” from 1999–2021. Se-
duction, and enterprise [9]. Among these, applications are venty-five publications selected from the Web of Science
customer intelligence analysis from user service calls, social- Core Collection are shown in Figure 2. Figure 2 represents
media content analysis, medical aids, patient-care systems, the sources or the fields where the audio segmentation is
and public safety. used with deep learning. In Engineering Electrical Elec-
tronic, the audio segmentation has 36 publication records,
the maximum values in the Web of Science Database. The
1.2. Related Work. In the task of audio segmentation, several second highest records are 22 documents in Computer
authors have devised a segmentation approach by a classi- Science Artificial Intelligence. Also, in Acoustics, the audio
fication system based on neural networks. A multilayer segmentation has 16 documents and 14 in Computer Science
perceptron trained using genetic algorithms to achieve Information Systems [16].
multiclass audio segmentation is an example of a feed- In Figure 3, we represent the research work related to
forward network. Many data are needed to train deep neural audio segmentation that started majorly in 2005. Until a
networks for reliable predictions [10]. Some studies have decade, there was a very slow increase in this type of re-
used data augmentation approaches to expand the quantity search, but post-2016, there was a sharp rise in this area of
of data to overcome these problems. To tackle the data research [17]. The exponential increase in research trends
shortage problem, Raza used two approaches to enhance the can be seen in audio segmentation-related research since
amount of the dataset. 2017.
Scientific Programming 3

36 16 9 7
Engineering Electrical Electronic Acoustics Medical Computer
Informatics Science
Interdisciplinary
Applications

14
Computer Science Information Systems

5 4
Imaging Science
Photographic Robotics
22 Technology
Computer Science Artiﬁcial Intelligence
13
Computer Science Theory Methods

4
Computer Science
Cybernetics

Figure 2: Source-wise analysis of audio segmentation research trends.

18
90
16
80
14 70
12 60
Publications

Citations
50
8 40
6 30
4 20
2 10

0 0
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021

Publications
Citations
Figure 3: Trend analysis of audio segmentation.

It replicates that audio segmentation and deep learning co-occurrence network, which can be created using the VOS
field is becoming an attractive research area in the steady viewer software. In this network visualization, each cluster
research zone progressively as citation number is growing has a different colour [21]. This analysis considers the
previous five years [18]. It is undoubtedly realized that there keywords that appeared in at least three collected docu-
is the maximum number of publications accomplished in ments. From 1310 keywords, only 119 have met the
2019 out of the last fifteen years (2005 to 2021) in this area of threshold represented in co-occurrence network visualiza-
research. In the current year, 2021 has witnessed consid- tion to compose the critical areas of audio segmentation and
erable confidence amongst researchers regarding its appli- deep learning, as shown in Figure 4.
cation in this domain for research, that is why most The colours red, blue, and green are shown in Figure 4 to
publications related to this field have been published [19]. As represent co-occurrence in the related keywords. The shades
per the trend analysis, there is a very high potential for of purple, orange, and so on show that the co-occurrence is a
research in this domain, as shown by the cumulative rising combination of two or more two domains.
pattern of research for audio segmentation methods [20].

2.2. Year-Wise Publications and Research Trends.

2.1. Keywords Related to Audio Segmentation and Deep Figure 5 shows the publication variations published from
Learning. There are six diﬀerent clusters of keywords in the 2005 to 2021, showing the gradual increase in the number of
4 Scientiﬁc Programming

VOSviewer

Figure 4: Co-occurrence keyword network on audio segmentation and deep learning.

Yearly Publications 36 and 47 documents, respectively. At the same time,

50
73.52% (125) of the total publications of audio segmen-
45
40 tation with deep learning were published in the last ﬁve
35 years (2015–2020).
30
25
20
15
2.3. Country-Wise Research Trends for Audio Segmentation.
10 Many researchers globally implement audio segmentation
5 methods. The documents published in the last 20 years come
0 from various countries [24]. Neural networks are also used
2004

2006

2008

2010

2012

2014

2016

2018

2020

2022

with audio segmentation, where the sum of squared errors is

Figure 5: A graphical representation of the publication’s year-wise. used to evaluate eﬃciency [25]. Researchers have worked
globally in implementing various audio segmentation ap-
plications, as shown in Figure 6.
publications in audio segmentation based on deep learning Table 1 shows the top 20 countries, the number of re-
techniques [22]. search documents related to audio segmentation, and ci-
Figure 5 shows the distribution of documents pub- tations. Figure 6 shows the Density Visualization for global
lished by year related to the application of audio seg- research analysis of audio segmentation researchers based
mentation with deep learning techniques [23]. The on the Scopus database [26]. As we can see, those two
research related to this domain started from 2005 to 2012 countries the United States and China have strong links by
gradually. Furthermore, since 2016, the number of pub- comparing to other countries [27]. So, their clustering is
lications dealing with audio segmentation methods highest among the other countries [28].
has continuously increased trends in research. In this As described in the given table, maximum number of
sense, it is evident that many documents have been publications was published in the United States [29]. These
published during the last two years (2019 and 2020), with data are extracted from the Scopus database having a
Scientiﬁc Programming 5

VOSviewer

Figure 6: Global research trends for audio segmentation.

Table 1: Publications and citations (country-wise) and global Table 2: Author-wise citations for audio segmentation research
occurrence analysis. trends along with year and references.
S. no. Country Documents Citations S. no. Document Citations Reference
1 United States 30 220 1 Zhang s. (2018) 94 [17]
2 China 28 247 2 Huang h. (2020) 63 [32]
3 India 11 26 3 Wang z. (2018) 43 [19]
4 Canada 7 58 4 Messner e. (2018) 42 [20]
5 United Kingdom 7 19 5 Leglaive s. (2018) 34 [18]
6 France 6 74 6 Baraldi l. (2017) 33 [13]
7 Germany 6 12 7 Gwardys g. (2014) 25 [11]
8 Japan 6 87 8 Akbari m. (2019) 24 [24]
9 South Korea 5 24 9 Lim m. (2018) 21 [21]
10 Spain 4 6 10 Leglaive s. (2019) 20 [25]
11 Switzerland 4 11 11 Lu w.-t. (2018) 20 [22]
12 Australia 3 15 12 Deng j. (2016) 16 [12]
13 Taiwan 3 29 13 Wu y. (2019) 15 [26]
14 Austria 2 42 14 Rahmani m.h. (2017) 15 [14]
15 Bangladesh 2 12 15 Min x. (2020) 14 [33]
16 Brazil 2 0 16 Laporte c. (2018) 14 [23]
17 Greece 2 4 17 Valliappan c.a. (2019) 13 [28]
18 Iran 2 15 18 Jati a. (2017) 13 [15]
19 Italy 2 35 19 Guo j. (2019) 12 [27]
20 Netherlands 2 7 20 Hossain s. (2019) 12 [29]
21 Baby a. (2017) 11 [16]
22 Leglaive s. (2020) 10 [34]
minimum of three documents [30]. United States, China, 23 Hesamian m.h. (2019) 10 [30]
and India are the top three countries, respectively, where 24 Li h. (2019) 10 [31]
research on audio segmentation is highest, and the entire
documents and citations are as high. It clearly shows how the
India, and a lot of research potential lies in countries like
related research is co-related [31].
Canada and the United Kingdom [33].
In this study, India has 11 documents and 26 citations,
indicating that Indian authors are more actively involved in
research based on the audio segmentation ﬁeld [32]. So, 2.4. Prominent Researchers for Audio Segmentation. The
most researchers are from the United States, China, and publication searched from the Scopus database using two
6 Scientiﬁc Programming

VOSviewer

2017 2018 2019 2020

Figure 7: Author-wise analysis of research trends in audio segmentation.

search keywords, “Audio Segmentation” and “Deep Language Identification, and Automatic Emotion Recog-
Learning,” has been cited several times as described in nition systems [36]. The audio signal is segmented into a
Table 2. By applying the filter of a minimum of 10 citations sequence of frames and classified into several classes like
for each document, we got 24 publications [34]. Table 2 music [37], speech [38], noise [39], and so on. The noise is
represents the citations for 24 publications identified using filtered out of the sound signal in this approach because
the VOS viewer software package. audio recordings are significant variations, like ratio [40],
Figure 7 shows the author-wise analysis of audio seg- audio encoding [41], bandwidth [42], language [43],
mentation research content based on the Scopus database. speaking styles [44], gender [45], and sound pitch [46],
The number mentioned in author’s citations is ninety-four, which are the challenges.
the highest number in this research area. From Figure 7, we Segmentation provides the most effective method for
can say that Zhang s. (2018) is cited most. Huang h. (2020) is splitting multimedia data into digital data by extracting
the second-largest because of the number of citations. The diverse aspects of multimedia data [47]. This segmentation
number of citations of Huang h. (2020) is sixty-three. yields useful information such as speaker signal and identity
division, as well as automatic indexing and data retrieval of
3. Application Areas of Audio Segmentation all instances of a certain speaker [48]. We can do automatic
online speech recognition acoustic models to improve
Audio segmentation is often utilized in various applications, overall system performance by collecting all segments
like Automatic Speech Recognition [35], Automatic produced by the same speaker [49]. Typically, a certain set of
Scientific Programming 7

properties is extracted from each audio frame [50]. Features References

are used two ways: with the extracted value or changes over
time [51]. It is feasible to calculate statistical features such as [1] T. Theodorou, I. Mporas, and N. Fakotakis, “An overview of
variance and variance using changes over time [52]. automatic audio segmentation,” International Journal of In-
formation Technology and Computer Science, vol. 11, pp. 1–9,
Audio segmentation is used to analyze and understand
2014.
the audio recordings, with several applications in healthcare, [2] E. F. Gomes and E. Pereira, “Classifying heart sounds using
production, and enterprise [53]. Among these, applications peak location for segmentation and feature construction,”
are customer intelligence analysis from user service calls, Workshop Classifying Heart Sounds, 2012.
social-media content analysis [54], medical aids, patient-care [3] S. . Lefèvre and N. Vincent, “A two-level strategy for audio
systems, and public safety [55]. In healthcare, with the as- segmentation,” Digital Signal Processing, vol. 21, no. 2,
sistance of audio segmentation, a real-time cardiac arrest pp. 270–277, 2011.
detection system monitors and detects any upcoming heart- [4] J. X. Zhang, J. Whalley, and S. Brooks, “A two-phase method
related diseases [56]. It categorized the heart sound re- for general audio segmentation,” in Proceedings of the IEEE
cordings into normal and abnormal heart sounds per per- International Conference on Multimedia and Expo, IEEE, New
ceived health risk. The system can provide the potential to York, NY, USA, July 2009.
monitor many people at a time and supply fast and effective [5] X. Shao, C. Xu, and M. S. Kankanhalli, “Applying neural
warnings to doctors for further treatments [57]. network on the content-based audio classification,” in Pro-
The experiments conducted have major relevance and ceedings of the Fourth International Conference on Informa-
contribute to predicting future study domain trends. The tion, Communications and Signal Processing, and the Fourth
Pacific Rim Conference on Multimedia, IEEE, Singapore,
various experiments such as source-wise, author-wise,
December 2003.
country-wise, citation-wise, and so on give us knowledge of
[6] A. Raza, A. Mehmood, S. Ullah, M. Ahmad, G. S. Choi, and
work conducted in this domain and the possible areas of B. W. On, “Heartbeat sound signal classification using deep
further research [58]. The author-wise analysis gives us learning,” Sensors, vol. 19, no. 21, p. 4819, 2019.
information about various authors working in this domain. [7] K. B. Bhangale and M. Kothandaraman, “Survey of deep
Similarly, source-wise information helps us understand the learning paradigms for speech processing,” Wireless Personal
various sources where information relevant to the work can Communications, vol. 125, no. 2, pp. 1913–1949, 2022.
be found. [8] B. Poole, J. Sohl-Dickstein, and S. Ganguli, “Analyzing noise
in auto-encoders and deep networks,” 2014, https://arxiv.org/
4. Conclusions abs/1406.1831.
[9] V. N. Varghees and K. I. Ramachandran, “A novel heart sound
As speech technologies’ applications are progressing, audio activity detection framework for automated heart sound
segmentation techniques’ significance is also increasing. The analysis,” Biomedical Signal Processing and Control, vol. 13,
huge surge in the number of research articles on deep pp. 174–188, 2014.
[10] A. C. Stasis, E. N. Loukis, S. A. Pavlopoulos, and
learning-based audio segmentation indicates the paramount
D. Koutsouris, “Using decision tree algorithms as a basis for a
importance of these techniques. This paper highlights the
heart sound diagnosis decision support system,” in Pro-
applications and source-wise research significance of audio
ceedings of the 4th International IEEE EMBS Special Topic
segmentation. The analysis presented in this paper also Conference on Information Technology Applications in Bio-
exhibits the clear and strong relationship between audio medicine, pp. 354–357, IEEE, Birmingham, UK, April 2003.
segmentation and deep learning techniques. This work can [11] C. Liu, D. Springer, Q. Li et al., “An open access database for
further be extended to include the domain-specific and the evaluation of heart sound algorithms,” Physiological
contextual analysis of audio segmentation techniques. Measurement, vol. 37, no. 12, pp. 2181–2213, 2016.
[12] G. Gwardys and D. M. Grzywczak, “Deep image features in
Data Availability music information retrieval,” International Journal of Elec-
tronics and Telecommunications, vol. 60, no. 4, pp. 321–326,
The data used to support the findings of this study are in- 2014.
cluded in the article. Should further data or information be [13] J. Q. Deng and Y. K. Kwok, “A hybrid Gaussian-HMM-Deep
required, these are available from the corresponding author learning approach for automatic chord Estimation with very
upon request. large Vocabulary,” in Proceedings of the 17th International
Society for Music Information Retrieval Conference, pp. 812–
818, ISMIR, New York City, United States, August 2016.
Conflicts of Interest [14] L. Baraldi, C. Grana, and R. Cucchiara, “Recognizing and
presenting the storytelling video structure with deep multi-
The authors declare that there are no conflicts of interest modal networks,” IEEE Transactions on Multimedia, vol. 19,
regarding the publication of this paper. no. 5, pp. 955–968, 2017.
[15] M. H. Rahmani and F. Almasganj, “Lip-reading via a DNN-
Acknowledgments HMM hybrid system using combination of the image-based
and model-based features,” in Proceedings of the 3rd Inter-
The authors thank Thapar University, Punjab, for the national Conference on Pattern Recognition and Image
technical assistance. The authors appreciate the support Analysis (IPRIA), pp. 195–199, IEEE, Shahrekord, Iran, April
from Mettu University, Ethiopia. 2017.
8 Scientific Programming

[16] A. Jati and P. G. Georgiou, “Speaker2Vec: Unsupervised technique for real-time magnetic resonance imaging video
learning and Adaptation of a speaker Manifold using deep using segnet,” in Proceedings of the ICASSP 2019-2019 IEEE
neural networks with an evaluation on speaker segmentation,” International Conference on Acoustics, Speech and Signal
in Proceedings of the Annual Conference of the International Processing (ICASSP), IEEE, Brighton, UK, May 2019.
Speech Communication Association, pp. 3567–3571, INTER- [30] S. Hossain, S. Najeeb, A. Shahriyar, Z. R. Abdullah, and
SPEECH, Stockholm, Sweden, August 2017. M. A. Haque, “A pipeline for lung tumor detection and
[17] A. Baby, J. J. Prakash, S. R. Vignesh, and H. A. Murthy, “Deep segmentation from ct scans using dilated convolutional neural
learning techniques in tandem with signal processing cues for networks,” in Proceedings of the ICASSP 2019-2019 IEEE
phonetic segmentation for text to speech synthesis in Indian International Conference on Acoustics, Speech, and Signal
languages,” in Proceedings of the Annual Conference of the Processing (ICASSP), IEEE, Brighton, UK, May 2019.
International Speech Communication Association, pp. 3817– [31] M. H. Hesamian, W. Jia, X. He, and P. J. Kennedy, “Atrous
3821, INTERSPEECH, August 2017. convolution for binary semantic segmentation of lung nod-
[18] E. Messner, M. Zöhrer, and F. Pernkopf, “Heart sound seg- ule,” in Proceedings of the ICASSP 2019-2019 IEEE Interna-
mentation -- an event detection approach using deep re- tional Conference on Acoustics, Speech, and Signal Processing
current neural networks,” IEEE Transactions on Biomedical (ICASSP), pp. 1015–1019, IEEE, Brighton, UK, May 2019.
Engineering, vol. 65, no. 9, pp. 1964–1974, 2018. [32] H. Li, D. Chen, W. H. Nailon, M. E. Davies, and D. Laurenson,
[19] S. Zhang, S. Zhang, T. Huang, W. Gao, and Q. Tian, “Learning “A deep dual-path network for improved mammogram image
affective features with a hybrid deep model for audio–visual processing,” in Proceedings of the ICASSP 2019-2019 IEEE
emotion recognition,” IEEE Transactions on Circuits and International Conference on Acoustics, Speech, and Signal
Systems for Video Technology, vol. 28, no. 10, pp. 3030–3043, Processing (ICASSP), pp. 1224–1228, IEEE, Brighton, UK, May
2018. 2019.
[20] Z. Wang and S. Ji, “Smoothed dilated convolutions for im- [33] H. Huang, L. Lin, R. Tong et al., “Unet 3+: a full-scale con-
proved dense prediction,” Proceedings of the ACM SIGKDD nected unet for medical image segmentation,” in Proceedings
International Conference on Knowledge Discovery and Data of the ICASSP 2020-2020 IEEE International Conference on
Mining, pp. 1–27, 2018. Acoustics, Speech, and Signal Processing (ICASSP), pp. 1055–
[21] S. Leglaive, L. Girin, and R. Horaud, “A variance modeling
1059, IEEE, Barcelona, Spain, May 2019.
framework based on variational autoencoders for speech [34] X. Min, G. Zhai, J. Zhou, X. P. Zhang, X. Yang, and X. Guan,
enhancement,” in Proceedings of the IEEE 28th International
“A multimodal saliency model for videos with high audio-
Workshop on Machine Learning for Signal Processing (MLSP),
visual correspondence,” IEEE Transactions on Image Pro-
IEEE, Aalborg, Denmark, September 2018.
cessing, vol. 29, pp. 3805–3819, 2020.
[22] M. Lim, D. Lee, H. Park et al., “Convolutional neural network
[35] S. Leglaive, X. Alameda-Pineda, L. Girin, and R. Horaud, “A
based audio event classification,” KSII Transactions on In-
recurrent variational autoencoder for speech enhancement,”
ternet and Information Systems (TIIS), vol. 12, no. 6,
in Proceedings of the ICASSP 2020-2020 IEEE International
pp. 2748–2760, 2018.
Conference on Acoustics, Speech, and Signal Processing
[23] W. T. Lu and L. Su, “Vocal Melody extraction with semantic
(ICASSP), pp. 371–375, IEEE, Barcelona, Spain, May 2020.
segmentation and audio-symbolic domain transfer learning,”
[36] G. Tzanetakis and P. Cook, “Multi-feature audio segmenta-
in Proceedings of the 19th International Society for Music
Information Retrieval Conference, pp. 521–528, ISMIR, Sep- tion for browsing and annotation,” in Proceedings of the 1999
tember 2018. IEEE Workshop on Applications of Signal Processing to Audio
[24] C. Laporte and L. Ménard, “Multi-hypothesis tracking of the and Acoustics. WASPAA’99 (Cat. No. 99TH8452), pp. 103–
tongue surface in ultrasound video recordings of normal and 106, IEEE, New Paltz, NY, USA, October 1999.
impaired speech,” Medical Image Analysis, vol. 44, pp. 98–114, [37] S. Venkatesh, D. Moffat, and E. R. Miranda, “Investigating the
2018. effects of training set synthesis for audio segmentation of
[25] M. Akbari, J. Liang, and J. Han, “DSSLIC: deep semantic radio broadcast,” Electronics, vol. 10, no. 7, p. 827, 2021.
segmentation-based layered image compression,” in Pro- [38] S. A. Deevi, C. P. Kaniraja, V. D. Mani, D. Mishra, S. Ummar,
ceedings of the ICASSP 2019-2019 IEEE International Con- and C. Satheesh, “HeartNetEC: a deep representation learning
ference on Acoustics, Speech and Signal Processing (ICASSP), approach for ECG beat classification,” Biomedical Engineering
pp. 2042–2046, IEEE, Brighton, UK, May 2019. Letters, vol. 11, no. 1, pp. 69–84, 2021.
[26] S. Leglaive, U. Şimşekli, A. Liutkus, L. Girin, and R. Horaud, [39] S. Suyanto, K. N. Ramadhani, S. Mandala, and A. Kurniawan,
“Speech enhancement with variational autoencoders and “Automatic segmented-Syllable and deep learning-based
alpha-stable distributions,” in Proceedings of the ICASSP 2019- Indonesian Audiovisual speech recognition,” in Proceedings of
2019 IEEE International Conference on Acoustics, Speech, and the 6th International Conference on Interactive Digital Media
Signal Processing (ICASSP), pp. 541–545, IEEE, Brighton, UK, (ICIDM), pp. 1–4, IEEE, Bandung, Indonesia, December 2020.
May 2019. [40] F. Barata, P. Tinschert, F. Rassouli et al., “Automatic recog-
[27] Y. Wu and W. Li, “Automatic audio chord recognition with nition, segmentation, and sex assignment of nocturnal
MIDI-trained deep feature and BLSTM-CRF sequence asthmatic coughs and cough epochs in smartphone audio
decoding model,” IEEE/ACM Transactions on Audio, Speech, recordings: observational field study,” Journal of Medical
and Language Processing, vol. 27, no. 2, pp. 355–366, 2019. Internet Research, vol. 22, no. 7, Article ID e18082, 2020.
[28] J. Guo, B. Song, P. Zhang, M. Ma, W. Luo, and J. lv, “Affective [41] O. Stephen and M. Sain, “Deep learning-based Scene image
video content analysis based on multimodal data fusion in detection and segmentation with speech synthesis in real-
heterogeneous networks,” Information Fusion, vol. 51, time,” in Smart Healthcare Analytics in IoT Enabled Envi-
pp. 224–232, 2019. ronment, pp. 163–171, Springer, Cham, 2020.
[29] C. A. Valliappan, A. Kumar, R. Mannem, G. R. Karthik, and [42] C. Park, D. Kim, and H. Ko, “Dilated convolution and gated
P. K. Ghosh, “An improved air tissue boundary segmentation linear unit based sound event detection and tagging algorithm
Scientific Programming 9

using weak label,” The Journal of the Acoustical Society of [56] C. S. S. Anupama, L. Natrayan, E. Laxmi Lydia et al., “Deep
Korea, vol. 39, no. 5, pp. 414–423, 2020. learning with backtracking search optimization-based skin
[43] M. F. M. Esa, N. H. Mustaffa, N. H. M. Radzi, and lesion diagnosis model,” Computers, Materials & Continua,
R. Sallehuddin, “Audio Deformation based data augmenta- vol. 70, no. 1, pp. 1297–1313, 2021.
tion for convolution neural network in Vibration analysis,” [57] S. Raja and A. J. Rajan, “A decision-making model for se-
IOP Conference Series: Materials Science and Engineering, lection of the Suitable FDM Machine using Fuzzy TOPSIS,”
vol. 551, no. 1, Article ID 012066, 2019. Mathematical Problems in Engineering, vol. 2022, Article ID
[44] A. Sendrayaperumal, S. Mahapatra, S. S. Parida et al., “Energy 7653292, 2022.
Auditing for efficient planning and Implementation in [58] S. Aggarwal, M. Suchithra, N. Chandramouli et al., “Rice
Commercial and Residential Buildings,” Advances in Civil disease detection using Artificial and Machine learning
Engineering, vol. 2021, pp. 1–10, 2021. techniques to Improvise Agro-Business,” Journal of Scientific
[45] L. P. Natrayan, S. S. Sundaram, and J. Elumalai, “Analyzing Programming, vol. 2022, Article ID 1757888, 2022.
the Uterine physiological with MMG signals using SVM,”
International journal of pharmaceutical research, vol. 11, no. 2,
pp. 165–170, 2019.
[46] K. Seeniappan, B. Venkatesan, N. N. Krishnan et al., “A
comparative assessment of performance and emission char-
acteristics of a DI diesel engine fuelled with ternary blends of
two higher alcohols with lemongrass oil biodiesel and diesel
fuel,” Energy & Environment, vol. 13, Article ID
0958305X2110513, 2021.
[47] K. R. Vaishali, S. R. Rammohan, L. Natrayan, D. Usha, and
V. R. Niveditha, “Guided container selection for data
streaming through neural learning in cloud,” International
Journal of System Assurance Engineering and Management,
vol. 16, pp. 1–7, 2021.
[48] G. Kanimozhi, L. Natrayan, S. Angalaeswari, and
P. Paramasivam, “An effective Charger for plug-in hybrid
Electric Vehicles (PHEV) with an enhanced PFC Rectifier and
ZVS-ZCS DC/DC high-frequency Converter,” Journal of
Advanced Transportation, vol. 2022, Article ID 7840102, 2022.
[49] S. Kaliappan, M. D. Raj Kamal, S. Mohanamurugan, and
P. K. Nagarajan, “Analysis of an Innovative Connecting Rod
by using finite Element method,” Taga Journal Of Graphic
Technology, vol. 14, pp. 1147–1152, 2018.
[50] D. K. Jain, S. K. S. Tyagi, S. Neelakandan, M. Prakash, and
L. Natrayan, “Metaheuristic optimization-based Resource
Allocation technique for Cybertwin-driven 6G on IoE envi-
ronment,” IEEE Transactions on Industrial Informatics,
vol. 18, no. 7, pp. 4884–4892, 2022.
[51] P. Asha, L. Natrayan, B. T. Geetha et al., “IoT enabled en-
vironmental toxicology for air pollution monitoring using AI
techniques,” Environmental Research, vol. 205, Article ID
112574, 2022.
[52] A. S. Kaliappan, S. Mohanamurugan, and P. K. Nagarajan,
“Numerical Investigation of Sinusoidal and Trapezoidal pis-
ton profiles for an IC engine,” Journal of Applied Fluid Me-
chanics, vol. 13, no. 1, pp. 287–298, 2020.
[53] S. S. Sundaram, N. Hari Basker, and L. Natrayan, “Smart
clothes with bio-sensors for ECG monitoring,” International
Journal of Innovative Technology and Exploring Engineering,
vol. 8, no. 4, pp. 298–301, 2019.
[54] K. Nagarajan, A. Rajagopalan, S. Angalaeswari, L. Natrayan,
and W. D. Mammo, “Combined Economic emission Dispatch
of Microgrid with the Incorporation of Renewable Energy
sources using improved Mayfly optimization algorithm,”
Computational Intelligence and Neuroscience, vol. 2022,
pp. 1–22, 2022.
[55] S. Magesh, V. R. Niveditha, P. S. Rajakumar, S. Radha Ram
Mohan, and L. Natrayan, “Pervasive computing in the context
of COVID-19 prediction with AI-based algorithms,” Inter-
national Journal of Pervasive Computing and Communica-
tions, vol. 16, no. 5, pp. 477–487, 2020.

Complete Mesocolic Excision and Extent of Lymphadenectomy For The Treatment of Colon Cancer
No ratings yet
Complete Mesocolic Excision and Extent of Lymphadenectomy For The Treatment of Colon Cancer
14 pages
Multimedia Programming Using Max/MSP and TouchDesigner
From Everand
Multimedia Programming Using Max/MSP and TouchDesigner
Patrik Lechner
5/5 (3)
VH GR-7 Mathematics T1 Sample-QP
100% (2)
VH GR-7 Mathematics T1 Sample-QP
6 pages
Deep Learning For Audio Signal Processing
No ratings yet
Deep Learning For Audio Signal Processing
14 pages
10 Audio Processing Tasks To Get You Started With Deep Learning Applications (With Case Studies)
No ratings yet
10 Audio Processing Tasks To Get You Started With Deep Learning Applications (With Case Studies)
5 pages
SNS - Final Project Report
No ratings yet
SNS - Final Project Report
19 pages
2021 Deep Learning Audio Book
No ratings yet
2021 Deep Learning Audio Book
38 pages
Speech Chapter 4
No ratings yet
Speech Chapter 4
41 pages
Chord Detection Using Deep Learning
No ratings yet
Chord Detection Using Deep Learning
7 pages
Audio Data Analysis Using Machine Learning and Deep
No ratings yet
Audio Data Analysis Using Machine Learning and Deep
74 pages
Nietjet 0602S 2018 003
No ratings yet
Nietjet 0602S 2018 003
5 pages
DL For Acoustics
No ratings yet
DL For Acoustics
4 pages
Audio Classification Using Deep Learning Report
No ratings yet
Audio Classification Using Deep Learning Report
25 pages
Multimedia Auditory Signal Analysis
No ratings yet
Multimedia Auditory Signal Analysis
17 pages
Audio Noise Detection
No ratings yet
Audio Noise Detection
29 pages
ISMIR 2019 Tutorial - Waveform-Based Music Processing With Deep Learning
No ratings yet
ISMIR 2019 Tutorial - Waveform-Based Music Processing With Deep Learning
152 pages
Spectnet: End-To-End Audio Signal Classification Using Learnable Spectrogram Features
No ratings yet
Spectnet: End-To-End Audio Signal Classification Using Learnable Spectrogram Features
8 pages
Mrac Paper1a
No ratings yet
Mrac Paper1a
11 pages
Applsci 14 02321
No ratings yet
Applsci 14 02321
6 pages
Major
No ratings yet
Major
8 pages
The Emergence of Deep Learning: New Opportunities For Music and Audio Technologies
No ratings yet
The Emergence of Deep Learning: New Opportunities For Music and Audio Technologies
2 pages
10 1109@JSTSP 2019 2909479
No ratings yet
10 1109@JSTSP 2019 2909479
13 pages
Predicting Singer Voice Using Convolutional Neural Network
No ratings yet
Predicting Singer Voice Using Convolutional Neural Network
17 pages
Music Deep Learning Deep Learning Methods For Music Signal ProcessingA Review of The State-Of-The-Art
No ratings yet
Music Deep Learning Deep Learning Methods For Music Signal ProcessingA Review of The State-Of-The-Art
22 pages
Expert Systems With Applications: P. Dhanalakshmi, S. Palanivel, V. Ramalingam
No ratings yet
Expert Systems With Applications: P. Dhanalakshmi, S. Palanivel, V. Ramalingam
7 pages
Signal Processing Methods For Music Transcription Klapuri
No ratings yet
Signal Processing Methods For Music Transcription Klapuri
443 pages
Synopsis Final Final
No ratings yet
Synopsis Final Final
10 pages
An Audio Classification Approach Using Feature Extraction Neural Network Classification Approch
No ratings yet
An Audio Classification Approach Using Feature Extraction Neural Network Classification Approch
6 pages
BTP Final
No ratings yet
BTP Final
16 pages
A Comparative Study in Automatic Recognition of Broadcast Audio
No ratings yet
A Comparative Study in Automatic Recognition of Broadcast Audio
4 pages
Musical Instrument Timbres Classification With Spectum
100% (1)
Musical Instrument Timbres Classification With Spectum
10 pages
Audio Query Based Interface
No ratings yet
Audio Query Based Interface
8 pages
Pfa Inr
No ratings yet
Pfa Inr
75 pages
Paper - Bims 2
No ratings yet
Paper - Bims 2
6 pages
Open-Source Practices For Music Signal Processing Research Recommendations For Transparent Sustainable and Reproducible Audio Research
No ratings yet
Open-Source Practices For Music Signal Processing Research Recommendations For Transparent Sustainable and Reproducible Audio Research
10 pages
Melnet: A Generative Model For Audio in The Frequency Domain
No ratings yet
Melnet: A Generative Model For Audio in The Frequency Domain
14 pages
Pert Usa PHD
No ratings yet
Pert Usa PHD
232 pages
Ph.D. Thesis Computationally Efficient Methods For Polyphonic Music Transcription
No ratings yet
Ph.D. Thesis Computationally Efficient Methods For Polyphonic Music Transcription
232 pages
Speech-Music Discrimination: A Deep Learning Perspective: Aggelos Pikrakis Sergios Theodoridis
No ratings yet
Speech-Music Discrimination: A Deep Learning Perspective: Aggelos Pikrakis Sergios Theodoridis
5 pages
A Hierarchical Approach For Audio Capture, Archive, and Distribution
No ratings yet
A Hierarchical Approach For Audio Capture, Archive, and Distribution
20 pages
Ai PBL A6
No ratings yet
Ai PBL A6
17 pages
Audio Segment Heuristic
No ratings yet
Audio Segment Heuristic
10 pages
Audio DeepFake Detection (Innovative)
No ratings yet
Audio DeepFake Detection (Innovative)
16 pages
Classification of Vehicles Based On Audio Signals Using Quadratic Discriminant Analysis and High Energy Feature Vectors
No ratings yet
Classification of Vehicles Based On Audio Signals Using Quadratic Discriminant Analysis and High Energy Feature Vectors
19 pages
Pap Tza 2017
No ratings yet
Pap Tza 2017
17 pages
Samsung Prism PPT 2
No ratings yet
Samsung Prism PPT 2
11 pages
APPFDL
No ratings yet
APPFDL
9 pages
Audiosegment Readthedocs Io en Latest
No ratings yet
Audiosegment Readthedocs Io en Latest
23 pages
Detection of Power Grid Synchronization Failure by Sensing Bad Voltage and Frequency
No ratings yet
Detection of Power Grid Synchronization Failure by Sensing Bad Voltage and Frequency
5 pages
Paper 10
No ratings yet
Paper 10
7 pages
Audio Chord Recognition With Recurrent Neural Networks
No ratings yet
Audio Chord Recognition With Recurrent Neural Networks
6 pages
Data Augmentation1
No ratings yet
Data Augmentation1
9 pages
1999 Waspaa Mfas PDF
No ratings yet
1999 Waspaa Mfas PDF
4 pages
Spectrogram Transformers For Audio Classification
No ratings yet
Spectrogram Transformers For Audio Classification
7 pages
Wei 2020 J. Phys. - Conf. Ser. 1453 012085
No ratings yet
Wei 2020 J. Phys. - Conf. Ser. 1453 012085
9 pages
2005 Automatic Music Classification and Summarization
No ratings yet
2005 Automatic Music Classification and Summarization
11 pages
Audio DeepFake Detection (Innovative)
100% (1)
Audio DeepFake Detection (Innovative)
16 pages
WIMP2017 Martinez-RamirezReiss
No ratings yet
WIMP2017 Martinez-RamirezReiss
4 pages
Voice Technologies and Systems: Definitive Reference for Developers and Engineers
From Everand
Voice Technologies and Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Computer Audition: Fundamentals and Applications
From Everand
Computer Audition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
Human Visual System Model: Understanding Perception and Processing
From Everand
Human Visual System Model: Understanding Perception and Processing
Fouad Sabry
No ratings yet
Which Advertisement. Next To Each Statement Write A Letter (A-H) - Some Advertisements Correspond To More Than One Statement. One Example Is Given
No ratings yet
Which Advertisement. Next To Each Statement Write A Letter (A-H) - Some Advertisements Correspond To More Than One Statement. One Example Is Given
9 pages
Enterprise Value and EBITDA
No ratings yet
Enterprise Value and EBITDA
3 pages
Blown Film
0% (1)
Blown Film
4 pages
Riello Burner
No ratings yet
Riello Burner
48 pages
C++ Codes
No ratings yet
C++ Codes
27 pages
Mobiltherm 605 Pds
No ratings yet
Mobiltherm 605 Pds
2 pages
2018-1 - Classifications in Brief Tonnis Classification of Hip Osteoarthritis
No ratings yet
2018-1 - Classifications in Brief Tonnis Classification of Hip Osteoarthritis
5 pages
Magnetostriction and Applications of Ultrasonic Waves: 15Z204 - Materials Science
No ratings yet
Magnetostriction and Applications of Ultrasonic Waves: 15Z204 - Materials Science
17 pages
ĐỀ THI THỬ SỐ 10 - Khóa Đề
No ratings yet
ĐỀ THI THỬ SỐ 10 - Khóa Đề
6 pages
SPM Physics Definition List
No ratings yet
SPM Physics Definition List
5 pages
Information Technology Project Management: Providing Measurable Organizational Value 5th Edition by Jack Marchewka 1118911016 9781118911013
100% (11)
Information Technology Project Management: Providing Measurable Organizational Value 5th Edition by Jack Marchewka 1118911016 9781118911013
81 pages
YAMAHA OUTBOARD LZ200NETO, LZ200TR Service Repair Manual X 100101 PDF
No ratings yet
YAMAHA OUTBOARD LZ200NETO, LZ200TR Service Repair Manual X 100101 PDF
60 pages
Application Form
No ratings yet
Application Form
2 pages
Key Differences Between Industrial All Risk (IAR) and PAR
No ratings yet
Key Differences Between Industrial All Risk (IAR) and PAR
2 pages
190 MP IgM-IFU-en-EU-IVDD-V2.1
No ratings yet
190 MP IgM-IFU-en-EU-IVDD-V2.1
2 pages
Potato Specification
No ratings yet
Potato Specification
3 pages
Periodic Health Examination Form 2 2020
No ratings yet
Periodic Health Examination Form 2 2020
2 pages
"Office Green Wants To Increase Brand Awareness.": Goal One: SMART Goal One
No ratings yet
"Office Green Wants To Increase Brand Awareness.": Goal One: SMART Goal One
2 pages
BQ Tandas
100% (1)
BQ Tandas
102 pages
PJ Poliuretán Hosszbordás Szíj
No ratings yet
PJ Poliuretán Hosszbordás Szíj
1 page
C++ With Visual Basic
No ratings yet
C++ With Visual Basic
10 pages
China Suzhou Retail Q4 2019 ENG
No ratings yet
China Suzhou Retail Q4 2019 ENG
2 pages
Coldrinks Project
No ratings yet
Coldrinks Project
23 pages
MVP Comprehensive Resource Impacts Agreement
No ratings yet
MVP Comprehensive Resource Impacts Agreement
16 pages
The Ergonomic Posture Assessment by Comparing REBA With RULA & OWAS: A Case Study in A Gas Springs Factory
No ratings yet
The Ergonomic Posture Assessment by Comparing REBA With RULA & OWAS: A Case Study in A Gas Springs Factory
23 pages
MLT Application e
No ratings yet
MLT Application e
16 pages
Nandkishor Patil
No ratings yet
Nandkishor Patil
2 pages
Final hmt-1 PDF
No ratings yet
Final hmt-1 PDF
211 pages

Paper 10

Uploaded by

Paper 10

Uploaded by

Hindawi

Shruti Aggarwal ,1 Vasukidevi G,2 S. Selvakanmani,3 Bhaskar Pant,4 Kiranjeet Kaur,5

Academic Editor: Punit Gupta

1. Introduction segmentation architecture, which is an open form of ar-

Figure 1: Audio segmentation process.

Figure 2: Source-wise analysis of audio segmentation research trends.

2.2. Year-Wise Publications and Research Trends.

Figure 4: Co-occurrence keyword network on audio segmentation and deep learning.

Yearly Publications 36 and 47 documents, respectively. At the same time,

with audio segmentation, where the sum of squared errors is

Figure 6: Global research trends for audio segmentation.

2017 2018 2019 2020

properties is extracted from each audio frame [50]. Features References

You might also like