Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (38)

Search Parameters:
Keywords = speaking style

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
12 pages, 324 KiB  
Article
Psychometric Properties of the Preference for Intuition and Deliberation in Eating Decision-Making Scale among Brazilian Adult Women
by Thainá Richelli Oliveira Resende, Edilene Márcia de Sousa, Marle dos Santos Alvarenga, Mariana Cristina Palermo Ferreira, Larissa Stefhanne Damasceno de Amorim Póvoa, Leandro Henrique Pereira Galvane, Cleidiel Aparecido Araujo Lemos, António Raposo, Ariana Saraiva, Conrado Carrascosa, Hmidan A. Alturki and Pedro Henrique Berbert de Carvalho
Nutrients 2024, 16(19), 3252; https://doi.org/10.3390/nu16193252 - 26 Sep 2024
Viewed by 644
Abstract
The Preference for Intuition and Deliberation in Food Decision-Making Scale (E-PID) was developed to evaluate both intuitive and deliberative food decision-making within a single instrument. However, its psychometric properties have only been assessed among German-speaking participants. The main aim of the present study [...] Read more.
The Preference for Intuition and Deliberation in Food Decision-Making Scale (E-PID) was developed to evaluate both intuitive and deliberative food decision-making within a single instrument. However, its psychometric properties have only been assessed among German-speaking participants. The main aim of the present study was to evaluate evidence of validity and reliability of the E-PID among 604 Brazilian adult women. Exploratory (n = 289) and confirmatory factor analyses (n = 315) were conducted to evaluate the factor structure of the E-PID. Convergent validity was assessed correlating the E-PID with measures of eating behaviors (Tree-Factor Eating Questionnaire-18), intuitive eating (Intuitive Eating Scale-2), and a measure of beliefs and attitudes towards food (Food-Life Questionnaire-SF). McDonald’s Omega coefficient (ω) was used to test the internal consistency of the E-PID. Results from an exploratory and confirmatory factor analysis supported a two-factor structure with seven items. We found good internal consistency (McDonald’s ω = 0.77–0.81). Furthermore, the E-PID demonstrated adequate convergent validity with measures of intuitive, restrictive, emotional and uncontrolled eating, and beliefs and attitudes towards food. Results support the use of the E-PID as a measure of intuition and deliberation in food decision-making among Brazilian adult women, expanding the literature on eating decision-making styles. Full article
(This article belongs to the Special Issue Eating Behavior and Women's Health)
15 pages, 600 KiB  
Article
French Validation of the New Sexual Satisfaction Scale Short Form (NSSS-SF Fr)
by Brice Gouvernet
Sexes 2024, 5(1), 31-45; https://doi.org/10.3390/sexes5010003 - 28 Feb 2024
Viewed by 1433
Abstract
This study addresses the critical need for French-language tools in assessing sexual satisfaction, an important aspect of global health, sexual health, and mental health. Its main aim is to validate the French version of the NSSS-SF scale (NSSS-SF Fr, Fr for French). The [...] Read more.
This study addresses the critical need for French-language tools in assessing sexual satisfaction, an important aspect of global health, sexual health, and mental health. Its main aim is to validate the French version of the NSSS-SF scale (NSSS-SF Fr, Fr for French). The research was conducted in two phases. The first study involved 253 participants, predominantly female (77.75%), with a focus on examining the tool’s psychometric properties (factorial structure, internal consistency, convergent validity). The second study included 855 participants, with a similar gender distribution, aimed at further validation and analysis, studying links between NSSS-SF Fr and anxiety and depressive symptoms (assessed with GAD7 and MDI), and attachment style (ECR-RS). The NSSS-SF Fr demonstrated robust psychometric properties. Key findings included its strong correlation with sexual health indicators, anxiety, depression, and attachment styles confirming its effectiveness as a reliable tool for evaluating sexual satisfaction in French-speaking populations. Comparisons with international studies highlighted its universal applicability and cultural sensitivity. The NSSS-SF French version stands as a critical tool for future research and clinical practice, bridging a vital gap in the assessment of sexual satisfaction among French-speaking individuals. Full article
Show Figures

Figure 1

15 pages, 1224 KiB  
Article
The Body, the Spirit, and the Other: Yantras as Embodied Cultural Integration
by Maja Tabea Jerrentrup
Soc. Sci. 2024, 13(1), 34; https://doi.org/10.3390/socsci13010034 - 3 Jan 2024
Viewed by 2589
Abstract
This article looks at the Sak Yant tattoo style, which is becoming increasingly popular among so-called “Westerners”. It explores the questions of whether Sak Yant tattoos among “Westerners” will typically fall under copyright issues and cultural appropriation, and what makes Sak Yants relevant [...] Read more.
This article looks at the Sak Yant tattoo style, which is becoming increasingly popular among so-called “Westerners”. It explores the questions of whether Sak Yant tattoos among “Westerners” will typically fall under copyright issues and cultural appropriation, and what makes Sak Yants relevant to clients. Underlying this research, with a marketing analysis of Sak Yants on Instagram, is the assumption that marketing is also guided by (anticipated) customer desires and can thus tell us something about their perspective. Two interrelated aspects become apparent: Sak Yants integrate aesthetics and spirituality as well as the body and mind, entities that are often considered separately in the “West”, which may be appealing to the “Western” customer and which sets Sak Yants apart from other tattoo styles. The meanings that Sak Yants have usually go deeper than just to the surface, as is not only illustrated by the process and permanence of tattooing but also by the importance of the ritual. People from the respective cultural contexts usually benefit and take part in the process. Therefore, instead of cultural appropriation or appreciation, one could perhaps speak of cultural participation or integration. Full article
Show Figures

Figure 1

16 pages, 4319 KiB  
Article
“Once the Fire Starts Then There Is No Stopping It”: The Revitalization of Chinookan Art in the 21st Century, Conversations with Greg A. Robinson
by Jon D. Daehnke and Greg A. Robinson
Arts 2023, 12(5), 185; https://doi.org/10.3390/arts12050185 - 31 Aug 2023
Viewed by 2891
Abstract
Chinookan art centered on the Lower Columbia River and was created by Chinookan-speaking people living along the river and its tributaries. The style is unique, focusing on geometric forms, numerical patterns, and anatomical representation. It is embedded in Chinookan mythology and differs considerably [...] Read more.
Chinookan art centered on the Lower Columbia River and was created by Chinookan-speaking people living along the river and its tributaries. The style is unique, focusing on geometric forms, numerical patterns, and anatomical representation. It is embedded in Chinookan mythology and differs considerably from the more widely recognized Formline of Indigenous artists from the northern Pacific Northwest. It also receives less attention, both publicly and scholarly. Due to high rates of death along the Columbia from introduced diseases during colonial invasion, and high levels of looting that followed, Chinookan art nearly disappeared from the landscape. In the 21st century Chinookan art has had a resurgence, led by Chinookan practitioners. The resurgence occurs not only within individual households but also in public settings. This resurgence also includes an emphasis on teaching the style to youth, who learn that this is not just about making art but is integrally attached to culture more broadly, including connection to language, stories, protocols, and Indigenous identity itself. It is ultimately a source of pride, resilience, and resistance. As a result, where there were once generations who never saw a landscape with Chinookan art, there are now generations who will never know a landscape without it. Full article
(This article belongs to the Special Issue Arts of the Northwest Coast)
Show Figures

Figure 1

11 pages, 678 KiB  
Article
Measuring Vulnerability in Grief: The Psychometric Properties of the Italian Adult Attitude to Grief Scale
by Alessio Gori, Eleonora Topino, Pierluigi Imperatore, Alessandro Musetti, Julius Sim and Linda Machin
Eur. J. Investig. Health Psychol. Educ. 2023, 13(6), 975-985; https://doi.org/10.3390/ejihpe13060074 - 4 Jun 2023
Viewed by 1965
Abstract
Although experiences of loss and the consequent grief are natural in human life, some individuals may have difficulty managing these events, to the point of developing significant impairment in their functioning in important life areas. Given this, the present research aimed to explore [...] Read more.
Although experiences of loss and the consequent grief are natural in human life, some individuals may have difficulty managing these events, to the point of developing significant impairment in their functioning in important life areas. Given this, the present research aimed to explore the psychometric properties of the Italian version of the Adult Attitude to Grief scale (AAG) to facilitate research on adult vulnerability to grief among Italian-speaking populations. A sample of 367 participants (Mage = 30.44, SD = 11.21; 78% females) participated in this research. A back-translation procedure was implemented to develop the Italian AAG. Then, participants completed the Italian AAG alongside a battery of other self-report psychometric scales in order to assess aspects of the construct validity of the AAG: the Forty-Item Defense Style Questionnaire, the Impact of Event Scale—Revised, and the Beck Depression Inventory–II. A bifactor structure was found to have the best fit to the data, supporting the possibility of using both the general factor (i.e., vulnerability) and three dimensions (i.e., overwhelmed, controlled, and resilient). Unlike the original version, the control dimension emerged as a “protective” factor in the Italian population, together with the resilient factor. Furthermore, results provided satisfactory indications of internal consistency and construct validity. In conclusion, the Italian AAG was shown to be a valid, reliable, quick, and easy-to-use scale that can be used both for research and clinical practice in the Italian context. Full article
Show Figures

Figure 1

17 pages, 717 KiB  
Article
A Personalized Multi-Turn Generation-Based Chatbot with Various-Persona-Distribution Data
by Shihao Zhu, Tinghuai Ma, Huan Rong and Najla Al-Nabhan
Appl. Sci. 2023, 13(5), 3122; https://doi.org/10.3390/app13053122 - 28 Feb 2023
Cited by 2 | Viewed by 3106
Abstract
Existing persona-based dialogue generation models focus on the semantic consistency between personas and responses. However, various influential factors can cause persona inconsistency, such as the speaking style in the context. Existing models perform inflexibly in speaking styles on various-persona-distribution datasets, resulting in persona [...] Read more.
Existing persona-based dialogue generation models focus on the semantic consistency between personas and responses. However, various influential factors can cause persona inconsistency, such as the speaking style in the context. Existing models perform inflexibly in speaking styles on various-persona-distribution datasets, resulting in persona style inconsistency. In this work, we propose a dialogue generation model with persona selection classifier to solve the complex inconsistency problem. The model generates responses in two steps: original response generation and rewriting responses. For training, we employ two auxiliary tasks: (1) a persona selection task to fuse the adapted persona into the original responses; (2) consistency inference to remove inconsistent persona information in the final responses. In our model, the adapted personas are predicted by an NLI-based classifier. We evaluate our model on the persona dialogue dataset with different persona distributions, i.e., the persona-dense PersonaChat dataset and the persona-spare PersonalDialog dataset. The experimental results show that our model outperforms strong models in response quality, persona consistency, and persona distribution consistency. Full article
(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications)
Show Figures

Figure 1

21 pages, 756 KiB  
Review
Reconsidering Read and Spontaneous Speech: Causal Perspectives on the Generation of Training Data for Automatic Speech Recognition
by Philipp Gabler, Bernhard C. Geiger, Barbara Schuppler and Roman Kern
Information 2023, 14(2), 137; https://doi.org/10.3390/info14020137 - 19 Feb 2023
Cited by 4 | Viewed by 3702
Abstract
Superficially, read and spontaneous speech—the two main kinds of training data for automatic speech recognition—appear as complementary, but are equal: pairs of texts and acoustic signals. Yet, spontaneous speech is typically harder for recognition. This is usually explained by different kinds of variation [...] Read more.
Superficially, read and spontaneous speech—the two main kinds of training data for automatic speech recognition—appear as complementary, but are equal: pairs of texts and acoustic signals. Yet, spontaneous speech is typically harder for recognition. This is usually explained by different kinds of variation and noise, but there is a more fundamental deviation at play: for read speech, the audio signal is produced by recitation of the given text, whereas in spontaneous speech, the text is transcribed from a given signal. In this review, we embrace this difference by presenting a first introduction of causal reasoning into automatic speech recognition, and describing causality as a tool to study speaking styles and training data. After breaking down the data generation processes of read and spontaneous speech and analysing the domain from a causal perspective, we highlight how data generation by annotation must affect the interpretation of inference and performance. Our work discusses how various results from the causality literature regarding the impact of the direction of data generation mechanisms on learning and prediction apply to speech data. Finally, we argue how a causal perspective can support the understanding of models in speech processing regarding their behaviour, capabilities, and limitations. Full article
Show Figures

Figure 1

7 pages, 222 KiB  
Essay
Physical Philosophy: Martial Arts as Embodied Wisdom
by Jason Holt
Philosophies 2023, 8(1), 14; https://doi.org/10.3390/philosophies8010014 - 14 Feb 2023
Cited by 6 | Viewed by 5151
Abstract
While defining martial arts is not prerequisite to philosophizing about them, such a definition is desirable, helping us resolve disputes about the status of hard cases. At one extreme, Martínková and Parry argue that martial arts are distinguished from both close combat (as [...] Read more.
While defining martial arts is not prerequisite to philosophizing about them, such a definition is desirable, helping us resolve disputes about the status of hard cases. At one extreme, Martínková and Parry argue that martial arts are distinguished from both close combat (as unsystematic) and combat sports (as competitive), and from warrior arts (as lethal) and martial paths (as spiritual). At the other extreme, mixed martial arts pundits and Bruce Lee speak of combat sports generally as martial arts. I argue that the fine-grained taxonomy proposed by Martínková and Parry can be usefully supplemented by a broader definition, specifically the following: martial arts are systematic fighting styles and practices as ways of embodying wisdom. A possible difficulty here is that such views face the charge of overemphasizing the “philosophical” aspect of martial arts. My definition can, however, avoid this apparent problem. If martial arts essentially aim to embody wisdom, this applies no less to the (strategic) practical wisdom of The Art of War than to the (ethical) practical wisdom of the Tao Te Ching. In an extended sense, then, any systematic fighting style, including combat sports, may count as a martial art insofar as it embodies wisdom by improving practical fighting skills. Full article
(This article belongs to the Special Issue The Philosophy and Science of Martial Arts)
15 pages, 1605 KiB  
Article
Residual Information in Deep Speaker Embedding Architectures
by Adriana Stan
Mathematics 2022, 10(21), 3927; https://doi.org/10.3390/math10213927 - 23 Oct 2022
Cited by 1 | Viewed by 2677
Abstract
Speaker embeddings represent a means to extract representative vectorial representations from a speech signal such that the representation pertains to the speaker identity alone. The embeddings are commonly used to classify and discriminate between different speakers. However, there is no objective measure to [...] Read more.
Speaker embeddings represent a means to extract representative vectorial representations from a speech signal such that the representation pertains to the speaker identity alone. The embeddings are commonly used to classify and discriminate between different speakers. However, there is no objective measure to evaluate the ability of a speaker embedding to disentangle the speaker identity from the other speech characteristics. This means that the embeddings are far from ideal, highly dependent on the training corpus and still include a degree of residual information pertaining to factors such as linguistic content, recording conditions or speaking style of the utterance. This paper introduces an analysis over six sets of speaker embeddings extracted with some of the most recent and high-performing deep neural network (DNN) architectures, and in particular, the degree to which they are able to truly disentangle the speaker identity from the speech signal. To correctly evaluate the architectures, a large multi-speaker parallel speech dataset is used. The dataset includes 46 speakers uttering the same set of prompts, recorded in either a professional studio or their home environments. The analysis looks into the intra- and inter-speaker similarity measures computed over the different embedding sets, as well as if simple classification and regression methods are able to extract several residual information factors from the speaker embeddings. The results show that the discriminative power of the analyzed embeddings is very high, yet across all the analyzed architectures, residual information is still present in the representations in the form of a high correlation to the recording conditions, linguistic contents and utterance duration. However, we show that this correlation, although not ideal, could still be useful in downstream tasks. The low-dimensional projections of the speaker embeddings show similar behavior patterns across the embedding sets with respect to intra-speaker data clustering and utterance outlier detection. Full article
Show Figures

Figure 1

15 pages, 529 KiB  
Article
Audio Augmentation for Non-Native Children’s Speech Recognition through Discriminative Learning
by Kodali Radha and Mohan Bansal
Entropy 2022, 24(10), 1490; https://doi.org/10.3390/e24101490 - 19 Oct 2022
Cited by 15 | Viewed by 2445
Abstract
Automatic speech recognition (ASR) in children is a rapidly evolving field, as children become more accustomed to interacting with virtual assistants, such as Amazon Echo, Cortana, and other smart speakers, and it has advanced the human–computer interaction in recent generations. Furthermore, non-native children [...] Read more.
Automatic speech recognition (ASR) in children is a rapidly evolving field, as children become more accustomed to interacting with virtual assistants, such as Amazon Echo, Cortana, and other smart speakers, and it has advanced the human–computer interaction in recent generations. Furthermore, non-native children are observed to exhibit a diverse range of reading errors during second language (L2) acquisition, such as lexical disfluency, hesitations, intra-word switching, and word repetitions, which are not yet addressed, resulting in ASR’s struggle to recognize non-native children’s speech. The main objective of this study is to develop a non-native children’s speech recognition system on top of feature-space discriminative models, such as feature-space maximum mutual information (fMMI) and boosted feature-space maximum mutual information (fbMMI). Harnessing the collaborative power of speed perturbation-based data augmentation on the original children’s speech corpora yields an effective performance. The corpus focuses on different speaking styles of children, together with read speech and spontaneous speech, in order to investigate the impact of non-native children’s L2 speaking proficiency on speech recognition systems. The experiments revealed that feature-space MMI models with steadily increasing speed perturbation factors outperform traditional ASR baseline models. Full article
(This article belongs to the Special Issue Information-Theoretic Approaches in Speech Processing and Recognition)
Show Figures

Figure 1

7 pages, 247 KiB  
Article
Social Virtual Reality: Neurodivergence and Inclusivity in the Metaverse
by James Hutson
Societies 2022, 12(4), 102; https://doi.org/10.3390/soc12040102 - 7 Jul 2022
Cited by 38 | Viewed by 8817
Abstract
Whereas traditional teaching environments encourage lively and engaged interaction and reward extrovert qualities, introverts, and others with symptoms that make social engagement difficult, such as autism spectrum disorder (ASD), are often disadvantaged. This population is often more engaged in quieter, low-key learning environments [...] Read more.
Whereas traditional teaching environments encourage lively and engaged interaction and reward extrovert qualities, introverts, and others with symptoms that make social engagement difficult, such as autism spectrum disorder (ASD), are often disadvantaged. This population is often more engaged in quieter, low-key learning environments and often does not speak up and answer questions in traditional lecture-style classes. These individuals are often passed over in school and later in their careers for not speaking up and are assumed to not be as competent as their gregarious and outgoing colleagues. With the rise of the metaverse and democratization of virtual reality (VR) technology, post-secondary education is especially poised to capitalize on the immersive learning environments social VR provides and prepare students for the future of work, where virtual collaboration will be key. This study seeks to reconsider the role of VR and the metaverse for introverts and those with ASD. The metaverse has the potential to continue the social and workplace changes already accelerated by the pandemic and open new avenues for communication and collaboration for a more inclusive audience and tomorrow. Full article
17 pages, 582 KiB  
Article
Ghosting, Breadcrumbing, Catfishing: A Corpus Analysis of English Borrowings in the Spanish Speaking World
by Irene Rull García and Kathryn P. Bove
Languages 2022, 7(2), 119; https://doi.org/10.3390/languages7020119 - 11 May 2022
Cited by 1 | Viewed by 3719
Abstract
The study aims to contribute to our understanding of the situation of languages in contact and the phenomenon of linguistic borrowings in the modern online world. The current study investigates the use of English terms borrowed to describe romantic relationships in Spanish. We [...] Read more.
The study aims to contribute to our understanding of the situation of languages in contact and the phenomenon of linguistic borrowings in the modern online world. The current study investigates the use of English terms borrowed to describe romantic relationships in Spanish. We use a list of terms presented in GQ Spain, a men’s culture, fashion and style magazine, as popular terms in 2020 to describe (a lack of) love in romantic relationships. In order to analyze the actual use of these borrowings in Spanish, we collected data from the Corpus del Español NOW (2012–2019), focusing on the number of occurrences of each English borrowing, level of morphological adaptation, co-occurrence of translations or explanations, date of first use and location of use. Overall, 11 of the 20 terms, such as ghosting, gaslighting or benching, appeared in the corpus. We note the presence of quotation marks, parentheses or uppercase letters in some cases, but it was observed that most examples keep their English form. However, many terms appeared with an explanation or translation, reflecting the novelty of the borrowing. Data regarding dates and countries were collected in order to set the year they were integrated with the new meaning (2013–2019). The country with the highest number of cases was Argentina, and there were a substantial number of cases in other Spanish-speaking countries. Overall, these findings show an increase in the incorporation of these borrowings over the years in the Spanish lexicon. Full article
Show Figures

Figure 1

12 pages, 348 KiB  
Article
Examining English- and Spanish-Speaking Therapist Behaviors in Parent–Child Interaction Therapy
by Yessica Green Rosas, Kristen M. McCabe, Argero Zerr, May Yeh, Kristine Gese and Miya L. Barnett
Int. J. Environ. Res. Public Health 2022, 19(8), 4474; https://doi.org/10.3390/ijerph19084474 - 8 Apr 2022
Cited by 3 | Viewed by 1941
Abstract
Parent–child interaction therapy (PCIT) is a best-practice treatment for behavior problems in young children. In PCIT, therapists coach parents during in-vivo interactions to strengthen the parent–child relationship and teach parents effective ways of managing difficult child behaviors. Past research has found that different [...] Read more.
Parent–child interaction therapy (PCIT) is a best-practice treatment for behavior problems in young children. In PCIT, therapists coach parents during in-vivo interactions to strengthen the parent–child relationship and teach parents effective ways of managing difficult child behaviors. Past research has found that different therapist coaching styles may be associated with faster skill acquisition and improved parent engagement. However, most research examining therapist behaviors has been conducted with English-speaking families, and there is limited research examining therapist behaviors when working with Spanish-speaking clients. In this study, English- and Spanish-speaking therapists’ coaching behaviors (e.g., directive versus responsive) were examined, as well as their association with client outcomes, including speed of parental skill acquisition and treatment completion. Results suggested that coaching styles varied significantly between sessions conducted in Spanish versus English. In Spanish sessions, therapists had more total verbalizations than in English sessions and demonstrated higher rates of both total directive and responsive coaching. Responsive coaching was found to predict treatment completion across groups, while directive coaching was not. Directive and responsive coaching were not found to predict the rate of parental skill acquisition. Implications regarding the training of therapists and emphasizing cultural considerations are discussed. Full article
(This article belongs to the Special Issue Parent-Child Interaction Therapy: Advances toward Health Equity)
11 pages, 1984 KiB  
Article
Perceived Anger in Clear and Conversational Speech: Contributions of Age and Hearing Loss
by Shae D. Morgan, Sarah Hargus Ferguson, Ashton D. Crain and Skyler G. Jennings
Brain Sci. 2022, 12(2), 210; https://doi.org/10.3390/brainsci12020210 - 2 Feb 2022
Cited by 1 | Viewed by 1917
Abstract
A previous investigation demonstrated differences between younger adult normal-hearing listeners and older adult hearing-impaired listeners in the perceived emotion of clear and conversational speech. Specifically, clear speech sounded angry more often than conversational speech for both groups, but the effect was smaller for [...] Read more.
A previous investigation demonstrated differences between younger adult normal-hearing listeners and older adult hearing-impaired listeners in the perceived emotion of clear and conversational speech. Specifically, clear speech sounded angry more often than conversational speech for both groups, but the effect was smaller for the older listeners. These listener groups differed by two confounding factors, age (younger vs. older adults) and hearing status (normal vs. impaired). The objective of the present study was to evaluate the contributions of aging and hearing loss to the reduced perception of anger in older adults with hearing loss. We investigated perceived anger in clear and conversational speech in younger adults with and without a simulated age-related hearing loss, and in older adults with normal hearing. Younger adults with simulated hearing loss performed similarly to normal-hearing peers, while normal-hearing older adults performed similarly to hearing-impaired peers, suggesting that aging was the primary contributor to the decreased anger perception seen in previous work. These findings confirm reduced anger perception for older adults compared to younger adults, though the significant speaking style effect—regardless of age and hearing status—highlights the need to identify methods of producing clear speech that is emotionally neutral or positive. Full article
(This article belongs to the Special Issue Auditory and Phonetic Processes in Speech Perception)
Show Figures

Figure 1

12 pages, 812 KiB  
Article
A Comparison of Hybrid and End-to-End ASR Systems for the IberSpeech-RTVE 2020 Speech-to-Text Transcription Challenge
by Juan M. Perero-Codosero, Fernando M. Espinoza-Cuadros and Luis A. Hernández-Gómez
Appl. Sci. 2022, 12(2), 903; https://doi.org/10.3390/app12020903 - 17 Jan 2022
Cited by 12 | Viewed by 3612
Abstract
This paper describes a comparison between hybrid and end-to-end Automatic Speech Recognition (ASR) systems, which were evaluated on the IberSpeech-RTVE 2020 Speech-to-Text Transcription Challenge. Deep Neural Networks (DNNs) are becoming the most promising technology for ASR at present. In the last few years, [...] Read more.
This paper describes a comparison between hybrid and end-to-end Automatic Speech Recognition (ASR) systems, which were evaluated on the IberSpeech-RTVE 2020 Speech-to-Text Transcription Challenge. Deep Neural Networks (DNNs) are becoming the most promising technology for ASR at present. In the last few years, traditional hybrid models have been evaluated and compared to other end-to-end ASR systems in terms of accuracy and efficiency. We contribute two different approaches: a hybrid ASR system based on a DNN-HMM and two state-of-the-art end-to-end ASR systems, based on Lattice-Free Maximum Mutual Information (LF-MMI). To address the high difficulty in the speech-to-text transcription of recordings with different speaking styles and acoustic conditions from TV studios to live recordings, data augmentation and Domain Adversarial Training (DAT) techniques were studied. Multi-condition data augmentation applied to our hybrid DNN-HMM demonstrated WER improvements in noisy scenarios (about 10% relatively). In contrast, the results obtained using an end-to-end PyChain-based ASR system were far from our expectations. Nevertheless, we found that when including DAT techniques, a relative WER improvement of 2.87% was obtained as compared to the PyChain-based system. Full article
Show Figures

Figure 1

Back to TopTop