default search action
INTERSPEECH 2004: Lisbon, Portugal
- 8th International Conference on Spoken Language Processing, INTERSPEECH-ICSLP 2004, Jeju Island, Korea, October 4-8, 2004. ISCA 2004
Plenary Talks
- Chin-Hui Lee:
From decoding-driven to detection-based paradigms for automatic speech recognition. - Hyun-Bok Lee:
In search of a universal phonetic alphabet - theory and application of an organic visible speech-. - Jacqueline Vaissière:
From X-ray or MRU data to sounds through articulatory synthesis: towards an integrated view of the speech communication process.
Speech Recognition - Adaptation
- Sreeram Balakrishnan, Karthik Visweswariah, Vaibhava Goel:
Stochastic gradient adaptation of front-end parameters. 1-4 - Antoine Raux, Rita Singh:
Maximum - likelihod adaptation of semi-continuous HMMs by latent variable decomposition of state distributions. 5-8 - Chao Huang, Tao Chen, Eric Chang:
Transformation and combination of hiden Markov models for speaker selection training. 9-12 - Brian Kan-Wing Mak, Roger Wend-Huu Hsiao:
Improving eigenspace-based MLLR adaptation by kernel PCA. 13-16 - Nikos Chatzichrisafis, Vassilios Digalakis, Vassilios Diakoloukas, Costas Harizakis:
Rapid acoustic model development using Gaussian mixture clustering and language adaptation. 17-20 - Karthik Visweswariah, Ramesh A. Gopinath:
Adaptation of front end parameters in a speech recognizer. 21-24 - Diego Giuliani, Matteo Gerosa, Fabio Brugnara:
Speaker normalization through constrained MLLR based transforms. 2893-2896 - Xiangyu Mu, Shuwu Zhang, Bo Xu:
Multi-layer structure MLLR adaptation algorithm with subspace regression classes and tying. 2897-2900 - Georg Stemmer, Stefan Steidl, Christian Hacker, Elmar Nöth:
Adaptation in the pronunciation space for non-native speech recognition. 2901-2904 - Xuechuan Wang, Douglas D. O'Shaughnessy:
Robust ASR model adaptation by feature-based statistical data mapping. 2905-2908 - Zhaobing Han, Shuwu Zhang, Bo Xu:
A novel target-driven generalized JMAP adaptation algorithm. 2909-2912 - Brian Mak, Simon Ka-Lung Ho, James T. Kwok:
Speedup of kernel eigenvoice speaker adaptation by embedded kernel PCA. 2913-2916 - Hyung Bae Jeon, Dong Kook Kim:
Maximum a posteriori eigenvoice speaker adaptation for Korean connected digit recognition. 2917-2920 - Wei Wang, Stephen A. Zahorian:
Vocal tract normalization based on spectral warping. 2921-2924 - Koji Tanaka, Fuji Ren, Shingo Kuroiwa, Satoru Tsuge:
Acoustic model adaptation for coded speech using synthetic speech. 2925-2928 - Motoyuki Suzuki, Hirokazu Ogasawara, Akinori Ito, Yuichi Ohkawa, Shozo Makino:
Speaker adaptation method for CALL system using bilingual speakers' utterances. 2929-2932 - Shinji Watanabe:
Acoustic model adaptation based on coarse/fine training of transfer vectors and its application to a speaker adaptation task. 2933-2936 - Wei-Ho Tsai, Shih-Sian Cheng, Hsin-Min Wang:
Speaker clustering of speech utterances using a voice characteristic reference space. 2937-2940 - Young Kuk Kim, Hwa Jeon Song, Hyung Soon Kim:
Performance improvement of connected digit recognition using unsupervised fast speaker adaptation. 2941-2944 - Hyung Soon Kim, Hwa Jeon Song:
Simultaneous estimation of weights of eigenvoices and bias compensation vector for rapid speaker adaptation. 2945-2948 - Matthias Wölfel:
Speaker dependent model order selection of spectral envelopes. 2949-2952 - Enrico Bocchieri, Michael Riley, Murat Saraclar:
Methods for task adaptation of acoustic models with limited transcribed in-domain data. 2953-2956 - Atsushi Fujii, Tetsuya Ishikawa, Katsunobu Itou, Tomoyosi Akiba:
Unsupervised topic adaptation for lecture speech retrieval. 2957-2960 - Haibin Liu, Zhenyang Wu:
Mean and covariance adaptation based on minimum classification error linear regression for continuous density HMMs. 2961-2964 - Goshu Nagino, Makoto Shozakai:
Design of ready-made acoustic model library by two-dimensional visualization of acoustic space. 2965-2968
Spoken Language Identification, Translation and Retrieval I
- Jean-Luc Gauvain, Abdelkhalek Messaoudi, Holger Schwenk:
Language recognition using phone latices. 25-28 - Mark A. Huckvale:
ACCDIST: a metric for comparing speakers' accents. 29-32 - Michael Levit, Allen L. Gorin, Patrick Haffner, Hiyan Alshawi, Elmar Nöth:
Aspects of named entity processing. 33-36 - Josep Maria Crego, José B. Mariño, Adrià de Gispert:
Finite-state-based and phrase-based statistical machine translation. 37-40 - Tanja Schultz, Szu-Chen Stan Jou, Stephan Vogel, Shirin Saleem:
Using word latice information for a tighter coupling in speech translation systems. 41-44 - Teruhisa Misu, Tatsuya Kawahara, Kazunori Komatani:
Confirmation strategy for document retrieval systems with spoken dialog interface. 45-48 - Shi-wook Lee, Kazuyo Tanaka, Yoshiaki Itoh:
Multilayer subword units for open-vocabulary spoken document retrieval. 1553-1556 - Yoshiaki Itoh, Kazuyo Tanaka, Shi-wook Lee:
An efficient partial matching algorithm toward speech retrieval by speech. 1557-1560 - Celestin Sedogbo, Sébastien Herry, Bruno Gas, Jean-Luc Zarader:
Language detection by neural discrimination. 1561-1564 - Ricardo de Córdoba, Javier Ferreiros, Valentín Sama, Javier Macías Guarasa, Luis Fernando D'Haro, Fernando Fernández Martínez:
Language identification techniques based on full recognition in an air traffic control task. 1565-1568 - John H. L. Hansen, Umit H. Yapanel, Rongqing Huang, Ayako Ikeno:
Dialect analysis and modeling for automatic classification. 1569-1572 - Emmanuel Ferragne, François Pellegrino:
Rhythm in read british English: interdialect variability. 1573-1576 - Pascale Fung, Yi Liu, Yongsheng Yang, Yihai Shen, Dekai Wu:
A grammar-based Chinese to English speech translation system for portable devices. 1577-1580 - Gökhan Tür:
Cost-sensitive call classification. 1581-1584 - Mikko Kurimo, Ville T. Turunen, Inger Ekman:
An evaluation of a spoken document retrieval baseline system in finish. 1585-1588 - Hui Jiang, Pengfei Liu, Imed Zitouni:
Discriminative training of naive Bayes classifiers for natural language call routing. 1589-1592 - Nicolas Moreau, Hyoung-Gook Kim, Thomas Sikora:
Phonetic confusion based document expansion for spoken document retrieval. 1593-1596 - Euisok Chung, Soojong Lim, Yi-Gyu Hwang, Myung-Gil Jang:
Hybrid named entity recognition for question-answering system. 1597-1600 - Jitendra Ajmera, Iain McCowan, Hervé Bourlard:
An online audio indexing system. 1601-1604 - Eric Sanders, Febe de Wet:
Histogram normalisation and the recognition of names and ontology words in the MUMIS project. 1605-1608 - Rui Amaral, Isabel Trancoso:
Improving the topic indexation and segmentation modules of a media watch system. 1609-1612 - Melissa Barkat-Defradas, Rym Hamdi, Emmanuel Ferragne, François Pellegrino:
Speech timing and rhythmic structure in arabic dialects: a comparison of two approaches. 1613-1616 - Hsin-Min Wang, Shih-Sian Cheng:
METRIC-SEQDAC: a hybrid approach for audio segmentation. 1617-1620 - Jen-Wei Kuo, Yao-Min Huang, Berlin Chen, Hsin-Min Wang:
Statistical Chinese spoken document retrieval using latent topical information. 1621-1624 - Masahiko Matsushita, Hiromitsu Nishizaki, Seiichi Nakagawa, Takehito Utsuro:
Keyword recognition and extraction by multiple-LVCSRs with 60, 000 words in speech-driven WEB retrieval task. 1625-1628 - Ruiqiang Zhang, Gen-ichiro Kikui, Hirofumi Yamamoto, Frank K. Soong, Taro Watanabe, Eiichiro Sumita, Wai Kit Lo:
Improved spoken language translation using n-best speech recognition hypotheses. 1629-1632 - Kakeung Wong, Man-Hung Siu:
Automatic language identification using discrete hidden Markov model. 1633-1636 - Bowen Zhou, Daniel Déchelotte, Yuqing Gao:
Two-way speech-to-speech translation on handheld devices. 1637-1640 - Hervé Blanchon:
HLT modules scalability within the NESPOLE! project. 1641-1644
Linguistics, Phonology, and Phonetics
- Midam Kim:
Correlation between VOT and F0 in the perception of Korean stops and affricates. 49-52 - Aude Noiray, Lucie Ménard, Marie-Agnès Cathiard, Christian Abry, Christophe Savariaux:
The development of anticipatory labial coarticulation in French: a pionering study. 53-56 - Melvyn John Hunt:
Speech recognition, sylabification and statistical phonetics. 57-60 - Jilei Tian:
Data-driven approaches for automatic detection of syllable boundaries. 61-64 - Anne Cutler, Dennis Norris, Núria Sebastián-Gallés:
Phonemic repertoire and similarity within the vocabulary. 65-68 - Sameer Maskey, Alan W. Black, Laura Tomokiya:
Boostrapping phonetic lexicons for new languages. 69-72 - Mirjam Broersma, K. Marieke Kolkman:
Lexical representation of non-native phonemes. 1241-1244 - Jong-Pyo Lee, Tae-Yeoub Jang:
A comparative study on the production of inter-stress intervals of English speech by English native speakers and Korean speakers. 1245-1248 - Emi Zuiki Murano, Mihoko Teshigawara:
Articulatory correlates of voice qualities of god guys and bad guys in Japanese anime: an MRI study. 1249-1252 - Sorin Dusan:
Effects of phonetic contexts on the duration of phonetic segments in fluent read speech. 1253-1256 - Qiang Fang:
A study on nasal coda los in continuous speech. 1257-1260 - Hua-Li Jian:
An improved pair-wise variability index for comparing the timing characteristics of speech. 1261-1264 - Hua-Li Jian:
An acoustic study of speech rhythm in taiwan English. 1265-1268 - Sung-A. Kim:
Language specific phonetic rules: evidence from domain-initial strengthening. 1269-1272 - Hansang Park:
Spectral characteristics of the release bursts in Korean alveolar stops. 1273-1276 - Rob van Son, Olga Bolotova, Louis C. W. Pols, Mietta Lennes:
Frequency effects on vowel reduction in three typologically different languages (dutch, finish, Russian). 1277-1280 - Julia Abresch, Stefan Breuer:
Assessment of non-native phones in anglicisms by German listeners. 1281-1284 - Sunhee Kim:
Phonology of exceptions for for Korean grapheme-to-phoneme conversion. 1285-1289 - Shigeyoshi Kitazawa, Shinya Kiriyama:
Acoustic and prosodic analysis of Japanese vowel-vowel hiatus with laryngeal effect. 1289-1293 - Kimiko Tsukada:
A cross-linguistic acoustic comparison of unreleased word-final stops: Korean and Thai. 1293-1296 - Taehong Cho, Elizabeth K. Johnson:
Acoustic correlates of phrase-internal lexical boundaries in dutch. 1297-1300 - Taehong Cho, James M. McQueen:
Phonotactics vs. phonetic cues in native and non-native listening: dutch and Korean listeners' perception of dutch and English. 1301-1304 - Svetlana Kaminskaia, François Poiré:
Comparing intonation of two varieties of French using normalized F0 values. 1305-1308 - Mira Oh, Kee-Ho Kim:
Phonetic realization of the suffix-suppressed accentual phrase in Korean. 1309-1312 - H. Timothy Bunnell, James B. Polikoff, Jane McNicholas:
Spectral moment vs. bark cepstral analysis of children's word-initial voiceles stops. 1313-1316 - Nobuaki Minematsu:
Pronunciation assessment based upon the compatibility between a learner's pronunciation structure and the target language's lexical structure. 1317-1320 - Kenji Yoshida:
Spread of high tone in akita Japanese. 1321-1324
Biomedical Applications of Speech Analysis
- Juan Ignacio Godino-Llorente, María Victoria Rodellar Biarge, Pedro Gómez Vilda, Francisco Díaz Pérez, Agustín Álvarez Marquina, Rafael Martínez-Olalla:
Biomechanical parameter fingerprint in the mucosal wave power spectral density. 73-76 - Cheolwoo Jo, Soo-Geon Wang, Byung-Gon Yang, Hyung-Soon Kim, Tao Li:
Classification of pathological voice including severely noisy cases. 77-80 - Qiang Fu, Peter Murphy:
A robust glottal source model estimation technique. 81-84 - Hiroki Mori, Yasunori Kobayashi, Hideki Kasuya, Hajime Hirose, Noriko Kobayashi:
F0 and formant frequency distribution of dysarthric speech - a comparative study. 85-88 - Hideki Kawahara, Yumi Hirachi, Masanori Morise, Hideki Banno:
Procedure "senza vibrato": a key component for morphing singing. 89-92 - Claudia Manfredi, Giorgio Peretti, Laura Magnoni, Fabrizio Dori, Ernesto Iadanza:
Thyroplastic medialisation in unilateral vocal fold paralysis: assessing voice quality recovering. 93-96 - Gernot Kubin, Martin Hagmüller:
Voice enhancement of male speakers with laryngeal neoplasm. 541-544 - Jong Min Choi, Myung-Whun Sung, Kwang Suk Park, Jeong-Hun Hah:
A comparison of the perturbation analysis between PRAAT and computerize speech lab. 545-548
Robust Speech Recognition on AURORA
- Ji Ming, Baochun Hou:
Evaluation of universal compensation on Aurora 2 and 3 and beyond. 97-100 - Hugo Van hamme:
PROSPECT features and their application to missing data techniques for robust speech recognition. 101-104 - Hugo Van hamme, Patrick Wambacq, Veronique Stouten:
Accounting for the uncertainty of speech estimates in the context of model-based feature enhancement. 105-108 - Hans-Günter Hirsch, Harald Finster:
Applying the Aurora feature extraction schemes to a phoneme based recognition task. 109-112 - Zhipeng Zhang, Tomoyuki Ohya, Sadaoki Furui:
Evaluation of tree-structured piecewise linear transformation-based noise adaptation on AURORA2 database. 113-116 - Tor André Myrvoll, Satoshi Nakamura:
Online minimum mean square error filtering of noisy cepstral coefficients using a sequential EM algorithm. 117-120 - Akira Sasou, Kazuyo Tanaka, Satoshi Nakamura, Futoshi Asano:
HMM-based feature compensation method: an evaluation using the AURORA2. 121-124 - Xuechuan Wang, Douglas D. O'Shaughnessy:
Noise adaptation for robust AURORA 2 noisy digit recognition using statistical data mapping. 125-128 - Benjamin J. Shannon, Kuldip K. Paliwal:
MFCC computation from magnitude spectrum of higher lag autocorrelation coefficients for robust speech recognition. 129-132 - Muhammad Ghulam, Takashi Fukuda, Junsei Horikawa, Tsuneo Nitta:
A noise-robust feature extraction method based on pitch-synchronous ZCPA for ASR. 133-136 - José C. Segura, Ángel de la Torre, Javier Ramírez, Antonio J. Rubio, M. Carmen Benítez:
Including uncertainty of speech observations in robust speech recognition. 137-140 - Takeshi Yamada, Jiro Okada, Nobuhiko Kitawaki:
Integration of n-best recognition results obtained by multiple noise reduction algorithms. 141-144 - Panji Setiawan, Sorel Stan, Tim Fingscheidt:
Revisiting some model-based and data-driven denoising algorithms in Aurora 2 context. 145-148 - Guo-Hong Ding, Bo Xu:
Exploring high-performance speech recognition in noisy environments using high-order taylor series expansion. 149-152 - Wing-Hei Au, Man-Hung Siu:
A robust training algorithm based on neighborhood information. 153-156 - Siu Wa Lee, Pak-Chung Ching:
In-phase feature induction: an effective compensation technique for robust speech recognition. 157-160 - Jeff Siu-Kei Au-Yeung, Man-Hung Siu:
Improved performance of Aurora 4 using HTK and unsupervised MLLR adaptation. 161-164 - Shang-nien Tsai, Lin-Shan Lee:
A new feature extraction front-end for robust speech recognition using progressive histogram equalization and multi-eigenvector temporal filtering. 165-168
Spoken / Multimodal Dialogue System
- Christian Fügen, Hartwig Holzapfel, Alex Waibel:
Tight coupling of speech recognition and dialog management - dialog-context dependent grammar weighting for speech recognition. 169-172 - Akinobu Lee, Keisuke Nakamura, Ryuichi Nisimura, Hiroshi Saruwatari, Kiyohiro Shikano:
Noise robust real world spoken dialogue system using GMM based rejection of unintended inputs. 173-176 - Hironori Oshikawa, Norihide Kitaoka, Seiichi Nakagawa:
Speech interface for name input based on combination of recognition methods using syllable-based n-gram and word dictionary. 177-180 - Imed Zitouni, Minkyu Lee, Hui Jiang:
Constrained minimization technique for topic identification using discriminative training and support vector machines. 181-184 - Jason D. Williams, Steve J. Young:
Characterizing task-oriented dialog using a simulated ASR chanel. 185-188 - Takashi Konashi, Motoyuki Suzuki, Akinori Ito, Shozo Makino:
A spoken dialog system based on automatic grammar generation and template-based weighting for autonomous mobile robots. 189-192 - Akinori Ito, Takanobu Oba, Takashi Konashi, Motoyuki Suzuki, Shozo Makino:
Noise adaptive spoken dialog system based on selection of multiple dialog strategies. 193-196 - Mikko Hartikainen, Markku Turunen, Jaakko Hakulinen, Esa-Pekka Salonen, J. Adam Funk:
Flexible dialogue management using distributed and dynamic dialogue control. 197-200 - Keith Houck:
Contextual revision in information seeking conversation systems. 201-204 - Ian M. O'Neill, Philip Hanna, Xingkun Liu, Michael F. McTear:
Cross domain dialogue modelling: an object-based approach. 205-208 - Hirohiko Sagawa, Teruko Mitamura, Eric Nyberg:
A comparison of confirmation styles for error handling in a speech dialog system. 209-212 - Fan Yang, Peter A. Heeman:
Using computer simulation to compare two models of mixed-initiative. 213-216 - Fan Yang, Peter A. Heeman, Kristy Hollingshead:
Towards understanding mixed-initiative in task-oriented dialogues. 217-220 - Peter Wolf, Joseph Woelfel, Jan C. van Gemert, Bhiksha Raj, David Wong:
Spokenquery: an alternate approach to chosing items with speech. 221-224 - Shona Douglas, Deepak Agarwal, Tirso Alonso, Robert M. Bell, Mazin G. Rahim, Deborah F. Swayne, Chris Volinsky:
Mining customer care dialogs for "daily news". 225-228 - Jens Edlund, Gabriel Skantze, Rolf Carlson:
Higgins - a spoken dialogue system for investigating error handling techniques. 229-232 - Fuliang Weng, Lawrence Cavedon, Badri Raghunathan, Danilo Mirkovic, Hua Cheng, Hauke Schmidt, Harry Bratt, Rohit Mishra, Stanley Peters, Sandra Upson, Elizabeth Shriberg, Carsten Bergmann, Lin Zhao:
A conversational dialogue system for cognitively overloaded users. 233-236 - Gerhard Hanrieder, Stefan W. Hamerich:
Modeling generic dialog applications for embedded systems. 237-240 - Matthew N. Stuttle, Jason D. Williams, Steve J. Young:
A framework for dialogue data collection with a simulated ASR channel. 241-244 - Shimei Pan:
A multi-layer conversation management approach for information seeking applications. 245-248 - Thomas K. Harris, Roni Rosenfeld:
A universal speech interface for appliances. 249-252 - Keita Hayashi, Yuki Irie, Yukiko Yamaguchi, Shigeki Matsubara, Nobuo Kawaguchi:
Speech understanding, dialogue management and response generation in corpus-based spoken dialogue system. 253-256 - Fernando Fernández Martínez, Valentín Sama, Luis Fernando D'Haro, Rubén San Segundo, Ricardo de Córdoba, Juan Manuel Montero:
Implementation of dialog applications in an open-source voiceXML platform. 257-260 - Chun Wai Lau, Bin Ma, Helen Mei-Ling Meng, Yiu Sang Moon, Yeung Yam:
Fuzzy logic decision fusion in a multimodal biometric system. 261-264 - Peter Poller, Norbert Reithinger:
A state model for the realization of visual perceptive feedback in smartkom. 265-268 - Akemi Iida, Yoshito Ueno, Ryohei Matsuura, Kiyoaki Aikawa:
A vector-based method for efficiently representing multivariate environmental information. 269-272 - Ioannis Toptsis, Shuyin Li, Britta Wrede, Gernot A. Fink:
A multi-modal dialog system for a mobile robot. 273-276 - Niels Ole Bernsen, Laila Dybkjær:
Structured interview-based evaluation of spoken multimodal conversation with h.c. andersen. 277-280
Speech Recognition - Search
- Miroslav Novak, Vladimír Bergl:
Memory efficient decoding graph compilation with wide cross-word acoustic context. 281-284 - Dongbin Zhang, Limin Du:
Dynamic beam pruning strategy using adaptive control. 285-288 - Takaaki Hori, Chiori Hori, Yasuhiro Minami:
Fast on-the-fly composition for weighted finite-state transducers in 1.8 million-word vocabulary continuous speech recognition. 289-292 - Peng Yu, Frank Torsten Bernd Seide:
A hybrid word / phoneme-based approach for improved vocabulary-independent search in spontaneous speech. 293-296 - Lubos Smídl, Ludek Müller:
Keyword spotting for highly inflectional languages. 297-300 - Frédéric Tendeau:
Optimizing an engine network that allows dynamic masking. 301-304
Spoken Dialogue and Systems
- Katsutoshi Ohtsuki, Nobuaki Hiroshima, Yoshihiko Hayashi, Katsuji Bessho, Shoichi Matsunaga:
Topic structure extraction for meeting indexing. 305-308 - Sophie Rosset, Lori Lamel:
Automatic detection of dialog acts based on multilevel information. 309-312 - Gina-Anne Levow:
Identifying local corrections in human-computer dialogue. 313-316 - Peter Reichl, Florian Hammer:
Hot discussion or frosty dialogue? towards a temperature metric for conversational interactivity. 317-320 - Stephanie Seneff, Chao Wang, I. Lee Hetherington, Grace Chung:
A dynamic vocabulary spoken dialogue interface. 321-324 - Matthias Denecke, Kohji Dohsaka, Mikio Nakano:
Learning dialogue policies using state aggregation in reinforcement learning. 325-328
Speech Perception
- Keren B. Shatzman:
Segmenting ambiguous phrases using phoneme duration. 329-332 - Shuichi Sakamoto, Yôiti Suzuki, Shigeaki Amano, Tadahisa Kondo, Naoki Iwaoka:
A compensation method for word-familiarity difference with SNR control in intelligibility test. 333-336 - Takashi Otake, Yoko Sakamoto, Yasuyuki Konomi:
Phoneme-based word activation in spoken-word recognition: evidence from Japanese school children. 337-340 - Belynda Brahimi, Philippe Boula de Mareüil, Cédric Gendrot:
Role of segmental and suprasegmental cues in the perception of maghrebian-acented French. 341-344 - Hiroaki Kato, Yoshinori Sagisaka, Minoru Tsuzaki, Makiko Muto:
Effect of speaking rate on the acceptability of change in segment duration. 345-348 - Kiyoko Yoneyama:
A cross-linguistic study of diphthongs in spoken word processing in Japanese and English. 349-352
Multi-Lingual Speech-to-Speech Translation
- Alex Waibel:
Speech translation: past, present and future. 353-356 - Gen-ichiro Kikui, Toshiyuki Takezawa, Seiichi Yamamoto:
Multilingual corpora for speech-to-speech translation research. 357-360 - Hermann Ney:
Statistical machine translation and its challenges. 361-364 - John Lee, Stephanie Seneff:
Translingual grammar induction. 365-368 - Youngjik Lee, Jun Park, Seung-Shin Oh:
Usability considerations of speech-to-speech translation system. 369-372 - Gianni Lazzari, Alex Waibel, Chengqing Zong:
Worldwide ongoing activities on multilingual speech to speech translation. 373-376
Speech Recognition - Large Vocabulary
- Dominique Fohr, Odile Mella, Christophe Cerisara, Irina Illina:
The automatic news transcription system: ANTS, some real time experiments. 377-380 - Bhuvana Ramabhadran, Olivier Siohan, Geoffrey Zweig:
Use of metadata to improve recognition of spontaneous speech and named entities. 381-384 - Janne Pylkkönen, Mikko Kurimo:
Duration modeling techniques for continuous speech recognition. 385-388 - Tanel Alumäe:
Large vocabulary continuous speech recognition for estonian using morpheme classes. 389-392 - Zhaobing Han, Shuwu Zhang, Bo Xu:
Combining agglomerative and tree-based state clustering for high accuracy acoustic modeling. 393-396 - William S.-Y. Wang, Gang Peng:
Parallel tone score association method for tone language speech recognition. 397-400 - Jing Zheng, Horacio Franco, Andreas Stolcke:
Effective acoustic modeling for rate-of-speech variation in large vocabulary conversational speech recognition. 401-404 - L. Sarada Ghadiyaram, Hemalatha Nagarajan, Nagarajan Thangavelu, Hema A. Murthy:
Automatic transcription of continuous speech using unsupervised and incremental training. 405-408 - Jan Nouza, Dana Nejedlová, Jindrich Zdánský, Jan Kolorenc:
Very large vocabulary speech recognition system for automatic transcription of czech broadcast programs. 409-412 - Olivier Siohan, Bhuvana Ramabhadran, Geoffrey Zweig:
Speech recognition error analysis on the English MALACH corpus. 413-416 - Rong Zhang, Alexander I. Rudnicky:
A frame level boosting training scheme for acoustic modeling. 417-420 - Rong Zhang, Alexander I. Rudnicky:
Optimizing boosting with discriminative criteria. 421-424 - Xianghua Xu, Qiang Guo, Jie Zhu:
Restructuring HMM states for speaker adaptation in Mandarin speech recognition. 425-428 - Mike Matton, Mathias De Wachter, Dirk Van Compernolle, Ronald Cools:
A discriminative locally weighted distance measure for speaker independent template based speech recognition. 429-432 - Yohei Itaya, Heiga Zen, Yoshihiko Nankaku, Chiyomi Miyajima, Keiichi Tokuda, Tadashi Kitamura:
Deterministic annealing EM algorithm in parameter estimation for acoustic model. 433-436 - Frantisek Grézl, Martin Karafiát, Jan Cernocký:
TRAP based features for LVCSR of meting data. 437-440 - Frank K. Soong, Wai Kit Lo, Satoshi Nakamura:
Optimal acoustic and language model weights for minimizing word verification errors. 441-444 - Atsushi Sako, Yasuo Ariki:
Structuring of baseball live games based on speech recognition using task dependant knowledge. 445-448 - Zhengyu Zhou, Helen M. Meng:
A two-level schema for detecting recognition errors. 449-452 - In-Jeong Choi, Nam-Hoon Kim, Su Youn Yoon:
Large vocabulary continuous speech recognition based on cross-morpheme phonetic information. 453-456 - Changxue Ma:
Automatic phonetic base form generation based on maximum context tree. 457-460 - Gustavo Hernández Ábrego, Lex Olorenshaw, Raquel Tato, Thomas Schaaf:
Dictionary refinements based on phonetic consensus and non-uniform pronunciation reduction. 1697-1700 - Abdelkhalek Messaoudi, Lori Lamel, Jean-Luc Gauvain:
Transcription of arabic broadcast news. 1701-1704 - Takahiro Shinozaki, Sadaoki Furui:
Spontaneous speech recognition using a massively parallel decoder. 1705-1708 - Tanja Schultz, Qin Jin, Kornel Laskowski, Yue Pan, Florian Metze, Christian Fügen:
Issues in meeting transcription - the ISL meeting transcription system. 1709-1712 - Katsutoshi Ohtsuki, Nobuaki Hiroshima, Shoichi Matsunaga, Yoshihiko Hayashi:
Multi-pass ASR using vocabulary expansion. 1713-1716 - Vlasios Doumpiotis, William Byrne:
Pinched lattice minimum Bayes risk discriminative training for large vocabulary continuous speech recognition. 1717-1720 - Izhak Shafran, William Byrne:
Task-specific minimum Bayes-risk decoding using learned edit distance. 1945-1948 - Rong Zhang, Alexander I. Rudnicky:
Apply n-best list re-ranking to acoustic model combinations of boosting training. 1949-1952 - Do Yeong Kim, Srinivasan Umesh, Mark J. F. Gales, Thomas Hain, Philip C. Woodland:
Using VTLN for broadcast news transcription. 1953-1956 - Andreas Stolcke, Chuck Wooters, Ivan Bulyko, Martin Graciarena, Scott Otterson, Barbara Peskin, Mari Ostendorf, David Gelbart, Nikki Mirghafori, Tuomo W. Pirinen:
From switchboard to meetings: development of the 2004 ICSI-SRI-UW meeting recognition system. 1957-1960 - Anand Venkataraman, Andreas Stolcke, Wen Wang, Dimitra Vergyri, Jing Zheng, Venkata Ramana Rao Gadde:
An efficient repair procedure for quick transcriptions. 1961-1964 - Yao Qian, Tan Lee, Frank K. Soong:
Tone information as a confidence measure for improving Cantonese LVCSR. 1965-1968
Speech Science
- Danielle Duez:
Temporal variables in parkinsonian speech. 461-464 - Olov Engwall:
Speaker adaptation of a three-dimensional tongue model. 465-468 - Nicole Cooper, Anne Cutler:
Perception of non-native phonemes in noise. 469-472 - Hideki Kawahara, Hideki Banno, Toshio Irino, Jiang Jin:
Intelligibility of degraded speech from smeared STRAIGHT spectrum. 473-476 - Young-Ik Kim, Rhee Man Kil:
Sound source localization based on zero-crosing peak-amplitude coding. 477-480 - Sachiyo Kajikawa, Laurel Fais, Shigeaki Amano, Janet F. Werker:
Adult and infant sensitivity to phonotactic features in spoken Japanese. 481-484 - Phil D. Green, James Carmichael:
Revisiting dysarthria assessment intelligibility metrics. 485-488 - Valter Ciocca, Tara L. Whitehill, Joan K.-Y. Ma:
The effect of intonation on perception of Cantonese lexical tones. 489-492 - Toshiko Isei-Jaakkola:
Maximum short quantity in Japanese and finish in two perception tests with F0 and db variants. 493-496 - Paavo Alku, Matti Airas, Brad H. Story:
Evaluation of an inverse filtering technique using physical modeling of voice production. 497-500 - Hui-ju Hsu, Janice Fon:
Positional and phonotactic effects on the realization of taiwan Mandarin tone 2. 501-504 - Karl Schnell, Arild Lacroix:
Speech production based on lossy tube models: unit concatenation and sound transitions. 505-508 - Qin Yan, Saeed Vaseghi, Dimitrios Rentzos, Ching-Hsiang Ho:
Modelling and ranking of differences across formants of british, australian and american accents. 509-512 - Tatsuya Kitamura, Satoru Fujita, Kiyoshi Honda, Hironori Nishimoto:
An experimental method for measuring transfer functions of acoustic tubes. 513-516 - Takuya Tsuji, Tokihiko Kaburagi, Kohei Wakamiya, Jiji Kim:
Estimation of the vocal tract spectrum from articulatory movements using phoneme-dependent neural networks. 517-520 - Kunitoshi Motoki, Hiroki Matsuzaki:
Computation of the acoustic characteristics of vocal-tract models with geometrical perturbation. 521-524 - P. Vijayalakshmi, M. Ramasubba Reddy:
Analysis of hypernasality by synthesis. 525-528 - Abdellah Kacha, Francis Grenez, Frédéric Bettens, Jean Schoentgen:
Adaptive long-term predictive analysis of disordered speech. 529-532 - Slobodan Jovicic, Sandra Antesevic, Zoran Saric:
Phoneme restoration in degraded speech communication. 533-536 - Maria Marinaki, Constantine Kotropoulos, Ioannis Pitas, Nikolaos Maglaveras:
Automatic detection of vocal fold paralysis and edema. 537-540
Novel Features in ASR
- Yasuhiro Minami, Erik McDermott, Atsushi Nakamura, Shigeru Katagiri:
A theoretical analysis of speech recognition based on feature trajectory models. 549-552 - Zhijian Ou, Zuoying Wang:
Discriminative combination of multiple linear predictions for speech recognition. 553-556 - Davood Gharavian, Seyed Mohammad Ahadi:
Use of formants in stressed and unstressed continuous speech recognition. 557-560 - Konstantin Markov, Satoshi Nakamura, Jianwu Dang:
Integration of articulatory dynamic parameters in HMM/BN based speech recognition system. 561-564 - Leigh David Alsteris, Kuldip K. Paliwal:
ASR on speech reconstructed from short-time fourier phase spectra. 565-568
Spoken and Natural Language Understanding
- Robert Lieb, Tibor Fábián, Günther Ruske, Matthias Thomae:
Estimation of semantic confidences on lattice hierarchies. 569-572 - Fumiyo Fukumoto, Yoshimi Suzuki:
Learning subject drift for topic tracking. 573-576 - Elizabeth Shriberg, Andreas Stolcke, Dustin Hillard, Mari Ostendorf, Barbara Peskin, Mary P. Harper, Yang Liu:
The ICSI-SRI-UW metadata extraction system. 577-580 - Mark Hasegawa-Johnson, Stephen E. Levinson, Tong Zhang:
Automatic detection of contrast for speech understanding. 581-584 - Nick Jui-Chang Wang, Jia-Lin Shen, Ching-Ho Tsai:
Integrating layer concept inform ation into n-gram modeling for spoken language understanding. 585-588 - Junyan Chen, Ji Wu, Zuoying Wang:
A robust understanding model for spoken dialogues. 589-592 - Chai Wutiwiwatchai, Sadaoki Furui:
Belief-based nonlinear rescoring in Thai speech understanding. 2129-2133 - Toshihiko Itoh, Atsuhiko Kai, Yukihiro Itoh, Tatsuhiro Konishi:
An understanding strategy based on plausibility score in recognition history using CSR confidence measure. 2133-2136 - Sangkeun Jung, Minwoo Jeong, Gary Geunbae Lee:
Speech recognition error correction using maximum entropy language model. 2137-2140 - Xiang Li, Juan M. Huerta:
Discriminative training of compound-word based multinomial classifiers for speech routing. 2141-2144 - Jihyun Eun, Changki Lee, Gary Geunbae Lee:
An information extraction approach for spoken language understanding. 2145-2148 - David Horowitz, Partha Lal, Pierce Gerard Buckley:
A maximum entropy shallow functional parser for spoken language understanding. 2149-2152 - Qiang Huang, Stephen J. Cox:
Mixture language models for call routing. 2153-2156 - Chung-Hsien Wu, Jui-Feng Yeh, Ming-Jun Chen:
Speech act identification using an ontology-based partial pattern tree. 2157-2160 - Ye-Yi Wang, Yun-Cheng Ju:
Creating speech recognition grammars from regular expressions for alphanumeric concepts. 2161-2164 - Isabel Trancoso, Paulo Araújo, Céu Viana, Nuno J. Mamede:
Poetry assistant. 2165-2168 - Tasuku Kitade, Tatsuya Kawahara, Hiroaki Nanjo:
Automatic extraction of key sentences from oral presentations using statistical measure based on discourse markers. 2169-2172 - Tomohiro Ohno, Shigeki Matsubara, Nobuo Kawaguchi, Yasuyoshi Inagaki:
Robust dependency parsing of spontaneous Japanese speech and its evaluation. 2173-2176 - Wolfgang Minker, Dirk Bühler, Christiane Beuschel:
Strategies for optimizing a stochastic spoken natural language parser. 2177-2180 - Tzu-Lun Lee, Ya-Fang He, Yun-Ju Huang, Shu-Chuan Tseng, Robert Eklund:
Prolongation in spontaneous Mandarin. 2181-2184 - Yuki Irie, Shigeki Matsubara, Nobuo Kawaguchi, Yukiko Yamaguchi, Yasuyoshi Inagaki:
Speech intention understanding based on decision tree learning. 2185-2188 - Satanjeev Banerjee, Alexander I. Rudnicky:
Using simple speech-based features to detect the state of a meeting and the roles of the meeting participants. 2189-2192 - Serdar Yildirim, Murtaza Bulut, Chul Min Lee, Abe Kazemzadeh, Zhigang Deng, Sungbok Lee, Shrikanth S. Narayanan, Carlos Busso:
An acoustic study of emotions expressed in speech. 2193-2196 - Tatsuya Kawahara, Ian Richard Lane, Tomoko Matsui, Satoshi Nakamura:
Topic classification and verification modeling for out-of-domain utterance detection. 2197-2200 - So-Young Park, Yong-Jae Kwak, Joon-Ho Lim, Hae-Chang Rim, Soo-Hong Kim:
Partially lexicalized parsing model utilizing rich features. 2201-2204 - Yoshimi Suzuki, Fumiyo Fukumoto, Yoshihiro Sekiguchi:
Clustering similar nouns for selecting related news articles. 2205-2208 - Leonardo Badino:
Chinese text word-segmentation considering semantic links among sentences. 2209-2212 - Do-Gil Lee, Hae-Chang Rim:
Syllable-based probabilistic morphological analysis model of Korean. 2213-2216
Speaker Segmentation and Clustering
- Fabio Valente, Christian Wellekens:
Scoring unknown speaker clustering : VB vs. BIC. 593-596 - Qin Jin, Tanja Schultz:
Speaker segmentation and clustering in meetings. 597-600 - Lori Lamel, Jean-Luc Gauvain, Leonardo Canseco-Rodriguez:
Speaker diarization from speech transcripts. 601-604 - Xavier Anguera Miró, Javier Hernando Pericas:
Evolutive speaker segmentation using a repository system. 605-608 - Hagai Aronowitz, David Burshtein, Amihood Amir:
Speaker indexing in audio archives using test utterance Gaussian mixture modeling. 609-612 - Antoine Raux:
Automated lexical adaptation and speaker clustering based on pronunciation habits for non-native speech recognition. 613-616
Speech Processing in a Packet Network Environment
- Kuldip K. Paliwal, Stephen So:
Scalable distributed speech recognition using multi-frame GMM-based block quantization. 617-620 - Naveen Srinivasamurthy, Kyu Jeong Han, Shrikanth S. Narayanan:
Robust speech recognition over packet networks: an overview. 621-624 - Thomas Eriksson, Samuel Kim, Hong-Goo Kang, Chungyong Lee:
Theory for speaker recognition over IP. 625-628 - Wu Chou, Feng Liu:
Voice portal services in packet network and voIP environment. 629-632 - Peter Kabal, Colm Elliott:
Synchronization of speaker selection for centralized tandem free voIP conferencing. 633-636 - Akitoshi Kataoka, Yusuke Hiwasaki, Toru Morinaga, Jotaro Ikedo:
Measuring the perceived importance of time- and frequency-divided speech blocks for transmitting over packet networks. 637-640 - Moo Young Kim, W. Bastiaan Kleijn:
Comparison of transmitter - based packet-loss recovery techniques for voice transmission. 641-644
Acoustic Modeling
- Denis Jouvet, Ronaldo O. Messina:
Context dependent "long units" for speech recognition. 645-648 - Shinichi Yoshizawa, Kiyohiro Shikano:
Rapid EM training based on model-integration. 649-652 - Dominique Fohr, Odile Mella, Irina Illina, Christophe Cerisara:
Experiments on the accuracy of phone models and liaison processing in a French broadcast news transcription system. 653-656 - Jorge F. Silva, Shrikanth S. Narayanan:
A statistical discrimination measure for hidden Markov models based on divergence. 657-660 - Jan Stadermann, Gerhard Rigoll:
A hybrid SVM/HMM acoustic modeling approach to automatic speech recognition. 661-664 - Dirk Knoblauch:
Data driven number-of-states selection in HMM topologies. 665-668 - Youngkyu Cho, Sung-a Kim, Dongsuk Yook:
Hybrid model using subspace distribution clustering hidden Markov models and semi-continuous hidden Markov models for embedded speech recognizers. 669-672 - Peder A. Olsen, Karthik Visweswariah:
Fast clustering of Gaussians and the virtue of representing Gaussians in exponential model format. 673-676 - Karen Livescu, James R. Glass:
Feature-based pronunciation modeling with trainable asynchrony probabilities. 677-680 - Hong-Kwang Jeff Kuo, Yuqing Gao:
Maximum entropy direct model as a unified model for acoustic modeling in speech recognition. 681-684 - Yu Zhu, Tan Lee:
Explicit duration modeling for Cantonese connected-digit recognition. 685-688 - Arthur Chan, Mosur Ravishankar, Alexander I. Rudnicky, Jahanzeb Sherwani:
Four-layer categorization scheme of fast GMM computation techniques in large vocabulary continuous speech recognition systems. 689-692 - Junho Park, Hanseok Ko:
Compact acoustic model for embedded implementation. 693-696 - Takatoshi Jitsuhiro, Satoshi Nakamura:
Increasing the mixture components of non-uniform HMM structures based on a variational Bayesian approach. 697-700 - Panu Somervuo:
Comparison of ML, MAP, and VB based acoustic models in large vocabulary speech recognition. 701-704 - Wolfgang Macherey, Ralf Schlüter, Hermann Ney:
Discriminative training with tied covariance matrices. 705-708 - Frank Diehl, Asunción Moreno:
Acoustic phonetic modeling using local codebook features. 709-712 - Gue Jun Jung, Su-Hyun Kim, Yung-Hwan Oh:
An efficient codebook design in SDCHMM for mobile communication environments. 713-716 - Makoto Shozakai, Goshu Nagino:
Analysis of speaking styles by two-dimensional visualization of aggregate of acoustic models. 717-720 - Myoung-Wan Koo, Ho-Hyun Jeon, Sang-Hong Lee:
Context dependent phoneme duration modeling with tree-based state tying. 721-724 - John Scott Bridle:
Towards better understanding of the model implied by the use of dynamic features in HMMs. 725-728
Prosody Modeling and Generation
- Jianfeng Li, Guoping Hu, Ren-Hua Wang:
Chinese prosody phrase break prediction based on maximum entropy model. 729-732 - Krothapalli Sreenivasa Rao, Bayya Yegnanarayana:
Intonation modeling for indian languages. 733-736 - Yu Zheng, Gary Geunbae Lee, Byeongchang Kim:
Using multiple linguistic features for Mandarin phrase break prediction in maximum-entropy classification framework. 737 - Ian Read, Stephen Cox:
Using part-of-speech for predicting phrase breaks. 741-744 - David Escudero Mancebo, Valentín Cardeñoso-Payo:
A proposal to quantitatively select the right intonation unit in data-driven intonation modeling. 745-748 - Jinfu Ni, Hisashi Kawai, Keikichi Hirose:
Formulating contextual tonal variations in Mandarin. 749-752 - Salma Mouline, Olivier Boëffard, Paul C. Bagshaw:
Automatic adaptation of the momel F0 stylisation algorithm to new corpora. 753-756 - Pablo Daniel Agüero, Klaus Wimmer, Antonio Bonafonte:
Joint extraction and prediction of fujisaki's intonation model parameters. 757-760 - Panagiotis Zervas, Nikos Fakotakis, George K. Kokkinakis, Georgios Kouroupetroglou, Gerasimos Xydas:
Evaluation of corpus based tone prediction in mismatched environments for greek tts synthesis. 761-764 - Ziyu Xiong, Juanwen Chen:
The duration of pitch transition phase and its relative factors. 765-768 - Yu Hu, Ren-Hua Wang, Lu Sun:
Polynomial regression model for duration prediction in Mandarin. 769-772 - Michelle Tooher, John G. McKenna:
Prediction of the glottal LF parameters using regression trees. 773-776 - Volker Dellwo, Bianca Aschenberner, Petra Wagner, Jana Dancovicova, Ingmar Steiner:
Bonntempo-corpus and bonntempo-tools: a database for the study of speech rhythm and rate. 777-780 - Wentao Gu, Keikichi Hirose, Hiroya Fujisaki:
Analysis of F0 contours of Cantonese utterances based on the command-response model. 781-784 - Marion Dohen, Hélène Loevenbruck:
Pre-focal rephrasing, focal enhancement and postfocal deaccentuation in French. 785-788 - Sridhar Krishna Nemala, Partha Pratim Talukdar, Kalika Bali, A. G. Ramakrishnan:
Duration modeling for hindi text-to-speech synthesis system. 789-792 - Nemala Sridhar Krishna, Hema A. Murthy:
A new prosodic phrasing model for indian language telugu. 793-796 - Oliver Jokisch, Michael Hofmann:
Evolutionary optimization of an adaptive prosody model. 797-800 - Gerasimos Xydas, Georgios Kouroupetroglou:
An intonation model for embedded devices based on natural F0 samples. 801-804 - Katerina Vesela, Nino Peterek, Eva Hajicová:
Prosodic characteristics of czech contrastive topic. 805-808
Multi-Sensor ASR
- Martin Graciarena, Federico Cesari, Horacio Franco, Gregory K. Myers, Cregg Cowan, Victor Abrash:
Combination of standard and throat microphones for robust speech recognition in highly noisy environments. 809-812 - Cenk Demiroglu, David V. Anderson:
Noise robust digit recognition using a glottal radar sensor for voicing detection. 813-816 - Dominik Raub, John W. McDonough, Matthias Wölfel:
A cepstral domain maximum likelihod beamformer for speech recognition. 817-820 - Naoya Mochiki, Tetsunori Kobayashi, Toshiyuki Sekiya, Tetsuji Ogawa:
Recognition of three simultaneous utterance of speech by four-line directivity microphone mounted on head of robot. 821-824 - Shigeki Sagayama, Okajima Takashi, Yutaka Kamamoto, Takuya Nishimoto:
Complex spectrum circle centroid for microphone-array-based noisy speech recognition. 825-828 - Larry P. Heck, Mark Z. Mao:
Automatic speech recognition of co-channel speech: integrated speaker and speech recognition approach. 829-832
Multi-Lingual Speech Processing
- José B. Mariño, Asunción Moreno, Albino Nogueiras:
A first experience on multilingual acoustic modeling of the languages spoken in morocco. 833-836 - Mónica Caballero, Asunción Moreno, Albino Nogueiras:
Data driven multidialectal phone set for Spanish dialects. 837-840 - Daniela Oria, Akos Vetek:
Multilingual e-mail text processing for speech synthesis. 841-844 - Harald Romsdorfer, Beat Pfister:
Multi-context rules for phonological processing in polyglot TTS synthesis. 845-848 - Leonardo Badino, Claudia Barolo, Silvia Quazza:
A general approach to TTS reading of mixed-language texts. 849-852 - Panayiotis G. Georgiou, Shrikanth S. Narayanan, Hooman Shirani Mehr:
Context dependent statistical augmentation of persian transcripts. 853-856
Speech Enhancement
- Cenk Demiroglu, David V. Anderson:
A soft decision MMSE amplitude estimator as a noise preprocessor to speech coder s using a glottal sensor. 857-860 - Rongqiang Hu, David V. Anderson:
Single acoustic-channel speech enhancement based on glottal correlation using non-acoustic sensor. 861-864 - Xianxian Zhang, John H. L. Hansen, Kathryn Hoberg Arehart, Jessica Rossi-Katz:
In-vehicle based speech processing for hearing impaired subjects. 865-868 - Sriram Srinivasan, W. Bastiaan Kleijn:
Speech enhancement using adaptive time-domain segmentation. 869-872 - Tomohiro Nakatani, Keisuke Kinoshita, Masato Miyoshi, Parham Zolfaghari:
Harmonicity based monaural speech dereverberation with time warping and F0 adaptive window. 873-876 - Marc Delcroix, Takafumi Hikichi, Masato Miyoshi:
Dereverberation of speech signals based on linear prediction. 877-880
Speech and Affect
- Nick Campbell:
Perception of affect in speech - towards an automatic processing of paralinguistic information in spoken conversation. 881-884 - Noël Chateau, Valérie Maffiolo, Christophe Blouin:
Analysis of emotional speech in voice mail messages: the influence of speakers' gender. 885-888 - Chul Min Lee, Serdar Yildirim, Murtaza Bulut, Abe Kazemzadeh, Carlos Busso, Zhigang Deng, Sungbok Lee, Shrikanth S. Narayanan:
Emotion recognition based on phoneme classes. 889-892 - Peter Robinson, Tal Sobol Shikler:
Visualizing dynamic features of expressions in speech. 893-896 - Aijun Li, Haibo Wang:
Friendly speech analysis and perception in standard Chinese. 897-900 - Ailbhe Ní Chasaide, Christer Gobl:
Decomposing linguistic and affective components of phonatory quality. 901-904 - Dan-Ning Jiang, Lian-Hong Cai:
Classifying emotion in Chinese speech by decomposing prosodic features. 1325-1328 - Chen Yu, Paul M. Aoki, Allison Woodruff:
Detecting user engagement in everyday conversations. 1329-1332 - Takashi X. Fujisawa, Norman D. Cook:
Identifying emotion in speech prosody using acoustical cues of harmony. 1333-1336 - Jianhua Tao:
Context based emotion detection from text input. 1337-1340 - Atsushi Iwai, Yoshikazu Yano, Shigeru Okuma:
Complex emotion recognition system for a specific user using SOM based on prosodic features. 1341-1344 - Hoon-Young Cho, Kaisheng Yao, Te-Won Lee:
Emotion verification for emotion detection and unknown emotion rejection. 1345-1348 - Keikichi Hirose:
Improvement in corpus-based generation of F0 contours using generation process model for emotional speech synthesis. 1349-1352
Speech Features
- Rajesh Mahanand Hegde, Hema A. Murthy, Venkata Ramana Rao Gadde:
Continuous speech recognition using joint features derived from the modified group delay function and MFCC. 905-908 - Hua Yu:
Phase-space representation of speech. 909-912 - Hema A. Murthy, Rajesh Mahanand Hegde, Venkata Ramana Rao Gadde:
The modified group delay feature: a new spectral representation of speech. 913-916 - Oh-Wook Kwon, Te-Won Lee:
ICA-based feature extraction for phoneme recognition. 917-920 - Qifeng Zhu, Barry Y. Chen, Nelson Morgan, Andreas Stolcke:
On using MLP features in LVCSR. 921-924 - Barry Y. Chen, Qifeng Zhu, Nelson Morgan:
Learning long-term temporal features in LVCSR using neural networks. 925-928 - T. V. Sreenivas, G. V. Kiran, A. G. Krishna:
Neural "spike rate spectrum" as a noise robust, speaker invariant feature for automatic speech recognition. 929-932 - Yoshihisa Nakatoh, Makoto Nishizaki, Shinichi Yoshizawa, Maki Yamada:
An adaptive MEL-LPC analysis for speech recognition. 933-936 - Kentaro Ishizuka, Noboru Miyazaki, Tomohiro Nakatani, Yasuhiro Minami:
Improvement in robustness of speech feature extraction method using sub-band based periodicity and aperiodicity decomposition. 937-940 - Carlos Toshinori Ishi:
A new acoustic measure for aspiration noise detection. 941-944 - Kris Demuynck, Oscar Garcia, Dirk Van Compernolle:
Synthesizing speech from speech recognition parameters. 945-948 - Marios Athineos, Hynek Hermansky, Daniel P. W. Ellis:
LP-TRAP: linear predictive temporal patterns. 949-952 - Xiang Li, Richard M. Stern:
Parallel feature generation based on maximizing normalized acoustic likelihood. 953-956 - Kun-Ching Wang:
An adaptive band-partitioning spectral entropy based speech detection in realistic noisy environments. 957-960 - Javier Ramírez, José C. Segura, M. Carmen Benítez, Ángel de la Torre, Antonio J. Rubio:
Improved voice activity detection combining noise reduction and subband divergence measures. 961-964 - Kiyoung Park, Changkyu Choi, Jeongsu Kim:
Voice activity detection using global soft decision with mixture of Gaussian model. 965-968 - Thomas Kemp, Climent Nadeu, Yin Hay Lam, Josep Maria Sola i Caros:
Environmental robust features for speech detection. 969-972 - Kornel Laskowski, Qin Jin, Tanja Schultz:
Crosscorrelation-based multispeaker speech activity detection. 973-976 - Shang-nien Tsai:
Improved robustness of time-frequency principal components (TFPC) by synergy of methods in different domains. 977-980 - Li Deng, Yu Dong, Alex Acero:
A quantitative model for formant dynamics and contextually assimilated reduction in fluent speech. 981-984 - Gernot Kubin, Tuan Van Pham:
DWT-based classification of acoustic-phonetic classes and phonetic units. 985-988 - Yong-Choon Cho, Seungjin Choi:
Learning nonnegative features of spectro-temporal sounds for classification. 989-992
Language Modeling, Multimodal & Multilingual Speech Processing
- Sungyup Chung, Keikichi Hirose, Nobuaki Minematsu:
N-gram language modeling of Japanese using bunsetsu boundaries. 993-996 - Langzhou Chen, Lori Lamel, Jean-Luc Gauvain, Gilles Adda:
Dynamic language modeling for broadcast news. 997-1000 - Ren-Yuan Lyu, Dau-Cheng Lyu, Min-Siong Liang, Min-Hong Wang, Yuang-Chin Chiang, Chun-Nan Hsu:
A unified framework for large vocabulary speech recognition of mutually unintelligible Chinese "regionalects". 1001-1004 - Ielka van der Sluis, Emiel Krahmer:
The influence of target size and distance on the production of speech and gesture in multimodal referring expressions. 1005-1008 - Anurag Kumar Gupta, Tasos Anastasakos:
Dynamic time windows for multimodal input fusion. 1009-1012 - Raymond H. Lee, Anurag Kumar Gupta:
MICot : a tool for multimodal input data collection. 1013-1016 - Chakib Tadj, Hicham Djenidi, Madjid Haouani, Amar Ramdane-Cherif, Nicole Lévy:
Simulating multimodal applications. 1017-1020 - Jakob Schou Pedersen, Paul Dalsgaard, Børge Lindberg:
A multimodal communication aid for global aphasia patients. 1021-1024 - Hirofumi Yamamoto, Gen-ichiro Kikui, Yoshinori Sagisaka:
Mis-recognized utterance detection using hierarchical language model. 1025-1028 - Marko Moberg, Kimmo Pärssinen, Juha Iso-Sipilä:
Cross-lingual phoneme mapping for multilingual synthesis systems. 1029-1032 - Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno, Tsuyoshi Tasaki, Takeshi Yamaguchi:
Robot motion control using listener's back-channels and head gesture information. 1033-1036 - Sakriani Sakti, Arry Akhmad Arman, Satoshi Nakamura, Paulus Hutagaol:
Indonesian speech recognition for hearing and speaking impaired people. 1037-1040 - Mohsen A. Rashwan:
A two phase arabic language model for speech recognition and other language applications. 1041-1044 - Yuya Akita, Tatsuya Kawahara:
Language model adaptation based on PLSA of topics and speakers. 1045-1048 - Hans J. G. A. Dolfing, Pierce Gerard Buckley, David Horowitz:
Unified language modeling using finite-state transducers with first applications. 1049-1052 - Katsunobu Itou, Atsushi Fujii, Tomoyosi Akiba:
Effects of language modeling on speech-driven question answering. 1053-1056 - Abhinav Sethy, Shrikanth S. Narayanan, Bhuvana Ramabhadran:
Measuring convergence in language model estimation using relative entropy. 1057-1060
Detection and Classification in ASR
- Rongqing Huang, John H. L. Hansen:
High-level feature weighted GMM network for audio stream classification. 1061-1064 - Jindrich Zdánský, Petr David, Jan Nouza:
An improved preprocessor for the automatic transcription of broadcast news audio stream. 1065-1068 - Yih-Ru Wang, Chi-Han Huang:
Speaker-and-environment change detection in broadcast news using the common component GMM-based divergence measure. 1069-1072 - Tommi Lahti:
Beginning of utterance detection algorithm for low complexity ASR engines. 1073-1076 - Somsak Sukittanon, Arun C. Surendran, John C. Platt, Christopher J. C. Burges:
Convolutional networks for speech detection. 1077-1080 - Suryakanth V. Gangashetty, Chellu Chandra Sekhar, B. Yegnanarayana:
Detection of vowel on set points in continuous speech using autoassociative neural network models. 1081-1084
Speech Analysis
- Toshiki Tamiya, Tetsuya Shimamura:
Reconstruction filter design for bone-conducted speech. 1085-1088 - Pedro J. Quintana-Morales, Juan L. Navarro-Mesa:
Frequency warped ARMA analysis of the closed and the open phase of voiced speech. 1089-1192 - Boris Doval, Baris Bozkurt, Christophe d'Alessandro, Thierry Dutoit:
Zeros of z-transform (ZZT) decomposition of speech for source-tract separation. 1093-1096 - Li Deng, Roberto Togneri:
Use of neural network mapping and extended kalman filter to recover vocal tract resonances from the MFCC parameters of speech. 1097-1100 - Xiao Li, Jonathan Malkin, Jeff A. Bilmes:
Graphical model approach to pitch tracking. 1101-1104 - Bo Xu, Jianhua Tao, Yongguo Kang:
A new multicomponent AM-FM demodulation with predicting frequency boundaries and its application to formant estimation. 1105-1108 - Yves Laprie:
A concurrent curve strategy for formant tracking. 2405-2408 - Qin Yan, Esfandiar Zavarehei, Saeed Vaseghi, Dimitrios Rentzos:
A formant tracking LP model for speech processing. 2409-2412 - Hong You:
Application of long-term filtering to formant estimation. 2413-2416 - Baris Bozkurt, Thierry Dutoit, Boris Doval, Christophe d'Alessandro:
A method for glottal formant frequency estimation. 2417-2420 - Baris Bozkurt, Thierry Dutoit, Boris Doval, Christophe d'Alessandro:
Improved differential phase spectrum processing for formant tracking. 2421-2424 - Xu Shao, Ben P. Milner:
MAP prediction of pitch from MFCC vectors for speech reconstruction. 2425-2428 - An-Tze Yu, Hsiao-Chuan Wang:
New harmonicity measures for pitch estimation and voice activity detection. 2429-2432 - Takuya Nishimoto, Shigeki Sagayama, Hirokazu Kameoka:
Multi-pitch trajectory estimation of concurrent speech based on harmonic GMM and nonlinear kalman filtering. 2433-2436 - Attila Ferencz, Jeongsu Kim, Yong-Beom Lee, Jae-Won Lee:
Automatic pitch marking and reconstruction of glottal closure instants from noisy and deformed electro-glotto-graph signals. 2437-2440 - Federico Flego, Luca Armani, Maurizio Omologo:
On the use of a weighted autocorrelation based fundamental frequency estimation for a multidimensional speech input. 2441-2444 - Aarthi M. Reddy, Bhiksha Raj:
A minimum mean squared error estimator for single channel speaker separation. 2445-2448 - Md. Khademul Islam Molla, Keikichi Hirose, Nobuaki Minematsu:
Audio source separation from the mixture using empirical mode decomposition with independent subspace analysis. 2449-2452 - In-Jung Oh, Hyun-Yeol Chung, Jae-Won Cho, Ho-Youl Jung, Rémy Prost:
Audio watermarking in sub-band signals using multiple echo kernels. 2453-2456 - Jie Zhang, Zhenyang Wu:
A piecewise interpolation method based on log-least square error criterion for HRTF. 2457-2460 - R. Muralishankar, A. G. Ramakrishnan, Lakshmish N. Kaushik:
Time-scaling of speech using independent subspace analysis. 2465-2468 - Laurent Girin, Mohammad Firouzmand, Sylvain Marchand:
Long term modeling of phase trajectories within the speech sinusoidal model framework. 2469-2472 - Tina Soltani, Dave Hermann, Etienne Cornu, Hamid Sheikhzadeh, Robert L. Brennan:
An acoustic shock limiting algorithm using time and frequency domain speech features. 2473-2476 - Jong Won Shin, Joon-Hyuk Chang, Nam Soo Kim:
Speech probability distribution based on generalized gama distribution. 2477-2480 - Yanli Zheng, Mark Hasegawa-Johnson, Sarah Borys:
Stop consonant classification by dynamic formant trajectory. 2481-2484 - Yoshinori Shiga, Simon King:
Estimating detailed spectral envelopes using articulatory clustering. 2485-2488
Speech Production
- Olov Engwall:
From real-time MRI to 3d tongue movements. 1109-1112 - Mitsuhiro Nakamura:
Coarticulatory variability and directionality in [s, ..]: an EPG study. 1113-1116 - Yosuke Tanabe, Tokihiko Kaburagi:
Flow representation through the glottis having a polygonal boundary shape. 1117-1120 - Hannu Pulakka, Paavo Alku, Svante Granqvist, Stellan Hertegard, Hans Larsson, Anne-Maria Laukkanen, Per-Ake Lindestad, Erkki Vilkman:
Analysis of the voice source in different phonation types: simultaneous high-sped imaging of the vocal fold vibration and glottal inverse filtering. 1121-1124 - Peter Birkholz, Dietmar Jackèl:
Influence of temporal discretization schemes on formant frequencies and bandwidths in time domain simulations of the vocal tract system. 1125-1128 - Tomoki Toda, Alan W. Black, Keiichi Tokuda:
Acoustic-to-articulatory inversion mapping with Gaussian mixture model. 1129-1132
Audio-Visual Speech Processing
- Jinyoung Kim, Jeesun Kim, Chris Davis:
Audio-visual spoken language processing. 1133-1136 - Kaoru Sekiyama, Denis Burnham:
Issues in the development of auditory-visual speech perception: adults, infants, and children. 1137-1140 - Emiel Krahmer, Marc Swerts:
Signaling and detecting uncertainty in audiovisual speech by children and adults. 1141-1144 - Valérie Hazan, Anke Sennema, Andrew Faulkner:
Effect of intensive audiovisual perceptual training on the perception and production of the /l/-/r/ contrast for Japanese learners of English. 1145-1148 - Jean Vroomen, Sabine van Linden, Béatrice de Gelder, Paul Bertelson:
Visual recalibration of auditory speech versus selective speech adaptation: different build-up courses. 1149-1152 - Chris Davis, Jeesun Kim:
Of the top of the head: audio-visual speech perception from the nose up. 1153-1156 - J. Bruce Millar, Michael Wagner, Roland Goecke:
Aspects of speaking-face data corpus design methodology. 1157-1160 - Jean-Luc Schwartz, Marie-Agnès Cathiard:
Modeling audio-visual speech perception: back on fusion architectures and fusion control. 2017-2020 - Mikko Sams, Ville Ojanen, Jyrki Tuomainen, Vasily Klucharev:
Neurocognition of speech-specific audiovisual perception. 2021-2024 - Adriano Vilela Barbosa, Eric Vatikiotis-Bateson, Andreas Daffertshofer:
Target practice on talking faces. 2025-2028 - Matthias Odisio, Gérard Bailly:
Audiovisual perceptual evaluation of resynthesised speech movements. 2029-2032 - Sascha Fagel:
Video-realistic synthetic speech with a parametric visual speech synthesizer. 2033-2036 - Patricia Scanlon, Gerasimos Potamianos, Vit Libal, Stephen M. Chu:
Mutual information based visual feature selection for lipreading. 2037-2040 - Bowon Lee, Mark Hasegawa-Johnson, Camille Goudeseune, Suketu Kamdar, Sarah Borys, Ming Liu, Thomas S. Huang:
AVICAR: audio-visual speech corpus in a car environment. 2489-2492 - Engin Erzin, Yucel Yemez, A. Murat Tekalp:
Adaptive classifier cascade for multimodal speaker identification. 2493-2496 - Midori Iba, Anke Sennema, Valérie Hazan, Andrew Faulkner:
Use of visual cues in the perception of a labial/labiodental contrast by Spanish-L1 and Japanese-L1 learners of English. 2497-2500 - Xianxian Zhang, Kazuya Takeda, John H. L. Hansen, Toshiki Maeno:
Audio-visual SPeaker localization for car navigation systems. 2501-2504 - Josef Chaloupka:
Automatic lips reading for audio-visual speech processing and recognition. 2505-2508 - Michael Wagner, Girija Chetty:
"liveness" verification in audio-video authentication. 2509-2512 - Maria José Sanchez Martinez, Juan Pablo de la Cruz Gutiérrez:
Speech recognition using motion based lipreading. 2513-2516 - Frédéric Berthommier:
Comparative study of linear and non-linear models for viseme in version: modeling of a cortical associative function. 2517-2520 - Petr Císar, Zdenek Krnoul, Milos Zelezný:
3d lip-tracking for audio-visual speech recognition in real applications. 2521-2524 - J. Bruce Millar, Roland Goecke:
The audio-video australian English speech data corpus AVOZES. 2525-2528 - Ki-Hyung Hong, Yong-Ju Lee, Jae-Young Suh, Kyong-Nim Lee:
Correcting Korean vowel speech recognition errors with limited lip features. 2529-2532 - Kuniko Y. Nielsen:
Segmental differences in the visual contribution to speech inteligibility. 2533-2536
Spoken Language Generation and Synthesis III
- Hui Ye, Steve J. Young:
Voice conversion for unknown speakers. 1161-1164 - Volker Fischer, Jaime Botella Ordinas, Siegfried Kunzmann:
Domain adaptation methods in the IBM trainable text-to-speech system. 1165-1168 - Yi Zhou, Yiqing Zu, Zhenli Yu, Dongjian Yue, Guilin Chen:
Applying pitch connection control in Mandarin speech synthesis. 1169-1172 - Hermann Ney, David Sündermann, Antonio Bonafonte, Harald Höge:
A first step towards text-independent voice conversion. 1173-1176 - Zhenli Yu, Kaizhi Wang, Yiqing Zu, Dongjian Yue, Guilin Chen:
Data pruning approach to unit selection for inventory generation of concatenative embeddable Chinese TTS systems. 1177-1180 - Jithendra Vepa, Simon King:
Subjective evaluation of join cost functions used in unit selection speech synthesis. 1181-1184 - Heiga Zen, Tadashi Kitamura, Murtaza Bulut, Shrikanth S. Narayanan, Ryosuke Tsuzuki, Keiichi Tokuda:
Constructing emotional speech synthesizers with limited speech database. 1185-1188 - Cheng-Yuan Lin, Jyh-Shing Roger Jang:
A two-phase pitch marking method for TD-PSOLA synthesis. 1189-1192 - Antonio Bonafonte, Alexander Kain, Jan P. H. van Santen, Helenca Duxans:
Including dynamic and phonetic information in voice conversion systems. 1193-1196 - Zixiang Wang, Ren-Hua Wang, Zhiwei Shuang, Zhen-Hua Ling:
A novel voice conversion system based on codebook mapping with phoneme-tied weighting. 1197-1200 - Zhen-Hua Ling, Yu Hu, Zhiwei Shuang, Ren-Hua Wang:
Compression of speech database by feature separation and pattern clustering using STRAIGHT. 1201-1204 - Shunsuke Kataoka, Nobuaki Mizutani, Keiichi Tokuda, Tadashi Kitamura:
Decision-tree backing-off in HMM-based speech synthesis. 1205-1208 - Nobuyuki Nishizawa, Hisashi Kawai:
Using a depth-restricted search to reduce delays in unit selection. 1209-1212 - Junichi Yamagishi, Takashi Masuko, Takao Kobayashi:
MLLR adaptation for hidden semi-Markov model based speech synthesis. 1213-1216 - Stefan Breuer, Julia Abresch:
Phoxsy: multi-phone segments for unit selection speech synthesis. 1217-1220 - Francesc Alías, Xavier Llorà, Ignasi Iriondo Sanz, Joan Claudi Socoró, Xavier Sevillano, Lluís Formiga:
Perception-guided and phonetic clustering weight tuning based on diphone pairs for unit selection TTS. 1221-1224 - Taoufik En-Najjary, Olivier Rosec, Thierry Chonavel:
A voice conversion method based on joint pitch and spectral envelope transformation. 1225-1228 - Taoufik En-Najjary, Olivier Rosec, Thierry Chonavel:
Fast GMM-based voice conversion for text-to-speech synthesis systems. 1229-1232 - Rohit Kumar:
A genetic algorithm for unit selection based speech synthesis. 1233-1236 - Jun Huang, Lex Olorenshaw, Gustavo Hernández Ábrego, Lei Duan:
A memory efficient grapheme-to-phoneme conversion system for speech processing. 1237-1240 - Rohit Kumar, S. Prahallad Kishore:
Automatic pruning of unit selection speech databases for synthesis without loss of naturalness. 1377-1380 - Tanya Lambert, Andrew P. Breen:
A database design for a TTS synthesis system using lexical diphones. 1381-1384 - John Kominek, Alan W. Black:
A family-of-models approach to HMM-based segmentation for unit selection speech synthesis. 1385-1388 - Wei Zhang, Ling Jin, Xijun Ma:
Mutual-information based segment pre-selection in concatenative text-to-speech. 1389-1392 - Heiga Zen, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura:
Hidden semi-Markov model based speech synthesis. 1393-1396 - Hartmut R. Pfitzinger:
DFW-based spectral smoothing for concatenative speech synthesis. 1397-1400 - Kyung-Joong Min, Un-Cheon Lim:
Korean prosody generation and artificial neural networks. 1869-1872 - Kyuchul Yoon:
A prosodic phrasing model for a Korean text-to-speech synthesis system. 1873-1876 - Qin Shi, Volker Fischer:
A comparison of statistical methods and features for the prediction of prosodic structures. 1877-1880 - Gui-Lin Chen, Ke-Song Han:
Letter-to-sound for small-footprint multilingual TTS engine. 1881-1884 - Jun Xu, Guohong Fu, Haizhou Li:
Grapheme-to-phoneme conversion for Chinese text-to-speech. 1885-1888 - Marc Schröder, Stefan Breuer:
XML representation languages as a way of interconnecting TTS modules. 1889-1892 - Wenjie Cao, Chengqing Zong, Bo Xu:
Approach to interchange-format based Chinese generation. 1893-1896 - Enrico Zovato, Stefano Sandri, Silvia Quazza, Leonardo Badino:
Prosodic analysis of a multi-style corpus in the perspective of emotional speech synthesis. 1897-1900 - Kyung-Joong Min, Chan-Goo Kang, Un-Cheon Lim:
Number of output nodes of artificial neural networks for Korean prosody generation. 1901-1904 - Sunhee Kim, Ju-Eun Ahn, Soon-Hyob Kim, Yang-Hee Lee:
A Korean grapheme-to-phoneme conversion system using selection procedure for exceptions. 1905-1908 - Thanate Khaorapapong, Montri Karnjanadecha, Keerati Inthavisas:
Synthesis of vowels and tones in Thai language by articulatory modeling. 1909-1912 - Yoshinori Shiga, Simon King:
Source-filter separation for articulation-to-speech synthesis. 1913-1916 - Hisako Asano, Hideharu Nakajima, Hideyuki Mizuno, Masahiro Oku:
Long vowel detection for letter-to-sound conversion for Japanese sourced words transliterated into the alphabet. 1917-1920 - Frantz Clermont, Thomas John Millhouse:
Inexactness and robustness in cepstral-to-formant transformation of spoken and sung vowels. 1921-1924 - Takeshi Saitou, Naoya Tsuji, Masashi Unoki, Masato Akagi:
Analysis of acoustic features affecting "singing-ness" and its application to singing-voice synthesis from speaking-voice. 1925-1928 - Vincent Pollet, Geert Coorman:
Statistical corpus-based speech segmentation. 1929-1932 - Jindrich Matousek, Jan Romportl, Daniel Tihelka, Zbynek Tychtl:
Recent improvements on ARTIC: czech text-to-speech system. 1933-1936 - Youngim Jung, Donghun Lee, HyeonSook Nam, Ae-sun Yoon, Hyuk-Chul Kwon:
Learning for transliteration of arabic-numeral expressions using decision tree for Korean TTS. 1937-1940 - Nicole Beringer:
How to integrate phonetic and linguistic knowledge in a text-to-phoneme conversion task: a syllabic TPC tool for French. 1941-1944 - Wael Hamza, Ellen Eide, Raimo Bakis:
Reconciling pronunciation differences between the front-end and the back-end in the IBM speech synthesis system. 2561-2564 - Juhong Ha, Yu Zheng, Gary Geunbae Lee, Yoon-Suk Seong, Byeongchang Kim:
High quality text-to-pinyin conversion using two-phase unknown word prediction. 2565-2568 - Yeon-Jun Kim, Ann K. Syrdal, Alistair Conkie:
Pronunciation lexicon adaptation for TTS voice building. 2569-2572 - Gabriel Webster:
Improving letter-to-pronunciation accuracy with automatic morphologically-based stress prediction. 2573-2576 - Wael Hamza, Ellen Eide, Raimo Bakis, Michael Picheny, John F. Pitrelli:
The IBM expressive speech synthesis system. 2577-2580 - Markus Schnell, Rüdiger Hoffmann:
What concept-to-speech can gain for prosody. 2581-2584
Speech Recognition - Language Model
- Tatsuya Kawahara, Kiyotaka Uchimoto, Hitoshi Isahara, Kazuya Shitaoka:
Dependency structure analysis and sentence boundary detection in spontaneous Japanese. 1353-1356 - Salma Jamoussi, David Langlois, Jean Paul Haton, Kamel Smaïli:
Statistical feature language model. 1357-1360 - Brigitte Bigi, Yan Huang, Renato de Mori:
Vocabulary and language model adaptation using information retrieval. 1361-1364 - Shinsuke Mori, Daisuke Takuma:
Word n-gram probability estimation from a Japanese raw corpus. 1365-1368 - Jen-Tzung Chien, Hung-Ying Chen:
Mining of association patterns for language modeling. 1369-1372 - Jen-Tzung Chien, Meng-Sung Wu, Hua-Jui Peng:
On latent semantic language modeling and smoothing. 1373-1376 - Vaibhava Goel:
Conditional maximum likelihood estimation for improving annotation performance of n-gram models incorporating stochastic finite state grammars. 2237-2241 - Edward James Schofield:
Fast parameter estimation for joint maximum entropy language models. 2241-2244 - Dimitra Vergyri, Katrin Kirchhoff, Kevin Duh, Andreas Stolcke:
Morphology-based language modeling for arabic speech recognition. 2245-2248 - A. Nayeemulla Khan, B. Yegnanarayana:
Speech enhanced multi-Span language model. 2249-2252 - Holger Schwenk, Jean-Luc Gauvain:
Neural network language models for conversational speech recognition. 2253-2256 - David Mrva, Philip C. Woodland:
A PLSA-based language model for conversational telephone speech. 2257-2260
Speaker Recognition
- Jérôme Louradour, Régine André-Obrecht, Khalid Daoudi:
Segmentation and relevance measure for speaker verification. 1401-1404 - Mohamed Chetouani, Bruno Gas, Jean-Luc Zarader, Marcos Faúndez-Zanuy:
A new nonlinear feature extraction algorithm for speaker verification. 1405-1408 - Elizabeth Shriberg, Luciana Ferrer, Anand Venkataraman, Sachin S. Kajarekar:
SVM modeling of "SNERF-grams" for speaker recognition. 1409-1412 - Purdy Ho, Pedro J. Moreno:
SVM kernel adaptation in speaker classification and verification. 1413-1416 - Koji Iwano, Taichi Asami, Sadaoki Furui:
Noise-robust speaker verification using F0 features. 1417-1420 - Zi-He Chen, Yuan-Fu Liao, Yau-Tarng Juang:
Eigen-prosody analysis for robust speaker recognition under mismatch handset environment. 1421 - Aaron D. Lawson, Mark C. Huggins:
Triphone-based confidence system for speaker identification. 1745-1748 - Kenichi Yoshida, Kazuyuki Takagi, Kazuhiko Ozeki:
Improved model training and automatic weight adjustment for multi-SNR multi-band speaker identification system. 1749-1752 - Man-Wai Mak, Kwok-Kwong Yiu, Ming-Cheung Cheung, Sun-Yuan Kung:
A new approach to channel robust speaker verification via constrained stochastic feature transformation. 1753-1756 - Chakib Tadj, Christian S. Gargour, Nabil Badri:
Best speaker-based structure tree for speaker verification. 1757-1760 - David Chow, Waleed H. Abdulla:
Robust speaker identification based on perceptual log area ratio and Gaussian mixture models. 1761-1764 - Stanley J. Wenndt, Richard M. Floyd:
Channel frequency response correction for speaker recognition. 1765-1768 - Yh-Her Yang, Yuan-Fu Liao:
Unseen handset mismatch compensation based on a priori knowledge interpolation for robust speaker recognition. 1769-1772 - Michael T. Padilla, Thomas F. Quatieri:
A comparison of soft and hard spectral subtraction for speaker verification. 1773-1776 - Vlasta Radová, Ales Padrta:
Comparison of several speaker verification procedures based on GMM. 1777-1780 - Yong Guan, Wenju Liu, Hongwei Qi, Jue Wang:
Improving performance of text-independent speaker identification by utilizing contextual principal curves filtering. 1781-1784 - Jen-Tzung Chien, Chuan-Wei Ting:
Speaker identification using probabilistic PCA model selection. 1785-1788 - Hagai Aronowitz, David Burshtein, Amihood Amir:
Text independent speaker recognition using speaker dependent word spotting. 1789-1792 - Hsiao-Chuan Wang, Jyh-Min Cheng:
A study on model-based equal error rate estimation for automatic speaker verification. 1793-1796 - Tomoko Matsui, Kunio Tanabe:
Probabilistic speaker identification with dual penalized logistic regression machine. 1797-1800 - Javier R. Saeta, Javier Hernando:
Model quality evaluation during enrolment for speaker verification. 1801-1804 - Pasi Fränti, Evgeny Karpov, Tomi Kinnunen:
Real-time speaker identification. 1805-1808 - Mohamed Fathy Abu-ElYazeed, Nemat S. Abdel Kader, Mohammed El-Henawy:
Multi-codebook vector quantization algorithm for speaker identification. 1809-1812 - Ming-Cheung Cheung, Kwok-Kwong Yiu, Man-Wai Mak, Sun-Yuan Kung:
Multi-sample fusion with constrained feature transformation for robust speaker verification. 1813-1816 - Michael Betser, Frédéric Bimbot, Mathieu Ben, Guillaume Gravier:
Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs. 2329-2332 - Nengheng Zheng, P. C. Ching, Tan Lee:
Time -frequency analysis of vocal source signal for speaker recognition. 2333-2336 - Rashmi Gangadharaiah, Balakrishnan Narayanaswamy, Narayanaswamy Balakrishnan:
A novel method for two-speaker segmentation. 2337-2340 - Bayya Yegnanarayana, A. Shahina, M. R. Kesheorey:
Throat microphone signal for speaker recognition. 2341-2344 - Mohamed Faouzi BenZeghiba, Hervé Bourlard:
Posteriori probabilities and likelihoods combination for speech and speaker recognition. 2345-2348 - Mohamed Mihoubi, Douglas D. O'Shaughnessy, Pierre Dumouchel:
The use of typical sequences for robust speaker identification. 2349-2352 - KyungHwa Kim:
A forensic phonetic investigation into the duration and speech rate. 2353-2356 - T. V. Sreenivas, Sameer Badaskar:
Mixture Gaussian model training against impostor model parameters: an application to speaker identification. 2357-2360 - Jan Anguita, Javier Hernando, Alberto Abad:
Jacobian adaptation with improved noise reference for speaker verification. 2361-2364 - Mihalis Siafarikas, Todor Ganchev, Nikos Fakotakis:
Objective wavelet packet features for speaker verification. 2365-2368 - Upendra V. Chaudhari, Ganesh N. Ramaswamy:
Policy analysis framework for conversational biometrics. 2369-2372 - Woo-Yong Choi, Jung Gon Kim, Hyung Soon Kim, Sung Bum Pan:
A new score normalization method for speaker verification with virtual impostor model. 2373-2376 - Samuel Kim, Thomas Eriksson, Hong-Goo Kang:
On the time variability of vocal tract for speaker recognition. 2377-2380 - Veena Desai, Hema A. Murthy:
Distributed speaker recognition. 2381-2384 - Pongtep Angkititrakul, Sepideh Baghaii, John H. L. Hansen:
Cluster-dependent modeling and confidence measure processing for in-set/out-of-set speaker identification. 2385-2388 - Yoshiyuki Umeda, Shingo Kuroiwa, Satoru Tsuge, Fuji Ren:
Distributed speaker recognition using earth mover's distance. 2389-2392 - Michael Barlow, Mehrdad Khodai-Joopari, Frantz Clermont:
A forensically-motivated tool for selecting cepstrally-consistent steady-states from non-contemporaneous vowel utterances. 2393-2396 - Anil Alexander, Andrzej Drygajlo:
Scoring and direct methods for the interpretation of evidence in forensic speaker recognition. 2397-2400 - Tomi Kinnunen, Evgeny Karpov, Pasi Fränti:
Efficient online cohort selection method for speaker verification. 2401-2404 - A. Nayeemulla Khan, Bayya Yegnanarayana:
Latent semantic analysis for speaker recognition. 2589-2592 - Yang Shao, DeLiang Wang:
Model-based sequential organization for cochannel speaker identification. 2593-2596 - Ka-Yee Leung, Man-Wai Mak, Sun-Yuan Kung:
Articulatory feature-based conditional pronunciation modeling for speaker verification. 2597-2600 - Alex Park, Timothy J. Hazen:
A comparison of normalization and training approaches for ASR-dependent speaker identification. 2601-2604 - Dat Tran:
New background modeling for speaker verification. 2605-2608
Processing of Prosody by Humans and Machines
- Gérard Bailly, Bleicke Holm, Véronique Aubergé:
A trainable prosodic model: learning the contours implementing communicative functions within a superpositional model of intonation. 1425-1428 - Dung Tien Nguyen, Chi Mai Luong, Bang Kim Vu, Hansjörg Mixdorff, Huy Hoang Ngo:
Fujisaki model based F0 contours in vietnamese TTS. 1429-1432 - Kazuyuki Ashimura, Hideki Kashioka, Nick Campbell:
Estimating speaking rate in spontaneous speech from z-scores of pattern durations. 1433-1436 - Takashi Masuko, Takao Kobayashi, Keisuke Miyanaga:
A style control technique for HMM-based speech synthesis. 1437-1440 - Mark Hasegawa-Johnson, Stephen E. Levinson, Tong Zhang:
Children's emotion recognition in an intelligent tutoring scenario. 1441-1444 - Keikichi Hirose, Nobuaki Minematsu:
Use of prosodic features for speech recognition. 1445-1448
Contemporary Issues in ASR
- Jochen Peters, Christina Drexel:
Transformation-based error correction for speech-to-text systems. 1449-1452 - Alexander Gutkin, Simon King:
Phone classification in pseudo-euclidean vector spaces. 1453-1456 - Grace Chung, Chao Wang, Stephanie Seneff, Edward Filisko, Min Tang:
Combining linguistic knowledge and acoustic information in automatic pronunciation lexicon generation. 1457-1460 - Ken Chen, Mark Hasegawa-Johnson:
Modeling pronunciation variation using artificial neural networks for English spontaneous speech. 1461-1464 - Stefanie Aalburg, Harald Höge:
Foreign-accented speaker-independent speech recognition. 1465-1468 - Panikos Heracleous, Yoshitaka Nakajima, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Non-audible murmur (NAM) speech recognition using a stethoscopic NAM microphone. 1469-1472 - Martin J. Russell, Shona D'Arcy, Lit Ping Wong:
Recognition of read and spontaneous children's speech using two new corpora. 1473-1476 - Joe Frankel, Mirjam Wester, Simon King:
Articulatory feature recognition using dynamic Bayesian networks. 1477-1480 - Gies Bouwman, Bert Cranen, Lou Boves:
Predicting word correct rate from acoustic and linguistic confusability. 1481-1484 - Kazushi Ishihara, Yuya Hattori, Tomohiro Nakatani, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:
Disambiguation in determining phonemes of sound-imitation words for environmental sound recognition. 1485-1488 - Jan Anguita, Stéphane Peillon, Javier Hernando, Alexandre Bramoulle:
Word confusability prediction in automatic speech recognition. 1489-1492 - Szu-Chen Stan Jou, Tanja Schultz, Alex Waibel:
Adaptation for soft whisper recognition using a throat microphone. 1493-1496 - Rainer Gruhn, Konstantin Markov, Satoshi Nakamura:
A statistical lexicon for non-native speech recognition. 1497-1500 - Mathew Magimai-Doss, Shajith Ikbal, Todd A. Stephenson, Hervé Bourlard:
Modeling auxiliary features in tandem systems. 1501-1504 - Louis ten Bosch, Lou Boves:
Survey of spontaneous speech phenomena in a multimodal dialogue system and some implications for ASR. 1505-1508 - Tobias Cincarek, Rainer Gruhn, Satoshi Nakamura:
Speech recognition for multiple non-native accent groups with speaker-group-dependent acoustic models. 1509-1512 - Frederik Stouten, Jean-Pierre Martens:
Coping with disfluencies in spontaneous speech recognition. 1513-1516 - Soonil Kwon, Shrikanth S. Narayanan:
Speaker model quantization for unsupervised speaker indexing. 1517-1520 - Matteo Gerosa, Diego Giuliani:
Investigating automatic recognition of non-native children's speech. 1521-1524 - Yang Liu, Elizabeth Shriberg, Andreas Stolcke, Mary P. Harper:
Using machine learning to cope with imbalanced classes in natural speech: evidence from sentence boundary and disfluency detection. 1525-1528 - Minho Jin, Gyucheol Jang, Sungrack Yun, Chang Dong Yoo:
Hybrid utterance verification based on n-best models and model derived from kulback-leibler divergence. 1529-1532 - Masataka Goto, Koji Kitayama, Katsunobu Itou, Tetsunori Kobayashi:
Speech spotter: on-demand speech recognition in human-human conversation on the telephone or in face-to-face situations. 1533-1536 - Kyong-Nim Lee, Minhwa Chung:
Pronunciation lexicon modeling and design for Korean large vocabulary continuous speech recognition. 1537-1540 - Sebastian Möller, Jan Felix Krebber, Alexander Raake:
Performance of speech recognition and synthesis in packet-based networks. 1541-1544 - Alastair Bruce James, Ben P. Milner, Angel Manuel Gomez:
A comparison of packet loss compensation methods and interleaving for speech recognition in burst-like packet loss. 1545-1548 - Ben P. Milner, Alastair Bruce James:
An analysis of packet loss models for distributed speech recognition. 1549-1552
Second Language Learning and Spoken Language Processing
- Nobuaki Minematsu:
Pronunciation assessment based upon the phonological distortions observed in language learners' utterances. 1669-1672 - Yasuo Suzuki, Yoshinori Sagisaka, Katsuhiko Shirai, Makiko Muto:
Analysis of the phone level contributions to objective evaluation of English speech by non-natives. 1673-1676 - Chao Wang, Mitchell Peabody, Stephanie Seneff, Jong-mi Kim:
An interactive English pronunciation dictionary for Korean learners. 1677-1680 - Seok-Chae Rhee, Jeon G. Park:
Development of the knowledge-based spoken English evaluation system and its application. 1681-1684 - Jared Bernstein, Isabella Barbier, Elizabeth Rosenfeld, John H. A. L. de Jong:
Theory and data in spoken language assessment. 1685-1688 - Tatsuya Kawahara, Masatake Dantsuji, Yasushi Tsubota:
Practical use of English pronunciation system for Japanese students in the CALL classroom. 1689-1692 - Jonas Beskow, Olov Engwall, Björn Granström, Preben Wik:
Design strategies for a virtual language tutor. 1693-1696
Emerging Research: Human Factors in Speech and Communication Systems
- Ellen Campana, Michael K. Tanenhaus, James F. Allen, Roger W. Remington:
Evaluating cognitive load in spoken language interfaces using a dual-task paradigm. 1721-1724 - Lesley-Ann Black, Norman D. Black, Roy Harper, Michelle Lemon, Michael F. McTear:
The voice-logbook: integrating human factors for a chronic care system. 1725-1728 - Kristiina Jokinen:
Communicative competence and adaptation in a spoken dialogue system. 1729-1732 - Zhan Fu, Lay Ling Pow, Fang Chen:
Evaluation of the difference between the driving behavior of a speech based and a speech-visual based task of an in-car compute. 1733-1736 - Sebastian Möller, Jan Felix Krebber, Paula M. T. Smeele:
Evaluating system metaphors via the speech output of a smart home system. 1737-1740 - Florian Hammer, Peter Reichl, Alexander Raake:
Elements of interactivity in telephone conversations. 1741-1744
Interdisciplinary Topics in Spoken Language Processing
- Rubén San Segundo, Juan Manuel Montero, Javier Macías Guarasa, Ricardo de Córdoba, Javier Ferreiros, José Manuel Pardo:
Generating gestures from speech. 1817-1820 - Noboru Kanedera, Asuka Sumida, Takao Ikehata, Tetsuo Funada:
Subtopic segmentation in the lecture speech. 1821-1824 - Donna Erickson, Caroline Menezes, Akinori Fujino:
Some articulatory measurements of real sadness. 1825-1828 - Chen-Long Lee, Wen-Whei Chang, Yuan-Chuan Chiang:
Application of voice conversion to hearing-impaired Mandarin speech enhancement. 1829-1832 - Oh Pyo Kweon, Akinori Ito, Motoyuki Suzuki, Shozo Makino:
A Japanese dialogue-based CALL system with mispronunciation and grammar error detection. 1833-1836 - Cheolwoo Jo, Ilsuh Bak:
Statistics-based direction finding for training vowels. 1837-1840 - Simona Montanari, Serdar Yildirim, Elaine Andersen, Shrikanth S. Narayanan:
Reference marking in children's computer-directed speech: an integrated analysis of discourse and gestures. 1841-1844 - Jong-mi Kim, Suzanne Flynn:
What makes a non-native accent?: a study of Korean English. 1845-1848 - Sang-Jin Kim, Kwang-Ki Kim, Minsoo Hahn:
Study on emotional speech features in Korean with its aplication to voice color conversion. 1849-1852 - Shigeaki Amano, Tomohiro Nakatani, Tadahisa Kondo:
Developmental changes in voiced-segment ratio for Japanese infants and parents. 1853-1856 - Kisun You, Hoyoun Kim, Wonyong Sung:
Implementation of an intonational quality assessment system for a handheld device. 1857-1860 - Denis Beautemps, Thomas Burger, Laurent Girin:
Characterizing and classifying cued speech vowels from labial parameters. 1861-1864 - Shinya Takahashi, Tsuyoshi Morimoto, Sakashi Maeda, Naoyuki Tsuruta:
Cough detection in spoken dialogue system for home health care. 1865-1868
Towards Adaptive Machines: Active and Unsupervised Learning
- Dong Yu, Mei-Yuh Hwang, Peter Mau, Alex Acero, Li Deng:
Unsupervised learning from users' error correction in speech dictation. 1969-1972 - Gerard G. L. Meyer, Teresa M. Kamm:
Robustness aspects of active learning for acoustic modeling. 1973-1976 - Karthik Visweswariah, Ramesh A. Gopinath, Vaibhava Goel:
Task adaptation of acoustic and language models based on large quantities of data. 1977-1980 - Luc Lussier, Edward W. D. Whittaker, Sadaoki Furui:
Unsupervised language model adaptation methods for spontaneous speech. 1981-1984 - Masafumi Nishida, Yoshitaka Mamiya, Yasuo Horiuchi, Akira Ichikawa:
On-line incremental adaptation based on reinforcement learning for robust speech recognition. 1985-1988 - Tomohiro Watanabe, Hiromitsu Nishizaki, Takehito Utsuro, Seiichi Nakagawa:
Unsupervised speaker adaptation using high confidence portion recognition results by multiple recognition systems. 1989-1992
Speech Coding
- Sorin Dusan, James L. Flanagan, Amod Karve, Mridul Balaraman:
Speech coding using trajectory compression and multiple sensors. 1993-1996 - Christian Feldbauer, Gernot Kubin:
How sparse can we make the auditory representation of speech? 1997-2000 - David Malah, Slava Shechtman:
Efficient sub-optimal temporal decomposition with dynamic weighting of speech signals for coding applications. 2001-2004 - Teddy Surya Gunawan, Eliathamby Ambikairajah, Julien Epps:
Perceptual wavelet packet audio coder. 2005-2008 - Sung-Kyo Jung, Hong-Goo Kang, Dae Hee Youn, Chang-Heon Lee:
Performance analysis of transcoding algorithms in packet-loss environments. 2009-2012 - Tiago H. Falk, Wai-Yip Chan, Peter Kabal:
Speech quality estimation using Gaussian mixture models. 2013-2016
Robust ASR
- Hong Kook Kim, Mazin G. Rahim:
Why speech recognizers make errors ? a robustness view. 1645-1648 - Seyed Mohammad Ahadi, Hamid Sheikhzadeh, Robert L. Brennan, George H. Freeman:
An energy normalization scheme for improved robustness in speech recognition. 1649-1652 - Juan M. Huerta, Etienne Marcheret, Sreeram Balakrishnan:
Rapid on-line environment compensation for server - based speech recognition in noisy mobile environments. 1653-1656 - Leila Ansary, Seyyed Ali Seyyed Salehi:
Modeling phones coarticulation effects in a neural network based speech recognition system. 1657-1660 - Daniel Willett:
Error - weighted discriminative training for HMM parameter estimation. 1661-1664 - Wai Kit Lo, Frank K. Soong, Satoshi Nakamura:
Robust verification of recognized words in noise. 1665-1668 - Zili Li, Hesham Tolba, Douglas D. O'Shaughnessy:
Robust automatic speech recognition using an optimal spectral amplitude estimator algorithm in low-SNR car environments. 2041-2044 - Junhui Zhao, Jingming Kuang, Xiang Xie:
Robust speech recognition using data-driven temporal filters based on independent component analysis. 2045-2048 - Norihide Kitaoka, Longbiao Wang, Seiichi Nakagawa:
Robust distant speech recognition based on position dependent CMN. 2049-2052 - Sumitaka Sakauchi, Yoshikazu Yamaguchi, Satoshi Takahashi, Satoshi Kobashikawa:
Robust speech recognition based on HMM composition and modified wiener filter. 2053-2056 - Ivan Brito, Néstor Becerra Yoma, Carlos Molina:
Feature-dependent compensation in speech recognition. 2057-2060 - Stephen Cox:
Using context to correct phone recognition errors. 2061-2064 - Yasunari Obuchi:
Improved histogram-based feature compensation for robust speech recognition and unsupervised speaker adaptation. 2065-2068 - Zhenyu Xiong, Thomas Fang Zheng, Wenhu Wu:
Weighting observation vectors for robust speech recognition in noisy environments. 2069-2072 - Masanori Tsujikawa, Ken-ichi Iso:
Hands-free speech recognition using blind source separation post-processed by two-stage spectral subtraction. 2073-2076 - Randy Gomez, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Robust speech recognition with spectral subtraction in low SNR. 2077-2080 - Bert Cranen, Johan de Veth:
Active perception: using a priori knowledge from clean speech models to ignore non-target features. 2081-2084 - Haitian Xu, Zheng-Hua Tan, Paul Dalsgaard, Børge Lindberg:
Spectral subtraction with full-wave rectification and likelihood controlled instantaneous noise estimation for robust speech recognition. 2085-2088 - Filip Korkmazsky, Dominique Fohr, Irina Illina:
Using linear interpolation to improve histogram equalization for speech recognition. 2089-2092 - Mark Hasegawa-Johnson, Ameya N. Deoras:
A factorial HMM aproach to robust isolated digit recognition in background music. 2093-2096 - Yoonjae Lee, Hanseok Ko:
Multi-eigenspace normalization for robust speech recognition in noisy environments. 2097-2100 - Christophe Cerisara, Dominique Fohr, Odile Mella, Irina Illina:
Exploiting models intrinsic robustness for noisy speech recognition. 2101-2104 - Pere Pujol, Jaume Padrell, Climent Nadeu, Dusan Macho:
Speech recognition experiments with the SPEECON database using several robust front-ends. 2105-2108 - Shajith Ikbal, Mathew Magimai-Doss, Hemant Misra, Hervé Bourlard:
Spectro-temporal activity pattern (STAP) features for noise robust ASR. 2109-2112 - Byoung-Don Kim, Jin Young Kim, Seung Ho Choi, Young-Bum Lee, Kyoung-Rok Lee:
Improvement of confidence measure performance using background model set algorithm. 2113-2116 - Guillermo Aradilla, John Dines, Sunil Sivadas:
Using RASTA in task independent TANDEM feature extraction. 2117-2120 - Kyu Jeong Han, Shrikanth S. Narayanan, Naveen Srinivasamurthy:
A distributed speech recognition system in multi-user environments. 2121-2124 - Reinhold Haeb-Umbach, Valentin Ion:
Soft features for improved distributed speech recognition over wireless networks. 2125-2128
Emerging Research
- Rinzou Ebukuro:
Analysis on disappearing and thriving of speech applications for ergonomic design guidelines and recommendations. 2217-2220 - Paula M. T. Smeele, Sebastian Möller, Jan Felix Krebber:
Evaluation of the speech output of a smart-home system in a car environment. 2221-2225 - Ellen C. Haas:
How does the integration of speech recognition controls and spatialized auditory displays affect user workload? 2225-2228 - Fang Chen:
Speech interaction system - how to increase its usability? 2229-2232 - Nicole Beringer:
Human language acquisition methods in a machine learning task. 2233-2236
Spoken Language Resources and Technology Evaluation I
- Laila Dybkjær, Niels Ole Bernsen, Wolfgang Minker:
New challenges in usability evaluation - beyond task-oriented spoken dialogue systems. 2261-2264 - Owen Kimball, Chia-Lin Kao, Rukmini Iyer, Teodoro Arvizo, John Makhoul:
Using quick transcriptions to improve conversational speech models. 2265-2268 - Rohit Mishra, Elizabeth Shriberg, Sandra Upson, Joyce Chen, Fuliang Weng, Stanley Peters, Lawrence Cavedon, John Niekrasz, Hua Cheng, Harry Bratt:
A wizard of oz framework for collecting spoken human-computer dialogs. 2269-2272 - Mikko Hartikainen, Esa-Pekka Salonen, Markku Turunen:
Subjective evaluation of spoken dialogue systems using SER VQUAL method. 2273-2276 - Ioana Vasilescu, Laurence Devillers, Chloé Clavel, Thibaut Ehrette:
Fiction database for emotion detection in abnormal situations. 2277-2280 - Ruhi Sarikaya, Yuqing Gao, Paola Virga:
Fast semi-automatic semantic annotation for spoken dialog systems. 2281-2284 - Yi-Jian Wu, Hisashi Kawai, Jinfu Ni, Ren-Hua Wang:
A study on automatic detection of Japanese vowel devoicing for speech synthesis. 2721-2724 - Tolga Çiloglu, Dinc Acar, Ahmet Tokatli:
Orientel-turkish: telephone speech database description and notes on the experience. 2725-2728 - Taejin Yoon, Sandra Chavarria, Jennifer Cole, Mark Hasegawa-Johnson:
Intertranscriber reliability of prosodic labeling on telephone conversation using toBI. 2729-2732 - Jilei Tian:
Efficient compression method for pronunciation dictionaries. 2733-2736 - Min-Siong Liang, Dau-Cheng Lyu, Yuang-Chin Chiang, Ren-Yuan Lyu:
Construct a multi-lingual speech corpus in taiwan with extracting phonetically balanced articles. 2737-2740 - Per Olav Heggtveit, Jon Emil Natvig:
Automatic prosody labeling of read norwegian. 2741-2744 - Eric Sanders, Andrea Diersen, Willy Jongenburger, Helmer Strik:
Towards automatic word segmentation of dialect speech. 2745-2748 - Petr Fousek, Frantisek Grézl, Hynek Hermansky, Petr Svojanovsky:
New nonsense syllables database - analyses and preliminary ASR experiments. 2749-2752 - Jan Felix Krebber, Sebastian Möller, Alexander Raake:
Speech input and output module assessment for remote access to a smart-home spoken dialog system. 2753-2756 - Dong-Hyun Kim, Yong-Wan Roh, Kwang-Seok Hong:
An implement of speech DB gathering system using voiceXML. 2757-2760 - Farshad Almasganj:
Precise phone boundary detection using wavelet packet and recurrent neural networks. 2761-2764 - Andrew Cameron Morris, Viktoria Maier, Phil D. Green:
From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition. 2765-2768 - Seok-Chae Rhee, Sook-Hyang Lee, Young-Ju Lee, Seok-Keun Kang:
Design and construction of Korean-spoken English corpus. 2769-2772 - Folkert de Vriend, Giulio Maltese:
Exploring XML-based technologies and procedures for quality evaluation from a real-life case perspective. 2773-2776 - Kuansan Wang:
Spoken language interface in ECMA/ISO telecommunication standards. 2777-2780 - Marelie H. Davel, Etienne Barnard:
The efficient generation of pronunciation dictionaries: machine learning factors during bootstrapping. 2781-2784 - Anja Geumann:
Towards a new level of anotation detail of multilingual speech corpora. 2785-2788 - Nobuo Kawaguchi, Shigeki Matsubara, Yukiko Yamaguchi, Kazuya Takeda, Fumitada Itakura:
CIAIR in-car speech database. 2789-2792 - Christophe Van Bael, Henk van den Heuvel, Helmer Strik:
Investigating speech style specific pronunciation variation in large spoken language corpora. 2793-2796 - Marelie H. Davel, Etienne Barnard:
The efficient generation of pronunciation dictionaries: human factors during bootstrapping. 2797-2800
Multi-Modal / Multi-Media Processing
- Roger K. Moore:
Modeling data entry rates for ASR and alternative input methods. 2285-2288 - Hiromitsu Ban, Chiyomi Miyajima, Katsunobu Itou, Fumitada Itakura, Kazuya Takeda:
Speech recognition using synchronization between speech and finger tapping. 2289-2292 - Anurag Kumar Gupta, Tasos Anastasakos:
Integration patterns during multimodal interaction. 2293-2296 - Etienne Marcheret, Stephen M. Chu, Vaibhava Goel, Gerasimos Potamianos:
Efficient likelihood computation in multi-stream HMM based audio-visual speech recognition. 2297-2300 - Changkyu Choi, Donggeon Kong, Hyoung-Ki Lee, Sang Min Yoon:
Separation of multiple concurrent speeches using audio-visual speaker localization and minimum variance beam-forming. 2301-2304 - Tokitomo Ariyoshi, Kazuhiro Nakadai, Hiroshi Tsujino:
Multimodal expression for humanoid robots by integration of human speech mimicking and facial color. 2305-2308
Automatic Speech Recognition in the Context of Mobile Communications
- Miroslav Novak:
Towards large vocabulary ASR on embedded platforms. 2309-2312 - Hiroshi Fujimura, Katsunobu Itou, Kazuya Takeda, Fumitada Itakura:
Analysis of in-car speech recognition experiments using a large-scale multi-mode dialogue corpus. 2313-2316 - Zheng-Hua Tan, Paul Dalsgaard, Børge Lindberg:
On the integration of speech recognition into personal networks. 2317-2320 - Richard C. Rose, Hong Kook Kim:
Robust speech recognition in client-server scenarios. 2321-2324 - Sangbae Jeong, Icksang Han, Eugene Jon, Jeongsu Kim:
Memory and computation reduction for embedded ASR systems. 2325-2328
Robust Features for ASR
- Takashi Fukuda, Tsuneo Nitta:
Canonicalization of feature parameters for automatic speech recognition. 2537-2540 - Soundararajan Srinivasan, Nicoleta Roman, DeLiang Wang:
On binary and ratio time-frequency masks for robust speech recognition. 2541-2544 - Alberto Sanchís, Alfons Juan, Enrique Vidal:
New features based on multiple word graphs for utterance verification. 2545-2548 - Lukás Burget:
Combination of speech features using smoothed heteroscedastic linear discriminant analysis. 2549-2552 - Shajith Ikbal, Hemant Misra, Sunil Sivadas, Hynek Hermansky, Hervé Bourlard:
Entropy based combination of tandem representations for noise robust ASR. 2553-2556 - Dongsuk Yook, Donghyun Kim:
Fast speech adaptation in linear spectral domain for additive and convolutional noise. 2557-2560
Towards Rapid Speech and Natural Language Application Development: Tooling, Architectures, Components and Standards
- I. Lee Hetherington:
The MIT finite-state transducer toolkit for speech and language processing. 2609-2612 - Junlan Feng, Srinivas Bangalore, Mazin G. Rahim:
Question-answering in webtalk: an evaluation study. 2613-2616 - Juan M. Huerta, Chaitanya Ekanadham:
Automatic network optimization of voice applications. 2617-2620 - Miguel Angel Rodriguez-Moreno, Heriberto Cuayáhuitl, Juventino Montiel-Hernández:
Voicebuilder: a framework for automatic speech application development. 2621-2624 - Andrea Facco, Daniele Falavigna, Roberto Gretter, Marcello Viganò:
On the development of telephone applications: some practical issues and evaluation. 2625-2628 - Stefan W. Hamerich, Volker Schless, Basilis Kladis, Volker Schubert, Otilia Kocsis, Stefan Igel, Ricardo de Córdoba, Luis Fernando D'Haro, José Manuel Pardo:
The GEMINI platform: semi-automatic generation of dialogue applications. 2629-2632
Speech Coding and Enhancement
- Kazuhiro Kondo, Kiyoshi Nakagawa:
A packet loss concealment method using recursive linear prediction. 2633-2636 - Minkyu Lee, Imed Zitouni, Qiru Zhou:
On a n-gram model approach for packet loss concealment. 2637-2640 - Stephen So, Kuldip K. Paliwal:
Efficient vector quantisation of line spectral frequencies using the switched split vector quantiser. 2641-2644 - M. Chaitanya, S. R. Mahadeva Prasanna, B. Yegnanarayana:
Enhancement of reverberant speech using excitation source information. 2645-2648 - Keisuke Kinoshita, Tomohiro Nakatani, Masato Miyoshi:
Improving automatic speech recognition performance and speech inteligibility with harmonicity based dereverberation. 2649-2652 - Seung Yeol Lee, Nam Soo Kim, Joon-Hyuk Chang:
Inner product based-multiband vector quantization for wideband speech coding at 16 kbps. 2653-2656 - Alberto Abad, Javier Hernando:
Speech enhancement and recognition by integrating adaptive beamforming and wiener filtering. 2657-2660 - Kyung-Tae Kim, Sung-Kyo Jung, MiSuk Lee, Hong-Goo Kang, Dae Hee Youn:
Temporal normalization techniques for transform-type speech coding and application to split-band wideband coders. 2661-2664 - Tatsunori Asai, Shigeki Miyabe, Hiroshi Saruwatari, Kiyohiro Shikano:
Interface for barge-in free spoken dialogue system using adaptive sound field control. 2665-2668 - Jong-Hark Kim, Jae-Hyun Shin, InSung Lee:
Multi-mode harmonic transfrom excitation LPC coding for speech and music. 2669-2672 - Mital Gandhi, Mark Hasegawa-Johnson:
Source separation using particle filters. 2673-2676 - Anssi Rämö, Jani Nurminen, Sakari Himanen, Ari Heikkinen:
Segmental speech coding model for storage applications. 2677-2680 - Gwo-hwa Ju, Lin-Shan Lee:
Improved speech enhancement by applying time-shift property of DFT on hankel matrices for signal subspace decomposition. 2681-2684 - Jari Juhani Turunen, Juha T. Tanttu, Frank Cameron:
Minimum phase compensation in speech coding using hammerstein model. 2685-2688 - Weifeng Li, Fumitada Itakura, Kazuya Takeda:
Optimizing regression for in-car speech recognition using multiple distributed microphones. 2689-2692 - Weifeng Li, Kazuya Takeda, Fumitada Itakura, Tran Huy Dat:
Speech enhancement based on magnitude estimation using the gamma prior. 2693-2696 - Andrew Errity, John McKenna, Stephen Isard:
Unscented kalman filtering of line spectral frequencies. 2697-2700 - Hyoung-Gook Kim, Thomas Sikora:
Speech enhancement based on smoothing of spectral noise floor. 2701-2704 - Junfeng Li, Masato Akagi:
Noise reduction using hybrid noise estimation technique and post-filtering. 2705-2708 - Marcel Gabrea:
An adaptive kalman filter for the enhancement of speech signals. 2709-2712 - T. V. Sreenivas, K. Sharath Rao, A. Sreenivasa Murthy:
Improved iterative wiener filtering for non-stationary noise speech enhancement. 2713-2716 - Yasheng Qian, Peter Kabal:
Highband spectrum envelope estimation of telephone speech using hard/soft-classification. 2717-2720
Acoustic Modeling for Robust ASR
- Filip Korkmazsky, Murat Deviren, Dominique Fohr, Irina Illina:
Hidden factor dynamic Bayesian networks for speech recognition. 2801-2804 - Mark Z. Mao, Vincent Vanhoucke:
Design of compact acoustic models through clustering of tied-covariance Gaussians. 2805-2808 - Chandra Kant Raut, Takuya Nishimoto, Shigeki Sagayama:
Model composition by lagrange polynomial approximation for robust speech recognition in noisy environment. 2809-2812 - Jian Wu, Donglai Zhu, Qiang Huo:
A study of minimum classification error training for segmental switching linear Gaussian hidden Markov models. 2813-2816 - Shigeki Matsuda, Takatoshi Jitsuhiro, Konstantin Markov, Satoshi Nakamura:
Speech recognition system robust to noise and speaking styles. 2817-2820 - Néstor Becerra Yoma, Ivan Brito, Carlos Molina:
The stochastic weighted viterbi algorithm: a frame work to compensate additive noise and low-bit rate coding distortion. 2821-2824
Spoken Dialogue Technology and Systems
- Stefanie Tomko, Roni Rosenfeld:
Shaping spoken input in user-initiative systems. 2825-2828 - Christopher J. Pavlovski, Jennifer C. Lai, Stella Mitchell:
Etiology of user experience with natural language speech. 2829-2832 - Manny Rayner, Beth Ann Hockey:
Side effect free dialogue management in a voice enabled procedure browser. 2833-2836 - Ian Richard Lane, Tatsuya Kawahara, Shinichi Ueno:
Example-based training of dialogue planning incorporating user and situation models. 2837-2840 - Shinya Fujie, Tetsunori Kobayashi, Daizo Yagi, Hideaki Kikuchi:
Prosody based attitude recognition with feature selection and its application to spoken dialog system as para-linguistic information. 2841-2844 - David Ollason, Yun-Cheng Ju, Siddharth Bhatia, Daniel Herron, Jackie Liu:
MS connect: a fully featured auto-attendant: system design, implementation and performance. 2845-2848
Multi-Channel Speech Processing
- Reinhold Haeb-Umbach, Sven Peschke, Ernst Warsitz:
Adaptive beamforming combined with particle filtering for acoustic source localization. 2849-2852 - Hong-Seok Kwon, Siho Kim, Keun-Sung Bae:
Time delay estimation using weighted CPSP function. 2853-2856 - Ilyas Potamitis, Panagiotis Zervas, Nikos Fakotakis:
DOA estimation of speech signals using semi-blind source separation techniques. 2857-2860 - Sang-Gyun Kim, Chang D. Yoo:
Blind separation of speech and sub-Gaussian signals in underdetermined case. 2861-2864 - Gil-Jin Jang, Changkyu Choi, Yongbeom Lee, Yung-Hwan Oh:
Adaptive cross-channel interference cancellation on blind signal separation outputs using source absence/presence detection and spectral subtraction. 2865-2868 - Erik M. Visser, Kwokleung Chan, Stanley Kim, Te-Won Lee:
A comparison of simultaneous 3-channel blind source separation to selective separation on channel pairs using 2-channel BSS. 2869-2872
Intersection of Spoken Language Processing and Written Language Processing
- Hyun-Bok Lee:
Towards a harmonious coexistence of spoken and written language. 2873-2876 - Miyoko Sugito:
Towards a grammar of spoken language - prosody of ill-formed utterances and listener's understanding in discourse -. 2877-2880 - Tatsuya Kawahara, Kazuya Shitaoka, Hiroaki Nanjo:
Automatic transformation of lecture transcription into document style using statistical framework. 2881-2884 - Karunesh Arora, Sunita Arora, Kapil Verma, Shyam Sunder Agrawal:
Automatic extraction of phonetically rich sentences from large text corpus of indian languages. 2885-2888 - Nicoletta Calzolari:
European initiatives to promote cooperation between speech and text communities. 2889-2892
Prosodic Recognition and Analysis
- Keiichi Takamaru:
Evaluation of a threshold for detecting local slower phrases in Japanese spontaneous conversational speech. 2969-2972 - Nazrul Effendy, Ekkarit Maneenoi, Patavee Charnvivit, Somchai Jitapunkul:
Intonation recognition for indonesian speech based on fujisaki model. 2973-2976 - Jinsong Zhang, Satoshi Nakamura, Keikichi Hirose:
Efficient tone classification of speaker independent continuous Chinese speech using anchoring based discriminating features. 2977-2980 - Michiko Watanabe, Yasuharu Den, Keikichi Hirose, Nobuaki Minematsu:
Clause types and filed pauses in Japanese spontaneous monologues. 2981-2984 - Yohei Yabuta, Yasuhiro Katagiri, Noriko Suzuki, Yugo Takeuchi:
Effect of voice prosody on the decision making process in human-computer interaction. 2985-2988 - Noriko Suzuki, Yasuhiro Katagiri:
Alignment of human prosodic patterns for spoken dialogue systems. 2989-2992 - Shinya Kiriyama, Shigeyoshi Kitazawa:
Evaluation of a prosodic labeling system utilizing linguistic information. 2993-2996 - Allison Blodgett:
Functions of intonation boundaries during spoken language comprehension in English. 2997-3000 - Marco Khne, Matthias Wolff, Matthias Eichner, Rüdiger Hoffmann:
Voice activation using prosodic features. 3001-3004 - Sahyang Kim:
The role of prosodic cues in word segmentation of Korean. 3005-3008 - Sun-Ah Jun:
Default phrasing and attachment preference in Korean. 3009-3012 - Sarah Borys, Aaron Cohen, Mark Hasegawa-Johnson, Jennifer Cole:
Modeling and recognition of phonetic and prosodic factors for improvements to acoustic speech recognition models. 3013-3016 - Eunjong Kong:
The role of pitch range variation in the discourse structure and intonation structure of Korean. 3017-3020 - Kazuyuki Takagi, Kazuhiko Ozeki:
Dependency analysis of read Japanese sentences using pause and F0 information: a speaker independent case. 3021-3024 - Shari R. Speer, Soyoung Kang:
Effects of prosodic boundaries on ambiguous syntactic clause boundaries in Japanese. 3025-3028 - Yasuko Nagasaki, Takanori Komatsu:
The superior effectivenes of the F0 range for identifying the context from sounds without phonemes. 3029-3032 - Tan Li, Montri Karnjanadecha, Thanate Khaorapapong:
A study of tone classification for continuous Thai speech recognition. 3033-3036 - Key-Seop Kim, Un Lim, Dong-Il Shin:
An acoustic-analytic role for the deviation between the scansion and reading of poems. 3037-3040 - Tomoko Ohsuga, Masafumi Nishida, Yasuo Horiuchi, Akira Ichikawa:
Estimating syntactic structure from prosodic features in Japanese speech. 3041-3044 - Masahiko Komatsu, Tsutomu Sugawara, Takayuki Arai:
Perceptual discrimination of prosodic types and their preliminary acoustic analysis. 3045-3048
Towards Rapid Speech and Natural Language Application Development
- Johann L'Hour, Olivier Boëffard, Jacques Siroux, Laurent Miclet, Francis Charpentier, Thierry Moudenc:
DORIS, a multiagent/IP platform for multimodal dialogue applications. 3049-3052 - Yu Chen:
EVITA-RAD: an extensible enterprise voice porTAI - rapid application development tool. 3053-3056 - Luis Fernando D'Haro, Ricardo de Córdoba, Rubén San Segundo, Juan Manuel Montero, Javier Macías Guarasa, José Manuel Pardo:
Strategies to reduce design time in multimodal/multilingual dialog applications. 3057-3060 - Gregory Aist:
Three-way system-user-expert interactions help you expand the capabilities of an existing spoken dialogue system. 3061-3064 - Giuseppe Di Fabbrizio, Charles Lewis:
Florence: a dialogue manager framework for spoken dialogue systems. 3065-3068 - Tatsuya Kawahara, Akinobu Lee, Kazuya Takeda, Katsunobu Itou, Kiyohiro Shikano:
Recent progress of open-source LVCSR engine julius and Japanese model repository. 3069-3072 - Hiroya Murao, Nobuo Kawaguchi, Shigeki Matsubara, Yukiko Yamaguchi, Kazuya Takeda, Yasuyoshi Inagaki:
Example-based spoken dialogue system with online example augmentation. 3073-3076 - Dirk Bhler:
Enhancing existing form-based dialogue managers with reasoning capabilities. 3077-3080 - Markku Turunen, Esa-Pekka Salonen, Mikko Hartikainen, Jaakko Hakulinen:
Robust and adaptive architecture for multilingual spoken dialogue systems. 3081-3084 - Porfírio P. Filipe, Nuno J. Mamede:
Towards ubiquitous task management. 3085-3088
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.