default search action
Hiroshi Saruwatari
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j98]Detai Xin, Junfeng Jiang, Shinnosuke Takamichi, Yuki Saito, Akiko Aizawa, Hiroshi Saruwatari:
JVNV: A Corpus of Japanese Emotional Speech With Verbal Content and Nonverbal Expressions. IEEE Access 12: 19752-19764 (2024) - [j97]Juliano G. C. Ribeiro, Shoichi Koyama, Hiroshi Saruwatari:
Physics-constrained adaptive kernel interpolation for region-to-region acoustic transfer function: a Bayesian approach. EURASIP J. Audio Speech Music. Process. 2024(1): 43 (2024) - [j96]Detai Xin, Shinnosuke Takamichi, Hiroshi Saruwatari:
JNV corpus: A corpus of Japanese nonverbal vocalizations with diverse phrases and emotions. Speech Commun. 156: 103004 (2024) - [j95]Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari:
Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis. IEEE ACM Trans. Audio Speech Lang. Process. 32: 1829-1844 (2024) - [j94]Juliano G. C. Ribeiro, Shoichi Koyama, Ryosuke Horiuchi, Hiroshi Saruwatari:
Sound Field Estimation Based on Physics-Constrained Kernel Interpolation Adapted to Environment. IEEE ACM Trans. Audio Speech Lang. Process. 32: 4369-4383 (2024) - [c318]Yoshihide Tomita, Shoichi Koyama, Hiroshi Saruwatari:
Localizing Acoustic Energy in Sound Field Synthesis by Directionally Weighted Exterior Radiation Suppression. ICASSP 2024: 321-325 - [c317]Yuto Ishikawa, Kohei Konaka, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari:
Real-Time Speech Extraction Using Spatially Regularized Independent Low-Rank Matrix Analysis and Rank-Constrained Spatial Covariance Matrix Estimation. ICASSP Workshops 2024: 730-734 - [c316]Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, Hiroshi Saruwatari:
Diversity-Based Core-Set Selection for Text-to-Speech with Linguistic and Acoustic Features. ICASSP 2024: 12351-12355 - [c315]Shinnosuke Takamichi, Hiroki Maeda, Joonyong Park, Daisuke Saito, Hiroshi Saruwatari:
Do Learned Speech Symbols Follow Zipf's Law? ICASSP 2024: 12526-12530 - [i91]Yoshihide Tomita, Shoichi Koyama, Hiroshi Saruwatari:
Localizing Acoustic Energy in Sound Field Synthesis by Directionally Weighted Exterior Radiation Suppression. CoRR abs/2401.05809 (2024) - [i90]Takaaki Saeki, Soumi Maiti, Shinnosuke Takamichi, Shinji Watanabe, Hiroshi Saruwatari:
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics. CoRR abs/2401.16812 (2024) - [i89]Yuto Ishikawa, Kohei Konaka, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari:
Real-time Speech Extraction Using Spatially Regularized Independent Low-rank Matrix Analysis and Rank-constrained Spatial Covariance Matrix Estimation. CoRR abs/2403.12477 (2024) - [i88]Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Wataru Nakata, Detai Xin, Hiroshi Saruwatari:
Building speech corpus with diverse voice characteristics for its prompt-based representation. CoRR abs/2403.13353 (2024) - [i87]Detai Xin, Xu Tan, Kai Shen, Zeqian Ju, Dongchao Yang, Yuancheng Wang, Shinnosuke Takamichi, Hiroshi Saruwatari, Shujie Liu, Jinyu Li, Sheng Zhao:
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis. CoRR abs/2404.03204 (2024) - [i86]Yuki Saito, Takuto Igarashi, Kentaro Seki, Shinnosuke Takamichi, Ryuichi Yamamoto, Kentaro Tachibana, Hiroshi Saruwatari:
SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark. CoRR abs/2406.07254 (2024) - [i85]Takuto Igarashi, Yuki Saito, Kentaro Seki, Shinnosuke Takamichi, Ryuichi Yamamoto, Kentaro Tachibana, Hiroshi Saruwatari:
Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment. CoRR abs/2406.07280 (2024) - [i84]Kentaro Seki, Shinnosuke Takamichi, Norihiro Takamune, Yuki Saito, Kanami Imamura, Hiroshi Saruwatari:
Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals. CoRR abs/2406.17722 (2024) - [i83]Wataru Nakata, Kentaro Seki, Hitomi Yanaka, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling. CoRR abs/2407.15828 (2024) - [i82]Detai Xin, Xu Tan, Shinnosuke Takamichi, Hiroshi Saruwatari:
BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec. CoRR abs/2409.05377 (2024) - [i81]Kazuki Yamauchi, Yuki Saito, Hiroshi Saruwatari:
Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT. CoRR abs/2409.07265 (2024) - [i80]Kaito Baba, Wataru Nakata, Yuki Saito, Hiroshi Saruwatari:
The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech. CoRR abs/2409.09305 (2024) - [i79]Hiroaki Hyodo, Shinnosuke Takamichi, Tomohiko Nakamura, Junya Koguchi, Hiroshi Saruwatari:
DNN-based ensemble singing voice synthesis with interactions between singers. CoRR abs/2409.09988 (2024) - 2023
- [j93]Takaaki Saeki, Shinnosuke Takamichi, Tomohiko Nakamura, Naoko Tanji, Hiroshi Saruwatari:
SelfRemaster: Self-Supervised Speech Restoration for Historical Audio Resources. IEEE Access 11: 144831-144843 (2023) - [j92]Takumi Abe, Shoichi Koyama, Natsuki Ueno, Hiroshi Saruwatari:
Amplitude Matching for Multizone Sound Field Control. IEEE ACM Trans. Audio Speech Lang. Process. 31: 656-669 (2023) - [j91]Takuya Hasumi, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari, Daichi Kitamura, Yu Takahashi, Kazunobu Kondo:
PoP-IDLMA: Product-of-Prior Independent Deeply Learned Matrix Analysis for Multichannel Music Source Separation. IEEE ACM Trans. Audio Speech Lang. Process. 31: 2680-2694 (2023) - [c314]Sota Misawa, Norihiro Takamune, Kohei Yatabe, Daichi Kitamura, Hiroshi Saruwatari:
Blind Source Separation Using Independent Low-Rank Matrix Analysis with Spectrogram-Consistency Regularization. APSIPA ASC 2023: 1050-1057 - [c313]Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Wataru Nakata, Detai Xin, Hiroshi Saruwatari:
COCO-NUT: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-Based Control. ASRU 2023: 1-8 - [c312]Takaaki Kojima, Kazuyuki Arikawa, Shoichi Koyama, Hiroshi Saruwatari:
Multichannel Active Noise Control with Exterior Radiation Suppression Based on Riemannian Optimization. EUSIPCO 2023: 96-100 - [c311]Kanami Imamura, Tomohiko Nakamura, Norihiro Takamune, Kohei Yatabe, Hiroshi Saruwatari:
Algorithms of Sampling-Frequency-Independent Layers for Non-integer Strides. EUSIPCO 2023: 326-330 - [c310]Koki Nishida, Norihiro Takamune, Rintaro Ikeshita, Daichi Kitamura, Hiroshi Saruwatari, Tomohiro Nakatani:
NoisyILRMA: Diffuse-Noise-Aware Independent Low-Rank Matrix Analysis for Fast Blind Source Extraction. EUSIPCO 2023: 925-929 - [c309]Kazuyuki Arikawa, Shoichi Koyama, Hiroshi Saruwatari:
Spatial Active Noise Control Method Based on Sound Field Interpolation from Reference Microphone Signals. ICASSP 2023: 1-5 - [c308]Tomohiko Nakamura, Shinnosuke Takamichi, Naoko Tanji, Satoru Fukayama, Hiroshi Saruwatari:
jaCappella Corpus: A Japanese a Cappella Vocal Ensemble Corpus. ICASSP 2023: 1-5 - [c307]Hien Ohnaka, Shinnosuke Takamichi, Keisuke Imoto, Yuki Okamoto, Kazuki Fujii, Hiroshi Saruwatari:
Visual Onoma-to-Wave: Environmental Sound Synthesis from Visual Onomatopoeias and Sound-Source Images. ICASSP 2023: 1-5 - [c306]Juliano G. C. Ribeiro, Shoichi Koyama, Hiroshi Saruwatari:
Kernel Interpolation of Acoustic Transfer Functions with Adaptive Kernel for Directed and Residual Reverberations. ICASSP 2023: 1-5 - [c305]Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Detai Xin, Hiroshi Saruwatari:
MID-Attribute Speaker Generation Using Optimal-Transport-Based Interpolation of Gaussian Mixture Models. ICASSP 2023: 1-5 - [c304]Detai Xin, Sharath Adavanne, Federico Ang, Ashish Kulkarni, Shinnosuke Takamichi, Hiroshi Saruwatari:
Improving Speech Prosody of Audiobook Text-To-Speech Synthesis with Acoustic and Textual Contexts. ICASSP 2023: 1-5 - [c303]Dong Yang, Tomoki Koriyama, Yuki Saito, Takaaki Saeki, Detai Xin, Hiroshi Saruwatari:
Duration-Aware Pause Insertion Using Pre-Trained Language Model for Multi-Speaker Text-To-Speech. ICASSP 2023: 1-5 - [c302]Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari:
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining. IJCAI 2023: 5179-5187 - [c301]Detai Xin, Shinnosuke Takamichi, Ai Morimatsu, Hiroshi Saruwatari:
Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus. INTERSPEECH 2023: 17-21 - [c300]Joonyong Park, Shinnosuke Takamichi, Tomohiko Nakamura, Kentaro Seki, Detai Xin, Hiroshi Saruwatari:
How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics. INTERSPEECH 2023: 1085-1089 - [c299]Yuki Saito, Shinnosuke Takamichi, Eiji Iimori, Kentaro Tachibana, Hiroshi Saruwatari:
ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings. INTERSPEECH 2023: 3048-3052 - [c298]Yota Ueda, Shinnosuke Takamichi, Yuki Saito, Norihiro Takamune, Hiroshi Saruwatari:
HumanDiffusion: diffusion model using perceptual gradients. INTERSPEECH 2023: 4264-4268 - [c297]Yuki Saito, Eiji Iimori, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari:
CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center. INTERSPEECH 2023: 5561-5565 - [c296]Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari:
Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion. SSW 2023: 62-68 - [c295]Ryunosuke Hirai, Yuki Saito, Hiroshi Saruwatari:
Federated Learning for Human-in-the-Loop Many-to-Many Voice Conversion. SSW 2023: 94-99 - [c294]Keisuke Kimura, Shoichi Koyama, Hiroshi Saruwatari:
Perceptual Quality Enhancement of Sound Field Synthesis Based on Combination of Pressure and Amplitude Matching. WASPAA 2023: 1-5 - [c293]Shoichi Koyama, Masaki Nakada, Juliano G. C. Ribeiro, Hiroshi Saruwatari:
Kernel Interpolation of Incident Sound Field in Region Including Scattering Objects. WASPAA 2023: 1-5 - [d1]Detai Xin, Junfeng Jiang, Shinnosuke Takamichi, Yuki Saito, Akiko Aizawa, Hiroshi Saruwatari:
JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions. IEEE DataPort, 2023 - [i78]Takaaki Saeki, Soumi Maiti, Xinjian Li, Shinji Watanabe, Shinnosuke Takamichi, Hiroshi Saruwatari:
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining. CoRR abs/2301.12596 (2023) - [i77]Dong Yang, Tomoki Koriyama, Yuki Saito, Takaaki Saeki, Detai Xin, Hiroshi Saruwatari:
Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech. CoRR abs/2302.13652 (2023) - [i76]Juliano G. C. Ribeiro, Shoichi Koyama, Hiroshi Saruwatari:
Kernel interpolation of acoustic transfer functions with adaptive kernel for directed and residual reverberations. CoRR abs/2303.03869 (2023) - [i75]Kazuyuki Arikawa, Shoichi Koyama, Hiroshi Saruwatari:
Spatial Active Noise Control Method Based On Sound Field Interpolation From Reference Microphone Signals. CoRR abs/2303.16021 (2023) - [i74]Detai Xin, Shinnosuke Takamichi, Ai Morimatsu, Hiroshi Saruwatari:
Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus. CoRR abs/2305.12442 (2023) - [i73]Detai Xin, Shinnosuke Takamichi, Hiroshi Saruwatari:
JNV Corpus: A Corpus of Japanese Nonverbal Vocalizations with Diverse Phrases and Emotions. CoRR abs/2305.12445 (2023) - [i72]Yuki Saito, Eiji Iimori, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari:
CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center. CoRR abs/2305.13713 (2023) - [i71]Yuki Saito, Shinnosuke Takamichi, Eiji Iimori, Kentaro Tachibana, Hiroshi Saruwatari:
ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings. CoRR abs/2305.13724 (2023) - [i70]Joonyong Park, Shinnosuke Takamichi, Tomohiko Nakamura, Kentaro Seki, Detai Xin, Hiroshi Saruwatari:
How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics. CoRR abs/2306.00697 (2023) - [i69]Takaaki Kojima, Kazuyuki Arikawa, Shoichi Koyama, Hiroshi Saruwatari:
Multichannel Active Noise Control with Exterior Radiation Suppression Based on Riemannian Optimization. CoRR abs/2306.08855 (2023) - [i68]Kanami Imamura, Tomohiko Nakamura, Norihiro Takamune, Kohei Yatabe, Hiroshi Saruwatari:
Algorithms of Sampling-Frequency-Independent Layers for Non-integer Strides. CoRR abs/2306.10718 (2023) - [i67]Yota Ueda, Shinnosuke Takamichi, Yuki Saito, Norihiro Takamune, Hiroshi Saruwatari:
HumanDiffusion: diffusion model using perceptual gradients. CoRR abs/2306.12169 (2023) - [i66]Koki Nishida, Norihiro Takamune, Rintaro Ikeshita, Daichi Kitamura, Hiroshi Saruwatari, Tomohiro Nakatani:
NoisyILRMA: Diffuse-Noise-Aware Independent Low-Rank Matrix Analysis for Fast Blind Source Extraction. CoRR abs/2306.12820 (2023) - [i65]Keisuke Kimura, Shoichi Koyama, Hiroshi Saruwatari:
Perceptual Quality Enhancement of Sound Field Synthesis Based on Combination of Pressure and Amplitude Matching. CoRR abs/2307.13941 (2023) - [i64]Shoichi Koyama, Masaki Nakada, Juliano G. C. Ribeiro, Hiroshi Saruwatari:
Kernel Interpolation of Incident Sound Field in Region Including Scattering Objects. CoRR abs/2309.05634 (2023) - [i63]Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, Hiroshi Saruwatari:
Diversity-based core-set selection for text-to-speech with linguistic and acoustic features. CoRR abs/2309.08127 (2023) - [i62]Shinnosuke Takamichi, Hiroki Maeda, Joonyong Park, Daisuke Saito, Hiroshi Saruwatari:
Do learned speech symbols follow Zipf's law? CoRR abs/2309.09690 (2023) - [i61]Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Wataru Nakata, Detai Xin, Hiroshi Saruwatari:
Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control. CoRR abs/2309.13509 (2023) - [i60]Detai Xin, Junfeng Jiang, Shinnosuke Takamichi, Yuki Saito, Akiko Aizawa, Hiroshi Saruwatari:
JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions. CoRR abs/2310.06072 (2023) - 2022
- [j90]Yuto Kondo, Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari:
Deficient-basis-complementary rank-constrained spatial covariance matrix estimation based on multivariate generalized Gaussian distribution for blind speech extraction. EURASIP J. Adv. Signal Process. 2022(1): 88 (2022) - [j89]Koichi Saito, Tomohiko Nakamura, Kohei Yatabe, Hiroshi Saruwatari:
Sampling-Frequency-Independent Convolutional Layer and its Application to Audio Source Separation. IEEE ACM Trans. Audio Speech Lang. Process. 30: 2928-2943 (2022) - [j88]Juliano G. C. Ribeiro, Natsuki Ueno, Shoichi Koyama, Hiroshi Saruwatari:
Region-to-Region Kernel Interpolation of Acoustic Transfer Functions Constrained by Physical Properties. IEEE ACM Trans. Audio Speech Lang. Process. 30: 2944-2954 (2022) - [j87]Tomoya Nishida, Natsuki Ueno, Shoichi Koyama, Hiroshi Saruwatari:
Region-Restricted Sensor Placement Based on Gaussian Process for Sound Field Estimation. IEEE Trans. Signal Process. 70: 1718-1733 (2022) - [c292]Juliano G. C. Ribeiro, Shoichi Koyama, Hiroshi Saruwatari:
Region-to-Region Kernel Interpolation of Acoustic Transfer Function with Directional Weighting. ICASSP 2022: 576-580 - [c291]Masaya Kawamura, Tomohiko Nakamura, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo:
Differentiable Digital Signal Processing Mixture Model for Synthesis Parameter Extraction from Mixture of Harmonic Sounds. ICASSP 2022: 941-945 - [c290]Kazuyuki Arikawa, Shoichi Koyama, Hiroshi Saruwatari:
Spatial Active Noise Control Based on Individual Kernel Interpolation of Primary and Secondary Sound Fields. ICASSP 2022: 1056-1060 - [c289]Shinnosuke Takamichi, Wataru Nakata, Naoko Tanji, Hiroshi Saruwatari:
J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis. INTERSPEECH 2022: 2358-2362 - [c288]Kenta Udagawa, Yuki Saito, Hiroshi Saruwatari:
Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS. INTERSPEECH 2022: 2968-2972 - [c287]Yuto Nishimura, Yuki Saito, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari:
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History. INTERSPEECH 2022: 3373-3377 - [c286]Takaaki Saeki, Shinnosuke Takamichi, Tomohiko Nakamura, Naoko Tanji, Hiroshi Saruwatari:
SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling. INTERSPEECH 2022: 4406-4410 - [c285]Takaaki Saeki, Detai Xin, Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari:
UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022. INTERSPEECH 2022: 4521-4525 - [c284]Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari:
Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis. INTERSPEECH 2022: 4551-4555 - [c283]Yuki Saito, Yuto Nishimura, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari:
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent. INTERSPEECH 2022: 5155-5159 - [c282]Yuki Ito, Tomohiko Nakamura, Shoichi Koyama, Hiroshi Saruwatari:
Head-Related Transfer Function Interpolation From Spatially Sparse Measurements Using Autoencoder With Source Position Conditioning. IWAENC 2022: 1-5 - [c281]Kazuhide Shigemi, Shoichi Koyama, Tomohiko Nakamura, Hiroshi Saruwatari:
Physics-Informed Convolutional Neural Network with Bicubic Spline Interpolation for Sound Field Estimation. IWAENC 2022: 1-5 - [c280]Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari:
Personalized Filled-pause Generation with Group-wise Prediction Models. LREC 2022: 385-392 - [c279]Yoshifumi Nakano, Takaaki Saeki, Shinnosuke Takamichi, Katsuhito Sudoh, Hiroshi Saruwatari:
VTTS: Visual-Text To Speech. SLT 2022: 936-942 - [i59]Shinnosuke Takamichi, Wataru Nakata, Naoko Tanji, Hiroshi Saruwatari:
J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis. CoRR abs/2201.10896 (2022) - [i58]Masaya Kawamura, Tomohiko Nakamura, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo:
Differentiable Digital Signal Processing Mixture Model for Synthesis Parameter Extraction from Mixture of Harmonic Sounds. CoRR abs/2202.00200 (2022) - [i57]Kazuyuki Arikawa, Shoichi Koyama, Hiroshi Saruwatari:
Spatial active noise control based on individual kernel interpolation of primary and secondary sound fields. CoRR abs/2202.04807 (2022) - [i56]Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari:
Personalized filled-pause generation with group-wise prediction models. CoRR abs/2203.09961 (2022) - [i55]Takaaki Saeki, Shinnosuke Takamichi, Tomohiko Nakamura, Naoko Tanji, Hiroshi Saruwatari:
SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling. CoRR abs/2203.12937 (2022) - [i54]Yoshifumi Nakano, Takaaki Saeki, Shinnosuke Takamichi, Katsuhito Sudoh, Hiroshi Saruwatari:
vTTS: visual-text to speech. CoRR abs/2203.14725 (2022) - [i53]Yuki Saito, Yuto Nishimura, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari:
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent. CoRR abs/2203.14757 (2022) - [i52]Takaaki Saeki, Detai Xin, Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari:
UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022. CoRR abs/2204.02152 (2022) - [i51]Detai Xin, Shinnosuke Takamichi, Takuma Okamoto, Hisashi Kawai, Hiroshi Saruwatari:
Speaking-Rate-Controllable HiFi-GAN Using Feature Interpolation. CoRR abs/2204.10561 (2022) - [i50]Yuto Nishimura, Yuki Saito, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari:
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History. CoRR abs/2206.08039 (2022) - [i49]Kenta Udagawa, Yuki Saito, Hiroshi Saruwatari:
Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS. CoRR abs/2206.10256 (2022) - [i48]Detai Xin, Shinnosuke Takamichi, Hiroshi Saruwatari:
Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations. CoRR abs/2206.10695 (2022) - [i47]Kazuhide Shigemi, Shoichi Koyama, Tomohiko Nakamura, Hiroshi Saruwatari:
Physics-informed convolutional neural network with bicubic spline interpolation for sound field estimation. CoRR abs/2207.10937 (2022) - [i46]Yuki Ito, Tomohiko Nakamura, Shoichi Koyama, Hiroshi Saruwatari:
Head-Related Transfer Function Interpolation from Spatially Sparse Measurements Using Autoencoder with Source Position Conditioning. CoRR abs/2207.10967 (2022) - [i45]Yusuke Nakai, Yuki Saito, Kenta Udagawa, Hiroshi Saruwatari:
Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech. CoRR abs/2209.12549 (2022) - [i44]Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari:
Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis. CoRR abs/2210.07559 (2022) - [i43]Hien Ohnaka, Shinnosuke Takamichi, Keisuke Imoto, Yuki Okamoto, Kazuki Fujii, Hiroshi Saruwatari:
Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images. CoRR abs/2210.09173 (2022) - [i42]Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari:
Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses. CoRR abs/2210.09815 (2022) - [i41]Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Detai Xin, Hiroshi Saruwatari:
Mid-attribute speaker generation using optimal-transport-based interpolation of Gaussian mixture models. CoRR abs/2210.09916 (2022) - [i40]Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, Hiroshi Saruwatari:
Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection. CoRR abs/2210.14850 (2022) - [i39]Detai Xin, Sharath Adavanne, Federico Ang, Ashish Kulkarni, Shinnosuke Takamichi, Hiroshi Saruwatari:
Improving Speech Prosody of Audiobook Text-to-Speech Synthesis with Acoustic and Textual Contexts. CoRR abs/2211.02336 (2022) - [i38]Tomohiko Nakamura, Shinnosuke Takamichi, Naoko Tanji, Satoru Fukayama, Hiroshi Saruwatari:
jaCappella Corpus: A Japanese a Cappella Vocal Ensemble Corpus. CoRR abs/2211.16028 (2022) - 2021
- [j86]Natsuki Ueno, Shoichi Koyama, Hiroshi Saruwatari:
Convex and Differentiable Formulation for Inverse Problems in Hilbert Spaces with Nonlinear Clipping Effects. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 104-A(9): 1293-1303 (2021) - [j85]Akihito Aiba, Minoru Yoshida, Daichi Kitamura, Shinnosuke Takamichi, Hiroshi Saruwatari:
Noise Robust Acoustic Anomaly Detection System with Nonnegative Matrix Factorization Based on Generalized Gaussian Distribution. IEICE Trans. Inf. Syst. 104-D(3): 441-449 (2021) - [j84]Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
Real-Time Full-Band Voice Conversion with Sub-Band Modeling and Data-Driven Phase Estimation of Spectral Differentials. IEICE Trans. Inf. Syst. 104-D(7): 1002-1016 (2021) - [j83]Satoshi Mizoguchi, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching. IEICE Trans. Inf. Syst. 104-D(11): 1971-1980 (2021) - [j82]Naoki Makishima, Yoshiki Mitsui, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo:
Independent deeply learned matrix analysis with automatic selection of stable microphone-wise update and fast sourcewise update of demixing matrix. Signal Process. 178: 107753 (2021) - [j81]Keigo Kamo, Yoshiki Mitsui, Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo:
Joint-diagonalizability-constrained multichannel nonnegative matrix factorization based on time-variant multivariate complex sub-Gaussian distribution. Signal Process. 188: 108183 (2021) - [j80]Kentaro Mitsui, Tomoki Koriyama, Hiroshi Saruwatari:
Deep Gaussian process based multi-speaker speech synthesis with latent speaker representation. Speech Commun. 132: 132-145 (2021) - [j79]Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari:
Incremental Text-to-Speech Synthesis Using Pseudo Lookahead With Large Pretrained Language Model. IEEE Signal Process. Lett. 28: 857-861 (2021) - [j78]Yuki Mitsufuji, Norihiro Takamune, Shoichi Koyama, Hiroshi Saruwatari:
Multichannel Blind Source Separation Based on Evanescent-Region-Aware Non-Negative Tensor Factorization in Spherical Harmonic Domain. IEEE ACM Trans. Audio Speech Lang. Process. 29: 607-617 (2021) - [j77]Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
Perceptual-Similarity-Aware Deep Speaker Representation Learning for Multi-Speaker Generative Modeling. IEEE ACM Trans. Audio Speech Lang. Process. 29: 1033-1048 (2021) - [j76]Tomohiko Nakamura, Shihori Kozuka, Hiroshi Saruwatari:
Time-Domain Audio Source Separation With Neural Networks Based on Multiresolution Analysis. IEEE ACM Trans. Audio Speech Lang. Process. 29: 1687-1701 (2021) - [j75]Shoichi Koyama, Jesper Brunnström, Hayato Ito, Natsuki Ueno, Hiroshi Saruwatari:
Spatial Active Noise Control Based on Kernel Interpolation of Sound Field. IEEE ACM Trans. Audio Speech Lang. Process. 29: 3052-3063 (2021) - [j74]Natsuki Ueno, Shoichi Koyama, Hiroshi Saruwatari:
Directionally Weighted Wave Field Estimation Exploiting Prior Information on Source Direction. IEEE Trans. Signal Process. 69: 2383-2395 (2021) - [c278]Sota Misawa, Norihiro Takamune, Tomohiko Nakamura, Daichi Kitamura, Hiroshi Saruwatari, Masakazu Une, Shoji Makino:
Speech Enhancement by Noise Self-Supervised Rank-Constrained Spatial Covariance Matrix Estimation via Independent Deeply Learned Matrix Analysis. APSIPA ASC 2021: 578-584 - [c277]Yusaku Mizobuchi, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo:
Prior Distribution Design for Music Bleeding-Sound Reduction Based on Nonnegative Matrix Factorization. APSIPA ASC 2021: 651-658 - [c276]Xuan Luo, Shinnosuke Takamichi, Tomoki Koriyama, Yuki Saito, Hiroshi Saruwatari:
Emotion-Controllable Speech Synthesis Using Emotion Soft Labels and Fine-Grained Prosody Factors. APSIPA ASC 2021: 794-799 - [c275]Takuya Hasumi, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari, Daichi Kitamura, Yu Takahashi, Kazunobu Kondo:
Multichannel Audio Source Separation with Independent Deeply Learned Matrix Analysis Using Product of Source Models. APSIPA ASC 2021: 1226-1233 - [c274]Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari:
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network. ASRU 2021: 749-756 - [c273]Koichi Saito, Tomohiko Nakamura, Kohei Yatabe, Yuma Koizumi, Hiroshi Saruwatari:
Sampling-Frequency-Independent Audio Source Separation Using Convolution Layer Based on Impulse Invariant Method. EUSIPCO 2021: 321-325 - [c272]Naoki Narisawa, Rintaro Ikeshita, Norihiro Takamune, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, Tomohiro Nakatani:
Independent Deeply Learned Tensor Analysis for Determined Audio Source Separation. EUSIPCO 2021: 326-330 - [c271]Takuya Hasumi, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari, Daichi Kitamura, Yu Takahashi, Kazunobu Kondo:
Empirical Bayesian Independent Deeply Learned Matrix Analysis For Multichannel Audio Source Separation. EUSIPCO 2021: 331-335 - [c270]Shoichi Koyama, Takashi Amakasu, Natsuki Ueno, Hiroshi Saruwatari:
Amplitude Matching: Majorization-Minimization Algorithm for Sound Field Control Only with Amplitude Constraint. ICASSP 2021: 411-415 - [c269]Yuto Kondo, Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari:
Deficient Basis Estimation of Noise Spatial Covariance Matrix for Rank-Constrained Spatial Covariance Matrix Estimation Method in Blind Speech Extraction. ICASSP 2021: 806-810 - [c268]Yota Ueda, Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino Baba, Hiroshi Saruwatari:
Humanacgan: Conditional Generative Adversarial Network with Human-Based Auxiliary Classifier and its Evaluation in Phoneme Perception. ICASSP 2021: 6468-6472 - [c267]Detai Xin, Tatsuya Komatsu, Shinnosuke Takamichi, Hiroshi Saruwatari:
Disentangled Speaker and Language Representations Using Mutual Information Minimization and Domain Adaptation for Cross-Lingual TTS. ICASSP 2021: 6608-6612 - [c266]Taiki Nakamura, Tomoki Koriyama, Hiroshi Saruwatari:
Sequence-to-Sequence Learning for Deep Gaussian Process Based Speech Synthesis Using Self-Attention GP Layer. Interspeech 2021: 121-125 - [c265]Detai Xin, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari:
Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis. Interspeech 2021: 1614-1618 - [c264]Kazuki Mizuta, Tomoki Koriyama, Hiroshi Saruwatari:
Harmonic WaveGAN: GAN-Based Speech Waveform Generation Model with Harmonic Structure Discriminator. Interspeech 2021: 2192-2196 - [c263]Kazuya Yufune, Tomoki Koriyama, Shinnosuke Takamichi, Hiroshi Saruwatari:
Accent Modeling of Low-Resourced Dialect in Pitch Accent Language Using Variational Autoencoder. SSW 2021: 189-194 - [c262]Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Naoko Tanji, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari:
Audiobook Speech Synthesis Conditioned by Cross-Sentence Context-Aware Word Embeddings. SSW 2021: 211-215 - [c261]Ryosuke Horiuchi, Shoichi Koyama, Juliano G. C. Ribeiro, Natsuki Ueno, Hiroshi Saruwatari:
Kernel Learning for Sound Field Estimation with L1 and L2 Regularizations. WASPAA 2021: 261-265 - [c260]Keisuke Kimura, Shoichi Koyama, Natsuki Ueno, Hiroshi Saruwatari:
Mean-Square-Error-Based Secondary Source Placement in Sound Field Synthesis with Prior Information on Desired Field. WASPAA 2021: 281-285 - [i37]Yota Ueda, Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino Baba, Hiroshi Saruwatari:
HumanACGAN: conditional generative adversarial network with human-based auxiliary classifier and its evaluation in phoneme perception. CoRR abs/2102.04051 (2021) - [i36]Yuto Kondo, Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari:
Deficient Basis Estimation of Noise Spatial Covariance Matrix for Rank-Constrained Spatial Covariance Matrix Estimation Method in Blind Speech Extraction. CoRR abs/2105.02491 (2021) - [i35]Koichi Saito, Tomohiko Nakamura, Kohei Yatabe, Yuma Koizumi, Hiroshi Saruwatari:
Sampling-Frequency-Independent Audio Source Separation Using Convolution Layer Based on Impulse Invariant Method. CoRR abs/2105.04079 (2021) - [i34]Takuya Hasumi, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari, Daichi Kitamura, Yu Takahashi, Kazunobu Kondo:
Empirical Bayesian Independent Deeply Learned Matrix Analysis For Multichannel Audio Source Separation. CoRR abs/2106.03492 (2021) - [i33]Naoki Narisawa, Rintaro Ikeshita, Norihiro Takamune, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, Tomohiro Nakatani:
Independent Deeply Learned Tensor Analysis for Determined Audio Source Separation. CoRR abs/2106.05529 (2021) - [i32]Yusaku Mizobuchi, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo:
Prior Distribution Design for Music Bleeding-Sound Reduction Based on Nonnegative Matrix Factorization. CoRR abs/2109.00237 (2021) - [i31]Takuya Hasumi, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari, Daichi Kitamura, Yu Takahashi, Kazunobu Kondo:
Multichannel Audio Source Separation with Independent Deeply Learned Matrix Analysis Using Product of Source Models. CoRR abs/2109.00704 (2021) - [i30]Sota Misawa, Norihiro Takamune, Tomohiko Nakamura, Daichi Kitamura, Hiroshi Saruwatari, Masakazu Une, Shoji Makino:
Speech Enhancement by Noise Self-Supervised Rank-Constrained Spatial Covariance Matrix Estimation via Independent Deeply Learned Matrix Analysis. CoRR abs/2109.04658 (2021) - [i29]Naoto Iijima, Shoichi Koyama, Hiroshi Saruwatari:
Binaural rendering from microphone array signals of arbitrary geometry. CoRR abs/2109.07274 (2021) - [i28]Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari:
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network. CoRR abs/2109.10724 (2021) - [i27]Ryosuke Horiuchi, Shoichi Koyama, Juliano G. C. Ribeiro, Natsuki Ueno, Hiroshi Saruwatari:
Kernel Learning For Sound Field Estimation With L1 and L2 Regularizations. CoRR abs/2110.04972 (2021) - [i26]Keisuke Kimura, Shoichi Koyama, Natsuki Ueno, Hiroshi Saruwatari:
Mean-square-error-based secondary source placement in sound field synthesis with prior information on desired field. CoRR abs/2112.06774 (2021) - 2020
- [j73]Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari:
Generative Moment Matching Network-Based Neural Double-Tracking for Synthesized and Natural Singing Voices. IEICE Trans. Inf. Syst. 103-D(3): 639-647 (2020) - [j72]Junya Koguchi, Shinnosuke Takamichi, Masanori Morise, Hiroshi Saruwatari, Shigeki Sagayama:
DNN-Based Full-Band Speech Synthesis Using GMM Approximation of Spectral Envelope. IEICE Trans. Inf. Syst. 103-D(12): 2673-2681 (2020) - [j71]Shinnosuke Takamichi, Yuki Saito, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari:
Phase reconstruction from amplitude spectrograms based on directional-statistics deep neural networks. Signal Process. 169: 107368 (2020) - [j70]Yuhta Takida, Shoichi Koyama, Natsuki Ueno, Hiroshi Saruwatari:
Reciprocity gap functional in spherical harmonic domain for gridless sound field decomposition. Signal Process. 169: 107383 (2020) - [j69]Masashi Aso, Shinnosuke Takamichi, Norihiro Takamune, Hiroshi Saruwatari:
Acoustic model-based subword tokenization and prosodic-context extraction without language knowledge for text-to-speech synthesis. Speech Commun. 125: 53-60 (2020) - [j68]Yuki Mitsufuji, Stefan Uhlich, Norihiro Takamune, Daichi Kitamura, Shoichi Koyama, Hiroshi Saruwatari:
Multichannel Non-Negative Matrix Factorization Using Banded Spatial Covariance Matrices in Wavenumber Domain. IEEE ACM Trans. Audio Speech Lang. Process. 28: 49-60 (2020) - [j67]Shinichi Mogami, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo, Nobutaka Ono:
Independent Low-Rank Matrix Analysis Based on Time-Variant Sub-Gaussian Source Model for Determined Blind Source Separation. IEEE ACM Trans. Audio Speech Lang. Process. 28: 503-518 (2020) - [j66]Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari:
Blind Speech Extraction Based on Rank-Constrained Spatial Covariance Matrix Estimation With Multivariate Generalized Gaussian Distribution. IEEE ACM Trans. Audio Speech Lang. Process. 28: 1948-1963 (2020) - [c259]Keigo Kamo, Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo:
Joint-Diagonalizability-Constrained Multichannel Nonnegative Matrix Factorization Based on Multivariate Complex Student's t-distribution. APSIPA 2020: 869-874 - [c258]Rui Watanabe, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo:
DNN-Based Frequency Component Prediction for Frequency-Domain Audio Source Separation. EUSIPCO 2020: 805-809 - [c257]Keigo Kamo, Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo:
Joint-Diagonalizability-Constrained Multichannel Nonnegative Matrix Factorization Based on Multivariate Complex Sub-Gaussian Distribution. EUSIPCO 2020: 890-894 - [c256]Tomoya Nishida, Natsuki Ueno, Shoichi Koyama, Hiroshi Saruwatari:
Sensor placement in arbitrarily restricted region for field estimation based on Gaussian process. EUSIPCO 2020: 2289-2293 - [c255]Kentaro Ariga, Tomoya Nishida, Shoichi Koyama, Natsuki Ueno, Hiroshi Saruwatari:
Mutual-Information-Based Sensor Placement for Spatial Sound Field Recording. ICASSP 2020: 166-170 - [c254]Tomohiko Nakamura, Hiroshi Saruwatari:
Time-Domain Audio Source Separation Based on Wave-U-Net Combined with Discrete Wavelet Transform. ICASSP 2020: 386-390 - [c253]Keigo Kamo, Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo:
Regularized Fast Multichannel Nonnegative Matrix Factorization with ILRMA-Based Prior Distribution of Joint-Diagonalization Process. ICASSP 2020: 606-610 - [c252]Tatsuki Kondo, Kanta Fukushige, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Rintaro Ikeshita, Tomohiro Nakatani:
Convergence-Guaranteed Independent Positive Semidefinite Tensor Analysis Based on Student's T Distribution. ICASSP 2020: 681-685 - [c251]Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino Baba, Hiroshi Saruwatari:
Humangan: Generative Adversarial Network With Human-Based Discriminator And Its Evaluation In Speech Perception Modeling. ICASSP 2020: 6239-6243 - [c250]Tomoki Koriyama, Hiroshi Saruwatari:
Utterance-Level Sequential Modeling for Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit. ICASSP 2020: 7249-7253 - [c249]Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
Lifter Training and Sub-Band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials. ICASSP 2020: 7784-7788 - [c248]Hayato Ito, Shoichi Koyama, Natsuki Ueno, Hiroshi Saruwatari:
Spatial Active Noise Control Based on Kernel Interpolation with Directional Weighting. ICASSP 2020: 8404-8408 - [c247]Juliano G. C. Ribeiro, Natsuki Ueno, Shoichi Koyama, Hiroshi Saruwatari:
Kernel interpolation of acoustic transfer function between regions considering reciprocity. SAM 2020: 1-5 - [c246]Hirotoshi Takeuchi, Kunio Kashino, Yasunori Ohishi, Hiroshi Saruwatari:
Harmonic Lowering for Accelerating Harmonic Convolution for Audio Signals. INTERSPEECH 2020: 185-189 - [c245]Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU. INTERSPEECH 2020: 1021-1022 - [c244]Kentaro Mitsui, Tomoki Koriyama, Hiroshi Saruwatari:
Multi-Speaker Text-to-Speech Synthesis Using Deep Gaussian Processes. INTERSPEECH 2020: 2032-2036 - [c243]Detai Xin, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari:
Cross-Lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space. INTERSPEECH 2020: 2947-2951 - [c242]Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari:
Investigating Effective Additional Contextual Factors in DNN-Based Spontaneous Speech Synthesis. INTERSPEECH 2020: 3201-3205 - [c241]Masashi Aso, Shinnosuke Takamichi, Hiroshi Saruwatari:
End-to-End Text-to-Speech Synthesis with Unaligned Multiple Language Units Based on Attention. INTERSPEECH 2020: 4009-4013 - [c240]Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari:
DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus. LREC 2020: 6438-6443 - [c239]Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
SMASH Corpus: A Spontaneous Speech Corpus Recording Third-person Audio Commentaries on Gameplay. LREC 2020: 6571-6577 - [c238]Naoto Iijima, Shoichi Koyama, Hiroshi Saruwatari:
Binaural Rendering From Distributed Microphone Signals Considering Loudspeaker Distance in Measurements. MMSP 2020: 1-6 - [i25]Hiroki Tamaru, Shinnosuke Takamichi, Naoko Tanji, Hiroshi Saruwatari:
JVS-MuSiC: Japanese multispeaker singing-voice corpus. CoRR abs/2001.07044 (2020) - [i24]Tomohiko Nakamura, Hiroshi Saruwatari:
Time-Domain Audio Source Separation Based on Wave-U-Net Combined with Discrete Wavelet Transform. CoRR abs/2001.10190 (2020) - [i23]Keigo Kamo, Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo:
Regularized Fast Multichannel Nonnegative Matrix Factorization with ILRMA-based Prior Distribution of Joint-Diagonalization Process. CoRR abs/2002.00579 (2020) - [i22]Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
Lifter Training and Sub-band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials. CoRR abs/2002.06778 (2020) - [i21]Tatsuki Kondo, Kanta Fukushige, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Rintaro Ikeshita, Tomohiro Nakatani:
Convergence-guaranteed Independent Positive Semidefinite Tensor Analysis Based on Student's t Distribution. CoRR abs/2002.08582 (2020) - [i20]Tomoki Koriyama, Hiroshi Saruwatari:
Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit. CoRR abs/2004.10823 (2020) - [i19]Keigo Kamo, Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo:
Joint-Diagonalizability-Constrained Multichannel Nonnegative Matrix Factorization Based on Multivariate Complex Sub-Gaussian Distribution. CoRR abs/2007.00416 (2020) - [i18]Kentaro Mitsui, Tomoki Koriyama, Hiroshi Saruwatari:
Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes. CoRR abs/2008.02950 (2020) - [i17]Shinnosuke Takamichi, Mamoru Komachi, Naoko Tanji, Hiroshi Saruwatari:
JSSS: free Japanese speech corpus for summarization and simplification. CoRR abs/2010.01793 (2020) - [i16]Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari:
Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model. CoRR abs/2012.12612 (2020)
2010 – 2019
- 2019
- [j65]Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra. Comput. Speech Lang. 58: 347-363 (2019) - [j64]Shinichi Mogami, Yoshiki Mitsui, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo, Hiroaki Nakajima, Hirokazu Kameoka:
Independent Low-Rank Matrix Analysis Based on Generalized Kullback-Leibler Divergence. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 102-A(2): 458-463 (2019) - [j63]Daiki Sekizawa, Shinnosuke Takamichi, Hiroshi Saruwatari:
Prosody Correction Preserving Speaker Individuality for Chinese-Accented Japanese HMM-Based Text-to-Speech Synthesis. IEICE Trans. Inf. Syst. 102-D(6): 1218-1221 (2019) - [j62]Hiroaki Nakajima, Daichi Kitamura, Norihiro Takamune, Hiroshi Saruwatari, Nobutaka Ono:
Bilevel Optimization Using Stationary Point of Lower-Level Objective Function for Discriminative Basis Learning in Nonnegative Matrix Factorization. IEEE Signal Process. Lett. 26(6): 818-822 (2019) - [j61]Naoki Makishima, Shinichi Mogami, Norihiro Takamune, Daichi Kitamura, Hayato Sumino, Shinnosuke Takamichi, Hiroshi Saruwatari, Nobutaka Ono:
Independent Deeply Learned Matrix Analysis for Determined Audio Source Separation. IEEE ACM Trans. Audio Speech Lang. Process. 27(10): 1601-1615 (2019) - [j60]Natsuki Ueno, Shoichi Koyama, Hiroshi Saruwatari:
Three-Dimensional Sound Field Reproduction Based on Weighted Mode-Matching Method. IEEE ACM Trans. Audio Speech Lang. Process. 27(12): 1852-1867 (2019) - [c237]Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari:
Acceleration of rank-constrained spatial covariance matrix estimation for blind speech extraction. APSIPA 2019: 332-338 - [c236]Naoki Makishima, Norihiro Takamune, Hiroshi Saruwatari, Daichi Kitamura, Yu Takahashi, Kazunobu Kondo:
Robust Demixing Filter Update Algorithm Based on Microphone-wise Coordinate Descent for Independent Deeply Learned Matrix Analysis. APSIPA 2019: 1868-1873 - [c235]Masakazu Une, Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Shoji Makino:
Evaluation of Multichannel Hearing Aid System by Rank-Constrained Spatial Covariance Matrix Estimation. APSIPA 2019: 1874-1879 - [c234]Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari:
Efficient Full-Rank Spatial Covariance Estimation Using Independent Low-Rank Matrix Analysis for Blind Source Separation. EUSIPCO 2019: 1-5 - [c233]Hayato Ito, Shoichi Koyama, Natsuki Ueno, Hiroshi Saruwatari:
Feedforward Spatial Active Noise Control Based on Kernel Interpolation of Sound Field. ICASSP 2019: 511-515 - [c232]Yuhta Takida, Shoichi Koyama, Natsuki Ueno, Hiroshi Saruwatari:
Robust Gridless Sound Field Decomposition Based on Structured Reciprocity Gap Functional in Spherical Harmonic Domain. ICASSP 2019: 581-585 - [c231]Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari:
Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking. ICASSP 2019: 7070-7074 - [c230]Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis. SSW 2019: 51-56 - [c229]Riku Arakawa, Shinnosuke Takamichi, Hiroshi Saruwatari:
Implementation of DNN-based real-time voice conversion and its improvements by audio data augmentation and mask-shaped device. SSW 2019: 93-98 - [c228]Taiki Nakamura, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Hiroshi Saruwatari:
V2S attack: building DNN-based voice conversion from automatic speaker verification. SSW 2019: 161-165 - [c227]Masashi Aso, Shinnosuke Takamichi, Norihiro Takamune, Hiroshi Saruwatari:
Subword tokenization based on DNN-based acoustic model for end-to-end prosody generation. SSW 2019: 234-238 - [c226]Riku Arakawa, Shinnosuke Takamichi, Hiroshi Saruwatari:
TransVoice: Real-Time Voice Conversion for Augmenting Near-Field Speech Communication. UIST (Adjunct Volume) 2019: 33-35 - [c225]Masahiro Nakanishi, Natsuki Ueno, Shoichi Koyama, Hiroshi Saruwatari:
Two-Dimensional Sound Field Recording With Multiple Circular Microphone Arrays Considering Multiple Scattering. WASPAA 2019: 368-372 - [i15]Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari:
Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking. CoRR abs/1902.03389 (2019) - [i14]Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari:
Efficient Full-Rank Spatial Covariance Estimation Using Independent Low-Rank Matrix Analysis for Blind Source Separation. CoRR abs/1906.02482 (2019) - [i13]Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis. CoRR abs/1907.08294 (2019) - [i12]Taiki Nakamura, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Hiroshi Saruwatari:
V2S attack: building DNN-based voice conversion from automatic speaker verification. CoRR abs/1908.01454 (2019) - [i11]Yuki Kubo, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari:
Acceleration of rank-constrained spatial covariance matrix estimation for blind speech extraction. CoRR abs/1908.01964 (2019) - [i10]Shinnosuke Takamichi, Kentaro Mitsui, Yuki Saito, Tomoki Koriyama, Naoko Tanji, Hiroshi Saruwatari:
JVS corpus: free Japanese multi-speaker voice corpus. CoRR abs/1908.06248 (2019) - [i9]Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino Baba, Hiroshi Saruwatari:
HumanGAN: generative adversarial network with human-based discriminator and its evaluation in speech perception modeling. CoRR abs/1909.11391 (2019) - 2018
- [j59]Daichi Kitamura, Shinichi Mogami, Yoshiki Mitsui, Norihiro Takamune, Hiroshi Saruwatari, Nobutaka Ono, Yu Takahashi, Kazunobu Kondo:
Generalized independent low-rank matrix analysis using heavy-tailed distributions for blind source separation. EURASIP J. Adv. Signal Process. 2018: 28 (2018) - [j58]Natsuki Ueno, Shoichi Koyama, Hiroshi Saruwatari:
Sound Field Recording Using Distributed Microphones Based on Harmonic Analysis of Infinite Order. IEEE Signal Process. Lett. 25(1): 135-139 (2018) - [j57]Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks. IEEE ACM Trans. Audio Speech Lang. Process. 26(1): 84-96 (2018) - [j56]Naoki Murata, Shoichi Koyama, Norihiro Takamune, Hiroshi Saruwatari:
Sparse Representation Using Multidimensional Mixed-Norm Penalty With Application to Sound Field Decomposition. IEEE Trans. Signal Process. 66(12): 3327-3338 (2018) - [c224]Masakazu Une, Yuki Saito, Shinnosuke Takamichi, Daichi Kitamura, Ryoichi Miyazaki, Hiroshi Saruwatari:
Generative approach using the noise generation models for DNN-based speech synthesis trained from noisy speech. APSIPA 2018: 340-344 - [c223]Takanori Akiyama, Shinnosuke Takamichi, Hiroshi Saruwatari:
Prosody-aware subword embedding considering Japanese intonation systems and its application to DNN-based multi-dialect speech synthesis. APSIPA 2018: 659-664 - [c222]Shinichi Mogami, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo, Hiroaki Nakajima, Nobutaka Ono:
Independent Low-Rank Matrix Analysis Based on Time-Variant Sub-Gaussian Source Model. APSIPA 2018: 1684-1691 - [c221]Shinichi Mogami, Hayato Sumino, Daichi Kitamura, Norihiro Takamune, Shinnosuke Takamichi, Hiroshi Saruwatari, Nobutaka Ono:
Independent Deeply Learned Matrix Analysis for Multichannel Audio Source Separation. EUSIPCO 2018: 1557-1561 - [c220]Yuhta Takida, Shoichi Koyama, Hiroshi Saruwatari:
Exterior and Interior Sound Field Separation Using Convex Optimization: Comparison of Signal Models. EUSIPCO 2018: 2549-2553 - [c219]Natsuki Ueno, Shoichi Koyama, Hiroshi Saruwatari:
Sound Field Reproduction with Exterior Cancellation Using Analytical Weighting of Harmonic Coefficients. ICASSP 2018: 466-470 - [c218]Yoshiki Mitsui, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo:
Vectorwise Coordinate Descent Algorithm for Spatially Regularized Independent Low-Rank Matrix Analysis. ICASSP 2018: 746-750 - [c217]Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
Text-to-Speech Synthesis Using STFT Spectra Based on Low-/Multi-Resolution Generative Adversarial Networks. ICASSP 2018: 5299-5303 - [c216]Yuhta Takida, Shoichi Koyama, Natsuki Ueno, Hiroshi Saruwatari:
Gridless Sound Field Decomposition Based on Reciprocity Gap Functional in Spherical Harmonic Domain. SAM 2018: 627-631 - [c215]Natsuki Ueno, Shoichi Koyama, Hiroshi Saruwatari:
Kernel Ridge Regression with Constraint of Helmholtz Equation for Sound Field Interpolation. IWAENC 2018: 1-440 - [c214]Shinnosuke Takamichi, Yuki Saito, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari:
Phase Reconstruction from Amplitude Spectrograms Based on Von-Mises-Distribution Deep Neural Network. IWAENC 2018: 286-290 - [c213]Shinnosuke Takamichi, Hiroshi Saruwatari:
CPJD Corpus: Crowdsourced Parallel Speech Corpus of Japanese Dialects. LREC 2018 - [i8]Shinichi Mogami, Hayato Sumino, Daichi Kitamura, Norihiro Takamune, Shinnosuke Takamichi, Hiroshi Saruwatari, Nobutaka Ono:
Independent Deeply Learned Matrix Analysis for Multichannel Audio Source Separation. CoRR abs/1806.10307 (2018) - [i7]Shinnosuke Takamichi, Yuki Saito, Norihiro Takamune, Daichi Kitamura, Hiroshi Saruwatari:
Phase reconstruction from amplitude spectrograms based on von-Mises-distribution deep neural network. CoRR abs/1807.03474 (2018) - 2017
- [j55]Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
Voice Conversion Using Input-to-Output Highway Networks. IEICE Trans. Inf. Syst. 100-D(8): 1925-1928 (2017) - [j54]Yoshiaki Bando, Hiroshi Saruwatari, Nobutaka Ono, Shoji Makino, Katsutoshi Itoyama, Daichi Kitamura, Masaru Ishimura, Moe Takakusaki, Narumi Mae, Kouei Yamaoka, Yutaro Matsui, Yuichi Ambe, Masashi Konyo, Satoshi Tadokoro, Kazuyoshi Yoshii, Hiroshi G. Okuno:
Low Latency and High Quality Two-Stage Human-Voice-Enhancement System for a Hose-Shaped Rescue Robot. J. Robotics Mechatronics 29(1): 198-212 (2017) - [c212]Narumi Mae, Yoshiki Mitsui, Shoji Makino, Daichi Kitamura, Nobutaka Ono, Takeshi Yamada, Hiroshi Saruwatari:
Sound source localization using binaural difference for hose-shaped rescue robot. APSIPA 2017: 1621-1627 - [c211]Shinnosuke Takamichi, Daisuke Saito, Hiroshi Saruwatari, Nobuaki Minematsu:
The UTokyo speech synthesis system for Blizzard Challenge 2017. Blizzard Challenge 2017 - [c210]Yoshiki Mitsui, Daichi Kitamura, Norihiro Takamune, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo:
Independent low-rank matrix analysis based on parametric majorization-equalization algorithm. CAMSAP 2017: 1-5 - [c209]Daichi Kitamura, Nobutaka Ono, Hiroshi Saruwatari:
Experimental analysis of optimal window length for independent low-rank matrix analysis. EUSIPCO 2017: 1170-1174 - [c208]Natsuki Ueno, Shoichi Koyama, Hiroshi Saruwatari:
Listening-area-informed sound field reproduction with Gaussian prior based on circular harmonic expansion. HSCMA 2017: 196-200 - [c207]Narumi Mae, Masaru Ishimura, Shoji Makino, Daichi Kitamura, Nobutaka Ono, Takeshi Yamada, Hiroshi Saruwatari:
Ego Noise Reduction for Hose-Shaped Rescue Robot Combining Independent Low-Rank Matrix Analysis and Multichannel Noise Cancellation. LVA/ICA 2017: 141-151 - [c206]Yoshiki Mitsui, Daichi Kitamura, Shinnosuke Takamichi, Nobutaka Ono, Hiroshi Saruwatari:
Blind source separation based on independent low-rank matrix analysis with sparse regularization for time-series activity. ICASSP 2017: 21-25 - [c205]Natsuki Ueno, Shoichi Koyama, Hiroshi Saruwatari:
Listening-area-informed sound field reproduction based on circular harmonic expansion. ICASSP 2017: 111-115 - [c204]Naoki Murata, Shoichi Koyama, Norihiro Takamune, Hiroshi Saruwatari:
Spatio-temporal sparse sound field decomposition considering acoustic source signal characteristics. ICASSP 2017: 441-445 - [c203]Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
Training algorithm to deceive Anti-Spoofing Verification for DNN-based speech synthesis. ICASSP 2017: 4900-4904 - [c202]Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities. INTERSPEECH 2017: 1268-1272 - [c201]Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari:
Sampling-Based Speech Parameter Generation Using Moment-Matching Networks. INTERSPEECH 2017: 3961-3965 - [c200]Shinichi Mogami, Daichi Kitamura, Yoshiki Mitsui, Norihiro Takamune, Hiroshi Saruwatari, Nobutaka Ono:
Independent low-rank matrix analysis based on complex student's t-distribution for blind audio source separation. MLSP 2017: 1-6 - [i6]Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities. CoRR abs/1704.02360 (2017) - [i5]Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari:
Sampling-based speech parameter generation using moment-matching networks. CoRR abs/1704.03626 (2017) - [i4]Shinichi Mogami, Daichi Kitamura, Yoshiki Mitsui, Norihiro Takamune, Hiroshi Saruwatari, Nobutaka Ono:
Independent Low-Rank Matrix Analysis Based on Complex Student's t-Distribution for Blind Audio Source Separation. CoRR abs/1708.04795 (2017) - [i3]Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks. CoRR abs/1709.08041 (2017) - [i2]Yoshiki Mitsui, Daichi Kitamura, Norihiro Takamune, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo:
Independent Low-Rank Matrix Analysis Based on Parametric Majorization-Equalization Algorithm. CoRR abs/1710.01589 (2017) - [i1]Ryosuke Sonobe, Shinnosuke Takamichi, Hiroshi Saruwatari:
JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis. CoRR abs/1711.00354 (2017) - 2016
- [j53]Daichi Kitamura, Nobutaka Ono, Hiroshi Sawada, Hirokazu Kameoka, Hiroshi Saruwatari:
Determined Blind Source Separation Unifying Independent Vector Analysis and Nonnegative Matrix Factorization. IEEE ACM Trans. Audio Speech Lang. Process. 24(9): 1626-1641 (2016) - [c199]Hiroaki Nakajima, Daichi Kitamura, Norihiro Takamune, Shoichi Koyama, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo:
Audio signal separation using supervised NMF with time-variant all-pole-model-based basis deformation. APSIPA 2016: 1-7 - [c198]Hiroaki Nakajima, Daichi Kitamura, Norihiro Takamune, Shoichi Koyama, Hiroshi Saruwatari, Nobutaka Ono, Yu Takahashi, Kazunobu Kondo:
Music signal separation using supervised NMF with all-pole-model-based discriminative basis deformation. EUSIPCO 2016: 1143-1147 - [c197]Naoki Murata, Hirokazu Kameoka, Keisuke Kinoshita, Shoko Araki, Tomohiro Nakatani, Shoichi Koyama, Hiroshi Saruwatari:
Reverberation-robust underdetermined source separation with non-negative tensor double deconvolution. EUSIPCO 2016: 1648-1652 - [c196]Yuki Mitsufuji, Shoichi Koyama, Hiroshi Saruwatari:
Multichannel blind source separation based on non-negative tensor factorization in wavenumber domain. ICASSP 2016: 56-60 - [c195]Naoki Murata, Shoichi Koyama, Hirokazu Kameoka, Norihiro Takamune, Hiroshi Saruwatari:
Sparse sound field decomposition with multichannel extension of complex NMF. ICASSP 2016: 345-349 - [c194]Shoichi Koyama, Hiroshi Saruwatari:
Sound field decomposition in reverberant environment using sparse and low-rank signal models. ICASSP 2016: 395-399 - [c193]Li Li, Hirokazu Kameoka, Takuya Higuchi, Hiroshi Saruwatari:
Semi-Supervised Joint Enhancement of Spectral and Cepstral Sequences of Noisy Speech. INTERSPEECH 2016: 3753-3757 - [c192]Masaru Ishimura, Shoji Makino, Takeshi Yamada, Nobutaka Ono, Hiroshi Saruwatari:
Noise reduction using independent vector analysis and noise cancellation for a hose-shaped rescue robot. IWAENC 2016: 1-5 - [c191]Daichi Kitamura, Nobutaka Ono, Hiroshi Saruwatari, Yu Takahashi, Kazunobu Kondo:
Discriminative and reconstructive basis training for audio source separation with semi-supervised nonnegative matrix factorization. IWAENC 2016: 1-5 - [c190]Moe Takakusaki, Daichi Kitamura, Nobutaka Ono, Takeshi Yamada, Shoji Makino, Hiroshi Saruwatari:
Ego-noise reduction for a hose-shaped rescue robot using determined rank-1 multichannel nonnegative matrix factorization. IWAENC 2016: 1-4 - 2015
- [j52]Daichi Kitamura, Hiroshi Saruwatari, Hirokazu Kameoka, Yu Takahashi, Kazunobu Kondo, Satoshi Nakamura:
Multichannel Signal Separation Combining Directional Clustering and Nonnegative Matrix Factorization with Spectrogram Restoration. IEEE ACM Trans. Audio Speech Lang. Process. 23(4): 654-669 (2015) - [c189]Shoichi Koyama, Atsushi Matsubayashi, Naoki Murata, Hiroshi Saruwatari:
Sparse sound field decomposition using group sparse Bayesian learning. APSIPA 2015: 850-855 - [c188]Naoki Murata, Shoichi Koyama, Norihiro Takamune, Hiroshi Saruwatari:
Sparse sound field decomposition with parametric dictionary learning for super-resolution recording and reproduction. CAMSAP 2015: 69-72 - [c187]Daichi Kitamura, Nobutaka Ono, Hiroshi Sawada, Hirokazu Kameoka, Hiroshi Saruwatari:
Relaxation of rank-1 spatial constraint in overdetermined blind source separation. EUSIPCO 2015: 1261-1265 - [c186]Daichi Kitamura, Nobutaka Ono, Hiroshi Sawada, Hirokazu Kameoka, Hiroshi Saruwatari:
Efficient multichannel nonnegative matrix factorization exploiting rank-1 spatial model. ICASSP 2015: 276-280 - [c185]Yuki Murota, Daichi Kitamura, Shoichi Koyama, Hiroshi Saruwatari, Satoshi Nakamura:
Statistical modeling of binaural signal and its application to binaural source separation. ICASSP 2015: 494-498 - [c184]Shoichi Koyama, Naoki Murata, Hiroshi Saruwatari:
Structured sparse signal models and decomposition algorithm for super-resolution in sound field recording and reproduction. ICASSP 2015: 619-623 - [c183]Hiroshi Saruwatari:
Statistical-model-based speech enhancement with musical-noise-free properties. DSP 2015: 1201-1205 - [c182]Shoichi Koyama, Koichiro Ito, Hiroshi Saruwatari:
Source-location-informed sound field recording and reproduction with spherical arrays. WASPAA 2015: 1-5 - 2014
- [j51]Daichi Kitamura, Hiroshi Saruwatari, Kosuke Yagi, Kiyohiro Shikano, Yu Takahashi, Kazunobu Kondo:
Music Signal Separation Based on Supervised Nonnegative Matrix Factorization with Orthogonality and Maximum-Divergence Penalties. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 97-A(5): 1113-1118 (2014) - [j50]Ryoichi Miyazaki, Hiroshi Saruwatari, Satoshi Nakamura, Kiyohiro Shikano, Kazunobu Kondo, Jonathan Blanchette, Martin Bouchard:
Musical-noise-free blind speech extraction integrating microphone array and iterative spectral subtraction. Signal Process. 102: 226-239 (2014) - [j49]Hironori Doi, Tomoki Toda, Keigo Nakamura, Hiroshi Saruwatari, Kiyohiro Shikano:
Alaryngeal Speech Enhancement Based on One-to-Many Eigenvoice Conversion. IEEE ACM Trans. Audio Speech Lang. Process. 22(1): 172-183 (2014) - [c181]Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo, Hirokazu Kameoka:
Hybrid multichannel signal separation using supervised nonnegative matrix factorization with spectrogram restoration. APSIPA 2014: 1-10 - [c180]Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo, Hirokazu Kameoka:
Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation. HSCMA 2014: 92-96 - [c179]Shunsuke Nakai, Hiroshi Saruwatari, Ryoichi Miyazaki, Satoshi Nakamura, Kazunobu Kondo:
Theoretical analysis of biased MMSE short-time spectral amplitude estimator and its extension to musical-noise-free speech enhancement. HSCMA 2014: 122-126 - [c178]Fine Dwinita Aprilyanti, Hiroshi Saruwatari, Satoshi Nakamura, Tomoya Takatani:
Optimized joint noise suppression and dereverberation based on blind signal extraction for hands-free speech recognition system. HSCMA 2014: 182-186 - [c177]Yuki Murota, Daichi Kitamura, Shunsuke Nakai, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo:
Music signal separation based on Bayesian spectral amplitude estimator with automatic target prior adaptation. ICASSP 2014: 7490-7494 - 2013
- [j48]Rafael Torres, Hiromichi Kawanami, Tomoko Matsui, Hiroshi Saruwatari, Kiyohiro Shikano:
Comparison of Methods for Topic Classification of Spoken Inquiries. Inf. Media Technol. 8(2): 438-448 (2013) - [j47]Frédéric Mustière, Martin Bouchard, Hossein Najaf-Zadeh, Ramin Pichevar, Louis Thibault, Hiroshi Saruwatari:
Design of multichannel frequency domain statistical-based enhancement systems preserving spatial cues via spectral distances minimization. Signal Process. 93(1): 321-325 (2013) - [c176]Fine Dwinita Aprilyanti, Hiroshi Saruwatari, Kiyohiro Shikano, Satoshi Nakamura, Tomoya Takatani:
Semi-blind algorithm for joint noise suppression and dereverberation based on higher-order statistics and acoustic model likelihood. APSIPA 2013: 1-6 - [c175]Ryoichi Miyazaki, Hiroshi Saruwatari, Satoshi Nakamura, Kiyohiro Shikano, Kazunobu Kondo, Jonathan Blanchette, Martin Bouchard:
Toward musical-noise-free blind speech extraction: Concept and its applications. APSIPA 2013: 1-10 - [c174]Daichi Kitamura, Hiroshi Saruwatari, Yusuke Iwao, Kiyohiro Shikano, Kazunobu Kondo, Yu Takahashi:
Superresolution-based stereo signal separation via supervised nonnegative matrix factorization. DSP 2013: 1-6 - [c173]Daichi Kitamura, Hiroshi Saruwatari, Kiyohiro Shikano, Kazunobu Kondo, Yu Takahashi:
Music signal separation by supervised nonnegative matrix factorization with basis deformation. DSP 2013: 1-6 - [c172]Hiroshi Saruwatari, Suzumi Kanehara, Ryoichi Miyazaki, Kiyohiro Shikano, Kazunobu Kondo:
Musical noise analysis for Bayesian minimum mean-square error speech amplitude estimators based on higher-order statistics. INTERSPEECH 2013: 441-445 - [c171]Hiroshi Saruwatari, Ryoichi Miyazaki:
Information-geometric optimization for nonlinear noise reduction systems. ISPACS 2013: 192-197 - [c170]Daichi Kitamura, Hiroshi Saruwatari, Kosuke Yagi, Kiyohiro Shikano, Yu Takahashi, Kazunobu Kondo:
Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing. ISSPIT 2013: 392-397 - 2012
- [j46]Ryoichi Miyazaki, Hiroshi Saruwatari, Kiyohiro Shikano:
Theoretical Analysis of Amounts of Musical Noise and Speech Distortion in Structure-Generalized Parametric Blind Spatial Subtraction Array. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 95-A(2): 586-590 (2012) - [j45]Ryo Wakisaka, Hiroshi Saruwatari, Kiyohiro Shikano, Tomoya Takatani:
Speech Prior Estimation for Generalized Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 95-A(2): 591-595 (2012) - [j44]Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech. Speech Commun. 54(1): 134-146 (2012) - [j43]Ryoichi Miyazaki, Hiroshi Saruwatari, Takayuki Inoue, Yu Takahashi, Kiyohiro Shikano, Kazunobu Kondo:
Musical-Noise-Free Speech Enhancement Based on Optimized Iterative Spectral Subtraction. IEEE Trans. Speech Audio Process. 20(7): 2080-2094 (2012) - [c169]Fine Dwinita Aprilyanti, Hiroshi Saruwatari, Kiyohiro Shikano, Tomoya Takatani:
Optimization scheme of joint noise suppression and dereverberation based on higher-order statistics. APSIPA 2012: 1-6 - [c168]Suzumi Kanehara, Hiroshi Saruwatari, Ryoichi Miyazaki, Kiyohiro Shikano, Kazunobu Kondo:
Comparative study on various noise reduction methods with decision-directed a priori SNR estimator via higher-order statistics. APSIPA 2012: 1-6 - [c167]Kazuma Nishimura, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Response generation based on statistical machine translation for speech-oriented guidance system. APSIPA 2012: 1-4 - [c166]Yuji Onuma, Noriyoshi Kamado, Hiroshi Saruwatari, Kiyohiro Shikano:
Real-time semi-blind speech extraction with speaker direction tracking on Kinect. APSIPA 2012: 1-6 - [c165]Yu Takahashi, Ryoichi Miyazaki, Hiroshi Saruwatari, Kazunobu Kondo:
Theoretical analysis of musical noise in nonlinear noise reduction based on higher-order statistics. APSIPA 2012: 1-10 - [c164]Hiroshi Saruwatari, Ryo Wakisaka, Kiyohiro Shikano, Frédéric Mustière, Louis Thibault, Hossein Najaf-Zadeh, Martin Bouchard:
Sound-localization-preserved binaural MMSE STSA estimator with explicit and implicit binaural cues. EUSIPCO 2012: 310-314 - [c163]Noriyoshi Kamado, Masayuki Hirata, Hiroshi Saruwatari, Kiyohiro Shikano:
Object-based stereo up-mixer for wave field synthesis based on spatial information clustering. EUSIPCO 2012: 594-598 - [c162]Ryo Wakisaka, Hiroshi Saruwatari, Kiyohiro Shikano, Tomoya Takatani:
Speech kurtosis estimation from observed noisy signal based on generalized Gaussian distribution prior and additivity of cumulants. ICASSP 2012: 4049-4052 - [c161]Kenzo Yamamoto, Tomoki Toda, Hironori Doi, Hiroshi Saruwatari, Kiyohiro Shikano:
Statistical approach to voice quality control in esophageal speech enhancement. ICASSP 2012: 4497-4500 - [c160]Ryoichi Miyazaki, Hiroshi Saruwatari, Takayuki Inoue, Kiyohiro Shikano, Kazunobu Kondo:
Musical-noise-free speech enhancement: Theory and evaluation. ICASSP 2012: 4565-4568 - [c159]Haruka Majima, Rafael Torres, Yoko Fujita, Hiromichi Kawanami, Tomoko Matsui, Hiroshi Saruwatari, Kiyohiro Shikano:
Spoken Inquiry Discrimination Using Bag-of-Words for Speech-Oriented Guidance System. INTERSPEECH 2012: 2097-2100 - [c158]Keigo Kubo, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Evaluation of Many-to-Many Alignment Algorithm by Automatic Pronunciation Annotation Using Web Text Mining. INTERSPEECH 2012: 2318-2321 - [c157]Ryoichi Miyazaki, Hiroshi Saruwatari, Kiyohiro Shikano, Kazunobu Kondo:
Musical-noise-free blind speech extraction using ICA-based noise estimation and iterative spectral subtraction. ISSPA 2012: 286-291 - [c156]Miyuki Itoi, Ryoichi Miyazaki, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Blind speech extraction for Non-Audible Murmur speech with speaker's movement noise. ISSPIT 2012: 320-325 - [c155]Suzumi Kanehara, Hiroshi Saruwatari, Ryoichi Miyazaki, Kiyohiro Shikano, Kazunobu Kondo:
Theoretical Analysis of Musical Noise Generation in Noise Reduction Methods with Decision-Directed a Priori SNR Estimator. IWAENC 2012 - [c154]Ryoichi Miyazaki, Hiroshi Saruwatari, Kiyohiro Shikano, Kazunobu Kondo:
Musical-Noise-Free Blind Speech Extraction Using ICA-Based Noise Estimation with Channel Selection. IWAENC 2012 - [c153]Sunao Hara, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Development of a Toolkit Handling Multiple Speech-Oriented Guidance Agents for Mobile Applications. IWSDS 2012: 79-85 - [c152]Rafael Torres, Hiromichi Kawanami, Tomoko Matsui, Hiroshi Saruwatari, Kiyohiro Shikano:
Topic Classification of Spoken Inquiries Using Transductive Support Vector Machine. IWSDS 2012: 261-267 - [c151]Haruka Majima, Rafael Torres, Hiromichi Kawanami, Sunao Hara, Tomoko Matsui, Hiroshi Saruwatari, Kiyohiro Shikano:
Evaluation of Invalid Input Discrimination Using Bag-of-Words for Speech-Oriented Guidance System. IWSDS 2012: 389-397 - 2011
- [j42]Noriyoshi Kamado, Haruhide Hokari, Shoji Shimada, Hiroshi Saruwatari, Kiyohiro Shikano:
Sound Field Reproduction by Wavefront Synthesis Using Directly Aligned Multi Point Control. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 94-A(3): 907-920 (2011) - [j41]Hiroshi Saruwatari, Yohei Ishikawa, Yu Takahashi, Takayuki Inoue, Kiyohiro Shikano, Kazunobu Kondo:
Musical Noise Controllable Algorithm of Channelwise Spectral Subtraction and Adaptive Beamforming Based on Higher Order Statistics. IEEE Trans. Speech Audio Process. 19(6): 1457-1466 (2011) - [j40]Takayuki Inoue, Hiroshi Saruwatari, Yu Takahashi, Kiyohiro Shikano, Kazunobu Kondo:
Theoretical Analysis of Musical Noise in Generalized Spectral Subtraction Based on Higher Order Statistics. IEEE Trans. Speech Audio Process. 19(6): 1770-1779 (2011) - [c150]Shunta Ishii, Tomoki Toda, Hiroshi Saruwatari, Sakriani Sakti, Satoshi Nakamura:
Blind noise suppression for Non-Audible Murmur recognition with stereo signal processing. ASRU 2011: 494-499 - [c149]Kazunobu Kondo, Yu Takahashi, Seiichi Hashimoto, Hiroshi Saruwatari, Takanori Nishino, Kazuya Takeda:
Efficient blind speech separation suitable for embedded devices. EUSIPCO 2011: 2319-2323 - [c148]Hiroyuki Nawata, Noriyoshi Kamado, Hiroshi Saruwatari, Kiyohiro Shikano:
Automatic musical thumbnailing based on audio object localization and its evaluation. ICASSP 2011: 41-44 - [c147]Noriyoshi Kamado, Hiroshi Saruwatari, Kiyohiro Shikano:
Robust sound field reproduction integrating multi-point sound field control and wave field synthesis. ICASSP 2011: 441-444 - [c146]Takayuki Inoue, Hiroshi Saruwatari, Kiyohiro Shikano, Kazunobu Kondo:
Theoretical analysis of musical noise in Wiener filtering family via higher-order statistics. ICASSP 2011: 5076-5079 - [c145]Hironori Doi, Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
An evaluation of alaryngeal speech enhancement methods based on voice conversion techniques. ICASSP 2011: 5136-5139 - [c144]Denis Babani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Acoustic model training for non-audible murmur recognition using transformed normal speech data. ICASSP 2011: 5224-5227 - [c143]Ryoichi Miyazaki, Hiroshi Saruwatari, Kiyohiro Shikano:
Theoretical Analysis of Musical Noise and Speech Distortion in Structure-Generalized Parametric Blind Spatial Subtraction Array. INTERSPEECH 2011: 341-344 - [c142]Ryo Wakisaka, Hiroshi Saruwatari, Kiyohiro Shikano, Tomoya Takatani:
Blind Speech Prior Estimation for Generalized Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator. INTERSPEECH 2011: 361-364 - [c141]Nobuhiko Hattori, Tomoki Toda, Hisashi Kawai, Hiroshi Saruwatari, Kiyohiro Shikano:
Speaker-Adaptive Speech Synthesis Based on Eigenvoice Conversion and Language-Dependent Prosodic Conversion in Speech-to-Speech Translation. INTERSPEECH 2011: 2769-2772 - [c140]Hiroshi Saruwatari, Nobuhisa Hirata, Toshiyuki Hatta, Ryo Wakisaka, Kiyohiro Shikano, Tomoya Takatani:
Semi-blind speech extraction for robot using visual information and noise statistics. ISSPIT 2011: 264-269 - 2010
- [j39]Yu Takahashi, Hiroshi Saruwatari, Kiyohiro Shikano, Kazunobu Kondo:
Musical-Noise Analysis in Methods of Integrating Microphone Array and Spectral Subtraction Based on Higher-Order Statistics. EURASIP J. Adv. Signal Process. 2010 (2010) - [j38]Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Adaptive Training for Voice Conversion Based on Eigenvoices. IEICE Trans. Inf. Syst. 93-D(6): 1589-1598 (2010) - [j37]Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Evaluation of Extremely Small Sound Source Signals Used in Speaking-Aid System with Statistical Voice Conversion. IEICE Trans. Inf. Syst. 93-D(7): 1909-1917 (2010) - [j36]Hironori Doi, Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models. IEICE Trans. Inf. Syst. 93-D(9): 2472-2482 (2010) - [j35]Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Improvements of the One-to-Many Eigenvoice Conversion System. IEICE Trans. Inf. Syst. 93-D(9): 2491-2499 (2010) - [c139]Yohei Ishikawa, Hiroshi Saruwatari, Yu Takahashi, Kiyohiro Shikano, Kazunobu Kondo:
Musical noise controllable algorithm of channelwise spectral subtraction and beamforming based on higher-order statistics criterion. CIP 2010: 81-86 - [c138]Takayuki Inoue, Yu Takahashi, Hiroshi Saruwatari, Kiyohiro Shikano, Kazunobu Kondo:
Theoretical analysis of musical noise in generalized spectral subtraction: Why should not use power/amplitude subtraction? EUSIPCO 2010: 994-998 - [c137]Jani Even, Hiroshi Saruwatari, Kiyohiro Shikano, Tomoya Takatani:
Blind signal extraction based joint suppression of diffuse background noise and late reverberation. EUSIPCO 2010: 1534-1538 - [c136]Hiroshi Saruwatari, Ryoi Okamoto, Yu Takahashi, Kiyohiro Shikano:
Blind Speech Extraction Combining Generalized MMSE STSA Estimator and ICA-Based Noise and Speech Probability Density Function Estimations. LVA/ICA 2010: 49-56 - [c135]Yu Takahashi, Hiroshi Saruwatari, Hiroshi Shikano, Kazunobu Kondo:
Theoretical musical-noise analysis and its generalization for methods of integrating beamforming and spectral subtraction based on higher-order statistics. ICASSP 2010: 93-96 - [c134]Jani Even, Hiroshi Saruwatari, Kiyohiro Shikano, Tomoya Takatani:
Complex Newton algorithm for blind signal extraction of speech in diffuse noise. ICASSP 2010: 213-216 - [c133]Hironori Doi, Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Statistical approach to enhancing esophageal speech based on Gaussian mixture models. ICASSP 2010: 4250-4253 - [c132]Jani Even, Hiroshi Saruwatari, Kiyohiro Shikano, Tomoya Takatani:
Speech enhancement in presence of diffuse background noise: Why using blind signal extraction? ICASSP 2010: 4770-4773 - [c131]Ryoi Okamoto, Yu Takahashi, Hiroshi Saruwatari, Kiyohiro Shikano:
MMSE STSA estimator with nonstationary noise estimation based on ICA for high-quality speech enhancement. ICASSP 2010: 4778-4781 - [c130]Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Non-parallel training for many-to-many eigenvoice conversion. ICASSP 2010: 4822-4825 - [c129]Jani Even, Carlos Toshinori Ishi, Hiroshi Saruwatari, Norihiro Hagita:
Close speaker cancellation for suppression of non-stationary background noise for hands-free speech interface. INTERSPEECH 2010: 977-980 - [c128]Rafael Torres, Shota Takeuchi, Hiromichi Kawanami, Tomoko Matsui, Hiroshi Saruwatari, Kiyohiro Shikano:
Comparison of methods for topic classification in a speech-oriented guidance system. INTERSPEECH 2010: 1261-1264 - [c127]Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
The use of air-pressure sensor in electrolaryngeal speech enhancement based on statistical voice conversion. INTERSPEECH 2010: 1628-1631 - [c126]Kumi Ohta, Tomoki Toda, Yamato Ohtani, Hiroshi Saruwatari, Kiyohiro Shikano:
Adaptive voice-quality control based on one-to-many eigenvoice conversion. INTERSPEECH 2010: 2158-2161 - [c125]Hiroshi Sawada, Jani Even, Hiroshi Saruwatari, Kiyohiro Shikano, Tomoya Takatani:
Improvement of speech recognition performance for spoken-oriented robot dialog system using end-fire array. IROS 2010: 970-975 - [c124]Chie Hayashida, Tomoki Toda, Yamato Ohtani, Hiroshi Saruwatari, Kiyohiro Shikano:
Linear transformation approaches to many-to-one voice conversion. SSW 2010: 74-79
2000 – 2009
- 2009
- [j34]Rajkishore Prasad, Hiroshi Saruwatari, Kiyohiro Shikano:
Enhancement of speech signals separated from their convolutive mixture by FDICA algorithm. Digit. Signal Process. 19(1): 127-133 (2009) - [j33]Randy Gomez, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Techniques in rapid unsupervised speaker adaptation based on HMM-Sufficient Statistics. Speech Commun. 51(1): 42-57 (2009) - [j32]Yu Takahashi, Tomoya Takatani, Keiichi Osako, Hiroshi Saruwatari, Kiyohiro Shikano:
Blind Spatial Subtraction Array for Speech Enhancement in Noisy Environment. IEEE Trans. Speech Audio Process. 17(4): 650-664 (2009) - [c123]Jani Even, Hiroshi Saruwatari, Kiyohiro Shikano:
Enhanced wiener post-processing based on partial projection back of the blind signal separation noise estimate. EUSIPCO 2009: 1442-1446 - [c122]Takashi Hiekata, Takashi Morita, Youhei Ikeda, Hiroshi Hashimoto, Ruoyu Zhang, Yu Takahashi, Hiroshi Saruwatari, Kiyohiro Shikano:
Multiple ICA-based real-time blind source extraction applied to handy size microphone. ICASSP 2009: 121-124 - [c121]Yu Takahashi, Yoshihisa Uemura, Hiroshi Saruwatari, Kiyohiro Shikano, Kazunobu Kondo:
Musical noise analysis based on higher order statistics for microphone array and nonlinear signal processing. ICASSP 2009: 229-232 - [c120]Shigeki Miyabe, Biing-Hwang Juang, Hiroshi Saruwatari, Kiyohiro Shikano:
Kernel-based nonlinear independent component analysis for underdetermined blind source separation. ICASSP 2009: 1641-1644 - [c119]Yu Takahashi, Hiroshi Saruwatari, Yuki Fujihara, Kentaro Tachibana, Yoshimitsu Mori, Shigeki Miyabe, Kiyohiro Shikano, Akira Tanaka:
Source adaptive blind signal extraction using closed-form ICA for hands-free robot spoken dialogue system. ICASSP 2009: 3681-3684 - [c118]Hiroshi Saruwatari, Hiromichi Kawanami, Shota Takeuchi, Yu Takahashi, Tobias Cincarek, Kiyohiro Shikano:
Hands-free speech recognition challenge for real-world speech dialogue systems. ICASSP 2009: 3729-3732 - [c117]Daisuke Miyamoto, Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Acoustic compensation methods for body transmitted speech conversion. ICASSP 2009: 3901-3904 - [c116]Yoshihisa Uemura, Yu Takahashi, Hiroshi Saruwatari, Kiyohiro Shikano, Kazunobu Kondo:
Musical noise generation analysis for noise reduction methods based on spectral subtraction and MMSE STSA estimation. ICASSP 2009: 4433-4436 - [c115]Jani Even, Hiroshi Saruwatari, Kiyohiro Shikano:
Target Speech Enhancement in Presence of Jammer and Diffuse Background Noise. ICA 2009: 565-572 - [c114]Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Electrolaryngeal speech enhancement based on statistical voice conversion. INTERSPEECH 2009: 1431-1434 - [c113]Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Many-to-many eigenvoice conversion with reference voice. INTERSPEECH 2009: 1623-1626 - [c112]Jani Even, Hiroshi Sawada, Hiroshi Saruwatari, Kiyohiro Shikano, Tomoya Takatani:
Semi-blind suppression of internal noise for hands-free robot spoken dialog system. IROS 2009: 658-663 - [c111]Shigeki Miyabe, Keisuke Masatoki, Hiroshi Saruwatari, Kiyohiro Shikano, Toshiyuki Nomura:
Temporal quantization of spatial information using directional clustering for multichannel audio coding. WASPAA 2009: 261-264 - 2008
- [j31]Tobias Cincarek, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Cost Reduction of Acoustic Modeling for Real-Environment Applications Using Unsupervised and Selective Training. IEICE Trans. Inf. Syst. 91-D(3): 499-507 (2008) - [j30]Tobias Cincarek, Hiromichi Kawanami, Ryuichi Nisimura, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Development, Long-Term Operation and Portability of a Real-Environment Speech-Oriented Guidance System. IEICE Trans. Inf. Syst. 91-D(3): 576-587 (2008) - [j29]Goshu Nagino, Makoto Shozakai, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method. IEICE Trans. Inf. Syst. 91-D(3): 607-614 (2008) - [j28]Yuki Yai, Shigeki Miyabe, Hiroshi Saruwatari, Kiyohiro Shikano, Yosuke Tatekura:
Rapid Compensation of Temperature Fluctuation Effect for Multichannel Sound Field Reproduction System. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 91-A(6): 1329-1336 (2008) - [j27]Keiichi Osako, Yoshimitsu Mori, Yu Takahashi, Hiroshi Saruwatari, Kiyohiro Shikano:
Fast Convergence Blind Source Separation Using Frequency Subband Interpolation by Null Beamforming. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 91-A(6): 1357-1361 (2008) - [c110]Jani Even, Hiroshi Saruwatari, Kiyohiro Shikano:
Extension of score function difference for frequency domain blind source separation. EUSIPCO 2008: 1-5 - [c109]Jani Even, Hiroshi Saruwatari, Kiyohiro Shikano:
Frequency domain semi-blind signal separation: application to the rejection of internal noises. ICASSP 2008: 157-160 - [c108]Yuuki Haraguchi, Shigeki Miyabe, Hiroshi Saruwatari, Kiyohiro Shikano, Toshiyuki Nomura:
Source-oriented localization control of stereo audio signals based on blind source separation. ICASSP 2008: 177-180 - [c107]Yuuta Yuyama, Shigeki Miyabe, Hiroshi Saruwatari, Kiyohiro Shikano:
Hybrid structure of inverse filtering and DOA-parameterized wavefront synthesis. ICASSP 2008: 401-404 - [c106]Randy Gomez, Jani Even, Hiroshi Saruwatari, Kiyohiro Shikano:
Distant talking robust speech recognition using late reflection components of room impulse response. ICASSP 2008: 4581-4584 - [c105]Shota Takeuchi, Tobias Cincarek, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Question and answer database optimization using speech recognition results. INTERSPEECH 2008: 451-454 - [c104]Hiroshi Saruwatari, Yu Takahashi, Hiroyuki Sakai, Shota Takeuchi, Tobias Cincarek, Hiromichi Kawanami, Kiyohiro Shikano:
Development and evaluation of hands-free spoken dialogue system for railway station guidance. INTERSPEECH 2008: 455-458 - [c103]Takashi Muramatsu, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory. INTERSPEECH 2008: 1076-1079 - [c102]Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
An improved one-to-many eigenvoice conversion system. INTERSPEECH 2008: 1080-1083 - [c101]Hideki Okamoto, Tomoko Matsui, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Speaker verification with non-audible murmur segments by combining global alignment kernel and penalized logistic regression machine. INTERSPEECH 2008: 1369-1372 - [c100]Daisuke Tani, Tomoki Toda, Yamato Ohtani, Hiroshi Saruwatari, Kiyohiro Shikano:
Maximum a posteriori adaptation for many-to-one eigenvoice conversion. INTERSPEECH 2008: 1461-1463 - [c99]Keigo Nakamura, Tomoki Toda, Yoshitaka Nakajima, Hiroshi Saruwatari, Kiyohiro Shikano:
Evaluation of speaking-aid system with voice conversion for laryngectomees toward its use in practical environments. INTERSPEECH 2008: 2209-2212 - [c98]Yu Takahashi, Hiroshi Saruwatari, Kiyohiro Shikano:
Real-time implementation of blind spatial subtraction array for hands-free robot spoken dialogue system. IROS 2008: 1687-1692 - [c97]Jani Even, Hiroshi Saruwatari, Kiyohiro Shikano:
An improved permutation solver for blind signal separation based front-ends in robot audition. IROS 2008: 2172-2177 - [c96]Jumpei Miyake, Shota Takeuchi, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Language model for the web search task in a spoken dialogue system for children. WOCCI 2008: 10 - 2007
- [j26]Panikos Heracleous, Tomomi Kaino, Hiroshi Saruwatari, Kiyohiro Shikano:
Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor. EURASIP J. Adv. Signal Process. 2007 (2007) - [j25]Shigeki Miyabe, Yoichi Hinamoto, Hiroshi Saruwatari, Kiyohiro Shikano, Yosuke Tatekura:
Interface for Barge-in Free Spoken Dialogue System Based on Sound Field Reproduction and Microphone Array. EURASIP J. Adv. Signal Process. 2007 (2007) - [j24]Randy Gomez, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Reducing Computation Time of the Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics. IEICE Trans. Inf. Syst. 90-D(2): 554-561 (2007) - [c95]Tobias Cincarek, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Development and portability of ASR and Q&A modules for real-environment speech-oriented guidance systems. ASRU 2007: 520-525 - [c94]Shigeki Miyabe, Tomoya Takatani, Hiroshi Saruwatari, Kiyohiro Shikano, Yosuke Tatekura:
Barge-in- and noise-free spoken dialogue interface based on sound field control and semi-blind source separation. EUSIPCO 2007: 232-236 - [c93]Kentaro Tachibana, Hiroshi Saruwatari, Yoshimitsu Mori, Shigeki Miyabe, Kiyohiro Shikano, Akira Tanaka:
Efficient Blind Source Separation Combining Closed-Form Second-Order ICA and Nonclosed-Form Higher-Order ICA. ICASSP (1) 2007: 45-48 - [c92]Yu Takahashi, Tomoya Takatani, Hiroshi Saruwatari, Kiyohiro Shikano:
Permutation-Robust Structure for ICA-Based Blind Source Extraction. ICASSP (1) 2007: 149-152 - [c91]Yoshimitsu Mori, Tomoya Takatani, Hiroshi Saruwatari, Kiyohiro Shikano, Takashi Hiekata, Takashi Morita:
High-Presence Hearing-Aid System using DSP-Based Real-Time Blind Source Separation Module. ICASSP (4) 2007: 609-612 - [c90]Randy Gomez, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Rapid unsupervised speaker adaptation using single utterance based on MLLR and speaker selection. INTERSPEECH 2007: 262-265 - [c89]Tobias Cincarek, Izumi Shindo, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Development of preschool children subsystem for ASR and q&a in a real-environment speech-oriented guidance task. INTERSPEECH 2007: 1469-1472 - [c88]Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Speaker adaptive training for one-to-many eigenvoice conversion based on Gaussian mixture model. INTERSPEECH 2007: 1981-1984 - [c87]Hideki Okamoto, Mariko Kojima, Tomoko Matsui, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Study on speaker verification with non-audible murmur segments. INTERSPEECH 2007: 2017-2020 - [c86]Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Impact of various small sound source signals on voice conversion accuracy in speech communication aid for laryngectomees. INTERSPEECH 2007: 2517-2520 - [c85]Yoshimitsu Mori, Tomoya Takatani, Hiroshi Saruwatari, Kiyohiro Shikano, Takashi Hiekata, Takashi Morita:
Noise-robust hands-free speech recognition using SIMO-model-based blind source separation. ISSPA 2007: 1-4 - [c84]Yu Takahashi, Tomoya Takatani, Hiroshi Saruwatari, Kiyohiro Shikano:
Robust spatial subtraction array with independent component analysis for speech enhancement. ISSPA 2007: 1-4 - [c83]Hiroyuki Sakai, Tobias Cincarek, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano, Akinobu Lee:
Voice activity detection applied to hands-free spoken dialogue robot based on decoding using acoustic and language model. ROBOCOMM 2007: 16 - [c82]Kumi Ohta, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Regression approaches to voice quality controll based on one-to-many eigenvoice conversion. SSW 2007: 101-106 - [c81]Daisuke Tani, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
An evaluation of many-to-one voice conversion algorithms with pre-stored speaker data sets. SSW 2007: 107-112 - [p1]Hiroshi Saruwatari, Tomoya Takatani, Kiyohiro Shikano:
SIMO-Model-Based Blind Source Separation - Principle and its Applications. Blind Speech Separation 2007: 149-168 - 2006
- [j23]Yoshimitsu Mori, Hiroshi Saruwatari, Tomoya Takatani, Satoshi Ukai, Kiyohiro Shikano, Takashi Hiekata, Youhei Ikeda, Hiroshi Hashimoto, Takashi Morita:
Blind Separation of Acoustic Signals Combining SIMO-Model-Based Independent Component Analysis and Binary Masking. EURASIP J. Adv. Signal Process. 2006 (2006) - [j22]Shigeki Miyabe, Hiroshi Saruwatari, Kiyohiro Shikano, Yosuke Tatekura:
Interface for Barge-in Free Spoken Dialogue System Using Nullspace Based Sound Field Control and Beamforming. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 89-A(3): 716-726 (2006) - [j21]Tobias Cincarek, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Utterance-Based Selective Training for the Automatic Creation of Task-Dependent Acoustic Models. IEICE Trans. Inf. Syst. 89-D(3): 962-969 (2006) - [j20]Randy Gomez, Akinobu Lee, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Improving Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics in Noisy Environments Using Multi-Template Models. IEICE Trans. Inf. Syst. 89-D(3): 998-1005 (2006) - [j19]Hiroshi Saruwatari, Toshiya Kawamura, Tsuyoki Nishikawa, Akinobu Lee, Kiyohiro Shikano:
Blind source separation based on a fast-convergence algorithm combining ICA and beamforming. IEEE Trans. Speech Audio Process. 14(2): 666-678 (2006) - [c80]Yoshimitsu Mori, Tomoya Takatani, Hiroshi Saruwatari, Kiyohiro Shikano, Takashi Hiekata, Takashi Morita:
Two-stage blind separation of moving sound sources with pocket-size real-time DSP module. EUSIPCO 2006: 1-5 - [c79]Yoshimitsu Mori, Hiroshi Saruwatari, Tomoya Takatani, Kiyohiro Shikano, Takashi Hiekata, Takashi Morita:
ICA and Binary-Mask-Based Blind Source Separation with Small Directional Microphones. ICA 2006: 649-657 - [c78]Yoshimitsu Mori, Tomoya Takatani, Hiroshi Saruwatari, Takashi Hiekata, Takashi Morita:
Blind Source Separation Combining Simo-Ica and Simo-Model-Based Binary Masking. ICASSP (5) 2006: 81-84 - [c77]Shigeki Miyabe, Tomoya Takatani, Yoshimitsu Mori, Hiroshi Saruwatari, Kiyohiro Shikano, Yosuke Tatekura:
Double-Talk Free Spoken Dialogue Interface Combining Sound Field Control With Semi-Blind Source Separation. ICASSP (1) 2006: 809-812 - [c76]Randy Gomez, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Improving Rapid Unsupervised Speaker Adaptation Based On Hmm Sufficient Statistics. ICASSP (1) 2006: 1001-1004 - [c75]Tobias Cincarek, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Acoustic modeling for spoken dialogue systems based on unsupervised utterance-based selective training. INTERSPEECH 2006 - [c74]Mariko Kojima, Tomoko Matsui, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Speaker verification with non-audible murmur segments. INTERSPEECH 2006 - [c73]Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Speaking aid system for total laryngectomees using voice conversion of body transmitted artificial speech. INTERSPEECH 2006 - [c72]Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation. INTERSPEECH 2006 - [c71]Tomoyuki Kato, Tomiki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Transcription Cost Reduction for Constructing Acoustic Models Using Acoustic Likelihood Selection Criteria. LREC 2006: 789-792 - 2005
- [j18]Kazuki Adachi, Tomoki Toda, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Designing Target Cost Function Based on Prosody of Speech Database. IEICE Trans. Inf. Syst. 88-D(3): 519-524 (2005) - [j17]Satoshi Ukai, Tomoya Takatani, Hiroshi Saruwatari, Kiyohiro Shikano, Ryo Mukai, Hiroshi Sawada:
Multistage SIMO-Model-Based Blind Source Separation Combining Frequency-Domain ICA and Time-Domain ICA. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 88-A(3): 642-650 (2005) - [j16]Tatsunori Asai, Hiroshi Saruwatari, Kiyohiro Shikano:
Interface for Barge-in Free Spoken Dialogue System Combining Adaptive Sound Field Control and Microphone Array. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 88-A(6): 1613-1618 (2005) - [j15]Tomoya Takatani, Satoshi Ukai, Tsuyoki Nishikawa, Hiroshi Saruwatari, Kiyohiro Shikano:
A Self-Generator Method for Initial Filters of SIMO-ICA Applied to Blind Separation of Binaural Sound Mixtures. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 88-A(7): 1673-1682 (2005) - [j14]Rajkishore Prasad, Hiroshi Saruwatari, Kiyohiro Shikano:
Blind Separation of Speech by Fixed-Point ICA with Source Adaptive Negentropy Approximation. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 88-A(7): 1683-1692 (2005) - [j13]Yosuke Tatekura, Shigefumi Urata, Hiroshi Saruwatari, Kiyohiro Shikano:
On-Line Relaxation Algorithm Applicable to Acoustic Fluctuation for Inverse Filter in Multichannel Sound Reproduction System. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 88-A(7): 1747-1756 (2005) - [j12]Hiroshi Saruwatari, Hiroaki Yamajo, Tomoya Takatani, Tsuyoki Nishikawa, Kiyohiro Shikano:
Blind Separation and Deconvolution for Convolutive Mixture of Speech Combining SIMO-Model-Based ICA and Multichannel Inverse Filtering. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 88-A(9): 2387-2400 (2005) - [j11]Shoko Araki, Shoji Makino, Robert Aichner, Tsuyoki Nishikawa, Hiroshi Saruwatari:
Subband-Based Blind Separation for Convolutive Mixtures of Speech. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 88-A(12): 3593-3603 (2005) - [j10]Rajkishore Prasad, Hiroshi Saruwatari, Kiyohiro Shikano:
Estimation of Shape Parameter of GGD Function by Negentropy Matching. Neural Process. Lett. 22(3): 377-389 (2005) - [c70]Panikos Heracleous, Yoshitaka Nakajima, Hiroshi Saruwatari, Kiyohiro Shikano:
A tissue-conductive acoustic sensor applied in speech recognition for privacy. sOc-EUSAI 2005: 93-97 - [c69]Shigeki Miyabe, Hiroshi Saruwatari, Kiyohiro Shikano, Yosuke Tatekura:
Barge-in free spoken dialogue interface using nullspace-based sound field control and beamforming. EUSIPCO 2005: 1-4 - [c68]Tsuyoki Nishikawa, Hiroshi Saruwatari, Kiyohiro Shikano:
Blind separation of more than two sources based on high-convergence algorithm combining ICA and beamforming. EUSIPCO 2005: 1-4 - [c67]Hiroshi Saruwatari, Satoshi Ukai, Tomoya Takatani, Tsuyoki Nishikawa, Kiyohiro Shikano:
Two-stage blind source separation combining SIMO-model-based ICA and adaptive beamforming. EUSIPCO 2005: 1-4 - [c66]Tomoya Takatani, Satoshi Ukai, Tsuyoki Nishikawa, Hiroshi Saruwatari, Kiyohiro Shikano:
Blind separation of binaural sound mixtures using SIMO-ICA with self-generator for initial filter. EUSIPCO 2005: 1-4 - [c65]Satoshi Ukai, Tomoya Takatani, Tsuyoki Nishikawa, Hiroshi Saruwatari:
Blind source separation combining SIMO-model-based ICA and adaptive beamforming. ICASSP (3) 2005: 85-88 - [c64]Hiroshi Saruwatari, Katsuyuki Sawai, Tsuyoki Nishikawa, Akinobu Lee, Kiyohiro Shikano, Atsunobu Kaminuma, Masao Sakata, Daisuke Saitoh:
Speech Enhancement Based on Blind Source Separation in Car Environments. ICDE Workshops 2005: 1205 - [c63]Randy Gomez, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Rapid unsupervised speaker adaptation based on multi-template HMM sufficient statistics in noisy environments. INTERSPEECH 2005: 293-296 - [c62]Daisuke Saitoh, Atsunobu Kaminuma, Hiroshi Saruwatari, Tsuyoki Nishikawa, Akinobu Lee:
Speech extraction in a car interior using frequency-domain ICA with rapid filter adaptations. INTERSPEECH 2005: 2301-2304 - [c61]Panikos Heracleous, Tomomi Kaino, Hiroshi Saruwatari, Kiyohiro Shikano:
Investigating the role of the Lombard reflex in non-audible murmur (NAM) recognition. INTERSPEECH 2005: 2649-2652 - [c60]Panikos Heracleous, Tomomi Kaino, Hiroshi Saruwatari, Kiyohiro Shikano:
Applications of NAM microphones in speech recognition for privacy in human-machine communication. INTERSPEECH 2005: 3041-3044 - [c59]Tomoya Takatani, Satoshi Ukai, Tsuyoki Nishikawa, Hiroshi Saruwatari, Kiyohiro Shikano:
Blind sound scene decomposition for robot audition using SIMO-model-based ICA. IROS 2005: 2247-2252 - [c58]Hiroshi Saruwatari, Yoshimitsu Mori, Tomoya Takatani, Satoshi Ukai, Kiyohiro Shikano, Takashi Hiekata, Takashi Morita:
Two-stage blind source separation based on ICA and binary masking for real-time robot audition system. IROS 2005: 2303-2308 - [c57]Yasuaki Ohashi, Tsuyoki Nishikawa, Hiroshi Saruwatari, Akinobu Lee, Kiyohiro Shikano:
Noise-robust hands-free speech recognition based on spatial subtraction array and known noise superimposition. IROS 2005: 2328-2332 - 2004
- [j9]Rajkishore Prasad, Hiroshi Saruwatari, Kiyohiro Shikano:
Robots that can hear, understand and talk. Adv. Robotics 18(5): 533-564 (2004) - [j8]Rajkishore Prasad, Hiroshi Saruwatari, Kiyohiro Shikano:
Negentropy based voice-activity detection for noise estimation in very low SNR condition. IEICE Electron. Express 1(16): 495-500 (2004) - [c56]Panikos Heracleous, Yoshitaka Nakajima, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Audible (normal) speech and inaudible murmur recognition using NAM microphone. EUSIPCO 2004: 329-332 - [c55]Hiroaki Yamajo, Hiroshi Saruwatari, Tomoya Takatani, Tsuyoki Nishikawa, Kiyohiro Shikano:
Evaluation of blind separation and deconvolution for binaural-sound mixtures using SIMO-model-based ICA. EUSIPCO 2004: 1709-1712 - [c54]Yosuke Tatekura, Shigefumi Urata, Hiroshi Saruwatari, Kiyohiro Shikano:
On-line adaptive algorithm to acoustic fluctuation for inverse filter relaxation in sound reproduction system. EUSIPCO 2004: 1765-1768 - [c53]Satoshi Ukai, Hiroshi Saruwatari, Tomoya Takatani, Kiyohiro Shikano, Ryo Mukai, Hiroshi Sawada:
Evaluation of Multistage SIMO-Model-Based Blind Source Separation Combining Frequency-Domain ICA and Time-Domain ICA. ICA 2004: 626-633 - [c52]Rajkishore Prasad, Hiroshi Saruwatari, Kiyohiro Shikano:
Single Channel Speech Enhancement: MAP Estimation Using GGD Prior Under Blind Setup. ICA 2004: 873-880 - [c51]Tsuyoki Nishikawa, Hiroshi Saruwatari, Kiyohiro Shikano, Atsunobu Kaminuma:
Stable and Low-Distortion Algorithm Based on Overdetermined Blind Separation for Convolutive Mixtures of Speech. ICA 2004: 881-888 - [c50]Satoshi Ukai, Hiroshi Saruwatari, Tomoya Takatani, Ryo Mukai, Hiroshi Sawada:
Multistage SIMO-model-based blind source separation combining frequency-domain ICA and time-domain ICA. ICASSP (4) 2004: 109-112 - [c49]Tomoya Takatani, Tsuyoki Nishikawa, Hiroshi Saruwatari, Kiyohiro Shikano:
Blind separation of binaural sound mixtures using SIMO-model-based independent component analysis. ICASSP (4) 2004: 113-116 - [c48]Tsuyoki Nishikawa, Hiroshi Abe, Hiroshi Saruwatari, Kiyohiro Shikano:
Overdetermined blind separation for convolutive mixtures of speech based on multistage ICA using subarray processing. ICASSP (1) 2004: 225-228 - [c47]Ryuichi Nisimura, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Public speech-oriented guidance system with adult and child discrimination capability. ICASSP (1) 2004: 433-436 - [c46]Rajkishore Prasad, Hiroshi Saruwatari, Kiyohiro Shikano:
MAP estimation of speech spectral component under GGD a priori. SAPA@INTERSPEECH 2004: 115 - [c45]Akinobu Lee, Keisuke Nakamura, Ryuichi Nisimura, Hiroshi Saruwatari, Kiyohiro Shikano:
Noise robust real world spoken dialogue system using GMM based rejection of unintended inputs. INTERSPEECH 2004: 173-176 - [c44]Panikos Heracleous, Yoshitaka Nakajima, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Non-audible murmur (NAM) speech recognition using a stethoscopic NAM microphone. INTERSPEECH 2004: 1469-1472 - [c43]Randy Gomez, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Robust speech recognition with spectral subtraction in low SNR. INTERSPEECH 2004: 2077-2080 - [c42]Tatsunori Asai, Shigeki Miyabe, Hiroshi Saruwatari, Kiyohiro Shikano:
Interface for barge-in free spoken dialogue system using adaptive sound field control. INTERSPEECH 2004: 2665-2668 - [c41]Kazuki Adachi, Tomoki Toda, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Perceptual Evaluation of Quality Deterioration Owing to Prosody Modification. LREC 2004 - 2003
- [j7]Hiroshi Saruwatari, Satoshi Kurita, Kazuya Takeda, Fumitada Itakura, Tsuyoki Nishikawa, Kiyohiro Shikano:
Blind Source Separation Combining Independent Component Analysis and Beamforming. EURASIP J. Adv. Signal Process. 2003(11): 1135-1146 (2003) - [j6]Shoko Araki, Shoji Makino, Yoichi Hinamoto, Ryo Mukai, Tsuyoki Nishikawa, Hiroshi Saruwatari:
Equivalence between Frequency-Domain Blind Source Separation and Frequency-Domain Adaptive Beamforming for Convolutive Mixtures. EURASIP J. Adv. Signal Process. 2003(11): 1157-1166 (2003) - [j5]Hiroshi Saruwatari, Toshiya Kawamura, Tsuyoki Nishikawa, Kiyohiro Shikano:
Fast-Convergence Algorithm for Blind Source Separation Based on Array Signal Processing. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 86-A(3): 634-639 (2003) - [j4]Tsuyoki Nishikawa, Hiroshi Saruwatari, Kiyohiro Shikano:
Blind Source Separation of Acoustic Signals Based on Multistage ICA Combining Frequency-Domain ICA and Time-Domain ICA. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 86-A(4): 846-858 (2003) - [j3]Tsuyoki Nishikawa, Hiroshi Saruwatari, Kiyohiro Shikano:
Stable Learning Algorithm for Blind Separation of Temporally Correlated Acoustic Signals Combining Multistage ICA and Linear Prediction. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 86-A(8): 2028-2036 (2003) - [j2]Shoko Araki, Ryo Mukai, Shoji Makino, Tsuyoki Nishikawa, Hiroshi Saruwatari:
The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech. IEEE Trans. Speech Audio Process. 11(2): 109-116 (2003) - [c40]Tomoya Takatani, Tsuyoki Nishikawa, Hiroshi Saruwatari:
Blind source separation based on binaural ICA. ICASSP (5) 2003: 321-324 - [c39]Yoichi Hinamoto, Kouichi Mino, Hiroshi Saruwatari, Kiyohiro Shikano:
Interface for barge-in free spoken dialogue system based on sound field control and microphone array. ICASSP (5) 2003: 505-508 - [c38]Shoko Araki, Shoji Makino, Robert Aichner, Tsuyoki Nishikawa, Hiroshi Saruwatari:
Subband based blind source separation for convolutive mixtures of speech. ICASSP (5) 2003: 509-512 - [c37]Hiroaki Yamajo, Hiroshi Saruwatari, Tomoya Takatani, Tsuyoki Nishikawa, Kiyohiro Shikano:
Blind separation and deconvolution for convolutive mixture of speech using SIMO-model-based ICA and multichannel inverse filtering. INTERSPEECH 2003: 537-540 - [c36]Shingo Yamade, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Unsupervised speaker adaptation based on HMM sufficient statistics in various noisy environments. INTERSPEECH 2003: 1493-1496 - [c35]Tatsuya Shiraishi, Tomoki Toda, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Simple designing methods of corpus-based visual speech synthesis. INTERSPEECH 2003: 2241-2244 - [c34]Hiromichi Kawanami, Yohei Iwami, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
GMM-based voice conversion applied to emotional speech synthesis. INTERSPEECH 2003: 2401-2404 - [c33]Tomoya Takatani, Tsuyoki Nishikawa, Hiroshi Saruwatari, Kiyohiro Shikano:
High-fidelity blind separation for convolutive mixture of acoustic signals using SIMO-model-based independent component analysis. ISSPA (2) 2003: 77-80 - [c32]Hiroshi Saruwatari, Hiroaki Yamajo, Tomoya Takatani, Tsuyoki Nishikawa, Kiyohiro Shikano:
Blind separation and deconvolution of MIMO system driven by colored inputs using SIMO-model-based ICA with information-geometric learning. NNSP 2003: 379-388 - [c31]Tsuyoki Nishikawa, Hiroshi Saruwatari, Kiyohiro Shikano:
Stable learning algorithm for low-distortion blind separation of real speech mixture combining multistage ICA and linear prediction. NOLISP 2003: 8 - 2002
- [j1]Yosuke Tatekura, Hiroshi Saruwatari, Kiyohiro Shikano:
Sound Reproduction System Including Adaptive Compensation of Temperature Fluctuation Effect for Broad-Band Sound Control. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 85-A(8): 1851-1860 (2002) - [c30]Tsuyoki Nishikawa, Hiroshi Saruwatari, Kiyohiro Shikano:
Comparison of time-domain ICA, frequency-domain ICA and multistage ICA for blind source separation. EUSIPCO 2002: 1-4 - [c29]Hiroshi Saruwatari, Toshiya Kawamura, Katsuyuki Sawai, Kiyohiro Shikano, Atsunobu Kaminuma, Masao Sakata:
Evaluation of fast-convergence algorithm for ICA-based blind source separation of real convolutive mixture. EUSIPCO 2002: 1-4 - [c28]Yosuke Tatekura, Hiroshi Saruwatari, Kiyohiro Shikano:
Adaptive compensation of temperature fluctuation effect in sound reproduction system. EUSIPCO 2002: 1-4 - [c27]Tsuyoki Nishikawa, Hiroshi Saruwatari, Kiyohiro Shikano:
Bund source separation based on Multi-Stage ICA combining frequency-domain ICA and time-domain ICA. ICASSP 2002: 917-920 - [c26]Hiroshi Saruwatari, Toshiya Kawamura, Katsuyuki Sawai, Atsunobu Kaminuma, Masao Sakata:
Blind source separation based on fast-convergence algorithm using ICA and beamforming for real convolutive mixture. ICASSP 2002: 921-924 - [c25]Shoko Araki, Yoichi Hinamoto, Shoji Makino, Tsuyoki Nishikawa, Ryo Mukai, Hiroshi Saruwatari:
Equivalence between frequency domain blind source separation and frequency domain adaptive beamforming. ICASSP 2002: 1785-1788 - [c24]Yosuke Tatekura, Hiroshi Saruwatari, Kiyohiro Shikano:
Sound reproduction system with adaptive compensation of temperature fluctuation effect. DSP 2002: 989-992 - [c23]Satoshi Nakamura, Kazuo Hiyane, Futoshi Asano, Yutaka Kaneda, Takeshi Yamada, Takanobu Nishiura, Tetsunori Kobayashi, Shiro Ise, Hiroshi Saruwatari:
Design and collection of acoustic sound data for hands-free speech recognition and sound scene understanding. ICME (2) 2002: 161-164 - [c22]Shingo Yamade, Kanako Matsunami, Akira Baba, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Spectral subtraction in noisy environments applied to speaker adaptation based on HMM sufficient statistics. INTERSPEECH 2002: 1045-1048 - [c21]Hiroshi Saruwatari, Katsuyuki Sawai, Akinobu Lee, Kiyohiro Shikano, Atsunobu Kaminuma, Masao Sakata:
Speech enhancement in car environment using blind source separation. INTERSPEECH 2002: 1781-1784 - [c20]Akinobu Lee, Yuichiro Mera, Hiroshi Saruwatari, Kiyohiro Shikano:
Selective multi-path acoustic model based on database likelihoods. INTERSPEECH 2002: 2661-2664 - [c19]Ryuichi Nisimura, Takashi Uchida, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano, Yoshio Matsumoto:
ASKA: receptionist robot with speech dialogue system. IROS 2002: 1314-1319 - [c18]Robert Aichner, Shoko Araki, Shoji Makino, Tsuyoki Nishikawa, Hiroshi Saruwatari:
Time domain blind source separation of non-stationary convolved signals by utilizing geometric beamforming. NNSP 2002: 445-454 - 2001
- [c17]Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. ICASSP 2001: 841-844 - [c16]Hiroshi Saruwatari, Satoshi Kurita, Kazuya Takeda:
Blind source separation combining frequency-domain ICA and beamforming. ICASSP 2001: 2733-2736 - [c15]Shoko Araki, Shoji Makino, Tsuyoki Nishikawa, Hiroshi Saruwatari:
Fundamental limitation of frequency domain blind source separation for convolutive mixture of speech. ICASSP 2001: 2737-2740 - [c14]Hidekazu Kamiyanagida, Hiroshi Saruwatari, Kazuya Takeda, Fumitada Itakura:
Direction of arrival estimation based on nonlinear microphone array. ICASSP 2001: 3033-3036 - [c13]Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano:
High quality voice conversion based on Gaussian mixture model with dynamic frequency warping. INTERSPEECH 2001: 349-352 - [c12]Miichi Yamada, Akira Baba, Shinichi Yoshizawa, Yuichiro Mera, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Unsupervised noisy environment adaptation algorithm using MLLR and speaker selection. INTERSPEECH 2001: 869-872 - [c11]Ryuichi Nisimura, Kumiko Komatsu, Yuka Kuroda, Kentaro Nagatomo, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano:
Automatic n-gram language model creation from web resources. INTERSPEECH 2001: 2127-2130 - [c10]Shoko Araki, Shoji Makino, Ryo Mukai, Hiroshi Saruwatari:
Equivalence between frequency domain blind source separation and frequency domain adaptive null beamformers. INTERSPEECH 2001: 2595-2598 - [c9]Hiroshi Saruwatari, Toshiya Kawamura, Kiyohiro Shikano:
Blind source separation for speech based on fast-convergence algorithm with ICA and beamforming. INTERSPEECH 2001: 2603-2606 - 2000
- [c8]Hiroshi Saruwatari, Shoji Kajita, Kazuya Takeda, Fumitada Itakura:
Speech enhancement based on noise adaptive nonlinear microphone array. EUSIPCO 2000: 1-4 - [c7]Hiroshi Saruwatari, Shoji Kajita, Kazuya Takeda, Fumitada Itakura:
Speech enhancement using nonlinear microphone array with noise adaptive complementary beamforming. ICASSP 2000: 1049-1052 - [c6]Satoshi Kurita, Hiroshi Saruwatari, Shoji Kajita, Kazuya Takeda, Fumitada Itakura:
Evaluation of blind signal separation method using directivity pattern under reverberant conditions. ICASSP 2000: 3140-3143 - [c5]Hiroshi Saruwatari, Satoshi Kurita, Kazuya Takeda, Fumitada Itakura, Kiyohiro Shikano:
Blind source separation based on subband ICA and beamforming. INTERSPEECH 2000: 94-97 - [c4]Tomoki Toda, Jinlin Lu, Hiroshi Saruwatari, Kiyohiro Shikano:
Straight-based voice conversion algorithm based on Gaussian mixture model. INTERSPEECH 2000: 279-282
1990 – 1999
- 1999
- [c3]Hiroshi Saruwatari, Shoji Kajita, Kazuya Takeda, Fumitada Itakura:
Speech enhancement using nonlinear microphone array with complementary beamforming. ICASSP 1999: 69-72 - [c2]Michiaki Omura, Motohiko Yada, Hiroshi Saruwatari, Shoji Kajita, Kazuya Takeda, Fumitada Itakura:
Compensating of room acoustic transfer functions affected by change of room temperature. ICASSP 1999: 941-944 - [c1]Hiroshi Saruwatari, Shoji Kajita, Kazuya Takeda, Fumitada Itakura:
Speech enhancement using nonlinear microphone array under nonstationary noise conditions. EUROSPEECH 1999: 2567-2570
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-23 20:36 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint