


default search action
MMM 2025, Nara, Japan - Part V
- Ichiro Ide
, Ioannis Kompatsiaris
, Changsheng Xu
, Keiji Yanai
, Wei-Ta Chu
, Naoko Nitta, Michael Riegler
, Toshihiko Yamasaki
:
MultiMedia Modeling - 31st International Conference on Multimedia Modeling, MMM 2025, Nara, Japan, January 8-10, 2025, Proceedings, Part V. Lecture Notes in Computer Science 15524, Springer 2025, ISBN 978-981-96-2073-9
Special Session on Multimedia Research in Robotics
- Jia Yap Lim, John See
, Christian Dondrup
:
Multimodal Engagement Prediction in Human-Robot Interaction Using Transformer Neural Networks. 3-17 - Daichi Yoshihara, Akishige Yuguchi, Seiya Kawano, Takamasa Iio, Koichiro Yoshino:
What Should Autonomous Robots Verbalize and What Should They Not? 18-29
Special Session on Spatial Intelligence in Multimedia Analytics
- Narges Ghasemi, Seon Ho Kim
, Abdullah Alfarrarjeh
, Cyrus Shahabi
:
Counting Unique Objects in Geo-Tagged Street Images: A Case Study of Homeless Encampments in Los Angeles. 33-46
Special Session on Simulating Edge Computing and Multimodal AI: A Benchmark for Real-World Applications
- Duy-Dong Le, Duy-Thanh Huynh, Pham The Bao:
Correlation-Based Weighted Federated Learning with Multimodal Sensing and Knowledge Distillation: An Application on a Real-World Benchmark Dataset. 49-60 - Dang Vu, Tien Dang, Quoc-Trung Nguyen, Tan Pham:
Leveraging Pruning, Quantization and Multi-objective Optimization for an Efficient Deployment of Multi-modal Models. 61-73
Demo Papers
- Onanong Kongmeesub, Cathal Gurrin
, Dongyun Nie
:
A User Identification and Reading Style Detection System Based on Eye Movement Patterns While Reading. 77-83 - Ibrahim Serouis, Florence Sèdes
:
AMDA: Advancing Multimedia Data Annotation for Human-Centric Situations. 84-90 - Tetsuro Kitahara, Takuya Tsutsumi, Takaaki Nagoshi, Taizan Suzuki:
An Implementation of Networked JamSketch. 91-97 - Duen-Chian Jheng, Bill Louis Harchan, Berenika Nawoja Kostka de Sztemberg, Jen-Hao Hsu
, Min-Chun Tien
:
Badminton Footwork Practice via an Immersive Virtual Reality System. 98-104 - Hanna Borgli
, Håkon Kvale Stensland
, Pål Halvorsen:
Better Image Segmentation with Classification: Guiding Zero-Shot Models Using Class Activation Maps. 105-111 - Yung-Chu Chiang, Zi-Xian Tang, Yi-Ching Luo, Jason S. Chang:
CleverFox: Integrating Visual Mnemonics with AI for Enhanced Language Learning. 112-118 - Ioannis Kontostathis, Evlampios Apostolidis
, Konstantinos Apostolidis
, Vasileios Mezaris
:
Enhancing User Control in AI-Based Video Summarization for Social Media. 119-126 - Hung-Yao Peng, Zi-Heng Zhong, Cheng-Chih Tsai, Ching-Yeh Chiang, Tse-Yu Pan
:
FencBuddy: Action-Aware Depth Perception Training for Fencing Attacks. 127-133 - Nami Iino
, Akinaru Iino:
Fingering Prediction for Classical Guitar: Dataset Creation and Model Development. 134-141 - Honghui Yuan, Keiji Yanai
:
KuzushijiFontDiff: Diffusion Model for Japanese Kuzushiji Font Generation. 142-149 - Bohan Li, Xingyi Li, Yangwen Liang, Shuangquan Wang, Kee-Bong Song:
Leveraging Latent Diffusion in 3D Gaussian Splatting for Novel View Synthesis. 150-157 - Wei-Lun Huang, Shintami Chusnul Hidayati
, Tse-Yu Pan
:
Movie Retrieval Systems Using Genre-Guided Multimodal Learning Techniques. 158-164 - Omar Shahbaz Khan
, Aaron Duane
, Hariz Hasnan, Noé Le Blavec, Pierre Ouvrard, Johan Verdon, Laurent d'Orazio
, Constance Thierry
, Björn Þór Jónsson:
Multi-Dimensional Exploration of Media Collection Metadata. 165-172 - Kelley Lynch, Kyeongmin Rim, Owen King, James Pustejovsky:
Multimodal Interoperability with the CLAMS Platform. 173-179 - Masatoshi Hamanaka
:
Real-Time Visualizer for Turntablist Performance. 180-186 - Yasutomo Kawanishi
, Yutaka Nakamura, Taiken Shintani, Carlos Toshinori Ishi
, Seiya Kawano
, Koichiro Yoshino, Takashi Minato
, Michihiko Minoh:
RoboDJ: Live Commentary Robots System Driven by Physical- and Cyber-World Observations. 187-193 - Honghui Yuan, Keiji Yanai
:
SceneTextStyler: Editing Text with Style Transformation. 194-201 - Jobin Idiculla Wattaseril, Jürgen Döllner:
SelectSum: Topic-Based Selective Summarization of Speech-Based Videos. 202-209 - Wenbin Gan, Minh-Son Dao, Koji Zettsu:
Smart Driving Assistance with Real-Time Risk Assessment and Personalized Driving Coaching to Enhance Road Safety. 210-217 - Jaime B. Fernandez
, Muhammad Intizar Ali
:
System Demo of Modeling Smart University Campus Virtual Environments. 218-224 - Martin Korb, Werner Bailer
:
Training a Segmentation-Based Visual Anonymization Service for Street Scenes. 225-232 - Christian Limberg
, Zhe Zhang, Marc A. Kastner
:
Transformer-Based Audio Generation Conditioned by 2D Latent Maps: A Demonstration. 233-239 - Angel F. Garcia Contreras, Wen-Yu Chang, Seiya Kawano, Yun-Nung Chen, Koichiro Yoshino:
Using Language Models to Generate and Forget the Narrative Memories of an Assistive Robot. 240-247 - Kota Izumi
, Keiji Yanai
:
WaveFontStyler: Font Style Transfer Based on Sound. 248-254
Video Browser Showdown
- Mario Leopold, Klaus Schoeffmann:
DiveXplore at the Video Browser Showdown 2025. 257-263 - Ujjwal Sharma
, Omar Shahbaz Khan
, Stevan Rudinac
, Björn Þór Jónsson:
Exquisitor at the Video Browser Showdown 2025: Unifying Conversational Search and User Relevance Feedback. 264-271 - Luca Rossetto
, Ralph Gasser
:
Feature-Driven Video Segmentation and Advanced Querying with vitrivr-Engine. 272-277 - Huy M. Le, Dat Nguyen Tien, Khang Le Duy, Tuan Nguyen Dang Quang, Nguyen Khanh Toan, Tuyen Nguyen, Binh T. Nguyen:
Fusionista: Fusion of 3-D Information of Video in Retrieval System. 278-285 - Tai Nguyen, Vo Ngoc Minh Anh, Duc Dat Pham, Tran Quang Vinh, Nhu Duong Thi Quynh, Le Anh Tien, Tan Duy Le, Binh T. Nguyen:
HORUS: Multimodal Large Language Models Framework for Video Retrieval at VBS 2025. 286-293 - Duc-Tuan Luu
, Khanh-An C. Quan
, Duy-Ngoc Nguyen, Khanh-Linh Bui-Le, Nhat-Sang Doan, Minh-Duc Le-Ngo, Vinh-Tiep Nguyen
, Minh-Triet Tran
:
IMSearch 2.0: Toward User-Centric and Efficient Interactive Multimedia Retrieval System. 294-301 - Yu-Tong Cheng, Jiaxin Wu, Zhixin Ma, Jiangshan He, Xiao-Yong Wei, Chong-Wah Ngo:
Interactive Video Search with Multi-modal LLM Video Captioning. 302-309 - Rahel Arnold
, Rahel Kempf, Raphael Waltenspül, Heiko Schuldt
:
MediaMix: Multimedia Retrieval in Mixed Reality. 310-317 - Bao Tran Gia, Tuong Bui Cong Khanh, Tam Le Thi Thanh, Thuyen Tran Doan, Khiem Le, Tien Do, Tien-Dung Mai, Thanh Duc Ngo, Duy-Dinh Le, Shin'ichi Satoh:
NII-UIT at VBS2025: Multimodal Video Retrieval with LLM Integration and Dynamic Temporal Search. 318-325 - Michael Stroh, Vojtech Kloda, Benjamin Verner, Zuzana Vopálková, Raphael Buchmüller, Bastian Jäckl, Jakub Hajko, Jakub Lokoc:
PraK Tool V3: Enhancing Video Item Search Using Localized Text and Texture Queries. 326-333 - Florian Spiess
, Luca Rossetto
, Heiko Schuldt
:
Simplified Video Retrieval in Virtual Reality with vitrivr-VR. 334-338 - Minh-Quan Ho-Le, Duy-Khang Ho, Huy-Hoang Do-Huu, Nhut-Thanh Le-Hinh, Hoa-Vien Vo-Hoang, Van-Tu Ninh
, Cathal Gurrin
, Minh-Triet Tran
:
SnapSeek 2.0 at Video Browser Showdown 2025. 339-346 - Thang-Long Nguyen-Ho
, Viet-Tham Huynh
, Onanong Kongmeesub, Minh-Triet Tran
, Dongyun Nie
, Graham Healy
, Cathal Gurrin
:
VEAGLE: Eye Gaze-Assisted Guidance for Video Browser Showdown. 347-354 - Nick Pantelidis, Dimitris Georgalis, Maria Pegia, Damianos Galanopoulos, Konstantinos Apostolidis, Klearchos Stavrothanasopoulos, Anastasia Moumtzidou, Konstantinos Gkountakos, Ilias Gialampoukidis, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris:
VERGE in VBS 2025. 355-362 - Quang-Linh Tran, Binh T. Nguyen, Gareth J. F. Jones, Cathal Gurrin:
VideoEase at VBS2025: An Interactive Video Retrieval System. 363-370 - Gia-Huy Vuong
, Van-Son Ho
, Tien-Thanh Nguyen-Dang, Xuan-Dang Thai, Minh-Quan Ho-Le, Tu-Khiem Le
, Minh-Khoi Pham
, Van-Tu Ninh
, Cathal Gurrin
, Minh-Triet Tran
:
ViewsInsight2.0: Enhancing Video Retrieval for VBS 2025 with an Automatic Query Generator Powered by Large Language Models. 371-377 - Khanh-An C. Quan
, Qui Ngoc Nguyen, Minh-Triet Tran
:
ViFi: A Video Finding System at Video Browser Showdown 2025. 378-384

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.