Knowledge boosting during low-latency inference

Srinivas, Vidya; Itani, Malek; Chen, Tuochao; Eskimez, Sefik Emre; Yoshioka, Takuya; Gollakota, Shyamnath

Computer Science > Machine Learning

arXiv:2407.11055 (cs)

[Submitted on 9 Jul 2024 (v1), last revised 25 Jul 2024 (this version, v3)]

Title:Knowledge boosting during low-latency inference

Authors:Vidya Srinivas, Malek Itani, Tuochao Chen, Sefik Emre Eskimez, Takuya Yoshioka, Shyamnath Gollakota

View PDF HTML (experimental)

Abstract:Models for low-latency, streaming applications could benefit from the knowledge capacity of larger models, but edge devices cannot run these models due to resource constraints. A possible solution is to transfer hints during inference from a large model running remotely to a small model running on-device. However, this incurs a communication delay that breaks real-time requirements and does not guarantee that both models will operate on the same data at the same time. We propose knowledge boosting, a novel technique that allows a large model to operate on time-delayed input during inference, while still boosting small model performance. Using a streaming neural network that processes 8 ms chunks, we evaluate different speech separation and enhancement tasks with communication delays of up to six chunks or 48 ms. Our results show larger gains where the performance gap between the small and large models is wide, demonstrating a promising method for large-small model collaboration for low-latency applications. Code, dataset, and audio samples available at this https URL.

Comments:	Accepted by Interspeech 2024
Subjects:	Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2407.11055 [cs.LG]
	(or arXiv:2407.11055v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2407.11055

Submission history

From: Vidya Srinivas [view email]
[v1] Tue, 9 Jul 2024 22:04:23 UTC (1,429 KB)
[v2] Wed, 17 Jul 2024 10:43:12 UTC (1,429 KB)
[v3] Thu, 25 Jul 2024 08:26:35 UTC (1,429 KB)

Computer Science > Machine Learning

Title:Knowledge boosting during low-latency inference

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Knowledge boosting during low-latency inference

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators