» » »

Attention-Guided Audio Compression for Multimodal LLMs - Livestream

Description Audio compression is often proposed to improve the efficiency of multimodal large language models, but its impact on downstream task performance remains underexplored. This talk examines how semantic neural audio codecs behave under token reduction constraints, using cross-modal attention as a signal to discard frames with low semantic content. On audio question-answering benchmarks, attention-guided frame selection removes 10??"30% of frames while matching baseline accuracy and answer consistency, and identifies a critical compression threshold (keep ratio ~0.7) below which performance degrades sharply. The talk also discusses an "answer consistency paradox" where models remain highly self-consistent (>98%) even as accuracy degrades and what this decoupling of consistency from correctness means for evaluating compressed multimodal systems in low-resource deployments.

Speaker: Prerana Rane, IEEE Signal Processing Society Santa Clara Valley Chapter

Register at weblink

Friday, 06/26/26

Contact:

Website: Click to Visit

Cost:

Free

Save this Event:

iCalendar
Google Calendar
Yahoo! Calendar
Windows Live Calendar

IEEE Signal Processing Society

Santa Clara Valley Chapter
, CA