UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching

Neta Glazer, Aviv Navon, Yael Segal-Feldman, Aviv Shamsian, Hilit Segev, Asaf Buchnick, Menachem Pirchi, Gil Hetz, Joseph Keshet

aiOla Research

Workshop on Machine Learning for Audio, ICML 2025

Abstract

Recent advances in Text-to-Speech (TTS) have enabled highly natural speech synthesis, yet integrating speech with complex background environments remains challenging. We introduce UmbraTTS, a flow-matching based TTS model that jointly generates both speech and environmental audio, conditioned on text and acoustic context. Our model allows fine-grained control over background volume and produces diverse, coherent, and context-aware audio scenes. A key challenge is the lack of data with speech and background audio aligned in natural context. To overcome the lack of paired training data, we propose a selfsupervised framework that extracts speech, background audio, and transcripts from unannotated recordings. Extensive evaluations demonstrate that UmbraTTS significantly outperformed existing baselines, producing natural, high-quality, environmentally aware audios.

Architecture

Model Architecture

TTS With Environmental Conditioning

Background Condition SER = 0 SER = 0.5 SER = 1
1. I can't believe how fast this year is flying by.
2. Let me know if you need anything, I'm happy to help.
3. Excuse me, you dropped something!
4. I need to find an ATM, do you know where one is?
5. Do you want to meet up for lunch tomorrow?
6. I just finished a great book. Do you like to read?
7. Oh no, my phone battery is about to die!
8. I think I've seen you around here before.
9. Are you free this weekend? We should do something fun!
10. Wow, it's really crowded today!
11. Wow, it's really crowded today!
12. I love going for a walk in the evening, it's so relaxing.
13. Do you know what time the next train arrives?
14. I'm trying to learn a new language. It's pretty challenging!
15. Are you free this weekend? We should do something fun!
16. Ugh, I forgot my umbrella and now it's raining!
17. I think I got lost. Can you help me find this address?
18. Watch out! That bike almost hit you!
19. The traffic is terrible today.
20. That smells amazing! What are you cooking?
21. Hey, how's it going?