πŸš€ aiOla Research

At aiOla Research, we’re shaping the future of speech and voice AI, making advanced language technologies accessible and impactful for enterprise workflows.

Our work is rooted in deep research and driven by real-world needs, combining academic rigor with a focus on practical outcomes.


πŸ“ Publications

UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching. Workshop on Machine Learning for Audio, ICML 2025.


FlowTSE: Target Speaker Extraction with Flow Matching. InterSpeech 2025.


Whisper in Medusa’s Ear: Multi-head Efficient Decoding for Transformer-based ASR. ICASSP 2025.


Keyword-guided adaptation of automatic speech recognition. InterSpeech 2024.

Open-vocabulary keyword-spotting with adaptive instance normalizations. ICASSP 2024.


See full publication list. More publications will be added soon. Stay tuned.

🎯 Our Mission

To develop robust, efficient, and adaptable AI systems that understand, generate, and interact with human speech in complex, real-world settings.


πŸ”¬ Research Focus Areas

Our research spans multiple areas at the intersection of voice and enterprise:

  • Automatic Speech Recognition (ASR)
    Developing models like Jargonic that deliver 95%+ accuracy across accents, noise conditions, and domain-specific jargon.

  • Text-to-Speech (TTS)
    Generating lifelike, multilingual, and customizable voices for expressive and context-aware speech synthesis.

  • Speech-to-Workflow Automation
    Turning voice into structured actions and data β€” enabling compliance, speed, and insight in industries like automotive, aviation, food & CPG, and pharma.


  • Jargonic ASR
    An enterprise-grade speech recognition model tailored for real-world jargon, noisy environments, and multilingual support.

  • Multi-head Efficient Decoding for Transformer-based ASR
    A multi-head decoding approach for efficient transformer-based ASR β€” increasing speed and robustness.

  • Keyword-guided Adaptation of Autumatic Speech Recognition

  • Open Vocabulary Keyword Spotting

  • WhisperNER
    A unified framework that combines speech recognition with real-time named entity recognition for enriched voice understanding.


🌐 Explore More

πŸ”— Visit our research homepage
πŸ”— Learn more about aiOla’s products
πŸ”— Contact us via our website


Advancing speech intelligence from research to real-world impact.