Published on August 26, 2025
In AI News

Microsoft Unveils VibeVoice, an Open-Source Text-to-Speech AI Model

VibeVoice can produce up to 90 minutes of synthetic dialogue with as many as four distinct speakers.

By Ankush Das

The Microsoft Research team has released VibeVoice, a frontier open-source text-to-speech (TTS) model designed for generating expressive, multi-speaker conversational audio. The system, aimed at research use, promises advances in scalability, consistency and natural turn-taking, areas where traditional TTS models often struggle.

VibeVoice can produce up to 90 minutes of synthetic dialogue with as many as four distinct speakers, extending beyond the one or two-speaker limits common in earlier systems.

The model introduces continuous speech tokenisers, operating at a low frame rate of 7.5 Hz, to preserve audio quality while reducing computational load. It also employs a diffusion-based framework, combining a transformer large language model with a dedicated “diffusion head” to refine acoustic details.

For this release, the system integrates Qwen2.5-1.5B as its large language model, alongside acoustic and semantic tokenisers. The acoustic tokeniser achieves a 3,200x downsampling rate from 24 kHz input. The diffusion head, with around 123 million parameters, applies denoising diffusion probabilistic models to generate natural-sounding speech conditioned on dialogue context.

The team stresses that VibeVoice is intended strictly for research. The system embeds both audible disclaimers and imperceptible watermarks into every generated file to discourage misuse. Out-of-scope applications include voice impersonation, disinformation, live deepfake conversions and use in unsupported languages.

The model is trained only on English and Chinese data and is designed exclusively for speech, not background sounds or music.

Microsoft highlights risks of bias, unexpected errors and potential misuse, noting the importance of disclosure when AI-generated audio is shared. To track and mitigate abuse, inference requests are logged in hashed form, and usage statistics will be published quarterly.

The project, including technical report and code, is now publicly available on GitHub under the MIT License.

📣 Want to advertise in AIM? Book here

Ankush Das

I am a tech aficionado and a computer science graduate with a keen interest in AI, Coding, Open Source, Global SaaS, and Cloud. Have a tip? Reach out to ankush.das@aimmediahouse.com