AI that communicates just like we do.

Ultravox is an open-source Speech Language Model (SLM) trained to understand speech naturally, just like humans do. Say goodbye to awkward pauses, slow response times, and robotic speech — Ultravox delivers smooth, real-time communication.

TRY IT OUT

ver 0.4

Learn more about Ultravox by talking to it.

or call 1 844-741-5700

The future of AI speech is here.

Experience the cutting-edge of AI speech with Ultravox, where technology meets human interaction; creating fluid, natural conversations on every medium

Beyond Speech Recognition

Ultravox is an advanced LLM that processes speech directly, without conversion to text. This enables much more natural and fluid conversations.

Web or VoIP
Ready

Web or VoIP Ready

Web or VoIP Ready

Seamlessly integrate Ultravox into your web, native app, or phone-based products with minimal effort. It comes with SDKs for all major languages and built-in Twilio support.

Multi-lingual by default

Ultravox is fluent in all major languages, and easily adaptable support new languages or accents, ensuring smooth communication across diverse audiences.

BYOM (Bring Your Own Model)

Ultravox gives you the flexibility to work with any open-source model, even your own fine-tuned models.

Fast, accurate, smart. Pick three.

Unlike other voice-based systems, Ultravox integrates speech recognition directly, without relying on transforming speech into text.  This makes Ultravox faster, more reliable, and more natural.

Ultravox

Understanding speech directly means there are fewer moving parts. This means much faster and much more consistent response times than the Legacy Component System.

Legacy Component Systems

The current industry standard is a cascaded pipeline of services strung together to give the illusion of a seamless experience. This means it's slower, more brittle, and unable to capture the nuances of human speech.

BENCHMARKS

CoVoST2 Translation

Our primary method of evaluation is zero-shot speech translation, measured by BLEU, as a proxy or general instruction-following capability (the higher the number the better)

En - De

25.47

En - Ca

27.46

En - Ar

28.07

Ru - En

38.96

Es - En

37.11

Zh - En

10.08

Customize it, then run it anywhere.

(even on-prem)

Whether it's adding support for additional languages, fine-tuning on your own datasets, or creating unique and custom voices — Ultravox can be fully customized to your needs.

Ultravax can also be deployed directly in your own cloud.

All the basics, plus some.

We know some of these are expected, but we want you to know we cover the basics:

Function Calling

Fine-tunable

Interruptions

Custom Voices & Voice Cloning Support

RAG Support

Works with existing text-based prompts

Multi-lingual

High Quality Speech

People are noticing.

They can't stop saying nice things about us *blushes*

Joe Heitzeberg

@jheitzeb

Wow! Ultravox is an *open source* speech to speech model — understands non-textual speech elements — paralinguistic information. @juberti just showed how it can pick up on tone, pauses, and more! @AITinkerers Seattle @FixieAI

bharat

@that_anokha_boy

ultravox is prolly most underrated project yall should checkout. i checked sarvam's shuka's code that is also inspired by ultravox.

Simon Willison

@simonw

I just spent some time with the voice demo of Ultravox at https://ai.town/ultravox and it really impressed me - openly licensed multi-modal audio model (like GPT-4o) based on Llama 3, and you can talk to it in your browser

Get in touch.

We'd love to learn more about your use case and how we can help

Prefer direct email? We're here:

hello@fixie.ai