AI that communicates just like we do
Ultravox is an open-source Speech Language Model (SLM) trained to understand speech naturally, just like humans do. Say goodbye to awkward pauses, slow response times, and robotic speech — Ultravox delivers smooth, real-time communication.
TRY IT OUT
ver 0.4
Learn more about Ultravox by talking to it.
or call 1 844-741-5700
The Future of AI Speech is Here
Experience the cutting-edge of AI speech with Ultravox, where technology meets human interaction; creating fluid, natural conversations on every medium
Beyond Speech Recognition
Ultravox is an advanced LLM that processes speech directly, without conversion to text. This enables much more natural and fluid conversations.
Web or VoIP Ready
Seamlessly integrate Ultravox into your web, native app, or phone-based products with minimal effort. It comes with SDKs for all major languages and built-in Twilio support.
Multi-lingual by default
Ultravox is fluent in all major languages, and easily adaptable support new languages or accents, ensuring smooth communication across diverse audiences.
BYOM (Bring Your Own Model)
Ultravox gives you the flexibility to work with any open-source model, even your own fine-tuned models.
Fast, Accurate, Smart. Pick three.
Unlike other voice-based systems, Ultravox integrates speech recognition directly, without relying on transforming speech into text. This makes Ultravox faster, more reliable, and more natural.
Ultravox
Understanding speech directly means there are fewer moving parts. This means much faster and much more consistent response times than the Legacy Component System.
Legacy Component Systems
The current industry standard is a cascaded pipeline of services strung together to give the illusion of a seamless experience. This means it's slower, more brittle, and unable to capture the nuances of human speech.
BENCHMARKS
CoVoST2 Translation
Our primary method of evaluation is zero-shot speech translation, measured by BLEU, as a proxy or general instruction-following capability (the higher the number the better)
En - De
25.47
En - Ca
27.46
En - Ar
28.07
Ru - En
38.96
Es - En
37.11
Zh - En
10.08
Customize it, then run it anywhere (even on-prem)
Whether it's adding support for additional languages, fine-tuning on your own datasets, or creating unique and custom voices — Ultravox can be fully customized to your needs.
Ultravax can also be deployed directly in your own cloud.
All the basics, plus some
We know some of these are expected, but we want you to know we cover the basics:
Function Calling
Fine-tunable
Interruptions
Custom Voices & Voice Cloning Support
RAG Support
Works with existing text-based prompts
Multi-lingual
High Quality Speech
People are noticing
They can't stop saying nice things about us *blushes*
Joe Heitzeberg
@jheitzeb
Wow! Ultravox is an *open source* speech to speech model — understands non-textual speech elements — paralinguistic information. @juberti just showed how it can pick up on tone, pauses, and more! @AITinkerers Seattle @FixieAI
bharat
@that_anokha_boy
ultravox is prolly most underrated project yall should checkout. i checked sarvam's shuka's code that is also inspired by ultravox.
Simon Willison
@simonw
I just spent some time with the voice demo of Ultravox at https://ai.town/ultravox and it really impressed me - openly licensed multi-modal audio model (like GPT-4o) based on Llama 3, and you can talk to it in your browser
Get In Touch
We'd love to learn more about your use case and how we can help
Prefer direct email? We're here:
hello@fixie.ai
© 2024 Fixie
hello@fixie.ai