Built on Open-Source Foundations

We created OpenVox with developers in mind. Because of that, we’ve made our base model available for anyone interested.

Text Tokenizer

~300 M Params

OD: 768

Audio Encoder

~20 M Params

OD: 4096

Text Embedder

~300 M Params

OD: 768

Audio Projector

OD: 4096

Llama 3.1 Output