Built on Open-Source Foundations
We created OpenVox with developers in mind. Because of that, we’ve made our base model available for anyone interested.
Text Tokenizer
~300 M Params
OD: 768
Audio Encoder
~20 M Params
OD: 4096
▼
▼
Text Embedder
~300 M Params
OD: 768
Audio Projector
OD: 4096
▼
Llama 3.1 Output