Get the latest tech news
Hertz-dev, the first open-source base model for conversational audio
For the last few months, we at Standard Intelligence have focused on fundamental research on the frontier of audio-only speech generation. We're excited to announce that we're open-sourcing current checkpoints of our full-duplex, audio-only transformer base model, hertz-dev, with a total of 8.5 billion parameters.
We're excited to announce that we're open-sourcing current checkpoints of our full-duplex, audio-only transformer base model, hertz-dev, with a total of 8.5 billion parameters. The model takes in sampled latent representations and predicts the next encoded audio frame as a mixture of gaussians, with 15 bits of quantized information from the next token acting as semantic scaffolding to steer the generation in a streamable manner. Hertz-dev is a glimpse at the future of real-time voice interaction, and is the easiest conversational audio model in the world for researchers to fine-tune and build on top of.
Or read this on Hacker News