Get the latest tech news

Hertz-dev, the first open-source base model for conversational audio


For the last few months, we at Standard Intelligence have focused on fundamental research on the frontier of audio-only speech generation. We're excited to announce that we're open-sourcing current checkpoints of our full-duplex, audio-only transformer base model, hertz-dev, with a total of 8.5 billion parameters.

We're excited to announce that we're open-sourcing current checkpoints of our full-duplex, audio-only transformer base model, hertz-dev, with a total of 8.5 billion parameters. The model takes in sampled latent representations and predicts the next encoded audio frame as a mixture of gaussians, with 15 bits of quantized information from the next token acting as semantic scaffolding to steer the generation in a streamable manner. Hertz-dev is a glimpse at the future of real-time voice interaction, and is the easiest conversational audio model in the world for researchers to fine-tune and build on top of.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of dev

dev

Photo of Hertz

Hertz

Photo of source base model

source base model

Related news:

News photo

From Lime to Uber to Hertz: Free and discounted Election Day rides

News photo

As re-sales of the Baldur's Gate 3 Collector's Edition reach $3,000, one dev condemns scalpers: "It's designed to make someone happy, not rich"

News photo

Soulslike Enotria: The Last Song delayed indefinitely on Xbox as dev says it's being ignored by Microsoft