Get the latest tech news
Show HN: Open-source, native audio turn detection model
Contribute to pipecat-ai/smart-turn development by creating an account on GitHub.
We have experimented with a number of different architectures and approaches to training data, and are releasing this version of the model now because we are confident that performance can be rapidly improved. Support for a wide range of languages Inference time <50ms on GPU and <500ms on CPU Much wider range of speech nuances captured in training data A completely synthetic training data pipeline Text conditioning of the model, to support "modes" like credit card, telephone number, and address entry. Our goal for an initial version of this model was to overfit on a non-trivial amount of data, plus exceed a non-quantitative vibes threshold when experimenting interactively.
Or read this on Hacker News