Get the latest tech news
OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds
Three, all new proprietary voice models called gpt-4o-transcribe, gpt-4o-mini-transcribe and gpt-4o-mini-tts.
Today, the ChatGPT maker has unveiled three, all new proprietary voice models called gpt-4o-transcribe, gpt-4o-mini-transcribe and gpt-4o-mini-tts, available initially in its application programming interface (API) for third-party software developers to build their own apps atop, as well as on a custom demo site, OpenAI.fm, that individual users can access for limited testing and fun. It is meant to supersede OpenAI’s two-year-old Whisper open source text-to-speech model, offering lower word error rates across industry benchmarks and improved performance in noisy environments, with diverse accents, and at varying speech speeds — across 100+ languages. Another startup, Hume AI offers a new model Octave TTS with sentence-level and even word-level customization of pronunciation and emotional inflection — based entirely on the user’s instructions, not any pre-set voices.
Or read this on Venture Beat