Get the latest tech news
Qwen3-Omni: Native Omni AI model for text, image and video
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time. ...
Speech Input: English, Chinese, Korean, Japanese, German, Russian, Italian, French, Spanish, Portuguese, Malay, Dutch, Indonesian, Turkish, Vietnamese, Cantonese, Arabic, Urdu. Since our code is currently in the pull request stage, and audio output inference support for the Instruct model will be released in the near future, you can follow the commands below to install vLLM from source. Additionally, the text generated by the thinker will be more readable, with a natural, conversational tone and without complex formatting that is difficult to vocalize, leading to more stable and fluent audio output from the talker.
Or read this on Hacker News