Get the latest tech news

Qwen3-Omni: Native Omni AI model for text, image and video


Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time. ...

Speech Input: English, Chinese, Korean, Japanese, German, Russian, Italian, French, Spanish, Portuguese, Malay, Dutch, Indonesian, Turkish, Vietnamese, Cantonese, Arabic, Urdu. Since our code is currently in the pull request stage, and audio output inference support for the Instruct model will be released in the near future, you can follow the commands below to install vLLM from source. Additionally, the text generated by the thinker will be more readable, with a natural, conversational tone and without complex formatting that is difficult to vocalize, leading to more stable and fluent audio output from the talker.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Text

Text

Photo of video

video

Photo of model

model

Related news:

News photo

Why I'm recommending this iPhone 17 model to most people (and it's not the Pro Max)

News photo

Should you buy the new Meta Ray-Bans? I tested every model, and here's how to decide

News photo

Is OpenAI's Video-Generating Tool 'Sora' Scraping Unauthorized YouTube Clips?