Get the latest tech news

Meta Introduces Spirit LM open source model that combines text and speech inputs/outputs


Spirit LM Expressive incorporates emotional cues into its speech generation and can detect and reflect anger, surprise, or joy.

• Spirit LM Expressive: Includes additional tokens for pitch and tone, allowing the model to capture more nuanced emotional states, such as excitement or sadness, and reflect those in the generated speech. In line with Meta’s commitment to open science, the company has made Spirit LM fully open-source, providing researchers and developers with the model weights, code, and supporting documentation to build upon. By offering a more natural and expressive approach to AI-generated speech, and making the model open-source, Meta is enabling the broader research community to explore new possibilities for multimodal AI applications.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of Text

Text

Photo of Spirit

Spirit

Photo of open source model

open source model

Related news:

News photo

Meta Spirit LM: Open multimodal language model that freely mixes text and speech

News photo

Google launches NotebookLM Business to make enterprise AI audio, text

News photo

Channelling the spirit of Inscryption, my favourite Steam Next Fest game is a game within a game, and it's brilliant