Get the latest tech news
Meta Introduces Spirit LM open source model that combines text and speech inputs/outputs
Spirit LM Expressive incorporates emotional cues into its speech generation and can detect and reflect anger, surprise, or joy.
• Spirit LM Expressive: Includes additional tokens for pitch and tone, allowing the model to capture more nuanced emotional states, such as excitement or sadness, and reflect those in the generated speech. In line with Meta’s commitment to open science, the company has made Spirit LM fully open-source, providing researchers and developers with the model weights, code, and supporting documentation to build upon. By offering a more natural and expressive approach to AI-generated speech, and making the model open-source, Meta is enabling the broader research community to explore new possibilities for multimodal AI applications.
Or read this on Venture Beat