Get the latest tech news

Open-source framework for real-time AI voice


Open-source framework for developing real-time multimodal conversational AI agents. - videosdk-live/agents

This SDK serves as a real-time bridge between AI models (like OpenAI and Gemini) and your users, facilitating seamless voice and media interactions. #FeatureDescription1 🎤 Real-time Communication (Audio/Video) Agents can listen, speak, and interact live in meetings.2 📞 SIP & Telephony Integration Seamlessly connect agents to phone systems via SIP for call handling, routing, and PSTN access.3 🧍 Virtual Avatars Add lifelike avatars to enhance interaction and presence using Simli.4 🤖 Multi-Model Support Integrate with OpenAI, Gemini, AWS NovaSonic, and more.5 🧩 Cascading Pipeline Integrates with different providers of STT, LLM, and TTS seamlessly.6 🧠 Conversational Flow Manages turn detection and VAD for smooth interactions.7 🛠️ Function Tools Extend agent capabilities with event scheduling, expense tracking, and more.8 🌐 MCP Integration Connect agents to external data sources and tools using Model Context Protocol.9 🔗 A2A Protocol Enable agent-to-agent interactions for complex workflows. For detailed guides, tutorials, and API references, check out our official VideoSDK AI Agents Documentation.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of source framework

source framework

Photo of time AI voice

time AI voice

Related news:

News photo

Cogentcore: Open-source framework for building multi-platform apps with Go

News photo

The RAG reality check: New open-source framework lets enterprises scientifically measure AI performance

News photo

OctoTools: Stanford’s open-source framework optimizes LLM reasoning through modular tool orchestration