Get the latest tech news
Apple’s newest AI study unlocks street navigation for blind users
SceneScout, combines Apple Maps with a multimodal LLM to provide interactive, AI-generated descriptions of street view images.
To try to close this gap, the researchers present this project that combines Apple Maps APIs with a multimodal large language model to provide interactive, AI-generated descriptions of street view images. SceneScout obviously isn’t a shipping product, and it explores the collaboration between a multimodal large language model and the Apple Maps API, rather than real-time, computer vision-based in-site world navigation. Another participant (P4) suggested a new form of interaction, in which users “could point the device in a certain direction” to receive on-demand descriptions, rather than having to physically align their phone camera to capture the surroundings.
Or read this on r/apple