Iván Hernández Dalas: Context is king: How Avride uses cloud VLMs as a safety net for delivery robots
Avride has integrated vision-language models into its delivery robots. Source: Avride Avride Inc. has built its delivery robots for high level of autonomy. Every single day, hundreds of them navigate busy city streets entirely on their own, processing complex sensor data locally on their onboard compute units. Our sidewalk robots run with minimal human involvement, reliably handling standard urban maneuvers, pedestrians, and traffic lights on their own. However, efficiently managing the mechanics of navigation – even in challenging conditions like narrow pathways or bad weather – is only one part of the equation. Ensuring a robot behaves appropriately in unusual, sensitive, or high-stakes real-world environments requires a different kind of intelligence. To add a proactive layer of environmental awareness, we have integrated heavy, cloud-based vision-language models (VLMs) into its system as an automated “VLM-watcher.” From object detection to holistic scene understandi...