Iván Hernández Dalas: Why robots still struggle to see the real world

Orbbec offers a range of cameras for robot perception, picking, and navigation. Source: Orbbec
The robot on the trade show floor looks effortless. It glides toward a bin, identifies the object, reaches in, and places the item exactly where it needs to go. The crowd nods. Investors take notes. Engineers celebrate. Then the robot ships to its destination, and the world stops behaving like the demo.
This demo-to-deployment gap remains one of the most persistent challenges in robotics. Machines that perform beautifully under controlled conditions often struggle with shifting light, reflective surfaces, transparent materials, moving people, and forklift traffic.
Robots don’t need to see like humans. Robotic perception should be reliable, task-specific, and measurable under real operating conditions.
The controlled environment problem
Lab conditions often favor the perception stack. Lighting, object position, and backgrounds are controlled, and the robot is given every advantage. Real-world environments grant none of these favors. Warehouse floors, hospital corridors, and manufacturing lines introduce shifting light, reflective surfaces, moving people, vibration, and material variation.
Each of these variables can expose a weakness that never appeared in the demo. What looks like a planning or manipulation failure may begin with sensing, calibration, or poor confidence estimation. A robot cannot reliably plan around a depth map that is confident but wrong.
Traditional 2D cameras remain useful for recognition, inspection, and tracking. But a 2D image does not measure depth. Depth can be inferred from motion, learned priors, or multi-view geometry, but those estimates often break when lighting, texture, occlusion, or materials change.
This is why 3D vision systems, depth cameras, and sensor fusion have become central to robotics deployment. Robots need spatial measurements from the physical world, not smarter guesses from flat images.
Depth sensing is not a single technology
Robotic vision has moved through several generations of sensing technology, each solving some problems while introducing others.
Early robotic vision systems relied heavily on 2D cameras paired with highly structured environments. Assembly-line robots worked with fixed part positions, orientations, and lighting. In many cases, the intelligence was in the fixture, not the sensor.
Structured light systems project a known pattern onto a scene and estimate depth by reading how that pattern deforms. This approach can work well for indoor inspection and measurement. However, it can be sensitive to ambient light, motion, reflective or transparent surfaces, and interference from other active emitters.
Stereo vision uses two offset cameras to estimate depth. By matching corresponding points between the two images, the system estimates disparity and converts it into depth. Passive stereo depends on texture and light; active stereo adds infrared projection for low-texture scenes. Stereo systems can scale well for robotics, but low texture, repetitive patterns, motion blur, occlusion, reflective materials, and range trade-offs all matter.
Time-of-flight (ToF) technology estimates distance from returning infrared light. ToF cameras can be compact, fast, and useful for dense depth, but ambient infrared, multipath reflections, reflective surfaces, and range ambiguity can all distort results.
The practical conclusion is simple: No sensor category is universally best. Structured light, stereo, ToF, lidar, RGB cameras, and inertial measurement units (IMUs) all have useful roles. The right choice depends on task, range, lighting, materials, motion, compute, safety needs, and failure tolerance.

Effective 3D robotic perception depends on a range of sensing technologies. Source: Orbbec
Better AI helps, but it is not a substitute for reliable measurement
It’s tempting to assume that AI can compensate for sensor limits. AI can substantially improve robotic perception. It can denoise depth maps, fill gaps, fuse RGB and depth, estimate pose, and track motion.
AI still depends on reliable physical data. A robot needs depth estimates that are correct enough to act on. The difference matters near people, expensive goods, or machinery.
For deployment, perception needs measurement, uncertainty, validation, and graceful degradation. If a sensor saturates, loses texture, sees through glass, receives multipath reflections, or drifts out of calibration, the system should recognize reduced confidence rather than silently passing bad geometry downstream.
In robotics, a perception failure that looks confident is often more dangerous than one that fails visibly.

Robotic perception systems must have enough real-world data for certainty. Source: Orbbec
What real-world deployment actually demands
Deployment is where the difficult problems tend to appear. A robot may perform well in integration, then fail on lab-missed edge cases: black rubber, glossy packaging, transparent film, sunlit doorways, vibration, dust, or overlapping active depth cameras.
Deployment teams should evaluate perception systems against the full operating envelope. The real question is whether the perception stack can produce reliable spatial information under the conditions that matter for the task.
Evaluation should cover depth accuracy, latency, calibration drift, compute load, mechanical fit and resilience to dust, vibration, and interference. It should also test glossy, dark, transparent, metallic, and low-texture surfaces.
Lighting must be treated as a variable, not a background assumption. A system that works under controlled indoor lighting may behave differently under direct sunlight, mixed LED sources, flicker, shadows, or near-infrared interference. Multi-camera operation should also be validated, especially when active illumination is involved.
Deployment readiness comes from repeatable performance across the real distribution of operating conditions, including the inconvenient cases that rarely appear in a polished demo video.

Cameras and other sensors must be optimized for real-world materials and environments. Source: Orbbec
What’s next for machine perception?
The robotics industry is not short on ambition. Humanoid robots, autonomous warehouses, hospital logistics, and factory automation all depend on machines that can perceive the physical world reliably enough to act in it.
The future of robotic perception will come from better depth sensing, sensor fusion, online calibration and validation. Stereo systems will continue to improve through stronger matching algorithms and neural processing. ToF systems will benefit from better modulation schemes, multipath mitigation, dynamic range, and sensor fusion.
Structured light will remain valuable in controlled close-range measurement and inspection. RGB, depth, lidar, IMU, tactile sensing, and semantic models will increasingly work together rather than compete as isolated technologies.
The most important progress may be less glamorous than a new algorithm: perception systems that know when they are uncertain, degrade gracefully, and expose useful confidence information to planning and control. Robotic perception needs enough accuracy, speed, and uncertainty awareness to support the task.
Making deployment look more like the demo starts with building perception for the world robots actually face, not the world we wish they operated in.
About the author
David Chen holds a Ph.D. in engineering mechanics, specializing in optical measurement systems. He has been developing RGB+Depth cameras since 2009 and, since joining Orbbec Inc. in 2013, has contributed to the successful global launch of more than 10 products.
Orbbec offers products spanning structured light, stereo vision, ToF, and lidar technologies. The company said its sensors power robots and manufacturing, logistics, retail, 3D scanning, healthcare, and fitness systems.
The post Why robots still struggle to see the real world appeared first on The Robot Report.
View Source
