Iván Hernández Dalas: AGIBOT WORLD 2026 dataset is open-source to accelerate embodied AI development

AGIBOT's humanoid robot is designed with dexterity to collect and use simulation and real-world data.

AGIBOT’s robot has a dexterous design to collect and use data. Source: AGIBOT

As robotics research moves beyond controlled lab settings into real-world environments, the demand for large-scale, high-quality data has become increasingly critical, according to AGIBOT. The company today released AGIBOT WORLD 2026, an open-source heterogeneous dataset it said is designed to systematically support five key research pathways in embodied intelligence.

“The dataset features structured, high-quality, and precisely annotated real-world robot data, providing developers and researchers with a robust foundation for training next-generation embodied AI systems,” said AGIBOT.

Editor’s note: At the 2026 Robotics Summit & Expo on May 27 and 28 in Boston, there will be sessions on embodied and physical AI, as well as on humanoid robot development. Registration is now open.


SITE AD for the 2026 Robotics Summit save the date.

AGIBOT WORLD follows free-form data-collection strategy

AGIBOT WORLD 2026 spans a wide range of real-world environments, including commercial spaces, homes, and everyday scenarios. AGIBOT said this captures the complexity, variability, and unpredictability that robots must handle in practice.

Unlike conventional datasets built on repetitive and scripted demonstrations, the Shanghai-based company has taken free-form data-collection approach, in which teleoperators dynamically perform tasks based on real-time conditions.

It claimed that this strategy can significantly enhance diversity within each episode and improve generalization across multiple dimensions, including object categories, initial configurations, and task execution sequences. AGIBOT said its robot uses a flexible wheeled base, articulated head and waist movements, and lift-pitch capabilities for efficient, natural, and highly transferable data collection.

In parallel, AGIBOT constructs 1:1 digital twin environments in simulation, with all corresponding simulation data released alongside the real-world dataset

Free-form data collection from teleoperation can ensure comprehensive generalization.

AGIBOT says free-form data collection ensures comprehensive generalization. Source: AGIBOT

Innovations bridge the gap between data, real robot behavior

“A fundamental question in embodied AI remains: Does the data truly reflect how a robot operates as an integrated system?” noted AGIBOT. To address this, the company has introduced several features:

  • Whole-body control (WBC): This enables coordinated control of arms, waist, and hands, allowing robots to perform tasks more fluidly as a unified system rather than through isolated motions.
  • First-person beyond-visual-range teleoperation: The robot’s perception is aligned with that of the operator, enabling more intuitive, continuous, and transferable control.
  • Force-controlled data collection: AGIBOT said it has incorporated contact dynamics and force feedback, capturing not only motion trajectories, but also real physical interactions.

“Together, these capabilities ensure that the dataset more accurately represents real-world robot behavior,” asserted the company.

Whole-body control allows for smooth movement of the entire G2 robot.

Whole-body control allows for smooth movement of the entire robot. Source: AGIBOT

Industrial-grade hardware feeds the data pipeline

AGIBOT explained that the dataset is collected on its G2 hardware platform, which integrates high-performance joint actuators, multi-modal sensors, and a high-performance domain controller to support precise force control and scalable development.

Equipped with Zhixing 90D grippers and the dexterous OmniHand, the G2 captures synchronized multi-modal data—including RGB(D), tactile signals, lidar point clouds, IMU data, and full-body joint states—within a unified pipeline.

AGIBOT added that each data episode undergoes rigorous cleaning and validation through its “industrial-grade” data-processing system, ensuring readiness for large-scale model training and research applications.

A chart showing the AGIBOT WORLD data process. Industrial-grade quality data is essential to developing embodied AI, says AGIBOT.

Source: AGIBOT

Phase 1 release: Imitation learning

The company said it plans to release AGIBOT WORLD 2026 in five phases, each aligned with a core research direction in embodied intelligence.

The first release focuses on imitation learning, a key paradigm that enables robots to acquire complex physical skills from expert demonstrations. This phase includes hundreds of hours of real-world data collected primarily in commercial and service environments. The dataset combines:

  • Task-level descriptions (segment-level instructions)
  • Action sequences (step-by-step execution)
  • Atomic skill labels (e.g., pull, place)
  • Object annotations (2D bounding boxes and attributes such as name and color)

“Importantly, error-recovery trajectories are also retained and annotated,” said AGIBOT. “This hierarchical annotation framework—spanning from high-level tasks to low-level actions—provides the fidelity and corrective priors needed to train more robust and adaptive embodied agents.”

A demonstration of a robot collecting and using annotated visual data. AGIBOT uses a hierarchical annotation framework.

AGIBOT WORLD 2026 uses a hierarchical annotation framework. Source: AGIBOT

AGIBOT WORLD part of a long-term commitment to embodied AI ecosystem

AGIBOT said it is among a small group of startups taking a long-term, infrastructure-driven approach to embodied intelligence.

Recognizing early that high-quality data is foundational to unlocking the next generation of robotic capabilities, the company has consistently open-sourced million-scale real-world and simulation datasets.

AGIBOT said this effort reflects its broader goal in embodied intelligence: to democratize access to high-quality robot data. Through the continued evolution of the AGIBOT WORLD ecosystem, the company aims to contribute to the global robotics community and accelerate the transition of embodied AI from research labs into real-world applications.

AGIBOT WORLD 2026 trains on real-world scenarios, such as this bimanual manipulation of flyers.

AGIBOT WORLD 2026 trains on real-world scenarios. Source: AGIBOT

The post AGIBOT WORLD 2026 dataset is open-source to accelerate embodied AI development appeared first on The Robot Report.



View Source

Popular posts from this blog

Iván Hernández Dalas: 4 Show Floor Takeaways from CES 2019: Robots and Drones, Oh My!

Iván Hernández Dalas: How automation and farm robots are transforming agriculture

Iván Hernández Dalas: Physical Intelligence open-sources Pi0 robotics foundation model