Iván Hernández Dalas: AGIBOT unveils Genie Envisioner 2.0 to advance world models into scalable simulators for embodied AI
Genie Envisioner World Simulator takes video data to help control robots. 2.0 Source: AGIBOT
AGIBOT today announced the release of Genie Envisioner 2.0, or GE 2-Sim, which it said marked a significant step forward in the evolution of world models — from world action models to fully interactive “world simulators.”
The new system introduces what the company described as a “physical evolution engine” for embodied AI. It is a model-based environment where robots can be trained, evaluated, and optimized at scale, without relying solely on costly real-world trial and error.
From understanding the world to learning within it
In 2025, AGIBOT introduced what it claimed was the industry’s first action-driven world model, Genie Envisioner. The open-source platform enabled robots to understand the world through integrated modeling of vision, language, and action, said the Shanghai-based company.
With Genie Envisioner 2.0, AGIBOT said it has shifted the paradigm further, from enabling robots to understand the world and then to enabling them to learn within a world generated by models.
The company asserted that this transition reflects a broader shift in embodied AI — from representing the world to simulating the world itself. As world models evolve into stable, high-fidelity environments that respond to actions in physically consistent ways, they unlock the ability to train robots at scale in synthetic environments.
AGIBOT said it believes GE 2-Sim marks a critical inflection point toward achieving a true scaling law in embodied intelligence.
World action models can show state evolution. Click here to enlarge. Source: AGIBOT
From world action models to world simulators
At the core of this evolution is AGIBOT’s continued development of the world action model (WAM) framework, which extends traditional world models by explicitly incorporating actions as a first-class variable.
Rather than modeling only state, WAM captures the full loop of:
- State → Action → State Evolution
This enables world models to serve as a foundational layer for both policy learning and action generation. Building on this foundation, AGIBOT has progressively developed a series of systems:
- EnerVerse: Extends embodied environments into a computable 4D world model
- Genie Envisioner Act (GE-Act): Bridges world representation and action trajectory generation
- Act2Goal: Enables long-horizon, goal-driven control
While these advances allowed world models to support policy learning, real-world deployment exposed key limitations: high reliance on physical environments, costly evaluation, and data scalability constraints.
This led to a fundamental realization. The next breakthrough lies not in stronger representation, but in transforming world models into fully functional simulators.
Making the world runnable: Toward interactive simulation
To enable this transition, AGIBOT introduces a set of new capabilities that push world models toward interactive simulation:
- EnerVerse-AC: Introduces action-conditioned world modeling for future prediction
- Genie Envisioner Sim (GE-Sim): A neural simulator for closed-loop policy evaluation
- EWMBench: A comprehensive benchmark evaluating simulation fidelity, action correctness, and semantic alignment
At the same time, AGIBOT establishes a new data and training paradigm:
- Real2Edit2Real: Real-world data becomes editable and extensible, significantly increasing scale and diversity
- Fidelity-Aware Data Composition: Combines real and generated data to balance realism and generalization
Together, these advancements transform world models from representation systems into environment-level infrastructure.
A world simulator can make simulation more interactive and productive. Click here to enlarge. Source: AGIBOT
Genie Envisioner 2.0: A ‘physical evolution engine’
Genie Envisioner 2.0 represents the culmination of this evolution—a system that is no longer just generative, but operational. Key capabilities include:
Action-driven world dynamics
The system responds directly to robot actions, generating high-fidelity environmental changes that follow physical and semantic constraints. The world becomes a process shaped by interaction, rather than a static representation.
Long-horizon temporal modeling
Supports minute-level stable simulation, enabling continuous generation of full task sequences rather than fragmented clips.
Embodied spatial consistency
Unifies multi-view perception, cross-view 3D consistency, and robot proprioception into a single representation—transforming perception from images into a fully interactive embodied world.
Built-in evaluation and reward modeling
A native general reward model enables self-evaluation and optimization based on textual feedback, supporting reinforcement learning in the world model without human-designed rewards.
Toward real-time interaction
With improved inference efficiency, GE 2-Sim approaches real-time operation, enabling:
- Eval in World Model
- RL in World Model
- Teleoperation in World Model
This marks the transition of world models from offline tools to interactive system environments.
The core simulation engine can provide data to feed AI. Click here to enlarge. Source: AGIBOT
A paradigm shift: When models become worlds
As these capabilities converge, embodied AI is undergoing a fundamental transformation, from “using models to understand the world” to “learning and making decisions within model-generated worlds.”
On one side, the integration of WAM and vision-language-action (VLA) models enables a shift from reactive control to generative, predictive decision-making.
On the other, world simulators allow robots to explore, iterate, and optimize at scale—no longer limited by real-world data availability, but by the fidelity of simulation itself.
When these two trajectories converge, robots move beyond replicating human demonstrations to continuously exploring, adapting, and evolving within model-generated environments.
Toward a new foundation for embodied intelligence
AGIBOT envisions world models evolving from tools for understanding, to platforms for learning, and ultimately to infrastructure that drives continuous evolution.
When models become worlds, reality is no longer the only training ground. When worlds can be constructed, learning can be scaled. And when evolution happens within models, the boundaries of embodied AI can be fundamentally redefined.
Editor’s note: At the 2026 Robotics Summit & Expo on May 27 and 28 in Boston, there will be sessions on embodied and physical AI. Registration is now open.
The post AGIBOT unveils Genie Envisioner 2.0 to advance world models into scalable simulators for embodied AI appeared first on The Robot Report.
View Source
