Iván Hernández Dalas: AGIBOT unveils Genie Envisioner 2.0 to advance world models into scalable simulators for embodied AI

Genie Envisioner World Simulator takes video data to help control robots, according to AGIBOT.

Genie Envisioner World Simulator takes video data to help control robots. 2.0 Source: AGIBOT

AGIBOT today announced the release of Genie Envisioner 2.0, or GE 2-Sim, which it said marked a significant step forward in the evolution of world models — from world action models to fully interactive “world simulators.”

The new system introduces what the company described as a “physical evolution engine” for embodied AI. It is a model-based environment where robots can be trained, evaluated, and optimized at scale, without relying solely on costly real-world trial and error.

From understanding the world to learning within it

In 2025, AGIBOT introduced what it claimed was the industry’s first action-driven world model, Genie Envisioner. The open-source platform enabled robots to understand the world through integrated modeling of vision, language, and action, said the Shanghai-based company.

With Genie Envisioner 2.0, AGIBOT said it has shifted the paradigm further, from enabling robots to understand the world and then to enabling them to learn within a world generated by models.

The company asserted that this transition reflects a broader shift in embodied AI — from representing the world to simulating the world itself. As world models evolve into stable, high-fidelity environments that respond to actions in physically consistent ways, they unlock the ability to train robots at scale in synthetic environments.

AGIBOT said it believes GE 2-Sim marks a critical inflection point toward achieving a true scaling law in embodied intelligence.

Diagram of how world action models work from AGIBOT.

World action models can show state evolution. Click here to enlarge. Source: AGIBOT

From world action models to world simulators

At the core of this evolution is AGIBOT’s continued development of the world action model (WAM) framework, which extends traditional world models by explicitly incorporating actions as a first-class variable.

Rather than modeling only state, WAM captures the full loop of:

State → Action → State Evolution

This enables world models to serve as a foundational layer for both policy learning and action generation. Building on this foundation, AGIBOT has progressively developed a series of systems:

EnerVerse: Extends embodied environments into a computable 4D world model
Genie Envisioner Act (GE-Act): Bridges world representation and action trajectory generation
Act2Goal: Enables long-horizon, goal-driven control

While these advances allowed world models to support policy learning, real-world deployment exposed key limitations: high reliance on physical environments, costly evaluation, and data scalability constraints.

This led to a fundamental realization. The next breakthrough lies not in stronger representation, but in transforming world models into fully functional simulators.

Making the world runnable: Toward interactive simulation

To enable this transition, AGIBOT introduces a set of new capabilities that push world models toward interactive simulation:

EnerVerse-AC: Introduces action-conditioned world modeling for future prediction
Genie Envisioner Sim (GE-Sim): A neural simulator for closed-loop policy evaluation
EWMBench: A comprehensive benchmark evaluating simulation fidelity, action correctness, and semantic alignment

At the same time, AGIBOT establishes a new data and training paradigm:

Real2Edit2Real: Real-world data becomes editable and extensible, significantly increasing scale and diversity
Fidelity-Aware Data Composition: Combines real and generated data to balance realism and generalization

Together, these advancements transform world models from representation systems into environment-level infrastructure.

A 'model world' can be interacted with and evolved, bridging envision and reality, explains AGIBOT.

A world simulator can make simulation more interactive and productive. Click here to enlarge. Source: AGIBOT

Genie Envisioner 2.0: A ‘physical evolution engine’

Genie Envisioner 2.0 represents the culmination of this evolution—a system that is no longer just generative, but operational. Key capabilities include:

Action-driven world dynamics

The system responds directly to robot actions, generating high-fidelity environmental changes that follow physical and semantic constraints. The world becomes a process shaped by interaction, rather than a static representation.

Long-horizon temporal modeling

Supports minute-level stable simulation, enabling continuous generation of full task sequences rather than fragmented clips.

Embodied spatial consistency

Unifies multi-view perception, cross-view 3D consistency, and robot proprioception into a single representation—transforming perception from images into a fully interactive embodied world.

Built-in evaluation and reward modeling

A native general reward model enables self-evaluation and optimization based on textual feedback, supporting reinforcement learning in the world model without human-designed rewards.

Toward real-time interaction

With improved inference efficiency, GE 2-Sim approaches real-time operation, enabling:

Eval in World Model
RL in World Model
Teleoperation in World Model

This marks the transition of world models from offline tools to interactive system environments.

Diagram of how world simulators can feed AI from data, by AGIBOT.

The core simulation engine can provide data to feed AI. Click here to enlarge. Source: AGIBOT

A paradigm shift: When models become worlds

As these capabilities converge, embodied AI is undergoing a fundamental transformation, from “using models to understand the world” to “learning and making decisions within model-generated worlds.”

On one side, the integration of WAM and vision-language-action (VLA) models enables a shift from reactive control to generative, predictive decision-making.

On the other, world simulators allow robots to explore, iterate, and optimize at scale—no longer limited by real-world data availability, but by the fidelity of simulation itself.

When these two trajectories converge, robots move beyond replicating human demonstrations to continuously exploring, adapting, and evolving within model-generated environments.

Toward a new foundation for embodied intelligence

AGIBOT envisions world models evolving from tools for understanding, to platforms for learning, and ultimately to infrastructure that drives continuous evolution.

When models become worlds, reality is no longer the only training ground. When worlds can be constructed, learning can be scaled. And when evolution happens within models, the boundaries of embodied AI can be fundamentally redefined.

Editor’s note: At the 2026 Robotics Summit & Expo on May 27 and 28 in Boston, there will be sessions on embodied and physical AI. Registration is now open.

SITE AD for the 2026 Robotics Summit save the date.

The post AGIBOT unveils Genie Envisioner 2.0 to advance world models into scalable simulators for embodied AI appeared first on The Robot Report.

View Source

Search This Blog

Ivan Hernandez Dalas - Mechatronics