⚡️ EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities

Travis Davies1, Yiqi Huang1, Alexi Gladstone2, Yunxin Liu3, Xiang Chen4, Heng Ji2, Huxian Liu1, Luhui Hu1
1ZhiCheng AI    2UIUC    3Tsinghua University    4Peking University

TL;DR

Implicit policies parameterized by generative models, such as Diffusion Policy, have become the standard for policy learning and Vision–Language–Action (VLA) models in robotics. However, these approaches often suffer from high computational cost, exposure bias, and unstable inference dynamics, which lead to divergence under distribution shifts.

We introduce EBT-Policy, a new energy-based architecture that solves core issues in robotic and real-world settings. EBT-Policy consistently outperforms diffusion-based policies across simulated and real-world tasks, while requiring significantly less training and inference computation.


  • ⚡️ 50× faster inference — converges in just 2 steps vs. Diffusion Policy's 100
  • ⚡️ 66% fewer training epochs — faster convergence during training
  • ⚡️ Emergent zero-shot retry behavior — recovers from failures using only behavior cloning data, without explicit retry training
  • ⚡️ Uncertainty-aware inference — leverages scalar energy for dynamic compute allocation
Energy landscape minimization and uncertainty modeling diagram

Explaining Uncertainty Modeling. Twelve frames are grouped into three phases: (1) Tool Insertion, (2) Hook Hanging Attempt, and (3) Recovery & Successful Retry. The color bar beneath each frame encodes per-frame energy predicted by the model, where a lower energy indicates higher certainty in EBT-Policy. Notably, red (Step 7) marks the failure that triggers an EBT-Policy retry, while green (Step 11) marks the successful correction. Together, these steps highlight EBT-Policy's interpretability and physical reasoning: using energy-based uncertainty to decide whether to continue or retry and how to adjust actions.

Explaining Energy Minimization. EBT-Policy receives inputs (RGB frames, robotic proprioception, and language instructions) and assigns an energy to candidate action trajectories. Starting from a noisy initialization, the trajectory is iteratively updated by gradient descent on this energy, yielding starting states to a final executable plan. Optimization terminates when the energy converges to a minimum, as illustrated by the energy-landscape sketch.

Abstract

Demonstrations

RoboMimic Benchmark

Can
Robotic can manipulation task demonstrating robust performance.
Lift
Object lifting task showcasing energy-based policy learning.
Square
Square nut assembly task with precise manipulation.

Emergent Retry Behavior

Tool Hang
Demonstrates emergent zero-shot retry behavior without explicit retry training.

Real-World Tasks

Fold Towel (3 inference steps)
Real-world robotic towel folding with robust generalization and consistent performance across variations.
Pick and Place (3 inference steps)
Demonstrates emergent behavior with robust manipulation and uncertainty-aware inference.
Place two Plates (3 inference steps)
Precise placement task showcasing energy-based reasoning and energy landscape optimization for accurate placement.

Results

Success Rates During Training

Success Rates During Training. EBT-Policy exhibits rapid performance improvement, reaching 100% success by epoch 30, using just 2 iterations for predicting actions. Diffusion Policy (DP), on the other hand, only reaches a 100% success rate after 90 epochs, and uses 50 times more steps than EBT-Policy at inference, demonstrating how EBT-Policy is more efficient than DP during both training and inference.

Property EBT-Policy-S EBT-Policy-R
Size ~30M ~100M
Task Simulation Real World
Language Encoder N/A T5-S
Vision Encoder ResNet-18 DINOv3-S

Comparison of EBT-Policy variants. EBT-Policy-S is a compact Transformer used for controlled simulation studies, while EBT-Policy-R is a larger multimodal variant designed for real-world, language-conditioned, and multitask policy learning.

BibTeX

@misc{davies2025ebtpolicyenergyunlocksemergent,
      title={EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities}, 
      author={Travis Davies and Yiqi Huang and Alexi Gladstone and Yunxin Liu and Xiang Chen and Heng Ji and Huxian Liu and Luhui Hu},
      year={2025},
      eprint={2510.27545},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2510.27545}, 
}