BadWorld: Adversarial Attack on World Models

¹The Hong Kong Polytechnic University

TL;DR

We introduce BadWorld, a label-free adversarial attack for visual world models.

Starting from a single perturbed context image, BadWorld reliably causes future rollouts to break down across unseen user controls, revealing severe robustness risks in current VWMs.

Method

1 Label-Free Velocity Attack Objectives

BadWorld attacks the world model without requiring any ground-truth future video or predefined correct rollout.

We introduce four self-supervised objectives: (1) Velocity-Max, amplifying denoising updates; (2) Velocity-Min, suppressing denoising updates; (3) Drift-Max, encouraging semantic drift; and (4) Drift-Min, inducing motion collapse. For all objectives, we approximate the history using repeated clean contexts and constrain the attack to early denoising timesteps.

Among them, we choose Velocity-Min as the final objective, since it consistently achieves strong attack effectiveness on both Matrix-Game-2.0 and Astra.

Matrix-Game-2.0

Clean	Velocity-Max	Velocity-Min ✓	Drift-Max	Drift-Min

Astra

Clean	Velocity-Max	Velocity-Min ✓	Drift-Max	Drift-Min

2 Trajectory-Adaptive Optimization

BadWorld further improves the perturbation by considering that future user controls are unknown and may vary.

It actively searches for hard trajectories where the current perturbation is less effective, then updates the perturbation against these challenging controls. This makes the adversarial image more robust across different camera paths or action sequences, leading to stronger degradation and better generalization.

Clean	Velocity-Min	+ Bi-Level Attack

Attack performance under different camera trajectories — Attack performance unfer different camera trajectories.

Key Takeaway

Visual world models are becoming interactive simulators, but their fragile dynamics suggest they have not truly learned stable physical knowledge.

BadWorld shows that tiny, imperceptible changes to a single input image can severely corrupt future rollouts. This reveals a critical robustness gap for safety-critical deployment and a practical path for privacy protection.

Citation

@misc{shen2026badworldadversarialattacksworld,
      title={BadWorld: Adversarial Attacks on World Models}, 
      author={Linghui Shen and Mingyue Cui and Xingyi Yang},
      year={2026},
      eprint={2606.16519},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2606.16519}, 
}

TL;DR

Method

Key Takeaway

Citation

Relevant Projects