PhysCtrl

Abstract

Existing video generation models excel at producing photo-realistic videos from text or images, but often lack physical plausibility and 3D controllability. To overcome these limitations, we introduce PhysCtrl, a novel framework for physics-grounded image-to-video generation with physical parameters and force control. At its core is a generative physics network that learns the distribution of physical dynamics across four materials (elastic, sand, plasticine, and rigid) via a diffusion model conditioned on physics parameters and applied forces. We represent physical dynamics as 3D point trajectories and train on a large-scale synthetic dataset of 550K animations generated by physics simulators. We enhance the diffusion model with a novel spatiotemporal attention block that emulates particle interactions and incorporates physics-based constraints during training to enforce physical plausibility. Experiments show that PhysCtrl generates realistic, physics-grounded motion trajectories which, when used to drive image-to-video models, yield high-fidelity, controllable videos that outperform existing methods in both visual quality and physical plausibility.

Pipeline

Given a single image, we lift the object in that image into 3D points. We train a diffusion-based trajectory generation model conditioned on physics parameters and external force for motion generation, which are then used as strong physics-grounded guidance for image-to-video generation.

Comparison

A pair of wireless headphones rests on a white table before lifting into the air, as if there is an invisble force applied to its handle.

ObjCtrl

DragAnything

CogVideo

Wan2.2

Ours

A yellow plasticine dinasour toy free falls to the ground due to gravity. It has no deformation before it touches the ground. After it touches the ground, it deforms.

ObjCtrl

DragAnything

CogVideo

Wan2.2

Ours

the penguin is fully lifted upwards and float into the air with a natural motion, as if there is a force applied onto its left wing. No webbed feet, realistic claws and flippers.

ObjCtrl

DragAnything

CogVideo

Wan2.2

Ours

A black cylindrical pipe lies on a wooden surface before rising and bending at a sharp angle. The transformation is smooth and fluid, as if an invisible upward force is applied in the middle of the pipe.

ObjCtrl

DragAnything

CogVideo

Wan2.2

Ours

@inproceedings{physctrl2025, Author = {Chen Wang* and Chuhao Chen* and Yiming Huang and Zhiyang Dou and Yuan Liu and Jiatao Gu and Lingjie Liu}, Title = {PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation}, Year = {2025}, booktitle={NeurIPS}, }

PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation

NeurIPS, 2025

PhysCtrl achieves controllable and physics-grounded video generation from an initial force.

Abstract

Pipeline

Force Control

Material Control

Comparison

BibTeX