Publications

arXiv 2026
Action Images teaser

Action Images: End-to-End Policy Learning via Multiview Video Generation

Haoyu Zhen, [Zixian Gao], [Qiao Sun], [Yilin Zhao], [Yuncong Yang], [Yilun Du], [Pengsheng Guo], [Tsun-Hsuan Wang], [Yi-Ling Qiao], [Chuang Gan]

arXivProjectCode

GitHub Stars

arXiv 2026

arXiv 2026
3D-Layout-R1 teaser

3D-Layout-R1: Structured Reasoning for Language-Instructed Spatial Editing

Haoyu Zhen, [Xiaolong Li], [Yilin Zhao], [Han Zhang], [Sifei Liu], [Kaichun Mo], [Chuang Gan], [Subhashree Radhakrishnan]

arXiv

arXiv 2026

arXiv 2026
UniCanvas teaser

UniCanvas: A Diffusion-base Unified Model for Text-in-Image Joint Generation

[Zeyuan Yang], [Hao-Wei Chen], [Xueyang Yu], [Yuncong Yang], Haoyu Zhen, [Ziqiao Ma], [Maohao Shen], [Chuang Gan]

arXiv

Best Paper @ A2A-MML Workshop, CVPR 2026

arXiv 2026

arXiv 2026
Fast Spatial Memory teaser

Fast Spatial Memory with Elastic Test-Time Training

[Ziqiao Ma], [Xueyang Yu], Haoyu Zhen, [Yuncong Yang], [Joyce Chai], [Chuang Gan]

arXivProjectCodeBlog

GitHub Stars

arXiv 2026

RSS 2026
GHOST teaser

GHOST: Hierarchical Sub-Goal Policies for Generalizing Robot Manipulation

[Sriram Krishna], [Ben Eisner], [Haotian Zhan], [Ying Yuan], Haoyu Zhen, [Chuang Gan], [Shubham Tulsiani], [David Held]

PDFProjectCode

Robotics: Science and Systems (RSS) 2026

ICCV 2025

TesserAct: Learning 4D Embodied World Models

Haoyu Zhen, [Qiao Sun], [Hongxin Zhang], [Junyan Li], [Siyuan Zhou], [Yilun Du], [Chuang Gan]

arXivProjectCodeTwitter

GitHub Stars

ICCV 2025

ICCV 2025
RapVerse teaser

RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text

[Jiaben Chen], [Xin Yan], Yihang Chen, Siyuan Cen, Qinwei Ma, Haoyu Zhen, [Kaizhi Qian], [Lie Lu], [Chuang Gan]

arXivProjectCode

GitHub Stars Demo views

ICCV 2025

ICML 2024

3D-VLA: 3D Vision-Language-Action Generative World Model

Haoyu Zhen, [Xiaowen Qiu], [Peihao Chen], [Jincheng Yang], [Xin Yan], [Yilun Du], [Yining Hong], [Chuang Gan]

arXivProjectCodeTwitter

ICML 2024

NeurIPS 2023

3D-LLM: Injecting the 3D World into Large Language Models

[Yining Hong], Haoyu Zhen, [Peihao Chen], [Shuhong Zheng], [Yilun Du], [Zhenfang Chen], [Chuang Gan]

arXivProjectCodeTwitter

NeurIPS 2023 (Spotlight)

NeurIPS 2023
REOT method illustration

Relative Entropic Optimal Transport: a (Prior-aware) Matching Perspective to (Unbalanced) Classification.

[Liangliang Shi], Haoyu Zhen, [Gu Zhang], [Junchi Yan]

NeurIPS 2023

3DV 2024

Color-NeuS: Reconstructing Neural Implicit Surfaces with Color

[Licheng Zhong], [Lixin Yang], [Kailin Li], Haoyu Zhen, [Mei Han], [Cewu Lu]

arXivProjectCodeData

3DV 2024

ICCV 2023
CHORD in-hand object reconstruction

CHORD: Category-level in-Hand Object Reconstruction via Shape Deformation

[Kailin Li], [Lixin Yang], Haoyu Zhen, Zenan Lin, [Xinyu Zhan], [Licheng Zhong], [Jian Xu], [Kejian Wu], [Cewu Lu]

arXivProject

ICCV 2023

ICML 2023
IOT-CL contrastive learning diagram

Understanding and Generalizing Contrastive Learning from the Inverse Optimal Transport Perspective

[Liangliang Shi], [Gu Zhang], Haoyu Zhen, Jintao Fan, [Junchi Yan]

OpenReviewSlides

ICML 2023