Project Page Draft

Advancing Omnimodal Embodied Agents from Isolated Skills to Everyday Physical Autonomy

OmniAct bridges omnimodal perception with a unified cyber-physical action space for everyday physical autonomy.

Junhao Shi*, Zezheng Huai*, Siyin Wang, Jia Chen, Yubang Wang, Zhaoye Fei, Hechang Chen, Jingjing Gong, Xipeng Qiu, Yu-Gang Jiang

Fudan University · Shanghai Innovation Institute · Jilin University

Demo Video

This page uses a compressed copy of the local demo video for GitHub Pages deployment.

Framework Overview

OmniAct framework overview

Framework figure rendered from method_omniact.pdf.

Abstract

Building persistent embodied agents in unstructured environments demands unified orchestration of heterogeneous tools spanning both cyber domains such as APIs and IoT, and physical domains such as manipulation and navigation, coupled with autonomous recovery from physical failures that arise over extended operation. OmniAct integrates a multimodal semantic planner for routing across unified action spaces, an adaptive hierarchical memory with event-boundary-driven compression, and an asynchronous visual preemption engine that closes the semantic loop during physical execution. Across 40 real-world long-horizon tasks on two robotic platforms coordinating household IoT devices, OmniAct improves end-to-end success, keeps context growth near flat over long interactions, and raises mid-scale open-weight models toward proprietary-level performance.

Framework

Unified Skill Routing

Multimodal inputs are mapped into structured commands over one cyber-physical action space spanning APIs, IoT devices, manipulators, and mobile robots.

Adaptive Hierarchical Memory

Event-boundary-driven compression condenses long interaction histories into timestamped semantic cues while preserving temporal coherence.

Visual Preemption

Periodic visual verification detects physical deviations during execution and triggers immediate replanning when anomalies appear.

Citation

@article{shi2026omniact,
  title   = {Advancing Omnimodal Embodied Agents from Isolated Skills to Everyday Physical Autonomy},
  author  = {Shi, Junhao and Huai, Zezheng and Wang, Siyin and Chen, Jia and Wang, Yubang and Fei, Zhaoye and Chen, Hechang and Gong, Jingjing and Qiu, Xipeng and Jiang, Yu-Gang},
  year    = {2026},
  note    = {Preprint}
}