PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

MetaWorld-X

Hierarchical World Modeling via VLM-Orchestrated Experts for Humanoid Loco-Manipulation

First Author^*, Second Author^*, Third Author

Institution Name
Conference name and year
^*Indicates Equal Contribution

Paper Supplementary Code arXiv

This video showcases our visualization.

Abstract

Learning natural, stable, and compositionally generalizable whole-body control policies for humanoid robots performing simultaneous locomotion and manipulation (loco-manipulation) remains a key challenge in robotics. Existing reinforcement learning approaches often rely on a single monolithic policy, leading to cross-skill interference and motion conflicts in high-degree-of-freedom systems. To address these issues, we propose MetaWorld-X, a hierarchical framework that decomposes complex control problems into specialized expert policies (SEP). Each expert is trained under human motion priors through imitation-constrained reinforcement learning, ensuring natural and physically plausible motions. We further develop an Intelligent Routing Mechanism (IRM) supervised by a Vision-Language Model (VLM), enabling semantic-driven expert composition. Extensive experiments on Humanoid-bench demonstrate that MetaWorld-X significantly outperforms baselines in motion quality, training efficiency, and task success rates, validating the effectiveness of semantic-driven expert orchestration.

The framework of MetaWorld-X.

Motion retargeting：

The specific implementation framework of the SEP module.

AMASS Dataset

H2O Replay

Unitree H1 Replay

Imitation Learning

BibTeX

@article{YourPaperKey2024,
  title={Your Paper Title Here},
  author={First Author and Second Author and Third Author},
  journal={Conference/Journal Name},
  year={2024},
  url={https://your-domain.com/your-project-page}
}

More Works from Our Lab

Paper Title 1

Paper Title 2

Paper Title 3