Abstract
Learning natural, stable, and compositionally generalizable whole-body control policies for humanoid robots performing simultaneous locomotion and manipulation (loco-manipulation) remains a key challenge in robotics. Existing reinforcement learning approaches often rely on a single monolithic policy, leading to cross-skill interference and motion conflicts in high-degree-of-freedom systems. To address these issues, we propose MetaWorld-X, a hierarchical framework that decomposes complex control problems into specialized expert policies (SEP). Each expert is trained under human motion priors through imitation-constrained reinforcement learning, ensuring natural and physically plausible motions. We further develop an Intelligent Routing Mechanism (IRM) supervised by a Vision-Language Model (VLM), enabling semantic-driven expert composition. Extensive experiments on Humanoid-bench demonstrate that MetaWorld-X significantly outperforms baselines in motion quality, training efficiency, and task success rates, validating the effectiveness of semantic-driven expert orchestration.
The framework of MetaWorld-X.
The specific implementation framework of the SEP module.
AMASS Dataset
H2O Replay
Unitree H1 Replay
Imitation Learning
BibTeX
@article{YourPaperKey2024,
title={Your Paper Title Here},
author={First Author and Second Author and Third Author},
journal={Conference/Journal Name},
year={2024},
url={https://your-domain.com/your-project-page}
}