COMET: Controllable Long-term Motion Generation with Extended Joint Targets

Eunjong Lee, Eunhee Kim, Sanghoon Hong, Eunho Jung, Jihoon Kim
Cinamon Inc.
Teaser Image

we propose COMET, a novel and unified framework designed to generate long term, stable human motion with fine-grained joint-level control for real-time application.

Abstract

Generating stable and controllable character motion in real-time is a key challenge in computer animation. Existing methods often fail to provide fine-grained control or suffer from motion degradation over long sequences, limiting their use in interactive applications.

We propose COMET, an autoregressive framework that runs in real time, enabling versatile character control and robust long-horizon synthesis. Our efficient Transformer-based conditional VAE allows for precise, interactive control over arbitrary user-specified joints for tasks like goal-reaching and in-betweening from a single model. To ensure long-term temporal stability, we introduce a novel reference-guided feedback mechanism that prevents error accumulation. This mechanism also serves as a plug-and-play stylization module, enabling real-time style transfer.

Extensive evaluations demonstrate that COMET robustly generates high-quality motion at real-time speeds, significantly outperforming state-of-the-art approaches in complex sequential goal-reaching tasks and confirming its readiness for demanding interactive applications.

Arbitrary Joint Control

COMET can be utilized for various goal-reaching tasks by freely specifying any combination of learned joints (pelvis and the five end-effectors).

Applications

Motion In-betweening

Since COMET supports multi-joint control, it can be naturally extended from the multi-joint goal-reaching task to handle motion in-betweening.


Motion Stylization

COMET can stylize the motion by generating a reference for a specific style and applying reference-guided feedback.

Comparison

Ablation Study on Reference-guided Feedback

With RGF enabled, COMET maintains coherent trajectories while accurately reaching all sequential targets.

W/O Reference-guided Feedback

W/ Reference-guided Feedback


Other Methods

Motion In-betweening

The CondMDI and DNO models were reproduced and trained on the AMASS and CIRCLE datasets. All three models do not include any post-processing stage.

CondMDI

DNO

COMET

Left : CondMDI, Middle : DNO, Right : COMET

Motion Stylization

The SMooDi model was reproduced and trained on the AMASS and CIRCLE datasets.

SMooDi - LeanBack Style

COMET - LeanBack Style

Video