If you find our code or paper helpful, please consider starring our repository and citing:
@inproceedings{javed2025intermask,
title={InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling},
author={Muhammad Gohar Javed and Chuan Guo and Li Cheng and Xingyu Li},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=ZAyuwJYN8N}
}
conda env create -f environment.yml
conda activate intermask
The code was tested on Python 3.7.7 and PyTorch 1.13.1
python prepare/download_models.py
For evaluation only. Obtained from the InterGen github repo.
bash prepare/download_evaluator.sh
The download scripts use the gdown package. If you face problems try running the following command and try again. Solution is from this github issue.
rm -f ~/.cache/gdown/cookies.json
Follow the instructions in the InterGen github repo to download the InterHuman dataset and place it in the ./data/InterHuman/
foler and unzip the motions_processed.zip
archive such that the directory structure looks like:
./data
├── InterHuman
├── annots
├── LICENSE.md
├── motions
├── motions_processed
└── split
python infer.py --gpu_id 0 --dataset_name interhuman --name trans_default
The inference script obtains text prompts from the file ./prompts.txt
. The format is each text prompt per line. By default the script generateds motion of 3 seconds in length. In our work, motion is in 30 fps.
The output files are stored under folder ./checkpoints/<dataset_name>/<name>/animation_infer/
, which is this case would be ./checkpoints/interhuman/trans_default/animation_infer/
. The output files are organized as follows:
keypoint_npy
: generated motions with shape of (nframe, 22, 9) for each interacting individual.keypoint_mp4
: stick figure animation in mp4 format with two viewpoints.
We also apply naive foot ik to the generated motions, see files with prefix ik_
. It sometimes works well, but sometimes will fail.
Note: You have to train the VQ-VAE BEFORE training the Inter-M Transformers. They CAN NOT be trained simultaneously.
python train_vq.py --gpu_id 0 --dataset_name interhuman --name vq_test
python train_transformer.py --gpu_id 0 --dataset_name interhuman --name trans_test --vq_name vq_test
Selected arguments:
--gpu_id
: GPU id.--dataset_name
: interaction dataset,interhuman
for InterHuman andinterx
for Inter-X.--name
: name your experiment. This will create a saving directory at./checkpoints/<dataset_name>/<name>
.--vq_name
: when training Inter-M Transformer, you need to specify the name of previously trained vq-vae model for tokenization.--batch_size
: we use256
for VQ-VAE training and52
for the Inter-M Transformer.--do_eval
: to perform evaluations during training. Note: Make sure you have downloaded the evaluation models.--max_epoch
: number of total epochs to run.50
for VQ-VAE and500
for Inter-M Transformer. All the trained model checkpoints, logs and intermediate evaluations will be saved at./checkpoints/<dataset_name>/<name>
.
InterHuman:
python eval.py --gpu_id 0 --use_trans False --dataset_name interhuman --name vq_default
HumanML3D:
python eval.py --gpu_id 0 --dataset_name interhuman --name trans_default
Selected arguments
--gpu_id
: GPU id.--use_trans
: whether to use transformer. default:True
. SetFalse
to perform inference on only the VQ-VAE.--dataset_name
: interaction dataset,interhuman
for InterHuman andinterx
for Inter-X.--name
: name of your trained model experiment.--which_epoch
: checkpoint name of the model: [all
,best_fid
,best_top1
,latest
,finest
]--save_vis
: whether to save visualization results. default =True
.--time_steps
: number of iterations for transformer inference. default:20
.--cond_scales
: scale of classifer-free guidance. default:2
.--topkr
: percentile of low score tokens to ignore while inference. default:0.9
.
The final evaluation results will be saved in ./checkpoints/<dataset_name>/<name>/eval/evaluation_<which_epoch>_ts<time_steps>_cs<cond_scales>_topkr<topkr>.log
Components in this code are derived from the following open-source efforts: