InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling

If you find our code or paper helpful, please consider starring our repository and citing:

@inproceedings{javed2025intermask,
  title={InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling},
  author={Muhammad Gohar Javed and Chuan Guo and Li Cheng and Xingyu Li},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=ZAyuwJYN8N}
}

📍 Preparation

1. Setup Environment

conda env create -f environment.yml
conda activate intermask

The code was tested on Python 3.7.7 and PyTorch 1.13.1

2. Models and Dependencies

Download Pre-trained Models

python prepare/download_models.py

Download Evaluation Models

For evaluation only. Obtained from the InterGen github repo.

bash prepare/download_evaluator.sh

The download scripts use the gdown package. If you face problems try running the following command and try again. Solution is from this github issue.

rm -f ~/.cache/gdown/cookies.json

3. Get Data

InterHuman

Follow the instructions in the InterGen github repo to download the InterHuman dataset and place it in the ./data/InterHuman/ foler and unzip the motions_processed.zip archive such that the directory structure looks like:

./data
├── InterHuman
    ├── annots
    ├── LICENSE.md
    ├── motions
    ├── motions_processed
    └── split

🚀 Demo

python infer.py --gpu_id 0 --dataset_name interhuman --name trans_default

The inference script obtains text prompts from the file ./prompts.txt. The format is each text prompt per line. By default the script generateds motion of 3 seconds in length. In our work, motion is in 30 fps.

The output files are stored under folder ./checkpoints/<dataset_name>/<name>/animation_infer/, which is this case would be ./checkpoints/interhuman/trans_default/animation_infer/. The output files are organized as follows:

keypoint_npy: generated motions with shape of (nframe, 22, 9) for each interacting individual.
keypoint_mp4: stick figure animation in mp4 format with two viewpoints.

We also apply naive foot ik to the generated motions, see files with prefix ik_. It sometimes works well, but sometimes will fail.

👾 Train Your Own Models

Note: You have to train the VQ-VAE BEFORE training the Inter-M Transformers. They CAN NOT be trained simultaneously.

Train VQ-VAE

python train_vq.py --gpu_id 0 --dataset_name interhuman  --name vq_test

Train Inter-M Transformer

python train_transformer.py --gpu_id 0 --dataset_name interhuman --name trans_test --vq_name vq_test

Selected arguments:

--gpu_id: GPU id.
--dataset_name: interaction dataset, interhuman for InterHuman and interx for Inter-X.
--name: name your experiment. This will create a saving directory at ./checkpoints/<dataset_name>/<name>.
--vq_name: when training Inter-M Transformer, you need to specify the name of previously trained vq-vae model for tokenization.
--batch_size: we use 256 for VQ-VAE training and 52 for the Inter-M Transformer.
--do_eval: to perform evaluations during training. Note: Make sure you have downloaded the evaluation models.
--max_epoch: number of total epochs to run. 50 for VQ-VAE and 500 for Inter-M Transformer. All the trained model checkpoints, logs and intermediate evaluations will be saved at ./checkpoints/<dataset_name>/<name>.

📖 Evaluation

Evaluate VQ-VAE Reconstruction:

InterHuman:

python eval.py --gpu_id 0 --use_trans False --dataset_name interhuman --name vq_default

Evaluate Text to Interaction Generation:

HumanML3D:

python eval.py --gpu_id 0 --dataset_name interhuman --name trans_default

Selected arguments

--gpu_id: GPU id.
--use_trans: whether to use transformer. default: True. Set False to perform inference on only the VQ-VAE.
--dataset_name: interaction dataset, interhuman for InterHuman and interx for Inter-X.
--name: name of your trained model experiment.
--which_epoch: checkpoint name of the model: [all, best_fid, best_top1, latest, finest]
--save_vis: whether to save visualization results. default = True.
--time_steps: number of iterations for transformer inference. default: 20.
--cond_scales: scale of classifer-free guidance. default: 2.
--topkr: percentile of low score tokens to ignore while inference. default: 0.9.

The final evaluation results will be saved in ./checkpoints/<dataset_name>/<name>/eval/evaluation_<which_epoch>_ts<time_steps>_cs<cond_scales>_topkr<topkr>.log

Acknowlegements

Components in this code are derived from the following open-source efforts:

momask-codes, InterGen, Inter-X

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling

📍 Preparation

1. Setup Environment

2. Models and Dependencies

Download Pre-trained Models

Download Evaluation Models

3. Get Data

InterHuman

🚀 Demo

👾 Train Your Own Models

Train VQ-VAE

Train Inter-M Transformer

📖 Evaluation

Evaluate VQ-VAE Reconstruction:

Evaluate Text to Interaction Generation:

Acknowlegements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
data		data
models		models
options		options
prepare		prepare
utils		utils
visualization		visualization
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
eval.py		eval.py
infer.py		infer.py
prompts.txt		prompts.txt
train_transformer.py		train_transformer.py
train_vq.py		train_vq.py

License

gohar-malik/InterMask

Folders and files

Latest commit

History

Repository files navigation

InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling

📍 Preparation

1. Setup Environment

2. Models and Dependencies

Download Pre-trained Models

Download Evaluation Models

3. Get Data

InterHuman

🚀 Demo

👾 Train Your Own Models

Train VQ-VAE

Train Inter-M Transformer

📖 Evaluation

Evaluate VQ-VAE Reconstruction:

Evaluate Text to Interaction Generation:

Acknowlegements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages