- Python: 3.9+
- Dependencies: See pyproject.toml.
-
Create a virtual environment and install dependencies.
$ uv venv -p /path/to/python $ uv sync
-
Log in to wandb.
$ wandb login
You can train and test a model with the following command:
# For training and evaluating MARC-ja
uv run python src/train.py -cn marc_ja devices=[0,1] max_batches_per_device=16
Here are commonly used options:
-cn
: Task name. Choose frommarc_ja
,jcola
,jsts
,jnli
,jsquad
, andjcqa
.devices
: GPUs to use.max_batches_per_device
: Maximum number of batches to process per device (default:4
).compile
: JIT-compile the model with torch.compile for faster training ( default:false
).model
: Pre-trained model name. see YAML config files under configs/model.
To evaluate on the out-of-domain split of the JCoLA dataset, specify datamodule/valid=jcola_ood
(
or datamodule/valid=jcola_ood_annotated
).
For more options, see YAML config files under configs.
uv run python scripts/train.py -cn marc_ja.debug
You can specify trainer=cpu.debug
to use CPU.
uv run python scripts/train.py -cn marc_ja.debug trainer=cpu.debug
If you are on a machine with GPUs, you can specify the GPUs to use with the devices
option.
uv run python scripts/train.py -cn marc_ja.debug devices=[0]
$ wandb sweep <(sed 's/MODEL_NAME/deberta_base/' sweeps/jcola.yaml)
wandb: Creating sweep from: /dev/fd/xx
wandb: Created sweep with ID: xxxxxxxx
wandb: View sweep at: https://wandb.ai/<wandb-user>/JGLUE-evaluation-scripts/sweeps/xxxxxxxx
wandb: Run sweep agent with: wandb agent <wandb-user>/JGLUE-evaluation-scripts/xxxxxxxx
$ DEVICES=0,1 MAX_BATCHES_PER_DEVICE=16 COMPILE=true wandb agent <wandb-user>/JGLUE-evaluation-scripts/xxxxxxxx
We fine-tuned the following models and evaluated them on the dev set of JGLUE. We tuned learning rate and training epochs for each model and task following the JGLUE paper.
Model | MARC-ja/acc | JCoLA/acc | JSTS/pearson | JSTS/spearman | JNLI/acc | JSQuAD/EM | JSQuAD/F1 | JComQA/acc |
---|---|---|---|---|---|---|---|---|
Waseda RoBERTa base | 0.965 | 0.867 | 0.913 | 0.876 | 0.905 | 0.853 | 0.916 | 0.853 |
Waseda RoBERTa large (seq512) | 0.969 | 0.849 | 0.925 | 0.890 | 0.928 | 0.910 | 0.955 | 0.900 |
LUKE Japanese base* | 0.965 | - | 0.916 | 0.877 | 0.912 | - | - | 0.842 |
LUKE Japanese large* | 0.965 | - | 0.932 | 0.902 | 0.927 | - | - | 0.893 |
DeBERTaV2 base | 0.970 | 0.879 | 0.922 | 0.886 | 0.922 | 0.899 | 0.951 | 0.873 |
DeBERTaV2 large | 0.968 | 0.882 | 0.925 | 0.892 | 0.924 | 0.912 | 0.959 | 0.890 |
DeBERTaV3 base | 0.960 | 0.878 | 0.927 | 0.891 | 0.927 | 0.896 | 0.947 | 0.875 |
*The scores of LUKE are from the official repository.
- Learning rate: {2e-05, 3e-05, 5e-05}
Model | MARC-ja/acc | JCoLA/acc | JSTS/pearson | JSTS/spearman | JNLI/acc | JSQuAD/EM | JSQuAD/F1 | JComQA/acc |
---|---|---|---|---|---|---|---|---|
Waseda RoBERTa base | 3e-05 | 3e-05 | 2e-05 | 2e-05 | 3e-05 | 3e-05 | 3e-05 | 5e-05 |
Waseda RoBERTa large (seq512) | 2e-05 | 2e-05 | 3e-05 | 3e-05 | 2e-05 | 2e-05 | 2e-05 | 3e-05 |
DeBERTaV2 base | 2e-05 | 3e-05 | 5e-05 | 5e-05 | 3e-05 | 2e-05 | 2e-05 | 5e-05 |
DeBERTaV2 large | 5e-05 | 2e-05 | 5e-05 | 5e-05 | 2e-05 | 2e-05 | 2e-05 | 3e-05 |
DeBERTaV3 base | 5e-05 | 2e-05 | 3e-05 | 3e-05 | 2e-05 | 5e-05 | 5e-05 | 2e-05 |
- Training epochs: {3, 4}
Model | MARC-ja/acc | JCoLA/acc | JSTS/pearson | JSTS/spearman | JNLI/acc | JSQuAD/EM | JSQuAD/F1 | JComQA/acc |
---|---|---|---|---|---|---|---|---|
Waseda RoBERTa base | 4 | 3 | 4 | 4 | 3 | 4 | 4 | 3 |
Waseda RoBERTa large (seq512) | 4 | 4 | 4 | 4 | 3 | 3 | 3 | 3 |
DeBERTaV2 base | 3 | 4 | 3 | 3 | 3 | 4 | 4 | 4 |
DeBERTaV2 large | 3 | 3 | 4 | 4 | 3 | 4 | 4 | 3 |
DeBERTaV3 base | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
- Waseda RoBERTa base: nlp-waseda/roberta-base-japanese
- Waseda RoBERTa large ( seq512): nlp-waseda/roberta-large-japanese-seq512
- LUKE Japanese base: studio-ousia/luke-base-japanese
- LUKE Japanese large: studio-ousia/luke-large-japanese
- DeBERTaV2 base: ku-nlp/deberta-v2-base-japanese
- DeBERTaV2 large: ku-nlp/deberta-v2-large-japanese
- DeBERTaV3 base: ku-nlp/deberta-v3-base-japanese
Nobuhiro Ueda (ueda at nlp.ist.i.kyoto-u.ac.jp)
- yahoojapan/JGLUE: JGLUE: Japanese General Language Understanding Evaluation
- JGLUE: Japanese General Language Understanding Evaluation (Kurihara et al., LREC 2022)
- 栗原 健太郎, 河原 大輔, 柴田 知秀, JGLUE: 日本語言語理解ベンチマーク, 自然言語処理, 2023, 30 巻, 1 号, p. 63-87, 公開日 2023/03/15, Online ISSN 2185-8314, Print ISSN 1340-7619, https://doi.org/10.5715/jnlp.30.63, https://www.jstage.jst.go.jp/article/jnlp/30/1/30_63/_article/-char/ja