Reproducibility at ICLR 2019
Reproducibility at ICLR 2019
as a vehicle for
ty
engineering
best practices
(I think)
But I'm
not going
to talk
about
notebooks
Reproducibili
ty and
Software
Engineering
Best
Practices
I take a somewhat
expansive view of
why reproducibility
matters
Reproducibility Helps With
Correctness
Reproducibili
ty Protects
Against Bad
Actors
Reproducibility
Helps Ensure
Robustness
Reproducibility Makes
It Easier to Try New
Datasets
Reproducibili
ty Makes It
Easier to
Create New
Datasets
Reproducibility Makes It Easier to Try
New Tasks
Reproducibility Enables Strong
Baselines
self-attention
attention
Fundamental Premise:
code
unit
review Docker
tests
s
me researchers
If you are a
researcher
If you are simply someone who cares about
researchers
Your ML
Experimen
ts Are
Software
Engineerin
g!
Many
Software
Engineering
Best Practices
are Intended
to Facilitate
Collaboration
Collaboration is a
Forcing Function for
Reproducibility
If nothing else,
you will almost
certainly need
to collaborate
with future-you
"But it runs on my computer!"
Use
Sourc
e
Contr
ol
Using GitHub (or similar) is a pretty de facto way to
collaborate
It also
serves
as a
"time
capsule"
for your
code
Separate "Library" Code and "Experiment"
Code
Separate "Library" Code and "Experiment"
Code
import torch
from pytorch_pretrained_bert import BertTokenizer, BertForMaskedLM
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
tokenized_text = tokenizer.tokenize(text)
masked_index = 8
tokenized_text[masked_index] = '[MASK]'
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
model.eval()
with torch.no_grad():
predictions = model(tokens_tensor, segments_tensors)
python run_classifier.py \
--task_name=MRPC \
--do_train=true \
--do_eval=true \
--data_dir=$GLUE_DIR/MRPC \
--vocab_file=$BERT_BASE_DIR/vocab.txt \
--bert_config_file=$BERT_BASE_DIR/bert_config.json \
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
--max_seq_length=128 \
--train_batch_size=32 \
--learning_rate=2e-5 \
--num_train_epochs=3.0 \
--output_dir=/tmp/mrpc_output/
Be Explicit About Your Dependencies
#### ESSENTIAL LIBRARIES FOR MAIN FUNCTIONALITY ####
# This installs Pytorch for CUDA 8 only. If you are using a newer version,
# please visit http://pytorch.org/ and install the relevant version.
# For now AllenNLP works with both PyTorch 1.0 and 0.4.1. Expect that in
# the future only >=1.0 will be supported.
torch>=0.4.1
# Adds an @overrides decorator for better documentation and error checking when using subclasses.
overrides
# Used by some old code. We moved away from it because it's too slow, but some old code still
# imports this.
nltk
input: {"sentence": "Did Uriah honestly think he could beat the game in under three
hours?"}
prediction: {"verbs": [{"verb": "Did", "description": "[V: Did] Uriah honestly think he
could beat the game in under three hours ?", "tags": ["B-V", "O", "O", "O", "O", "O", "O",
"O", "O", "O", "O", "O", "O", "O"]}, {"verb": "think", "description": "Did [ARG0: Uriah]
[ARGM-MNR: honestly] [V: think] [ARG1: he could beat the game in under three hours] ?",
"tags": ["O", "B-ARG0", "B-ARGM-MNR", "B-V", "B-ARG1", "I-ARG1", "I-ARG1", "I-ARG1", "I-
ARG1", "I-ARG1", "I-ARG1", "I-ARG1", "I-ARG1", "O"]}, {"verb": "could", "description": "Did
Uriah honestly think he [V: could] beat the game in under three hours ?", "tags": ["O",
"O", "O", "O", "O", "B-V", "O", "O", "O", "O", "O", "O", "O", "O"]}, {"verb": "beat",
"description": "Did Uriah honestly think [ARG0: he] [ARGM-MOD: could] [V: beat] [ARG1: the
game] [ARGM-TMP: in under three hours] ?", "tags": ["O", "O", "O", "O", "B-ARG0", "B-ARGM-
MOD", "B-V", "B-ARG1", "I-ARG1", "B-ARGM-TMP", "I-ARGM-TMP", "I-ARGM-TMP", "I-ARGM-TMP",
"O"]}], "words": ["Did", "Uriah", "honestly", "think", "he", "could", "beat", "the",
"game", "in", "under", "three", "hours", "?"]}
Provide Instructions
Reproducibility requires you to design
dynamically
less good: "I did some science, now here is an artifact capturing what I
did"
good: "I did some science, now you do some more science on top of it"
Case Study:
Reproducibility and
AllenNLP
What is AllenNLP?
opinionate
d
Programming to Higher-Level Abstractions
# models/crf_tagger.py
class CrfTagger(Model):
"""
The ``CrfTagger`` encodes a sequence of text with a ``Seq2SeqEncoder``,
then uses a Conditional Random Field model to predict a tag for each token in the
sequence.
def __init__(self, vocab: Vocabulary,
text_field_embedder: TextFieldEmbedder,
encoder: Seq2SeqEncoder,
label_namespace: str = "labels",
feedforward: Optional[FeedForward] = None,
label_encoding: Optional[str] = None,
include_start_end_transitions: bool = True,
constrain_crf_decoding: bool = None,
calculate_span_f1: bool = None,
dropout: Optional[float] = None,
verbose_metrics: bool = False,
initializer: InitializerApplicator = InitializerApplicator(),
regularizer: Optional[RegularizerApplicator] = None) -> None:
super().__init__(vocab, regularizer)
"token_characters": {
parameters
logs
docker image
metrics
datasets
cost
Organize Experiments Into Groups
ideally you'd
give them more
descriptive
names, though
Beaker and Reproducibility
● old code + new data => upload the dataset, reuse the blueprint
● new code + old data => create the blueprint, point at existing
dataset
● want to see previous results?
○ inputs + logs + outputs stored "forever"
○ record of every experiment run + results
○ share with a link
● Reproducibility is important for
more than the obvious reasons
● Your choices of tools and
processes make reproducibility
easier or harder
To Sum Up ● Search out tools that make
reproducibility easier
● Adopt processes that make
reproducibility easier
● If nothing else, be kind to future-
you
● But also be kind to everyone else
who might build on your research
A Few Related Presentations
● I Don't Like Notebooks
AI2: allenai.org
AllenNLP: allennlp.org