0% found this document useful (0 votes)
46 views1 page

4 - Adressing Data Mismatch

To address data mismatch between training, development, and test sets: manually analyze errors to understand differences; make training data similar to development and test sets by collecting more representative data or using data synthesis, though the latter risks overfitting to a small subset of the problem space.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views1 page

4 - Adressing Data Mismatch

To address data mismatch between training, development, and test sets: manually analyze errors to understand differences; make training data similar to development and test sets by collecting more representative data or using data synthesis, though the latter risks overfitting to a small subset of the problem space.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Addressing data mismatch

This is a general guideline to address data mismatch:

• Perform manual error analysis to understand the error differences between training,
development/test sets. Development should never be done on test set to avoid overfitting.

• Make training data or collect data similar to development and test sets. To make the training data
more similar to your development set, you can use is artificial data synthesis. However, it is
possible that if you might be accidentally simulating data only from a tiny subset of the space of
all possible examples.

You might also like