tfds.testing.mock_data
Stay organized with collections
Save and categorize content based on your preferences.
Mock tfds to generate random data.
@contextlib.contextmanager
tfds.testing.mock_data(
num_examples: int = 1,
num_sub_examples: int = 1,
max_value: Optional[int] = None,
*,
policy: MockPolicy = tfds.testing.MockPolicy.AUTO
,
as_dataset_fn: Optional[Callable[..., tf.data.Dataset]] = None,
data_dir: Optional[str] = None,
mock_array_record_data_source: Optional[PickableDataSourceMock] = None
) -> Iterator[None]
Usage
with tfds.testing.mock_data(num_examples=5):
ds = tfds.load('some_dataset', split='train')
for ex in ds: # ds will yield randomly generated examples.
ex
All calls to tfds.load
/tfds.data_source
within the context manager then
return deterministic mocked data.
For more control over the generated examples, you can
manually overwrite the DatasetBuilder._as_dataset
method:
def as_dataset(self, *args, **kwargs):
return tf.data.Dataset.from_generator(
lambda: ({
'image': np.ones(shape=(28, 28, 1), dtype=np.uint8),
'label': i % 10,
} for i in range(num_examples)),
output_types=self.info.features.dtype,
output_shapes=self.info.features.shape,
)
with mock_data(as_dataset_fn=as_dataset):
ds = tfds.load('some_dataset', split='train')
for ex in ds: # ds will yield the fake data example of 'as_dataset'.
ex
Policy
For improved results, you can copy the true metadata files
(dataset_info.json
, label.txt
, vocabulary files) in
data_dir/dataset_name/version
. This will allow the mocked dataset to use
the true metadata computed during generation (split names,...).
If metadata files are not found, then info from the original class will be
used, but the features computed during generation won't be available (e.g.
unknown split names, so any splits are accepted).
Miscellaneous
- The examples are deterministically generated. Train and test split will
yield the same examples.
- The actual examples will be randomly generated using
builder.info.features.get_tensor_info()
.
- Download and prepare step will always be a no-op.
- Warning:
info.split['train'].num_examples
won't match
len(list(ds_train))
Some of those points could be improved. If you have suggestions, issues with
this functions, please open a new issue on our Github.
Args |
num_examples
|
Number of fake example to generate.
|
num_sub_examples
|
Number of examples to generate in nested Dataset features.
|
max_value
|
The maximum value present in generated tensors; if max_value is
None or it is set to 0, then random numbers are generated from the range
from 0 to 255.
|
policy
|
Strategy to use to generate the fake examples. See
tfds.testing.MockPolicy .
|
as_dataset_fn
|
If provided, will replace the default random example
generator. This function mock the FileAdapterBuilder._as_dataset
|
data_dir
|
Folder containing the metadata file (searched in
data_dir/dataset_name/version ). Overwrite data_dir kwargs from
tfds.load . Used in MockPolicy.USE_FILES mode.
|
mock_array_record_data_source
|
Overwrite a mock for the underlying
ArrayRecord data source if it is used.
|
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-04-26 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-04-26 UTC."],[],[],null,["# tfds.testing.mock_data\n\n\u003cbr /\u003e\n\n|------------------------------------------------------------------------------------------------------------------------------|\n| [View source on GitHub](https://github.com/tensorflow/datasets/blob/v4.9.3/tensorflow_datasets/testing/mocking.py#L146-L439) |\n\nMock tfds to generate random data. \n\n @contextlib.contextmanager\n tfds.testing.mock_data(\n num_examples: int = 1,\n num_sub_examples: int = 1,\n max_value: Optional[int] = None,\n *,\n policy: MockPolicy = ../../tfds/testing/MockPolicy#AUTO,\n as_dataset_fn: Optional[Callable[..., tf.data.Dataset]] = None,\n data_dir: Optional[str] = None,\n mock_array_record_data_source: Optional[PickableDataSourceMock] = None\n ) -\u003e Iterator[None]\n\n### Usage\n\n- Usage (automated):\n\n with tfds.testing.mock_data(num_examples=5):\n ds = tfds.load('some_dataset', split='train')\n\n for ex in ds: # ds will yield randomly generated examples.\n ex\n\nAll calls to [`tfds.load`](../../tfds/load)/[`tfds.data_source`](../../tfds/data_source) within the context manager then\nreturn deterministic mocked data.\n\n- Usage (manual):\n\nFor more control over the generated examples, you can\nmanually overwrite the `DatasetBuilder._as_dataset` method: \n\n def as_dataset(self, *args, **kwargs):\n return tf.data.Dataset.from_generator(\n lambda: ({\n 'image': np.ones(shape=(28, 28, 1), dtype=np.uint8),\n 'label': i % 10,\n } for i in range(num_examples)),\n output_types=self.info.features.dtype,\n output_shapes=self.info.features.shape,\n )\n\n with mock_data(as_dataset_fn=as_dataset):\n ds = tfds.load('some_dataset', split='train')\n\n for ex in ds: # ds will yield the fake data example of 'as_dataset'.\n ex\n\n### Policy\n\nFor improved results, you can copy the true metadata files\n(`dataset_info.json`, `label.txt`, vocabulary files) in\n`data_dir/dataset_name/version`. This will allow the mocked dataset to use\nthe true metadata computed during generation (split names,...).\n\nIf metadata files are not found, then info from the original class will be\nused, but the features computed during generation won't be available (e.g.\nunknown split names, so any splits are accepted).\n\n### Miscellaneous\n\n- The examples are deterministically generated. Train and test split will yield the same examples.\n- The actual examples will be randomly generated using `builder.info.features.get_tensor_info()`.\n- Download and prepare step will always be a no-op.\n- Warning: `info.split['train'].num_examples` won't match `len(list(ds_train))`\n\nSome of those points could be improved. If you have suggestions, issues with\nthis functions, please open a new issue on our Github.\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Args ---- ||\n|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `num_examples` | Number of fake example to generate. |\n| `num_sub_examples` | Number of examples to generate in nested Dataset features. |\n| `max_value` | The maximum value present in generated tensors; if max_value is None or it is set to 0, then random numbers are generated from the range from 0 to 255. |\n| `policy` | Strategy to use to generate the fake examples. See [`tfds.testing.MockPolicy`](../../tfds/testing/MockPolicy). |\n| `as_dataset_fn` | If provided, will replace the default random example generator. This function mock the `FileAdapterBuilder._as_dataset` |\n| `data_dir` | Folder containing the metadata file (searched in `data_dir/dataset_name/version`). Overwrite `data_dir` kwargs from [`tfds.load`](../../tfds/load). Used in [`MockPolicy.USE_FILES`](../../tfds/testing/MockPolicy#USE_FILES) mode. |\n| `mock_array_record_data_source` | Overwrite a mock for the underlying ArrayRecord data source if it is used. |\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\n| Yields ------ ||\n|---|---|\n| None ||\n\n\u003cbr /\u003e"]]