How can Tensorflow be used with tf.data for finer control using Python?

The ‘tf.Data’ helps in customizing the model building pipeline, by shuffling the data in the dataset so that all types of data get evenly distributed (if possible).

Read More: What is TensorFlow and how Keras work with TensorFlow to create Neural Networks?

We will be using the flowers dataset, containing images of several thousands of flowers. It contains 5 sub-directories, and there is one sub-directory for every class.

We are using the Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory has been built on top of Jupyter Notebook.

print("Defining customized input pipeline")
list_ds = tf.data.Dataset.list_files(str(data_dir/'*/*'), shuffle=False)
list_ds = list_ds.shuffle(image_count, reshuffle_each_iteration=False)

for f in list_ds.take(5):
   print(f.numpy())

class_names = np.array(sorted([item.name for item in data_dir.glob('*') if item.name != "LICENSE.txt"]))
print(class_names)

print("The dataset is split into training and validation set")
val_size = int(image_count * 0.2)
train_ds = list_ds.skip(val_size)
val_ds = list_ds.take(val_size)
print("Length of each subset is displayed below")
print(tf.data.experimental.cardinality(train_ds).numpy())
print(tf.data.experimental.cardinality(val_ds).numpy())

Code credit: https://www.tensorflow.org/tutorials/load_data/images

Output

Defining customized input pipeline
b'/root/.keras/datasets/flower_photos/dandelion/14306875733_61d71c64c0_n.jpg'
b'/root/.keras/datasets/flower_photos/dandelion/8935477500_89f22cca03_n.jpg'
b'/root/.keras/datasets/flower_photos/sunflowers/3001531316_efae24d37d_n.jpg'
b'/root/.keras/datasets/flower_photos/daisy/7133935763_82b17c8e1b_n.jpg'
b'/root/.keras/datasets/flower_photos/tulips/17844723633_da85357fe3.jpg'
['daisy' 'dandelion' 'roses' 'sunflowers' 'tulips']
The dataset is split into training and validation set
Length of each subset is displayed below
2936
734

Explanation

The keras.preprocessing utilities is a way to create a 'tf.data.Dataset' using a directory of images.
To gain more control over this, customized input pipeline can be written using 'tf.data'.
The tree structure of the files could be used to compile a 'class_names' list.