The ‘tf.Data’ helps in customizing the model building pipeline, by shuffling the data in the dataset so that all types of data get evenly distributed (if possible).
Read More: What is TensorFlow and how Keras work with TensorFlow to create Neural Networks?
We will be using the flowers dataset, containing images of several thousands of flowers. It contains 5 sub-directories, and there is one sub-directory for every class.
We are using the Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory has been built on top of Jupyter Notebook.
print("Defining customized input pipeline") list_ds = tf.data.Dataset.list_files(str(data_dir/'*/*'), shuffle=False) list_ds = list_ds.shuffle(image_count, reshuffle_each_iteration=False) for f in list_ds.take(5): print(f.numpy()) class_names = np.array(sorted([item.name for item in data_dir.glob('*') if item.name != "LICENSE.txt"])) print(class_names) print("The dataset is split into training and validation set") val_size = int(image_count * 0.2) train_ds = list_ds.skip(val_size) val_ds = list_ds.take(val_size) print("Length of each subset is displayed below") print(tf.data.experimental.cardinality(train_ds).numpy()) print(tf.data.experimental.cardinality(val_ds).numpy())
Code credit: https://www.tensorflow.org/tutorials/load_data/images
Output
Defining customized input pipeline b'/root/.keras/datasets/flower_photos/dandelion/14306875733_61d71c64c0_n.jpg' b'/root/.keras/datasets/flower_photos/dandelion/8935477500_89f22cca03_n.jpg' b'/root/.keras/datasets/flower_photos/sunflowers/3001531316_efae24d37d_n.jpg' b'/root/.keras/datasets/flower_photos/daisy/7133935763_82b17c8e1b_n.jpg' b'/root/.keras/datasets/flower_photos/tulips/17844723633_da85357fe3.jpg' ['daisy' 'dandelion' 'roses' 'sunflowers' 'tulips'] The dataset is split into training and validation set Length of each subset is displayed below 2936 734
Explanation
- The keras.preprocessing utilities is a way to create a 'tf.data.Dataset' using a directory of images.
- To gain more control over this, customized input pipeline can be written using 'tf.data'.
- The tree structure of the files could be used to compile a 'class_names' list.