Object Detection in Scarla
Object Detection in Scarla
Approaches overview
1. Mobilenet v2 SSD
Approaches overview
A. Configuration stage
B. Dataset sanity check stage.
C. Tfrecord generation stage.
D. Tfrecord sanity check stage.
E. Training stage.
F. Inference stage.
A. Configuration stage
labels = [
{'name': 'Vehicle', 'id': 1},
{'name': 'Bike', 'id': 2},
{'name': 'Motorbike', 'id': 3},
{'name': 'Traffic Light', 'id': 4},
{'name': 'Traffic Sign', 'id': 5},
]
In [3]: import os
os.chdir('/kaggle/working/models/research')
Cloning cocoapi
In [6]: os.chdir('cocoapi/PythonAPI')
In [7]: !make
In [9]: os.chdir('/kaggle/working/models/research')
ERROR: pip's dependency resolver does not currently take into account all the packages t
hat are installed. This behaviour is the source of the following dependency conflicts.
cudf 23.8.0 requires cupy-cuda11x>=12.0.0, which is not installed.
cuml 23.8.0 requires cupy-cuda11x>=12.0.0, which is not installed.
dask-cudf 23.8.0 requires cupy-cuda11x>=12.0.0, which is not installed.
beatrix-jupyterlab 2023.814.150030 requires jupyter-server~=1.16, but you have jupyter-s
erver 2.10.0 which is incompatible.
beatrix-jupyterlab 2023.814.150030 requires jupyterlab~=3.4, but you have jupyterlab 4.
0.5 which is incompatible.
cudf 23.8.0 requires pandas<1.6.0dev0,>=1.3, but you have pandas 2.0.3 which is incompat
ible.
cudf 23.8.0 requires protobuf<5,>=4.21, but you have protobuf 3.20.3 which is incompatib
le.
cudf 23.8.0 requires pyarrow==11.*, but you have pyarrow 9.0.0 which is incompatible.
cuml 23.8.0 requires dask==2023.7.1, but you have dask 2023.11.0 which is incompatible.
cuml 23.8.0 requires distributed==2023.7.1, but you have distributed 2023.11.0 which is
incompatible.
dask-cudf 23.8.0 requires dask==2023.7.1, but you have dask 2023.11.0 which is incompati
ble.
dask-cudf 23.8.0 requires distributed==2023.7.1, but you have distributed 2023.11.0 whic
h is incompatible.
dask-cudf 23.8.0 requires pandas<1.6.0dev0,>=1.3, but you have pandas 2.0.3 which is inc
ompatible.
multiprocess 0.70.15 requires dill>=0.3.7, but you have dill 0.3.1.1 which is incompatib
le.
pathos 0.3.1 requires dill>=0.3.7, but you have dill 0.3.1.1 which is incompatible.
pymc3 3.11.5 requires numpy<1.22.2,>=1.15.0, but you have numpy 1.24.3 which is incompat
ible.
pymc3 3.11.5 requires scipy<1.8.0,>=1.7.3, but you have scipy 1.11.3 which is incompatib
le.
pytoolconfig 1.2.6 requires packaging>=22.0, but you have packaging 21.3 which is incomp
atible.
tensorflow-decision-forests 1.5.0 requires tensorflow~=2.13.0, but you have tensorflow
2.15.0.post1 which is incompatible.
tensorflowjs 4.13.0 requires packaging~=23.1, but you have packaging 21.3 which is incom
patible.
tensorstore 0.1.48 requires ml-dtypes>=0.3.1, but you have ml-dtypes 0.2.0 which is inco
mpatible.
ydata-profiling 4.5.1 requires numpy<1.24,>=1.16.0, but you have numpy 1.24.3 which is i
ncompatible.
OK (skipped=1)
Creating config file for mobilenetv2 SSD along with downloading pretrained model
train_config: {
fine_tune_checkpoint_version: V2
fine_tune_checkpoint: "/kaggle/working/temp1/ssd_mobilenet_v2_fpnlite_640x640_coco17_t
fine_tune_checkpoint_type: "detection"
batch_size: 1
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
num_steps: 50000
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
random_crop_image {
min_object_covered: 0.0
min_aspect_ratio: 0.75
max_aspect_ratio: 3.0
min_area: 0.75
max_area: 1.0
overlap_thresh: 0.0
}
}
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: .08
total_steps: 50000
warmup_learning_rate: .026666
warmup_steps: 1000
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
}
train_input_reader: {
label_map_path: "/kaggle/working/labels.pbtxt"
tf_record_input_reader {
input_path: "/kaggle/working/train.record"
}
}
eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
}
eval_input_reader: {
label_map_path: "/kaggle/working/labels.pbtxt"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "/kaggle/working/test.record"
}
}
"""
config_path
ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/
ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/checkpoint/
ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/checkpoint/ckpt-0.data-00000-of-00001
ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/checkpoint/checkpoint
ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/checkpoint/ckpt-0.index
ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/pipeline.config
ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/saved_model/
ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/saved_model/saved_model.pb
ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/saved_model/variables/
ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/saved_model/variables/variables.data-00000
-of-00001
ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/saved_model/variables/variables.index
'/kaggle/working/ssd_mobilenet_v2_fpnlite_640_config_updated.config'
Out[14]:
We are using yolo annotations to parse the bbox, you can use PASCAL VOC format; too
In [15]: import os
import cv2
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import random
boxes = []
for line in lines:
class_id, x_center, y_center, width, height = map(float, line.strip().split())
return boxes
# Plot image
ax.imshow(img)
# Show plot
plt.show()
Importing modules
In [16]: import os
import io
import tensorflow as tf
from PIL import Image
from object_detection.utils import dataset_util
from tqdm import tqdm
import matplotlib.pyplot as plt
import numpy as np
image = Image.open(image_path)
width, height = image.size
xmins = []
xmaxs = []
ymins = []
ymaxs = []
classes = []
classes_text = []
xmins.append(x_min)
xmaxs.append(x_max)
ymins.append(y_min)
ymaxs.append(y_max)
classes.append(int(class_id))
classes_text.append(str.encode(str(int(class_id))))
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(str.encode(os.path.basename(image_p
'image/source_id': dataset_util.bytes_feature(str.encode(os.path.basename(image_
'image/encoded': dataset_util.bytes_feature(encoded_image_data),
'image/format': dataset_util.bytes_feature(b'png'),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_example
def create_tf_record(output_file, image_folder, labels_folder):
writer = tf.io.TFRecordWriter(output_file)
image_files = os.listdir(image_folder)
writer.close()
# Paths
output_train_record = '/kaggle/working/train.record'
output_test_record = '/kaggle/working/test.record'
# Decode image
image = tf.image.decode_image(example['image/encoded'])
image = tf.image.convert_image_dtype(image, tf.uint8)
for i in range(num_boxes):
xmin, xmax, ymin, ymax = xmins[i].numpy(), xmaxs[i].numpy(), ymins[i].numpy(), ym
# Convert normalized coordinates to image coordinates
xmin, xmax, ymin, ymax = int(xmin * image.shape[1]), int(xmax * image.shape[1]),
rect = plt.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin,
linewidth=1, edgecolor='r', facecolor='none')
plt.gca().add_patch(rect)
plt.show()
# TFRecord files
train_record_path = '/kaggle/working/train.record'
test_record_path = '/kaggle/working/test.record'
# Creating datasets
train_dataset = tf.data.TFRecordDataset(train_record_path)
test_dataset = tf.data.TFRecordDataset(test_record_path)
print("Train Dataset")
Train Dataset
*********************************************
Test Dataset
*********************************************
E. Training stage
print(tf.__version__)
In [ ]: os.chdir('/kaggle/working/models/research')
!python object_detection/model_main_tf2.py --num_train_steps=100000 --model_dir=/kaggle
F. Inference stage
Importing modules
# Perform inference
infer = model.signatures["serving_default"]
output_dict = infer(input_tensor)
num_boxes = output_dict['detection_boxes'][0].shape[0]
if num_boxes > 0:
print(f"Number of bounding boxes detected in {image_file}: {num_boxes}")
else:
print(f"No bounding boxes detected in {image_file}.")
boxes = output_dict['detection_boxes'][0].numpy()
classes = output_dict['detection_classes'][0].numpy().astype(np.int32)
scores = output_dict['detection_scores'][0].numpy()
for i in range(len(boxes)):
box = boxes[i]
class_id = classes[i]
score = scores[i]
# Drawing rectangle
color = (0, 255, 0) # Green color for the rectangle
thickness = 2
cv2.rectangle(image_np, (xmin, ymin), (xmax, ymax), color, thickness)
Conclusion
1. Training for more steps are desired as we train for only 100000 steps
2. Only 779 training images and 249 testing images from CARLA for 5 classes are used. Feel free to
increase more
3. The challenge of getting desired inference is more since the bounding boxes are small in ground
truth and it's a synthetic dataset on which the pretrained model has never been trained esp. the
distilled mobilenet versions
4. Non Maximum suppression needs to be added to the inferencing code to filter out Bbox
predictions beyond an IOU and threshold
2. Retinanet from METAAI
In [ ]:
In [ ]:
In [ ]:
In [ ]: