Task 9 Implementation of Object Detection and Localization
Task 9 Implementation of Object Detection and Localization
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json
<IPython.core.display.HTML object>
Saving kaggle.json to kaggle.json
Archive: /content/object-detection-using-open-cv.zip
inflating: Data/coco_names.txt
inflating: Data/frozen_inference_graph.pb
inflating: Data/horses.jpg
inflating: Data/image1.jpg
inflating: Data/image3.jpg
inflating: Data/ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt
#First Image
[4]: import cv2
image = cv2.imread('Data/image1.jpg')
image = cv2.resize(image, (640, 480))
h = image.shape[0]
w = image.shape[1]
1
weights = "Data/frozen_inference_graph.pb"
model = "Data/ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt"
# load the MobileNet SSD model trained on the COCO dataset
net = cv2.dnn.readNetFromTensorflow(weights, model)
First, we load our image from disk, resize it, and grab the height and width of the image.
Then we load our model using OpenCV’s dnn module providing the references to the prototxt and
model files.
The sample files are provided in the folder which contains the source code along with some images.
Make sure to download these files from this link.
Next, let’s load our class labels:
[5]: # load the class labels the model was trained on
class_names = []
with open("Data/coco_names.txt", "r") as f:
class_names = f.read().strip().split("\n")
We first create the list class_names that will contain the names of the classes from the file.
Then we open our file using context manager and store each line, which corresponds to each class
name, in our list.
Let’s now apply object detection:
[6]: # create a blob from the image
blob = cv2.dnn.blobFromImage(
image, 1.0/127.5, (320, 320), [127.5, 127.5, 127.5])
# pass the blog through our network and get the output predictions
net.setInput(blob)
output = net.forward() # shape: (1, 1, 100, 7)
We create a blob using the cv2.dnn.blobFromImage function. Basically, this function is used to
preprocess our image and prepare it for the network. It applies a mean subtraction and a scaling
to the input image.
The first argument to this function is the input image. The second one is the scale factor which we
can use to scale our image.
The third argument is the spatial size of the output image and the last argument is the mean
subtraction values which are subtracted from channels.
The values for these parameters are provided in the documentation, don’t worry too much about
them.
Next, we set the blob as input for the network and detect the objects on the image using our Single
Shot Detector.
The result is stored in the output variable. If you print out the shape of this variable, you’ll get a
shape of (1, 1, 100, 7).
2
So we can loop over the detections to filter the results and get the bounding boxes of the detected
objects:
[7]: from google.colab.patches import cv2_imshow
# loop over the number of detected objects
for detection in output[0, 0, :, :]: # output[0, 0, :, :] has a shape of: (100,␣
↪7)
cv2_imshow(image)
cv2.waitKey()
3
[7]: -1
#Second Image
[8]: image1 = cv2.imread('Data/image3.jpg')
image1 = cv2.resize(image1, (640, 480))
h = image1.shape[0]
w = image1.shape[1]
4
probability = detection[2]
cv2_imshow(image1)
cv2.waitKey()
5
[10]: -1
#Third Image
[11]: image2 = cv2.imread('Data/horses.jpg')
image2 = cv2.resize(image2, (640, 480))
h = image2.shape[0]
w = image2.shape[1]
cv2_imshow(image2)
cv2.waitKey()
6
[13]: -1