0% found this document useful (0 votes)
3 views

B_DL_Assign

The document outlines the process for setting up an object localization task using Python libraries such as Albumentations, OpenCV, and PyTorch. It includes steps for installing necessary packages, downloading a dataset, creating a custom dataset class, and defining a model architecture using EfficientNet. Additionally, it describes data augmentation techniques and the training and evaluation functions for the model.

Uploaded by

Dnyanesh Radke
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

B_DL_Assign

The document outlines the process for setting up an object localization task using Python libraries such as Albumentations, OpenCV, and PyTorch. It includes steps for installing necessary packages, downloading a dataset, creating a custom dataset class, and defining a model architecture using EfficientNet. Additionally, it describes data augmentation techniques and the training and evaluation functions for the model.

Uploaded by

Dnyanesh Radke
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Intro Task, Object Localization

# install libraries/packages/modules

!pip install -U git+https://github.com/albumentations-


team/albumentations
!pip install timm
!pip install --upgrade opencv-contrib-python

Looking in indexes: https://pypi.org/simple, https://us-


python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/albumentations-team/albumentations
Cloning https://github.com/albumentations-team/albumentations to
/tmp/pip-req-build-lzcx8h8f
Running command git clone -q https://github.com/albumentations-
team/albumentations /tmp/pip-req-build-lzcx8h8f
Requirement already satisfied: numpy>=1.11.1 in
/usr/local/lib/python3.7/dist-packages (from albumentations==1.2.1)
(1.21.6)
Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-
packages (from albumentations==1.2.1) (1.7.3)
Requirement already satisfied: scikit-image>=0.16.1 in
/usr/local/lib/python3.7/dist-packages (from albumentations==1.2.1)
(0.18.3)
Requirement already satisfied: PyYAML in
/usr/local/lib/python3.7/dist-packages (from albumentations==1.2.1)
(3.13)
Requirement already satisfied: qudida>=0.0.4 in
/usr/local/lib/python3.7/dist-packages (from albumentations==1.2.1)
(0.0.4)
Requirement already satisfied: opencv-python>=4.1.1 in
/usr/local/lib/python3.7/dist-packages (from albumentations==1.2.1)
(4.6.0.66)
Requirement already satisfied: opencv-python-headless>=4.0.1 in
/usr/local/lib/python3.7/dist-packages (from qudida>=0.0.4-
>albumentations==1.2.1) (4.6.0.66)
Requirement already satisfied: scikit-learn>=0.19.1 in
/usr/local/lib/python3.7/dist-packages (from qudida>=0.0.4-
>albumentations==1.2.1) (1.0.2)
Requirement already satisfied: typing-extensions in
/usr/local/lib/python3.7/dist-packages (from qudida>=0.0.4-
>albumentations==1.2.1) (4.1.1)
Requirement already satisfied: matplotlib!=3.0.0,>=2.0.0 in
/usr/local/lib/python3.7/dist-packages (from scikit-image>=0.16.1-
>albumentations==1.2.1) (3.2.2)
Requirement already satisfied: networkx>=2.0 in
/usr/local/lib/python3.7/dist-packages (from scikit-image>=0.16.1-
>albumentations==1.2.1) (2.6.3)
Requirement already satisfied: imageio>=2.3.0 in
/usr/local/lib/python3.7/dist-packages (from scikit-image>=0.16.1-
>albumentations==1.2.1) (2.9.0)
Requirement already satisfied: tifffile>=2019.7.26 in
/usr/local/lib/python3.7/dist-packages (from scikit-image>=0.16.1-
>albumentations==1.2.1) (2021.11.2)
Requirement already satisfied: PyWavelets>=1.1.1 in
/usr/local/lib/python3.7/dist-packages (from scikit-image>=0.16.1-
>albumentations==1.2.1) (1.3.0)
Requirement already satisfied: pillow!=7.1.0,!=7.1.1,>=4.3.0 in
/usr/local/lib/python3.7/dist-packages (from scikit-image>=0.16.1-
>albumentations==1.2.1) (7.1.2)
Requirement already satisfied: kiwisolver>=1.0.1 in
/usr/local/lib/python3.7/dist-packages (from matplotlib!
=3.0.0,>=2.0.0->scikit-image>=0.16.1->albumentations==1.2.1) (1.4.4)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!
=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from
matplotlib!=3.0.0,>=2.0.0->scikit-image>=0.16.1-
>albumentations==1.2.1) (3.0.9)
Requirement already satisfied: cycler>=0.10 in
/usr/local/lib/python3.7/dist-packages (from matplotlib!
=3.0.0,>=2.0.0->scikit-image>=0.16.1->albumentations==1.2.1) (0.11.0)
Requirement already satisfied: python-dateutil>=2.1 in
/usr/local/lib/python3.7/dist-packages (from matplotlib!
=3.0.0,>=2.0.0->scikit-image>=0.16.1->albumentations==1.2.1) (2.8.2)
Requirement already satisfied: six>=1.5 in
/usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1-
>matplotlib!=3.0.0,>=2.0.0->scikit-image>=0.16.1-
>albumentations==1.2.1) (1.15.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in
/usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.19.1-
>qudida>=0.0.4->albumentations==1.2.1) (3.1.0)
Requirement already satisfied: joblib>=0.11 in
/usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.19.1-
>qudida>=0.0.4->albumentations==1.2.1) (1.1.0)
Looking in indexes: https://pypi.org/simple, https://us-
python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: timm in /usr/local/lib/python3.7/dist-
packages (0.6.7)
Requirement already satisfied: torchvision in
/usr/local/lib/python3.7/dist-packages (from timm) (0.13.1+cu113)
Requirement already satisfied: torch>=1.4 in
/usr/local/lib/python3.7/dist-packages (from timm) (1.12.1+cu113)
Requirement already satisfied: typing-extensions in
/usr/local/lib/python3.7/dist-packages (from torch>=1.4->timm) (4.1.1)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in
/usr/local/lib/python3.7/dist-packages (from torchvision->timm)
(7.1.2)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-
packages (from torchvision->timm) (1.21.6)
Requirement already satisfied: requests in
/usr/local/lib/python3.7/dist-packages (from torchvision->timm)
(2.23.0)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1
in /usr/local/lib/python3.7/dist-packages (from requests->torchvision-
>timm) (1.24.3)
Requirement already satisfied: chardet<4,>=3.0.2 in
/usr/local/lib/python3.7/dist-packages (from requests->torchvision-
>timm) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in
/usr/local/lib/python3.7/dist-packages (from requests->torchvision-
>timm) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in
/usr/local/lib/python3.7/dist-packages (from requests->torchvision-
>timm) (2022.6.15)
Looking in indexes: https://pypi.org/simple, https://us-
python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: opencv-contrib-python in
/usr/local/lib/python3.7/dist-packages (4.6.0.66)
Requirement already satisfied: numpy>=1.14.5 in
/usr/local/lib/python3.7/dist-packages (from opencv-contrib-python)
(1.21.6)

# Download Dataset

!git clone https://github.com/parth1620/object-localization-


dataset.git

fatal: destination path 'object-localization-dataset' already exists


and is not an empty directory.

import pandas as pd
import numpy as np
import cv2
import matplotlib.pyplot as plt
import torch
from tqdm.notebook import tqdm

from sklearn.model_selection import train_test_split


from sklearn.preprocessing import LabelEncoder

import sys
sys.path.append('/content/object-localization-dataset')

Configurations
CSV_FILE = '/content/object-localization-dataset/train.csv'
DATA_DIR = '/content/object-localization-dataset/'

DEVICE = 'cuda'

BATCH_SIZE = 16
IMG_SIZE = 140

LR= 0.001
EPOCHS = 40
MODEL_NAME = 'efficientnet_b0'

NUM_COR = 4

df = pd.read_csv(CSV_FILE)
df.head(10)
df

img_path xmin ymin xmax ymax width


height \
0 train_images/mushroom_51.jpg 24 23 202 183 227
227
1 train_images/eggplant_37.jpg 34 34 88 201 227
227
2 train_images/mushroom_20.jpg 49 86 183 185 227
227
3 train_images/eggplant_51.jpg 51 59 191 164 227
227
4 train_images/eggplant_26.jpg 40 70 179 168 227
227
.. ... ... ... ... ... ... .
..
181 train_images/eggplant_62.jpg 67 22 177 215 227
227
182 train_images/cucumber_45.jpg 11 31 217 208 227
227
183 train_images/mushroom_37.jpg 93 13 158 193 227
227
184 train_images/eggplant_44.jpg 21 59 192 171 227
227
185 train_images/mushroom_16.jpg 43 20 197 182 227
227

label
0 mushroom
1 eggplant
2 mushroom
3 eggplant
4 eggplant
.. ...
181 eggplant
182 cucumber
183 mushroom
184 eggplant
185 mushroom

[186 rows x 8 columns]

Understand the dataset


row = df.iloc[182]

img = cv2.imread(DATA_DIR + row.img_path)


img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
pt1=(row.xmin, row.ymin)
pt2= (row.xmax,row.ymax)
bnd_box_img = cv2.rectangle(img, pt1,pt2,(255,0,0), 2)
plt.imshow(bnd_box_img)

<matplotlib.image.AxesImage at 0x7efbe0e13b50>
train_df,valid_df = train_test_split(df,test_size = .20, random_state
= 42)

Augmentations
import albumentations as A

train_augs = A.Compose([
A.Resize(IMG_SIZE, IMG_SIZE),
A.HorizontalFlip(p=0.5),
A.VerticalFlip(p=0.5),
A.Rotate()
], bbox_params = A.BboxParams(format = 'pascal_voc', label_fields =
['class_labels']))

valid_augs = A.Compose([
A.Resize(IMG_SIZE, IMG_SIZE),
], bbox_params = A.BboxParams(format = 'pascal_voc', label_fields =
['class_labels']))

Create Custom Dataset


class ObjectLocDataset(torch.utils.data.Dataset):
def __init__(self,df,augmentations):
self.df = df
self.augmentations = augmentations

def __len__(self):
return len(self.df)

def __getitem__(self, idx):

row = self.df.iloc[idx]

xmin= row.xmin
ymin= row.ymin
xmax= row.xmax
ymax= row.ymax

bbox = [[xmin,ymin,xmax,ymax]]

img_path = DATA_DIR + row.img_path


img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

if self.augmentations:
data = self.augmentations(image = img, bboxes = bbox,
class_labels = [None])
img = data['image']
bbox = data['bboxes'][0]

img = torch.from_numpy(img).permute(2,0,1)/255.0
bbox= torch.Tensor(bbox)
return img,bbox

trainset = ObjectLocDataset(train_df,train_augs)
validset = ObjectLocDataset(valid_df,valid_augs)

print(f"Total Examples in trainset : {len(trainset)}")


print(f"Total Examples in validset : {len(validset)}")

Total Examples in trainset : 148


Total Examples in validset : 38

img, bbox = trainset[10]

xmin, ymin, xmax, ymax = bbox

pt1 = (int(xmin), int(ymin))


pt2 = (int(xmax), int(ymax))

bnd_img = cv2.rectangle(img.permute(1, 2, 0).numpy(),pt1, pt2,


(255,0,0),2)
plt.imshow(bnd_img)
WARNING:matplotlib.image:Clipping input data to the valid range for
imshow with RGB data ([0..1] for floats or [0..255] for integers).

<matplotlib.image.AxesImage at 0x7efbcd28ddd0>

Load dataset into batches


trainloader = torch.utils.data.DataLoader(trainset, batch_size =
BATCH_SIZE, shuffle= True)
validloader = torch.utils.data.DataLoader(validset, batch_size =
BATCH_SIZE, shuffle= False)

print("Total no. batches in trainloader :


{}".format(len(trainloader)))
print("Total no. batches in validloader :
{}".format(len(validloader)))

Total no. batches in trainloader : 10


Total no. batches in validloader : 3

for images,bboxes in trainloader:


break

print("Shape of one batch images : {}".format(images.shape))


print("Shape of one batch bboxes : {}".format(bboxes.shape))
Shape of one batch images : torch.Size([16, 3, 140, 140])
Shape of one batch bboxes : torch.Size([16, 4])

Create Model
from torch import nn
import timm

class ObjLocModel(nn.Module):

def __init__(self):
super(ObjLocModel,self).__init__()

self.backbone = timm.create_model(MODEL_NAME, pretrained = True,


num_classes = 4)

def forward(self, images, gt_bboxes = None):


bboxes = self.backbone(images)

if gt_bboxes != None:
loss = nn.MSELoss()(bboxes, gt_bboxes)
return bboxes,loss

return bboxes

model = ObjLocModel()
model.to(DEVICE);

random_img = torch.rand(1,3,140,140).to(DEVICE)
model(random_img).shape

torch.Size([1, 4])

Create Train and Eval Function


def train_fn(model, dataloader, optimizer):
total_loss = 0.0
model.train()

for data in tqdm(dataloader):

images, gt_bboxes = data


images, gt_bboxes = images.to(DEVICE), gt_bboxes.to(DEVICE)

bboxex,loss = model(images,gt_bboxes)

optimizer.zero_grad()
loss.backward()
optimizer.step()

total_loss += loss.item()

return total_loss / len(dataloader)

def eval_fn(model, dataloader):


total_loss = 0.0
model.eval()

with torch.no_grad():
for data in tqdm(dataloader):

images, gt_bboxes = data


images, gt_bboxes = images.to(DEVICE), gt_bboxes.to(DEVICE)

bboxex,loss = model(images,gt_bboxes)
total_loss += loss.item()

return total_loss / len(dataloader)

#Training Loop

optimizer = torch.optim.Adam(model.parameters(),lr= LR)

best_valid_loss = np.Inf

for i in range(EPOCHS):

train_loss = train_fn(model, trainloader, optimizer)


valid_loss = eval_fn(model, validloader)

if valid_loss<best_valid_loss:
torch.save(model.state_dict(),'best_model.pt')
print("Weights are saved")
best_valid_loss = valid_loss

print(f"Epoch : {i+1} train loss : {train_loss} valid loss :


{valid_loss}")

{"model_id":"93cb1c8c695f4c18bdf5bc1245dfa8f4","version_major":2,"vers
ion_minor":0}

{"model_id":"e3c13bf0c4c6458a9b090da69575f19b","version_major":2,"vers
ion_minor":0}

Weights are saved


Epoch : 1 train loss : 5142.847241210938 valid loss :
3667.647216796875
{"model_id":"d16c4f1790a84a3cb299f7ecd63a596d","version_major":2,"vers
ion_minor":0}

{"model_id":"34e4beed8cd7416e813d432cdd138eb4","version_major":2,"vers
ion_minor":0}

Weights are saved


Epoch : 2 train loss : 2169.490625 valid loss : 799.9299723307291

{"model_id":"96c7d5bcd2cd42dcb819f0b28811fe00","version_major":2,"vers
ion_minor":0}

{"model_id":"9df95f743ffe44f9aaf71ae37f4c05d2","version_major":2,"vers
ion_minor":0}

Weights are saved


Epoch : 3 train loss : 1232.0718994140625 valid loss :
217.66906229654947

{"model_id":"98d507882dee42839cf2802134baaccf","version_major":2,"vers
ion_minor":0}

{"model_id":"29aa091d59b74a81bdabd7d066671fef","version_major":2,"vers
ion_minor":0}

Weights are saved


Epoch : 4 train loss : 819.05029296875 valid loss : 209.311279296875

{"model_id":"540ac4419c0a457088ad93762eb0de47","version_major":2,"vers
ion_minor":0}

{"model_id":"83896e35d0dc4cad8641a2dd5581fde8","version_major":2,"vers
ion_minor":0}

Weights are saved


Epoch : 5 train loss : 683.5140625 valid loss : 203.95267232259116

{"model_id":"42ea7a894e72434db16e46e73bc6368c","version_major":2,"vers
ion_minor":0}

{"model_id":"be57c3d8959041bf9c238478eb55b123","version_major":2,"vers
ion_minor":0}

Epoch : 6 train loss : 491.652587890625 valid loss :


205.97164916992188

{"model_id":"7f5de4361ecf483c9801e6899fdb10b1","version_major":2,"vers
ion_minor":0}

{"model_id":"b53951f697d04deab673dc0f2b841969","version_major":2,"vers
ion_minor":0}
Epoch : 7 train loss : 360.6284713745117 valid loss :
305.06504313151044

{"model_id":"547b6d6dc51a4a198e4f6d204cb4cd63","version_major":2,"vers
ion_minor":0}

{"model_id":"bbdd270ea6f749bab25cc974219a02f9","version_major":2,"vers
ion_minor":0}

Weights are saved


Epoch : 8 train loss : 250.20942230224608 valid loss :
160.16223907470703

{"model_id":"991180bcffee4d7493eed2206968aa7d","version_major":2,"vers
ion_minor":0}

{"model_id":"d28c948835694c01a84cb7a8a4535305","version_major":2,"vers
ion_minor":0}

Weights are saved


Epoch : 9 train loss : 250.328173828125 valid loss :
142.54071553548178

{"model_id":"956cd408b47940d68cfbee0916786177","version_major":2,"vers
ion_minor":0}

{"model_id":"83658e8f6f35467bb1f36c5a3bc87e64","version_major":2,"vers
ion_minor":0}

Weights are saved


Epoch : 10 train loss : 188.7417350769043 valid loss :
126.22472890218098

{"model_id":"561d2e29151c4171bd83e141810e0894","version_major":2,"vers
ion_minor":0}

{"model_id":"554f4f3769384df8b57b38428afb0051","version_major":2,"vers
ion_minor":0}

Weights are saved


Epoch : 11 train loss : 149.76507568359375 valid loss :
123.25281778971355

{"model_id":"df4bf081538a417aa6b992d7d85f4910","version_major":2,"vers
ion_minor":0}

{"model_id":"18fad7b07fd34450afc9d38fec6a6078","version_major":2,"vers
ion_minor":0}

Weights are saved


Epoch : 12 train loss : 121.95893249511718 valid loss :
97.7426249186198
{"model_id":"28b771f6fe2e4e8db41cc25146df9e71","version_major":2,"vers
ion_minor":0}

{"model_id":"443205415dd24133a2bd68da63048d34","version_major":2,"vers
ion_minor":0}

Weights are saved


Epoch : 13 train loss : 118.0179946899414 valid loss :
81.26379140218098

{"model_id":"2f87224d401441518e91e47565aff433","version_major":2,"vers
ion_minor":0}

{"model_id":"f0cb06267c124920b06f951357a9390f","version_major":2,"vers
ion_minor":0}

Epoch : 14 train loss : 103.23489303588867 valid loss :


136.75027974446616

{"model_id":"9ba34b9bb4be411e813c261ba1ab1d5b","version_major":2,"vers
ion_minor":0}

{"model_id":"522f4ccb9eac4a34a04c1af5d8f5a0b6","version_major":2,"vers
ion_minor":0}

Epoch : 15 train loss : 92.04374809265137 valid loss :


119.39169565836589

{"model_id":"5a72382a0b2b4f5ba08f62d45a5047f4","version_major":2,"vers
ion_minor":0}

{"model_id":"c6c8e9b55ff748f3898ab2c582004de0","version_major":2,"vers
ion_minor":0}

Epoch : 16 train loss : 83.40096855163574 valid loss :


90.8295669555664

{"model_id":"3834e81bd6ea4e76a52be80c9e74ce34","version_major":2,"vers
ion_minor":0}

{"model_id":"e8f7d62439b5430fb2ad3850c21ea739","version_major":2,"vers
ion_minor":0}

Epoch : 17 train loss : 83.5185432434082 valid loss :


94.33363342285156

{"model_id":"58aed79a5d7a484dabf04ce2c1a75c32","version_major":2,"vers
ion_minor":0}

{"model_id":"791d53504d24473ab9992aa091112948","version_major":2,"vers
ion_minor":0}
Weights are saved
Epoch : 18 train loss : 73.33877410888672 valid loss :
73.88326644897461

{"model_id":"06248e87e2a74398bf3e198f379963bd","version_major":2,"vers
ion_minor":0}

{"model_id":"1ee74e47124a4451aaec9c5f9e96d7a9","version_major":2,"vers
ion_minor":0}

Epoch : 19 train loss : 79.07369270324708 valid loss :


85.07627868652344

{"model_id":"f5c4f7d9fb13428ca463a26be46cfe38","version_major":2,"vers
ion_minor":0}

{"model_id":"8c8265bfc28e47c9a8068c43b865a3b9","version_major":2,"vers
ion_minor":0}

Epoch : 20 train loss : 55.44272117614746 valid loss :


106.5606180826823

{"model_id":"6de4f48957a2491eac7f4d136898631d","version_major":2,"vers
ion_minor":0}

{"model_id":"f543253e199f4346b6a6110c8592250d","version_major":2,"vers
ion_minor":0}

Epoch : 21 train loss : 61.454484939575195 valid loss :


83.29894765218098

{"model_id":"319422ae246f4dcc9544aeb6fe57e45d","version_major":2,"vers
ion_minor":0}

{"model_id":"cb44617bf22a459d92e937aec7107963","version_major":2,"vers
ion_minor":0}

Weights are saved


Epoch : 22 train loss : 52.500980377197266 valid loss :
66.48759206136067

{"model_id":"3c3aaf56d6004d5b9cd28f694ff4cd72","version_major":2,"vers
ion_minor":0}

{"model_id":"9949625ef78c4ee5a48d416dedf76cb0","version_major":2,"vers
ion_minor":0}

Weights are saved


Epoch : 23 train loss : 46.83328800201416 valid loss :
54.53442891438802

{"model_id":"102f9ce88a554c95ba2f8dd864993bf2","version_major":2,"vers
ion_minor":0}
{"model_id":"6b55ebce554347e29f42aae61f3b0d4d","version_major":2,"vers
ion_minor":0}

Weights are saved


Epoch : 24 train loss : 51.07123432159424 valid loss :
51.29500071207682

{"model_id":"65ac24c9c4694596abe79bfddf82c84a","version_major":2,"vers
ion_minor":0}

{"model_id":"75a13c0529a6425f9bf20ce764416607","version_major":2,"vers
ion_minor":0}

Epoch : 25 train loss : 43.201372146606445 valid loss :


81.85339609781902

{"model_id":"37caf0a047544b27875b1d9b50099ef7","version_major":2,"vers
ion_minor":0}

{"model_id":"4bb87ebdfd264473b200e7f188514161","version_major":2,"vers
ion_minor":0}

Epoch : 26 train loss : 42.57255249023437 valid loss :


55.61776351928711

{"model_id":"2d20c03c32ba4ff18b07f897a3f2ce7b","version_major":2,"vers
ion_minor":0}

{"model_id":"894a33e0aad54a1ab9d4d24c926b1a0d","version_major":2,"vers
ion_minor":0}

Weights are saved


Epoch : 27 train loss : 47.65526351928711 valid loss :
43.85478973388672

{"model_id":"cee7b51e5cf245a58e4f83af53f97df9","version_major":2,"vers
ion_minor":0}

{"model_id":"5f54a767c6da404a8fd0912ade63267a","version_major":2,"vers
ion_minor":0}

Epoch : 28 train loss : 72.92973976135254 valid loss :


67.98496882120769

{"model_id":"951bf93ddb49421baa7c7ec673e2b1b1","version_major":2,"vers
ion_minor":0}

{"model_id":"c5e45a8c551c4f63a5f6ab4b913b9777","version_major":2,"vers
ion_minor":0}

Weights are saved


Epoch : 29 train loss : 51.90148239135742 valid loss :
43.722574869791664
{"model_id":"1a4ec2a724274961bee75d15a8fc1c43","version_major":2,"vers
ion_minor":0}

{"model_id":"4d11953559f7452a8782a181a1f1e430","version_major":2,"vers
ion_minor":0}

Epoch : 30 train loss : 66.98183631896973 valid loss :


78.21378962198894

{"model_id":"fe7f078c40334c7192f0c516a06f360f","version_major":2,"vers
ion_minor":0}

{"model_id":"286ed6c99a08433da0ba6b766469ede8","version_major":2,"vers
ion_minor":0}

Epoch : 31 train loss : 45.93609046936035 valid loss :


60.52271016438802

{"model_id":"30a8ebc04962498a91ecca99560d3202","version_major":2,"vers
ion_minor":0}

{"model_id":"0cd8a0ce1455449aae9b15ab67b11a48","version_major":2,"vers
ion_minor":0}

Epoch : 32 train loss : 46.778146743774414 valid loss :


83.5928560892741

{"model_id":"8d9a107d164743ae9c8694d72b3e50cb","version_major":2,"vers
ion_minor":0}

{"model_id":"4f0fae6c18fc445e9504ba5d3ce987d4","version_major":2,"vers
ion_minor":0}

Epoch : 33 train loss : 56.635987281799316 valid loss :


85.60576502482097

{"model_id":"f8dfe8c6847b49829509c574678c0021","version_major":2,"vers
ion_minor":0}

{"model_id":"2b733eb023174ff4b5d4d87e6a665a7a","version_major":2,"vers
ion_minor":0}

Epoch : 34 train loss : 46.19539108276367 valid loss :


75.08321634928386

{"model_id":"1f3c2ceee9a744a4b8cd76a726217c97","version_major":2,"vers
ion_minor":0}

{"model_id":"b0c3cb8fd7b840bbb763a984e54fa835","version_major":2,"vers
ion_minor":0}

Epoch : 35 train loss : 41.255849838256836 valid loss :


67.58123588562012
{"model_id":"ee52f41a62d549b1a856528b25d86808","version_major":2,"vers
ion_minor":0}

{"model_id":"9be311c1ff1c44cab6e4a0629c38a6b0","version_major":2,"vers
ion_minor":0}

Epoch : 36 train loss : 40.00660438537598 valid loss :


60.169359842936196

{"model_id":"b2e158d3d65e4b32907d1954d197c68e","version_major":2,"vers
ion_minor":0}

{"model_id":"12b580fc3dcd442a97fb6a97af53a4f5","version_major":2,"vers
ion_minor":0}

Epoch : 37 train loss : 34.86181306838989 valid loss :


63.82146962483724

{"model_id":"12364ad74b574a69933f78d2a061deb3","version_major":2,"vers
ion_minor":0}

{"model_id":"278ea876608345f9bffc6752edd616a2","version_major":2,"vers
ion_minor":0}

Epoch : 38 train loss : 45.070559692382815 valid loss :


79.29641215006511

{"model_id":"be961bb77f144408bf8e661ff7dc491e","version_major":2,"vers
ion_minor":0}

{"model_id":"b6ccd588718c4d42810ba9edb67c59db","version_major":2,"vers
ion_minor":0}

Epoch : 39 train loss : 36.28313503265381 valid loss :


48.7588259379069

{"model_id":"f468b49a8252418e8ab8c0d285cb7e1d","version_major":2,"vers
ion_minor":0}

{"model_id":"9de0e44ee07d4f26b5d8eff6832f0cbc","version_major":2,"vers
ion_minor":0}

Epoch : 40 train loss : 44.24296646118164 valid loss :


55.153724670410156

#Inference

import utils

model.load_state_dict(torch.load('best_model.pt'))
model.eval()
with torch.no_grad():

image,gt_bbox = validset[18]
image = image.unsqueeze(0).to(DEVICE)
out_bbox = model(image)

utils.compare_plots(image,gt_bbox, out_bbox)

WARNING:matplotlib.image:Clipping input data to the valid range for


imshow with RGB data ([0..1] for floats or [0..255] for integers).

Linkedin : www.linkedin.com/in/KaziTanvir Github : https://github.com/KaziTanvir


Tweet Emotion Recognition: Natural Language Processing
with TensorFlow

Dataset: Tweet Emotion Dataset

This is a starter notebook for the guided project Tweet Emotion Recognition with TensorFlow

A complete version of this notebook is available in the course resources

Task 1: Introduction
Task 2: Setup and Imports
1. Installing Hugging Face's nlp package
2. Importing libraries
!pip install nlp

Requirement already satisfied: nlp in /usr/local/lib/python3.11/dist-


packages (0.4.0)
Requirement already satisfied: numpy in
/usr/local/lib/python3.11/dist-packages (from nlp) (1.26.4)
Requirement already satisfied: pyarrow>=0.16.0 in
/usr/local/lib/python3.11/dist-packages (from nlp) (17.0.0)
Requirement already satisfied: dill in /usr/local/lib/python3.11/dist-
packages (from nlp) (0.3.8)
Requirement already satisfied: pandas in
/usr/local/lib/python3.11/dist-packages (from nlp) (2.2.2)
Requirement already satisfied: requests>=2.19.0 in
/usr/local/lib/python3.11/dist-packages (from nlp) (2.32.3)
Requirement already satisfied: tqdm>=4.27 in
/usr/local/lib/python3.11/dist-packages (from nlp) (4.67.1)
Requirement already satisfied: filelock in
/usr/local/lib/python3.11/dist-packages (from nlp) (3.16.1)
Requirement already satisfied: xxhash in
/usr/local/lib/python3.11/dist-packages (from nlp) (3.5.0)
Requirement already satisfied: charset-normalizer<4,>=2 in
/usr/local/lib/python3.11/dist-packages (from requests>=2.19.0->nlp)
(3.4.1)
Requirement already satisfied: idna<4,>=2.5 in
/usr/local/lib/python3.11/dist-packages (from requests>=2.19.0->nlp)
(3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in
/usr/local/lib/python3.11/dist-packages (from requests>=2.19.0->nlp)
(2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in
/usr/local/lib/python3.11/dist-packages (from requests>=2.19.0->nlp)
(2024.12.14)
Requirement already satisfied: python-dateutil>=2.8.2 in
/usr/local/lib/python3.11/dist-packages (from pandas->nlp) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in
/usr/local/lib/python3.11/dist-packages (from pandas->nlp) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in
/usr/local/lib/python3.11/dist-packages (from pandas->nlp) (2024.2)
Requirement already satisfied: six>=1.5 in
/usr/local/lib/python3.11/dist-packages (from python-dateutil>=2.8.2-
>pandas->nlp) (1.17.0)

%matplotlib inline

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import nlp
import random

def show_history(h):
epochs_trained = len(h.history['loss'])
plt.figure(figsize=(16, 6))

plt.subplot(1, 2, 1)
plt.plot(range(0, epochs_trained), h.history.get('accuracy'),
label='Training')
plt.plot(range(0, epochs_trained), h.history.get('val_accuracy'),
label='Validation')
plt.ylim([0., 1.])
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(range(0, epochs_trained), h.history.get('loss'),
label='Training')
plt.plot(range(0, epochs_trained), h.history.get('val_loss'),
label='Validation')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

def show_confusion_matrix(y_true, y_pred, classes):


from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_true, y_pred, normalize='true')

plt.figure(figsize=(8, 8))
sp = plt.subplot(1, 1, 1)
ctx = sp.matshow(cm)
plt.xticks(list(range(0, 6)), labels=classes)
plt.yticks(list(range(0, 6)), labels=classes)
plt.colorbar(ctx)
plt.show()

print('Using TensorFlow version', tf.__version__)

Using TensorFlow version 2.17.1

Task 3: Importing Data


1. Importing the Tweet Emotion dataset
2. Creating train, validation and test sets
3. Extracting tweets and labels from the examples
!pip install datasets

Requirement already satisfied: datasets in


/usr/local/lib/python3.11/dist-packages (3.2.0)
Requirement already satisfied: filelock in
/usr/local/lib/python3.11/dist-packages (from datasets) (3.16.1)
Requirement already satisfied: numpy>=1.17 in
/usr/local/lib/python3.11/dist-packages (from datasets) (1.26.4)
Requirement already satisfied: pyarrow>=15.0.0 in
/usr/local/lib/python3.11/dist-packages (from datasets) (17.0.0)
Requirement already satisfied: dill<0.3.9,>=0.3.0 in
/usr/local/lib/python3.11/dist-packages (from datasets) (0.3.8)
Requirement already satisfied: pandas in
/usr/local/lib/python3.11/dist-packages (from datasets) (2.2.2)
Requirement already satisfied: requests>=2.32.2 in
/usr/local/lib/python3.11/dist-packages (from datasets) (2.32.3)
Requirement already satisfied: tqdm>=4.66.3 in
/usr/local/lib/python3.11/dist-packages (from datasets) (4.67.1)
Requirement already satisfied: xxhash in
/usr/local/lib/python3.11/dist-packages (from datasets) (3.5.0)
Requirement already satisfied: multiprocess<0.70.17 in
/usr/local/lib/python3.11/dist-packages (from datasets) (0.70.16)
Requirement already satisfied: fsspec<=2024.9.0,>=2023.1.0 in
/usr/local/lib/python3.11/dist-packages (from
fsspec[http]<=2024.9.0,>=2023.1.0->datasets) (2024.9.0)
Requirement already satisfied: aiohttp in
/usr/local/lib/python3.11/dist-packages (from datasets) (3.11.11)
Requirement already satisfied: huggingface-hub>=0.23.0 in
/usr/local/lib/python3.11/dist-packages (from datasets) (0.27.1)
Requirement already satisfied: packaging in
/usr/local/lib/python3.11/dist-packages (from datasets) (24.2)
Requirement already satisfied: pyyaml>=5.1 in
/usr/local/lib/python3.11/dist-packages (from datasets) (6.0.2)
Requirement already satisfied: aiohappyeyeballs>=2.3.0 in
/usr/local/lib/python3.11/dist-packages (from aiohttp->datasets)
(2.4.4)
Requirement already satisfied: aiosignal>=1.1.2 in
/usr/local/lib/python3.11/dist-packages (from aiohttp->datasets)
(1.3.2)
Requirement already satisfied: attrs>=17.3.0 in
/usr/local/lib/python3.11/dist-packages (from aiohttp->datasets)
(24.3.0)
Requirement already satisfied: frozenlist>=1.1.1 in
/usr/local/lib/python3.11/dist-packages (from aiohttp->datasets)
(1.5.0)
Requirement already satisfied: multidict<7.0,>=4.5 in
/usr/local/lib/python3.11/dist-packages (from aiohttp->datasets)
(6.1.0)
Requirement already satisfied: propcache>=0.2.0 in
/usr/local/lib/python3.11/dist-packages (from aiohttp->datasets)
(0.2.1)
Requirement already satisfied: yarl<2.0,>=1.17.0 in
/usr/local/lib/python3.11/dist-packages (from aiohttp->datasets)
(1.18.3)
Requirement already satisfied: typing-extensions>=3.7.4.3 in
/usr/local/lib/python3.11/dist-packages (from huggingface-hub>=0.23.0-
>datasets) (4.12.2)
Requirement already satisfied: charset-normalizer<4,>=2 in
/usr/local/lib/python3.11/dist-packages (from requests>=2.32.2-
>datasets) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in
/usr/local/lib/python3.11/dist-packages (from requests>=2.32.2-
>datasets) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in
/usr/local/lib/python3.11/dist-packages (from requests>=2.32.2-
>datasets) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in
/usr/local/lib/python3.11/dist-packages (from requests>=2.32.2-
>datasets) (2024.12.14)
Requirement already satisfied: python-dateutil>=2.8.2 in
/usr/local/lib/python3.11/dist-packages (from pandas->datasets)
(2.8.2)
Requirement already satisfied: pytz>=2020.1 in
/usr/local/lib/python3.11/dist-packages (from pandas->datasets)
(2024.2)
Requirement already satisfied: tzdata>=2022.7 in
/usr/local/lib/python3.11/dist-packages (from pandas->datasets)
(2024.2)
Requirement already satisfied: six>=1.5 in
/usr/local/lib/python3.11/dist-packages (from python-dateutil>=2.8.2-
>pandas->datasets) (1.17.0)

from datasets import load_dataset


ds = load_dataset("dair-ai/emotion", "split")

/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/
_auth.py:94: UserWarning:
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your
settings tab (https://huggingface.co/settings/tokens), set it as
secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to
access public models or datasets.
warnings.warn(

{"model_id":"48e6c423f89b49e8924d36611a5aeb13","version_major":2,"vers
ion_minor":0}

{"model_id":"685cae32b9df4be2a5bb8534b83da466","version_major":2,"vers
ion_minor":0}

{"model_id":"d3a263d898eb4d818da3982702ed86b4","version_major":2,"vers
ion_minor":0}

{"model_id":"5ccd40ac413a44628562db72c199eed9","version_major":2,"vers
ion_minor":0}

{"model_id":"49dd9fc602fa42c59ca1a9caaff88d7f","version_major":2,"vers
ion_minor":0}

{"model_id":"bf52282fb99f4c55a1fd15874062e68f","version_major":2,"vers
ion_minor":0}

{"model_id":"9153e850a4f44ff595b4f44a6a222307","version_major":2,"vers
ion_minor":0}

ds

DatasetDict({
train: Dataset({
features: ['text', 'label'],
num_rows: 16000
})
validation: Dataset({
features: ['text', 'label'],
num_rows: 2000
})
test: Dataset({
features: ['text', 'label'],
num_rows: 2000
})
})

train = ds['train']
val = ds['validation']
test = ds['test']

train

Dataset({
features: ['text', 'label'],
num_rows: 16000
})

def get_tweets(x):
tweets = [x['text'] for x in x]
labels = [x['label'] for x in x]

return tweets, labels

tweets,labels=get_tweets(train)

tweets[0]

{"type":"string"}

labels[0]

tweets[1],labels[1]

('i can go from feeling so hopeless to so damned hopeful just from


being around someone who cares and is awake',
0)

import numpy as np
np.unique(labels)

array([0, 1, 2, 3, 4, 5])

labels -> 0: sadness, 1: joy, 2: love, 3: angry, 4: fear, 5: surprise

Task 4: Tokenizer
1. Tokenizing the tweets
from tensorflow.keras.preprocessing.text import Tokenizer

tokenised = Tokenizer(num_words = 10000, oov_token ='<UNK>')


#NUM_WORDS -> TOTAL NUMBER OF WORDS TO BE STORED IN VOCAB
#OOV -> OUT-OF-VOCABULARY TOKEN, I.E, IF ANY WORD IS NOT IN VOCAB THEN
REPLACE IT WITH UNK -> UNKNOWN

tokenised.fit_on_texts(tweets)

To check what actually tokenisation did

tweets[0]

{"type":"string"}

tokenised.texts_to_sequences([tweets[0]])

[[2, 139, 3, 679]]

Task 5: Padding and Truncating Sequences


1. Checking length of the tweets
2. Creating padded sequences
lengths = [len(t.split(' ')) for t in tweets]
plt.hist(lengths, bins=len(set(lengths)))
plt.show()
maxlen = 50

from tensorflow.keras.preprocessing.sequence import pad_sequences


padded = pad_sequences(tokenised.texts_to_sequences(tweets),
maxlen=maxlen, padding='post', truncating='post')

padded

array([[ 2, 139, 3, ..., 0, 0, 0],


[ 2, 40, 101, ..., 0, 0, 0],
[ 17, 3060, 7, ..., 0, 0, 0],
...,
[ 2, 3, 327, ..., 0, 0, 0],
[ 2, 3, 14, ..., 0, 0, 0],
[ 2, 47, 7, ..., 0, 0, 0]], dtype=int32)

len(padded[0])

50

Task 6: Preparing the Labels


1. Creating classes to index and index to classes dictionaries
2. Converting text labels to numeric labels (already implemented and labels are numerical)
classes = set(labels)
print(classes)

{0, 1, 2, 3, 4, 5}

plt.hist(labels)
plt.show()
print(labels[0])

0->sadness (dataset is having labels in numerical format)

Task 7: Creating the Model


1. Creating the model
2. Compiling the model
model = tf.keras.models.Sequential([
tf.keras.layers.Embedding(10000, 16, input_length=50),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(20,
return_sequences=True)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(20)),
tf.keras.layers.Dense(6, activation='softmax')
])

model.compile(loss='sparse_categorical_crossentropy',
optimizer='adam', metrics=['accuracy'])

/usr/local/lib/python3.11/dist-packages/keras/src/layers/core/
embedding.py:90: UserWarning: Argument `input_length` is deprecated.
Just remove it.
warnings.warn(
model.summary()

Model: "sequential"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳
━━━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃
Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇
━━━━━━━━━━━━━━━━━┩
│ embedding (Embedding) │ ? │
0 (unbuilt) │
├──────────────────────────────────────┼─────────────────────────────┼
─────────────────┤
│ bidirectional (Bidirectional) │ ? │
0 (unbuilt) │
├──────────────────────────────────────┼─────────────────────────────┼
─────────────────┤
│ bidirectional_1 (Bidirectional) │ ? │
0 (unbuilt) │
├──────────────────────────────────────┼─────────────────────────────┼
─────────────────┤
│ dense (Dense) │ ? │
0 (unbuilt) │
└──────────────────────────────────────┴─────────────────────────────┴
─────────────────┘

Total params: 0 (0.00 B)

Trainable params: 0 (0.00 B)

Non-trainable params: 0 (0.00 B)

model.build(input_shape=(None, maxlen))

model.summary()

Model: "sequential"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳
━━━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃
Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇
━━━━━━━━━━━━━━━━━┩
│ embedding (Embedding) │ (None, 50, 16) │
160,000 │
├──────────────────────────────────────┼─────────────────────────────┼
─────────────────┤
│ bidirectional (Bidirectional) │ (None, 50, 40) │
5,920 │
├──────────────────────────────────────┼─────────────────────────────┼
─────────────────┤
│ bidirectional_1 (Bidirectional) │ (None, 40) │
9,760 │
├──────────────────────────────────────┼─────────────────────────────┼
─────────────────┤
│ dense (Dense) │ (None, 6) │
246 │
└──────────────────────────────────────┴─────────────────────────────┴
─────────────────┘

Total params: 175,926 (687.21 KB)

Trainable params: 175,926 (687.21 KB)

Non-trainable params: 0 (0.00 B)

Task 8: Training the Model


1. Preparing a validation set
2. Training the model
val_tweets, val_labels = get_tweets(val)

val_sequences = tokenised.texts_to_sequences(val_tweets)
val_padded = pad_sequences(val_sequences, maxlen=maxlen,
padding='post', truncating='post')

val_tweets[0], val_labels[0]

('im feeling quite sad and sorry for myself but ill snap out of it
soon', 0)

h = model.fit(
padded, np.array(labels),
validation_data=(val_padded, np.array(val_labels)),
epochs=20,
callbacks=[
tf.keras.callbacks.EarlyStopping(monitor='val_accuracy',
patience=2)
]
)

Epoch 1/20
500/500 ━━━━━━━━━━━━━━━━━━━━ 19s 12ms/step - accuracy: 0.3874 - loss:
1.5271 - val_accuracy: 0.6530 - val_loss: 0.9299
Epoch 2/20
500/500 ━━━━━━━━━━━━━━━━━━━━ 15s 11ms/step - accuracy: 0.6875 - loss:
0.8073 - val_accuracy: 0.7270 - val_loss: 0.7383
Epoch 3/20
500/500 ━━━━━━━━━━━━━━━━━━━━ 10s 11ms/step - accuracy: 0.8069 - loss:
0.4949 - val_accuracy: 0.8410 - val_loss: 0.4937
Epoch 4/20
500/500 ━━━━━━━━━━━━━━━━━━━━ 6s 12ms/step - accuracy: 0.9061 - loss:
0.2860 - val_accuracy: 0.8650 - val_loss: 0.4383
Epoch 5/20
500/500 ━━━━━━━━━━━━━━━━━━━━ 5s 11ms/step - accuracy: 0.9375 - loss:
0.2080 - val_accuracy: 0.8765 - val_loss: 0.4161
Epoch 6/20
500/500 ━━━━━━━━━━━━━━━━━━━━ 10s 11ms/step - accuracy: 0.9599 - loss:
0.1389 - val_accuracy: 0.8845 - val_loss: 0.3720
Epoch 7/20
500/500 ━━━━━━━━━━━━━━━━━━━━ 6s 12ms/step - accuracy: 0.9689 - loss:
0.1004 - val_accuracy: 0.8975 - val_loss: 0.3425
Epoch 8/20
500/500 ━━━━━━━━━━━━━━━━━━━━ 10s 11ms/step - accuracy: 0.9708 - loss:
0.0948 - val_accuracy: 0.8975 - val_loss: 0.3518
Epoch 9/20
500/500 ━━━━━━━━━━━━━━━━━━━━ 6s 13ms/step - accuracy: 0.9775 - loss:
0.0735 - val_accuracy: 0.8880 - val_loss: 0.3817

Task 9: Evaluating the Model


1. Visualizing training history
2. Prepraring a test set
3. A look at individual predictions on the test set
4. A look at all predictions on the test set
show_history(h)

test_tweets,test_labels = get_tweets(test)
test_sequences = tokenised.texts_to_sequences(test_tweets)
test_padded = pad_sequences(test_sequences, maxlen=maxlen,
padding='post', truncating='post')

_ = model.evaluate(test_padded, np.array(test_labels))

63/63 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9133 - loss:


0.2773

i = random.randint(0, len(test_tweets) - 1)

print('Sentence: ',test_tweets[i])
print('emotion linked with sentence: ',test_labels[i])

p = (np.expand_dims(test_sequences[i], axis=0))
predictions = model.predict(p)
pred=np.argmax(predictions,axis=1)[0]
print('Predicted emotion: ',pred)

Sentence: i was feeling brave i would try to pick up running again


emotion linked with sentence: 1
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 371ms/step
Predicted emotion: 1

You might also like