Faster R-CNN
Faster R-CNN
Backbone Network:
ResNet or VGG16).
Anchors:
Classification of Anchors:
object).
● Positive (foreground) anchors: Boxes with high
(adjustments).
Loss functions:
background or foreground.
Region Proposals:
ROI Pooling:
Object Classification:
inside it.
● Cross-entropy loss is used to classify the objects into
categories.
Multi-task Learning:
ones.
Training:
Two ways to train:
import torch
import torchvision
import torchvision.transforms as T
pre-trained on COCO. You can modify this for your own dataset
model = fasterrcnn_resnet50_fpn(pretrained=True)
in_features = model.roi_heads.box_predictor.cls_score.in_features
# Replace the head of the model with a new one (for the number of
classes in your dataset)
model.roi_heads.box_predictor = FastRCNNPredictor(in_features,
num_classes)
labels (labels).
transform = T.Compose([
T.ToTensor(),
])
class CustomDataset(torch.utils.data.Dataset):
self.transforms = transforms
# Load image
target = {}
# Apply transforms
def __len__(self):
return len(self.data)
# Load dataset
dataset = CustomDataset(transforms=transform)
indices = torch.randperm(len(dataset)).tolist()
collate_fn=lambda x:
tuple(zip(*x)))
collate_fn=lambda x:
tuple(zip(*x)))
Now set up the optimizer and training loop. For Faster R-CNN,
else torch.device('cpu')
model.to(device)
weight_decay=0.0005)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
step_size=3,
gamma=0.1)
num_epochs = 10
for epoch in range(num_epochs):
model.train()
train_loss = 0.0
# Training loop
optimizer.zero_grad()
# Forward pass
# Backward pass
losses.backward()
optimizer.step()
train_loss += losses.item()
lr_scheduler.step()
print(f'Epoch: {epoch + 1}, Loss: {train_loss /
len(train_loader)}')
print("Training complete!")
After training, you can evaluate the model on the validation set
model.eval()
with torch.no_grad():
predictions = model(images)
# Example: print the bounding boxes and labels for the first
image
print(predictions[0]['boxes'])
print(predictions[0]['labels'])
Step 8: Inference
To run inference on a new image:
import cv2
# Load image
img = Image.open("path/to/your/image.jpg")
img = transform(img)
img = img.unsqueeze(0).to(device)
# Model prediction
model.eval()
with torch.no_grad():
prediction = model([img])
print(prediction[0]['boxes'])
print(prediction[0]['labels'])