Visual Processing
Visual Processing
December 2024
• Section 1: Visual Processing: Biology and Technology
• Section 2: EfficientNet
1
Section 1
3
Biology and Technology
Resolution (Retina): High spatial resolution in the fovea; low Input Resolution: Size of the input image (e.g., 224×224 pixels),
resolution in the periphery for context and movement. determining the granularity of details processed.
Width (Feature Diversity): Parallel neurons in V1 respond to Width (Filters): Filters in convolutional layers capture diverse
various features, such as edges, orientation, motion, and color. spatial patterns, e.g., edges, textures, and shapes. Wider
networks detect more diverse features.
Depth (Hierarchy): Hierarchical layers process increasingly Depth (Layer Stacking): Layers process increasingly abstract
complex informa on. V1 (basic pa erns like lines/edges) → features, e.g., detec ng edges → combining edges into textures
V2/V4 (textures and shapes) → IT (complex objects like faces). → forming shapes and objects.
Combining Features: Later areas (e.g., IT) combine patterns like 1x1 Convolution: Combines features across channels, e.g.,
"red" + "round" = "red ball." combining "red," "round," and "smooth" features into "red ball."
Receptive Fields: Small receptive fields in early layers (e.g., V1 Receptive Fields: Early layers focus on local patterns (small
detect small edges). Later layers have larger receptive fields to receptive fields). Later layers expand receptive fields for global
understand global context (e.g., object recognition). understanding (e.g., object detection).
Convolution: The biological visual system does not use "filters" Convolution: Filters (small weight matrices) are applied to input
explicitly but responds to specific patterns in the visual field. data to detect patterns through: - Mathematics: Each filter slides
Neurons in the retina and visual cortex (e.g., V1) act like over the input, performing element-wise multiplications and
localized processors, responding to specific spatial patterns, summing them up (convolution operation). - Feature Maps: The
such as edges or orientations, within their receptive fields. output is a feature map that highlights locations where the filter
These neurons act as pattern detectors, similar to how detects patterns (e.g., edges, curves). Just like biological vision,
convolution processes small areas of an image. early layers detect basic patterns, while deeper layers combine
them to identify complex shapes and objects.
Non-Linearity: Visual neurons have thresholds; firing happens Non-Linearity: Activation functions (e.g., ReLU) introduce
only after sufficient stimulus activation. thresholds in CNNs to capture complex patterns.
4
Resolution: Detail vs. Efficiency
Biology-Inspired Design:
• CNN architectures mimic the human strategy: focus
computational resources where detail matters most.
• Use layered hierarchies to process critical information
progressively.
5
Width: Detecting Diverse Features
6
Depth: Building Complexity
7
Depth: Building Complexity
8
Scaling in Biological and Computational Vision Systems
10
Section 2
EfficientNet
Introduction to EfficientNet
12
The Challenge of Scaling in CNNs
13
Compound Scaling
14
EfficientNet-B0 Architecture
15
Design Assessment
16
Section 3
18
Installing and Loading EfficientNet-B0
19
Preparing the Dataset
# Define transformations
data_transforms = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
# Load dataset
train_dataset = datasets.ImageFolder('path/to/train', transform=data_transforms)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
20
Modifying the Classifier
import torch.nn as nn
21
Setting Up Fine-Tuning
22
Training the Model
23
Evaluating the Model
24