0% found this document useful (0 votes)
7 views25 pages

Visual Processing

Uploaded by

inci.ahmet3814
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views25 pages

Visual Processing

Uploaded by

inci.ahmet3814
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Visual Processing

December 2024
• Section 1: Visual Processing: Biology and Technology

• Section 2: EfficientNet

• Section 3: Fine-Tuning EfficientNet-B0 in PyTorch

1
Section 1

Visual Processing: Biology and Technology


Biology and Technology

3
Biology and Technology

Human Visual System Convolutional Neural Network (CNN)

Resolution (Retina): High spatial resolution in the fovea; low Input Resolution: Size of the input image (e.g., 224×224 pixels),
resolution in the periphery for context and movement. determining the granularity of details processed.
Width (Feature Diversity): Parallel neurons in V1 respond to Width (Filters): Filters in convolutional layers capture diverse
various features, such as edges, orientation, motion, and color. spatial patterns, e.g., edges, textures, and shapes. Wider
networks detect more diverse features.
Depth (Hierarchy): Hierarchical layers process increasingly Depth (Layer Stacking): Layers process increasingly abstract
complex informa on. V1 (basic pa erns like lines/edges) → features, e.g., detec ng edges → combining edges into textures
V2/V4 (textures and shapes) → IT (complex objects like faces). → forming shapes and objects.
Combining Features: Later areas (e.g., IT) combine patterns like 1x1 Convolution: Combines features across channels, e.g.,
"red" + "round" = "red ball." combining "red," "round," and "smooth" features into "red ball."
Receptive Fields: Small receptive fields in early layers (e.g., V1 Receptive Fields: Early layers focus on local patterns (small
detect small edges). Later layers have larger receptive fields to receptive fields). Later layers expand receptive fields for global
understand global context (e.g., object recognition). understanding (e.g., object detection).
Convolution: The biological visual system does not use "filters" Convolution: Filters (small weight matrices) are applied to input
explicitly but responds to specific patterns in the visual field. data to detect patterns through: - Mathematics: Each filter slides
Neurons in the retina and visual cortex (e.g., V1) act like over the input, performing element-wise multiplications and
localized processors, responding to specific spatial patterns, summing them up (convolution operation). - Feature Maps: The
such as edges or orientations, within their receptive fields. output is a feature map that highlights locations where the filter
These neurons act as pattern detectors, similar to how detects patterns (e.g., edges, curves). Just like biological vision,
convolution processes small areas of an image. early layers detect basic patterns, while deeper layers combine
them to identify complex shapes and objects.
Non-Linearity: Visual neurons have thresholds; firing happens Non-Linearity: Activation functions (e.g., ReLU) introduce
only after sufficient stimulus activation. thresholds in CNNs to capture complex patterns.

4
Resolution: Detail vs. Efficiency

Human Visual System Convolutional Neural Networks

 Fovea vs. Periphery:  Input Resolution:


High-resolution in the fovea for detailed central vision (e.g., reading or The size of input images (e.g., 224×224 pixels) determines the
recognizing a face).Low-resolution in peripheral vision, optimized for network's capacity to capture fine details. Higher resolution inputs
detecting movement and broader spatial awareness. retain more information but demand higher computational power.
 Dynamic Focus:  Resolution Trade-offs:
Eyes focus dynamically, adjusting resolution for context (e.g., tracking a Lower resolutions reduce computational load but may sacrifice finer
fast-moving object vs. identifying fine details). details. Efficient architectures balance resolution with feature
extraction (e.g., EfficientNet scales input resolution proportionally to
 Neural Efficiency: Neural circuits optimize processing by discarding width and depth).
unnecessary details and focusing on relevant patterns.

Biology-Inspired Design:
• CNN architectures mimic the human strategy: focus
computational resources where detail matters most.
• Use layered hierarchies to process critical information
progressively.

5
Width: Detecting Diverse Features

6
Depth: Building Complexity

7
Depth: Building Complexity

These operations work in tandem: Convolution extracts patterns,


and subsampling refines the data to ensure computational
efficiency and robustness in recognizing features. This hierarchical
process is similar to the human visual pathway, where early stages
detect local patterns (like edges) and later stages combine these
patterns to form a global understanding.

8
Scaling in Biological and Computational Vision Systems

• Fovea Centralis: Located at the center of the


• Baseline Network: A simple model with initial depth,
retina, the fovea contains a high density of
width, and input resolution.
cone photoreceptors, enabling sharp central
vision and fine detail discrimination. • Width Scaling: Increases the number of channels in
each layer to capture more features.
• Retina: As one moves away from the fovea,
the density of cone cells decreases, and rod • Depth Scaling: Adds more layers to capture more
photoreceptors become more prevalent. This complex patterns.
arrangement supports peripheral vision, • Resolution Scaling: Uses higher resolution images to
which is more sensitive to motion and retain more fine-grained information.
functions better in low-light conditions but
offers lower resolution. • Compound Scaling: Combines all three scaling methods
in a balanced way to optimize performance and
efficiency.
9
Bottlenecks and Efficiency

10
Section 2

EfficientNet
Introduction to EfficientNet

12
The Challenge of Scaling in CNNs

13
Compound Scaling

14
EfficientNet-B0 Architecture

15
Design Assessment

16
Section 3

Fine-Tuning EfficientNet-B0 in PyTorch


What is Fine-Tuning?

18
Installing and Loading EfficientNet-B0

# Install the necessary library


!pip install torchvision efficientnet_pytorch

# Import EfficientNet from PyTorch


from efficientnet_pytorch import EfficientNet

# Load pre-trained EfficientNet-B0


model = EfficientNet.from_pretrained('efficientnet-b0')

19
Preparing the Dataset

from torchvision import datasets, transforms

# Define transformations
data_transforms = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# Load dataset
train_dataset = datasets.ImageFolder('path/to/train', transform=data_transforms)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)

20
Modifying the Classifier

import torch.nn as nn

# Modify the classifier


num_classes = 2 # For binary classification
model._fc = nn.Linear(model._fc.in_features, num_classes)

21
Setting Up Fine-Tuning

22
Training the Model

23
Evaluating the Model

24

You might also like