0% found this document useful (0 votes)
2 views

4. Structured outputs- Data types

The document discusses structured outputs in convolutional neural networks (CNNs), highlighting their ability to produce high-dimensional tensors for tasks like pixel-level classification and image segmentation. It explains how CNNs can handle varying spatial extents and different data types, including 1-D, 2-D, and 3-D representations. The document emphasizes the advantages of using CNNs for complex data relationships and processing capabilities over traditional neural networks.

Uploaded by

devanand272003
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

4. Structured outputs- Data types

The document discusses structured outputs in convolutional neural networks (CNNs), highlighting their ability to produce high-dimensional tensors for tasks like pixel-level classification and image segmentation. It explains how CNNs can handle varying spatial extents and different data types, including 1-D, 2-D, and 3-D representations. The document emphasizes the advantages of using CNNs for complex data relationships and processing capabilities over traditional neural networks.

Uploaded by

devanand272003
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Structured outputs,

Data types

Mr. Sivadasan E T
Associate Professor
Vidya Academy of Science and Technology, Thrissur
Structured outputs
• A "structured object" in the context of
convolutional neural networks (CNNs) refers to
outputs that go beyond simple classification or
regression values.

• These outputs have complex, meaningful


relationships between their components and
typically represent high-dimensional data with
intricate patterns or structures.
Structured outputs

Convolutional networks can be used to output a high-


dimensional, structured object, rather than just
predicting a class label for a classification task or a
real value for a regression task.
High-Dimensional Tensor Output:

CNNs often emit a tensor as output.

A tensor can be seen as a multi-dimensional grid of


numbers representing probabilities, pixel intensities, or
other information.
Structured outputs
Example - Pixel-Level Classification:
Suppose a CNN produces a tensor S where:

Si,j,k represents the probability that pixel (j, k) belongs


to class i (like "car" or "person").

This enables pixel-wise classification rather than


predicting just a single class for the entire image.
Structured outputs
Image Segmentation:

By assigning a class to each pixel, CNNs can create


precise masks that outline individual objects in an
image.

Use Case: Identifying and isolating cars, roads, and


pedestrians in autonomous driving images.
Structured outputs

• Once a prediction for each pixel is made,


various methods can be used to further process
these predictions in order to obtain a
segmentation of the image into regions.
Structured outputs

• The general idea is to assume that large groups


of contiguous pixels tend to be associated with
the same label.

• Graphical models can describe the probabilistic


relationships between neighboring pixels.
Data Types

The data used with a convolutional network usually


consists of several channels.

Each channel being the observation of a different


quantity at some point in space or time.
Data Types
• One advantage to convolutional networks is that
they can also process inputs with varying spatial
extents.
• These kinds of input simply cannot be represented
by traditional, matrix multiplication-based neural
networks.
• This provides a compelling reason to use
convolutional networks even when computational
cost and overfitting are not significant issues.
Data Types

• For example, consider a collection of images,


where each image has a different width and
height.

• It is unclear how to model such inputs with a


weight matrix of fixed size.
Data Types

• Convolution is straightforward to apply; the


kernel is simply applied a different number of
times depending on the size of the input, and the
output of the convolution operation scales
accordingly.
Data Types

1-D Single Channel

• Audio waveform: The axis we convolve over


corresponds to time.

• We discretize time and measure the amplitude


of the waveform once per time step.
Data Types
1-D Multi-Channel

• This involves animating 3D characters by


changing their joint angles over time.
• Each frame records the angles of different
joints, describing the character's pose.
• In convolutional models, each data channel
represents the angle of one joint around a
specific axis.
Data Types
2-D Single Channel:

• Audio data that has been preprocessed with a


Fourier transform:

• We can transform the audio waveform into a 2D


tensor with different rows corresponding to different
frequencies and different columns corresponding to
different points in time.
Data Types
2-D Multi-Channel:

Color image data:


• One channel contains the red pixels, one the green
pixels, and one the blue pixels.

• The convolution kernel moves over both the


horizontal and vertical axes of the image, conferring
translation equivariance in both directions.
Data Types

3-D Single Channel:

Volumetric data: A common source of this kind of data


is medical imaging technology, such as CT scans.
Data Types

3-D Multi-Channel:

Color video data: One axis corresponds


to time, one to the height of the video frame, and one
to the width of the video frame.
Thank You!

You might also like