0% found this document useful (0 votes)
14 views

4. Structured outputs- Data types

The document discusses structured outputs in convolutional neural networks (CNNs), highlighting their ability to produce high-dimensional tensors for tasks like pixel-level classification and image segmentation. It explains how CNNs can handle varying spatial extents and different data types, including 1-D, 2-D, and 3-D representations. The document emphasizes the advantages of using CNNs for complex data relationships and processing capabilities over traditional neural networks.

Uploaded by

devanand272003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

4. Structured outputs- Data types

The document discusses structured outputs in convolutional neural networks (CNNs), highlighting their ability to produce high-dimensional tensors for tasks like pixel-level classification and image segmentation. It explains how CNNs can handle varying spatial extents and different data types, including 1-D, 2-D, and 3-D representations. The document emphasizes the advantages of using CNNs for complex data relationships and processing capabilities over traditional neural networks.

Uploaded by

devanand272003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Structured outputs,

Data types

Mr. Sivadasan E T
Associate Professor
Vidya Academy of Science and Technology, Thrissur
Structured outputs
• A "structured object" in the context of
convolutional neural networks (CNNs) refers to
outputs that go beyond simple classification or
regression values.

• These outputs have complex, meaningful


relationships between their components and
typically represent high-dimensional data with
intricate patterns or structures.
Structured outputs

Convolutional networks can be used to output a high-


dimensional, structured object, rather than just
predicting a class label for a classification task or a
real value for a regression task.
High-Dimensional Tensor Output:

CNNs often emit a tensor as output.

A tensor can be seen as a multi-dimensional grid of


numbers representing probabilities, pixel intensities, or
other information.
Structured outputs
Example - Pixel-Level Classification:
Suppose a CNN produces a tensor S where:

Si,j,k represents the probability that pixel (j, k) belongs


to class i (like "car" or "person").

This enables pixel-wise classification rather than


predicting just a single class for the entire image.
Structured outputs
Image Segmentation:

By assigning a class to each pixel, CNNs can create


precise masks that outline individual objects in an
image.

Use Case: Identifying and isolating cars, roads, and


pedestrians in autonomous driving images.
Structured outputs

• Once a prediction for each pixel is made,


various methods can be used to further process
these predictions in order to obtain a
segmentation of the image into regions.
Structured outputs

• The general idea is to assume that large groups


of contiguous pixels tend to be associated with
the same label.

• Graphical models can describe the probabilistic


relationships between neighboring pixels.
Data Types

The data used with a convolutional network usually


consists of several channels.

Each channel being the observation of a different


quantity at some point in space or time.
Data Types
• One advantage to convolutional networks is that
they can also process inputs with varying spatial
extents.
• These kinds of input simply cannot be represented
by traditional, matrix multiplication-based neural
networks.
• This provides a compelling reason to use
convolutional networks even when computational
cost and overfitting are not significant issues.
Data Types

• For example, consider a collection of images,


where each image has a different width and
height.

• It is unclear how to model such inputs with a


weight matrix of fixed size.
Data Types

• Convolution is straightforward to apply; the


kernel is simply applied a different number of
times depending on the size of the input, and the
output of the convolution operation scales
accordingly.
Data Types

1-D Single Channel

• Audio waveform: The axis we convolve over


corresponds to time.

• We discretize time and measure the amplitude


of the waveform once per time step.
Data Types
1-D Multi-Channel

• This involves animating 3D characters by


changing their joint angles over time.
• Each frame records the angles of different
joints, describing the character's pose.
• In convolutional models, each data channel
represents the angle of one joint around a
specific axis.
Data Types
2-D Single Channel:

• Audio data that has been preprocessed with a


Fourier transform:

• We can transform the audio waveform into a 2D


tensor with different rows corresponding to different
frequencies and different columns corresponding to
different points in time.
Data Types
2-D Multi-Channel:

Color image data:


• One channel contains the red pixels, one the green
pixels, and one the blue pixels.

• The convolution kernel moves over both the


horizontal and vertical axes of the image, conferring
translation equivariance in both directions.
Data Types

3-D Single Channel:

Volumetric data: A common source of this kind of data


is medical imaging technology, such as CT scans.
Data Types

3-D Multi-Channel:

Color video data: One axis corresponds


to time, one to the height of the video frame, and one
to the width of the video frame.
Thank You!

You might also like