0% found this document useful (0 votes)

17 views5 pages

Dubey ML Write-Up

Uploaded by

Nabil Suraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views5 pages

Dubey ML Write-Up

Uploaded by

Nabil Suraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Dubey ML write-up

U-Net Meets Transformers: Better Encoders for Image Segmentation

In this article, we delve into the hybrid architecture of TransU-net, combining CNNs and
Transformers.

Outline

1. What is U-net and its architecture

2. Transformer in U-net architecture
3. How is TransU-net implemented
4. Results, analysis and insights

U-net Architecture

U-Net, a convolutional neural network (CNN) architecture, was introduced in 2015 by Olaf
Ronneberger et al. [1] and was initially designed for biomedical image segmentation. Its
name derives from its “U” shape, created by its encoder-decoder structure and skip
connections [2]. U-Net has proven to be flexible and effective, and its applications have
expanded into various fields beyond its original purpose.

image showing structure, fig.1

The Contracting Path (Encoder):

This path operates much like a typical convolutional neural network (CNN). It progressively
down samples the input image, extracting and encoding essential features at each level.
This process can be visualized as a funnel, gradually reducing the spatial dimensions of the
image while increasing the depth of the feature maps.

Each step in the contracting path typically involves:

Convolution: Applies filters to the image to detect specific patterns and features.
ReLU Activation: Introduces non-linearity to enable the network to learn complex
relationships.
Max Pooling: Down samples the feature maps to reduce their size and increase the
receptive field of subsequent layers.

As the image traverses this path, the network captures increasingly abstract and high-level
features, crucial for understanding the overall context of the image.

image showing structure, fig.2

The Expanding Path (Decoder):

This path reverses the contracting path processes, gradually upsampling the feature maps
to reconstruct a segmentation mask of the same size as the original input image. It’s acts
as a reverse funnel, expanding the spatial dimensions while decreasing the depth of the
feature maps.

Each step in the expanding path typically involves:

Upsampling: Increases the size of the feature maps, often using transposed
convolutions.
Concatenation: Combines the up sampled feature maps with corresponding high-
resolution ones from the contracting path. This crucial step, facilitated by skip
connections, helps to recover spatial details lost during downsampling.
Convolution: Refines the combined feature maps to generate a precise segmentation
mask.

Transformer in the architecture

Given an image x ∈ ℝ^{H × W × C} with a spatial resolution of H×W and C channels, the goal
is to predict a pixel-wise label map of the same resolution, H×W. Traditional methods often
rely on Convolutional Neural Networks (CNNs), such as U-Net, to encode the images into
high-level feature representations and then decode them back to the full spatial resolution.

In contrast, this approach introduces self-attention mechanisms into the encoder design by
leveraging Transformers[3]. This enhancement enables the model to capture richer feature
representations and better global context.

Implementation of TransU-net
The goal is to predict a pixel-wise label map (segmentation map) for an image, where
each pixel is assigned a class label. The output map will have the same spatial resolution
as the input image H×W, and the process is performed using a neural network.

*image showing structure, fig.3

TransU-Net combines CNNs (Convolutional Neural Networks) and Transformers to

improve segmentation performance by leveraging both local and global image features. The
following are the steps involved.

*animated image showing structure, fig.4

1. Image Sequentialization

To process the image using a Transformer:

The image is split into small patches of size P×P (e.g., 16×16 pixels).
Each patch is flattened into a vector and forms a “token.” The total number of tokens is

H⋅W
N=
P2
where H and W are the image height and width.
Example: If the image is 256×256 and the patch size P=16, then

256 ⋅ 256
N= = 256 tokens
162

2. Patch Embedding

Each token is mapped into a latent space of dimensionality D using a linear

projection.
Positional embeddings are added to the tokens to retain spatial information (i.e.,
which part of the image a patch corresponds to).

3. Transformer Encoder

The sequence of embedded tokens is passed through multiple layers that constitute the
Transformer. This includes:
*image showing structure, fig.5

1. Multi-Head Self-Attention (MSA): Captures relationships between tokens (global

context).
2. Multi-Layer Perceptron (MLP): Processes the attention output for further feature
refinement.
3. Layer Normalization (LN): Stabilizes training.
4. Skip Connections: Help retain information across layers.

The output is a sequence of feature representations Lz_{L}, which encodes global

relationships in the image.

4. TransU-Net Architecture

*image showing structure, fig.6

TransU-Net improves upon using just Transformers by combining CNNs and Transformers :

Hybrid Encoder:

A CNN is first applied to the input image to extract intermediate feature maps at
various resolutions.
These feature maps are tokenized and processed by the Transformer.
Using CNN features ensures local (low-level) details like edges and boundaries are
preserved.

Cascaded Up sampler (CUP):

The encoded representation is reshaped to a 2D feature map and progressively up

sampled back to the original resolution H × W.
Each Upsampling step includes:

Upsampling operator (e.g., bilinear interpolation).

3×3 convolution layer.

ReLU activation.
Skip connections between the encoder and decoder help recover fine-grained details.

Results and analysis

1. Performance on Synapse Multi-Organ Segmentation Dataset

The dataset includes 30 abdominal CT scans with ground-truth labels for 8 organs. The
main evaluation metrics were:

Dice Similarity Coefficient (DSC): Measures overlap accuracy.

Hausdorff Distance (HD): Measures boundary quality.

Key Observations:

1. The hybrid CNN-Transformer encoder leverages low-level spatial details and high-level
global context, leading to better results.
2. The cascaded upsampling strategy recovers fine-grained features, improving
segmentation boundaries.
3. Superior results were observed for all organs, especially challenging ones like the
pancreas and gallbladder.

Oracle: Questions and Answers (PDF) For More Information - Visit
100% (1)
Oracle: Questions and Answers (PDF) For More Information - Visit
70 pages
Unet + RL
No ratings yet
Unet + RL
63 pages
Trans Unet
No ratings yet
Trans Unet
4 pages
1809.10486 - Nnu NetSelf adaptingFrameworkforUnetBasedMedicalImageSegmentation
No ratings yet
1809.10486 - Nnu NetSelf adaptingFrameworkforUnetBasedMedicalImageSegmentation
11 pages
Segmentation
No ratings yet
Segmentation
34 pages
U-Net Architecture For Image Segmentation
No ratings yet
U-Net Architecture For Image Segmentation
7 pages
Transunet: Transformers Make Strong Encoders For Medical Image Segmentation
No ratings yet
Transunet: Transformers Make Strong Encoders For Medical Image Segmentation
13 pages
Usa U Net
No ratings yet
Usa U Net
27 pages
Paperid 104
No ratings yet
Paperid 104
4 pages
Abstract 3. Related Work Sandstorm U-Net Deeplabv3 4. Methodology
No ratings yet
Abstract 3. Related Work Sandstorm U-Net Deeplabv3 4. Methodology
18 pages
U-Net Biomedical Segmentation Updated
No ratings yet
U-Net Biomedical Segmentation Updated
8 pages
Previously
No ratings yet
Previously
49 pages
UNet For Semantic Segmentation - DTD - 19april2024
No ratings yet
UNet For Semantic Segmentation - DTD - 19april2024
20 pages
METHODOLOGY
No ratings yet
METHODOLOGY
5 pages
U-Net: Convolutional Networks For Biomedical Image Segmentation
No ratings yet
U-Net: Convolutional Networks For Biomedical Image Segmentation
8 pages
UNETR: Transformers For 3D Medical Image Segmentation
No ratings yet
UNETR: Transformers For 3D Medical Image Segmentation
11 pages
MP 4
No ratings yet
MP 4
4 pages
Unet
No ratings yet
Unet
8 pages
Experiment 5
No ratings yet
Experiment 5
29 pages
Dental X-Ray Image Segmenation Using A U-Shaped Deep Convolutional Network
No ratings yet
Dental X-Ray Image Segmenation Using A U-Shaped Deep Convolutional Network
13 pages
Transfer Learning U-Net Deep Learning For Lung Ultrasound Segmentation
No ratings yet
Transfer Learning U-Net Deep Learning For Lung Ultrasound Segmentation
14 pages
RM UNetUNet LikeMambawithrotationalSSMmoduleformedical
No ratings yet
RM UNetUNet LikeMambawithrotationalSSMmoduleformedical
17 pages
U-Net Architectures For Fast Prediction of Incompressible Laminar Flows
No ratings yet
U-Net Architectures For Fast Prediction of Incompressible Laminar Flows
12 pages
ICDSEA - 2025 - PPT Paper-Id 104
No ratings yet
ICDSEA - 2025 - PPT Paper-Id 104
7 pages
Aligning and Prompting Everything All at Once For Universal Visual Perception
No ratings yet
Aligning and Prompting Everything All at Once For Universal Visual Perception
5 pages
Tmi 2019 2959609
No ratings yet
Tmi 2019 2959609
12 pages
U-KAN Makes Strong Backbone For Medical Image Segmentation and Generation
No ratings yet
U-KAN Makes Strong Backbone For Medical Image Segmentation and Generation
14 pages
1 Image Segmentation Using Deep Learning
No ratings yet
1 Image Segmentation Using Deep Learning
6 pages
Densely Connected Recurrent Residual (Dense R2Unet) Convolutional Neural Network For Segmentation of Lung CT Images
No ratings yet
Densely Connected Recurrent Residual (Dense R2Unet) Convolutional Neural Network For Segmentation of Lung CT Images
5 pages
Projectppt Edited
No ratings yet
Projectppt Edited
12 pages
Ultrasound Nerve Segmentation
No ratings yet
Ultrasound Nerve Segmentation
13 pages
Attention U-Net, ResUnet, U-Net++, U - Net - AIGuys
No ratings yet
Attention U-Net, ResUnet, U-Net++, U - Net - AIGuys
16 pages
Zhang Et Al - 2020 - DENSE-Inception U-Net For Medical Image Segmentation2
No ratings yet
Zhang Et Al - 2020 - DENSE-Inception U-Net For Medical Image Segmentation2
40 pages
A Novel Deep Learning Model For Medical Image Segmentation With Convolutional Neural Network and Transformer
No ratings yet
A Novel Deep Learning Model For Medical Image Segmentation With Convolutional Neural Network and Transformer
15 pages
Azad Deep Frequency Re-Calibration U-Net For Medical Image Segmentation ICCVW 2021 Paper 2
No ratings yet
Azad Deep Frequency Re-Calibration U-Net For Medical Image Segmentation ICCVW 2021 Paper 2
10 pages
Transfuse: Fusing Transformers and Cnns For Medical Image Segmentation
No ratings yet
Transfuse: Fusing Transformers and Cnns For Medical Image Segmentation
9 pages
Convformer: Combining CNN and Transformer For Medical Image Segmentation
No ratings yet
Convformer: Combining CNN and Transformer For Medical Image Segmentation
5 pages
R2 Unet PDF
No ratings yet
R2 Unet PDF
12 pages
Semantic Segmentation
No ratings yet
Semantic Segmentation
22 pages
Hust PPT Template 2022 Red 4x3-1
No ratings yet
Hust PPT Template 2022 Red 4x3-1
15 pages
3D U-Net: Learning Dense Volumetric Segmentation From Sparse Annotation
No ratings yet
3D U-Net: Learning Dense Volumetric Segmentation From Sparse Annotation
8 pages
DL Ass 742
No ratings yet
DL Ass 742
14 pages
Literature Review On Image Classification Architecture
No ratings yet
Literature Review On Image Classification Architecture
14 pages
Optimisation of Semantic Segmentation Algorithm For Autonomous Driving Using U-NET Architecture
No ratings yet
Optimisation of Semantic Segmentation Algorithm For Autonomous Driving Using U-NET Architecture
16 pages
U-Net Swathi 2020
No ratings yet
U-Net Swathi 2020
5 pages
Vmunet
No ratings yet
Vmunet
9 pages
Computers in Biology and Medicine
No ratings yet
Computers in Biology and Medicine
16 pages
Eg-Transunet: A Transformer-Based U-Net With Enhanced and Guided Models For Biomedical Image Segmentation
No ratings yet
Eg-Transunet: A Transformer-Based U-Net With Enhanced and Guided Models For Biomedical Image Segmentation
22 pages
JPM 13 01298
No ratings yet
JPM 13 01298
23 pages
End-to-End Boundary Aware Networks For Medical Image Segmentation
No ratings yet
End-to-End Boundary Aware Networks For Medical Image Segmentation
8 pages
Transdeeplab: Convolution-Free Transformer-Based Deeplab V3+ For Medical Image Segmentation
No ratings yet
Transdeeplab: Convolution-Free Transformer-Based Deeplab V3+ For Medical Image Segmentation
13 pages
The One Hundred Layers Tiramisu: Fully Convolutional Densenets For Semantic Segmentation
No ratings yet
The One Hundred Layers Tiramisu: Fully Convolutional Densenets For Semantic Segmentation
9 pages
Snap2Insight Task Explanation
No ratings yet
Snap2Insight Task Explanation
6 pages
A Hybrid CNN-Transformer Architecture For Precise Medical Image Segmentation
No ratings yet
A Hybrid CNN-Transformer Architecture For Precise Medical Image Segmentation
13 pages
Report Explo
No ratings yet
Report Explo
31 pages
4b Image Processing
No ratings yet
4b Image Processing
63 pages
Laddernet: Multi-Path Networks Based On U-Net For Medical Image Segmentation Juntang Zhuang Biomedical Engineering, Yale University, New Haven, CT, USA
No ratings yet
Laddernet: Multi-Path Networks Based On U-Net For Medical Image Segmentation Juntang Zhuang Biomedical Engineering, Yale University, New Haven, CT, USA
4 pages
2018 - SeGAN - Adversarial Network With Multi-Scale L 1 Loss For Medical
No ratings yet
2018 - SeGAN - Adversarial Network With Multi-Scale L 1 Loss For Medical
10 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Image Based Modeling and Rendering: Exploring Visual Realism: Techniques in Computer Vision
From Everand
Image Based Modeling and Rendering: Exploring Visual Realism: Techniques in Computer Vision
Fouad Sabry
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
PLC Interview Qna
No ratings yet
PLC Interview Qna
26 pages
Q5X IO-Link Data Reference Guide
No ratings yet
Q5X IO-Link Data Reference Guide
7 pages
SB Pro PE 4.009 (Full Release) Version History and Release Notes
No ratings yet
SB Pro PE 4.009 (Full Release) Version History and Release Notes
39 pages
Topcoater Elite Series 5: Manual Powder Cart
No ratings yet
Topcoater Elite Series 5: Manual Powder Cart
2 pages
DSA Sample - Arshad
No ratings yet
DSA Sample - Arshad
21 pages
01 Linear Data Structures
No ratings yet
01 Linear Data Structures
56 pages
Bi QB
No ratings yet
Bi QB
6 pages
QMS Quick Start Guide
No ratings yet
QMS Quick Start Guide
14 pages
Al-Qaysi Mina
No ratings yet
Al-Qaysi Mina
46 pages
SAT-101 User Manual: Document No: MAN-0013 Issue No: 4 Dated: 24 Aug 2004
No ratings yet
SAT-101 User Manual: Document No: MAN-0013 Issue No: 4 Dated: 24 Aug 2004
24 pages
Rizwan Shoukat PDF
No ratings yet
Rizwan Shoukat PDF
3 pages
SOP IPPB - LPT001 Downloading&Configuring Java
No ratings yet
SOP IPPB - LPT001 Downloading&Configuring Java
9 pages
Single
No ratings yet
Single
64 pages
IntelliSAW Sensor Installation Manual
No ratings yet
IntelliSAW Sensor Installation Manual
33 pages
Transform Energy Lab Report
No ratings yet
Transform Energy Lab Report
2 pages
Datasheet MB980 Ibase
No ratings yet
Datasheet MB980 Ibase
1 page
Pop Personal
No ratings yet
Pop Personal
10 pages
Number System Solutions
No ratings yet
Number System Solutions
5 pages
Data Struture and Alghorithem
No ratings yet
Data Struture and Alghorithem
46 pages
Pamplet Penyerahan Result Pt3
No ratings yet
Pamplet Penyerahan Result Pt3
2 pages
IT Department Final Examinations Schedule, Semester I, AY 2022-2023 (Draft Version) 2 PDF
No ratings yet
IT Department Final Examinations Schedule, Semester I, AY 2022-2023 (Draft Version) 2 PDF
4 pages
CMMT ST C8 1C MP S0 - Operating Instr - 2024 04a - 8214117g1
No ratings yet
CMMT ST C8 1C MP S0 - Operating Instr - 2024 04a - 8214117g1
12 pages
F Module 5 Questions (P 3)
No ratings yet
F Module 5 Questions (P 3)
3 pages
Tài liệu không có tiêu đề
No ratings yet
Tài liệu không có tiêu đề
2 pages
胡希恕经方理论与实践
No ratings yet
胡希恕经方理论与实践
324 pages
Circuit Theory Lec 4
No ratings yet
Circuit Theory Lec 4
34 pages
Ucs1505 - Iot
No ratings yet
Ucs1505 - Iot
158 pages
2024 03 29 Csiac Dod Cybersecurity Policy Chart - PDF - Safe
100% (2)
2024 03 29 Csiac Dod Cybersecurity Policy Chart - PDF - Safe
1 page
Dbms Output
No ratings yet
Dbms Output
17 pages

Dubey ML Write-Up

Uploaded by

Dubey ML Write-Up

Uploaded by

Dubey ML write-up

U-Net Meets Transformers: Better Encoders for Image Segmentation

1. What is U-net and its architecture

image showing structure, fig.1

The Contracting Path (Encoder):

Each step in the contracting path typically involves:

image showing structure, fig.2

The Expanding Path (Decoder):

Each step in the expanding path typically involves:

Transformer in the architecture

*image showing structure, fig.3

TransU-Net combines CNNs (Convolutional Neural Networks) and Transformers to

*animated image showing structure, fig.4

To process the image using a Transformer:

Each token is mapped into a latent space of dimensionality D using a linear

1. Multi-Head Self-Attention (MSA): Captures relationships between tokens (global

The output is a sequence of feature representations Lz_{L}, which encodes global

*image showing structure, fig.6

Cascaded Up sampler (CUP):

The encoded representation is reshaped to a 2D feature map and progressively up

Upsampling operator (e.g., bilinear interpolation).

3×3 convolution layer.

Results and analysis

1. Performance on Synapse Multi-Organ Segmentation Dataset

Dice Similarity Coefficient (DSC): Measures overlap accuracy.

You might also like