0% found this document useful (0 votes)

98 views

SegDiff - Image Segmentation With Diffusion Probabilistic Models

This paper proposes a method called SegDiff to apply diffusion probabilistic models to the task of image segmentation. SegDiff learns end-to-end without relying on pre-trained backbones. It merges information from the input image and current segmentation estimate by summing the outputs of two encoders. Additional encoding and decoding layers are then used to iteratively refine the segmentation map. Running the diffusion model multiple times and merging the results improves performance and calibration. The method achieves state-of-the-art results on several benchmarks.

Uploaded by

Tung

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views

SegDiff - Image Segmentation With Diffusion Probabilistic Models

Uploaded by

Tung

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

SegDiff: Image Segmentation with Diffusion Probabilistic Models

Tomer Amit1 , Tal Shaharbany1 , Eliya Nachmani1,2 , and Lior Wolf1

1
Tel-Aviv University
2
Facebook AI Research
{tomeramit1,shaharabany,eliyan,wolf}@mail.tau.ac.il
arXiv:2112.00390v3 [cs.CV] 7 Sep 2022

Abstract encoder-decoder networks of varied architectures [4, 31,

38, 50, 52, 53]. While adversarial methods have been at-
Diffusion Probabilistic Methods are employed for state- tempted [12, 33, 49, 51], they do not constitute the current
of-the-art image generation. In this work, we present a state of the art.
method for extending such models for performing image Therefore, it is uncertain whether diffusion models,
segmentation. The method learns end-to-end, without re- which have been used primarily for GAN-like generation
lying on a pre-trained backbone. The information in the tasks, would be competitive in this domain. In this work,
input image and in the current estimation of the segmenta- we propose applying a diffusion model to learn the image
tion map is merged by summing the output of two encoders. segmentation map. Unlike other recent improvements in
Additional encoding layers and a decoder are then used to the field of image segmentation [13, 22, 44], we train our
iteratively refine the segmentation map, using a diffusion method end-to-end, without relying on a pre-trained back-
model. Since the diffusion model is probabilistic, it is ap- bone network.
plied multiple times, and the results are merged into a fi- The diffusion model employs a denoising network con-
nal segmentation map. The new method produces state-of- ditioned on the input image only through a sum in which
the-art results on the Cityscapes validation set, the Vaihin- this information is aggregated with information arising from
gen building segmentation benchmark, and the MoNuSeg the current estimate xt . Specifically, the input image I and
dataset. the current estimate xt of the binary segmentation map are
passed through two different encoders, and the sum of these
multi-channel tensors is passed through a U-Net [38] to pro-
1. Introduction vide the next estimate xt−1 .
Since the generation process is stochastic in its nature,
Diffusion methods, which iteratively improve a given one may obtain multiple solutions. As we show, merging
image, obtain image quality that is on par with or better these solutions, by simply averaging multiple runs, leads to
than other types of generative models, including other forms an improvement in overall accuracy.
of log-likelihood models and adversarial models [10, 19]. The novel method presented produces state-of-the-art re-
Such methods have been shown to excel in many genera- sults on multiple benchmarks: Cityscapes [9], building seg-
tion tasks, both conditional and unconditional. mentation [39], and nuclei segmentation [25, 26].
The vast majority of diffusion models are applied in do- Our main contributions are:
mains in which there is no absolute ground truth result and
the output is evaluated either through a user study or using • We are the first to apply diffusion models to the image
several quality and diversity scores. As far as we know, with segmentation problem.
the exception of super resolution [19,27,41], diffusion mod-
els have not been applied to problems in which the ground • We propose a new way to condition the model on the
truth result is unique. input image.
In this work, we tackle the problem of image seg-
mentation. This problem is a cornerstone of both clas- • We introduce the concept of multiple generations, in
sical computer vision and the deep learning methods of order to improve performance and calibration of the
the last decade. The leading methods in the field employ diffusion model.

1
• We obtained state-of-the-art results on multiple bench- In our work, we take a different approach to condition-
marks. The margin is especially large for small data ing, adding (not concatenating) the input image, after it
sets. passes through an convolutional encoder, to the current esti-
mation of the segmentation image. In other words, we learn
the DPM of a residual model.
2. Related work
Image segmentation is a problem of assigning each
3. Background
pixel a label that identifies whether it belongs to a spe- We briefly introduce the formulation of diffusion models
cific class or not. This problem is widely investigated us- mentioned in [18]. Diffusion models are generative mod-
ing different architectures. These include fully convolu- els parametrized by a Markov chain and composed of for-
tional networks [31], encoder-decoder architectures with ward and backward processes. The forward process q is
skip-connections, such as U-Net [38], transformer-based ar- described by the formulation:
chitectures, such as the segformer [50], and even architec-
T
tures that combine hypernetworks, such as [36]. Y
Diffusion Probabilistic Models (DPM) [43] are a class q(x1:T |x0 ) = q(xt |xt−1 ), (1)
t=1
of generative models based on a Markov chain, which can
transform a simple distribution (e.g. Gaussian) to data where T is the number of steps in the diffusion model,
that is sampled in a complex distribution. Diffusion mod- x1 , ..., xT are latent variables, and x0 is a sample from the
els are capable of generating high-quality images that can data. At each iteration of the forward process, Gaussian
compete with and even outperform the latest GAN meth- noise is added according to
ods [10, 18, 35, 43]. A variational framework for the like- p
lihood estimation of diffusion models was introduced by q(xt |xt−1 ) = N (xt ; 1 − βt xt−1 , βt In×n ), (2)
Huang et al. [21]. Subsequently, Kingma et al. [23] pro-
posed a Variational Diffusion Model that produces state-of- where βt is a constant that defines the schedule of added
the-art results in likelihood estimation for image density. noise, and In×n is the identity matrix of size n. As de-
Diffusion models were also applied to language model- scribed in [18],
ing [2, 20], where a novel diffusion model for categorical t
Y
data was used. αt = 1 − βt , ᾱt = αs . (3)
Conditional Diffusion Probabilistic Models In our s=0
work, we use diffusion models to solve the image seg-
The forward process supports sampling at an arbitrary
mentation problem as conditional generation, given the
timestamp t, with the formula
image. Conditional generation with diffusion models in-
cludes methods for class-conditioned generation, which is √
q(xt |x0 ) = N (xt ; ᾱt x0 , (1 − ᾱt )In×n ), (4)
obtained by adding a class embedding to the timestamp em-
bedding [35]. In [8] a method for guiding the generative which can be reparametrized to:
process in DDPM is present. This method allows the gen- √ p
eration of images based on a given reference image without xt = ᾱt x0 + (1 − ᾱt ), ∼ N (0, In×n ). (5)
any additional learning.
In the domain of super resolution, the lower-resolution The reverse process is parametrized by θ and defined by
image is upsampled and then concatenated, channelwise, T
Y
to the generated image at each iteration [19, 41]. A sim- pθ (x0:T −1 |xT ) = pθ (xt−1 |xt ). (6)
ilar approach passes the low-resolution images through a t=1
convolutional block [27] prior to the concatenation. Con-
currently with our work, diffusion models were applied to Starting from pθ (xT ) = N (xT ; 0, In×n ), the reverse
image-to-image translation tasks [40]. These tasks include process transforms the latent variable distribution pθ (xT )
uncropping, inpainting, and colorization. The results ob- to the data distribution pθ (x0 ). The reverse process steps
tained outperform strong GAN baselines. are performed by taking small Gaussian steps described by
Conditional diffusion models have also been used for
pθ (xt−1 |xt ) = N (xt−1 ; µθ (xt , t), Σθ (xt , t)). (7)
voice generation. The mel-spectrogram is processed with
a convolutional network, and is used as an additional in- Calculating q(xt−1 |xt , x0 ) using Bayes’ theorem, one
put to the DPM denoising network [6, 24, 30]. Further- obtains:
more, in [37] a text-to-speech diffusion model is introduced,
which uses text as a condition to the diffusion model. q(xt−1 |xt , x0 ) = N (xt−1 ; µ̃(xt , x0 ), β̃t In×n ), (8)

2
Figure 1. Our proposed diffusion method for image segmentation encodes the input signal, xt , with F . The extracted features are summed
with the feature map of the conditioned image I generated by network G. Networks E and D are a U-net encoder and decoder [35, 38],
respectively, that refine the estimated segmentation map, obtaining xt−1 .

where
√ √ Algorithm 1 Inference Algorithm
ᾱt−1 βt αt (1 − ᾱt−1 )
µ̃t (xt , x0 ) = x0 + xt , (9) Input total diffusion steps T, image I
1 − ᾱt 1 − ᾱt
xT ∼ N (0, In×n )
1 − ᾱt−1 for t = T, T − 1, ..., 1 do
β̃ = βt . (10)
1 − ᾱt z ∼ N (0, In×n )
−4 −2
The neural network µθ predicts the noise , which is βt = 10 (T −t)+2∗10
T −1
(t−1)

parametrized using Eq. 5, 9 to obtain: αt = Q1 − βt

t
ᾱt = s=0 αs
1 βt
µθ (xt , t) = √ (xt − √ θ (xt , t)). (11) β̃t = 1−ᾱt−1
1−ᾱt βt
αt 1 − ᾱt 1
+ 1[t>1] β̃t2 z
1
xt−1 = αt − 2 (xt − √1−αt θ (xt , I, t))
1−ᾱt
Following [18] we set
return x0
Σθ (xt , t) = σt2 In×n , (12)

where Algorithm 2 Training Algorithm

σt2 = β̃t . (13) Input total diffusion steps T, images and segmentation
The forward process variance parameter is chosen to be a masks dataset D = {(Ik , Mk )}K k =1

linearly increasing constant from β1 = 10−4 to βT = 2 ∗ repeat

10−2 , formally: Sample (Ii , Mi ) ∼ D, ∼ N (0, In×n )
Sample t ∼ Uniform({1,...,T})
−4 −2
10−4 (T − t) + 2 ∗ 10−2 (t − 1) βt = 10 (T −t)+2∗10 (t−1)
βt = . (14) T −1
T −1 αt = Q1 − βt
t
Finally, we will minimize the term ᾱt = s=0 αs
√ Take gradient
√ step on ∇θ || − θ (xt , Ii , t)||, xt =
√ √
Ex0 ,,t [|| − θ ( ᾱt x0 + 1 − ᾱt , t)||2 ], (15) ᾱt Mi + 1 − ᾱt
until convergence
where ∼ N (0, In×n ).
For inference, we can reparametrize the reverse process,
Eq. 7, with Eq. 11, obtaining
4. Method
1 1 − αt
xt−1 = √ (xt − √ θ (xt , t)) + σθ z. (16)
αt 1 − ᾱt Our method modifies the diffusion model by condition-
ing the step estimation function θ on an input tensor that

3
combines information derived from both the current esti- 4.3. Architecture
mate xt and the input image I.
The input image encoder G is built from Residual in
In diffusion models, θ is typically a U-Net [38]. In our
Residual Dense Blocks [47] (RRDBs), which combine
work, θ can be expressed in the following form:
multi-level residual connections without batch normaliza-
θ (xt , I, t) = D(E(F (xt ) + G(I), t), t) . (17) tion layers. G has an input 2D-convolutional layer, an
RRDB with a residual connection around it, followed by
In this architecture, the U-Net’s decoder D is conventional another 2D-convolutional layer, leaky RELU activation
and its encoder is broken down into three networks: E, and a final 2D-convolutional output layer. F is a 2D-
F , and G. The last encodes the input image, while F en- convolutional layer with a single-channel input and an out-
codes the segmentation map of the current step xt . The two put of C channels.
processed inputs have the same spatial dimensionality and The encoder-decoder part of θ , i.e., D and E, is based
number of channels. Based on the success of residual con- on U-Net, similarly to [35]. Each level is composed of
nections [17], we sum these signals F (xt ) + G(I). This residual blocks, and at resolution 16x16 and 8x8 each resid-
sum then passes to the rest of the U-Net encoder E. ual block is followed by an attention layer. The bottle-
The current step index t is passed to two different net- neck contains two residual blocks with an attention layer
works D and E. In each of these, it is embedded using a in between. Each attention layer contains multiple attention
shared learned look-up table. heads.
The output of θ from Eq. 17, which is conditioned on I, The residual block is composed of two convolutional
is plugged into Eq. 16, replacing the unconditioned θ net- blocks, where each convolutional block contains group-
work. This resulting inference time procedure is illustrated norm, Silu activation, and a 2D-convolutional layer. The
in Fig. 1 and detailed in Alg. 1. residual block receives the time embedding through a linear
layer, Silu activation, and another linear layer. The result is
4.1. Employing multiple generations then added to the output of the first 2D-convolutional block.
Additionally, the residual block has a residual connection
Since calculating xt−1 during inference includes the ad- that passes all its content.
dition of σθ (xt , t)z, where z is from a standard distribution,
On the encoder side (network E), there is a downsam-
there is significant variability between different runs of the
ple block after the residual blocks of the same depth, which
inference method on the same inputs, see Fig. 2(b).
is a 2D-convolutional layer with a stride of two. On the de-
In order to exploit this phenomenon, we run the inference coder side (network D), there is an upsample block after the
algorithm multiple times, then average the results. residual blocks of the same depth, which is composed of the
This way, we stabilize the results of segmentation and nearest interpolation that doubles the spatial size, followed
improve performance, as demonstrated in Fig. 2(c). We use by a 2D-convolutional layer. Each layer in the encoder has
thirty generated instances in all experiments, except for the a skip connection to the decoder side.
experiments in the ablation study, which quantifies the gain
of this averaging procedure.
5. Experiments
4.2. Training We present segmentation results for three datasets, as
The training procedure is depicted in Alg. 2. The total well as an ablation study.
number of diffusion steps T is set by the user. For each Datasets The Cityscapes dataset [9] is an instance seg-
iteration, a random sample is obtained (Ii , Mi ) (an image mentation dataset containing 5,000 annotated images di-
and the associated ground truth binary segmentation map). vided into 2,975 images for training, 500 for validation, and
The iteration number 1 ≤ t ≤ T is sampled from a uniform 1,525 for testing.
distribution, and epsilon from a standard distribution. The experimental setting used is sometimes referred to
We then sample xt according to Eq. 5, compute F (xt ) + as interactive segmentation and is motivated by the need to
G(Ii ), and apply networks E and D to obtain θ (xt , Ii , t). accelerate object annotation [1]. Under this setting, there
The loss being minimized is a modified version of Eq 15, are eight object categories, and the goal is to recover the ob-
namely: jects’ per-pixel masks, given a cropped patch that contains
the bounding box around each object.
√ √
Ex0 ,,xe ,t [|| − θ ( ᾱt x0 + 1 − ᾱt , Ii , t)||2 ]. (18) Our per-object training and validation sets are created by
taking crops from images in the original Cityscapes sets us-
At training time, the ground truth segmentation of the ing the locations of the ground truth classes (we do not have
input image Ii is known, and the loss is computed by setting access to the ground truth labels of the original Cityscapes
x 0 = Mi . test set).

4
(a) (b) (c) (d)

Figure 2. Obtaining multiple segmentation results for Cityscapes, Vaihingen, and MoNuSeg. (a) input image, (b) a subset of the obtained
results for multiple runs on the same input, visualized by the jet color scale between 0 in blue and 1 in red, (c) average result, and (d)
ground truth.

We compared our method for the Cityscapes dataset form [28], since it uses a different protocol. Specifically,
with PSPDeepLab [5], Polygon-RNN++ [1], Curve- this method, which improves upon Mask R-CNN [16], uti-
GCN [29] Deep active contours [14], Segformer-B5 [50] lizes the entire image (and not just the segmentation patch)
and Stdc1 [11]. For most baselines, we report the results as part of its inputs, and does not work on standard patches
obtained from previous publications. For Segformer and in a way that would enable a direct comparison.
Stdc, we train from scratch. The Vaihingen dataset [39] contains 168 aerial images of
We did not perform a comparison with PolyTrans- Vaihingen, in Germany, divided into 100 images for train-

5
ing and 68 for the test. The task is to segment the central including random scaling in the range of [0.75, 1.25], with
building in each image. For this dataset, the leading base- up to 22 degrees rotation in each direction, and a horizontal
lines are DSAC [34], DarNet [7], TDAC [15], Deep active flip with a probability of 0.5.
contours [14], FCN-UNET [38], FCN-ResNet-34, FCN- For the Vaihingen dataset, the size of the input image
HarDNet-85 [4], Segformer-B5 [50] and Stdc1 [11]. and the test image resolution was 256 × 256. The experi-
The MoNuSeg dataset [25, 26] contains a training set ments were performed with a batch size of eight images, six
with 30 microscopic images from seven organs, with an- RRDB blocks, and a depth of seven. The number of chan-
notations of 21,623 individual nuclei. The test dataset con- nels was set to [C, C, C, 2C, 2C, 4C, 4C] with C = 128.
tains 14 similar images. We resized the images to a res- The same augmentations are used as in [7]: random scal-
olution of 512 × 512, following [45]. The relevant base- ing by a factor sampled uniformly in the range [0.75, 1.5], a
line methods are FCN [3], UNET [38], UNET++ [53], Res- rotation sampled uniformly between zero and 360 degrees,
Unet [48], Axial attention (A.A) Unet [46] and Medical independent horizontal and vertical flips, applied with a
transformer [45]. probability of 0.5, and a random color jitter, with a max-
Evaluation The Cityscapes dataset is evaluated using the imum value of 0.6 brightness, 0.5 contrast, 0.4 saturation,
common metrics of mean Intersection-over-Union (mIoU) and 0.025 hue.
per class. For MoNuSeg, the input image resolution was 256×256,
but the test resolution was 512 × 512. To address this, we
N
X T P (yi , yî ) applied a sliding window of 256 × 256 with a stride of 256,
mIoU (yi , yî ) =
T P (yi , yî ) + F N (yi , yî ) + F P (yi , yî ) i.e., we tested each quadrant of the image separately.
i=1
(19) The experiments were carried out with a batch size of
Where N is the number of classes in the dataset, TP is the eight images, with 12 RRDB blocks. The network depth
true positive between the ground truth y and output mask ŷ, was seven, and the number of channels in each depth was
FN is a false negative, and FP is a false positive. [C, C, C, 2C, 2C, 4C, 4C], with C = 128. We used the
The Vaihingen dataset is evaluated using several metrics: same augmentation scheme as in [45] with random crop-
mIoU, F1-score, Weighted Coverage (WCov), and Bound- ping of 256 × 256 to adjust for GPU memory.
ary F-score (BoundF), as described in [7]. Briefly, the pre- It is worth noting that all baseline methods except Seg-
diction is correct if it is within a certain distance threshold former and Stdf rely on pre-trained weights obtained on the
from the ground truth. The benchmarks use five thresholds, ImageNet, PASCAL or COCO datasets. Our networks are
from 1px to 5px, for evaluating performance. initialized with random weights.
Following previous work, evaluation on the MoNuSeg Results Following previous work, Cityscapes is evalu-
dataset is performed using mIoU and the F1-score. ated in one of two settings. Tight: in this setting, the sam-
Training details The number of diffusion steps in previ- ples (image and associated segmentation map) are extracted
ous works was 1000 [18] and even 4000 [35]. The literature by a tight crop around the object mask. Expansion: sam-
suggests that more is better [42]. In our main experiments, ples are extracted by a crop around the object mask, which
we employ 100 diffusion steps to reduce inference time. An is 15% larger than the tight crop. The inputs of the model
additional set of experiments investigated the influence of are crops 10% - 20% larger than the tight one. This setting
the number of diffusion steps on the performance and run- is slightly more challenging, since there is less information
time of the method. on the location of the target object.
The AdamW [32] optimizer is used in all our experi- The results for the Cityscapes dataset are reported in
ments. Based on the intuition that the more RRDB blocks, Tab. 1. As can be seen, our method outperforms all baseline
the better the results, we used as many blocks as we could fit methods, across all categories and in both settings.
on the GPU without overly reducing batch size. The Unet The gap is apparent even for the most recent baseline
used for datasets with a resolution of 256 × 256 has one methods and, as can be seen in Fig. 3, the gap in perfor-
additional layer with respect to the dataset with half that mance is especially sizable for datasets with less training
resolution, in order to account for the spatial dimensions. images.
On the Cityscapes dataset, the input resolution of our The results for the Vaihingen dataset are presented in
model is 128 × 128. The test metrics are computed on the Tab. 2. As can be seen, our method outperforms the results
original resolution; therefore, we resized the prediction to reported in previous work for all four scores.
the original image size. The results for the MoNuSeg dataset are presented in
Training took place with a batch size of 30 images. The Tab. 3. In both segmentation metrics, our method outper-
network had 15 RRDB blocks and a depth of six. The num- forms all previous works, including very recent variants of
ber of channels was set to [C, C, 2C, 2C, 4C, 4C] with C = U-Net and transformers that were developed specifically for
128. We followed the same augmentation scheme as in [14], this segmentation task.

6
Method Bicycle Bus Person Train Truck M.cycle Car Rider Mean
Polygon-RNN++ [1] 63.06 81.38 72.41 64.28 78.90 62.01 79.08 69.95 71.38
expansion

PSP-DeepLab [5] 67.18 83.81 72.62 68.76 80.48 65.94 80.45 70.00 73.66
Polygon-GCN [29] 66.55 85.01 72.94 60.99 79.78 63.87 81.09 71.00 72.66
Spline-GCN [29] 67.36 85.43 73.72 64.40 80.22 64.86 81.88 71.73 73.70
SegDiff (ours) 69.80 85.97 76.09 75.95 80.68 67.06 83.40 72.57 76.44
Deep contour [14] 68.08 83.02 75.04 74.53 79.55 66.53 81.92 72.03 75.09
Segformer-B5 [50] 68.02 78.78 73.53 68.46 74.54 64.06 83.20 69.12 72.46
Stdc1 [11] 67.86 80.67 74.20 69.73 77.02 64.52 83.53 69.58 73.39
Stdc2 [11] 68.67 81.29 74.41 71.36 75.71 63.69 83.51 69.90 73.57
SegDiff (ours) 69.62 84.64 75.18 74.89 80.34 67.75 83.63 73.49 76.19

Table 1. Cityscapes segmentation results for two protocols: the top part refers to segmentation results with 15% expansion around the
bounding box; the bottom part refers to segmentation results with a tight bounding box.

Method F1-Score mIoU WCov FBound

FCN-UNet [38] 87.40 78.60 81.80 40.20
FCN-ResNet34 91.76 87.20 88.55 75.12
FCN-HarDNet [4] 93.97 88.95 93.60 80.20
DSAC [34] - 71.10 70.70 36.40
DarNet [7] 93.66 88.20 88.10 75.90
Deep contour [14] 94.80 90.33 93.72 78.72
TDAC [15] 94.26 89.16 90.54 78.12
Segformer-B5 [50] 93.94 88.57 91.91 77.95
Stdc1 [11] 94.04 88.75 92.78 78.86 Figure 3. mIoU relative to SegDiff being 100% for each Cityscape
class, sorted by the number of training images per class.
Stdc2 [11] 93.97 88.62 92.59 77.3
SegDiff (ours) 95.14 91.12 93.83 85.09
Method F1-Score IoU WCov FBound
Table 2. Segmentation results for the Vaihingen dataset.
Variant one 91.60 85.45 88.67 71.70
Method Dice mIoU
Variant two 90.92 84.00 89.67 70.00
Vaihingen

FCN [3] 28.84 28.71 Variant three 93.77 88.67 91.69 80.15
U-Net [38] 79.43 65.99 Variant four 94.77 90.27 93.82 82.64
U-Net++ [53] 79.49 66.04 Variant five 93.16 87.76 91.08 79.89
Res-UNet [48] 79.49 66.07 Variant six 91.97 85.57 89.83 71.04
A.A U-Net [46] 76.83 62.49 Full method 94.95 90.64 94.00 84.37
MedT [45] 79.55 66.17
Ours 81.59 69.00 Variant one 90.52 84.15 90.37 62.66
Cityscapes “Bus”

Variant two 85.21 75.92 81.15 38.81

Variant three 90.35 83.76 88.56 58.80
Table 3. Segmentation results for the MoNuSeg dataset.
Variant four 91.30 85.17 90.34 63.85
The performance of the mIoU segmentation metric as Variant five 89.57 82.73 88.87 58.40
a function of the number of iterations is presented for the Variant six 82.97 72.66 80.85 34.38
three datasets in Fig. 4. It is interesting to note that the Full method 90.72 84.35 89.96 63.87
number of diffusion steps required to achieve the maximal
score differs across the datasets. Table 4. Ablation study for different conditioning methods.
All Cityscapes classes present similar behavior (with dif- tion earlier, similarly to the Cityscapes classes. An alterna-
ferent levels of performance), saturating around the 60th it- tive hypothesis can relate the number of required iterations
eration. The Vaihingen score takes longer to reach its max- to the ratio of pixels within the segmentation mask, which
imal value. While one may attribute this to the larger in- is higher for Cityscapes and MoNuSeg than for Vaihingen.
put image size observed by the network, MoNuSeg, which This requires further validation.
has the same input image size as Vaihingen, reaches satura- We next study the effect of the number of generated in-

7
(a) (b)

Figure 4. mIoU (mean and variance) across the test images as a function of the number of diffusion steps. (a) Results for the Cityscapes
classes, with 128 × 128 image resolution. (b) Results for the Vaihingen and MoNuSeg datasets, with 256 × 256 image resolution.

(a) (b)

Figure 5. mIoU per number of generated inferences. (a) Results for the Cityscapes classes, with 128 × 128 image resolution. (b) Results
for the Vaihingen and MoNuSeg datasets, with 256 × 256 image resolution.

stances on performance. The results can be seen in Fig. 5. Another aspect of achieving improvement by employing
In general, increasing the number of generated instances multiple generations is calibration. The calibration score is
tends to increase the mIoU score. However, the number of measured as the difference between the prediction proba-
runs required to reach optimal performance varies between bility and the true probability of the event. For example,
classes. For example, for the “Bus” and “Train” classes of a perfectly calibrated model is defined by P(Ŷ = Y |P̂ =
Cityscapes, the best score is achieved when using 10 and 3 p) = p, which means that the prediction probability equals
generated instances, respectively. MoNuSeg, requires con- the true probability of the event. We estimate the calibration
siderably more runs (25) for maximal performance. On the score by splitting the [0, 1] range into ten uniform bins, then
other hand, when the number of generated instances is in- average the squared difference between each bin’s mean
creased, inference time also increases linearly, resulting in a prediction probability and the percentage of positive sam-
slower method compared to architectures such as Segformer ples.
and Stdc. The results of examining the calibration scores are pre-

8
(a) (b)

Figure 6. Mean calibration score (lower is better) per number of generated inferences. The error bars depict the standard error. (a) Results
for the Cityscapes classes, with an image resolution of 128 × 128. (b) Results for the Vaihingen and MoNuSeg datasets, with the 256 × 256
image resolution.

(a) (b) (c) (d) (e) (f) (g)

Figure 7. Results of the ablation study. (a) the input image, (b-e) results for variants one–four of our method, respectively, (f) the result of
our method, and (g) ground truth. Panels (b-f) employ the jet color scale between 0 in blue and 1 in red

sented in Fig. 6. For most datasets, increasing the num- than other experiments with a larger number of generated
ber of generated instances improves the calibration score, instances. This phenomenon may be a result of the highly
especially when the increase is from a single instance. In varied size and the small number of test images.
addition, for the larger classes in Cityscapes - Rider and Bi-
cycle - and for the MoNuSeg and Vaihingen datasets, the Ablation Study We evaluate various alternatives to our
improvement continues to increase even more compared to method. The first variant concatenates [F (xt ), G(I)] at
the other datasets. The “Train” class in Cityscapes is an ex- the channel dimension. The second variant employs FC-
ception; here, the single-instance calibration score is better HarDNet-70 V2 [4] instead of RRDBs. The third variant,
following [19, 41], concatenates I channelwise to xt , with-

9
(a) (b)

Figure 8. mIoU per number of RRDB blocks. (a) Results on Vaihingen, (b) Results on Cityscapes “Bus”.

(a) (b)

Figure 9. Generation time in seconds and mIou per number of diffusion steps for Vaihingen and Cityscapes “Bus”. (a) mIoU per diffusion
step, (b) Time per diffusion step.

out using an encoder. The last alternative method is to prop- a large margin, while on Cityscapes “Bus”, the difference
agate F (xt ) through the U-Net module and add it to G(I) is small. The RRDB blocks are preferable to the FC-
after the first, third, and fifth downsample blocks (variants HarDNet architecture in both datasets (variant two). Re-
four–six), instead of performing F (xt ) + G(I). In this vari- moving the encoder affects the metrics significantly (vari-
ant, G(I) is downsampled to match the required number ant three), slightly more so on Vaihingen. The change in the
of channels by propagating it through a 2D-convolutional signal’s integration position of variant four leads to a neg-
layer with a stride of two. ligible difference on Vaihingen and even outperforms our
These variant experiments were tested by averaging full method on Cityscapes “Bus”. Variants five and six lead
nine generated instances on the Vaihingen dataset and on to a decrease in performance as the distance from the first
Cityscapes “Bus” (the performance reported for our method layer increases. Fig. 7 depicts sample results for the various
is therefore slightly different from those reported in Tab. 2). variants (one–four) of the Vaihingen dataset.
The summation we introduce as a conditioning approach Parameter sensitivity For testing the stability of our pro-
outperforms concatenation (variant one) on Vaihingen by posed method, we experimented with the two hyperparam-

10
eters that can affect performance the most: the number of generation, similarly to other recent diffusion models.
diffusion steps, and the number of RRDB blocks. To study In order to condition the input image, we generate an-
the effect of these parameters, we varied the number of dif- other encoding path, which is similar to U-Net’s encoder-
fusion steps in the range of [25, 50, 75, 100, 150, 200], and decoder use in conventional image segmentation methods.
the number of RRDB blocks in the range of [1, 3, 5, 10] for The two encoder pathways are merged by summing the ac-
Vaihingen and [5, 10, 15, 20, 25] for Cityscapes “Bus”. We tivations early in the U-Net’s encoder.
started from a baseline configuration (which was 100 dif- Using our approach, we obtain state-of-the-art segmenta-
fusion steps, 3 RRDB blocks for Vaihingen, and 10 RRDB tion results on a diverse set of benchmarks, including street
blocks for Cityscapes “Bus”) and experimented with differ- view images, aerial images, and microscopy.
ent values around these.
The effect of the number of RRDB blocks In this part 7. Acknowledgments
we set the number of diffusion steps to 100. As can be
seen in Fig. 8, with our configuration, the optimal number This project has received funding from the European Re-
of RRDB blocks is 3 for Vaihingen, and 10 for Cityscapes search Council (ERC) under the European Unions Horizon
“Bus”. However, evidently, the number of blocks has a lim- 2020 research and innovation programme (grant ERC CoG
ited impact in the case of both Cityscapes and Vaihingen. 725974).
The gap between the best and worst performance points is
less than 1 mIoU for Vaihingen and less than 2 mIoU for References
Cityscapes “Bus”. Therefore, we conclude that this hyper- [1] David Acuna, Huan Ling, Amlan Kar, and Sanja Fidler. Ef-
parameter has a small effect on performance. ficient interactive annotation of segmentation datasets with
Varying the number of diffusion steps T In this part, polygon-rnn++. In Proceedings of the IEEE conference on
we set the number of RRDB blocks of Vaihingen to 3 and Computer Vision and Pattern Recognition, pages 859–868,
Cityscapes “Bus” to 10. We explore the possible accu- 2018. 4, 5, 7
racy/runtime tradeoff with regards to the number T of dif- [2] Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tar-
fusion steps. Results are shown in Fig. 9. low, and Rianne van den Berg. Structured denoising dif-
fusion models in discrete state-spaces. Advances in Neural
When the number of diffusion steps is increased - as we
Information Processing Systems, 34:17981–17993, 2021. 2
can see in Fig. 9(a) - the graph fluctuation for Vaihingen is
[3] Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla.
less than 1 mIoU, and for Cityscapes “Bus” it is less than 2
Segnet: A deep convolutional encoder-decoder architecture
mIoU. for image segmentation. IEEE transactions on pattern anal-
Surprisingly, when the number of diffusion steps is re- ysis and machine intelligence, 39(12):2481–2495, 2017. 6,
duced, even to just 25, which is a very low number com- 7
pared to the literature [18, 35], the segmentation results re- [4] Ping Chao, Chao-Yang Kao, Yu-Shan Ruan, Chien-Hsiang
main stable in both datasets, with a degradation of only up Huang, and Youn-Long Lin. Hardnet: A low memory traf-
to 2 mIou for Vaihingen, and 1 mIou for Cityscapes “Bus”. fic network. In Proceedings of the IEEE/CVF international
This reduction can speed up performance by a factor of four conference on computer vision, pages 3552–3561, 2019. 1,
and provide a reasonable accuracy to runtime tradeoff. 6, 7, 9
The results for the generation time of one sample in sec- [5] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos,
onds are presented in Fig. 9(b). As can be observed, both Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image
graphs are linear, with a different slope. The main reason segmentation with deep convolutional nets, atrous convolu-
tion, and fully connected crfs. IEEE transactions on pattern
for this is the difference in image size (which is 256 × 256
analysis and machine intelligence, 40(4):834–848, 2017. 5,
for Vaihingen and 128 × 128 for Cityscapes “Bus”). An-
7
other, minor reason is the difference in the number of RRDB
[6] Nanxin Chen, Yu Zhang, Heiga Zen, Ron J Weiss, Mo-
blocks in this experiment. hammad Norouzi, and William Chan. Wavegrad: Esti-
mating gradients for waveform generation. arXiv preprint
6. Conclusions arXiv:2009.00713, 2020. 2
[7] Dominic Cheng, Renjie Liao, Sanja Fidler, and Raquel Ur-
A wealth of methods have been applied to image seg-
tasun. Darnet: Deep active ray network for building seg-
mentation, including active contour and their deep vari- mentation. In Proceedings of the IEEE/CVF Conference
ants, encoder-decoder architectures, and U-Nets, which - to- on Computer Vision and Pattern Recognition, pages 7431–
gether with more recent, transformer-based methods - rep- 7439, 2019. 6, 7
resent a leading approach. In this work, we propose utiliz- [8] Jooyoung Choi, Sungwon Kim, Yonghyun Jeong, Youngjune
ing the state-of-the-art image generation technique of diffu- Gwon, and Sungroh Yoon. Ilvr: Conditioning method for
sion models. Our diffusion model employs a U-Net archi- denoising diffusion probabilistic models. arXiv preprint
tecture, which is used to incrementally improve the obtained arXiv:2108.02938, 2021. 2

11
[9] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo [23] Diederik P Kingma, Tim Salimans, Ben Poole, and
Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Jonathan Ho. Variational diffusion models. arXiv preprint
Franke, Stefan Roth, and Bernt Schiele. The cityscapes arXiv:2107.00630, 2021. 2
dataset for semantic urban scene understanding. In Proceed- [24] Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and
ings of the IEEE conference on computer vision and pattern Bryan Catanzaro. Diffwave: A versatile diffusion model for
recognition, pages 3213–3223, 2016. 1, 4 audio synthesis. arXiv preprint arXiv:2009.09761, 2020. 2
[10] Prafulla Dhariwal and Alexander Nichol. Diffusion models [25] Neeraj Kumar, Ruchika Verma, Deepak Anand, Yanning
beat gans on image synthesis. Advances in Neural Informa- Zhou, Omer Fahri Onder, Efstratios Tsougenis, Hao Chen,
tion Processing Systems, 34, 2021. 1, 2 Pheng-Ann Heng, Jiahui Li, Zhiqiang Hu, et al. A multi-
[11] Mingyuan Fan, Shenqi Lai, Junshi Huang, Xiaoming Wei, organ nucleus segmentation challenge. IEEE transactions
Zhenhua Chai, Junfeng Luo, and Xiaolin Wei. Rethinking on medical imaging, 39(5):1380–1391, 2019. 1, 6
bisenet for real-time semantic segmentation. In Proceedings [26] Neeraj Kumar, Ruchika Verma, Sanuj Sharma, Surabhi
of the IEEE/CVF conference on computer vision and pattern Bhargava, Abhishek Vahadane, and Amit Sethi. A dataset
recognition, pages 9716–9725, 2021. 5, 6, 7 and a technique for generalized nuclear segmentation for
[12] Volker Fischer, Mummadi Chaithanya Kumar, Jan Hendrik computational pathology. IEEE transactions on medical
Metzen, and Thomas Brox. Adversarial examples for seman- imaging, 36(7):1550–1560, 2017. 1, 6
tic image segmentation. arXiv preprint arXiv:1703.01101,
[27] Haoying Li, Yifan Yang, Meng Chang, Shiqi Chen, Huajun
2017. 1
Feng, Zhihai Xu, Qi Li, and Yueting Chen. Srdiff: Single
[13] Jun Fu, Jing Liu, Jie Jiang, Yong Li, Yongjun Bao, and Han- image super-resolution with diffusion probabilistic models.
qing Lu. Scene segmentation with dual relation-aware atten- Neurocomputing, 2022. 1, 2
tion network. IEEE Transactions on Neural Networks and
[28] Justin Liang, Namdar Homayounfar, Wei-Chiu Ma, Yuwen
Learning Systems, 32(6):2547–2560, 2020. 1
Xiong, Rui Hu, and Raquel Urtasun. Polytransform: Deep
[14] Shir Gur, Tal Shaharabany, and Lior Wolf. End to end polygon transformer for instance segmentation. In Proceed-
trainable active contours via differentiable rendering. arXiv ings of the IEEE/CVF Conference on Computer Vision and
preprint arXiv:1912.00367, 2019. 5, 6, 7 Pattern Recognition, pages 9131–9140, 2020. 5
[15] Ali Hatamizadeh, Debleena Sengupta, and Demetri Ter-
[29] Huan Ling, Jun Gao, Amlan Kar, Wenzheng Chen, and Sanja
zopoulos. End-to-end trainable deep active contour models
Fidler. Fast interactive object annotation with curve-gcn. In
for automated image segmentation: Delineating buildings in
Proceedings of the IEEE/CVF Conference on Computer Vi-
aerial imagery. In European Conference on Computer Vision,
sion and Pattern Recognition, pages 5257–5266, 2019. 5,
pages 730–746. Springer, 2020. 6, 7
7
[16] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Gir-
[30] Jinglin Liu, Chengxi Li, Yi Ren, Feiyang Chen, Peng Liu,
shick. Mask r-cnn. In Proceedings of the IEEE international
and Zhou Zhao. Diffsinger: Singing voice synthesis via shal-
conference on computer vision, pages 2961–2969, 2017. 5
low diffusion mechanism. arXiv preprint arXiv:2105.02446,
[17] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
2021. 2
Deep residual learning for image recognition. In Proceed-
ings of the IEEE conference on computer vision and pattern [31] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully
recognition, pages 770–778, 2016. 4 convolutional networks for semantic segmentation. In Pro-
ceedings of the IEEE conference on computer vision and pat-
[18] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu-
tern recognition, pages 3431–3440, 2015. 1, 2
sion probabilistic models. Advances in Neural Information
Processing Systems, 33:6840–6851, 2020. 2, 3, 6, 11 [32] Ilya Loshchilov and Frank Hutter. Decoupled weight decay
regularization. arXiv preprint arXiv:1711.05101, 2017. 6
[19] Jonathan Ho, Chitwan Saharia, William Chan, David J Fleet,
Mohammad Norouzi, and Tim Salimans. Cascaded diffu- [33] Pauline Luc, Camille Couprie, Soumith Chintala, and Jakob
sion models for high fidelity image generation. Journal of Verbeek. Semantic segmentation using adversarial networks.
Machine Learning Research, 23(47):1–33, 2022. 1, 2, 9 arXiv preprint arXiv:1611.08408, 2016. 1
[20] Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick [34] Diego Marcos, Devis Tuia, Benjamin Kellenberger, Lisa
Forré, and Max Welling. Argmax flows and multinomial dif- Zhang, Min Bai, Renjie Liao, and Raquel Urtasun. Learn-
fusion: Towards non-autoregressive language models. arXiv ing deep structured active contours end-to-end. In Proceed-
e-prints, pages arXiv–2102, 2021. 2 ings of the IEEE Conference on Computer Vision and Pattern
[21] Chin-Wei Huang, Jae Hyun Lim, and Aaron C Courville. A Recognition, pages 8877–8885, 2018. 6, 7
variational perspective on diffusion-based generative models [35] Alexander Quinn Nichol and Prafulla Dhariwal. Improved
and score matching. Advances in Neural Information Pro- denoising diffusion probabilistic models. In International
cessing Systems, 34, 2021. 2 Conference on Machine Learning, pages 8162–8171. PMLR,
[22] Zilong Huang, Xinggang Wang, Lichao Huang, Chang 2021. 2, 3, 4, 6, 11
Huang, Yunchao Wei, and Wenyu Liu. Ccnet: Criss-cross [36] Yuval Nirkin, Lior Wolf, and Tal Hassner. Hyperseg: Patch-
attention for semantic segmentation. In Proceedings of the wise hypernetwork for real-time semantic segmentation. In
IEEE/CVF International Conference on Computer Vision, Proceedings of the IEEE/CVF Conference on Computer Vi-
pages 603–612, 2019. 1 sion and Pattern Recognition, pages 4061–4070, 2021. 2

12
[37] Vadim Popov, Ivan Vovk, Vladimir Gogoryan, Tasnima [50] Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar,
Sadekova, and Mikhail Kudinov. Grad-tts: A diffusion prob- Jose M Alvarez, and Ping Luo. Segformer: Simple and
abilistic model for text-to-speech. In International Confer- efficient design for semantic segmentation with transform-
ence on Machine Learning, pages 8599–8608. PMLR, 2021. ers. Advances in Neural Information Processing Systems,
2 34, 2021. 1, 2, 5, 6, 7
[38] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- [51] Yuan Xue, Tao Xu, Han Zhang, L Rodney Long, and Xi-
net: Convolutional networks for biomedical image segmen- aolei Huang. Segan: adversarial network with multi-scale
tation. In International Conference on Medical image com- l1 loss for medical image segmentation. Neuroinformatics,
puting and computer-assisted intervention, pages 234–241. 16(3):383–392, 2018. 1
Springer, 2015. 1, 2, 3, 4, 6, 7 [52] Fisher Yu and Vladlen Koltun. Multi-scale context
[39] Franz Rottensteiner, Gunho Sohn, Markus Gerke, and aggregation by dilated convolutions. arXiv preprint
Jan D Wegner. Isprs semantic labeling contest. ISPRS: arXiv:1511.07122, 2015. 1
Leopoldshöhe, Germany, 2014. 1, 5 [53] Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima
[40] Chitwan Saharia, William Chan, Huiwen Chang, Chris A Tajbakhsh, and Jianming Liang. Unet++: A nested u-net
Lee, Jonathan Ho, Tim Salimans, David J Fleet, and Mo- architecture for medical image segmentation. In Deep learn-
hammad Norouzi. Palette: Image-to-image diffusion mod- ing in medical image analysis and multimodal learning for
els. arXiv preprint arXiv:2111.05826, 2021. 2 clinical decision support, pages 3–11. Springer, 2018. 1, 6,
[41] Chitwan Saharia, Jonathan Ho, William Chan, Tim Sal- 7
imans, David J Fleet, and Mohammad Norouzi. Image
super-resolution via iterative refinement. arXiv preprint
arXiv:2104.07636, 2021. 1, 2, 9
[42] Robin San-Roman, Eliya Nachmani, and Lior Wolf. Noise
estimation for generative diffusion models. arXiv preprint
arXiv:2104.02600, 2021. 6
[43] Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan,
and Surya Ganguli. Deep unsupervised learning using
nonequilibrium thermodynamics. In International Confer-
ence on Machine Learning, pages 2256–2265. PMLR, 2015.
2
[44] Robin Strudel, Ricardo Garcia, Ivan Laptev, and Cordelia
Schmid. Segmenter: Transformer for semantic segmenta-
tion. In Proceedings of the IEEE/CVF International Confer-
ence on Computer Vision, pages 7262–7272, 2021. 1
[45] Jeya Maria Jose Valanarasu, Poojan Oza, Ilker Hacihaliloglu,
and Vishal M Patel. Medical transformer: Gated axial-
attention for medical image segmentation. In International
Conference on Medical Image Computing and Computer-
Assisted Intervention, pages 36–46. Springer, 2021. 6, 7
[46] Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam,
Alan Yuille, and Liang-Chieh Chen. Axial-deeplab: Stand-
alone axial-attention for panoptic segmentation. In European
Conference on Computer Vision, pages 108–126. Springer,
2020. 6, 7
[47] Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu,
Chao Dong, Yu Qiao, and Chen Change Loy. Esrgan: En-
hanced super-resolution generative adversarial networks. In
Proceedings of the European conference on computer vision
(ECCV) workshops, pages 0–0, 2018. 4
[48] Xiao Xiao, Shen Lian, Zhiming Luo, and Shaozi Li.
Weighted res-unet for high-quality retina vessel segmenta-
tion. In 2018 9th international conference on information
technology in medicine and education (ITME), pages 327–
331. IEEE, 2018. 6, 7
[49] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou,
Lingxi Xie, and Alan Yuille. Adversarial examples for se-
mantic segmentation and object detection. In Proceedings of
the IEEE international conference on computer vision, pages
1369–1378, 2017. 1

Unilag MBA Past Question and Answers
No ratings yet
Unilag MBA Past Question and Answers
3 pages
Diffusion: by Aryan Jain
100% (1)
Diffusion: by Aryan Jain
55 pages
Jawahar Navodaya Vidyalaya Entrance Exam Class 6 Sample Paper Math PDF
No ratings yet
Jawahar Navodaya Vidyalaya Entrance Exam Class 6 Sample Paper Math PDF
2 pages
2209.04747v6
No ratings yet
2209.04747v6
25 pages
Diffusion Models A Concise Perspective
No ratings yet
Diffusion Models A Concise Perspective
8 pages
Diffusion Models in Vision a Survey
No ratings yet
Diffusion Models in Vision a Survey
20 pages
NeurIPS 2021 Diffusion Models Beat Gans On Image Synthesis Paper
No ratings yet
NeurIPS 2021 Diffusion Models Beat Gans On Image Synthesis Paper
15 pages
2312.14977diffusion Models For Generative Artificial
No ratings yet
2312.14977diffusion Models For Generative Artificial
23 pages
Universal Guidance For Diffusion Models
No ratings yet
Universal Guidance For Diffusion Models
20 pages
Diffusion
No ratings yet
Diffusion
55 pages
Diffusion With Forward Models Solving Stochastic Inverse Problems Without Direct Supervision
No ratings yet
Diffusion With Forward Models Solving Stochastic Inverse Problems Without Direct Supervision
14 pages
Efficient Diffusion Model For Image Super Resolution
No ratings yet
Efficient Diffusion Model For Image Super Resolution
19 pages
latent(1)
No ratings yet
latent(1)
10 pages
2582 Elucidating The Design Space o
No ratings yet
2582 Elucidating The Design Space o
13 pages
An In-Depth Guide To Denoising Diffusion Probabilistic Models - From Theory To Implementation
No ratings yet
An In-Depth Guide To Denoising Diffusion Probabilistic Models - From Theory To Implementation
18 pages
Diffusion
100% (5)
Diffusion
62 pages
DMD Lowres
No ratings yet
DMD Lowres
22 pages
Elucidating The Design Space of Diffusion-Based Generative Models
No ratings yet
Elucidating The Design Space of Diffusion-Based Generative Models
47 pages
Diffusion Model 5
No ratings yet
Diffusion Model 5
51 pages
Week 4 - Diffusion Models
No ratings yet
Week 4 - Diffusion Models
35 pages
A Survey On Generative Diffusion Models
No ratings yet
A Survey On Generative Diffusion Models
26 pages
Progressive Distillation For Fast Sampling
No ratings yet
Progressive Distillation For Fast Sampling
21 pages
Weekly Update 10-05-24
No ratings yet
Weekly Update 10-05-24
1 page
Li Your Diffusion Model is Secretly a Zero-Shot Classifier ICCV 2023 Paper
No ratings yet
Li Your Diffusion Model is Secretly a Zero-Shot Classifier ICCV 2023 Paper
12 pages
Diffusion Models
No ratings yet
Diffusion Models
46 pages
PFGM++ - Unlocking The Potential of Physics-Inspired Generative Models
No ratings yet
PFGM++ - Unlocking The Potential of Physics-Inspired Generative Models
23 pages
Denoising Diffusion Probabilistic Models For Generation of Realistic Fully-Annotated Microscopy Image Datasets PLOS Computational Biology
No ratings yet
Denoising Diffusion Probabilistic Models For Generation of Realistic Fully-Annotated Microscopy Image Datasets PLOS Computational Biology
10 pages
Universal Guidance For Diffusion Models
No ratings yet
Universal Guidance For Diffusion Models
10 pages
CVPR2022 Tutorial Diffusion Model
No ratings yet
CVPR2022 Tutorial Diffusion Model
188 pages
New Denoising Diffusion Model
No ratings yet
New Denoising Diffusion Model
13 pages
Progressive Deblurring of Diffusion Models for Coarse-to-Fine Image Synthesis
No ratings yet
Progressive Deblurring of Diffusion Models for Coarse-to-Fine Image Synthesis
10 pages
Wei 2023 Diffusion Model As Mae
No ratings yet
Wei 2023 Diffusion Model As Mae
18 pages
2310.09760
No ratings yet
2310.09760
5 pages
Diffusion Models in Deep Learning
No ratings yet
Diffusion Models in Deep Learning
14 pages
Consistency Models
No ratings yet
Consistency Models
41 pages
Conditional Diffusion Model
No ratings yet
Conditional Diffusion Model
11 pages
Improved Denoising Diffusion Probabilistic Models
No ratings yet
Improved Denoising Diffusion Probabilistic Models
17 pages
Final Term Paper Draft 2
No ratings yet
Final Term Paper Draft 2
33 pages
Kingma 等 - 2023 - Variational Diffusion Models
No ratings yet
Kingma 等 - 2023 - Variational Diffusion Models
27 pages
Image-to-Image Difussion Models
No ratings yet
Image-to-Image Difussion Models
29 pages
Image Super-Resolution Via Iterative Refinement
No ratings yet
Image Super-Resolution Via Iterative Refinement
28 pages
Denoising Diffusion Restoration Models
No ratings yet
Denoising Diffusion Restoration Models
32 pages
Diffusion Model
No ratings yet
Diffusion Model
17 pages
Diffusion Models in Vision: A Survey: IEEE Transactions On Pattern Analysis and Machine Intelligence March 2023
No ratings yet
Diffusion Models in Vision: A Survey: IEEE Transactions On Pattern Analysis and Machine Intelligence March 2023
26 pages
Demystifying Variational Diffusion Models
No ratings yet
Demystifying Variational Diffusion Models
48 pages
Diffusion Models: A Comprehensive Survey of Methods and Applications
No ratings yet
Diffusion Models: A Comprehensive Survey of Methods and Applications
54 pages
diffusion-csail-lecture-notes
No ratings yet
diffusion-csail-lecture-notes
56 pages
Denoising Diffusion Implicit Models
No ratings yet
Denoising Diffusion Implicit Models
22 pages
2306.02949v1 (1)
No ratings yet
2306.02949v1 (1)
6 pages
2310.04378v1
No ratings yet
2310.04378v1
18 pages
Neural Network Diffusion: Forward Process
No ratings yet
Neural Network Diffusion: Forward Process
17 pages
Diffusion model
No ratings yet
Diffusion model
16 pages
Image Harmonization With Diffusion Model
No ratings yet
Image Harmonization With Diffusion Model
8 pages
Lecture7-8 Diffusion Model
No ratings yet
Lecture7-8 Diffusion Model
136 pages
Lecture7 8 Diffusion Model 1 78
No ratings yet
Lecture7 8 Diffusion Model 1 78
78 pages
Lecture7 8 - Diffusion - Model 1 78 1 66
No ratings yet
Lecture7 8 - Diffusion - Model 1 78 1 66
66 pages
通过用于医学图像分割的预分割扩散采样加速扩散模型
No ratings yet
通过用于医学图像分割的预分割扩散采样加速扩散模型
5 pages
Diffusion Model Clearly Explained! _ by Steins _ Medium
No ratings yet
Diffusion Model Clearly Explained! _ by Steins _ Medium
18 pages
Consistency Models
No ratings yet
Consistency Models
42 pages
Stable Diffusion For Image Generation
No ratings yet
Stable Diffusion For Image Generation
23 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
SoICT-Eng - ProbComp - Lec 1
No ratings yet
SoICT-Eng - ProbComp - Lec 1
18 pages
SoICT-Eng - ProbComp - Lec 9 - Random Network Models
No ratings yet
SoICT-Eng - ProbComp - Lec 9 - Random Network Models
80 pages
InternImage - Exploring Large-Scale Vision Foundation Models With Deformable Convolutions
No ratings yet
InternImage - Exploring Large-Scale Vision Foundation Models With Deformable Convolutions
12 pages
Battle of The Backbones - A Large-Scale Comparison of Pretrained Models Across Computer Vision Tasks
No ratings yet
Battle of The Backbones - A Large-Scale Comparison of Pretrained Models Across Computer Vision Tasks
29 pages
Use A Transistor As A Heater PDF
100% (1)
Use A Transistor As A Heater PDF
5 pages
4.constraints
No ratings yet
4.constraints
14 pages
TPC 5.1.0 UML&SysML WithPapyrus Features Tutorial
No ratings yet
TPC 5.1.0 UML&SysML WithPapyrus Features Tutorial
43 pages
Chapter 1 - Crystal Structure - Part 1 PDF
No ratings yet
Chapter 1 - Crystal Structure - Part 1 PDF
45 pages
08 Felipe Gabaldon
No ratings yet
08 Felipe Gabaldon
24 pages
Math Class 4 Paper
100% (1)
Math Class 4 Paper
2 pages
Lecture 2 - Intro To Computer and Algorithm
No ratings yet
Lecture 2 - Intro To Computer and Algorithm
81 pages
US Army Medical Course MD0900-100 - Basic Mathematics
No ratings yet
US Army Medical Course MD0900-100 - Basic Mathematics
124 pages
Problem Set 2 Template Chapter 4 5
No ratings yet
Problem Set 2 Template Chapter 4 5
20 pages
G73 Irregular Path Stock Removal Cycle (Group 00) - Lathe: Haas Technical Documentation
No ratings yet
G73 Irregular Path Stock Removal Cycle (Group 00) - Lathe: Haas Technical Documentation
2 pages
Microsoft Word - 2014 ICTM RAA Algebra I -Final
No ratings yet
Microsoft Word - 2014 ICTM RAA Algebra I -Final
4 pages
Allama Iqbal Open University, Islamabad (Department of Statistics) Warning
No ratings yet
Allama Iqbal Open University, Islamabad (Department of Statistics) Warning
3 pages
Python Data Types - Jupyter Notebook_017ae85a1acb61f561c41b1ab55449c7
No ratings yet
Python Data Types - Jupyter Notebook_017ae85a1acb61f561c41b1ab55449c7
17 pages
ICS Sample Questions Converted 1
No ratings yet
ICS Sample Questions Converted 1
42 pages
Be 313 Sim
No ratings yet
Be 313 Sim
72 pages
Engineering Economy - Time Value of Money
No ratings yet
Engineering Economy - Time Value of Money
95 pages
Essential Mathematics for Economic Analysis 6e 6th Edition Knut Sydsaeter - The newest ebook version is ready, download now to explore
50% (2)
Essential Mathematics for Economic Analysis 6e 6th Edition Knut Sydsaeter - The newest ebook version is ready, download now to explore
47 pages
Record Front Sheets
No ratings yet
Record Front Sheets
3 pages
4-4 To 4-8 Review 1
No ratings yet
4-4 To 4-8 Review 1
2 pages
Module2 PDF
No ratings yet
Module2 PDF
29 pages
A Strategic Analysis of Textile Industry of Pakistan
100% (1)
A Strategic Analysis of Textile Industry of Pakistan
96 pages
CS 228: Logic in Computer Science: Krishna. S
No ratings yet
CS 228: Logic in Computer Science: Krishna. S
53 pages
2022 FRQ 답지
No ratings yet
2022 FRQ 답지
9 pages
Some Basic Formulas For Meo Class Exams
0% (1)
Some Basic Formulas For Meo Class Exams
8 pages
Bmat101p Calculus-Lab Lo 1.0 65 Bmat101p
No ratings yet
Bmat101p Calculus-Lab Lo 1.0 65 Bmat101p
2 pages
Mircea Reghis - On Nonuniform Asymptotic Stability
No ratings yet
Mircea Reghis - On Nonuniform Asymptotic Stability
19 pages
Modern Physics
No ratings yet
Modern Physics
3 pages
4Q. Parallel Lines Cut by A Transversal
No ratings yet
4Q. Parallel Lines Cut by A Transversal
5 pages