0% found this document useful (0 votes)

47 views11 pages

Cross Match

This paper proposes a novel two-stream segmentation network for source-free domain adaptation semantic segmentation. It introduces a multimodal auxiliary network that takes depth as additional input to enhance pseudo labels and encourage consistency between predictions. Experiments show the method achieves state-of-the-art mIoU of 57.7% and 57.5% on adapting from GTA5 and SYNTHIA to Cityscapes.

Uploaded by

Kiên Dương Ngô

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views11 pages

Cross Match

Uploaded by

Kiên Dương Ngô

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

This ICCV paper is the Open Access version, provided by the Computer Vision Foundation.

Except for this watermark, it is identical to the accepted version;

the final published version of the proceedings is available on IEEE Xplore.

CrossMatch: Source-Free Domain Adaptive Semantic Segmentation via

Cross-Modal Consistency Training

Yifang Yin1 , Wenmiao Hu2,4 , Zhenguang Liu3 *, Guanfeng Wang4 *, Shili Xiang1 , Roger Zimmermann2
1
Institute for Infocomm Research, A*STAR 2 National University of Singapore
3
Zhejiang Gongshang University 4 Grabtaxi Holdings Pte. Ltd.
{yin yifang, sxiang}@i2r.a-star.edu.sg, [email protected]
[email protected], [email protected], [email protected]

Abstract
Source Source
Auxiliary task
Source-free domain adaptive semantic segmentation has model model
Pseudo labels
gained increasing attention recently. It eases the require- Pseudo
labels
Depth
Semantic loss
Semantic loss
ment of full access to the source domain by transferring Semantic loss Depth loss
Consistency
knowledge only from a well-trained source model. How- loss
Semantic Depth Semantic Semantic
ever, reducing the uncertainty of the target pseudo labels Decoder Decoder Decoder Decoder

becomes inevitably more challenging without the supervi- Image Image RGB-D
Encoder
sion of the labeled source data. In this work, we propose a Encoder Encoder

novel asymmetric two-stream architecture that learns more

robustly from noisy pseudo labels. Our approach simul-
Auxiliary network
taneously conducts dual-head pseudo label denoising and (a) (b)
cross-modal consistency regularization. Towards the for-
Figure 1. Comparison of our proposed framework with existing
mer, we introduce a multimodal auxiliary network during
depth-aware semantic segmentation models. (a) Prior art mostly
training (and discard it during inference), which effectively
adopts a multitask learning framework by adding depth estimation
enhances the pseudo labels’ correctness by leveraging the as an auxiliary task. (b) We introduce a multimodal auxiliary net-
guidance from the depth information. Towards the latter, work that takes depth modality as an additional input for effective
we enforce a new cross-modal pixel-wise consistency be- pseudo label denoising and consistency regularization.
tween the predictions of the two streams, encouraging our
model to behave smoothly for both modality variance and
image perturbations. It serves as an effective regularization distribution shift, e.g., street scenes collected under a cross-
to further reduce the impact of the inaccurate pseudo la- city [11] or cross-weather [44] environment. Unsupervised
bels in source-free unsupervised domain adaptation. Exper- domain adaptation (UDA) techniques have been proposed
iments on GTA5 → Cityscapes and SYNTHIA → Cityscapes to address the domain shift problem, which aim at transfer-
benchmarks demonstrate the superiority of our proposed ring the knowledge learned from a labeled source domain
method, obtaining the new state-of-the-art mIoU of 57.7% to an unlabeled target domain [48, 50, 69, 67]. However,
and 57.5%, respectively. one major limitation of such UDA approaches lies in the re-
quirement for full access to the source dataset. In practice,
the source data may be restricted from being shared due to
1. Introduction proprietary, privacy, or profit related concerns [26].
To cope with data sharing restrictions, recent efforts have
Semantic segmentation predicts pixel-level category la- investigated source-free domain adaptation, which trans-
bels to given scenes. Although deep neural networks have fers knowledge from a well-trained source model (rather
been widely adopted, attaining state-of-the-art performance than from the source data itself) to an unlabeled target do-
relies mainly on the assumption that the training and test- main [39, 31]. Early solutions introduce a generator to es-
ing data follow the same distribution [62, 32, 33]. This as- timate the source domain based on the pre-trained source
sumption is impractical as target scenarios often exhibit a model [31], which can be used to generate fake source sam-
* The corresponding authors. ples for supervision as in typical UDA. However, due to the

21786
lack of supervision from the real source domain, advanced 57.5% on the Cityscapes dataset when adapting from
techniques designed for typical UDA, such as depth-aware the GTA5 and SYNTHIA benchmarks, respectively.
semantic segmentation and pseudo label denoising meth-
ods, may work less satisfactorily in a source-free setting. 2. Related Work
With the above insights, we propose a novel two-stream
Unsupervised domain adaptation Unsupervised domain
segmentation network for source-free UDA. As shown in
adaptation (UDA) aims to improve a model’s performance
Figure 1 (a), existing depth-aware semantic segmentation
on an unlabeled target domain by leveraging the features
for typical UDA mainly adopts a multitask learning frame-
extracted from a labeled source domain [62]. Early works
work where depth estimation is modeled as an auxiliary
adopted adversarial training [18] to reduce the distribu-
task [51, 53]. However, we observe through experiments
tion mismatch between different domains [36, 15, 48, 50].
that the regularization induced by the auxiliary task is quite
Efforts have been made on aligning the distributions at
limited for source-free UDA due to the lack of ground-truth
either the image level [21, 57], the intermediate feature
semantic labels. It cannot effectively prevent the main seg-
level [11, 10] or the output level [48, 50]. Some recent
mentation network from overfitting to the incorrect over-
attempts align the distributions in a class-wise manner in
confident pseudo labels of the target images. To solve this
order to obtain a fine-grained feature alignment [36, 15].
problem, we alternatively propose a multimodal auxiliary
However, these methods rely on cumbersome adversarial
network, as shown in Figure 1 (b), which takes the depth
training that requires access to the source data.
information and the intermediate representations generated
UDA via self-training Pseudo label refinement under a
by the main stream image encoder as the input. We train
self-training framework has achieved competitive results
both the main and the auxiliary streams on the segmentation
in the field of UDA for semantic segmentation [30, 68,
task via self-training, and formulate an explicit cross-modal
70, 23]. Early methods selected highly confident pre-
consistency loss between the output of the two streams for
dictions as pseudo labels based on a confidence thresh-
effective regularization. The benefits of our proposed seg-
old [73, 72]. To improve the robustness of the pseudo labels,
mentation network are threefold:
efforts have been made on prediction ensembling [6, 63],
First, our inference-stage model consists of the main
pseudo label denoising [37, 28, 45, 67], training sample re-
stream only, which is a unimodal model that infers from
weighting [69], augmentation consistency [1, 38], leverag-
RGB images the same way as existing models. Second,
ing high-resolution images [24], and pixel-level contrastive
the asymmetric design of our neural network introduces
learning [58]. However, these approaches also rely on
modality variance in addition to the typical input pertur-
the source-target co-existence to retain task-specific source
bations produced by data augmentation, dropouts, etc. On
knowledge with self-training.
one hand, the auxiliary network better rectifies the pseudo
Source-free UDA Kundu et al. [26] focused on source
labels with multimodal knowledge expansion [61]. On the
model generalization and developed a multi-head frame-
other hand, the cross-modal consistency effectively trans-
work trained by extending the source data with diverse data
fers the knowledge learned from the multimodal auxiliary
augmentations. Teja and Fleuret [39] focused on target do-
network to the unimodal main network. Third, our pro-
main adaptation and proposed to reduce the prediction un-
posed framework has better feasibility compared to exist-
certainty by feature corruption with entropy regularization.
ing depth-aware UDA as ours only requires the depth in-
Liu et al. [31] leveraged a generator to estimate the source
formation in the target domain. Without annotation cost,
data distribution, based on which fake samples were syn-
the depth information can be easily learned from video se-
thesized for training. Qiu et al. [40] proposed to generate
quences or stereo images based on self-supervised depth es-
per-class prototypes based on a source prototype genera-
timation models [17, 71, 53]. Here we summarize our con-
tor, which is used to align the pseudo-labeled target data
tributions as follows:
based on contrastive learning. To the best of our knowl-
• We propose a novel source-free UDA framework by edge, the prior approaches [64, 66] all focused on unimodal
introducing a multimodal auxiliary network. It models models. Inspired by existing work on cross-modal model-
the correlations between depth and semantics, and can ing between image features and acoustic clues [65], edge
be discarded completely at inference time. maps [34], or LiDAR points [25] in different applications,
we develop a new cross-modal pseudo label denoising net-
• We enforce a cross-modal consistency between the work for depth-aware source-free UDA.
predictions of the main and auxiliary streams with Depth-aware UDA Motivated by multitask learning, depth
dual-head pseudo label denoising, to reduce the impact estimation has been adopted as an auxiliary task to im-
of inaccurate pseudo labels in source-free UDA. prove UDA for semantic segmentation [49, 9, 43, 3, 22].
• Our proposed method outperforms the prior art by a The labels for depth estimation are mostly derived by self-
significant margin, obtaining an mIoU of 57.7% and supervised models using stereo pairs [16, 17] or video se-

21787
Rectified
-!"# pseudo labels
3!"# 4,5 Offline
0/!"# abels
pseudo labels
Image ,
Classification

denoising

Source
loss

model
!$ -6 ,
!"#
Classification 0/%&'
loss Class prototypes
SSA-Gate SSA-Gate 1!"# 2 1%&'
-.!"#
"(#

teacher model
!$
,
Cross-modal

Mean
%&'

)&* consistency loss

%&'
+
3%&' 4,2 +5 -%&' -.%&'
Depth +
Multimodal auxiliary stream
Online pseudo labels

Figure 2. Illustration of our proposed two-stream segmentation network for source-free UDA.

quences [71]. The correlations between depth and se- 4. Approach

mantics are next modeled by attention-based feature fu-
sion [51, 53]. The depth distribution in different categories We follow the pseudo-label based self-training strategies
can be utilized to further reduce the domain gap [56]. How- to train our source-free UDA model [26]. Target samples
ever, these methods rely on the access to the source domain are passed through the source model to generate a set of
and assume the source and target images are available in pseudo labels that are used to supervise the network. One
stereo pairs or video sequences. main challenge in a self-training framework is reducing the
uncertainty of the pseudo labels for the target images. To
tackle this challenge, we propose to denoise the offline tar-
3. Problem Formulation get pseudo labels with online cross-modal consistency train-
Efforts on source-free domain adaptive semantic seg- ing. Next, we introduce the technical details of our pro-
mentation can be divided into 1) vendor-side domain gen- posed framework.
eralization, and 2) client-side domain adaptation [26]. The 4.1. Two-stream Segmentation Network
vendor and the client have access to the labeled source and
the unlabeled target datasets, respectively. The goal of the The overall architecture of our proposed asymmetric
vendor is to train a source model with good generalization two-stream segmentation network is shown in Figure 2. The
ability to unseen domains [27]. This trained source model main stream is unimodal, which takes RGB images as the
is next passed to the client to be adapted to the unlabeled only input, and can be implemented by any of the exist-
target domain via self-training [39, 31]. ing segmentation models such as DeepLabv2. The auxiliary
In this work, we propose to improve client-side domain stream is multimodal, which ingests depth and the interme-
adaptation by leveraging depth information as the auxiliary diate features generated by the main stream image encoder
modality. Let X = {(xi , di )}ni=1 denote the target dataset to exploit the correlations between the depth and semantic
where (xi , di ) represent the RGB and the depth modality information. To achieve this, we build upon the Separation-
of the i-th sample, respectively. Our goal is to adapt a and-Aggregation Gate (SA-Gate) [8] and present a single-
unimodal source model hs (x) to a unimodal target model sided SA-Gate, termed SSA-Gate, which is placed after
in in
ht (x) more robustly via a multimodal auxiliary network. each of the encoder blocks. Formally, let Fimg and Faux de-
To achieve this goal, we present a novel two-stream neu- note the input features of the SSA-Gate from the main and
ral network with a main stream and an auxiliary stream that auxiliary streams, respectively. SSA-Gate first recalibrates
perform semantic segmentation based on RGB and RGB-D the input features with the help from the other modality by
rec in
modalities, respectively. Facilitated by the depth modality, Fimg = Fimg + Attna (Fimg
in in
||Faux in
) ⊛ Faux
pseudo labels obtained from the source model can be bet- rec in
(1)
Faux = Faux + Attni (Fimg
in in
||Faux in
) ⊛ Fimg
ter rectified, leading to improved source-free UDA perfor-
in in
mance. Moreover, the auxiliary stream is only required dur- where Fimg ||Faux is the concatenation of the input features
ing training, and will be discarded at inference time. Thus, along the channel dimension. Attna and Attni compute
in in
our inference-stage model shares the same network archi- the channel-wise attention for Faux and Fimg , respectively,
tecture (e.g., DeepLabv2 [4]) but obtains improved segmen- and ⊛ denotes the channel-wise multiplication. Next, SSA-
tation results compared to the prior art. Gate merges the features from the two streams based on

21788
(i,k)
the spatial-wise gates proposed in [8]. Let Fmrg denote the where ps represents the softmax probability of pixel x(i)
merged feature, SSA-Gate updates the feature of the auxil- belonging to the k-th class. Thereafter, the classification
out in
iary stream as Faux = 0.5 · (Faux + Fmrg ) and keeps the loss can be computed based on ŷ (i,k) as
feature in the main stream unchanged. With known cam-
era parameters, we follow prior work [5, 8] and extract the ℓcla = ℓce (ŷ, pimg ) + ℓce (ŷ, paux ) (3)
HHA representation, which encodes the depth image with PH×W PK (i,k)
three channels of horizontal disparity, height above ground, where ℓce (ŷ, p) = − i=1 k=1 ŷ log p(i,k) is the
and the angle of the pixel’s local surface normal, as the in- cross-entropy loss. pimg and paux are the predicted outputs
put of our target network [19]. According to previous stud- of the main and auxiliary streams, respectively. In addition
ies [5, 8], the HHA representation is more effective for se- to the pseudo labeling, we introduce a cross-modal consis-
mantic segmentation tasks. Alternatively, the 1-channel dis- tency loss to regularize the output between the two streams.
parity maps can be directly used as the input to our frame- The goal is to reduce the impact of inaccurate pseudo labels,
work if the camera parameters are not available. and this consistency loss is formulated as

4.2. Dual-head Pseudo Label Denoising with Cross- ℓreg = Dkl (p̃aux ||pimg ) + Dkl (p̃img ||paux ) (4)
modal Consistency Regularization
where p̃img and p̃aux are the predicted outputs of
Given a target sample (x, d), we use fimg (x) and the mean-teacher model, and Dkl (p̃aux ||pimg ) =
faux (x, d) to denote the features extracted by the main and PH×W (i) (i) (i)
− i=1 p̃aux log(pimg /p̃aux ) is the Kullback Leibler
auxiliary streams as shown in Figure 2. The extracted fea-
(KL) divergence. We perturb the input based on strong and
tures are next passed to the respective classifiers gimg and
weak augmentations, and feed them to the target network
gaux to obtain the predictions pimg and paux . A mean-
and its mean-teacher model, respectively. Since p̃img and
teacher model [47] is maintained whose parameters are up-
p̃aux are generated based on weak augmented views, they
dated as the exponential moving average of the parameters
are more reliable. They thus can be used as online soft
of the target network. This is used to generate more reliable
pseudo labels to regularize the predictions pimg and paux
online pseudo labels, denoted as p̃img and p̃aux . Offline
inferred over the strong augmented views.
pseudo labels are generated using the source model based
In addition to data augmentations, recall that pimg =
on RGB images only, i.e., ps = hs (x). Next, we will intro-
gimg (fimg (x)) and paux = gaux (faux (x, d)) also predict
duce how to formulate the objectives to optimize our pro-
based on different input modalities. Therefore, our pro-
posed framework.
posed regularization loss enforces that the target network
gives consistent predictions not only for small perturbations
4.2.1 Cross-modal Consistency Training but also over cross-modal views.

Consistency regularization is a popular and essential tech-

4.2.2 Dual-head Pseudo Label Denoising
nique in semi-supervised learning [60, 46]. Based on the
model smoothness assumption, model predictions should Though the pseudo labels ps generated by the source model
be constrained to be invariant to small perturbations of ei- can be directly used to train the target network, rectifying
ther inputs or model hidden states [38], which can be in- ps from a parallel aspect to consistency training will gain
troduced by data augmentation, dropouts, etc. To prevent additional benefits. To this end, we adapt a recent state-
the target model from overfitting to the noisy pseudo labels, of-the-art prototypical pseudo label denoising method [67]
we present a new cross-modal consistency regularization to our framework. This approach fixes ps and rectifies ps
loss that works effectively with pseudo labeling in source- based on class-wise dynamic weights ω as
free UDA. The predictions for pixels with low-confidence
(i,k)
pseudo labels tend to be more sensitive to input perturba- exp (ω (i,k) · ps )
tions [69]. Thus, the impact of the noise in pseudo labels p̂(i,k)
s = PK (i,k′ )
(5)
k′ =1 exp (ω (i,k′ ) · ps )
can be significantly reduced by enforcing a consistency reg-
ularization between the predictions of the two streams. (i,k) (i,k)
where ps and p̂s represent the softmax probability of
Given an unlabeled target image x, we pass it through
(i,k) pixel x(i) belonging to the k-th class before and after de-
the source model to generate the soft pseudo labels ps . noising. We perform prototypical pseudo label denoising
(i,k)
The hard pseudo labels ŷ are computed as for the main and the auxiliary streams separately. Take the
(
(i,k′ )
main stream as an example, let fimg (x)(i) represent the fea-
(i,k) 1 if k = arg maxk′ ps ture at pixel i. The weights ωimg are updated in each train-
ŷ = (2)
0 otherwise ing epoch based on the feature distance to the class proto-

21789
types by already relatively high. We compute the classification loss
using Eq. 3 as ℓ2cla = ℓce (ŷ, pimg ) + ℓce (ŷ, paux ).
(k)
(i,k) exp (−||f˜img (x)(i) − ηimg ||/τ ) This stage is usually referred to as self-distillation, which
ωimg = PK (k′ )
(6)
′ exp (−||f˜img (x)(i) − η ||/τ ) has been successfully applied to typical UDA to boost a
k =1 img
model’s performance [67, 26]. Here we show that with
(k)
where ηimg is the prototype (i.e., the feature centroid) of our proposed cross-modal consistency training, one or more
rounds of self-distillation can also bring substantial perfor-
class k in the main stream. We use f˜img (i.e., the image en-
mance gain to source-free UDA.
coder in the mean-teacher model) instead of fimg , as we de-
sire a more reliable feature estimation for the input sample. 4.3. Test-time Inference
τ is the softmax temperature empirically set to 1. Similarly,
(k)
we maintain class prototypes ηaux for the auxiliary stream, Considering that the depth information may not always
(k) be available during test-time inference, we discard the mul-
compute ωaux based on f˜aux (x, d) and ηaux , and correct ps
timodal auxiliary network and keep only the main stream
based on ωaux using Eq. 5. The classification loss can then
as our inference-stage model. The reasons behind this are
be computed based on the rectified pseudo labels ŷimg and
twofold. First, it improves the feasibility of our model as the
ŷaux , which are more accurate than ŷ.
main stream takes the RGB image as the only input. Sec-
ond, we observe that the multimodal auxiliary stream only
4.2.3 Optimization marginally outperforms the main stream after the model
We perform two rounds of self-training to optimize our pro- converges. Therefore, the accuracy loss as a trade-off for
posed two-stream segmentation network. In both stages, model feasibility is relatively slim. Formally, given a test
we formulate the overall loss as a linear combination of the image x, we compute its pixel-level semantic labels as
classification loss and the regularization loss pimg = gimg (fimg (x)).

ℓstg = ℓstg
cla + γℓreg (7) 5. Experiments
where the superscript stg ∈ {1, 2} distinguishes the loss 5.1. Experimental Settings
computed in stage 1 or stage 2. γ is a balancing coefficient
Dataset We evaluate our proposed method by adapting
that controls the weight of the regularization loss. We em-
from the game scenes GTA [41] and SYNTHIA [42] to the
pirically set γ = 1 in our experiments. We train the same
real scenes Cityscapes [12]. The Cityscapes dataset con-
two-stream segmentation model with the same cross-modal
tains 2,975 training and 500 validation images with a res-
consistency loss as the regularization for self-training. The
olution of 2048 × 1024. For depth, we use the disparity
only difference between the two stages is how we compute
maps provided by the official Cityscapes dataset by default.
the hard pseudo labels and the classification loss.
In the ablation study, we also evaluate our method with
Stage one The source model extracts the pseudo labels for
self-supervised stereoscopic depth [44, 53] and monocular
the target images in the first stage. As the source model
depth [55], which were trained on the stereo images and
was trained on the labeled source data, the uncertainty in
video sequences in the Cityscapes training set, respectively.
the pseudo labels for target images is high. Thus, applying
Evaluation metric We report the Intersection over Union
pseudo label denoising techniques is beneficial, based on
(IoU) on the 19 common categories shared by GTA5 and
which a more robust classification loss can be computed.
Cityscapes and the 16 common categories shared by SYN-
In our implementation, we compute the symmetric cross-
THIA and Cityscapes. Following previous studies, we also
entropy (SCE) [54] based on ŷimg and ŷaux as
report the results on 13 of the 16 common categories shared
ℓ1cla = ℓsce (ŷimg , pimg ) + ℓsce (ŷaux , paux ) (8) by the SYNTHIA and Cityscapes datasets.
Implementation details For the source-only model, we
where pimg and paux are the predicted outputs of the adopt the pre-trained models on GTA5 and SYNTHIA pro-
main and auxiliary streams, ŷimg and ŷaux are the hard vided by Kundu et al. [26]. Both the source model and
pseudo labels denoised by ωimg and ωaux , and ℓsce (ŷ, p) = our target model use DeepLabv2 [4] for segmentation with
αℓce (p, ŷ) + βℓce (ŷ, p). Following previous work [67], we ResNet-101 [20] as the backbone. We insert four SSA-
set the balancing coefficients α and β to 0.1 and 1. Gates, one after each of the four encoder blocks in ResNet-
Stage two The pseudo labels for the target images are 101. We train our model using the SGD solver with a mo-
extracted by our learned target model in the first stage, mentum of 0.9 and weight decay of 2 × 10−4 . We use a
which are derived from the fusion of the two streams: mini-batch size of 4 and an initial learning rate of 6 × 10−4 .
ŷ = 12 (pimg + paux ). No advanced denoising methods are Following [67], we set the parameters for the prototypical
required in this stage as the quality of the pseudo labels is pseudo label denoising α, β, and τ to 0.1, 1, and 1, re-

21790
Table 1. Per-class IoU (%) and mIoU (%) comparison of GTA5 → Cityscapes adaptation. The best score for each column is highlighted.

sidewalk

building

person
terrain

motor
fence

vege.

truck
rider
light

train
road

pole
wall

bike
sign

sky

bus
car
Method SF mIoU
FADA [52] ✗ 91.0 50.6 86.0 43.4 29.8 36.8 43.4 25.0 86.8 38.3 87.4 64.0 38.0 85.2 31.6 46.1 6.5 25.4 37.1 50.1
CAG-UDA [68] ✗ 90.4 51.6 83.8 34.2 27.8 38.4 25.3 48.4 85.4 38.2 78.1 58.6 34.6 84.7 21.9 42.7 41.1 29.3 37.2 50.2
Seg-Uncertainty [69] ✗ 90.4 31.2 85.1 36.9 25.6 37.5 48.8 48.5 85.3 34.8 81.1 64.4 36.8 86.3 34.9 52.2 1.7 29.0 44.6 50.3
IAST [37] ✗ 94.1 58.8 85.4 39.7 29.2 25.1 43.1 34.2 84.8 34.6 88.7 62.7 30.3 87.6 42.3 50.3 24.7 35.2 40.2 52.2
CorDA [53] ✗ 94.7 63.1 87.6 30.7 40.6 40.2 47.8 51.6 87.6 47.0 89.7 66.7 35.9 90.2 48.9 57.5 0.0 39.8 56.0 56.6
ProDA [67] ✗ 87.8 56.0 79.7 46.3 44.8 45.6 53.5 53.5 88.6 45.2 82.1 70.7 39.2 88.8 45.5 59.4 1.0 48.9 56.4 57.5
EHTDI [29] ✗ 95.4 68.8 88.1 37.1 41.4 42.5 45.7 60.4 87.3 42.6 86.8 67.4 38.6 90.5 66.7 61.4 0.3 39.4 56.1 58.8
BiSMAP [35] ✗ 89.2 54.9 84.4 44.1 39.3 41.6 53.9 53.5 88.4 45.1 82.3 69.4 41.8 90.4 56.4 68.8 51.2 47.8 60.4 61.2
SFDA [31] ✓ 84.2 39.2 82.7 27.5 22.1 25.9 31.1 21.9 82.4 30.5 85.3 58.7 22.1 80.0 33.1 31.5 3.6 27.8 30.6 43.2
URMA [39] ✓ 92.3 55.2 81.6 30.8 18.8 37.1 17.7 12.1 84.2 35.9 83.8 57.7 24.1 81.7 27.5 44.3 6.9 24.1 40.4 45.1
LD [66] ✓ 91.6 53.2 80.6 36.6 14.2 26.4 31.6 22.7 83.1 42.1 79.3 57.3 26.6 82.1 41.0 50.1 0.3 25.9 19.5 45.5
SRDA [2] ✓ 90.5 47.1 82.8 32.8 28.0 29.9 35.9 34.8 83.3 39.7 76.1 57.3 23.6 79.5 30.7 40.2 0.0 26.6 30.9 45.8
SFUDA [64] ✓ 95.2 40.6 85.2 30.6 26.1 35.8 34.7 32.8 85.3 41.7 79.5 61.0 28.2 86.5 41.2 45.3 15.6 33.1 40.0 49.4
GtA w/o cPAE [26] ✓ 90.9 48.6 85.5 35.3 31.7 36.9 34.7 34.8 86.2 47.8 88.5 61.7 32.6 85.9 46.9 50.4 0.0 38.9 52.4 51.6
GtA w/ cPAE [26] ✓ 91.7 53.4 86.1 37.6 32.1 37.4 38.2 35.6 86.7 48.5 89.9 62.6 34.3 87.2 51.0 50.8 4.2 42.7 53.9 53.4
Ours ✓ 93.0 60.4 87.2 46.4 41.4 38.0 45.1 51.5 87.5 48.6 83.7 63.2 31.8 88.6 49.5 60.3 0.0 47.1 47.8 56.4
Ours w/ distillation ✓ 94.5 65.5 87.4 45.7 42.6 42.3 46.7 54.5 88.3 48.0 84.7 66.0 33.4 89.9 53.5 56.8 0.0 46.9 49.4 57.7
Ours (mono) ✓ 95.0 67.0 87.4 44.0 42.2 40.7 47.5 50.8 87.1 51.0 77.5 67.7 29.9 88.5 42.0 57.4 0.0 45.3 42.5 56.0
Ours (stereo) ✓ 95.1 67.8 87.7 51.3 41.5 36.3 47.4 51.3 87.8 47.8 87.3 67.0 34.2 87.5 41.0 51.8 0.0 42.6 46.4 56.4

spectively. We conduct an ablation study on the balancing or 41.0% mIoU on GTA5 or SYNTHIA → Cityscapes), our
coefficient γ in Eq. 7 and set γ = 1 in the rest of the ex- method obtains competitive or even better results compared
periments. For consistency regularization, we employ ran- to most of the existing non-source-free UDA methods. It is
dom crop as the weak augmentation and apply RandAug- worth noting that our method can be easily integrated with
ment [13] and Cutout [14] in addition to random crop as the non-source-free UDA methods. A naive implementation is
strong augmentation. As the class prototypes are required to start with an adapted model instead of the source model
for pseudo label denoising, we first train our target model to generate pseudo labels for target images in stage one self-
on the pseudo labels generated by the source model before training.
denoising as a warm-up. Next, we initialize the class pro-
totypes with the learned warm-up model and continue op- 5.3. Ablation Study and Discussion
timizing it based on Eq. 7 for 60 epochs. In the warm-up
Impact of the source for depth information Our pro-
stage, we choose the top 33% of the most confident predic-
posed method is agnostic to the acquisition of the depth
tions per class over the entire training set to select balanced
information. To evaluate, we replace the depth informa-
and reliable hard pseudo labels [30, 26].
tion provided by the official Cityscapes dataset1 by 1) the
5.2. Comparisons with State-of-the-Art Methods self-supervised stereoscopic depth [44] used in CorDA [53],
and 2) the self-supervised monocular depth learned by the
We compare our proposed method with the prior art in ManyDepth model [55], denoted as Ours (stereo) and Ours
Tables 1 and 2. The column SF indicates if the compari- (mono), respectively. For the monocular depth, we directly
son method is source-free or not. As shown, our method use the 1-channel disparity map as the input; while for the
outperforms the existing source-free methods by a large stereo depth, we use the 3-channel HHA representation de-
margin, achieving a state-of-the-art mIoU of 57.7% and rived from the depth information with camera parameters as
56.4% (57.5% and 55.6%) with or without self-distillation the input (see Figure 3 for the visualized examples). Gen-
on GTA5 → Cityscapes (SYNTHIA → Cityscapes). We erally speaking, stereo depth is more accurate but its acqui-
achieve the best score on 15 out of 19 common categories sition requires more expensive stereo cameras. Monocular
shared by GTA5 and Cityscapes, and on 12 out of 16 com- depth can be estimated based on video sequences recorded
mon categories shared by SYNTHIA and Cityscapes. The by regular cameras. However, it is less accurate and it re-
experimental results indicate the effectiveness of our pro- quires significantly more storage to manage the video se-
posed pseudo label denoising with cross-modal consistency quences. We show that our proposed method is effective
training. As we are exploring a new direction that has with different sources of depth information. In real-world
not been studied in previous source-free methods, our so- scenarios, users should choose based on their own require-
lution is orthogonal to existing techniques such as source ments and available devices.
domain estimation [31] and conditional Prior-enforcing Au-
Utilization strategies on the depth information Existing
toEncoder (cPAE) [26]. Such techniques can be combined
depth-aware domain adaptive semantic segmentation meth-
with our proposed method for further performance gains.
Next, we compare our method to the non-source-free 1 The depth provided in the official Cityscapes dataset is not the ground

prior art. Starting with a well-trained source model (44.0% truth but also estimated based on stereo images.

21791
Table 2. Per-class IoU (%) and mIoU (%) comparison of SYNTHIA → Cityscapes adaptation. The best score for each column is high-
lighted. mIoU and mIoU* denote the averaged scores across 16 and 13 categories, respectively.

sidewalk

building

person
fence*

motor
vege.
pole*
wall*

rider
light
road

bike
sign

sky

bus
car
Method SF mIoU mIoU*
CAG-UDA [68] ✗ 84.7 40.8 81.7 7.8 0.0 35.1 13.3 22.7 84.5 77.6 64.2 27.8 80.9 19.7 22.7 48.3 44.5 51.5
FADA [52] ✗ 84.5 40.1 83.1 4.8 0.0 34.3 20.1 27.2 84.8 84.0 53.5 22.6 85.4 43.7 26.8 27.8 45.2 52.5
Seg-Uncertainty [69] ✗ 87.6 41.9 83.1 14.7 1.7 36.2 31.3 19.9 81.6 80.6 63.0 21.8 86.2 40.7 23.6 53.1 47.9 54.9
IAST [37] ✗ 81.9 41.5 83.3 17.7 4.6 32.3 30.9 28.8 83.4 85.0 65.5 30.8 86.5 38.2 33.1 52.7 49.8 57.0
CorDA [53] ✗ 93.3 61.6 85.3 19.6 5.1 37.8 36.6 42.8 84.9 90.4 69.7 41.8 85.6 38.4 32.6 53.9 55.0 62.8
ProDA [67] ✗ 87.8 45.7 84.6 37.1 0.6 44.0 54.6 37.0 88.1 84.4 74.2 24.3 88.2 51.1 40.5 45.6 55.5 62.0
EHTDI [29] ✗ 93.0 69.8 84.0 36.6 9.1 39.7 42.2 43.8 88.2 88.1 68.3 29.0 85.5 54.1 37.1 56.3 57.8 64.6
BiSMAP [35] ✗ 81.9 39.8 84.2 - - - 41.7 46.1 83.4 88.7 69.2 39.3 80.7 51.0 51.2 58.8 - 62.8
SFDA [31] ✓ 81.9 44.9 81.7 4.0 0.5 26.2 3.3 10.7 86.3 89.4 37.9 13.4 80.6 25.6 9.6 31.3 39.2 45.9
URMA [39] ✓ 59.3 24.6 77.0 14.0 1.8 31.5 18.3 32.0 83.1 80.4 46.3 17.8 76.7 17.0 18.5 34.6 39.6 45.0
LD [66] ✓ 77.1 33.4 79.4 5.8 0.5 23.7 5.2 13.0 81.8 78.3 56.1 21.6 80.3 49.6 28.0 48.1 42.6 50.1
SFUDA [64] ✓ 90.9 45.5 80.8 3.6 0.5 28.6 8.5 26.1 83.4 83.6 55.2 25.0 79.5 32.8 20.2 43.9 44.2 51.9
GtA w/o cPAE [26] ✓ 89.0 44.6 80.1 7.8 0.7 34.4 22.0 22.9 82.0 86.5 65.4 33.2 84.8 45.8 38.4 31.7 48.1 55.5
GtA w/ cPAE [26] ✓ 90.5 50.0 81.6 13.3 2.8 34.7 25.7 33.1 83.8 89.2 66.0 34.9 85.3 53.4 46.1 46.6 52.0 60.1
Ours ✓ 91.5 55.5 85.4 34.4 8.3 40.8 40.0 44.4 86.6 84.3 62.4 22.0 88.3 60.0 40.6 45.6 55.6 62.1
Ours w/ distillation ✓ 91.5 56.3 85.9 37.9 9.2 42.1 42.6 47.6 87.2 86.1 64.5 23.3 89.3 64.5 45.0 47.7 57.5 64.0
Ours (mono) ✓ 91.2 56.6 85.0 36.5 6.8 41.6 45.5 18.8 86.5 86.2 66.4 26.7 88.7 58.2 44.3 48.0 55.4 61.7
Ours (stereo) ✓ 91.6 56.4 85.7 29.3 7.8 41.2 42.0 37.6 86.8 85.9 65.2 27.3 88.4 59.5 44.4 47.8 56.0 63.0

Mono depth Stereo depth Official depth Table 4. Model justification of our proposed framework on GTA5
→ Cityscapes. The auxiliary modality column indicates if depth
modality is used during training or not.
components mIoU gain
source model 44.0 -
auxiliary self consistency pseudo label
mIoU gain
modality training regularization denoising
✓ 50.5 +6.5
✓ ✓ 51.2 +7.2
Original image Stereo HHA Official HHA ✓ ✓ 52.7 +8.7
stage 1
Figure 3. Visualization of the depth and the HHA representation ✓ ✓ ✓ 55.1 +11.1
✓ ✓ 50.9 +6.9
obtained by different methods. ✓ ✓ ✓ 51.6 +7.6
✓ ✓ ✓ 54.2 +10.2
Table 3. Comparison of different utilization strategies of the depth ✓ ✓ ✓ ✓ 56.4 +12.4
auxiliary self stage 1 self-supervised
information for source-free UDA on GTA5 → Cityscapes. * indi- modality distillation initialization initialization
mIoU gain
stage 2
cates we made minimum modifications to make the method com- ✓ ✓ ✓ 57.6 +13.6
✓ ✓ ✓ 57.7 +13.7
patible with source-free settings.
Method BG MC RIV RIG DS mIoU gain
Source only [26] 55.3 19.4 28.7 62.9 53.7 44.0 - However, as this method did not address the domain shift
DADA* [51] 61.5 26.9 36.1 72.1 55.8 50.1 +6.1
CorDA* [53] 60.5 27.3 39.0 73.8 55.6 50.5 +6.5
issue between the source model and the target images, it
MKE* [61] 62.2 27.8 40.4 70.9 57.8 51.5 +7.5 performs less effectively than our proposed approach. Fur-
Ours 65.8 31.7 44.9 76.7 65.4 56.4 +12.4 thermore, the inference-stage model in MKE is multimodal,
while ours is unimodal with better feasibility.
ods mostly follow a multitask learning framework where Effectiveness of cross-modal pseudo label denoising Our
depth estimation is modeled as the auxiliary task [51, 53]. proposed framework consists of two major components,
We modified two depth-aware UDA methods to make them namely the multimodal auxiliary network and cross-modal
applicable in a source-free setting by calculating the classi- consistency training. As shown in Table 4, we start with a
fication loss based on the pseudo labeled target images only. source model that obtains an mIoU of 44.0% on the GTA5
The results are reported in Table 32 . As shown, without the → Cityscapes. By training the network without our pro-
supervision of the labeled source data, the regularization in- posed consistency regularization, it achieves an mIoU of
duced by the auxiliary task is quite limited. Moreover, we 50.9% and 54.2%, respectively, based on the supervision of
compare our approach to a Multimodal Knowledge Expan- the classification loss only before and after the pseudo label
sion (MKE) method [61] that transfers knowledge from a denoising. By combining our proposed consistency regular-
unimodal teacher network to a multimodal student network. ization with pseudo label denoising, we obtain a new state-
2 Background
of-the-art mIoU of 56.4%, outperforming the source model
(BG) - building, wall, fence, vegetation, terrain, sky; Mi-
nority Class (MC) - rider, train, motorcycle, bicycle; Road Infrastructure
significantly by 12.4%. To evaluate the benefits introduced
Vertical (RIV) - pole, traffic light, traffic sign; Road Infrastructure Ground by the depth modality, we replaced our multimodal auxil-
(RIG) - road, sidewalk; and Dynamic Stuff (DS) - person, car, truck, bus. iary network with a unimodal network with the same ar-

21792
Figure 4. Qualitative results of source-free semantic segmentation on the Cityscapes dataset. From left to right: input, output of the source
model, output of the GtA model with cPAE [26], output of our proposed model without self-distillation, ground-truth segmentation mask.

Table 5. Impact of the source model on GTA5 → Cityscapes. Table 6. The effect of the balancing coefficient γ.
source model source training target model target adaptation mIoU γ 0.5 1 2 5 10
DeepLabv2 data aug. - - 38.6 mIoU 56.0 56.4 56.2 56.7 55.7
DeepLabv2 [26] multi-head - - 44.0
DeepLabv2 multi-head SegFormer self-training 51.3 Table 7. The mIoU obtained by the multimodal auxiliary network
DeepLabv2 multi-head DeepLabv2 self-training 50.5 with varying number of SSA-Gate.
DeepLabv2 multi-head DeepLabv2* our proposed 56.4 SSA-Gate no. 1 2 3 4
SegFormer [59] data aug. - - 43.2
mIoU 43.4 49.2 53.1 56.6
SegFormer data aug. SegFormer self-training 50.5
SegFormer data aug. DeepLabv2 self-training 49.4
SegFormer data aug. DeepLabv2* our proposed 55.5
Parameter sensitivity analysis Finally, we study the im-
GtA w/ cPAE SF adapted - - 53.4
GtA w/ cPAE SF adapted DeepLabv2* our proposed 57.3 pact of the balancing coefficient γ in Eq. 7 on the self-
ProDA [67] non-SF adapted - - 57.5 training in stage one. We set γ to different values, conduct
ProDA non-SF adapted DeepLabv2* our proposed 59.5
experiments on GTA5 → Cityscapes, and report the results
in Table 6. The experimental results show that our proposed
chitecture as the main stream. The mIoU decreases in all method is not sensitive to the balancing factor γ. In our pre-
cases by using RGB as the only input. Next, we evaluate vious experiments, we empirically set γ = 1. It shows that
our cross-modal consistency training in self-distillation. We the mIoU can be slightly improved by setting γ = 5. We
initialize our model either with the weights of the learned obtain the state-of-the-art mIoU of 55.7% ∼ 56.7% when
model in stage one (i.e., stage 1 initialization) or with Sim- γ ∈ [0.5, 10], which verifies and underscores the robust-
CLRv2 [7] pretrained weights (i.e., self-supervised initial- ness of our proposed cross-modal consistency training tech-
ization). In both cases, we observe a performance gain of nique. Table 7 shows the mIoU obtained by the multimodal
around 1.3% over the stage one model. The qualitative eval- auxiliary network with varying number of SSA-Gate. The
uation of our method is illustrated in Figure 4. mIoU decreases significantly to 43.4% with only one SSA-
Impact of the source model The majority of the source- Gate, which indicates that predicting the semantic labels
free UDA methods are built upon DeepLab models. Here from depth alone is challenging without sufficient informa-
we evaluate a Transformer-based model, namely Seg- tion exchange with RGB images.
former [59], as the source and target models in a source-
free UDA setting. As Table 5 shows, Segformer has better 6. Conclusions
generalization ability than DeepLabv2. With data augmen-
tation only, a source Segformer model obtains an mIoU of We propose to enhance source-free domain adaptive se-
43.2, outperforming a source DeepLabv2 model by 4.6%. mantic segmentation via cross-modal consistency training.
Moreover, when being adopted as the target model, Seg- To achieve this goal, we introduce a multimodal auxiliary
former achieves an mIoU of 51.3% and 50.5%, respectively. network to leverage the guidance from the depth modal-
It outperforms the corresponding DeepLabv2 by 0.8% and ity during training. A cross-modal consistency loss is for-
1.1%, when being adapted from the same source model. To mulated between the output of the main and the auxiliary
verify that our method is orthogonal to previous work, we networks, which serves as an effective regularization for
also start with a source-free model (i.e., GtA w/ cPAE [26]) source-free UDA. Our proposed approach not only outper-
and a non-source-free model (i.e., ProDA [67]), and apply forms the source-free prior art by a large margin, but also
our method on top of it. As can be seen, the mIoU has been reduces the gap between source-free and non-source-free
further improved by 3.9% and 2%, respectively. UDA methods in semantic segmentation.

21793
References [14] Terrance DeVries and Graham W Taylor. Improved regular-
ization of convolutional neural networks with cutout. arXiv
[1] Nikita Araslanov and Stefan Roth. Self-supervised augmen- preprint arXiv:1708.04552, 2017. 6
tation consistency for adapting semantic segmentation. In [15] Liang Du, Jingang Tan, Hongye Yang, Jianfeng Feng, Xi-
CVPR, pages 15384–15394, 2021. 2 angyang Xue, Qibao Zheng, Xiaoqing Ye, and Xiaolin
[2] Mathilde Bateson, Hoel Kervadec, Jose Dolz, Hervé Lom- Zhang. SSF-DAN: Separated semantic feature based domain
baert, and Ismail Ben Ayed. Source-relaxed domain adap- adaptation network for semantic segmentation. In ICCV,
tation for image segmentation. In MICCAI, pages 490–499, pages 982–991, 2019. 2
2020. 6 [16] Ravi Garg, Vijay Kumar Bg, Gustavo Carneiro, and Ian Reid.
[3] Adriano Cardace, Luca De Luigi, Pierluigi Zama Ramirez, Unsupervised cnn for single view depth estimation: Geome-
Samuele Salti, and Luigi Di Stefano. Plugging self- try to the rescue. In ECCV, pages 740–756. Springer, 2016.
supervised monocular depth into unsupervised domain adap- 2
tation for semantic segmentation. In WACV, pages 1129– [17] Clément Godard, Oisin Mac Aodha, and Gabriel J Bros-
1139, 2022. 2 tow. Unsupervised monocular depth estimation with left-
[4] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, right consistency. In CVPR, pages 270–279, 2017. 2
Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image [18] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing
segmentation with deep convolutional nets, atrous convolu- Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and
tion, and fully connected crfs. IEEE Transactions on Pattern Yoshua Bengio. Generative adversarial networks. Commu-
Analysis and Machine Intelligence, 40(4):834–848, 2017. 3, nications of the ACM, 63(11):139–144, 2020. 2
5 [19] Saurabh Gupta, Ross Girshick, Pablo Arbeláez, and Jitendra
[5] Lin-Zhuo Chen, Zheng Lin, Ziqin Wang, Yong-Liang Yang, Malik. Learning rich features from RGB-D images for object
and Ming-Ming Cheng. Spatial information guided convo- detection and segmentation. In ECCV, pages 345–360, 2014.
lution for real-time RGBD semantic segmentation. IEEE 4
Transactions on Image Processing, 30:2313–2324, 2021. 4 [20] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
[6] Minghao Chen, Hongyang Xue, and Deng Cai. Do- Deep residual learning for image recognition. In CVPR,
main adaptation for semantic segmentation with maximum pages 770–778, 2016. 5
squares loss. In ICCV, pages 2090–2099, 2019. 2 [21] Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu,
Phillip Isola, Kate Saenko, Alexei Efros, and Trevor Darrell.
[7] Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad
Cycada: Cycle-consistent adversarial domain adaptation. In
Norouzi, and Geoffrey E Hinton. Big self-supervised mod-
ICML, pages 1989–1998, 2018. 2
els are strong semi-supervised learners. NeurIPS, 33:22243–
22255, 2020. 8 [22] Lukas Hoyer, Dengxin Dai, Yuhua Chen, Adrian Koring,
Suman Saha, and Luc Van Gool. Three ways to improve se-
[8] Xiaokang Chen, Kwan-Yee Lin, Jingbo Wang, Wayne Wu,
mantic segmentation with self-supervised depth estimation.
Chen Qian, Hongsheng Li, and Gang Zeng. Bi-directional
In CVPR, pages 11130–11140, 2021. 2
cross-modality feature propagation with separation-and-
[23] Lukas Hoyer, Dengxin Dai, and Luc Van Gool. Daformer:
aggregation gate for RGB-D semantic segmentation. In
Improving network architectures and training strategies for
ECCV, pages 561–577, 2020. 3, 4
domain-adaptive semantic segmentation. In CVPR, pages
[9] Yuhua Chen, Wen Li, Xiaoran Chen, and Luc Van Gool. 9924–9935, 2022. 2
Learning semantic segmentation from synthetic data: A ge-
[24] Lukas Hoyer, Dengxin Dai, and Luc Van Gool. HRDA:
ometrically guided input-output adaptation approach. In
Context-aware high-resolution domain-adaptive semantic
CVPR, pages 1841–1850, 2019. 2
segmentation. In ECCV, 2022. 2
[10] Yuhua Chen, Wen Li, and Luc Van Gool. Road: Reality ori- [25] Maximilian Jaritz, Tuan-Hung Vu, Raoul de Charette, Emilie
ented adaptation for semantic segmentation of urban scenes. Wirbel, and Patrick Pérez. xMUDA: Cross-modal unsuper-
In CVPR, pages 7892–7901, 2018. 2 vised domain adaptation for 3d semantic segmentation. In
[11] Yi-Hsin Chen, Wei-Yu Chen, Yu-Ting Chen, Bo-Cheng Tsai, CVPR, pages 12605–12614, 2020. 2
Yu-Chiang Frank Wang, and Min Sun. No more discrimi- [26] Jogendra Nath Kundu, Akshay Kulkarni, Amit Singh, Varun
nation: Cross city adaptation of road scene segmenters. In Jampani, and R Venkatesh Babu. Generalize then adapt:
ICCV, pages 1992–2001, 2017. 1, 2 Source-free domain adaptive semantic segmentation. In
[12] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo ICCV, pages 7046–7056, 2021. 1, 2, 3, 5, 6, 7, 8
Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe [27] Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M
Franke, Stefan Roth, and Bernt Schiele. The cityscapes Hospedales. Deeper, broader and artier domain generaliza-
dataset for semantic urban scene understanding. In CVPR, tion. In ICCV, pages 5542–5550, 2017. 3
pages 3213–3223, 2016. 5 [28] Guangrui Li, Guoliang Kang, Wu Liu, Yunchao Wei, and
[13] Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Yi Yang. Content-consistent matching for domain adaptive
Le. Randaugment: Practical automated data augmentation semantic segmentation. In ECCV, pages 440–456, 2020. 2
with a reduced search space. In CVPR Workshops, pages [29] Junjie Li, Zilei Wang, Yuan Gao, and Xiaoming Hu. Explor-
702–703, 2020. 6 ing high-quality target domain information for unsupervised

21794
domain adaptive semantic segmentation. In ACM Multime- [45] Inkyu Shin, Sanghyun Woo, Fei Pan, and In So Kweon. Two-
dia, pages 5237––5245, 2022. 6, 7 phase pseudo label densification for self-training based do-
[30] Yunsheng Li, Lu Yuan, and Nuno Vasconcelos. Bidirectional main adaptation. In ECCV, pages 532–548, 2020. 2
learning for domain adaptation of semantic segmentation. In [46] Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao
CVPR, pages 6936–6945, 2019. 2, 6 Zhang, Han Zhang, Colin A Raffel, Ekin Dogus Cubuk,
[31] Yuang Liu, Wei Zhang, and Jun Wang. Source-free do- Alexey Kurakin, and Chun-Liang Li. Fixmatch: Simplifying
main adaptation for semantic segmentation. In CVPR, pages semi-supervised learning with consistency and confidence.
1215–1224, 2021. 1, 2, 3, 6, 7 NeurIPS, 33:596–608, 2020. 4
[32] Zhenguang Liu, Haoming Chen, Runyang Feng, Shuang Wu, [47] Antti Tarvainen and Harri Valpola. Mean teachers are better
Shouling Ji, Bailin Yang, and Xun Wang. Deep dual consec- role models: Weight-averaged consistency targets improve
utive network for human pose estimation. In CVPR, pages semi-supervised deep learning results. NeurIPS, 30, 2017. 4
525–534, 2021. 1 [48] Yi-Hsuan Tsai, Wei-Chih Hung, Samuel Schulter, Ki-
[33] Zhenguang Liu, Shuang Wu, Shuyuan Jin, Qi Liu, Shijian hyuk Sohn, Ming-Hsuan Yang, and Manmohan Chandraker.
Lu, Roger Zimmermann, and Li Cheng. Towards natural and Learning to adapt structured output space for semantic seg-
accurate future motion prediction of humans and animals. In mentation. In CVPR, pages 7472–7481, 2018. 1, 2
CVPR, pages 10004–10012, 2019. 1 [49] Simon Vandenhende, Stamatios Georgoulis, Wouter
[34] Adrian Lopez-Rodriguez and Krystian Mikolajczyk. Desc: Van Gansbeke, Marc Proesmans, Dengxin Dai, and Luc
Domain adaptation for depth estimation via semantic con- Van Gool. Multi-task learning for dense prediction tasks: A
sistency. International Journal of Computer Vision, survey. PAMI, 2021. 2
131(3):752–771, 2023. 2 [50] Tuan-Hung Vu, Himalaya Jain, Maxime Bucher, Matthieu
[35] Yulei Lu, Yawei Luo, Li Zhang, Zheyang Li, Yi Yang, Cord, and Patrick Pérez. Advent: Adversarial entropy mini-
and Jun Xiao. Bidirectional self-training with multiple mization for domain adaptation in semantic segmentation. In
anisotropic prototypes for domain adaptive semantic seg- CVPR, pages 2517–2526, 2019. 1, 2
mentation. In ACM Multimedia, pages 1405—-1415, 2022.
[51] Tuan-Hung Vu, Himalaya Jain, Maxime Bucher, Matthieu
6, 7
Cord, and Patrick Pérez. Dada: Depth-aware domain adap-
[36] Yawei Luo, Liang Zheng, Tao Guan, Junqing Yu, and Yi
tation in semantic segmentation. In ICCV, pages 7364–7373,
Yang. Taking a closer look at domain shift: Category-level
2019. 2, 3, 7
adversaries for semantics consistent domain adaptation. In
[52] Haoran Wang, Tong Shen, Wei Zhang, Ling-Yu Duan, and
ICCV, pages 2507–2516, 2019. 2
Tao Mei. Classes matter: A fine-grained adversarial ap-
[37] Ke Mei, Chuang Zhu, Jiaqi Zou, and Shanghang Zhang. In-
proach to cross-domain semantic segmentation. In ECCV,
stance adaptive self-training for unsupervised domain adap-
pages 642–659, 2020. 6, 7
tation. In ECCV, 2020. 2, 6, 7
[53] Qin Wang, Dengxin Dai, Lukas Hoyer, Luc Van Gool, and
[38] Luke Melas-Kyriazi and Arjun K Manrai. Pixmatch: Unsu-
Olga Fink. Domain adaptive semantic segmentation with
pervised domain adaptation via pixelwise consistency train-
self-supervised depth estimation. In ICCV, pages 8515–
ing. In CVPR, pages 12435–12445, 2021. 2, 4
8525, 2021. 2, 3, 5, 6, 7
[39] S Prabhu Teja and François Fleuret. Uncertainty reduction
for model adaptation in semantic segmentation. In CVPR, [54] Yisen Wang, Xingjun Ma, Zaiyi Chen, Yuan Luo, Jinfeng
pages 9613–9623, 2021. 1, 2, 3, 6, 7 Yi, and James Bailey. Symmetric cross entropy for robust
[40] Zhen Qiu, Yifan Zhang, Hongbin Lin, Shuaicheng Niu, learning with noisy labels. In ICCV, pages 322–330, 2019. 5
Yanxia Liu, Qing Du, and Mingkui Tan. Source-free domain [55] Jamie Watson, Oisin Mac Aodha, Victor Prisacariu, Gabriel
adaptation via avatar prototype generation and adaptation. In Brostow, and Michael Firman. The temporal opportunist:
IJCAI, 2021. 2 Self-supervised multi-frame monocular depth. In CVPR,
[41] Stephan R Richter, Vibhav Vineet, Stefan Roth, and Vladlen 2021. 5, 6
Koltun. Playing for data: Ground truth from computer [56] Quanliang Wu and Huajun Liu. Unsupervised domain adap-
games. In ECCV, pages 102–118, 2016. 5 tation for semantic segmentation using depth distribution. In
[42] German Ros, Laura Sellart, Joanna Materzynska, David Advances in Neural Information Processing Systems. 3
Vazquez, and Antonio M Lopez. The synthia dataset: A large [57] Zuxuan Wu, Xin Wang, Joseph E Gonzalez, Tom Goldstein,
collection of synthetic images for semantic segmentation of and Larry S Davis. ACE: Adapting to changing environ-
urban scenes. In CVPR, pages 3234–3243, 2016. 5 ments for semantic segmentation. In ICCV, pages 2121–
[43] Suman Saha, Anton Obukhov, Danda Pani Paudel, Menelaos 2130, 2019. 2
Kanakis, Yuhua Chen, Stamatios Georgoulis, and Luc [58] Binhui Xie, Shuang Li, Mingjia Li, Chi Harold Liu, Gao
Van Gool. Learning to relate depth and semantics for un- Huang, and Guoren Wang. SePiCo: Semantic-guided pixel
supervised domain adaptation. In CVPR, pages 8197–8207, contrast for domain adaptive semantic segmentation. arXiv
2021. 2 preprint arXiv:2204.08808, 2022. 2
[44] Christos Sakaridis, Dengxin Dai, Simon Hecker, and Luc [59] Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar,
Van Gool. Model adaptation with synthetic and real data for Jose M Alvarez, and Ping Luo. SegFormer: Simple and ef-
semantic dense foggy scene understanding. In ECCV, pages ficient design for semantic segmentation with transformers.
687–704, 2018. 1, 5, 6 NeurIPS, 34:12077–12090, 2021. 8

21795
[60] Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, and
Quoc Le. Unsupervised data augmentation for consistency
training. NeurIPS, 33:6256–6268, 2020. 4
[61] Zihui Xue, Sucheng Ren, Zhengqi Gao, and Hang Zhao.
Multimodal knowledge expansion. In ICCV, pages 854–863,
2021. 2, 7
[62] Jihan Yang, Ruijia Xu, Ruiyu Li, Xiaojuan Qi, Xiaoyong
Shen, Guanbin Li, and Liang Lin. An adversarial perturba-
tion oriented domain adaptation approach for semantic seg-
mentation. In AAAI, pages 12613–12620, 2020. 1, 2
[63] Yanchao Yang and Stefano Soatto. FDA: Fourier domain
adaptation for semantic segmentation. In CVPR, pages
4085–4095, 2020. 2
[64] Mucong Ye, Jing Zhang, Jinpeng Ouyang, and Ding Yuan.
Source data-free unsupervised domain adaptation for seman-
tic segmentation. In ACM Multimedia, pages 2233—-2242,
2021. 2, 6, 7
[65] Yifang Yin, Harsh Shrivastava, Ying Zhang, Zhenguang Liu,
Rajiv Ratn Shah, and Roger Zimmermann. Enhanced au-
dio tagging via multi-to single-modal teacher-student mutual
learning. In AAAI, volume 35, pages 10709–10717, 2021. 2
[66] Fuming You, Jingjing Li, Lei Zhu, Zhi Chen, and Zi
Huang. Domain adaptive semantic segmentation without
source data. In ACM Multimedia, pages 3293—-3302, 2021.
2, 6, 7
[67] Pan Zhang, Bo Zhang, Ting Zhang, Dong Chen, Yong Wang,
and Fang Wen. Prototypical pseudo label denoising and tar-
get structure learning for domain adaptive semantic segmen-
tation. In CVPR, pages 12414–12424, 2021. 1, 2, 4, 5, 6, 7,
8
[68] Qiming Zhang, Jing Zhang, Wei Liu, and Dacheng Tao. Cat-
egory anchor-guided unsupervised domain adaptation for se-
mantic segmentation. NeurIPS, 32, 2019. 2, 6, 7
[69] Zhedong Zheng and Yi Yang. Rectifying pseudo label learn-
ing via uncertainty estimation for domain adaptive seman-
tic segmentation. International Journal of Computer Vision,
pages 1106–1120, 2021. 1, 2, 4, 6, 7
[70] Qianyu Zhou, Zhengyang Feng, Qiqi Gu, Jiangmiao Pang,
Guangliang Cheng, Xuequan Lu, Jianping Shi, and Lizhuang
Ma. Context-aware mixup for domain adaptive semantic seg-
mentation. IEEE Transactions on Circuits and Systems for
Video Technology, 2022. 2
[71] Tinghui Zhou, Matthew Brown, Noah Snavely, and David G
Lowe. Unsupervised learning of depth and ego-motion from
video. In CVPR, pages 1851–1858, 2017. 2, 3
[72] Yang Zou, Zhiding Yu, BVK Kumar, and Jinsong Wang. Un-
supervised domain adaptation for semantic segmentation via
class-balanced self-training. In ECCV, pages 289–305, 2018.
2
[73] Yang Zou, Zhiding Yu, Xiaofeng Liu, BVK Kumar, and Jin-
song Wang. Confidence regularized self-training. In ICCV,
pages 5982–5991, 2019. 2

21796

CS3491 - Notes - Unit 4 - Ensemble Techniques and Unsupervised Learning
No ratings yet
CS3491 - Notes - Unit 4 - Ensemble Techniques and Unsupervised Learning
35 pages
Cross Validation Thesis
100% (4)
Cross Validation Thesis
5 pages
7.deep Learning Model To Detect and Classify Bone Fracture in X-Ray Images
No ratings yet
7.deep Learning Model To Detect and Classify Bone Fracture in X-Ray Images
6 pages
Chapter B Tech
No ratings yet
Chapter B Tech
41 pages
1ST REVISIONCHAPTER 1 2 and 3 RESEARCH
No ratings yet
1ST REVISIONCHAPTER 1 2 and 3 RESEARCH
26 pages
NN Bnu3
No ratings yet
NN Bnu3
42 pages
Ebook 17CCC
No ratings yet
Ebook 17CCC
440 pages
Outlier Detection A Survey
No ratings yet
Outlier Detection A Survey
84 pages
Salary Prediction
No ratings yet
Salary Prediction
28 pages
Sign Language Detection Using Machine Learning
No ratings yet
Sign Language Detection Using Machine Learning
6 pages
Inteligen Tutoring System
No ratings yet
Inteligen Tutoring System
259 pages
Careers
No ratings yet
Careers
6 pages
Architecture Document - Reviewprediction
No ratings yet
Architecture Document - Reviewprediction
11 pages
NLP Report (Repaired)
No ratings yet
NLP Report (Repaired)
39 pages
Content Based ML Repo
No ratings yet
Content Based ML Repo
36 pages
Unit 3 1 Aiml Notes
No ratings yet
Unit 3 1 Aiml Notes
43 pages
1SJ18CS101 Subhash K V 7 33
No ratings yet
1SJ18CS101 Subhash K V 7 33
27 pages
1 s2.0 S1746809424011388 Main
No ratings yet
1 s2.0 S1746809424011388 Main
19 pages
Model Selection Evaluation Algorithm Selection 1684595082
No ratings yet
Model Selection Evaluation Algorithm Selection 1684595082
51 pages
Data Pruning
No ratings yet
Data Pruning
52 pages
URL Based Phishing Website Detection by Using Gradient and Catboost Algorithms
No ratings yet
URL Based Phishing Website Detection by Using Gradient and Catboost Algorithms
8 pages
Convolutions Die Hard
No ratings yet
Convolutions Die Hard
20 pages
Ijdns 2024 7
No ratings yet
Ijdns 2024 7
14 pages
Choosing Wisely and Learning Deeply
No ratings yet
Choosing Wisely and Learning Deeply
18 pages
Texture Learning Domain Randomization For Domain Generalized Segmentation
No ratings yet
Texture Learning Domain Randomization For Domain Generalized Segmentation
18 pages
Multi-Scale Input Reconstruction Network and One-Stage Instance Segmentation For Enhancing Heart Defect Prediction Rate
No ratings yet
Multi-Scale Input Reconstruction Network and One-Stage Instance Segmentation For Enhancing Heart Defect Prediction Rate
10 pages
A Simple Recipe For Language-Guided Domain Generalized Segmentation
No ratings yet
A Simple Recipe For Language-Guided Domain Generalized Segmentation
14 pages
Forest Fire Prediction Using Machine Learning
No ratings yet
Forest Fire Prediction Using Machine Learning
15 pages
AI Feynman, A Physics Inspired Method For Symbolic Regression, Uderscu, 2020 PDF
No ratings yet
AI Feynman, A Physics Inspired Method For Symbolic Regression, Uderscu, 2020 PDF
17 pages
Not All Classes Stand On Same - Supplementary-Material-1
No ratings yet
Not All Classes Stand On Same - Supplementary-Material-1
12 pages
A Computer Vision Based Approach For Driver Distraction Recognition Using Deep Learning and Genetic Algorithm Based Ensemble
No ratings yet
A Computer Vision Based Approach For Driver Distraction Recognition Using Deep Learning and Genetic Algorithm Based Ensemble
12 pages
Creating A Dataset For High-Performance Computing Code Translation Using LLMS: A Bridge Between Openmp Fortran and C++
No ratings yet
Creating A Dataset For High-Performance Computing Code Translation Using LLMS: A Bridge Between Openmp Fortran and C++
7 pages
Cross-Lingual Contextualized Topic Models With Zero-Shot Learning
No ratings yet
Cross-Lingual Contextualized Topic Models With Zero-Shot Learning
8 pages
Depth Reconstruction With Deep Neural Networks (Part 2)
No ratings yet
Depth Reconstruction With Deep Neural Networks (Part 2)
54 pages
Fivp
No ratings yet
Fivp
4 pages
1 s2.0 S0031320324003522 Main
No ratings yet
1 s2.0 S0031320324003522 Main
49 pages
Enhancing Domain Adaptation Through Prompt Gradient Alignment
No ratings yet
Enhancing Domain Adaptation Through Prompt Gradient Alignment
26 pages
DACS: Domain Adaptation Via Cross-Domain Mixed Sampling
No ratings yet
DACS: Domain Adaptation Via Cross-Domain Mixed Sampling
11 pages
Exploring The Impact of Attacks On Ring AllReduce
No ratings yet
Exploring The Impact of Attacks On Ring AllReduce
2 pages
Depthanything
No ratings yet
Depthanything
18 pages
Objects That Sound: Abstract
No ratings yet
Objects That Sound: Abstract
20 pages
Remotesensing 16 02722
No ratings yet
Remotesensing 16 02722
20 pages
Conflict-Based Cross-View Consistency For Semi-Supervised Semantic Segmentation
No ratings yet
Conflict-Based Cross-View Consistency For Semi-Supervised Semantic Segmentation
11 pages
Unsupervised Domain Adaptation For Depth Prediction From Images
No ratings yet
Unsupervised Domain Adaptation For Depth Prediction From Images
14 pages
Semi-Supervised Image Semantic Segmentation Method With Semantic Regions Patching and Uncertainty-Guided Loss
No ratings yet
Semi-Supervised Image Semantic Segmentation Method With Semantic Regions Patching and Uncertainty-Guided Loss
16 pages
Semi-Supervised Medical Image Segmentation Via Cross Teaching Between CNN and Transformer
No ratings yet
Semi-Supervised Medical Image Segmentation Via Cross Teaching Between CNN and Transformer
14 pages
A Unified Sequence Interface For Vision Tasks
No ratings yet
A Unified Sequence Interface For Vision Tasks
14 pages
Efficient Multi Task Progressive Learning For Semantic Segm - 2024 - Pattern Rec
No ratings yet
Efficient Multi Task Progressive Learning For Semantic Segm - 2024 - Pattern Rec
18 pages
(2203.06915) SimMatch - Semi-Supervised Learning With Similarity Matching
No ratings yet
(2203.06915) SimMatch - Semi-Supervised Learning With Similarity Matching
17 pages
Pop Net
No ratings yet
Pop Net
11 pages
SEA Multi-Graph-Based Higher-Order Sensor Alignment For Multivariate Time-Series Unsupervised Domain Adaptation
No ratings yet
SEA Multi-Graph-Based Higher-Order Sensor Alignment For Multivariate Time-Series Unsupervised Domain Adaptation
16 pages
2024, LC-MSM - Language-Conditioned Masked Segmentation Model For Unsupervised Domain - 'Kim Et Al' (Pattern Recognition)
No ratings yet
2024, LC-MSM - Language-Conditioned Masked Segmentation Model For Unsupervised Domain - 'Kim Et Al' (Pattern Recognition)
10 pages
Instance Consistency Regularization For Semi-Supervised 3D Instance Segmentation
No ratings yet
Instance Consistency Regularization For Semi-Supervised 3D Instance Segmentation
15 pages
Image Translation Based Synthetic Data Generation For Industrial Object Detection and Pose Estimation
No ratings yet
Image Translation Based Synthetic Data Generation For Industrial Object Detection and Pose Estimation
8 pages
Icomit2023 Tut Miura 4x
No ratings yet
Icomit2023 Tut Miura 4x
14 pages
Park LANIT Language-Driven Image-to-Image Translation For Unlabeled Data CVPR 2023 Paper
No ratings yet
Park LANIT Language-Driven Image-to-Image Translation For Unlabeled Data CVPR 2023 Paper
11 pages
27857-Article Text-31911-1-2-20240324
No ratings yet
27857-Article Text-31911-1-2-20240324
9 pages
ML Theory Report
No ratings yet
ML Theory Report
9 pages
Semantic Segmentation by Using Down-Sampling and S
No ratings yet
Semantic Segmentation by Using Down-Sampling and S
14 pages
Perturbed and Strict Mean Teachers For Semi Supervised Semantic Segmentation
No ratings yet
Perturbed and Strict Mean Teachers For Semi Supervised Semantic Segmentation
10 pages
2018 - Adversarial Learning For Semi-Supervised Semantic Segmentation - Hung Et Al
No ratings yet
2018 - Adversarial Learning For Semi-Supervised Semantic Segmentation - Hung Et Al
17 pages
Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction
No ratings yet
Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction
11 pages
Depth Anything: Unleashing The Power of Large-Scale Unlabeled Data
No ratings yet
Depth Anything: Unleashing The Power of Large-Scale Unlabeled Data
18 pages
The Best of Both Modes: Separately Leveraging RGB and Depth For Unseen Object Instance Segmentation
No ratings yet
The Best of Both Modes: Separately Leveraging RGB and Depth For Unseen Object Instance Segmentation
12 pages
Multi-Source Domain Adaptation For Medical Image Segmentation
No ratings yet
Multi-Source Domain Adaptation For Medical Image Segmentation
12 pages
Cdtrans Cross-Domain Transformer For Unsupervised Domain Adaptation
No ratings yet
Cdtrans Cross-Domain Transformer For Unsupervised Domain Adaptation
14 pages
Adaptive Betweenness Clustering For Semi-Supervised Domain Adaptation
No ratings yet
Adaptive Betweenness Clustering For Semi-Supervised Domain Adaptation
15 pages
He APSeg Auto-Prompt Network For Cross-Domain Few-Shot Semantic Segmentation CVPR 2024 Paper
No ratings yet
He APSeg Auto-Prompt Network For Cross-Domain Few-Shot Semantic Segmentation CVPR 2024 Paper
11 pages
Unified Deep Supervised Domain Adaptation and Generalization
No ratings yet
Unified Deep Supervised Domain Adaptation and Generalization
11 pages
Optimisation of Semantic Segmentation Algorithm For Autonomous Driving Using U-NET Architecture
No ratings yet
Optimisation of Semantic Segmentation Algorithm For Autonomous Driving Using U-NET Architecture
16 pages
End-to-End Semi-Supervised Object Detection With Soft Teacher
No ratings yet
End-to-End Semi-Supervised Object Detection With Soft Teacher
10 pages
Chu TWIST Two-Way Inter-Label Self-Training For Semi-Supervised 3D Instance Segmentation CVPR 2022 Paper
No ratings yet
Chu TWIST Two-Way Inter-Label Self-Training For Semi-Supervised 3D Instance Segmentation CVPR 2022 Paper
10 pages
(2025-AEJ) A Relation-Enhanced Mean-Teacher Framework For Source-Free Domain Adaptation of Object Detection
No ratings yet
(2025-AEJ) A Relation-Enhanced Mean-Teacher Framework For Source-Free Domain Adaptation of Object Detection
12 pages
Iterative Loop Method Combining Active and Semi-Supervised Learning For Domain Adaptive Semantic Segmentation
No ratings yet
Iterative Loop Method Combining Active and Semi-Supervised Learning For Domain Adaptive Semantic Segmentation
10 pages
Source Free Domain Adaptation With Image Translation: Preprint. Under Review
No ratings yet
Source Free Domain Adaptation With Image Translation: Preprint. Under Review
11 pages
17029-Article Text-20523-1-2-20210518
No ratings yet
17029-Article Text-20523-1-2-20210518
8 pages
Ujjwal Sir Reflective Teacher Semi Supervised Paper
No ratings yet
Ujjwal Sir Reflective Teacher Semi Supervised Paper
10 pages
Bootstrap PDF
No ratings yet
Bootstrap PDF
11 pages
Few-Shot Adversarial Domain Adaptation
No ratings yet
Few-Shot Adversarial Domain Adaptation
11 pages
Wang Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels CVPR 2022 Paper
No ratings yet
Wang Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels CVPR 2022 Paper
10 pages
Atapour-Abarghouei Veritatem Dies Aperit - Temporally Consistent Depth Prediction Enabled by CVPR 2019 Paper PDF
No ratings yet
Atapour-Abarghouei Veritatem Dies Aperit - Temporally Consistent Depth Prediction Enabled by CVPR 2019 Paper PDF
12 pages
Transferring To Real-World Layouts: A Depth-Aware Framework For Scene Adaptation
No ratings yet
Transferring To Real-World Layouts: A Depth-Aware Framework For Scene Adaptation
11 pages
Unsupervised Domain Adaptation Style GAN
No ratings yet
Unsupervised Domain Adaptation Style GAN
5 pages
Wang Domain Adaptive Semantic Segmentation With Self-Supervised Depth Estimation ICCV 2021 Paper
No ratings yet
Wang Domain Adaptive Semantic Segmentation With Self-Supervised Depth Estimation ICCV 2021 Paper
11 pages
RFBNet Deep Multimodal Networks With Residual Fusion Blocks For RGB-D Semantic Segmentation
No ratings yet
RFBNet Deep Multimodal Networks With Residual Fusion Blocks For RGB-D Semantic Segmentation
7 pages
Neural Network Adaption For Depth Sensor Replication
No ratings yet
Neural Network Adaption For Depth Sensor Replication
11 pages
Zheng SimMatch Semi-Supervised Learning With Similarity Matching CVPR 2022 Paper
No ratings yet
Zheng SimMatch Semi-Supervised Learning With Similarity Matching CVPR 2022 Paper
11 pages
(IJCST-V12I3P11) :M. Rega, Dr. S. Sivakumar
No ratings yet
(IJCST-V12I3P11) :M. Rega, Dr. S. Sivakumar
6 pages
Implementation of Deep Neural Networks Learning On Unmanned Aerial Vehicle Based Remote-Sensing
No ratings yet
Implementation of Deep Neural Networks Learning On Unmanned Aerial Vehicle Based Remote-Sensing
7 pages
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
No ratings yet
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
20 pages
Sensors: Semantic Segmentation With Transfer Learning For Off-Road Autonomous Driving
No ratings yet
Sensors: Semantic Segmentation With Transfer Learning For Off-Road Autonomous Driving
21 pages
PHD Title: Efficient Multimodal Vision Transformers For Embedded System
No ratings yet
PHD Title: Efficient Multimodal Vision Transformers For Embedded System
4 pages
Choi Self-Ensembling With GAN-Based Data Augmentation For Domain Adaptation in Semantic ICCV 2019 Paper
No ratings yet
Choi Self-Ensembling With GAN-Based Data Augmentation For Domain Adaptation in Semantic ICCV 2019 Paper
11 pages
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet

Cross Match

Uploaded by

Cross Match

Uploaded by

This ICCV paper is the Open Access version, provided by the Computer Vision Foundation.

Except for this watermark, it is identical to the accepted version;

CrossMatch: Source-Free Domain Adaptive Semantic Segmentation via

novel asymmetric two-stream architecture that learns more

)&* consistency loss

quences [71]. The correlations between depth and se- 4. Approach

Consistency regularization is a popular and essential tech-

You might also like