0% found this document useful (0 votes)

4 views14 pages

Cluster GuidedAsymmetricContrastiveLearningforUnsupervisedPersonRe Identification - 2106.07846

The document presents a novel approach called Cluster-guided Asymmetric Contrastive Learning (CACL) for unsupervised person re-identification (Re-ID), which aims to improve feature learning by suppressing the influence of colors in images. CACL leverages clustering information to guide the learning process and employs both instance-level and cluster-level contrastive learning to enhance feature discrimination. Experimental results on benchmark datasets demonstrate the effectiveness of the proposed method in learning robust features for person Re-ID.

Uploaded by

Tapan Mittal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views14 pages

Cluster GuidedAsymmetricContrastiveLearningforUnsupervisedPersonRe Identification - 2106.07846

Uploaded by

Tapan Mittal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/360647017

Cluster-Guided Asymmetric Contrastive Learning for Unsupervised Person Re-

Identiﬁcation

Article in IEEE Transactions on Image Processing · May 2022

DOI: 10.1109/TIP.2022.3173163

CITATIONS READS
78 39

3 authors:

Mingkun Li Chun-Guang Li
Beijing University of Posts and Telecommunications Beijing University of Posts and Telecommunications
10 PUBLICATIONS 115 CITATIONS 85 PUBLICATIONS 3,348 CITATIONS

SEE PROFILE SEE PROFILE

Jun Guo
Beijing University of Posts and Telecommunications
478 PUBLICATIONS 8,708 CITATIONS

SEE PROFILE

All content following this page was uploaded by Chun-Guang Li on 24 November 2022.

The user has requested enhancement of the downloaded file.

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 8, APRIL 2022 1

Cluster-guided Asymmetric Contrastive Learning

for Unsupervised Person Re-Identification
Mingkun Li, Chun-Guang Li, Senior Member, IEEE, and Jun Guo

Color Images
Abstract—Unsupervised person re-identification (Re-ID) aims
to match pedestrian images from different camera views in an
unsupervised setting. Existing methods for unsupervised person
Re-ID are usually built upon the pseudo labels from clustering.
arXiv:2106.07846v2 [cs.CV] 9 May 2022

However, the result of clustering depends heavily on the quality

of the learned features, which are overwhelmingly dominated Color Images
Cluster
by colors in images. In this paper, we attempt to suppress the
negative dominating influence of colors to learn more effective
Contrastive
features for unsupervised person Re-ID. Specifically, we propose Learning
a Cluster-guided Asymmetric Contrastive Learning (CACL) ap-
proach for unsupervised person Re-ID, in which clustering result
is leveraged to guide the feature learning in a properly designed Gray-Scale Images

asymmetric contrastive learning framework. In CACL, both

instance-level and cluster-level contrastive learning are employed
to help the siamese network learn discriminant features with
respect to the clustering result within and between different
data augmentation views, respectively. In addition, we also
present a cluster refinement method, and validate that the cluster
refinement step helps CACL significantly. Extensive experiments
conducted on three benchmark datasets demonstrate the superior Fig. 1. Illustration for basic idea of our proposal. We attempt to leverage the
clustering information into contrastive learning to find more effective features
performance of our proposal. by exploring the invariance between color images and gray-scale images.
Index Terms—Unsupervised Person Re-Identification, Asym-
metric Contrastive Learning, Cluster Refinement.
More recently, contrastive learning is applied to perform
feature learning in unsupervised setting, e.g., [6], [7], [8],
I. I NTRODUCTION [9], [10]. The primary idea in these methods is to learn
some invariance in feature representation with self-supervised
U NSUPERVISED person Re-identification (Re-ID) aims
to match pedestrian images from different camera views
in unsupervised setting without demanding massive labelled
mechanism based on data augmentation. In SimCLR [8], each
sample and its multiple augmentations are treated as positive
data, and has attracted increasing attention in computer vision pairs, and the rest of the samples in the same batch are treated
and pattern recognition community in recent years [1]. The as negative pairs and, a contrastive loss is used to distinguish
great challenge we face in unsupervised person Re-ID is to the positive and negative samples to prevent the model from
tackle heavy variations from different viewpoints, varying illu- falling into a trivial solution. We note that SimCLR requires
minations, changing weather conditions, cluttered background to use a large batch size, e.g., 256 ∼ 4096, to contain enough
and etc., without supervision labels. negative samples for effectively training the networks. In
BYOL [10] and SimSiam [9], a predictor layer is used to
Recently, existing methods for unsupervised person Re-ID
prevent feature collapse without using negative samples. In
are usually built on exploiting weak supervision information
InterCLR [6] and SwAV [7], clustering is used to prevent
(e.g., pseudo labels) from clustering. For example, MMT [2]
the feature collapse. In particular, in SwAV [7], a scalable
uses DBSCAN [3] algorithm to generate pseudo labels and
online clustering loss is proposed to train the siamese network
exploit the pseudo labels to train two networks. HCT [4]
with multi-crop data augmentation; whereas in InterCLR [6],
uses a hierarchical clustering algorithm to gradually assign
a MarginNCE loss is proposed to enhance the discriminant
pseudo labels to the training samples during the training stage.
power. While promising performance has been reported on
SSG [5] uses k-means on training samples with multi-views.
ImageNet [11], however, these contrastive learning methods
However, the performance of these methods heavily relies on
are not suitable for unsupervised person Re-ID due to serious
the quality of the pseudo labels, which directly depends on
feature collapse.
the feature representation of the input images.
In this paper, we attempt to leverage cluster information
M. Li, C.-G. Li and J. Guo are with the School of Artificial Intelligence, into contrastive learning to develop an effective framework for
Beijing University of Posts and Telecommunications, Beijing, 100876 P.R. unsupervised person Re-ID. We notice that the performance
China e-mail: {mingkun.li, lichunguang, guojun}@bupt.edu.cn.
Chun-Guang Li is the corresponding author. of person Re-ID depends heavily on the effectiveness of the
Manuscript received xx, 2021; revised xx, xxxx. learned features. However, the learned features are overwhelm-
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 8, APRIL 2022 2

ingly dominated by the colors in pedestrian images (such as

the clothing color and background color), especially in the
unsupervised setting. For example, the pedestrian images with
similar color clothes often have smaller distances in feature
space, which may result in mistakes in clustering, and the
mistakes in clustering may further bring wrong guidance to
the pseudo labels for training the network. Although colors
are important feature to match pedestrian images for person
Re-ID, it may also become an obstacle to learn more subtle
and effective texture features that are important fine-level cues
for person Re-ID. Thus it is desirable to learn more robust
and discriminating features that can resist dominant colors for
person Re-ID task.
Unfortunately, it is quite challenging to properly suppress (a) Raw (b) T (c) T 0 (d) G ◦ T 0
the negative impact of colors for learning more effective fine-
grain level features without loss of discriminant information. Fig. 2. Illustration for the raw images and the augmented images. The
first column shows the raw images. The middle two columns show the
For example, directly using random color changing (i.e., color- images generated with transforms T (·) and T 0 (·). The last column shows
jitter [12]) for data augmentation in contrastive training may the corresponding gray-scale images which are generated with both transform
damage the consistency in color distribution, not that helpful to T 0 (·) and color-to-grayscale transform G(·), i.e., G ◦ T 0 (·).
gain generalization ability on unseen samples. To this end, in
this paper, we propose a novel and effective framework for un- proposal. Section IV shows experiments and Section V gives
supervised person Re-ID, termed Cluster-guided Asymmetric the conclusions.
Contrastive Learning (CACL), in which clustering information
is properly incorporated into contrastive learning to learn II. R ELATED W ORK
robust and discriminant features while suppressing dominant
colors, as illustrated in Fig. 1. To be specific, we explore A. Unsupervised Person Re-identification
supervision information from the perspective of suppressing Person Re-ID aims to find specific pedestrians from videos
colors in the framework of cluster-guided contrastive learning, or images according to targets. For the increasing demand
in which the samples in asymmetric views of specifically in real life and avoiding the high consumption of labeling
designed data augmentations (e.g., color images vs. gray- datasets, unsupervised person Re-ID has become popular in
scale images) as shown in Fig. 2—are exploited to provide recent years [1]. The existing unsupervised person Re-ID
strong supervision to impose invariance in feature learning. methods can be divided into two categories: a) unsupervised
By integrating the clustering results into contrastive learning, domain adaptation methods, that need labeled source dataset
the proposed framework is able to avoid feature collapse. and unlabeled target dataset [13], [14], [15], [16]; and b)
By suppressing dominant colors, the proposed framework is pure unsupervised methods, that need with only unlabeled
able to effectively learn robust and discriminating features dataset [17], [18], [19], [20].
other than colors. In addition, we also present a simple but The unsupervised domain adaptation methods train the
effective cluster refinement method to improve the clustering network with the help of labeled datasets, and transfer the
result and thus further enhancing the contrastive learning. We network to unlabeled datasets by reducing the gap between two
conduct extensive experiments on three benchmark datasets, datasets. For example, [21] proposed to align the second-order
and experimental results validate the effectiveness of our statistics of the distributions in the two domains through linear
proposal. transformations to reduce the domain shift; [17] proposed a
Paper Contributions. The contributions of the paper are combined loss function to co-train with samples from the
highlighted as follows. source and target domains and the merging memory bank; [22]
proposed to maximize the inter-domain classification loss and
1) We propose an effective unsupervised framework that minimize the intra-domain classification loss to learn domain
leverages clustering information into contrastive learning robust features. However, unsupervised domain adaptation
while suppressing the dominant colors in images to learn methods are limited by the requirement of the target dataset
fine-grained features. having close distribution to the source dataset.
2) We propose a novel cluster-level loss function to perform Most purely unsupervised person Re-ID methods rely on the
inter-views and intra-view contrastive learning that can pseudo labels to train the network. For example, HCT [4] uses
effectively exploit the cluster-level hidden information hierarchical clustering to generate pseudo labels and train the
from different data augmentation views. convolution neural network for feature learning; [23] assigns
3) We also present a cluster refinement method and verify multiple labels to samples and proposes a new loss function
that the refined clustering information helps the con- for multi-label training. Note that the quality of the pseudo
trastive learning framework significantly. labels relies on the feature representation of the input images.
The remainder of this paper is organized as follows. Sec- However, in the early stage, the feature representation is not
tion II describes the relevant work. Section III presents our good enough to generate high-quality pseudo labels, and thus
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 8, APRIL 2022 3

Cluster-Level Contrastive Learning Flow

Instance-Level Contrastive Learning Loss
Back Propagate

Image ResNet Feature embedding Pseudo Labels

Clustering & Cluster Refinement
𝐼(% 𝐹(⋅ |Θ)
𝓍%

𝒯(⋅)
Predictor G( ⋅ |Ψ) 𝓍%
𝑧%
Raw Update
Instance−level Cluster-level Instance Memory Bank
Image Contrastive ℳ
Contrastive
𝐼%
Loss ℒ" Loss ℒ!
𝒢 ∘ 𝒯 & (⋅)

𝓍-%

𝓍-% Cluster-level
Image ResNet Update
Contrastive Instance Memory Bank
𝐼'% 𝐹 & (⋅ |Θ′)
Loss ℒ! ℳ&

Fig. 3. Illustration for our proposed Cluster-guided Asymmetric Contrastive Learning (CACL) framework. After training, we keep only the ResNet F (·|Θ)
in the first branch for inference and use the feature xi for testing.

the low-quality pseudo labels will contaminate the network dependently repel each other, which will undoubtedly ignore
training. Therefore, it is needed to design a cluster refinement the cluster information. In contrast, cluster-level contrastive
method to improve the clustering quality before feeding the learning can effectively mine cluster information, but it relies
pseudo labels to train the network. heavily on the clustering result. Unfortunately, in the early
training stage, the features are not good enough to yield good
B. Contrastive Learning clustering result. Thus, an effective way to train the network by
In recent years, with the development and application of combining both the two lines of contrastive learning methods
the siamese network, contrastive learning began to emerge in is needed.
the field of unsupervised learning. Contrastive learning aims In this paper, we attempt to bridge the two lines of con-
at learning good image representation. It learns invariance in trastive learning methods into a unified framework to form
features by manipulating a set of positive samples and negative effective mutual learning and joint training: a) the instance-
samples with data augmentation. level contrastive learning helps training the network to perform
The existing methods for contrastive learning can be further feature learning—especially in the early training stage; mean-
categorized to: a) instance-level methods [8], [9], [24], [25], while b) the cluster-level contrastive learning helps training
[10] and b) cluster-level methods [7], [6], [26]. Instance-level the network—especially when the quality of the clustering has
methods regard each image as an individual class and consider been improved. In this way, the self-supervision information
two augmented views of the same image as positive pairs and imposed by data augmentation and the weak supervision
treat others in the same batch (or memory bank) as negative information obtained from clustering can be fully exploited
pairs. For example, SimCLR [8] regards samples in the current without the need to use negative samples pairs.
batch as the negative samples; MoCo [27] uses a dictio-
nary to implement contrastive learning, which converts one
III. O UR P ROPOSAL : C LUSTER - GUIDED A SYMMETRIC
branch of the contrastive learning into a momentum encoder;
C ONTRASTIVE L EARNING (CACL)
SimSiam [9] proposed a stop-gradient method that can train
the siamese network without negative samples. Cluster-level This section presents our proposal—Cluster-guided Asym-
methods regard the samples in the same clusters as positive metric Contrastive Learning (CACL) approach for unsuper-
samples and other samples as negative samples. For example, vised person Re-ID.
in [6] InfoNCE loss is combined with MarginNCE loss to For clarity, we show the architecture of our proposed CACL
attract positive samples and repelled negative samples; in [7] in Fig. 3. Overall, our CACL is a siamese network, which
multi-crop data augmentation is used to enhance the robustness consists of two branches of backbone networks F (·|Θ) and
of the network and a scalable online clustering method is F 0 (·|Θ0 ) without sharing parameters, where Θ and Θ0 are the
proposed to explore the inter-invariance of clusters; in [26] parameters in the two networks, respectively, and a predictor
weights-sharing deep neural networks are used to extract layer G(·|Ψ) is added after the first branch, where Ψ denotes
features from sample pairs with different data augmentations, the parameters in the predictor layer. The backbone networks
and contrastive clustering is performed with respect to both F (·|Θ) and F 0 (·|Θ0 ) are implemented1 via ResNet-50 [28] for
the features in the row and column spaces. feature learning.
However, in the unsupervised setting, the instance-level
contrastive learning methods simply make each sample in- 1 It also works if the backbone networks other than ResNet-50 are used.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 8, APRIL 2022 4

Given an unlabeled image dataset I = {Ii }N i=1 consisting where ω(Ii ) is to find the cluster index ` for z i , and ũ`
of N samples. For an input image Ii ∈ I, we generate two is the center vector of the `-th cluster in which Ũ :=
samples Iî and I˜i via different data augmentation strategies {ũ1 , · · · , ũm0 } and the cluster center ũ` is defined as
as the inputs of the two branches, respectively, in which
1
Iî = T (Ii ) and I˜i = G(T 0 (Ii )), where T (·) and T 0 (·) denote
X
ũ` = ṽ i , (3)
two different transforms and G(·) denotes the operation to |C (`) |
Ii ∈C (`)
transform color image into gray-scale image. For simplicity,
we denote the output features of the first network branch and where ṽ i is the instance feature of image I˜i in the instance
the second network branch as xi and x̃i , and denote the output memory bank M̃, C (`) is the `-th cluster. The inter-views
(inter)
of the predictor layer in the first branch as z i , respectively, cluster-level contrastive loss LC defined in Eq. (2)
where xi , x̃i , z i ∈ RD . is used to reduce the discrepancy between the projection
The clustering result of the output features X := output z i of the first network branch and the cluster center
{x1 , · · · , xN } from the first network branch is used to gener- ũ` of the feature output of the second branch with the
ate the pseudo labels Y := {y 1 , · · · , y N }. We exploit the gray-scale view.
pseudo labels to leverage the cluster information into the • Intra-views Cluster-level contrastive loss, denoted as
(intra)
contrastive learning. Specifically, in the training stage, the LC , which is defined as:
two network branches F (·|Θ) and F 0 (·|Θ0 ) are trained with (intra)
the augmented samples without sharing parameters, and the LC = − (1 − qi )2 ln(qi )
(4)
pseudo labels Y are used to guide the training of both network − (1 − q̃i )2 ln(q̃i ),
branches.
In CACL, we use instance memory banks M = {vi }N where qi and q̃i are the softmax of the inner product of the
i=1
and M̃ = {ṽi }N D network outputs and the corresponding instance memory
i=1 where vi , ṽi ∈ R to store the outputs of
two branches, respectively. Both instance memory banks M bank, which are defined as
and M̃ are initialized with X := {x1 , · · · , xN } and X̃ := exp(u>ω(Ii ) xi /τ )
{x̃1 , · · · , x̃N }, which are the outputs of the network branches qi = Pm0 , (5)
>
F (·|Θ) and F 0 (·|Θ0 ) pre-trained on ImageNet, respectively. `=1 exp(u` xi /τ )
exp(ũ>
ω(Ii ) x̃i /τ )
A. Cluster-guided Contrastive Learning q̃i = Pm0 >
, (6)
`=1 exp(ũ` x̃i /τ )
At beginning, we pre-train the two network branches F (·|Θ)
and F 0 (·|Θ0 ) on ImageNet [11], and use the features from where u` and ũ` are the center vectors of the `-th cluster
the first network branch F (·|Θ) to yield m clusters, which for the first branch and the second branch, respectively,
are denoted as C := {C (1) , C (2) , · · · , C (m) }. The clustering in which ũ` is defined in Eq. (3) and u` is defined as
result is used to form pseudo labels to train the cluster-guided 1 X
contrastive learning module. u` = vi , (7)
|C (`) |
To exploit the label invariance between the two augmented Ii ∈C (`)
views and leverage the cluster structure, we employ two types
of contrastive losses: a) instance-level contrastive loss, denoted where v i is the instance feature of image Iî in the instance
as LI , and b) cluster-level contrastive loss, denoted as LC . memory bank M. Note that both xi and x̃i share the
Instance-Level Contrastive Loss. To match the feature out- same pseudo labels ω(Ii ) from clustering. The intra-
(intra)
puts z i and x̃i of the two network branches at instance-level, views cluster-level contrastive loss LC in Eq. (4) is
similar to [8], [10], we introduce the negative cosine similarity used to encourage the siamese network to learn features
of the prediction outputs z i in the first branch and the feature with respect to the corresponding cluster center for the
output of the second branch x̃i to define an instance-level two branches, respectively.
contrastive loss LI as follows: Putting the loss functions in Eqs. (2) and (4) together, we
z> x̃i have the cluster-level contrastive loss LC as follows:
i
LI := − , (1)
kz i k2 kx̃i k2 LC := LC
(inter)
+ LC
(intra)
. (8)
where k · k2 is the `2 -norm.
Cluster-Level Contrastive Loss. To leverage the cluster struc- Remark 1. The cluster-level contrastive loss LC in Eq. (8)
ture to further explore the hidden information from different aims to leverage the clustering information to minimize the
views, we propose a cluster-level contrastive loss LC , which difference between the samples of the same cluster from
(inter)
is further divided into inter-views cluster-level contrastive loss different augmentation views via LC , and within the same
(intra)
and intra-views cluster-level contrastive loss. augmentation view via LC . This will help the siamese
• Inter-views Cluster-level contrastive loss, denoted as network to mine the hidden information brought by the basic
(inter)
LC , which is defined as: augmented view in the first branch and the gray-scale aug-
mented view in the second branch to prevent feature collapse
(inter) z>i ũω(Ii )
LC := − , (2) to a trivial solution and impose the supervision information to
kz i k2 kũω(Ii ) k2 learn features other than colors.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 8, APRIL 2022 5

B. Clustering and Cluster Refinement where α is set as 0.2 by default (and we will discuss the
Note that the cluster-level contrast loss is greatly affected by influence of α in experiments).
the quality of the clustering result. When the clusters are noisy, In order to save the computation cost2 , we also use a stop-
it will cause negative effects on the training. To improve the gradient operation as mentioned in SimSiam [9]. Note that we
quality of the clustering result, we propose a cluster refinement adopt the stop-gradient operation [9] to the second network
method which removes a proportion of noisy samples in larger branch F 0 (·|Θ0 ) when using the instance level loss LI in
clusters, helping the model to better learn the information at Eq. (1) to perform back propagation. Thus, the parameters
the cluster level. Θ0 in the second network branch are updated only with the
(intra)
For a cluster, we want to keep the samples with higher sim- intra-views cluster-level contrastive loss LC in Eq. (4).
ilarity and remove the samples with lower similarity. Given a Remark 3. For clarity, we summarize the details of the train-
set of raw clusters, denoted as {C (1) , C (2) , · · · , C (m) }, without ing procedure in Algorithm 1. We note that the “asymmetry” in
loss of generality, we pick C (i) to perform cluster refinement. the proposed framework for cluster-guided contrastive learning
At first, we obtain an over-segmentation of C (i) , i.e., C (i) is lies in following three aspects: a) asymmetry in network
(i) (i) (i) structure, i.e., a predictor layer is only added after the first
further divided into {C1 , C2 , · · · , Cni }. Then we perform
cluster refinement according to the following criterion: branch3 ; and b) asymmetry in data augmentation, i.e., the
(i) (i)
augmented samples provided to the second branch are further
if D(Cj |C (i) ) < D(C (i) ), then Cj is kept; (9) transformed into gray-scale; c) asymmetry in pseudo labels
(i) (i) generation, i.e., the output features of the first branch are
otherwise Cj is removed, where D(Cj |C (i) ) is the average
(i)
used to generate pseudo labels which are shared with the
inter-distance from all samples in the sub-cluster Cj to other second branch. Because of the asymmetry in the three aspects
(i) (i)
samples in cluster C , and D(C ) is the average intra- mentioned above, we term the proposed framework as Cluster-
distance among samples in cluster C (i) . guided Asymmetric Contrastive Learning (CACL).
After such a post-processing step, the clusters of larger Remark 4. There have been many unsupervised Re-ID meth-
size are improved and at meantime, more singletons or tiny ods [17], [13], [29], [12] used the contrastive learning to learn
clusters are also produced. We denote the refined clusters discriminant features. Most of them [13], [29], [12] are Gener-
0
as C 0 = {C (1) , C (2) , · · · , C (m ) }, where m0 ≥ m. Compared ative Adversarial Networks (GANs)-based methods and need
to tiny clusters and singletons, the larger clusters are more additional supervised information to assist the training. For
informative to provide pseudo supervision information to example, ATNet [13] trains multiple GANs through utilizing
guide the contrastive learning. illumination and camera information, GCL [12] introduces
Remark 2. In implementation, we use DBSCAN algorithm [3] the pose information in training, and AD-cluster [29] uses
to generate the raw clusters and to generate the over- generating cross-camera samples to assist the training. Un-
segmentation of the clusters. DBSCAN [3] is a density-based like these methods, our proposed CACL uses an asymmetric
clustering algorithm. It regards a data point as density reach- Siamese network to effectively learn fine-grained features by
able if the data point lies within a small distance threshold d to suppressing color with simple data augmentation operations
other samples, where the parameter d is the distance threshold during the training, rather than using an expensive sample
to find neighboring point. Specifically, to generate the raw generation via GANs. Compared to GANs based methods, our
clusters, we employ DBSCAN with a slightly larger distance CACL is simple, efficient and effective.
threshold parameter d (e.g., d = 0.6); whereas to generate the
over-segmentation, we use a slightly smaller distance threshold
parameter d0 , where d0 := d−δ (e.g., δ = 0.02). We will show D. Inference Procedure for CACL
the influence of the parameters δ and d in experiments. After training, we keep only the ResNet F (·|Θ) in the first
branch for inference in testing.
C. Training Procedure for Our CACL Approach To be specific, in the inference procedure, we use the output
In CACL, the two branches in the siamese network are features X of the first branch F (·|Θ) to calculate the similarity
g
implemented with ResNet-50 [28] and they are not sharing between images. Given the query image dataset I g = {Iig }N i=1
q
parameters. We pre-train the two network branches on Ima- and the query image dataset I q = {Iiq }N g
i=1 , where N and
q
geNet at first and use the learned features to initialize the two N are the sizes of the two datasets, respectively. For each
memory banks M and M̃, respectively. image Iig in the query, we compute the distances between the
In training stage, we train both network branches at the query image and the images in the gallery I q via the feature
same time with the total loss:
2 Note that it is not necessary to use the stop-gradient operation in our

L := LI + LC . (10) CACL because the clustering result provides enough guide information under
the asymmetric structure to prevent collapse. Although this is similar to the
We update the two instance memory banks M and M̃, method in SimSiam [9], the purpose is different and it is not necessary to use
in our proposal.
respectively, as follows: 3 It is also feasible to add another predictor layer after the second branch

(t) (t−1) to have a symmetric network structure. Nevertheless, our experimental results
v i ← αv i + (1 − α)xi , (11) show that merely marginal performance improvement can be yielded after
(t) (t−1) adding an extra predictor layer. Thus, we prefer to use the asymmetric network
ṽ i ← αṽ i + (1 − α)x̃i , (12) architecture for the contrastive learning framework.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 8, APRIL 2022 6

Algorithm 1 Training Procedure for CACL B. Implementation Details

Input: Given a dataset I = {Ii }N i=1 . Settings for Training. In our CACL approach, we use
Output: ResNet-50 [28] pre-trained on ImageNet [11] for both network
1: Pre-train the two network branches on ImageNet. branches.5 The feature outputs xi ∈ RD and x̃i ∈ RD of the
2: Initialize the two instance memory banks M and M̃ and two networks F (·|Θ) and F (·|Θ0 ) are D-dimensional vectors
set P = Pbest = 0. where D = 2048. We use the features output xi of the first
3: while epoch ≤ total epoch do branch F (·|Θ) to perform clustering, where xi = F (Iˆi |Θ) ∈
4: Generate Iˆi and I˜i via data augmentation T (·) and RD .
G(T 0 (·)); The prediction layer G(·) is a D × D full connection layer.
5: Perform feature extraction to get xi and x̃i ; We initialize the two memory banks with the outputs of the
6: Perform clustering and clustering refinement via Eq. (9) feature from the corresponding network branches F (·|Θ) and
to yield pseudo label Y = {y 1 , · · · , y N }; F 0 (·|Θ0 ), respectively. We optimize the network through Adam
7: Update the two cluster centers U and Ũ via Eq. (7); optimizer [47] with a weight decay of 0.0005 and train the
8: Train siamese network, i.e., updatomg Θ, Ψ and Θ0 via network with 80 epochs in total. The learning rate is initially
the total loss in Eq. (10); set as 0.00035 and decreased to one-tenth per 20 epochs. The
9: Update instance memory bank M and M̃ via Eq. (11) batch size is set to 64. The temperature coefficient τ in Eq. (6)
and Eq. (12); is set to 0.05 and the update factor α in Eqs. (11) and (12) is
10: Evaluate the model performance P with F (·|Θ); set to 0.2.
11: if P > Pbest then Settings for Data Augmentation. In our experiments, we use
12: Output the best model F (·|Θ) and set Pbest ← P ; the same data augmentation operations as other methods [17],
13: end if [2], including random horizontal flip, random erasing and
14: end while random crop, to define data augmentation T (·) and T 0 (·).
Besides, we add a gray-scale transform to the input of the
second branch.
obtained from the output of the first branch. And then, we sort Metrics for Performance Evaluation. In evaluation, we use
the distance in ascending order to find the matched images. the mean average precision (mAP) and cumulative matching
characteristic (CMC) at Rank-1, 5, 10 to evaluate the perfor-
IV. E XPERIMENTS mance.
In this section, we describe the used benchmark datasets
and the detailed parameter settings in experiments at first, C. Comparison to the State-of-the-art Methods
and then provide extensive experiments on these datasets, We compare our proposed CACL to the state-of-the-art
including a set of detailed ablation study and a set of evaluation unsupervised domain adaptation methods and purely unsu-
experiments to show the effect of each component. Finally, we pervised methods for person Re-ID. The purely unsuper-
give a set of data visualization experiments. 4 vised methods for person Re-ID include: CAMEL [40],
PUL [19], SSL [20], LOMO [42], BOW [41], BUC [18],
A. Dataset Description HCT [4], SpCL [17], and CAP [43]. The unsupervised
To evaluate the effectiveness of our proposal, we use domain adaptation methods for person Re-ID include: PT-
the following three benchmark datasets: Market-1501 [41], GAN [30], ADTC [36], HHL [35], SSG [5], MMCL [23],
DukeMTMC-ReID [45] and MSMT17 [46]. AD-Cluster [29], MEB [38], NRMT [39], SPGAN [32], TJ-
Market-1501 has 32,668 photos of 1501 people from six AIDL [16], JVTC [37], PGPPM [34], and MMT [2].
different camera views. The training set contains 12,936 of The comparison results of the state-of-the-art unsupervised
751 identities. The testing set contains 19,732 images of 750 domain adaptation methods and purely unsupervised methods
identities. are shown in Table I. We can find that our proposed CACL
DukeMTMC-ReID consists of images sampling from achieves 80.9/92.7% at mAP/Rank-1 on Market-1501 and
DukeMTMC-ReID video dataset, 120 frames per video, with 69.6/82.6% at mAP/Rank-1 on DukeMTMC-ReID, respec-
a total of 36,411 images of people of 1404 identities. The tively. It can be found that CACL not only performs better
training set contains 16,522 images of 702 identities and the than all pure unsupervised methods but also achieves the best
testing set contains 2228 query images of 702 identities and performance than unsupervised domain adaptation methods.
17,661 gallery images. These images are taken from eight Moreover, we also conduct experiments on a much larger
cameras. dataset MSMT17 and report the experimental results in Table
MSMT17 has a total of 126,441 images under 15 camera II. Again, we can observe that our proposed CACL achieves
views. The training set contains 32,621 images of 1041 identi- a leading performance, i.e., 23.0/48.4% at mAP/Rank-1. It is
ties. The testing set contains 93,820 images of 3060 identities worth to note that our CACL yields superior performance than
are used for testing. MSMT17 is larger than Market-1501 and some UDA methods on this challenging dataset. These results
DukeMTMC-ReID. confirm the effectiveness of our proposal.
4 The code can be downloaded from https://github.com/MingkunLishigure/ 5 In Section IV-C, we also provide the performance evaluation with other
CACL. backbone networks for the two branches.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 8, APRIL 2022 7

TABLE I
C OMPARISON TO OTHER STATE - OF - THE - ART METHODS . ’UDA’ IS TO REFER THE UNSUPERVISED DOMAIN ADAPTATION METHODS AND ’US’ IS TO
REFER THE PURELY UNSUPERVISED LEARNING METHODS . ’*’ MEANS THAT THE USED BACKBONE IS PRE - TRAINED ON I MAGE N ET.

Market-1501 DukeMTMC-ReID
Method Type Reference Bakcbone
mAP Rank-1 Rank-5 Rank-10 mAP Rank-1 Rank-5 Rank-10
PTGAN [30] UDA CVPR’18 GoogleNet [31] 15.7 38.6 57.3 - 13.5 27.4 43.6 -
SPGAN [32] UDA CVPR’18 ResNet50* [28] 26.7 58.1 76.0 82.7 26.4 46.9 62.6 68.5
TJ-AIDL [16] UDA CVPR’18 MobileNet* [33] 26.5 58.2 74.8 - 23.0 44.3 59.6 -
PGPPM [34] UDA CVPR’18 ResNet50* [28] 33.9 63.9 81.1 86.4 17.9 36.3 54.0 61.6
HHL [35] UDA ECCV’18 ResNet50* [28] 31.4 62.2 78.0 84.0 27.2 46.9 61.0 66.7
SSG [5] UDA ECCV’19 ResNet50* [28] 58.3 80.0 90.0 92.4 53.4 73.0 80.6 83.2
AD-cluster [29] UDA CVPR’20 ResNet50* [28] 68.3 86.7 94.4 96.5 54.1 72.6 82.5 85.5
ADTC [36] UDA ECCV’20 ResNet50* [28] 59.7 79.3 90.8 94.1 52.5 71.9 84.1 87.5
MMCL [23] UDA CVPR’20 ResNet50* [28] 60.4 84.4 92.8 95.0 51.4 72.4 82.9 85.0
MMT [2] UDA ICLR’20 ResNet50* [28] 73.8 89.5 96.0 97.6 62.3 76.3 87.7 91.2
JVTC [37] UDA ECCV’20 ResNet50* [28] 67.2 86.8 95.2 97.1 66.5 80.4 89.9 93.7
MEB [38] UDA ECCV’20 ResNet50* [28] 76.0 89.9 95.2 96.9 65.3 81.2 90.9 92.2
NRMT [39] UDA ECCV’20 ResNet50* [28] 71.7 87.8 94.6 96.5 62.2 77.8 86.9 89.5
SpCL [17] UDA NIPS’20 ResNet50* [28] 76.7 90.3 96.2 97.7 68.8 82.9 90.1 92.5
CAMEL [40] US ICCV’17 ResNet50* [28] 26.3 54.4 73.1 79.6 19.8 40.2 57.5 64.9
Bow [41] US ICCV’15 - 14.8 35.8 52.4 60.3 8.5 17.1 28.8 34.9
PUL [19] US TOMM’18 ResNet50* [28] 22.8 51.5 70.1 76.8 22.3 41.1 46.6 63.0
LOMO [42] US CVPR’15 - 8.0 27.2 41.6 49.1 4.8 12.3 21.3 26.6
BUC [18] US AAAI’19 ResNet50* [28] 30.6 61.0 71.6 76.4 21.9 40.2 52.7 57.4
HCT [4] US CVPR’20 ResNet50* [28] 56.4 80.0 91.6 95.2 50.1 69.6 83.4 87.4
SSL [20] US CVPR’20 ResNet50* [28] 37.8 71.7 83.8 87.4 28.6 52.5 63.5 68.9
SpCL [17] US NIPS’20 ResNet50* [28] 73.1 88.1 96.3 97.7 65.3 81.2 90.3 92.2
CAP [43] US AAAI’20 ResNet50* [28] 79.2 91.4 96.3 97.7 67.3 81.1 89.3 91.8
CACL US This paper ResNet50* [28] 80.9 92.7 97.4 98.5 69.6 82.6 91.2 93.8
CACL US This paper IBN-ResNet* [44] 83.6 93.3 97.7 98.3 72.5 85.5 92.9 94.9

TABLE II In the baseline method, we train both branches with data

E XPERIMENTAL R ESULTS ON MSMT17. augmentation T 0 (·) and T 0 (·) by using the Non-Parametric
MSMT17 Softmax loss [49], which is defined as
Method Type Reference
mAP Rank-1 Rank-5 Rank-10
exp(u>ω(Ii ) xi /τ )
PTGAN [30] UDA CVPR’18 3.3 11.8 - 27.4 L(xi ) = − ln( Pm0 ), (13)
ECN [48] UDA CVPR’19 10.2 30.2 41.5 46.8 >
SSG [5] UDA ICCV’19 13.3 32.2 - 51.2 `=1 exp(u` xi /τ )
MMCL [23] UDA CVPR’20 16.2 43.6 54.3 58.9 and both the training process and the memory updating strat-
JVTC+ [37] US ECCV’20 17.3 43.1 53.8 59.4
SpCL [17] US NIPS’20 19.1 42.3 55.6 61.2
egy in the baseline method are kept the same as our CACL
MMT [2] UDA ICLR’20 24.0 50.1 63.5 69.3 method.
SpCL [17] UDA NIPS’20 26.8 53.7 79.3 83.1 To comprehensively evaluate the contribution of each com-
CACL US This paper 23.0 48.9 61.2 66.4 ponent, we conduct a set of ablation experiments by test-
CACL w/ IBN-ResNet US This paper 29.9 57.1 68.4 73.1
ing each component in our CACL framework individually,
i.e., cluster refinement, instance-level contrastive loss LI and
cluster-level contrastive loss LC . To further evaluate the sub-
Note that Instance-Batch Normalization (IBN) [44] has been part of the cluster-level contrastive loss, we also conduct
(inter)
used in object recognition and has been proved very effective. experiments to evaluate the influence of using LC or
(intra)
Here, we evaluate our CACL, in which the backbone is im- LC , separately.
plemented with Instance-Batch Normalization ResNet (IBN- In the ablation experiments, to test the model with con-
ResNet). Similar to CACL with ResNet [28], we introduce trastive loss LC or LI , we train both branches with data aug-
an Instance-Batch Normalization (IBN) layer to replace the mentation T 0 (·) and G(T 0 (·)), respectively. To test the model
BN layer and call it an IBN-ResNet. As shown in Table I, performance with the cluster-level contrastive loss LC and the
(intra)
the performance of our CACL can be further improved when sub-part of LC , compared to the baseline method, we
combining with IBN-ResNet. need to replace the Non-Parametric Softmax loss in Eq. (13)
by the loss in Eq. (4) for both branches. The results of the
ablation study are reported in Table III.
D. Ablation Study As can be read in Table III, the performance improves when
each component is used individually. This validates that each
To evaluate the effectiveness of each component: LI , component contributes to the performance improvements. For
(inter) (intra)
LC , LC and clustering with refinement in our CACL the experiments of using both LC and LI , it does not signif-
approach, we conduct a set of ablation experiments on Market- icantly better than just using LI , and in the experiments of
1501 and DukeMTMC-ReID. using LC we observe a slight improvement than the baseline.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 8, APRIL 2022 8

TABLE III
A BLATION S TUDY ON M ARKET-1501 AND D UKE MTMC-R E ID.

Market-1501 DukeMTMC-ReID
Components Cluster Refine LI Lintra
C Linter
C mAP Rank-1 Rank-5 Rank-10 mAP Rank-1 Rank-5 Rank-10
Baseline 68.1 85.2 94.0 96.0 62.5 78.5 88.5 90.3
+ LC X X 70.8 87.5 94.4 96 62.5 79.5 88.4 90.8
+ LI X 74.7 88.7 95 96.6 64.2 80.7 89 91.6
+ LI + LC X X X 74.4 89.3 95.9 96.7 63.8 79.2 89.2 91.7
+ Cluster Refine X 73 87.8 95.7 97.2 65.7 81.1 90.6 93.2
+ Cluster Refine + LI X X 78.2 91.2 97 98.1 67.6 81.8 90.2 93
+ Cluster Refine + LI +Linter
C X X X 78.7 91.2 97 97.9 68.5 81.9 91.2 93.8
+ Cluster Refine + LI +Lintra
C X X X 79.2 91.9 96.7 98 68.3 82.1 90.3 93.2
+ Cluster Refine + LC X X X 80.4 92.2 97.1 98.2 68.8 82.2 91.3 93.8
Our CACL X X X X 80.9 92.7 97.4 98.5 69.6 82.6 91.2 93.8

This is because the clustering result is not high quality and

using LC will make the training pay more attention to the
noisy cluster information. Therefore, it might bring misleading
information to the network training. In the experiments of
using both LC and cluster refinement, we observe significant
performance improvement than using the cluster refinement
alone. This also validates that the cluster refinement improves
the clustering result and the refined clustering information can
further enhance the effectiveness of using LC to train the
network.

E. More Evaluation and Analysis Fig. 4. Illustration for the raw images and the augmented images. The 1st
row: “raw images”. The 2nd row: “color-jitter”. Bottom row: “gray-scale”.
Evaluation on Importance of Cluster-Guided. We use an
instance-level contrastive loss in our method to mine the invari-
TABLE V
ance between different augment views based on SimSiam [9]. P ERFORMANCE C OMPARISON ON USING C OLOR DATA AUGMENTATIONS
To verify whether the clustering guidance is vital in the AND G RAY- SCALE T RANSFORM TO THE S ECOND N ETWORK B RANCH .

contrast learning framework, we train our CACL framework

Market-1501
but just using the instance-level contrastive loss in Eq. (1) Components Cluster Refine
mAP Rank-1 Rank-5 Rank-10
without the clustering guidance. The experimental results are T 0 (·) 70.3 87.4 94.6 96.5
shown in Table IV. As can be read from Table IV, surprisingly, J (T 0 (·)) 72.5 87.8 95.3 96.9
the contrastive learning framework without clustering guidance G(T 0 (·)) 74.4 89.3 95.9 96.7
did not work at all. T 0 (·) X 79.0 90.6 96.3 97.1
J (T 0 (·)) X 79.1 90.8 96.7 97.8
TABLE IV G(T 0 (·)) X 80.9 92.7 97.4 98.5
A BLATION S TUDY ON M ARKET-1501.

Market-1501
Components
mAP Rank-1 Rank-5 Rank-10 augmentation methods in Fig. 4. As can be observed, “color-
CACL w/o clustering 0.3 0.5 1.2 2.3 jitter” did change the image, but the color information still
CACL w/o stopGrad 80.2 92.0 97.0 97.6 dominates.
CACL 80.9 92.7 97.4 98.5
Experimental results are provided in Table V. We can
read that using “color-jitter” J (·) yields some performance
Improvements Brought by Suppressing Colors. To suppress improvement, but using “gray-scale” G(·) yield the best
colors influence, CACL uses a gray-scale process G(·) over performance improvement. When combined with the cluster
the data augmentation T 0 (·) for the second network branch. refinement step, we can observe the similar result that: using
To validate the effectiveness of suppressing colors, we conduct “gray-scale” G(·) yields better performance improvement than
a set of experiments under different settings: a) simply using using “color-jitter” J (·). These results validate that suppress-
data augmentation T 0 (·) with raw color; b) using another data ing colors is effective to gain performance improvement.
augmentation approach, named “color-jitter”, which denoted Compared to using “gray-scale”, using “color-jitter” does not
as J (·) to replace G(·), which output is still a color image; c) truly eliminate the influence brought by colors, that is to say,
with gray-scale transform G(·) after T 0 (·). It should be empha- after using color-jitter, the color information still dominates.
sized that in the implementation, the “color-jitter” operation To further reveal the mechanisms why using “gray-scale”
will give random amplitude values to the image changing. works better than using “color-jitter” in the proposed frame-
We display the image samples processed with different data work, we show the statistic histograms of color distributions
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 8, APRIL 2022 9

(a) Raw Images (b) Color-Jitter (c) Gray-Scale

Fig. 5. Comparison on distributions in histogram of intensity in RGB channels under different data augmentation operations.

of using raw image, color-jitter, and gray-scale, respectively. TABLE VI

Specifically, we compute the statistical histograms of the P ERFORMANCE C OMPARISON OF DIFFERENT CLUSTER PARAMETER d
( THE MAXIMUM DISTANCE BETWEEN NEIGHBOR POINTS ) ON CACL AND
intensity values in the RGB channels of the raw color images BASELINE METHOD .
and the images after using “color-jitter” and “gray-scale” with
500 images sampled at random in the training data from Market-1501 DukeMTMC-ReID
d Baseline CACL Baseline CACL
Market-1501. The statistical results are shown in Fig. 5. mAP Rank-1 mAP Rank-1 mAP Rank-1 mAP Rank-1
We can observe that: using “gray-scale” yields roughly 0.4 68.6 85.9 75.2 91.4 60.1 77.5 62.0 77.7
consistent distribution in the histogram compared to the raw 0.5 71.2 86.5 81.6 93.0 63.4 80.3 67.5 81.8
0.6 68.1 85.2 80.9 92.7 62.5 78.5 69.6 82.6
images; whereas using the distribution in the histogram of the 0.7 43.8 71.5 75.8 90.1 4.1 10.3 66.7 80.6
images after using “color-jitter” has some notable deviations
from that of the raw images. In the histogram of using
“gray-scale”, the proportion of the pixels at the two extreme TABLE VII
values (i.e., 0 and 255) are significantly reduced; whereas I LLUSTRATION FOR THE MODEL PERFORMANCE WITH DIFFERENT δ ON
M ARKET-1501.
in the histogram of using “color-jitter”, the proportion of
the pixels at the two extreme values, especially at 0, are Market-1501
significantly magnified—this phenomenon might damage the δ d = 0.4 d = 0.5 d = 0.6 d = 0.7
mAP Rank-1 mAP Rank-1 mAP Rank-1 mAP Rank-1
content consistency with the raw image. The difference in the
0.02 75.2 91.4 81.6 93.0 80.9 92.7 75.8 90.1
consistency of the histogram reveals the essential advantage of 0.04 70.8 89.5 80.4 92.6 80.3 92.3 68.7 86.2
using “gray-scale” to suppress the influence of colors, rather 0.06 65.8 87.2 77.7 91.7 79.0 91.4 8.20 20.3
than using “color-jitter”. 0.08 64.3 86.2 76.6 91.2 78.5 91.3 6.10 15.6
Evaluation on Parameters in DBSCAN. We conduct ex-
periments evaluate the parameter d to find the neighbors. In
cluster refinement, we use DBSCAN with a smaller parameter different structures. In Table IV, as can be read that, the
d0 , where d0 := d − δ to find the over-segmentation. We performance of the framework with asymmetric structure
conduct experiments on Market-1501 to evaluate the effects drops slightly (i.e., only 0.7% lower than that of using the
of changing the two parameters. Experiments are recorded stop-gradient operation) when the stop-gradient operation is
in Table VI. we can find that while the change of d will not used. This hints that the framework with asymmetric
affect the baseline performance, our CACL still improves the structure in CACL does not highly depend on the stop-gradient
model performance significantly. Note that even though the operation.
baseline performance will sharply drop when using d = 0.7, Evaluation Performance of Two Branches. To further re-
our method can also achieve a good performance which is also veal the performance of the trained networks, we record the
higher than other unsupervised methods in Table I. performance of using the output features of each branch of
The cluster refinement is an important component in our two networks F (·|Θ) and F 0 (·|Θ0 ), separately, for person Re-
proposed CACL, and δ is an important parameter to find the ID in Table VIII. We can read that using the output features
over-segmentation of the raw clusters. Thus, we further con- of the second branch F 0 (·|Θ0 ) did yield significantly lower
duct experiments to evaluate the performance of using different performance than that of using the output feature of the first
values of δ. Experimental results are shown in Table VII. branch F (·|Θ), and the result of using F 0 (·|Θ0 ) is similar to
(intra)
We can find that the performance is not too sensitive to δ. the result of the experiments without using LC . This is
When using δ = 0.02, the performance achieves the best, i.e., because the second network branch pays attention to learning
80.9/92.7% at mAP/Rank-1 on Market-1501 and 69.6/82.6% features from gray-scale images, lacking of the ability to
at mAP/Rank-1 on DukeMTMC-ReID. capture richer information from color images.
Moreover, we also test the stop-gradient operations under Evaluation on Memory Update Parameter α. We conduct
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 8, APRIL 2022 10

TABLE VIII Experimental results are shown in Fig. 6. We can observe

P ERFORMANCE C OMPARISON ON F (·|Θ) AND F 0 (·|Θ0 ). that the contrastive loss LC + LI did help the model dis-
Market-1501
tinguish those similar images while maintaining the cluster
Branch compactness, and also separate the overlapping individual
mAP Rank-1 Rank-5 Rank-10
F (·|Θ) (Color) 80.9 92.7 97.4 98.5 samples from each other. This confirms the effectiveness of
F 0 (·|Θ0 ) (Gray-Scale) 43.8 71.5 83.9 87.1 our proposed approach, and it also shows that our approach
can attenuate the influence of clothing color.
At the same time, we also selected some query samples
TABLE IX with the top-10 best matching images in the gallery set and
P ERFORMANCE C OMPARISON ON DIFFERENT α.
show them in Fig. 7. Compared to the baseline model, our
Market-1501 approach returns more accurate results. We can find that
Branch
mAP Rank-1 Rank-5 Rank-10 most of the wrong samples matched by the baseline model
0.0 75.1 89.8 96.3 97.3 are dressed in the same color with the query sample. These
0.2 80.9 92.7 97.4 98.5 results suggest that our approach can effectively ignore the
0.4 80.8 92.5 97.1 98.2
interference caused by samples with similar colors and thus
0.6 80.2 92.4 97.2 98.3
0.8 77.3 90.9 96.6 98.0 find more accurate matches.
1.0 4.3 10.9 19.9 24.9
V. C ONCLUSION
We have proposed a Cluster-guided Asymmetric Contrastive
experiments to evaluate the effects of the memory update Learning (CACL) approach for unsupervised person Re-ID,
parameter α and show the results in Table IX. We can find in which cluster information is leveraged to guide the feature
that our CACL is not sensitive to the changing of memory learning in a properly designed contrastive learning frame-
update parameter α, except for α = 1. When using α = 1, the work. Specifically, in our proposed CACL, instance-level con-
model performance significantly drops because the memory trastive learning is conducted with respect to the asymmetric
bank has not been updated at this time. When using α = 0.2 data augmentation and cluster-level contrastive learning is
the model achieves the best performance on Market-1501, i.e., conducted with respect to the refined clustering result. By
80.9/92.7% at mAP/Rank-1. leveraging the refined cluster result into contrastive learning,
Evaluation on Performance with Ground-truth Labels. CACL is able to effectively exploit the invariance within and
We compare our CACL to the baseline method with the between different data augmentation views for learning more
ground-truth labels (i.e., in supervised setting). The results effective features beyond the dominating colors. In addition,
are shown in Table X. We can find that CACL could achieve we confirmed that refined clustering result could help our
good performance under unsupervised setting, which is merely CACL approach mine invariant information more effectively
lower 3/1.1% at mAP/Rank-1 than the baseline method, which at the cluster level. We have conducted extensive experiments
is trained with the ground-truth labels on Market-1501. More- on three benchmark datasets and demonstrated the superior
over, if we provide ground-truth labels to train our CACL (i.e., performance of our proposal.
CACL+labels), notable improvements in performance than the As the future work, it is interesting and promising to
supervised baseline method can be observed. incorporate attention mechanism (e.g., [50], [51]), clustering
ensemble and hybrid contrastive learning strategy (e.g., [52])
F. Data Visualization or side information in dataset (e.g., [12]) to further enrich
the representation capacity, improve the stability and enhance
To gain some intuitive understanding of the performance of
the overall performance of the proposed framework. What’s
our proposed CACL, we conduct a set of data visualization
more, in other related fields, such as face recognition or
experiments on Market-1501 to visualize the clustering results
vehicle re identification (e.g., [53], [54]), whether suppresses
of the learned features when different training strategies are
the dominating color can also bring positive influence is a very
used: a) without using the contrastive loss LC + LI ; and b)
interesting and worth exploring direction.
using the contrastive losses LC + LI .
R EFERENCES
TABLE X [1] L. Zheng, Y. Yang, and A. G. Hauptmann, “Person re-identification:
P ERFORMANCE C OMPARISON TO BASELINE M ETHOD IN S UPERVISED Past, present and future,” arXiv preprint arXiv:1610.02984, 2016. 1, 2
S ETTING . “ BASELINE + LABELS ” MEANS THAT WE USE THE [2] Y. Ge, D. Chen, and H. Li, “Mutual mean-teaching: Pseudo label
GROUND - TRUTH LABELS TO TRAIN THE BASELINE METHOD ; WHEREAS refinery for unsupervised domain adaptation on person re-identification,”
“CACL + LABELS ” MEANS THAT WE USE THE GROUND - TRUTH LABELS in International Conference on Learning Representations, 2020. 1, 6, 7
TO TRAIN OUR CACL. [3] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm
for discovering clusters in large spatial databases with noise,” in Second
Market-1501 DukeMTMC-ReID International Conference on Knowledge Discovery and Data Mining,
Method
mAP Rank-1 mAP Rank-1 1996, p. 226–231. 1, 5
CACL 80.9 92.7 69.6 82.6 [4] K. Zeng, M. Ning, Y. Wang, and Y. Guo, “Hierarchical clustering with
Baseline + labels 83.9 93.6 73.3 86.6 hard-batch triplet loss for person re-identification,” in IEEE Conference
CACL + labels 85.7 94.2 74.9 87.2 on Computer Vision and Pattern Recognition, 2020, pp. 13 657–13 665.
1, 2, 6, 7
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 8, APRIL 2022 11

CACL (Without ℒ! and ℒ" ) CACL (Full)

Fig. 6. Data Visualization via t-SNE of the learned feature and clusters under two different training strategies: Training without LC and LI (left) as mentioned
in Table III and our CACL (right). The data points come from the Market-1501 training set (1,000 images of 60 identities). The points with the same color
mean the image of the same identity. To demonstrate the difference between the two distributions in detail, we further zoom in on the circled clusters and
show the corresponding images. The images in the boxes are similar to each other and the corresponding data points are very close to each other or even
overlapping in the feature space if the model is trained without using LC and LI , as shown in the left box; whereas using the contrastive losses LC and LI
will effectively distinguish these data points and maintain the cluster compactness as shown in the right box.

Baseline CACL
Query 1st 10th 1st 10th

Fig. 7. Visualization of the top-10 best matched images. We show the top-10 best matching samples in the gallery set for the query sample with the baseline
method and our proposed CACL. The images with frames in green and in red are the correctly matched images and mismatched images, respectively.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 8, APRIL 2022 12

[5] Y. Fu, Y. Wei, G. Wang, Y. Zhou, H. Shi, and T. S. Huang, “Self- [26] Y. Li, P. Hu, Z. Liu, D. Peng, J. T. Zhou, and X. Peng, “Contrastive
similarity grouping: A simple unsupervised cross domain adaptation clustering,” in AAAI Conference on Artificial Intelligence, 2021. 3
approach for person re-identification,” in The IEEE International Con- [27] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast
ference on Computer Vision, October 2019, pp. 6112–6121. 1, 6, 7 for unsupervised visual representation learning,” in IEEE Conference on
[6] J. Xie, X. Zhan, Z. Liu, Y. S. Ong, and C. C. Loy, “Delving into inter- Computer Vision and Pattern Recognition, 2020, pp. 9729–9738. 3
image invariance for unsupervised visual representations,” in Conference [28] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
and Workshop on Neural Information Processing Systems, 2020. 1, 3 image recognition,” in IEEE Conference on Computer Vision and Pattern
[7] M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, Recognition, 2016, pp. 770–778. 3, 5, 6, 7
“Unsupervised learning of visual features by contrasting cluster assign- [29] Y. Zhai, S. Lu, Q. Ye, X. Shan, J. Chen, R. Ji, and Y. Tian, “Ad-
ments,” Advances in Neural Information Processing Systems, pp. 9912– cluster: Augmented discriminative clustering for domain adaptive person
9924, 2020. 1, 3 re-identification,” in IEEE Conference on Computer Vision and Pattern
[8] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework Recognition, 2020, pp. 9021–9030. 5, 6, 7
for contrastive learning of visual representations,” in International [30] L. Wei, S. Zhang, W. Gao, and Q. Tian, “Person transfer gan to
Conference on Machine Learning, 2020, pp. 1597–1607. 1, 3, 4 bridge domain gap for person re-identification,” in IEEE Conference
[9] X. Chen and K. He, “Exploring simple siamese representation learning,” on Computer Vision and Pattern Recognition, 2018, pp. 79–88. 6, 7
in IEEE Conference on Computer Vision and Pattern Recognition, 2021,
[31] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
pp. 15 750–15 758. 1, 3, 5, 8
V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in
[10] J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya,
IEEE Conference on Computer Vision and Pattern Recognition, 2015,
C. Doersch, B. Avila Pires, Z. Guo, M. Gheshlaghi Azar, B. Piot,
pp. 1–9. 7
k. kavukcuoglu, R. Munos, and M. Valko, “Bootstrap your own latent
[32] W. Deng, L. Zheng, Q. Ye, G. Kang, Y. Yang, and J. Jiao, “Image-
- a new approach to self-supervised learning,” in Advances in Neural
image domain adaptation with preserved self-similarity and domain-
Information Processing Systems, 2020, pp. 21 271–21 284. 1, 3, 4
dissimilarity for person re-identification,” in IEEE Conference on Com-
[11] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
puter Vision and Pattern Recognition, 2018, pp. 994–1003. 6, 7
with deep convolutional neural networks,” in Conference and Workshop
on Neural Information Processing Systems, 2012, pp. 1097–1105. 1, 4, [33] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang,
6 T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convo-
[12] H. Chen, Y. Wang, B. Lagadec, A. Dantcheva, and F. Bremond, lutional neural networks for mobile vision applications,” arXiv preprint
“Joint generative and contrastive learning for unsupervised person re- arXiv:1704.04861, 2017. 7
identification,” in IEEE Conference on Computer Vision and Pattern [34] F. Yang, Z. Zhong, Z. Luo, S. Lian, and S. Li, “Leveraging virtual and
Recognition, June 2021, pp. 2004–2013. 2, 5, 10 real person for unsupervised person re-identification,” IEEE Transac-
[13] J. Liu, Z.-J. Zha, D. Chen, R. Hong, and M. Wang, “Adaptive transfer tions on Multimedia, vol. 22, no. 9, pp. 2444–2453, 2019. 6, 7
network for cross-domain person re-identification,” in IEEE Conference [35] Z. Zhong, L. Zheng, S. Li, and Y. Yang, “Generalizing a person
on Computer Vision and Pattern Recognition, 2019, pp. 7202–7211. 2, retrieval model hetero-and homogeneously,” in European Conference on
5 Computer Vision, 2018, pp. 172–188. 6, 7
[14] S. Bak, P. Carr, and J.-F. Lalonde, “Domain adaptation through synthesis [36] Z. Ji, X. Zou, X. Lin, X. Liu, T. Huang, and S. Wu, “An attention-driven
for unsupervised person re-identification,” in European Conference on two-stage clustering method for unsupervised person re-identification,”
Computer Vision, 2018, pp. 189–205. 2 in European Conference on Computer Vision, 2020, pp. 20–36. 6, 7
[15] P. Peng, T. Xiang, Y. Wang, M. Pontil, S. Gong, T. Huang, and [37] J. Li and S. Zhang, “Joint visual and temporal consistency for unsuper-
Y. Tian, “Unsupervised cross-dataset transfer learning for person revised domain adaptive person re-identification,” in European Conference
identification,” in IEEE Conference on Computer Vision and Pattern on Computer Vision, 2020. 6, 7
Recognition, 2016, pp. 1306–1315. 2 [38] Y. Zhai, Q. Ye, S. Lu, M. Jia, R. Ji, and Y. Tian, “Multiple expert
[16] J. Wang, X. Zhu, S. Gong, and W. Li, “Transferable joint attribute- brainstorming for domain adaptive person re-identification,” in European
identity deep learning for unsupervised person re-identification,” in IEEE Conference on Computer Vision, 2020, pp. 594–611. 6, 7
Conference on Computer Vision and Pattern Recognition, 2018, pp. [39] F. Zhao, S. Liao, G.-S. Xie, J. Zhao, K. Zhang, and L. Shao, “Un-
2275–2284. 2, 6, 7 supervised domain adaptation with noise resistible mutual-training for
[17] Y. Ge, F. Zhu, D. Chen, R. Zhao, and H. Li, “Self-paced contrastive person re-identification,” in European Conference on Computer Vision.
learning with hybrid memory for domain adaptive object re-id,” in Springer, 2020, pp. 526–544. 6, 7
Advances in Neural Information Processing Systems, 2020, pp. 11 309– [40] H.-X. Yu, A. Wu, and W.-S. Zheng, “Cross-view asymmetric metric
11 321. 2, 5, 6, 7 learning for unsupervised person re-identification,” in IEEE Interna-
[18] Y. Lin, X. Dong, L. Zheng, Y. Yan, and Y. Yang, “A bottom-up clustering tional Conference on Computer Vision, 2017, pp. 994–1002. 6, 7
approach to unsupervised person re-identification,” in The Association [41] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable
for the Advancement of Artificial Intelligence, vol. 33, 2019, pp. 8738– person re-identification: A benchmark,” in IEEE International Confer-
8745. 2, 6, 7 ence on Computer Vision, 2015, pp. 1116–1124. 6, 7
[19] H. Fan, L. Zheng, C. Yan, and Y. Yang, “Unsupervised person re-
[42] S. Liao, Y. Hu, X. Zhu, and S. Z. Li, “Person re-identification by
identification: Clustering and fine-tuning,” ACM Transactions on Mul-
local maximal occurrence representation and metric learning,” in IEEE
timedia Computing, Communications, and Applications, vol. 14, no. 4,
Conference on Computer Vision and Pattern Recognition, 2015, pp.
p. 83, 2018. 2, 6, 7
2197–2206. 6, 7
[20] Y. Lin, L. Xie, Y. Wu, C. Yan, and Q. Tian, “Unsupervised person re-
[43] M. Wang, B. Lai, J. Huang, X. Gong, and X.-S. Hua, “Camera-aware
identification via softened similarity learning,” in IEEE Conference on
proxies for unsupervised person re-identification,” in AAAI Conference
Computer Vision and Pattern Recognition, 2020, pp. 3390–3399. 2, 6,
on Artificial Intelligence, vol. 2, 2021, p. 4. 6, 7
7
[21] B. Sun, J. Feng, and K. Saenko, “Return of frustratingly easy domain [44] X. Pan, P. Luo, J. Shi, and X. Tang, “Two at once: Enhancing learning
adaptation,” in Association for the Advancement of Artificial Intelligence, and generalization capacities via ibn-net,” in European Conference on
vol. 30, 2016. 2 Computer Vision (ECCV), 2018, pp. 464–479. 7
[22] Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by [45] E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, “Performance
backpropagation,” in International Conference on Machine Learning, measures and a data set for multi-target, multi-camera tracking,” in
2015, pp. 1180–1189. 2 European Conference on Computer Vision. Springer, 2016, pp. 17–
[23] D. Wang and S. Zhang, “Unsupervised person re-identification via 35. 6
multi-label classification,” in IEEE Conference on Computer Vision and [46] L. Wei, S. Zhang, W. Gao, and Q. Tian, “Person transfer gan to
Pattern Recognition, 2020, pp. 10 981–10 990. 2, 6, 7 bridge domain gap for person re-identification,” in IEEE Conference
[24] P. Bojanowski and A. Joulin, “Unsupervised learning by predicting on Computer Vision and Pattern Recognition, 2018, pp. 79–88. 6
noise,” in International Conference on Machine Learning. PMLR, [47] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
2017, pp. 517–526. 3 in 3rd International Conference on Learning Representations, 2015. 6
[25] A. Dosovitskiy, P. Fischer, J. T. Springenberg, M. Riedmiller, and [48] Z. Zhong, L. Zheng, Z. Luo, S. Li, and Y. Yang, “Invariance matters:
T. Brox, “Discriminative unsupervised feature learning with exemplar Exemplar memory for domain adaptive person re-identification,” in IEEE
convolutional neural networks,” IEEE Transactions on Pattern Analysis Conference on Computer Vision and Pattern Recognition, 2019, pp. 598–
and Machine Intelligence, vol. 38, no. 9, pp. 1734–1747, 2015. 3 607. 7
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 8, APRIL 2022 13

[49] Z. Wu, Y. Xiong, S. X. Yu, and D. Lin, “Unsupervised feature learning

via non-parametric instance discrimination,” in IEEE Conference on
Computer Vision and Pattern Recognition, 2018, pp. 3733–3742. 7
[50] J. Si, H. Zhang, C.-G. Li, J. Kuen, X. Kong, A. C. Kot, and G. Wang,
“Dual attention matching network for context-aware feature sequence
based person re-identification,” in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2018, pp. 5363–5372. 10
[51] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai,
T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly,
J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words:
Transformers for image recognition at scale,” International Conference
on Learning Representations, 2021. 10
[52] H. Sun, M. Li, and C.-G. Li, “Hybrid contrastive learning with clus-
ter ensemble for unsupervised person re-identification,” arXiv preprint
arXiv:2201.11995, 2022. 10
[53] X. Liu, W. Liu, H. Ma, and H. Fu, “Large-scale vehicle re-identification
in urban surveillance videos,” in 2016 IEEE international conference on
multimedia and expo (ICME). IEEE, 2016, pp. 1–6. 10
[54] X. Liu, W. Liu, T. Mei, and H. Ma, “Provid: Progressive and multi-
modal vehicle reidentification for large-scale urban surveillance,” IEEE
Transactions on Multimedia, vol. 20, no. 3, pp. 645–658, 2017. 10

View publication stats

Amazing Adventures Book of Powers
67% (3)
Amazing Adventures Book of Powers
50 pages
Documentation
No ratings yet
Documentation
35 pages
Cluster Contrast For Unsupervised Person Re-Identification
No ratings yet
Cluster Contrast For Unsupervised Person Re-Identification
19 pages
I-C: A U F R L: ON Nifying Ramework For Epresentation Earning
No ratings yet
I-C: A U F R L: ON Nifying Ramework For Epresentation Earning
31 pages
Understanding Deep Contrastive Learning Via Coordinate-Wise Optimization
No ratings yet
Understanding Deep Contrastive Learning Via Coordinate-Wise Optimization
25 pages
Unsupervised Person Re-Identification With Stochastic Training Strategy
No ratings yet
Unsupervised Person Re-Identification With Stochastic Training Strategy
11 pages
Instruct-ReID++: Towards Universal Purpose
No ratings yet
Instruct-ReID++: Towards Universal Purpose
24 pages
Learning To Purification For Unsupervised Person Re-Identification
No ratings yet
Learning To Purification For Unsupervised Person Re-Identification
16 pages
On The Duality Between Contrastive and Noncontrastive Self-Supervised Learning
No ratings yet
On The Duality Between Contrastive and Noncontrastive Self-Supervised Learning
28 pages
Learning Generalisable Omni-Scale Representations For Person Re-Identification
No ratings yet
Learning Generalisable Omni-Scale Representations For Person Re-Identification
14 pages
tcsvt19 Survey
No ratings yet
tcsvt19 Survey
18 pages
ReID About Market
No ratings yet
ReID About Market
10 pages
FG ReID
No ratings yet
FG ReID
19 pages
ICMi F
No ratings yet
ICMi F
16 pages
Pro Co
No ratings yet
Pro Co
16 pages
Unsupervised and Self-Adaptative Techniques For Cross-Domain Person Re-Identification
No ratings yet
Unsupervised and Self-Adaptative Techniques For Cross-Domain Person Re-Identification
21 pages
Paper 3
No ratings yet
Paper 3
9 pages
SimCLR: Simple Framework For Contrastive Learning of Visual Representaitons
No ratings yet
SimCLR: Simple Framework For Contrastive Learning of Visual Representaitons
20 pages
Deep Learning For Person Re-Identification: A Survey and Outlook
No ratings yet
Deep Learning For Person Re-Identification: A Survey and Outlook
25 pages
Joint Discriminative and Generative Learning For Person Re-Identification
No ratings yet
Joint Discriminative and Generative Learning For Person Re-Identification
12 pages
A Practical Contrastive Learning Framework For Single Image Super Resolution
No ratings yet
A Practical Contrastive Learning Framework For Single Image Super Resolution
12 pages
Contrastive Masked Autoencoders Are Stronger Vision Learners
No ratings yet
Contrastive Masked Autoencoders Are Stronger Vision Learners
15 pages
Contrastive Visual Clustering For Improving Instance Level - 2024 - Pattern Reco
No ratings yet
Contrastive Visual Clustering For Improving Instance Level - 2024 - Pattern Reco
9 pages
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
No ratings yet
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
13 pages
Rethinking Sampling Strategies For Unsupervised Person Re-Identification
No ratings yet
Rethinking Sampling Strategies For Unsupervised Person Re-Identification
13 pages
8 S2 Net
No ratings yet
8 S2 Net
13 pages
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
No ratings yet
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
23 pages
Exploiting Robust Unsupervised Video Person Re-Identification
No ratings yet
Exploiting Robust Unsupervised Video Person Re-Identification
10 pages
Sim CLR
No ratings yet
Sim CLR
11 pages
Person Re-Identification in The Real-World Application Based On Deep Learning
No ratings yet
Person Re-Identification in The Real-World Application Based On Deep Learning
7 pages
Research Paper
No ratings yet
Research Paper
7 pages
Jeon Mining Better Samples For Contrastive Learning of Temporal Correspondence CVPR 2021 Paper
No ratings yet
Jeon Mining Better Samples For Contrastive Learning of Temporal Correspondence CVPR 2021 Paper
11 pages
Zheng Scalable Person Re-Identification ICCV 2015 Paper
No ratings yet
Zheng Scalable Person Re-Identification ICCV 2015 Paper
9 pages
Zhao Attribute-Driven Feature Disentangling and Temporal Aggregation For Video Person Re-Identification CVPR 2019 Paper
No ratings yet
Zhao Attribute-Driven Feature Disentangling and Temporal Aggregation For Video Person Re-Identification CVPR 2019 Paper
10 pages
Wang 2019 J. Phys. Conf. Ser. 1168 032039
No ratings yet
Wang 2019 J. Phys. Conf. Ser. 1168 032039
6 pages
Spatial-Temporal Federated Learning For Lifelong Person Re-Identification On Distributed Edges
No ratings yet
Spatial-Temporal Federated Learning For Lifelong Person Re-Identification On Distributed Edges
11 pages
Missing Value Imputation Based On Data Clustering: January 2008
No ratings yet
Missing Value Imputation Based On Data Clustering: January 2008
12 pages
Li VirFace Enhancing Face Recognition Via Unlabeled Shallow Data CVPR 2021 Paper
No ratings yet
Li VirFace Enhancing Face Recognition Via Unlabeled Shallow Data CVPR 2021 Paper
10 pages
Criminal Identification - Abstract
No ratings yet
Criminal Identification - Abstract
7 pages
Clust Loss
No ratings yet
Clust Loss
9 pages
Maqola 21
No ratings yet
Maqola 21
6 pages
Contrastive Self Supervised Learning With Hard Negative Pair Mining
No ratings yet
Contrastive Self Supervised Learning With Hard Negative Pair Mining
8 pages
Ahmed An Improved Deep 2015 CVPR Paper
No ratings yet
Ahmed An Improved Deep 2015 CVPR Paper
9 pages
A Survey On Contrastive Self-Supervised Learning
No ratings yet
A Survey On Contrastive Self-Supervised Learning
21 pages
Icdcit-2025 CR 34
No ratings yet
Icdcit-2025 CR 34
15 pages
TEEN NeurIPS-2023-few-shot-class-incremental-learning-via-training-free-prototype-calibration-Paper-Conference
No ratings yet
TEEN NeurIPS-2023-few-shot-class-incremental-learning-via-training-free-prototype-calibration-Paper-Conference
17 pages
Enhancing Image and Video Retrieval: Learning Via Equivalence Constraints
No ratings yet
Enhancing Image and Video Retrieval: Learning Via Equivalence Constraints
7 pages
Face Identification Using Large Feature Sets
No ratings yet
Face Identification Using Large Feature Sets
11 pages
A Comprehensive Review of Pedestrian Re-Identification Based On Deep Learning
No ratings yet
A Comprehensive Review of Pedestrian Re-Identification Based On Deep Learning
36 pages
Random Occlusion-Recovery For Person Re-Identification: Abstract
No ratings yet
Random Occlusion-Recovery For Person Re-Identification: Abstract
13 pages
Wang 2019
No ratings yet
Wang 2019
37 pages
Self-Supervised Contrastive Representation Learning For Semi-Supervised Time-Series Classification
No ratings yet
Self-Supervised Contrastive Representation Learning For Semi-Supervised Time-Series Classification
15 pages
Adversarial Multi Scale Features Learning For Person Re Identification
No ratings yet
Adversarial Multi Scale Features Learning For Person Re Identification
4 pages
Vaze Generalized Category Discovery CVPR 2022 Paper
No ratings yet
Vaze Generalized Category Discovery CVPR 2022 Paper
10 pages
Unsupervised Domain Adaptation An Adaptive Feature
No ratings yet
Unsupervised Domain Adaptation An Adaptive Feature
11 pages
KES2010
No ratings yet
KES2010
11 pages
Weakly Supervised Contrastive Learning
No ratings yet
Weakly Supervised Contrastive Learning
10 pages
Few-Shot Object Detection With Refined Contrastive Learning (FSRC)
No ratings yet
Few-Shot Object Detection With Refined Contrastive Learning (FSRC)
6 pages
Decision Boundary Optimization For Few Shot Class Incremental Learning ICCV WS 2023
No ratings yet
Decision Boundary Optimization For Few Shot Class Incremental Learning ICCV WS 2023
11 pages
GSW NG01017640 GEN LA7880 00004 - Technical Specifications For Pipeline Valves - D01
100% (1)
GSW NG01017640 GEN LA7880 00004 - Technical Specifications For Pipeline Valves - D01
23 pages
Weekly Learning Activity Sheets General Physics 1 Grade 12, Quarter 2 Week 6
100% (1)
Weekly Learning Activity Sheets General Physics 1 Grade 12, Quarter 2 Week 6
10 pages
MSDS-CSP E - 2400 Evamarine Finish
No ratings yet
MSDS-CSP E - 2400 Evamarine Finish
5 pages
CO - Earth and Life Science (Detailed Lesson Plan)
No ratings yet
CO - Earth and Life Science (Detailed Lesson Plan)
6 pages
Untapped Mineral Potential of Somaliland Are View
No ratings yet
Untapped Mineral Potential of Somaliland Are View
12 pages
Jaw Relations
No ratings yet
Jaw Relations
131 pages
Activity 1 Significant Figures (Ver06292020) (2) - Unlocked
No ratings yet
Activity 1 Significant Figures (Ver06292020) (2) - Unlocked
15 pages
Detention Volume Estimating Workbook (PDF) - 201404301105510967
No ratings yet
Detention Volume Estimating Workbook (PDF) - 201404301105510967
300 pages
11th Math Summer Vacation Task
No ratings yet
11th Math Summer Vacation Task
41 pages
CLASS XII PHYSICS EXEMPLAR SOLUTION Chapter 5 Magnetism and Matter
No ratings yet
CLASS XII PHYSICS EXEMPLAR SOLUTION Chapter 5 Magnetism and Matter
35 pages
Learn About Ecosystems - Lesson Plan
No ratings yet
Learn About Ecosystems - Lesson Plan
2 pages
Chapter 3 - Well Test Analysis Formulas and Calcu
No ratings yet
Chapter 3 - Well Test Analysis Formulas and Calcu
30 pages
Website Template For MSC by Coursework - ODL MSC Process Safety
No ratings yet
Website Template For MSC by Coursework - ODL MSC Process Safety
19 pages
Island of Ignorance 31 Aug 23 Digital Draft
No ratings yet
Island of Ignorance 31 Aug 23 Digital Draft
41 pages
NMP5 Q4 Week 2
No ratings yet
NMP5 Q4 Week 2
16 pages
Raslika Sharfina Nirwan: Professional Experience
100% (1)
Raslika Sharfina Nirwan: Professional Experience
1 page
Chapter 4 Vector Spaces - Part 2
No ratings yet
Chapter 4 Vector Spaces - Part 2
31 pages
WYSIWYG
No ratings yet
WYSIWYG
26 pages
Behavioral Pragmatism Barnes Holmes
No ratings yet
Behavioral Pragmatism Barnes Holmes
12 pages
Wu - 18 - Modelling of A Post-Combustion Carbon Dioxide Capture Absorber Using Potassium Carbonate Solvent in Aspen Custom Modeller
No ratings yet
Wu - 18 - Modelling of A Post-Combustion Carbon Dioxide Capture Absorber Using Potassium Carbonate Solvent in Aspen Custom Modeller
10 pages
Gender Responsiveness in Local Government Unit of San Ildefonso Ilocos Sur
No ratings yet
Gender Responsiveness in Local Government Unit of San Ildefonso Ilocos Sur
16 pages
Newton's Laws of Motion at Work Science Presentation in Beige Charcoal Hand Drawn Style
No ratings yet
Newton's Laws of Motion at Work Science Presentation in Beige Charcoal Hand Drawn Style
18 pages
Safari 8
No ratings yet
Safari 8
8 pages
163227-JEE Main 2024 (January 27 Shift 2) Physics Question Paper With Solutions (PDF)
No ratings yet
163227-JEE Main 2024 (January 27 Shift 2) Physics Question Paper With Solutions (PDF)
5 pages
Lesson 12.1 and 12.2 Seatwork
No ratings yet
Lesson 12.1 and 12.2 Seatwork
3 pages
Memento CRT
No ratings yet
Memento CRT
4 pages
Byjonathan L. Mayuga: New DENR List Reveals More Boracay Businesses Violated Environment Laws
No ratings yet
Byjonathan L. Mayuga: New DENR List Reveals More Boracay Businesses Violated Environment Laws
4 pages
Ancient Mantle in A Modern Arc: Osmium Isotopes in Izu-Bonin-Mariana Forearc Peridotites
No ratings yet
Ancient Mantle in A Modern Arc: Osmium Isotopes in Izu-Bonin-Mariana Forearc Peridotites
4 pages
Topcon Agriculture SB - 18005 TopNET Global D Frequency Migration Phase 2
No ratings yet
Topcon Agriculture SB - 18005 TopNET Global D Frequency Migration Phase 2
2 pages

Cluster GuidedAsymmetricContrastiveLearningforUnsupervisedPersonRe Identification - 2106.07846

Uploaded by

Cluster GuidedAsymmetricContrastiveLearningforUnsupervisedPersonRe Identification - 2106.07846

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Cluster-Guided Asymmetric Contrastive Learning for Unsupervised Person Re-

Article in IEEE Transactions on Image Processing · May 2022

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Cluster-guided Asymmetric Contrastive Learning

However, the result of clustering depends heavily on the quality

asymmetric contrastive learning framework. In CACL, both

ingly dominated by the colors in pedestrian images (such as

Cluster-Level Contrastive Learning Flow

Image ResNet Feature embedding Pseudo Labels

Algorithm 1 Training Procedure for CACL B. Implementation Details

TABLE II In the baseline method, we train both branches with data

This is because the clustering result is not high quality and

contrast learning framework, we train our CACL framework

(a) Raw Images (b) Color-Jitter (c) Gray-Scale

of using raw image, color-jitter, and gray-scale, respectively. TABLE VI

TABLE VIII Experimental results are shown in Fig. 6. We can observe

CACL (Without ℒ! and ℒ" ) CACL (Full)

[49] Z. Wu, Y. Xiong, S. X. Yu, and D. Lin, “Unsupervised feature learning

View publication stats

You might also like