0% found this document useful (0 votes)

30 views10 pages

8 Fend

Uploaded by

xt1244911676

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views10 pages

8 Fend

Uploaded by

xt1244911676

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

This CVPR paper is the Open Access version, provided by the Computer Vision Foundation.

Except for this watermark, it is identical to the accepted version;

the final published version of the proceedings is available on IEEE Xplore.

FEND: A Future Enhanced Distribution-Aware Contrastive Learning

Framework for Long-tail Trajectory Prediction

Yuning Wang 1 , Pu Zhang 2 , Lei Bai 3 , Jianru Xue 1 †

1
Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, China
2
DiDi Chuxing, China
3
Shanghai AI Laboratory, China
[email protected], {zhangpu94,baisanshi}@gmail.com, [email protected]

Abstract

Predicting the future trajectories of the traffic agents is

a gordian technique in autonomous driving. However, tra-
jectory prediction suffers from data imbalance in the preva-
lent datasets, and the tailed data is often more complicated
and safety-critical. In this paper, we focus on dealing with
the long-tail phenomenon in trajectory prediction. Previ-
ous methods dealing with long-tail data did not take into
account the variety of motion patterns in the tailed data.
In this paper, we put forward a future enhanced contrastive
learning framework to recognize tail trajectory patterns and
form a feature space with separate pattern clusters. Fur-
thermore, a distribution aware hyper predictor is brought
up to better utilize the shaped feature space. Our method is
a model-agnostic framework and can be plugged into many Figure 1. The long-tailed final displacement errors of the state-
well-known baselines. Experimental results show that our of-the-art prediction network: Trajectron++ EWTA [28] on ETH-
framework outperforms the state-of-the-art long-tail pre- UCY. The long-tail part of the dataset contains various compli-
diction method on tailed samples by 9.5% on ADE and 8.5% cated motion patterns, and predicting them is challenging.
on FDE, while maintaining or slightly improving the aver-
aged performance. Our method also surpasses many long-
tail techniques on trajectory prediction task. example, in real traffic scenes, most of the trajectories fol-
low certain simple kinematic rules, while deviating and
collision-avoiding circumstances are scarce. Therefore, the
1. Introduction frequent cases are often simple and easy to predict, while
the tail cases are often complicated with many motion pat-
Trajectory prediction is of great importance in au- terns and suffer from large prediction errors, which makes
tonomous driving scenarios [27]. It aims to predict a series them more safety-critical, as shown in Fig. 1 for the univ
of future positions for the agents on the road given the ob- dataset. Despite of its significance, the long-tail prediction
served past tracks. There have been many recent methods problem have been rarely discussed in literature.
in trajectory prediction, both unimodal [1, 48] and multi- It has been pointed out that the feature encoders largely
modal [10, 37, 38, 49]. suffer from long-tail data. In the training process, the head
Despite the high accuracy those prediction methods have samples are encountered more often and dominate the la-
achieved, most of them treat the samples in the datasets tent space, while the tailed samples will be modeled insuf-
equally in both training and evaluation phases. But there ficiently, as discussed in [24, 28, 39]. Feature embeddings
is a long-tailed phenomenon in prevalent datasets [28]. For of the tailed data can even be mixed up with the ones of the
* Equal contributions. head data as discussed in [28], therefore the performances
† Corresponding author. of the tailed samples could be harmed.

1400
In this paper, we pick up the general idea of using con- more diverse trajectories, which is different from our goal
trastive learning to enhance the model ability on long-tailed to distinguish tail patterns from head patterns and optimize
data. A new framework is developed called FEND: Future the feature embedding space.
ENhanced Distribution-aware contrastive trajectory predic- Trajectory prediction approaches based on con-
tion, which is a pattern-based contrastive feature learning trastive learning. Contrastive learning [34] is a self-
framework enhanced by future trajectory information. An supervised method to improve the representation ability of
offline trajectory clustering process and prototypical con- the network given the similarities between sample pairs, and
trastive learning are introduced for recognizing and sepa- has many variants [4, 8, 19, 20] with different ways of se-
rating different trajectory patterns to boost the tail samples lecting positive and negative samples and calculating con-
modeling. To deal with the afore mentioned problem, the trastive loss. Prototypical Contrastive Learning (PCL) [23]
features of trajectories within the same pattern cluster are is a variant of contrastive learning that can preserve local
pulled together, while the features from different pattern smoothness therefore induce semantically hierarchical clus-
clusters will be pushed apart. Moreover, a more flexible tered feature space [23]. Contrastive learning has also been
network structure of the decoder is introduced to exploit the incorporated into trajectory prediction. DisDis [7] uses con-
shaped feature embedding space with different pattern clus- trastive learning in a CVAE framework to discriminate the
ters. Our contribution can be summarized as follows: latent variable distributions and make the predictions more
• We propose a future enhanced contrastive feature diverse. ABC+ [12] uses action labels from their datasets
learning framework for long-tailed trajectory predic- and contrasts according to them. Social-NCE [26] uses con-
tion, which can better distinguish tail patterns from trastive learning to make the predictions away from their
head patterns, and the different patterns are repre- simulated collision cases. None of those above-mentioned
sented by different cluster prototypes to enhance the methods have discussed long-tail prediction. The most rel-
modeling of the tailed data. evant work is from Makansi et al. [28], which also tries
• We propose a distribution-aware hyper predictor, aim- to solve the long-tail prediction problem with contrastive
ing at providing separated decoder parameters for tra- learning and uses Kalman prediction errors to select posi-
jectory inputs with different patterns. tive and negative samples. Makansi et al. [28] push all the
• Experimental results show that our proposed frame- tailed samples together in their method. In this work, we
work can outperform start-of-the-art methods. not only separate the tails from the heads as the study [28]
did, but also recognize the patterns of the tailed samples due
2. Related Work to the fact that the tailed samples can be tailed in different
ways, e.g. turning or accelerating, as shown in Fig. 1 and
2.1. Trajectory Prediction
Fig. 3, which further improves the model capabilities.
Deep learning has become a mainstream trajectory pre-
diction method because of its powerful representational 2.2. Long-tailed Learning
ability. Some studies [1, 32, 43, 46, 48] focus on bet-
Long-tailed learning aims to improve the performance
ter modeling subtle relationship such as social interactions
on tailed samples when faced with unbalanced data. Most
to make their prediction more precise, and some works
of them focus on classification tasks. Typical methods do
[29, 33, 35, 36, 50] aim to produce more diverse trajec-
data resampling [6, 13, 41] or loss reweighting [9, 15, 25]
tory proposals. Strong baselines [30, 38, 40, 47] have been
to improve the capability of the network on tailed samples.
brought up. Although the trajectory prediction methods be-
Recent advances [3, 31] seek for a theoretical balance of
come increasingly accurate, the long-tail issue in the task of
head-tail performance by means of adjusting the classifica-
trajectory prediction has been rarely discussed.
tion boundaries, whereas these methods cannot be directly
Trajectory prediction approaches based on cluster-
used in regression tasks. Very recently Yang et al. [45] have
ing. Existing methods [5, 42, 44] have used trajectory clus-
investigated imbalanced regression tasks and propose a fea-
tering for trajectory prediction. MultiPath [5] performs
ture distribution smoothing and label distribution smooth-
Kmeans with the square distances between the trajectories
ing method. But the methodology in [45] needs labels with
to get anchor trajectory sets. PCCS-Net [42] decouples mul-
structured relationships, which is incongruent with the tra-
timodal trajectory prediction into three steps: feature clus-
jectory data. In our methods, we find out structured relation-
tering, cluster selecting, and synthesizing. Memo-Net [44]
ships between trajectories by forming pattern clusters, and
clusters trajectories in the original coordinates and uses an
optimize feature space according to item. Besides, we use
attention network for better cluster selecting. All existing
Hypernetwork [11] as the trajectory decoder to deal with
methods that use trajectory clustering are aiming at select-
tail samples utilizing its distribution-aware modeling abil-
ing future modalities for trajectory decoders and producing
ity, which has not been discussed in long-tail regression to
Codes available at https://github.com/ynw2021/FEND. our best knowledge.

1401
Figure 2. Illustration of our overall future enhanced distribution-aware contrastive learning framework. Top: Offline Kmeans clustering for
pseudo cluster labels. Bottom: The baseline prediction network with FEND plugged in for prediction. The FEND module contains a PCL
optimization procedure and a hyper decoder.

3. Method dicted differently.

Problem formulation. Trajectory prediction is a kind of

sequential prediction problems. Given a series of past ob-
3.1. Future Enhanced Contrastive Learning
served coordinates {(xnt , ytn )}N n=1 for N agents over time 3.1.1 Future Enhanced Trajectory Clustering
t = −Tobs + 1, −Tobs + 2, ..., 0, our objective is to pre-
dict the future locations (x̂t , ŷt ) of the agent of interest in a For trajectory pattern clustering, the start points and initial
constant period t = 1, 2, ..., Tpred . directions of trajectories should be normalized to make the
As discussed above, the trajectory data suffers from the feature extractor more focused on the different patterns of
long-tail phenomenon. To address this issue, we come trajectories. But the data-preprocessing ways of present tra-
up with a new long-tail trajectory prediction framework jectory prediction methods are various. Therefore, to make
FEND, which contains a future enhanced contrastive learn- our framework can be more generally applied, we use an
ing method for helping shape better feature embedding for offline cluster module to do normalization and perform tra-
trajectory encoders, and a more flexible distribution-aware jectory clustering. Also, many trajectory prediction base-
hyper predictor for impairing the influence from the head lines do not have a future trajectory encoder, and their en-
samples to the tail samples. coded past trajectory features are high-dimensional, so on-
Overview. The overall framework of FEND is discribed line clustering will be time-consuming.
in Fig. 2. Both the history and the future trajectories are We simply use a 1D convolution network (CNN) at-
firstly processed by a trajectory feature extractor, and the tached by an LSTM as the trajectory feature extractor for
extracted features are clustered by Kmeans to form differ- trajectory encoding and reconstruction, which is supervised
ent pattern clusters. After clustering, the tail trajectory pat- by the reconstruction loss as autoencoders. The feature at
terns and the head trajectory patterns are separated sponta- the bottleneck of the network is used to perform hierarchical
neously using both history and future information. Accord- Kmeans. Kmeans [14] is a computation-efficient classical
ing to the pseudo cluster labels generated by Kmeans, PCL clustering method and can be replaced with any other clus-
is performed on the history encoding features of the base- tering algorithm. We perform Kmeans with multiple level
line prediction network. By performing PCL, the feature of clusters for achieving hierarchy, as the original PCL does.
space of trajectory encoders is separately clustered. Then a In the training phase of feature extractors, we also use the
hyper decoder is constructed which generates separate de- original PCL [23] with EM steps as an auxiliary loss to get
coder weights for different trajectory inputs, therefore tra- a hierarchically clustered feature space in a self-supervised
jectories in the head clusters and the tail clusters are pre- manner, which will be discussed in Sec. 3.1.2.

1402
3.1.2 Prototypical Contrastive Learning is the number of Kmeans clustering hierarchies, cm s means
the prototype of the cluster to which i belongs, and cm j
In our methods, we have already got the cluster labels after
means the prototype of an arbitrary cluster j. The proto-
the trajectory clustering step. Therefore we use the cluster
type is calculated by taking an average of all the features
assignments as pseudo labels for computing prototypes and
in a cluster. Nm denotes the number of clusters for hier-
densities. The original PCL [23] is an self-supervised meth-
archy m. ϕm j denotes the density of a cluster j, which is
ods with EM steps, therefore it needs to perform clustering
calculated as below:
before every training epoch. Our methods use the pseudo
labels to reduce the clustering steps therefore require less PZ
∥v ′ − c∥2
computation source compared to the original PCL. Given ϕ = z=1 z , (4)
Z log(Z + α)
pseudo cluster labels, PCL can pull the features of instances
belonging to the same cluster together and push the features where Z is the number of instances in the cluster, and α
of instances in different clusters apart, as what vanilla con- is a smoothing factor to ensure that small clusters do not
trastive learning does to the positive and negative samples. have an overly large ϕ. We set α = 10 same as [23]. vz′
Implementing PCL loss. We do PCL on the features at is the momentum updated feature for instance z to ensure
the bottleneck of the encoder-decoder trajectory prediction stability.
network: after the encoder. Similar to Makansi et al. [28],
we add a fully-connected (FC) layer after the encoder and 3.2. Distribution-Aware Hyper Predictor
add the PCL loss to its output features. The features before
the FC layer will be given to the trajectory decoder. We Distribution-aware hypernetwork. Intuitively, the
perform a multi-level clustering with M hierarchies when head clusters and the tail clusters should be assigned dif-
calculating PCL loss. The PCL loss is as follows: ferent decoders to impair their influence on each other. But
there is an insufficient amount of data for the tail samples,
LP rotoN CE = Lins + Lproto , (1) and separately training decoders for them will cause badly
overfitting. Therefore, we want to transfer common knowl-
where the first term is an instance-wise contrastive term and edge across the whole dataset, while keep the modeling
the second-term is an instance-prototype contrastive term. flexibility of separate decoders. HyperNetworks [11] is an
Instance-wise term. The first term in Eq. (1) is an approach of using a small network, which is known as a hy-
instance-wise contrastive term considering the pseudo clus- pernetwork, to generate the weights of the main network,
ter labels, which can be written as follows: and it naturally suits our demands. The hypernetwork con-
Npoi
tains the knowledge of all samples, which prevents overfit-
r
X 1 X exp vi · vi+ /τ ting. Also, there are separate decoder parameters for head
Lins = − log Pr . (2)
Npoi j=1 exp (vi · vj /τ )
and tail clusters, which make the decoder aware of the dis-
i=1 i+ =1
tribution of the clustered feature space. So the hyper de-
The instance-wise term can help the instances gather to- coder can predict the tailed clusters differently.
gether faster and the algorithm converge faster. vi and vi+ LSTM trajectory decoder. As an example of a hyper
are feature embeddings of trajectory instance i and positive predictor, we employ an LSTM as the trajectory decoder,
sample i+ after the encoder respectively, i+ ̸= i. Npoi is which is commonly used in recent studies [28, 38, 49].The
the number of positive samples to i in a batch. τ is the con- original formulation of an LSTM is as follows:
trastive temperature of the instance-wise contrastive term.
In Eq. (2), the positive samples i+ are the instances from it = Whi ht−1 + Wxi xt + bi ,
the same cluster with the instance i, and the rest instances gt = Whg ht−1 + Wxg xt + bg ,
in the batch, i.e. belonging to other clusters, are regarded as
ft = Whf ht−1 + Wxf xt + bf ,
negative samples. j means an arbitrary sample in the cur- (5)
rent batch data. r denotes the batch size. ot = Who ht−1 + Wxo xt + bo ,
Instance-prototype term. The second term in Eq. (1) is mt = σ (ft ) ⊙ mt−1 + σ (it ) ⊙ ψ (gt ) ,
an instance-prototype contrastive term, which can be writ- ht = σ (ot ) ⊙ ψ (mt ) ,
ten as follows:
r M where i, g, f, o are the input gate, update gate, forget gate,
1 XX exp (vi · cm m
s /ϕs ) and output gate respectively. Wh ∈ RNh ×Nh , Wx ∈
Lproto = − log PNm .
M i=1 m=1 m
j=1 exp vi · cj /ϕj
m
RNh ×Nx , b ∈ RNh , Nh and Nx are the dimensions of input
(3) and hidden states. ht , mt are the hidden state and the cell
The prototypes help preserving local smoothness and the state. σ is the sigmoid operator, and ψ is the tanh operator.
formation of clusters with different patterns. In Eq. (3), M The initial x and h are produced by the feature embedding

1403
v of the observed trajectory: 4. Experiments
x1 = Wxv v + bvx , 4.1. Datasets
(6)
h0 = Whv v + bvh , We evaluate our proposed method on several widely used
public pedestrain datasets including ETH-UCY, Nuscenes
where Whv ∈ RNh ×Nv , Wxv ∈ RNx ×Nv , bvh ∈ RNh , bvx ∈ and SDD. ETH-UCY is a pedestrian dataset with rich so-
RN x . cial interactions. Nuscenes is a large scale trajectory dataset
HyperLSTM. In our implement, the formulation of an with both vehicles and pedestrians. In this work, we mainly
LSTM with a small hypernetwork is as follows: evaluate the performances of our model on the vehicle type,
same as [28]. SDD is another large scale bird view trajec-
yt = LN (dyh ⊙ Why ht−1 + dyx ⊙ Wxy xt + by (zby )) , tory dataset. We use ETH-UCY and Nuscenes in the way
(7) same as our backbone Traj++ EWTA [28] and SDD in the
where way same as our backbone Y-Net [30].
dyh (zh ) = Whz
y
zh ,
dyx (zx ) = Wxz
y
zx , (8) 4.2. Evaluation Metrics
b y
(zby ) = y y
Wbz zb + by0 .
Performance metrics. We use the common met-
In Eq. (7), y means one of {i, g, f, o} four gates in the rics for evaluating multimodal trajectory prediction per-
original LSTM formulation Eq. (5) for brevity. ⊙ denotes formance: Average-Displacement-Error (ADE) and Final-
the element-wise product operation, LN () denotes the layer Displacement-Error (FDE), which is commonly used in
normalization, ds and b are the weights and bias adjusting studies [1, 5, 48]. ADE means the averaged L2 distance be-
vectors from the hypernetwork to change the weights and tween future prediction and ground truth trajectory, while
bias in the original LSTM. ds and b are generated by the FDE means the L2 distance between the final predicted
output zs of the hypernetwork as in Eq. (8), where W s and destination and the ground truth destination. For evaluat-
by0 are the weights and bias of the linear fully-connected ing multi-modality, we calculate mininum ADE and FDE
layers. z can be written as follows for instance i with input among all the output guesses, which are denoted as mi-
feature vi : nADE and minFDE and are averaged across the dataset.
Tailed test sample selecting. In order to demonstrate
zi = fH (vi ) , (9) our model on the long-tailed data, we need to separate the
hard samples as well as the easy ones for evaluation. Specif-
where the fH means the hypernetwork mapping function, ically, we use the testing FDEs of the baseline method as
which should be a shallow network to reduce computation the threshold to divide the datasets into seven kinds of sam-
and prevent overfitting. ples: the top 1%-5% challenging samples with the largest
errors, the rest easier samples, as well as all samples in the
3.3. Loss Reweighting
datasets. In [28], the Kalman predictor prediction error is
Our final network loss is as follows: utilized for dataset division. Compared with the FDEs of
a simple Kalman predictor, performances of an advanced
L = Lpred + λLP rotoN CE , (10) baseline predictor can better reflect the degrees of difficulty
for the samples to be modeled by the data-driven network,
where Lpred is the loss of the baseline prediction network, which can better reveal the ability of the long-tail prediction
λ is a coefficient on the PCL loss term. For easy samples methods to deal with the hard tailed samples. The Kalman
that the network has already fitted perfectly, the PCL loss divisions are discussed in supplementaries.
would hardly bring more benefit in network optimization.
Thus, we make λ vary across samples, which performs as 4.3. Baseline
a gate to shut off the PCL loss on easy samples. We use
the prediction loss Lpred of the network after a warm-up We use Trajectron++ EWTA (Traj++ EWTA) [28] as a
training stage to indicate the hardness of the samples, which baseline for our framework on ETH-UCY and Nuscenes,
is denoted as L′pred . λ is determined according to L′pred : which has achieved state-of-the-art results according to
[28]. Traj++ EWTA augments the Trajectron++ [38]
λ=a L′pred > θ, by removing the conditional variational autoencoder parts
(11) and using a multi-head decoder trained with the evolving
λ=0 L′pred < θ,
winner-take-all (EWTA) strategy. Another strong baseline
where a is a constant hyperparameter, and θ is the threshold we experiment on is Y-Net [30], which uses a U-Net back-
to filter out head samples. bone and achieves state-of-the-art results on SDD.

1404
Top 1% Top 2% Top 3% Top 4% Top 5% Rest All
Traj++ EWTA [28] 0.98/2.54 0.79/2.07 0.71/1.81 0.65/1.63 0.60/1.50 0.14/0.26 0.17/0.32
Traj++ EWTA+resample [41] 0.90/2.17 0.77/1.90 0.73/1.78 0.66/1.60 0.64/1.52 0.20/0.41 0.23/0.47
Traj++ EWTA+reweighting [9] 0.97/2.47 0.78/2.03 0.68/1.73 0.62/1.55 0.56/1.40 0.14/0.26 0.16/0.32
Traj++ EWTA+LDAM [3] 0.92/2.35 0.76/1.96 0.68/1.71 0.62/1.53 0.57/1.37 0.15/0.27 0.17/0.32
Traj++ EWTA+contrastive [28] 0.92/2.33 0.74/1.91 0.67/1.71 0.60/1.48 0.55/1.32 0.15/0.27 0.17/0.32
Traj++ EWTA+FEND (ours) 0.84/2.13 0.68/1.68 0.61/1.46 0.56/1.30 0.52/1.19 0.15/0.27 0.17/0.32

Table 1. Prediction errors in the format of (minADE/minFDE) in meters on seven kinds of testing samples on the ETH-UCY dataset.

Top 1% Top 2% Top 3 % Top 4% Top 5% Rest All

Traj++ EWTA [28] 1.33/3.09 1.02/2.35 0.87/2.00 0.80/1.80 0.74/1.64 0.16/0.26 0.19/0.32
Traj++ EWTA+contrastive [28] 1.28/2.85 0.97/2.15 0.83/1.83 0.76/1.64 0.70/1.48 0.15/0.24 0.18/0.30
Traj++ EWTA w/o resampling+FEND 1.21/2.50 0.92/1.88 0.79/1.61 0.72/1.43 0.66/1.31 0.14/0.20 0.17/0.26

Table 2. Prediction errors in the format of (minADE/minFDE) in meters on seven kinds of testing samples on Nuscenes dataset.

4.4. Implement Details samples. Specifically, our framework outperforms the sec-
ond best method: Traj++ EWTA+contrastive [28] by 9.5%
We follow the train schedule of Traj++ EWTA, to train
on ADE and 8.5% on FDE on the top 1% hardest samples,
the network with a batch size of 256 for 100 epochs for
and maintains the average ADE and FDE nearly stable. The
ETH-UCY and 5 epochs for Nuscenes in each EWTA stage.
Traj++ EWTA+reweighting [9] performs best on the aver-
The learning rate is initially set as 0.01 and exponentially
age ADE/FDE, but its performance gains on tailed samples
decays with the rate of 0.001. We use a warm-up of 300
are relatively little. The Traj++ EWTA+resampling [41]
epochs in our final model for ETH-UCY. We set a = 50
gets more gains on the most tailed samples, but its aver-
as an initial loss factor same as [28], and a will decade to
age ADE/FDE become much worse. Unlike simply doing
0.2 after 100 epochs to not to harm the prediction training
resampling or loss reweighting, hypernetwork can decou-
process, according to the drop on the EWTA loss. The head
ple head samples and tail samples in the parameter space of
sample filter threshold θ is set to 0.2. For the feature extrac-
decoder, therefore achieves better performances.
tor, we use a 1D CNN with 16 output channel and a kernel
size of 3, attached with an LSTM with a hidden size of 128. Quantitative comparisons on Traj++ EWTA on
For Kmeans clustering, we use {20, 50, 100} as the clus- Nuscenes. Comparison results with the previous best
ter numbers for getting hierarchical clusters. And we use long-tail prediction method [28] on Nuscenes are in Ta-
a fully-connected multilayer perception with a hidden size ble 2. We find out that the resampling operation in the
of 128 as the hypernetwork. To train Y-Net, we follow [22] original Traj++ EWTA does not work well with FEND,
to make the encoded feature with shape (C, H, W ) average probably because of causing overfit on hypernetwork. De-
pooled in the spatial dimension to get a C dimensional vec- spite of this, as shown in Table 2, the baseline without
tor, and perform PCL on it. We set a = 1 and no warmup. resampling can achieve both superior long-tail and over-
all performances with FEND. The performances of Traj++
4.5. Comparisons with others EWTA and Traj++ EWTA+contrastive on both ETH-UCY
and Nuscenes are tested on the provided pre-trained models
Quantitative comparisons on Traj++ EWTA on ETH- of [28].
UCY. To show the effectiveness of our methods, we select
the state-of-the-art method for long-tail trajectory predic- Quantitative comparisons on Y-Net on SDD. We also
tion [28], classical data resampling [41] and loss reweight- plug our module into Y-Net, the results are shown in Table
ing [9], and a head-tail performance balancing method 3. We reproduced the results of Y-Net using the official re-
[3] for comparison. For long-tailed classification meth- leased code of [30] with 42 as the random seed, since the
ods [3, 9, 41], we construct a classification head after the original method does not have a fix seed. Results show that
encoder of Traj++ EWTA to use it to classify the trajectories our method can achieve performance gains on both tail sam-
according to the discretization of Kalman filter errors, same ples and the whole dataset.
as Makansi et al. [28], and the classification loss is trained Qualitative comparison. Figure 3 shows some long-
along with the prediction loss. Table 1 summarizes our ex- tailed hard-case studies of our method on ETH-UCY. Those
perimental results on ETH-UCY using a best-of-20 evalu- cases contain some rare social interactions, and all the fu-
ation [10]. We can see that our method stably outperforms ture trajectories in them are non-trivial to be predicted. In
all comparing methods on all the top 1% − 5% long-tail all those samples, our method (blue) outperforms the origi-

1405
Top 1% Top 2% Top 3% Top 4% Top 5% Rest All
Y-Net* [30] 65.82/134.01 51.84/104.37 43.74/88.21 38.68/76.08 34.72/67.46 6.54/8.96 7.93/11.88
Y-Net*+FEND 57.58/108.51 46.33/86.93 39.22/75.02 35.05/66.24 31.27/57.98 6.64/9.24 7.87/11.68

Table 3. Prediction errors in the format of (minADE/minFDE) on seven kinds of testing samples on SDD dataset. * means the results are
reproduced using the official released code of [30].

Components Performance(ADE/FDE)
PCL F H Top 1% Top 2% Top 3% Top 4% Top 5% Rest All
0.98/2.54 0.79/2.07 0.71/1.81 0.65/1.63 0.60/1.50 0.14/0.26 0.17/0.32
✓ 0.96/2.41 0.79/2.03 0.70/1.77 0.62/1.56 0.57/1.41 0.15/0.27 0.17/0.32
✓ ✓ 0.89/2.23 0.72/1.84 0.66/1.61 0.60/1.44 0.55/1.30 0.15/0.27 0.17/0.32
✓ ✓ 0.90/2.28 0.72/1.87 0.65/1.61 0.58/1.43 0.54/1.30 0.15/0.27 0.17/0.32
✓ ✓ ✓ 0.84/2.13 0.68/1.68 0.61/1.46 0.56/1.30 0.52/1.19 0.15/0.27 0.17/0.32

Table 4. Ablation study of different modules in FEND. F means future enhanced clusters, H means the hypernetwork.

(a) (b) (c)

(d) (e) (f)

Figure 3. Qualitative results on the ETH-UCY dataset: (a)(b) col-
lision avoidance (c)(d) social influence of parallel walking (e)(f)
Figure 4. Qualitative results on the ETH-UCY dataset for our dif-
crowd avoidance. The predictions are selected using a best-of-20
ferent model variants and the baseline Traj++ EWTA. The predic-
evaluation.
tions are selected using a best-of-20 evaluation.

nal Traj++ EWTA (red) and the Traj++ EWTA+contrastive

variants. All the plotted cases are challenging, and we can
(magenta), thanks to our future enhanced PCL framework
see that our full model FEND stably outperforms the other
for letting the prediction network better recognize different
variants and the baseline Traj++ EWTA. Also we can dis-
trajectory patterns and a more flexible hyper predictor.
cover from the figure that all of our different model variants
4.6. Ablation Study and Dicussions perform better than the baseline Traj++ EWTA.

Quantitative ablation studies. Results of quantitative a Top 1% Top 2% Top 3% All

ablation studies are shown in Tab. 4. We can see from the 1 0.97/2.48 0.78/2.01 0.69/1.72 0.17/0.33
results that both the future enhanced clustering and the PCL 20 0.85/2.15 0.68/1.70 0.61/1.47 0.17/0.32
loss can contribute to the performance of the tailed sam- 50 0.84/2.13 0.68/1.68 0.61/1.46 0.17/0.32
ples. Importing the hypernetwork can also lead to a decline 100 0.85/2.14 0.68/1.69 0.61/1.46 0.17/0.32
on the tailed FDEs. And the future enhanced PCL and the
hypernetwork are compatible with each other for achieving Table 5. Study on the parameter sensitivity of the auxiliary loss
lower tail FDEs by using both. weight a. Results are in the format of (minADE/minFDE) in me-
Qualitative ablation studies. Figure 4 shows some vi- ters.
sualizations of the predict results with our different model

1406
(a) (b) (c)

Figure 5. TSNE results of (a)Traj++EWTA (b)Traj++ EWTA+contrastive (c)Traj++ EWTA+FEND on the univ scene. The red stars, the
green stars, and the yellow stars represent clusters of three kinds of hard tailed patterns, while the magenta and cyan dots represent clusters
of two kinds of easy head patterns. We can see from the figures that our method forms a more separately clustered feature space.

1.0 1.0 contrast

Head part FEND and the CDF bars of head and tail regions on ETH-UCY in
Figure 6 versus the second-best: Traj++ EWTA+contrastive
CDF for variant FDE. y=P(FDE>x)

0.8
[28]. The CDF is averaged across the five scenes.
0.08
F=0.073, c=0.080， Limitations. The performances on the head samples are
0.6
slightly dropped, which can been seen in Figure 6 and Ta-
Tail part ble 1 2 3. We leave it as future works. In most experiments
0.4 we use the minADE/FDE as the prediction evaluation pro-
tocols. There are many better metrics such as the Nega-
0.2 tive Log-Likelihood (NLL) [2, 16, 17] or those which take
scene-compliance or socially acceptable prediction into ac-
0.0
1.0 2.0 3.0 4.0 5.0 6.0 count [18, 21]. The results of another evaluation protocol:
0.0 1.0 2.0 3.0 4.0 5.0
FDE NLL are in supplementaries.
x(final displacement errors in meters)
Discussion about single agent clustering. We use sin-
Figure 6. CDF curve and CDF bars of testing FDEs on ETH-UCY. gle agent full trajectory features for clustering, similar to
It can be seen that our method have a shorter tail region. other works using single trajectories to cluster or retrieve
[42, 51]. In our experiment we find out that the information
in single agent trajectories can already lead to good perfor-
Parameter sensitivity study. Table 5 shows the param- mances. We believe that it is a promising future direction to
eter sensitivity study of PCL loss weight a. We can see that include social features into the clustering process.
setting a = 50 initially will be the best choice. Other pa-
rameter sensitivity studies are provided in supplementaries. 5. Conclusion
Shaped feature embedding space. Figure 5 shows the
TSNE results of the feature space of our method and two In this paper, we propose a future enhanced contrastive
comparing methods, with two head patterns and three tailed feature space shaping method and a distribution-aware hy-
patterns. We can see from the figure that our future en- per decoder for long-tailed trajectory prediction. Quantitive
hanced PCL method can decently separate the tail patterns and qualitative experiment results show that our method can
and the head patterns, while there is still some overlap be- outperform state-of-the-art long-tail prediction methods on
tween the heads and the tails in the feature space of Traj++ the challenging tailed samples, while maintaining the aver-
EWTA and Traj++ EWTA+contrastive. Also, we can see aged performance on the whole datasets. Our method can
from Fig. 5 that our method can form different clusters for be generally plugged into many strong prediction networks.
different tailed patterns, while in the feature space of Traj++
EWTA+contrastive, all the samples of the three tail patterns Acknowledgement
are pushed together, as in Sec. 2 discussed.
FDE distribution bars. To illustrate the distribution of This work was supported by NSFC Projects (No.
the prediction errors across the dataset more clearly, We plot 62036008) and STI 2030—Major Projects (No.
the cumulative distribution function (CDF) curve of FDEs, 2021ZD0201300).

1407
References [14] John A Hartigan and Manchek A Wong. Algorithm as 136 a
k-means clustering algorithm. Journal of the royal statistical
[1] Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, society. series c (applied statistics), 28(1):100–108, 1979. 3
Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. So- [15] Haibo He and Edwardo A Garcia. Learning from imbalanced
cial lstm: Human trajectory prediction in crowded spaces. In data. IEEE Transactions on knowledge and data engineer-
Proceedings of the IEEE conference on computer vision and ing, 21(9):1263–1284, 2009. 2
pattern recognition, pages 961–971, 2016. 1, 2, 5
[16] Ronny Hug, Wolfgang Hübner, and Michael Arens. Intro-
[2] Apratim Bhattacharyya, Bernt Schiele, and Mario Fritz. Ac- ducing probabilistic bézier curves for n-step sequence pre-
curate and diverse sampling of sequences based on a “best of diction. In Proceedings of the AAAI Conference on Artificial
many” sample objective. In Proceedings of the IEEE Con- Intelligence, volume 34, pages 10162–10169, 2020. 8
ference on Computer Vision and Pattern Recognition, pages [17] Boris Ivanovic and Marco Pavone. The trajectron: Proba-
8485–8493, 2018. 8 bilistic multi-agent trajectory modeling with dynamic spa-
[3] Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, tiotemporal graphs. In Proceedings of the IEEE/CVF Inter-
and Tengyu Ma. Learning imbalanced datasets with label- national Conference on Computer Vision, pages 2375–2384,
distribution-aware margin loss. Advances in neural informa- 2019. 8
tion processing systems, 32, 2019. 2, 6 [18] Boris Ivanovic and Marco Pavone. Injecting planning-
[4] Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Pi- awareness into prediction and detection evaluation. In 2022
otr Bojanowski, and Armand Joulin. Unsupervised learning IEEE Intelligent Vehicles Symposium (IV), pages 821–828.
of visual features by contrasting cluster assignments. Ad- IEEE, 2022. 8
vances in Neural Information Processing Systems, 33:9912– [19] Yannis Kalantidis, Mert Bulent Sariyildiz, Noe Pion,
9924, 2020. 2 Philippe Weinzaepfel, and Diane Larlus. Hard negative mix-
[5] Yuning Chai, Benjamin Sapp, Mayank Bansal, and Dragomir ing for contrastive learning. Advances in Neural Information
Anguelov. Multipath: Multiple probabilistic anchor tra- Processing Systems, 33:21798–21809, 2020. 2
jectory hypotheses for behavior prediction. arXiv preprint [20] Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna,
arXiv:1910.05449, 2019. 2, 5 Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and
[6] Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and Dilip Krishnan. Supervised contrastive learning. Advances
W Philip Kegelmeyer. Smote: synthetic minority over- in Neural Information Processing Systems, 33:18661–18673,
sampling technique. Journal of artificial intelligence re- 2020. 2
search, 16:321–357, 2002. 2 [21] Parth Kothari, Sven Kreiss, and Alexandre Alahi. Human
[7] Guangyi Chen, Junlong Li, Nuoxing Zhou, Liangliang Ren, trajectory forecasting in crowds: A deep learning perspec-
and Jiwen Lu. Personalized trajectory prediction via distri- tive. IEEE Transactions on Intelligent Transportation Sys-
bution discrimination. In Proceedings of the IEEE/CVF In- tems, 23(7):7386–7400, 2021. 8
ternational Conference on Computer Vision, pages 15580– [22] Mihee Lee, Samuel S Sohn, Seonghyeon Moon, Sejong
15589, 2021. 2 Yoon, Mubbasir Kapadia, and Vladimir Pavlovic. Muse-
[8] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Ge- vae: multi-scale vae for environment-aware long term tra-
offrey Hinton. A simple framework for contrastive learning jectory prediction. In Proceedings of the IEEE/CVF Con-
of visual representations. In International conference on ma- ference on Computer Vision and Pattern Recognition, pages
chine learning, pages 1597–1607. PMLR, 2020. 2 2221–2230, 2022. 6
[9] Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge [23] Junnan Li, Pan Zhou, Caiming Xiong, and Steven CH Hoi.
Belongie. Class-balanced loss based on effective number of Prototypical contrastive learning of unsupervised representa-
samples. In Proceedings of the IEEE/CVF conference on tions. arXiv preprint arXiv:2005.04966, 2020. 2, 3, 4
computer vision and pattern recognition, pages 9268–9277, [24] Tianhong Li, Peng Cao, Yuan Yuan, Lijie Fan, Yuzhe Yang,
2019. 2, 6 Rogerio S Feris, Piotr Indyk, and Dina Katabi. Targeted su-
[10] Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, pervised contrastive learning for long-tailed recognition. In
and Alexandre Alahi. Social gan: Socially acceptable tra- Proceedings of the IEEE/CVF Conference on Computer Vi-
jectories with generative adversarial networks. In Proceed- sion and Pattern Recognition, pages 6918–6928, 2022. 1
ings of the IEEE conference on computer vision and pattern [25] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and
recognition, pages 2255–2264, 2018. 1, 6 Piotr Dollár. Focal loss for dense object detection. In Pro-
[11] David Ha, Andrew Dai, and Quoc V Le. Hypernetworks. ceedings of the IEEE international conference on computer
arXiv preprint arXiv:1609.09106, 2016. 2, 4 vision, pages 2980–2988, 2017. 2
[12] Marah Halawa, Olaf Hellwich, and Pia Bideau. Action-based [26] Yuejiang Liu, Qi Yan, and Alexandre Alahi. Social nce: Con-
contrastive learning for trajectory prediction. In European trastive learning of socially-aware motion representations. In
Conference on Computer Vision, pages 143–159. Springer, Proceedings of the IEEE/CVF International Conference on
2022. 2 Computer Vision, pages 15118–15129, 2021. 2
[13] Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. [27] Yuanfu Luo, Panpan Cai, Aniket Bera, David Hsu, Wee Sun
Borderline-smote: a new over-sampling method in im- Lee, and Dinesh Manocha. Porca: Modeling and planning
balanced data sets learning. In International conference on for autonomous driving among many pedestrians. IEEE
intelligent computing, pages 878–887. Springer, 2005. 2 Robotics and Automation Letters, 3(4):3418–3425, 2018. 1

1408
[28] Osama Makansi, Özgün Çiçek, Yassine Marrakchi, and [40] Nasim Shafiee, Taskin Padir, and Ehsan Elhamifar. Introvert:
Thomas Brox. On exposing the challenging long tail in Human trajectory prediction via conditional 3d attention. In
future prediction of traffic actors. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vi-
IEEE/CVF International Conference on Computer Vision, sion and Pattern Recognition, pages 16815–16825, 2021. 2
pages 13147–13157, 2021. 1, 2, 4, 5, 6, 8 [41] Li Shen, Zhouchen Lin, and Qingming Huang. Relay back-
[29] Osama Makansi, Eddy Ilg, Ozgun Cicek, and Thomas Brox. propagation for effective learning of deep convolutional neu-
Overcoming limitations of mixture density networks: A sam- ral networks. In European conference on computer vision,
pling and fitting framework for multimodal future prediction. pages 467–482. Springer, 2016. 2, 6
In Proceedings of the IEEE/CVF Conference on Computer [42] Jianhua Sun, Yuxuan Li, Hao-Shu Fang, and Cewu Lu. Three
Vision and Pattern Recognition, pages 7144–7153, 2019. 2 steps to multimodal trajectory prediction: Modality clus-
[30] Karttikeya Mangalam, Yang An, Harshayu Girase, and Jiten- tering, classification and synthesis. In Proceedings of the
dra Malik. From goals, waypoints & paths to long term hu- IEEE/CVF International Conference on Computer Vision,
man trajectory forecasting. In Proceedings of the IEEE/CVF pages 13250–13259, 2021. 2, 8
International Conference on Computer Vision, pages 15233– [43] Chenxin Xu, Maosen Li, Zhenyang Ni, Ya Zhang, and Si-
15242, 2021. 2, 5, 6, 7 heng Chen. Groupnet: Multiscale hypergraph neural net-
[31] Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh works for trajectory prediction with relational reasoning. In
Rawat, Himanshu Jain, Andreas Veit, and Sanjiv Kumar. Proceedings of the IEEE/CVF Conference on Computer Vi-
Long-tail learning via logit adjustment. arXiv preprint sion and Pattern Recognition, pages 6498–6507, 2022. 2
arXiv2007.07314, 2020. 2 [44] Chenxin Xu, Weibo Mao, Wenjun Zhang, and Siheng Chen.
[32] Abduallah Mohamed, Kun Qian, Mohamed Elhoseiny, and Remember intentions: Retrospective-memory-based trajec-
Christian Claudel. Social-stgcnn: A social spatio-temporal tory prediction. In Proceedings of the IEEE/CVF Conference
graph convolutional neural network for human trajectory on Computer Vision and Pattern Recognition, pages 6488–
prediction. In Proceedings of the IEEE/CVF Conference 6497, 2022. 2
on Computer Vision and Pattern Recognition, pages 14424– [45] Yuzhe Yang, Kaiwen Zha, Yingcong Chen, Hao Wang, and
14432, 2020. 2 Dina Katabi. Delving into deep imbalanced regression. In In-
[33] Sriram Narayanan, Ramin Moslemi, Francesco Pittaluga, ternational Conference on Machine Learning, pages 11842–
Buyu Liu, and Manmohan Chandraker. Divide-and-conquer 11851. PMLR, 2021. 2
for lane-aware diverse trajectory prediction. In Proceedings [46] Cunjun Yu, Xiao Ma, Jiawei Ren, Haiyu Zhao, and Shuai Yi.
of the IEEE/CVF Conference on Computer Vision and Pat- Spatio-temporal graph transformer networks for pedestrian
tern Recognition, pages 15799–15808, 2021. 2 trajectory prediction. In European Conference on Computer
[34] Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Repre- Vision, pages 507–523. Springer, 2020. 2
sentation learning with contrastive predictive coding. arXiv [47] Ye Yuan, Xinshuo Weng, Yanglan Ou, and Kris M Kitani.
preprint arXiv:1807.03748, 2018. 2 Agentformer: Agent-aware transformers for socio-temporal
[35] Bo Pang, Tianyang Zhao, Xu Xie, and Ying Nian Wu. Tra- multi-agent forecasting. In Proceedings of the IEEE/CVF
jectory prediction with latent belief energy-based model. In International Conference on Computer Vision, pages 9813–
Proceedings of the IEEE/CVF Conference on Computer Vi- 9823, 2021. 2
sion and Pattern Recognition, pages 11814–11824, 2021. 2 [48] Pu Zhang, Wanli Ouyang, Pengfei Zhang, Jianru Xue, and
[36] Tung Phan-Minh, Elena Corina Grigore, Freddy A Boulton, Nanning Zheng. Sr-lstm: State refinement for lstm to-
Oscar Beijbom, and Eric M Wolff. Covernet: Multimodal wards pedestrian trajectory prediction. In Proceedings of
behavior prediction using trajectory sets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12085–12094, 2019. 1, 2, 5
Recognition, pages 14074–14083, 2020. 2 [49] Pu Zhang, Jianru Xue, Pengfei Zhang, Nanning Zheng, and
[37] Amir Sadeghian, Vineet Kosaraju, Ali Sadeghian, Noriaki Wanli Ouyang. Social-aware pedestrian trajectory predic-
Hirose, Hamid Rezatofighi, and Silvio Savarese. Sophie: tion via states refinement lstm. IEEE transactions on pattern
An attentive gan for predicting paths compliant to social and analysis and machine intelligence, 2020. 1, 4
physical constraints. In Proceedings of the IEEE/CVF con- [50] Hang Zhao, Jiyang Gao, Tian Lan, Chen Sun, Ben Sapp,
ference on computer vision and pattern recognition, pages Balakrishnan Varadarajan, Yue Shen, Yi Shen, Yuning Chai,
1349–1358, 2019. 1 Cordelia Schmid, et al. Tnt: Target-driven trajectory pre-
[38] Tim Salzmann, Boris Ivanovic, Punarjay Chakravarty, and diction. In Conference on Robot Learning, pages 895–904.
Marco Pavone. Trajectron++ dynamically-feasible trajectory PMLR, 2021. 2
forecasting with heterogeneous data. In European Confer- [51] He Zhao and Richard P Wildes. Where are you heading?
ence on Computer Vision, pages 683–700. Springer, 2020. 1, dynamic trajectory prediction with expert goal examples. In
2, 4, 5 Proceedings of the IEEE/CVF International Conference on
[39] Dvir Samuel and Gal Chechik. Distributional robustness loss Computer Vision, pages 7629–7638, 2021. 8
for long-tail learning. In Proceedings of the IEEE/CVF Inter-
national Conference on Computer Vision, pages 9495–9504,
2021. 1

1409

Dsci303-19 GM - em
No ratings yet
Dsci303-19 GM - em
81 pages
Chapter 3
No ratings yet
Chapter 3
81 pages
Hashing:: Michael Levin
No ratings yet
Hashing:: Michael Levin
146 pages
Recurrence Relations-2
No ratings yet
Recurrence Relations-2
9 pages
Finite State Machines (FSM) Design
No ratings yet
Finite State Machines (FSM) Design
18 pages
The Answer Is Yes, Worst Case
No ratings yet
The Answer Is Yes, Worst Case
4 pages
Numerical Fourier Analysis Second Edition (Gerlind Plonka, Daniel Potts, Gabriele Steidl Etc.) (Z-Library)
No ratings yet
Numerical Fourier Analysis Second Edition (Gerlind Plonka, Daniel Potts, Gabriele Steidl Etc.) (Z-Library)
676 pages
Numerical Methods Lab Report
No ratings yet
Numerical Methods Lab Report
12 pages
Particle Swarm Optimization - 1
No ratings yet
Particle Swarm Optimization - 1
21 pages
Compensator For Control System
No ratings yet
Compensator For Control System
56 pages
Lecture04 Range Searching
No ratings yet
Lecture04 Range Searching
38 pages
AI Manual
No ratings yet
AI Manual
69 pages
Directsparsematrices
No ratings yet
Directsparsematrices
87 pages
Homework 1
No ratings yet
Homework 1
2 pages
I-:'-Ntrumsu'!I: Model Selection in Neural Networks
No ratings yet
I-:'-Ntrumsu'!I: Model Selection in Neural Networks
27 pages
16 Weeks Plan
No ratings yet
16 Weeks Plan
5 pages
Class-Balanced Loss Based On Effective Number of Samples
No ratings yet
Class-Balanced Loss Based On Effective Number of Samples
11 pages
A Proposal On Machine Learning Via Dynamical Systems
No ratings yet
A Proposal On Machine Learning Via Dynamical Systems
11 pages
Practice Problem
No ratings yet
Practice Problem
12 pages
Deep Long-Tailed Learning A Survey
No ratings yet
Deep Long-Tailed Learning A Survey
20 pages
3D Reconstruction Slides
No ratings yet
3D Reconstruction Slides
12 pages
Long-Tail Learning Via Logit Adjustment
No ratings yet
Long-Tail Learning Via Logit Adjustment
27 pages
3.4. Sharpening Spatial Filtering
No ratings yet
3.4. Sharpening Spatial Filtering
45 pages
Lecture 14
No ratings yet
Lecture 14
25 pages
Unit V
No ratings yet
Unit V
43 pages
Image Fusion Using MatlabIMAGE FUSION USING MATLAB
No ratings yet
Image Fusion Using MatlabIMAGE FUSION USING MATLAB
8 pages
4220 Learning Fast and Slow For Onl
No ratings yet
4220 Learning Fast and Slow For Onl
22 pages
07 Trellis Diagram and The Viterbi Algorithm
No ratings yet
07 Trellis Diagram and The Viterbi Algorithm
12 pages
DAA Lesson Plan
No ratings yet
DAA Lesson Plan
3 pages
Sms Spam Filtering Pres
No ratings yet
Sms Spam Filtering Pres
18 pages
Dip Cat 2
No ratings yet
Dip Cat 2
2 pages
6 CoverNet
No ratings yet
6 CoverNet
10 pages
Advantages of SJF: Machine Learning Algorithms
No ratings yet
Advantages of SJF: Machine Learning Algorithms
5 pages
Node2vec: Scalable Feature Learning For Networks
No ratings yet
Node2vec: Scalable Feature Learning For Networks
10 pages
3-Data Preprocessing
No ratings yet
3-Data Preprocessing
32 pages
Pro Co
No ratings yet
Pro Co
16 pages
DSP Lab Report
No ratings yet
DSP Lab Report
22 pages
Long-Tailed Learning As Multi-Objective Optimization
No ratings yet
Long-Tailed Learning As Multi-Objective Optimization
9 pages
Hopfield
No ratings yet
Hopfield
3 pages
Flood Prediction Using Supervised Machine Learning Algorithms
No ratings yet
Flood Prediction Using Supervised Machine Learning Algorithms
4 pages
A Survey On Long-Tailed Visual Recognition: Lu Yang, He Jiang, Qing Song, Jun Guo
No ratings yet
A Survey On Long-Tailed Visual Recognition: Lu Yang, He Jiang, Qing Song, Jun Guo
32 pages
Individualized Federated Learning For Traffic Prediction With Error Driven Aggregation
No ratings yet
Individualized Federated Learning For Traffic Prediction With Error Driven Aggregation
19 pages
M D H F L D A: Itigating ATA Eterogeneity in Ederated Earning With ATA Ugmentation
No ratings yet
M D H F L D A: Itigating ATA Eterogeneity in Ederated Earning With ATA Ugmentation
18 pages
Learning Critically Selective Self Distillation in Federated Learning On Non-IID Data
No ratings yet
Learning Critically Selective Self Distillation in Federated Learning On Non-IID Data
12 pages
Learning Critically Selective Self-Distillation in Federated Learning On Non-IID Data
No ratings yet
Learning Critically Selective Self-Distillation in Federated Learning On Non-IID Data
12 pages
DeepSparse for Efficient CPU Inference: The Complete Guide for Developers and Engineers
From Everand
DeepSparse for Efficient CPU Inference: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Alpaca Fine-Tuning with LLaMA: The Complete Guide for Developers and Engineers
From Everand
Alpaca Fine-Tuning with LLaMA: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Ray Tune for Scalable Hyperparameter Optimization: The Complete Guide for Developers and Engineers
From Everand
Ray Tune for Scalable Hyperparameter Optimization: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Cohere Rerank in Practice: The Complete Guide for Developers and Engineers
From Everand
Cohere Rerank in Practice: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Choreography Patterns in Distributed Systems: Definitive Reference for Developers and Engineers
From Everand
Choreography Patterns in Distributed Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Backtracking Algorithms and Applications: Definitive Reference for Developers and Engineers
From Everand
Backtracking Algorithms and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet
The Actor Model in Distributed Computing: Definitive Reference for Developers and Engineers
From Everand
The Actor Model in Distributed Computing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Union-Find Data Structures and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Union-Find Data Structures and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Keras Deep Learning Essentials: Definitive Reference for Developers and Engineers
From Everand
Keras Deep Learning Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Queue Structures and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Queue Structures and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical Replication Architectures and Protocols: Definitive Reference for Developers and Engineers
From Everand
Practical Replication Architectures and Protocols: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Optimal Pathfinding with A-Star Algorithms: Definitive Reference for Developers and Engineers
From Everand
Optimal Pathfinding with A-Star Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Boost.Thread in Practice: Definitive Reference for Developers and Engineers
From Everand
Boost.Thread in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
StarPU: Parallel Computing and Task Scheduling Techniques
From Everand
StarPU: Parallel Computing and Task Scheduling Techniques
Richard Johnson
No ratings yet
Efficient String Processing with Trie Structures: Definitive Reference for Developers and Engineers
From Everand
Efficient String Processing with Trie Structures: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Knuth-Morris-Pratt Algorithm Explained: Definitive Reference for Developers and Engineers
From Everand
Knuth-Morris-Pratt Algorithm Explained: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
EtherChannel Configuration and Optimization: Definitive Reference for Developers and Engineers
From Everand
EtherChannel Configuration and Optimization: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Bootstrapping Language-Image Pretraining: The Complete Guide for Developers and Engineers
From Everand
Bootstrapping Language-Image Pretraining: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Fluent Simulation and Modeling Techniques: Definitive Reference for Developers and Engineers
From Everand
Fluent Simulation and Modeling Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Comprehensive Guide to Glue for Scientific Data Exploration: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Glue for Scientific Data Exploration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Caffe Deep Learning Framework Essentials: Definitive Reference for Developers and Engineers
From Everand
Caffe Deep Learning Framework Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical MXNet Applications: Definitive Reference for Developers and Engineers
From Everand
Practical MXNet Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Applied Statistical Analysis with SPSS: Definitive Reference for Developers and Engineers
From Everand
Applied Statistical Analysis with SPSS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Big-O Notation Demystified: Definitive Reference for Developers and Engineers
From Everand
Big-O Notation Demystified: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Principles of Data Forwarding Technologies: Definitive Reference for Developers and Engineers
From Everand
Principles of Data Forwarding Technologies: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Principles of Mesh Networks and Mesh Generation: Definitive Reference for Developers and Engineers
From Everand
Principles of Mesh Networks and Mesh Generation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Tensor Structures and Applications: Definitive Reference for Developers and Engineers
From Everand
Tensor Structures and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Technical Foundations of Torch: Definitive Reference for Developers and Engineers
From Everand
Technical Foundations of Torch: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Transformers: Principles and Applications
From Everand
Transformers: Principles and Applications
Richard Johnson
No ratings yet
OneFlow for Parallel and Distributed Deep Learning Systems: The Complete Guide for Developers and Engineers
From Everand
OneFlow for Parallel and Distributed Deep Learning Systems: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Transformers in Deep Learning Architecture: Definitive Reference for Developers and Engineers
From Everand
Transformers in Deep Learning Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Attractor Networks: Fundamentals and Applications in Computational Neuroscience
From Everand
Attractor Networks: Fundamentals and Applications in Computational Neuroscience
Fouad Sabry
No ratings yet
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Few-Shot Machine Learning: Doing More with Less Data
From Everand
Few-Shot Machine Learning: Doing More with Less Data
Robert Johnson
No ratings yet
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
Foundational Models and Architectures S1: Generative AI, #1
From Everand
Foundational Models and Architectures S1: Generative AI, #1
Leaster Startx
No ratings yet
AI-Driven Time Series Forecasting: Complexity-Conscious Prediction and Decision-Making
From Everand
AI-Driven Time Series Forecasting: Complexity-Conscious Prediction and Decision-Making
Raghurami Reddy Etukuru Ph.D.
No ratings yet
Mastering Partial Least Squares Structural Equation Modeling (Pls-Sem) with Smartpls in 38 Hours
From Everand
Mastering Partial Least Squares Structural Equation Modeling (Pls-Sem) with Smartpls in 38 Hours
Ken Kwong-Kay Wong
3/5 (1)
Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路（國際英文版）
From Everand
Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路（國際英文版）
Jyh-Horng Jeng
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
From Everand
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
Sebastian Thelen
5/5 (1)
Julia for Data Science
From Everand
Julia for Data Science
Anshul Joshi
No ratings yet
Neural Networks and Fuzzy Logic
From Everand
Neural Networks and Fuzzy Logic
C. Naga Bhaskar
No ratings yet
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet

8 Fend

Uploaded by

8 Fend

Uploaded by

This CVPR paper is the Open Access version, provided by the Computer Vision Foundation.

Except for this watermark, it is identical to the accepted version;

FEND: A Future Enhanced Distribution-Aware Contrastive Learning

Yuning Wang 1 *, Pu Zhang 2 *, Lei Bai 3 , Jianru Xue 1 †

Predicting the future trajectories of the traffic agents is

3. Method dicted differently.

Problem formulation. Trajectory prediction is a kind of

Top 1% Top 2% Top 3 % Top 4% Top 5% Rest All

(a) (b) (c)

(a) (b) (c)

(d) (e) (f)

(d) (e) (f)

nal Traj++ EWTA (red) and the Traj++ EWTA+contrastive

Quantitative ablation studies. Results of quantitative a Top 1% Top 2% Top 3% All

1.0 1.0 contrast

You might also like

Yuning Wang 1 , Pu Zhang 2 , Lei Bai 3 , Jianru Xue 1 †