0% found this document useful (0 votes)
2 views4 pages

(Update) Task2 AntGraph 1st

This document presents the winning solution, AntGraph, for the WSDM Cup 2022 focused on temporal link prediction, achieving AUC scores of 0.666 and 0.902 on two datasets. The authors analyze the importance of structural and attribute information over temporal data, introducing various feature engineering methods and demonstrating the effectiveness of their approach through extensive experiments. The paper emphasizes the significance of first-order relationships and provides insights into the datasets and evaluation metrics used in the competition.

Uploaded by

librahu123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views4 pages

(Update) Task2 AntGraph 1st

This document presents the winning solution, AntGraph, for the WSDM Cup 2022 focused on temporal link prediction, achieving AUC scores of 0.666 and 0.902 on two datasets. The authors analyze the importance of structural and attribute information over temporal data, introducing various feature engineering methods and demonstrating the effectiveness of their approach through extensive experiments. The paper emphasizes the significance of first-order relationships and provides insights into the datasets and evaluation metrics used in the competition.

Uploaded by

librahu123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

An Effective Graph Learning based Approach for Temporal Link

Prediction: The First Place of WSDM Cup 2022


Qian Zhao Shuo Yang Binbin Hu∗
Ant Group Ant Group Ant Group
[email protected] [email protected] [email protected]

Zhiqiang Zhang Yakun Wang Yusong Chen


Ant Group Ant Group Ant Group
[email protected] [email protected] [email protected]

Jun Zhou Chuan Shi


Ant Group Beijing University of Posts and
[email protected] Telecommunications
[email protected]

ABSTRACT Table 1: The statistics of two datasets. Note that “# Node” of


Dataset B is obtained by the maximum value of node ids, and
Temporal link prediction, as one of the most crucial work in tem-
“Inter.” is short for “Intermediate”.
poral graphs, has attracted lots of attention from the research area.
The WSDM Cup 2022 seeks for solutions that predict the existence
probabilities of edges within time spans over temporal graph. This Dataset A Dataset B
paper introduces the solution of AntGraph, which wins the 1st place # Train 27, 045, 268 8, 278, 431
in the competition. We first analysis the theoretical upper-bound # Initial Test 8, 197 3, 863
of the performance by removing temporal information, which im- # Inter. Test 49, 903 49, 940
plies that only structure and attribute information on the graph # Final Test 200, 000 200, 000
could achieve great performance. Based on this hypothesis, then
we introduce several well-designed features. Finally, experiments # Nodes 19, 942 1, 304, 045
conducted on the competition datasets show the superiority of our # Edges 27, 045, 268 8, 278, 431
proposal, which achieved AUC score of 0.666 on dataset A and 0.902 # Node feat. 8 N.A.
on dataset B, the ablation studies also prove the efficiency of each # Edge feat. N.A. 768
feature. # Edge type 248 14

KEYWORDS
Link Prediction, Gradient Boosting Decision Trees, Graph Learning,
WSDM Cup 2022 to the data analyses, we surprisingly find that removing the time
span information in prediction could also achieve satisfactory
1 INTRODUCTION performance.
As graphs are ubiquitously exist in a wide range of real-world • Subsequently, we introduce the data processing flow, enumer-
applications, many problems can be formulated as specific tasks ate several feature engineering methods ranging from network
over graphs. And link prediction [4], as one of the most important embedding to heuristic graph structure.
task in graph-structured datasets, is widely applied in biology [10], • Finally, we conduct comprehensive experiments on the competi-
recommendation [3, 14] and finance [12]. Meanwhile, real-world tion datasets, which show the effectiveness of our proposal, and
data is usually evolving over time, some following literature [11, exhaustive ablation studies also show the improvement of each
13] consider devising temporal graph learning models to uncover kind of feature.
temporal information. However, predicting the links on a temporal
graph is more non-trivial. WSDM Cup 2022 calls for solutions that Our source code are publicly available on GitHub1 .
predicting the probability of a link within a period of time. In this
paper, we will introduce the solution of AntGraph team, which 2 DATASETS
ranks the first of the competition (achieved AUC score of 0.666 In this section, we focus on the exploratory of datasets provided by
on dataset A and 0.902 on dataset B). And this technical report is the competition, and an in-depth analysis is presented, followed by
organized as following: the detailed introduction of evaluation metrics.
• First, we give some statistics on the datasets, do some exploratory
analyses and introduce the motivation of the method. According
∗ Corresponding author. 1 https://github.com/im0qianqian/WSDM2022TGP-AntGraph
WSDM Cup ’22, February 21–25, 2022, Arizona, USA Qian Zhao, Shuo Yang, Binbin Hu∗ , Zhiqiang Zhang, Yakun Wang, Yusong Chen, Jun Zhou, and Chuan Shi

Table 2: The analysis of the existence of same edges in the initial test set.

Exist in graph Exist in graph Not exist Not exist


Description Total Exist in graph
label = 1 label = 0 label = 1 label = 0
𝑤 .𝑜. edge type 7354 (89.72%) 3333 (40.66%) 4021 (49.05%) 183 (2.23%) 660 (8.05%)
Dataset A 8197
𝑤 .𝑖. edge type 5886 (71.81%) 2755 (33.61%) 3131 (38.20%) 761 (9.28%) 1550 (18.91%)
𝑤 .𝑜. edge type 3195 (82.71%) 2123 (54.96%) 1072 (27.75%) 128 (3.31%) 540 (13.98%)
Dataset B 3863
𝑤 .𝑖. edge type 2612 (67.62%) 1685 (43.62%) 927 (24.00%) 566 (14.65%) 685 (17.73%)

Table 3: The performance w.r.t. AUC of our native strategy set determines the ranking of the competition. In summary,
compared to the baseline model provided by the sponsor on we detailed all necessary statistics of two datasets in Table 1.
both initial and intermediate (Inter.) test set.

2.2 Data Analysis


Method Initial test Inter. test
Generally, an inspiring data analysis could shed some light on the
Baseline model 0.5110 0.5026 model design, which plays a vital role in various data mining tasks.
Dataset A Naive strategy (𝑤 .𝑜. edge type) 0.5428 0.5432 Based on the originally provided data (i.e., the train set and the
Naive strategy (𝑤 .𝑖. edge type) 0.5597 0.5687 initial test set), we perform a series of detailed data analysis as
Baseline model 0.5100 0.5026 follows:
Dataset B Naive strategy (𝑤 .𝑜. edge type) 0.6391 0.8655
Native strategy (𝑤 .𝑖. edge type) 0.5867 0.8059 • The existence of same edges in the test. We firstly analyze
whether edges of the initial test set have already existed in the
original graph. In this analysis, timestamps are not taken into
consideration. As shown in Table 2, we observe that the original
2.1 A Brief Description graph contains most of edges in the initial test for both datasets,
The competition expects participants to adopt a single model (hy- especially when the edge type is ignored. Surprisingly, we also
perparameters can vary) that works well on two kinds of data simul- find that approximately half of edges (i.e., 40.66% 𝑣 .𝑠. 49.05%
taneously, and thus correspondingly provides two representative without edge type and 33.61% 𝑣.𝑠. 38.20% with edge type) existed
large-scale temporal graph datasets. in the graph keep the same labels for Dataset A, while about
• Dataset A characterizes a dynamic event graph with entities three quarters of edges (i.e., 54.96% 𝑣.𝑠. 27.75% without edge type
as nodes and different types of events as edges. Each node and 43.62% 𝑣.𝑠. 24.00% with edge type) existed in the graph keep
maybe associated with rich features if available, and except the same labels for Dataset B.
for the edge types, no any other information is available for Following aforementioned observations, we are curious about
edges. the performance of the most naive strategy that just predict the
• Dataset B characterizes a user-item graph with users and existence of each edge via its existence in the original graphs. We
items as nodes and different types of interactions as edges. present corresponding results in Table 3, and find that the naive
Each edge is associated with rich features if available, and strategy achieve a more competitive performance than baseline
no feature information is available for nodes. Noting that the model provided by the sponsor. It indicates the crucial importance
sponsor treat the user-item graph as a bipartite graph. For of first-order relationship for the task.
convenience, we try to convert this graph to an undirected • Optimal performance without consideration of timestamps.
multi-relation graph through shifting item ids. In particular, Secondly, we also explore theoretical upper-bounds on perfor-
we perform the above operation by adding the sum of 1 and mance without temporal information. We select the data with
the maximum value of user ids (denotes as 𝑂 𝑓 𝑓 𝑠𝑒𝑡𝑢 ) for each the same node pair in the initial set, and then calculate the mode
original item id, as follows: or mean value for all labels as the prediction result of these data.
 Experiments show that the model can still achieve a good perfor-
𝑛𝑜𝑑𝑒_𝑖𝑑 𝑖 𝑓 𝑎𝑛 𝑢𝑠𝑒𝑟 mance, as shown in Table 4.
𝑛𝑜𝑑𝑒_𝑖𝑑 = (1)
𝑛𝑜𝑑𝑒_𝑖𝑑 + 𝑂 𝑓 𝑓 𝑠𝑒𝑡𝑢 𝑖 𝑓 𝑎𝑛 𝑖𝑡𝑒𝑚
Since the competition asks participants to predict whether an edge
will exist between two nodes within a given time span, instead of a 2.3 Evaluation Metrics
single timestamp in the graph, a start and an end timestamp is This competition uses Area Under ROC (AUC) as the evaluation
respectively given for each query in test stage. Also, for each dataset, metric for both tasks. Intuitively, the two task have different dif-
the sponsor provides a train set, an initial test set, an intermediate ficulties, and sacrificing one task to do well on the another is not
test set and a final test set, and the labels of intermediate test set expected. Therefore, the competition further adopt the average of
and final test set are still not available at present. It is worthwhile to T-scores as the ranking basis for encouraging the model to per-
note that only the performance of the model in the final test form well on different tasks. The formal definition is introduced as
An Effective Graph Learning based Approach for Temporal Link Prediction: The First Place of WSDM Cup 2022 WSDM Cup ’22, February 21–25, 2022, Arizona, USA

Table 4: Explore the maximum AUC without temporal infor- 3.2 Feature Engineering
mation.
3.2.1 LINE embedding. As concluded in the previous data analysis,
the first-order relation is of crucially importance in our link predic-
Initial test Initial test tion task. In order to capture such deep correlation between nodes
Description
(mode) (mean) in an more fine-grained manner, we introduce the LINE embed-
node pair (𝑤 .𝑜. edge type) 0.9040 0.9776 ding, an effective and efficient graph learning framework arbitrary
Dataset A graphs (undirected, directed, and/or weighted). In particular, LINE
node pair (𝑤 .𝑖. edge type) 0.9900 0.9997
is carefully designed preserves both the first-order and second-
node pair (𝑤 .𝑜. edge type) 0.8946 0.9795 order proximities, which is suitable to our scenarios to capture co-
Dataset B
node pair (𝑤 .𝑖. edge type) 0.9147 0.9875 occurrence relation. On the other hands, several heterogeneous [2]
and knowledge [1, 7, 9] graph representation based methods are
also promising ways for learning powerful representations, whereas
the LNIE experimentally achieves the best performance, shown in
follows:
following experiment part.
𝐴𝑈𝐶 − mean(𝐴𝑈𝐶)
𝑇 𝑠𝑐𝑜𝑟𝑒 = ∗ 0.1 + 0.5 (2) 3.2.2 Node crossing features. After we obtain the representations
std(𝐴𝑈𝐶)
for each node in graphs, we construct crossing features to further
𝑇 𝑆𝑐𝑜𝑟𝑒𝐴 + 𝑇 𝑆𝑐𝑜𝑟𝑒𝐵 reveal the correlation of each node pair. Specifically, we calculate
AverageOfTscore = (3) the similarity of node pairs w.r.t. LINE embeddings in datasets as the
2
node crossing features. Given a node pair (𝑢, 𝑣) with corresponding
where mean(𝐴𝑈 𝐶) and std(𝐴𝑈𝐶) represents the mean and stan- embedding 𝑒𝑢 and 𝑒 𝑣 , the similarity is calculated through the cosine
dard deviation of AUC of all participants. Clearly, an larger average operation (i.e., 𝑒𝑢 · 𝑒 𝑣 /||𝑒𝑢 || × ||𝑒 𝑣 ||) and the dot product (i.e., 𝑒𝑢 · 𝑒 𝑣 ):
of T-scores means a better performance.
3.2.3 Subgraph features. In addition, we also added the follow-
3 METHODOLOGY ing statistical features based on the graph structure to well help
In this section, we introduce our complete solution for large-scale downstream model capture high-order information:
temporal graph link prediction task, which consists of train data • Unitary feature w.r.t. individual nodes: i) The degree of this node;
construction component, feature engineering component and down- ii) The number of different nodes adjacent to this node; iii) The
stream model training component. In the following, we will zoom number of different edge types adjacent to this node.
into each well designed component. • Binary information w.r.t. node pairs: i) The number of one hop
paths between two nodes; ii) The number of two hop paths be-
3.1 Train Data Construction tween two nodes; iii) The number of different edge types between
As mentioned above, the goal of this competition is to predict two nodes.
whether an edge will exist between two nodes within a given time • Ternary information w.r.t. node pairs and edge types: The number
span, whereas each edge in the provided graphs is only associ- of occurrences of this triplet.
ated with a single timestamp. Hence, the inconsistent problem
between training and testing severely threatens the generalization 3.3 Catboost Model
of models. In addition, previous data analysis has concluded that
The link prediction task can be easily formulated as a binary classi-
this task may not benefit from involving timestamps, therefore, we
fication problem based on the extracted features from each (source
construct the train data without timestamps as follows:
node, relation, target) triple. On the other hands, gradient boosting
3.1.1 Negative sampling. For efficient training, we adopt the shuf- has prove its powerful capability in various applications for clas-
fling based sampling strategy to sample negative instance in batch, sification. Recently, catboost [6] has gained increasing popularity
rather then the whole node set. Moreover, the timestamps are ig- and attention due to its fast processing speed and high prediction
nored in our negative sampling process. In particular, our negative performance. We feed the train data (see Section 3.1) as well as
sampling process is detailed is follows: i) We denotes edges in the abundant features (see Section 3.2) into catboost model, and then
original graphs as the positive instance set, consisting of source utilize the produced scores as the final predictions.
nodes, target nodes and relations. ii) We only keep source nodes
unchanged, and randomly shuffle target nodes and relations to gen- 4 EXPERIMENTS
erate the negative instance set. iii) We combine the above positive
4.1 Overall Performance
and negative instance set, and uniformly sample a certain number
of instances to construct the final train set. Performance in the leaderbord. We present the results of top five
teams from the learderboard in Table 6. We observation that our
3.1.2 Removing redundant features. Firstly, we remove all time re- solution achieve the best performance in Dataset A and competitive
lated features, including timestamp, start time, end time. Moreover, performance in Dataset B. As the best performance achived w.r.t.
we remove the edge features for Dataset B, since these featues are the final ranking metric further indicating that our solution works
not avaiable for most of edges, i.e., the non-empty ratio is 6.67%. well on both kinds of data simultaneously.
WSDM Cup ’22, February 21–25, 2022, Arizona, USA Qian Zhao, Shuo Yang, Binbin Hu∗ , Zhiqiang Zhang, Yakun Wang, Yusong Chen, Jun Zhou, and Chuan Shi

Table 5: Overall experimental results of different methods in two datasets.

Dataset A Dataset B
Model
Initial AUC Inter. AUC Initial AUC Inter. AUC
baseline 0.5110 0.5026 0.5100 0.5026
DeepWalk [5] 0.5352 0.5707 0.5246 0.4985
TransE [1] 0.5182 0.5614 0.6389 0.8903
RotatE [7] 0.5315 0.5736 0.6323 0.8981
ComplEx [9] 0.5514 0.5821 0.6359 0.9014
LINE [8] 0.6072 0.6320 0.6425 0.8905
catboost (raw input data) 0.6045 0.6222 0.5545 0.5869
+ LINE embedding 0.6377 (+5.49%) 0.6540 (+5.11%) 0.6399 (+15.40%) 0.9013 (+53.57%)
+ Subgraph features 0.6611 (+9.36%) 0.6673 (+7.25%) 0.5861 (+5.70%) 0.7561 (+28.83%)
+ LINE embedding + Subgraph features 0.6619 (+9.50%) 0.6659 (+7.02%) 0.6368 (+14.84%) 0.8978 (+52.97%)
+ LINE embedding + Node crossing features 0.6573 (+8.73%) 0.6673 (+7.25%) 0.6504 (+17.29%) 0.9001 (+53.37%)
+ All (submitted version) 0.6657 (+10.12%) 0.6671 (+7.22%) 0.6459 (+16.48%) 0.9028 (+53.83%)

Table 6: Top five results in the final learderboard. 5 CONCLUSION


This paper describes our solution for WSDM 2022 Challenge - Tem-
Dataset A Dataset B Average of poral Link Prediction. For this task, we design a novel negative
Rank Team name
Final AUC Final AUC T/100 sampling strategy, and combined with data analysis to delete re-
1 AntGraph(Ours) 0.666001 0.901961 0.630737 dundant information. We introduce the LINE embedding to provide
2 nothing here 0.662482 0.906923 0.628942 local and global features of graph. At the same time, we design
3 NodeInGraph 0.627821 0.865567 0.585137 node crossing features and subgraph features. In the end, our team
4 We can [mask]! 0.603621 0.898232 0.572372 AntGraph was ranked the 1st place on the final leaderboard.
5 IDEAS Lab UT 0.605264 0.873949 0.566849
REFERENCES
[1] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Ok-
sana Yakhnenko. 2013. Translating embeddings for modeling multi-relational
Compare to baselines. We compare our methods with other 6 data. In NIPS.
[2] Binbin Hu, Yuan Fang, and Chuan Shi. 2019. Adversarial learning on heteroge-
methods, including the official baseline2 and several classic network neous information networks. In SIGKDD. 120–129.
embedding methods, i.e., DeepWalk [5], TransE [1], RotatE [7], [3] Binbin Hu, Chuan Shi, Wayne Xin Zhao, and Philip S Yu. 2018. Leveraging
meta-path based context for top-n recommendation with a neural co-attention
LINE [8], and ComplEx [9]. The experimental results in Table 5 model. In SIGKDD. 1531–1540.
shows that our solution outperforms all baselines by a considerable [4] Ajay Kumar, Shashank Sheshar Singh, Kuldeep Singh, and Bhaskar Biswas. 2020.
margin. Link prediction techniques, applications, and performance: A survey. Physica A:
Statistical Mechanics and its Applications 553 (2020), 124289.
Overall, both of observations verfy the effectiveness of our pro- [5] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning
posal. of social representations. In SIGKDD. 701–710.
[6] Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Doro-
gush, and Andrey Gulin. 2018. CatBoost: unbiased boosting with categorical
4.2 Ablation studies features. In NIPS.
[7] Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2019. RotatE: Knowl-
In this section, we perform a series of ablation studies to analyses edge Graph Embedding by Relational Rotation in Complex Space. In ICLR.
the impact of kinds of features proposed in Section 3.2, including [8] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei.
LINE embedding, node crossing features and subgraph features. We 2015. LINE: Large-scale Information Network Embedding.. In WWW.
[9] Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume
summarize the comparison results in Table 5, and we have following Bouchard. 2016. Complex embeddings for simple link prediction. In ICML. 2071–
observations: i) All extracted feature help base model achieve better 2080.
performance, and the best performance is yielded in most cases with [10] Turki Turki and Zhi Wei. 2017. A link prediction approach to cancer drug
sensitivity prediction. BMC systems biology 11, 5 (2017), 1–14.
the incorporation of all features. ii) Involving LINE features show [11] Xuhong Wang, Ding Lyu, Mengjian Li, Yang Xia, Qi Yang, et al. 2021. APAN:
a greater improvement than models involving subgraph features Asynchronous Propagation Attention Network for Real-time Temporal Graph
Embedding. In SIGMOD. 2628–2638.
in Dataset B, while opposite trend is observed in Dataset A. An [12] Shuo Yang, Binbin Hu, Zhiqiang Zhang, et al. 2021. Inductive Link Prediction with
intuitive explanation is that Dataset B is much sparser than Dataset Interactive Structure Learning on Attributed Graph. In ECML-PKDD. Springer,
A, and thus subgraph structure are hardly exploited in Dataset B. 383–398.
[13] Shuo Yang, Zhiqiang Zhang, Jun Zhou, et al. 2020. Financial Risk Analysis for
Overall, compared to the base model only using raw feature, our SMEs with Graph-based Supply Chain Mining.. In IJCAI. 4661–4667.
final submitted model achieve a relative improvement of 7.22% on [14] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton,
dataset A and 53.83% on dataset B, respectively. and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale
recommender systems. In SIGKDD. 974–983.

2 https://github.com/dglai/WSDM2022-Challenge

You might also like