3.1. Data
Our model is developed based on data provided by an international online fashion retailer. The dataset encompasses a sample spanning two years of order-level transactions and return records from the platform, totaling 2,627,927 valid data records. It was sampled from a data warehouse which was constructed from data sourced from various channels, including fundamental customer and product information, order-level transactions, and return records. Each data record in the dataset captures information relating to the ordering and returning of a product. Specifically, each record contains details such as the product ID, customer ID, order ID, order specifics (e.g., quantity, price, coupons used, payment method), product attributes (e.g., color, size, product group), and the final quantity of products returned. It is worth noting that a single product ID may refer to a product with varying colors and sizes.
Our data preprocessing encompasses a series of essential steps including data cleaning, transformation, feature engineering, normalization, and data splitting. During the data cleaning phase, we meticulously identified and removed abnormal return records where the quantity of returned products exceeded the amount ordered. Additionally, we employed forward and backward filling methods to impute missing values in the product price column. In the data transformation stage, we implemented measures to capture seasonal effects by converting the date of the order into two seasonal features: the day of the week and the month of the year. This allowed us to effectively incorporate seasonal variations into our analysis of product returns. Furthermore, in the feature engineering stage, we introduced additional features such as the number of products in the same order, the order value, and the average order value. These features were specifically designed to capture the impact of various order characteristics on return behavior, thus enhancing the predictive capabilities of our model. Lastly, we partitioned the data into training, validation, and test sets, and standardized the numerical features using a standardized normalization method based on the training dataset. This ensured consistency and reliability in our analysis while mitigating the risk of overfitting.
Our primary objective involved estimating the probability of a product’s return given its ordering information. Details regarding the data type and range of all features used for predicting returns, as well as the target label, are reported in
Table 1.
3.2. Customer–Order–Product Heterogenous Graph Neural Networks
There are in general three categories of factors associated with product returns: customer, product, and order. Customers tend to exhibit significant individual differences in their purchasing and returning behavior. Some individuals take advantage of lenient return policies and prefer to select multiple products in an order, refining their selection after their receipt of the order. Others prioritize social responsibility and environmental awareness and exercise greater caution in selecting products. However, these differences among customers are difficult to describe using explicit features, particularly for new users who have yet to make a purchase on the platform. Due to privacy protection concerns, customer information such as gender, age, and occupation is often incomplete or inaccurate. On the other hand, the purchaser of an order may not necessarily be the user of the product, making it challenging to accurately describe purchasing preferences based on demographic characteristics alone. Therefore, we can only infer customer preferences from their historical purchasing and returning records.
Similarly, products also possess unique attributes that contribute to return rates, such as color, size, category, manufacturer, and brand. Various other factors, including the quality, material, style, packaging, etc., could also influence the likelihood of a product being returned, but these are unobservable. Additionally, order characteristics may also relate to product returns, with variables like the number of different products, total value, and coupon usage within an order potentially impacting the probability of a product being returned.
We utilized a heterogenous graph to represent the connections between customer, order, and product, as denoted in
Figure 1. Within the graph, customers, orders, and products comprise distinct nodes. Customers and orders are linked by edges that signify a ‘one to many’ purchasing relationship, where a customer can have multiple orders, while each order corresponds to a single customer. Orders and products share a ‘many to many’ connection, where an order can include multiple products, and a product may be featured across multiple orders. Edge attributes, including color, size, quantity, unit price, and discount, exist between orders and products. Node and edge attributes are categorized in
Table 2.
Our target is to predict whether there will be a return on an edge of Order–Product in an order. The challenge stems from the dynamic nature of the Customer–Order–Product heterogeneous graph, which evolves as new orders are generated over time. Our task involves predicting returns for new orders, rather than those already present in the graph during the training phase. Thus, our prediction approach is inductive rather than transductive.
Heterogeneous Graph Neural Networks (HGNNs) have emerged as a powerful tool for analyzing complex systems with diverse entities and relationships [
25,
26,
27]. Unlike traditional Graph Neural Networks (GNNs) that operate on homogeneous graphs with uniform node and edge types, HGNNs excel in handling graphs with varying node and edge types. This flexibility enables them to model intricate relationships in real-world scenarios [
28], such as social networks [
29], text classification [
30,
31], recommendation systems [
32,
33], and biological networks [
34], where entities can exhibit diverse characteristics and connections.
Our Customer–Order–Product Heterogeneous Graph Neural Networks (COP-HGNNs) for return prediction leverage three types of nodes and edges to capture nuanced information, enabling a deeper understanding of the underlying structures. By incorporating diverse node and edge types, our model is better equipped to discern intricate patterns. Our solution comprises two subnetworks: an inductive embedding network and a prediction network, both trained jointly in a supervised manner.
3.2.1. Inductive Embedding Network
We leverage node and edge features to learn an inductive embedding function that generalizes to unseen nodes and edges. Instead of training distinct embedding vectors for each node and edge, we train a set of embedding and aggregator functions that learn to aggregate feature information from neighboring nodes and edges. During inference, we apply these learned functions to generate embeddings for entirely unseen nodes, enabling predictions on new data.
Initially, we transform categorical variables such as customer ID, product ID, product group, color code, and size code into continuous representations through a feature embedding layer. Specifically, we use a fully connected linear layer that ensures each input index possesses an associated weight vector:
where
is the one-hot vector representation of the nominal variable
X. During the training process, the weight matrix
is updated to learn the optimal embedding representations (
C is the size of the dictionary of
;
E is the size of each embedding vector).
Since the product node contains both numerical attributes (such as the recommended retail price listed in
Table 2) and nominal attributes (such as Product Group), we utilize a linear layer to transform the dimension of the numerical attributes into E. The product node embedding is then a composite representation of the product ID embedding, the Product Group (a categorical feature of product) embedding, and the embedding of numerical attributes:
The edge embedding representation of the ‘order-product’ linkage is obtained by
where
is the vector of numeric attributes, including ‘quantity’ and ‘discount’.
Generating the embedded representation of the order node involves capturing various facets, including details about the customer placing the order, the products contained within the order, and the distinct attributes of the order itself. We utilize the SAGE graph convolution with a mean aggregator [
35] to aggregate the embedded representations of the products within the order into the order embedding:
where
are network weight matrixes, and
represents an indexing function that maps the order to the customer who placed it, utilizing the edge between the customer and the order.
3.2.2. Return Prediction Network
After obtaining the node and edge embedding representations, we use them as inputs to an MLP (Multilayer Perceptron) neural network to generate return prediction:
where
and
are weight matrices, [·] represents the concatenate operator, and
converts
x into the return probability.
3.2.3. Loss Function
The training of the COP-HGNN utilizes the Binary Cross-Entropy Loss function (BCE Loss) to evaluate prediction outcomes (as denoted in Equation (6)).
where
y is either a 1 or 0, indicative of whether the product was returned or not, respectively.
N signifies the overall number of training samples. We utilize the Adam optimization algorithm [
36] to continuously update the network parameters via gradient back-propagation of the Loss function. This process assists in minimizing the discrepancy between the prediction results and the actual labels.
3.2.4. Online Prediction for New Orders
In the prediction stage, given the network weights obtained during training, we assume that when a customer places a new order, the prediction system can query the trained network to obtain the customer’s embedding representation via their Customer ID. We can similarly obtain the embedding vectors of each product in the order. The edge embedding can be calculated using Equation (3), and the order’s embedding representation can be calculated using Equation (4). Finally, the return probability prediction for each ordered product is calculated by applying Equation (5). In the event of new customers or new products (i.e., the ‘cold start problem’), the system will assign them a new ID code. During the network training phase, we have reserved ample embedding dimension space for these new customers or products.
The algorithm for the pseudo-training of both the embedding network and the return prediction network is outlined in Algorithm 1. Algorithm 2 encapsulates the predictive process for new orders.
Algorithm 1: Training COP-HGNN embedding and return prediction network |
Input: Nodes, node features, edges, and edge features. |
Output: HGNN network with optimization weights. |
1 initializing network weights , |
2 for m in 1…MAX-STEPs do |
3 for batch in 1…MAX-NUM-BATCHs do |
4 for f in [customer, product, color, size, product group] do |
# embedding customer, product, color, size, product group |
5 ; |
6 end |
# embedding product |
7 ; |
# embedding edge of order-product |
8 ; |
# embedding order with graph convolution |
9 |
# generating return prediction |
10 |
11 |
end |
12 Backpropagate the network weights |
13 end |
Algorithm 2: Return prediction for products in a new order |
Input: embedding network, prediction network, Customer ID, IDs of products in the order, order features, order-product features. |
Output: the return probability for each product in the order |
1 Calculating , , and using the COP-HGNN network weights; |
2 Calculating using graph convolution; |
3 for product in order do |
4 Generating return prediction using prediction network and embeddings; |
5 end |