0% found this document useful (0 votes)
39 views9 pages

Garment Returns Prediction For AI-Based Processing and Waste Reduction in E-Commerce

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views9 pages

Garment Returns Prediction For AI-Based Processing and Waste Reduction in E-Commerce

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Garment Returns Prediction for AI-Based Processing and Waste

Reduction in E-Commerce

a, b, c d
Marie Niederlaender Aena Nuzhat Lodi Soeren Gry , Rajarshi Biswas and Dirk Werth
August-Wilhelm Scheer Institut, Uni Campus D 5 1, Saarbrücken, Germany
{firstname.lastname}@aws-institut.de

Keywords: Returns Prediction, Machine Learning, Recommendation System, Sustainable Return Management,
E-Commerce, Fashion, Apparel, Artificial Intelligence.

Abstract: Product returns are an increasing burden for manufacturers and online retailers across the globe, both economi-
cally and ecologically. Especially in the textile and fashion industry, on average more than half of the ordered
products are being returned. The first step towards reducing returns and being able to process unavoidable
returns effectively, is the reliable prediction of upcoming returns at the time of order, allowing to estimate
inventory risk and to plan the next steps to be taken to resell and avoid destruction of the garments. This
study explores the potential of 5 different Machine Learning Algorithms combined with regualised target
encoding for categorical features to predict returns of a German online retailer, exclusively selling festive
dresses and garments for special occasions, where a balanced accuracy of up to 0.86 can be reached even for
newly introduced products, if historical data on customer behavior is available. This work aims to be extended
towards an AI-based recommendation system to find the ecologically and economically best processing strategy
for garment returns to reduce waste and the financial burden on retailers.

1 INTRODUCTION when it comes to CO2 emissions, contributing to the


5% of global emissions created by the fashion industry.
Global fashion e-commerce is estimated to have This makes the fashion industry one of the three most
reached a global size of US $871.2 billion in 2023 polluting sectors in the world (Vogue/BCG, 2021).
and is therefore the largest B2C e-commerce market The average CO2 equivalent caused by a single re-
segment, expecting further growth at a rate of 11.5 % turned package is valued at 1.5 kg (Forschungsgruppe
per year (Statista, 2023). In 2022, the vast majority Retourenmanagement, 2022). In order to reduce the
of returned packages in Europe are associated with environmental and economical impact of product re-
the fashion sector, in Germany as much as 91% of turns, the best way is to reduce returns in total. There
returned goods were fashion items (Forschungsgruppe are preventative strategies, but also reactive strategies
Retourenmanagement, 2022). The ever-increasing with regards to this issue (Deges, 2021), because some
number of returns results not only in high economical returns are inevitable, for example when customers
costs for e-commerce retailers, but also in an increas- order one item in different sizes or colours with the
ing burden for the environment: Due to the additional intention to keep only one or few of them, a custom re-
(financial) effort needed to resell returned items, send- ferred to as bracketing which is prevalent with fashion
ing returned items to landfill is one solution a lot of products (Bimschleger et al., 2019). Even when only
businesses opt for. It is estimated that in Germany in one item is ordered, there are several possibilities why
2021 alone, about 17 million returned items were dis- a garment is returned. It can be due to a wrong size,
posed and that the disposal rate for returns in other Eu- bad fit, personal preference, unmet expectations due
ropean countries is even higher (Forschungsgruppe Re- to a discrepancy between how the product is displayed
tourenmanagement, 2022). Returns also play a big role online versus its appearance in real life, or even be-
cause of insufficient quality or damaging. No matter if
a a preventive or reactive strategy is chosen to tackle the
https://orcid.org/0009-0008-1935-821X
b issue, the first step to be able to act is to be prepared,
https://orcid.org/0009-0001-4739-4743
c so this study investigates different methods to predict
https://orcid.org/0000-0002-4441-0517
d fashion product returns utilizing several machine learn-
https://orcid.org/0000-0003-2115-6955

156
Niederlaender, M., Lodi, A., Gry, S., Biswas, R. and Werth, D.
Garment Returns Prediction for AI-Based Processing and Waste Reduction in E-Commerce.
DOI: 10.5220/0012321300003636
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 16th International Conference on Agents and Artificial Intelligence (ICAART 2024) - Volume 2, pages 156-164
ISBN: 978-989-758-680-4; ISSN: 2184-433X
Proceedings Copyright © 2024 by SCITEPRESS – Science and Technology Publications, Lda.
Garment Returns Prediction for AI-Based Processing and Waste Reduction in E-Commerce

ing algorithms. This paper is part of a wider scope of tional, rule-based, lazy and decision tree algorithms.
research that aims at using return predictions to cre- The best performer was the M5P decision tree algo-
ate an AI-based recommendation system for the more rithm, which combines elements of decision trees and
(cost-)effective and eco-friendly handling of unavoid- multiple linear regression. In the rule-based segment,
able returns. Section 2 of this paper states different M5Rules and Decision Table performed similarly well.
studies that have been performed in the area of product Support Vector Regression and Linear Regression also
returns prediction and gives an overview of the dif- performed well among the functional algorithms.
ferent methods used and circumstances that had most Asdecker and Karl (2018) compared simple data min-
impact on increased or decreased return probabilities. ing methods with complex data analysis methods to
After describing the data utilised in this study in sec- assess their suitability for predicting customer returns.
tion 3, we describe the steps undertaken and machine They were able to use data on delivery and returns
learning methods used to make reliable return predic- information. Positive correlations with the likelihood
tions in section 4. In section 5, we discuss the results of returns were found for the number of items in the
using the performance measures Balanced Accuracy, parcel, the total value of the items in the parcel and
Area under the ROC-Curve (AUC), Precision and Re- the age of the customer account. Delivery time was
call to get the full picture on the model’s strengths negatively correlated. When comparing analysis meth-
and shortcomings. The results are compared for the ods, even simple data mining methods such as binary
introduction of new products with unknown return his- logistic regression and linear discriminant analysis did
tory, for future orders or a selection of random orders, not perform much worse than more complex methods
respectively. The final section gives a summary of the such as ensembles (Asdecker and Karl, 2018).
findings and gives an outlook on possibilities for fu- In another study, Asdecker et al. (2017) used linear and
ture research based on research gaps and shortcomings logistic regression to examine data sets from a German
identified in this paper. online shop specialising in women’s clothing. Vari-
ables used included coupons, payment method, order
and return history, and basket contents. The highest
2 RELATED WORK information content for predicting the likelihood of re-
turns was found when using historical returns informa-
The causes of returns can be many and varied. In tion for each item and customer. The impact of adding
order to capture the possible drivers and returns in the a free gift to the order was also examined. Among
fashion and apparel sector, research in recent years other things, the study found that ordering the same
has used a variety of techniques related to machine garment in different colours reduced the likelihood of
learning algorithms (Gry. et al., 2023). A selection of returns. The addition of a free gift also reduced the
current approaches is presented below. likelihood of returns in the study. On the other hand,
the likelihood of returns increased when paying on
account, using a voucher and as the average price of
Feature Selection, ML Models and Analysis the order increased.
Methods: In fashion e-commerce, retailers typically
work with large data sets, some of which contain lit-
Customer Reviews, Prices, Promotions and Pay-
tle usable information. It is often an aggregation of
ment Methods: Sahoo et al. (2018) used a two-stage
a large number of data points, only a few of which
probit model, a type of binary regression model (Heck-
contribute to the quality of the ML models. However,
man, 1979), to investigate how product reviews affect
in order to make accurate predictions of returns, it
purchases and returns. They found that products with
is important that the ML models contain informative
fewer product reviews led to more bracketing. Brack-
features. To assist in the selection of these features,
Urbanke et al. (2015) developed Mahalanobis feature eting refers to the consumer behaviour of ordering a
extraction in their research to help reduce the dimen- selection of items with the aim of keeping only a frac-
sionality of large sparse datasets. During development, tion of them after trying them on (Bimschleger et al.,
the authors were able to draw on returns data from a 2019). On the other hand, items with a large number
large German fashion retailer. Mahalanobis was able of reviews were less likely to be returned. The influ-
to reduce the required storage capacity by more than ence of item price on the likelihood of return was also
99%, outperforming the other feature extraction meth- examined. Higher prices showed a lower likelihood
ods investigated in the study. of returns than lower prices, which is attributed to the
Tüylü and Eroğlu (2019), for example, have been in- mental effort consumers put into deciding to buy ex-
volved in testing and comparing different ML models pensive items.
in the context of predicting returns. They tested func- Free shipping is also considered to be a significant fac-

157
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence

tor influencing the likelihood of returns. Shehu et al. been extracted. For the group of customers where the
(2020) used a Type II Tobit model (Van Heerde et al., first sample was a return, the return probability for
2005) in their study. They found that free shipping pro- the remaining instances is P(r|y = 1) = 0.85. For the
motions increase the willingness to buy items that are other group of customers, namely where the first sam-
more difficult to evaluate from the customer’s point of ple was not a return, the return probability decreases to
view, and thus also increase the likelihood of returns. P(r|y = 0) = 0.56 for the remaining instances, which
General free shipping offers outside of promotions indicates that for customers who returned once, the
also show an increased likelihood of returns (Lepthien probability that they return increases for the remainder
and Clement, 2019). of their orders.
Yan and Cao (2017) examined the effect of payment
method and product variety on the likelihood of returns.
The payment method proved to be a good indicator of 4 EXPERIMENT SETUP
the likelihood of returns. When customers paid in cash,
they were less impulsive and made fewer non-essential In the scope of this paper, we investigate 5 different
purchases than when they paid by credit card, and were
ML algorithms using different settings for training and
therefore less likely to return. They also found that
optimisation. The following paragraphs describe the
the likelihood of returns decreased with the variety of
steps that were undertaken for imputation, automated
items, such as shoes, clothing and accessories. In con-
feature selection, feature engineering, encoding and
trast to bracketing, this does not involve the selection
hyperparameter tuning, which were the same for each
of multiple items to try on.
of the five algorithms. Additionally, 3 settings were set
for model training and hyperparameter optimisation to
further investigate which aspects affect performance
3 DATASET in which way.

The data used in this work consists of sales and re- Imputation, Automated Feature Selection and
turns data logged via the retailer’s ERP-System. We Feature Engineering: As the first step of prepro-
have been provided a subset of this data, containing all cessing, columns and rows with small or no infor-
the sales and returns made via an online-marketplace mational use were dropped. Some feature columns
for fashion, starting from April 1st 2022 until March were removed manually beforehand when there was
31st 2023. To exclude any effects of the Covid-19 no possible dependency between the feature and the
Pandemic and data at the end of the period where re- target. Remaining missing values were filled with −1
turns were yet to come in, only data from September for numerical and with a blank string for categorical
1st 2022 to February 26 2023 was used for the pre- variables. Decision factors for automated removal of
dictions. The data consists of two tabular datasets, features were, if the percentage of missing values was
namely sold articles and returned articles, where each over a certain threshold of 50% or if there was only
instance represents a single product that has been sold one feature value. Furthermore, for each feature pair,
or returned. The entries can be clustered into orders or redundant features were dropped if the Pearson cor-
returns of multiple products using a unique order-ID relation coefficient exceeded 0.95. Features with no
and a soldarticle-ID, which represents a product in correlation to the target variable were dropped. To
a specific size and colour. The same method allows feed the models information on different materials and
to link the tables to form one table containing sales, material combinations of garments, different fabric
customer and product information and the boolean types were extracted from the product description and
target-column stating if the sale has been returned or added as binary features. New features were also cre-
not. The dataset contains information on the price and ated to reflect properties concerning each order as a
properties of items such as their colour and material, whole and making bracketing behavior by customers
but also on the city of the customer, order date and a more apparent. Features added were the number of
customer ID to identify if a customer ordered multiple items in a given order, the number of same items in the
times. The overall return probability in this dataset is same colour, same size or clothing category (features
P(r) = 0.73, which may be higher then other average 1-3 in Table 1). However, the creation of the remain-
return rates due to the specialisation on festive dresses ing features mentioned in Table 1 was necessary to
and garments which gives rise to other fitting standards exceed a balanced accuracy of 0.61 for any of the ML
and different consumer behavior compared to everyday algorithms employed which indicates that historical
wear. Based on a random sample for a given customer, customer behavior as well as order-related observa-
estimates of the conditional return probabilities have tions give important insights to potential returns. One

158
Garment Returns Prediction for AI-Based Processing and Waste Reduction in E-Commerce

Table 1: Features that were created to target different aspects of consumer behavior, such as general return behavior, bracketing,
ordering for other people or impulse purchases and literature referring to this consumer behavior or investigating said features.
Nr Feature Explanation Literature
number of same items in multiple items were ordered in the same size
1 the same size for given but possibly in another colour, potential brack-
order ID eting behavior
number of same items in multiple items were ordered in the same
2 the same colour for given colour but possibly in a different size, poten- Makkonen et al. (2021),
order ID tial bracketing behavior Asdecker et al. (2017),
number of items in Yan and Cao (2017),
multiple items from the same category were Bimschleger et al.
the same category (e.g.
3 ordered, potential bracketing behavior, lack of (2019)
dress, pants..) for a given
diversity in order
order ID
number of same items in the same item was ordered multiple times, pos-
4 an order for a given order sibly in different colours and sizes, potential
ID bracketing behavior
number of items in one correlation of larger number with larger returb Asdecker and Karl
5
order probability, potential bracketing behavior (2018)
see if and how recently a customer last or-
number of days since the
6 dered something; for first time customers Yan and Cao (2017)
last order
value is set to > 400 days
see if and how recently a customer ordered the
same item; ordering the same item again may
number of days since last
7 indicate stronger intention to keep/ ordering
ordering same item
correct size when ordering again; for first time
ordering item value is set to > 500 days
if less than 4 entries use P(r|y = 1) if major-
historical return probabil- Cui et al. (2020), As-
8 ity is true, else use P(r|y = 0); if there’s no
ity of customer decker et al. (2017)
majority, use P(r|y = 1)
size varies by more than
1 value within a given or- potential bracketing behavior, indicator that
9
der for given clothing cat- part of order is for other people
egory Makkonen et al. (2021)
size deviates usual for bool variable indicating if a customer orders
10
given clothing category their historical size or not
relative and absolute dis- indicator for impulse purchase; unclear if rel-
11, 12
count on an item ative or absolute value has more effect
Asdecker et al. (2017)
relative and absolute dis- to observe the effect of discounts on order
13, 14
count on order level

possible explanation for this observation is that the ma- taining at least two items, which underlines the effect
jority of return reasons do not depend on the specific that bracketing behavior has on return volume. For
item and its properties, but on the context in which the the 14 newly engineered features, no elimination tech-
order has been placed, like the customer ordering a se- niques were used and all of them were incorporated
lection of items with the intention to only keep one or a into the final ML models. This procedure resulted
few and some customers being more prone to returning in a total of 48 features, including the 14 engineered
more frequently, as the difference between the prob- features (Table 1) for reflecting customer behavior.
abilities P(r|y = 0) and P(r|y = 1) suggests. Further, These features were created based on indicators for
the average return probability on order level increases frequently returning customers, customers ordering for
from Porder (nitems = 1) = 0.73 for orders containing a other people, and customers ordering a selection of
single item to Porder (nitems > 1) = 0.94 for orders con- items, for example in different sizes, with the intention

159
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence

to return most of them. The remaining features are a Data Split Encoding
set of boolean features for different materials, a colour
feature, customer-ID, article-ID (not unique regarding 5 folds

size or colour), soldarticle-ID (unique regarding size Fit glmm

and colour), day, month and year of the order, the price Encoding

of the item, the total price of the order, the weight of


the garment, the product line, the style and fit, the
country it has been manufactured in and the clothing
category (e.g. dress, pants, bolero, skirt).

Encoding and Scaling: Numerical features were 5 fits

scaled to have unit variance and a mean of zero. The


Transform

dataset contains many high cardinality categorical fea- Transform


test folds

tures, which can be a problem when it comes to choos-


ing an encoding technique. As in a recent benchmark
Data

study by Pargent et al. (2022), regularised target encod- sorted by

Repeat for

ing led to consistently improved results in supervised order date


each training

fold

machine learning with high cardinality features com-


pared to other state of the art encoding techniques, in
this study regularised target encoding is the method
of choice. This type of encoding can be interpreted Figure 1: Scheme for the preparation of training and test
sets from the original dataset, followed by 5-fold regularised
as a generalised linear mixed model (Micci-Barreca, target encoding of the training set Xtr , and subsequent ap-
2001; Kuhn and Johnson, 2019), where a linear target plication of the resultant encoding onto the respective test
predictor for each feature value is combined with a folds.
random effects. To prevent overfitting to the training
data, this encoding method is combined with 5-fold Hyperparameter Tuning and Model Training:
stratified cross-validation (CV), where each left out 5 different Models were used for training, including K
fold in the training set Xtr is encoded based on the Nearest Neighbours (KNN), Gaussian Naive Bayes
target encoding fit result for the remaining folds, re- (NB), Support Vector Machines (SVM), Bagged
sulting in five training mappings. The test sets are also Decision Trees (BDT) and XGBoost (XGB), a
divided into 5 stratified folds and then mapped to the regularising gradient boosting algorithm based on
training mappings. A scheme on how the data was Decision Trees. The following paragraphs show the
split into training and test sets and then encoded using reasoning behind choosing this set of algorithms,
this procedure, is shown in Figure 1. To implement including their possible advantages and limitations.
regularised target encoding, we use a generalised lin- KNN is a supervised learning algorithm that
ear mixed model (glmm) encoder, where infrequent predicts the target class based on a class vote of
values create outcomes near the grand mean, result- its adjacent neighbors. Due to its straight forward
ing in reduced sensitivity to outliers. An exception approach it is easy to interpret and local patterns in
for this method is the encoding of materials, which feature space can be captured, which might suitable
are one-hot encoded to reflect different combinations for the imposed prediction problem. However, one
of materials for one product. Few experiments were major drawback is its lack of efficiency as a lazy
performed where categorical columns were encoded algorithm. Another aspect to keep in mind is its
with no cross-validation generalised linear mixed mod- proneness to bias in the case of class imbalance due to
els. However, we found the models to be very prone the existence of more neighbours with the majority
to overfitting and proceeded with 5-fold CV glmm- class (Murphy, 2018).
encoding, which is in accordance with Pargent et al. Gaussian Naive Bayes is a probabilistic algorithm
(2022). Most of orders were placed by unique cus- that assumes conditional independence of features.
tomers who did not order more than once in the ob- The numerical features are assumed to have a normal
served time scope, but to represent different personas distribution. This algorithm is known for its simplicity
of return behavior, customer IDs were encoded using and computational efficiency. It might be a suitable
target encoding with a smoothing parameter of α = 20, fit for a probabilistic setting such as estimating the
resulting in 8 different numerical values. New cus- return probability of items and also in situations
tomer IDs in the test sets were encoded as the grand where the data is limited, such as for newly introduced
mean. products. However, if the conditional independence of

160
Garment Returns Prediction for AI-Based Processing and Waste Reduction in E-Commerce

features is not fulfilled because of strong correlations, most models, random oversampling was the method
it may not deliver adequate performance (Bishop and of choice, except for SVMs, where exponentially
Nasrabadi, 2006). increasing model complexity with the number of
Support Vector Machines are a supervised learning instances gave rise to selecting random undersampling.
method used to define a hyperplane that separates For testing the results, three test sets were created,
the two target classes. This separation is determined as shown in Figure 1. First, all instances related to
by support vectors, which are crucial instances in 10 random products were removed from the original
the dataset that influence the positioning of the dataset by their Article ID to form a test set Xte,product ,
hyperplane. The primary objective is to maximize which consisted of 2, 851 instances, with the aim
the margin between the hyperplane and the instances to mimic the introduction of a new product line.
of each class. The use of a nonlinear kernel allows Second, from the remainder of the data, the last 15%
for the creation of nonlinear SVMs, which can be an were used as a second test set Xte,future , consisting of
advantage, but makes the outcome very sensitive to 6, 000 instances, testing the scenario of new incoming
the chosen kernel function. Due to the maximisation orders. Last, after removing these instances from the
of the margins, the models can become fairly robust to dataset, 5% were sampled randomly to form a third
outliers, and high dimensional data can be handled test set Xte,rand with 1, 700 instances. For some of
effectively. However, model complexity increases the models, a random 10%−portion of Xte,rand was
exponentially with the amount of training examples, used for hyperparameter optimisation, using only
which can be a major drawback (Murphy, 2018). the remaining instances of Xte,rand as a test set. This
Bootstrap Aggregating (Bagging) Decision Trees setting is labeled R = 1 in Figure 2.
emerge as a suitable option for predicting garment
returns in data characterized by high cardinality
categorical features, owing to the discriminative nature Performance Evaluation: In order to fully as-
of split criteria employed during the construction of sess the performance of the tested models on the three
the Decision Tree. Combining this advantage with test scenarios, the balanced accuracy (BA) was chosen
bagging enhances performance, can improve model as the most suitable indicator of model performance,
stability and reduces the risk of overfitting, if the base due to the imbalanced class ratio of roughly 70 to 30.
classifiers are not too complex. Drawbacks can be To give true positives and true negatives equal weight
the lack of interpretability of the prediction results for the evaluation, the balanced accuracy is given by
due to the nature of ensembles and bias regarding the the average of the true positive rate (TPR) and true
training set (Murphy, 2018). negative rate (TNR), also referred to as sensitivity or
Lastly, XGBoost (Extreme Gradient Boosting) is recall, and specificity:
also an ensemble learning method based on decision
trees, which is widely used in state-of-the-art literature T PR + T NR
BA = (1)
and machine learning challenges, and known for its 2
scalability (Chen and Guestrin, 2016). It has been To gain the full picture of how many of the returns
applied successfully to a wide range of applications, could be predicted as such, we also investigate the re-
such as store sales prediciton and customer behavior call or TPR. Further, an estimation of how many false
prediction (Chen and Guestrin, 2016), which indicates positives go along with the correct prediction of the
that it can be a suitable solution for garment returns positive class is given by the precision, which is the
prediction in this specific setting. The ability to ratio of true positives and all test instances classified
get feature importances for this method can also be as positive. The Area under the ROC-Curve (AUC)
beneficial. A possible limitation is the proneness to is shown as an additional metric, indicating the rela-
overfitting due to the sensitivity of boosting methods tionship of true positives and false positives for varied
to outliers. return probability thresholds, which can help assess
Hyperparameters were tuned using 4 to 7-fold the suitability of the models to be used as an output
CV on the training set, testing combinations using for return probability estimates.
a randomised grid. As the imbalanced distribution
of target values can lead to a bias towards the
positive class, random oversampling (labeled O = 1
in Figure 2) of the minority class and random 5 DISCUSSION OF RESULTS
undersampling (labeled U = 1 in Figure 2) were used
as experiment settings besides keeping the training The results are summarised in Figure 2 and show the
sample as is, which contained 33, 777 instances. For four performance metrics used to evaluate the models
across the three test sets for randomly selected data,

161
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence

Figure 2: Performance scores for the trained models on the three test sets Xte,rand ,Xte,future and Xte,product . Different marker
shapes indicate varied experiment settings, namely if hyperparameters were optimised on a portion of Xte,rand and testing on
the remainder (labeled R=1, else labeled R=0 if CV on Xtr was used instead), or if random oversampling (O=1) or random
undersampling (U=1) was used to counter class imbalance.

future order data and new product data. The influences can be seen for SVMs, where the balanced accuracy
of the settings R, O and U will be discussed in the changed from below 0.6 to 0.84 up to 0.87 across the
following. test sets, which can be explained by the significant
improvement in precision of up to 0.15.
The Role of Optimisation Sets (R = 1 or
R = 0): One observation is that for SVMs, bagged
Decision Trees and XGBoost solely optimising on the The Role of Random Over- and Undersampling
training set led to the worst performances across all (O = 1, O = 0 and U = 1, U = 0): In contrast
metrics and test sets, indicating that hyperparameter to what one might expect for the imbalanced data
optimisation on the training data might not be optimal used in this work, no significant improvement on
for these algorithms, whereas for KNN and Naive the performances, especially on balanced accuracy
Bayes no significant difference between optimising can be found when using random oversampling of
on a portion of Xte,rand and optimising on the training the minority class or random undersampling of the
set can be found across the metrics and test sets, majority class. Only for NB the over- or undersampled
except for improved precision and balanced accuracy versions with R = 0 seem favorable for an improved
at the cost of a lower recall rate for the NB models. balanced accuracy, which might be explained by
For SVM, BDT and XGB the performances line up the fact that Naive Bayes has a generally high bias,
next to NB and KNN, if 10% of Xte,rand are used so this can be counteracted by random over- or
for optimisation, the biggest overall improvement undersampling. For the other models, choosing O = 0

162
Garment Returns Prediction for AI-Based Processing and Waste Reduction in E-Commerce

and U = 0 seems favorable, as the balanced accuracies the mapping obtained by the training set. When taking
rank among the best with simultaneously high recall this into account, SVMs are among the best performers
rates. for the given data. Naive Bayes and K Nearest Neigh-
bours have shown to be very robust to the different
training settings. Balanced accuracies reach a max-
Performances for Different Test Sets: The over- imum of 0.86 for Xte,future , 0.87 for Xte,rand and 0.86
all similar performances across all three test sets indi- for Xte,product . With newly added features based on the
cate that there is not too much variance across test sets. historical customer behavior and potential bracketing
We can also infer from this that future orders on this behavior, a high recall rate of up to 0.99 can be reached
dataset can be classified correctly with a high likeli- across test sets. This implies that precise prediction
hood by observing historical data over the time scope can become a challenge when the available amount
of six months. The importance of data on the historical of historical data on customer behavior is limited or
return behavior of customers is in accordance with find- if the majority of customers are first-time customers.
ings by Asdecker et al. (2017). Also, the introduction In this study, a balanced accuracy of 0.61 could not
of new products with possibly very different properties be exceeded without utilising historical customer data.
like style, fit and colour, that have not been part of the For this situation, it can be a reasonable approach to
training set rank only slightly lower in balanced accu- apply clustering methods in order to be able to clas-
racy. A larger difference can be seen in AUC, where sify the return behavior of new customers based on
the 0.9 mark is not surpassed for Xte,product . This in- similarities with existing customers. Adding historical
dicates that for new products, the models’ abilities to return rates for different Article IDs and other cate-
make trade-offs between the sensitivity and specificity gories should also be investigated. This result should
is not as effective. However, when comparing the over- be seen in the context of the clothing category, namely
all best-ranking models (i.e. ignoring SVM BDT and festive dresses and garments. Therefore, further explo-
XGB for R = 0), a slightly improved precision can ration with data from retailers which include a variety
be reached compared to Xte,future and Xte,rand . Never- of other, non-festive clothing categories is indispens-
theless, slight changes might manifest differently on able. Additional research is needed to explore the
different test sets and other validation sets, when other potential of predicting return rates for products that
random products are chosen or future orders from other have not yet been manufactured, which can make an
times are selected. Another important point when in- enormous contribution towards waste reduction and
terpreting the performance on Xte,product is to keep in CO2 reduction in the fashion industry. Return predic-
mind that customer behavior played a significant role tions lay the foundation for future research focusing on
in correctly classifying these instances, but upon the the most sustainable processing of returned garments
introduction of a new product line one might not yet and optimisation of reverse logistics processes based
have exact order information. It is also desirable to be on return probabilities on order and item level. We
able to make predictions before new products are even recommend the investigation of return reasons as key
manufactured to get a first estimate on the return rate information for return processing and research on as-
to be expected. signing most probable return reasons to orders with
large return probability. The problem of high return
rates is of large relevance from an economic but also
from an environmental perspective, but there is great
6 CONCLUSION AND OUTLOOK potential for improvement by employing AI and ML
applications. This research provides the basis to work
This work explores the application of five classical towards an AI-Based recommendation system that can
Machine Learning algorithms for the prediction of e- be integrated in to a system used to manage orders and
commerce returns using up-to-date data from a manu- returns (e.g. Enterprice Resource Planning (ERP) or
facturer of festive garments. Categorical features with Product Data Management (PDM) systems), where
high cardinality were encoded using regularised target return probabilities on order and item level shall give
encoding using 5-fold CV generalised target encoding the necessary insights to provide recommendations for
(Pargent et al., 2022), which is a novel approach in the fast and sustainable processing of returns.
context of returns prediction. Three settings for hyper-
parameter optimisation and model training were ex-
plored. The results indicate that for tree-based models
and SVMs, it is favorable to optimise hyperparameters
with an additional set that is not the originally target
encoded training set, but that has been encoded using

163
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence

ACKNOWLEDGEMENTS Lepthien, A. and Clement, M. (2019). Shipping fee sched-


ules and return behavior. Marketing Letters, 30(2):151–
This research was funded in part by the Ger- 165.
man Federal Ministry of Education and Research Makkonen, M., Frank, L., and Kemppainen, T. (2021).
(BMBF) under the project OptiRetouren (grant num- The effects of consumer demographics and payment
method preference on product return frequency and rea-
ber 01IS22046B). It is a joint project of the August- sons in online shopping. In Bled eConference, pages
Wilhelm Scheer Institut, INTEX, HAIX and h+p. 567–580. University of Maribor.
August-Wilhelm Scheer Institut is mainly entrusted Micci-Barreca, D. (2001). A preprocessing scheme for
with conducting research in AI for forecasting returns high-cardinality categorical attributes in classification
volume and for recommendations based on AI. and prediction problems. ACM SIGKDD Explorations
Newsletter, 3(1):27–32.
Murphy, K. P. (2018). Machine learning: A probabilistic per-
spective (adaptive computation and machine learning
REFERENCES series). The MIT Press: London, UK.
Pargent, F., Pfisterer, F., Thomas, J., and Bischl, B. (2022).
Asdecker, B. and Karl, D. (2018). Big data analytics in Regularized target encoding outperforms traditional
returns management-are complex techniques necessary methods in supervised machine learning with high car-
to forecast consumer returns properly? In 2nd Interna- dinality features. Computational Statistics, 37(5):2671–
tional Conference on Advanced Research Methods and 2692.
Analytics. Proceedings, pages 39–46.
Sahoo, N., Dellarocas, C., and Srinivasan, S. (2018). The
Asdecker, B., Karl, D., and Sucky, E. (2017). Examining impact of online product reviews on product returns.
drivers of consumer returns in e-tailing with real shop Information Systems Research, 29(3):723–738.
data. In Hawaii International Conference on System
Shehu, E., Papies, D., and Neslin, S. A. (2020). Free shipping
Sciences, pages 4192–4201.
promotions and product returns. Journal of Marketing
Bimschleger, C., Patel, K., and Leddy, M. (2019). Bringing Research, 57(4):640–658.
it back: Retailers need a synchronized reverse logistics
Statista (2023). Fashion ecommerce report
strategy. Technical report, Deloitte Development LLC.
2023. https://www.statista.com/study/38340/
Bishop, C. M. and Nasrabadi, N. M. (2006). Pattern recog- ecommerce-report-fashion/. Online; accessed
nition and machine learning, volume 4. Springer. 2023-08-09.
Chen, T. and Guestrin, C. (2016). Xgboost: A scalable Tüylü, A. N. A. and Eroğlu, E. (2019). Using machine learn-
tree boosting system. In Proceedings of the 22nd acm ing algorithms for forecasting rate of return product
sigkdd international conference on knowledge discov- in reverse logistics process. Alphanumeric Journal,
ery and data mining, pages 785–794. 7(1):143–156.
Cui, H., Rajagopalan, S., and Ward, A. R. (2020). Pre- Urbanke, P., Kranz, J., and Kolbe, L. M. (2015). Predict-
dicting product return volume using machine learning ing product returns in e-commerce: The contribution
methods. European Journal of Operational Research, of mahalanobis feature extraction. In International
281(3):612–627. Conference on Interaction Sciences, pages 1–19.
Deges, F. (2021). Retourencontrolling im online-handel. Van Heerde, H. J., Gijsbrechts, E., and Pauwels, K. (2005).
Controlling – Zeitschrift für erfolgsorientierte Un- Price war: what is it good for? store incidence and
ternehmenssteuerung, 2/2021:61–68. basket size response to the price war in dutch grocery
Forschungsgruppe Retourenmanagement (2022). retailing. Tilburg University, LE Tilburg, The Nether-
Ergebnisse des europäischen retourentachos lands.
veröffentlicht. https://www.retourenforschung.de/info- Vogue/BCG (2021). Consumers’ adaption to sus-
ergebnisse-des-europaeischen-retourentachos- tainability in fashion. https://web-assets.bcg.
veroeffentlicht.html. Online; accessed 2023-01-26. com/27/f3/794284e7437d99a71d625caf589f/
Gry., S., Niederlaender., M., Lodi., A., Mutz., M., and consumers-adaptation-to-sustainability-in-fashion.
Werth., D. (2023). Advances in ai-based garment pdf. Online; accessed 2023-08-09.
returns prediction and processing: A conceptual ap- Yan, R. and Cao, Z. (2017). Product returns, asymmetric
proach for an ai-based recommender system. In information, and firm performance. International Jour-
Proceedings of the 20th International Conference on nal of Production Economics, 185:211–222.
Smart Business Technologies - ICSBT, pages 15–25.
INSTICC, SciTePress.
Heckman, J. J. (1979). Sample selection bias as a specifica-
tion error. Econometrica: Journal of the econometric
society, pages 153–161.
Kuhn, M. and Johnson, K. (2019). Feature engineering and
selection: A practical approach for predictive models.
n taylor & francis group.

164

You might also like