0% found this document useful (0 votes)
16 views30 pages

Using AI To Detect Panic Buying

Uploaded by

Tin Tran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views30 pages

Using AI To Detect Panic Buying

Uploaded by

Tin Tran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

AI & SOCIETY

https://doi.org/10.1007/s00146-023-01654-9

NET WORK RESEARCH

Using AI to detect panic buying and improve products distribution


amid pandemic
Yossiri Adulyasak1 · Omar Benomar2 · Ahmed Chaouachi2 · Maxime C. Cohen2 · Warut Khern‑am‑nuai3

Received: 11 January 2023 / Accepted: 29 March 2023


© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2023

Abstract
The COVID-19 pandemic has triggered panic-buying behavior around the globe. As a result, many essential supplies were
consistently out-of-stock at common point-of-sale locations. Even though most retailers were aware of this problem, they
were caught off guard and are still lacking the technical capabilities to address this issue. The primary objective of this paper
is to develop a framework that can systematically alleviate this issue by leveraging AI models and techniques. We exploit
both internal and external data sources and show that using external data enhances the predictability and interpretability
of our model. Our data-driven framework can help retailers detect demand anomalies as they occur, allowing them to react
strategically. We collaborate with a large retailer and apply our models to three categories of products using a dataset with
more than 15 million observations. We first show that our proposed anomaly detection model can successfully detect anoma-
lies related to panic buying. We then present a prescriptive analytics simulation tool that can help retailers improve essential
product distribution in uncertain times. Using data from the March 2020 panic-buying wave, we show that our prescriptive
tool can help retailers increase access to essential products by 56.74%.

Keywords Anomaly detection · Retail operations · Panic buying · COVID-19

1 Introduction Panic buying in most product categories causes short-


term shocks and false demand signals that may result in
The COVID-19 pandemic triggered a wave of panic-buying supply chain disruptions and adversely affect long-term
behavior worldwide at the end of Q1 2020 (e.g., Furutani profitability. This is due to the fact that customers stock-
2020; Lufkin 2020; Settanni 2020). Consumers rushed to pile these products without intending to consume larger
grocery stores and stockpiled large amounts of essential sup- quantities than usual. In the wake of the consequences of
plies, such as face masks, hand sanitizers, canned food, and the panic-buying wave at the end of Q1 2020, retailers are
toilet paper. Retailers were naturally not prepared for such eager to find a solution. The primary objective of this paper
behavior; consequently, many essential products quickly is to develop a framework that can systematically alleviate
stocked out at most point-of-sale locations, such as grocery this issue by leveraging artificial intelligence (AI) models
stores and pharmacies (Edmiston 2020). By the time retail- and methods. Since such panic-buying behavior cannot be
ers realized the extent of the situation, it was unfortunately easily predicted because it is a rare event, our goal is to
too late to react. deploy AI tools to promptly detect panic-buying behavior
and strategically react. Specifically, our proposed framework
can help retailers identify situations in which sales patterns
* Warut Khern‑am‑nuai become abnormal, constituting demand anomalies in real
[email protected] time. Furthermore, for each detected anomaly, the model
provides a concrete, tangible interpretation that can help
1
HEC Montreal, 3000, Chemin de La Cote‑Sainte‑Catherine, retailers understand the reasons behind the anomaly. Once
Montreal, QC H3T 2A7, Canada
the demand anomaly is detected, retailers can strategically
2
IVADO Labs, 6795 Rue Marconi #200, Montreal, react using the following levers: (i) implementing a ration-
QC H2S 3J9, Canada
ing policy that imposes a limit on the number of items each
3
Desautels Faculty of Management, McGill University, 1001 customer can purchase and (ii) adapting the replenishment
Rue Sherbrooke O., Montreal, QC H3A 1G5, Canada

13
Vol.:(0123456789)
AI & SOCIETY

strategy to cope with the demand surge. Understanding when business requirements and preferences. We also present an
to activate a rationing policy, for which categories of prod- end-to-end real-world application of the deployment of our
ucts, and for how long it can have a significant impact on data-driven model in collaboration with one of the largest
both retailers and consumers (especially given that we will retail grocery chains in North America. We apply our model
likely experience several future waves of panic buying). In to three categories of products—toilet paper, canned soup,
particular, when implemented properly, such rationing poli- and household cleaners—during the COVID-19 pandemic
cies can broaden the distribution of essential products to a (January–May 2020). Our data comprise more than 15 mil-
large number of customers (without incurring any additional lion observations. We first showcase the accuracy of our
costs to the retailer). Interestingly, many retailers imple- anomaly detection algorithm compared to several bench-
mented such rationing policies in 2020 and 2021. However, marks. We then present a scenario analysis of the impact
the specifics of these policies were often based on intuition of imposing different versions of a rationing policy. Our
and emotional reactions, as opposed to being data driven. In results suggest that by implementing the right rationing
our conversations, several retailers acknowledged the lack policy at the right time, stockouts could be entirely avoided
of tools at their disposal to guide the deployment of ration- (or, at least, significantly mitigated), and access to essential
ing policies. In addition, these rationing policies were often products could be granted to 56.74% additional customers.
implemented too late, as stated in several anecdotal quotes, Finally, we leverage our model (estimated using data from
such as the following: “What we learned was we didn’t the first COVID-19 panic-buying wave at the end of Q1
impose product restrictions early enough, and that created a 2020) to simulate the efficacy of our prescriptive tool in a
run on the system and created some difficulties for people.”1 future panic-buying event (either triggered by the pandemic
In this paper, we collaborate with a large retail grocery or by other worrisome events). Via extensive simulations,
chain in North America to develop a real-time anomaly we showcase how our AI-based framework can help retailers
detection system (which leverages unsupervised and super- improve essential product distribution.
vised machine learning models) that can identify and
flag pertinent anomalies related to panic buying. Specifi-
cally, we propose a two-stage machine learning approach 2 Relevant literature and theoretical
that allows us to incorporate various internal data sources background
(e.g., promotions, sales, inventory, store traffic, and online
transactions) as well as external data (e.g., social media, In this section, we first survey prior studies that are closely
Google Trends) to detect anomalies. Such external data related to this paper. We then discuss the theoretical back-
can help signal potential anomalies as they occur, which ground that underpins our study, specifically on the drivers
will ultimately provide enough time for retailers to react. of panic-buying behavior and the resulting consequences.
We obtain a strong validation that our approach success-
fully detects anomalies for the right period and that virtually 2.1 Related literature
no anomalies are detected during periods where we do not
expect to have them. Our anomaly detection model differs In this subsection, we survey prior relevant literature. We
from traditional methods in that we adapt the definition of first review the extensive studies in data-driven retail opera-
anomalies based on specific business requirements. More tions. We then review several existing papers on anomaly
precisely, instead of identifying anomalous patterns on a detection models and how these models are used in the
purely statistical basis, we search for instances that have context of supply chain management. Following that, we
concrete undesired consequences (e.g., triggering a stockout discuss several papers that leverage external data, such as
in the near future). To do so, we propose to use a two-stage social media content, to augment predictive tasks. Lastly, we
AI model in which the first stage identifies anomalies and survey prior works that study the social impact of artificial
the second stage classifies those anomalies as pertinent or intelligence.
non-pertinent. The second stage has the flexibility to handle
different types of anomalies, depending on the managerial 2.1.1 Data‑driven retail operations
context under consideration. The novelty of this paper lies
in its application to business contexts, especially regarding In recent years, retail managers have increasingly integrated
how the detected anomalies are classified based on busi- big data analytics into operations (Choi et al. 2018; Fisher
ness requirements. Our two-stage model offers flexibility to and Raman 2018). In this context, data-driven supply chain
retailers to detect several types of anomalies based on their management has become a practice that is highly sought
after by many firms (Sanders and Ganeshan 2018). This
1
https://​nypost-​com.​cdn.​amppr​oject.​org/c/​s/​nypost.​com/​2020/​11/​17/​ trend is supported by the previous literature, which has con-
covid-​19-​panic-​buying-​toilet-​paper-​essen​tials-​fly-​off-​shelv​es/​amp/ sistently demonstrated the value of data in improving firms’

13
AI & SOCIETY

operations, including supply chain configuration (Wang detecting anomalous demand patterns (e.g., Liu et al. 2012,
et al. 2016), safety stock allocation (Hamister et al. 2018), 2008).
risk management (Choi et al. 2016), and promotion planning Although the concept of demand anomaly detection is far
(Cohen et al. 2017). With the advances in artificial intel- from new, virtually all prior studies have detected demand
ligence in recent years, techniques such as deep learning anomalies from a purely statistical perspective (e.g., iden-
have been increasingly adopted to improve several aspects tifying demand observations that are significantly different
related to retail operations, such as procurement (Cui et al. relative to “normal” patterns). They do not account for busi-
2022), demand forecasting (Birim et al. 2022), inventory ness requirements (i.e., how anomalies are defined from a
replenishment (Qi et al. 2020), and risk management (Wu business standpoint) in the detection process. This discrep-
and Chien 2021). This paper extends the literature in this ancy has important implications because the implementa-
research stream in the wake of the COVID-19 pandemic, tion of cutting-edge technologies without taking business
which imposes significant burdens on supply chains and requirements into account can generate severe consequences
product distribution (e.g., Armani et al. 2020; Ivanov and for businesses (Yeoh and Koronios 2010). In this paper, we
Dolgui 2020). Specifically, we propose a data-driven frame- present an anomaly detection model that explicitly accounts
work that can systematically identify demand anomalies for for managerial requirements. While our model was moti-
grocery products. We then classify these anomalies based vated by the COVID-19 pandemic disruptions, it remains
on managerial requirements (e.g., whether these anomalies flexible enough to ensure that it can be generalized to non-
are pertinent according to a specific managerial definition). pandemic scenarios. In summary, the novelty of this paper
Using the identified anomalies, we then develop a prescrip- lies in its application to business contexts, especially in
tive tool to showcase the implications of our method in terms how the detected anomalies are classified based on busi-
of essential product distribution. Our data-driven model can ness requirements. As such, we have developed a two-stage
help retailers strategically decide when to activate a ration- model that includes a meaningful interpretation and offers
ing policy, for which stores and categories of products to flexibility to retailers to detect several types of anomalies
do so, and the appropriate limit value (e.g., one versus two based on their business requirements and preferences.
items per customer).
Our paper is also related to the nascent literature on retail 2.1.3 Leveraging external data
operations amid the COVID-19 pandemic (Tsyganov 2021).
Several papers have focused on supply chain resiliency (e.g., This paper is also related to the stream of literature that
Ivanov and Dolgui 2020; Sodhi and Tang 2020). Han et al. utilizes external data sources, such as social media and
(2020) consider the impact of the pandemic on e-commerce news articles, for identification, exploration, and predic-
sales. While most previous studies take a descriptive or pre- tion tasks. Conceptually, several prior studies have dem-
dictive approach, our paper adopts a data-driven prescriptive onstrated that social media tends to capture public opinion
approach to enhance future decisions, namely, how retailers (Anstead and O’Loughlin 2015). Indeed, the content shared
can detect panic-buying events and strategically react. on social media tends to be generated from diverse infor-
mation sources (Bakshy et al. 2015). As a result, social
2.1.2 Anomaly detection media data are widely used for prediction purposes in a
wide range of applications. For example, Chen et al. (2014)
Anomaly detection has been extensively studied in the sta- demonstrate that investors’ opinions transmitted via a social
tistics, computer science, and machine learning communities media platform are significantly useful for predicting future
(see, e.g., Mehrotra et al. 2017, and the references therein). stock returns and earnings surprises. Similarly, Tanlamai
Recent anomaly detection models have taken advantage of et al. (2022) develop a machine learning model to identify
advances in machine learning and deep learning methods to arbitrage opportunities in a retail market. The authors show
identify patterns of anomalies based on multivariate inputs. that the predictive performance of the model significantly
For a recent comprehensive literature review on anomaly increases when external data, such as online user reviews
detection models that rely on deep learning methods, see and online questions and answers, are included as model
Chalapathy and Chawla (2019). Applications of these anom- features. Using social media data to help predict sales of spe-
aly detection models include the identification and detection cific products is also common in the literature (e.g., Gaikar
of fraud (e.g., Paula et al. 2016), cyber-intrusion (e.g., Hong and Marakarkandy 2015).
et al. 2014), and medical anomalies (e.g., Salem et al. 2013; In the context of this paper, of which the focus is the
Sabic et al. 2021). In the focal context of this study, several COVID-19 pandemic, there exist several prior studies
prior papers have used anomaly detection algorithms for that demonstrate the potential usefulness of external data
retail demand anomaly detection and empirically demon- in predictive tasks. For instance, Samaras et al. (2020)
strated that existing algorithms perform reasonably well in empirically studied the role of Google Trends and Twitter

13
AI & SOCIETY

in predicting the weekly number of cases of influenza in 2.2 Theoretical background


Greece. The authors show that predictive models that are
primarily built based on these data perform well, with a This research is informed by prior studies that have pro-
mean absolute percentage error of 18.74% when Twitter posed theories on the drivers of panic-buying behavior and
is the main source and 22.61% when Google Trends is the resulting reactions of retailers. In behavioral operations,
the main source. In the same vein, Qin et al. (2020) use issues surrounding panic buying have attracted significant
social media search indexes for COVID-19 symptoms, attention from researchers (e.g., Croson et al. 2014; Ster-
such as dry cough, fever, and chest distress, in early 2020 man and Dogan 2015). Panic buying, which is often defined
to predict the number of COVID-19 cases. The authors as consumers’ attempt to accumulate more goods than they
employed five different predictive models to empirically actually need, is triggered by various factors. For example,
show that social media search indexes are an effective Li et al. (2021) empirically demonstrate the relationship
early predictor of the number of infections. between individuals’ risk aversion and tendency to panic
The prior studies referred to above have shown that buy. Other drivers include sociological factors such as
external data, such as social media, can be useful in cap- herding (Naeem and Ozuem 2021) and psychological fac-
turing public opinion, which can ultimately be used to tors such as anxiety (Lins et al. 2021). In the context of the
predict public movements. Building upon this empiri- COVID-19 pandemic, the primary factor discovered in the
cal evidence, we incorporate external data to comple- literature is the perceived uncertainty of available future sup-
ment the use of internal data for anomaly detection in ply, which can have implications for personal well-being (see
retail demand. More precisely, we leverage external data Arafat et al. 2020 for a detailed discussion on the psycho-
sources to classify demand anomalies amid the COVID- logical underpinnings of panic buying during the COVID-19
19 pandemic into pertinent anomalies that have adverse pandemic). The implications of such panic-buying behavior
retail consequences (e.g., triggering stockouts in the near can range from mild inconvenience to destabilization of the
future). entire supply chain (Perera et al. 2020).
In the retail sector, retailers have been generally aware
of the possibility of panic-buying events and their potential
2.1.4 Social impact of artificial intelligence catastrophic consequences for almost a century (Mitchell
1924). However, most retailers respond to panic-buying
AI has been widely used in several areas of society over behavior retroactively and heuristically (Prentice et al.
the past decade. The use of AI in business and manage- 2020). For instance, retailers usually delegate the respon-
ment tends to have direct implications for firms’ profits sibility for detecting and mitigating panic-buying waves to
and consumer welfare (e.g., Cohen et al. 2022b; Khern- store managers (Arumita 2020) without relying on rigorous
am-nuai et al. 2022). Meanwhile, the public sector also data-driven capabilities. As a result, there exists a large dis-
utilizes AI in several functions, such as in the legal sys- crepancy between the ability to detect panic-buying events
tem (e.g., Cohen et al. 2022a, 2023) and in policy mak- and the policies deployed to discourage or mitigate such
ing (van Noordt and Misuraca 2022). This raises several events (Chakraborti and Roberts 2020). From a pandemic
concerns over the social impact of AI, especially regard- management perspective, this discrepancy can have trouble-
ing how AI interacts with people (Shin 2023), how peo- some implications. First, inconsistent policies can confuse
ple perceive the credibility of AI (Shin 2022), and user consumers during an already stressful and uncertain time.
awareness of privacy in AI-mediated environments (Shin Second, these inconsistencies may encourage consumers to
et al. 2022b). Similarly, consumers, businesses, and poli- further mobilize and explore (e.g., to find stores with a more
cymakers have all expressed concerns over the fairness lenient policy), which could contribute to exacerbating the
of decisions made based on AI predictions (Shimao et al. spread of the virus. These implications call for a systematic,
2022; Shin et al. 2022a). The current paper connects to data-driven approach that can identify and mitigate panic-
this stream of research by proposing an AI model that can buying behavior.
help retailers improve the distribution of essential prod- Motivated by the above shortcomings regarding how
ucts during periods when demand anomalies occur (e.g., panic-buying events are handled in practice, this work con-
during the panic-buying periods at the beginning of the siders the following research question: Can we develop a
COVID-19 pandemic). The improved product distribu- practical AI framework to systematically identify and miti-
tion brought about by our AI model has positive implica- gate panic-buying behavior? Recent literature has shown
tions for consumer surplus and social welfare; thus, it that panic buying tends to generate unexpected and extreme
represents an application of AI that is potentially of great spikes in observable demand (Zheng et al. 2021). We treat
benefit to society. such events as anomalies. We then identify and adapt
existing machine learning models that can effectively and

13
AI & SOCIETY

Fig. 1  Grocery sales in Canada (source: Statistics Canada)

consistently detect these anomalies. We collaborate with a product categories that were significantly affected by the
retail partner to move beyond the standard statistical defini- COVID-19 pandemic: toilet paper (398 products), canned
tions of these anomalies and to incorporate business implica- soup (255 products), and household cleaners (554 prod-
tions into the definition. Lastly, we discuss systematic and ucts).2 We plot the (normalized) weekly transaction volume
semi-automatic processes that can be triggered following for toilet paper in Fig. 2. Similar plots for canned soup and
the detected anomalies to mitigate the potential impact of household cleaners can be found in Appendix A.1.
panic-buying behavior. For all three product categories, we observe a striking
spike in the volume of sales around mid-March (i.e., weeks
10 to 12 on the x-axis). Notably, the number of weekly
3 Data and empirical context transactions in week 11 doubles—or even triples—relative
to weeks 1–8 for all three categories. This increase is sig-
In this research, we collaborate with one of the largest gro- nificantly higher relative to the reported average increase in
cery retail chains in North America. The company manages grocery sales due to COVID-19. This led us to select these
multiple brands of grocery stores and records billions of three product categories as our focal categories in this paper.
dollars in annual revenue. Through this collaboration, we In total, we have 15,005,425 in-store transactions related to
were able to access comprehensive point-of-sale transac- the sales of these 3 product categories in the 42 stores.
tion data from 42 stores located in a large metropolitan city. In addition to point-of-sale data, we also independently
Our dataset spans January 1, 2018 to May 1, 2020. Using collect data from external sources related to the COVID-19
this dataset, we first examine the effect of the COVID-19 pandemic. Our objective is to utilize these data sources to
pandemic on several product categories. Recall that the first augment the prediction accuracy of our models. Specifically,
wave of COVID-19 in North America in March 2020 had a we collect data from the following three external sources3:
strong impact on grocery store sales. For example, Statistics
Canada reports that the average increase in grocery sales
relative to the previous year was approximately 45% among 2
In addition to the three main categories reported in the paper,
Canadian grocery stores (see Fig. 1). In this context, in Q1 four other categories, namely Baking Ingredients, Water, Cheese,
of 2020, there was an intense panic-buying wave in several and Soap, were initially chosen by the retailer for this project. How-
countries around the world. Customers rushed to grocery ever, after preliminary analyses, the numbers of detected anomalies
stores to purchase large quantities of essential products, such in these categories were smaller (less than 0.78%). As a result, the
retailer decided to exclude these four categories from the study and
as face masks, hand sanitizers, canned food, and toilet paper. focus on the three categories we presented.
We use our data to plot the total number of sales per 3
There are of course additional sources of external data available.
product category on a weekly basis. We then identify three However, it is important to note that in the retail industry, it is often

13
AI & SOCIETY

Fig. 2  Weekly toilet paper sales from January to May 2020 across 42 stores (the first week on the x-axis corresponds to the week of January 5).
The y-axis is normalized for anonymity

1. First, we collect the data from Google Trends searches. 1 and May 1, 2020. Based on this dataset, we calculate
In particular, we record the number of searches for the the volume of the news articles and their sentiment.
term “COVID-19.” In addition, we record the number
of searches that are related to the product categories We collect these external data using an automatic script
under consideration (i.e., we collect the daily search that runs on a daily basis. The sentiments of the tweets
trend for the terms “toilet paper,” “household cleaners,” and news articles are analyzed using the VADER (Valence
and “canned soup”). We limit the collection scope to Aware Dictionary and sEntiment Reasoner) package in
searches within the metropolitan area of the city in our Python (Gilbert and Hutto 2014), which is widely used in
dataset between January 1 and May 1, 2020. the literature to analyze the sentiment of textual content in
2. Second, we collect data from the social networking plat- media and social media (e.g., Ilk et al. 2020). The package
form Twitter. Specifically, we collect tweets that con- analyzes the input content and produces four output senti-
tain one of the common COVID-19 hashtags defined ment scores: positive, neutral, negative, and compound. The
by Lamsal (2020), as well as tweets that are related to first three scores range between 0 and 1 and represent the
the product categories under our consideration. In other proportion of positive content, neutral content, and negative
words, we collect tweets that either contain one of the content, where the sum of the three scores is 1. Meanwhile,
common COVID-19 hashtags or mention toilet paper, the compound score, which is the score that we use in our
household cleaners, or canned soup in the content or study, normalizes the three scores into a single score. The
both. Similar to our Google Trends data collection, we compound score ranges from 0 (extremely negative) to 1
limit the scope to tweets within the metropolitan area (extremely positive). We have a total of 337 tweets and 484
of the city in our dataset between January 1 and May news articles that fit our selection criteria.4
1, 2020. Based on this collected data, we calculate the To illustrate the trend of the external data we collected,
volume of the tweets as well as their sentiment. we plot the volume of the search terms on Google Trends
3. Third, we collect news article data from three local news that are related to toilet paper in Fig. 3. Interestingly, we
outlets that operate in the same metropolitan area as the observe that the peak of toilet paper sales driven by the
city in our dataset. These three news outlets are the top COVID-19 pandemic on March 13, 2020 (see Fig. 2) coin-
three local news outlets based on their Google search cides with the peak in Fig. 3. In addition, Fig. 4 plots the
ranks. We then collect all the news articles that men- volume of tweets and news articles that contain the terms
tion either the term “COVID-19” or one of the product “COVID-19” and “toilet paper.” Once again, a similar trend
categories under consideration or both between January can be observed. Overall, these two figures provide strong
evidence that external data can be useful in the context of
Footnote 3 (Continued) the predictive tasks studied in this paper. Similar figures for
illegal to collect granular data from competitors and/or manufacturers
and use these data in either predictive or prescriptive models. Mean-
4
while, it is possible to obtain aggregate-level data (e.g., at the geog- Several variables in our collected external data are naturally cor-
raphy, quarter, or industry levels) by purchasing such datasets from related. As shown in Appendix B, however, none of the correlation
third-party providers. Nevertheless, our retail partner felt that the cost coefficients are extreme. Since we are focusing on a predictive task,
of acquiring, cleaning, and ingesting these data was not a priority we measure the usefulness of these variables by comparing our model
given that such external datasets may not be granular enough to be performance before and after including these variables in the model,
meaningfully used in our models. as shown in Tables 5 and 6.

13
AI & SOCIETY

Fig. 3  Google Trends data for toilet paper search terms

Fig. 4  Daily volume data for


tweets and news articles related
to the toilet paper category

canned soup and household cleaners are available in Appen- performance. The second stage is modular in the sense that
dix A.2. it can be adapted to various business contexts, depending
on the desired definition of an anomaly. We summarize the
design and function of our two-stage model in Fig. 5. The
4 AI model for demand anomaly detection first stage relies on unsupervised machine learning (ML)
methods since data labels are not available in our training
The objective of our demand anomaly detection model is dataset, whereas the second stage is based on supervised
twofold. First, we aim to develop a flexible framework that methods. The overall ML pipeline architecture is presented
is applicable to the COVID-19 pandemic context as well as in Appendix C, whereas the specifics of the ML methodolo-
to other scenarios where demand anomalies may occur (e.g., gies we used are relegated to Appendix D.
natural catastrophes such as hurricanes and severe storms).
Second, given the emergency and criticality of the impact
that the COVID-19 pandemic poses for the distribution of 4.1 First‑stage model: anomaly detection
essential products, we also ensure that our model can be tai-
lored to the specific business requirements of the COVID-19 Our first-stage model aims to identify sales observations that
context. We resolve this dilemma by developing a two-stage are anomalous (e.g., large sharp increases). We construct
model with a flexible framework. The first stage (labeled our dataset using a sliding window mechanism (Sejnowski
anomaly detection) applies an unsupervised anomaly detec- and Rosenberg 1987) to create a time-series dataset. Specifi-
tion model to identify anomalies in sales transactions. The cally, we use a window of 3 h that comprises the aggregated
second stage (labeled anomaly labeling or pertinent classifi- observations of 36 data points, each with a 5-min interval.
cation) is tailored to the COVID-19 situation by taking man- Figure 6 illustrates the construction process of our data. The
agerial requirements into account to classify the detected detailed step-by-step feature preparation process is available
anomalies from the first stage into pertinent anomalies from in Appendix E.
a business perspective. As discussed, this stage also incor- Each observation represents the count of articles sold in
porates external data sources to improve the classification the focal category in a given store during the corresponding

13
AI & SOCIETY

Fig. 5  Design of our two-stage


model

Fig. 6  Structure of the data (for illustrative purposes, we assume the data starts at 9 AM)

interval. We aggregate the products at the category level two approaches, we refer the reader to the aforementioned
to ensure sufficient data variation. However, we highlight papers.5
that the model performance is independent of the aggrega- We train the Donut model using the sales data from 2018
tion level (i.e., our model can be applied at the product or and 2019. Meanwhile, the EIF model is trained on the 2019
sub-category level as long as the underlying data represents data.6 We then apply both models to the sales data in the
a large enough sample with sufficient variation). Since the first 4 months of 2020 (recall that panic-buying behavior in
dataset does not have a label (i.e., there is no label that indi- North America occurred mainly in March and April 2020).
cates whether an observation is anomalous), we adopt ML The output of both models is an anomaly score for each
methods designed to detect anomalous events in an unsu- observation. Observations with an anomaly score higher
pervised fashion. than a certain threshold in at least one model (since Donut
To ensure that our results do not overly depend on a sin- and EIF are complementary approaches) are considered
gle approach, we consider several anomaly detection meth- anomalies, which become inputs for the second-stage model.
ods for the first stage. We ultimately select two methods We treat the threshold in both models as hyperparameters
that detect anomalies using different approaches. The first (i.e., parameters whose values are chosen before training
method is a time-series-based anomaly detection method the algorithm and used to control the learning process) that
called Donut (Xu et al. 2018b). It uses state-of-the-art deep are optimally selected via a cross-validation procedure to
learning variational autoencoders to suppress time-series maximize the performance of our second-stage model.
components (trend, seasonality, and noise) and detect anom-
alies in an unsupervised manner. The second method is a
tree-based model-free anomaly detection algorithm called 5
We also considered several alternative approaches, including a
the extended isolation forest (EIF) (Hariri et al. 2021). The statistical-based method and Prophet (Taylor and Letham 2018). Both
core operation of the EIF is to identify observations that these approaches yielded lower performance relative to our two pri-
mary methods.
differ from the rest of the data. For more details on these 6
We use a different training set for the EIF model for the sake of effi-
ciency. However, the results remain qualitatively the same when we
use the same training set for both models.

13
AI & SOCIETY

Fig. 7  Anomalies detected in


the toilet paper category in one
store

4.2 Second‑stage model: pertinent anomaly decision tree (GBDT) (Friedman 2001) as our main method
classification for this stage based on its superior performance. Specifi-
cally, we adopt an efficient implementation framework of
The output of the first stage consists of a list of anomalies the GBDT called LightGBM (Ke et al. 2017). Four groups
detected from a purely statistical perspective. However, even of input features associated with basket statistics, prices and
though these anomalies are significantly different from “nor- promotions, time components, and anomalies are considered
mal” patterns, not all of them lead to adverse consequences in this model, with a total of twenty-nine input variables.
from a business perspective. In our second stage, we label Since anomalies can be detected concurrently across mul-
these anomalies as pertinent and non-pertinent based on tiple stores, the second-stage model is trained using input
managerial expertise. Subsequently, we use a supervised features obtained from all stores in the same region. To this
learning model trained on input features associated with the end, we applied a z-score normalization to the sales data
labeled anomalies to detect pertinent future anomalies. (i.e., quantity and number of transactions) of each store prior
In collaboration with our retail partner, we defined three to computing the input features.7 The complete list of predic-
types of consequences associated with pertinent anomalies tors used in our second-stage model is reported in Table 15
in the context of panic buying: (i) anomalies that trigger a in Appendix F.
stockout of at least one product in the category within the We then split the input data into a 70% training set and a
next 3 days, (ii) anomalies that lead to subsequent anom- 30% test set. The GBDT model is trained using the training
alies in the same category and store (i.e., at least 10% of set, and its performance is evaluated on the test set. All vari-
observations in the next 3 hh in the focal store are identified ables related to external data are lagged by one time period
as anomalies), and (iii) anomalies that lead to a spread of to address a potential reverse causality issue. Ultimately,
anomalies of the same category in other stores (i.e., at least this model predicts for each product category and each 3-h
ten other stores experience more than one anomaly in the window whether an anomaly is pertinent (and provides a
following 3 h). We also vary the time length of the different likelihood score).
definitions to provide an additional layer of flexibility. An
anomaly is formally defined as an observation with a rare 4.3 Anomaly detection results
pattern that deviates significantly from the majority of the
data. In the context of panic buying, an anomaly translates 4.3.1 Detected anomalies
into a very high, sudden spike in demand. As discussed, we
further classify the anomalies into three pertinence types, We first present a visualization of the anomalies detected
depending on their consequences: triggering a stockout, by our first-stage model in Fig. 7. Each dot represents the
leading to subsequent anomalies in the same store, and (normalized) aggregate sales over the category during a
leading to subsequent anomalies in other stores. Finally, 3-h interval. The colors of the dots are only for the sake of
these business requirements used to define the pertinence of visualization and are not used in any of our analyses: gray
anomalies are fully flexible so that practitioners or research- corresponds to no anomaly, yellow to a low score, orange to
ers who adopt our model can rely on our framework and a medium score, and red to a high score. For conciseness,
label pertinent anomalies differently based on their specific
requirements. 7
A z-score normalization refers to normalizing every value in a data-
Similar to the first-stage process, we consider several clas- set such that the mean of all of the values is 0 and the standard devia-
sification models. We finally opt for the gradient-boosting tion is 1.

13
AI & SOCIETY

Fig. 8  Anomalies in the toilet


paper category classified as
pertinent (colored) and non-
pertinent (black)

Table 1  Total number Detected anomalies Total Pertinent anomalies Total


of detected and pertinent detected anomalies pertinent
anomalies Donut EIF Donut EIF anoma-
lies

Toilet paper 7,790 2,719 8,250 677 1,756 2,200


Canned soup 10,895 2,021 11,405 1,291 372 1,365
Household cleaners 6,267 1,349 6,350 356 412 701

Table 2  Total percentage and Percentage of pertinent anomalies


number of pertinent anomalies
Triggering anomalies Spreading anomalies Stockout anomalies

Toilet paper 73% (1,606) 56% (1,232) 90% (1,980)


Canned soup 73% (997) 60% (822) 15% (206)
Household cleaners 82% (574) 0% (0) 42% (294)

we focus on the toilet paper category from the largest store flexibility to define the concept of pertinence, depending on
in our dataset (the same plots for the two other product cat- the context under consideration.
egories can be found in Appendix A.3). Recall that each To summarize, Table 1 reports the total number of anom-
observation in our data is an aggregated sales volume of alies detected by each algorithm (Donut and EIF) in the first
all the products in a category during a 3-h time window. stage, as well as the total (i.e., the union of both sets). It also
The x-axis represents the time, and the y-axis represents reports how many of these anomalies are eventually clas-
the (normalized) volume of sales. Intuitively, observations sified as pertinent in the second stage. Interestingly, Donut
with sales higher than a certain threshold are identified as detects a significantly larger number of anomalies than EIF.
anomalies, and the higher the sales, the higher the anomaly At the same time, EIF is more effective than Donut since
score. To allow easy visualization, we assign four colors to the proportion of pertinent anomalies in the second stage is
the anomalies, depending on their severity. much higher. This detail illustrates the complementarity of
Our second-stage model classifies the anomalies detected the two approaches. Note that the total number of pertinent
in the first stage as pertinent or non-pertinent. In Fig. 8, we anomalies reported in the last column is not the sum of the
plot the non-pertinent or false alarms (in black) and the per- total number of pertinent anomalies reported based on Donut
tinent alarms (in other colors) for the toilet paper category. and EIF in the two preceding columns because the same
The non-pertinent observations are detected as anomalies pertinent anomaly could be detected by both Donut and EIF
from a statistical perspective but did not yield business con- in the first stage.
sequences as per our definitions. As we can see, our second- In addition, recall that anomalies detected in the first
stage model can successfully classify anomalies as pertinent, stage are considered pertinent if they fit any of the three defi-
allowing us to remove the non-pertinent anomalies, thus nitions of pertinence presented in Sect. 4.2. Table 2 reports
making the detection process more suitable from a manage- the percentage and number of anomalies that are labeled as
rial perspective. As mentioned before, our method offers the pertinent according to each of our business definitions in

13
AI & SOCIETY

Table 3  Summary of model performance


Triggering anomalies Spreading anomalies Stockout anomalies
Precision Recall F1 score Precision Recall F1 score Precision Recall F1 score

Toilet paper 81% 81% 81% 93% 93% 93% 77% 74% 75%
Canned soup 83% 53% 68% 89% 80% 84% 71% 52% 60%
Household cleaners 86% 68% 76% N/A (no pertinent anomalies) 70% 58% 63%

Table 4  Number of days in which pertinent anomalies are detected in in Table 4. As expected, the pertinent anomalies are mainly
2020 concentrated in March 2020. This finding is a strong valida-
January February March April tion that our approach successfully detects anomalies for the
right period, whereas virtually no anomalies are detected
Toilet paper 0 0 6 1 during periods where we do not expect to see them (since
Canned soup 0 0 22 1 no panic buying was observed). These results support the
Household cleaners 0 0 25 7 validity and correctness of our approach and provide us with
a strong sanity check.
the second stage. Note that household cleaners exhibit no
anomalies that signal a spread to other stores but include
several anomalies that trigger subsequent anomalies in the 4.3.2 The role of external data
same store, as well as anomalies that lead to future stockouts.
As is common in the ML literature (e.g., see Larose We next investigate the impact of incorporating external
2015), we use three complementary metrics: precision (i.e., data sources (Google Trends, Twitter, and news data) into
the percentage of results that are relevant), recall (i.e., the our second-stage model. In other words, we investigate the
percentage of total relevant results correctly classified by improvement in classification performance for models that
the algorithm), and F1 score (i.e., the measure of a model’s incorporate these external data sources relative to a base-
accuracy defined as the harmonic mean of precision and line model that uses only internal data. Here, we report our
recall) to measure the performance of the classification algo- model performance in terms of classifying pertinent anom-
rithms. Note that the accuracy measure would not be suit- alies for different combinations: using only internal data,
able in this case given the skewness of the problem (anoma- using internal data with each type of external data, and using
lies are very rare events with less than 1% occurrence over internal data with all types of external data. For brevity, we
historical data) since all the models would achieve more label an observation as a pertinent anomaly if it satisfies any
than 99% accuracy based on this measure. We report the of our three definitions. Consistent with the prior literature,
overall performance of our second-stage model in Table 3. we find that adding external data helps enhance the perfor-
This table includes the precision, recall, and F1 score of the mance of our classification model. The results are reported
anomaly detection outcomes for the three product catego- in Tables 5 and 6. Observe that the F1 score for the model
ries (toilet paper, canned soup, and household cleaners) with that incorporates all external data sources in the second-
respect to the different definitions of pertinence. Overall, our stage classification is consistently above 90%.
model performs well for toilet paper across all three types To clearly establish the influence of each external data
of pertinent anomalies. It is worth noting that we do not source in the second-stage classification task, we compute
report the classification of spreading anomalies for house- the Shapley value of each predictor used in the classifica-
hold cleaners since there is no pertinent anomaly based on tion model. The notion of the Shapley value was originally
this definition for this product category. Finally, we report developed as a solution concept in the cooperative game
the days in which we identify at least one pertinent anomaly theory literature. The value represents the average expected

Table 5  Model performance with respect to external data (part 1)


Internal only Internal + Google trend Internal + News articles
Precision Recall F1 score Precision Recall F1 score Precision Recall F1 score

Toilet paper 91% 85% 88% 95% 95% 95% 95% 95% 95%
Canned soup 65% 55% 59% 73% 55% 63% 85% 55% 67%
Household cleaners 84% 65% 73% 89% 69% 78% 84% 65% 73%

13
AI & SOCIETY

Table 6  Model performance Internal + Twitter Internal + All


with respect to external data
(part 2) Precision Recall F1 score Precision Recall F1 score

Toilet paper 95% 95% 95% 97% 95% 96%


Canned soup 85% 55% 67% 94% 91% 93%
Household cleaners 83% 69% 75% 96% 89% 92%

Fig. 9  Feature importance (Shapley vales) of top ten predictors in the second-stage classification for the toilet paper category

marginal contribution of each player in the game to the (hour of the day and day of the week) also seem to play an
payoff function (Shapley 1953). In recent years, it has been important role. The feature importance plots based on the
widely used to explain the influence of the different predic- Shapley values of the top ten predictors for the canned soup
tors in ML models (e.g., Husain et al. 2016; Ma and Tourani and household cleaners categories are available in Appendix
2020; Pamuru et al. 2022). The choice of this model-inter- A.4.
pretability metric (rather than using the pre-built function
in LightGBM for variable importance) is due to the require- 4.3.3 Alternative specifications
ment of the retail partner to make the interpretability func-
tion model agnostic in case they change the predictive model We next consider several alternatives to the specification of
in the ML pipeline in the future. In the context of this paper, our two-stage model and report several comparison results.
the Shapley value captures the average marginal contribu-
tion of each predictor to the eventual outcome (i.e., whether 4.3.3.1 Varying the definition of triggering anoma‑
the anomaly is pertinent). To this end, we use the SHAP lies Recall that we defined an anomaly as pertinent if it trig-
library (Lundberg and Lee 2017), which relies on scalable gers stockouts within the next 3 days. In Fig. 10, we vary the
additive feature attribution methods, to compute the Shapley length of this definition. Particularly, we consider 1 day and
values. The resulting mean absolute Shapley values can then 7 days as alternative values. As expected, when the length
be used as a measure of global feature importance based on increases, the percentage of detected pertinent anomalies
the magnitude of feature attributions. also increases—in a concave fashion. The fact that our
Figure 9 shows the feature importance derived from the model offers the flexibility to work under various lengths
Shapley values of the top ten predictors (in terms of influ- allows us to adapt the definition of anomaly depending on
ence on the classification task) for the toilet paper category the business requirements.
(a higher value indicates a higher influence on the clas-
sification). As we can see, five out of the top ten features 4.3.3.2 Using single‑stage anomaly detection mod‑
come from external data sources. This finding explains the els Recall that our anomaly detection model relies on two
superior performance of the model that uses external data stages: detecting anomalies (first stage) and classifying them
sources relative to the model that uses only internal data, as as pertinent (second stage). The first stage relies on the time
reported in Tables 5 and 6. In addition, temporal features series of the sales, whereas the second stage uses several

13
AI & SOCIETY

Fig. 10  Analysis of pertinent


anomalies related to stockouts
in the same store for different
time lengths: 1 day, 3 days, and
7 days

Table 7  Prediction comparison of different benchmarks for 42 stores


SADPA/SAD SADPA/SAD SADPA/SAD SADPA/SAD
Jan 2020 Feb 2020 Mar 2020 Apr 2020

Toilet paper First-stage vanilla model 0/619 0/495 33/419 4/419


First-stage IEF multivariate 0/88 0/15 112/144 16/75
First-stage Donut multivariate 0/35 0/34 155/439 26/84
Our two-stage model 0/0 0/0 252/271 42/46
Canned soup First-stage vanilla model 0/420 0/138 448/873 29/331
First-stage IEF multivariate 0/214 0/312 240/1040 21/202
First-stage Donut multivariate 0/38 0/210 733/880 38/389
Our two-stage model 0/0 0/55 918/1073 38/42
Household cleaners First-stage vanilla only 0/168 0/520 1063/1211 280/822
First-stage IEF multivariate 0/14 0/670 870/870 268/675
First-stage Donut multivariate 0/20 0/401 912/984 294/560
Our two-stage model 0/0 0/0 1004/1004 294/360

features (e.g., promotions and external data) to perform the Table 7. The second number (on the right) corresponds to
classification task. An alternative approach would be to con- the number of store anomalous days (SAD) that are detected
sider a single-stage anomaly detection model that utilizes all by each method. The first number (on the left) corresponds
the features at once. We next compare the performance of to the number of store anomalous days with actual pertinent
our two-stage model to several single-stage anomaly detec- anomalies (SADPA) among the SAD. The ratio SADPA/
tion methods (vanilla model, EIF multivariate, and Donut SAD is then equivalent to precision at the store-day level.
multivariate). The vanilla model refers to a simple statisti- This number is important because the retailer needs to
cal method whereby anomalies are defined as observations review and react based on the detected anomalies. Thus,
with a value higher (or lower) than two standard deviations incorrectly detected anomalies (i.e., false positives) can
from the mean of the sales in the training dataset. The EIF significantly affect the trust of the users of this tool. As we
and Donut multivariate models are extensions of the (unsu- can observe in Table 7, our two-stage model can effectively
pervised learning) anomaly detection models considered in rule out the majority of anomalies detected by the first stage
Sect. 4.1, where all the features are included in the anomaly that are likely to be non-pertinent. Table 7 showcases three
detection phase (i.e., in the first stage). important benefits of our approach: (i) our model detects a
The results for all three product categories are reported in much larger number of actual pertinent anomalies relative
Table 7. Since our retail partner is particularly interested in to the three single-stage benchmarks; (ii) our model has a
how effective the methods are in detecting pertinent anoma- much lower number of false alarms (e.g., during the peak
lies in each specific store-day pair, we evaluate the methods period in March and April 2020, the false positive rate of the
using two key measures that are reported in each cell of two-stage model over the three categories is approximately

13
AI & SOCIETY

8.8% versus 54.4% when using the vanilla model); and (iii) solving this problem until now with our proposed approach.
our model detects anomalies when it should (March and The second measure is the arrangement of an urgent ship-
April) but not when none exist (January and February). ment to fulfill additional inventory. If necessary, the demand
These results clearly convey the need to consider a two- planner can make a special order request if it is foreseen
stage approach in which the various features are leveraged that certain products will be out-of-stock prior to the next
for classification purposes in the second stage, especially in scheduled delivery. Such an intervention, however, is costly
the empirical context of this study. More importantly, with- and mainly used in exceptional situations when a substantial
out the second stage, the retailer would need to deal with an stockout is anticipated. In this section, we focus on the first
enormous number of detected anomalies that often turn out measure (rationing policy), but one can easily combine our
to be false alarms. Meanwhile, although using traditional simulation tool with the second measure (urgent shipment
single-stage models may be appealing since they are easier to fulfill additional inventory), as discussed in Sect. 5.3. We
to employ and maintain, such an approach can lead to detect- next elaborate on our prescriptive simulation tool and apply
ing a large number of non-pertinent anomalies that can be it to a simulated future panic-buying event to showcase its
costly to the company’s operations. However, these models practical impact.
tend to fail to detect many pertinent anomalies.
So far, our focus has been on the predictive side by devel- 5.1 Prescriptive tool for rationing policies
oping a model that can successfully detect retail demand
anomalies according to different definitions of pertinence. We conduct a scenario analysis at the product-group level in
The next question is how retailers can leverage the insights which a limit c is imposed on the total number of products
generated by our model to improve their operational deci- purchased per customer. For example, the store manager may
sions. We investigate this question by developing a prescrip- decide to allow each customer to purchase no more than
tive simulation tool in the next section. c = 2 packs of toilet paper of any brand. The product groups
on which such a limit can be imposed are predetermined by
the retailer and typically represent a set of products that are
5 Prescriptive retail operations relatively homogeneous and substitutable for each other. Our
amid pandemics prescriptive tool can be used to decide (i) when to activate
such a rationing policy, (ii) for which group of products, and
Once pertinent anomalies are detected, the retailer needs (iii) the best limit value (i.e., the value of c). Taken together,
to react quickly and strategically. In particular, the demand we develop a simulation tool that allows retailers to proac-
planner can analyze the impact of panic buying on the store tively test “what-if” scenarios in response to the detected
inventory to prescribe the necessary actions. More specifi- demand anomalies due to panic buying.
cally, our model will send a signal indicating potentially The total demand for each group of products g is esti-
pertinent anomalies that may affect the operations of the mated by a decomposition model comprising two modeling
retailer; then the retailer can strategically react by deploy- features: (i) the estimated arrival rate of customers for prod-
ing two potential countermeasures. First, the retailer may uct group g, and (ii) the distribution of the number of units
impose a quantity limit per customer for a group of similar (basket distribution) from this product group purchased by a
products, typically at the category or sub-category level. customer. The first input can be estimated from the observed
This type of rationing policy was widely adopted by several sales during the periods with pertinent anomalies, which are
retailers in the first quarter of 2020 and in 2021.8 Ultimately, given by the classification model from Sect. 4.2, adjusted by
such a policy aims to ensure that a higher number of custom- intra-day and intra-week seasonality factors (we omit the
ers can purchase essential products that are currently in high details for conciseness). The second input is the empirical
demand. The simulation model presented in this section can distribution of the number of purchased units per customer
be used to decide when to activate such a rationing policy, in group g obtained from the sales transaction data during
for which categories of products, and which limit value to the same period. The expected total demand for products in
set. To our knowledge, no such data-driven tool has yet been group g under limit c during the time interval (t, t ), where

developed. Instead, retailers have heretofore decided when t > t is any subsequent time period, can then be calculated

and how to use rationing policies based on intuitions and as follows:


emotional reactions. Consequently, there is no clear notion [ ( )] ∑t+L ∑∞ ( )
of current practice against which to benchmark our results Ep D ̃ g,c t, t� = 𝜆g (s) min{kg , c}P ̃ kg = k .
s=t k=1
since there has been no systematic approach applied to
Here, 𝜆g (s) is the estimated arrival rate of customers for
8
https://​www.​cnn.​com/​2020/​03/​06/​busin​ess/​coron​avirus-​global-​ products in group g at time s,̃ k is a random variable repre-
panic-​buying-​toilet-​paper/​index.​html senting the number of units of products in group g purchased

13
AI & SOCIETY

will be greatly shifted. In our simulation, we consider


( )
by a customer, and ̃
kg = k is the probability mass function
the following basket distribution: (0.1, 0.1, 0.15, 0.15,
of k . The demand planner can then use this simulation analy- 0.2, 0.3), where the i -th number corresponds to the pro-
sis tool to estimate the total group demand over time based portion of customers who purchase i units ( i = 1,…, 6).
on different limits to select the appropriate value of c for In this distribution, 30% of the customers will buy six
each product group and time period. products.
We next showcase the impact of our simulation tool 3. Duration of the panic-buying wave: Another important
applied to the panic-buying event that occurred in March parameter is the duration of the panic-buying wave. In
2020. Specifically, we use the historical data from the toi- our simulation, we consider a 7-day event and a 14-day
let paper category prior to March 12, 2020 to calibrate our event.
model and then simulate the results for the subsequent week.
Our findings suggest that the substantial stockout in the toi- In addition, we need to consider the levers available to the
let paper category for March 12–19, 2020 (i.e., the crux of retailer. First, the retailer needs to decide on the value of the
the panic-buying period in North America) could have been rationing limit c . Motivated by practical considerations, we
avoided by imposing a limit of one item per customer start- consider c = 1, 2, 3, as well as the scenario with no ration-
ing from March 12. In addition, imposing a limit of one item ing limit. A second important parameter is the amount of
per customer in the toilet paper category for March 12–19 initial inventory at the beginning of the panic-buying wave.
in a representative store would have increased the average In our simulation, we consider two values (simple and dou-
number of unique customers per day from 675 to 1,058— ble). Specifically, we use the same value as in the March
corresponding to a 56.74% increase in access to essential 2020 panic-buying event in the toilet paper category. We
products.9 This and similar policies, if performed properly, also consider a scenario in which the amount of initial inven-
can democratize access to a wider population without incur- tory is twice as high.
ring any cost to the retailer. To determine the projected inventory level of the category
We note that the above simulation exercise is based on under different rationing limits, the (random) inventory posi-
historical data and may not accurately reflect the actual tion ̃Ii,c (t ) of product
( � )i at time t′ under quantity limit c can

impact. To address this shortcoming, we next present a sim- be computed as Ii,c t = I i (t) − D
̃ �
̃ i,c (t, t� ), where I � i (t) is the
ulation test based on a future panic-buying event. known inventory position of product i at the current time t ,
and D̃ i,c (t, t� ) denotes the total random demand for product i
5.2 Simulation of future panic‑buying waves during the time interval (t, t� ) under limit c . Practically, we
can approximate the expected inventory level of product i
We simulate several relevant scenarios to capture a typical under limit c at time t′ by
wave of panic buying. An event of panic buying is character-
ized by the following three features:
[ ( )] [ ( )]
EP ̃Ii,c t� ≈ I � i (t) − EP D ̃ i,c t, t� ,

1. Panic-buying wave strength: In a panic-buying event, the


[ ( )]
̃ i,c t, t′ denotes the estimated demand for prod-
where EP D
demand for a specific category of products (e.g., toilet
paper) will substantially increase. We, thus, consider uct i during the time interval (t, t� ) under limit c . The results
three strength levels: half, full, and double. We use data of our simulations are reported in Tables 8 and 9 for dura-
from the panic-buying wave triggered by the COVID-19 tions of 7 and 14 days, respectively. As discussed, we con-
pandemic in March 2020 to calibrate the full strength. sider three different levels of panic-buying wave strength and
We then consider a milder version (half strength) and a two levels of initial inventory. We first report the number of
stronger version (double strength). “inventory coverage days,” that is, the number of days before
2. Basket distribution: Using historical data, retailers can reaching the situation where all the items in the category are
calculate the distribution of the number of units for each out-of-stock. The higher this number, the better (“7 + ”
product group. This distribution can be time dependent means that the inventory lasts for at least a week, which is
(e.g., different for each day of the week and month of the the duration of the panic-buying wave). We also report the
year). In a panic-buying event, the basket distribution number of baskets served under each scenario. As expected,
imposing a stricter rationing limit (i.e., a smaller value of c)
increases both the inventory coverage days and the number
9
Our simulation shows that when our model is used in conjunc- of baskets served. Overall, our tool allows retailers to test
tion with the rationing policy, we would be able to conduct 56.74%
more transactions. Since the rationing policy limits the purchase of
“what-if” scenarios and better understand the various trade-
these in-demand products for each household, we can infer that our offs at play. For example, our tool allows the retailer to iden-
model would be able to distribute these products to 56.74% additional tify situations in which imposing a value of c = 2 is sufficient
households without accessing consumer-level data.

13
AI & SOCIETY

Table 8  Stochastic simulation results for a future panic-buying event of 7 days


Panic-buying Limit per customer Simple initial inventory Double initial inventory
wave strength (expected values)
# of inventory # of baskets # of inventory # of baskets
coverage days served coverage days served

Half strength (× 0.5) 1-item limit 7+ 1,766.70 (+ 47%) 7+ 1,769.44


2-item limit 7+ 1,770.19 (+ 47%) 7+ 1,766.36
3-item limit 6.76 1,713.39 (+ 43%) 7+ 1,766.25
No limit 4.03 1,200.88 7+ 1,766.15
Full strength (× 1) 1-item limit 7+ 3,527.49 (+ 194%) 7+ 3,530.65 (+ 47%)
2-item limit 4.61 2,622.66 (+ 118%) 7+ 3,533.30 (+ 47%)
3-item limit 2.80 1,845.60 (54%) 6.75 3,428.76 (+ 43%)
No limit 1.63 1,200.97 4.02 2,401.36
Double strength (× 2) 1-item limit 4.26 4,983 (+ 315%) 7+ 7,035.47 (+ 193%)
2-item limit 1.81 2,622.72 (+ 118%) 4.61 5,245.12 (+ 118%)
3-item limit 1.18 1,845.50 (+ 54%) 2.80 3,691.06 (+ 54%)
No limit 0.74 1,200.86 1.62 2,401.39

Table 9  Stochastic simulation results for a future panic-buying event of 14 days


Panic-buying Limit per customer Simple initial inventory Double initial inventory
wave strength (expected values)
# of inventory # of baskets # of inventory # of baskets
coverage days served coverage days served

Half strength (× 0.5) 1-item limit 14 + 2,722.41 (+ 127%) 14 + 2,718.36 (+ 14%)


2-item limit 12.67 2,528.60 (+ 111%) 14 + 2,715.85 (+ 14%)
3-item limit 7.78 1,845.47 (+ 54%) 14 + 2,720.54 (+ 14%)
No limit 4.04 1,200.89 11.53 2,378.90
Full strength (× 1) 1-item limit 12.03 4,890.10 (+ 307%) 14 + 5,430.42 (+ 126%)
2-item limit 4.61 2,622.61 (118%) 12.65 5,063.12 (+ 111%)
3-item limit 2.78 1,845.59 (+ 54%) 7.79 3,691.03 (+ 54%)
No limit 1.63 1,200.98 4.04 2,401.72
Double strength (× 2) 1-item limit 4.25 4,983 (+ 315%) 12.06 9,779.28 (+ 307%)
2-item limit 1.81 2,622.79 (+ 118%) 4.60 5,245.27 (+ 118%)
3-item limit 1.18 1,845.60 (+ 54%) 2.81 3,691.10 (+ 54%)
No limit 0.74 1,200.67 1.63 2,401.51

and where imposing a lower value will have no additional We simulate panic-buying behavior using actual inven-
benefit. Ultimately, the tool allows retailers to make tory and basket distributions during the panic buying of
informed decisions regarding rationing policies in a data- March 2020. For each simulation scenario, we generate bas-
driven fashion. ket distributions according to a predefined probability distri-
In our simulation, we assume that the consumer demand bution of the number of items in a basket for each category.
would not change should the retailer implement a ration- The number of baskets generated represents the strength of
ing policy. Indeed, in the context of this study, it is highly the panic-buying wave; that is, the stronger the panic-buying
unlikely that consumer demand would be elastic to the retail- wave, the larger the number of baskets (demand). We con-
er’s rationing policy. In other words, since the products in sider three wave-strength configurations: half strength, full
our study are subject to panic buying, it is unlikely that the strength, and double strength. The full-strength configura-
demand would be lower (or higher) as a result of the ration- tion means that the simulation mimics the actual demand
ing policy implemented by the retailer. Consequently, we recorded during the panic buying of March 2020. We simu-
can safely assume that the demand values are independent late the half-strength and double-strength waves by halv-
of the model parameters. Our prescriptive simulation took ing (i.e., 0.5 × March 2020 demand) and doubling (i.e.,
advantage of this distinctive feature. 2 × March 2020 demand) the full-strength wave, respectively.

13
AI & SOCIETY

For each configuration, we generate 10,000 samples and derived from the approximate total demand during the
compute the expected values, which are reported in Tables 8 replenishment lead time. If the approximate on-hand inven-
and 9. More specifically, the data points are uniformly dis- tory is projected to reach this critical level prior to the origi-
tributed between 0.4 and 0.6 for the half-strength simula- nal replenishment date, then the demand planner can con-
tion, between 0.8 and 1.2 for the full-strength simulation, sider placing a special request through the fulfillment system
and between 1.6 and 2.4 for the double-strength simulation to replenish the product. The recommended minimum
(each simulation includes a ± 20% range around the nominal replenishment quantity Qi,c of product i under limit c must
wave strength). The initial inventory value considered for be sufficiently high to ensure that the inventory can satisfy
the simulation is the actual inventory value as of March 12, the estimated demand[until ( the next
) replenishment
] period R,
2020, and we also consider a scenario in which the initial that is, Qi,c = argmin EP ̃Ii,c (R) + q ≥ 0 .
inventory is doubled. Finally, the different scenarios repre-
q
The above extension to making urgent inventory deci-
sent the rationing policies implemented where the number
sions shows how our proposed anomaly detection model can
of items per basket is capped at 1, 2, or 3, or no limit is set.
be leveraged to strategically adapt future inventory decisions
We report the number of baskets served during a period of
to mitigate the occurrence of stockouts and, ultimately, serve
7 or 14 days, as well as the number of days before running
a larger number of customers.
out-of-stock. As observed, by imposing a rationing policy
In summary, our tool’s first mission was to successfully
informed by our proposed model, the retailer can extend the
detect pertinent anomalies in the context of panic buying.
number of inventory coverage days and increase the number
Such anomalies may have adverse future consequences for
of baskets served, both of which have direct implications for
the retailer. It is then the retailer’s decision how to respond:
consumer surplus and social welfare since the product will
either by imposing an informed rationing policy or by plac-
more likely be available on the shelf.
ing an urgent inventory order (or both). Our simulation tool
can guide retailers in making such critical decisions.
5.3 Extension: urgent shipment to fulfill additional
inventory 6 Implications and conclusion
One can extend our simulation tool to prescribing urgent
In this section, we first present the practical implications
inventory shipments. To do so, the output of the scenario
of our AI-based framework. We then discuss how both the
analysis is used to produce an inventory estimate for the near
predictive and prescriptive tools developed in this paper can
future under a specific quantity limit for each item in the
be adapted and used by a wide range of retailers. We close
category to determine if an urgent shipment is necessary.
by reporting our conclusions.
This item-level expected demand estimate can be expressed
a � s
� �� � � � 6.1 Practical implications
EP D̃ i,c t, t� = ∑t λg (s)∑∞ min{̃
s=t k=1
kg , c}P ̃
k g = k 𝜙 i (s) ,
where 𝜙i (s) is the estimated proportion of the demand of As discussed, this work was directly motivated by practical
product i in group g in period s. For example, we can deter- considerations. In particular, the panic-buying behavior amid
mine 𝜙i (s) using the commonly employed multinomial logit the COVID-19 pandemic caught most retailers off guard as
function based on the sales in the same group of products. they were not ready for such unprecedented panic-buying
More specifically, we denote by mi the total sales of product waves. In this paper, we propose an AI-based framework to
i recently observed during periods with pertinent anomalies detect early signals of panic-buying events, which are cast
and by Ag,c (t), the list of available products in group g at time as anomalies. We then propose a prescriptive tool that can
t under limit c. We then have 𝜙i (t) = ∑ i m . Note that this
m
react to mitigate the adverse impact of panic-buying events.
Overall, our framework includes predictive (detecting differ-
j∈Ag,c (t) j

reallocation of category demand based on the multinomial


ent types of anomalies by leveraging data) and prescriptive
logit function essentially captures the fact that customers are
(setting data-driven rationing policies) components.
willing to substitute an available product for another avail-
As discussed in Sect. 5, our framework was implemented
able product, which can be justified in a panic-buying event
across 42 stores run by our retail partner. This retail part-
for certain categories of products.
ner has made the decision to run our tool at specific stores
We consider the inventory level of a product to be[ critical]
with certain product categories to proactively detect future
when the expected inventory level of the product EP ̃Ii,c t′
( )
demand anomalies. In the event that an anomaly is detected,
reaches a predefined critical level 𝛼i chosen by the retailer. the tool will automatically send an alert to the appropriate
In practice, this critical inventory level can be directly store manager. The alert will include the specifics of the

13
AI & SOCIETY

detected anomaly (which store, which category of products, enhanced through the use of data and additional features
the type of anomaly, and its severity score, along with inter- in such contexts. More specifically, pharmacies can further
pretability measures). In addition, at the end of each day, a incorporate drug-consumption coverage in the anomaly
formal report will be sent to the store managers regarding classification model to enhance the prediction of pertinent
the specifics of the detected anomalies. It will then be up to anomalies. Electronics retailers can leverage product return
the store managers to decide whether to take any prescriptive data in the ML-based classification model in a similar fash-
action to preempt the impact of the detected anomalies (e.g., ion. The outputs from the anomaly detection model can then
a rationing policy or increasing inventory orders). Interest- be used to analyze the impact of imposing different rationing
ingly, each store manager will have the flexibility to define policies, as presented in Sect. 5.1.
the types of anomalies they would like to flag and detect. Finally, the framework proposed in this paper can also
Finally, in the event of severe panic buying, our prescrip- be applied to e-commerce. The unsupervised autoencoder
tive tool will assist the retailer in setting the right rationing model we used has been effectively applied to a large-scale
policy for the right set of products at the right time. Ulti- web application to detect and analyze anomalies in web traf-
mately, our framework can be seen as preventive and can sig- fic every minute (Xu et al. 2018a). In addition to the main
nificantly help retailers during challenging and unexpected features presented in Appendix F, online retailers can further
times. Our method can increase access to essential products leverage customer traffic and clickstream data to enhance the
and mitigate prolonged stockouts, which are detrimental to performance of the classification model. Our tool can then
a store’s reputation. be deployed in an automated manner to analyze online retail
sales and provide recommendations on quantity rationing,
6.2 Generalizability product offerings, and inventory fulfillment in real time.

Despite the first COVID-19 panic-buying wave being behind 6.3 Conclusion


us, retailers still often experience unusual demand spikes
due to shifts in customer behavior, which often lead to stock- In this paper, we leveraged AI tools and methods to help
outs (Gopal 2021). To mitigate this issue, the predictive and retailers detect panic-buying events and improve essential
prescriptive analytics approaches presented in this paper can product distribution in uncertain times. More specifically,
be readily adapted by retailers in different verticals. Specifi- we proposed an anomaly detection model that can identify
cally, any convenience store or supermarket, regardless of its pertinent anomalies in real time. Detecting anomalies early
size and geographical location, can leverage transaction and allows retailers to be proactive and systematically react to
basket data, in conjunction with labels indicating pertinent anomalies before it is too late. In particular, retailers can
anomalies, to train our anomaly detection and classifica- activate a rationing policy (i.e., limiting the number of pur-
tion models. The anomalies can be labeled using historical chased products per customer) or decide to place an urgent
sales and inventory data by focusing on different business inventory order. These actions would enhance the distribu-
requirements, such as service-level failures (e.g., stockouts) tion of essential products, thereby benefiting both retailers
and abnormalities in sales (e.g., sales spikes across multiple and their customers.
locations). The trained model can then be deployed either at We proposed a two-stage AI model in which the first
the store or regional level to detect and classify anomalies. stage detects anomalies from a statistical perspective, and
Store managers can then be promptly notified once pertinent the second stage classifies these anomalies according to
anomalies have been detected, perform prescriptive “what- their managerial pertinence. Our framework is flexible and
if” analyses, and make informed decisions on rationing poli- can easily be adapted to various business settings. Since we
cies and fulfillment actions. defined anomalies from a business perspective, our model
Other types of retailers, such as pharmacies, have also provides a clear interpretation of the detected anomalies. By
suffered from panic buying, leading to shortages of supplies applying our method to three product categories—with more
of many medication products, including cold and flu tablets, than 15 million observations—we first established that our
painkillers, and various prescription drugs. Rationing poli- model yields high performance and can be scaled to a large
cies have also been put in place to control such situations number of categories and stores. In actual implementation,
(Tillett 2020). Similarly, electronics stores have experienced our anomaly detection model can run in the background and
significant demand surges in multiple product categories send scheduled emails (e.g., hourly, daily, weekly) to the
since 2020 due to changes in customer behavior (Leswing demand planner on the detected anomalies for each store
2021). To address these issues, pharmacies and electronics and category. These emails can include the type of anomaly,
retailers can apply our two-step anomaly detection method its severity level, and an interpretation. We then conducted
to proactively detect subtle demand anomalies for any group a simulation analysis to develop a prescriptive analytics
of products. The models can be readily generalized and tool. Ultimately, our tool allows retailers to test “what-if”

13
AI & SOCIETY

scenarios to strategically react and properly decide when and In addition to its research impact, we believe that our
how to activate rationing policies. Finally, we simulated a method can yield a substantial societal impact. Specifically,
future panic-buying wave to showcase the practical impact it can help increase access to essential products during
of our tool. panic-buying waves. We have shown that a simple proac-
Nevertheless, our research has limitations that offer tive strategy based on our results could have mitigated the
highly promising avenues for future research. First, our overwhelming retail stockouts observed in March 2020 in
model was developed to detect anomalies based on sales North America and significantly increased access to essen-
data and not on actual customer demand realizations. Given tial products. This work demonstrates that by effectively
that observed sales are censored demand, it would be inter- leveraging suitable AI methods on large amounts of data,
esting to extend our method to uncensored sales data to truly the retail world can be better for both firms and consumers.
detect demand anomalies. Second, our study was only con-
cerned with potential strategic reactions to panic buying by
retailers, whereas potential reactions from manufacturers are
left as a potential future research direction. Lastly, while Appendix A: Additional plots
the counterfactual simulation results from our prescriptive
analytics show that our proposed model positively impacts Weekly sales for canned soup and household
consumer surplus and social welfare by improving product cleaners categories
availability, future research could aim to quantify this impact
more formally. See Appendix Figs. 11, 12.

Fig. 11  Canned soup weekly sales from January to May 2020 across 42 stores (the first week in the x-axis corresponds to the week of January
5). The y-axis is normalized for anonymity

Fig. 12  Household cleaners weekly sales from January to May 2020 across 42 stores (the first week in the x-axis corresponds to the week of
January 5). The y-axis is normalized for anonymity

13
AI & SOCIETY

Tweets and news article plots for canned soup


and household cleaners categories
See Appendix Figs. 13, 14.

Fig. 13  Daily volume data of


tweets and news article related
to the canned soup category

Fig. 14  Daily volume data of


tweets and news article related
to the household cleaners
category

Anomalies in canned soup and household cleaners


categories

See Appendix Figs. 15, 16, 17, 18.


Fig. 15  Anomalies detected in
the canned soup category in
one store

13
AI & SOCIETY

Fig. 16  Anomalies in the


canned soup category classified
as pertinent (colored) and non-
pertinent (black)

Fig. 17  Anomalies detected in


the household cleaners category
in one store

Fig. 18  Anomalies in household


cleaners category classified as
pertinent (colored) and non-
pertinent (black)

13
AI & SOCIETY

Shapley values for canned soup and household


cleaners categories

See Appendix Figs. 19, 20.

Fig. 19  Feature importance (Shapley values) of top ten predictors in the second-stage classification for the canned soup category

Fig. 20  Feature importance (Shapley values) of top ten predictors in the second-stage classification for the household cleaners category

1. First, we collect the data from Google Trend searches.


Appendix B: Correlation table for external
In particular, we record the amount of searches for the
data
term “COVID-19.” In addition, we record the amount
of searches that are related to the product categories
In Sect. 3 of the paper, we described the external data we
under consideration (i.e., we collect the daily search
have collected to supplement our model.

13
AI & SOCIETY

Table 10  Correlation table for the toilet paper category


Trend Trend News News News COVID News COVID Twitter Twitter Twitter COVID Twitter COVID
COVID sentiment frequency sentiment frequency sentiment frequency sentiment frequency

Trend 1.0000 0.4798 0.0826 0.0275 0.0560 0.2984 0.3766 0.5719 0.0173 0.4251
Trend COVID 0.4798 1.0000 0.1390 0.1521 0.4220 0.8278 0.1919 0.3988 0.3852 0.8400
News 0.0826 0.1390 1.0000 0.8496 0.1765 0.2773 -0.1728 -0.0775 0.2056 0.1253
sentiment
News 0.0275 0.1521 0.8496 1.0000 0.1644 0.2682 -0.1203 -0.0719 0.2551 0.1250
frequency
News COVID 0.0560 0.4220 0.1765 0.1644 1.0000 0.4509 0.0053 -0.0190 0.3819 0.3270
sentiment
News COVID 0.2984 0.8278 0.2773 0.2682 0.4509 1.0000 0.0900 0.1970 0.4040 0.7371
frequency
Twitter 0.3766 0.1919 -0.1728 -0.1203 0.0053 0.0900 1.0000 0.0642 0.0000 0.0573
sentiment
Twitter 0.5719 0.3988 -0.0775 -0.0719 -0.0190 0.1970 0.0642 1.0000 0.0042 0.6532
frequency
Twitter COVID 0.0173 0.3852 0.2056 0.2551 0.3819 0.4040 0.0000 0.0042 1.0000 0.3344
sentiment
Twitter COVID 0.4251 0.8400 0.1253 0.1250 0.3270 0.7371 0.0573 0.6532 0.3344 1.0000
frequency

13
AI & SOCIETY

Fig. 21  Illustration of the AI pipeline architecture

trend of the terms “toilet paper,” “household cleaners,” local news outlets based on the Google search rank. We
and “canned soup”). We limit the collection scope to then collect all the news articles that mention the term
searches within the metropolitan city in our dataset COVID-19, or mention one of the product categories
between January 1 and May 1, 2020. under consideration, or both between January 1 and May
2. Second, we collect data from the social networking 1, 2020. Based on this dataset, we calculate the volume
platform Twitter. Specifically, we collect the tweets of the news articles and their sentiment.
that contain one of the common COVID-19 hashtags
as defined in Lamsal (2020), and tweets that are We report the Pearson correlation coefficients of all the
related to the product categories under our considera- variables for the toilet paper category in Table 10.Similar
tion. In other words, we collect the tweets that either patterns were also observed for other product categories.
contain one of the common COVID-19 hashtags or
mention toilet paper, household cleaners, or canned
soup in the content or both. Similar to our Google Appendix C: Overview of two‑stage AI
Trend data collection, we limit the scope to tweets pipeline architecture
within the metropolitan city in our dataset between
January 1 and May 1, 2020. Based on this collected The architecture of the AI pipeline implemented on the
data, we calculate the volume of the tweets as well as cloud platform is shown in Fig. 21. The main data used in
their sentiment. this pipeline comprise (i) transaction data, (ii) inventory and
3. Third, we collect news article data from three local stockout records, (iii) external data collected from external
news outlets that operate in the same metropolitan city sources (Google Trends, Twitter, and news). Historical data
as our dataset. These three news outlets are the top three are processed and stored in the database, whereas live data
are connected to the company’s data streams.

13
AI & SOCIETY

As described in Sect. 4 of the paper, the predictive A deep‑learning‑based anomaly detection


framework consists of the first-stage and second-stage algorithm: Donut
models. The labeling process is implemented and con-
nected to the ML flow platform.10 This platform is com- Donut Xu et al. (2018a) is a multivariate anomaly detec-
monly used to manage the lifecycle of ML processes, in tion algorithm based on a deep learning architecture
particular, the records of experiments including model called variational autoencoder (VAE) (Kingma and Well-
parameters, code versions, metrics, and output files. We ing 2013), which can be used to train generative models to
implemented the first-stage anomaly detection models construct the distribution of the input data (training set).
using the EIF11 and Donut12 Python libraries, whereas the This autoencoder-based method is trained to compress
second-stage classification model was implemented using (i.e., encode) the data input into a latent vector of lower
the LightGBM library.13 The trained ML models are stored dimensional space and then reconstruct (i.e., decode)
in the MLflow pipeline and later used in the prediction the data input from the latent vector using an end-to-end
on live data. Detected pertinent anomalies from this two- deep-learning neural network model. The deep-learning-
stage ML prediction will be saved, and pertinent anomaly based Donut algorithm is particularly suitable when deal-
reports will be generated accordingly. The user can also ing with large-scale datasets (such as the transaction data
run various what-if analyses (as described in Sect. 5) to in our case) with missing values. Since this algorithm
evaluate and determine appropriate rationing policies and makes use of kernel density estimation (KDE) in the
fulfillment plans if necessary. reconstruction process, one can also obtain probabilistic
insights and interpretability information (for more details,
see Xu et al. (2018a)).
Appendix D: AI‑based methodologies
for anomaly detection and classification LightGBM

See Appendix Fig. 21 LightGBM is a decision-tree-based algorithm proposed by


Ke et al. (2017) that has been widely adopted in various
Extended isolation forest ML applications due to its scalability and performance
(e.g., see Hancock and Khoshgoftaar 2021; Makridakis
The extended isolation forest (EIF) algorithm (Hariri et al. et al. 2021). This type of tree-based models is highly
2021) is an extension of the isolation forest algorithm (Liu suitable for capturing complex non-linear relationships
et al. 2008), which is an unsupervised learning algorithm among different features. Unlike traditional tree-based ML
developed for anomaly detection. This algorithm is differ- algorithms, LightGBM creates a prediction tree by node
ent from traditional statistical anomaly detection algorithms splitting, allowing low memory usage and faster training
since it does not directly rely on a probabilistic distribution (more details can be found in the library documentation).
of normal data points. Instead, this tree-based algorithm cre- Thus, LightGBM is capable of dealing with large-scale
ates a set of trees (called forest) by partitioning the input and complex data. In addition, tree-based models are use-
data based on one random feature at a time using a similar- ful for practitioners due to their interpretability dimen-
ity score (i.e., a distance measure) until all individual data sion (Ke et al. 2017). More particularly, one can directly
points are isolated at the leaf nodes. With this trained model, obtain feature importance and plot prediction trees using
one can compute the anomaly score of a new data point the LightGBM library.
by running it through all the trees and measure the average
anomaly score based on the depth from each tree (i.e., a
lower depth implies that the data point can be easily isolated Appendix E: Step‑by‑step description
due to its deviation from the other data points). The scal- of input feature preparation
ability and effectiveness of this approach are demonstrated
in Hariri et al. (2021). See Appendix Tables 11, 12, 13, 14.

10
https://​mlflow.​org/.
11
https://​github.​com/​sahan​dha/​eif.
12
https://​github.​com/​NetMa​nAIOps/​donut.
13
https://​light​gbm.​readt​hedocs.​io.

13
AI & SOCIETY

Table 11  Example of aggregate sales data of each 5-min interval


Time interval 11:05 11:10 11:15 11:20 … 13:50 13:55 14:00 14:05 14:10 14:15 … 16:45 16:50 16:55 17:00 17:05 …

Sales quantity 2 1 3 2 … 3 2 4 0 2 5 … 3 2 4 0 2 …

Table 12  Input features derived First-stage features 1 2 3 4 … 34 35 36


from the aggregate sales data
Observation 1 (11:00–14:00) 2 1 3 2 … 3 2 4
Observation 2 (11:05–14:05) 1 3 2 … 3 2 4 0
Observation 3 (11:10–14:10) 3 2 … 3 2 4 0 2
… …
Observation 35 (13:55–16:55) 4 0 2 5 … 3 2 4
Observation 36 (14:00–17:00) 0 2 5 … 3 2 4 0
Observation 37 (14:05–17:05) 2 5 … 3 2 4 0 2

Table 13  Output of the first- First-stage features 1 2 3 4 ... 34 35 36 Anomaly flag


stage anomaly detection model
(Y is for Yes and N for no) Observation 1 (11:00–14:00) 2 1 3 2 . 3 2 4 Y
Observation 2 (11:05–14:05) 1 3 2 . 3 2 4 0 N
Observation 3 (11:10–14:10) 3 2 . 3 2 4 0 2 Y
. . .
Observation 35 (13:55–16:55) 4 0 2 5 . 3 2 4 Y
Observation 36 (14:00–17:00) 0 2 5 . 3 2 4 0 N
Observation 37 (14:05–17:05) 2 5 . 3 2 4 0 2 N
. . .

Table 14  Input features of the First-stage features 1 2 3 4 … 34 35 36 Anomaly flag


second-stage model (pertinent
anomaly classification) Observation 1 (11:00–14:00) 2 1 3 2 … 3 2 4 Y
Observation 2 (11:05–14:05) 1 3 2 … 3 2 4 0 N
Observation 3 (11:10–14:10) 3 2 … 3 2 4 0 2 Y
… … …
Observation 35 (13:55–16:55) 4 0 2 5 … 3 2 4 Y
Observation 36 (14:00–17:00) 0 2 5 … 3 2 4 0 N
Observation 37 (14:05–17:05) 2 5 … 3 2 4 0 2 N
… … …

1. Aggregate sales data: We first aggregate the transaction window of 3 h that comprises aggregated observations
sales data by category into the total sales quantity of of 36 data points each composed of a 5-min interval at
each 5-min interval. An example of the aggregate sales the store level. More specifically, the first observation
data is provided below. based on the above example comprises the sales quanti-
2. Input features for anomaly detection: The feature ties of each 5-min interval that has occurred between
inputs of the ML-based anomaly detection models are 11:00 and 14:00, the second observation corresponds
created using the sliding window method (Sejnowski to the sales quantities of each 5-min interval that has
and Rosenberg 1987). To create input features, we use a occurred between 11:05 and 14:05, and so on. We also

13
AI & SOCIETY

conduct a small computational experiment to vary the use the inputs from multiple stores in the same region
time window and find no meaningful implications. in the pertinent anomaly classification model, the sales
3. Outputs of first-stage model: The feature inputs gener- data (i.e., quantity and number of baskets) of each store
ated in Step 2 are used in the first-stage model, which are transformed using a z-score normalization. In other
indicate whether or not each observation in the input is words, features 1–15 in Table 15 are computed using the
considered as an anomaly as shown in the last column normalized sales data from each store. In the last col-
of Table 13. Since our framework leverages two ML- umn, we create the label indicating whether the anom-
based anomaly detection models (EIF and Donut), an aly is pertinent by checking if one of the consequences
observation is considered anomalous if it is flagged as described in Sect. 4.2 has occurred. This column is used
an anomaly by at least one of the anomaly detection as the label in the supervised ML model.
models.
4. Input features for pertinent anomaly classification:
In this step, we only retain the anomalies flagged in the
first-stage model (i.e., labeled as Y in Table 13). For Appendix F: List of features
each detected anomaly, we obtain the input features
(i.e., predictors) for the second-stage model to classify See Appendix Table 15.
pertinent anomalies using both internal and external
data sources. As shown in Table 14, we create 29 input
variables (as listed in Table 15) derived from internal
data and 5 input variables derived from Google trends,
Twitter, and news data as described in Sect. 3. Since we

Table 15  List of features from Feature name Description


internal data sources
basket size sum Number of articles in the window
basket assortment sum Number of article assortments in the window
basket size avg Average of articles in the window
basket assortment avg Average of article assortments in the window
basket size cat sum Number of category articles in the window
basket size cat avg Average of category articles in the window
basket assortment cat avg Average of category article assortments in the window
market share Category sales over all articles in the window
agg sales Number of category articles in the window
weighted price Weighted price of the category articles (weighted by quantity)
assortment size Number of category assortments in the window
promo assortment Number of category promotion assortments in the window
promo sales Number of category promotions in the window
store n lanes Number of lanes opened in the window
basket count Number of baskets in the window (frequency/traffic)
day Day of the week
month Month of the year
week Week of the year
hour Hour of the window
donut cluster Donut cluster result
donut score Donut test-score result
donut score scaled Donut test-score scaled result (between 0 and 1)
eif cluster EIF cluster result
eif score EIF cluster score
nb past sales spike Number of past sales spike flag in other stores
nb past stockouts Number of past stockouts flag out
holidays flag Holiday indicator

13
AI & SOCIETY

Acknowledgements The authors’ order is in alphabetical order. The Choi TM, Wallace SW, Wang Y (2018) Big data analytics in operations
third author is the main contributor. The authors would like to thank management. Prod Oper Manag 27(10):1868–1883
the retail partner, IVADO Labs, and SCALE AI that made this work Cohen MC, Leung NHZ, Panchamgam K, Perakis G, Smith A (2017)
possible. We also thank Gregg Gilbert, Michael Krause, and Mehdi Ait The impact of linear optimization on promotion planning. Oper
Younes for their insightful comments that helped improve this paper. Res 65(2):446–468
Cohen MC, Dahan S, Rule C (2022a) Conflict analytics: when data
Author contributions Conceptualization: YA, MC, WK-a-n. Methodol- science meets dispute resolution. Manag Business Rev 2(2):86–93
ogy: OB, AC. Investigation: YA, OB, AC, MC, WK-a-n. Visualization: Cohen MC, Perakis G, Thraves C (2022b) Consumer surplus under
YA, OB, AC, MC, WK-a-n. Writing: YA, MC, WK-a-n. demand uncertainty. Prod Oper Manag 31(2):478–494
Cohen, M. C., Dahan, S., Khern-am-nuai, W., Shimao, H., and Tou-
Funding This work is financially supported by the Natural Sciences boul, J. (2023) The Use of AI in Legal Systems: Determining
and Engineering Research Council of Canada (NSERC) grant number Independent Contractor vs. Employee Status. Artificial Intelli-
RGPIN-2021–02657. The first and fourth authors are part-time advi- gence and Law (Forthcoming).
sors to IVADO Labs, and the fifth author was a part-time advisor to Croson R, Donohue K, Katok E, Sterman J (2014) Order stability in
the same organization when this research was completed. There are supply chains: Coordination risk and the role of coordination
no competing interests to declare that are relevant to the content of stock. Prod Oper Manag 23(2):176–196
this article. Cui R, Li M, Zhang S (2022) Ai and procurement. Manuf Serv Oper
Manag 24(2):691–706
Data availability The data that support the findings of this study are Edmiston J (2020) ‘it’s madness’: Panic buying leaves long lines and
supplied by our retail partner, but restrictions apply to the availability empty shelves at grocers across country. [url: https://​finan​cialp​
of these data, which were used under license for the current study, ost.​com/​news/​retail-​marke​ting/​its-​madne​ss-​panic-​buying-​leaves-​
and so are not publicly available. Data are, however, available from long-​lines-​and-​empty-​shelv​es-​at-​g roce​rs-​across-​count​r y; last
the authors upon reasonable request and with permission of the retail accessed 29-August-2020].
partner. Fisher M, Raman A (2018) Using data and big data in retailing. Prod
Oper Manag 27(9):1665–1669
Declarations Friedman JH (2001) Greedy function approximation: a gradient boost-
ing machine. Ann Stat 29(5):1189–1232
Conflict of interest The authors declare no competing interests. Furutani K (2020) People in japan are panic-buying toilet paper due to
covid-19 coronavirus. [url: https://w ​ ww.t​ imeou​ t.c​ om/t​ okyo/n​ ews/​
people-​in-​japan-​are-​panic-​buying-​toilet-​paper-​due-​to-​covid-​19-​
coron​avirus-​030220; last accessed 29-August-2020].
References Gaikar D, Marakarkandy B (2015) Product sales prediction based on
sentiment analysis using twitter data. Int J Comput Sci Inf Technol
Anstead N, O’Loughlin B (2015) Social media analysis and public (IJCSIT) 6(3):2303–2313
opinion: The 2010 UK general election. J Comput-Mediat Com- Gilbert C, Hutto E (2014) Vader: A parsimonious rule-based model
mun 20(2):204–220 for sentiment analysis of social media text. Eighth International
Arafat SY, Kar SK, Marthoenis M, Sharma P, Apu EH, Kabir R (2020) Conference on Weblogs and Social Media (ICWSM-14)., volume
Psychological underpinning of panic buying during pandemic 81, 82.
(covid-19). Psychiatry Res 289:113061 Gopal VG (2021) How changes in consumer preferences and buying
Armani AM, Hurt DE, Hwang D, McCarthy MC, Scholtz A (2020) behaviour have caused more stock outs in 2021. [url: https://​start​
Low-tech solutions for the covid-19 supply chain crisis. Nat Rev upsma​gazine.​co.​uk/​artic​le-​how-​chang​es-​consu​mer-​prefe​rences-​
Mater 5:1–4 and-​buying-​behav​iour-​have-​caused-​more-​stock-​outs-​2021; last
Arumita A (2020) Changes in the structure and system of the shopping accessed 21-March-2021].
center area due to covid-19. Available at SSRN 3590973. Hamister JW, Magazine MJ, Polak GG (2018) Integrating analytics
Bakshy E, Messing S, Adamic LA (2015) Exposure to ideo- through the big data information chain: A case from supply chain
logically diverse news and opinion on facebook. Science management. J Bus Logist 39(3):220–230
348(6239):1130–1132 Han BR, Sun T, Chu LY, Wu L (2020) Covid-19 and e-commerce
Birim S, Kazancoglu I, Mangla SK, Kahraman A, Kazancoglu Y operations: Evidence from alibaba. Available at SSRN: https://​
(2022) The derived demand for advertising expenses and impli- ssrn.​com/​abstr​act=​36548​59.
cations on sustainability: a comparative study using deep learning Hancock J, Khoshgoftaar TM (2021) Leveraging lightgbm for categori-
and traditional machine learning methods, Annals of Operations cal big data. 2021 IEEE Seventh International Conference on Big
Research (Forthcoming). Data Computing Service and Applications (BigDataService),
Chakraborti R, Roberts G (2020) Learning to hoard: the effects of 149–154 (IEEE).
preexisting and surprise price-gouging regulation during the Hariri S, Kind MC, Brunner RJ (2021) Extended isolation forest. IEEE
covid-19 pandemic. Available at SSRN: https://​ssrn.​com/​abstr​ Trans Knowl Data Eng 33(4):1479–1489
act=​36723​00. Hong J, Liu CC, Govindarasu M (2014) Integrated anomaly detection
Chalapathy R, Chawla S (2019) Deep learning for anomaly detection: for cyber security of the substations. IEEE Transactions on Smart
A survey. arXiv preprint arXiv:​1901.​03407. Grid 5(4):1643–1653
Chen H, De P, Hu YJ, Hwang BH (2014) Wisdom of crowds: the value Husain W, Xin LK, Jothi N, et al. (2016) Predicting generalized anxiety
of stock opinions transmitted through social media. The Review disorder among women using random forest approach. In: 2016
of Financial Studies 27(5):1367–1403 3rd international conference on computer and information sci-
Choi TM, Chan HK, Yue X (2016) Recent development in big data ences (ICCOINS), 37–42.
analytics for business operations and risk management. IEEE Ilk N, Shang G, Goes P (2020) Improving customer routing in contact
Transactions on Cybernetics 47(1):81–92 centers: an automated triage design based on text analytics. J Oper
Manag 66(5):553–577

13
AI & SOCIETY

Ivanov D, Dolgui A (2020) Viability of intertwined supply networks: Qi M, Shi Y, Qi Y, Ma C, Yuan R, Wu D, Shen ZJM (2020) A practi-
extending the supply chain resilience angles towards survivability. cal end-to-end inventory management model with deep learning.
a position paper motivated by covid-19 outbreak. Int J Prod Res Available at SSRN: https://​ssrn.​com/​abstr​act=​37377​80.
58(10):2904–2915. Qin L, Sun Q, Wang Y, Wu KF, Chen M, Shia BC, Wu SY (2020)
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY Prediction of number of cases of 2019 novel coronavirus (covid-
(2017) Lightgbm: A highly efficient gradient boosting decision 19) using social media search index. Int J Environ Res Public
tree. Adv Neural Inform Process Syst 3146–3154. Health 17(7):2365
Khern-am-nuai, Warut and So, Hyunji and Cohen, Maxime C. and Sabic E, Keeley D, Henderson B, Nannemann S (2021) Healthcare and
Adulyasak, Yossiri (2022) Selecting Cover Images for Restau- anomaly detection: using machine learning to predict anomalies
rant Reviews: AI vs. Wisdom of the Crowd. Available at SSRN: in heart rate data. AI & Soc 36(1):149–158
https://​ssrn.​com/​abstr​act=​38086​67. Salem O, Guerassimov A, Mehaoua A, Marcus A, Furht B (2013)
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv Sensor fault and patient anomaly detection and classification in
preprint arXiv:​1312.​6114. medical wireless sensor networks. IEEE Int Conf Commun (ICC)
Lamsal R (2020) Coronavirus (covid-19) tweets dataset. [url: https://​ 2013:4373–4378
doi.​org/​10.​21227/​781w-​ef42]. Samaras L, Garcia-Barriocanal E, Sicilia MA (2020) comparing social
Larose DT (2015) Data mining and predictive analytics (John Wiley media and google to detect and predict severe epidemics. Sci Rep
& Sons). 10(1):1–11
Leswing K (2021) Why there’s a chip shortage that’s hurting every- Sanders NR, Ganeshan R (2018) Big data in supply chain management.
thing from the playstation 5 to the chevy malibu. https://​www.​ Prod Oper Manag 27(10):1745–1748
cnbc.c​ om/2​ 021/0​ 2/1​ 0/w
​ hats-c​ ausin​ g-t​ he-c​ hip-s​ horta​ ge-a​ ffect​ ing-​ Sejnowski TJ, Rosenberg CR (1987) Parallel networks that learn to
ps5-​cars-​and-​more.​html; Last accessed 21-March-2021. pronounce english text. Complex Systems 1(1):145–168
Li S, Zhang Z, Liu Y, Ng S (2021) The closer I am, the safer I feel: The Settanni E (2020) Those who do not move, do not notice their (supply)
“distance proximity effect” of covid-19 pandemic on individu- chains—inconvenient lessons from disruptions related to COVID-
als’ risk assessment and irrational consumption. Psychol Mark 19. AI & Soc 35(4):1065–1071
38(11):2006–2018 Shapley LS (1953) A value for n-person games. Contributions Theory
Lins S, Koch R, Aquino S, de Freitas MC, Costa IM (2021) Anxiety, Games 2(28):307–317
depression, and stress: Can mental health variables predict panic Shimao, H., Khern-am-nuai, W., Kannan, K., and Cohen, M. C. (2022,
buying? J Psychiatr Res 144:434–440 July). Strategic Best Response Fairness in Fair Machine Learn-
Liu FT, Ting KM, Zhou Z (2008) Isolation forest. Eighth IEEE Inter- ing. In: Proceedings of the 2022 AAAI/ACM Conference on AI,
national Conference on Data Mining 2008:413–422 Ethics, and Society (pp. 664–664).
Liu FT, Ting KM, Zhou ZH (2012) Isolation-based anomaly detection. Shin D (2022) How do people judge the credibility of algorithmic
ACM Trans Knowl Discovery from Data (TKDD) 6(1):1–39 sources? AI & Soc 37:81–96
Lufkin B (2020) Coronavirus: The psychology of panic buying. Shin D (2023) Algorithms, humans, and interactions: How do algo-
https://​www.​bbc.​com/​workl​ife/a rticle/20200304-coronavirus- rithms interact with people? Taylor & Francis, Designing Mean-
covid-19-update-why-people-are-stockpiling; Last accessed ingful AI Experiences
29-August-2020. Shin D, Kee KF, Shin EY (2022a) Algorithm awareness: Why user
Lundberg SM, Lee SI (2017) A unified approach to interpreting awareness is critical for personal privacy in the adoption of algo-
model predictions. Adv Neural Inform Process Syst (NeurIPS) rithmic platforms? Int J Inf Manage 65:102494
30:4765–4774 Shin D, Lim JS, Ahmad N, Ibahrine M (2022b) Understanding user
Ma S, Tourani R (2020) Predictive and causal implications of using sensemaking in fairness and transparency in algorithms: algorith-
shapley value for model interpretation. Proceedings of the 2020 mic sensemaking in over-the-top platform. AI & Society, 1–14.
KDD Workshop on Causal Discovery, 23–38. Sodhi M, Tang C (2020) Supply chain management for extreme condi-
Makridakis S, Spiliotis E, Assimakopoulos V, Chen Z, Gaba A, Tsetlin tions: Research opportunities, J Supply Chain Manag.
I, Winkler RL (2021) The m5 uncertainty competition: Results, Sterman JD, Dogan G (2015) “I’m not hoarding, I’m just stocking
findings and conclusions, Int J Forecasting. up before the hoarders get here”.: Behavioral causes of phantom
Mehrotra KG, Mohan CK, Huang H (2017) Anomaly detection prin- ordering in supply chains. J Oper Manag 39:6–22
ciples and algorithms (Springer). Tanlamai J, Khern-am nuai W, Adulyasak Y (2022) Arbitrage opportu-
Mitchell TW (1924) Competitive illusion as a cause of business cycles. nities predictions in retail markets and the role of user-generated
Q J Econ 38(4):631–652 content. Available at SSRN: https://​ssrn.​com/​abstr​act=​37640​48.
Naeem M, Ozuem W (2021) Customers’ social interactions and panic Taylor SJ, Letham B (2018) Forecasting at scale. Am Stat 72(1):37–45
buying behavior: Insights from social media practices. J Consum Tillett A (2020) Medicines rationed to stop panic buying. https://​www.​
Behav 20:1191–1203 afr.c​ om/p​ oliti​ cs/f​ edera​ l/m
​ edici​ nes-r​ ation​ ed-t​ o-s​ top-p​ anic-b​ uying-​
Pamuru V, Kar W, Khern-am nuai W (2022) Status downgrade: The 20200​319-​p54bsl; Last accessed 21-March-2021.
impact of losing status on a user generated content platform. Tsyganov, V. (2021). Artificial intelligence, public control, and supply
Available at SSRN: https://​ssrn.​com/​abstr​act=​39634​15. of a vital commodity like COVID-19 vaccine. AI & Society, 1–10.
Paula EL, Ladeira M, Carvalho RN, Marzagao T (2016) Deep learn- van Noordt C, Misuraca G (2022) Artificial intelligence for the public
ing anomaly detection as support fraud investigation in brazilian sector: results of landscaping the use of AI in government across
exports and anti-money laundering. 2016 In: 15th IEEE Inter- the European Union. Gov Inf Q 39(3):101714
national Conference on Machine Learning and Applications Wang G, Gunasekaran A, Ngai EW, Papadopoulos T (2016) Big
(ICMLA), 954–960. data analytics in logistics and supply chain management: cer-
Perera HN, Fahimnia B, Tokar T (2020) Inventory and ordering deci- tain investigations for research and applications. Int J Prod Econ
sions: a systematic review on research driven through behavioral 176:98–110
experiments. Int J Oper Prod Manag 40(7/8):997–1039 Wu PJ, Chien CL (2021) Ai-based quality risk management in
Prentice C, Chen J, Stantic B (2020) Timed intervention in covid-19 omnichannel operations: O2o food dissimilarity. Comput Ind
and panic buying. J Retail Consum Serv 57:102203 Eng 160:107556

13
AI & SOCIETY

Xu H, Chen W, Zhao N, Li Z, Bu J, Li Z, Liu Y, Zhao Y, Pei D, Feng Y, Publisher's Note Springer Nature remains neutral with regard to
Chen J, Wang Z, Qiao H (2018a) Unsupervised anomaly detection jurisdictional claims in published maps and institutional affiliations.
via variational auto-encoder for seasonal kpis in web applications.
arXiv preprint arXiv: 1802.03903. Springer Nature or its licensor (e.g. a society or other partner) holds
Xu H, Chen W, Zhao N, Li Z, Bu J, Li Z, Liu Y, Zhao Y, Pei D, Feng exclusive rights to this article under a publishing agreement with the
Y, et al. (2018b) Unsupervised anomaly detection via variational author(s) or other rightsholder(s); author self-archiving of the accepted
auto-encoder for seasonal kpis in web applications. Proceedings manuscript version of this article is solely governed by the terms of
of the 2018b World Wide Web Conference, 187–196. such publishing agreement and applicable law.
Yeoh W, Koronios A (2010) Critical success factors for business intel-
ligence systems. J Comput Inform Syst 50(3):23–32
Zheng R, Shou B, Yang J (2021) Supply disruption management
under consumer panic buying and social learning effects. Omega
101:102238

13

You might also like