Using AI To Detect Panic Buying
Using AI To Detect Panic Buying
https://doi.org/10.1007/s00146-023-01654-9
Abstract
The COVID-19 pandemic has triggered panic-buying behavior around the globe. As a result, many essential supplies were
consistently out-of-stock at common point-of-sale locations. Even though most retailers were aware of this problem, they
were caught off guard and are still lacking the technical capabilities to address this issue. The primary objective of this paper
is to develop a framework that can systematically alleviate this issue by leveraging AI models and techniques. We exploit
both internal and external data sources and show that using external data enhances the predictability and interpretability
of our model. Our data-driven framework can help retailers detect demand anomalies as they occur, allowing them to react
strategically. We collaborate with a large retailer and apply our models to three categories of products using a dataset with
more than 15 million observations. We first show that our proposed anomaly detection model can successfully detect anoma-
lies related to panic buying. We then present a prescriptive analytics simulation tool that can help retailers improve essential
product distribution in uncertain times. Using data from the March 2020 panic-buying wave, we show that our prescriptive
tool can help retailers increase access to essential products by 56.74%.
13
Vol.:(0123456789)
AI & SOCIETY
strategy to cope with the demand surge. Understanding when business requirements and preferences. We also present an
to activate a rationing policy, for which categories of prod- end-to-end real-world application of the deployment of our
ucts, and for how long it can have a significant impact on data-driven model in collaboration with one of the largest
both retailers and consumers (especially given that we will retail grocery chains in North America. We apply our model
likely experience several future waves of panic buying). In to three categories of products—toilet paper, canned soup,
particular, when implemented properly, such rationing poli- and household cleaners—during the COVID-19 pandemic
cies can broaden the distribution of essential products to a (January–May 2020). Our data comprise more than 15 mil-
large number of customers (without incurring any additional lion observations. We first showcase the accuracy of our
costs to the retailer). Interestingly, many retailers imple- anomaly detection algorithm compared to several bench-
mented such rationing policies in 2020 and 2021. However, marks. We then present a scenario analysis of the impact
the specifics of these policies were often based on intuition of imposing different versions of a rationing policy. Our
and emotional reactions, as opposed to being data driven. In results suggest that by implementing the right rationing
our conversations, several retailers acknowledged the lack policy at the right time, stockouts could be entirely avoided
of tools at their disposal to guide the deployment of ration- (or, at least, significantly mitigated), and access to essential
ing policies. In addition, these rationing policies were often products could be granted to 56.74% additional customers.
implemented too late, as stated in several anecdotal quotes, Finally, we leverage our model (estimated using data from
such as the following: “What we learned was we didn’t the first COVID-19 panic-buying wave at the end of Q1
impose product restrictions early enough, and that created a 2020) to simulate the efficacy of our prescriptive tool in a
run on the system and created some difficulties for people.”1 future panic-buying event (either triggered by the pandemic
In this paper, we collaborate with a large retail grocery or by other worrisome events). Via extensive simulations,
chain in North America to develop a real-time anomaly we showcase how our AI-based framework can help retailers
detection system (which leverages unsupervised and super- improve essential product distribution.
vised machine learning models) that can identify and
flag pertinent anomalies related to panic buying. Specifi-
cally, we propose a two-stage machine learning approach 2 Relevant literature and theoretical
that allows us to incorporate various internal data sources background
(e.g., promotions, sales, inventory, store traffic, and online
transactions) as well as external data (e.g., social media, In this section, we first survey prior studies that are closely
Google Trends) to detect anomalies. Such external data related to this paper. We then discuss the theoretical back-
can help signal potential anomalies as they occur, which ground that underpins our study, specifically on the drivers
will ultimately provide enough time for retailers to react. of panic-buying behavior and the resulting consequences.
We obtain a strong validation that our approach success-
fully detects anomalies for the right period and that virtually 2.1 Related literature
no anomalies are detected during periods where we do not
expect to have them. Our anomaly detection model differs In this subsection, we survey prior relevant literature. We
from traditional methods in that we adapt the definition of first review the extensive studies in data-driven retail opera-
anomalies based on specific business requirements. More tions. We then review several existing papers on anomaly
precisely, instead of identifying anomalous patterns on a detection models and how these models are used in the
purely statistical basis, we search for instances that have context of supply chain management. Following that, we
concrete undesired consequences (e.g., triggering a stockout discuss several papers that leverage external data, such as
in the near future). To do so, we propose to use a two-stage social media content, to augment predictive tasks. Lastly, we
AI model in which the first stage identifies anomalies and survey prior works that study the social impact of artificial
the second stage classifies those anomalies as pertinent or intelligence.
non-pertinent. The second stage has the flexibility to handle
different types of anomalies, depending on the managerial 2.1.1 Data‑driven retail operations
context under consideration. The novelty of this paper lies
in its application to business contexts, especially regarding In recent years, retail managers have increasingly integrated
how the detected anomalies are classified based on busi- big data analytics into operations (Choi et al. 2018; Fisher
ness requirements. Our two-stage model offers flexibility to and Raman 2018). In this context, data-driven supply chain
retailers to detect several types of anomalies based on their management has become a practice that is highly sought
after by many firms (Sanders and Ganeshan 2018). This
1
https://nypost-com.cdn.ampproject.org/c/s/nypost.com/2020/11/17/ trend is supported by the previous literature, which has con-
covid-19-panic-buying-toilet-paper-essentials-fly-off-shelves/amp/ sistently demonstrated the value of data in improving firms’
13
AI & SOCIETY
operations, including supply chain configuration (Wang detecting anomalous demand patterns (e.g., Liu et al. 2012,
et al. 2016), safety stock allocation (Hamister et al. 2018), 2008).
risk management (Choi et al. 2016), and promotion planning Although the concept of demand anomaly detection is far
(Cohen et al. 2017). With the advances in artificial intel- from new, virtually all prior studies have detected demand
ligence in recent years, techniques such as deep learning anomalies from a purely statistical perspective (e.g., iden-
have been increasingly adopted to improve several aspects tifying demand observations that are significantly different
related to retail operations, such as procurement (Cui et al. relative to “normal” patterns). They do not account for busi-
2022), demand forecasting (Birim et al. 2022), inventory ness requirements (i.e., how anomalies are defined from a
replenishment (Qi et al. 2020), and risk management (Wu business standpoint) in the detection process. This discrep-
and Chien 2021). This paper extends the literature in this ancy has important implications because the implementa-
research stream in the wake of the COVID-19 pandemic, tion of cutting-edge technologies without taking business
which imposes significant burdens on supply chains and requirements into account can generate severe consequences
product distribution (e.g., Armani et al. 2020; Ivanov and for businesses (Yeoh and Koronios 2010). In this paper, we
Dolgui 2020). Specifically, we propose a data-driven frame- present an anomaly detection model that explicitly accounts
work that can systematically identify demand anomalies for for managerial requirements. While our model was moti-
grocery products. We then classify these anomalies based vated by the COVID-19 pandemic disruptions, it remains
on managerial requirements (e.g., whether these anomalies flexible enough to ensure that it can be generalized to non-
are pertinent according to a specific managerial definition). pandemic scenarios. In summary, the novelty of this paper
Using the identified anomalies, we then develop a prescrip- lies in its application to business contexts, especially in
tive tool to showcase the implications of our method in terms how the detected anomalies are classified based on busi-
of essential product distribution. Our data-driven model can ness requirements. As such, we have developed a two-stage
help retailers strategically decide when to activate a ration- model that includes a meaningful interpretation and offers
ing policy, for which stores and categories of products to flexibility to retailers to detect several types of anomalies
do so, and the appropriate limit value (e.g., one versus two based on their business requirements and preferences.
items per customer).
Our paper is also related to the nascent literature on retail 2.1.3 Leveraging external data
operations amid the COVID-19 pandemic (Tsyganov 2021).
Several papers have focused on supply chain resiliency (e.g., This paper is also related to the stream of literature that
Ivanov and Dolgui 2020; Sodhi and Tang 2020). Han et al. utilizes external data sources, such as social media and
(2020) consider the impact of the pandemic on e-commerce news articles, for identification, exploration, and predic-
sales. While most previous studies take a descriptive or pre- tion tasks. Conceptually, several prior studies have dem-
dictive approach, our paper adopts a data-driven prescriptive onstrated that social media tends to capture public opinion
approach to enhance future decisions, namely, how retailers (Anstead and O’Loughlin 2015). Indeed, the content shared
can detect panic-buying events and strategically react. on social media tends to be generated from diverse infor-
mation sources (Bakshy et al. 2015). As a result, social
2.1.2 Anomaly detection media data are widely used for prediction purposes in a
wide range of applications. For example, Chen et al. (2014)
Anomaly detection has been extensively studied in the sta- demonstrate that investors’ opinions transmitted via a social
tistics, computer science, and machine learning communities media platform are significantly useful for predicting future
(see, e.g., Mehrotra et al. 2017, and the references therein). stock returns and earnings surprises. Similarly, Tanlamai
Recent anomaly detection models have taken advantage of et al. (2022) develop a machine learning model to identify
advances in machine learning and deep learning methods to arbitrage opportunities in a retail market. The authors show
identify patterns of anomalies based on multivariate inputs. that the predictive performance of the model significantly
For a recent comprehensive literature review on anomaly increases when external data, such as online user reviews
detection models that rely on deep learning methods, see and online questions and answers, are included as model
Chalapathy and Chawla (2019). Applications of these anom- features. Using social media data to help predict sales of spe-
aly detection models include the identification and detection cific products is also common in the literature (e.g., Gaikar
of fraud (e.g., Paula et al. 2016), cyber-intrusion (e.g., Hong and Marakarkandy 2015).
et al. 2014), and medical anomalies (e.g., Salem et al. 2013; In the context of this paper, of which the focus is the
Sabic et al. 2021). In the focal context of this study, several COVID-19 pandemic, there exist several prior studies
prior papers have used anomaly detection algorithms for that demonstrate the potential usefulness of external data
retail demand anomaly detection and empirically demon- in predictive tasks. For instance, Samaras et al. (2020)
strated that existing algorithms perform reasonably well in empirically studied the role of Google Trends and Twitter
13
AI & SOCIETY
13
AI & SOCIETY
consistently detect these anomalies. We collaborate with a product categories that were significantly affected by the
retail partner to move beyond the standard statistical defini- COVID-19 pandemic: toilet paper (398 products), canned
tions of these anomalies and to incorporate business implica- soup (255 products), and household cleaners (554 prod-
tions into the definition. Lastly, we discuss systematic and ucts).2 We plot the (normalized) weekly transaction volume
semi-automatic processes that can be triggered following for toilet paper in Fig. 2. Similar plots for canned soup and
the detected anomalies to mitigate the potential impact of household cleaners can be found in Appendix A.1.
panic-buying behavior. For all three product categories, we observe a striking
spike in the volume of sales around mid-March (i.e., weeks
10 to 12 on the x-axis). Notably, the number of weekly
3 Data and empirical context transactions in week 11 doubles—or even triples—relative
to weeks 1–8 for all three categories. This increase is sig-
In this research, we collaborate with one of the largest gro- nificantly higher relative to the reported average increase in
cery retail chains in North America. The company manages grocery sales due to COVID-19. This led us to select these
multiple brands of grocery stores and records billions of three product categories as our focal categories in this paper.
dollars in annual revenue. Through this collaboration, we In total, we have 15,005,425 in-store transactions related to
were able to access comprehensive point-of-sale transac- the sales of these 3 product categories in the 42 stores.
tion data from 42 stores located in a large metropolitan city. In addition to point-of-sale data, we also independently
Our dataset spans January 1, 2018 to May 1, 2020. Using collect data from external sources related to the COVID-19
this dataset, we first examine the effect of the COVID-19 pandemic. Our objective is to utilize these data sources to
pandemic on several product categories. Recall that the first augment the prediction accuracy of our models. Specifically,
wave of COVID-19 in North America in March 2020 had a we collect data from the following three external sources3:
strong impact on grocery store sales. For example, Statistics
Canada reports that the average increase in grocery sales
relative to the previous year was approximately 45% among 2
In addition to the three main categories reported in the paper,
Canadian grocery stores (see Fig. 1). In this context, in Q1 four other categories, namely Baking Ingredients, Water, Cheese,
of 2020, there was an intense panic-buying wave in several and Soap, were initially chosen by the retailer for this project. How-
countries around the world. Customers rushed to grocery ever, after preliminary analyses, the numbers of detected anomalies
stores to purchase large quantities of essential products, such in these categories were smaller (less than 0.78%). As a result, the
retailer decided to exclude these four categories from the study and
as face masks, hand sanitizers, canned food, and toilet paper. focus on the three categories we presented.
We use our data to plot the total number of sales per 3
There are of course additional sources of external data available.
product category on a weekly basis. We then identify three However, it is important to note that in the retail industry, it is often
13
AI & SOCIETY
Fig. 2 Weekly toilet paper sales from January to May 2020 across 42 stores (the first week on the x-axis corresponds to the week of January 5).
The y-axis is normalized for anonymity
1. First, we collect the data from Google Trends searches. 1 and May 1, 2020. Based on this dataset, we calculate
In particular, we record the number of searches for the the volume of the news articles and their sentiment.
term “COVID-19.” In addition, we record the number
of searches that are related to the product categories We collect these external data using an automatic script
under consideration (i.e., we collect the daily search that runs on a daily basis. The sentiments of the tweets
trend for the terms “toilet paper,” “household cleaners,” and news articles are analyzed using the VADER (Valence
and “canned soup”). We limit the collection scope to Aware Dictionary and sEntiment Reasoner) package in
searches within the metropolitan area of the city in our Python (Gilbert and Hutto 2014), which is widely used in
dataset between January 1 and May 1, 2020. the literature to analyze the sentiment of textual content in
2. Second, we collect data from the social networking plat- media and social media (e.g., Ilk et al. 2020). The package
form Twitter. Specifically, we collect tweets that con- analyzes the input content and produces four output senti-
tain one of the common COVID-19 hashtags defined ment scores: positive, neutral, negative, and compound. The
by Lamsal (2020), as well as tweets that are related to first three scores range between 0 and 1 and represent the
the product categories under our consideration. In other proportion of positive content, neutral content, and negative
words, we collect tweets that either contain one of the content, where the sum of the three scores is 1. Meanwhile,
common COVID-19 hashtags or mention toilet paper, the compound score, which is the score that we use in our
household cleaners, or canned soup in the content or study, normalizes the three scores into a single score. The
both. Similar to our Google Trends data collection, we compound score ranges from 0 (extremely negative) to 1
limit the scope to tweets within the metropolitan area (extremely positive). We have a total of 337 tweets and 484
of the city in our dataset between January 1 and May news articles that fit our selection criteria.4
1, 2020. Based on this collected data, we calculate the To illustrate the trend of the external data we collected,
volume of the tweets as well as their sentiment. we plot the volume of the search terms on Google Trends
3. Third, we collect news article data from three local news that are related to toilet paper in Fig. 3. Interestingly, we
outlets that operate in the same metropolitan area as the observe that the peak of toilet paper sales driven by the
city in our dataset. These three news outlets are the top COVID-19 pandemic on March 13, 2020 (see Fig. 2) coin-
three local news outlets based on their Google search cides with the peak in Fig. 3. In addition, Fig. 4 plots the
ranks. We then collect all the news articles that men- volume of tweets and news articles that contain the terms
tion either the term “COVID-19” or one of the product “COVID-19” and “toilet paper.” Once again, a similar trend
categories under consideration or both between January can be observed. Overall, these two figures provide strong
evidence that external data can be useful in the context of
Footnote 3 (Continued) the predictive tasks studied in this paper. Similar figures for
illegal to collect granular data from competitors and/or manufacturers
and use these data in either predictive or prescriptive models. Mean-
4
while, it is possible to obtain aggregate-level data (e.g., at the geog- Several variables in our collected external data are naturally cor-
raphy, quarter, or industry levels) by purchasing such datasets from related. As shown in Appendix B, however, none of the correlation
third-party providers. Nevertheless, our retail partner felt that the cost coefficients are extreme. Since we are focusing on a predictive task,
of acquiring, cleaning, and ingesting these data was not a priority we measure the usefulness of these variables by comparing our model
given that such external datasets may not be granular enough to be performance before and after including these variables in the model,
meaningfully used in our models. as shown in Tables 5 and 6.
13
AI & SOCIETY
canned soup and household cleaners are available in Appen- performance. The second stage is modular in the sense that
dix A.2. it can be adapted to various business contexts, depending
on the desired definition of an anomaly. We summarize the
design and function of our two-stage model in Fig. 5. The
4 AI model for demand anomaly detection first stage relies on unsupervised machine learning (ML)
methods since data labels are not available in our training
The objective of our demand anomaly detection model is dataset, whereas the second stage is based on supervised
twofold. First, we aim to develop a flexible framework that methods. The overall ML pipeline architecture is presented
is applicable to the COVID-19 pandemic context as well as in Appendix C, whereas the specifics of the ML methodolo-
to other scenarios where demand anomalies may occur (e.g., gies we used are relegated to Appendix D.
natural catastrophes such as hurricanes and severe storms).
Second, given the emergency and criticality of the impact
that the COVID-19 pandemic poses for the distribution of 4.1 First‑stage model: anomaly detection
essential products, we also ensure that our model can be tai-
lored to the specific business requirements of the COVID-19 Our first-stage model aims to identify sales observations that
context. We resolve this dilemma by developing a two-stage are anomalous (e.g., large sharp increases). We construct
model with a flexible framework. The first stage (labeled our dataset using a sliding window mechanism (Sejnowski
anomaly detection) applies an unsupervised anomaly detec- and Rosenberg 1987) to create a time-series dataset. Specifi-
tion model to identify anomalies in sales transactions. The cally, we use a window of 3 h that comprises the aggregated
second stage (labeled anomaly labeling or pertinent classifi- observations of 36 data points, each with a 5-min interval.
cation) is tailored to the COVID-19 situation by taking man- Figure 6 illustrates the construction process of our data. The
agerial requirements into account to classify the detected detailed step-by-step feature preparation process is available
anomalies from the first stage into pertinent anomalies from in Appendix E.
a business perspective. As discussed, this stage also incor- Each observation represents the count of articles sold in
porates external data sources to improve the classification the focal category in a given store during the corresponding
13
AI & SOCIETY
Fig. 6 Structure of the data (for illustrative purposes, we assume the data starts at 9 AM)
interval. We aggregate the products at the category level two approaches, we refer the reader to the aforementioned
to ensure sufficient data variation. However, we highlight papers.5
that the model performance is independent of the aggrega- We train the Donut model using the sales data from 2018
tion level (i.e., our model can be applied at the product or and 2019. Meanwhile, the EIF model is trained on the 2019
sub-category level as long as the underlying data represents data.6 We then apply both models to the sales data in the
a large enough sample with sufficient variation). Since the first 4 months of 2020 (recall that panic-buying behavior in
dataset does not have a label (i.e., there is no label that indi- North America occurred mainly in March and April 2020).
cates whether an observation is anomalous), we adopt ML The output of both models is an anomaly score for each
methods designed to detect anomalous events in an unsu- observation. Observations with an anomaly score higher
pervised fashion. than a certain threshold in at least one model (since Donut
To ensure that our results do not overly depend on a sin- and EIF are complementary approaches) are considered
gle approach, we consider several anomaly detection meth- anomalies, which become inputs for the second-stage model.
ods for the first stage. We ultimately select two methods We treat the threshold in both models as hyperparameters
that detect anomalies using different approaches. The first (i.e., parameters whose values are chosen before training
method is a time-series-based anomaly detection method the algorithm and used to control the learning process) that
called Donut (Xu et al. 2018b). It uses state-of-the-art deep are optimally selected via a cross-validation procedure to
learning variational autoencoders to suppress time-series maximize the performance of our second-stage model.
components (trend, seasonality, and noise) and detect anom-
alies in an unsupervised manner. The second method is a
tree-based model-free anomaly detection algorithm called 5
We also considered several alternative approaches, including a
the extended isolation forest (EIF) (Hariri et al. 2021). The statistical-based method and Prophet (Taylor and Letham 2018). Both
core operation of the EIF is to identify observations that these approaches yielded lower performance relative to our two pri-
mary methods.
differ from the rest of the data. For more details on these 6
We use a different training set for the EIF model for the sake of effi-
ciency. However, the results remain qualitatively the same when we
use the same training set for both models.
13
AI & SOCIETY
4.2 Second‑stage model: pertinent anomaly decision tree (GBDT) (Friedman 2001) as our main method
classification for this stage based on its superior performance. Specifi-
cally, we adopt an efficient implementation framework of
The output of the first stage consists of a list of anomalies the GBDT called LightGBM (Ke et al. 2017). Four groups
detected from a purely statistical perspective. However, even of input features associated with basket statistics, prices and
though these anomalies are significantly different from “nor- promotions, time components, and anomalies are considered
mal” patterns, not all of them lead to adverse consequences in this model, with a total of twenty-nine input variables.
from a business perspective. In our second stage, we label Since anomalies can be detected concurrently across mul-
these anomalies as pertinent and non-pertinent based on tiple stores, the second-stage model is trained using input
managerial expertise. Subsequently, we use a supervised features obtained from all stores in the same region. To this
learning model trained on input features associated with the end, we applied a z-score normalization to the sales data
labeled anomalies to detect pertinent future anomalies. (i.e., quantity and number of transactions) of each store prior
In collaboration with our retail partner, we defined three to computing the input features.7 The complete list of predic-
types of consequences associated with pertinent anomalies tors used in our second-stage model is reported in Table 15
in the context of panic buying: (i) anomalies that trigger a in Appendix F.
stockout of at least one product in the category within the We then split the input data into a 70% training set and a
next 3 days, (ii) anomalies that lead to subsequent anom- 30% test set. The GBDT model is trained using the training
alies in the same category and store (i.e., at least 10% of set, and its performance is evaluated on the test set. All vari-
observations in the next 3 hh in the focal store are identified ables related to external data are lagged by one time period
as anomalies), and (iii) anomalies that lead to a spread of to address a potential reverse causality issue. Ultimately,
anomalies of the same category in other stores (i.e., at least this model predicts for each product category and each 3-h
ten other stores experience more than one anomaly in the window whether an anomaly is pertinent (and provides a
following 3 h). We also vary the time length of the different likelihood score).
definitions to provide an additional layer of flexibility. An
anomaly is formally defined as an observation with a rare 4.3 Anomaly detection results
pattern that deviates significantly from the majority of the
data. In the context of panic buying, an anomaly translates 4.3.1 Detected anomalies
into a very high, sudden spike in demand. As discussed, we
further classify the anomalies into three pertinence types, We first present a visualization of the anomalies detected
depending on their consequences: triggering a stockout, by our first-stage model in Fig. 7. Each dot represents the
leading to subsequent anomalies in the same store, and (normalized) aggregate sales over the category during a
leading to subsequent anomalies in other stores. Finally, 3-h interval. The colors of the dots are only for the sake of
these business requirements used to define the pertinence of visualization and are not used in any of our analyses: gray
anomalies are fully flexible so that practitioners or research- corresponds to no anomaly, yellow to a low score, orange to
ers who adopt our model can rely on our framework and a medium score, and red to a high score. For conciseness,
label pertinent anomalies differently based on their specific
requirements. 7
A z-score normalization refers to normalizing every value in a data-
Similar to the first-stage process, we consider several clas- set such that the mean of all of the values is 0 and the standard devia-
sification models. We finally opt for the gradient-boosting tion is 1.
13
AI & SOCIETY
we focus on the toilet paper category from the largest store flexibility to define the concept of pertinence, depending on
in our dataset (the same plots for the two other product cat- the context under consideration.
egories can be found in Appendix A.3). Recall that each To summarize, Table 1 reports the total number of anom-
observation in our data is an aggregated sales volume of alies detected by each algorithm (Donut and EIF) in the first
all the products in a category during a 3-h time window. stage, as well as the total (i.e., the union of both sets). It also
The x-axis represents the time, and the y-axis represents reports how many of these anomalies are eventually clas-
the (normalized) volume of sales. Intuitively, observations sified as pertinent in the second stage. Interestingly, Donut
with sales higher than a certain threshold are identified as detects a significantly larger number of anomalies than EIF.
anomalies, and the higher the sales, the higher the anomaly At the same time, EIF is more effective than Donut since
score. To allow easy visualization, we assign four colors to the proportion of pertinent anomalies in the second stage is
the anomalies, depending on their severity. much higher. This detail illustrates the complementarity of
Our second-stage model classifies the anomalies detected the two approaches. Note that the total number of pertinent
in the first stage as pertinent or non-pertinent. In Fig. 8, we anomalies reported in the last column is not the sum of the
plot the non-pertinent or false alarms (in black) and the per- total number of pertinent anomalies reported based on Donut
tinent alarms (in other colors) for the toilet paper category. and EIF in the two preceding columns because the same
The non-pertinent observations are detected as anomalies pertinent anomaly could be detected by both Donut and EIF
from a statistical perspective but did not yield business con- in the first stage.
sequences as per our definitions. As we can see, our second- In addition, recall that anomalies detected in the first
stage model can successfully classify anomalies as pertinent, stage are considered pertinent if they fit any of the three defi-
allowing us to remove the non-pertinent anomalies, thus nitions of pertinence presented in Sect. 4.2. Table 2 reports
making the detection process more suitable from a manage- the percentage and number of anomalies that are labeled as
rial perspective. As mentioned before, our method offers the pertinent according to each of our business definitions in
13
AI & SOCIETY
Toilet paper 81% 81% 81% 93% 93% 93% 77% 74% 75%
Canned soup 83% 53% 68% 89% 80% 84% 71% 52% 60%
Household cleaners 86% 68% 76% N/A (no pertinent anomalies) 70% 58% 63%
Table 4 Number of days in which pertinent anomalies are detected in in Table 4. As expected, the pertinent anomalies are mainly
2020 concentrated in March 2020. This finding is a strong valida-
January February March April tion that our approach successfully detects anomalies for the
right period, whereas virtually no anomalies are detected
Toilet paper 0 0 6 1 during periods where we do not expect to see them (since
Canned soup 0 0 22 1 no panic buying was observed). These results support the
Household cleaners 0 0 25 7 validity and correctness of our approach and provide us with
a strong sanity check.
the second stage. Note that household cleaners exhibit no
anomalies that signal a spread to other stores but include
several anomalies that trigger subsequent anomalies in the 4.3.2 The role of external data
same store, as well as anomalies that lead to future stockouts.
As is common in the ML literature (e.g., see Larose We next investigate the impact of incorporating external
2015), we use three complementary metrics: precision (i.e., data sources (Google Trends, Twitter, and news data) into
the percentage of results that are relevant), recall (i.e., the our second-stage model. In other words, we investigate the
percentage of total relevant results correctly classified by improvement in classification performance for models that
the algorithm), and F1 score (i.e., the measure of a model’s incorporate these external data sources relative to a base-
accuracy defined as the harmonic mean of precision and line model that uses only internal data. Here, we report our
recall) to measure the performance of the classification algo- model performance in terms of classifying pertinent anom-
rithms. Note that the accuracy measure would not be suit- alies for different combinations: using only internal data,
able in this case given the skewness of the problem (anoma- using internal data with each type of external data, and using
lies are very rare events with less than 1% occurrence over internal data with all types of external data. For brevity, we
historical data) since all the models would achieve more label an observation as a pertinent anomaly if it satisfies any
than 99% accuracy based on this measure. We report the of our three definitions. Consistent with the prior literature,
overall performance of our second-stage model in Table 3. we find that adding external data helps enhance the perfor-
This table includes the precision, recall, and F1 score of the mance of our classification model. The results are reported
anomaly detection outcomes for the three product catego- in Tables 5 and 6. Observe that the F1 score for the model
ries (toilet paper, canned soup, and household cleaners) with that incorporates all external data sources in the second-
respect to the different definitions of pertinence. Overall, our stage classification is consistently above 90%.
model performs well for toilet paper across all three types To clearly establish the influence of each external data
of pertinent anomalies. It is worth noting that we do not source in the second-stage classification task, we compute
report the classification of spreading anomalies for house- the Shapley value of each predictor used in the classifica-
hold cleaners since there is no pertinent anomaly based on tion model. The notion of the Shapley value was originally
this definition for this product category. Finally, we report developed as a solution concept in the cooperative game
the days in which we identify at least one pertinent anomaly theory literature. The value represents the average expected
Toilet paper 91% 85% 88% 95% 95% 95% 95% 95% 95%
Canned soup 65% 55% 59% 73% 55% 63% 85% 55% 67%
Household cleaners 84% 65% 73% 89% 69% 78% 84% 65% 73%
13
AI & SOCIETY
Fig. 9 Feature importance (Shapley vales) of top ten predictors in the second-stage classification for the toilet paper category
marginal contribution of each player in the game to the (hour of the day and day of the week) also seem to play an
payoff function (Shapley 1953). In recent years, it has been important role. The feature importance plots based on the
widely used to explain the influence of the different predic- Shapley values of the top ten predictors for the canned soup
tors in ML models (e.g., Husain et al. 2016; Ma and Tourani and household cleaners categories are available in Appendix
2020; Pamuru et al. 2022). The choice of this model-inter- A.4.
pretability metric (rather than using the pre-built function
in LightGBM for variable importance) is due to the require- 4.3.3 Alternative specifications
ment of the retail partner to make the interpretability func-
tion model agnostic in case they change the predictive model We next consider several alternatives to the specification of
in the ML pipeline in the future. In the context of this paper, our two-stage model and report several comparison results.
the Shapley value captures the average marginal contribu-
tion of each predictor to the eventual outcome (i.e., whether 4.3.3.1 Varying the definition of triggering anoma‑
the anomaly is pertinent). To this end, we use the SHAP lies Recall that we defined an anomaly as pertinent if it trig-
library (Lundberg and Lee 2017), which relies on scalable gers stockouts within the next 3 days. In Fig. 10, we vary the
additive feature attribution methods, to compute the Shapley length of this definition. Particularly, we consider 1 day and
values. The resulting mean absolute Shapley values can then 7 days as alternative values. As expected, when the length
be used as a measure of global feature importance based on increases, the percentage of detected pertinent anomalies
the magnitude of feature attributions. also increases—in a concave fashion. The fact that our
Figure 9 shows the feature importance derived from the model offers the flexibility to work under various lengths
Shapley values of the top ten predictors (in terms of influ- allows us to adapt the definition of anomaly depending on
ence on the classification task) for the toilet paper category the business requirements.
(a higher value indicates a higher influence on the clas-
sification). As we can see, five out of the top ten features 4.3.3.2 Using single‑stage anomaly detection mod‑
come from external data sources. This finding explains the els Recall that our anomaly detection model relies on two
superior performance of the model that uses external data stages: detecting anomalies (first stage) and classifying them
sources relative to the model that uses only internal data, as as pertinent (second stage). The first stage relies on the time
reported in Tables 5 and 6. In addition, temporal features series of the sales, whereas the second stage uses several
13
AI & SOCIETY
features (e.g., promotions and external data) to perform the Table 7. The second number (on the right) corresponds to
classification task. An alternative approach would be to con- the number of store anomalous days (SAD) that are detected
sider a single-stage anomaly detection model that utilizes all by each method. The first number (on the left) corresponds
the features at once. We next compare the performance of to the number of store anomalous days with actual pertinent
our two-stage model to several single-stage anomaly detec- anomalies (SADPA) among the SAD. The ratio SADPA/
tion methods (vanilla model, EIF multivariate, and Donut SAD is then equivalent to precision at the store-day level.
multivariate). The vanilla model refers to a simple statisti- This number is important because the retailer needs to
cal method whereby anomalies are defined as observations review and react based on the detected anomalies. Thus,
with a value higher (or lower) than two standard deviations incorrectly detected anomalies (i.e., false positives) can
from the mean of the sales in the training dataset. The EIF significantly affect the trust of the users of this tool. As we
and Donut multivariate models are extensions of the (unsu- can observe in Table 7, our two-stage model can effectively
pervised learning) anomaly detection models considered in rule out the majority of anomalies detected by the first stage
Sect. 4.1, where all the features are included in the anomaly that are likely to be non-pertinent. Table 7 showcases three
detection phase (i.e., in the first stage). important benefits of our approach: (i) our model detects a
The results for all three product categories are reported in much larger number of actual pertinent anomalies relative
Table 7. Since our retail partner is particularly interested in to the three single-stage benchmarks; (ii) our model has a
how effective the methods are in detecting pertinent anoma- much lower number of false alarms (e.g., during the peak
lies in each specific store-day pair, we evaluate the methods period in March and April 2020, the false positive rate of the
using two key measures that are reported in each cell of two-stage model over the three categories is approximately
13
AI & SOCIETY
8.8% versus 54.4% when using the vanilla model); and (iii) solving this problem until now with our proposed approach.
our model detects anomalies when it should (March and The second measure is the arrangement of an urgent ship-
April) but not when none exist (January and February). ment to fulfill additional inventory. If necessary, the demand
These results clearly convey the need to consider a two- planner can make a special order request if it is foreseen
stage approach in which the various features are leveraged that certain products will be out-of-stock prior to the next
for classification purposes in the second stage, especially in scheduled delivery. Such an intervention, however, is costly
the empirical context of this study. More importantly, with- and mainly used in exceptional situations when a substantial
out the second stage, the retailer would need to deal with an stockout is anticipated. In this section, we focus on the first
enormous number of detected anomalies that often turn out measure (rationing policy), but one can easily combine our
to be false alarms. Meanwhile, although using traditional simulation tool with the second measure (urgent shipment
single-stage models may be appealing since they are easier to fulfill additional inventory), as discussed in Sect. 5.3. We
to employ and maintain, such an approach can lead to detect- next elaborate on our prescriptive simulation tool and apply
ing a large number of non-pertinent anomalies that can be it to a simulated future panic-buying event to showcase its
costly to the company’s operations. However, these models practical impact.
tend to fail to detect many pertinent anomalies.
So far, our focus has been on the predictive side by devel- 5.1 Prescriptive tool for rationing policies
oping a model that can successfully detect retail demand
anomalies according to different definitions of pertinence. We conduct a scenario analysis at the product-group level in
The next question is how retailers can leverage the insights which a limit c is imposed on the total number of products
generated by our model to improve their operational deci- purchased per customer. For example, the store manager may
sions. We investigate this question by developing a prescrip- decide to allow each customer to purchase no more than
tive simulation tool in the next section. c = 2 packs of toilet paper of any brand. The product groups
on which such a limit can be imposed are predetermined by
the retailer and typically represent a set of products that are
5 Prescriptive retail operations relatively homogeneous and substitutable for each other. Our
amid pandemics prescriptive tool can be used to decide (i) when to activate
such a rationing policy, (ii) for which group of products, and
Once pertinent anomalies are detected, the retailer needs (iii) the best limit value (i.e., the value of c). Taken together,
to react quickly and strategically. In particular, the demand we develop a simulation tool that allows retailers to proac-
planner can analyze the impact of panic buying on the store tively test “what-if” scenarios in response to the detected
inventory to prescribe the necessary actions. More specifi- demand anomalies due to panic buying.
cally, our model will send a signal indicating potentially The total demand for each group of products g is esti-
pertinent anomalies that may affect the operations of the mated by a decomposition model comprising two modeling
retailer; then the retailer can strategically react by deploy- features: (i) the estimated arrival rate of customers for prod-
ing two potential countermeasures. First, the retailer may uct group g, and (ii) the distribution of the number of units
impose a quantity limit per customer for a group of similar (basket distribution) from this product group purchased by a
products, typically at the category or sub-category level. customer. The first input can be estimated from the observed
This type of rationing policy was widely adopted by several sales during the periods with pertinent anomalies, which are
retailers in the first quarter of 2020 and in 2021.8 Ultimately, given by the classification model from Sect. 4.2, adjusted by
such a policy aims to ensure that a higher number of custom- intra-day and intra-week seasonality factors (we omit the
ers can purchase essential products that are currently in high details for conciseness). The second input is the empirical
demand. The simulation model presented in this section can distribution of the number of purchased units per customer
be used to decide when to activate such a rationing policy, in group g obtained from the sales transaction data during
for which categories of products, and which limit value to the same period. The expected total demand for products in
set. To our knowledge, no such data-driven tool has yet been group g under limit c during the time interval (t, t ), where
�
developed. Instead, retailers have heretofore decided when t > t is any subsequent time period, can then be calculated
′
13
AI & SOCIETY
impact. To address this shortcoming, we next present a sim- be computed as Ii,c t = I i (t) − D
̃ �
̃ i,c (t, t� ), where I � i (t) is the
ulation test based on a future panic-buying event. known inventory position of product i at the current time t ,
and D̃ i,c (t, t� ) denotes the total random demand for product i
5.2 Simulation of future panic‑buying waves during the time interval (t, t� ) under limit c . Practically, we
can approximate the expected inventory level of product i
We simulate several relevant scenarios to capture a typical under limit c at time t′ by
wave of panic buying. An event of panic buying is character-
ized by the following three features:
[ ( )] [ ( )]
EP ̃Ii,c t� ≈ I � i (t) − EP D ̃ i,c t, t� ,
13
AI & SOCIETY
and where imposing a lower value will have no additional We simulate panic-buying behavior using actual inven-
benefit. Ultimately, the tool allows retailers to make tory and basket distributions during the panic buying of
informed decisions regarding rationing policies in a data- March 2020. For each simulation scenario, we generate bas-
driven fashion. ket distributions according to a predefined probability distri-
In our simulation, we assume that the consumer demand bution of the number of items in a basket for each category.
would not change should the retailer implement a ration- The number of baskets generated represents the strength of
ing policy. Indeed, in the context of this study, it is highly the panic-buying wave; that is, the stronger the panic-buying
unlikely that consumer demand would be elastic to the retail- wave, the larger the number of baskets (demand). We con-
er’s rationing policy. In other words, since the products in sider three wave-strength configurations: half strength, full
our study are subject to panic buying, it is unlikely that the strength, and double strength. The full-strength configura-
demand would be lower (or higher) as a result of the ration- tion means that the simulation mimics the actual demand
ing policy implemented by the retailer. Consequently, we recorded during the panic buying of March 2020. We simu-
can safely assume that the demand values are independent late the half-strength and double-strength waves by halv-
of the model parameters. Our prescriptive simulation took ing (i.e., 0.5 × March 2020 demand) and doubling (i.e.,
advantage of this distinctive feature. 2 × March 2020 demand) the full-strength wave, respectively.
13
AI & SOCIETY
For each configuration, we generate 10,000 samples and derived from the approximate total demand during the
compute the expected values, which are reported in Tables 8 replenishment lead time. If the approximate on-hand inven-
and 9. More specifically, the data points are uniformly dis- tory is projected to reach this critical level prior to the origi-
tributed between 0.4 and 0.6 for the half-strength simula- nal replenishment date, then the demand planner can con-
tion, between 0.8 and 1.2 for the full-strength simulation, sider placing a special request through the fulfillment system
and between 1.6 and 2.4 for the double-strength simulation to replenish the product. The recommended minimum
(each simulation includes a ± 20% range around the nominal replenishment quantity Qi,c of product i under limit c must
wave strength). The initial inventory value considered for be sufficiently high to ensure that the inventory can satisfy
the simulation is the actual inventory value as of March 12, the estimated demand[until ( the next
) replenishment
] period R,
2020, and we also consider a scenario in which the initial that is, Qi,c = argmin EP ̃Ii,c (R) + q ≥ 0 .
inventory is doubled. Finally, the different scenarios repre-
q
The above extension to making urgent inventory deci-
sent the rationing policies implemented where the number
sions shows how our proposed anomaly detection model can
of items per basket is capped at 1, 2, or 3, or no limit is set.
be leveraged to strategically adapt future inventory decisions
We report the number of baskets served during a period of
to mitigate the occurrence of stockouts and, ultimately, serve
7 or 14 days, as well as the number of days before running
a larger number of customers.
out-of-stock. As observed, by imposing a rationing policy
In summary, our tool’s first mission was to successfully
informed by our proposed model, the retailer can extend the
detect pertinent anomalies in the context of panic buying.
number of inventory coverage days and increase the number
Such anomalies may have adverse future consequences for
of baskets served, both of which have direct implications for
the retailer. It is then the retailer’s decision how to respond:
consumer surplus and social welfare since the product will
either by imposing an informed rationing policy or by plac-
more likely be available on the shelf.
ing an urgent inventory order (or both). Our simulation tool
can guide retailers in making such critical decisions.
5.3 Extension: urgent shipment to fulfill additional
inventory 6 Implications and conclusion
One can extend our simulation tool to prescribing urgent
In this section, we first present the practical implications
inventory shipments. To do so, the output of the scenario
of our AI-based framework. We then discuss how both the
analysis is used to produce an inventory estimate for the near
predictive and prescriptive tools developed in this paper can
future under a specific quantity limit for each item in the
be adapted and used by a wide range of retailers. We close
category to determine if an urgent shipment is necessary.
by reporting our conclusions.
This item-level expected demand estimate can be expressed
a � s
� �� � � � 6.1 Practical implications
EP D̃ i,c t, t� = ∑t λg (s)∑∞ min{̃
s=t k=1
kg , c}P ̃
k g = k 𝜙 i (s) ,
where 𝜙i (s) is the estimated proportion of the demand of As discussed, this work was directly motivated by practical
product i in group g in period s. For example, we can deter- considerations. In particular, the panic-buying behavior amid
mine 𝜙i (s) using the commonly employed multinomial logit the COVID-19 pandemic caught most retailers off guard as
function based on the sales in the same group of products. they were not ready for such unprecedented panic-buying
More specifically, we denote by mi the total sales of product waves. In this paper, we propose an AI-based framework to
i recently observed during periods with pertinent anomalies detect early signals of panic-buying events, which are cast
and by Ag,c (t), the list of available products in group g at time as anomalies. We then propose a prescriptive tool that can
t under limit c. We then have 𝜙i (t) = ∑ i m . Note that this
m
react to mitigate the adverse impact of panic-buying events.
Overall, our framework includes predictive (detecting differ-
j∈Ag,c (t) j
13
AI & SOCIETY
detected anomaly (which store, which category of products, enhanced through the use of data and additional features
the type of anomaly, and its severity score, along with inter- in such contexts. More specifically, pharmacies can further
pretability measures). In addition, at the end of each day, a incorporate drug-consumption coverage in the anomaly
formal report will be sent to the store managers regarding classification model to enhance the prediction of pertinent
the specifics of the detected anomalies. It will then be up to anomalies. Electronics retailers can leverage product return
the store managers to decide whether to take any prescriptive data in the ML-based classification model in a similar fash-
action to preempt the impact of the detected anomalies (e.g., ion. The outputs from the anomaly detection model can then
a rationing policy or increasing inventory orders). Interest- be used to analyze the impact of imposing different rationing
ingly, each store manager will have the flexibility to define policies, as presented in Sect. 5.1.
the types of anomalies they would like to flag and detect. Finally, the framework proposed in this paper can also
Finally, in the event of severe panic buying, our prescrip- be applied to e-commerce. The unsupervised autoencoder
tive tool will assist the retailer in setting the right rationing model we used has been effectively applied to a large-scale
policy for the right set of products at the right time. Ulti- web application to detect and analyze anomalies in web traf-
mately, our framework can be seen as preventive and can sig- fic every minute (Xu et al. 2018a). In addition to the main
nificantly help retailers during challenging and unexpected features presented in Appendix F, online retailers can further
times. Our method can increase access to essential products leverage customer traffic and clickstream data to enhance the
and mitigate prolonged stockouts, which are detrimental to performance of the classification model. Our tool can then
a store’s reputation. be deployed in an automated manner to analyze online retail
sales and provide recommendations on quantity rationing,
6.2 Generalizability product offerings, and inventory fulfillment in real time.
13
AI & SOCIETY
scenarios to strategically react and properly decide when and In addition to its research impact, we believe that our
how to activate rationing policies. Finally, we simulated a method can yield a substantial societal impact. Specifically,
future panic-buying wave to showcase the practical impact it can help increase access to essential products during
of our tool. panic-buying waves. We have shown that a simple proac-
Nevertheless, our research has limitations that offer tive strategy based on our results could have mitigated the
highly promising avenues for future research. First, our overwhelming retail stockouts observed in March 2020 in
model was developed to detect anomalies based on sales North America and significantly increased access to essen-
data and not on actual customer demand realizations. Given tial products. This work demonstrates that by effectively
that observed sales are censored demand, it would be inter- leveraging suitable AI methods on large amounts of data,
esting to extend our method to uncensored sales data to truly the retail world can be better for both firms and consumers.
detect demand anomalies. Second, our study was only con-
cerned with potential strategic reactions to panic buying by
retailers, whereas potential reactions from manufacturers are
left as a potential future research direction. Lastly, while Appendix A: Additional plots
the counterfactual simulation results from our prescriptive
analytics show that our proposed model positively impacts Weekly sales for canned soup and household
consumer surplus and social welfare by improving product cleaners categories
availability, future research could aim to quantify this impact
more formally. See Appendix Figs. 11, 12.
Fig. 11 Canned soup weekly sales from January to May 2020 across 42 stores (the first week in the x-axis corresponds to the week of January
5). The y-axis is normalized for anonymity
Fig. 12 Household cleaners weekly sales from January to May 2020 across 42 stores (the first week in the x-axis corresponds to the week of
January 5). The y-axis is normalized for anonymity
13
AI & SOCIETY
13
AI & SOCIETY
13
AI & SOCIETY
Fig. 19 Feature importance (Shapley values) of top ten predictors in the second-stage classification for the canned soup category
Fig. 20 Feature importance (Shapley values) of top ten predictors in the second-stage classification for the household cleaners category
13
AI & SOCIETY
Trend 1.0000 0.4798 0.0826 0.0275 0.0560 0.2984 0.3766 0.5719 0.0173 0.4251
Trend COVID 0.4798 1.0000 0.1390 0.1521 0.4220 0.8278 0.1919 0.3988 0.3852 0.8400
News 0.0826 0.1390 1.0000 0.8496 0.1765 0.2773 -0.1728 -0.0775 0.2056 0.1253
sentiment
News 0.0275 0.1521 0.8496 1.0000 0.1644 0.2682 -0.1203 -0.0719 0.2551 0.1250
frequency
News COVID 0.0560 0.4220 0.1765 0.1644 1.0000 0.4509 0.0053 -0.0190 0.3819 0.3270
sentiment
News COVID 0.2984 0.8278 0.2773 0.2682 0.4509 1.0000 0.0900 0.1970 0.4040 0.7371
frequency
Twitter 0.3766 0.1919 -0.1728 -0.1203 0.0053 0.0900 1.0000 0.0642 0.0000 0.0573
sentiment
Twitter 0.5719 0.3988 -0.0775 -0.0719 -0.0190 0.1970 0.0642 1.0000 0.0042 0.6532
frequency
Twitter COVID 0.0173 0.3852 0.2056 0.2551 0.3819 0.4040 0.0000 0.0042 1.0000 0.3344
sentiment
Twitter COVID 0.4251 0.8400 0.1253 0.1250 0.3270 0.7371 0.0573 0.6532 0.3344 1.0000
frequency
13
AI & SOCIETY
trend of the terms “toilet paper,” “household cleaners,” local news outlets based on the Google search rank. We
and “canned soup”). We limit the collection scope to then collect all the news articles that mention the term
searches within the metropolitan city in our dataset COVID-19, or mention one of the product categories
between January 1 and May 1, 2020. under consideration, or both between January 1 and May
2. Second, we collect data from the social networking 1, 2020. Based on this dataset, we calculate the volume
platform Twitter. Specifically, we collect the tweets of the news articles and their sentiment.
that contain one of the common COVID-19 hashtags
as defined in Lamsal (2020), and tweets that are We report the Pearson correlation coefficients of all the
related to the product categories under our considera- variables for the toilet paper category in Table 10.Similar
tion. In other words, we collect the tweets that either patterns were also observed for other product categories.
contain one of the common COVID-19 hashtags or
mention toilet paper, household cleaners, or canned
soup in the content or both. Similar to our Google Appendix C: Overview of two‑stage AI
Trend data collection, we limit the scope to tweets pipeline architecture
within the metropolitan city in our dataset between
January 1 and May 1, 2020. Based on this collected The architecture of the AI pipeline implemented on the
data, we calculate the volume of the tweets as well as cloud platform is shown in Fig. 21. The main data used in
their sentiment. this pipeline comprise (i) transaction data, (ii) inventory and
3. Third, we collect news article data from three local stockout records, (iii) external data collected from external
news outlets that operate in the same metropolitan city sources (Google Trends, Twitter, and news). Historical data
as our dataset. These three news outlets are the top three are processed and stored in the database, whereas live data
are connected to the company’s data streams.
13
AI & SOCIETY
10
https://mlflow.org/.
11
https://github.com/sahandha/eif.
12
https://github.com/NetManAIOps/donut.
13
https://lightgbm.readthedocs.io.
13
AI & SOCIETY
Sales quantity 2 1 3 2 … 3 2 4 0 2 5 … 3 2 4 0 2 …
1. Aggregate sales data: We first aggregate the transaction window of 3 h that comprises aggregated observations
sales data by category into the total sales quantity of of 36 data points each composed of a 5-min interval at
each 5-min interval. An example of the aggregate sales the store level. More specifically, the first observation
data is provided below. based on the above example comprises the sales quanti-
2. Input features for anomaly detection: The feature ties of each 5-min interval that has occurred between
inputs of the ML-based anomaly detection models are 11:00 and 14:00, the second observation corresponds
created using the sliding window method (Sejnowski to the sales quantities of each 5-min interval that has
and Rosenberg 1987). To create input features, we use a occurred between 11:05 and 14:05, and so on. We also
13
AI & SOCIETY
conduct a small computational experiment to vary the use the inputs from multiple stores in the same region
time window and find no meaningful implications. in the pertinent anomaly classification model, the sales
3. Outputs of first-stage model: The feature inputs gener- data (i.e., quantity and number of baskets) of each store
ated in Step 2 are used in the first-stage model, which are transformed using a z-score normalization. In other
indicate whether or not each observation in the input is words, features 1–15 in Table 15 are computed using the
considered as an anomaly as shown in the last column normalized sales data from each store. In the last col-
of Table 13. Since our framework leverages two ML- umn, we create the label indicating whether the anom-
based anomaly detection models (EIF and Donut), an aly is pertinent by checking if one of the consequences
observation is considered anomalous if it is flagged as described in Sect. 4.2 has occurred. This column is used
an anomaly by at least one of the anomaly detection as the label in the supervised ML model.
models.
4. Input features for pertinent anomaly classification:
In this step, we only retain the anomalies flagged in the
first-stage model (i.e., labeled as Y in Table 13). For Appendix F: List of features
each detected anomaly, we obtain the input features
(i.e., predictors) for the second-stage model to classify See Appendix Table 15.
pertinent anomalies using both internal and external
data sources. As shown in Table 14, we create 29 input
variables (as listed in Table 15) derived from internal
data and 5 input variables derived from Google trends,
Twitter, and news data as described in Sect. 3. Since we
13
AI & SOCIETY
Acknowledgements The authors’ order is in alphabetical order. The Choi TM, Wallace SW, Wang Y (2018) Big data analytics in operations
third author is the main contributor. The authors would like to thank management. Prod Oper Manag 27(10):1868–1883
the retail partner, IVADO Labs, and SCALE AI that made this work Cohen MC, Leung NHZ, Panchamgam K, Perakis G, Smith A (2017)
possible. We also thank Gregg Gilbert, Michael Krause, and Mehdi Ait The impact of linear optimization on promotion planning. Oper
Younes for their insightful comments that helped improve this paper. Res 65(2):446–468
Cohen MC, Dahan S, Rule C (2022a) Conflict analytics: when data
Author contributions Conceptualization: YA, MC, WK-a-n. Methodol- science meets dispute resolution. Manag Business Rev 2(2):86–93
ogy: OB, AC. Investigation: YA, OB, AC, MC, WK-a-n. Visualization: Cohen MC, Perakis G, Thraves C (2022b) Consumer surplus under
YA, OB, AC, MC, WK-a-n. Writing: YA, MC, WK-a-n. demand uncertainty. Prod Oper Manag 31(2):478–494
Cohen, M. C., Dahan, S., Khern-am-nuai, W., Shimao, H., and Tou-
Funding This work is financially supported by the Natural Sciences boul, J. (2023) The Use of AI in Legal Systems: Determining
and Engineering Research Council of Canada (NSERC) grant number Independent Contractor vs. Employee Status. Artificial Intelli-
RGPIN-2021–02657. The first and fourth authors are part-time advi- gence and Law (Forthcoming).
sors to IVADO Labs, and the fifth author was a part-time advisor to Croson R, Donohue K, Katok E, Sterman J (2014) Order stability in
the same organization when this research was completed. There are supply chains: Coordination risk and the role of coordination
no competing interests to declare that are relevant to the content of stock. Prod Oper Manag 23(2):176–196
this article. Cui R, Li M, Zhang S (2022) Ai and procurement. Manuf Serv Oper
Manag 24(2):691–706
Data availability The data that support the findings of this study are Edmiston J (2020) ‘it’s madness’: Panic buying leaves long lines and
supplied by our retail partner, but restrictions apply to the availability empty shelves at grocers across country. [url: https://financialp
of these data, which were used under license for the current study, ost.com/news/retail-marketing/its-madness-panic-buying-leaves-
and so are not publicly available. Data are, however, available from long-lines-and-empty-shelves-at-g rocers-across-countr y; last
the authors upon reasonable request and with permission of the retail accessed 29-August-2020].
partner. Fisher M, Raman A (2018) Using data and big data in retailing. Prod
Oper Manag 27(9):1665–1669
Declarations Friedman JH (2001) Greedy function approximation: a gradient boost-
ing machine. Ann Stat 29(5):1189–1232
Conflict of interest The authors declare no competing interests. Furutani K (2020) People in japan are panic-buying toilet paper due to
covid-19 coronavirus. [url: https://w ww.t imeou t.c om/t okyo/n ews/
people-in-japan-are-panic-buying-toilet-paper-due-to-covid-19-
coronavirus-030220; last accessed 29-August-2020].
References Gaikar D, Marakarkandy B (2015) Product sales prediction based on
sentiment analysis using twitter data. Int J Comput Sci Inf Technol
Anstead N, O’Loughlin B (2015) Social media analysis and public (IJCSIT) 6(3):2303–2313
opinion: The 2010 UK general election. J Comput-Mediat Com- Gilbert C, Hutto E (2014) Vader: A parsimonious rule-based model
mun 20(2):204–220 for sentiment analysis of social media text. Eighth International
Arafat SY, Kar SK, Marthoenis M, Sharma P, Apu EH, Kabir R (2020) Conference on Weblogs and Social Media (ICWSM-14)., volume
Psychological underpinning of panic buying during pandemic 81, 82.
(covid-19). Psychiatry Res 289:113061 Gopal VG (2021) How changes in consumer preferences and buying
Armani AM, Hurt DE, Hwang D, McCarthy MC, Scholtz A (2020) behaviour have caused more stock outs in 2021. [url: https://start
Low-tech solutions for the covid-19 supply chain crisis. Nat Rev upsmagazine.co.uk/article-how-changes-consumer-preferences-
Mater 5:1–4 and-buying-behaviour-have-caused-more-stock-outs-2021; last
Arumita A (2020) Changes in the structure and system of the shopping accessed 21-March-2021].
center area due to covid-19. Available at SSRN 3590973. Hamister JW, Magazine MJ, Polak GG (2018) Integrating analytics
Bakshy E, Messing S, Adamic LA (2015) Exposure to ideo- through the big data information chain: A case from supply chain
logically diverse news and opinion on facebook. Science management. J Bus Logist 39(3):220–230
348(6239):1130–1132 Han BR, Sun T, Chu LY, Wu L (2020) Covid-19 and e-commerce
Birim S, Kazancoglu I, Mangla SK, Kahraman A, Kazancoglu Y operations: Evidence from alibaba. Available at SSRN: https://
(2022) The derived demand for advertising expenses and impli- ssrn.com/abstract=3654859.
cations on sustainability: a comparative study using deep learning Hancock J, Khoshgoftaar TM (2021) Leveraging lightgbm for categori-
and traditional machine learning methods, Annals of Operations cal big data. 2021 IEEE Seventh International Conference on Big
Research (Forthcoming). Data Computing Service and Applications (BigDataService),
Chakraborti R, Roberts G (2020) Learning to hoard: the effects of 149–154 (IEEE).
preexisting and surprise price-gouging regulation during the Hariri S, Kind MC, Brunner RJ (2021) Extended isolation forest. IEEE
covid-19 pandemic. Available at SSRN: https://ssrn.com/abstr Trans Knowl Data Eng 33(4):1479–1489
act=3672300. Hong J, Liu CC, Govindarasu M (2014) Integrated anomaly detection
Chalapathy R, Chawla S (2019) Deep learning for anomaly detection: for cyber security of the substations. IEEE Transactions on Smart
A survey. arXiv preprint arXiv:1901.03407. Grid 5(4):1643–1653
Chen H, De P, Hu YJ, Hwang BH (2014) Wisdom of crowds: the value Husain W, Xin LK, Jothi N, et al. (2016) Predicting generalized anxiety
of stock opinions transmitted through social media. The Review disorder among women using random forest approach. In: 2016
of Financial Studies 27(5):1367–1403 3rd international conference on computer and information sci-
Choi TM, Chan HK, Yue X (2016) Recent development in big data ences (ICCOINS), 37–42.
analytics for business operations and risk management. IEEE Ilk N, Shang G, Goes P (2020) Improving customer routing in contact
Transactions on Cybernetics 47(1):81–92 centers: an automated triage design based on text analytics. J Oper
Manag 66(5):553–577
13
AI & SOCIETY
Ivanov D, Dolgui A (2020) Viability of intertwined supply networks: Qi M, Shi Y, Qi Y, Ma C, Yuan R, Wu D, Shen ZJM (2020) A practi-
extending the supply chain resilience angles towards survivability. cal end-to-end inventory management model with deep learning.
a position paper motivated by covid-19 outbreak. Int J Prod Res Available at SSRN: https://ssrn.com/abstract=3737780.
58(10):2904–2915. Qin L, Sun Q, Wang Y, Wu KF, Chen M, Shia BC, Wu SY (2020)
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY Prediction of number of cases of 2019 novel coronavirus (covid-
(2017) Lightgbm: A highly efficient gradient boosting decision 19) using social media search index. Int J Environ Res Public
tree. Adv Neural Inform Process Syst 3146–3154. Health 17(7):2365
Khern-am-nuai, Warut and So, Hyunji and Cohen, Maxime C. and Sabic E, Keeley D, Henderson B, Nannemann S (2021) Healthcare and
Adulyasak, Yossiri (2022) Selecting Cover Images for Restau- anomaly detection: using machine learning to predict anomalies
rant Reviews: AI vs. Wisdom of the Crowd. Available at SSRN: in heart rate data. AI & Soc 36(1):149–158
https://ssrn.com/abstract=3808667. Salem O, Guerassimov A, Mehaoua A, Marcus A, Furht B (2013)
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv Sensor fault and patient anomaly detection and classification in
preprint arXiv:1312.6114. medical wireless sensor networks. IEEE Int Conf Commun (ICC)
Lamsal R (2020) Coronavirus (covid-19) tweets dataset. [url: https:// 2013:4373–4378
doi.org/10.21227/781w-ef42]. Samaras L, Garcia-Barriocanal E, Sicilia MA (2020) comparing social
Larose DT (2015) Data mining and predictive analytics (John Wiley media and google to detect and predict severe epidemics. Sci Rep
& Sons). 10(1):1–11
Leswing K (2021) Why there’s a chip shortage that’s hurting every- Sanders NR, Ganeshan R (2018) Big data in supply chain management.
thing from the playstation 5 to the chevy malibu. https://www. Prod Oper Manag 27(10):1745–1748
cnbc.c om/2 021/0 2/1 0/w
hats-c ausin g-t he-c hip-s horta ge-a ffect ing- Sejnowski TJ, Rosenberg CR (1987) Parallel networks that learn to
ps5-cars-and-more.html; Last accessed 21-March-2021. pronounce english text. Complex Systems 1(1):145–168
Li S, Zhang Z, Liu Y, Ng S (2021) The closer I am, the safer I feel: The Settanni E (2020) Those who do not move, do not notice their (supply)
“distance proximity effect” of covid-19 pandemic on individu- chains—inconvenient lessons from disruptions related to COVID-
als’ risk assessment and irrational consumption. Psychol Mark 19. AI & Soc 35(4):1065–1071
38(11):2006–2018 Shapley LS (1953) A value for n-person games. Contributions Theory
Lins S, Koch R, Aquino S, de Freitas MC, Costa IM (2021) Anxiety, Games 2(28):307–317
depression, and stress: Can mental health variables predict panic Shimao, H., Khern-am-nuai, W., Kannan, K., and Cohen, M. C. (2022,
buying? J Psychiatr Res 144:434–440 July). Strategic Best Response Fairness in Fair Machine Learn-
Liu FT, Ting KM, Zhou Z (2008) Isolation forest. Eighth IEEE Inter- ing. In: Proceedings of the 2022 AAAI/ACM Conference on AI,
national Conference on Data Mining 2008:413–422 Ethics, and Society (pp. 664–664).
Liu FT, Ting KM, Zhou ZH (2012) Isolation-based anomaly detection. Shin D (2022) How do people judge the credibility of algorithmic
ACM Trans Knowl Discovery from Data (TKDD) 6(1):1–39 sources? AI & Soc 37:81–96
Lufkin B (2020) Coronavirus: The psychology of panic buying. Shin D (2023) Algorithms, humans, and interactions: How do algo-
https://www.bbc.com/worklife/a rticle/20200304-coronavirus- rithms interact with people? Taylor & Francis, Designing Mean-
covid-19-update-why-people-are-stockpiling; Last accessed ingful AI Experiences
29-August-2020. Shin D, Kee KF, Shin EY (2022a) Algorithm awareness: Why user
Lundberg SM, Lee SI (2017) A unified approach to interpreting awareness is critical for personal privacy in the adoption of algo-
model predictions. Adv Neural Inform Process Syst (NeurIPS) rithmic platforms? Int J Inf Manage 65:102494
30:4765–4774 Shin D, Lim JS, Ahmad N, Ibahrine M (2022b) Understanding user
Ma S, Tourani R (2020) Predictive and causal implications of using sensemaking in fairness and transparency in algorithms: algorith-
shapley value for model interpretation. Proceedings of the 2020 mic sensemaking in over-the-top platform. AI & Society, 1–14.
KDD Workshop on Causal Discovery, 23–38. Sodhi M, Tang C (2020) Supply chain management for extreme condi-
Makridakis S, Spiliotis E, Assimakopoulos V, Chen Z, Gaba A, Tsetlin tions: Research opportunities, J Supply Chain Manag.
I, Winkler RL (2021) The m5 uncertainty competition: Results, Sterman JD, Dogan G (2015) “I’m not hoarding, I’m just stocking
findings and conclusions, Int J Forecasting. up before the hoarders get here”.: Behavioral causes of phantom
Mehrotra KG, Mohan CK, Huang H (2017) Anomaly detection prin- ordering in supply chains. J Oper Manag 39:6–22
ciples and algorithms (Springer). Tanlamai J, Khern-am nuai W, Adulyasak Y (2022) Arbitrage opportu-
Mitchell TW (1924) Competitive illusion as a cause of business cycles. nities predictions in retail markets and the role of user-generated
Q J Econ 38(4):631–652 content. Available at SSRN: https://ssrn.com/abstract=3764048.
Naeem M, Ozuem W (2021) Customers’ social interactions and panic Taylor SJ, Letham B (2018) Forecasting at scale. Am Stat 72(1):37–45
buying behavior: Insights from social media practices. J Consum Tillett A (2020) Medicines rationed to stop panic buying. https://www.
Behav 20:1191–1203 afr.c om/p oliti cs/f edera l/m
edici nes-r ation ed-t o-s top-p anic-b uying-
Pamuru V, Kar W, Khern-am nuai W (2022) Status downgrade: The 20200319-p54bsl; Last accessed 21-March-2021.
impact of losing status on a user generated content platform. Tsyganov, V. (2021). Artificial intelligence, public control, and supply
Available at SSRN: https://ssrn.com/abstract=3963415. of a vital commodity like COVID-19 vaccine. AI & Society, 1–10.
Paula EL, Ladeira M, Carvalho RN, Marzagao T (2016) Deep learn- van Noordt C, Misuraca G (2022) Artificial intelligence for the public
ing anomaly detection as support fraud investigation in brazilian sector: results of landscaping the use of AI in government across
exports and anti-money laundering. 2016 In: 15th IEEE Inter- the European Union. Gov Inf Q 39(3):101714
national Conference on Machine Learning and Applications Wang G, Gunasekaran A, Ngai EW, Papadopoulos T (2016) Big
(ICMLA), 954–960. data analytics in logistics and supply chain management: cer-
Perera HN, Fahimnia B, Tokar T (2020) Inventory and ordering deci- tain investigations for research and applications. Int J Prod Econ
sions: a systematic review on research driven through behavioral 176:98–110
experiments. Int J Oper Prod Manag 40(7/8):997–1039 Wu PJ, Chien CL (2021) Ai-based quality risk management in
Prentice C, Chen J, Stantic B (2020) Timed intervention in covid-19 omnichannel operations: O2o food dissimilarity. Comput Ind
and panic buying. J Retail Consum Serv 57:102203 Eng 160:107556
13
AI & SOCIETY
Xu H, Chen W, Zhao N, Li Z, Bu J, Li Z, Liu Y, Zhao Y, Pei D, Feng Y, Publisher's Note Springer Nature remains neutral with regard to
Chen J, Wang Z, Qiao H (2018a) Unsupervised anomaly detection jurisdictional claims in published maps and institutional affiliations.
via variational auto-encoder for seasonal kpis in web applications.
arXiv preprint arXiv: 1802.03903. Springer Nature or its licensor (e.g. a society or other partner) holds
Xu H, Chen W, Zhao N, Li Z, Bu J, Li Z, Liu Y, Zhao Y, Pei D, Feng exclusive rights to this article under a publishing agreement with the
Y, et al. (2018b) Unsupervised anomaly detection via variational author(s) or other rightsholder(s); author self-archiving of the accepted
auto-encoder for seasonal kpis in web applications. Proceedings manuscript version of this article is solely governed by the terms of
of the 2018b World Wide Web Conference, 187–196. such publishing agreement and applicable law.
Yeoh W, Koronios A (2010) Critical success factors for business intel-
ligence systems. J Comput Inform Syst 50(3):23–32
Zheng R, Shou B, Yang J (2021) Supply disruption management
under consumer panic buying and social learning effects. Omega
101:102238
13