Big Data Analytics in Operation Management
Big Data Analytics in Operation Management
Big Data Analytics in Operation Management
Accepted Manuscript
Title: Big Data Analytics in Operations Management
DOI: https://doi.org/doi:10.1111/poms.12838
Reference: POMS 12838
To appear in: Production and Operations Management
Please cite this article as: Choi Tsan-Ming., et al., Big Data
Analytics in Operations Management. Production and Operations Management (2017),
https://doi.org/doi:10.1111/poms.12838
This article has been accepted for publication and undergone full peer review but has not
been through the copyediting, typesetting, pagination and proofreading process, which may
lead to differences between this version and the Version of Record. Please cite this article as
doi: 10.1111/poms.12838
Article Type: Original Article
Tsan-Ming Choi1
Business Division, Institute of Textiles and Clothing, Faculty of Applied Science and Textiles,
The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong.
Stein W. Wallace
Yulan Wang
Faculty of Business, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
1
For correspondence.
This article has been accepted for publication and undergone full peer review but has not been
through the copyediting, typesetting, pagination and proofreading process, which may lead to
differences between this version and the Version of Record. Please cite this article as doi:
10.1111/poms.12838
Abstract: Big data analytics is critical in modern operations management (OM). In this paper, we first
examine the existing big data related analytics techniques, and identify their strengths, weaknesses
as well as major functionalities. We then discuss various big data analytics strategies to overcome
the respective computational and data challenges. After that, we examine the literature and discuss
how different types of big data methods (techniques, strategies and architectures) can be applied to
different OM topical areas, namely forecasting, inventory management, revenue management and
marketing, transportation management, supply chain management, and risk analysis. We also
investigate real world applications of big data analytics in top branded enterprises. Finally, we
conclude the paper with a discussion of a future research agenda.
Key Words: Big data analytics, big data methods, operations management, data-driven optimization,
applications.
History: Received: 8 December 2017; accepted: 20 December 2017 by Kalyan Singhal after one
revision.
1. Introduction
1.1. Background
We are now in the big data era. Internet of things (IoTs), cloud computing (Passacantando et al.
2016), wireless sensor networks (Takaishi et al. 2014; Ding et al. 2016), and social media are all
commonly used terminologies related to big data in our everyday lives. Big data here refers to the
situation when the dataset exhibits several characteristics, such as high volume, high variety, and
In the big data era, new challenges emerge regarding the computing requirements and
strategies to conduct OM analysis. In particular, we observe that more and more companies and
organizations are employing big data related technologies such as information and communication
technology (ICT), enterprise resources planning (ERP) systems, cloud computing, IoTs, and social
media in their operations. All these sensor and computing systems store and manipulate a massive
amount of data which is highly heterogeneous (including both structured and unstructured data
points) and diversified (Drosou et al. 2017), and requires very speedy processing. This requirement
leads to the rapid development of big data analytics, which motivates us to develop this paper.
To be specific, in this paper, we first search the OM literature and review existing big data
analytics techniques and strategies. We then provide a concise review of the literature in different
important OM topical areas. After that, we discuss how big data methods (techniques, strategies and
architectures) can be applied in different topical areas; namely forecasting, inventory management,
revenue management and marketing, transportation management, supply chain management, and
risk analysis. Some real world applications of big data analytics in top enterprises are also examined.
Finally, we conclude the paper with a discussion of a future research agenda.
1.2. Methodology
For research methodology, we do not intend to report an exhaustive review of the topic. Instead, we
focus on searching via Web of Science portals on papers published in SCI/SSCI journals in the
operations research and management science category. We also supplement with Google Scholar
searches using primary keywords such as “big data, data driven, data analytics”, supplemented by
2
Notice that there are other “V”s which relate to big data. Refer to Choi, Gao, et al. (2017).
3
In this paper, OM includes management science and operations research with an emphasis on employing
analytical and scientific methods.
4
In this paper, “big data analytics” is treated as a singular term.
Most recently, Guha and Kumar (2017) discuss the emergence of big data research and they
examine the topic from the following three perspectives: information systems, operations and
supply chain management, and healthcare. Feng and Shanthikumar (2017) propose analytical
models for probable future research in manufacturing and demand management in the big data era.
Fisher and Raman (2017) explore the use of big data in retail operations. Despite having a number of
review papers around big data analytics, to our best knowledge, none of them explicitly highlight the
OM studies with big data analytics methods (techniques, strategies and architectures), and discuss
how big data analytics maps into OM and the respective applications. This paper hence bridges this
important gap and positions itself as the pioneering review on the topic.
5
We thank Professor Kalyan Singhal for recommending a few important related papers to us.
Big data analytics involves the processing of data from different sources in different formats. For
Accepted Article
example, data can come from the web, social media, ERP systems, and cloud platforms, and they can
be given in text, graphic, audio and video formats. This hence creates terminologies such as web
analytics, social analytics, network analytics, text analytics, and multi-media analytics (see Chen et al.
(2012) and Hu et al. (2014)). In addition, data processing schemes can be split into three types,
namely batch processing, real time (or near real time) stream processing, and interactive processing
(with human interactive inputs-outputs). There are technological supporting platforms for each type
of processing (Chen and Zhang 2014). For instance, Apache Hadoop6 is probably the most famous
batching process software platform (and it implements the computational paradigm, following the
divide and conquer strategy, called Map/Reduce). Dryad and Pentaho Business Analytics are other
examples of batching processing platforms. For real time stream processing, SAP Hana is a software
platform. Storm and S4 are also well-established real time streaming systems which support big data
analytics. For interactive processing systems, Dremel by Google and Apache Drill are examples. As
the technical details behind these schemes are beyond the scope of this review, we refer interested
readers to Chen and Zhang (2014) for more details. In the following, we examine several commonly
used techniques for big data analytics. Note that these techniques are not mutually exclusive and
they naturally overlap to some extent.
6
Hadoop has been improved, e.g., by an integration with “R” to enable parallel processing, as well as other
extensions including Hadoop-ML (Wu et al. 2014).
Data mining: Data mining is a process of extracting insights from a given data set. It is the
cornerstone for business intelligence and big data analytics (Choi et al, 2017). It can be used in areas
such as market segmentation, collaborative processes (Fan et al. 2017), classification, clustering
(Fahad et al. 2014) and regression. Presently, data mining is highly specialized with a lot of different
functional areas and approaches. For instance, we have sequential and temporal mining, spatial
mining, process mining, privacy-preserving mining, network mining, web mining, etc., all of which
are associated with big data analytics. Usually, data mining models are developed based on machine
learning and statistics. For some challenges associated with data mining with big data, including the
multi-source data mining mechanism and the dynamic data mining methods, see Wu et al. (2014).
Optimization: Optimization is a standard analytical approach to finding the optimal (or near
optimal) solutions in quantitative decision-making problems. In business applications, methods like
genetic algorithms (Kershenbaum 1997), stimulated annealing, particle filters, and many other
evolutionary algorithms (Potvin 2009) are well-developed ways to find solutions in a reasonably
short time. In big data analytics, computational optimization methods face challenges on memory
and computational time, convergence and identification of globally optimal solutions, and the need
of real-time optimization (Huang and Chaovalitwongse 2015).
Others: In addition to the above four mainstream and major big data analytics techniques,
other techniques such as social network analysis (Banerjee et al. 2016), clustering algorithm analysis
(Fahad et al. 2014), data envelopment analysis (Zhu et al. 2017), and visualization analysis (Strehl
and Ghosh 2003) are known to be useful for big data analytics.
Table 2.1 summarizes the strengths and weaknesses of the four major techniques, and the
corresponding development areas to cope with the big data challenges. Table 2.2 Further shows the
major functionalities of the four major techniques in big data analytics.
3. Hybrid methods.
Machine Versatile and flexible in Time consuming in training 1. Deep machine learning.
Learning making use of data to
capture complex 2. Scaling up machine learning.
behaviors
3. Fast learning algorithms.
5. Parallel processing.
Data Mining Combining statistical and Suffering the weaknesses of 1. Clustering techniques.
machine learning models the underlying models
which make it versatile 2. Distributed and parallel
to deal with different processing.
types of data
3. Multi-media processing.
From Table 2.1, it is obvious that different big data analytics techniques have their respective
strengths and weaknesses. Thus, recent research is exploring how to better utilize them for different
kinds of applications.
Statistics 1. Determine correlations and data patterns, and identify the data relationship (e.g., by regression) in a
quick manner.
2. Show whether a sample can be used to denote the population, which helps to reduce data
requirements and computational time.
Machine 1. By intelligence, learn, evolve and capture behaviors of the systems under studies.
Learning
2. Can capture complex relationships in the systems but require substantial training time and memory.
3. Flexible and able to support the processing of different data types by image processing, pattern
recognition, text recognition, etc.
Data Mining 1. Extract useful information from data by employing statistical and machine learning models.
2. Clustering analysis, segmentation analysis, dynamic data mining with multi-source huge datasets.
2. Require extensions to deal with real time processing, parallel processing, etc in large scale
optimization.
Big data analytics faces various challenges which make them different from the typical data analytics.
From the data side, these challenges include having a massive amount of data points (big data
volume, high dimension), the presence of complex data (high variety of data with different classes
and types), and the existence of high uncertainty. From the computing side, many existing methods
are not flexible enough and “unscalable” to adapt to the requirements of big data. They also suffer
Distributed and Parallel Processing (DPP): Facing a big dataset, one may also process the data
by multiple parallel and distributed computing systems. This concept is consistent with the divide
and conquer strategy. However, distributed and parallel processing focuses on the importance of
having parallel processing so that the big dataset is being analyzed at the same time by multiple
distributed processors. It has a high degree of flexibility. Recent research also highlights the
importance of distributed machine learning (see, e.g., Xing et al. 2015).
Incremental Learning using New Cases (ILNC): In machine learning, the training process
requires time and when we are given many data points, the training time becomes even more
substantial. This is a hurdle to big data analytics because we need to have quick processing and even
want to achieve real time processing. The ILNC approach aims to incrementally improve the
machine learning algorithms by using the new cases, i.e., new data blocks. This approach requires
the presence of good computing memory so that the knowledge discovered by the trained data sets
will be well-stored.
7
The big data analytics field is developing rapidly. These strategies do not mean to be exhaustive but they do
represent the commonly seen “mainstream” methods to deal with big data challenges, e.g., see Chen et al.
(2014), Hu et al. (2014), and Wang and He (2016).
Scalability: If the computing systems (e.g., the analytical models or optimization methods) are
scalable, they are more versatile to cope with the need of big data analytics. As a result, it is
important to develop versatile computing systems which are flexible and scalable with respect to
computational power so that they can fulfill the requirements of big data analytics.
Heuristics: In the standard OM literature, for many problems which are difficult to solve in a
reasonable time (e.g., NP hard problems), we develop heuristics to try to find near-optimal solutions
by numerical methods, and then identify bounds. This approach is still applicable in big data
analytics to address the computational time issue.
Divide and Break down the big data into multiple pieces and make them small enough to be solved
Conquer one by one, including the granular computing method.
DPP Process data by multiple parallel and distributed computing in multiple processors.
ILNC Improve the machine learning algorithms incrementally by using the new cases.
Statistical Learn about the relationship between samples and the population, and save
Inference computation effort by processing samples instead of population.
Feature Selection Select a subset from the big dataset to represent its core features.
Scalability Ensure the computing system is scalable to deal with big data challenges as “no size fits
all”.
Heuristics Determine the near-optimal solutions and identify the bounds within time and memory
constraints.
In this section, we review the OM studies related to big data. We classify this review into various OM
topical areas, based on the papers collected.
Forecasting: Among all OM topical areas supported by big data analytics, if we plan to choose
one to start with, “forecasting” is probably the most intuitive and direct one. Traditionally,
forecasting relies heavily on historical data, expert advice and market information. In the big data
era, we have more and more available sources of information, which potentially can enhance the
performance of forecasting8. In the literature, Baughman et al. (2016) report the IBM Global
Technology Services (GTS) team’s research in forecasting web traffic patterns. To be specific, at that
time, IBM’s current practice in terms of cloud platform resources allocation required the
participation of humans so as to meet the demand. The GTS team aims to make it automatic by
8
Notice that it is still controversial whether forecasting with big data really matters significantly as there is also
a high cost associated with big data computation and processing. See Nikolopoulos and Petropoulos (2017) for
a recent study.
Inventory Management: Inventory control is a critical topic in OM. In the literature, Huang and
Van Mieghem (2014) adopt the statistical approach to explore clickstream data in inventory control
problems. The authors explore a problem in which the retailers feature products online but they
take orders in stores (i.e. offline). By analyzing the empirical click and order data, the authors
develop via dynamic programming a decision support model. They also show empirically that the
clickstream data is statistically significant to predict the timing and amount of orders offline. They
report a computational study that their proposed decision support model can yield a reduction of
3% inventory holding cost and 5% inventory backordering cost. Van Jaarsveld and Scheller-Wolf
(2015) develop a stochastic programming based algorithm for inventory management in an
industrial-scale assemble-to-order system. Due to the problem’s large-scale nature, it is a big data
related optimization problem in inventory control. The authors consider a continuous time model
and derive the optimal base-stock policies. They reveal that the first-come first-served policy in
component allocation performs reasonably well, and further demonstrate that the no-holdback
allocation policies outperform the first-come first-served policy. Recently, Bertsimas et al. (2016)
employ a data-driven optimization technique called conditional stochastic optimization to explore
inventory control with big data. The authors make use of four-year point-of-sales and inventory data
across the retail network in multiple locations of a retail company. They use Google Geocoding API
to obtain the specific coordinates of store locations and employ the search engine Google’s search
query volume to understand the market attention paid to different items. Altogether, by combining
all sources of information, they decide the optimal inventory management scheme for the retail
network.
Supply Chain Management: Big data has a huge influence on supply chain and logistics
management. It was predicted early that big data analytics would revolutionize supply chain design
(Waller and Fawcett 2013) and may change product lifecycle management in the supply chain (Li et
al. 2015). Big data analytics also affects the optimization of service parts in after-sales operations
management (Boone et al 2016). In the literature, Wang, Gunasekaran and Ngai (2016) study a
distribution network design optimization problem, with the use of big data. The authors consider the
situation in which the supply chain planner can use big data to determine the optimal number of
distribution centers and assign customers to them. They employ a mixed-integer programming
approach and conduct simulation studies to illustrate the performance of the optimization model.
Kaur and Singh (2017) propose a mixed integer nonlinear programming model to address the
environmentally sustainable procurement and logistics operations in supply chains. Owing to the
problem’s complexity and the need to deal with big data and real time analysis, the authors develop
a heuristic. Testing the heuristic by using randomly generated data instances shows that the
heuristic performs well. Papadopoulos et al. (2017) make use of unstructured big data coming from
Risk Analysis: Risk analysis includes activities such as risk assessment, risk monitoring, and risk
control. Undoubtedly, risk analysis, for both business operations (Choi, Chan and Yue 2017) and
non-profit making organization (like governments), would benefit by proper use of big data. In the
literature, Allodi and Massacci (2017) study the cyber-crime problem by using big data. The authors
develop a quantitative scheme for assessing cyber security risk with data from the security centers.
Their proposed scheme can give quantitative probability estimates to help fight untargeted
cyber-attacks towards the organization. The authors conduct an analysis by using real data from a
financial institution to show that their proposed big data risk assessment scheme is effective. Biffis
and Chavez (2017) use a data mining approach to show how to mine satellite big data to yield
weather indices. The indices are critical for weather risk management and has impacts on the
agricultural food industry. The authors develop a data-driven risk transfer scheme. They conduct a
real case study by exploring Mozambique’s maize production. The authors illustrate how weather
data from rainfall and temperature can be used to create risk profiles. They argue that their
proposed framework can lead to a cost saving (from insurance) of 30%. Lopez-Cuevas et al. (2017)
propose a new analytical framework to study “mood” as a proxy of behavior, and reveal how
disruptive events may affect different populations in the presence of risk. In the proposed
framework, the authors first illustrate the mechanism to employ big data from different social media
Others: Big data analytics is also employed in various other domains such as healthcare and
retailing. Interested readers can refer to Guha and Kumar (2017) and Fisher and Raman (2017) for
more discussions.
From the above sections, we have examined various big data techniques, strategies, and studies in
the literature. In this section, we explore how different analytical techniques and big data
architectures map into the examined OM topical areas by combining the results.
In fact, it is known that good big data analytics and applications are more than just the proper
deployment of techniques and strategies. In particular, the complete big data architecture’s design is
critical (see Chen and Zhang 2014). From the papers reviewed above, we have found a couple of
generic big data architectures9 (denoted by BDA 1, BDA 2, BDA 3, and BDA 4) and we present them
in the Appendix. To be specific, BDA 1 represents the architecture for the case with batch processing.
Under BDA 1, data sources are collected by the software agents in the workstation. Strategies Z with
batch processing are adopted and linked with the corporate database. Analytic techniques Y are
employed to generate the output and also update the corporate database. BDA 2 is rather similar to
BDA 1 except that the focus is on real time processing and Strategies Z have to support real-time
stream processing. This also calls for real time deployment of the analytic techniques Y to generate
9
Note that there are some subtle differences in terms of, e.g., the specific platforms adopted and some
companies have multiple databases. In the four proposed generic big data architectures, we focus on the
operational perspective and highlight the specific data sources, techniques and strategies adopted in each
architecture.
Areas Papers Big Data Big Data Big Data Data Sources X Real Cases
Techniques Y Strategies Z Architectures Involved (if
(if specified) any)
Forecasting Baughman et Discrete event Feature BDA 4 Web analytics, social IBM
al. (2016) simulation, selection,ILNU,distrib analytics (real data
statistics, feature uted and parallel from social media and
selection processing, statistics web pages)
algorithms,
optimization
Liu et al. Machine learning, Combining multiple BDA 4 Social analytics, text
(2016) data mining techniques analytics, web
analytics (real data
from social media and
web pages)
See-To and Statistics Statistical inference Common data Real data from
Ngai (2016) analytics10 fashion
companies on a
major Chinese
e-commerce
platform
Ferreira et al. Machine learning, Statistical inference BDA 4 Web analytics, ERP An online retailer
(2016) optimization system Rue La La
Chong et al. Machine learning Distributed and BDA 4 Web analytics (using
(2017) parallel processing real data obtained
from web crawling –
Amazon.com)
Cui et al. Machine learning Combining multiple Social analytics An online apparel
(2017) techniques retailer
Sagaert et al. Statistics Statistical inference BDA 1 Common data A major supplier
10
The term “common data analytics” refers to the case when the data are structured and given in numerical
values.
Bertsimas et Optimization Statistical inference BDA 1 Web analytics Sales data from a
al. (2016) retail company
Revenue Morales and Data mining Statistical inference Common data Real reservation
Management Wang (2010) analytics record dataset
and Marketing from a hotel chain
in the UK
Mukherjee Machine learning, Statistical inference BDA 1 Unstructured big data The medical
and Sinha optimization device industry
(2017)
Shang et al. Statistics Statistical inference Common data Real air cargo data
(2017) analytics from a leading
forwarder
Chung et al. Machine learning, Statistical inference BDA 1 Common data A leading Hong
(2017) optimization analytics Kong airline
Xie et al. Statistics Statistical inference BDA 1 Common data Manhattan city
(2017) analytics (from
multiple sources)
Jamshidi et al. Machine learning Heuristics BDA 4 Common data Dutch railway
(2017) analytics network
Supply Chain Wang et al. Optimization Statistical inference BDA 1 Common data
Management (2016) analytics (from
multiple sources)
Risk Analysis Allodi and Statistics Statistical inference BDA 2 Common data A financial
Massacci analytics institution
(2017)
Biffis and Data mining Statistical inference BDA 2 Common data Maize production
Chavez (2017) analytics in Mozambique
Lorca et al. Statistics Statistical inference BDA 4 Web analytics Hurricane Andrew
(2017) (including online map
data)
Table 5.1 shows how these four big data architectures, with the respective big data techniques
Y, big data strategies Z and data sources X, would fit into different big data analytics models in the
examined papers. From Table 5.1, we have the following findings:
1. Big Data Techniques: Data mining and machine learning techniques are widely used in OM
studies in forecasting, revenue management and marketing, and transportation management.
They are also used in risk analysis. This observation shows the fact that these OM topical areas
involve complex data patterns which require the use of more versatile techniques like machine
learning and data mining to explore. Optimization is the standard technique for inventory
management, and also commonly used in supply chain management. This is expected because
analytical optimization models are well established in inventory management (e.g., the base
stock policy) and supply chain management. Even in the presence of big data, researchers very
likely will consider the application of optimization techniques to solve these problems. Statistics,
as the basic and most fundamental technique for data analysis, is present in almost all
examined OM topical areas.
2. Big Data Strategies and Using the Multi-Methodological Approach: For the majority of studies,
statistical inference is the strategy adopted to deal with the big data problem. This highlights
the fact that for the respective studies, they actually are exploring relatively simple big data
problems. For some other studies, heuristics and scalability are two important strategies to deal
3. Big Data Architectures: In OM studies, the use of batch processing is still popular and common.
This is consistent with the observations that most studies in Table 5.1 are based on batch
processing. However, as real time processing is critical for risk analysis, we do see that more
applications are associated with it. Moreover, we see that the use of BDA 4 (combining multiple
big data architectures) and BDA 3 (combining both real time stream processing and batching
processing together) are quite common.
4. Data Sources: Social media data and web data are very commonly used to conduct studies in
the big data era due to their “public-data” nature. As such, web analytics and social analytics
have been widely observed in the reviewed OM studies. In addition, most reviewed OM studies
are still using the common data analytics method which refers to the analysis based on
structured datasets with numerical data points. This makes the analysis easier but has not
completely realized the true big data nature of having a large variety of data formats and data
sets.
5. Real Cases Based Studies: It is encouraging to see that many reviewed papers report real case
studies. This is an important feature of big data based studies because we have to use real
world relevant data to conduct experiments and analyses. We expect this trend to continue and,
hopefully, more real case based OM studies on big data analytics will appear in the future.
In order to explore real world applications of big data analytics in operations, we conduct a case
study in this section. We choose to identify the world top enterprises in this case study because they
have the needed resources to develop and deploy big data analytics and we also have relatively
Table 6.1. Most valuable branded company in each industrial category (from Forbes.com 2017)
Technology Apple 1
Beverages Coca-Cola 5
Leisure Disney 7
Automotive Toyota 8
Restaurants McDonald’s 9
Apparel Nike 16
Alcohol Budweiser 22
Retail Walmart 24
11
The categorization follows the ones as shown on Forbes’ webpage
[https://www.forbes.com/powerful-brands/list/ (accessed 18 September 2017)]. We do not include those
brands that are categorized as “diversified” (e.g., GE, Siemens, BASF, and Philips) or with very limited
information as well as some big data related service providers.
Apple Mobile analytics; Hadoop Revenue management and marketing: new products design; new
service-bundle-products development
Coca-Cola Mobile analytics; social analytics; AI; image Revenue management and marketing: new products design (e.g. tastes); new
recognition; augmented reality customization service; bottle packaging
Disney Machine learning Revenue management and marketing: customer experience, customization,
park operations
Toyota AI (robotics) Revenue management and marketing: new product design, and pricing; new
service development.
McDonald’s Mobile analytics; mobile computing (iBeacon) Revenue management and marketing: marketing campaigns and promotion;
membership scheme
Nike Machine learning, mobile computing Demand forecasting; Manufacturing; Revenue management and marketing:
new product design, and pricing; new service development (e.g., speedy
customization)
Louis Vuitton Social media analytics Revenue management and marketing: real time fashion show; product pricing
Budweiser Virtual reality; AI Revenue management and marketing: new product design, and customer
experience
American Machine learning Revenue management and marketing: customer experience, new service
Express development; Risk management
Walmart Mobile computing, AI, facial recognition Inventory management: auto-replenishment; Revenue management and
marketing: pricing, customer services, visual merchandising; Store operations:
auto-check out
Caterpillar Machine learning, mobile computing Operations: optimization of resource allocation; Revenue management and
marketing: use of power and fuel
From Table 6.2, it is obvious that all these big enterprises have used big data analytics for
revenue management and marketing activities. This is intuitive as big data from the market,
including consumers, would provide a valuable source of information for these enterprises to
improve their business operations and marketing activities such as product offering, new product
development, market segmentation and pricing. In addition, big data analytics and applications are
also commonly seen in many timely business models such as customized service and individual
product offering. Other important activities in which big data analytics plays a critical role in practice
include demand forecasting and inventory replenishment and management.
There is no doubt that we are now in the big data era. Big data analytics is critical in all kinds of
Accepted Article
organizations and enterprises. OM, as a field which focuses on the optimal use of resources to
improve efficiency and effectiveness of operations, should also take the opportunity to develop itself
to work well with big data.
In this paper, we have reviewed various existing big data related analytics techniques. To be
specific, we have highlighted the importance of statistics, machine learning, data mining, and
optimization models for supporting big data analytics. The strengths and weaknesses of them have
been examined and compared, and their major functionalities have been studied. Then, we have
introduced and discussed various big data analytics strategies such as divide and conquer,
distributed and parallel processing, incremental learning and statistical inference. The core features
of them have been concisely investigated. After that, we have reviewed the related literature and
reported how big data analytics has been applied in topical areas such as forecasting, inventory
management, revenue management and marketing, transportation management, supply chain
management and risk analysis. We have proposed and developed different kinds of big data
architectures. We further revealed how different types of big data techniques, strategies and
architectures can be applied to these OM topical areas. Finally, from exploring publicly available
information on how large scale enterprises use big data analytics, we have uncovered further
insights into the real world applications of big data analytics. We believe that these findings are
valuable to both practitioners and academics who are interested in how big data analytics can be
used in OM.
From our exploration, we have identified a few promising areas that can be studied in the
future:
1. Optimal choices of big data analytics techniques and strategies: From the above analysis, both
companies in the real world and academic studies have used many different kinds of big data
techniques and strategies in operations. However, are they using the best techniques and
strategies? How to determine the best techniques and strategies? These are some fundamental
questions which have not been well-answered. They hence deserve deeper exploration and
further studies.
2. Big data architectures: In this paper, we have proposed four different categories of BDA
architectures (see the Appendix). Despite trying to capture the most essential real world
elements and simplify the picture, these architectures are far from being perfect and
3. Application areas: From the review and real practice examination, “revenue management and
marketing” is a popular area in which big data analytics and the related tools have been applied
extensively. However, from the analysis and review above, there are relatively few published
papers and real world enterprises focusing on supply chain management with big data
applications. Thus, supply chain management is a definitely an under-explored area. The reason
behind can be explained by the fact that big data analytics for supply chain management is
challenging because it requires multiple supply chain members to work closely together for the
use of big data. Thus, in future research, it will be promising and challenging to investigate how
big data analytics can be applied for critical issues such as strategic partnership and channel
coordination in supply chain systems.
4. Real world issues: In this paper, we have studied many real world applications of big data
analytics, especially in large-scale enterprises. On one hand, these studies are introductory and
not deep enough. In the future, more in-depth case studies can be conducted to reveal more
insights regarding their applications of big data analytics. On the other hand, the use of big data
analytics is associated with many social issues such as data privacy, threats to human and social
welfare (e.g., the emergence of artificial intelligence), etc. These should also be studied in the
future so that proper rules can be imposed to ensure the use of big data analytics is ethically
sound and will contribute positively to the society.
Acknowledgements:
We are grateful to the Editor in Chief, Professor Kalyan Singhal for his great support and important
advice on this paper. Tsan-Ming Choi’s research is partially supported by The Hong Kong Polytechnic
University (Grant Number: G-YBGR ). Yulan Wang’s research is partially supported by The Hong Kong
Polytechnic University (Grant Number: G-YBQR).
Agarwal, R., V. Dhar. 2014. Editorial – big data, data science, and analytics: The opportunity and
Accepted Article
challenge for IS research. Information Systems Research 25(3) 443-448.
Ahmed E, I. Yaqoob, I. Hashem, I. Khan, A. Ahmed, M. Imran, A. V. Vasilakos. 2017. The role of big
data analytics in internet of things. Computer Networks 129(2) 459-471.
Ale B. 2016. Risk analysis and big data. Safety and Reliability 36(3) 153-165.
Allodi, L., F. Massacci. 2017. Security events and vulnerability data for cyber security risk. Risk
Analysis 37(8) 1607-1627.
Aloysius, J.A., H. Hoehle, S. Goodarzi, V. Venkatesh. 2016. Big data initiatives in retail environments:
Linking service process perceptions to shopping outcomes. Annals of Operations Research.
Forthcoming.
Aral, S., D. Walker. 2011. Creating social contagion through viral product design: A randomized trial
of peer influence in networks. Management Science 57(9) 1623–1639.
Aral, S., D. Walker. 2012. Identifying influential and susceptible members of social networks. Science
337(6092) 337–341.
Aral, S., D. Walker. 2014. Tie strength, embeddedness, and social influence: A large-scale networked
experiment. Management Science 60(6) 1352-1370.
Arunachalam, D., N. Kumar, J.P. Kawalek. 2017. Understanding big data analytics capabilities in
supply chain management: Unravelling the issues, challenges and implications for practice.
Transportation Research – Part E. Forthcoming.
Badiezadeh, T., R.F. Saen, T. Samavati. 2017. Assessing sustainability of supply chains by double
frontier network DEA: A big data approach. Computers and Operations Research. Forthcoming.
Banerjee, S., S. Sanghavi, S. Shakkottai. 2016. Online collaborative filtering on graphs. Operations
Research 64(3) 756-769.
Baughman, A.K., R. Bogdany, B. Harrison, B. O´Connell, H. Pearthree, B. Frankel, C. McAvoy, S. Sun, C.
Upton. 2016. IBM predicts cloud computing demand for sports tournaments. Interfaces 46(1)
33-48.
Bertsimas, D., N. Kallus, A. Hussain. 2016. Inventory management in the era of big data. Production
and Operations Management 25(12) 2002-2013.
Biffis, E., E. Chavez. 2017. Satellite data and machine learning for weather risk management and
food security. Risk Analysis 37(8) 1508-1520.
Boone, C.A., B.T. Hazen, B. Skipper, R.E. Overstreet. 2016. A framework for investigating
optimization of service parts performance with big data. Annals of Operations Research.
Forthcoming.
Data Sources X: “X” can be web, social media, sensors, corporate database, etc.
Accepted Article
Analytic Techniques Y: “Y” can be machine learning methods (e.g., neural networks), optimization
models, statistical models, data mining approach, etc.
Strategies Z: “Z” can be distributed and parallel processing, feature selection, statistical inference,
etc.
Multiple Architectures M: “M” can include big data architectures 1, 2, 3 or a mix of them.
Data Output
Sources X
Figure 5.2. Big data architecture 2 (BDA 2) with real time processing.
Figure 5.3. Big data architecture 3 (BDA 3) with real time processing and batch processing together.
Corporate
database Output
Analytic techniques Y
Data Sources X
Figure 5.4. Big data architecture 4 (BDA 4) with multiple data sources (from X and multiple
architectures M).