Data Science Predictive Analytics, and Big Data in Supply Chain
Data Science Predictive Analytics, and Big Data in Supply Chain
doi: 10.1111/jbl.12082
hile data science, predictive analytics, and big data have been frequently used buzzwords, rigorous academic investigations into these
areas are just emerging. In this forward thinking article, we discuss the results of a recent large-scale survey on these topics among supply chain management (SCM) professionals, complemented with our experiences in developing, implementing, and administering one of the
rst masters degree programs in predictive analytics. As such, we effectively provide an assessment of the current state of the eld via a largescale survey, and offer insight into its future potential via the discussion of how a research university is training next-generation data scientists.
Specically, we report on the current use of predictive analytics in SCM and the underlying motivations, as well as perceived benets and barriers. In addition, we highlight skills desired for successful data scientists, and provide illustrations of how predictive analytics can be implemented in the curriculum. Relying on one of the largest data sets of predictive analytics users in SCM collected to date and our experiences
with one of the rst masters degree programs in predictive analytics, it is our intent to provide a timely assessment of the eld, illustrate its
future potential, and motivate additional research and pedagogical advancements in this domain.
Keywords: data science; predictive analytics; big data; data scientist; supply chain management; education; curriculum development
INTRODUCTION
A topic that is on the minds of many supply chain management
(SCM) professionals is how to deal with massive amounts of
data, and how to leverage and apply predictive analytics. This
challenge is a direct result of the ease with which data have been
able to be collected via modern information technology, generating unprecedented volume, variety, and velocity of data.
Heralded to revolutionize how SCM is conducted (Waller and
Fawcett 2013b), predictive analytics has the potential for signicant above-average returns (McAfee et al. 2012). Within the context of SCM, predictive analytics can be dened as using both
quantitative and qualitative methods to improve supply chain
design and competitiveness (Waller and Fawcett 2013b, 80).
Predictive analytics is positioned within the overall domain of
data science, which refers to the application of quantitative and
qualitative methods from a variety of disciplines in combination
with SCM theory to solve relevant SCM problems and predict
outcomes, taking into account data quality and availability
issues (Waller and Fawcett 2013b, 79).
Despite the importance and relevance of SCM predictive analytics, there is a dearth of literature on the topic and many questions (Waller and Fawcett 2013b, 77). While articles in
practitioner outlets and consultancy reports are becoming more
prevalent, their content is mostly repetitive, and rigorous scientic investigations into the topic have been absent. In addition,
while companies are experimenting with big data analytics, for
the majority, its application has been elusive or has only
scratched the surface of what may be possible. The need for
well-trained data scientists is thus imminent (Dwoskin 2014).
Further research into predictive analytics is also needed, both
121
Function
Logistics/transportation (count)
% Within function
% Within usage group
Operations management (count)
% Within function
% Within usage group
Supply chain management (count)
% Within function
% Within usage group
Purchasing/procurement/sourcing (count)
% Within function
% Within usage group
Engineering (count)
% Within function
% Within usage group
Research & development (count)
% Within function
% Within usage group
Total (count)
% of total
No current use,
but plans for
the future
To some
extent
To a great
extent
No current
use, no plans
for the future
Not familiar
with data
analytics
Total
19
7.9
41.3
10
9.0
21.7
8
8.4
17.4
6
18.2
13.0
2
7.4
4.3
1
4.0
2.2
46
8.7
66
27.5
44.9
27
24.3
18.4
29
30.5
19.7
9
27.3
6.1
5
18.5
3.4
11
44.0
7.5
147
27.7
23
9.6
32.9
9
8.1
12.9
26
27.4
37.1
6
18.2
8.6
3
11.1
4.3
3
12.0
4.3
70
13.2
55
22.9
47.0
32
28.8
27.4
13
13.7
11.1
2
6.1
1.7
10
37.0
8.5
5
20.0
4.3
117
22.0
77
32.1
51.0
33
29.7
21.9
19
20.0
12.6
10
30.3
6.6
7
25.9
4.6
5
20.0
3.3
151
28.4
240
100.0
45.2
111
100.0
20.9
95
100.0
17.9
33
100.0
6.2
27
100.0
5.1
25
100.0
4.7
531
100.0
122
Motivation
My conviction
Internal colleagues
Customers
Competitors
Encouragement by
senior leadership
Suppliers
Press coverage
Overall
mean
4.73
4.47
4.11
4.25
4.78
(1.36)
(1.48)
(1.73)
(1.77)
(1.57)
No current use,
but plans for
the future (1)
4.17(3)
3.94(3)
4.11
4.19
4.42
4.05 (1.70)
3.06 (1.65)
To some
extent (2)
4.55(3)
4.44
3.84(3)
4.03(3)
4.72
(1.36)
(1.55)
(1.69)
(1.58)
(1.52)
4.03 (1.50)
3.25 (1.56)
ANOVA
F-value
and sign.
To a great
extent (3)
5.40(1,2)
4.83(1)
4.62(2)
4.72(2)
5.12
(1.25)
(1.39)
(1.70)
(1.80)
(1.58)
3.97 (1.75)
3.00 (1.67)
(1.33)
(1.54)
(1.72)
(1.75)
(1.54)
12.62**
4.20*
4.06*
3.11*
2.47*
4.22 (1.73)
3.07 (1.67)
0.43
0.32
Notes: This table shows means and standard errors (in parentheses). The numbers in the superscripted parentheses indicate the group score number from
which the group score is signicantly different. Due to unequal sample sizes, post hoc pairwise comparison tests were conducted utilizing the Hochberg
test statistic. **p < .001, *p < .05.
groups existed on the individuals conviction serving as a motivator. The valuable implication derived from this nding is that
an SCM professionals conviction about the value of SCM predictive analytics is one of the primary drivers for early adoption.
The result may also be indicative of users realizing the value of
the approaches once they are actively being utilized, further
increasing the individuals conviction. To a lesser degree, internal colleagues, customers, and competitors serve as distinguishing characteristics being related to more frequent use of
Benets
Overall
mean
No current use,
but plans for
the future (1)
To some
extent (2)
To a great
extent (3)
ANOVA
F-value
and sign.
5.36(3)
5.14
4.97
5.19
4.50
(1.44)
(1.29)
(1.21)
(1.28)
(1.63)
5.40(3)
5.01(3)
4.94(3)
5.03(3)
4.38(3)
(1.18) 6.03(1,2)
(1.42) 5.68(2)
(1.38) 5.58(2)
(1.51) 5.73(2)
(1.60) 5.15(2)
(1.09)
(1.17)
(1.21)
(1.26)
(1.46)
6.21**
5.17**
5.11**
4.91**
4.90**
5.28
5.06
4.69
4.31
(1.23)
(1.33)
(1.26)
(1.58)
5.12(3)
5.17
4.88
4.41
(1.34)
(1.29)
(1.31)
(1.60)
5.72(2)
5.63
5.32
4.98
(1.28)
(1.21)
(1.37)
(1.52)
4.15*
3.29*
3.14*
3.12*
4.86 (1.48)
5.03 (1.27)
5.43 (1.18)
2.82*
5.06 (1.29)
4.72 (1.52)
4.75 (1.59)
4.97 (1.34)
4.58 (1.54)
4.72 (1.58)
5.45 (1.23)
5.03 (1.45)
5.10 (1.46)
2.80*
1.80
1.28
Notes: This table shows means and standard errors (in parentheses). The numbers in the superscripted parentheses indicate the group score number from
which the group score is signicantly different. Due to unequal sample sizes, post hoc pairwise comparison tests were conducted utilizing the Hochberg
test statistic. **p < .01, *p < .05.
123
Barrier
Lack of data
Inability to identify most suitable data
Security concerns
Lack of upper management support
Unclear business case or value
Privacy/condentiality issues
Lack of policies and governance structure
Inability to make sense of available data
No need/not necessary/no benet
Overwhelming, difcult to manage
Cost of currently available solutions
Lack of integration with current systems
Employees are inexperienced (need to train)
Change management issues (resistance to
change)
Lack of appropriate solutions for SCM
Current applications unable to meet business
needs
Time constraints
Overall
mean
3.83
3.99
3.84
3.83
3.83
3.80
3.91
3.95
3.30
4.16
4.48
4.61
4.92
4.44
(1.50)
(1.48)
(1.71)
(1.74)
(1.44)
(1.72)
(1.57)
(1.54)
(1.64)
(1.46)
(1.48)
(1.51)
(1.47)
(1.64)
No current use,
but plans for
the future (1)
4.42(3)
4.50(3)
4.36
4.39
4.19
4.28
4.14
4.39
3.67
4.56
4.86
4.92
4.92
4.56
(1.32)
(1.56)
(1.69)
(1.40)
(1.04)
(1.60)
(1.31)
(1.48)
(1.45)
(1.46)
(1.44)
(1.50)
(1.61)
(1.61)
To some
extent (2)
3.79
4.00
3.86
3.83
3.89
3.84
4.03
3.96
3.34
4.15
4.41
4.62
4.83
4.32
(1.35)
(1.41)
(1.69)
(1.79)
(1.41)
(1.70)
(1.59)
(1.44)
(1.60)
(1.33)
(1.38)
(1.45)
(1.47)
(1.67)
To a great
extent (3)
3.55(1)
3.67(1)
3.50
3.52
3.52
3.45
3.53
3.67
2.98
3.95
4.38
4.40
5.12
4.60
(1.79)
(1.50)
(1.71)
(1.78)
(1.65)
(1.77)
(1.65)
(1.71)
(1.80)
(1.66)
(1.68)
(1.60)
(1.38)
(1.59)
ANOVA
F-value
and sign.
3.93*
3.66*
2.91*
2.87*
2.77*
2.71*
2.54*
2.51*
2.07
1.97
1.44
1.33
0.77
0.69
4.33 (1.41)
3.96 (1.49)
4.56 (1.46)
4.14 (1.40)
4.30 (1.33)
3.87 (1.44)
4.25 (1.54)
4.03 (1.64)
0.58
0.54
4.63 (1.37)
4.56 (1.56)
4.71 (1.25)
4.52 (1.49)
0.44
Notes: This table shows means and standard errors (in parentheses). The numbers in the superscripted parentheses indicate the group score number from
which the group score is signicantly different. Due to unequal sample sizes, post hoc pairwise comparison tests were conducted utilizing the Hochberg
test statistic. *p < .05.
124
ate data. Most likely, some type of data exist that may be suitable for predictive analytics, due to the ease with which data can
be collected todayit just needs to be identied, made palatable,
and worked with. It may also just need an illustration how existing data can be readily used for predictive analyticsthe data
may be there, but individuals may not realize its suitability. For
the remaining barriers, a similar pattern is evident, although not
signicant at statistically detectable levels.
Mean
SD
5.38
1.216
5.19
5.18
1.117
1.113
5.17
5.08
1.224
1.137
5.04
1.196
4.92
1.248
4.88
4.74
4.67
1.269
1.275
1.221
Company
Steelcase
Kelloggs
PriceWaterhouseCoopers
IBM
Source: https://accounting.broad.msu.edu/welcome/ms-business-analytics/events/
125
analysis therefore not only make valuable contributions to academic research, but also to the practitioner press.
80
60
MSDS
40
MSBA
Enterprise
Business
Processes &
Decision Making
MSA
20
0
2007
2008
2009
2010
2011
2012
2013
2014
2015
Data
Management
Analytical and
Modeling
Tools
Data Scientists
126
A MS in predictive analytics
Waller and Fawcett (2013a) encouraged the provision of a curriculum to train next-generation graduates to successfully tackle the
challenges of data science. We directly follow their call by providing a description on how a large U.S. university trains their
next-generation data scientists. In this section, we rst provide
University
Arizona State
University
Carnegie Mellon
University*
DePaul University
Drexel University*
Fordham
University*
Louisiana State
University
Michgian State
University
New York
University
North Carolina
State University
Northwestern
University*
Northwestern
University
Rensselaer
Polytechnic
Institute*
Rutgers
University*
Southern
Methodist
University*
University of
Cincinnati*
University of
Tennessee*
University of
Connecticut*
University of
Maryland
University of
San Francisco
University
of Texas*
Time
Year
Credits commitment established
Program
length
Business
processes
Data
Analytical
and decision management and modeling Integration
making (%)
(%)
tools (%)
(%)
30
FT/O
2013
916 months
18
27
46
FT
2013
1221 months
36
23
25
16
52
45
30
O
PT/O
FT
2010
2012
2012
24 months
20 months
12 months
22
29
43
34
42
43
33
29
14
11
39
FT
2011
12 months
30
52
13
30
FT
2013
12 months
13
20
47
20
36
FT/PT
2013
1521 months
33
20
40
30
FT
2007
10 months
15
25
45
15
2011
20 months
10
10
70
10
FT
2012
15 months
20
27
33
20
30
FT
2013
12 months
16
34
34
16
43
FT/PT
2012
1221 months
30
20
40
10
33
FT
2013
1824 months
46
46
25
FT/PT
2011
1220 months
76
16
39
FT
2010
17 months
22
10
58
10
33
FT/PT
2012
12+ months
44
28
28
30
FT
2013
9 months
48
10
36
35
FT
2012
11 months
15
20
54
11
36
FT
2013
12 months
13
25
50
12
Notes: *The percentage mix reects required courses; elective course selections would change the actual percentage allocations. FT, Full time, on campus program; PT, Part time, on campus program; O, Online program.
data science capabilities. At one end of the spectrum this curriculum involves certicate and executive education programs to
expose participants to core concepts in data science. At the other
end of the spectrum are well-developed and robust curricula
leading to a masters or undergraduate degrees in data science.1
As illustrated in Figure 1 (North Carolina State University
2014), there has been tremendous growth in the number of MS
Analytics programs with a signicant uptick in the number of
programs in 2014 and 2015.
One of the most challenging curricular aspects of developing
sufcient applicable skills in data science is the breadth and
depth of diverse skill sets that are needed to be a highly capable
professional. Furthermore, individuals who may have an interest
in one dimension associated with data science are unlikely to
have developed skills or even been exposed to conceptual foundations and skills in another area. Thus, a data scientist must
develop deep conceptual understanding and applicable skills in
three areasenterprise business processes and decision making,
data management, and analytical and modeling tools (Figure 2).
Given the disparate nature of these areas, it is critical that the
development process includes experiential learning to provide the
necessary practice of integrating these areas and understanding
how to master the skills in specic domains.
We conducted an analysis of MS in Analytics programs that
were established in 2013 or prior, and which have received recognition as a top program (BISoftwareInsight 2014; MastersInDataScience 2014). We excluded programs that focused strictly
on concentrations in existing MBA programs to examine differences among the specic MS curricula. As can be seen in the
summary provided in Table 7, there is tremendous variation in
these programscredit hours vary substantially, but a majority
of the programs identied require between 30 and 39 credits,
with some programs requiring considerably more. Consistent
with the number of credit hours, a majority of these programs
support degree completion in one year or less. Those requiring
substantially more time are often designed as online or part-time
programs, and typically target working students who cannot
attend a full-time program.
Perhaps, the most interesting variation can be seen across the
three knowledge/skill domains previously described (enterprise
business processes and decision making, analytical and modeling
tools, data management), being indicative of the different programmatic foci of the offerings.2 Stressing our conviction about
the value of practical application of course content, we added an
127
Analytics Problem
Framing
15%
Data
16%
17%
Methodology/Approach
Selection
Model Building
Deployment
15%
22%
128
the three principal areas identied above: enterprise business processes and decision making; data management; and analytical
and modeling tools.
Experiential learning, embedded in each of the three semesters
the students are enrolled, involves an integration of the three
foundational areas by performing a corporate analytics project in
partnership within an organization. Participating companies have
included Fortune 500 and medium-sized businesses in the insurance, automotive, nancial services, energy, and manufacturing
industries as well as governmental entities. While many of the
projects involve customer-facing applications creating predictive
models regarding customer churn and customer lifetime value,
there is increasing interest in nancial, cost management, supply
chain (e.g., predicting logistics and quality failures), and human
resource (e.g., employee churn) applications. The analytical tools
applied have ranged from traditional statistics to sophisticated
predictive modeling applications (e.g., SPSS Modeler) and programming in R to develop customized statistical and visual
analyses.
Data Management
129
The colors correspond to the prior figure highlighting business processes and decision making (green), data
management (tan), and analytical and modeling tools (blue). The yellow highlights integrative content across
all three areas. The numbers in parentheses represent the number of credit hours.
130
executives and business process owners in rmsusing appropriate language (e.g., presenting impact ndings using visual
display to executives and highly technical models to business
process owners).
More on curriculum
In addition to the launch of the MS in Predictive Analytics program, the university is committed to developing analytics awareness and understanding across our very large undergraduate
population. Given the number of credits associated with our business degree (across all majors, including SCM), there are insufcient credits available for a student to develop the depth of
knowledge needed in data management and analytics, and modeling tools. However, we are using the business core courses to
develop awareness and strategic understanding of the importance
of using predictive analytics when making many business decisions, and we are providing hands-on touch points for every student using integrative experiences, as outlined in the following.
The university has initiated the process of integrating a common thematic modulecognitive computingwithin each course
making up the core business requirement that every business
major must take (SCM, marketing, nance, human resources,
information systems, statistics, and a strategy capstone). Speci-
Notes: Blue arrows represent changes across the rst three years of the program. Shaded areas represent course modications (yellow) and new course developments (green).
CONCLUSION
This article brought insight into the rapidly evolving domain of
SCM predictive analytics and represents, to our knowledge, one
of the rst academic, large-scale surveys on the topic. With the
data collected, we were able to provide a current assessment of
the extent to which SCM predictive analytics are used in industry, together with the underlying motivations. We also identied
primary benets and major obstacles to SCM predictive analytics. In doing so, we offered additional insight into various usage
groups and their characteristics, explicating the current state of
data science, predictive analytics, and big data in SCM. Further,
we provided recommendations on how to train our next-generation data scientists. Insight in this latter part was generated by
the analysis of our survey data and expert interviews, combined
with our experiences in developing and implementing one of the
rst MS degree programs in predictive analytics, offering insight
into the future potential of data science, predictive analytics, and
big data in SCM.
Overall, it was our intent to provide a timely assessment of
the eld and motivate additional research and pedagogical
developments in this domain. As was illustrated, the eld of
SCM predictive analytics provides a promising avenue for transforming the management of supply chains, and offers an exciting array of research opportunities. For this purpose, and based
on our insight derived from the survey, expert interviews, and
the development of the MS program, we offer the following
possible avenues for future investigation. From a strategic perspective, there is a need for understanding the specic types of
supply chain questions that rms are addressing with analytics
and the measured value of the insights derived from analytics
activities (e.g., return on investment). Similarly, more formal
investigation into the barriers impeding the adoption and infusion of predictive analytics into the organization would be valuable. In addition, linked to the adoption of business analytics is
the organizational structure that is implemented to promote and
support enterprise analytics activities. Associated questions in
need of further investigation include the following: What are
the differing impacts of having a centralized, distributed, or
hybrid structure for successfully promoting analytics use within
the enterprise, and how might that structure change over time?
What corporate governance structures need to be in place to
enable and facilitate SCM predictive analytics? In addition, legal
and ethical issues in the use of predictive analytics, especially
as it pertains to consumer data, need to be investigated. Promising avenues for predictive analytics exist also in its application
to real-time risk management and dynamic resource optimizations. It is our hope that this article provides motivation and a
starting point to stimulate further research in SCM predictive
analytics, and to further infuse curricula with predictive analytics components.
REFERENCES
Akkermans, H.A., and Van Wassenhove, L.N. 2013. Searching
for the Grey Swans: The Next 50 Years of Production
Research. International Journal of Production Research 51
(2324):674655.
131
132
22/broad-undergraduates-move-cognitive-computing-future-ib
ms-watson/
Nestler, S., Levis, J., and Klimack, B. 2012. Certied Analytics
Professional. Analytics, Sept/Oct, 2629.
North Carolina State University. 2014. Degree Programs in
Analytics and Data Science. http://analytics.ncsu.edu/?page_id
=4184
Russom, P. 2011. Big Data Analytics. TDWI Best Practices
Report, Fourth Quarter. http://www.sas.com/content/dam/SAS/
en_us/doc/research2/big-data-analytics-105425.pdf
Waller, M.A., and Fawcett, S.E. 2013a. Click Here for a Data
Scientist: Big Data, Predictive Analytics, and Theory
Development in the Era of a Maker Movement Supply
Chain. Journal of Business Logistics 34(4):24952.
Waller, M.A., and Fawcett, S.E. 2013b. Data Science,
Predictive Analytics, and Big Data: A Revolution That Will
Transform Supply Chain Design and Management. Journal
of Business Logistics 34(2):7784.
SHORT BIOGRAPHIES
Tobias Schoenherr (PhD Indiana University) is an Associate
Professor in the Broad College of Business at Michigan State
University. His research focuses on strategic supply management,
including strategic sourcing, the use of technology in procure-
Copyright of Journal of Business Logistics is the property of Wiley-Blackwell and its content
may not be copied or emailed to multiple sites or posted to a listserv without the copyright
holder's express written permission. However, users may print, download, or email articles for
individual use.