0% found this document useful (0 votes)
20 views13 pages

Smith (2020)

Uploaded by

Manpreet Multani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views13 pages

Smith (2020)

Uploaded by

Manpreet Multani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

915600 JIN Journal of Information TechnologySmith

Invited Article
JIT
Journal of Information Technology

Data mining fool’s gold 2020, Vol. 35(3) 182–194


© Association for Information
Technology Trust 2020
Article reuse guidelines:
sagepub.com/journals-permissions
https://doi.org/10.1177/0268396220915600
DOI: 10.1177/0268396220915600
Journals.sagepub.com/jinf
Gary Smith

Abstract
The scientific method is based on the rigorous testing of falsifiable conjectures. Data mining, in contrast, puts data before
theory by searching for statistical patterns without being constrained by prespecified hypotheses. Artificial intelligence
and machine learning systems, for example, often rely on data-mining algorithms to construct models with little or no
human guidance. However, a plethora of patterns are inevitable in large data sets, and computer algorithms have no
effective way of assessing whether the patterns they unearth are truly useful or meaningless coincidences. While data
mining sometimes discovers useful relationships, the data deluge has caused the number of possible patterns that can
be discovered relative to the number that are genuinely useful to grow exponentially—which makes it increasingly likely
that what data mining unearths is likely to be fool’s gold.

Keywords
Data mining, data exploration, knowledge discovery, artificial intelligence, machine learning, HARKing

Introduction paradoxically, the more information we have, the more difficult


is to extract meaning from it. Too much information tends to
The scientific revolution was fueled by what has come to be behave like very little information.
known as the scientific method: specify a falsifiable con-
jecture and then collect data, ideally through a controlled
If there is a fixed set of true statistical relationships that are
experiment, to test this hypothesis. The modern availability useful for making predictions, the data deluge necessarily
of powerful computers and vast amounts of data makes it increases the ratio of meaningless statistical relationships to
tempting to reverse the process by using data “to reveal hid- true relationships.
den patterns and secret correlations” (Sagiroglu and Sinanc,
2013). When a pattern is found, a theory can be conceived From a Bayesian perspective, suppose that 1 out of
after the fact to explain the pattern, or it might be argued every 1000 patterns that might be discovered is useful and
that theories are unnecessary (Begoli and Horsey, 2012; the other 999 are useless, and that we use a reliable statisti-
Cios et al., 2007; Fayyad et al., 1996). Some assert that cal test that will correctly identify a real pattern as truly
using a priori knowledge before looking at the data is not useful and a coincidental pattern as truly useless 95% of the
only unnecessary, but limiting (Piatetsky-Shapiro, 1991). time. Our prior probability that we find a useful pattern by
This reversal of the scientific method goes by many searching randomly is 1 in 1000. After we have found a pat-
names, including data mining, data exploration, knowledge tern and determined that it is statistically significant at the
discovery, and information harvesting. What they have in 5% level, the posterior probability that it is useful is less
common is the belief that data come before theory. This is than 1 in 50. This is higher than 1 in 1000, but it is hardly
known as HARKing: Hypothesizing After the Results are persuasive. We are far more likely than not to have discov-
Known. The harsh sound of the word reflects the dangers of ered a pattern that is genuinely useless.
HARKing: It is tempting to believe that patterns are unu- Table 1 shows the posterior probabilities for other values
sual and their discovery meaningful; in large data sets, pat- of the prior probability.
terns are inevitable and generally meaningless.
Calude and Longo (2017) prove that large amounts of
data necessarily contain a large number of patterns and cor- Pomona College, USA
relations waiting to be discovered: Corresponding author:
Gary Smith, Department of Economics, Pomona College, 425 N.
the more data, the more arbitrary, meaningless and useless College Avenue, Claremont, CA 91711, USA.
(for future action) correlations will be found in them. Thus, Email: [email protected]
Smith 183

Table 1. Probability that a discovered pattern is useful. Coase did not intend his comment to be a lofty goal
worth seeking, but a succinct criticism of the practice of
Prior probability Posterior probability
pillaging data in search of statistical significance (Tullock,
0.001 0.018664 2001).
0.0001 0.001897 Research published in highly respected Information
0.00001 0.000190 Systems (IS) journals has traditionally followed the scien-
0.000001 0.000019 tific method, with accepted theories suggesting hypotheses
that are tested with high-quality data. Grover and Lyytinen
(2015) make an intentionally provocative argument for less
We do not know precisely how many useless patterns reliance on established theories and more openness to data-
are out there waiting to be discovered, but we do know that driven research, and such papers are now being published
with big data and powerful computers, it is a very large in top-tier IS journals (e.g. Brynjolfsson et al., 2016; Guo
number that is getting larger every day, which means that et al., 2017; Martens et al., 2016; Shi et al., 2016). Jafar
the probably that a randomly discovered pattern is useful is et al. (2017) report that undergraduate and graduate IS cur-
getting ever closer to 0. ricula increasingly include data-mining courses offered
under a variety of names.
Data mining Data mining sometimes discovers useful models that are
later confirmed by the scientific method. What is problem-
Decades ago, being accused of data mining, fishing expedi- atic is the widespread uncritical acceptance of data-mined
tions, and data dredging were insults comparable to being results as equivalent to the scientific method. If an explana-
accused of plagiarism. James Tobin (1972), a Nobel laure- tion is desired, it is easy for creative humans to think up
ate in economics, wryly observed that when researchers did fanciful theories after the fact. If the data mining is hidden
calculations by hand, they thought hard before calculating. inside a black box algorithm, no explanation is possible. If
With terabytes of data and lightning-fast computers, it is the researcher believes that correlation supersedes causa-
too easy to calculate first, think later. tion, no explanation is needed.
Ronald Coase (1988), another economics Nobel laure-
ate, famously remarked, “If you torture the data long
enough, it will confess.” Since those who ransack data Prediction
looking for statistical patterns will surely find some, their
discoveries demonstrate nothing more than that data were Some data-mining enthusiasts argue that the goal is predic-
ransacked. tion, not the confirmation of causal effects (Mullainathan
Today, the combination of the powerful computers and and Spiess, 2017). If there is a correlation between the
the data explosion has made data mining irresistible for number of Google searches for the word Scorpio and the
some. Anderson (2008), at the time the editor-in-chief of price of avocados in San Francisco, we do not need to know
Wired, wrote an article with the provocative title, “The End why these are correlated. It is enough to know that they are
of Theory: The Data Deluge Makes the Scientific Method correlated since one predicts the other. Smith and Cordes
Obsolete.” Anderson argued: (2019) quote a business executive who repeatedly embraced
data mining with the pithy comment, “Up is up.”
With enough data, the numbers speak for themselves. . . . Athey (2018: 10) argued that prediction does not require
Correlation supersedes causation, and science can advance causation:
even without coherent models, unified theories, or really any
mechanistic explanation at all. Imagine first that a hotel chain wishes to form an estimate of
the occupancy rates of competitors, based on publicly available
In the opening lines in a forward for a book on using prices. This is a prediction problem . . . [H]igher posted prices
data mining for knowledge discovery, a computer scientist are predictive of higher occupancy rates, since hotels tend to
(Kecman, 2007) cited Coase while arguing that state-of- raise their prices as they fill up (using yield management
the-art data-torturing tools may sometimes be needed to software). In contrast, imagine that a hotel chain wishes to
estimate how occupancy would change if the hotel raised
reveal nature’s secrets:
prices across the board . . . This is a question of causal
inference. Clearly, even though prices and occupancy are
“If you torture the data long enough, [it] will confess,” said
positively correlated in a typical dataset, we would not
1991 Nobel-winning economist Ronald Coase. The statement
conclude that raising prices would increase occupancy.
is still true. However, achieving this lofty goal is not easy.
First, “long enough” may, in practice, be “too long” in many
applications and thus unacceptable. Second, to get “confession” However, for a statistical relationship to be useful for
from large data sets one needs to use state-of-the-art “torturing” making predictions, there must be a reason for the relation-
tools. Third, Nature is very stubborn—not yielding easily or ship. For example, the ancient Egyptians noticed that the
unwilling to reveal its secrets at all. annual flooding of the Nile was regularly preceded by
184 Journal of Information Technology 35(3)

seeing Sirius—the brightest star visible from earth—appear tests. A determined head-in-the-sand researcher who ana-
to rise in the eastern horizon just before the sun rose. Sirius lyzes 10,000 pairs of unrelated data can expect to find 25
did not cause the flooding, but it was a useful predictor correlations that are statistically significant both in-sample
because there was an underlying reason: Sirius rose before and out-of-sample. In the age of the data deluge, there are a
dawn every year in mid-July and heavy rains that began in lot more than 10,000 pairs that can be analyzed and a lot
May in the Ethiopian Highlands caused the flooding of the more than 25 spurious correlations that will survive in-sam-
Nile in late July. ple and out-of-sample tests.
In the hotel example, the statistical correlation between Out-of-sample tests are surely valuable; however, data
prices and occupancy rates is not a fluke; it reflects a real mining with out-of-sample data is still data mining and still
underlying structural relationship. In contrast, a discovered subject to the same pitfalls.
statistical relationship between hotel occupancy rates in
Denver and the price of tea in China would be useless for
predicting either. Crowding out
There is a more subtle problem with wholesale data mining
tempered by out-of-sample tests. Suppose that a data-min-
Out-of-sample data ing algorithm is used to select several predictor variables
The perils of data mining are often exposed when a pattern from a data set that includes a relatively small number of
that has been discovered by rummaging through data disap- “true” variables that are causally related to the variable
pears when it is applied to fresh data. So, it would seem that being predicted and a large number of “nuisance” variables
an effective way of determining whether a statistical pat- that are independent of the variable being predicted. One
tern is meaningful or meaningless is to divide the original problem, as we have seen, is that some nuisance variables
data into two parts—in-sample data that can be used to dis- are likely to be coincidentally successful both in-sample
cover models, and out-of-sample data that can be used to and out-of-sample, but then flop when the model goes live
test the models that were discovered with the in-sample with new data.
data (Athey, 2018; Egami et al., 2018). This procedure is An additional problem is that a data-mining algorithm
sensible but, unfortunately, provides no guarantees. may select nuisance variables in place of true variables that
Suppose that we are trying to figure out a way to predict would be useful for making reliable predictions. Testing
Liverpool’s margin of victory in the English Premier and retesting a data-mined model may eventually expose
League, and we divide the 2018 season into the first half the nuisance variables as useless, but it cannot bring back
(19 in-sample games) and the second half (19 out-of-sam- the true variables that were crowded out by the nuisance
ple games). If a data-mining algorithm looks at temperature variables. The more nuisance variables that are initially
data in hundreds of California cities on the day before considered, the more likely it is that some true variables
Liverpool matches, it might discover that the difference will disappear without a trace.
between the high and low temperatures in Claremont,
California, is a good predictor of the Liverpool score.
Not enough good data?
If this statistical pattern is purely coincidental (as it
surely is), then testing the relationship on the out-of-sample In fields where laboratory experiments are possible, theo-
data is likely to show that it is useless for predicting ries can be tested endlessly and coincidental patterns will
Liverpool scores. If that happens, however, the data-mining eventually be exposed as such. However, in many fields,
algorithm can keep looking for other weather patterns there are not enough data for a large number of out-of-sam-
(there are lots of cities in California, and other places, if ple tests (Arnott et al., 2018).
needed) until it finds one that makes successful predictions An extreme example is that shortly after the 2016 U.S.
with both the in-sample data and the out-of-sample data— presidential election, it was widely reported that a history
and it is certain to succeed if a sufficiently large number of professor had correctly predicted that Trump would win the
cities are considered. Just as spurious correlations can be popular vote based on 13 key variables (Stevenson, 2016).
discovered for the first 19 games of the Premiere League Overfitting was an obvious concern. As it turned out,
season, so spurious correlations can be discovered for all 38 Trump lost the popular vote, contrary to the model’s predic-
games. tion, but the bigger point is that one observation every 4
A pattern is generally considered to be statistically sig- years does not allow for much out-of-sample testing. Other
nificant if there is less than a 5% chance that it would occur cases are less extreme, but still limiting.
by luck alone. This means that if we are so misguided as to Another problem with observational data collected out-
only consider correlations between pairs of independently side of controlled experiments is self-selection bias. People
generated random numbers, we can expect 1 out of every who make different choices may experience different out-
20 spurious correlations to pass the in-sample test, and 1 comes not because of their choices but because of the types
out of 400 to pass both the in-sample and out-of-sample of people who make such choices. Data-mining algorithms
Smith 185

are ill-equipped to recognize such biases because they do have no effect on Y, but might be coincidentally corre-
not consider the nature of the data being mined. lated with Y.
Social media data are currently fashionable because they The base case was σx = 5, σy = 20, but I also considered
provide vast amounts of data, but their usefulness is ques- all combinations of σx = 5, 10, or 20, and σy = 10, 20, or
tionable. Are the people who use social media representa- 30. For the range of values considered here, the results
tive of the population? Are the messages they send were robust with respect to the values of σx and σy, so I
representative of their feelings? Are they even people? A only report results for the base case. One hundred thou-
2018 Pew Research Center study (Wojcik et al., 2018) esti- sand simulations were done for each parameterization of
mated that two-thirds of the tweeted links to popular web the model.
sites were made by suspected bots. Again, data-mining The central question is how effective the estimated
algorithms are hard-pressed to consider the relevance and model is at making reliable predictions with fresh data. So,
quality of the data they mine. in each simulation, 100 observations were used to estimate
the model’s coefficients, and the remaining 100 observa-
tions were used to test the model’s reliability.
Monte Carlo simulations The in-sample data were centered on the in-sample
Monte Carlo simulations, named after the gambling mecca, means and the out-of-sample data were centered on the out-
were first used in the Manhattan Project in the 1940s of-sample means so that the out-of-sample predictions
(Metropolis, 1987). Today, they are widely employed in would not be inflated if the in-sample and out-of-sample
many probabilistic situations where an exact numerical means differed.
solution cannot be derived mathematically. A stepwise regression procedure was used to select the
For example, a financial planner might specify a per- explanatory variables, one by one, with the lowest two-
son’s initial wealth, age, spending, and investment strate- sided p values if the p value was less than 0.05. The results
gies, and the model’s parameters. In a Monte Carlo were not due to the use of stepwise regression, but rather to
simulation of annual outcomes, a computer random number data mining. Stepwise regression is simply a practical data-
generator determines whether the person lives another year, mining tool for identifying explanatory variables that are
how much she spends, and the return on her investments. statistically correlated with the variable being predicted
The simulations might be run one million times in order to when there are a large number of candidate explanatory
provide an estimate of the probability that she outlives her variables (Bruce and Bruce, 2017; Cios et al., 2007; Hastie
wealth and the probability distribution of her bequest. After et al., 2016; Varian, 2014).
the simulations are run with a variety of behavioral assump- In one set of simulations, all of the candidate explana-
tions, the planner and the client can choose a strategy. tory variables were nuisance variables. In the second set of
Here, I used Monte Carlo simulations to explore the per- simulations, five true variables were included among the
ils of data mining. A total of 200 observations for each of m candidate variables. The first set of simulations, with
candidate explanatory variable were determined by com- entirely spurious variables, considers the extent to which
puter-generated random draws from a normal distribution coincidental correlations with the dependent variable can
with mean 0 and standard deviation σx create an illusion of a successful prediction model. The sec-
ond set of simulations considers how well data mining is
X i , j = εi , j ε ~ N [ 0, σ x ] (1) able to distinguish between meaningful and meaningless
variables.
The independence of the explanatory variables ensures The predictive success of the model was gauged by the
that there are no structural relationships among the explan- correlation between the actual values of the dependent vari-
atory variables that might cause some variables to be prox- able and the model’s predicted values. The square of the
ies for others. in-sample correlation is the coefficient of multiple determi-
Five randomly selected explanatory variables (the true nation, R2 for the estimated model. The out-of-sample cor-
variables) were used to determine the value of a dependent relation is the corresponding statistic using the out-of-sample
variable Y data with the in-sample coefficient estimates.
Table 2 reports the results of simulations in which all of
5
the candidate explanatory variables were nuisance varia-
Yj = ∑β X
i =1
i i, j + υj, υ ~ N 0, σ y  bles. Every variable selected by the data-mining algorithm
as being useful was actually useless, yet data mining con-
where the value of each β coefficient was randomly deter- sistently discovered a substantial number of variables that
mined from a uniform distribution ranging from 2 to 4, were highly correlated with the target variable. For exam-
with the range 0 to 2 excluded so that the true variables ple, with 100 candidate variables, the data-mining algo-
would have substantial effects on the dependent variable. rithm picked out, on average, 6.63 useless variables for
The other candidate variables are nuisance variables that making predictions.
186 Journal of Information Technology 35(3)

Table 2. Simulations with no true variables.

Number of candidate Average number of Average in-sample Average out-of-sample


variables variables selected correlation correlation
5 1.11 0.244 0.000
10 1.27 0.258 0.000
50 3.05 0.385 0.000
100 6.63 0.549 0.000
500 97.79 1.000 0.000

Table 3. Simulations with five true variables.

Number of candidate Average number of Average in-sample Average out-of-sample


variables variables selected correlation correlation
5 4.50 0.657 0.606
10 4.74 0.663 0.600
50 6.99 0.714 0.543
100 10.71 0.780 0.478
500 97.84 1.000 0.266

Table 2 also shows that, as the number of candidate These simulations also document how a plethora of
explanatory variables increases, so do the average number nuisance variables can crowd out true variables. With 100
of nuisance variables selected. Regardless of how highly candidate variables, for example, one or more true varia-
correlated these variables are with the target variable, they bles were crowded out 50% of the time, and two or more
are, on average, completely useless for future predictions. true variables were crowded out 16% of the time. There
The out-of-sample correlations average zero. However, were even occasions when all five true variables were
some models, by luck alone, survived out-of-sample tests. crowded out.
In every case, approximately 5% of the models that were The bottom line is straightforward. Variables discovered
statistically significant in-sample were also statistically sig- through data mining can appear to be useful even when
nificant out-of-sample. they’re irrelevant, and can cause true variables to be over-
Table 3 shows the simulation results when the five true looked and discarded. Both flaws undermine the usefulness
variables were among the candidate variables considered of data mining.
by the data-mining algorithm. The inclusion of five true
variables did not eliminate the selection of nuisance varia-
bles; instead, it increased the number of variables selected. Trump’s tweets
The larger the number of candidate variables, the more nui-
A simple data-mining example is an exploration of Donald
sance variables are included and the worse are the out-of-
Trump’s tweets (Trump Twitter Archive, 2019) during the
sample predictions. This is empirical confirmation of what
3-year period beginning on 9 November 2016, the day after
might be called the paradox of big data:
his election as President of the United States.
Trump has 66 million Twitter followers and averaged
It would seem that having data for a large number of variables
will help us find more reliable patterns; however, the more 10.64 tweets a day during this 3-year period. He holds the
variables we consider, the less likely it is that what we find will most powerful office in the world, so perhaps his tweets
be useful. have real consequences. I restricted my analysis to words
that Trump used at least 100 times and also appeared in his
Notice, too, that when fewer nuisance variables are con- tweets on at least 50 different days, and I ignored simple
sidered, fewer nuisance variables are selected. Instead of filler words like a, an, it, and to. I calculated the mean and
unleashing a data-mining algorithm on hundreds or thou- standard deviation of his daily usage of each word and,
sands or hundreds of thousands of unfiltered variables, it from these, calculated the daily Z-value for each word.
would be better to use human expertise to exclude as many I then used a 10-fold cross-validation data-mining algo-
nuisance variables as possible. This is a corollary of the rithm to identify words whose daily Z-values could be used
paradox of big data: to predict various variables 1 to 5 days later. It turned out
that the S&P 500 index of stock prices is predicted to be 97
The larger the number of possible explanatory variables, the points higher 2 days after a one-standard-deviation increase
more important is human expertise. in the Trump’s use of the word president, Table 4 shows
Smith 187

Table 4. Data-mining Donald Trump’s tweets.

Dependent variable Explanatory word (Test MSE)/(training MSE) Reduction in MSE (%) Full-sample correlation
S&P 500 (+ 2) President 1.0017 22.69 0.43
Moscow Low (+ 4) Ever 1.0006 3.97 0.20
Pyongyang High (+ 5) Wall 1.0021 5.04 −0.22
Urban Tea (+ 4) With 1.0000 13.64 −0.32
RV (+ 5) Democrat 1.0043 28.87 0.47

MSE: mean square error; RV: random variable.

that the ratio of the average test-period mean square error positive correlation with the value of this random variable
(MSE) to the training-period MSE was barely above 1 and 5 days later.
that the use of the model reduced the MSE by 22.69% rela- The intended lessons are how easy it is for data-mining
tive to simply using the average value of S&P 500 as a pre- algorithms to find transitory patterns and how tempting it is
dictor. The correlation between the daily Z-values for to think up explanations after the fact.
president and the S&P 500 2 days later was a remarkable Here, I considered thousands of tweeted words, 19
0.43. The two-sided p value was essentially zero, though dependent variables (the S&P 500 and the Dow Jones
statistical tests with data-mined models are problematic. Industrial Average, Moscow daily high and low tempera-
We can surely concoct a plausible explanation for why tures, Pyongyang daily high and low temperatures, Urban
Trump tweeting the word president affects the stock market Tea stock returns and Jay Shree Tea stock returns, the num-
a few days later, so I predicted a few other variables. Trump ber of runs scored by the Washington Nationals baseball
seems to admire Russian President Vladimir Putin team, and 10 random-walk variables), and lags of 1 to 5
(Somerlan, 2019; Yeung et al., 2019) and North Korean days, and I only reported the most striking relationships.
Chairman Kim Jong-un (Bierman and Stokols, 2018; That is the nature of the beast we call data mining: seek and
Haltiwanger, 2019). Perhaps his tweets reverberate in these ye shall find.
countries.
Using Weather Underground (2019) data, my 10-fold
cross-validation data-mining algorithm discovered that the
Data-mined investment strategies
low temperature in Moscow is predicted to be 3.30°F higher My data mining of Trump’s tweets is not far-fetched. A
4 days after a one-standard-deviation increase in Trump’s Bank of America study (Franck, 2019) reported that the
use of the word ever, and that the low temperature in stock market does better on days when Trump tweets less.
Pyongyang is predicted to be 4.65°F lower 5 days after a A JP Morgan study (Alloway, 2019) concluded that Trump
one-standard-deviation increase in the use of the word wall. tweets containing the words China, billion, products,
We might concoct an explanation for these correlations Democrats, or great have statistically significant effects on
too, perhaps related to the fact that temperatures in Moscow, interest rates.
Pyongyang, and the Eastern United States are related and Bolen et al. (2011) reported that a data-mining analysis
that Trump’s choice of words is influenced by the weather. of nearly 10 million Twitter tweets during the period
Going farther afield, I considered the proverbial price of tea February to December 2008 found that an upswing in
in China. I could not find daily data on tea prices in China, “calm” words was often followed an increase in the Dow
so I used the daily stock prices of Urban Tea, a tea product Jones average 6 days later. They looked at seven different
distributer headquartered in Changsha City, Hunan Dow predictors: an assessment of positive versus negative
Province, China, with retail stores in Changsha and moods and 6 mood states (calm, alert, sure, vital, kind, and
Shaoyang that sell tea and tea-based beverages. The data- happy). There is, no doubt, considerable noise in assigning
mining algorithm found that Urban Tea’s stock price is pre- mood states to various tweets. Is nice a calm, kind, or happy
dicted to fall 4 days after Trump used the word with more word? Is yes! an alert, sure, or vital word? The researchers
frequently. also considered several different days into the future for
This data-mined correlation between with and Urban correlating with the Dow. Finally, why did they use data
Tea’s stock price might inspire a creative explanation, so I from February to December 2008? What happened to
generated something even more difficult to explain after January? With so much flexibility, data mining was bound
the fact—a random-walk random variable with the daily to discover some coincidental patterns.
change in the value of the variable determined by random Preis et al. (2013) reported that they had found a novel
draws from a normal distribution with mean zero. The data- way to time the stock market by using Google search data.
mining algorithm found that a one-standard deviation They considered weekly data on the frequency with which
increase in Trump’s use of the word democrat had a strong users searched for 98 different keywords:
188 Journal of Information Technology 35(3)

We included terms related to the concept of stock markets,


with some terms suggested by the Google Sets service, a tool
which identifies semantically related keywords. The set of
terms used was therefore not arbitrarily chosen, as we
intentionally introduced some financial bias.

The use of the pejorative word bias is unfortunate since


it suggests that there is something wrong with using search
terms that are related to the stock market. The belief that
correlation supersedes causation assumes that the way to
discover new insights is to look for patterns unfettered by
expert opinion—here, to discover ways to beat the stock
market by looking for “unbiased” words that have nothing
to do with stocks. The fatal flaw in such a blind strategy is
that coincidental patterns will almost certainly be found,
and data alone cannot distinguish between meaningful and
meaningless patterns. If we have wisdom about something,
it is generally better to use it—to introduce some financial
expertise. Figure 1. Clumsily staggering in and out of the market.
The researchers considered moving averages of 1 to
6 weeks for each of their 98 keywords and reported that the
most successful stock trading strategy was based on the strategy had an annual return of 2.81%, compared with
keyword debt, using a 3-week moving average and this 8.60% for buy-and-hold.
decision rule:
Equbot
Buy the Dow if the momentum indicator is negative. In 2017, a company named Equbot launched AIEQ, which
Sell the Dow if the momentum indicator is positive. claimed to be the first exchange-traded fund (ETF) run by
artificial intelligence. Equbot boasted that AIEQ removes
Using data for the 7-year period 1 January 2004, through “human error and bias from the process” by using IBM’s
22 February 2011, they reported that this strategy had an Watson and genetic algorithms, fuzzy logic, and adaptive
astounding 23.0% annual return, compared with 2.2% for a tuning (Equbot, 2019). How well did it perform? Figure 2
buy-and-hold strategy. Their conclusion: shows that AIEQ seems to be a “closet indexer,” tracking
the S&P 500 while underperforming it. From inception
Our results suggest that these warning signs in search volume through 1 November 2019, AIEQ had a cumulative return
data could have been exploited in the construction of profitable of 18%, compared with 23% for the S&P 500.
trading strategies. Figure 3 compares the volume of trading in AIEQ to the
volume of trading in the S&P 500, both scaled to equal 1
They offer no reasons: when AIEQ was launched. Once the disappointing results
became apparent, customers lost interest.
Future work will be needed to provide a thorough explanation
of the underlying psychological mechanisms which lead
people to search for terms like debt before selling stocks at a The Voleon group
lower price. In 2008, Michael Kharitonov and Jon McAuliffe, with
PhDs in computer science and statistics, respectively,
The researchers considered 98 different keywords and 6 started Voleon, an investment management firm that picked
different moving averages (a total of 588 strategies). If they stocks based on a data-mining algorithm:
considered two trading rules (buying when the momentum
indicator was positive or selling when the momentum indi- McAuliffe and Kharitonov say that they don’t even know what
cator was positive), then 1,176 strategies were explored. their bots are looking for or how they reach their conclusions.
With so many possibilities, some chance patterns would “What we say is ‘Here’s a bunch of data. Extract the signal
surely to be discovered—which undermines the credibility from the noise,’” Kharitonov says. “We don’t know what that
of those that were reported. signal is going to be like.” (Salmon and Stokes, 2010)
I tested their debt strategy for predicting the Dow over
the next 7 years, from 22 February 2011, through 31 Voleon’s algorithms reportedly sifted through massive
December 2018. Figure 1 shows the results. Their debt amounts of data, including satellite images, credit card
Smith 189

because the world has changed, but because there never was
a real relationship—just a transitory statistical correlation.

Bitcoin
Bitcoin is the most well-known cryptocurrency, a digital
medium of exchange that operates independently of the
central banking system. As an investment, bitcoins are pure
speculation. Investors who buy bonds receive interest.
Investors who buy stocks receive dividends. Investors who
buy apartment buildings receive rent. The only way people
who invest in bitcoins can make a profit is if they sell their
bitcoins for more than they paid for them—and there is lit-
tle reason to think that bitcoin prices are truly related to
anything other than what Keynes called “animal spirits.”
Nonetheless, using daily data for 1 January 2011 through
31 May 2018 Liu and Tsyvinski (2018) estimated 810 asso-
Figure 2. Underwhelming performance.
ciations between bitcoin returns and various variables, such
as the effect of bitcoin returns on stock returns in the beer,
book, and automobile industries. (They also estimated hun-
dreds of additional equations for two other cryptocurren-
cies, Ethereum and Ripple, and for various sub-periods of
their data set.) The most sensible relationship of the many
they consider is the effect of bitcoin returns on the number
of Google searches for bitcoin. This relationship is not par-
ticularly useful, but at least there is a plausible
explanation.
Perhaps the most unusual thing about this study is that
the authors reported thousands of estimated relationships,
not just those that were statistically significant. Overall,
for the full-sample bitcoin data, they found 63 of the 810
estimated relationships (7.8%) to be statistically signifi-
cant at the 10% level, somewhat fewer than would be
expected if they had just correlated bitcoin returns with
random numbers. They did not attempt to explain the cor-
relations they found: “We don’t give explanations, we just
Figure 3. Overwhelming disinterest. document this behavior.”
Patterns without explanations are treacherous. A search
receipts, and social media language, looking for patterns for patterns in large databases will almost certainly dis-
related to stock prices. cover some, and the coincidental patterns that are discov-
Ten years after its launch, the Wall Street Journal ered are likely to vanish when the results are used to make
reported that Voleon’s annual return had been slightly predictions. What is the point of documenting temporary
worse than the S&P 500, with Kharitonov admitting, patterns that are likely to vanish?
“Most of the things we’ve tried have failed” (Hope and Rosebeck and Smith (2019) used out-of-sample data
Chung, 2017). from 1 June 2018 through 31 July 2019 to try to replicate
The Journal attributed Voleon’s struggles to the fact that 59 of the 63 statistically significant relationships that Liu
financial markets are “continually being affected by new and Tsyvinski reported (there were no out-of-sample data
events, the relationships among which are frequently shift- for 4 of the relationships) and found that 13 continued to
ing.” This is a common excuse when data-mined patterns be statistically significant out of sample. Five of these 13
vanish—the world has changed. Perhaps, but an alternative persistently significant coefficients were the coefficients
explanation is that the statistical patterns that data-mining in five related equations that used bitcoin weekly returns
algorithms discover were fleeting because they were fortui- to predict bitcoin searches in the same week, which is one
tous. If an algorithm finds a correlation between stock prices of the few models that has a logical explanation. These
and Google searches for the word debt, and the pattern dis- five coefficients had the same signs in-sample and
appears when it is used to buy and sell stocks, it is not out-of-sample.
190 Journal of Information Technology 35(3)

For the other eight persistently significant coefficients, The chief scientist said that the company’s algorithm
two had the same sign in both periods and six had oppo- selects dozens of variables, and constantly changes the var-
site signs. Should we conclude that, because bitcoin iables selected as correlations come and go. She believes
returns happened to have had a statistically significant that the ever-changing list of variables demonstrates the
negative effect on stock returns in the paperboard-con- model’s power and flexibility. A more compelling interpre-
tainers-and-boxes industry that was confirmed with out- tation is that the data-mining algorithm captures transitory
of-sample data, a useful, meaningful relationship has coincidental correlations that are of little value. If these
been discovered? were causal relationships, they would not come and go.
There are three lessons here: First, energetic data mining They would persist and be useful.
is certain to discover coincidental patterns. Second, rela- Was this firm’s software successful in identifying good
tionships that have a logical foundation are more likely to job candidates? A person who worked for the company for
be confirmed out-of-sample. Third, with a sufficient 3 years wrote that (Anonymous, 2016):
amount of data mining, some coincidental patterns will, by
luck alone, persist out-of-sample. Customers really, really hate the product. There are almost no
customers that have an overall positive experience. This has
been true for years, and management is not able to reimagine
Google flu trends the company in a way that would let them fix that core problem.

Google researchers created a data-mining program called


Evaluating job applicants based on whether they visit
Google Flu Trends that analyzed 50 million search queries
certain web sites is potentially discriminatory. Similarly, an
and identified 45 key words that “can accurately estimate
Amazon algorithm for evaluating job applicants discrimi-
the current level of weekly influenza activity in each
nated against women who had gone to women’s colleges or
region of the United States, with a reporting lag of about
belonged to women’s organizations because there were few
one day” (Ginsberg et al., 2009). An MIT professor
women in the algorithm’s data base of current employees
praised the model:
(Dastin, 2018; Reuters, 2018).
In 2016, Admiral Insurance, Britain’s largest car insur-
This seems like a really clever way of using data that is created
unintentionally by the users of Google to see patterns in the ance company, planned to launch firstcarquote, which
world that would otherwise be invisible. I think we are just would base its car insurance rates on a data-mined analysis
scratching the surface of what’s possible with collective of an applicant’s Facebook posts (Ruddick, 2016; Rudgard,
intelligence. (Helft, 2008) 2016). One example the company cited was whether a per-
son liked Michael Jordan or Leonard Cohen—which
However, after issuing its report, Google Flu Trends humans would recognize as ripe with errors and biases.
over-estimated the number of flu cases for 100 of the next The Admiral advisor who designed the algorithm said:
108 weeks, by an average of nearly 100% (Lazer et al.,
2014). Google Flu Trends no longer makes flu predictions. Our analysis is not based on any one specific model, but rather
on thousands of different combinations of likes, words and
phrases and is constantly changing with new evidence that we
Messing with our lives obtain from the data. As such our calculations reflect how
drivers generally behave on social media, and how predictive
Many data-mining algorithms for screening job applicants, that is, as opposed to fixed assumptions about what a safe
pricing car insurance, approving loan applications, and driver may look like.
determining prison sentences have significant errors and
biases that are not due to programmer mistakes and biases, This claim was intended to show that the algorithm is
but to data mining. flexible and innovative. What it actually reveals is that their
Gild developed data-mining software for evaluating data-mining algorithm identifies historical patterns, not
applicants for software engineering jobs by monitoring useful predictors. The algorithm changes constantly
their online activities (Peck, 2013). The chief scientist because it has no logical basis and is continuously buffeted
acknowledged that some of the factors chosen by its data- by short-lived correlations.
mining software do not make sense. For example, the soft- We never had the opportunity to see how this algorithm
ware found that several good programmers in its database would have fared because, a few hours before the sched-
visited a particular Japanese manga site frequently, so it uled launch, Facebook announced that it would not allow
decided that people who visit this site are likely to be good Admiral to access Facebook data, citing its policy that
programmers. The chief scientist said, “Obviously, it’s not “prohibits the use of data obtained from Facebook to make
a causal relationship,” but argued that it was still useful decisions about eligibility, including whether to approve or
because Gild has 6 million software engineers in its data- reject an application or how much interest to charge on a
base and there was a strong statistical correlation. loan” (Cohn, 2016).
Smith 191

In 2017, the founder and CEO of a Chinese tech com- criminal by applying their AI algorithm to scanned facial
pany reported that they had developed a data-mining algo- photos.
rithm that evaluates loan applications based on an analysis They argued:
of the usage of hundreds of millions of smartphones in
China (Yuan, 2017): Unlike a human examiner/judge, a computer vision algorithm
or classifier has absolutely no subjective baggages, having no
We don’t need human beings to tell us who’s a good customer emotions, no biases whatsoever due to past experience, race,
and who’s bad. Technology is our risk control. religion, political doctrine, gender, age, etc., no mental fatigue,
no preconditioning of a bad sleep or meal.
Among the data that show up as evidence of a person
They scanned 1856 male ID photos—730 criminals and
being a good credit risk: using an Android phone instead of
1126 non-criminals—and their data-mining program found
an iPhone; not answering incoming calls; having outgoing
“some discriminating structural features for predicting
calls not answered, and not keeping the phone fully charged.
criminality, such as lip curvature, eye inner corner distance,
We could invent plausible theories to explain the discov-
and the so-called nose-mouth angle.”
ered statistical patterns. Or, if these patterns had been
An article in the MIT Technology Review (Emerging
reversed, indicators of being a bad credit risk, we could
Technology from the arXiv, 2016) was optimistic: “All this
invent reasonable explanations for that too. That’s the thing
heralds a new era of anthropometry, criminal or otherwise.
about making up theories after the fact—we are clever
. . . And there is room for more research as machines
enough to invent plausible stories for whatever statistical
become more capable.” Vorhees (2016) wrote, “the study
patterns are found, even if the statistical patterns are ran-
has been conducted with rigor. The results are what they
dom noise discovered by data-mining software. Finding
are.” Spoken like a true data miner. Who needs theories? If
patterns proves nothing. Making up stories to fit the pat-
a data-mining algorithm finds statistical patterns, that’s
terns proves nothing. (If you were wondering, the data-
proof enough. The results are what they are. Up is up.
mining software found these particular patterns to be
To their great credit, data scientists, as a whole, dis-
correlated with being a bad credit risk.)
missed this study as perilous pseudoscience—unreliable
Richard Berk has appointments in the Department of
and misleading, with potentially dangerous consequences if
Criminology and the Department of Statistics at the
taken seriously. However, a blogger (cageymaru, 2016)
University of Pennsylvania. One of his specialties is algo-
argued:
rithmic criminology, which is becoming increasingly com-
mon in pre-trial bail determination, post-trial sentencing, What if they just placed the people that look like criminals into
and post-conviction parole decisions. Berk (2013) writes, an internment camp? What harm would that do? They would
“The approach is ‘black box,’ for which no apologies are just have to stay there until they went through an extensive
made.” In an article in The Atlantic, Berk is more explicit: rehabilitation program. Even if some went that were innocent;
“If I could use sun spots or shoe size or the size of the wrist- how could this adversely affect them in the long run?
band on their wrist, I would. If I give the algorithm enough
predictors to get it started, it finds things that you wouldn’t If such blind faith in data mining becomes the norm,
anticipate” (Labi, 2012). governments may well start imprisoning people based on
Most “things that you wouldn’t anticipate” are things data-mined facial analyses.
that do not make sense, like sun spots, shoe sizes, and wrist-
band sizes. They reflect temporary coincidences that are
useless predictors of criminal behavior. Angwin et al. Discussion
(2016) analyzed one of the most popular risk-assessment Artificial intelligence (AI) systems often rely on data-min-
algorithms and found that only 20% of the people predicted ing algorithms to specify and parameterize models. Such
to commit violent crimes within 2 years actually did so— algorithms have no effective way of assessing the plausibil-
and that the predictions discriminate against black ity of what they discover because computers are not intel-
defendants. ligent in any meaningful sense of the word (Smith, 2018).
Berk no doubt has good intentions, but it is unsettling Consider, for example, the challenges identified by Terry
that he thinks people should be paroled or remain incarcer- Winograd that have come to be known as Winograd sche-
ated based on sunspots, shoes, and wristbands. That’s what mas (Davis, 2018). What does the word it refer to in this
happens when you trust data-mining algorithms too much. sentence?
If bail, sentencing, and parole decisions are based on
data mining, it is just a short step to using data-mined mod- I can’t cut that tree down with that axe; it is too [thick/small].
els to decide who should be arrested and imprisoned. Sure
enough, Wu and Zhang (2016, 2017) reported that they Humans know that if the bracketed word is thick, then it
could predict with 89.5% accuracy whether a person is a refers to the tree and, if the bracketed word is small, then it
192 Journal of Information Technology 35(3)

refers to the axe. Winograd schemas are very difficult for uncritical acceptance of data-mining discoveries as real phe-
computers because they do not understand what words nomena. The crisis might be partly abated by recognizing the
mean. They do not know what tree, axe, cut down, thick, or fact that data-mined coincidences are inevitably temporary.
small mean, or how they might be related.
There is a Winograd Schema Challenge with a $25,000 Conclusion
prize for a computer program that is 90% accurate in inter-
preting Winograd schemas (Levesque et al., 2012). In the Data-mining algorithms—often operating under the label
2016 competition, the expected value of the score for artificial intelligence—are now widely used to discover sta-
guessing was 44% correct (some schemas had more than tistical patterns. However, in large data sets streaks, clus-
two possible answers). The highest computer score was ters, correlations, and other patterns are the norm, not the
58% correct, the lowest 32%, a variation that may have exception. While data mining might discover a useful rela-
been due more to luck than to differences in the competing tionship, the number of possible patterns that can be spotted
programs’ abilities. Computers are like New-Zealand-born relative to the number that are genuinely useful has grown
Nigel Richards who has won the French-language Scrabble exponentially—which means that the chances that a dis-
World Championship twice without knowing the meaning covered pattern is useful is rapidly approaching zero. This
of the words he spells. is the paradox of big data:
How could a data-mining algorithm interpret a corre-
It would seem that having data for a large number of variables
lation between Trump tweeting the word with and the
will help us find more reliable patterns; however, the more
price of Urban Tea stock 4 days later when computer variables we consider, the less likely it is that what we find will
algorithms do not know what any of the words mean and be useful.
have no understanding of what might cause stock prices
to go up or down? Declaration of conflicting interests
Computer image-recognition software is similarly brit-
The author(s) declared no potential conflicts of interest with respect
tle because it identifies and matches pixel patterns without
to the research, authorship, and/or publication of this article.
any understanding of the image formed by the pixels.
Putting graffiti on a photograph of a stop sign or even Funding
changing a few pixels in a picture of a stop sign—altera-
tions that would not be noticed by humans—can cause The author(s) received no financial support for the research,
authorship, and/or publication of this article.
state-of-the-art deep neural networks to fail miserably
(Evtimov et al., 2017; Su et al., 2017). Data-mining pixels
ORCID iD
is not the same as knowing what a stop sign is.
Nguyen et al. (2015) demonstrated something even Gary Smith https://orcid.org/0000-0002-5173-2741
more surprising. In addition to making nothing out of
something (like a computer not recognizing a stop sign), References
computers can make something out of nothing by misinter- Alloway T (2019) JPMorgan creates “Volfefe” index to track
preting meaningless images as real objects. For example, a Trump tweet impact. Bloomberg.Com. Available at: https://
powerful image-recognition program was 99% certain that www.bloomberg.com/news/articles/2019-09-09/jpmorgan-
a horizontal sequence of black and yellow lines was a creates-volfefe-index-to-track-trump-tweet-impact (accessed
1 December 2019).
school bus, completely ignoring the fact that there were no
Anderson C (2008) The end of theory: The data deluge makes
wheels, door, or windows in the picture. the scientific method obsolete? Wired, 23 June. Available
Sharif et al. (2016) reported that the state-of-the-art deep at: https://www.wired.com/2008/06/pb-theory/ (accessed 21
neural network programs used in facial biometric systems April 2020).
can be fooled by people wearing colorful eyeglass frames. Angwin J, Larson J, Mattu S, et al. (2016) Machine bias.
One of the authors, a white male, was misidentified as ProPublica, 23 May. Available at: https://www.propublica.
Milla Jovovich, a white female, 88% of the time, and org/article/machine-bias-risk-assessments-in-criminal-sen-
another author, a 24-year-old Middle Eastern male, was tencing (accessed 21 April 2020).
misidentified as Carson Daly, a 43-year-old white male, Anonymous (2016) Gild review. Glassdoor, 20 March. Available at:
100% of the time—all because the eyeglass frame colors https://www.glassdoor.com/Reviews/Gild-Reviews-E459358.
led the computer program astray. Humans do not make htm (accessed 21 April 2020).
Arnott RD, Harvey CR and Markowitz H (2018) A backtesting
such mistakes because we know what eyeglass frames are,
protocol in the era of machine learning, 21 November, p. 18.
and we know that we should look past the frames to iden- Athey S (2018) The impact of machine learning on economics.
tify the person we see. Computers know none of this; they In: Agrawal A, Gans J and Avi G (eds) The Economics of
just match pixels as best they can. Artificial Intelligence: An Agenda. Chicago, IL: University
The reproducibility crisis (Baker, 2017; Ioannidis, 2005; of Chicago Press, pp. 507–547.
Pashler and Wagenmakers, 2012), in which attempts to repli- Baker M (2017) 1,500 scientists lift the lid on reproducibility.
cate research findings often fail, may be partly due to the Nature 533(7604): 452–454.
Smith 193

Begoli E and Horsey J (2012) Design principles for effective Fayyad U, Piatetsky-Shapiro G and Smyth P (1996) From data
knowledge discovery from big data. In: 2012 Joint Working mining to knowledge discovery in databases. AI Magazine
IEEE/IFIP Conference on Software Architecture and 17(3): 37–54.
European Conference on Software Architecture, Helsinki, Franck T (2019) On days when President Trump tweets a lot, the
Finland, 20–24 August 2012. New York: IEEE, pp. 215–218. stock market falls, investment bank finds. CNBC. Available
Berk R (2013) Algorithmic criminology. Security Informatics 2: 5. at: https://www.cnbc.com/2019/09/03/on-days-when-presi-
Bierman N and Stokols E (2018) Trump voices admiration and envy dent-trump-tweets-a-lot-the-stock-market-falls-investment-
of Kim Jong Un, underscoring his respect for autocrats. Los bank-finds.html (accessed 21 April 2020).
Angeles Times, 15 June. Available at: https://www.latimes.com/ Ginsberg J, Mohebbi MH, Patel RS, et al. (2009) Detecting influ-
politics/la-na-pol-trump-kim-values-20180615-story.html enza epidemics using search engine query data. Nature 457:
Bolen J, Mao H and Zeng X (2011) Twitter mood predicts the 1012–1014.
stock market. Journal of Computational Science 2(1): 1–8. Grover V and Lyytinen K (2015) New state of play in informa-
Bruce P and Bruce A (2017) Practical Statistics for Data tion systems research: The push to the edges. MIS Quarterly
Scientists: 50 Essential Concepts. Newton, MA: O’Reilly 39(2): 271–296.
Media. p. 250. Guo X, Wei Q, Chen G, et al. (2017) Extracting representative
Brynjolfsson E, Geva T and Reichman S (2016) Crowd-squared: information on intra-organizational blogging platforms. MIS
Amplifying the predictive power of search trend data. MIS Quarterly 41(4): 1105–1127.
Quarterly 40(4): 941–961. Haltiwanger J (2019) Trump calls North Korea’s Kim Jong Un,
cageymaru (2016) post, HardForum Tech News, 21 November. who’s threatened the US with nuclear war, a “great leader.”
Available at: https://hardforum.com/threads/new-program Business Insider, 27 February. Available at: https://www.
-judges-if-youre-a-criminal-from-your-facial-features. businessinsider.my/page/5769?m&jwsource=cl (accessed 21
1917912/ (accessed 21 April 2020). April 2020).
Calude CS and Longo G (2017) The deluge of spurious correla- Hastie T, Tibshirani R and Friedman J (2016) The Elements of
tions in big data. Foundations of Science 22(3): 595–612. Statistical Learning (2nd edn). New York: Springer.
Cios KJ, Pedrycz W, Swiniarski RW, et al. (2007) Data Mining: A Helft M (2008) Google uses searches to track flu’s spread. The
Knowledge Discovery Approach. New York: Springer. New York Times, 11 November. Available at: https://www.
Cohn C (2016) Facebook stymies Admiral’s plans to use nytimes.com/2008/11/12/technology/internet/12flu.html
social media data to price insurance premiums. Reuters, 2 (accessed 21 April 2020).
November. Available at: https://www.reuters.com/article/ Hope B and Chung J (2017) The future is bumpy: High-tech hedge
us-insurance-admiral-facebook/facebook-stymies-admirals- fund hits limits of robot stock picking, wall street journal. 17
plans-to-use-social-media-data-to-price-insurance-premi- December. Available at: https://www.wsj.com/articles/the-
ums-idUSKBN12X1WP (accessed 21 April 2020). future-is-bumpy-high-tech-hedge-fund-hits-limits-of-robot-
Coase R (1988) How should economists choose? In: Ideas, Their stock-picking-1513007557 (accessed 21 April 2020).
Origins and Their Consequences: Lectures to Commemorate Ioannidis JA (2005) Contradicted and initially stronger effects
the Life and Work of G. Warren Nutter. Thomas Jefferson in highly cited clinical research. Journal of the American
Center Foundation. Washington, DC: American Enterprise: Medical Association 294(2): 218–228.
Institute for Public Policy Research, pp. 63–79. Jafar M, Babb J and Abdullat A (2017) Emergence of data ana-
Dastin J (2018) Amazon scraps secret AI recruiting tool that lytics in the information systems curriculum. Information
showed bias against women. Reuters, 9 October. Available Systems Education Journal 15: 22–36.
at: https://www.reuters.com/article/us-amazon-com-jobs- Kecman V (2007) Forward. In: Cios KJ, Pedrycz W, Swiniarski
automation-insight/amazon-scraps-secret-ai-recruiting- RW, et al. (eds) Data Mining: A Knowledge Discovery
tool-that-showed-bias-against-women-idUSKCN1MK08G Approach. New York: Springer, p. xi.
(accessed 21 April 2020). Labi N (2012) Misfortune teller. The Atlantic, January–February.
Davis E (2018) Collection of Winograd schemas. Available at: Available at: https://www.theatlantic.com/magazine/archive/
https://cs.nyu.edu/faculty/davise/papers/WinogradSchemas/ 2012/01/misfortune-teller/308846/ (accessed 21 April 2020).
WSCollection.html (accessed 21 April 2020). Lazer D, Kennedy R, King G, et al. (2014) The parable of
Egami N, Fong CJ, Grimmers J, et al. (2018) How to make causal Google flu: Traps in big data analysis. Science 343(6176):
inferences using text. 15 October. Available at: https://arxiv. 1203–1205.
org/pdf/1802.02163.pdf (accessed 21 April 2020). Levesque HJ, Davis E and Morgenstern L (2012) The Winograd
Emerging Technology from the arXiv (2016) Neural network schema challenge. In: KR 2012: 13th International Conference
learns to identify criminals by their faces. MIT Technology on the Principles of Knowledge Representation and Reasoning,
Review, 22 November. Available at: https://www.technolo- Rome, 10–14 June 2012, pp. 552–561. Ultimo NSW, Australia:
gyreview.com/s/602955/neural-network-learns-to-identify- UT Sydney.
criminals-by-their-faces/ (accessed 21 April 2020). Liu Y and Tsyvinski A (2018) Risks and returns of cryptocurrency.
Equbot (2019) Available at: https://equbot.com (accessed 21 April NBER working paper no. 24877, 13 August. Available at:
2020). https://ssrn.com/abstract=3226952 (accessed 21 April 2020).
Evtimov I, Eykholt K, Fernandes E, et al. (2017) Robust physical- Martens D, Provost F, Clark J, et al. (2016) Mining massive fine-
world attacks on deep learning models. Available at: https:// grained behavior data to improve predictive analytics. MIS
arxiv.org/abs/1707.08945 (accessed 21 April 2020). Quarterly 40(4): 869–888.
194 Journal of Information Technology 35(3)

Metropolis N (1987) The beginning of the Monte Carlo method. Somerlan J (2019) Donald Trump’s gushing praise of Vladimir
Los Alamos Science 15: 125–130. Putin under fresh scrutiny after Michael Cohen allegations.
Mullainathan S and Spiess J (2017) Machine learning: An applied Independent, 18 January. Available at: https://www.independ-
econometric approach. Journal of Economic Perspectives ent.co.uk/news/world/americas/us-politics/trump-cohen-putin-
31(2): 87–106. russia-investigation-mueller-congress-fbi-a8734231.html
Nguyen A, Yosinski J and Clune J (2015) Deep neural networks (accessed 21 April 2020).
are easily fooled: High confidence predictions for unrecog- Stevenson PW (2016) Professor who predicted 30 years of presi-
nizable images. In: Proceedings of the IEEE Conference on dential elections correctly called a Trump win in September.
Computer Vision and Pattern Recognition, 8-10 June 2015, The Washington Post, 8 November. Available at: https://www.
Boston, MA. washingtonpost.com/news/the-fix/wp/2016/10/28/professor-
Pashler H and Wagenmakers EJ (2012) Editors’ introduction to whos-predicted-30-years-of-presidential-elections-correctly-
the special section on replicability in psychological sci- is-doubling-down-on-a-trump-win/ (accessed 21 April 2020).
ence: A crisis of confidence? Perspectives on Psychological Su J, Vargas DV and Kouichi S (2017) One pixel attack for fool-
Science 7(6): 528–530. ing deep neural networks. November. Available at: https://
Peck D (2013) They’re watching you at work. Atlantic, December. arxiv.org/abs/1710.08864 (accessed 21 April 2020).
Available at: https://www.theatlantic.com/magazine/archive/ Tobin J (1972) Personal communication.
2013/12/theyre-watching-you-at-work/354681/ Trump Twitter Archive (2019) Available at: http://www.trumpt-
Piatetsky-Shapiro G (1991) Knowledge discovery in real data- witterarchive.com (accessed 21 April 2020).
bases: A report on the IJCAI-89 workshop. AI Magazine Tullock G (2001) A comment on Daniel Klein’s “A plea to econo-
11(5): 68–70. mists who favor liberty.” Eastern Economic Journal 27(2):
Preis T, Moat HS and Stanley HE (2013) Quantifying trading 203–207.
behavior in financial markets using Google trends. Scientific Varian HR (2014) Big data: New tricks for econometrics. The
Reports 3: 1684. Journal of Economic Perspectives 28(2): 3–27.
Reuters (2018) Amazon ditched AI recruiting tool that favored Vorhees W (2016) Has AI gone too far? Automated inference of
men for technical jobs. The Guardian, 10 October. Available criminality using face images. Data Science Central, 29
at: https://www.theguardian.com/technology/2018/oct/10/ November. Available at: https://www.datasciencecentral.com/
amazon-hiring-ai-gender-bias-recruiting-engine (accessed profiles/blogs/has-ai-gone-too-far-automated-inference-of-
21 April 2020). criminality-using-face (accessed 21 April 2020).
Rosebeck O and Smith G (2019) The reproducibility crisis: A case Weather Underground (2019) Available at: https://www.wunder-
study. Working paper, Pomona College, Claremont, CA, July. ground.com (accessed 21 April 2020).
Ruddick G (2016) Admiral to price car insurance based on Facebook Wojcik S, Messing S, Smith A, et al. (2018) Bots in the
posts. The Guardian, 1 November. Available at: https://www. Twittersphere. Pew Research Center, 18 April. Available
theguardian.com/technology/2016/nov/02/admiral-to-price- online: https://www.pewresearch.org/internet/2018/04/09/bots-
car-insurance-based-on-facebook-posts (accessed 21 April in-the-twittersphere/ (accessed 21 April 2020).
2020). Wu X and Zhang X (2016) Automated inference on criminal-
Rudgard O (2016) Admiral to use Facebook profile to determine ity using face images. Shanghai Jiao Tong University, 21
insurance premium. The Telegraph, 2 November. Available November. Available at: https://arxiv.org/abs/1611.04135v1
at: https://www.telegraph.co.uk/insurance/car/insurer-trawls- (accessed 21 April 2020).
your-facebook-profile-to-see-how-well-you-drive/ (accessed Wu X and Zhang X (2017) Responses to critiques on machine
21 April 2020). learning of criminality perceptions. Shanghai Jiao Tong
Sagiroglu S and Sinanc D (2013) Big data: A review. In: University, 26 May. Available at: https://arxiv.org/abs/
Proceedings of 2013 international conference on collabo- 1611.04135v3 (accessed 21 April 2020).
ration technologies and systems (CTS), San Diego, CA, Yeung J, Westcott B, Liptak K, et al. (2019) G20 summit 2019:
20–24 May 2013. Trump meets leaders in Osaka. CNN, 29 June. Available
Salmon F and Stokes J (2010) Algorithms take control of wall online: https://www.cnn.com/politics/live-news/g20-june-
street. Wired, 27 December. Available at: https://www.wired. 2019-intl-hnk/index.html (accessed 21 April 2020).
com/2010/12/ff-ai-flashtrading/ (accessed 21 April 2020). Yuan L (2017) Want a loan in China? Keep your phone charged.
Sharif M, Bhagavatula S, Bauer L, et al. (2016) Accessorize to a The Wall Street Journal, 6 April. Available online: https://
crime: Real and stealthy attacks on state-of-the-art face recog- www.wsj.com/articles/want-a-loan-in-china-keep-your-
nition. In: Proceedings of the 2016 ACM SIGSAC Conference phone-charged-1491474250 (accessed 21 April 2020).
on Computer and Communications Security, Vienna, 24–28
October 2016, pp. 1528–1540. New York: ACM. Author biography
Shi Z, Lee GM and Whinston AB (2016) Toward a better measure
Gary Smith is the author of more than 90 papers and 15 books,
of business proximity: Topic modeling for industry intelli-
gence. MIS Quarterly 4(4): 1035–1056. most recently The AI Delusion (Oxford 2018), The 9 Pitfalls of
Smith G (2018) The AI Delusion. Oxford: Oxford University Data Science (Oxford 2019, co-authored with Jay Cordes and
Press. winner of the PROSE award for Excellence in Popular Science &
Smith G and Cordes J (2019) The 9 Pitfalls of Data Science. Popular Mathematics), and The Phantom Pattern Problem: The
Oxford: Oxford University Press. Mirage of Big Data.

You might also like