Why So Many Data Science Proje
Why So Many Data Science Proje
M
ore and more companies are embracing
data science as a function and a ca-
pability. But many of them have
not been able to consistently
derive business value
from their investments
in big data, artificial
intelligence, and
machine learn-
ing.1 Moreover, evidence suggests that the gap is widening
between organizations successfully gaining value from
data science and those struggling to do so.2
To better understand the mistakes that companies
make when implementing profitable data science
projects, and discover how to avoid them, we conducted
in-depth studies of the data science activities in three of
India’s top 10 private-sector banks with well-established
analytics departments. We identified five common
mistakes, as exemplified by the following cases we
encountered, and below we suggest corresponding
solutions to address them.
Mistake 1:
The Hammer in Search of a Nail
Hiren, a recently hired data scientist in one of the banks we studied,
is the kind of analytics wizard that organizations covet. 3 He is
especially taken with the k-nearest neighbors algorithm, which is useful
for identifying and classifying clusters of data. “I have applied k-nearest
neighbors to several simulated data sets during my studies,” he told us, “and I
can’t wait to apply it to the real data soon.”
Hiren did exactly that a few months later, when he To combat this problem, senior managers at the
used the k-nearest neighbors algorithm to identify es- banks in our study often turned to training. At one
pecially profitable industry segments within the bank’s bank, data science recruits were required to take
portfolio of business checking accounts. His recom- product training courses taught by domain experts
mendation to the business checking accounts team: alongside product relationship manager trainees. This
Target two of the portfolio’s 33 industry segments. bank also offered data science training tailored for
This conclusion underwhelmed the business business managers at all levels and taught by the head
team members. They already knew about these of the data science unit. The curriculum included basic
segments and were able to ascertain segment analytics concepts, with an emphasis on questions to
profitability with simple back-of-the-envelope cal- ask about specific solution techniques and where the
culations. Using the k-nearest neighbors algorithm techniques should or should not be used. In general,
for this task was like using a guided missile when a the training interventions designed to address this
pellet gun would have sufficed. problem aimed to facilitate the cross-fertilization of
In this case and some others we examined in all knowledge among data scientists, business managers,
three banks, the failure to achieve business value and domain experts and help them develop a better
resulted from an infatuation with data science understanding of one another’s jobs.
solutions. This failure can play out in several ways. In related fieldwork, we have also seen process-
In Hiren’s case, the problem did not require such an based fixes for avoiding the mistake of jumping too
elaborate solution. In other situations, we saw the quickly to a favored solution. One large U.S.-based
successful use of a data science solution in one aerospace company uses an approach it calls the
arena become the justification for its use in another Seven Ways, which requires that teams identify and
arena in which it wasn’t as appropriate or effective. compare at least seven possible solution approaches
In short, this mistake does not arise from the and then explicitly justify their final selection.
technical execution of the analytical technique; it
arises from its misapplication. Mistake 2:
After Hiren developed a deeper understanding of Unrecognized Sources of Bias
the business, he returned to the team with a new Pranav, a data scientist with expertise in statistical
recommendation: Again, he proposed using the modeling, was developing an algorithm aimed at
k-nearest neighbors algorithm, but this time at the producing a recommendation for the underwriters
customer level instead of the industry level. This responsible for approving secured loans to small
proved to be a much better fit, and it resulted in new and medium-sized enterprises. Using the credit
insights that allowed the team to target as-yet approval memos (CAMs) for all loan applications
untapped customer segments. The same algorithm processed over the previous 10 years, he compared
in a more appropriate context offered a much greater the borrowers’ financial health at the time of their
potential for realizing business value. application with their current financial status.
It’s not exactly rocket science to observe that Within a couple of months, Pranav had a software
analytical solutions are likely to work best when they tool built around a highly accurate model, which
are developed and applied in a way that is sensitive to the underwriting team implemented.
the business context. But we found that data science Unfortunately, after six months, it became clear
does seem like rocket science to many managers. that the delinquency rates on the loans were higher
Dazzled by the high-tech aura of analytics, they can after the tool was implemented than before.
lose sight of context. This was more likely, we Perplexed, senior managers assigned an experienced
discovered, when managers saw a solution work well underwriter to work with Pranav to figure out what
elsewhere, or when the solution was accompanied by had gone wrong.
an intriguing label, such as “AI” or “machine The epiphany came when the underwriter discov-
learning.” Data scientists, who were typically focused ered that the input data came from CAMs. What the
on the analytical methods, often could not or, at any underwriter knew, but Pranav hadn’t, was that CAMs
rate, did not provide a more holistic perspective. were prepared only for loans that had already been
senior leaders, regardless of whether they align with presented in the branch by an RM. Realizing that the
current priorities.6 So, there is a line to be walked here. problem wasn’t the recommender’s model but the
If an insight arises that does not fit current priorities delivery mode of the recommendations, Sophia met
and systems but nonetheless could deliver significant with the senior leaders in branch banking and
value to the company, it is incumbent upon data proposed relaunching the recommendation engine
scientists to communicate this to management. as a tool to support product sales through the RMs.
We found that to facilitate exploratory work, bank The redesigned initiative was a huge success.
executives sometimes assigned additional data The difficulties Sophia encountered highlight
scientists to project teams. These data scientists did the need to pay attention to how the outputs of
not colocate and were instructed not to concern analytical tools are communicated and used. To
themselves with team priorities. On the contrary, they generate full value for customers and the business,
were tasked with building alternative solutions related user experience analysis should be included in the
to the project. If these solutions turned out to be data science design process. At the very least, user
viable, the head of the data science unit pitched them testing should be an explicit part of the data science
to senior management. This dual approach recognizes project life cycle. Better yet, a data science practice
the epistemic interdependence between the data could be positioned within a human-centered
science and business professionals — a scenario design frame. In addition to user testing, such a
To generate full
in which data science seeks to address frame could mandate user research on the
today’s business needs as well as detect front end of the data science process.
opportunities to innovate and transform value from data While we did not see instances of data
current business practices.7 Both roles are
important, if data science is to realize as
science, user science embedded within design thinking
or other human-centered design
much business value as possible. experience practices in this study, we did find that
analysis should be the shadowing procedures described
Mistake 4: above sometimes operated as a kind
Right Tool, Wrong User included in the of user experience analysis. As data
Sophia, a business analyst, worked with design process, scientists shadowed other employees to
her team to develop a recommendation
engine capable of offering accurately
and user testing understand the sources of data, they also
gained an understanding of users and
targeted new products and services to the should be part channels through which solutions could be
bank’s customers. With assistance from the
marketing team, the recommender was added
of the project delivered. In short, the use of shadowing in
data science projects contributes to a better
to the bank’s mobile wallet app, internet banking life cycle. understanding of the processes that generate data,
site, and emails. But the anticipated new business and of solution users and delivery channels.
never materialized: Customer uptake of the product
suggestions was much lower than anticipated. Mistake 5:
To discover why, the bank’s telemarketers surveyed The Rocky Last Mile
a sample of customers who did not purchase the new The bank’s “win-back” initiative, which was aimed at
products. The mystery was quickly solved: Many recovering lost customers, had made no progress for
customers doubted the credibility of recommendations months. And that day’s meeting between the data
delivered through apps, websites, and emails. scientists and the product managers, which was
Still looking for answers, Sophia visited several of supposed to get the initiative back on track, was not
the bank’s branches, where she was surprised to going well either.
discover the high degree of trust customers appeared Data scientists Dhara and Viral were focused on
to place in the advice of relationship managers how to identify which lost customers were most likely
(RMs). A few informal experiments convinced her to return to the bank, but product managers
that customers would be much more likely to accept Anish and Jalpa wanted to discuss the details of
the recommendation engine’s suggestions when the campaign to come and were pushing the data