100% found this document useful (1 vote)
77 views6 pages

Why So Many Data Science Proje

1) Many data science projects fail to deliver business value due to common mistakes. This document discusses five such mistakes: focusing too much on favorite algorithms without considering business needs, unrecognized sources of bias in data or models, poor communication between data scientists and business teams, failure to prioritize projects, and lack of resources to implement successful projects. 2) Examples from banks show data scientists proposing complex algorithms when simpler solutions sufficed, and models producing worse outcomes due to biased training data that ignored important variables. 3) Suggested solutions include training to foster understanding across roles, scrutinizing various solution approaches, and addressing resource constraints to implement promising projects.

Uploaded by

Abimbola Potter
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
77 views6 pages

Why So Many Data Science Proje

1) Many data science projects fail to deliver business value due to common mistakes. This document discusses five such mistakes: focusing too much on favorite algorithms without considering business needs, unrecognized sources of bias in data or models, poor communication between data scientists and business teams, failure to prioritize projects, and lack of resources to implement successful projects. 2) Examples from banks show data scientists proposing complex algorithms when simpler solutions sufficed, and models producing worse outcomes due to biased training data that ignored important variables. 3) Suggested solutions include training to foster understanding across roles, scrutinizing various solution approaches, and addressing resource constraints to implement promising projects.

Uploaded by

Abimbola Potter
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

D AT A A N A L Y T I C S

Why So Many Data


Science Projects
Fail to Deliver
Organizations can gain more business value from advanced analytics
by recognizing and overcoming five common obstacles.
BY MAYUR P. JOSHI, NING SU, ROBERT D. AUSTIN, AND ANAND K. SUNDARAM

M
ore and more companies are embracing
data science as a function and a ca-
pability. But many of them have
not been able to consistently
derive business value
from their investments
in big data, artificial
intelligence, and
machine learn-
ing.1 Moreover, evidence suggests that the gap is widening
between organizations successfully gaining value from
data science and those struggling to do so.2
To better understand the mistakes that companies
make when implementing profitable data science
projects, and discover how to avoid them, we conducted
in-depth studies of the data science activities in three of
India’s top 10 private-sector banks with well-established
analytics departments. We identified five common
mistakes, as exemplified by the following cases we
encountered, and below we suggest corresponding
solutions to address them.

Mistake 1:
The Hammer in Search of a Nail
Hiren, a recently hired data scientist in one of the banks we studied,
is the kind of analytics wizard that organizations covet. 3 He is
especially taken with the k-nearest neighbors algorithm, which is useful
for identifying and classifying clusters of data. “I have applied k-nearest
neighbors to several simulated data sets during my studies,” he told us, “and I
can’t wait to apply it to the real data soon.”

JEAN FRANCOIS PODEVIN/THEISPOT.COM SPRING 2021 MIT SLOAN MANAGEMENT REVIEW 85


D AT A A N A L Y T I C S

Hiren did exactly that a few months later, when he To combat this problem, senior managers at the
used the k-nearest neighbors algorithm to identify es- banks in our study often turned to training. At one
pecially profitable industry segments within the bank’s bank, data science recruits were required to take
portfolio of business checking accounts. His recom- product training courses taught by domain experts
mendation to the business checking accounts team: alongside product relationship manager trainees. This
Target two of the portfolio’s 33 industry segments. bank also offered data science training tailored for
This conclusion underwhelmed the business business managers at all levels and taught by the head
team members. They already knew about these of the data science unit. The curriculum included basic
segments and were able to ascertain segment analytics concepts, with an emphasis on questions to
profitability with simple back-of-the-envelope cal- ask about specific solution techniques and where the
culations. Using the k-nearest neighbors algorithm techniques should or should not be used. In general,
for this task was like using a guided missile when a the training interventions designed to address this
pellet gun would have sufficed. problem aimed to facilitate the cross-fertilization of
In this case and some others we examined in all knowledge among data scientists, business managers,
three banks, the failure to achieve business value and domain experts and help them develop a better
resulted from an infatuation with data science understanding of one another’s jobs.
solutions. This failure can play out in several ways. In related fieldwork, we have also seen process-
In Hiren’s case, the problem did not require such an based fixes for avoiding the mistake of jumping too
elaborate solution. In other situations, we saw the quickly to a favored solution. One large U.S.-based
successful use of a data science solution in one aerospace company uses an approach it calls the
arena become the justification for its use in another Seven Ways, which requires that teams identify and
arena in which it wasn’t as appropriate or effective. compare at least seven possible solution approaches
In short, this mistake does not arise from the and then explicitly justify their final selection.
technical execution of the analytical technique; it
arises from its misapplication. Mistake 2:
After Hiren developed a deeper understanding of Unrecognized Sources of Bias
the business, he returned to the team with a new Pranav, a data scientist with expertise in statistical
recommendation: Again, he proposed using the modeling, was developing an algorithm aimed at
k-nearest neighbors algorithm, but this time at the producing a recommendation for the underwriters
customer level instead of the industry level. This responsible for approving secured loans to small
proved to be a much better fit, and it resulted in new and medium-sized enterprises. Using the credit
insights that allowed the team to target as-yet approval memos (CAMs) for all loan applications
untapped customer segments. The same algorithm processed over the previous 10 years, he compared
in a more appropriate context offered a much greater the borrowers’ financial health at the time of their
potential for realizing business value. application with their current financial status.
It’s not exactly rocket science to observe that Within a couple of months, Pranav had a software
analytical solutions are likely to work best when they tool built around a highly accurate model, which
are developed and applied in a way that is sensitive to the underwriting team implemented.
the business context. But we found that data science Unfortunately, after six months, it became clear
does seem like rocket science to many managers. that the delinquency rates on the loans were higher
Dazzled by the high-tech aura of analytics, they can after the tool was implemented than before.
lose sight of context. This was more likely, we Perplexed, senior managers assigned an experienced
discovered, when managers saw a solution work well underwriter to work with Pranav to figure out what
elsewhere, or when the solution was accompanied by had gone wrong.
an intriguing label, such as “AI” or “machine The epiphany came when the underwriter discov-
learning.” Data scientists, who were typically focused ered that the input data came from CAMs. What the
on the analytical methods, often could not or, at any underwriter knew, but Pranav hadn’t, was that CAMs
rate, did not provide a more holistic perspective. were prepared only for loans that had already been

86 MIT SLOAN MANAGEMENT REVIEW SPRING 2021 SLOANREVIEW.MIT.EDU


prescreened by experienced relationship managers model for analyzing savings account attrition, and
and were very likely to be approved. Data from loan he then spent three more months fine-tuning it to
applications rejected at the prescreening stage was not improve its accuracy. When he shared the final
used in the development of the model, which product with the savings account product team,
produced a huge selection bias. This bias led Pranav they were impressed, but they could not sponsor its
to miss the import of a critical decision parameter: implementation because their annual budget had
bounced checks. Unsurprisingly, there were very few already been expended.
instances of bounced checks among the borrowers Eager to avoid the same result the following
whom relationship managers had prescreened. year, Kartik presented his model to the product
The technical fix in this case was easy: Pranav team before the budgeting cycle began. But now
added data on loan applications rejected in the team’s mandate from senior management had
prescreening, and the “bounced checks” parameter shifted from account retention to account
became an important element in his model. The acquisition. Again, the team was unable to sponsor
tool began to work as intended. THE a project based on Kartik’s model.
The bigger problem for companies seeking to In his third year of trying, Kartik finally got
RESEARCH
achieve business value from data science is how approval for the project, but he had little to
to discern such sources of bias upfront and This article is based on an celebrate. “Now they want to implement it,”
ensure that they do not creep into models in in-depth study of the data he told us, with evident dismay, “but the
science efforts in three
the first place. This is challenging because large, private-sector Indian model has decayed and I will need to
laypeople — and sometimes analytics banks with collective build it again!”
assets exceeding
experts themselves — can’t easily tell $200 million. The mistake that blocks banks from
how the “black box” of analytics achieving value in cases like this is a lack
generates output. And analytics experts The study included of synchronization between data
who do understand the black box often onsite observations; science and the priorities and processes
semistructured interviews
do not recognize the biases embedded with 57 executives, of the business. To avoid it, better links
in the raw data they use. managers, and between data science and the strategies
data scientists;
The banks in our study avoid un- and the examination and systems of the business are needed.
recognized bias by requiring that data of archival records. Senior executives can ensure the
scientists become more familiar with the alignment of data science activities with
sources of the data they use in their models. The five obstacles and the organizational strategies and systems by more
solutions for overcoming
For instance, we saw one data scientist spend a them emerged from tightly integrating data science practices and data
month in a branch shadowing a relationship an inductive analytical scientists with the business in physical, structural,
process based on the
manager to identify the data needed to ensure that a qualitative data. and process terms. For example, one bank embedded
model produced accurate results. data scientists in business teams on a project basis. In
We also saw a project team composed of data this way, the data scientists rubbed elbows with the
scientists and business professionals use a formal business team day to day, becoming more aware of its
bias-avoidance process, in which they identified priorities and deadlines — and in some cases actually
potential predictor variables and their data sources anticipating unarticulated business needs. We have
and then scrutinized each for potential biases. The also seen data science teams colocated with business
objective of this process was to question assumptions teams, as well as the use of process mandates, such as
and otherwise “deodorize” the data — thus avoiding requiring that project activities be conducted at the
problems that can arise from the circumstances in business team’s location or that data scientists be
which the data was created or gathered.4 included in business team meetings and activities.
Generally speaking, data scientists ought to be
Mistake 3: concentrating their efforts on the problems deemed
Right Solution, Wrong Time most important by business leaders.5 But there is a
Kartik, a data scientist with expertise in machine caveat: Sometimes data science produces unexpected
learning, spent a month developing a sophisticated insights that should be brought to the attention of

SLOANREVIEW.MIT.EDU SPRING 2021 MIT SLOAN MANAGEMENT REVIEW 87


D AT A A N A L Y T I C S

senior leaders, regardless of whether they align with presented in the branch by an RM. Realizing that the
current priorities.6 So, there is a line to be walked here. problem wasn’t the recommender’s model but the
If an insight arises that does not fit current priorities delivery mode of the recommendations, Sophia met
and systems but nonetheless could deliver significant with the senior leaders in branch banking and
value to the company, it is incumbent upon data proposed relaunching the recommendation engine
scientists to communicate this to management. as a tool to support product sales through the RMs.
We found that to facilitate exploratory work, bank The redesigned initiative was a huge success.
executives sometimes assigned additional data The difficulties Sophia encountered highlight
scientists to project teams. These data scientists did the need to pay attention to how the outputs of
not colocate and were instructed not to concern analytical tools are communicated and used. To
themselves with team priorities. On the contrary, they generate full value for customers and the business,
were tasked with building alternative solutions related user experience analysis should be included in the
to the project. If these solutions turned out to be data science design process. At the very least, user
viable, the head of the data science unit pitched them testing should be an explicit part of the data science
to senior management. This dual approach recognizes project life cycle. Better yet, a data science practice
the epistemic interdependence between the data could be positioned within a human-centered
science and business professionals — a scenario design frame. In addition to user testing, such a

To generate full
in which data science seeks to address frame could mandate user research on the
today’s business needs as well as detect front end of the data science process.
opportunities to innovate and transform value from data While we did not see instances of data
current business practices.7 Both roles are
important, if data science is to realize as
science, user science embedded within design thinking
or other human-centered design
much business value as possible. experience practices in this study, we did find that
analysis should be the shadowing procedures described
Mistake 4: above sometimes operated as a kind
Right Tool, Wrong User included in the of user experience analysis. As data
Sophia, a business analyst, worked with design process, scientists shadowed other employees to
her team to develop a recommendation
engine capable of offering accurately
and user testing understand the sources of data, they also
gained an understanding of users and
targeted new products and services to the should be part channels through which solutions could be
bank’s customers. With assistance from the
marketing team, the recommender was added
of the project delivered. In short, the use of shadowing in
data science projects contributes to a better
to the bank’s mobile wallet app, internet banking life cycle. understanding of the processes that generate data,
site, and emails. But the anticipated new business and of solution users and delivery channels.
never materialized: Customer uptake of the product
suggestions was much lower than anticipated. Mistake 5:
To discover why, the bank’s telemarketers surveyed The Rocky Last Mile
a sample of customers who did not purchase the new The bank’s “win-back” initiative, which was aimed at
products. The mystery was quickly solved: Many recovering lost customers, had made no progress for
customers doubted the credibility of recommendations months. And that day’s meeting between the data
delivered through apps, websites, and emails. scientists and the product managers, which was
Still looking for answers, Sophia visited several of supposed to get the initiative back on track, was not
the bank’s branches, where she was surprised to going well either.
discover the high degree of trust customers appeared Data scientists Dhara and Viral were focused on
to place in the advice of relationship managers how to identify which lost customers were most likely
(RMs). A few informal experiments convinced her to return to the bank, but product managers
that customers would be much more likely to accept Anish and Jalpa wanted to discuss the details of
the recommendation engine’s suggestions when the campaign to come and were pushing the data

88 MIT SLOAN MANAGEMENT REVIEW SPRING 2021 SLOANREVIEW.MIT.EDU


scientists to take responsibility for its implementation negative impact on the core responsibilities of data
immediately. After the meeting adjourned without a scientists became excessive.
breakthrough, Viral vented his frustration to Dhara:
“If data scientists and analysts do everything, why THE MISTAKES WE IDENTIFIED invariably
does the bank need product managers? Our job is to occurred at the interfaces between the data science
develop an analytical solution; it’s their job to execute.” function and the business at large. This suggests that
By the next meeting, though, Viral seemed to have leaders should be adopting and promoting a broader
changed his mind. He made a determined effort to un- conception of the role of data science within their
derstand why the product managers kept insisting that companies — one that includes a higher degree of
the data scientists take responsibility for implementa- coordination between data scientists and employees
tion. He discovered that on multiple occasions in the responsible for problem diagnostics, process
past, the information systems department had given administration, and solution implementation. This
the bank’s product managers lists of customers to tar- tighter linkage can be achieved through a variety of
get for win-back that had not resulted in a successful means, including training, shadowing, colocating,
campaign. It turned out that using the lists had been and offering formal incentives. Its payoff will be fewer
extremely challenging, partly due to an inability to solution failures, shorter project cycle times, and,
track customer contacts — so the product managers ultimately, the attainment of greater business value.
felt that being given another list of target customers
was simply setting them up for another failure. Mayur P. Joshi (@mayur_p_joshi) is an assistant pro-
fessor in FinTech at Alliance Manchester Business
With this newfound understanding of the prob- School at the University of Manchester. Ning Su
lem from the point of view of the product managers, (@ningsu) is an associate professor of general
management, strategy, and information systems at
Viral and Dhara added to their project plan the devel-
Ivey Business School at Western University. Robert
opment of a front-end software application for the D. Austin (@morl8tr) is a professor of information
bank’s telemarketers, email management teams, systems at Ivey Business School. Anand K. Sundaram
(@iyeranandkiyer) is head of retail analytics at
branch banking staff, and assets teams. This provided
IDFC First Bank. Comment on this article at https://
them with a tool where they could feed information sloanreview.mit.edu/x/62317.
from their interactions with customers and make bet-
ter use of the lists provided by the data science team. REFERENCES
Finally, the project moved ahead. 1. R. Bean and T.H. Davenport, “Companies Are Failing in
Viral and Dhara’s actions required an unusual Their Efforts to Become Data-Driven,” Harvard Business
Review, Feb. 5, 2019, https://hbr.org.
degree of empathy and initiative. They stepped out
2. T.H. Davenport, N. Mittal, and I. Saif, “What Separates
of their roles as data scientists and acted more like
Analytical Leaders From Laggards?” MIT Sloan Manage-
project leaders. But companies probably should ment Review, Feb. 3, 2020, https://sloanreview.mit.edu.
not depend on data scientists in this way, and they 3. The names of people and organizations are pseudonyms,
may not want to — after all, the technical expertise in keeping with our agreements with the companies.
of data scientists is a scarce and expensive resource. 4. S. Ransbotham, “Deodorizing Your Data,” MIT Sloan
Management Review, Aug. 24, 2015, https://sloanreview
Instead, companies can involve data scientists in
.mit.edu.
the implementation of solutions. One bank in our
5. T. O’Toole, “What’s the Best Approach to Data Analytics?”
study achieved this by adding estimates of the business Harvard Business Review, March 2, 2020, https://hbr.org.
value delivered by data scientists’ solutions to their 6. S. Ransbotham, “Avoiding Analytical Myopia,”
performance evaluations. This motivated data MIT Sloan Management Review, Jan. 25, 2016,
https://sloanreview.mit.edu.
scientists to ensure the successful implementation of
7. P. Puranam, M. Raveendran, and T. Knudsen,
their solutions. The bank’s executives acknowledged
“Organization Design: The Epistemic Interdependence
that this sometimes caused data scientists to operate Perspective,” Academy of Management Review 37,
too far outside their assigned responsibilities. no. 3 (July 2012): 419-440.
However, they believed that ensuring value delivery Reprint 62317.
justified the diversion of data science resources, and Copyright © Massachusetts Institute of Technology, 2021.
that it could be corrected on a case-by-case basis, if the All rights reserved.

SLOANREVIEW.MIT.EDU SPRING 2021 MIT SLOAN MANAGEMENT REVIEW 89


Reproduced with permission of copyright owner. Further reproduction
prohibited without permission.

You might also like