Big Data and The Future of R&D Management

Research-Technology Management

Big Data and the Future of R&D Management

Michael Blackburn, Jeffrey Alexander, J. David Legan & Diego Klabjan

OVERVIEW: This study explores the concept of big data and whether, and to what extent, it might affect R&D management
in the future. Through extensive discussions to dissect the nature of big data and to achieve a common understanding of
what it represents, a research framework was constructed to analyze the impact of big data based on its potential to inform,
enable, and transform or disrupt R&D management across four dimensions: strategy, people, technology, and process
integration. A literature review, interviews with experts, and case studies of organizations using big data demonstrate that
this phenomenon will have significant implications for R&D and innovation management, although the nature and extent
of that impact is somewhat uneven among different industry sectors.

KEYWORDS: Big data, Digitalization, IRI research

Big data is increasingly pervasive, changing how we big data is so pervasive and well-established that it is
understand the world. Researchers have used big data, no longer “emerging” (Sharwood 2015). Clearly, all R&D
gathered from social media streams, sensors embedded in organizations will need to deal with big data sooner or later.
consumer products, and elsewhere, to identify problems Organizations are no longer asking whether or not they
with newly launched products before they escalate and to should exploit big data for competitive advantage—
develop ideas for enhancements to existing products based they are determining what to do with the big data that is
on their observed performance. As technology has enabled already part of the operating environment.
more organizations to access and analyze big data, it has But although it offers intriguing possibilities for new
become more common. In fact, market research firm approaches to R&D, new business models, even new
Gartner Group recently removed big data from its annual markets, for many, as Stephen Hoover has noted, “Big data
“hype cycle” chart of emerging technologies, arguing that isn’t a solution—it’s a problem” (Hoover 2015). The many

DOI: 10.1080/08956308.2017.1348135
acquisition, storage, and analysis of big data. Another set
IRI Research Profile of literature looks at its operational impact on companies,
primarily focusing on customer-facing functions such as
Big Data and the Future of R&D Management marketing and customer service. Publications on the impact
of big data on research, to the extent that they exist,
Exploring big data and its implications for R&D generally come from government research institutions and
academia. Very little has been published about private-
Goal: To develop a common understanding of what big sector applications of big data to R&D, although this does
data is and explore how it will impact R&D management not mean the private sector is not engaging with big data.
and activities in the next decade. One problem with the term “big data” is that it does not
Co-Chairs: Jeffrey Alexander (RTI International), Michael describe a particular technology or approach. As Bill Pike of
Blackburn (Cargill), David Legan the Pacific Northwest National Laboratory told the research
Subject Matter Expert: Diego Klabjan (Northwestern
group, “Big data is data of sufficient size and complexity
University) to challenge contemporary analytical techniques.” For
organizations, big data is any data of sufficient size and
For more information, contact Mike Blackburn at
complexity to challenge the analytical techniques and
[email protected].
technologies available to it. Thus, big data will mean
different things to different organizations. For organizations

accustomed to working with massive data sets, big data
are those who have not begun the journey to understand
implies a scale beyond state-of-the-art data management
what big data is and how it can be used to change the
technologies. For other organizations, big data may be
way we do R&D or generate value.
any data set that cannot be handled by Microsoft Excel.
R&D organizations are increasingly moving into the
The more useful approach to defining big data is to look
realm of big data—some driven by advancements in
at the characteristics of what we call big data—the five
technology that have increased the amount of data
Vs—and how those characteristics relate to the ways
gathered in a single experiment to a level that requires
organizations are accustomed to accessing, processing,
special handling. However, few have considered what the
and using data.
changes in this data landscape will mean for R&D and
In this context, big data is the problem to the extent that
R&D management. The goal of this study is to understand
it challenges organizations’ ability to absorb and harvest
how big data will affect R&D management and R&D
value from their data streams. The solution to that
activities in the future.
problem is advanced analytics—techniques such as
machine learning, unstructured textual analysis, and other
Analytics: Answering Big Data Questions
tools that can glean insights from large, complex data sets.
Big data is a term that is widely used but has no commonly
Advanced analytics identify latent relationships between
accepted definition. It is most commonly defined in terms of
variables, uncovering patterns that are not discernable by
five Vs: volume, variety, velocity, value, and veracity. In
humans alone. This interaction between data, models,
other words, truly big data is large in volume, varied in type
and analysis is the core of the promise of big data for
and source, and accessible quickly once it is generated—
applications in R&D. For instance, artificial intelligence
increasingly, these days, in real time; it may vary in
and machine learning systems are likely to play growing
composition and meaning over time, and it may or may
roles in both project and portfolio management, helping
not be trustworthy. The original definition, coined by Laney
R&D leaders make smarter decisions and improving both
(2001), included just the first three characteristics; two
the execution and value proposition of R&D (Farrington
additional terms—value and veracity—were added as it
and Crews 2013).
became evident that the potential of big data is in the value
Analytics may take a variety of forms, but they all have
of the information and the need to ensure its integrity (Marr
the same goal—to glean insights from raw data. Those
2014). There are also a number of technical terms that are
insights can then be used to create predictive models organi-
part of the conversation around big data, including open
zations can apply to business processes and other elements,
data (data that is typically in the public domain and readily
helping them to achieve business objectives. Robinson,
available, such as government data), found data (data ori-
Levis, and Bennet (2010) describe three types of analytics:
ginally generated for a specific purpose that can be analyzed
for a different purpose—for example, analyzing credit card . Descriptive analytics use data to find out what has
transactions to discern consumer purchasing patterns), happened in the past.
and datafication (the trend toward capturing more aspects . Predictive analytics use data to find out what might
of social and physical phenomena as digital data), as well happen in the future (forecasting and estimation).
as many others (Alexander, Blackburn, and Legan 2015). . Prescriptive analytics use data to identify the courses of
Large quantities of published information are available action that are likely to produce the best outcomes
on technical developments that support the creation, under given conditions.

44 j Research-Technology Management Big Data and the Future of R&D Management

Descriptive analytics is comparable to traditional business
intelligence—the compilation of statistics and major findings
about past activities and conditions in a given time period As big data gains a foothold,
(Delen 2014). Predictive analytics applies new techniques
to traditional forecasting to create sophisticated models; management decisions based purely on
modern approaches to predictive analytics use advanced intuition or experience are increasingly
statistical methods and machine learning algorithms to iso-
late and examine thousands of variables simultaneously in being regarded as suspect.
the context of a predictive model. This kind of analysis
allows the interactions of many variables to be observed
and the ones driving a potential result to be identified. The
work of quantitative hedge funds in modeling the stock The Study
market is one example of predictive analytics at work. The design of this project was based on broad input from
Prescriptive analytics applies techniques such as opti- IRI members at workshops created to elicit areas of interest
mization, simulation, and heuristics-based decision making and early questions. The primary input came from 22
to map the potential consequences of alternative strategies attendees at a workshop held during IRI’s 2015 Winter
or courses of action. This type of analysis provides an ROR Meeting. That workshop uncovered a wide range of
understanding of the trade-offs between different options; questions, reflecting a broad range of understanding of
it may improve the quality of decisions by integrating more and involvement with big data in participants’ operations.
factors and more complex interactions than humans are Some members reported explorations of big data involving
capable of processing unaided. One area where prescriptive large research programs and dedicated groups; other
analytics is making inroads is in computer-assisted diagno- organizations were still trying to determine the meaning
sis. In these systems, a physician enters observations about of big data and why they should be concerned with it. With
a patient into an engine that then scans the medical litera- this diversity of understanding and utilization of big data,
ture to identify diseases or disorders that might be generat- the first challenge was to develop a common understanding
ing those symptoms. The system can then analyze all of of what big data is and identify an approach to frame the
those inputs to identify the procedures or treatments most discussion of its likely impacts on R&D.
likely to be effective in that particular case (Haftner 2012). That approach emerged from early literature analysis
Big data analytics will inevitably have an impact on and the IRI’s 2015 Winter ROR Meeting workshop as we
management. As big data gains a foothold, management worked to define the scope of the Digitalization project.
decisions based purely on intuition or experience are We identified three kinds of impacts—inform, enable, and
increasingly being regarded as suspect (Economist transform/disrupt—across four key elements of R&D
Intelligence Unit 2012). LaValle and colleagues (2010) operations—strategy, people, technology, and process
report that, in their study, “top-performing organizations (Table 1). The impacts align with innovation frameworks,
use analytics five times more than lower performers.” The which typically define projects in terms of incremental,
challenge to management is that decisions about strategy adjacent, and transformational outcomes. The elements of
and operations become more complex as the complexity people, process, and technology have long been used in
of the data that must be considered increases (Zhao, Fan, change management (Maltaverne 2015); strategy was
and Hu 2014). The question is whether R&D management added because of its importance to R&D.
is prepared to cope with this changing environment. To begin to answer the questions raised by this way of
The rise of big data, and big data tools, then, will present thinking, we looked to interviews with thought leaders,
challenges and opportunities across the full range of R&D further review of the literature, and requests for examples
management responsibilities and activities. Going forward, from attendees at IRI meetings. Ultimately, we hoped to
it will increasingly inform innovation and the process gather a set of examples and cases that would help demon-
a company uses to execute innovation, enable new strate how big data is being used now and illuminate how it
approaches to R&D, and transform the practice of R&D. is likely to change R&D practices going forward.
Some of these changes—particularly those with regard to We began the study by interviewing eight thought
how big data informs or enables innovation—may be leaders who are recognized by IRI members as leaders in
largely incremental, driving toward accelerating R&D while understanding and using big data, to help gain an under-
driving down cost and risks. Larger challenges lie in the standing of what big data applications look like and how
potential for big data and analytics to disrupt or transform big data is likely to develop in the future. The hour-long
current business models, for instance, as nontraditional interviews were recorded and summarized, then reviewed
players find ways to use big data to dislodge established by the group to extract key points. These interviews
market leaders, or established leaders radically reshape provided insights that helped refine the framework and
their structures and processes to make better use of big structure our questions for the case studies.
data and potentially remake their businesses, rendering We then began to gather examples of the application
competitors’ models obsolete. of big data in R&D that we could align to the framework.

TABLE 1. Mapping big data’s likely impacts on R&D
Strategy People Technology Process integration
How will big data How could R&D Who will use big data to What big data How will R&D
inform R&D/ management improve inform R&D management, technologies and systems management practices
innovation? through the use of big and what will they need to will R&D leaders use to and processes change as
data? know? improve decision making? big data becomes
How will big data What new capabilities and How will research teams’ What big data technology How will R&D activities
enable new approaches to innovation skills and knowledge need and systems will become change as big data
approaches to will big data make to change to make use of part of the R&D process? becomes pervasive?
R&D/innovation? possible? big data?
How will big data How can big data create/ Who will use big data as a What big data What should companies
transform/disrupt identify opportunities to tool for disruption? technologies on the do to predict and exploit
existing approaches disrupt markets and horizon will enable future disruptive opportunities
to R&D/innovation? industries? How might disruptive opportunities? presented by big data?
competitors use big data
against incumbents?
A literature survey identified nine case studies, and we Observations

interviewed five IRI members who had offered informa- The thought leader interviews were completed early in
tion about applications of big data they had implemented our efforts and helped to refine our framework. These
in their organizations. Additional examples were collected interviews revealed several key themes. One was that every
from 71 participants in a workshop held at the 2015 IRI organization has a large amount of data that is being
Member Summit, 109 participants in a workshop at the underutilized; big data analytics can pull value from this
2016 IRI Annual Meeting, and 150 participants in three data. Another point that several interviewees highlighted
workshops held at the 2016 IRI Member Summit. In was that the human capital required to deploy big data
each of these workshops, we talked through the frame- solutions effectively is different than it was in the past;
work then asked participants to work in groups to today’s data scientists require both scientific and data
describe examples that fit the framework. In the first analysis skills. These interviews also pointed to the reality
two workshops, we discussed the examples brought up that some segments of industry are more affected than
by participants and put them into the framework in a others. Finally, all of our early-stage interviewees expressed
full-group discussion. The IRI case studies were identified the belief that big data analytics will continue to develop and
from these examples. will become an accepted cost of doing business in the future.
In the 2016 IRI Member Summit workshop, we took a The examples observed in the literature were quite
different approach. In this iteration, we asked participants diverse, including analysis of large data sets, cheminfor-
to talk about both how they saw R&D being affected and matics (Bunger 2015), advanced analytics using approaches
where they had seen these impacts. This final session, such as machine learning (Li 2011) and artificial intelli-
which included 150 participants, generated a total of 237 gence (Wigley et al. 2016), pattern recognition, image
ideas, demonstrating participants’ recognition of the likely analysis, text analytics (Markham, Kowolenko, and
impact of big data for R&D. Michaelis 2015), virtual experimentation and simulation,
After the workshops were complete, we reviewed each forecasting (Huang et al. 2015), and bioinformatics and
example—including both literature cases and examples genomics (Stevens 2013). Many of these examples
collected in workshops—and placed it in the framework described applications that are quite complex.
where it best fit; some examples were placed in more In addition to providing examples for analysis, the
than one area of the framework. The final evaluation workshops offered a sense of participants’ understanding
was organized by industry segments because we noted of and feelings about big data. These discussions made
a broad distribution of knowledge and understanding of very apparent the diversity of understanding of big data.
big data within the IRI members and we recognized Often, statements or questions would be about the tech-
an uneven impact of big data across industries, with nology enabling big data rather than the application and
some segments clearly ahead of others. We organized impact of big data on R&D. In the small-group discussions,
these data into eight categories: Industrial Manufactur- participants found it relatively easy to identify examples
ing, Consumer Goods, Food & Beverage, High Tech, of big data informing R&D, more difficult to identify
Energy, Chemicals, Health Care & Pharmaceuticals, and examples of big data enabling new approaches to R&D,
Government. We then classified each of these industry and challenging to identify examples of big data disrupting
segments based on whether we identified many examples or transforming R&D.
(4þ), some examples (2–3), or one or no examples (0–1) The examples we collected clearly demonstrate that all
of big data applications for R&D in that segment industry segments are being informed by big data and its
(Table 2). uses in strategy, people, technology, and process

TABLE 2. Big data applications by industry
RQs Strategy People Technology Process integration
How will
big data
inform future

How will
Big Data
enable new
to R&D/

How will
Big Data
disrupt R&D/

integration. However, within a given segment, there can be have also invested in big data capabilities in sales and
a broad range of understanding and execution. While there marketing to provide leads and insights through, for
are many examples of big data enabling new R&D example, monitoring social media feeds. These data,
approaches across industry segments, government and the combined with data from consumer calls and external
more technology-focused industry segments are ahead in databases such as patents and scientific literature, generate
this regard. This same group leads in the use of big data to insights for R&D in the form of recommendations for
transform or disrupt R&D. product improvements.
For Eastman and consumer goods companies, big data
Inform provides both marketing and innovation insight. In
Insights provided by big data can inform both the kinds of industrial manufacturing, big data also plays more than
innovations an organization pursues and the process it one role—both providing a customer service and feeding
uses to produce new products and services. Big data can back into companies’ innovation plans. The increasingly
contribute to opportunity assessment, project selection, common incorporation of sensors and the Internet of
and even identification of potentially fruitful incremental Things (IoT) into industrial products provides performance
product improvements. data that companies use to support the provision of services
For instance, Eastman Chemical Company engaged in a to existing customers. That performance data can also help
collaboration with North Carolina State University to apply identify opportunities to improve product performance,
big data to gain insight into 3D printing technology and heighten efficiency, or fill new customer needs. In other
the market environment. The project collected consumer words, even as data supports an existing product, it gives
responses to and attitudes toward relevant Eastman pro- R&D information about what the next generation of
ducts and competitors’ products from social media and used projects in the portfolio should look like.
unstructured text analytics to identify consumer concerns If big data is to inform R&D, new skills and competencies
and needs. Ultimately, the analysis highlighted the envir- will be needed. The thought leaders we interviewed noted
onmental impact of the products as a key consumer that analysts need to be familiar both with data analytics
concern. Thus, big data delivered rapid opportunity assess- and with the underlying business or research question
ment and identified a lucrative, underserved market space being addressed, calling to mind the pi-shaped skill set
that could be addressed by Eastman’s capabilities. described by Alexander, Blackburn, and Legan (2015).
Eastman’s use of big data spanned the boundary These observations, our interviewees said, point to a
between marketing and R&D; consumer goods companies need for changes in both hiring and training in R&D

organizations and highlight the importance of organiza- and be willing to make investments both in time to learn
tional willingness to invest in the infrastructure and human and in funds to purchase these tools. R&D management
capital needed to support the use of big data. must also understand the change management required
when these new tools and techniques are introduced to
Enable the R&D organization. Learning organizations will have
Big data can enable more efficient and effective innovation an advantage, as they are quicker to absorb new knowledge
by increasing the ability of researchers to obtain needed and embrace new ways of working.
information, allowing faster iteration on designs and sup-
porting virtual design and experimentation. Increasingly, Transform/Disrupt
software is a key to lab management, linking experimental As big data increasingly enables new approaches to R&D,
results with search and analytics tools, and automating and new business models that change the market itself,
experimental design can save thousands of dollars in time R&D leaders must consider how big data developments will
and materials (Bunger 2015). Big data can also enable impact the future of their organizations. By driving down
R&D to respond to unexpected events before they become the transaction costs involved in innovation, big data can
crises by providing early warning of an emerging issue. lessen many of the traditional advantages enjoyed by large
The most common application in this domain is in R&D organizations. Eventually, organizations will have to
literature search. There are numerous big data tools that embrace these tools—and transform themselves in the
search across all literature and relevant databases and even process—in order to keep up, or face the risk of being dis-
internal document repositories to identify information that rupted by competitors who deploy big data to drive new
is pertinent to a researcher-defined query. For example, product development, streamline R&D to get to market offers a product that helps researchers learn from faster, and move into new markets.
the vast amount of research being produced around the Some industries are already seeing these forces in
world each day. play, forcing organizations to look not only at what work
One basic use of big data tools to enable R&D is to manage is being done but also at how that work is structured.
the information needed to support innovation. For instance, One interviewee told us that the US intelligence
consulting firm Decernis maintains a very large database of community has responded by moving from a hierarchical
worldwide regulations pertaining to food, cosmetics, over- organizational model to a network model. By aggregating
the-counter pharmaceuticals, medical devices, packaging, large central data repositories, accessible through a secure
and more, with all the raw materials used in these areas network using a standardized suite of tools, intelligence
catalogued. The data are translated into 40 languages. This agencies can perform multiple analyses simultaneously
database permits the R&D organizations that use the firm’s and collaborate organically, instead of routing all analytical
service to formulate products with greater confidence that work to one organization. This approach allows the
the products will be accepted by regulatory agencies in community to process more data and exploit its insights
the target market; it also provides insight into potential more efficiently.
regulatory issues before they cause operational difficulties. Big data is also changing the way organizations pursue
Another challenge in R&D management is managing open innovation. In an approach pioneered by Procter
unexpected events that impact a company’s products. One & Gamble in the early 2000s, companies form open
consumer goods company we spoke to monitors social media networks of individual and organizational collaborators
streams when a new product is launched, using analytics to that share information, gleaning insights from widely
gauge consumer reaction and monitor for unexpected issues. distributed resources that can then be applied to R&D
In one case, a packaging issue was identified by comments in (Kastelle 2012; Ozkan 2015). This approach allowed
social media; the company’s R&D organization was able to P&G to integrate external resources into its innovation
create a correction and put it in place before any complaints process; ultimately, the company was able to streamline
were received in the company’s call center. IBM’s Watson its innovation infrastructure and reduce R&D spending.
has also been used to monitor social media streams for However, to accomplish this, P&G had to develop ways
product “scares” or potential recalls. to manage and screen the flow of ideas and knowledge.
For big data to enable R&D, we observe from those who This is an example of a transformation and disruption of
have embraced big data, R&D management must be R&D through the collection of ideas from a large group
looking for new, more efficient ways to carry out research of external contributors, supported by a big data analytics
Other organizations are deploying big data to support
approaches to R&D that would not have been feasible
without big data and analytics. DARPA’s Big Mechanism
If big data is to inform R&D, new skills and
program, for instance, seeks to accelerate cancer research
competencies will be needed. by leveraging the entire research literature (Cohen 2015).
The project is using big data tools to “read” every scientific
article related to cancer, extract all instances suggesting

48 j Research-Technology Management Big Data and the Future of R&D Management

a causal pathway, assemble those instances in context to
create large-scale causal models (signaling networks) and
derive new hypotheses about cancer mechanisms, and test The pharmaceutical industry is one of
those hypotheses in virtual experiments. Although the
program is focused on cancer biology, the overarching goal the sectors where big data is having the
is to develop technologies to support a new kind of most impact on R&D.
science, one in which research is integrated more or less
immediately—automatically or semi-automatically—into
causal, explanatory models of unprecedented completeness
and consistency. on medical conditions and disease states, will provide
insights to identify new potential drug targets for a host
Case Study: Big Data in the Pharmaceutical Industry of once-intractable conditions.
The pharmaceutical industry is one of the sectors where . Enabling. In the typical drug development process, once
big data is having the most impact on R&D; our data new targets for treatment have been identified, a
set included multiple articles and case study examples researcher looks for a compound that can interact with
from pharma. In part, this is driven by the nature of the target in a desirable way, historically through wet
pharmaceutical R&D. The industry relies on clinical trials chemistry and biological screening. More and more,
involving thousands of patients; often, those trials docu- however, this process is moving to virtual screening, in
ment inherently variable responses to a given compound which computer models examine millions of compounds
because of genetic and physiological diversity across the for potential interaction with targets and identify the
patient population. These variations can obscure the true most promising ones. Only a small subset undergoes
outcome of a trial and make it difficult to map a new traditional biological screening (Storrs 2015). This
drug’s actual effects. To deal with this challenge, pharma approach allows more potential molecules to be identi-
companies have developed very sophisticated data fied more quickly and at lower cost, bringing drugs to
analysis capabilities. clinical trial and eventually to market more quickly.
However, the industry has not definitively solved the big
data problem. Traditional pharmaceutical research has . Transforming and disrupting. Extrapolating only a little, it
become increasingly less productive over the past two is not hard to see how the revolution in the availability
decades, at least in part because of a “lack of data or lack of genomic data and data analysis tools could change the
of appropriate analysis of the available data” (Tormay nature of pharmaceutical discovery by permitting the
2015, 88). At the same time, certain kinds of data have identification of compounds with high efficacy in
rapidly gone from being unavailable or difficult to access targeted genetic groups, even if that efficacy cannot be
to being overabundant, largely driven by advances in distinguished from a placebo when compared across
genome sequencing technology and a concomitant reduc- the general population. Bernie Meyerson, Chief Innov-
tion in the costs associated with gene sequencing. The ation Officer at IBM and one of the preeminent thinkers
Human Genome Project announced its first draft sequence on big data, suggested in his interview with us that
in 2000 and its first finished genome in 2003; this first effort health care is where big data will have the greatest
took more than 10 years and cost approximately $2.7 billion societal impact (at least in the United States) and that
with 20 different institutions collaborating (National Human the opportunities presented by the application of big
Genome Research Institute 2003). Today, a human genome data in pharmaceuticals will ultimately be a contributor
can be sequenced in a matter of hours for around $1,000; to that outcome.
one high-throughput sequencer can deliver 400 billion
base pairs per day and up to 12 human genome sequences
and 1.5 terabytes of data per 3.5-day run (Illumina 2015),
Big data will profoundly affect R&D, changing both what
providing access to larger amounts of genomic data faster
innovation looks like and how it is managed. We are
than ever before—and increasing the computing power
already seeing this impact. Although R&D has not generally
needed to gather insight from that data.
been at the forefront of big data applications, companies are
These capabilities are rapidly informing, enabling, and
starting to exploit these capabilities. GE’s heavy investment
transforming pharmaceutical R&D in a number of ways:
in data analytics for its aircraft engine unit and other
. Informing. The power to rapidly generate genetic data has businesses (Winig 2016) is one manifestation of that trend.
been harnessed in the UK’s 100,000 Genomes Project, Looking at the evolution in companies like GE, which are
which is sequencing entire genomes from a diverse set of early adopters of big data approaches, can give some sense
subjects, including patients with rare diseases. The resulting of how the future might unfold for R&D in all industries.
knowledge and insight should help clinicians to improve We believe that the framework we developed for this
diagnosis and outcomes (Genomics England n.d.). This study can be used as a guide in considering the impact
is but one project that uses such approaches. The big data is likely to have in a given industry. Further devel-
resulting torrent of data, when linked with information opment of this framework could lead to a maturity model to

