Data Governance in Pharmaceuticals
Data Governance in Pharmaceuticals
Master of Science in
Health Management & Data Intelligence
Table of Contents
1 Acknowledgments ............................................................................................................... 5
2 Abstract ............................................................................................................................... 7
3 Introduction ......................................................................................................................... 9
3.1 Overview of Data Governance and its importance in the pharmaceutical industry. . 10
4 Literature Review.............................................................................................................. 24
4.2 Review the existing data governance models and their applicability in the
4.3 Critical success factors (CSFs) for Data Governance in general ............................... 66
5.2 Research context and philosophy: The criteria for selecting participants. ................ 74
6.1.3 Data Governance best practices for Policies Compliance in Pharmaceutical .... 85
Industry................................................................................................................................... 102
1 Acknowledgments
I would like to begin by expressing my sincere gratitude to the individuals who graciously
participated in the survey, providing invaluable insights into the realm of Data Governance. To
respect their wishes for personal identity confidentiality, I will refrain from revealing their
names here. Your willingness to share your expertise and experiences was instrumental in
shaping this thesis, and I am deeply appreciative of your contributions. Thank you for your
support and commitment to advancing our understanding of Data Governance.
This Master’s thesis would not have been possible without the invaluable contributions,
guidance, and expertise of the data professionals in the field. I am filled with gratitude for the
individuals who have played a pivotal role in shaping this work.
First and foremost, I extend my heartfelt thanks to Arnaud Jaoul, my thesis supervisor and
professor at École des Mines de Saint-Étienne Engineering School for Big Data and Artificial
Intelligence course. His unwavering patience and profound knowledge of Data Management
not only provided me with a clear direction for developing my arguments but also helped me
enhance the quality of my thesis.
I am also grateful to Mathieu Verriere and Bruno Versaevel, both outstanding professors at
emlyon Business School, who facilitated valuable connections with data experts in Lyon,
enabling me to conduct interviews and formulate the survey questions.
My deepest appreciation goes out to all those who generously spared their time to participate
in interviews, thus offering profound insights into the challenges of Data Governance, which
ultimately inspired the formulation of specific questions for the pharmaceutical sector. I would
like to express my heartfelt thanks to:
• Ole Olesen-Bagneux, author of "The Enterprise Data Catalog," for his extensive
knowledge and constructive guidance.
• Laura Madsen, author of "Disrupting Data Governance" and "Data Driven Healthcare:
How Analytics and BI are Transforming the Industry," for sharing her invaluable
perspectives on the meaningful aspects of data governance role, although she always
humorously says, "I hate Data Governance!"
• Thomas de Charentenay, Data & Business Transformation at Sanofi Pasteur, for his
thoughtful review of my survey questions and advice on how to improve.
• Charlotte Ledoux, Data Governance Manager at Pernod Ricard, for her invaluable
insights into the critical role of C-level sponsorship in the success of data governance
initiatives within change management.
• Phil Black, the founder of Data QG, whose data governance education platform
provided me with a deeper understanding of the world of data governance.
• Juan Felipe Arias Aguirre, Data Scientist at Cityscoot, for offering his valuable insights
into the challenges of data governance.
I would like to say a big "thank you" to each of these people for their constant help and the
important roles they played in making this thesis happen.
2 Abstract
The field of information systems recognizes the growing importance of data governance in
efficiently handling the ever-increasing volume of data in organizations (Alhassan et al., 2016).
It highlights the need for effective strategies to manage and utilize data resources effectively
(Alhassan et al., 2016). This holds true particularly in the pharmaceutical sector, where data
governance plays a key role in dealing with challenges posed by various data sources that can
make it difficult to share and use data effectively (Truong et al., 2017). Data governance is not
only vital for managing large datasets but also for complying with regulations and maintaining
pharmaceutical sector, combining insights drawn from a thorough literature review and a
conducted survey. The survey targeted professionals spanning roles in Data, R&D, Marketing,
Production, Sales, Clinical, and Finance within the pharmaceutical industry. Despite limitations
posed by a modest sample size of 19 and constraints precluding in-depth interviews, the survey
responses provided valuable insights. Quantitative analysis of the survey responses facilitated
successful implementation.
The literature underscores the integration of data governance into the quality management
system emerges as a vital strategy to meet regulatory expectations (Khin et al., 2020).
Additionally, the growing influence of artificial intelligence (AI) in decision-making and the
quest for enhanced medicines cannot be underestimated (Marcelo Corrales Compagnucci et al.,
2022). Within this context, challenges related to sustaining data integrity traverse inadequate
quality culture, organizational and individual behavior, leadership, processes, and technology
(Charoo et al., 2023). Notably, regulatory authorities endorse the adoption of the F.A.I.R and
ALCOA+ principles as guiding tenets for upholding data integrity (Hodgson et al., 2017).
The primary findings have been analyzed by comparing them to the literature review, leading
to the identification of seven Critical Success Factors (CSFs) essential for effective data
governance within the pharmaceutical industry. These factors include securing leadership
Departments, and business units, and fostering a data-driven culture. Additionally, transparency
in data processes guided by principles such as F.A.I.R., ALCOA+, and cGMP is considered an
important factor.
While acknowledging the limitations posed by sample size, future research holds the potential
to mitigate these drawbacks and glean deeper insights. Additionally, the consideration of
these details, this investigation highlights how important data governance is in influencing
healthcare and pharmaceutical progress, emphasizing the balanced relationship between theory
3 Introduction
In today's digital age, data has become the lifeblood of organizations across various industries,
and the pharmaceutical sector is no exception (Truong et al., 2017). The rapidly growing
volume of data generated within pharmaceutical companies presents both opportunities and
challenges (Alhassan et al., 2016). In the ever-evolving landscape of data infrastructure and
digital solutions, the pharmaceutical industry is witnessing a paradigm shift in how data is
generated, managed, and utilized (Alhassan et al., 2016). With the proliferation of digital
including clinical trial results, patient records, real-world evidence, and supply chain
information (Marcelo Corrales Compagnucci et al., 2022). While this data abundance presents
remarkable opportunities, it also poses significant challenges that need to be addressed to fully
Data infrastructure in the pharmaceutical sector has become increasingly complex, comprising
diverse sources and formats (Marcelo Corrales Compagnucci et al., 2022). This evolution has
given rise to the critical need for robust data governance (Alhassan et al., 2016). Data
designed to ensure data quality, security, integrity, and compliance (Ladley, 2019). It addresses
the challenges inherent in managing large and sensitive datasets, safeguarding patient privacy,
pharmaceutical industry and understand how it can play a pivotal role in overcoming existing
challenges. The key problem at hand is the effective management and utilization of data to
make informed decisions while maintaining compliance and data privacy (Hodgson et al.,
2017). Data governance provides a solution to this problem by establishing guidelines for data
access, usage, and accountability throughout the organization's data lifecycle (Weiss, 2022).
The importance of data governance in the pharmaceutical context extends beyond operational
efficiency (Neumeyer, 2020). It is essential for maintaining patient safety and drug efficacy
throughout the entire drug development and post-marketing phases (Neumeyer, 2020). By
adhering to stringent data governance practices, pharmaceutical companies can ensure ethical
and secure handling of patient data, minimizing risks associated with adverse events and
Supply chain optimization is another area where data governance proves invaluable (Koh et al.,
2011). It enables enhanced supply chain visibility, reducing the likelihood of drug shortages
and counterfeit risks, thereby benefitting patients and healthcare providers alike (Bagozzi &
Lindmeier, 2017).
Finally, data governance plays a significant role in driving commercial effectiveness in the
pharmaceutical industry (EMEA, 2010). By leveraging well-governed data, companies can gain
valuable market insights, improve sales force effectiveness, and enhance customer relationship
3.1 Overview of Data Governance and its importance in the pharmaceutical industry.
Data governance is often misconstrued as being synonymous with data management. However,
according to DAMA International (2009), “data governance is the exercise of authority and
control (planning, monitoring, and enforcement) over the management of data assets”. It should
be noted that data governance complements data management but does not replace it (Wende,
2007). Therefore, it is crucial to make a clear distinction between data management and data
governance. Their characteristics explained below will reveal the fundamental contrast between
operational handling of data and strategic oversight of data-related processes and policies:
The DAMA Guide to the Data Management Body of Knowledge, authored by DAMA
International in 2009, stands as the definitive reference for all endeavors in the realm of data
management. Often referred to as the "bible" of data management since it is referenced in most
articles about data governance, this authoritative guide offers comprehensive insights and best
practices for professionals seeking to navigate the complexities of data management. According
managing and utilizing data and information assets within an organization. It aims to meet the
Figure 1 above illustrates the ten major constituent functions of data management as outlined
1) Data Governance: This function involves the strategic planning, supervision, and
2) Data Architecture Management: This function defines the blueprint for managing data
assets. It involves determining the data requirements of the organization and designing
with application system solutions and projects that implement the enterprise
architecture.
data needs of the organization. It focuses on data-related tasks within the system
development lifecycle (SDLC) and includes data modeling, requirements analysis, and
4) Data Operations Management: This function involves the planning, control, and support
for structured data assets throughout their lifecycle, from creation and acquisition to
executing security policies and procedures to ensure data privacy, confidentiality, and
improve, and ensure the fitness of data for its intended use.
planning, implementation, and control processes that facilitate the provision of decision
support data and support for knowledge workers engaged in reporting, querying, and
analysis activities.
implementation, and control activities to store, protect, and provide access to data stored
in electronic files and physical records, including text, graphics, images, audio, and
video.
10) Metadata Management: This function involves planning, implementation, and control
provides essential information about the data assets and their characteristics.
These ten functions collectively form the comprehensive scope of the data management
function as defined by DAMA International (2009). They provide a structured framework for
organizations to effectively manage their data assets and ensure their strategic utilization
As illustrated in Figure 1 above, data governance holds a central and crucial position in the
management of data assets, as depicted in the circular representation of the ten data
management functions. The following section explains in more detail about the Data
Governance concept.
Data governance focuses on determining who holds decision-making authority over data assets
within an organization (Khatri & Brown, 2010), with the aim of ensuring data quality,
Data could only bring value to business decision making when its quality and integrity are
ensured (Otto, 2015). As depicted in Figure 2 above, Huff et al. (2019) emphasize five data
dimensions to ensure data integrity, where data governance plays a crucial role in terms of
patient care, treatment plans, record retention, key performances indicators (KPIs) and
scorecards.
As depicted in Figure 2, data integrity and quality emerge as the central facets critical for
of which fall under the purview of data governance endeavors (Huff et al., 2019).
Data integrity is a critical aspect of data that contributes to its trustworthiness and reliability
(Huff et al., 2019). According to Huff et al. (2019), data integrity refers to the systems and
To ensure data integrity, several attributes must be upheld, as defined by ALCOA+ (Figure 3)
3) Enduring: Data should be retained for the required duration as specified by regulatory
4) Available: Data should be accessible and retrievable when needed, ensuring its
disruptions.
5) Attributability: Data should be traceable and linked to its source of who collected and
6) Legibility: Data should be clear and easily readable, ensuring its understandability and
accessibility.
8) Original or True Copy: Data should be in its original form or an authorized and accurate
9) Accuracy: Data should be precise and free from errors, reflecting the true values and
characteristics it represents.
These characteristics must be maintained throughout the entire data lifecycle which
retrieval, transmission, and disposal after the designated retention period (Hodgson et al., 2017).
By adhering to the principles of ALCOA+ and ensuring data integrity, organizations can
enhance the reliability, credibility, and usability of their data assets (Hodgson et al., 2017).
Data integrity serves as a fundamental principle and commitment within the pharmaceutical
industry to ensure the production of safe, effective, and high-quality drugs (Charoo et al., 2023).
By upholding data integrity, the industry aims to comply with established standards and
regulations, reinforcing its dedication to patient safety and public health (Charoo et al., 2023).
In the context of drug manufacturing processes, data integrity plays a vital role in enabling
regulatory authorities, such as the US Food and Drug Administration (FDA) and the United
monitor and assess the integrity of data generated throughout the entire lifecycle of a drug
(Charoo et al., 2023). This is crucial for ensuring the reliability and accuracy of data submitted
Recognizing the significance of data integrity, the FDA CDER and MHRA-UK conducted a
joint Good Clinical Practice (GCP) workshop in October 2018 (Khin et al., 2020). The
workshop focused on discussing data integrity and clinical data management within the
pharmaceutical industry (Khin et al., 2020). The outcome of the workshop emphasized that
reliability can have a profound impact on the acceptability of the data submitted for regulatory
Moreover, violations of data integrity can pose serious safety risks to human volunteers
participating in clinical trials and undermine the regulatory efforts to protect human subjects
(Khin et al., 2020). Therefore, it is imperative for organizations to establish and maintain robust
procedures for maintaining study blind during clinical trials, ensuring the integrity and
confidentiality of data collected throughout the trial process (Khin et al., 2020).
upholding rigorous standards, safeguarding patient safety, and promoting public trust in the
Data governance plays a paramount role in the biopharmaceutical industry by acting as the
guardian of data integrity across every stage of the data lifecycle (Weiss, 2022). Data lifecycle
archiving, retrieval, transmission, and disposal of data after the designated retention period
(Hodgson et al., 2017). From the inception of data collection through its processing, analysis,
storage, and eventual utilization, data governance ensures that data remains accurate, consistent,
Ensuring data integrity is increasingly crucial in this modern landscape, where the industry
digital technologies (Marcelo Corrales Compagnucci et al., 2022; Self, 2014; Truong et al.,
2017). Digital transformation initiatives leveraging big data, cloud computing, artificial
intelligence (AI), and the Internet of Things (IoT) are being pursued by many pharmaceutical
companies to improve efficiency and gain a competitive edge (10 Internet of Things (IoT)
Healthcare Examples, 2023; Marcelo Corrales Compagnucci et al., 2022; Selvaraj &
governance to ensure compliance with strict regulatory requirements and align with the Industry
While digital transformation initiatives are often prioritized by the C-suite, data governance,
data integrity, and regulatory compliance may not always receive sufficient attention (Weiss,
2022). This difference in attention could arise from the perception that data governance lacks
the immediate appeal of subjects such as data analytics or machine learning (Weiss, 2022). Data
procedural complexities, rather than yielding the rapid and tangible outcomes associated with
analytics or machine learning (Levy, 2021). Nevertheless, experts underscore the significance
of data governance through the adage "bad input, bad output" (Levy, 2021; Marcelo Corrales
Compagnucci et al., 2022; Vamathevan et al., 2019). Essentially, data governance acts as a
cornerstone, ensuring the integrity of data quality (DAMA International, 2009; Wende, 2007).
This, in turn, safeguards the credibility of results derived from data analytics and machine
learning endeavors, thereby culminating in the delivery of genuine value to the organization
legacy processes and replace manual methods with integrated systems that can enhance overall
In recent years, there has been a significant increase in FDA regulatory warning letters. Data
integrity issues accounted for 47% of all warning letters issued by the FDA in 2019 and reached
65% by the end of 2021 (Eglovitch, 2022). This trend has prompted pharmaceutical
organizations to reassess their infrastructure and digitalize their business and operational
The adoption of tools such as electronic laboratory notebooks (ELN), laboratory information
management systems (LIMS), and manufacturing execution systems (MES) has helped the
biopharmaceutical industry improve data integrity and manage operational data (Weiss, 2022).
Large organizations are also investing in centralized data repositories like data lakes where data
is shared and used among business units to support their digitization efforts and break down
data silos (Weiss, 2022). However, integrating these systems while maintaining data integrity
and regulatory compliance remains a significant challenge that necessitates a strong foundation
the biopharmaceutical industry (Alosert et al., 2022; Weiss, 2022; Wise et al., 2018). It ensures
the consistency, coherence, and regulatory compliance of data throughout its lifecycle, enabling
(Alosert et al., 2022; Rattan, 2018). By prioritizing data governance, the industry can navigate
the challenges of Industry 4.0 and establish a solid framework for future advancements (Alosert
et al., 2022).
improve operational efficiency, reduce costs, and gain a competitive advantage (Leesakul et al.,
2022). However, amidst these transformative efforts, maintaining data integrity and ensuring
regulatory compliance remain critical challenges (Alosert et al., 2022; Buytaert-Hoefen, 2019;
Khin et al., 2020). The effective implementation of digital technologies must be underpinned
by a strong foundation of data governance, which encompasses the strategic planning, control,
Previous studies have highlighted the importance of data governance and data integrity in
various industries, including the biopharmaceutical sector (Alosert et al., 2022; Khin et al.,
2020; Neumeyer, 2020; Truong et al., 2017). However, there is a limited understanding of the
specific challenges and best practices for implementing data governance in the context of digital
transformation initiatives in the pharmaceutical industry. While some research has explored the
adoption of specific tools and systems to improve data integrity (Hodgson et al., 2017;
Neumeyer, 2020; Rattan, 2018), there is a lack of comprehensive studies that examine data
The purpose of this thesis is to explore and analyze the current rate of adoption and awareness
biopharmaceutical industry. The thesis aims to investigate the current state of data governance
practices, identify challenges and gaps in implementing data governance, and propose strategies
Additionally, the study will involve conducting surveys with key stakeholders in leading
biopharmaceutical companies, to gain insights into their data governance models, challenges,
By addressing this research gap and incorporating insights from industry, this study aims to
provide valuable insights and guidance for industry practitioners, helping them understand the
critical role of data governance in digital transformation. The findings and recommendations
will help raise awareness about this subject and equip organizations in the biopharmaceutical
sector with the necessary tools and strategies to ensure regulatory compliance, data integrity,
1. Literature Review
practices, common issues with data integrity, the data lifecycle in the BioPharma sector,
• Reviews existing data governance models (centralized, federated, hybrid) and assesses
2. Research Methodology
• Describes the research design and methodology, including the approach to data
5. Conclusion
• Summarizes the findings from the literature review and empirical research.
• Discusses the implications of the study for practitioners and suggests future research
directions.
4 Literature Review
This section delves deeply into the data integrity issues discussed in the pharmaceutical context
processes currently lack established standards (Weiss, 2022). Although some standards are
emerging, they are not widely adopted by hardware and software vendors due to the absence of
industry consensus or the immaturity of these standards (Weiss, 2022). This issue is
data silos and hindered system-to-system integration (Alosert et al., 2022; Weiss, 2022).
An often overlooked yet crucial aspect of data integrity is data contextualization (Buytaert-
Hoefen, 2019; Huff et al., 2019; Rattan, 2018). Simply extracting data from a specific system,
Maintaining this contextual information, also known as the "chain of custody," is essential not
only for interpreting experimental findings but also for attaining the necessary business
The combination of data from multiple systems is frequently performed manually by operators,
often relying on intermediary tools like spreadsheets (Weiss, 2022). This manual process is
time-consuming and prone to errors, with the risk of mistakes increasing with each additional
extensive quality checks to ensure data integrity and compliance with regulatory requirements,
The ability to generate and access high-quality contextualized data poses a significant
Corrales Compagnucci et al., 2022; Vamathevan et al., 2019; Weiss, 2022). Data scientists with
extensive education levels can spend an excessive amount of time searching for, combining,
and cleaning data to generate datasets for training and validating models (Weiss, 2022).
automating data integration processes are essential for enhancing data integrity and accessibility
in biopharmaceutical processes, enabling more efficient and accurate analysis for decision-
The integration of digital topography with custom solutions is challenging for many
organizations due to the lack of IT/software development skills, resources, and time, as well as
varying support from hardware/software vendors (Weiss, 2022). Developing custom software
code for system integration often creates a resource overhead debt that is difficult to maintain
(Weiss, 2022). Therefore, when procuring new informatics systems or instruments, it is crucial
to consider the vendor's API and documentation quality, support services, and examples of
products to meet the growing demand for integration support (Weiss, 2022).
software for automation may be beyond their technical and budgetary capabilities (Levy, 2021).
To address this, some companies have developed software solutions that simplify the
integration of laboratory hardware and software platforms (Levy, 2021). These solutions offer
The task of correcting, contextualizing, and aligning data from different sources can be
daunting, involving the combination of partial data sets, harmonization of terms and identifiers,
and ensuring data alignment (Buytaert-Hoefen, 2019). Manual data processing by human
operators introduces significant data integrity risks and is prone to errors (Buytaert-Hoefen,
integrate and map terms from different systems (Shafiei et al., 2015). Open-source initiatives
knowledge graph tools have expanded the availability of such solutions (Shafiei et al., 2015).
While the primary focus of systems integration and automated data contextualization is on
hardware and operational tools, there are challenges in capturing all observations in the wet-lab
work dynamic (Weiss, 2022). Researchers often document data manually in handwritten notes,
which may be crucial for experiment qualification but may not be included in the electronic
information chain of custody (Weiss, 2022). This introduces risks of data loss, transcription
To address this issue, some companies have developed scientifically intelligent digital voice
assistants that enhance the lab workflow and data capture process (Weiss, 2022). These
assistants provide voice instructions to guide users through complex protocols, import data
directly from lab equipment, and capture important observations and notes by transcribing voice
dictation (Weiss, 2022). These digital tools can operate independently or integrate with other
Achieving the goals of automation and analytics in the context of Industry 4.0 necessitates the
improvement of data governance practices and system integration (Alosert et al., 2022). The
FDA has adopted the ALCOA+ framework to establish its expectations regarding data integrity,
which helps industry professionals comply with 21 CFR Part 11 (Weiss, 2022). According to
ALCOA, data must be attributable, legible, contemporaneous, original, and accurate (Chubb,
2021). Building on these principles, ALCOA+ further specifies that data must also be complete,
In the realm of scientific data management and stewardship, the F.A.I.R Principles, established
in 2014, advocate for similar concepts to address broader challenges related to data integration
and system automation (Chubb, 2021). The F.A.I.R Principles emphasize the findability,
accessibility, interoperability, and reusability of data (IDBS, s. d.). They promote the adoption
contextualizable (Chubb, 2021). While more of a design principle than a standard, the F.A.I.R
Principles have gained popularity as a valuable tool for data management and stewardship
sources available, the pharmaceutical industry can leverage a wide range of data to drive
innovation. This access to diverse data sources can contribute to the discovery of new
insights and the development of novel approaches (Holub et al., 2017; Roe, 2021).
2. Reduced time frames in drug discovery: The ready accessibility of data enabled by
FAIR implementation can expedite the drug discovery process. Researchers can easily
access and utilize relevant data, leading to more efficient and timely decision-making
3. Elimination of data silos: FAIR implementation promotes the elimination of data silos
within organizations. This fosters internal and external collaboration, as data becomes
more discoverable, accessible, and reusable across different teams and stakeholders.
Breaking down data silos encourages collaboration and knowledge sharing, leading to
enhanced productivity and innovation (Van Vlijmen et al., 2020; Wise et al., 2018).
intelligence (AI) and other cutting-edge technologies. The availability of FAIR data
allows for more effective utilization of these sophisticated analytical methods, enabling
researchers to gain deeper insights and make more informed decisions (Fleming, 2018;
F.A.I.R. and ALCOA+ principles complement each other by focusing on different aspects of
data management. F.A.I.R. emphasizes the importance of metadata in enhancing the reliability
of electronic data capture, while ALCOA+ addresses data integrity challenges to improve the
trustworthiness of the data output (Weiss, 2022). By embracing both sets of principles,
organizations can work towards creating a more comprehensive and reliable data management
It should be highlighted that the alignment of research and development (R&D) datasets with
FAIR principles does require a significant investment of time, effort, and resources (Alharbi et
al., 2021). It is a complex process that involves various steps, including data management,
According to Alharbi et al. (2023), to align R&D datasets with FAIR principles, collaboration
researchers, data scientists, IT professionals, data managers, and other relevant stakeholders
within the organization (Alharbi et al., 2023). It may also extend to external collaborations with
academic institutions, regulatory bodies, and other industry partners (Alharbi et al., 2023).
The pharmaceutical industry is currently facing significant concerns regarding data integrity,
as evidenced by recent FDA observations and warning letters (Eglovitch, 2022). To address the
According to 501(a)(2)(B) of the FD&C Act, a drug can be considered impure if it doesn't
follow the current good manufacturing practice (cGMP) standards during its production,
strength, quality, and purity (Code of Federal Regulations, 2022). Studies indicate that
while 62% of drug shortages in the United States between 2013 and 2017 were attributed to
with data integrity requirements can result in unvalidated results, leading to potential risks such
as post-marketing issues and product recalls (Charoo et al., 2023). In this context, the
commitment to manufacturing drugs that are safe, effective, and compliant with quality
standards (Alosert et al., 2022; Rattan, 2018). Data integrity also serves as a critical tool for
Violations of data integrity in relation to current Good Manufacturing Practice (cGMP) have
prompted regulatory actions such as warning letters, import warnings, and consent decrees
(Charoo et al., 2023). The Code of Federal Regulations (CFR) outlines the essential criteria of
cGMP data integrity in 21 CFR 211 and 212 (Figure 4), concerning records, data storage,
standards of safety, identity, strength, quality, and purity (Code of Federal Regulations, 2022).
Figure 4 - Data integrity requirements in 21 CFR 211 and 212. (Charoo et al., 2023)
Manufacturing Practices), it is crucial to analyze common issues related to data integrity. The
In the highly regulated pharmaceutical industry, companies are required to operate with great
care and rigor to meet the stringent standards set by regulatory authorities (Neumeyer, 2020).
However, it's important to recognize that some practices can have significant compliance
implications (Neumeyer, 2020). Nine common issues highlighted below aim to raise awareness
within the pharmaceutical sector about the importance of avoiding these pitfalls to ensure
Data retention in the pharmaceutical industry refers to the practice of retaining and preserving
various types of data and records related to drug manufacturing, testing, quality control, and
(Alosert et al., 2022). The specific duration for which data must be retained can vary depending
on the type of data and the applicable regulations, such as those set forth by the U.S. Food and
Drug Administration (FDA) in the United States and similar agencies in other countries
(Neumeyer, 2020).
The deliberate destruction of production, control, distribution, and quality records of the
and Cox (2019), is a serious violation of data integrity and regulatory requirements. Such
actions undermine the accuracy and reliability of data, raising concerns about the integrity of
manufacturing processes and the quality of pharmaceutical products (Cox, 2019). This not only
exposes companies to regulatory sanctions but also poses potential harm to patients relying on
To prevent breaches of data integrity and ensure the quality and safety of pharmaceutical
products, proper data governance, management oversight, and adherence to current Good
(2023). Unfortunately, data from 2005 to 2017 reveal that a significant percentage (23%) of
warning letters issued by regulatory bodies cited the deletion or destruction of cGMP original
records, which are crucial for demonstrating compliance (Charoo et al., 2023).
The Code of Federal Regulation (CFR) 211.180 specifies that production, control, or
distribution records necessary for cGMP compliance should be retained for a specified period
(Code of Federal Regulations, 2022). For example, the records should be kept for at least one
year after the batch expiration date or three years after the distribution of over-the-counter
(OTC) without an expiration date (Code of Federal Regulations, 2022). Critical records, such
as batch documents, marketing authorization application data and traceability data, may require
long-term retention with appropriate archival systems to ensure data integrity during extended
storage (MHRA, 2015). It is essential to retain data in its original form or true copies, such as
photocopies, microfilm, or microfiche, or any other form that can accurately replicate the
original records to ensure data security (MHRA, 2015). Once the retention period has expired,
the documents should be disposed of following the prescribed procedure, and a record or
retired records in compliance with Good Manufacturing Practice (GMP) requirements (PIC,
2021).
audit trail for GMP-relevant changes and deletions (EudraLex, 2011). GMP-relevant data
should not be altered or deleted without proper justification, and audit trails should be easily
changes may include adjustments to sample weight, sequence aborts, batch number changes for
procedures, disabled peak detection, inadequate investigation of unknown peaks and Out-Of-
Specification (OOS) results is a serious concern in terms of data integrity and regulatory
compliance (Charoo et al., 2023). These actions undermine the reliability and accuracy of
laboratory data, which is critical for ensuring the quality and safety of pharmaceutical products
In response to such issues, the FDA has released a draft guidance titled "Submission of Quality
Metrics Data" to develop compliance and inspection policies and practices (FDA, 2016). This
guidance aims to utilize quality metrics, including the Invalidated Out-of-Specification Rate
(IOOSR), for various usage purposes such as scheduling drug manufacturer inspections based
on risk assessment, forecasting and mitigating drug shortages, and promoting the
(FDA, 2016).
The IOOSR is a measure defined by the FDA as the ratio of invalidated out-of-specification
(OOS) test results for lot release and long-term stability testing, caused by measurement process
aberration, to the total number of lot release and long-term stability OOS test results within a
specified reporting timeframe (FDA, 2016). This metric provides valuable insights into the
reliability and accuracy of testing processes and can help identify potential issues with data
By implementing quality metrics like the IOOSR, regulatory authorities can gain better
visibility into the integrity of laboratory data, identify areas of concern, and take appropriate
regulatory actions to ensure compliance and the production of safe and effective pharmaceutical
Metadata refers to structured data that provides information about the context and management
of the data (DAMA International, 2009). While an audit trail is a secure, electronically
generated record that contains a time-stamped sequence of events related to the creation,
According to Charoo et al. (2023) both metadata and audit trails play crucial roles in
For example, in 2015, FDA addressed a warning letter to Zhejiang Hisun Pharmaceutical
Company because of a significant deviation from current Good Manufacturing Practice (cGMP)
regulations, which included the deletion of failing data and the repetition of tests until passing
results were obtained. Moreover, the letter indicated that supporting raw data was deleted,
metadata was not archived, and audit trails were unavailable (FDA warning letter, 2015).
To accurately reconstruct cGMP activities, both the data and its associated metadata must be
retained throughout the specified retention period, preserving their relationships in a secure and
traceable manner (Charoo et al., 2023). Examples of metadata components for a specific set of
data may include the date/time stamp, user ID of the person conducting the test or analysis,
instrument ID used for data acquisition, material status data, material identification number,
On the other hand, audit trails enable the reconstruction of activities performed on electronic
records (Charoo et al., 2023). For instance, an audit trail for a High-Performance Liquid
Chromatography (HPLC) run may include information such as the username, date/time of the
run, integration parameters used, details of any reprocessing performed, and documentation
justifying the need for reprocessing (Charoo et al., 2023). Audit trails are essential for
In line with FDA recommendations (2018), it is crucial to securely preserve the backup (actual
copy) of the original data, including metadata, throughout the entire records retention period.
Metadata is treated as an integral part of the backup data (FDA, 2018). Electronic data generated
to meet cGMP standards should also include relevant metadata, which must be evaluated as
part of the batch release criteria (Charoo et al., 2023). This ensures that both the original data
and its accompanying metadata are preserved and assessed for compliance with cGMP
To mitigate the risk of inadvertent data loss or deletion, a risk mitigation approach such as
server-based data gathering with overnight archiving frequency and restricted access to
laboratory staff can be implemented (Charoo et al., 2023). Cloud-based storage applications
that are commercially available and compliant with regulatory requirements can be utilized for
long-term data retention in a cost-effective manner (Charoo et al., 2023). These cloud-based
storage solutions typically have secure protocols in place to control data entering and leaving
the cloud, ensuring data security and integrity (Charoo et al., 2023). However, it is essential to
ensure that the chosen cloud-based storage solution meets all relevant regulatory requirements
for data retention, security, and privacy (Charoo et al., 2023). Appropriate measures should be
taken to safeguard against unauthorized access or data breaches (Charoo et al., 2023). Regular
reviews and audits of the cloud-based storage system's security measures should also be
conducted to ensure ongoing compliance with regulatory requirements (Charoo et al., 2023).
Besides, it is crucial to note that the European Health Data Space constitutes a foundational
element of the robust European Health Union and represents the inaugural EU data space
tailored to a specific domain within the broader European data strategy (European Health Data
Space, 2023). It extends the regulatory framework established by the General Data Protection
Regulation (GDPR), the proposed Data Governance Act, the draft Data Act, and the Network
The French Digital Healthcare Agency (ANS) sets a rigorous framework for practices related
Certification, s. d.). After initial approval in 2016, OVHcloud received HDS certification in
2019, so that all of its healthcare sector customers could benefit from this guarantee (HDS
Certification, s. d.).
The practice of pre-injections, also known as "testing into compliance," before the official
sample sequence is not supported by credible scientific evidence and is considered a violation
According to the FDA (2018), companies should use scientific principles to determine the
number of retests that can be performed, and this number should be predetermined in the
Standard Operating Procedures (SOP) and not changed based on the results obtained during
analysis. The SOP should also specify the time frame within which repeat testing should not be
performed, and any deviations from this procedure should be recorded and justified in
Furthermore, the FDA (2018) recommends that if additional testing is deemed necessary, a
protocol should be created and approved by the facility's quality unit. This protocol should
outline the additional analytical testing to be conducted and provide details on the scientific
and/or technical handling of the data (FDA, 2018). It is crucial to follow proper procedures and
document any deviations from established protocols or procedures to ensure compliance with
4.1.4.5 Contemporaneous
The contemporaneous requirement states that data should be recorded as it occurs (EMEA,
2010). To address this issue, analysts should be trained to document their work when it is
finished, and procedures should be in place to ensure adherence to this practice (Charoo et al.,
2023).
Neumeyer's study (2020) revealed that logbooks and batch records were signed by employees
much later than their execution date, which directly conflicts with the "contemporaneous"
software platforms that can record activity details in real-time and in a secure audit trail that
Another violation of the contemporaneous requirement was observed during the tablet
manufacturing process, where tablet samples taken at different time intervals were tested in an
area outside the tableting room, but no checks were made to ensure the accuracy and traceability
of the results (Neumeyer, 2020). This violates Chapter 4(4.8) of EU GMP, which states that
records should be made or completed at the time each action is taken, and all significant
2020).
The use of electronic laboratory notebook (ELN) systems can contribute to improving
contemporaneous documentation practices (Charoo et al., 2023). ELN systems allow for real-
time recording of data as activities are conducted, ensuring compliance with the
contemporaneous requirement (Charoo et al., 2023). Furthermore, any changes or edits made
to the original data are recorded with a date, time, and signature stamp, enhancing transparency
and accountability in the documentation process (Charoo et al., 2023). By utilizing ELN
systems, discrepancies can be minimized, and data can be accurately recorded and maintained
Aborted chromatography sample set runs with deleted data are not in compliance with cGMP
requirements (FDA, 2018). While occasional aborted runs may occur, it is important that the
data from these runs is not deleted, and investigations should be conducted to determine the
cause of the abort (Charoo et al., 2023). All data generated, including aborted runs, should be
Processes should be designed to ensure that data cannot be modified without proper record-
keeping of the modification, including aborted injections (FDA, 2018). This prevents runs from
ensuring data integrity and compliance with regulatory requirements (FDA, 2018). By retaining
all data, including aborted runs, and conducting appropriate investigations, companies can
demonstrate their commitment to data integrity and regulatory compliance (FDA, 2018).
The inability of the company to locate raw data from a standard curve and the use of papers for
recording data in an analytical assay are clear violations of cGMP standards (FDA, 2018).
To ensure good document control, it is recommended to use bound paginated notebooks that
are stamped for official use by a document control group (Neumeyer, 2020). These notebooks
allow for easy detection of unofficial notebooks and help identify any gaps in notebook pages,
If blank forms are used for data recording, they should be controlled by the Quality Assurance
(QA) department and reconciled upon completion to ensure that no pages are missing or
tampered with (Code of Federal Regulations, 2022). Recording data on paper and later
transcribing it into a permanent laboratory notebook violates cGMP standards, as data should
be recorded at the time of performance and saved in a way that accurately replicates the original
The use of unofficial and uncontrolled electronic spreadsheets for storing initial data is also a
violation of cGMP standards, according to the letter sent to FACTA Farmaceutici S.p.A (FDA
warning letter, 2017). It is crucial for data to be properly recorded, stored, and accessible to
analysts and operators for retrieval and review during inspections or important quality decisions
Proper training of analysts and operators is essential to ensure compliance with cGMP
requirements and enable them to retrieve all data related to cGMP processes when needed (Code
of Federal Regulations, 2022). This includes training on data recording practices, document
control procedures, and the importance of accurate and accessible data for regulatory
FDA's finding indicates the lack of appropriate controls over computer systems and access
privileges at BBC Group Limited (FDA warning letter, 2021). Failure to establish proper
controls and access rules for computer systems is a violation of regulatory requirements,
To address these issues and ensure compliance with cGMP guidelines, it is essential for the
company to implement robust controls and access rules for their computer systems (Charoo et
al., 2023). The FDA (2018) recommends assigning the role of system administrator to personnel
who are not responsible for the content of the records. This separation of duties helps prevent
unauthorized modifications to records and ensures the integrity and security of the data (FDA,
2018).
Maintaining a record of authorized personnel with access privileges to each cGMP computer
system is crucial (Charoo et al., 2023). This can be accomplished through a list of authorized
individuals, clearly documenting who has access to the system (Code of Federal Regulations,
controlled records, including cGMP records and inputting laboratory data into computerized
Standard Operating Procedures (SOPs) should define the roles and responsibilities of the system
administrator, as well as the access privileges for each cGMP computer system in use (Charoo
et al., 2023). It is important to assign the system administrator role to personnel who are
independent from those responsible for the record content, such as laboratory personnel
(Charoo et al., 2023). SOPs should outline the specific access privileges and responsibilities of
the system administrator and authorized personnel to ensure appropriate controls are in place
4.1.4.9 Data integrity challenges with contract development and manufacturing organizations
In the pharmaceutical industry, legacy supply agreements often lack the necessary transparency
and data integrity requirements for regulatory compliance and adherence to Good
Manufacturing Practices (GMP) (Charoo et al., 2023). Even well-prepared agreements may not
fully address the challenges arising from the absence of a digital data management system in
many Contract Development and Manufacturing Organizations (CDMOs) (Charoo et al., 2023).
The reliance on spreadsheets for data sharing is common but susceptible to errors and
manipulation (Schell, 2019). Moreover, the lack of a digital data management system can lead
to delays in critical activities such as technology transfer, product scale-up, data submissions,
To ensure compliance with GMP guidelines, contract manufacturing facilities must establish
robust controls to guarantee the accuracy and integrity of data and test results (Friedman, 2012).
Manufacturers or sponsors should thoroughly review the data generated by the contracted
facility as part of their quality assurance process before releasing the product (Friedman, 2012).
Outsourcing companies should ensure that their contractors have comparable data governance
systems and include data integrity requirements in their contractor qualification program
(Friedman, 2012). Manufacturers should conduct audits of the internal practices of CDMOs to
ensure compliance with data integrity requirements (Friedman, 2012). The contractor's data
governance system should be periodically assessed, and the agreement should include
provisions for process control and data integrity measures such as computer system validation,
In addition to the issues discussed, there are supplementary strategies available to ensure data
To ensure data security and reliability, companies should implement data management policies,
and integrate them into their quality systems (Pérez, 2017). Organizational controls include
instructions for record completion, retention of records, staff training, authorization for data
generation/approval, and design of data governance systems, while technical controls include
Implementing both organizational and technical controls facilitates effective data management
To address data integrity issues, the following measures can be adopted (Shafiei et al., 2015):
To ensure data integrity, minimize the risk of data loss, and maintain compliance with
regulatory requirements, the following measures should be taken into consideration based on a
1. Validate Electronic Data Storage: All electronic data storage locations, including
integrity and reliability. This validation process should be based on a risk assessment
that considers the criticality of the data and the potential impact on product quality.
2. Define Roles and Responsibilities: Clearly define the roles and responsibilities for
responsible for conducting the validation, establishing procedures for data storage
validation, and defining criteria for assessing the integrity and reliability of the data
storage locations.
3. Enhance Storage Solutions: Implement measures to enhance the storage solutions and
protect data from unauthorized access, loss, or modification. This may involve using
secure storage systems, encryption techniques, access controls, and backup and
governance, considering the criticality of the data and the potential risks associated with
its modification or deletion. This involves identifying and assessing the risks to data
integrity, prioritizing resources and controls based on the level of risk, and
5. Regular Data Validation and Revalidation: Establish a process for regular data
revalidation activities when changes or upgrades are made to the storage systems.
To ensure strict control over data access and enhance data security, identify vulnerable points,
and mitigate the risks associated with unauthorized data access, the following measures should
1. Authorization and Role-Based Access Control: Define clear roles and permissions for
(RBAC) system where access to data is granted based on the specific tasks and
personnel have access to data and limits access to sensitive information on a need-to-
know basis.
outlines the criteria and procedures for granting and revoking access privileges. Access
should be granted based on a need-to-know basis, considering the specific job functions
access permissions are aligned with the current roles and responsibilities of personnel.
authentication mechanisms to verify the identity of individuals accessing the data. This
can include the use of strong passwords, multi-factor authentication, and periodic
password updates. Enforce password policies that require the use of complex passwords
Data backup is a critical aspect of data integrity and plays a crucial role in ensuring data safety
and reliability. Here are some important considerations related to data backups and archives to
1. Comprehensive Backup Strategy: The backup files should include all data from the
original record, along with its associated metadata. It is essential to ensure that the
backup process captures accurate and complete data to maintain data integrity. The
backups should be performed on a regular basis, ideally daily, to minimize the risk of
data loss.
2. Data Protection: Backup files must be protected from loss, erasure, or alteration.
and access controls, helps prevent unauthorized access and maintains the integrity of
the backup data. Regular validation of the backup process is essential to ensure the
the ability to retrieve backup information is crucial. It facilitates the verification of data
accuracy, completeness, and consistency. The backup data serves as a reference point
should be validated, secured, and controlled throughout the entire data lifecycle. This
ensures that the archived data remains intact and accessible over time. Adequate
Training and educating staff members about data security policies and data integrity guidelines
are crucial for ensuring the successful implementation of data security processes. Here are some
1. Staff Training: Staff members should receive comprehensive training on data integrity
guidelines, data management practices, and the importance of data consistency and
reliability. This training should cover topics such as data security measures, proper data
handling procedures, access controls, and the consequences of data integrity lapses. By
equipping employees with the necessary knowledge and skills, organizations can
2. Accountability: Individuals responsible for data integrity lapses should be identified and
appropriately addressed. This may involve removing them from positions where they
could influence current Good Manufacturing Practice (cGMP) or drug application data
for their actions reinforces the importance of data integrity and sends a clear message
Leadership should promote and prioritize data integrity as a core value, setting an
Incentives, such as recognition or rewards, can be provided for periods of time without
any data integrity issues or for implementing effective data security measures. These
incentives can motivate employees and organizations to maintain high standards of data
The FDA (2018) emphasizes the importance of engaging a third-party auditor to conduct an
assessment enables the identification of any existing gaps or areas for improvement,
Furthermore, the implementation of mechanisms like anonymous reporting systems can foster
an environment where employees feel comfortable reporting suspected data integrity breaches
without fear of retaliation (FDA, 2018). These reporting systems encourage transparency and
facilitate the identification and resolution of data integrity concerns within the organization
Data governance plays a crucial role in overseeing data integrity practices and ensuring
adherence to established policies and procedures (FDA, 2018; Shafiei et al., 2015). By
establishing robust data governance frameworks, organizations can effectively monitor and
regulate data integrity throughout its lifecycle (FDA, 2018; Shafiei et al., 2015).
Improving quality oversight is another essential aspect of addressing data integrity issues. This
can involve implementing enhanced documentation practices, increasing staff training on data
integrity guidelines and best practices, and refining existing processes to better align with data
The integration of technology and automation in existing systems has proven to be instrumental
in enhancing compliance with data integrity regulations, as highlighted by Shafiei et al. (2015).
Automating data collection processes has been shown to minimize incorrect entries, ensure the
completion and verification of all fields in patient information, and achieve a comprehensive
In 2003, the FDA introduced the concepts of quality by design (QbD) and process analytical
technology (PAT) with the aim of incorporating quality into the product manufacturing process
from its inception (FDA, 2004). PAT tools enable real-time measurement of critical process
parameters, facilitating timely product release for commercial distribution (EMEA, 2010).
Apart from providing competitive advantages such as low rejection rates, cost-effective
analysis, and high yield, these tools also possess robust data integrity features (Spivey, 2022).
A recent proposal by Floryanzia et al. (2022) introduces the Computer Vision for Disintegration
(CVD) system, which can be employed alongside traditional tablet disintegration testers to
monitor tablet pieces and differentiate them from the surrounding liquid. By utilizing machine
learning models to analyze and interpret data captured by cameras, the CVD system offers high
efficiency and improves our understanding of the disintegration mechanism (Floryanzia et al.,
To enhance data security and integrity, it is essential to upgrade computer systems and software,
minimizing potential security risks (Shafiei et al., 2015). Regular software updates should be
ensured, and robust access controls and permissions should be implemented (Shafiei et al.,
2015). Utilizing advanced technologies like data encryption, firewalls, and intrusion detection
systems can provide an additional layer of protection against data breaches (Shafiei et al., 2015).
As the landscape of drug development evolves with the introduction of newer modalities,
questions arise regarding the suitability of legacy informatics and hardware systems for small-
The adoption of these newer technologies is driven by their ability to embrace a holistic
integrated approach to BioPharma Lifecycle Management (Weiss, 2022). This approach entails
integrations, contextualized data stores built on F.A.I.R principles, and integrated analytics to
drive business intelligence through a unified digital platform (Weiss, 2022). The successful
adoption of these technologies will depend on their ease of implementation, immediate business
4.2 Review the existing data governance models and their applicability in the
pharmaceutical context.
Data governance plays a pivotal role in the pharmaceutical industry, who generates vast
amounts of data throughout the drug development lifecycle, clinical trials, and post-market
surveillance (Khin et al., 2020; Roe, 2021; Truong et al., 2017; Weiss, 2022). Effective data
governance models are essential for managing this data, ensuring its quality, and adhering to
regulatory requirements (FDA, 2016). Currently there are three prominent models: centralized,
federated, and hybrid (Abu-Elkheir et al., 2013; Al-Ruithe et al., 2019; Dehghani, 2022; Ladley,
The centralized data governance model establishes a single, centralized authority responsible
for governing and managing data across the entire pharmaceutical organization (Weber et al.,
2009). Under this model, a central team defines data standards, policies, and procedures,
ensuring consistency and uniformity throughout the organization (Ladley, 2019). This approach
enables better control over data quality, integrity, and security (Ladley, 2019). However, it may
Based on the description above, it could be concluded that the centralized model suits
and centralized decision-making. It is particularly useful for managing sensitive data, such as
patient records, intellectual property, and clinical trial data. Moreover, this model aids in
The federated data governance model distributes data governance responsibilities across
(Abu-Elkheir et al., 2013; Nadal et al., 2023). This model promotes localized decision-making,
allowing each unit to tailor data governance practices according to their specific needs (Abu-
Elkheir et al., 2013; Dehghani, 2022; Nadal et al., 2023). The federated approach encourages
al., 2013; Nadal et al., 2023). However, ensuring consistency and coordination across diverse
entities can be challenging, potentially leading to data silos and inconsistent data practices
Based on the description above, it could be concluded that the federated model is well-suited
for large pharmaceutical organizations with diverse business units or subsidiaries operating in
and fostering collaboration among stakeholders. This model supports agility and responsiveness
to local market dynamics, ensuring compliance while catering to varying data needs across
different entities.
The hybrid data governance model combines elements of both centralized and federated models
(Al-Ruithe et al., 2019). It seeks to strike a balance between centralized control and local
autonomy (Al-Ruithe et al., 2019). In this model, core data governance principles and standards
are established centrally, ensuring consistency and adherence to global policies (Al-Ruithe et
al., 2019). Simultaneously, local units have the flexibility to adapt and extend these standards
to cater to their unique requirements (Al-Ruithe et al., 2019). The hybrid model allows for
efficient data management while accommodating local nuances (Al-Ruithe et al., 2019).
Based on the description above, it could be concluded that the hybrid model is beneficial for
pharmaceutical organizations that seek a balance between centralized control and local
flexibility. It enables the establishment of core data governance principles while empowering
local units to adapt to their specific requirements. The hybrid model ensures consistency in
critical areas while accommodating customization and innovation at the local level.
In today's world, aside from centralized, federated, and hybrid data governance models, it's also
vital to consider challenges in data governance within cloud computing environments. The
following section is based entirely on an article from the Harvard Business School (Iansiti et
al., 2021) that explores the case of Moderna. This article highlights the utilization of cloud
computing and digital technologies by Moderna to enhance its operations and sheds light on
Iansiti et al. (2021) share that according to Moderna's Chief Digital Officer, Damiani, operating
in the cloud is a pivotal decision that brings about numerous benefits, including increased
security, cost-efficiency, agility, resilience, and disaster recovery capabilities. Damiani states,
"Operating in the cloud rather than building our own infrastructure was foundational to
everything else we did. It was the first decision we made". (Iansiti et al., 2021).
Since 2013, Moderna has been utilizing Amazon Web Services (AWS) and has continued to
expand its partnership with the platform over time. This strategic collaboration with AWS has
played a vital role in Moderna's digital transformation journey, providing them with the
necessary infrastructure and tools to bolster their operations. (Iansiti et al., 2021).
Figure 5, presented below, illustrates Moderna's digitization building blocks, showcasing the
various components and elements that contribute to the company's digital strategy and
computing and digital technologies to optimize their operations. (Iansiti et al., 2021).
Iansiti et al. (2021) also highlight that another fundamental principle embraced by Moderna is
the concept of data integration. Damiani and other executives recognize the detrimental effects
of siloed data on efficiency and productivity within the organization. To address this challenge,
Moderna strives to harmonize data across systems, aiming to enter it once and enable its
seamless flow to relevant teams. Furthermore, the company extends data integration to connect
laboratory instruments through the Internet of Things (IoT), thereby enhancing overall data
Automation and robotics play a significant role in Moderna's digital transformation journey.
extensive automation initiatives. Rather than automating everything at once, Moderna adopts
islands are then integrated into a cohesive whole, ensuring a cautious and measured approach
Analytics and artificial intelligence (AI) are integral components of Moderna's digitization
strategy. Damiani views AI as the "holy grail" and recognizes the value of digitization in
generating structured data, which is crucial for developing algorithms that support the creation
of next-generation medications. In his words, Damiani states, "We relied on digitization early
on, not for the sake of digitization but for generating data. Today, we have a lot of structured
data, for instance in research and pre-clinical production. When we run experiments, we collect
even more data. This allows us to build better algorithms, which helps build the next generation
Figure 6, presented above, illustrates the digital integration efforts at Moderna, showcasing the
interconnectedness of various digital components within the organization (Iansiti et al., 2021).
Iansiti et al. (2021) show that Moderna adopts a mixed approach when it comes to sourcing
digital solutions. For undifferentiated processes like finance and HR, the company leverages
off-the-shelf Software as a Service (SaaS) tools. It is estimated that approximately 85% of the
tools used for such processes are existing SaaS solutions. However, for company-specific
processes and innovation, such as research and technical development, Moderna relies on
Different functions within Moderna exhibit varying levels of adoption when it comes to digital
and AI technologies, with pre-clinical activities leading the way. Despite this variation,
Damiani, Moderna's Chief Digital Officer, expresses confidence in the organization's progress
and estimates that they have achieved approximately 60% to 70% of his vision for digital
transformation. Automation has played a crucial role in reducing cycle times, enabling Moderna
to operate at a faster pace compared to other pharmaceutical and biotech companies. (Iansiti et
al., 2021).
According to Dave Johnson, the head of Informatics, Data Science, and AI at Moderna, artificial
intelligence (AI) provides the company with a competitive advantage by supporting decision-
making processes and enabling predictions that would be unattainable for humans within a
reasonable timeframe. This scalability empowers Moderna to accelerate its operations and gain
Although the success of Moderna on cloud environments is very inspiring, it should be noted
that the challenges for cloud data governance are still very important and complex.
Al-Ruithe & Benkhelifa (2017) observe that the transition to cloud computing environments
presents numerous challenges for many organizations. As a result, data governance strategies
need to adapt in terms of structure, human resources, technology, procedures, roles, and
responsibilities (Al-Ruithe & Benkhelifa, 2017). Although cloud computing offers various
benefits such as cost efficiency, unlimited storage, backup and recovery capabilities, automatic
software integration, easy data access, quick deployment, scalability, and the availability of
new services, its widespread use is hindered by several concerns (Ko et al., 2011).
One of the most significant concerns is related to security and privacy issues, with 41% of these
concerns attributed to governance and legal issues (Khanghahi & Ravanmehr, 2013). Therefore,
data governance plays a critical role in successful cloud governance. The barriers to
organizational, legal, policy, financial, and knowledge-related factors (Al-Ruithe & Benkhelifa,
2017). Another constraint is the business value of data governance, which necessitates the
development of a charter for a data governance program, including its mission and vision (Al-
Ruithe & Benkhelifa, 2017). Poor communication between staff and other stakeholders is
identified as a reason for failure in data governance programs within organizations (Ladley,
2019), emphasizing the need for a well-defined communication plan for data governance to
In the cloud landscape, the absence of cloud regulations and the lack of cloud data governance
requirements are two environmental factors that affect the implementation of data governance
(Al-Ruithe & Benkhelifa, 2017). While some organizations attempt to establish their own cloud
data governance, barriers such as limited knowledge and financial resources hinder their
progress (Khanghahi & Ravanmehr, 2013; Mary et al., 2011; Self, 2014). Additionally,
organizational challenges like general culture and mindset, leading to resistance to change, pose
further obstacles (Al-Ruithe & Benkhelifa, 2017). The complexity of implementing cloud data
governance arises from the involvement of external stakeholders and the need for innovative
al., 2014). Insufficient and inadequate technologies can have a negative impact on the feasibility
as the modern data stack. “The modern data stack is a combination of various software tools
used to collect, process, and store data on a well-integrated cloud-based data platform. It is
known to have benefits in handling data due to its robustness, speed, and scalability” (Chia,
2023). Navigating the modern data stack can vary in ease or difficulty for pharmaceutical
companies. There are several factors, including their existing infrastructure, data complexity,
regulatory requirements, and the level of expertise within their organization (Chia, 2023).
However, the complexity and variety of tools and technologies available can make it difficult
to determine which combination is the best fit for their specific needs, as Figure 7 below:
Horner (2023) states that while the modern data stack represents a substantial advancement
compared to the conventional practice of manually coding data pipelines using outdated tools,
it has also encountered criticism for falling short of its stated benefits in numerous aspects such
as tool sprawl, procurement and billing issues, high cost of ownership, length setup / integration
To handle this complexity of cloud environment, Al-Ruithe et al. (2016) suggest the critical
success factors for cloud data governance, that will be examined in detail below:
Cloud data governance success can be attributed to two key aspects: organizational and
technological factors (Al-Ruithe & Benkhelifa, 2017; Mary et al., 2011; Self, 2014).
and IT, executive sponsorship, and the establishment of a data governance center of excellence
(Al-Ruithe & Benkhelifa, 2017; Murray, 2023). On the other hand, technological factors
revolve around automating the data integration lifecycle to support data governance objectives
A study identified ten data governance success factors, including strategic accountability,
standards, addressing managerial blind spots, recognizing the complexities of data, cross-
divisional collaboration, data quality metrics, partnership with other companies, strategic points
of control, training and awareness for data stakeholders, and compliance monitoring (Cheong
Another article suggests the development of data governance guidelines, principles, policies,
organizational structures, and procedures to support the data governance framework (Rifaie et
al., 2009). The authors highlight three essential elements: structure, process, and
outlining the roles and responsibilities of each member (Rifaie et al., 2009). Process
encompasses decision-making processes for data assets, reviewing and approving data-related
monitoring and measurement of these processes and decisions, as well as the mechanisms for
Successful cloud data governance requires the collaboration of diverse expertise from various
processes, accountability, scope and focus maintenance, KPI measurement, compliance and
Addressing security and compliance in a cloud environment is crucial for the development of a
robust cloud data governance program (Rebollo et al., 2015). Cloud computing's disruptive
nature necessitates the implementation of comprehensive data governance strategies that may
vary depending on the deployment or delivery model (Al-Ruithe et al., 2016). Involving cloud
actors as integral stakeholders is vital for successful cloud data governance (Felici et al., 2013).
However, the complexity of legal contracts between cloud actors often poses challenges for
ordinary cloud consumers to understand (Dogo et al., 2013). Therefore, aligning the data
governance strategy with cloud computing regulations is essential to incorporate all relevant
requirements into the service level agreement when adopting cloud computing services (Badger
et al., 2012).
framework for critical success factors in cloud data governance. The framework comprises four
cloud data governance strategy that considers the unique challenges and requirements
of the organization.
2. Cloud data governance CSFs: This dimension identifies the critical success factors
crucial for successfully implementing the cloud data governance strategy. These factors
considerations.
3. Cloud data governance CSFs evaluation: This dimension involves assessing and
evaluating the identified CSFs to determine their importance and potential impact on
implementing the cloud data governance strategy. This evaluation helps prioritize and
executing and operationalizing the cloud data governance strategy based on the
By following this framework, organizations can enhance their understanding of critical factors
contributing to successful cloud data governance and make informed decisions throughout the
Implementing any data governance model in the pharmaceutical context poses numbers of
challenges. These include ensuring data privacy and security, aligning with evolving regulatory
landscapes, managing data across diverse systems and stakeholders, and fostering a culture of
data governance throughout the organization (Khin et al., 2020; Panian, 2010; Truong et al.,
2016; Cheong & Chang, 2007; Rifaie et al., 2009; Weber et al., 2009). Additionally,
organizations must consider the scalability of their chosen model to accommodate future
growth, the integration of emerging technologies such as artificial intelligence and machine
learning, and the ability to adapt to evolving data governance best practices (Levy, 2021;
organizations recognize the value of data (Khatri & Brown, 2010). With the growing volume
of data used within organizations, the effective management of data has become critical for
successful business operations (Tallon et al., 2013). Data plays a crucial role in both operational
and strategic decision-making processes (Tallon et al., 2013). Establishing trust in data is of
paramount importance as a lack of trust can lead to significant time wastage (50%) in “hunting
for data” (Redman, 2013). Conversely, when data is trusted, it is more likely to be shared,
In a study conducted by Alhassan et al. in 2019, seven critical success factors (CSFs) for data
governance were identified and ranked based on their perceived importance (Table 1). The
The competencies of employees directly influence their ability to define, implement, and
monitor data processes, procedures, policies, and requirements, as well as their capability to
handle data governance activities (Alhassan et al., 2019). These competencies are crucial for
top managers in establishing an overall data governance strategy and treating data as a strategic
asset (Alhassan et al., 2019). It is also essential for employees to possess the necessary
capabilities and awareness to handle data entry and access, ensuring the integrity and security
of the organization's data (Alhassan et al., 2019). To ensure appropriate employee data
processes (Alhassan et al., 2019). This training should be conducted both internally and
externally to increase employees' awareness of the importance of data accuracy and the secure
Data processes and procedures are a major focus in data governance, ensuring the reliable and
effective flow and use of data (DAMA International, 2009). While it is essential to define,
implement, and monitor data processes and procedures across all aspects of data management,
specific emphasis is often placed on areas such as data quality, data access, and data recording
and storage (Alhassan et al., 2019). The absence of clear and well-defined data processes and
procedures can raise doubts about the reliability of data, which can be attributed to factors such
as undefined procedures or missing components like data testing (Alhassan et al., 2019).
To achieve effective data processes and procedures, it is recommended to embed them into the
system itself (Koh et al., 2011). This can be done through the inclusion of mandatory fields,
validation methods, and data flow requirements that guide users in adhering to established
processes (Alhassan et al., 2019). Regular checks and updates of existing processes and
procedures are also essential to ensure their ongoing relevance and effectiveness (Alhassan et
al., 2019).
Flexible data tools and technologies have a noteworthy impact on various critical success
factors (CSFs) in data governance, including the establishment of standardized and easily
understandable data policies, clear data processes and procedures, and the implementation of
inclusive data requirements (Alhassan et al., 2019). By leveraging flexible data tools and
technologies, organizations can embed these policies and processes into suitable systems,
ensuring proper formatting and enforcement, such as making specific fields mandatory or
employing automated validation methods (Alhassan et al., 2019). The availability of robust data
tools and technologies further enables the successful implementation of other CSFs (Alhassan
et al., 2019).
To effectively address flexible data tools and technologies, it is essential to have the appropriate
IT infrastructure and integrated data environment (Alhassan et al., 2019; Mary et al., 2011;
Tallon et al., 2013). This may involve the adoption of advanced technologies for data
integration, allowing for automated data validation and seamless data flow (Alhassan et al.,
reliability and flexibility of systems, with consideration given to accommodating future changes
(Alhassan et al., 2019). Data privacy and availability should also be considered when
integrating internal and external systems, ensuring compliance with relevant regulations and
The establishment of standardized and easily understandable data policies is crucial for
providing high-level guidelines and rules for handling data within an organization (Alhassan et
al., 2019). When specific data lacks well-defined policies, it can create uncertainty among
employees and hinder decision-making processes due to a lack of understanding on how the
Furthermore, accessing unnecessary data that compromises privacy can have a negative impact
on business performance (Alhassan et al., 2019). The study suggests that data policy documents
should follow a specific template, keeping them basic and up to date to ensure employees
comprehend and value the importance of adhering to the guidelines (Alhassan et al., 2019).
However, merely having defined data policies is not sufficient for effective data governance. It
is highly recommended to implement these policies by embedding them into systems with
mandatory fields and validation methods (Alhassan et al., 2019). This integration ensures that
data is handled in accordance with the established policies and facilitates compliance (Alhassan
et al., 2019). Additionally, regular monitoring and periodic updates of data policies through
audits are essential to ensure their continued relevance and effectiveness (Alhassan et al., 2019).
Clearly defined data roles and responsibilities are essential in data governance to identify
individuals accountable for various data-related activities within the organization (Khatri &
Brown, 2010). These roles include responsibilities such as defining data policies and processes,
ensuring data quality and integrity, managing data access and security, and overseeing data
Without well-defined roles and responsibilities, even with good processes in place, there is a
risk of errors and confusion in data management (Alhassan et al., 2019). Assigning specific
responsibilities to individuals or teams helps establish accountability and ensures that the
necessary actions are taken to maintain data accuracy, consistency, and compliance with
policies and regulations (Alhassan et al., 2019; Khatri & Brown, 2010).
By clearly delineating data roles and responsibilities, organizations can improve coordination
and collaboration among stakeholders, enhance data governance practices, and mitigate the risk
of data-related issues (Khatri & Brown, 2010). It is important to regularly review and update
these roles and responsibilities to adapt to evolving business needs and technological
Clear and inclusive data requirements play a crucial role in the successful implementation of
data governance practices (Alhassan et al., 2019). These requirements define various aspects of
data implementation, including data flows, integration, mandatory fields, and validation
requirements effectively to ensure a clear understanding of data needs by the IT team (Alhassan
et al., 2019). This requires a strong understanding of the business processes and objectives that
rely on data, as well as the ability to communicate those requirements in a formal and detailed
Effective communication between data owners and implementers is essential to ensure that data
requirements are properly translated into IT systems and processes (Alhassan et al., 2019). This
includes specifying the data flow, identifying mandatory fields that need to be captured, and
defining validation methods to ensure data accuracy and consistency (Alhassan et al., 2019).
Employees with strong data competencies can contribute significantly to understanding and
defining the right data requirements for the organization (Alhassan et al., 2019).
Developing focused and tangible data strategies that align with organizational goals is essential
for achieving success in data governance (Alhassan et al., 2019). These data strategies should
encompass the decision domain of data principles and provide a framework for guiding data-
To ensure the effectiveness of data strategies, it is essential to consider both short-term and
long-term objectives (Alhassan et al., 2019). Short-term objectives help address immediate data
governance needs, while long-term objectives provide a roadmap for future data management
al., 2019). Organizations should acknowledge the value of data in driving business outcomes
and treat it as a critical resource (Alhassan et al., 2019). Assigning a top management committee
for data governance can provide the necessary leadership and oversight to support the
development and implementation of data strategies (Alhassan et al., 2019). This committee
should have a clear mandate to define data governance goals, establish policies, allocate
resources, and monitor the progress of data initiatives (Alhassan et al., 2019).
5 Research Methodology
The research methodology applied in this study adopts a systematic approach to investigate and
understand the data governance models prevalent in the pharmaceutical sector. The conceptual
model guiding this thesis is based on a quantitative research design, utilizing surveys conducted
with key pharmaceutical companies. These surveys will provide quantitative insights into the
within the industry. Additionally, the research will assess the experiences of these organizations
encountered. By analyzing the quantitative survey responses, the study seeks to pinpoint
Critical Success Factors (CSFs) in data governance that contribute to effective implementation
within pharmaceutical context. The collected quantitative data will enable a robust analysis that
sheds light on the most effective approaches and practices in pharmaceutical data governance,
thereby offering valuable insights for industry practitioners and researchers alike.
5.2 Research context and philosophy: The criteria for selecting participants.
The philosophy underpinning this research is rooted in a pragmatic approach, seeking to derive
practical insights from real-world practices. The selection criteria for pharmaceutical
networks, invitations to participate in the survey were extended to individuals with diverse roles
across various departments such as Data, Marketing, Sales, Production and R&D etc., ensuring
a multifaceted perspective.
Recognizing the sensitivity and confidentiality of strategic data governance information held
foster candid and truthful responses, as respondents can freely share their experiences,
specific company names or individual identities, the study maintains a high level of privacy and
confidentiality. This approach enables the research to extract unbiased and authentic insights,
5.3 Instruments/measures
The central tool employed in this study is a structured questionnaire. This instrument has been
governance models, change management undertakings, and the critical success factors specific
and industry best practices, the questionnaire explores various dimensions. It delves into the
realm of data governance models, particularly examining the adoption and alignment of
The survey is partitioned into distinct themes for an organized exploration. Topics encompass
the implementation of data governance practices within the organizational framework, the
address the intricate nuances of data governance pertinent to the pharmaceutical sector,
scrutinizing compliance with FDA regulations. This entails aspects like data retention policies,
metadata and audit trail management, contemporaneous practices, and regulated access to
computer systems, thus providing a comprehensive view of how companies align themselves
Moreover, the questionnaire inquiries about change management experiences, eliciting insights
into challenges, successes, and lessons learned during the implementation of data governance
initiatives. Critical Success Factors (CSFs) are assessed through a set of structured questions
governance practices.
With a diverse participation of 19 individuals from many countries spanning across departments
including Data, R&D, Marketing, Production, Sales, Clinical, and Finance, a comprehensive
approach has been adopted (Figure 9). Acknowledging the importance of unbiased and holistic
comprehension of data governance across the entire organization, the data is treated as a
cohesive entity rather than being analyzed departmentally. This approach ensures that insights
are not skewed by department-specific perspectives, aligning with the broader strategy of clear
The amalgamated data from all participants is subjected to a comprehensive analysis aimed at
understanding the various facets of data governance. Each question within the survey is
different responses, conclusions are drawn about distinct aspects of data governance. This
approach allows for a comprehensive portrayal of how data governance is comprehended and
having executed data governance programs with: 36% have adopted a comprehensive
implementation, while 27% have chosen a partial approach. The remaining 27% acknowledge
a lack of awareness regarding any data governance initiatives, while 9% of them confirm a data
The result above shows that the establishment of a data governance model often evolves through
a pragmatic and progressive approach. This approach is frequently adopted to gradually involve
teams and departments, relying on compelling internal results and success stories. The decision
to follow a "partial" approach is often a deliberate choice to build step by step, measure
According to Figure 11, among the organizations that are implementing data governance
programs, the rate stayed at approximately 30% for both “over the past 3 years” and “less than
3 years”. Notably, around 40% of these projects are developed within a timeline of “5 to 10
years”. This pattern can be interpreted as a signal of the increasing embrace of the partial or
could account for the significance of such a program in the pharmaceuticals industry.
The findings presented in Figures 12 and 13 reveal that only 50% of organizations put the
responsibility for directing and implementing data governance programs on Data Governance
or Data Managers, who dedicated their efforts entirely to the Data Governance program.
Conversely, the remaining organizations apportion these duties among individuals with other
primary responsibilities, including the Chief Information Officer (CIO), the entire IT
Department, and the Data Department, each accounting for 10% of cases. Additionally, 20% of
organizations delegate these responsibilities to individuals within the Data team, such as Data
This outcome highlights the relatively immature state of personnel organization within Data
Governance. It could be explained that for organizations that are in the early stages of
implementing this program, they often lack a dedicated position to oversee it, and therefore rely
A positive aspect is that all organizations acknowledge the widespread use of data across their
various departments. However, as indicated in Figure 14, the pharmaceutical industry continues
to heavily rely on a centralized data team, accounting for 70% of cases. This team is responsible
for tasks like data collection, transformation, and provisioning to all departments. Only 20% of
organizations have taken the step of assigning at least one data specialist to each department,
and a mere 10% have designated a data specialist for just some departments.
approach to data sharing and utilization with the centralized model. The centralized data
governance model establishes a single authority responsible for governing and managing data
across a pharmaceutical organization (Weber et al., 2009. It ensures consistency and control
over data quality and security but may encounter challenges in scalability, flexibility, and
meeting diverse stakeholder needs (Dehghani, 2022; Machado et al., 2022; Weber et al., 2009).
The promising sign is that a subset of organizations is starting to recognize the significance of
embedding data experts within business units to harness the value of data more efficiently.
Figure 15 illustrates that 75% of organizations maintain a Center of Excellence staffed with
Data experts. This Center is responsible for overseeing, governing, and providing technical
assistance to all the data members in each department (Al-Ruithe & Benkhelifa, 2017; Murray,
2023). Conversely, the remaining 25% of organizations lack a centralized data team, and
consequently, there is no data sharing among departments at all. The survey results reveal a
significant shift in data management practices within the pharmaceutical sector. Notably, a
substantial percentage of organizations are adopting hybrid data governance models to balance
decentralization and centralized control. These findings highlight the industry's drive to ensure
Savings, garnering 100% support (Figure 16). Following closely, Scalability and Flexibility,
alongside Enhance Data Security and Privacy, share the second spot with 85% agreement each.
Improved Collaboration and Accessibility, combined with the potential for Disaster Recovery
and Business Continuity, secure the third position with a robust result of 71%. This outcome
highlights cloud computing's critical role in pharmaceutical organizations, vital for managing
the escalating volume of data or big data and ensuring their operational efficiency.
The result shares the same conclusion of Ko et al. (2011) that cloud computing undoubtedly
presents a range of advantages, including cost efficiency, boundless storage capacity, robust
backup and recovery capabilities, automated software integration, seamless data accessibility,
swift deployment, scalability, and the provision of novel services (Ko et al., 2011), its broad
As depicted in Figure 17 below, while data practitioners grasp the crucial significance of data
security and privacy in the pharmaceutical realm, only 17% of organizations have a well-
defined Cloud Data Governance program. Conversely, a significant 50% have not yet initiated
Figure 18 provides insight into the challenges faced, where 100% of respondents encounter
issues integrating cloud data with on-premises systems. Furthermore, 83% struggle with
ensuring data privacy and meeting regulatory requirements, as well as managing data access
and user permissions. Additionally, 67% grapple with complications related to data residency
and jurisdiction concerns. These findings underline the complexity of establishing effective
The findings unequivocally affirm that within the spectrum of concerns, security and privacy
issues assume a significant prominence, with their roots traced back to governance and legal
factors, as identified by Khanghahi & Ravanmehr (2013). Considering this, the role of data
environments. The hurdles that obstruct the seamless execution of a cloud data governance
What challenges does your organization face with Data Governance in cloud environment?
In the realm of pharmaceuticals, addressing the prevalent challenges associated with data
integrity and complying with FDA and cGMP regulations requires a proactive approach (FDA,
2018). Neumeyer (2020) outlines key strategies to mitigate these issues, highlighting the critical
aspects of Data Retention, Metadata and Audit Trails, Data Quality Validation,
For preventing breaches in data integrity and upholding pharmaceutical standards, meticulously
adhering to current Good Manufacturing Practice (cGMP) regulations are pivotal. Notably, the
Code of Federal Regulation (CFR) 211.180 establishes a critical guideline, stipulating those
records pertinent to production, control, or distribution, necessary for cGMP compliance, must
In this context, the survey findings presented in Figure 19 above emphasize a mere 38% of
organizations possess a defined document outlining the duration of data retention along with
suitable archival systems for critical records. These encompass essential elements such as batch
documents, marketing authorization application data, and traceability data for human-derived
such a document, while an additional 12% remain unaware of its existence. These results
highlight the pressing need for heightened attention to establishing clear guidelines for data
retention and robust archival mechanisms within organizations to uphold data integrity and
regulatory compliance.
On the other hand, maintaining the security of data requires that it be kept in its original form
or accurate duplicates, like photocopies, microfilm, or microfiche, or any other format that
precisely replicates the initial records (MHRA, 2015). A record or log should also be
records, in line with Good Manufacturing Practice (GMP) standards (PIC, 2021).
The survey outcomes indicate that only 25% of photocopies, microfilm, or microfiche are
retained as true copies (Figure 20). A significant 50% are uncertain, which might be because
they aren't directly responsible for these files. Still, an alarming 25% confirm that these records
As outlined by Charoo et al. (2023), both metadata and audit trails assume pivotal roles in
Practice (cGMP) standards. These elements are vital in ensuring the integrity of data,
facilitating seamless data retrieval, and enabling effective data utilization (Charoo et al., 2023).
Specifically, metadata encompasses structured data that provides contextual insights and data
Figures 21 reveals that the integration of Metadata, Data Catalogs, and audit trails within the
70% of organizations indicate the presence of metadata, while a slightly lower yet notable 60%
affirm the existence data catalog and 50% for audit trails within their operations. It's worth
noting that a discernible 10% to 20% of respondents remain uncertain about the presence of
these resources, hinting at a potential gap in awareness or communication among data users.
This uncertainty could mean that these valuable assets aren't being used or shared effectively.
In addition, the list of metadata elements in a particular dataset includes a wide range of
components. These include the date and time stamp of the activity, the unique user identification
linked to the individual executing the test or analysis, the identification code of the instrument
employed for data capture, information about the material's status, a distinctive material
identification number, as well as comprehensive audit trails that provide a documented history
of interactions and changes (Charoo et al., 2023). Figure 22 presented below shows that every
single organization, totaling 100%, has incorporated these specific instances within their
metadata practices. With 80% among them has also audit trails and only 20% add that it depends
To enhance data governance, Shafiei et al. (2015) recommended establishing mechanisms that
automatically generate audit trails for various data activities, including creation, modification,
and deletion. These trails should encompass crucial information such as timestamps, user IDs,
and event specifics, ensuring a comprehensive and accurate record of data events (Shafiei et al.,
2015).
Figure 23 below demonstrates a positive outcome: the complete integration of these precise
instances into the metadata practices of every single organization, amounting to a 100%
adoption rate.
Data governance places significant emphasis on data processes and procedures, which are
pivotal for ensuring the reliable and efficient flow and utilization of data (DAMA International,
2009). While encompassing all aspects of data management, specific attention is often directed
towards elements such as data quality, data access, and data recording and storage (Alhassan et
al., 2019). The absence of well-defined data processes and procedures can cast doubt on data
reliability, stemming from factors like undefined protocols or incomplete components such as
data testing (Alhassan et al., 2019). To establish robust data processes and procedures,
integration within the system itself is recommended (Alhassan et al., 2019). This can be
achieved by incorporating mandatory fields, validation methods, and data flow prerequisites
Figure 24 notably highlights that only 70% of organizations have initiated processes for
outlining data quality requirements, coupled with the establishment of data quality rules and
the implementation of validation and cleansing procedures. This underscores the need for
enhanced attention and systematic efforts in building and fortifying data processes to bolster
6.1.3.4 Contemporaneous
Neumeyer's research (2020) uncovered that employees were signing logbooks and batch
records much later than when the actual tasks were performed, contradicting the
and Accurate) (EMEA, 2010). This principle emphasizes recording data as events happen. To
tackle this issue, it's crucial to train analysts to document their work promptly after completion
and to establish procedures that enforce this practice (Charoo et al., 2023). Fortunately, there
are software platforms available that can capture activity details in real-time, creating a secure
audit trail that can't be modified (Charoo et al., 2023). These platforms offer a solution to
The survey findings shown in Figure 25 indicate that only 50% of organizations use tools like
Logbooks and batch records (that record the details of the activity of who carried out of the
work, when and why, permanently and real time), LIMS (laboratory information management
system), and MES (manufacturing execution systems) to manage operational data, while 30%
to 40% are unaware of these tools. The relatively low adoption of tools like Logbooks, LIMS,
and MES to manage operational data suggests a potential area for improvement in ensuring
To make sure computer systems follow the rules of current Good Manufacturing Practices
(cGMP), Standard Operating Procedures (SOPs) are vital (Charoo et al., 2023). These
procedures should define the roles of system administrators and also outline who has access to
different cGMP computer systems (Charoo et al., 2023). It's important to choose system
administrators who are different from the people in charge of the records, like those working in
the lab (Charoo et al., 2023). This helps keep things fair and reduces the risk of unauthorized
actions (Charoo et al., 2023). Through clear SOPs, the specific rights and duties of both system
administrators and authorized staff can be set, creating strong rules that prevent unauthorized
However, the survey results show a gap, with only 75% of organizations assigning the role of
system administrator (with the power to change files and settings) to people who aren't
responsible for the record content (Figure 26). Fixing this gap is crucial for keeping data
Simultaneously, Figure 27 illustrates that only 60% have implemented processes encompassing
access controls, encryption, and data masking, which are crucial components for maintaining
data quality standards in the pharmaceutical context. However, a notable 10% to 20% of
respondents express uncertainty regarding the existence of these processes. This ambiguity
could potentially stem from the diverse backgrounds of respondents, including those from Sales
and Marketing, who might not possess comprehensive knowledge about these technical
processes, highlighting the need for improved communication and awareness across various
departments.
Figure 28 demonstrates that only 50% of employees in organizations have unique individual
accounts and passwords for system access. Alarmingly, a substantial 38% of them might be
compelled to share their accounts. This situation significantly complicates the task of tracing
the identities of those who log in. Neglecting to institute effective controls and access
Both Figures 29 and 30 affirm a significant trend: the importance of data governance is not
being adequately emphasized by leaders or the C-suite. Only 50% of organizations provide
training to their employees on the significance and best practices of data governance.
Surprisingly, 55% of leaders are deemed to have only a partial awareness of the impact of
effective data governance, with only 27% considered fully conscious and 18% having some
level of awareness.
These results align with the findings of Weiss (2022), highlighting that while digital
transformation initiatives often take precedence, aspects like data governance, integrity, and
regulatory compliance might not receive the requisite attention from the C-suite.
Due to the limited awareness evident among employees and leaders alike, the implementation
According to Figure 31, a mere 12% of organizations have currently instituted such incentive
schemes. This dearth of emphasis on incentives mirrors the prevailing deficiency in awareness
levels, underscoring the urgent necessity for more comprehensive educational and awareness
campaigns. These campaigns aim to cultivate a robust culture of data governance within
organizational contexts.
As mentioned by Yang et al. (2010), regarding the facilitation of compliance, the application
motivated to conform to data integrity regulations. These incentives might span periods
characterized by the absence of data integrity related concerns or the successful implementation
of robust data security measures. Through this approach, employees and organizations could be
As reviewed extensively in literature sessions, a wealth of studies has delved into the myriad
challenges and obstacles that surround the implementation of data governance within the
advancements, and the complexities of day-to-day operations (Cheong & Chang, 2007;
Redman, 2013; Rifaie et al., 2009; Spivey, 2022). As illuminated by these studies, successfully
navigating these challenges is not merely a strategic imperative for organizational triumph, but
it also bears far-reaching consequences for public health and the industry's overall credibility
(Eglovitch, 2022; FDA, 2016; Holub et al., 2017; Truong et al., 2017).
The survey results, as shown in Figure 32, provide valuable insights into the primary challenges
faced in implementing data governance within the pharmaceutical industry. The findings
highlight the multifaceted nature of these challenges, which encompass various aspects of
What are the most important challenges in Data Governance within your organization?
The transition to a fully digital environment is identified as the most noteworthy challenge, with
90% of respondents acknowledging its complexity. This transition entails not only the technical
aspects of digitization but also the need to adapt existing workflows and practices to the digital
landscape.
Organizing access and facilitating the sharing of information and knowledge emerge as
significant hurdles, with a notable 82% of respondents highlighting this challenge. This reflects
the complexity of ensuring seamless and secure access to critical data while fostering
Equally prominent is the challenge of mastering the risks associated with document and
information management, such as the potential for loss or unauthorized modification. This
challenge also garners an 82% response rate, underscoring the industry's recognition of the
Defining and implementing rules and processes for document management, including
versioning, workflows, and document naming, emerges as the third challenge, with 54% of
respondents indicating its significance. This highlights the need for standardized and efficient
respondents each. These challenges underline the broader implications of data governance,
including the need to ensure the preservation of critical information over time, effectively assess
the value of various data assets, and manage costs associated with document management and
information systems.
As per Figure 33 below, foremost among the identified obstacles is the budgetary constraint
substantial 100%. This underscores the financial considerations that organizations must grapple
frameworks.
A closely related obstacle is the lack of willingness from decision-makers and managers,
resonating at 64%. The finding aligns with the previous observations, indicating a disparity in
prioritization between data governance and other strategic imperatives. This lack of enthusiasm
from leadership underscores the necessity for advocating the value and significance of data
The survey further reveals challenges stemming from knowledge gaps. Specifically, the lack of
knowledge about methodologies commands a significant 82%, reflecting the complexity of data
governance methodologies and frameworks. Concurrently, the lack of awareness regarding the
risks entailed by inadequate data governance—such as the loss of critical information and the
exposure of personal data—constitutes a substantial obstacle at 73%. The result accentuates the
need for educational initiatives that empower stakeholders with the requisite understanding of
data governance's intricacies and its implications for organizational success and compliance.
Interestingly, the lack of knowledge about obligations and standards is reported as a less
regulated nature of the pharmaceutical industry, where compliance with obligations and
standards is inherently ingrained in operations. However, it's worth noting that even within a
regulated environment, maintaining awareness and alignment with evolving standards remains
their entire existence. Following closely behind are Electronic Document Management (EDM)
and handling sensitive data, both at 73%. These figures emphasize the critical need for robust
office files, disparity of solutions within the organization, and document resource management
all pose challenges at a rate of 36%. Managing paper archives, including the associated costs
and space requirements, presents a challenge at 27%. Intranet content management, enterprise
social media, and digitalizing stocks encounter hurdles at 18%, while email management and
This figure illustrates the diverse and multifaceted landscape of Data Governance, necessitating
comprehensive approaches to ensure regulatory compliance and data integrity. These findings
aim to underscore the intricate nature of Data Governance and its broad scope within a
company. The objective is to raise awareness among leaders regarding the importance of
establishing a well-structured Data Governance plan that covers all the aspects mentioned
above. This ensures that the organization effectively addresses these challenges and maintains
The survey findings presented above offer a simultaneous reflection and a deeper insight into
the data governance practices within pharmaceutical companies. These insights, coupled with
the identification of critical success factors (CSFs) for data governance and the suggestions
from other findings in Literature Reviews, aim to formulate the following critical success
in Pharmaceutical Industry
The Critical Success Factors (CSFs) proposed below are drawn from a comprehensive
understanding of industry-specific needs and best practices thanks to the survey and literature
review. Specifically, from the seven critical success factors (CSFs) for data governance
concluded by Alhassan et al. in 2019 (section 4.3), and the six measures to ensure data integrity
proposed by Shafiei et al. in 2015 (section 4.1.5), combined with the CSFs for data governance
in cloud computing thanks to the research of Al-Ruithe & Benkhelifa in 2017 (section 4.2.4.3).
culture, integrating data from diverse sources, and ensuring data integrity and security.
data for innovation. To effectively address these challenges and capitalize on the available
leaders is significant for the successful implementation of a robust data governance framework
(Al-Ruithe & Benkhelifa, 2017; Murray, 2023). This sponsorship functions as a foundational
cornerstone, carrying the dual role of drawing attention and expediting the allocation of critical
resources, encompassing both human capital and financial provisions (Al-Ruithe & Benkhelifa,
2017; Murray, 2023). Through the acquisition of such sponsorship, pharmaceutical enterprises
can ensure that their data governance endeavors attain the essential prominence and backing
indispensable for their efficacious execution (Al-Ruithe & Benkhelifa, 2017; Murray, 2023).
Given the industry's landscape characterized by rigorous regulations, intricate data dynamics,
and the imperative of data-driven decision-making, the backing and dedication of leaders
underscore the strategic significance of data governance (Al-Ruithe & Benkhelifa, 2017;
Murray, 2023). This affirmation reverberates across the entire organizational fabric,
establishing a guiding precedent for all departments to prioritize and proactively engage with
Furthermore, this sponsorship will serve as a catalyst for introducing incentives that can emerge
as a potent strategy to induce compliance with data integrity regulations for both employees
and pharmaceutical companies (Yang et al., 2010). These incentives, taking the form of
integrity issues or for the successful implementation of robust data security measures (Yang et
al., 2010). By employing such incentives, a motivating impetus is cultivated among employees
and organizations, spurring them to uphold elevated standards of data integrity (Yang et al.,
2010).
Within the dynamic landscape of the pharmaceutical industry, the integration of comprehensive
employee data competency training holds pivotal importance. This specialized training
initiative aims to enhance the data literacy and proficiency of the workforce, all while fostering
a deep understanding of regulatory compliance directives (Alhassan et al., 2019; Shafiei et al.,
mitigate the risks linked with mishandling data, ensuring the precision, credibility, and security
The complex nature of the pharmaceutical industry amplifies the importance of guaranteeing
that each staff member possesses a strong grasp of data competency and a clear grasp of
regulatory protocols (Alhassan et al., 2019; MHRA, 2015; PIC, 2021; Shafiei et al., 2015).
Through these meticulously crafted training endeavors, a culture rooted in conscientious data
making, regulatory adherence, and the optimal utilization of data assets (Alhassan et al., 2019;
Shafiei et al., 2015). Within the pharmaceutical context, characterized by its unwavering
commitment to precision, safety, and compliance, furnishing employees with the aptitude to
navigate the intricate data terrain not only enhances operational efficiency but also reinforces
the industry's steadfast dedication to achieving excellence driven by data (Truong et al., 2017).
Adequate training for analysts and operators stands as an imperative prerequisite to ensure
adherence to current Good Manufacturing Practice (cGMP) requisites and empower them to
access all data pertinent to cGMP processes when required (Charoo et al., 2023). This
and an understanding of the crucial role of accurate and accessible data in upholding regulatory
An essential facet lies in providing employees with comprehensive training concerning the
stipulations outlined in 21 CFR Part 11, which pertains to electronic records and signatures
(Charoo et al., 2023). In the contemporary landscape, a cardinal directive mandates the
immediate recording of data as it transpires (Charoo et al., 2023). To address this requirement,
analysts should be trained to document their activities once they conclude, and mechanisms
must be firmly in place to ensure adherence to this procedural norm (Charoo et al., 2023).
Within the pharmaceutical realm, where precision, transparency, and unwavering regulatory
compliance form the bedrock, the establishment of well-defined roles and open channels of
communication stands as a powerful mechanism (Alhassan et al., 2019; Al-Ruithe et al., 2016;
Khatri & Brown, 2010; Murray, 2023). In this context, everyone within the organization is
empowered to uphold data integrity, aligning with the overarching mission of advancing
healthcare through meticulous data management (Pérez, 2017; Spivey, 2022; Unger, 2017).
clinical trials, regulatory adherence, and patient safety, a methodical approach to data handling
becomes imperative (Khin et al., 2020). By meticulously outlining roles and responsibilities,
their pivotal contributions to data governance and regulatory alignment (Alhassan et al., 2019;
Al-Ruithe et al., 2016; Khatri & Brown, 2010; Murray, 2023). This cultivation of awareness
security, and ethical utilization (Rebollo et al., 2015; Tountopoulos et al., 2014).
where data governance principles naturally interweave with day-to-day operational practices
(Alhassan et al., 2019). This concerted endeavor ensures that precision, transparency, and
regulatory adherence remain steadfast at every level, fortifying the bedrock of conscientious
data management and the pursuit of enhanced healthcare outcomes (Charoo et al., 2023; FDA,
In line with this, the FDA (2018) recommends the assignment of the system administrator role
to individuals not responsible for record content. This segregation of duties serves to thwart
unauthorized alterations to records, thereby upholding data integrity and security (FDA, 2018).
and secure storage solutions (Al-Ruithe et al., 2016; Iansiti et al., 2021). The pharmaceutical
amounts of information are generated and scrutinized for various purposes such as research,
development, clinical trials, regulatory adherence, and patient safety (Khin et al., 2020). In this
intricate terrain, tapping into the potential of advanced technologies, notably AI, not only
accelerates the processing and analysis of data but also augments the precision and accuracy of
the insights gleaned (Al-Ruithe et al., 2016; Iansiti et al., 2021). AI-driven analytics can uncover
patterns, correlations, and trends within massive datasets that might be difficult to discern
In essence, the infusion of cutting-edge technology into the pharmaceutical context transcends
mere efficiency gains. It serves as the bedrock for data-driven decision-making, enriches
research and development undertakings, and reinforces the industry's dedication to patient
safety and regulatory conformity (Alosert et al., 2022; Charoo et al., 2023; Khin et al., 2020).
aligns with the dynamic landscape of the sector, equipping pharmaceutical companies to
Furthermore, secure storage solutions become paramount due to the sensitive nature of
pharmaceutical data, which includes proprietary research, patient records, and regulatory
documentation (Shafiei et al., 2015). By employing advanced encryption techniques and secure
storage systems, pharmaceutical companies can safeguard critical data from unauthorized
access, breaches, and potential cyber threats (Iansiti et al., 2021; Khin et al., 2020; Shafiei et
al., 2015). This fortified data infrastructure not only ensures compliance with stringent
regulatory requirements but also fosters trust among stakeholders, including regulatory
agencies, partners, and patients (Shafiei et al., 2015). The integration of cutting-edge
technological solutions, including advanced tools like electronic laboratory notebooks (ELN),
(MES), has emerged as a cornerstone in augmenting data management practices within the
biopharmaceutical sector (Weiss, 2022). In this regard, Electronic Laboratory Notebook (ELN)
systems enable real-time data recording during activities, ensuring compliance with the
software platforms offer the capability to record activity details in a secure audit trail that
7.5 Collaboration
and various business units emerges as a fundamental catalyst, facilitating the seamless
integration of diverse data sources and fostering cohesive decision-making across the
organization (Khatri & Brown, 2010). The pharmaceutical industry operates within a complex
landscape where departments such as research, clinical trials, regulatory affairs, manufacturing,
and data management each generate and manage a wealth of data (Khatri & Brown, 2010).
Promoting collaboration between these units, Data Departments, and IT ensures that data flows
This collaboration takes on added significance in the pharmaceutical context due to the
industry's intricate demands and stringent regulations that must be adhered to, all while
maintaining a patient-centric focus (Khatri & Brown, 2010). The amalgamation of data sharing
empowers departments to access precise and current information, enhancing the efficiency of
treatments and therapies, the symbiotic relationship between IT, Data Departments, and
business units nurtures an environment where data-driven insights can guide and expedite
substantively to the advancement of healthcare outcomes (Cheong & Chang, 2007; Redman,
By championing the value of data stewardship across all ranks, a shared sense of collective
ownership and commitment to the data governance framework is fostered (Rosenbaum, 2010;
Shafiei et al., 2015). This cultural shift goes beyond the mere adoption of technologies; it
encapsulates a mindset where data is recognized as a strategic asset, and its responsible
management is ingrained in the fabric of the organization's ethos (Al-Ruithe et al., 2016; Charoo
The pharmaceutical industry operates within a framework that requires meticulous attention to
detail, adherence to regulatory requirements, and the pursuit of scientific breakthroughs (Iansiti
et al., 2021; Rosenbaum, 2010). A data-driven culture equips employees with the tools and
mindset to make informed decisions, grounded in accurate and reliable data (Shafiei et al.,
processes, enhance research and development endeavors, and optimize patient care (Charoo et
In an industry where strict adherence to regulations takes precedence, aligning data governance
practices with compliance requirements becomes imperative (FDA, 2018; Seddon & Currie,
the integrity of data is upheld while potential risks are mitigated (Charoo et al., 2023; Shafiei
et al., 2015).
Establishing clear data processes and procedures forms a cornerstone in effective data
governance within the pharmaceutical industry (Charoo et al., 2023; Shafiei et al., 2015). By
delineating step-by-step guidelines for data collection, storage, usage, sharing, and disposal,
pharmaceutical companies ensure consistency, accuracy, and compliance throughout their data
lifecycle (Weiss, 2022). These standardized processes streamline operations, mitigate errors,
and enhance data integrity, allowing the organization to make informed decisions based on
The gravity of clear data processes and procedures is heightened in the pharmaceutical realm
due to its intricate regulatory environment and the paramount significance of data in domains
like drug development, clinical trials, and patient safety (Alosert et al., 2022; Charoo et al.,
2023; Khin et al., 2020). Regulatory bodies mandate transparent documentation of data
handling practices to validate research outcomes and safeguard patient well-being (FDA, 2018).
By adhering rigorously to meticulously defined data processes, pharmaceutical entities not only
harmonize with regulatory mandates but also reduce the risk of non-compliance, data breaches,
and erroneous conclusions (Alosert et al., 2022; Charoo et al., 2023; Khin et al., 2020).
In crafting these processes and procedures, pharmaceutical companies can draw inspiration
Consistent, Enduring). These frameworks ensure the quality, traceability, and authenticity of
data (Alharbi et al., 2021; Chubb, 2021; Fox, 2019; Roe, 2021). Additionally, integrating
principles from Good Manufacturing Practice (cGMP) standards further augments the
dependability and integrity of data (Spivey, 2022). cGMP, requiring meticulous documentation
and standardized processes, plays a pivotal role in sustaining product quality and patient safety
(Spivey, 2022).
After defining clear procedures, communication emerges as a critical aspect, encompassing the
(Ladley, 2019). This highlights the urgency for well-structured communication strategies
(Morabito, 2015).
While this study has provided valuable insights into data governance within the pharmaceutical
industry, certain limitations warrant acknowledgment and suggestions for future research. One
limitation is the sample size, as the survey garnered responses from only 19 participants,
potentially limiting the breadth of coverage across the industry. Although the findings are
informative, a larger and more diverse sample could provide a more comprehensive
conduct one-on-one interviews due to confidentiality constraints restricted the depth of insights.
interviews or utilize alternative methods that ensure anonymity, thereby facilitating more
Furthermore, the global nature of the pharmaceutical industry introduces a geographical aspect
regulatory frameworks and industry practices could impact the results. To address this, future
studies could focus on specific geographic zones, delving into the intricacies of data governance
issues within that region. By narrowing the scope, researchers can gather more localized, in-
depth insights that align with the regulatory and operational nuances of that specific zone. This
approach would yield more targeted and actionable recommendations for companies operating
In retrospect, mitigating the limitations of sample size and participant interaction, while
adopting a more localized research focus, would enhance the comprehensiveness and
9 Conclusion
This thesis has undertaken a thorough investigation into data governance within the
pharmaceutical industry, yielding valuable insights into the core factors that contribute to
effective data management. Through a careful blend of literature review and survey analysis,
The literature review has shed light on various aspects, highlighting the importance of data
management, data governance, the challenges related to data integrity in pharmaceuticals, data
governance in cloud computing environments, and the data lifecycle in BioPharma. While
earlier studies have emphasized the significance of data governance and integrity across
understanding the specific challenges and best practices for implementing data governance
within the context of digital transformation in this industry. While some research has explored
the adoption of specific tools to improve data integrity, comprehensive studies examining data
governance frameworks and their impact on digital transformation outcomes are lacking.
As the landscape of drug development evolves with the introduction of new approaches,
questions arise regarding the suitability of existing informatics and hardware systems for small-
biotechnology companies with considerable purchasing power but limited legacy infrastructure
presents an opportunity for a fresh and innovative approach to traditional informatics (Weiss,
2022).
The adoption of these new technologies is driven by their potential to offer a holistic approach
to BioPharma Lifecycle Management (Weiss, 2022). This approach involves utilizing out-of-
data repositories based on F.A.I.R principles, and integrated analytics to support business
intelligence through a unified digital platform (Alharbi et al., 2021, 2023; Fox, 2019; Wise et
al., 2019). The success of these technologies hinges on their ease of implementation, immediate
Building upon this context, this thesis developed a survey to delve deeply into the real
challenges faced by the pharmaceutical industry and identified seven critical success factors to
address these issues. The significance of obtaining sponsorship from organizational leaders,
fostering a culture that values data, and defining clear roles and responsibilities has been
emphasized. These elements collectively provide a strong foundation for the implementation of
promoting collaboration between IT, Data Departments, and business units have been identified
as essential components that contribute to the smooth integration of data sources and enhance
decision-making processes. Furthermore, the adoption of clear data processes and procedures,
guided by principles such as F.A.I.R., ALCOA+, and cGMP, ensures data integrity, regulatory
Drawing from these insights, tailored recommendations can be developed for different types of
pharmaceutical companies. While Centralized data governance may suit larger organizations
requiring a unified approach, Federated models might benefit companies with distinct business
units seeking autonomy. Hybrid approaches, blending elements from both models, could
provide flexibility and adaptability for companies with diverse data needs. The implications of
this research extend widely. Effective data governance empowers pharmaceutical companies to
optimize decision-making, innovate with confidence, and align with regulatory requirements.
Future research could explore the measurable impact of different data governance models on
governance practices could provide valuable insights into evolving industry trends.
10 References
healthcare-examples/
Abu-Elkheir, M., Hayajneh, M., & Ali, N. A. (2013). Data Management for the Internet of
https://doi.org/10.3390/s131115582
Alharbi, E., Skeva, R., Juty, N., Jay, C., & Goble, C. (2021). Exploring the Current Practices,
https://doi.org/10.1162/dint_a_00109
Alharbi, E., Skeva, R., Juty, N., Jay, C., & Goble, C. (2023). A FAIR-Decide framework for
Alhassan, I., Sammon, D., & Daly, M. (2016). Data governance activities : An analysis of the
https://doi.org/10.1080/12460125.2016.1187397
Alhassan, I., Sammon, D., & Daly, M. (2019). Critical Success Factors for Data Governance :
https://doi.org/10.1080/10580530.2019.1589670
Alosert, H., Savery, J., Rheaume, J., Cheeks, M., Turner, R., Spencer, C., S. Farid, S., &
Goldrick, S. (2022). Data integrity within the biopharmaceutical sector in the era of
https://doi.org/10.1002/biot.202100609
Al-Ruithe, M., & Benkhelifa, E. (2017). Analysis and Classification of Barriers and Critical
Al-Ruithe, M., Benkhelifa, E., & Hameed, K. (2016). Key Dimensions for Cloud Data
Governance. 2016 IEEE 4th International Conference on Future Internet of Things and
Al-Ruithe, M., Benkhelifa, E., & Hameed, K. (2019). A systematic literature review of data
governance and cloud data governance. Personal and Ubiquitous Computing, 23(5),
839‑859. https://doi.org/10.1007/s00779-017-1104-3
Badger, M. L., Grance, T., Patt-Corner, R., & Voas, J. (2012). Cloud computing synopsis and
https://www.who.int/news/item/28-11-2017-1-in-10-medical-products-in-developing-
countries-is-substandard-or-falsified
International. https://bioprocessintl.com/manufacturing/information-technology/a-
harmonized-approach-to-data-integrity/
ProQuest.
https://www.proquest.com/openview/7f3db1944e62aa8e5476538565c86ca2/1?pq-
origsite=gscholar&cbl=18750
Charoo, N. A., Khan, M. A., & Rahman, Z. (2023). Data integrity issues in pharmaceutical
Cheong, L. K., & Chang, V. (2007). The Need for Data Governance : A Case Study.
Chia, J. (2023). The Modern Data Stack Explained : What The Future Holds | Alation. Alation.
https://www.alation.com/blog/modern-data-stack-explained/
Chubb, P. (2021, décembre 7). Setting the Standard : FAIR & ALCOA+ in research during the
Code of Federal Regulations. (2022). 21 CFR Part 211—Current Good Manufacturing Practice
https://www.ecfr.gov/current/title-21/chapter-I/subchapter-C/part-211
Cox. (2019, juillet 16). US FDA Warning Letter Hits Strides On Data Integrity. Pharma
Intelligence. https://pink.pharmaintelligence.informa.com/PS140515/US-FDA-
Warning-Letter-Hits-Strides-On-Data-Integrity
DAMA International. (2009). The DAMA Guide to The Data Management Body of Knowledge
https://doi.org/10.5555/3165209
Dogo, E. M., Salami, A. F., & Salman, S. (2013). Feasibility Analysis of Critical Factors
https://uilspace.unilorin.edu.ng/handle/20.500.12484/2832
Eglovitch, J. (2022). Experts Say FDA Enforcement Focus Unchanged, Use Of Alternative
http://www.ema.europa.eu/docs/enGB/document library/Scientific
guideline/2010/03/WC500075028.pdf
EudraLex. (2011). The Rules Governing Medicinal Products in the European Union: Vol.
Volume 4 Good Manufacturing Practice Medicinal Products for Human and Veterinary
https://health.ec.europa.eu/system/files/2016-11/annex11_01-2011_en_0.pdf
health-and-care/european-health-data-space_en
FDA. (2004). Guidance for Industry PAT - A Framework for Innovative Pharmaceutical
https://www.fda.gov/media/71012/download
FDA. (2016). Submission of Quality Metrics Data Guidance for Industry. U.S. Department of
https://www.fda.gov/media/93012/download
FDA. (2018). Data Integrity and Compliance With Drug CGMP Questions and Answers
Guidance for Industry. U.S. Department of Health and Human Services Food and Drug
Administration. https://www.fda.gov/media/119267/download
FDA. (2003). Part 11, Electronic Records; Electronic Signatures—Scope and Application. U.S.
information/search-fda-guidance-documents/part-11-electronic-records-electronic-
signatures-scope-and-application
FDA warning letter. (2015). FDA warning letter Zhejiang Hisun Pharmaceutical Co., Ltd.
https://www.fdanews.com/ext/resources/files/2016/01/01-12-16-
ZhejiangHisunPharma.pdf?1520830779
FDA warning letter. (2017). FDA warning letter FACTA Farmaceutici S.p.A. MARCS-CMS
495986. https://www.fda.gov/inspections-compliance-enforcement-and-criminal-
investigations/warning-letters/facta-farmaceutici-spa-495986-01132017
FDA warning letter. (2021, octobre 8). BBC Group Limited—614659—08/04/2021. FDA; FDA.
https://www.fda.gov/inspections-compliance-enforcement-and-criminal-
investigations/warning-letters/bbc-group-limited-614659-08042021
Felici, M., Koulouris, T., & Pearson, S. (2013). Accountability for Data Governance in Cloud
Fleming, N. (2018). How artificial intelligence is changing drug discovery. Nature, 557(7707),
S55‑S57. https://doi.org/10.1038/d41586-018-05267-x
Floryanzia, S., Ramesh, P., Mills, M., Kulkarni, S., Chen, G., Shah, P., & Lavrich, D. (2022).
https://pharmaphorum.com/views-analysis-digital/leveraging-the-fair-principles-of-
data-in-pharma
https://www.fda.gov/media/84744/download
HDS certification : Healthcare data hosting | OVHcloud. (s. d.). Consulté 18 septembre 2023,
à l’adresse https://www.ovhcloud.com/en-gb/enterprise/certification-conformity/hds/
Henstock, P. V. (2019). Artificial Intelligence for Pharma : Time for Internal Investment.
https://doi.org/10.1016/j.tips.2019.05.003
Hodgson, D., Maini, F., Greenrose, W., Christiani, S., Chan, S., & Hargitai, B. (2017). Under
https://www2.deloitte.com/content/dam/Deloitte/uk/Documents/life-sciences-health-
care/deloitte-uk-data-integrity-report.pdf
Holub, P., Kohlmayer, F., Prasser, F., Mayrhofer, M. T., Schlunder, I., Martin, G. M., & Litton,
https://www.timextender.com/blog/data-empowered-leadership/the-modern-data-
stack-is-broken
Huff, N. S., DHA, MBA, CHC, & CHSP. (2019). Maintaining data integrity.
https://compliancecosmos.org/maintaining-data-integrity
Iansiti, M., Lakhani, K. R., Mayer, H., & Herman, K. (2021). Moderna (A). Harvard Business
School Publishing.
IDBS. (s. d.). IDBS - The FAIR principles : A quick introduction. IDBS. Consulté 3 mai 2023,
à l’adresse https://www.idbs.com/the-fair-principles-a-quick-introduction/
Khanghahi, N., & Ravanmehr, R. (2013). Cloud Computing Performance Evaluation : Issues
Khatri, V., & Brown, C. V. (2010). Designing data governance. Communications of the ACM,
Khin, N. A., Francis, G., Mulinde, J., Grandinetti, C., Skeete, R., Yu, B., Ayalew, K., Cho, S.-
J., Fisher, A., Kleppinger, C., Ayala, R., Bonapace, C., Dasgupta, A., Kronstein, P. D.,
& Vinter, S. (2020). Data Integrity in Global Clinical Trials : Discussions From Joint
Ko, R. K. L., Jagadpramana, P., Mowbray, M., Pearson, S., Kirchberg, M., Liang, Q., & Lee,
https://doi.org/10.1109/SERVICES.2011.91
Koh, S. C. L., Gunasekaran, A., & Goodman, T. (2011). Drivers, barriers and critical success
factors for ERPII implementation in supply chains : A critical analysis. The Journal of
https://doi.org/10.1016/j.jsis.2011.07.001
Ladley, J. (2019). Data Governance : How to Design, Deploy, and Sustain an Effective Data
https://books.google.fr/books?id=AkW9DwAAQBAJ&lpg=PP1&ots=OO7TJLvFwG
&dq=Ladley%20J.%20Data%20Governance%20Program&lr&hl=fr&pg=PR12#v=on
epage&q&f=false
Leesakul, N., Oostveen, A.-M., Eimontaite, I., Wilson, M. L., & Hyde, R. (2022). Workplace
https://doi.org/10.3390/su14063311
challenges-to-machine-learning-adoption-and-implementation-in-the-lab-26870
Machado, I. A., Costa, C., & Santos, M. Y. (2022). Data Mesh : Concepts and Principles of a
https://doi.org/10.1016/j.procs.2021.12.013
Marcelo Corrales Compagnucci, Michael Lowery Wilson, Mark Fenwick, Nikolaus Forgó, &
https://search.ebscohost.com/login.aspx?direct=true&db=nlebk&AN=3341835&site=e
host-live&scope=site
Marshall, I. J., & Wallace, B. C. (2019). Toward systematic review automation : A practical
guide to using machine learning tools in research synthesis. Systematic Reviews, 8(1),
Mary, B., Mccarthy, P., & Hill, S. (2011). Cloud adoption points to IT risk and data governance
McDowall, R. (2020). Nightmare on lab street—Are you haunted by hybrid systems ? Agilent
https://www.agilent.com/cs/library/articlereprints/public/Agilent-LCGC-ebook-data-
integrity-tips-for-regulated-laboratories-part-1.pdf
MHRA. (2015). MHRA GMP Data Integrity Definitions and Guidance for Industry March
content/uploads/2015/04/Data_integrity_definitions_and_guidance_v2.pdf
Morabito, V. (2015). Big Data Governance. In V. Morabito (Éd.), Big Data and Analytics :
https://doi.org/10.1007/978-3-319-10665-6_5
Murray, S. (2023). Organizing Talent : Return Of The Data Center Of Excellence. Monte Carlo
Data. https://www.montecarlodata.com/blog-data-center-of-excellence/
Nadal, S., Abelló, A., Romero, O., Vansummeren, S., & Vassiliadis, P. (2023). Graph-Driven
Neumeyer, M. (2020, juin 23). Data Integrity : 2020 FDA Data Integrity Observations in
Review. http://www.americanpharmaceuticalreview.com/Featured-Articles/565600-
Data-Integrity-2020-FDA-Data-Integrity-Observations-in-Review/
Otto, B. (2015). Quality and Value of the Data Resource in Large Enterprises. Information
https://doi.org/10.1080/10580530.2015.1044344
Pérez, J. R. (2017). Maintaining Data Integrity Avoiding regulator scrutiny in the medical
content/uploads/2017/10/Article-Data-Integrity.pdf
PIC. (2021). Good Practices For Data Management And Integrity In Regulated Gmp/Gdp
Rattan, A. K. (2018). Data Integrity : History, Issues, and Remediation of Issues. PDA Journal
https://doi.org/10.5731/pdajpst.2017.007765
Rebollo, O., Mellado, D., Fernández-Medina, E., & Mouratidis, H. (2015). Empirical
https://doi.org/10.1016/j.infsof.2014.10.003
Redman, T. C. (2013). Data’s Credibility Problem. Harvard Business Review, 91(12), 84‑88.
Rifaie, M., Alhajj, R., & Ridley, M. (2009). Data governance strategy : A key issue in building
https://doi.org/10.1145/1806338.1806449
Roe, R. (2021). Lack of FAIR data reduces life sciences innovation in laboratory informatics.
30‑31.
Entities and Advancing Data Access. Health Services Research, 45(5p2), 1442‑1455.
https://doi.org/10.1111/j.1475-6773.2010.01140.x
https://www.lifescienceleader.com/doc/how-to-avoid-data-integrity-woes-in-pharma-
0001
Seddon, J. J. M., & Currie, W. L. (2013). Cloud computing and trans-border health data :
Unpacking U.S. and EU healthcare regulation and compliance. Health Policy and
Self, R. J. (2014). Governance Strategies for the Cloud, Big Data, and Other Technologies in
Selvaraj, S., & Sundaravaradhan, S. (2019). Challenges and opportunities in IoT healthcare
https://doi.org/10.1007/s42452-019-1925-y
Shafiei, N., Montardy, R. D., & Rivera-Martinez, E. (2015). Data Integrity—A Study of Current
Spivey, C. (2022). Ensuring CGMP Standards for Data Integrity. In the Lab eNewsletter, 17(6).
https://www.pharmtech.com/view/ensuring-cgmp-standards-for-data-integrity
Tallon, P. P., Ramirez, R. V., & Short, J. E. (2013). The Information Artifact in IT Governance :
Tountopoulos, V., Felici, M., Pannetrat, A., Catteddu, D., & Pearson, S. (2014). Interoperability
Analysis of Accountable Data Governance in the Cloud. In F. Cleary & M. Felici (Éds.),
https://doi.org/10.1007/978-3-319-12574-9_7
Truong, T., George, R., & Davidson, J. (2017). Establishing an Effective Data Governance
Unger, B. (2017). Best Practices For Data Integrity Oversight At Your Contract Manufacturer.
data-integrity-oversight-at-your-contract-manufacturer-0001
Vamathevan, J., Clark, D., Czodrowski, P., Dunham, I., Ferran, E., Lee, G., Li, B., Madabhushi,
A., Shah, P., Spitzer, M., & Zhao, S. (2019). Applications of machine learning in drug
https://doi.org/10.1038/s41573-019-0024-5
Van Vlijmen, H., Mons, A., Waalkens, A., Franke, W., Baak, A., Ruiter, G., Kirkpatrick, C.,
da Silva Santos, L. O. B., Meerman, B., Jellema, R., Arts, D., Kersloot, M.,
Knijnenburg, S., Lusher, S., Verbeeck, R., & Neefs, J.-M. (2020). The Need of Industry
Weber, K., Otto, B., & Österle, H. (2009). One Size Does Not Fit All—A Contingency
Approach to Data Governance. Journal of Data and Information Quality, 1(1), 4:1-4:27.
https://doi.org/10.1145/1515693.1515696
Wende, K. (2007). A Model for Data Governance – Organising Accountabilities for Data
Wise, J., De Barron, A. G., Splendiani, A., Balali-Mood, B., Vasant, D., Little, E., Mellino, G.,
Harrow, I., Smith, I., Taubert, J., Van Bochove, K., Romacker, M., Walgemoed, P.,
Jimenez, R. C., Winnenburg, R., Plasterer, T., Gupta, V., & Hedley, V. (2019).
Wise, J., Möller, A., Christie, D., Kalra, D., Brodsky, E., Georgieva, E., Jones, G., Smith, I.,
Greiffenberg, L., McCarthy, M., Arend, M., Luttringer, O., Kloss, S., & Arlington, S.
(2018). The positive impacts of Real-World Data on the challenges facing the evolution
https://doi.org/10.1016/j.drudis.2018.01.034
Yang, L., Sun, G., & Eppler, M. J. (2010). Making Strategy Work : A Literature Review on the
https://doi.org/10.4337/9781849807289.00015