0% found this document useful (0 votes)
218 views

Data Curation and Management

This document outlines a framework for data curation and management. It discusses key components such as data governance, storage, quality, integration, preservation, sharing, ownership, security, and challenges. Real-world examples of data curation frameworks from NOAA, TCGA, HCA, and EOSC are provided. The significance of a data curation framework includes improved data quality, increased efficiency, cost reduction, better decision making, and compliance with regulations.

Uploaded by

Mohammed Seid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
218 views

Data Curation and Management

This document outlines a framework for data curation and management. It discusses key components such as data governance, storage, quality, integration, preservation, sharing, ownership, security, and challenges. Real-world examples of data curation frameworks from NOAA, TCGA, HCA, and EOSC are provided. The significance of a data curation framework includes improved data quality, increased efficiency, cost reduction, better decision making, and compliance with regulations.

Uploaded by

Mohammed Seid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

University of Gondar

College of Informatics
Department of Information Science
Data Science and analytics Post Graduate program

Course name: -Advanced Data Curation and Management

Title: Data curation and management framework

Prepared By:

1. Mohammed Seid

Submitted to (Asst.proff.) Assefa chekole


Date of submission 30,May, 2015 E.C
Data Curation And management 2015
E.C

Table Contents
Introduction ................................................................................................................................2
Significance ................................................................................................................................3
Key components of data curation and management .....................................................................4
Data governance: .....................................................................................................................4
Benefits of Data Governance ...................................................................................................5
Data storage.............................................................................................................................6
Data quality .............................................................................................................................8
Data integration .......................................................................................................................9
Data preservation .....................................................................................................................9
Data Preservation Planning: ...................................................................................................... 10
Data sharing and reuse .............................................................................................................. 11
Data ownership and intellectual property ................................................................................... 12
Data curation:............................................................................................................................ 12
Data security ............................................................................................................................. 13
Some common challenges in data curation and management include: ........................................ 14
Tools and technologies .............................................................................................................. 15
Real-world examples for data curation and management framework ......................................... 19
The National Oceanic and Atmospheric Administration (NOAA) .......................................... 19
The Cancer Genome Atlas (TCGA) ....................................................................................... 19
The Human Cell Atlas (HCA)................................................................................................ 20
The European Open Science Cloud (EOSC) .......................................................................... 20
Related work Review Summary ................................................................................................ 20
Conclusion ................................................................................................................................ 22
Reference .................................................................................................................................. 23

1
Data Curation And management 2015
E.C

Introduction
Data curation and management framework is the combination of two concepts which are data
curation and data management, the first one “Data curation is the active and on-going
management of data through its lifecycle of interests and usefulness to scholarship, science, and
education. Data curation activities enable data discovery and retrieval, maintain its quality, add
value, and provide re-use over time, and this new field includes authentication, archiving,
management, preservation, retrieval, and representation” .[1] Data curation is the process by
which data are put into a state to be managed so that they can be understood and used by parties
across disciplines and organizations. The passage of time should not prohibit the use of the data.
This requires that appropriate measures be undertaken to ensure data infrastructure, searchability,
availability, and preservation [2] .

in the other hand Business dictionary dot com defines data management as “administrative
process by which the required data is acquired, validated, stored, protected, and processed, and
by which its accessibility, reliability, and timeliness is ensured to satisfy the needs of data users”
(Business Dictionary.com, 2012). Data must also be continuously extended, updated, and made
secure for reuse through consultation (Otlet, 1903; Bush, 1945; Rayward, 1998) and “discursive
formation” (Foucault, 1972) to remain useful to science [1].

Data curation and management framework refers to a structured approach that guides the
organization, storage, preservation, and sharing of data throughout its lifecycle. The framework
helps ensure that data is accurate, reliable, and accessible to users while protecting its integrity
and confidentiality. and also it is the active and ongoing management of data through its lifecycle
of interest and usefulness to scholarship, science, and education. It helps to make data
discoverable, accessible, and intelligible, and supports data reuse. It involves policies,
procedures, and technologies that enable organizations to control data and ensure its availability,
security, and usability. The framework helps organizations to effectively manage and utilize their
data assets, ensuring that they are accurate, reliable, and available when needed. This, in turn,
can help organizations to make better decisions, improve their operations, and achieve their
goals[3]

2
Data Curation And management 2015
E.C

Significance

The significance and benefits of a data curation and management framework are numerous and
can have a significant impact on an organization's operations and success.

1. Improved Data Quality: A data curation and management framework can help to
improve the quality of data by ensuring that it is accurate, complete, and consistent. This
can help to ensure that decisions are made based on reliable data, leading to better
outcomes and improved organizational performance.
2. Increased Efficiency: A data curation and management framework can help to
increase organizational efficiency by making data more easily accessible and reducing
the time and effort required finding and analyzing data.
3. Cost Reduction: A data curation and management framework can help to reduce costs
by streamlining data management processes, reducing data redundancy, and
improving data quality. This can lead to cost savings in areas such as storage, data
management, and data analysis.
4. Improved Decision-Making: A data curation and management framework can help to
improve decision-making by providing accurate and relevant data in a timely manner.
This can help organizations to make informed decisions that are based on data-driven
insights.
5. Enhanced Data Security: A data curation and management framework can help to
enhance data security by implementing appropriate data security measures such as access
controls, encryption, and data backup and recovery. This can help to prevent data
breaches and other security incidents that can lead to significant financial, legal,
and reputational damage.
6. Compliance with Regulations: A data curation and management framework can help
organizations to comply with relevant regulations such as data privacy laws, data security
standards, and industry-specific regulations. This can help to avoid legal and regulatory
penalties and ensure that the organization is operating in a responsible and ethical
manner.

3
Data Curation And management 2015
E.C

Key components of data curation and management

Data governance:
Data governance is a process of Establishing policies, procedures, and standards to guide data
management activities. Data governance specifies a cross-functional framework for managing
data as a strategic enterprise asset. In doing so, data governance specifies decision rights and
accountabilities for an organization’s decision making about its data. Furthermore, data
governance formalizes data policies, standards, and procedures and monitors compliance.

4
Data Curation And management 2015
E.C

Figure 1 conceptual framework for data governance adopted from [3]

Benefits of Data Governance

An effective data governance strategy provides many benefits to an organization, including:

 A common understanding of data — Data governance provides a consistent view of,


and common terminology for, data, while individual business units retain appropriate
flexibility.
 Improved quality of data — Data governance creates a plan that ensures data accuracy,
completeness, and consistency.
 Data map — Data governance provides an advanced ability to understand the location of
all data related to key entities, which is necessary for . Like a GPS that can represent a
physical landscape and help people find their way in unknown landscapes, data
governance makes data assets useable and easier to connect with business outcomes.
 A 360-degree view of each customer and other business entities — Data governance
establishes a framework so an organization can agree on “a single version of the truth”
for critical business entities and create an appropriate level of consistency across entities
and business activities.

5
Data Curation And management 2015
E.C

 Consistent compliance — Data governance provides a platform for meeting the


demands of government regulations, such as the EU General Data Protection Regulation
(GDPR), the US HIPAA (Health Insurance Portability and Accountability Act), and
industry requirements such as PCI DSS (Payment Card Industry Data Security
Standards).
 Improved data management — Data governance brings the human dimension into a
highly automated, data-driven world. It establishes codes of conduct and best practices in
data management, making certain that the concerns and needs beyond traditional data and
technology areas — including areas such as legal, security, and compliance — are
addressed consistently

Data storage
Data storage: Implementing appropriate storage solutions to ensure data availability, security,
and scalability. Data storage is a key component of any data curation and management
framework. Some of the main considerations for data storage include:

1. Storage medium: The main options for storage media are:

 Hard drives: Conventional spinning hard drives are cheap but slower. Solid state
drives are faster but more expensive. For large data storage, hard drive arrays (RAID)
are often used.
 Tape: Tape storage is very cheap but slower to access. It is good for archival
storage of large amounts of data that is not accessed frequently.
 Cloud storage: Cloud storage services like Amazon S3 and Google Cloud Storage are
inexpensive, scalable, and redundant. But data stored in the cloud may have security
and privacy concerns.

2. Storage architecture: The storage infrastructure can be designed for high performance,
scalability, redundancy, etc. Some options include:

 Direct attached storage: Storage directly attached to a server. Simple but limited
scalability.

6
Data Curation And management 2015
E.C

 Storage area network: A separate storage network with storage arrays and storage
devices. Provides more flexibility and scalability.
 Object storage: A distributed storage architecture optimized for storing and
managing large amounts of unstructured data. Scalable and redundant. Used by
cloud storage services.

3. Storage redundancy: To ensure high availability and prevent data loss, storage
redundancy is important. This can include:

 RAID: Uses multiple hard drives to duplicate and spread data across the drives.
Protects against drive failures.
 Replication: Duplicating data across multiple storage systems in different
locations. Protects against system failures.
 Erasure coding: A method of breaking up data into fragments, expanding and
encoding the data with redundant fragments, and storing across different
locations. Also provides redundancy.
 Snapshots: Used to create point-in-time copies of data that can be restored if the
current data gets corrupted or lost.
 Storage management: Software tools are needed to manage and monitor storage
systems and capacity. These include solutions for backup, archiving, hierarchical
storage management, and storage optimization.

There are several common data storage strategies used for data curation, and the choice of
strategy depends on the type of data, the volume of data, and the intended use of the data. Some
of the most common data storage strategies for data curation include:

 Relational Database Management Systems (RDBMS): RDBMS is a traditional data


storage strategy that stores data in a structured format using tables with predefined
relationships between them. RDBMS can handle large volumes of structured data and
provides efficient querying and data retrieval capabilities.
 NoSQL Databases: NoSQL databases are designed to handle unstructured and semi-
structured data that does not fit well into the rigid structure of traditional relational
7
Data Curation And management 2015
E.C

databases. NoSQL databases are highly scalable and can handle large volumes of data,
making them an excellent choice for big data applications.
 Data Warehouses: Data warehouses are specialized databases that store large volumes
of structured data for analytical purposes. They are optimized for complex
queries and data analysis, and they typically integrate data from multiple sources to
provide a comprehensive view of an organization's data.
 Data Lakes: Data lakes are a newer data storage strategy that store data in its raw,
unstructured form. They provide a flexible and scalable data storage solution that can
handle large volumes of data from disparate sources. Data lakes are ideal for big data
applications where data is collected for later analysis.
 Cloud-Based Storage: Cloud-based storage is a popular option for data curation due to
its scalability, flexibility, and cost-effectiveness. Cloud-based storage solutions allow
organizations to store and manage large volumes of data without the need for expensive
hardware or infrastructure. Additionally, cloud-based storage solutions offer high
availability, data redundancy, and disaster recovery capabilities.

Data quality
Ensuring the accuracy, consistency, and reliability of data through validation, cleaning, and
enrichment processes. Data quality is a critical aspect of data curation and management.
Inaccurate or incomplete data can lead to incorrect analysis and inappropriate decision-making.
A data curation and management framework should include strategies to ensure data
quality throughout the data lifecycle, from data acquisition to data dissemination. Some common
strategies for ensuring data quality in data curation and management include:

1. Data Profiling: Data profiling involves analyzing the data to identify patterns,
inconsistencies, and errors. This process can help to identify data quality issues and allow
for corrective action to be taken.

2. Data Cleaning: Data cleaning involves correcting or removing errors and inconsistencies
in the data. Data cleaning can be done manually or through automated tools, such as data
quality software.

8
Data Curation And management 2015
E.C

3. Data Standardization: Data standardization involves ensuring that data is consistent and
conforms to specific standards. Standardizing data can help to improve data accuracy and
reduce errors.

Data Validation: Data validation involves checking the data against predefined rules or constraints to
ensure that it meets certain criteria. Data validation can help to identify and correct errors early in the
data lifecycle.

Data integration
Data integration is a critical component of data curation and management frameworks. It
involves the process of combining data from multiple sources and formats to create a unified
dataset. The goal of data integration is to provide users with a complete and coherent view of the
data relevant to their needs.

In a data curation and management framework, data integration typically involves several steps.
These may include:

1. Data discovery: This involves identifying the sources of data that need to be integrated.
This can include data from internal databases, external sources, or third-party providers.

2. Data mapping: This involves creating a mapping between the data elements in the
different data sources to identify common elements that can be used to integrate the data.

3. Data transformation: This involves converting data from one format to another to
ensure that it can be integrated with other data sources.

4. Data quality assurance: This involves ensuring that the data is accurate, complete, and
consistent across all integrated sources.

5. Data synchronization: This involves ensuring that the integrated data is kept up to date
and synchronized with the original data sources.

Data preservation
Data preservation: Safeguarding data against loss, degradation, or corruption and ensuring its
long-term accessibility. Data preservation is an essential component of data curation and management

9
Data Curation And management 2015
E.C
framework. It involves ensuring that data is stored and maintained over a long period of time and remains
accessible, usable, and understandable. Data preservation is necessary to ensure that data can be used for
future research, analysis, and decision-making.

The following are some of the key strategies for data preservation in a data curation and
management framework:

1. Data Documentation: Data documentation involves creating metadata and other


documentation that provides context about the data, including its origin, purpose, and
structure. This documentation helps to ensure that the data remains understandable and
usable over time.

2. Data Storage: Data storage involves choosing appropriate storage technologies that can
accommodate the volume and type of data being preserved. It is essential to choose
storage technologies that are scalable, reliable, and secure.

3. Data Backup: Data backup involves creating copies of the data and storing them in
multiple locations to ensure that data is not lost due to hardware failures, natural
disasters, or other unforeseen events. It is essential to have a backup and recovery plan in
place to ensure that data can be restored in the event of a disaster.

4. Data Migration: Data migration involves transferring data from one storage technology
to another as technology evolves or becomes obsolete. It is crucial to ensure that data
remains accessible and usable during the migration process.

5. Data Access: Data access involves providing access to the data to authorized users over a
long period of time. It is essential to have appropriate access controls in place to ensure
that the data is not accessed or used inappropriately.

Data Preservation Planning:


Data preservation planning involves creating a plan for preserving data over the long term. The
plan should include strategies for data documentation, storage, backup, migration, and access.

10
Data Curation And management 2015
E.C

Data sharing and reuse


Facilitating the sharing of data with external parties and promoting the reuse of existing data for
new purposes. Data sharing and reuse are important aspects of the data curation and management
framework. Sharing data allows others to access and use the data for research, analysis, and
decision-making, which can lead to new insights and discoveries. Reusing data can also help to
reduce the cost and time required to collect new data.

The following are some key strategies for data sharing and reuse in a data curation and
management framework:

1. Data Documentation: Data documentation is essential for sharing and reusing


data. Documentation should include information about the data's origin, purpose,
structure, and quality. It should also include information about any restrictions on data
use or sharing.

2. Data Access: Data access should be provided to authorized users to ensure that the data
can be shared and reused. Access controls should be put in place to ensure that data is not
accessed or used inappropriately.

3. Data Standardization: Standardizing data can help to ensure that data is consistent and
can be easily shared and reused across different platforms and
applications. Standardization can include using common file formats, data structures,
and metadata standards.

4. Data Discovery: Data discovery involves making data easy to find and access by others.
This can involve publishing data in repositories or data portals, providing search tools,
and using appropriate metadata standards.

5. Data Licensing: Data licensing involves specifying the terms and conditions under
which data can be shared and reused. Licensing can include specifying the allowable uses
of the data, any restrictions on data sharing, and any attribution requirements.

11
Data Curation And management 2015
E.C

6. Data Citation: Data citation involves providing a way to acknowledge and credit the
original data creators when data is shared and reused. Data citation can help to ensure that
data creators receive appropriate credit for their work.

Data ownership and intellectual property


Defining ownership rights, responsibilities, and intellectual property protections for data. Data
ownership and intellectual property are important considerations in the data curation and management
framework. Ownership and intellectual property rights determine who has the right to control, access, and
use the data. It is essential to establish clear ownership and intellectual property rights to prevent legal
disputes and ensure that data is used appropriately.

The following are some key strategies for addressing data ownership and intellectual property in
a data curation and management framework:

1. Data Ownership: Data ownership refers to the legal right to control, access, and use the
data. Data ownership may reside with individuals, organizations, or governments,
depending on the circumstances of data creation and collection. It is essential to establish
clear ownership of data to ensure that it is used appropriately.

2. Intellectual Property: Intellectual property refers to the legal rights associated


with creative works, such as patents, trademarks, and copyrights. Intellectual property
rights can apply to data, particularly if the data contains original works or inventions. It is
essential to establish clear intellectual property rights to ensure that data is not used
inappropriately or without permission.

Data curation:
Managing and maintaining data, including metadata, to enhance its discoverability, accessibility,
and usefulness for users. Data curation is a key component of data curation and management
frameworks. It involves the process of selecting, organizing, and maintaining data to ensure its accuracy,
completeness, and usability. The goal of data curation is to ensure that data is reliable and relevant for its
intended use, and that it is easily accessible to those who need it.

12
Data Curation And management 2015
E.C

In a data curation and management framework, data curation typically involves several steps.
These may include:

1. Data selection: This involves identifying the data that is most relevant to the project or
research question at hand.

2. Data acquisition: This involves obtaining the data from various sources, such as
databases, data repositories, or external sources.

3. Data cleaning: This involves removing any errors, duplicates, or inconsistencies in the
data to ensure its accuracy.

4. Data integration: This involves combining data from multiple sources and formats to
create a unified dataset.

5. Data documentation: This involves creating metadata and other documentation to


describe the data and its context.

Data security
Data security is an essential aspect of the data curation and management framework. Data
security involves protecting data from unauthorized access, modification, or destruction. It is
essential to implement appropriate data security measures to ensure that data is not accessed or
used inappropriately.

The following are some key strategies for data security in a data curation and management
framework:

1. Access Controls: Access controls ensure that only authorized users can access the data.
Access controls can include authentication, authorization, and encryption. It is essential
to implement appropriate access controls to prevent unauthorized access to the data.

2. Data Encryption: Data encryption involves converting data into a coded format that can
only be read by authorized users with the correct decryption key. Data encryption can
help to ensure that data is not accessed or used inappropriately.
13
Data Curation And management 2015
E.C

3. Data Classification: Data classification involves categorizing data into different levels of
sensitivity or confidentiality. Different levels of data classification can have different
access controls and encryption requirements based on their sensitivity.

4. Data Backup and Recovery: Data backup and recovery involves creating copies of the
data and storing them in multiple locations to ensure that data is not lost due to hardware
failures, natural disasters, or other unforeseen events. It is essential to have a backup
and recovery plan in place to ensure that data can be restored in the event of a disaster.

5. Data Auditing: Data auditing involves monitoring data access and use to detect
unauthorized access or use. Data auditing can help to identify and mitigate potential
security threats to the data.

6. Data Retention and Destruction: Data retention and destruction involves establishing
policies for how long data is kept and when it should be destroyed. Data retention and
destruction policies can help to ensure that data is not retained longer than necessary and
is properly disposed of when no longer needed.

Some common challenges in data curation and management include:

1. Lack of time and resources: Data curation requires time and effort, but researchers often
lack sufficient time or funding to devote to curation activities. This can lead to data that is
poorly organized, documented, and preserved.
2. Lack of expertise: Researchers do not always have the expertise required for good data
curation. Activities like metadata creation, digital preservation, and data
documentation require specific knowledge and skills.
3. Data heterogeneity: Research data comes in many different forms, formats, and
structures. Curation solutions that work for one type of data may not work for another.
This heterogeneity makes automation and standardization difficult.

14
Data Curation And management 2015
E.C

4. Lack of incentives: Researchers are often incentivized to publish papers, but not
necessarily to share or curate their data. This can lead to a lack of motivation to invest in
good data practices.
5. Evolving technology: Technology used in research is constantly changing. This means
data, formats, storage solutions, software, etc. are also changing. Keeping data usable and
accessible over time requires ongoing curation efforts to address technological changes.
6. Privacy and ethical concerns: Some data cannot be shared or reused due to privacy,
confidentiality, or other ethical concerns. Determining how to handle sensitive data in an
ethical way can be challenging.
7. Lack of tools and infrastructure: Good tools and infrastructure for activities like metadata
generation, format migration, storage, access provision, and digital preservation do not
always exist, especially for highly specialized or heterogeneous data types. This makes
curation difficult.
8. Unsustainable practices: Short-term, ad hoc solutions are common in research, but these
approaches do not support long-term data access and reuse. Developing sustainable
practices and infrastructure requires a significant shift in culture and mindset.
9. Lack of standards: The lack of community standards and best practices for data in some
domains makes establishing a reasonable and consistent approach to curation
challenging. Standards help enable sharing, interoperability and reuse.

Tools and technologies


Data curation and management frameworks are essential for organizing, storing, and maintaining
data, ensuring its quality, and making it accessible for analysis and decision-making. There are
several tools and technologies available to facilitate data curation and management. Some
popular options include

1. Database Management Systems (DBMS): A DBMS is software designed to manage


databases, allowing users to define, create, maintain, and control access to the database.

15
Data Curation And management 2015
E.C

o Relational DBMS: A Relational DBMS is a type of database management system


that organizes data into tables, which consist of rows and columns. Each table
represents a particular entity or concept, and the rows represent individual
instances of that entity or concept, while the columns represent the attributes or
properties of the entity.

Examples

Include MySQL, PostgreSQL, Oracle, and Microsoft SQL Server

o NoSQL DBMS: NoSQL DBMS (Not Only SQL Database Management System) is a type
of database management system that does not use the traditional relational model of
data storage. Instead of using tables with rows and columns, NoSQL databases use
different data models, such as document-oriented, key-value, graph, or column-family
models, to store and manage data.NoSQL DBMSs are designed to handle large volumes
of unstructured or semi-structured data, which may not fit well into the rigid structure
of a relational database. They are also designed to be highly scalable, able to handle
large amounts of data and high levels of traffic, and can be distributed across multiple
servers to improve performance and availability

Example

MongoDB, Cassandra, Couchbase, and Amazon DynamoDB

2. Data Warehouses: Data warehouses store large amounts of structured and semi-
structured data from various sources, supporting the efficient querying and analysis of
data. Examples include:
o Amazon Redshift
o Google BigQuery
o Snowflake
o Microsoft Azure Synapse Analytics

16
Data Curation And management 2015
E.C

3. Data Integration Tools: Data integration tools help to extract, transform, and load (ETL)
data from multiple sources and formats into a unified data store. Examples include:
o Apache NiFi
o Talend
o Microsoft SQL Server Integration Services (SSIS)
o Informatica PowerCenter
4. Data Catalogs: Data catalogs help to organize and discover data by providing metadata
management and data lineage tracking. Examples include:
o Alation
o Collibra
o AWS Glue Data Catalog
o Google Cloud Data Catalog
5. Data Quality Tools: These tools ensure data accuracy, consistency, and reliability by
identifying and resolving data issues such as duplicates, missing values, and incorrect
formats. Examples include:
o IBM InfoSphere Information Analyzer
o Informatica Data Quality
o Talend Data Quality
o Trifacta
6. Data Governance Platforms: Data governance platforms provide a holistic approach to
managing data policies, standards, and processes to ensure the availability, usability,
integrity, and security of data. Examples include:
o Collibra Data Governance Center
o Informatica Axon Data Governance
o IBM Watson Knowledge Catalog
o SAP Data Intelligence
7. Data Visualization and Reporting Tools: These tools enable users to visually explore,
analyze, and share data insights. Examples include:
o Tableau
o Microsoft Power BI
o QlikView

17
Data Curation And management 2015
E.C

o D3.js
8. Big Data Processing Frameworks: Big data processing frameworks enable the
handling, processing, and analysis of large and complex datasets. Examples include:
o Apache Hadoop
o Apache Spark
o Flink
o Google Cloud Dataflow

Comparison of Previous Works in Data Curation and Management Framework

Framework Model Architecture Strengths Drawbacks


Data Comprehensive data
Collibra Cloud-based Expensive
Governance governance framework
Data Wide range of connectors
Talend Open-source Steep learning curve
Integration and data integration features
Data Desktop- Powerful data visualization Limited data
Tableau
Analytics based and analytics capabilities preparation features
IBM Comprehensive data security
Data Security On-premise Expensive
Guardium and compliance features
Comprehensive data quality
Informatica Data Quality Cloud-based Expensive
framework
Apache Scalable and cost-effective Requires significant
Data Storage Open-source
Hadoop data storage technical expertise
Data Powerful data discovery and
Alation Cloud-based Expensive
Governance governance capabilities

18
Data Curation And management 2015
E.C

Real-world examples for data curation and management framework

Here are some real-world examples of organizations using data curation and management
frameworks:

The National Oceanic and Atmospheric Administration (NOAA)

The National Oceanic and Atmospheric Administration (NOAA) is a US government agency that
collects and manages a vast amount of environmental data, including data on weather, oceans,
and fisheries. To improve the management of this data, NOAA implemented a data curation and
management framework based on the Data Management Maturity Model (DMM).

The DMM provided a comprehensive framework for assessing and improving NOAA's data
management practices across multiple domains, including data governance, metadata,
preservation, and sharing. NOAA used the DMM to evaluate its existing data management
practices, identify gaps and opportunities for improvement, and develop a roadmap for
implementing best practices.

As part of this effort, NOAA established a Data Management Integration Team (DMIT) to lead
the implementation of the DMM and coordinate data management activities across the agency.
The DMIT worked with NOAA's data stakeholders to develop and implement policies and
procedures for data management, establish data quality standards, and improve data sharing and
interoperability.

The Cancer Genome Atlas (TCGA)

TCGA is a research program funded by the National Cancer Institute (NCI) that aims to improve
our understanding of cancer biology and develop new treatments for cancer. To manage the vast
amount of genomic data generated by the program, TCGA developed a data curation and
management framework based on the FAIR Data Principles. The framework includes policies
and procedures for data sharing, metadata management, and data quality control, and has enabled
researchers to access and analyze TCGA data more easily and efficiently.

19
Data Curation And management 2015
E.C

The Human Cell Atlas (HCA)

The HCA is a global research initiative that aims to create a comprehensive map of all human
cells to understand how they interact and contribute to health and disease. To manage the large
amount of data generated by the project, the HCA developed a data curation and management
framework based on the Data Management Maturity Model (DMM). The framework includes
policies and procedures for data governance, metadata management, and data sharing, and has
enabled researchers to access and analyze HCA data more effectively.

The European Open Science Cloud (EOSC)

The EOSC is a pan-European initiative that aims to provide researchers with seamless access to
research data and services. To achieve this, the EOSC developed a data curation and
management framework based on the Research Data Alliance (RDA) guidelines and
recommendations. The framework includes policies and procedures for data sharing,
interoperability, and reuse, and has enabled researchers to access and share research data across
different disciplines and domains.

Related work Review Summary


The article [5] "Data Management and Curation Practices: The Case of Using DSpace and
Implications" explores the practices and implications of using DSpace for data management and
curation. DSpace is an open-source digital repository platform widely used in academic and
research institutions. The article examines the challenges faced in managing and curating
research data, including issues related to data organization, metadata creation, preservation, and
sharing. It discusses how DSpace can be utilized as a tool for data management and curation,
highlighting its features and functionalities. The article also discusses the implications of using
DSpace, including the need for training and support for researchers, as well as the potential
benefits of improved data accessibility, preservation, and collaboration. Overall, the article sheds
light on the practical aspects and considerations involved in using DSpace for effective data
management and curation. Article [6] article titled "A Lightweight Framework for Research Data
Management" presents a concise approach to managing research data efficiently. The framework

20
Data Curation And management 2015
E.C

emphasizes simplicity and ease of use, providing researchers with practical guidelines for
organizing, documenting, and sharing their data. By adopting this lightweight framework,
researchers can enhance data reproducibility, collaboration, and long-term preservation without
overwhelming administrative burdens. Article [7] titled "Provenance-Driven Data Curation
Workflow Analysis" focuses on the analysis of data curation workflows using provenance
information. The study explores how provenance, which records the history and origin of data,
can be leveraged to improve data curation processes. By analyzing the provenance data,
researchers can gain insights into the workflow patterns, identify bottlenecks, and optimize the
curation process. The article highlights the significance of incorporating provenance-driven
approaches in data curation to enhance efficiency, quality, and reliability in managing and
preserving research data. Article [8] titled “Data Curation with a Focus on Reuse" explores the
importance of data curation and its role in enabling data reuse. The article emphasizes that
effective data curation involves not only preserving and organizing data but also ensuring its
accessibility and usability for future research. It highlights the challenges faced in data curation,
including the need for standardized metadata, data documentation, and long-term preservation
strategies. The article also discusses the benefits of data reuse, such as promoting scientific
advancements, enabling interdisciplinary research, and facilitating reproducibility. It concludes
by emphasizing the need for continued investment in data curation efforts to maximize the value
and impact of research data. Article [9] titled "Medical Data Quality Assessment: On the
Development of an Automated Framework for Medical Data Curation" focuses on the
development of an automated framework for assessing the quality of medical data. The article
highlights the challenges faced in ensuring the accuracy, completeness, and consistency of
medical data, which is crucial for reliable clinical research and decision-making. The proposed
framework aims to automate the process of data curation by leveraging machine learning
algorithms and data mining techniques. It discusses the various components of the framework,
including data preprocessing, feature extraction, and quality assessment algorithms. The article
concludes by highlighting the potential benefits of the automated framework, such as improved
efficiency, consistency, and reliability in medical data curation processes.

21
Data Curation And management 2015
E.C

Conclusion

In conclusion, implementing a data curation and management framework is crucial for


educational institutions to ensure the accuracy, completeness, and consistency of data. The
framework provides a systematic approach to data management, covering aspects such as data
collection, processing, analysis, and storage. An effective data curation and management
framework can improve the efficiency and reliability of data management processes, leading to
better decision-making and outcomes. With the increasing use of technology in education and the
growing volume of data, the need for an automated and standardized data curation and
management framework is becoming more critical for educational institutions. Therefore,
investing in the development of a robust framework can help educational institutions leverage
data as a strategic asset, improve student performance, and enhance the overall learning
experience.

22
Data Curation And management 2015
E.C

Reference
[1] Plato L. Smith II, Exploring Data Curation and Management Programs, Projects, and
Services through Metatriangulation, Communication and Information,

[2] Vasily Bunakov ,Brian Matthews ,Data Curation Framework for Facilities Science, In
Proceedings of the 2nd International Conference on Data Technologies and Applications, 2013
DOI: 10.5220/0004593302110216,

[3] Whyte, A. Emerging infrastructure and services for research data management and curation
in the UK and Europe. In G. Pryor (Ed.). Managing Research Data. (2012).

[4] Andrey Kosinov1, Adilbek Erkimbaev1, Geirgy Kobzev1, and Vladimir Zitserman, Data
Curation Approach to Management of Research Data, Joint Institute for High Temperatures,
Russian Academy of Sciences, Russia,2019

[5] Yin Zhang, Chen, Data Management and Curation Practices: The Case of Using DSpace and
Implications, 2015

[6] DimitarNikolov, EsenTuna, ALightweightFrameworkforResearchDataManagement, 2019,


https://doi.org/10.1145/3332186.3333157.

[7] Tianhong Song, Provenance-Driven Data Curation Workflow Analysis, ACM,2015,


http://dx.doi.org/10.1145/2744680.2744691

[8] Maria Esteva,Robert McLay, Weijia Xu, Sivakumar Kulasekaran, Data Curation with a
Focus on Reuse,2016, DOI: http://dx.doi.org/10.1145/2910896.2910906

[9] VasileiosC,Pezoulasa,Konstantina,D.Kouroua,FanisKalatzisa,ThemisP.Exarchosa,c, Medical


data quality assessment: On the development t of an automated framework for medical data
curation, 2018,https://doi.org/10.1016/j.compbiomed.2019.03.001

23

You might also like