Asset-V1 e-SHE+EX101+Q1+type@asset+block@Chapter2 Session 3 PDF
Asset-V1 e-SHE+EX101+Q1+type@asset+block@Chapter2 Session 3 PDF
Chapter2: Session 3
Data Value Chain
Learning outcome
After the successfully completing this session, you can
• Describe data value chain in big data era.
Data value Chain
Data Acquisition
• It is the process of gathering, filtering, and cleaning data before it is
put in a data warehouse or any other storage solution on which data
analysis can be carried out.
• Data acquisition is one of the major big data challenges in terms of
infrastructure requirements.
• The infrastructure required to support the acquisition of big data must
deliver low, predictable latency in both capturing data and in
executing queries; be able to handle very high transaction volumes,
often in a distributed environment; and support flexible and dynamic
data structures.
Data Analysis
• It is concerned with making the raw data acquired amenable to use in
decision-making as well as domain-specific usage.
• Data analysis involves exploring, transforming, and modeling data
with the goal of highlighting relevant data, synthesizing and extracting
useful hidden information with high potential from a business point of
view.
• Related areas include data mining, business intelligence, and
machine learning.
Data Curation
• It is the active management of data over its life cycle to ensure it
meets the necessary data quality requirements for its effective usage.
• Data curation processes can be categorized into different activities
such as content creation, selection, classification, transformation,
validation, and preservation.
• Data curation is performed by expert curators that are responsible for
improving the accessibility and quality of data.
• Data curators (also known as scientific curators or data annotators)
hold the responsibility of ensuring that data are trustworthy,
discoverable, accessible, reusable and fit their purpose.
• A key trend for the duration of big data utilizes community and
crowdsourcing approaches.
Data Storage
• It is the persistence and management of data in a scalable way that
satisfies the needs of applications that require fast access to the data.
• Relational Database Management Systems (RDBMS) have been the
main, and almost unique, a solution to the storage paradigm for
nearly 40 years.
• However, the ACID (Atomicity, Consistency, Isolation, and Durability)
properties that guarantee database transactions lack flexibility with
regard to schema changes and the performance and fault tolerance
when data volumes and complexity grow, making them unsuitable for
big data scenarios.
• NoSQL technologies have been designed with the scalability goal in
mind and present a wide range of solutions based on alternative data
models.
Data Usage
• It covers the data-driven business activities that need access to data,
its analysis, and the tools needed to integrate the data analysis within
the business activity.
• Data usage in business decision-making can enhance
competitiveness through the reduction of costs, increased added
value, or any other parameter that can be measured against existing
performance criteria.
Summery