Ch2 Emerging
Ch2 Emerging
Data Analysis
Data Curation is the active management of data over its life cycle to ensure
it meets the necessary data quality requirements for its effective usage.
Clustered Computing
To better address the high storage and computational needs of big data,
computer clusters are a better fit.
Benefits:-
Hadoop has an ecosystem that has evolved from its four core
components: data management, access, processing, and
storage.
Big Data Life Cycle with Hadoop
Ingesting data into the system- the 1st stage of Big Data processing is
Ingest. The data is ingested or transferred to Hadoop from various sources
such as relational databases, systems, or local files.
Computing and analyzing data- the 3rd stage is to Analyze. Here, the
data is analyzed by processing frameworks such as Pig, Hive, and Impala.
an k
Th
u ! !
Yo
Q & A