Data Analytics Lifecycle
Data Analytics Lifecycle
Phase 1: Discovery –
The data science team learn and investigate the problem.
Develop context and understanding.
Come to know about data sources needed and available for
the project.
The team formulates initial hypothesis that can be later tested
with data.
Phase 2: Data Preparation –
Steps to explore, preprocess, and condition data prior to modeling
and analysis.
It requires the presence of an analytic sandbox, the team execute,
load, and transform, to get data into the sandbox.
Data preparation tasks are likely to be performed multiple times and
not in predefined order.
Several tools commonly used for this phase are – Hadoop, Alpine
Miner, Open Refine, etc.