Module_1B
Module_1B
“Make it so”
Overview of Data Analytics Lifecycle
Phase 1: Discovery
Phase 1: Discovery
• Activities to consider
– Assess the structure of the data – this dictates the tools and
analytic techniques for the next phase
– Ensure the analytic techniques enable the team to meet the
business objectives and accept or reject the working
hypotheses
– Determine if the situation warrants a single model or a series
of techniques as part of a larger analytic workflow
– Research and understand how other analysts have
approached this kind or similar kind of problem
Phase 3: Model Planning
Model Planning in Industry Verticals
• Select variables
• Balance data
• Build models
• Validate
• Deploy
• Maintain
• Define success
• Explore data
• Condition data
Common Tools for
the Model Building Phase
• Commercial Tools
– SAS Enterprise Miner – built for enterprise-level computing &
analytics
– SPSS Modeler (IBM) – provides enterprise-level computing and
analytics
– Matlab – high-level language for data analytics, algorithms, data
exploration
– Alpine Miner – provides GUI frontend for backend analytics tools
• Free or Open Source Tools
– R and PL/R - PL/R is a procedural language for PostgreSQL with R
– Octave – language for computational modeling
– WEKA – data mining software package with analytic workbench
– Python – language providing toolkits for machine learning and
analysis
Phase 5: Communicate Results
Phase 5: Communicate Results
• After executing model team need to
compare outcomes of modeling to
criteria established for success and
failure.
• Team considers how best to articulate
findings and outcomes to various team
members and stakeholders, taking into
account warning, assumptions.
• Team should identify key findings,
quantify business value, and develop
narrative to summarize and convey
findings to stakeholders.
Phase 5: Communicate Results
• Determine if the team succeeded or failed in its
objectives
• Assess if the results are statistically significant and
valid
– If so, identify aspects of the results that present salient
findings
– Identify surprising results and those in line with the
hypotheses
• Communicate and document the key findings and
major insights derived from the analysis
– This is the most visible portion of the process to the outside
stakeholders and sponsors
Phase 6: Operationalize
Phase 6: Operationalize
• In this last phase, the team communicates the benefits of the project
more broadly and sets up a pilot project to deploy the work in a
controlled way
• Risk is managed effectively by undertaking small scope, pilot
deployment before a wide-scale rollout
• During the pilot project, the team may need to execute the algorithm
more efficiently in the database rather than with in-memory tools
like R, especially with larger datasets
• To test the model in a live setting, consider running the model in a
production environment for a discrete set of products or a single line
of business
• Monitor model accuracy and retrain the model if necessary
Phase 6: Operationalize
Key outputs from successful analytics project
Phase 6: Operationalize
Key outputs from successful analytics project
• Business user – tries to determine business benefits and
implications
• Project sponsor – wants business impact, risks, ROI
• Project manager – needs to determine if project
completed on time, within budget, goals met
• Business intelligence analyst – needs to know if reports
and dashboards will be impacted and need to change
• Data engineer and DBA – must share code and document
• Data scientist – must share code and explain model to
peers, managers, stakeholders
Phase 6: Operationalize
Four main deliverables