Module I - 1
Module I - 1
Visualization
Course Code- CSC601
Module I- Introduction to Data analytics and life cycle (5Hr CO1)
Data Analytics Lifecycle overview:Key Roles for a Successful Analytics, Background and Overview
of Data Analytics Lifecycle Project
Phase 1: Discovery: Learning the Business Domain, Resources Framing the Problem, Identifying
Key Stakeholders. Interviewing the Analytics Sponsor, Developing Initial Hypotheses Identifying
Potential Data Sources
Phase 2: Data Preparation: Preparing the Analytic Sandbox, Performing ETLT, Learning About the
Data, Data Conditioning, Survey and visualize, Common Tools for the Data Preparation Phase
Phase 3: Model Planning: Data Exploration and Variable Selection, Model Selection ,Common
Tools for the Model Planning Phase
Phase 4: Model Building: Common Tools for the Model Building Phase
Phase 5: Communicate Results
Phase 6: Operationalize
Introduction to Data analytics-
Analytics is the discovery and communication of meaningful patterns in data. Especially, valuable
in areas rich with recorded information, analytics relies on the simultaneous application of
statistics, computer programming, and operation research to qualify performance. Analytics often
favors data visualization to communicate insight.
Firms may commonly apply analytics to business data, to describe, predict, and improve business
performance. Especially, areas within include predictive analytics, enterprise decision
management, etc. Since analytics can require extensive computation(because of big data), the
algorithms and software used to analytics harness the most current methods in computer science.
In a nutshell, analytics is the scientific process of transforming data into insight for making better
decisions. The goal of Data Analytics is to get actionable insights resulting in smarter decisions
and better business outcomes.
It is critical to design and built a data warehouse or Business Intelligence(BI) architecture that
provides a flexible, multi-faceted analytical ecosystem, optimized for efficient ingestion and
analysis of large and diverse data sets.
There are four types of data analytics
1. Predictive (forecasting)
2. Descriptive (business intelligence and data mining)
3. Prescriptive (optimization and simulation)
4. Diagnostic analytics
Predictive Analytics
Predictive analytics turn the data into valuable, actionable information. predictive
analytics uses data to determine the probable outcome of an event or a likelihood
of a situation occurring.
Predictive analytics holds a variety of statistical techniques from modeling,
machine, learning, data mining, and game theory that analyze current and
historical facts to make predictions about a future event. Techniques that are used
for predictive analytics are:
● Linear Regression
● Time series analysis and forecasting
● Data Mining
Descriptive Analytics
Descriptive analytics looks at data and analyze past event for insight as to how to approach future
events. It looks at the past performance and understands the performance by mining historical data to
understand the cause of success or failure in the past. Almost all management reporting such as sales,
marketing, operations, and finance uses this type of analysis.
The descriptive model quantifies relationships in data in a way that is often used to classify customers or
prospects into groups. Unlike a predictive model that focuses on predicting the behavior of a single
customer, Descriptive analytics identifies many different relationships between customer and product.
Common examples of Descriptive analytics are company reports that provide historic reviews
like:
● Data Queries
● Reports
● Descriptive Statistics
● Data dashboard
Prescriptive Analytics
Prescriptive analytics goes beyond predicting future outcomes by also suggesting action
benefit from the predictions and showing the decision maker the implication of each decision
option. Prescriptive Analytics not only anticipates what will happen and when to happen but
also why it will happen. Further, Prescriptive Analytics can suggest decision options on how
to take advantage of a future opportunity or mitigate a future risk and illustrate the implication
of each decision option.
For example, Prescriptive Analytics can benefit healthcare strategic planning by using
analytics to leverage operational and usage data combined with data of external factors such
as economic data, population demography, etc.
Diagnostic Analytics
In this analysis, we generally use historical data over other data to answer any
question or for the solution of any problem. We try to find any dependency and pattern
in the historical data of the particular problem.
For example, companies go for this analysis because it gives a great insight into a
problem, and they also keep detailed information about their disposal otherwise data
collection may turn out individual for every problem and it will be very time-consuming.
Common techniques used for Diagnostic Analytics are:
● Data discovery
● Data mining
● Correlations
Key Roles for a Successful Analytics Project-
1. Business User
2. Project Sponsor
3. Project Manager
4. Business Intelligence Analyst
5. Database Administrator (DBA)
6. Data Engineer
7. Data Scientist
1. Business User
● The business user is the one who understands the main area of the project and is
also basically benefited from the results.
● This user gives advice and consult the team working on the project about the value
of the results obtained and how the operations on the outputs are done.
● The business manager, line manager, or deep subject matter expert in the project
mains fulfills this role.
2. Project Sponsor
● The Project Sponsor is the one who is responsible to initiate the project. Project
Sponsor provides the actual requirements for the project and presents the basic
business issue.
● He generally provides the funds and measures the degree of value from the final
output of the team working on the project.
● This person introduce the prime concern and brooms the desired output.
3. Project Manager:
● This person ensures that key milestone and purpose of the project is met on
time and of the expected quality.
7. Data Scientist:
● Data scientist facilitates with the subject matter expertise for analytical techniques,
data modelling, and applying correct analytical techniques for a given business issues.
● He ensures overall analytical objectives are met.
● Data scientists outline and apply analytical methods and proceed towards the data
available for the concerned project.
Data Analytics Lifecycle
Phase 1- Discovery
● The data science team learn and investigate the problem.
● Develop context and understanding.
● Come to know about data sources needed and available for the project.
● The team formulates initial hypothesis that can be later tested with data.
In Phase 1, the team learns the business domain, including relevant history such as whether
the organization or business unit has attempted similar projects in the past from which they
can learn. The team assesses the resources available to support the project in terms of
people, technology, time, and data. Important activities in this phase include framing the
business problem as an analytics challenge that can be addressed in subsequent phases and
formulating initial hypotheses (IHs) to test and begin learning the data.
Phase 1: Discovery
2.2.1 Learning the Business Domain
2.2.2 Resources
● Steps to explore, preprocess, and condition data prior to modeling and analysis.
● It requires the presence of an analytic sandbox, the team execute, load, and transform,
to get data into the sandbox.
● Data preparation tasks are likely to be performed multiple times and not in predefined
order.
● Several tools commonly used for this phase are – Hadoop, Alpine Miner, Open Refine,
etc.
Phase 2 requires the presence of an analytic sandbox, in which the team can work with data
and perform analytics for the duration of the project. The team needs to execute extract,
load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox.
The ELT and ETL are sometimes abbreviated as ETLT. Data should be transformed in the
ETLT process so the team can work with it and analyze it. In this phase, the team also needs
to familiarize itself with the data thoroughly and take steps to condition the data
Phase 2: Data Preparation
● The team communicates benefits of project more broadly and sets up pilot project to
deploy work in controlled way before broadening the work to full enterprise of users.
● This approach enables team to learn about performance and related constraints of
the model in production environment on small scale , and make adjustments before
full deployment.
● The team delivers final reports, briefings, codes.
● Free or open source tools – Octave, WEKA, SQL, MADlib.
In Phase 6, the team delivers final reports, briefings, code, and technical documents. In
addition, the team may run a pilot project to implement the models in a production
environment.