0% found this document useful (0 votes)

7 views17 pages

Draw The Data Analytics Life Cycle and Explain Each Phase With Examples

The document outlines the Data Analytics Life Cycle, detailing six phases: Discovery, Data Preparation, Model Planning, Model Building, Communicate Results, and Operationalize, each with specific activities and examples. It also differentiates between Business Intelligence and Data Science, discusses the importance of model selection, and highlights tools used in various phases of analytics. Additionally, it addresses the causes of data deluge and its implications with a real-life example of a social media platform.

Uploaded by

shrushtib27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views17 pages

Draw The Data Analytics Life Cycle and Explain Each Phase With Examples

Uploaded by

shrushtib27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

3

1. Draw the Data Analytics Life Cycle and explain each phase with
examples.

Phases:

lifecycle ss in ppt

1. Discovery – In Phase 1, the team learns the business domain, including

relevant history such as whether the organization or business unit has
attempted similar projects in the past from which they can learn. ----
Understand business goals and identify key resources.
2. Data Preparation – Phase 2 requires the presence of an analytic
sandbox, in which the team can work with data and perform analytics
for the duration of the project. ● The team needs to execute extract,
load, and transform (ELT) or extract, transform and load (ETL) to get
data into the sandbox. -- ETLT (Extract, Transform, Load, Transform)
data into an analytics sandbox.
3. Model Planning – is model planning, where the team determines the
methods, techniques, and workflow it intends to follow for the
subsequent model building phase.--Select appropriate modeling
techniques.
4. Model Building – In Phase 4, the team develops datasets for testing,
training, and production purposes. ● In addition, in this phase the team
builds and executes models based on the work done in the model
planning phase. -- Develop datasets and build models.
5. Communicate Results – In Phase 5, the team, in collaboration with
major stakeholders, determines if the results of the project are a
success or a failure based on the criteria developed in Phase 1.----
Visualize findings, present insights to stakeholders.
6. Operationalize – ● In Phase 6, the team delivers final reports, briefings,
code, and technical documents.---Deploy model into production for
decision-making.

2. What is the Data Preparation phase? Explain ETLT process and the
role of the Analytics Sandbox.
Data Preparation: Phase 2 requires the presence of an analytic
sandbox, in which the team can work with data and perform analytics
for the duration of the project.
The team needs to execute extract, load, and transform (ELT) or extract,
transform and load (ETL) to get data into the sandbox.
The ELT and ETL are sometimes abbreviated as ETLT. Data should be
transformed in the ETLT process so the team can work with it and
analyze it.
Activities: - Explore ● Pre-process ● Condition data. ● Do I have enough
good quality data to start building the model? ● ELTL : ETL software, or
Extract-Transform-Load, is used to manage all aspects of data
preparation ● 50 % of time. 14 Data Science Life Cycle Phase 2 : Data
preparation ● Preparing the Analytic Sandbox (commonly referred to
as a workspace) ● Performing ETLT ● Learning about data ● Data
Conditioning ● Survey and Visualize ● Tools- hadoop,alpine
miner,openrefine ● Data Wrangler
ETLT: is used to manage all aspects of data preparation
Extract data from sources.
Transform to suitable format.
Load into sandbox.
Transform again as needed for analysis.
Analytics Sandbox: An isolated environment where analysts can safely
manipulate and model data without affecting live systems.

3. What is the Discovery phase in Data Analytics Life Cycle? What

are the activities carried out in this phase?

In Phase 1, the team learns the business domain, including relevant

history such as whether the organization or business unit has
attempted similar projects in the past from which they can learn.
The team assesses the resources available to support the project in
terms of people, technology, time, and data.
Important activities in this phase include framing the business problem
as an analytics challenge that can be addressed in subsequent phases
and formulating initial hypotheses (IHs) to test and begin learning the
data
Activities:
Learning the business domain. ● Resources ● Framing the problem. ●
Identifying key stakeholders. ● Interviewing the Analytics Sponsor. ●
Developing initial hypotheses. ● Identifying potential data sources

4. Explain the key roles involved in a successful analytics project.

What are their responsibilities and expectations?

Business Analyst: Understands business needs

Data Scientist: Builds and validates models
Data Engineer: Prepares and pipelines data
Project Manager: Ensures timely delivery
Stakeholders: Define expectations and act on insights

Expectations: Actionable insights, improved decision-making, ROI

5. Differentiate between Business Intelligence (BI) and Data Science.

Aspect Business Intelligence Data Science

Focus Reporting, dashboards Predictive modeling,

Data Structured, historical Structured +

Unstructured

Tools Tableau, Power BI Python, R, TensorFlow

Outcome What happened What will happen, Why

it happened
6. What is Model Planning and Model Building? List activities and
common tools used.

Model Planning: Phase 3 is model planning, where the team determines

the methods, techniques, and workflow it intends to follow for the
subsequent model building phase.
The team explores the data to learn about the relationships between
variables and subsequently selects key variables and the most suitable
models.
Activities-- ● Data Exploration and variable selection ● Model
selection ● Do I have a good idea about the type of model to try?
Can I refine the analytic plan
Tools: R, SAS, Excel
Model Building: ● In Phase 4, the team develops datasets for testing,
training, and production purposes.
In addition, in this phase the team builds and executes models based
on the work done in the model planning phase.
Activities---Develop Analytical model and train it. ● Model build on
training data, fit on train data and evaluated with test. ● Is the model
robust? Have we failed for sure? ● Is the model robust? Have we
failed for sure?
Tools: Python, R, Scikit-learn, TensorFlow

7. Explain Model Selection in Data Analytics. How do you choose the

right model for your data?

Factors to consider:
Type of problem (classification, regression)
Data size and structure
Performance metrics (accuracy, RMSE)
Interpretability
Tools support

Example: Use logistic regression for binary classification, decision trees for
interpretable models.

8. What are the main sources of Big Data? Give three examples and
explain each.

9. What are the 3 V’s of Big Data? Discuss main considerations when
processing Big Data.
important considerations include:

Scalability and Elasticity: Your infrastructure and processing

frameworks should be able to scale up or down dynamically based on
the data volume and processing demands. Cloud platforms often
provide this elasticity.
Fault Tolerance and Reliability: Given the distributed nature of Big Data
processing, systems should be designed to handle failures gracefully
without losing data or interrupting processing.
Security and Privacy: Protecting sensitive data is paramount.
Implementing appropriate security measures, access controls, and
adhering to privacy regulations are critical.
Cost-Effectiveness: Processing large volumes of data can be expensive.
Optimizing your infrastructure, choosing cost-efficient technologies,
and managing resource utilization are important considerations.
Data Governance and Management: Establishing clear policies and
procedures for data acquisition, storage, processing, and retention is
essential for maintaining data quality and compliance.
Skills and Expertise: Processing Big Data requires specialized skills in
areas like data engineering, data science, and distributed computing.
Having the right team with the necessary expertise is crucial for
success.
Choice of Tools and Technologies: A wide range of Big Data tools and
frameworks are available (e.g., Hadoop, Spark, Kafka, NoSQL databases,
cloud-based services). Selecting the right tools for your specific needs
and use cases is a critical decision.

10. Write a short note on Big Data Analytics Architecture with a neat
diagram.
11. What is Linear Regression? Difference between Simple and
Multiple Linear Regression. How is performance evaluated?

Here's a detailed answer to the question:

✅ What is Linear Regression?

Linear Regression is a statistical and machine learning technique used
to model the relationship between a dependent variable (target) and
one or more independent variables (features) by fitting a linear
equation to the observed data.

General form of the equation:

Y=β0+β1X1+β2X2+...+βnXn+ϵ

Where:

Y = predicted value (dependent variable)

Xn = input features (independent variables)
βn= coefficients
ϵ = error term

✅ Difference Between Simple and Multiple Linear Regression

Aspect Simple Linear Multiple Linear
Regression Regression

Number of 1 2 or more
Independent
Variables

Equation Y=β0+β1X+ϵ Y=β0+β1X1+β2X2+...

+βnXn+ϵ

Use Case Predicting salary Predicting house

based on years of price using area,
experience location, and age

Visualization Straight line on 2D Multidimensional

graph plane (not easily
visualizable)

Complexity Low Higher

✅ How is Performance Evaluated?

Common metrics used to evaluate a linear regression model:

1. R² (R-squared)
Measures the proportion of variance in the dependent variable
explained by the model.
Value ranges from 0 to 1 (closer to 1 is better).
2. Adjusted R²
Modified R² that adjusts for the number of predictors in the
model.
Useful in multiple regression.
3. Mean Absolute Error (MAE)
Average of absolute differences between predicted and actual
values.
Easy to interpret.
4. Mean Squared Error (MSE)
Average of squared differences. Penalizes larger errors more.
5. Root Mean Squared Error (RMSE)
Square root of MSE. Same units as the output variable.

12. Define Descriptive, Diagnostic, and Predictive Analytics with

examples.
Aspect Descriptive Diagnostic Predictive
Analytics Analytics Analytics

Purpose Summarize and Understand Predict future

describe causes of events outcomes based
historical data or patterns on historical
data

Question What Why did it What is likely to

Answered happened? happen? happen?

Data Used Historical, Historical data Historical data

aggregated data with used to train
segmentation predictive
and comparison models

Techniques/Tool Data Drill-down, data Machine

s aggregation, mining, learning,
reporting, correlation regression, time-
dashboards, analysis, cause- series,
statistics effect analysis classification
models

Complexity Low – Simple Moderate – High – Involves

summaries Requires deeper modeling and
analysis algorithm
selection

Outcome Clear view of Identifies factors Forecasts and

past influencing past probabilities of
performance outcomes future trends

Examples Total monthly Sales dropped Predicting next

sales, customer due to price quarter sales or
count trends increase or customer churn
market
competition

Tools Excel, Tableau, SQL with Python (scikit-

Power BI analytics, R, learn), R, SAS,
Python with IBM SPSS, ML
visualization platforms
tools

Output Format Reports, bar Root cause Prediction

charts, line diagrams, scores, risk
graphs comparison levels,
dashboards probability-
based decisions

Decision Basic insight for Informs Enables

Support historical corrective action proactive
tracking strategies and
resource
planning

13. Explain common tools used for Model Building in analytics.

Mention both open-source and commercial tools.

Open Source: Python (Scikit-learn, TensorFlow), R

Commercial: SAS, IBM Watson, RapidMiner

🔧 1. Tools for Data Preparation

These tools help clean, format, and organize raw data before analysis:
Tool Use in Data Preparation

Python (Pandas, NumPy) Data wrangling, handling missing

values, merging datasets, and
reshaping data

R (tidyverse) Data cleaning using dplyr, tidyr, etc.

Talend / Alteryx Visual ETL tools to connect to

sources, clean and transform data
without coding

Apache NiFi Real-time data ingestion and

transformation pipelines for big data

🧠 2. Tools for Model Planning

Used to explore data and determine the best modeling techniques:

Tool Use in Model Planning

R (ggplot2, stats) Data visualization, correlation

analysis, summary statistics

Python (Matplotlib, Seaborn, Scikit- For plotting distributions, feature

learn) importance, and choosing models

SAS Offers statistical modeling and

planning tools

IBM SPSS GUI-based planning of statistical

models

⚙️ 3. Tools for Model Building

Used to build, train, and evaluate predictive models:
Tool Use in Model Building

Python (Scikit-learn, TensorFlow, Most widely used for regression,

XGBoost) classification, deep learning models

R (Caret, mlr, RandomForest) Easy to build and tune predictive

models with great visualization

RapidMiner Drag-and-drop environment for

model building and evaluation

SAS Enterprise Miner Visual environment for predictive

modeling

H2O.ai Open-source platform supporting

large-scale ML (works with R, Python,
Spark)

14. What is causing the data deluge? Explain with a real-life example.

The "data deluge" refers to the exponentially increasing volume of data

being created and stored worldwide. This surge is driven by a confluence
of factors:

1. Proliferation of Digital Devices and Sensors: We are surrounded by

devices that constantly generate data. Smartphones, laptops, tablets,
smartwatches, and the rapidly growing Internet of Things (IoT) devices
(smart home appliances, industrial sensors, connected vehicles) all
contribute massive streams of information.
2. Increased Internet Usage and Online Activities: Our daily online
activities leave digital footprints. Social media interactions (posts,
comments, likes, shares), online shopping, streaming videos and music,
browsing websites, and sending emails all generate vast amounts of
data.
3. Growth of Multimedia Content: Images, videos, and audio files are
inherently large in size. The increasing creation and sharing of such
content on social media platforms and other online services
significantly contribute to the data deluge. High-resolution videos and
the popularity of platforms like YouTube and TikTok amplify this effect.
4. Rise of Big Data Technologies and Data Collection: Organizations are
increasingly recognizing the value of data and are implementing
sophisticated technologies to collect and store information from
various sources. This includes customer interactions, business
processes, sensor data, and publicly available information. The ability
to store and process this data more efficiently further encourages its
collection.
5. Scientific Research and Simulations: Fields like genomics, astronomy,
climate science, and high-energy physics generate enormous datasets
through experiments and simulations. Advances in these areas lead to
increasingly complex and data-intensive research.
6. Regulatory and Compliance Requirements: Many industries are
subject to regulations that mandate the long-term storage of various
types of data, contributing to the overall volume.

Real-life Example: Social Media Platform

Consider a popular social media platform like Instagram. Millions of

users worldwide are constantly:
Uploading photos and videos: Each high-resolution image or video adds
significantly to the platform's storage needs.
Posting text updates and comments: While smaller in size individually,
the sheer volume of these interactions across millions of users
generates a massive amount of textual data.
Sending direct messages: Private conversations contribute to the
overall data volume.
Interacting with content: Likes, shares, saves, and story views are all
recorded as data points.
Generating usage data: Information about how users navigate the app,
their preferences, and their activity patterns is also collected.

Impact:
Over a single day, Instagram (and similar platforms) generates
terabytes, if not petabytes, of new data. This constant influx requires
massive and scalable infrastructure for storage, processing, and
analysis. The platform needs to efficiently manage this "data deluge" to:
Provide core services: Ensure users can upload, view, and interact with
content smoothly.
Personalize user experience: Recommend relevant content, suggest
connections, and tailor advertisements based on user data.
Detect and prevent abuse: Identify and remove harmful content or
malicious accounts by analyzing patterns in the data.
Gain business insights: Understand user behavior, trends, and
preferences to improve the platform and its offerings.
Without the ability to handle this massive and continuous flow of data,
the social media platform would become slow, unreliable, and unable
to deliver a relevant experience to its users. This example illustrates
how the combination of user activity, multimedia content, and the
platform's need to understand and manage this information leads to a
significant data deluge.

15. List and explain a few applications of Big Data Analytics in

industries such as healthcare, retail, or finance.

Healthcare: Disease prediction, patient monitoring

Retail: Customer segmentation, personalized marketing
Finance: Fraud detection, credit scoring
Manufacturing: Predictive maintenance, supply chain optimization

Data Analytics Life Cycle
No ratings yet
Data Analytics Life Cycle
8 pages
Ch1-Introduction To Data Analytics & LifeCycle
No ratings yet
Ch1-Introduction To Data Analytics & LifeCycle
25 pages
Week 2 - Data Analytics Life Cycle
No ratings yet
Week 2 - Data Analytics Life Cycle
41 pages
Overview of Data Analytics Lifecycle: Unit 2
No ratings yet
Overview of Data Analytics Lifecycle: Unit 2
100 pages
Business Analytics Unit I
No ratings yet
Business Analytics Unit I
45 pages
Compare Colgate and Sensodyne Based On Positioning
No ratings yet
Compare Colgate and Sensodyne Based On Positioning
5 pages
UBL Operations Management
No ratings yet
UBL Operations Management
18 pages
Chapter 1
No ratings yet
Chapter 1
41 pages
Placer Gold Operations Manual
100% (1)
Placer Gold Operations Manual
178 pages
Unit2-Data Science
No ratings yet
Unit2-Data Science
20 pages
Course Contents of List of Courses Approved by Federal University of Technology, Akure in Metallurgical and Materials Engineering Department
100% (2)
Course Contents of List of Courses Approved by Federal University of Technology, Akure in Metallurgical and Materials Engineering Department
14 pages
Lecture - 8 Steady State Diffusion Equation
No ratings yet
Lecture - 8 Steady State Diffusion Equation
16 pages
What Is A Data Analytics Lifecycle
No ratings yet
What Is A Data Analytics Lifecycle
8 pages
Ni-Cd Battery For Aircraft Battery Design and Charging Options
No ratings yet
Ni-Cd Battery For Aircraft Battery Design and Charging Options
8 pages
NTPC Green Energy Limited Corporate Identity Number
No ratings yet
NTPC Green Energy Limited Corporate Identity Number
643 pages
BSR-Data Science
No ratings yet
BSR-Data Science
308 pages
Investor Presentation
No ratings yet
Investor Presentation
30 pages
What Your Food Ate How To Heal Our Land and Reclaim Our Health David R Montgomery Instant Download
No ratings yet
What Your Food Ate How To Heal Our Land and Reclaim Our Health David R Montgomery Instant Download
83 pages
Unit I Big Data
No ratings yet
Unit I Big Data
256 pages
Pointers To Review On Mathematics
No ratings yet
Pointers To Review On Mathematics
3 pages
Data Analytics Lifecycle
No ratings yet
Data Analytics Lifecycle
4 pages
Chemical Catalog
No ratings yet
Chemical Catalog
58 pages
Unit - I - 2
No ratings yet
Unit - I - 2
63 pages
ATW115 Slides Chp02
No ratings yet
ATW115 Slides Chp02
52 pages
EVD Evolution Eng
No ratings yet
EVD Evolution Eng
52 pages
DS&BDA Unit 3
No ratings yet
DS&BDA Unit 3
51 pages
W3 - DA Life Cycle
No ratings yet
W3 - DA Life Cycle
49 pages
Module I (Introduction Data Analytics Life Cycle) Part II
No ratings yet
Module I (Introduction Data Analytics Life Cycle) Part II
103 pages
CSCI946 w2-BDLifecycle
No ratings yet
CSCI946 w2-BDLifecycle
76 pages
2 Data Analytics
No ratings yet
2 Data Analytics
49 pages
Chap 1
No ratings yet
Chap 1
42 pages
Chapter 1 Business
No ratings yet
Chapter 1 Business
52 pages
Part1 Ds ML Introduction
No ratings yet
Part1 Ds ML Introduction
61 pages
Unit - I DA
No ratings yet
Unit - I DA
107 pages
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
No ratings yet
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
24 pages
Lesson 6 Data Life Cycle Part 2
No ratings yet
Lesson 6 Data Life Cycle Part 2
30 pages
Ba01572cen 0320
No ratings yet
Ba01572cen 0320
16 pages
Central Civil Services (Conduct) Rules MCQ
No ratings yet
Central Civil Services (Conduct) Rules MCQ
11 pages
Module 1B
No ratings yet
Module 1B
65 pages
Life Cycle
No ratings yet
Life Cycle
35 pages
DAV - Viva QnA - Doubtly - in
No ratings yet
DAV - Viva QnA - Doubtly - in
12 pages
Data Science QB
No ratings yet
Data Science QB
42 pages
DataAnalytics Chap 1
No ratings yet
DataAnalytics Chap 1
36 pages
Unit 1 - DSA
No ratings yet
Unit 1 - DSA
12 pages
Data Analytics 1
No ratings yet
Data Analytics 1
13 pages
Ch1-Introduction To Data Analytics & LifeCycle
No ratings yet
Ch1-Introduction To Data Analytics & LifeCycle
26 pages
Chapter 2 Different Types of Fixtures
No ratings yet
Chapter 2 Different Types of Fixtures
20 pages
Coursera
No ratings yet
Coursera
12 pages
EE 432/532 Diffusion Examples - 1
No ratings yet
EE 432/532 Diffusion Examples - 1
13 pages
Introduction To Algorithms
No ratings yet
Introduction To Algorithms
18 pages
Tutorial #1:the Essential ANSYS.: ME309: Finite Element Analysis in Mechanical Design
No ratings yet
Tutorial #1:the Essential ANSYS.: ME309: Finite Element Analysis in Mechanical Design
9 pages
DSV-Module1 & Module2 Notes
No ratings yet
DSV-Module1 & Module2 Notes
62 pages
Trends in Data Science: AI and DS-I
No ratings yet
Trends in Data Science: AI and DS-I
32 pages
5 Data Analytics Life Cycle
No ratings yet
5 Data Analytics Life Cycle
18 pages
DA Notes
No ratings yet
DA Notes
68 pages
AP Chemistry Bonding Help Sheet: 2, (Diamond)
No ratings yet
AP Chemistry Bonding Help Sheet: 2, (Diamond)
6 pages
Common Analytics Interview Questions
No ratings yet
Common Analytics Interview Questions
4 pages
Unit2 DATA SCIENCE
No ratings yet
Unit2 DATA SCIENCE
8 pages
2-Data Analytics Lifecycle
No ratings yet
2-Data Analytics Lifecycle
17 pages
BigData Section2
No ratings yet
BigData Section2
17 pages
Introduction
No ratings yet
Introduction
14 pages
Madhubhan Rejou Spa Services Menu
No ratings yet
Madhubhan Rejou Spa Services Menu
10 pages
Robotics Perception Week 3 Assignment
No ratings yet
Robotics Perception Week 3 Assignment
6 pages
D.A - Introduction To Data Analytics
No ratings yet
D.A - Introduction To Data Analytics
16 pages
Data Analytics Lifecycle
No ratings yet
Data Analytics Lifecycle
16 pages
Dav Sem 6
No ratings yet
Dav Sem 6
25 pages
Adobe Scan 27-Mar-2024
No ratings yet
Adobe Scan 27-Mar-2024
12 pages
Analytics and Data Science
No ratings yet
Analytics and Data Science
12 pages
Smart Assistive Multi Final
No ratings yet
Smart Assistive Multi Final
11 pages
Unit - 2 Learning Notes
No ratings yet
Unit - 2 Learning Notes
7 pages
What Is Data Anaysis
No ratings yet
What Is Data Anaysis
8 pages
Fintree 10X Workbench V02 Final
No ratings yet
Fintree 10X Workbench V02 Final
6 pages
Tugas Analitika Data (Yasa Hapipudin)
No ratings yet
Tugas Analitika Data (Yasa Hapipudin)
4 pages
HTTTTC - Final Exam
No ratings yet
HTTTTC - Final Exam
4 pages
Data Science: Lesson 5
No ratings yet
Data Science: Lesson 5
6 pages
Data Analytics
No ratings yet
Data Analytics
7 pages
Key Roles and Life Cycle
No ratings yet
Key Roles and Life Cycle
4 pages
IASC Template
No ratings yet
IASC Template
7 pages
Syllabus MBA542 Fall 2020
No ratings yet
Syllabus MBA542 Fall 2020
3 pages
Data Analytics
No ratings yet
Data Analytics
13 pages
Da CH1 Slqa
No ratings yet
Da CH1 Slqa
6 pages
Unit V
No ratings yet
Unit V
3 pages
HTML & SQL Programmes
No ratings yet
HTML & SQL Programmes
4 pages
Euglena S
No ratings yet
Euglena S
4 pages
Unit - 2 PDA
No ratings yet
Unit - 2 PDA
20 pages
Jin Memorial Temple
No ratings yet
Jin Memorial Temple
2 pages
Big Data Analytics Life Cycle
No ratings yet
Big Data Analytics Life Cycle
2 pages
Cet455-Qp May 24
No ratings yet
Cet455-Qp May 24
2 pages
Unit 3 Batnote
No ratings yet
Unit 3 Batnote
1 page
Crack the Data Analyst Interview: Real-Time Questions & Expert Answers
From Everand
Crack the Data Analyst Interview: Real-Time Questions & Expert Answers
Yash d.
No ratings yet

Draw The Data Analytics Life Cycle and Explain Each Phase With Examples

Uploaded by

Draw The Data Analytics Life Cycle and Explain Each Phase With Examples

Uploaded by

3

1. Discovery – In Phase 1, the team learns the business domain, including

3. What is the Discovery phase in Data Analytics Life Cycle? What

In Phase 1, the team learns the business domain, including relevant

4. Explain the key roles involved in a successful analytics project.

Business Analyst: Understands business needs

Expectations: Actionable insights, improved decision-making, ROI

5. Differentiate between Business Intelligence (BI) and Data Science.

Aspect Business Intelligence Data Science

Focus Reporting, dashboards Predictive modeling,

Data Structured, historical Structured +

Tools Tableau, Power BI Python, R, TensorFlow

Outcome What happened What will happen, Why

Model Planning: Phase 3 is model planning, where the team determines

7. Explain Model Selection in Data Analytics. How do you choose the

Scalability and Elasticity: Your infrastructure and processing

Here's a detailed answer to the question:

✅ What is Linear Regression?

General form of the equation:

Y = predicted value (dependent variable)

✅ Difference Between Simple and Multiple Linear Regression

Equation Y=β0+β1X+ϵ Y=β0+β1X1+β2X2+...

Use Case Predicting salary Predicting house

Visualization Straight line on 2D Multidimensional

Complexity Low Higher

✅ How is Performance Evaluated?

12. Define Descriptive, Diagnostic, and Predictive Analytics with

Purpose Summarize and Understand Predict future

Question What Why did it What is likely to

Data Used Historical, Historical data Historical data

Techniques/Tool Data Drill-down, data Machine

Complexity Low – Simple Moderate – High – Involves

Outcome Clear view of Identifies factors Forecasts and

Examples Total monthly Sales dropped Predicting next

Tools Excel, Tableau, SQL with Python (scikit-

Output Format Reports, bar Root cause Prediction

Decision Basic insight for Informs Enables

13. Explain common tools used for Model Building in analytics.

Open Source: Python (Scikit-learn, TensorFlow), R

🔧 1. Tools for Data Preparation

Python (Pandas, NumPy) Data wrangling, handling missing

R (tidyverse) Data cleaning using dplyr, tidyr, etc.

Talend / Alteryx Visual ETL tools to connect to

Apache NiFi Real-time data ingestion and

🧠 2. Tools for Model Planning

Tool Use in Model Planning

R (ggplot2, stats) Data visualization, correlation

Python (Matplotlib, Seaborn, Scikit- For plotting distributions, feature

SAS Offers statistical modeling and

IBM SPSS GUI-based planning of statistical

⚙️ 3. Tools for Model Building

Python (Scikit-learn, TensorFlow, Most widely used for regression,

R (Caret, mlr, RandomForest) Easy to build and tune predictive

RapidMiner Drag-and-drop environment for

SAS Enterprise Miner Visual environment for predictive

H2O.ai Open-source platform supporting

The "data deluge" refers to the exponentially increasing volume of data

1. Proliferation of Digital Devices and Sensors: We are surrounded by

Real-life Example: Social Media Platform

Consider a popular social media platform like Instagram. Millions of

15. List and explain a few applications of Big Data Analytics in

Healthcare: Disease prediction, patient monitoring

You might also like