0% found this document useful (0 votes)

5 views6 pages

Data Analytics Question

The document outlines key concepts in data analytics, including the differences between descriptive, predictive, and prescriptive analytics, as well as correlation vs. causation. It discusses various statistical methods such as hypothesis testing, regression analysis, and the importance of data cleaning and visualization. Additionally, it covers machine learning concepts like overfitting, model evaluation, and the use of tools like Power BI and DAX.

Uploaded by

shahipiyush55

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views6 pages

Data Analytics Question

Uploaded by

shahipiyush55

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Data Analytics Question

What is the difference between descriptive, predictive, and prescriptive analytics?

 Descriptive Analytics: Focuses on summarizing past data to understand what happened. For
example, summarizing sales data for the past quarter.

 Predictive Analytics: Uses historical data and statistical models to forecast future trends. For
instance, predicting next quarter’s sales based on past trends.

 Prescriptive Analytics: Suggests actions based on the analysis to achieve desired outcomes,
such as recommending the best pricing strategy to maximize profit.

Explain the concept of correlation vs. causation with examples.

 Correlation indicates a relationship between two variables, but one does not necessarily
cause the other. For example, there may be a correlation between ice cream sales and the
number of people going to the beach, but ice cream sales are not caused by the number of
beach-goers.

 Causation implies that one variable directly affects another. For example, a business
increasing its advertising budget may cause a rise in sales.

What are hypothesis testing, and why is it important?

 Hypothesis testing is a statistical method used to test if a premise about a population

is true, based on sample data. It helps make data-driven decisions, like testing whether
a new marketing strategy significantly increases sales. It's important because it
provides a structured way to draw conclusions from data, rather than relying on
assumptions.

What are the key differences between mean, median, and mode?

 Mean is the average of a data set and can be affected by extreme values (outliers).
 Median is the middle value in a data set when sorted in order and is not affected by
outliers.
 Mode is the value that appears most frequently in the dataset and can be useful for
categorical data.

Explain regression analysis. What are its types?

Regression analysis is used to understand relationships between variables. It helps predict the
value of one variable based on the values of others.

 Linear Regression: Predicts a continuous outcome based on one or more independent

variables.
 Multiple Regression: Involves multiple independent variables to predict a dependent
variable.
 Logistic Regression: Used for predicting binary outcomes (yes/no, true/false).
What are the key principles of effective data visualization?

Effective data visualization should:

 Simplify complex data: Visualizations should make it easy to understand the data at
a glance.
 Focus on the story: The visualization should highlight the key insights, not just show
numbers.
 Use appropriate chart types: Choose the right type of chart (bar, line, pie) for the
data and audience.
 Ensure clarity: Use clear labels, legends, and colours to avoid confusion.

What is data cleaning, and why is it important in analytics?

 Data cleaning is the process of identifying and correcting errors or inconsistencies in a

dataset. It’s essential because accurate analysis depends on clean data. Inconsistent or
incomplete data can lead to incorrect conclusions and business decisions.

How can Business Analytics add value to an organization?

 "Business analytics provides insights that drive data-informed decisions. It helps

optimize operations, improve customer satisfaction, and increase profitability. For
example, using analytics to understand customer preferences enables businesses to
personalize marketing strategies, leading to higher conversion rates and improved
ROI."

What industries do you think benefit most from analytics? Why?

 "Industries like retail, finance, healthcare, and e-commerce benefit greatly from
analytics. Retail and e-commerce use analytics to optimize inventory, predict demand,
and personalize customer experiences. Healthcare uses analytics to improve patient
care, while finance relies on it for risk assessment and fraud detection."

What is the difference between classification and regression in analytics?

 Classification is a type of predictive modeling where the output variable is

categorical (e.g., yes/no, 0/1). For instance, predicting whether a customer will churn
(yes/no).
 Regression is used when the output variable is continuous (e.g., predicting sales or
temperature). For example, predicting the total sales for the next quarter.

What is overfitting in a machine learning model, and how can it be avoided?

 Overfitting occurs when a model is too complex and captures noise or random
fluctuations in the data, rather than the actual pattern. It leads to high accuracy on
training data but poor performance on new, unseen data.
 To avoid overfitting:
o Use simpler models or regularization techniques (e.g., L1/L2 regularization).
o Split the data into training and validation sets to test the model on unseen data.
o Use cross-validation to ensure the model generalizes well.
Can you explain what a confusion matrix is?

A confusion matrix is a tool used to evaluate the performance of classification models. It

shows the actual vs. predicted classifications and includes:

 True Positives (TP): Correctly predicted positive cases.

 True Negatives (TN): Correctly predicted negative cases.
 False Positives (FP): Incorrectly predicted positive cases.
 False Negatives (FN): Incorrectly predicted negative cases.

What is the difference between supervised and unsupervised learning?

 Supervised learning involves training a model on labeled data, where the target
(output) variable is known. The model is then used to predict the target for new data.
Example: predicting house prices based on features like size, location, etc.
 Unsupervised learning involves working with data that does not have labeled
outputs. The goal is to find hidden patterns or relationships in the data. Example:
clustering customers based on purchasing behavior.

Explain what bias-variance tradeoff is.

 Bias refers to the error introduced by approximating a real-world problem with a

simplified model.
 Variance refers to the model's sensitivity to small fluctuations in the training data.
 Tradeoff: As you reduce bias (by making the model more complex), variance
increases, leading to overfitting. Similarly, reducing variance (by simplifying the
model) may increase bias, leading to underfitting. The goal is to find a balance
between the two.

What is normalization, and why is it important in database design?

 Normalization is the process of organizing data to reduce redundancy and improve

data integrity. It involves breaking a database into smaller tables and using foreign
keys to maintain relationships. This makes it easier to maintain and ensures that
updates are consistent across the database.

What are some common mistakes to avoid when creating visualizations?

 Overcomplicating the visuals: Too many data points or complex charts can confuse
the audience. Use simplicity to highlight key insights.
 Poor choice of chart type: For example, using pie charts for too many categories can
make it hard to read. Use bar charts or line charts where appropriate.
 Lack of context: Ensure that your visualizations have clear labels, legends, and
context to avoid misinterpretation.
 Ignoring color usage: Colors should be used consistently and in a way that is easy to
distinguish. Avoid using too many similar colors.
How do you choose between different visualization tools like Tableau, Power BI, and
Excel?

 Tableau: Best for creating interactive dashboards and advanced visualizations with
large datasets. It’s powerful for real-time analysis.
 Power BI: Ideal for integrating with Microsoft tools and for businesses already using
the Microsoft ecosystem. Great for straightforward visualizations and dashboarding.
 Excel: Suitable for smaller datasets and quick analyses, especially when working with
financial data. It’s more flexible for custom calculations.

Explain the importance of "data storytelling" when presenting analytics results.

 "Data storytelling" involves presenting your findings in a narrative form that is

engaging and easy to understand. It's important because data alone can be
overwhelming. When paired with a story, data becomes more relatable and actionable
for decision-makers. For instance, instead of just showing a chart of sales data, I could
highlight the key factors that led to sales growth, using the data to guide the audience
through the insights.

What are the types of clustering algorithms?

 K-means clustering: Partitions data into k clusters based on similarity.

 Hierarchical clustering: Builds a tree-like structure to show the nested groups.
 DBSCAN: Density-based clustering that groups together points that are close to each
other based on a distance measure.

What is predictive modelling?

Answer: Predictive modelling is a statistical technique that uses historical data to create a model
that can predict future outcomes. It involves identifying patterns in data and applying algorithms to
forecast future events based on those patterns.

What are the common types of predictive modeling techniques?

Answer: Common techniques include:

1.Linear Regression: Used for predicting a continuous outcome based on one or more predictor
variables.

2.Logistic Regression: Used for binary classification problems.

3.Decision Trees: A flowchart-like structure that splits data into branches to make predictions.

4.Random Forest: An ensemble method that uses multiple decision trees to improve accuracy.

5.Support Vector Machines (SVM): A classification technique that finds the hyperplane that best
separates classes.

6.Neural Networks: A set of algorithms modelled after the human brain, used for complex pattern
recognition.
What is overfitting, and how can it be prevented?

Ans : Overfitting occurs when a model learns the noise in the training data rather than the
underlying pattern, resulting in poor performance on unseen data. It can be prevented by:

1.Using simpler models.

2.Applying regularization techniques (e.g., L1 or L2 regularization).

3.Using cross-validation to ensure the model generalizes well.

4.Pruning decision trees.

What is cross-validation, and why is it important?

Answer: Cross-validation is a technique used to assess how a predictive model will generalize to an
independent dataset. It involves partitioning the data into subsets, training the model on some
subsets, and validating it on others. It is important because it helps to prevent overfitting and
provides a more reliable estimate of model performance.

How do you evaluate the performance of a predictive model?

Answer: Model performance can be evaluated using various metrics, depending on the type of
problem:

For regression: Mean Absolute Error (MAE), Mean Squared Error (MSE), R-squared.

For classification: Accuracy, Precision, Recall, F1 Score, Area Under the Receiver Operating
Characteristic Curve (AUC-ROC).

What is DAX, and how is it used in Power BI?

Answer: DAX (Data Analysis Expressions) is a formula language used in Power BI for data modeling
and analysis. It allows users to create calculated columns, measures, and custom calculations to
enhance data analysis.

What is the difference between Power Query and Power Pivot?

Answer: Power Query: A data connection technology that enables users to discover, connect,
combine, and refine data from various sources.

Power Pivot: A data modelling tool that allows users to create data models, define relationships, and
perform complex calculations using DAX.

What is the purpose of using slicers in Power BI?

Answer: Slicers are visual filters that allow users to segment data in reports. They provide an
interactive way to filter data based on specific criteria, enhancing user experience and data
exploration.

Kantar - Consultant Interview Questions
No ratings yet
Kantar - Consultant Interview Questions
11 pages
BA NOTES SHORT
No ratings yet
BA NOTES SHORT
50 pages
PI Kit - MBA Admissions 2023
No ratings yet
PI Kit - MBA Admissions 2023
50 pages
Descriptive Analytics
No ratings yet
Descriptive Analytics
31 pages
1) What Is Business Analytics?
No ratings yet
1) What Is Business Analytics?
6 pages
BIA_notes
No ratings yet
BIA_notes
10 pages
Microsoft Excel Statistical and Advanced Functions for Decision Making
From Everand
Microsoft Excel Statistical and Advanced Functions for Decision Making
Palani Murugappan
5/5 (2)
DA (All CHP.)
No ratings yet
DA (All CHP.)
14 pages
205- BA (SC-BA-01) BAU R KARAN K
No ratings yet
205- BA (SC-BA-01) BAU R KARAN K
53 pages
Business Analytics Summary (Units 1.2 - 1.8)
No ratings yet
Business Analytics Summary (Units 1.2 - 1.8)
8 pages
DADV_Question Bank_ Important Questions of DADV
No ratings yet
DADV_Question Bank_ Important Questions of DADV
20 pages
Data Analytics for Business-3 Marks
No ratings yet
Data Analytics for Business-3 Marks
5 pages
NEW-QUESTION-BANK-BUSINESS-ANALYTICS
No ratings yet
NEW-QUESTION-BANK-BUSINESS-ANALYTICS
60 pages
Business Analytics
No ratings yet
Business Analytics
23 pages
Cognizant Data Analyst Interview Questions 1745235888
No ratings yet
Cognizant Data Analyst Interview Questions 1745235888
18 pages
unit 1
No ratings yet
unit 1
5 pages
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
No ratings yet
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
28 pages
Data Analytics Chapter -1
No ratings yet
Data Analytics Chapter -1
42 pages
Introduction to Data Science and Data Analytics
No ratings yet
Introduction to Data Science and Data Analytics
72 pages
Data Science
100% (1)
Data Science
7 pages
Types of data
No ratings yet
Types of data
12 pages
DA Interview Questions
No ratings yet
DA Interview Questions
7 pages
50 Data Analytics Interview Questions
No ratings yet
50 Data Analytics Interview Questions
10 pages
Accounting Analytics 2
No ratings yet
Accounting Analytics 2
41 pages
BA Th Exam
No ratings yet
BA Th Exam
38 pages
data science
No ratings yet
data science
28 pages
Da CH1 Slqa
No ratings yet
Da CH1 Slqa
6 pages
Assignment OF Data Science (AIT 120) : Submitted To: Submitted by
No ratings yet
Assignment OF Data Science (AIT 120) : Submitted To: Submitted by
10 pages
Da #2
No ratings yet
Da #2
1 page
Chapter 1: Introduction To Business Analytics
No ratings yet
Chapter 1: Introduction To Business Analytics
14 pages
SEM 3 - MBBD2131 - Introduction to Business Analytics
No ratings yet
SEM 3 - MBBD2131 - Introduction to Business Analytics
18 pages
Data Analytics 2marks PDF
100% (1)
Data Analytics 2marks PDF
13 pages
Week 1
No ratings yet
Week 1
50 pages
UNIT III QB
No ratings yet
UNIT III QB
6 pages
Unit 4 DWDM
No ratings yet
Unit 4 DWDM
8 pages
Big Data Analysis
No ratings yet
Big Data Analysis
25 pages
50 Interview Questions & Answers!
No ratings yet
50 Interview Questions & Answers!
52 pages
BI SHORT NOTES
No ratings yet
BI SHORT NOTES
15 pages
ManagingaBusinessVentureUpdate20250123-28-1t55lf
No ratings yet
ManagingaBusinessVentureUpdate20250123-28-1t55lf
13 pages
IDS Notes
No ratings yet
IDS Notes
32 pages
Dadv Question Bank Solution
No ratings yet
Dadv Question Bank Solution
29 pages
Big Data and Analytics
No ratings yet
Big Data and Analytics
86 pages
Business Analytics Process and Data Exploration
No ratings yet
Business Analytics Process and Data Exploration
38 pages
Data_Science
No ratings yet
Data_Science
207 pages
ATW115 Slides Chp01
No ratings yet
ATW115 Slides Chp01
47 pages
Intro to Business Analytics
No ratings yet
Intro to Business Analytics
27 pages
BI oral
No ratings yet
BI oral
6 pages
Top Data Science Interview Questions and Answers in 2023 PDF
100% (1)
Top Data Science Interview Questions and Answers in 2023 PDF
14 pages
Data Science
No ratings yet
Data Science
14 pages
Types of Data
No ratings yet
Types of Data
11 pages
Data Analyst面试指南
No ratings yet
Data Analyst面试指南
32 pages
640394541-Kantar-Consultant-Interview-questions-1
No ratings yet
640394541-Kantar-Consultant-Interview-questions-1
11 pages
R Programming Basics
No ratings yet
R Programming Basics
17 pages
Big Data
No ratings yet
Big Data
5 pages
DSML
No ratings yet
DSML
62 pages
HubSpots Guide To Data Analytics
No ratings yet
HubSpots Guide To Data Analytics
50 pages
The Predictive Analytics Model
No ratings yet
The Predictive Analytics Model
6 pages
It 311-Ads Module 5
No ratings yet
It 311-Ads Module 5
9 pages
File 1704445511 0009750 Unit-1 PPT 01
No ratings yet
File 1704445511 0009750 Unit-1 PPT 01
41 pages
A Short Guide to Marketing Model Alignment & Design: Advanced Topics in Goal Alignment - Model Formulation
From Everand
A Short Guide to Marketing Model Alignment & Design: Advanced Topics in Goal Alignment - Model Formulation
David Young
No ratings yet
Field Project Report Guidelines_2024-26
No ratings yet
Field Project Report Guidelines_2024-26
17 pages
(eBook PDF) Elementary Statistics in Social Research, Updated Edition 12th Edition 2024 Scribd Download
100% (7)
(eBook PDF) Elementary Statistics in Social Research, Updated Edition 12th Edition 2024 Scribd Download
48 pages
MATH 533 Part C - Regression and Correlation Analysis
0% (1)
MATH 533 Part C - Regression and Correlation Analysis
9 pages
Exploratory Data Analysis: M. Srinath
No ratings yet
Exploratory Data Analysis: M. Srinath
19 pages
pr2 2nd Quarter Practical Research 2 Exam
No ratings yet
pr2 2nd Quarter Practical Research 2 Exam
4 pages
Elicit - Personal Expense Analytics Model - Report
No ratings yet
Elicit - Personal Expense Analytics Model - Report
10 pages
Quantitative Research
50% (2)
Quantitative Research
22 pages
Mayu Resh
No ratings yet
Mayu Resh
41 pages
Effects of Econyl On Environment
No ratings yet
Effects of Econyl On Environment
13 pages
Hca 465 - Regression Assignment
No ratings yet
Hca 465 - Regression Assignment
6 pages
It6702data Warehousing and Data Mining
No ratings yet
It6702data Warehousing and Data Mining
2 pages
Cfa Amos
No ratings yet
Cfa Amos
7 pages
Slot Coordination Guidelines at HKIA
No ratings yet
Slot Coordination Guidelines at HKIA
39 pages
Bachelor of Marketing Course Structure
No ratings yet
Bachelor of Marketing Course Structure
49 pages
Predictive Analytics
No ratings yet
Predictive Analytics
7 pages
Raja C 1
No ratings yet
Raja C 1
4 pages
Critical Discourse Analysis of Pakistani TV Comedy Talk Show "Khabarnaak"
No ratings yet
Critical Discourse Analysis of Pakistani TV Comedy Talk Show "Khabarnaak"
6 pages
Paul Charlent Case Scenario
No ratings yet
Paul Charlent Case Scenario
4 pages
Presented By: Jayson S. Hernandez: Guidance Counselor I San Miguel National High School
No ratings yet
Presented By: Jayson S. Hernandez: Guidance Counselor I San Miguel National High School
36 pages
DSBA Curriculum Guide
No ratings yet
DSBA Curriculum Guide
18 pages
BAN 602 - Project4
No ratings yet
BAN 602 - Project4
5 pages
3789
No ratings yet
3789
13 pages
Final HDFC Project
No ratings yet
Final HDFC Project
45 pages
Week 4 R Programming Model Validation
No ratings yet
Week 4 R Programming Model Validation
5 pages
Data Analysis and Interpretation
No ratings yet
Data Analysis and Interpretation
32 pages
2008-2015 - Malayan Banking Berhard - Metha Haryati
No ratings yet
2008-2015 - Malayan Banking Berhard - Metha Haryati
45 pages
The Problem and Review of Related Literature and Studies
No ratings yet
The Problem and Review of Related Literature and Studies
36 pages
APD's Annual Use of Force Report: 2016-2019
No ratings yet
APD's Annual Use of Force Report: 2016-2019
74 pages
Amazon_Sales_Analysis_Presentation
No ratings yet
Amazon_Sales_Analysis_Presentation
24 pages
DrAshrafElsafty-E-RM-45D-MidTerm-Arafa AbdelNasser
No ratings yet
DrAshrafElsafty-E-RM-45D-MidTerm-Arafa AbdelNasser
34 pages

Data Analytics Question

Uploaded by

Data Analytics Question

Uploaded by

Data Analytics Question

What is the difference between descriptive, predictive, and prescriptive analytics?

Explain the concept of correlation vs. causation with examples.

What are hypothesis testing, and why is it important?

 Hypothesis testing is a statistical method used to test if a premise about a population

Explain regression analysis. What are its types?

 Linear Regression: Predicts a continuous outcome based on one or more independent

Effective data visualization should:

What is data cleaning, and why is it important in analytics?

 Data cleaning is the process of identifying and correcting errors or inconsistencies in a

How can Business Analytics add value to an organization?

 "Business analytics provides insights that drive data-informed decisions. It helps

What industries do you think benefit most from analytics? Why?

What is the difference between classification and regression in analytics?

 Classification is a type of predictive modeling where the output variable is

What is overfitting in a machine learning model, and how can it be avoided?

A confusion matrix is a tool used to evaluate the performance of classification models. It

 True Positives (TP): Correctly predicted positive cases.

What is the difference between supervised and unsupervised learning?

Explain what bias-variance tradeoff is.

 Bias refers to the error introduced by approximating a real-world problem with a

What is normalization, and why is it important in database design?

 Normalization is the process of organizing data to reduce redundancy and improve

What are some common mistakes to avoid when creating visualizations?

Explain the importance of "data storytelling" when presenting analytics results.

 "Data storytelling" involves presenting your findings in a narrative form that is

What are the types of clustering algorithms?

 K-means clustering: Partitions data into k clusters based on similarity.

What is predictive modelling?

What are the common types of predictive modeling techniques?

Answer: Common techniques include:

2.Logistic Regression: Used for binary classification problems.

1.Using simpler models.

2.Applying regularization techniques (e.g., L1 or L2 regularization).

3.Using cross-validation to ensure the model generalizes well.

4.Pruning decision trees.

What is cross-validation, and why is it important?

How do you evaluate the performance of a predictive model?

What is DAX, and how is it used in Power BI?

What is the difference between Power Query and Power Pivot?

What is the purpose of using slicers in Power BI?

You might also like