0% found this document useful (0 votes)
5 views6 pages

Data Analytics Question

The document outlines key concepts in data analytics, including the differences between descriptive, predictive, and prescriptive analytics, as well as correlation vs. causation. It discusses various statistical methods such as hypothesis testing, regression analysis, and the importance of data cleaning and visualization. Additionally, it covers machine learning concepts like overfitting, model evaluation, and the use of tools like Power BI and DAX.

Uploaded by

shahipiyush55
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views6 pages

Data Analytics Question

The document outlines key concepts in data analytics, including the differences between descriptive, predictive, and prescriptive analytics, as well as correlation vs. causation. It discusses various statistical methods such as hypothesis testing, regression analysis, and the importance of data cleaning and visualization. Additionally, it covers machine learning concepts like overfitting, model evaluation, and the use of tools like Power BI and DAX.

Uploaded by

shahipiyush55
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Data Analytics Question

What is the difference between descriptive, predictive, and prescriptive analytics?

 Descriptive Analytics: Focuses on summarizing past data to understand what happened. For
example, summarizing sales data for the past quarter.

 Predictive Analytics: Uses historical data and statistical models to forecast future trends. For
instance, predicting next quarter’s sales based on past trends.

 Prescriptive Analytics: Suggests actions based on the analysis to achieve desired outcomes,
such as recommending the best pricing strategy to maximize profit.

Explain the concept of correlation vs. causation with examples.

 Correlation indicates a relationship between two variables, but one does not necessarily
cause the other. For example, there may be a correlation between ice cream sales and the
number of people going to the beach, but ice cream sales are not caused by the number of
beach-goers.

 Causation implies that one variable directly affects another. For example, a business
increasing its advertising budget may cause a rise in sales.

What are hypothesis testing, and why is it important?

 Hypothesis testing is a statistical method used to test if a premise about a population


is true, based on sample data. It helps make data-driven decisions, like testing whether
a new marketing strategy significantly increases sales. It's important because it
provides a structured way to draw conclusions from data, rather than relying on
assumptions.

What are the key differences between mean, median, and mode?

 Mean is the average of a data set and can be affected by extreme values (outliers).
 Median is the middle value in a data set when sorted in order and is not affected by
outliers.
 Mode is the value that appears most frequently in the dataset and can be useful for
categorical data.

Explain regression analysis. What are its types?

Regression analysis is used to understand relationships between variables. It helps predict the
value of one variable based on the values of others.

 Linear Regression: Predicts a continuous outcome based on one or more independent


variables.
 Multiple Regression: Involves multiple independent variables to predict a dependent
variable.
 Logistic Regression: Used for predicting binary outcomes (yes/no, true/false).
What are the key principles of effective data visualization?

Effective data visualization should:

 Simplify complex data: Visualizations should make it easy to understand the data at
a glance.
 Focus on the story: The visualization should highlight the key insights, not just show
numbers.
 Use appropriate chart types: Choose the right type of chart (bar, line, pie) for the
data and audience.
 Ensure clarity: Use clear labels, legends, and colours to avoid confusion.

What is data cleaning, and why is it important in analytics?

 Data cleaning is the process of identifying and correcting errors or inconsistencies in a


dataset. It’s essential because accurate analysis depends on clean data. Inconsistent or
incomplete data can lead to incorrect conclusions and business decisions.

How can Business Analytics add value to an organization?

 "Business analytics provides insights that drive data-informed decisions. It helps


optimize operations, improve customer satisfaction, and increase profitability. For
example, using analytics to understand customer preferences enables businesses to
personalize marketing strategies, leading to higher conversion rates and improved
ROI."

What industries do you think benefit most from analytics? Why?

 "Industries like retail, finance, healthcare, and e-commerce benefit greatly from
analytics. Retail and e-commerce use analytics to optimize inventory, predict demand,
and personalize customer experiences. Healthcare uses analytics to improve patient
care, while finance relies on it for risk assessment and fraud detection."

What is the difference between classification and regression in analytics?

 Classification is a type of predictive modeling where the output variable is


categorical (e.g., yes/no, 0/1). For instance, predicting whether a customer will churn
(yes/no).
 Regression is used when the output variable is continuous (e.g., predicting sales or
temperature). For example, predicting the total sales for the next quarter.

What is overfitting in a machine learning model, and how can it be avoided?

 Overfitting occurs when a model is too complex and captures noise or random
fluctuations in the data, rather than the actual pattern. It leads to high accuracy on
training data but poor performance on new, unseen data.
 To avoid overfitting:
o Use simpler models or regularization techniques (e.g., L1/L2 regularization).
o Split the data into training and validation sets to test the model on unseen data.
o Use cross-validation to ensure the model generalizes well.
Can you explain what a confusion matrix is?

A confusion matrix is a tool used to evaluate the performance of classification models. It


shows the actual vs. predicted classifications and includes:

 True Positives (TP): Correctly predicted positive cases.


 True Negatives (TN): Correctly predicted negative cases.
 False Positives (FP): Incorrectly predicted positive cases.
 False Negatives (FN): Incorrectly predicted negative cases.

What is the difference between supervised and unsupervised learning?

 Supervised learning involves training a model on labeled data, where the target
(output) variable is known. The model is then used to predict the target for new data.
Example: predicting house prices based on features like size, location, etc.
 Unsupervised learning involves working with data that does not have labeled
outputs. The goal is to find hidden patterns or relationships in the data. Example:
clustering customers based on purchasing behavior.

Explain what bias-variance tradeoff is.

 Bias refers to the error introduced by approximating a real-world problem with a


simplified model.
 Variance refers to the model's sensitivity to small fluctuations in the training data.
 Tradeoff: As you reduce bias (by making the model more complex), variance
increases, leading to overfitting. Similarly, reducing variance (by simplifying the
model) may increase bias, leading to underfitting. The goal is to find a balance
between the two.

What is normalization, and why is it important in database design?

 Normalization is the process of organizing data to reduce redundancy and improve


data integrity. It involves breaking a database into smaller tables and using foreign
keys to maintain relationships. This makes it easier to maintain and ensures that
updates are consistent across the database.

What are some common mistakes to avoid when creating visualizations?

 Overcomplicating the visuals: Too many data points or complex charts can confuse
the audience. Use simplicity to highlight key insights.
 Poor choice of chart type: For example, using pie charts for too many categories can
make it hard to read. Use bar charts or line charts where appropriate.
 Lack of context: Ensure that your visualizations have clear labels, legends, and
context to avoid misinterpretation.
 Ignoring color usage: Colors should be used consistently and in a way that is easy to
distinguish. Avoid using too many similar colors.
How do you choose between different visualization tools like Tableau, Power BI, and
Excel?

 Tableau: Best for creating interactive dashboards and advanced visualizations with
large datasets. It’s powerful for real-time analysis.
 Power BI: Ideal for integrating with Microsoft tools and for businesses already using
the Microsoft ecosystem. Great for straightforward visualizations and dashboarding.
 Excel: Suitable for smaller datasets and quick analyses, especially when working with
financial data. It’s more flexible for custom calculations.

Explain the importance of "data storytelling" when presenting analytics results.

 "Data storytelling" involves presenting your findings in a narrative form that is


engaging and easy to understand. It's important because data alone can be
overwhelming. When paired with a story, data becomes more relatable and actionable
for decision-makers. For instance, instead of just showing a chart of sales data, I could
highlight the key factors that led to sales growth, using the data to guide the audience
through the insights.

What are the types of clustering algorithms?

 K-means clustering: Partitions data into k clusters based on similarity.


 Hierarchical clustering: Builds a tree-like structure to show the nested groups.
 DBSCAN: Density-based clustering that groups together points that are close to each
other based on a distance measure.

What is predictive modelling?

Answer: Predictive modelling is a statistical technique that uses historical data to create a model
that can predict future outcomes. It involves identifying patterns in data and applying algorithms to
forecast future events based on those patterns.

What are the common types of predictive modeling techniques?

Answer: Common techniques include:

1.Linear Regression: Used for predicting a continuous outcome based on one or more predictor
variables.

2.Logistic Regression: Used for binary classification problems.

3.Decision Trees: A flowchart-like structure that splits data into branches to make predictions.

4.Random Forest: An ensemble method that uses multiple decision trees to improve accuracy.

5.Support Vector Machines (SVM): A classification technique that finds the hyperplane that best
separates classes.

6.Neural Networks: A set of algorithms modelled after the human brain, used for complex pattern
recognition.
What is overfitting, and how can it be prevented?

Ans : Overfitting occurs when a model learns the noise in the training data rather than the
underlying pattern, resulting in poor performance on unseen data. It can be prevented by:

1.Using simpler models.

2.Applying regularization techniques (e.g., L1 or L2 regularization).

3.Using cross-validation to ensure the model generalizes well.

4.Pruning decision trees.

What is cross-validation, and why is it important?

Answer: Cross-validation is a technique used to assess how a predictive model will generalize to an
independent dataset. It involves partitioning the data into subsets, training the model on some
subsets, and validating it on others. It is important because it helps to prevent overfitting and
provides a more reliable estimate of model performance.

How do you evaluate the performance of a predictive model?

Answer: Model performance can be evaluated using various metrics, depending on the type of
problem:

For regression: Mean Absolute Error (MAE), Mean Squared Error (MSE), R-squared.

For classification: Accuracy, Precision, Recall, F1 Score, Area Under the Receiver Operating
Characteristic Curve (AUC-ROC).

What is DAX, and how is it used in Power BI?

Answer: DAX (Data Analysis Expressions) is a formula language used in Power BI for data modeling
and analysis. It allows users to create calculated columns, measures, and custom calculations to
enhance data analysis.

What is the difference between Power Query and Power Pivot?

Answer: Power Query: A data connection technology that enables users to discover, connect,
combine, and refine data from various sources.

Power Pivot: A data modelling tool that allows users to create data models, define relationships, and
perform complex calculations using DAX.

What is the purpose of using slicers in Power BI?

Answer: Slicers are visual filters that allow users to segment data in reports. They provide an
interactive way to filter data based on specific criteria, enhancing user experience and data
exploration.

You might also like