Data Analytics Question
Data Analytics Question
Descriptive Analytics: Focuses on summarizing past data to understand what happened. For
example, summarizing sales data for the past quarter.
Predictive Analytics: Uses historical data and statistical models to forecast future trends. For
instance, predicting next quarter’s sales based on past trends.
Prescriptive Analytics: Suggests actions based on the analysis to achieve desired outcomes,
such as recommending the best pricing strategy to maximize profit.
Correlation indicates a relationship between two variables, but one does not necessarily
cause the other. For example, there may be a correlation between ice cream sales and the
number of people going to the beach, but ice cream sales are not caused by the number of
beach-goers.
Causation implies that one variable directly affects another. For example, a business
increasing its advertising budget may cause a rise in sales.
What are the key differences between mean, median, and mode?
Mean is the average of a data set and can be affected by extreme values (outliers).
Median is the middle value in a data set when sorted in order and is not affected by
outliers.
Mode is the value that appears most frequently in the dataset and can be useful for
categorical data.
Regression analysis is used to understand relationships between variables. It helps predict the
value of one variable based on the values of others.
Simplify complex data: Visualizations should make it easy to understand the data at
a glance.
Focus on the story: The visualization should highlight the key insights, not just show
numbers.
Use appropriate chart types: Choose the right type of chart (bar, line, pie) for the
data and audience.
Ensure clarity: Use clear labels, legends, and colours to avoid confusion.
"Industries like retail, finance, healthcare, and e-commerce benefit greatly from
analytics. Retail and e-commerce use analytics to optimize inventory, predict demand,
and personalize customer experiences. Healthcare uses analytics to improve patient
care, while finance relies on it for risk assessment and fraud detection."
Overfitting occurs when a model is too complex and captures noise or random
fluctuations in the data, rather than the actual pattern. It leads to high accuracy on
training data but poor performance on new, unseen data.
To avoid overfitting:
o Use simpler models or regularization techniques (e.g., L1/L2 regularization).
o Split the data into training and validation sets to test the model on unseen data.
o Use cross-validation to ensure the model generalizes well.
Can you explain what a confusion matrix is?
Supervised learning involves training a model on labeled data, where the target
(output) variable is known. The model is then used to predict the target for new data.
Example: predicting house prices based on features like size, location, etc.
Unsupervised learning involves working with data that does not have labeled
outputs. The goal is to find hidden patterns or relationships in the data. Example:
clustering customers based on purchasing behavior.
Overcomplicating the visuals: Too many data points or complex charts can confuse
the audience. Use simplicity to highlight key insights.
Poor choice of chart type: For example, using pie charts for too many categories can
make it hard to read. Use bar charts or line charts where appropriate.
Lack of context: Ensure that your visualizations have clear labels, legends, and
context to avoid misinterpretation.
Ignoring color usage: Colors should be used consistently and in a way that is easy to
distinguish. Avoid using too many similar colors.
How do you choose between different visualization tools like Tableau, Power BI, and
Excel?
Tableau: Best for creating interactive dashboards and advanced visualizations with
large datasets. It’s powerful for real-time analysis.
Power BI: Ideal for integrating with Microsoft tools and for businesses already using
the Microsoft ecosystem. Great for straightforward visualizations and dashboarding.
Excel: Suitable for smaller datasets and quick analyses, especially when working with
financial data. It’s more flexible for custom calculations.
Answer: Predictive modelling is a statistical technique that uses historical data to create a model
that can predict future outcomes. It involves identifying patterns in data and applying algorithms to
forecast future events based on those patterns.
1.Linear Regression: Used for predicting a continuous outcome based on one or more predictor
variables.
3.Decision Trees: A flowchart-like structure that splits data into branches to make predictions.
4.Random Forest: An ensemble method that uses multiple decision trees to improve accuracy.
5.Support Vector Machines (SVM): A classification technique that finds the hyperplane that best
separates classes.
6.Neural Networks: A set of algorithms modelled after the human brain, used for complex pattern
recognition.
What is overfitting, and how can it be prevented?
Ans : Overfitting occurs when a model learns the noise in the training data rather than the
underlying pattern, resulting in poor performance on unseen data. It can be prevented by:
Answer: Cross-validation is a technique used to assess how a predictive model will generalize to an
independent dataset. It involves partitioning the data into subsets, training the model on some
subsets, and validating it on others. It is important because it helps to prevent overfitting and
provides a more reliable estimate of model performance.
Answer: Model performance can be evaluated using various metrics, depending on the type of
problem:
For regression: Mean Absolute Error (MAE), Mean Squared Error (MSE), R-squared.
For classification: Accuracy, Precision, Recall, F1 Score, Area Under the Receiver Operating
Characteristic Curve (AUC-ROC).
Answer: DAX (Data Analysis Expressions) is a formula language used in Power BI for data modeling
and analysis. It allows users to create calculated columns, measures, and custom calculations to
enhance data analysis.
Answer: Power Query: A data connection technology that enables users to discover, connect,
combine, and refine data from various sources.
Power Pivot: A data modelling tool that allows users to create data models, define relationships, and
perform complex calculations using DAX.
Answer: Slicers are visual filters that allow users to segment data in reports. They provide an
interactive way to filter data based on specific criteria, enhancing user experience and data
exploration.