Business Analytics Notes
Business Analytics Notes
Mean
The mean is a fundamental measure of central tendency that provides a single value
representing the average of a dataset. It is calculated by summing all values and dividing by
the number of observations. The formula for mean is:
where ∑x\sum x∑x is the sum of all data points, and nnn is the number of data points. The
mean gives an idea of the "typical" value in a dataset and is widely used in business analytics
to understand key metrics like average sales, customer spending, or production output.
However, it is sensitive to extreme values (outliers), which can distort its accuracy. For
example, if a company’s average employee salary is calculated including a very high CEO
salary, the result may not represent the typical employee’s salary.
PAGE NO. = 53
Standard deviation is critical in business to assess risk, consistency and performance. For
example, in stock market analysis, a high standard deviation means the stock is volatile and
risky, whereas a low SD means it is more stable.
Skewness
Skewness measures the asymmetry of the distribution of data. If the data is perfectly
symmetrical, the distribution is said to have zero skewness and is considered normal. If the
tail is longer on the right side, the distribution is positively skewed, and if the tail is longer
on the left, it is negatively skewed. The formula for sample skewness is:
Skewness helps in identifying whether the mean is a reliable measure of central tendency. For
example, in customer income data, if a few customers earn significantly more than the rest,
the data will show positive skewness. This impacts the choice of statistical techniques and
summarization methods.
What is Normality?
Normality refers to a condition where the dataset follows a normal distribution, also known
as the Gaussian distribution. A normal distribution is symmetric, bell-shaped, and centered
around the mean. In this distribution, the mean, median, and mode are all equal. It is widely
used in statistics due to its natural occurrence in many real-life phenomena such as employee
performance, product weight, or exam scores.
The normal distribution is defined by two parameters: mean (μ) and standard deviation (σ).
The shape of the curve is determined by these two. It has key properties:
Histograms
Box plots
Q-Q (quantile-quantile) plots
Additionally, statistical tests like the Shapiro-Wilk Test and Kolmogorov-Smirnov Test are
used to test normality.
Non-Normal Distributions
If data is not normally distributed, it could be skewed, bimodal, or uniform. In such cases,
non-parametric tests such as the Mann-Whitney U test or Kruskal-Wallis test are more
appropriate. For example, income data often follows a positively skewed distribution, and
using a non-parametric approach would yield more reliable results.
6. T-Test
The t-test is a parametric test used to compare the means of two groups and determine if
the differences are statistically significant. It is especially useful when the sample size is
small and population standard deviation is unknown.
Types of T-Tests
Assumptions
7. Chi-Square Test
Overview
The Chi-Square Test (χ²) is a non-parametric test used to examine the association
between categorical variables. It is used when data is in the form of frequencies or counts,
not continuous variables.
Assumptions
The Chi-square test is often used in business to assess customer preferences, employee
satisfaction by department, or relationship between product category and return rate.
8. Cluster Analysis
Introduction
Cluster Analysis is a powerful unsupervised learning technique used to group similar data
points into clusters, where the data within each cluster is more similar to each other than to
those in other clusters. It is widely used in market research, customer segmentation, and
pattern recognition.
Types of Clustering
1. Hierarchical Clustering:
o Creates a tree-like structure (dendrogram) to group data.
o Useful for small datasets.
2. K-Means Clustering:
o Divides data into K predefined clusters.
o Minimizes the within-cluster variance.
Application in Business
Cluster analysis helps businesses maximize marketing ROI, reduce churn, and improve
operational efficiency.
Decision-Making
Data analytics enables fact-based decision-making by transforming raw data into actionable
insights. It replaces intuition with data-driven strategies. For example, analyzing past sales
can help forecast future demand, aiding in inventory planning.
Customer Insights
Businesses can use analytics to deeply understand customer behavior, preferences, and
feedback. By analyzing customer purchase history, a retailer can personalize product
recommendations and marketing messages, leading to improved customer retention.
Operational Efficiency
Competitive Advantage
Data-driven companies gain a competitive edge by reacting faster to market trends and
making smarter strategic decisions. For instance, companies like Amazon and Netflix thrive
because they leverage analytics to recommend products and content, enhancing customer
satisfaction.