0% found this document useful (0 votes)
6 views7 pages

Business Analytics Notes

The document covers key concepts in business analytics, including mean, standard deviation, skewness, normality, t-tests, chi-square tests, and cluster analysis. It emphasizes the importance of these statistical measures and tests in understanding data distributions, making informed decisions, and improving business operations. Additionally, it highlights how data analytics can enhance customer insights, forecasting, operational efficiency, and competitive advantage.

Uploaded by

priya24laasya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views7 pages

Business Analytics Notes

The document covers key concepts in business analytics, including mean, standard deviation, skewness, normality, t-tests, chi-square tests, and cluster analysis. It emphasizes the importance of these statistical measures and tests in understanding data distributions, making informed decisions, and improving business operations. Additionally, it highlights how data analytics can enhance customer insights, forecasting, operational efficiency, and competitive advantage.

Uploaded by

priya24laasya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

BUSINESS ANALYTICS NOTES

3. Standard Deviation, Mean, Skewness

Mean

The mean is a fundamental measure of central tendency that provides a single value
representing the average of a dataset. It is calculated by summing all values and dividing by
the number of observations. The formula for mean is:

Mean=∑xn\text{Mean} = \frac{\sum x}{n}Mean=n∑x

where ∑x\sum x∑x is the sum of all data points, and nnn is the number of data points. The
mean gives an idea of the "typical" value in a dataset and is widely used in business analytics
to understand key metrics like average sales, customer spending, or production output.
However, it is sensitive to extreme values (outliers), which can distort its accuracy. For
example, if a company’s average employee salary is calculated including a very high CEO
salary, the result may not represent the typical employee’s salary.

Standard Deviation (SD)

Standard Deviation is a measure of dispersion or spread in a dataset. It indicates how much


individual data points deviate from the mean. A low standard deviation suggests that the
values are close to the mean, whereas a high standard deviation indicates that the values are
more spread out. The formula for standard deviation (for a sample) is:

PAGE NO. = 53

Standard deviation is critical in business to assess risk, consistency and performance. For
example, in stock market analysis, a high standard deviation means the stock is volatile and
risky, whereas a low SD means it is more stable.

Skewness
Skewness measures the asymmetry of the distribution of data. If the data is perfectly
symmetrical, the distribution is said to have zero skewness and is considered normal. If the
tail is longer on the right side, the distribution is positively skewed, and if the tail is longer
on the left, it is negatively skewed. The formula for sample skewness is:

PAGE NO. = 61, 64 & 65

Skewness helps in identifying whether the mean is a reliable measure of central tendency. For
example, in customer income data, if a few customers earn significantly more than the rest,
the data will show positive skewness. This impacts the choice of statistical techniques and
summarization methods.

4. Normality and Distribution of Data

What is Normality?

Normality refers to a condition where the dataset follows a normal distribution, also known
as the Gaussian distribution. A normal distribution is symmetric, bell-shaped, and centered
around the mean. In this distribution, the mean, median, and mode are all equal. It is widely
used in statistics due to its natural occurrence in many real-life phenomena such as employee
performance, product weight, or exam scores.

Properties of Normal Distribution

The normal distribution is defined by two parameters: mean (μ) and standard deviation (σ).
The shape of the curve is determined by these two. It has key properties:

 It is symmetric around the mean.


 About 68.26% of the data lies within ±1σ, 95.44% within ±2σ, and 99.73% within
±3σ.
 The total area under the curve is 1.
 It extends infinitely in both directions, though practically most data lies within ±3σ.

Why Normality Matters in Analytics


Many parametric tests such as t-tests, regression analysis, and ANOVA assume normality
of data. If the assumption of normality is violated, the results from these tests may not be
valid. For example, if you want to evaluate employee performance based on a training
program using a t-test, the data should ideally be normally distributed for accurate
interpretation.

How to Check for Normality

Normality can be visually assessed using:

 Histograms
 Box plots
 Q-Q (quantile-quantile) plots

Additionally, statistical tests like the Shapiro-Wilk Test and Kolmogorov-Smirnov Test are
used to test normality.

Non-Normal Distributions

If data is not normally distributed, it could be skewed, bimodal, or uniform. In such cases,
non-parametric tests such as the Mann-Whitney U test or Kruskal-Wallis test are more
appropriate. For example, income data often follows a positively skewed distribution, and
using a non-parametric approach would yield more reliable results.

6. T-Test

Definition and Purpose

The t-test is a parametric test used to compare the means of two groups and determine if
the differences are statistically significant. It is especially useful when the sample size is
small and population standard deviation is unknown.

Types of T-Tests

1. One-sample t-test: Compares the mean of a single group with a known or


hypothesized population mean.
Example: Is the average delivery time of a service different from 30 minutes?
2. Independent (two-sample) t-test: Compares the means of two independent groups.
Example: Compare average sales of two different branches.
3. Paired t-test: Used when the same group is measured twice (before and after a
treatment).
Example: Measure productivity of employees before and after training.

Formula for Independent t-test

Assumptions

 Data should be normally distributed


 Samples are independent
 Variance between groups should be equal (homogeneity of variance)

The t-test is widely used in business to compare employee performance, marketing


campaign results, or sales before and after a price change.

7. Chi-Square Test

Overview

The Chi-Square Test (χ²) is a non-parametric test used to examine the association
between categorical variables. It is used when data is in the form of frequencies or counts,
not continuous variables.

Types of Chi-Square Tests

1. Chi-square test of independence:


o Tests whether two categorical variables are independent.
o Example: Is customer satisfaction independent of geographic location?
2. Chi-square goodness-of-fit test:
o Checks if a sample distribution matches an expected distribution.
o Example: Are product sales equally distributed across all weekdays?
Formula

Assumptions

 Data must be in counts


 Categories must be mutually exclusive
 Expected frequency in each cell should be ≥ 5

The Chi-square test is often used in business to assess customer preferences, employee
satisfaction by department, or relationship between product category and return rate.

8. Cluster Analysis

Introduction

Cluster Analysis is a powerful unsupervised learning technique used to group similar data
points into clusters, where the data within each cluster is more similar to each other than to
those in other clusters. It is widely used in market research, customer segmentation, and
pattern recognition.

Purpose and Importance

The main purpose of clustering is to discover hidden structures or patterns in large


datasets. For instance, a company can use clustering to segment its customers into groups
such as price-sensitive, brand-loyal, and occasional buyers, allowing targeted marketing
strategies for each group.

Types of Clustering

1. Hierarchical Clustering:
o Creates a tree-like structure (dendrogram) to group data.
o Useful for small datasets.
2. K-Means Clustering:
o Divides data into K predefined clusters.
o Minimizes the within-cluster variance.

Steps in K-Means Clustering

 Choose number of clusters (K)


 Randomly assign data points to clusters
 Calculate cluster centroids
 Reassign points based on nearest centroid
 Repeat until convergence

Application in Business

 Customer segmentation for personalized offers


 Fraud detection in banking
 Inventory categorization based on turnover and value

Cluster analysis helps businesses maximize marketing ROI, reduce churn, and improve
operational efficiency.

9. Importance of Data Analytics in Business

Decision-Making

Data analytics enables fact-based decision-making by transforming raw data into actionable
insights. It replaces intuition with data-driven strategies. For example, analyzing past sales
can help forecast future demand, aiding in inventory planning.

Customer Insights

Businesses can use analytics to deeply understand customer behavior, preferences, and
feedback. By analyzing customer purchase history, a retailer can personalize product
recommendations and marketing messages, leading to improved customer retention.

Forecasting and Planning


Using predictive analytics, businesses can anticipate future trends, customer behavior, or
risks. For example, banks use historical loan data to predict default risks, allowing better
credit decisions.

Operational Efficiency

Analytics helps in identifying inefficiencies in processes. By tracking KPIs (Key


Performance Indicators), companies can reduce wastage, optimize resource utilization, and
improve service delivery. In manufacturing, analytics is used for quality control and
production optimization.

Competitive Advantage

Data-driven companies gain a competitive edge by reacting faster to market trends and
making smarter strategic decisions. For instance, companies like Amazon and Netflix thrive
because they leverage analytics to recommend products and content, enhancing customer
satisfaction.

OTHERS IN NOTE BOOK FOR REFERAL.

You might also like