0% found this document useful (0 votes)

46 views3 pages

bt1101 Cheat Sheet

This document provides a cheat sheet for an introduction to business analytics course. It summarizes key concepts related to probability, statistical analysis, metrics, databases, and data visualization. Metrics and dashboards are introduced as tools to measure and monitor business performance. Common statistical techniques are outlined, including measures of central tendency, probability distributions, hypothesis testing, and methods for constructing confidence and prediction intervals.

Uploaded by

Random Dude

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views3 pages

bt1101 Cheat Sheet

Uploaded by

Random Dude

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

lOMoARcPSD|10039096

Finals Cheat Sheet - Summary Introduction to Business

Analytics
Introduction to Business Analytics (National University of Singapore)

StuDocu is not sponsored or endorsed by any college or university

Downloaded by Good Lee ([email protected])
lOMoARcPSD|10039096

Data: numerical or textual facts and figures that are collected through some type of PROBABILITY
measurement process. Classical definition: probabilities can be deduced from theoretical arguments
Information: result of analyzing data; that is, extracting meaning from data to support Relative frequency definition: probabilities are based on empirical data
evaluation and decision making. Subjective definition: probabilities are based on judgment and experience
A data set is simply a collection of data. Marketing survey responses, a table of
historical stock prices, and a collection of measurements of dimensions of a Statistical measures of goodness of fit:
manufactured item are examples of data sets. • Chi-square (need at least 50 data points)
A database is a collection of related files containing records on people, places, or • Kolmogorov-Smirnov (works well for small samples) → Compare cumulative
things. The people, places, or things for which we store and maintain information are distribution
called entities. A database for an online retailer that sells instructional fitness books H0: same distribution H1: different distribution
and DVDs, for instance, might consist of a file for three entities: publishers from which • Anderson-Darling (puts more weight on the differences between the tails of the
goods are purchased, customer sales transactions, and product inventory. A distributions)
database file is usually organized in a two-dimensional table, where the columns • Shapiro’s Normality Test (test data against normal distribution)
correspond to each individual element of data (called fields, or attributes), and the H0: normally distributed H1: not normally distributed
rows represent records of related data elements.
CENTRAL LIMIT THEOREM
A metric is a unit of measurement that provides a way to objectively quantify The central limit theorem states that if the sample size is large enough, the
performance. For example, senior managers might assess overall business sampling distribution of the mean is approximately normally distributed, regardless
performance using such metrics as net profit, return on investment, market share, of the distribution of the population and that the mean of the sampling distribution
and customer satisfaction. A plant manager might monitor such metrics as the will be the same as that of the population.
proportion of defective parts produced or the number of inventory turns each month. The central limit theorem also states that if the population is normally distributed,
For a Web-based retailer, some useful metrics are the percentage of orders filled then the sampling distribution of the mean will also be normal for any sample size.
accurately and the time taken to fill a customer’s order. Measurement is the act of
𝜎
obtaining data associated with a metric. Measures are numerical values associated Standard Error of the Mean =
√𝑛
with a metric.
CONFIDENCE INTERVALS FOR THE MEAN
Reliability means that data are accurate and consistent. with Known Population Standard Deviation
Validity means that data correctly measure what they are supposed to measure. 𝜎
𝑥̅ ± 𝑧𝛼⁄2 ( )
√𝑛
A dashboard is a visual representation of a set of key business measures. It is derived Standard 𝑧𝛼⁄2 values: 𝑧0.975 = 1.96
from the analogy of an automobile’s control panel, which displays speed, gasoline
level, temperature, and so on. Dashboards provide important summaries of key
business information to help manage a business process or function.

Pareto Analysis: 80% output – 20% input

A cross-tabulation is a tabular method that displays the number of observations in a

data set for different subcategories of two categorical variables. A cross-tabulation
table is often called a contingency table. The subcategories of the variables must be
mutually exclusive and exhaustive, meaning that each observation can be classified
into only one subcategory, and, taken together over all subcategories, they must with Unknown Population Standard Deviation
𝑠
constitute the complete data set. 𝑥̅ ± 𝑡𝛼⁄2,𝑛−1 ( )
√𝑛
DATA TYPE SINGLE VARIABLE MULTIPLE VARIABLES
CATEGORICAL Pie Barplot
Frequency Barplot Contigency Table
Frequency Table
NUMERICAL Barplot Group Barplot
Histogram Scatterplot
Frequency Table Contigency Table
TREND Line Chart Line Chart
Surface Chart
CONFIDENCE INTERVALS FOR A PROPORTION
MEASURES 𝒑̂ (𝟏 − 𝒑
̂) 𝒙
̂ ± 𝑧𝛼⁄2 √
𝒑 ̂=
𝒑
• LOCATION Mean, Median, Mode 𝑛 𝒏
• DISPERSION Range, Variance, Standard Deviation, Chebyshev’s Theorem,
Coefficient of Variation
• SHAPE Skewness, Kurtosis
• ASSOCIATION Covariance, Correlation
MEAN STANDARD DEVIATION
POPULATION ∑𝑁
𝑖=1 𝑥𝑖
𝜇= ∑𝑛 (𝑥𝑖 − 𝜇)
𝑁 𝜎 = √ 𝑖=1
𝑁
SAMPLE ∑𝑛𝑖=1 𝑥𝑖 𝑛
𝑥̅ = ∑ (𝑥𝑖 − 𝑥𝑖 )
𝑛 𝑠 = √ 𝑖=1
𝑛−1

Chebyshev’s Theorem Empirical Rule PREDICTION INTERVALS

1 k = 1 ~ 68%
𝑃(𝜇 ± 𝑘𝜎) = 1 − 2 1
𝑘 k = 2 ~ 95% 𝑥̅ ± 𝑡𝛼⁄2,𝑛−1 (𝑠√1 + )
k = 3 ~ 99.7% 𝑛

𝐒𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐃𝐞𝐯𝐢𝐚𝐭𝐢𝐨𝐧
𝐂𝐨𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐨𝐟 𝐕𝐚𝐫𝐢𝐚𝐭𝐢𝐨𝐧 (𝐂𝐕) =
𝐌𝐞𝐚𝐧
𝟏
𝐑𝐞𝐭𝐮𝐫𝐧 𝐭𝐨 𝐑𝐢𝐬𝐤 =
𝐂𝐕

Covariance is a measure of the linear association between two variables, X and Y.

Like
the variance, different formulas are used for populations and samples.
Correlation is a measure of the linear relationship between two variables, X and
Y, which does not depend on the units of measurement.

(𝑥 − 𝜇) 𝑥̅ − 𝜎0
𝑧= 𝑡= 𝑠
𝜎 ⁄ 𝑛
√ Downloaded by Good Lee ([email protected])
lOMoARcPSD|10039096

Multicollinearity: affects each other (cor > 0.7)

Interaction: Multiplication (A x B)

The principle of parsimony: models as simple as possible

FORECASTING NO SEASONALITY SEASONALITY

NO TREND Simple Moving Average Holt-Winters no-trend
Simple Exponential smoothing
Smoothing Multiple Regression
TREND Double Exponential Holt-Winters
Smoothing additive/multiplicative

Single Exponential Smoothing

ŷx = 𝛼 ∙ yx + (1 − 𝛼) ∙ ŷx−1

The null hypothesis is denoted by H0, and the alternative hypothesis is denoted by
H1. Using sample data, we either
1. reject the null hypothesis and conclude that the sample data provide
sufficient statistical evidence to support the alternative hypothesis, or
2. fail to reject the null hypothesis and conclude that the sample data does
not support the alternative hypothesis.
If we fail to reject the null hypothesis, then we can only accept as valid the existing DATA MINING
theory or belief, but we can never prove it. Approaches
- Data Exploration and Reduction: identifying groups in which elements are in
FALSE TRUE some way similar
Type I Error o Sampling
REJECT CORRECT
𝒑 = 𝜶 o Data Visualisations: Boxplots, Parallel Coordinates Chart,
Type II Error Scatterplot/Variable Plot Matrix
ACCEPT CORRECT
𝒑 = 𝜷 o Cluster Analysis (Hierarchical Clustering) → dendogram
▪ Agglomerative Clustering Method
The probability of making a Type I error, that is, P(rejecting H0 |H0 is true), • Single Linkage (Nearest Neighbor)
is denoted by α and is called the level of significance. This defines the likelihood that • Complete Linkage (furthest distance)
you are willing to take in making the incorrect conclusion that the alternative • Average Linkage (averaging groups)
hypothesis is true when, in fact, the null hypothesis is true. The value of a can be • Ward’s hierarchical clustering (sum of squares)
controlled by the decision maker and is selected before the test is conducted. ▪ Divisive Clustering Method
Commonly used levels for a are 0.10, 0.05, and 0.01. - Classification: analyzing data to predict how to classify a new data element
The probability of correctly failing to reject the null hypothesis, or P(not o k-Nearest Neighbors (categorical)
rejecting H0 |H0 is true), is called the confidence coefficient and is calculated as 1 -
▪ 𝑘 = √𝑛 < 20
α. For a confidence coefficient of 0.95, we mean that we expect 95 out of 100 samples
o Discriminant Analysis
to support the null hypothesis rather than the alternate hypothesis when H0 is actually
true.
Unfortunately, we cannot control the probability of a Type II error, P(not
rejecting H0 |H0 is false), which is denoted by β. Unlike α, β cannot be specified in
advance but depends on the true value of the (unknown) population parameter.
The value 1 -β is called the power of the test and represents the probability
of correctly rejecting the null hypothesis when it is indeed false, or P(rejecting H0 |H0
is false). We would like the power of the test to be high (equivalently, we would like
the probability of a Type II error to be low) to allow us to make a valid conclusion. The
power of the test is sensitive to the sample size; small sample sizes generally result
in a low value of 1 -β. The power of the test can be increased by taking larger samples,
which enable us to detect small differences between the sample statistics and o Logistic Regression (result will be in probability 0~1)
population parameters with more accuracy. However, a larger sample size incurs - Association: analyzing databases to identify natural associations among
higher costs, giving new meaning to the adage, there is no such thing as a free lunch. variables and create rules for target marketing or buying recommendations
This suggests that if you choose a small level of significance, you should try to - Cause-and-Effect Modelling: developing analytic models to describe
compensate by having a large sample size when you conduct the test. relationships between metrics that drive business performance
𝑛
STATISTICAL TESTS: 𝐶𝑖
Compare two sample means with normal errors: t.test NPV = ∑
- (1 + 𝑟)𝑖
𝑖=0
- Compare means of two or more population groups: aov (ANOVA)
o H0: 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑛
o H1: at least one mean is different from the others
Requirements:
a. Are randomly and independent obtained
b. Are normally distributed
c. Have equal variances
- Compare equality of two variances: var.test (F-test)
o H0: 𝜎12 − 𝜎22 = 0
o H1: 𝜎12 − 𝜎22 ≠ 0
𝑠12
𝐹= 2
𝑠2
- Compare more than two variances: Bartlett.test
o H0: 𝜎12 = 𝜎22 ⋯ = 𝜎 2
o Ha: 𝜎𝑖2 ≠ 𝜎𝑗2 for at least one pair (i,j).
- Compare proportions: prop.test

REGRESSION STATISTICS
R-squared = ∑(𝑦 − 𝑦′)2 → [worst ]0 < r < 1 [best]
Multiple R = |r| → sample correlation coefficient -1<= r <= 1
Adjusted R-squared → adjusted for sample size and number of X vars
Standard Error → variability between observed and predicted values

DIAGNOSTIC PLOTS
RESIDUALS vs FITTED the more linear and random, the better
NORMAL Q-Q the less deviation from the line, the more normal
SCALE-LOCATION homoscedasticity; better random, better horizontal
Downloaded by Good Lee ([email protected])
RESIDUALS vs LEVERAGE any influential outliers?

C207 Study Guide
No ratings yet
C207 Study Guide
27 pages
It0089 Finalreviewer
100% (1)
It0089 Finalreviewer
143 pages
SPSS - Unit I
No ratings yet
SPSS - Unit I
31 pages
Handout-A-Preliminaries (Advance Statistics)
No ratings yet
Handout-A-Preliminaries (Advance Statistics)
29 pages
Quantitative Methods 3
No ratings yet
Quantitative Methods 3
174 pages
HEALTH 7 Q2 Modules 1 To 7
No ratings yet
HEALTH 7 Q2 Modules 1 To 7
215 pages
Statistical Analysis (Lecture 1)
No ratings yet
Statistical Analysis (Lecture 1)
40 pages
Bio Statistics
No ratings yet
Bio Statistics
55 pages
2 & 16 Mark Questions and Answers: Enterprise Resource Planning
82% (11)
2 & 16 Mark Questions and Answers: Enterprise Resource Planning
33 pages
Tutoring Session 2023 - Statistics For Business
No ratings yet
Tutoring Session 2023 - Statistics For Business
65 pages
Stats
No ratings yet
Stats
11 pages
Statistical Foundations - Intro 64zlf
100% (2)
Statistical Foundations - Intro 64zlf
86 pages
Week 5A - Statistics Handout
No ratings yet
Week 5A - Statistics Handout
9 pages
CH11 pptx-1
No ratings yet
CH11 pptx-1
35 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Bcom Sem Vi - Principles of Business Decisions
0% (1)
Bcom Sem Vi - Principles of Business Decisions
24 pages
BoS - Session 1
100% (1)
BoS - Session 1
37 pages
Basics of Business Statistics
100% (1)
Basics of Business Statistics
66 pages
BR100 Po
100% (1)
BR100 Po
61 pages
QT Module-2
No ratings yet
QT Module-2
45 pages
Business Statistics - Session 1 - 3
No ratings yet
Business Statistics - Session 1 - 3
63 pages
AL - I (Unit - I)
No ratings yet
AL - I (Unit - I)
19 pages
DA Major Notes
No ratings yet
DA Major Notes
46 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
QM 1
No ratings yet
QM 1
58 pages
Fundamentals of Data Science and Analytics On Descriptive Analysis
No ratings yet
Fundamentals of Data Science and Analytics On Descriptive Analysis
53 pages
Building For Future Generations First Principles of True Reed, Jeff
No ratings yet
Building For Future Generations First Principles of True Reed, Jeff
68 pages
Biostatistics Notes-Numbered
No ratings yet
Biostatistics Notes-Numbered
21 pages
REVIEWER
No ratings yet
REVIEWER
2 pages
ECONOMICS SEM 4 Notes Sakshi
No ratings yet
ECONOMICS SEM 4 Notes Sakshi
10 pages
Statapp Chapter 1 121928
No ratings yet
Statapp Chapter 1 121928
2 pages
FIN10002 - Notes Master
No ratings yet
FIN10002 - Notes Master
44 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
30 pages
SWB 2
No ratings yet
SWB 2
298 pages
Chapter 1 & 2 - Stats
No ratings yet
Chapter 1 & 2 - Stats
5 pages
Introduction To Statistics
100% (3)
Introduction To Statistics
43 pages
Module 2 - Statistical Foundations
No ratings yet
Module 2 - Statistical Foundations
108 pages
Introduction Book 1
No ratings yet
Introduction Book 1
41 pages
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
100% (1)
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
33 pages
Psychology in Modern India
100% (1)
Psychology in Modern India
19 pages
Statistics and Data Analytics Cheat Sheets
100% (1)
Statistics and Data Analytics Cheat Sheets
2 pages
Sasa Reviewer P1 J P4 at P5
No ratings yet
Sasa Reviewer P1 J P4 at P5
10 pages
Sasa Reviewer P1, P4 at P5
No ratings yet
Sasa Reviewer P1, P4 at P5
10 pages
Cheat Sheets - Stats Analytics
No ratings yet
Cheat Sheets - Stats Analytics
2 pages
Table of Concrete Design Properties Including Strength Properties
No ratings yet
Table of Concrete Design Properties Including Strength Properties
7 pages
Tarlac Province
No ratings yet
Tarlac Province
16 pages
Descriptive Analytics
No ratings yet
Descriptive Analytics
15 pages
Advantages of Technology
No ratings yet
Advantages of Technology
9 pages
Lecture 1-Statistics Introduction-Defining, Displaying and Summarizing Data
No ratings yet
Lecture 1-Statistics Introduction-Defining, Displaying and Summarizing Data
53 pages
01 - Introduction To Statistics
No ratings yet
01 - Introduction To Statistics
38 pages
Reviewer Part 1
No ratings yet
Reviewer Part 1
9 pages
3 Matm111
No ratings yet
3 Matm111
3 pages
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
No ratings yet
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
46 pages
Business Statistics May Module
No ratings yet
Business Statistics May Module
72 pages
Data Science (Unit 02) Notes
No ratings yet
Data Science (Unit 02) Notes
7 pages
Introduction Bus Statistics
No ratings yet
Introduction Bus Statistics
32 pages
Introduction To STATISTICS-new
No ratings yet
Introduction To STATISTICS-new
44 pages
In The Matters Of:: Special Leave Petition (Criminal) No
No ratings yet
In The Matters Of:: Special Leave Petition (Criminal) No
49 pages
Business Analytics CHAPTER 2
No ratings yet
Business Analytics CHAPTER 2
3 pages
Eda Quiz 1 Reviewer
No ratings yet
Eda Quiz 1 Reviewer
4 pages
Heaven Earth Man Facial Diagnosis
50% (2)
Heaven Earth Man Facial Diagnosis
4 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Chapter 1 PDF
No ratings yet
Chapter 1 PDF
5 pages
Ge8 Statistics
No ratings yet
Ge8 Statistics
2 pages
A. Variables:: Types of Distributions
No ratings yet
A. Variables:: Types of Distributions
10 pages
Admission of Patient For Surgery
No ratings yet
Admission of Patient For Surgery
5 pages
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
ISOM Cheat Sheet 1
No ratings yet
ISOM Cheat Sheet 1
6 pages
Physiotherapy An Active Transformational and Authentic Career Choice
No ratings yet
Physiotherapy An Active Transformational and Authentic Career Choice
15 pages
Bustat Reviewer
No ratings yet
Bustat Reviewer
6 pages
Business Analytics (MIS171) Summary Notes
No ratings yet
Business Analytics (MIS171) Summary Notes
6 pages
365 Data Science - Statistics: Glossary Section Lesson Word
No ratings yet
365 Data Science - Statistics: Glossary Section Lesson Word
5 pages
Notes
No ratings yet
Notes
3 pages
Elements of Poetry
No ratings yet
Elements of Poetry
19 pages
Stat Defn Booklet
No ratings yet
Stat Defn Booklet
9 pages
Doctor Allama Muhamamd Iqbal
No ratings yet
Doctor Allama Muhamamd Iqbal
7 pages
Authentic Conversations PDF
No ratings yet
Authentic Conversations PDF
7 pages
Are Contract Labourers Direct Employees: The Two Prong Test
No ratings yet
Are Contract Labourers Direct Employees: The Two Prong Test
2 pages
CHỢ MỚI- ĐỀ KIỂM TRA GIỮA KỲ II LỚP 6
No ratings yet
CHỢ MỚI- ĐỀ KIỂM TRA GIỮA KỲ II LỚP 6
3 pages
Henry Sy
No ratings yet
Henry Sy
1 page
Candida-Associated Denture Stomatitis
No ratings yet
Candida-Associated Denture Stomatitis
5 pages
Sleep Hygiene
No ratings yet
Sleep Hygiene
1 page
Nov 20 - Lesson Plan - Hot Air Balloon
No ratings yet
Nov 20 - Lesson Plan - Hot Air Balloon
2 pages
Assimilation
No ratings yet
Assimilation
2 pages
Ket Test 25
No ratings yet
Ket Test 25
3 pages
Slappy
No ratings yet
Slappy
2 pages
Transportation Law Common Carriage of Passengers Case List - ATTY LAMAN
No ratings yet
Transportation Law Common Carriage of Passengers Case List - ATTY LAMAN
2 pages
OSPF in Point-To-Multipoint
No ratings yet
OSPF in Point-To-Multipoint
4 pages
066 - Sison V David (1961) - Subido
No ratings yet
066 - Sison V David (1961) - Subido
3 pages
Mujahid and Salauddin Face Imminent Execution
No ratings yet
Mujahid and Salauddin Face Imminent Execution
2 pages

bt1101 Cheat Sheet

Uploaded by

bt1101 Cheat Sheet

Uploaded by

lOMoARcPSD|10039096

Finals Cheat Sheet - Summary Introduction to Business

StuDocu is not sponsored or endorsed by any college or university

Pareto Analysis: 80% output – 20% input

A cross-tabulation is a tabular method that displays the number of observations in a

Chebyshev’s Theorem Empirical Rule PREDICTION INTERVALS

Covariance is a measure of the linear association between two variables, X and Y.

Multicollinearity: affects each other (cor > 0.7)

The principle of parsimony: models as simple as possible

FORECASTING NO SEASONALITY SEASONALITY

Single Exponential Smoothing

You might also like