0% found this document useful (0 votes)

16 views7 pages

Mod 1 Stats

Uploaded by

jennylehuynh29

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views7 pages

Mod 1 Stats

Uploaded by

jennylehuynh29

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Module 1: Descriptive Statistics

Lecture 1: Introduction and Descriptive Statistics

Data Types
Qualitative/categorical
● Mutually exclusive labels (one label cannot mean two things)
● Not often numbers, if so, numbers have no mathematical meaning
- Nominal: ordering/ranking makes no sense, numerical labels are arbitrary
- Ordinal: ordering/ranking has meaning/can be interpreted, numerical labels
respect the ordering
Quantitative/numerical
● Numbers used to record certain events, numbers have mathematical meaning
- Interval: quantity in difference is meaningful, but in ratio is not; zero has no
natural meaning
- Ratio : difference and ratio of two quantities is
also meaningful; zero is meaningful

Using categorical/qualitative data

Frequency distribution
● Frequency: the total number of occurrences for each
category

➗
● Relative frequency: the fraction of total number of
items belonging to category (eg. 102 808 = 0.1262)
● Percent frequency: relative frequency x 100%
Histograms
● Categories on x-axis
● Frequency, relative frequency, percent frequency on y-axis

Using numerical/quantitative data

Frequency distributions and histograms
● Categories on x-axis are grouped (eg. 0-5, 5-10, 10-15)
● Density frequency

Probability theory
● Random variable (r.v.) - a variable’s value appears randomly
● population - the complete pool of a certain random variable
● Sample - a random collection of certain size from the population

Probability distribution
● Probability distribution - the general shape of probability for
values that a random variable may take

Notation
● Random variable denoted by X, Y (capital letters)
- Eg. X: number of children in household
- Eg. Y: amount of time spent by husband on
housework per day
● realisations/observations of a random variable denoted by xᵢ, yᵢ (lowercase letters
with subscript)
- Eg. x₁: number of children in household is 1
- Eg. y₁₃₇:amount of time spent by husband is 137 on housework per day
● N and n denote the size or number of observations.
- N is referred to population size
- n denotes the sample size

Descriptive Statistics
Central tendency
● Measure of central tendency yields info about the centre of a set of numbers
(distribution of a r.v.’s) – does not focus on the span of the dataset or how far values
are from middle numbers
● gives an idea of what a typical, middle, or average that a r.v. can take
● sometimes called measures of location

three measures of central tendency

Mode ● most frequently occurring value in a set of data

● If there are 2 modes, the 2 modes are listed and the data is said to be bimodal
● Datasets with 3 or more modes are referred to as multimodal
● Concept of mode is often used in determining sizes
● Appropriate descriptive summary measure for categorical data

Median ● middle value in an ordered array of numbers

𝑛+1
● locate the median by finding the 2 th term in the ordered array
● Large and small values do not inordinately influence the median – hence the
● best measure of location to use in the analysis of variables in which extreme but
acceptable values can occur at just one end of the data
● Not all info from the dataset is used
● Data must be quantitative or be able to be ranked

Mean ● Average of a set of numbers

● Sample mean is represented by X̄
● Population mean is represented by µ
● Data should be quantitative as it needs to be summed
● Affected by all values – advantage because it reflects all the data, but
disadvantage because extreme values pull the mean towards extremes
● To calculate the mean forecast value, we need to multiply each possible value by
its probability and sum up the products.

- If we denote the r.v. by X:

Variability
● Measures of variability yield info about the likelihood of a realisation of the r.v. is
away from the centre of its distribution, describes the spread/dispersion of a dataset
● Gives an idea of fluctuation and volatility across realisations of the r.v.
● The more variability in a dataset, the less typical they are of the whole set
● Using measures of variability in conjunction with measures of central tendency
makes possible a more complete numerical description of the data (measure of
variability is necessary to complement the mean value when describing data)
● Conveys fluctuations and volatility across realisation of random variable
● The more spread out the r.v. is, the larger the risk/dispersion the variability is
● Also called measures of scale, spread, dispersion or risk
● Measures of variability
- Variance (Var) - average of squared distance from the mean
- Standard deviation (std): square root of variance
- Coefficient of variation - standard deviation/ mean x100%

Variability formulas
Variance
● It computes the average squared distance between data points and their mean,
depending on sample or population
● Population variance
- Finite population
- Denoted by σ² (stigma square) or
Var(X)/Variance of X
● Sample variance
- Denoted by s²
Standard deviation
● Standard deviation solves the problem of squared units. It has the same unit of the
original data
● Population standard deviation
- Denoted by σ (stigma) or std(X)
● Sample standard deviation
- Denoted by s
Coefficient of variation
● Measures standard deviation per unit of
mean
● In finance when the r.v. X denotes assets returns, CV measures risk per unit of
expected return
● It is unit free, because both the numerator and denominator have the same unit as
the original data and they cancel each other
● Population CV
- when σ increase, CV increase
- when µ increase, CV decreases
- Ratio between risk and expected return
Skewness
Shape
● Central tendency and variability are useful to describe and summarise data or the
distribution of r.v.’s
● Skewness - a measure of asymmetry
● Mode: value on the horizontal axis where the high point of the curve occurs
● Mean: towards the tail of the distribution (drawn towards the extreme values)
● Median: generally located somewhere between the mode and the mean

Lecture 2: Probability theory

● Multi-dimensional data
● Experiment: a random process that creates outcomes (eg. the data collection
procedure)
● Sample space: the set of all possible outcomes
● Event: a set of outcomes (can contain no outcome, single outcome or multiple
outcomes) of an experiment to which probability is assigned. So an event is a subset
of the sample space
● Relative frequency: outcomes receive probability corresponding to their number of
occurrences → P(outcomes)= number of occurrences of outcomeı ÷ total number of
occurrences of all outcomes

Law of addition
Joint vs marginal probabilities
● Distinguish joint and marginal probability through multidimensional outcomes
● Joint probability: denotes relative frequency when asking about all dimensions
- Eg. what is relative frequency that customer bought a $49 plan on a weekday
● Marginal probability: displays relative frequency when only asking about a single
dimension

Law of total probability, version 1

● Complement of the event denoted as A’ → pronounced as A prime - meaning not A -
if there is a dash at the top = not the outcome
● When referring to joint probability, we use
intersection “∩”. The event A∩B (it reads: the
intersection of A and B or A intersection B) means
the event where both A and B are true or both A and B occur

Venn diagram: visualisation of probability

● Venn diagram shows logic relations across sets
● The external rectangle indicates the whole sample space
● The internal circle indicates some event A
Joint events
● Joint events such as A ∩ B is the intersection (∩) of A and B
Union of events
● Indicates the event A or B happens
● This is denoted by A∪B, pronounced as the union of A and B or A union B.
So P(A∪B) indicates the probability that A or B is true or that A or B occurs

General rule of addition

Mutually exclusive events

● If event A occurs only if event B does not occur (cannot occur at the
same time), we say A and B are mutually exclusive (events)
● Any event and its complement are mutually exclusive. Either “A
occurs” or “A does not occur
● P(A∩A’) = 0

Collectively exhaustive events

● If the occurrence of events A and B covers the whole sample
space, we say A and B are collectively exhaustive (events
● Any event and its complement are collectively exhaustive. “A
occurs” and “A does not occur” make up all possible outcomes
● P( A∪A’) = 1

Conditional probability and independence

Conditional probabilities
● P(A|B) denotes the probability that event A occurs, conditional on that B occurs.
● The symbol P(X=x|Y=y) denotes the probability of r.v. X taking value x, conditional on
the r.v. Y taking value y
● formula:

● Bayes rule:

Law of total probability

● Joint probability = conditional probability multiplied by the marginal probability

Independent events: formula

● If A and B are independent events, whether or not B occurs should not affect the
probability that A occurs; also, whether or not A occurs should affect the probability
that B occurs
● Formula:

● Bayes rule:

Implications of formulas

Binomial experiments
● Eg. toss a coin 3 times in a row and you are interested in how likely it is that you get
exactly two heads
● A binomial experiment assesses the number of a certain outcome from repeated
independent trials
● Each trial has two possible outcomes (eg. heads or tails, success or failure)

Binomial tree
● When two outcomes are independent, P(A|B) = P(A)
● Suppose we have three products, each can be defect
(D) with probability p or functional (F) with probability q=
=1-p
Binomial distribution
● A r.v. X taking value in (0,1,...,n) is said to follow the binomial distribution denoted by
𝑋 ~ 𝐵𝑖𝑛(𝑛, 𝑝)

𝑥
● 𝑝 : the probability of x successes
𝑛−𝑥
● (1 − 𝑝) : the probability of n-x failures. So in total we have n trials
● The factor (combinatorial operator)

- computes the number of cases/combinations of choosing x

objects from the set of n objects. Remember the factorial
operator m! = 1 x 2 x 3 x … x (m-1) x m

● Properties of binomial distribution:

- Almost all distributions have expectation (i.e. mean) and variance (and thus
standard deviation).
- Every distribution (their pdf) is characterised by some parameters.
→ The binomial distribution has two parameters, 𝒏 (the number of trials) and
𝒑 (the success probability or success rate)
→ the mean (expectation) and variance of 𝑋~𝐵𝑖𝑛(𝑛, 𝑝) are given by:

Econ1203 Notes
67% (3)
Econ1203 Notes
35 pages
Lecture-1 Descriptive Statistics
No ratings yet
Lecture-1 Descriptive Statistics
50 pages
Psychology 117 Study Guide
100% (3)
Psychology 117 Study Guide
41 pages
Tpe 517 Geostatistics II
No ratings yet
Tpe 517 Geostatistics II
83 pages
L2 - Mathematical Preliminaries
No ratings yet
L2 - Mathematical Preliminaries
24 pages
W3 Descriptive Statistics
No ratings yet
W3 Descriptive Statistics
47 pages
Business Statistics - Sessions 4 To 7
No ratings yet
Business Statistics - Sessions 4 To 7
43 pages
Lecture Note On Biostatistics
No ratings yet
Lecture Note On Biostatistics
74 pages
Chapter Two
No ratings yet
Chapter Two
36 pages
AP Statistics Study Guide
100% (1)
AP Statistics Study Guide
12 pages
Business Statistics Notes
No ratings yet
Business Statistics Notes
50 pages
Stats Summary Notes
No ratings yet
Stats Summary Notes
32 pages
Exam 1 Study Guide NURS450
No ratings yet
Exam 1 Study Guide NURS450
13 pages
Bioepi Lesson 6. Descriptive Statistics
No ratings yet
Bioepi Lesson 6. Descriptive Statistics
38 pages
Module 1 - Descriptive Stats
No ratings yet
Module 1 - Descriptive Stats
9 pages
ML2 Math Algo
No ratings yet
ML2 Math Algo
72 pages
GE 04 - Mathematics in The Modern World-Topic 2-Data Management
No ratings yet
GE 04 - Mathematics in The Modern World-Topic 2-Data Management
36 pages
Part 1 QT
No ratings yet
Part 1 QT
40 pages
7.1 Fundamental Theories of Probability: Reporter: Erika Dianne Salma
No ratings yet
7.1 Fundamental Theories of Probability: Reporter: Erika Dianne Salma
22 pages
Lecture Methods 3
No ratings yet
Lecture Methods 3
23 pages
Research Methodology - Types, Examples and Writing Guide
No ratings yet
Research Methodology - Types, Examples and Writing Guide
12 pages
Revision Module 1,2,3
No ratings yet
Revision Module 1,2,3
129 pages
SPSC Final Chapter-4-1-1-3-1-1
No ratings yet
SPSC Final Chapter-4-1-1-3-1-1
63 pages
Stats Reviewer
No ratings yet
Stats Reviewer
5 pages
Lecture 2 Slides With Q&A 20242025
No ratings yet
Lecture 2 Slides With Q&A 20242025
38 pages
Viva Que 1
No ratings yet
Viva Que 1
43 pages
Kotze Writing An Academic Journal Article
No ratings yet
Kotze Writing An Academic Journal Article
158 pages
Basic Stat
No ratings yet
Basic Stat
46 pages
STAT Vocab
No ratings yet
STAT Vocab
15 pages
It0089 Finalreviewer
No ratings yet
It0089 Finalreviewer
143 pages
Satistics
No ratings yet
Satistics
18 pages
002 Probability-and-Statistics-Part-1-Data
No ratings yet
002 Probability-and-Statistics-Part-1-Data
84 pages
The Classical Model: Slides by Niels-Hugo Blunch Washington and Lee University
100% (1)
The Classical Model: Slides by Niels-Hugo Blunch Washington and Lee University
22 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
51 pages
Stats Review
No ratings yet
Stats Review
65 pages
STA301 Statistics and Probability FAQS AND GLOSSARY
No ratings yet
STA301 Statistics and Probability FAQS AND GLOSSARY
33 pages
9.1. Prob - Stats
No ratings yet
9.1. Prob - Stats
19 pages
1 Intro-Statistics
No ratings yet
1 Intro-Statistics
61 pages
STA301 IMP Notes Headings and Some Questions Answers Prepared by
No ratings yet
STA301 IMP Notes Headings and Some Questions Answers Prepared by
32 pages
Module Wise Important Formulae
No ratings yet
Module Wise Important Formulae
45 pages
Prof. Joy V. Lorin-Picar Davao Del Norte State College: New Visayas, Panabo City
No ratings yet
Prof. Joy V. Lorin-Picar Davao Del Norte State College: New Visayas, Panabo City
91 pages
QBA Summary Notes
No ratings yet
QBA Summary Notes
28 pages
Formula Sheet Biostatistics
No ratings yet
Formula Sheet Biostatistics
2 pages
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
No ratings yet
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
69 pages
Introduction Into Statistics: Vladimir Kozlov
No ratings yet
Introduction Into Statistics: Vladimir Kozlov
20 pages
Lesson 4 Notes
No ratings yet
Lesson 4 Notes
14 pages
Activity-Based Costing, Total Quality Management and Business Process Re Engineering Their Separate and Concurrent Association With Improvement in Financial Performance
No ratings yet
Activity-Based Costing, Total Quality Management and Business Process Re Engineering Their Separate and Concurrent Association With Improvement in Financial Performance
22 pages
Unit 3 - Descriptive Statistics
No ratings yet
Unit 3 - Descriptive Statistics
44 pages
Statistical Methods
No ratings yet
Statistical Methods
15 pages
Psych 101 Endterm Notes
No ratings yet
Psych 101 Endterm Notes
9 pages
Statistical Methods
No ratings yet
Statistical Methods
16 pages
Statistical and Probability Tools For Cost Engineering
No ratings yet
Statistical and Probability Tools For Cost Engineering
16 pages
Resilience of Organizations in The Construction Industry in The Face of COVID-19 Disturbances
No ratings yet
Resilience of Organizations in The Construction Industry in The Face of COVID-19 Disturbances
17 pages
How To Conduct Paired-T-Test SPSS: Comprehension in Adsorption With Bibliometric
No ratings yet
How To Conduct Paired-T-Test SPSS: Comprehension in Adsorption With Bibliometric
8 pages
Multivariate Analysis of Variance-MANOVA
No ratings yet
Multivariate Analysis of Variance-MANOVA
14 pages
Basic Statistics: Statistics: Is A Science That Analyzes Information Variables (For Instance
No ratings yet
Basic Statistics: Statistics: Is A Science That Analyzes Information Variables (For Instance
14 pages
DM Mod4
No ratings yet
DM Mod4
108 pages
Statistics and Probabilities Quarter 1
No ratings yet
Statistics and Probabilities Quarter 1
6 pages
Introductiontof Criminology DONE
No ratings yet
Introductiontof Criminology DONE
31 pages
Exam2005 2
0% (1)
Exam2005 2
19 pages
Final Exam Review Questions ("Big Problems")
No ratings yet
Final Exam Review Questions ("Big Problems")
3 pages
Up Tps6 Lecture Powerpoint 11.1 2
No ratings yet
Up Tps6 Lecture Powerpoint 11.1 2
63 pages
CHAPTER 5 Project Report
No ratings yet
CHAPTER 5 Project Report
35 pages
Stephen Few Show Me The Numbers
0% (1)
Stephen Few Show Me The Numbers
7 pages
Reviewer Part 1
No ratings yet
Reviewer Part 1
9 pages
Descriptive Probability
No ratings yet
Descriptive Probability
12 pages
Mean Median Mode Range
No ratings yet
Mean Median Mode Range
16 pages
Measures of Central Tendency: Mean
No ratings yet
Measures of Central Tendency: Mean
7 pages
Session: 27: Topic
No ratings yet
Session: 27: Topic
62 pages
Phyllis Kathuri - Project
No ratings yet
Phyllis Kathuri - Project
60 pages
A. Variables:: Types of Distributions
No ratings yet
A. Variables:: Types of Distributions
10 pages
AI5006 - Deep Learning
No ratings yet
AI5006 - Deep Learning
6 pages
GEA1000 Finals Cheatsheet
No ratings yet
GEA1000 Finals Cheatsheet
2 pages
Ederio DLP, Math 7 Quarter 4
No ratings yet
Ederio DLP, Math 7 Quarter 4
7 pages
Stats Week 1 PDF
No ratings yet
Stats Week 1 PDF
6 pages
Course Contents: INSTITUTE of MANAGEMENT STUDIES, Devi Ahilya University, INDORE
No ratings yet
Course Contents: INSTITUTE of MANAGEMENT STUDIES, Devi Ahilya University, INDORE
16 pages
Practical Guide To Controlled Experiments On The Web: Listen To Your Customers Not To The Hippo
No ratings yet
Practical Guide To Controlled Experiments On The Web: Listen To Your Customers Not To The Hippo
34 pages
Week 3 - Measures of Central Tendency
No ratings yet
Week 3 - Measures of Central Tendency
4 pages
2 Descriptive Statistics Handout
No ratings yet
2 Descriptive Statistics Handout
2 pages
BIOL40049 Biostatistics Syllabus
No ratings yet
BIOL40049 Biostatistics Syllabus
7 pages
Classify Sample Observation
No ratings yet
Classify Sample Observation
2 pages
2088-Article Text-6814-1-10-20230619
No ratings yet
2088-Article Text-6814-1-10-20230619
9 pages
Determining Visitors' Satisfaction in Theme Parks A Case
100% (2)
Determining Visitors' Satisfaction in Theme Parks A Case
4 pages
Sampling Techniques
No ratings yet
Sampling Techniques
3 pages
CBS News 2016 Battleground Tracker, Methods: Ohio, Iowa, August 21, 2016
0% (1)
CBS News 2016 Battleground Tracker, Methods: Ohio, Iowa, August 21, 2016
4 pages
Mba Statistics Midterm Review Sheet
No ratings yet
Mba Statistics Midterm Review Sheet
1 page
Parametric and Nonparametric Test: By: Sai Prakash MBA Insurance Management Pondicherry University
No ratings yet
Parametric and Nonparametric Test: By: Sai Prakash MBA Insurance Management Pondicherry University
10 pages
Fault Classification by SVM
No ratings yet
Fault Classification by SVM
15 pages
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet

Mod 1 Stats

Uploaded by

Mod 1 Stats

Uploaded by

Module 1: Descriptive Statistics

Lecture 1: Introduction and Descriptive Statistics

Using categorical/qualitative data

Using numerical/quantitative data

three measures of central tendency

Mode ● most frequently occurring value in a set of data

Median ● middle value in an ordered array of numbers

Mean ● Average of a set of numbers

- If we denote the r.v. by X:

Lecture 2: Probability theory

Law of total probability, version 1

Venn diagram: visualisation of probability

General rule of addition

Mutually exclusive events

Collectively exhaustive events

Conditional probability and independence

Law of total probability

Independent events: formula

- computes the number of cases/combinations of choosing x

● Properties of binomial distribution:

You might also like