0% found this document useful (0 votes)

16 views9 pages

Module 1 - Descriptive Stats

Uploaded by

jennylehuynh29

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views9 pages

Module 1 - Descriptive Stats

Uploaded by

jennylehuynh29

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Module 1: Descriptive Statistics

Data Types
Qualitative/categorical
● Mutually exclusive labels (one label cannot mean two things)
● Not often numbers, if so, numbers have no mathematical meaning
- Nominal: ordering/ranking makes no sense, numerical labels are arbitrary
- Ordinal: ordering/ranking has meaning/can be interpreted, numerical labels
respect the ordering
Quantitative/numerical
● Numbers used to record certain events, numbers have mathematical meaning
- Interval: quantity in difference is meaningful, but in ratio is not; zero has no
natural meaning
- Ratio : difference and ratio of two quantities is also meaningful; zero is
meaningful

Using categorical/qualitative data

Frequency distribution
● Frequency: the total number of occurrences for each
category
● Relative frequency: the fraction of total number of items
belonging to category (eg. 102 ➗808 = 0.1262)
● Percent frequency: relative frequency x 100%
Histograms
● Categories on x-axis
● Frequency, relative frequency, percent frequency on y-axis

Using numerical/quantitative data

Frequency distributions and histograms
● Categories on x-axis are grouped (eg. 0-5, 5-10, 10-15)
● Density frequency

Probability theory
● Random variable (r.v.) - a variable’s value appears randomly
● population - the complete pool of a certain random variable
● Sample - a random collection of certain size from the population

Probability distribution
● Probability distribution - the general shape of probability for values that a random
variable may take

Notation
● Random variable denoted by X, Y (capital letters)
- Eg. X: number of children in household
- Eg. Y: amount of time spent by husband on
housework per day
● realisations/observations of a random variable denoted by xᵢ,
yᵢ (lowercase letters with subscript)
- Eg. x₁: number of children in household is 1
- Eg. y₁₃₇:amount of time spent by husband is 137 on housework per day
● N and n denote the size or number of observations.
- N is referred to population size
- n denotes the sample size

Descriptive Statistics
Central tendency
● Measure of central tendency yields info about the centre of a set of numbers
(distribution of a r.v.’s) – does not focus on the span of the dataset or how far values
are from middle numbers
● gives an idea of what a typical, middle, or average that a r.v. can take
● sometimes called measures of location

three measures of central tendency

Mode ● most frequently occurring value in a set of data

● If there are 2 modes, the 2 modes are listed and the data is said to be bimodal
● Datasets with 3 or more modes are referred to as multimodal
● Concept of mode is often used in determining sizes
● Appropriate descriptive summary measure for categorical data

Median ● middle value in an ordered array of numbers

n+1
● locate the median by finding the th term in the ordered array
2
● Large and small values do not inordinately influence the median – hence the
● best measure of location to use in the analysis of variables in which extreme but
acceptable values can occur at just one end of the data
● Not all info from the dataset is used
● Data must be quantitative or be able to be ranked

Mean ● Average of a set of numbers

● Sample mean is represented by X̄
● Population mean is represented by μ
● Data should be quantitative as it needs to be summed
● Affected by all values – advantage because it reflects all the data, but
disadvantage because extreme values pull the mean towards extremes
● To calculate the mean forecast value, we need to multiply each possible value by
its probability and sum up the products.

- If we denote the r.v. by X:

Variability
● Measures of variability yield info about the likelihood of a realisation of the r.v. is
away from the centre of its distribution, describes the spread/dispersion of a dataset
● Gives an idea of fluctuation and volatility across realisations of the r.v.
● The more variability in a dataset, the less typical they are of the whole set
● Using measures of variability in conjunction with measures of central tendency
makes possible a more complete numerical description of the data (measure of
variability is necessary to complement the mean value when describing data)
● Conveys fluctuations and volatility across realisation of random variable
● The more spread out the r.v. is, the larger the risk/dispersion the variability is
● Also called measures of scale, spread, dispersion or risk
● Measures of variability
- Variance (Var) - average of squared distance from the mean
- Standard deviation (std): square root of variance
- Coefficient of variation - standard deviation/ mean x100%

Variability formulas
Variance
● It computes the average squared distance between data points and their mean,
depending on sample or population
● Population variance
- Finite population
- Denoted by σ ² (stigma square) or
Var(X)/Variance of X
● Sample variance
- Denoted by s²
Standard deviation
● Standard deviation solves the problem of
squared units. It has the same unit of the
original data
● Population standard deviation
- Denoted by σ (stigma) or std(X)
● Sample standard deviation
- Denoted by s
Coefficient of variation
● Measures standard deviation per unit of mean
● In finance when the r.v. X denotes assets returns, CV measures risk per unit of
expected return
● It is unit free, because both the numerator and
denominator have the same unit as the original data and
they cancel each other
● Population CV
- when σ increase, CV increase
- when μ increase, CV decreases
- Ratio between risk and expected return
Skewness
Shape
● Central tendency and variability are useful to describe and summarise data or the
distribution of r.v.’s
● Skewness - measure of asymmetry
● Mode: value on the horizontal axis where the high point of the curve occurs
● M
e
a
n
:

towards
the tail of
the

distribution (drawn towards the extreme values)

● Median: generally located somewhere between the mode and the mean

Probability theory
● Multi-dimensional data
● Experiment: a random process that creates outcomes (eg. the data collection
procedure)
● Sample space: the set of all possible outcomes
● Event: a set of outcomes (can contain no outcome, single outcome or multiple
outcomes) of an experiment to which probability is assigned. So an event is a subset
of the sample space
● Relative frequency: outcomes receive probability corresponding to their number of
occurrences → P(outcomes)= number of occurrences of outcomeı ÷ total number of
occurrences of all outcomes

Law of addition
Joint vs marginal probabilities
● Distinguish joint and marginal probability through multidimensional outcomes
● Joint probability: denotes relative frequency when asking about all dimensions
- Eg. what is relative frequency that customer bought a $49 plan on a weekday
● Marginal probability: displays relative frequency when only asking about a single
dimension
- Eg. relative frequency that customer bought a $49 plan
●
Complement of the
event denoted
as A’ →
pronounced as A prime - meaning not A - if there is a dash at the top = not the outcome

When referring to joint probability, we use intersection “∩”. The event A∩B (it reads:the
intersection of A and B, or A intersection B) means the event where both A and B are
true or both A and B occur

Venn diagram: visualisation of probability

● Venn diagram shows logic relations across sets
● The external rectangle indicates the whole sample space
● The internal circle indicates some event A
Joint events
● Joint events such as A ∩ B is the intersection (∩) of A and B
Union of events
● Indicates the event A or B happens
● This is denoted by A∪B, pronounced as the union of A and B or A union B.
So P(A∪B) indicates the probability that A or B is true or that A or B occur

Mutually exclusive events

● If event A occurs only if event B does not occur (cannot occur at
the same time), we say A and B are mutually exclusive (events)
● Any event and its complement are mutually exclusive. Either “A
occurs” or “A does not occur
● P(A∩A’) = 0

Collectively exhaustive events

● If the occurrence of events A and B covers the whole sample
space, we say A and B are collectively exhaustive (events
● Any event and its complement are collectively exhaustive. “A occurs” and “A does not
occur” make up all possible outcomes
● P( A∪A’) = 1

Conditional probabilities and independence

Conditional probabilities
● P(A|B) denotes the probability that event A occurs, conditional on that B occurs.
● The symbol P(X=x|Y=y) denotes the probability of r.v. X taking value x, conditional on
the r.v. Y taking value y
● formula:

● Bayes rule:

Law of total probability

● Joint probability = conditional probability multiplied by marginal probability

Independent events: formula

● If A and B are independent events, whether or not B occurs should not affect the
probability that A occurs; also, whether or not A occurs should affect the probability
that B occurs
● Formula:

● Bayes rule:

Implications of formulas

Binomial experiments
● Eg. toss a coin 3 times in a row and you are interested in how likely it is that you get
exactly two heads
● A binomial experiment assesses the number of a certain outcome from repeated
independent trials
● Each trial has two possible outcomes (eg. heads or tails, success or failure)

Binomial tree
● When two outcomes are independent, P(A|B) = P(A)
● Suppose we have three products, each can be defect (D) with probability p or
functional (F) with probability q= = 1 - p

Continuous probability distributions

● Discrete probability distribution: the distribution of a discrete random variable
● Discrete random variable: a r.v. that takes discrete values. Discrete r.v. typically
counts
- Eg. number of kids in a household, number of successes in n trials
● Continuous random variable: a r.v. that takes values on (part of) the real line.
Continuous r.v. measures
- Eg. waiting time in a queue, height of soldiers, inflation rates

2 different probability distribution functions (pdfs): Discrete, Continuous

Scores add up
to 1

Probability density function

● continuous probability distribution for X is defined via the
means of probability density function (pdf) which assigns a
positive value to possible outcomes of X such that the
density is integrated to 1 (this means that the area under the
curve is 1). The probability that X lies between two numbers
is the area under pdf function between those numbers
Discrete random variable
● P(X=x), where x is some specific value because P(X=x) =0 always
● A continuous r.v. has infinitely many outcomes. If a single outcome had positive
probability, the probabilities would add up to infinity and not 1
● Eg. What is the probability that a random person waits exactly
2.71285748634050284… minutes?
- The probability is 0. However the probability that a person waits in between
2.71…84 and 2.71…9 is strictly positive
Implications for Inequalities

Cumulative density function for a continuous pdf

● P(X<x) for a continuous r.v. defines the cumulative
density function (CDF)

Conditions for pdf f χ (x):

1. Total area under the pdf equals 1: P(-∞ <X<∞ )=1
2. Given how probability is worked with areas we can
also say that the pdf can never be negative,
because it would imply negative probabilities over some range
- Eg.

Continuous uniform distribution

● A r.v. X taking any value within [a,b]
is said to follow the continuous
uniform distribution
● X ~ Unif(a,b)
● If all potential outcomes (realisations) between a and b are equally likely
● There are two parameters:
- a: the minimum value that X can assume
- b: the maximum value that X can assume

a+b (b−a)²
- E( X )= , Var ( X)=
2 12

● For any continuous r.v.’s P(x ₁< X < x ₂)=P ¿ )−P ¿), the area under the pdf from x₂
to x₁ is the difference between the values of the cdf at x₂ and x₁

Principles of Inventory and Materials Management
50% (2)
Principles of Inventory and Materials Management
9 pages
002 Probability-and-Statistics-Part-1-Data
No ratings yet
002 Probability-and-Statistics-Part-1-Data
84 pages
p1 Formula Sheet
No ratings yet
p1 Formula Sheet
2 pages
Andreas Behr (Auth.) - Production and Efficiency Analysis With R (2015, Springer International Publishing) PDF
No ratings yet
Andreas Behr (Auth.) - Production and Efficiency Analysis With R (2015, Springer International Publishing) PDF
235 pages
Anna University Master of Business Administration (Mba) Curriculum - 2005 - Full-Time Mode Semester - I
No ratings yet
Anna University Master of Business Administration (Mba) Curriculum - 2005 - Full-Time Mode Semester - I
59 pages
Mod 1 Stats
No ratings yet
Mod 1 Stats
7 pages
Measures of Central Tendency: Mean
No ratings yet
Measures of Central Tendency: Mean
7 pages
Mba Statistics Midterm Review Sheet
No ratings yet
Mba Statistics Midterm Review Sheet
1 page
SPSC Final Chapter-4-1-1-3-1-1
No ratings yet
SPSC Final Chapter-4-1-1-3-1-1
63 pages
L2 - Mathematical Preliminaries
No ratings yet
L2 - Mathematical Preliminaries
24 pages
Psych 101 Endterm Notes
No ratings yet
Psych 101 Endterm Notes
9 pages
Statistical Methods
No ratings yet
Statistical Methods
15 pages
Basic Statistics: Statistics: Is A Science That Analyzes Information Variables (For Instance
No ratings yet
Basic Statistics: Statistics: Is A Science That Analyzes Information Variables (For Instance
14 pages
Satistics
No ratings yet
Satistics
18 pages
Stats Review
No ratings yet
Stats Review
65 pages
Business Statistics Notes
No ratings yet
Business Statistics Notes
50 pages
Stats Week 1 PDF
No ratings yet
Stats Week 1 PDF
6 pages
Module Wise Important Formulae
No ratings yet
Module Wise Important Formulae
45 pages
Stats Summary Notes
No ratings yet
Stats Summary Notes
32 pages
Statistical Methods
No ratings yet
Statistical Methods
16 pages
STA301 IMP Notes Headings and Some Questions Answers Prepared by
No ratings yet
STA301 IMP Notes Headings and Some Questions Answers Prepared by
32 pages
Econ1203 Notes
67% (3)
Econ1203 Notes
35 pages
7.1 Fundamental Theories of Probability: Reporter: Erika Dianne Salma
No ratings yet
7.1 Fundamental Theories of Probability: Reporter: Erika Dianne Salma
22 pages
Lecture Methods 3
No ratings yet
Lecture Methods 3
23 pages
9.1. Prob - Stats
No ratings yet
9.1. Prob - Stats
19 pages
Prelims Biostat
No ratings yet
Prelims Biostat
9 pages
Business Statistics - Sessions 4 To 7
No ratings yet
Business Statistics - Sessions 4 To 7
43 pages
Classify Sample Observation
No ratings yet
Classify Sample Observation
2 pages
ML2 Math Algo
No ratings yet
ML2 Math Algo
72 pages
Unit 3 - Descriptive Statistics
No ratings yet
Unit 3 - Descriptive Statistics
44 pages
A. Variables:: Types of Distributions
No ratings yet
A. Variables:: Types of Distributions
10 pages
Probability & Statistics
No ratings yet
Probability & Statistics
108 pages
Part 1 QT
No ratings yet
Part 1 QT
40 pages
Week 3 - Measures of Central Tendency
No ratings yet
Week 3 - Measures of Central Tendency
4 pages
Stats Reviewer
No ratings yet
Stats Reviewer
16 pages
Chap5 (Bus Analytics)
No ratings yet
Chap5 (Bus Analytics)
2 pages
Statistical and Probability Tools For Cost Engineering
No ratings yet
Statistical and Probability Tools For Cost Engineering
16 pages
Prof. Joy V. Lorin-Picar Davao Del Norte State College: New Visayas, Panabo City
No ratings yet
Prof. Joy V. Lorin-Picar Davao Del Norte State College: New Visayas, Panabo City
91 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
51 pages
2 Descriptive Statistics Handout
No ratings yet
2 Descriptive Statistics Handout
2 pages
STAT Vocab
No ratings yet
STAT Vocab
15 pages
Basic Statistics
No ratings yet
Basic Statistics
23 pages
Statistical Concepts and Principles
No ratings yet
Statistical Concepts and Principles
37 pages
Introduction Into Statistics: Vladimir Kozlov
No ratings yet
Introduction Into Statistics: Vladimir Kozlov
20 pages
Lecture 2 Slides With Q&A 20242025
No ratings yet
Lecture 2 Slides With Q&A 20242025
38 pages
Descriptive Probability
No ratings yet
Descriptive Probability
12 pages
Tpe 517 Geostatistics II
No ratings yet
Tpe 517 Geostatistics II
83 pages
Basic - Statistics 30 Sep 2013 PDF
100% (1)
Basic - Statistics 30 Sep 2013 PDF
20 pages
GEA1000 Finals Cheatsheet
No ratings yet
GEA1000 Finals Cheatsheet
2 pages
Statistics and Probabilities Quarter 1
No ratings yet
Statistics and Probabilities Quarter 1
6 pages
Lecture-1 Descriptive Statistics
No ratings yet
Lecture-1 Descriptive Statistics
50 pages
Statistics and Probability Notes Part 1
No ratings yet
Statistics and Probability Notes Part 1
23 pages
Statistics Lecture Course 2022-2023
No ratings yet
Statistics Lecture Course 2022-2023
66 pages
W3 Descriptive Statistics
No ratings yet
W3 Descriptive Statistics
47 pages
Week One: Introduction To Quantitative Methods MBA 2013
No ratings yet
Week One: Introduction To Quantitative Methods MBA 2013
49 pages
Chapter Two
No ratings yet
Chapter Two
36 pages
Unit 3 R As A Set of Statistical Tables
No ratings yet
Unit 3 R As A Set of Statistical Tables
31 pages
GE 04 - Mathematics in The Modern World-Topic 2-Data Management
No ratings yet
GE 04 - Mathematics in The Modern World-Topic 2-Data Management
36 pages
ISA Summary Toya
No ratings yet
ISA Summary Toya
38 pages
Statistics and Probability
No ratings yet
Statistics and Probability
12 pages
Finals (MS)
No ratings yet
Finals (MS)
3 pages
Basic Stat
No ratings yet
Basic Stat
46 pages
2021 CUMT 105 Module 1
No ratings yet
2021 CUMT 105 Module 1
30 pages
PYQ - Probability and Statistics 2023-2024
No ratings yet
PYQ - Probability and Statistics 2023-2024
8 pages
HL Paper 1: A Function Is Defined by
No ratings yet
HL Paper 1: A Function Is Defined by
30 pages
HCMUTE Prob and Stat Lecture 4
No ratings yet
HCMUTE Prob and Stat Lecture 4
28 pages
Lecture 16 Probability Distributions-1
No ratings yet
Lecture 16 Probability Distributions-1
27 pages
Koleksi Soalan SPM Paper 1
No ratings yet
Koleksi Soalan SPM Paper 1
34 pages
Questions - Chapter 4
No ratings yet
Questions - Chapter 4
28 pages
Comm 839 Past Questions and Answers
No ratings yet
Comm 839 Past Questions and Answers
13 pages
EDA Quiz 2 Answer Key
No ratings yet
EDA Quiz 2 Answer Key
4 pages
1 - Chapter 1 - Frequency Distribution and Graphs
No ratings yet
1 - Chapter 1 - Frequency Distribution and Graphs
29 pages
AIMLSyllabus
No ratings yet
AIMLSyllabus
15 pages
Quantitative Techniques Course Outline
No ratings yet
Quantitative Techniques Course Outline
4 pages
Binomial Geometric and Random Variable Practice 2024
No ratings yet
Binomial Geometric and Random Variable Practice 2024
2 pages
DAA Notes
No ratings yet
DAA Notes
115 pages
Statistical Methods For The Social Sciences Academia
No ratings yet
Statistical Methods For The Social Sciences Academia
105 pages
Problems Involving Mean and Variance of Probability Distributions
50% (2)
Problems Involving Mean and Variance of Probability Distributions
38 pages
Mannino Heiberger Fedorov
No ratings yet
Mannino Heiberger Fedorov
18 pages
Lesson 4 Hypergeometric Poisson Distribution
No ratings yet
Lesson 4 Hypergeometric Poisson Distribution
16 pages
Indian Institue of Technology 1
No ratings yet
Indian Institue of Technology 1
186 pages
Statistics and Probability Module 1
No ratings yet
Statistics and Probability Module 1
5 pages
Introduction To Normal Distribution: Nathaniel E. Helwig
0% (1)
Introduction To Normal Distribution: Nathaniel E. Helwig
56 pages
Chapter 4
No ratings yet
Chapter 4
29 pages
Discrete Distributions
No ratings yet
Discrete Distributions
25 pages
Normal Distribution Notes & Exam Type Qstns
No ratings yet
Normal Distribution Notes & Exam Type Qstns
18 pages
Assessment Module 1
No ratings yet
Assessment Module 1
3 pages
SUMMATIVE TEST in Statistics Probability
0% (1)
SUMMATIVE TEST in Statistics Probability
2 pages

Module 1 - Descriptive Stats

Uploaded by

Module 1 - Descriptive Stats

Uploaded by

Module 1: Descriptive Statistics

Using categorical/qualitative data

Using numerical/quantitative data

three measures of central tendency

Mode ● most frequently occurring value in a set of data

Median ● middle value in an ordered array of numbers

Mean ● Average of a set of numbers

- If we denote the r.v. by X:

distribution (drawn towards the extreme values)

Venn diagram: visualisation of probability

Mutually exclusive events

Collectively exhaustive events

Conditional probabilities and independence

Law of total probability

Independent events: formula

Continuous probability distributions

2 different probability distribution functions (pdfs): Discrete, Continuous

Probability density function

Cumulative density function for a continuous pdf

Conditions for pdf f χ (x):

Continuous uniform distribution

You might also like