0% found this document useful (0 votes)

23 views

Lecture 1 Descriptive Statistics

Uploaded by

haikal shariff

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

Lecture 1 Descriptive Statistics

Uploaded by

haikal shariff

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

AGR 5201

ADVANCED
STATISTICAL
METHODS

Lecture 1
Descriptive Statistics
DESCRIPTIVE STATISTICS
POPULATION VS. SAMPLE

Sample
A part of a
population

Population
Consist of all possible
values of a variable
Sample

What is sample?

• A sample is a part of the

population

• It is randomly sampled
from a large population
DESCRIPTIVE STATISTICS - EXAMPLE

An Illustration:
Below are data of IQ test from 13 students from class A and B. Which group
is smarter?

Class A Class B
102 115 127 162
128 109 131 103
131 89 96 111
98 106 80 109
140 119 93 87
93 97 120 105
110 109

Each individual may be different. If you try to understand a group by remembering the qualities of
each member, you become overwhelmed and fail to understand the group.
DESCRIPTIVE STATISTICS

Which group is smarter now?

Class A average IQ Class B average IQ

110.54 110.23

They’re roughly the same!

With a summary descriptive statistic, it is much easier to
answer our question.
TYPES OF DESCRIPTIVE STATISTICS

1. Graphics Organize Data

• Tables
• Graphs

2. Numeric Summarize Data

• Central Tendency measure of location
• Variation measure of spread
1. Graphics Organize Data

• Tables
• Frequency Distributions
• Relative Frequency Distributions

• Graphs
• Bar Chart or Histogram
• Stem and Leaf Plot
• Frequency Polygon
Grouped Relative Frequency Distribution

Relative Frequency Distribution of IQ for Two Classes

IQ Frequency Percent Cumulative percent

80 – 89 3 12.5 12.5
90 – 99 5 20.8 33.3
100 – 109 6 25 58.3
110 – 119 3 12.5 70.8
120 – 129 3 12.5 83.3
130 – 139 2 8.3 91.6
140 – 149 1 4.2 95.8
150 and over 1 4.2 100
Total 24 100
Histogram
Bar Graph
Stem And Leaf Plot

Stem Leaf
2. Numeric Summarizing Data:

• Central Tendency (measure of location)

• Mean
• Median
• Mode

• Variation (measure of spread)

• Range
• Interquartile Range
• Variance
• Standard Deviation
Mean

Most commonly called the “average.”

Symbol : is known as “Y bar”
To get the mean, add up the values for each case and divide by the total number of cases.
The formula for mean:
Mean

Some symbolic conventions in this class:

• Y = your variable (could be X or Q or )

• “bar” or line over symbol of your variable = mean of that variable
• Y1 = first case’s value on variable Y
• “. . .” = ellipsis = continue sequentially
• Yn = last case’s value on variable Y
• n = number of cases in your sample
• Σ = Greek letter “sigma” = sum or add up what follows
• i = a typical case or each case in the sample (1 through n)
Mean

Class A Class B
102 115 127 162
128 109 131 103
131 89 96 111
98 106 80 109
140 119 93 87
93 97 120 105
110 109
Mean

The mean is the “balance point.”

Each person’s score is like 1 kg placed at the score’s position on a see-saw. Below, on a
200 cm see-saw, the mean equals 110, the place on the see-saw where a fulcrum finds
balance:
1 kg at 93 1 kg at 1 kg at
cm 106 cm 131 cm

110 cm

17 21
4 units
units
below 0
units above
below units
The scale is balanced because…
17 + 4 on the left = 21 on the right
Mean

• Means can be badly affected by outliers (data points with extreme values
unlike the rest)
• Outliers can make the mean a bad measure of central tendency or common
experience

Income in the U.S.

All Americans Bill Gates

Mean Outlier
Median
The middle value when a variable’s values are ranked in order; the point that
divides a distribution into two equal halves.
When data are listed in order, the median is the point at which 50% of the cases
are above and 50% below it.
The 50th percentile.

The formula for median:

If n is odd, median =

If n is even, median =
Median

IQ score in class A (13 students = odd n)

Use formula median for odd n =

89 93 97 98 102 106 109 110 115 119 128 131 140

Median = 109
(six cases on the left, six on the right)
Median

If the first student were to drop out of Class A, there would be a new median
(even n):

89 93 97 98 102 106 109 110 115 119 128 131 140

Median = 109.5
(six cases on the left, six cases on the right)
Median

• The median is unaffected by outliers, making it a better measure of central

tendency, better describing the “typical person” than the mean when data are
skewed.

All Americans
Bill Gates
(outlier)
Median

• If the recorded values for a variable form a symmetric distribution, the median
and mean are identical.
• In skewed data, the mean lies further toward the skew than the median.

Symmetric
Skewed

Median
Mean
Mean Median
Mode

The most common data point is called the mode.

The combined IQ scores for Classes A & B:
80 87 89 93 93 96 97 98 102 103 105 106 109 109 109 110 111 115 119 120
127 128 131 131 140 162
The mode!!

BTW, It is possible to have more than one mode!

Mode

It may mot be at the center of a

distribution.

Data distribution on the right is

“bimodal”
Mode

It may give you the most likely

experience rather than the “typical” or
“central” experience.
2. Numeric Summarizing Data:

• Central Tendency (measure of location)

• Mean
• Median
• Mode

• Variation (measure of spread)

• Range
• Interquartile Range
• Variance
• Standard Deviation
Range

The spread, or the distance, between the lowest and highest values of a
variable.
To get the range for a variable, you subtract its lowest value from its highest
value.
Class A Class B
102 115 127 162
128 109 131 103
131 89 96 111
98 106 80 109
140 119 93 87
93 97 120 105
110 109

Class A Range = 140 - 89 = 51 Class B Range = 162 - 80 = 82

Interquartile Range (IQR)

• A quartile is the value that marks one of

the divisions that breaks a series of
values into four equal parts.
• The median is a quartile and divides the
cases in half.
• 25th percentile is a quartile that divides
the first ¼ of cases from the latter ¾.
• 75th percentile is a quartile that divides
the first ¾ of cases from the latter ¼.
• The interquartile range is the distance
or range between the 25th percentile and
the 75th percentile.
Variance

A measure of the spread of the recorded values on a variable.

A measure of dispersion.
The larger the variance, the further the individual cases are from the mean.

The smaller the variance, the

closer the individual scores
are to the mean.
Variance

•
Variance

Class A The deviation of 102 from 110.54 is?

102 115
128 109
102 - 110.54 = -8.54
131 89
98 106 Deviation of 115?
140 119 115 - 110.54 = 4.46
93 97
110

Mean, of A= 110.54
Example: Deviations of IQ for Class A

i Yi Deviation
1 102 -8.54
2 128 17.46
3 131 20.46
4 98 -12.54
5 140 29.46
6 93 -17.54
7 110 -0.54
8 115 4.46
9 109 -1.54
10 89 -21.54
11 106 -4.54
12 119 8.46
Variance

• We want to add these to get total deviations, but if we

were to do that, we would get zero every time. Why?
• We need a way to eliminate negative signs.
• Squaring the deviations will eliminate the negative
signs…
Back to the IQ example,

A deviation squared for 102 is:

(102 - 110.54)2 = (-8.54)2 = 72.93

and a deviation squared of 115:

(115 - 110.54)2 = (4.46)2 = 19.89
Variance

If you were to add all the squared deviations together, you’d get
what we call the
“Sum of Squares” (SS)
The formula for sum of square (SS):
Example: Sum of square of IQ for class A
i Yi Deviation Deviation squared
1 102 -8.54 72.9316
2 128 17.46 304.8516
3 131 20.46 418.6116
4 98 -12.54 157.2516
5 140 29.46 867.8916
6 93 -17.54 307.6516
7 110 -0.54 0.2916
8 115 4.46 19.8916
9 109 -1.54 2.3716
10 89 -21.54 463.9716
11 106 -4.54 20.6116
12 119 8.46 71.5716
13 97 -13.54 183.3316
Variance

The last step…

The approximate average sum of squares is the
“VARIANCE”

In general,
Sum of square
(SS)

Degree of freedom
= total number of sample -1
Variance

For Class A, Variance = 2825.39 / n - 1

= 2825.39 / 12 = 235.45

How helpful is that???

Variance is in squared unit (IQ point2), so it does
not represent the true value of variation in that
dataset.
Variance

•
Standard Deviation

To convert variance into something meaningful, let’s create standard deviation.

The square root of the variance reveals the average deviation of the observations from
the mean.

In other words, standard deviation is the average distance of observation from its
mean.
Standard Deviation

15.34
Standard Deviation

• Larger s.d. = greater amounts of variation around the mean.

For example:
Mean=3
Mean=3
s.d. = 1 s.d. = 0.5

• s.d. = 0 only when all values are the same (only when you have a constant and
not a “variable”)
• If you were to “rescale” a variable, the s.d. would change by the same
magnitude—if we changed units so the mean equaled 30, the s.d. on the left
would be 10, and on the right, 5.
• Like the mean, the s.d. will be inflated by an outlier case value.
Mean and Variance
Review

1. Deviation
2. Deviation squared
3. Sum of squares
4. Variance
5. Standard deviation
2. Numeric Summarizing Data:

• Central Tendency (measure of location)

• Mean
• Median
• Mode

• Variation (measure of spread)

• Range
• Interquartile Range
• Variance
• Standard Deviation

• …Wait! There’s one more!!

Box-plots

A way to graphically portray almost all descriptive statistics at once is the box-plot
(shows the location and spread).
A box-plot shows:
• Upper and lower quartiles
• Mean
• Median
• Range
• Outliers (values greater than 1.5 IQR)
Box-plots

Inter Quartile Range (IQR)

= 123.5 – 96.5 180.00

= 27
162
160.00

140.00
whiskers

123.5
120.00
Mean =110.5

106.5 (Median)
100.00
96.5
whiskers
80.00 82
IQ

The max and min value of the data are 162 and 82, respectively.
• The calculation of bottom‘whiskers’ length:
= Value at 25th percentile + (1.5* IQR)
= 96.5 - (1.5*27)
= 96.5 - 40.5
= 56 (the lowest value of the bottom whiskers)
• The calculation of top‘whiskers’ length:
= Value at 75th percentile + (1.5* IQR)
= 123.5 + (1.5*27)
= 123.5 + 40.5
= 164 (the highest value of the top whiskers)
• The minimum and maximum whiskers length is 56 and 164, respectively, so any value
smaller or greater than 56 and 164, respectively are considered outliers.
• The max value of our data is 162 (which is <164), thus, there is no outlier in the
dataset. The same goes to the minimum value.
Distribution Shape and Box-plot

Calculus in Software Engineering
No ratings yet
Calculus in Software Engineering
10 pages
Numerology and Probability in Dante
100% (1)
Numerology and Probability in Dante
4 pages
Atm Machine FSM
No ratings yet
Atm Machine FSM
8 pages
Asas Statistik
No ratings yet
Asas Statistik
52 pages
Descriptive Statistics W2
No ratings yet
Descriptive Statistics W2
29 pages
W2 Descriptive Statistics
No ratings yet
W2 Descriptive Statistics
60 pages
week2-4-desciptive statistics
No ratings yet
week2-4-desciptive statistics
88 pages
Statistika - Materi 3 - Descriptive Statistics (Central Tendency)
No ratings yet
Statistika - Materi 3 - Descriptive Statistics (Central Tendency)
30 pages
Asdescriptive Statistics2 095747
No ratings yet
Asdescriptive Statistics2 095747
53 pages
Lec01 03 Descriptive Statistics
No ratings yet
Lec01 03 Descriptive Statistics
72 pages
Week 9 An Introduction To SPSS Descriptive Statistics
No ratings yet
Week 9 An Introduction To SPSS Descriptive Statistics
54 pages
Descriptive Statistics: Making Sense of Data
No ratings yet
Descriptive Statistics: Making Sense of Data
21 pages
1.3 Variation
No ratings yet
1.3 Variation
16 pages
IB A&I 3.1
No ratings yet
IB A&I 3.1
38 pages
Click To Add Text Dr. Cemre Erciyes
No ratings yet
Click To Add Text Dr. Cemre Erciyes
69 pages
Lesson 3 Descriptive Statistics - Measures of Central Tendency
No ratings yet
Lesson 3 Descriptive Statistics - Measures of Central Tendency
32 pages
Kuliah Minggu-3
No ratings yet
Kuliah Minggu-3
13 pages
Quant Descriptive Statistics
No ratings yet
Quant Descriptive Statistics
37 pages
Unit 6 Interpreting Evaluation Results
No ratings yet
Unit 6 Interpreting Evaluation Results
54 pages
Freq. distribution Characteristics
No ratings yet
Freq. distribution Characteristics
13 pages
Statistics: The Language of Facts: Group 6
No ratings yet
Statistics: The Language of Facts: Group 6
65 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
93 pages
Lesson 6c, 7, 8-Print
No ratings yet
Lesson 6c, 7, 8-Print
5 pages
Math in The Modern World Stat Lecture
No ratings yet
Math in The Modern World Stat Lecture
3 pages
Theory and Formula
No ratings yet
Theory and Formula
42 pages
slides_week2
No ratings yet
slides_week2
43 pages
Desc Excel
No ratings yet
Desc Excel
65 pages
Chapter 5 - RM
No ratings yet
Chapter 5 - RM
22 pages
Understanding Descriptive & Inferential Statistics
No ratings yet
Understanding Descriptive & Inferential Statistics
37 pages
11_Calculate Measures of Dispersion
No ratings yet
11_Calculate Measures of Dispersion
18 pages
Statistika - Materi 4 - Descriptive Statistics (Variation)
No ratings yet
Statistika - Materi 4 - Descriptive Statistics (Variation)
35 pages
Ch3 Numerically Summarizing Data
No ratings yet
Ch3 Numerically Summarizing Data
35 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
41 pages
Measures of Spread and Dispersion
No ratings yet
Measures of Spread and Dispersion
20 pages
Standard Deviation and Variance
No ratings yet
Standard Deviation and Variance
10 pages
Math2101Stat 2 2
No ratings yet
Math2101Stat 2 2
23 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
6 pages
Biostatistics (Descriptive Statistics)
No ratings yet
Biostatistics (Descriptive Statistics)
30 pages
Ed216 Chapter 7
No ratings yet
Ed216 Chapter 7
31 pages
Instructions For Chapter 3 Prepared by Dr. Guru-Gharana: Terminology and Conventions
No ratings yet
Instructions For Chapter 3 Prepared by Dr. Guru-Gharana: Terminology and Conventions
11 pages
EDA_W3_Obtaining-Data
No ratings yet
EDA_W3_Obtaining-Data
57 pages
Descriptive
No ratings yet
Descriptive
3 pages
Ch 2 Lecture Notes
No ratings yet
Ch 2 Lecture Notes
12 pages
Lecture 3 Numerical Measures of Data
No ratings yet
Lecture 3 Numerical Measures of Data
36 pages
Topic1-3
No ratings yet
Topic1-3
41 pages
Descreptive Statistics 1
No ratings yet
Descreptive Statistics 1
74 pages
Lecture 2 EPS 550 SP 2010
No ratings yet
Lecture 2 EPS 550 SP 2010
30 pages
Measures of Central Tendency and Variability
No ratings yet
Measures of Central Tendency and Variability
38 pages
221 Chapter3 Student
No ratings yet
221 Chapter3 Student
16 pages
5 Measures of Dispersion Descriptivesummarizingfrequency Distcumulativemean Medianvariance Mode STV
No ratings yet
5 Measures of Dispersion Descriptivesummarizingfrequency Distcumulativemean Medianvariance Mode STV
29 pages
Descriptive Statistics: Mean or Average
No ratings yet
Descriptive Statistics: Mean or Average
5 pages
Business Statistics NOtes
No ratings yet
Business Statistics NOtes
46 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
35 pages
Unit 01 - Describing Data and Its Distributions - 1 Per Page
No ratings yet
Unit 01 - Describing Data and Its Distributions - 1 Per Page
79 pages
Unit-3 DS Students
No ratings yet
Unit-3 DS Students
35 pages
SLIDES - Statistics-Descriptive Statistics
No ratings yet
SLIDES - Statistics-Descriptive Statistics
25 pages
PSYC104 Central Tendency & Variability
No ratings yet
PSYC104 Central Tendency & Variability
6 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
3-Measures of Dispersion
No ratings yet
3-Measures of Dispersion
33 pages
Jerome Statistics
No ratings yet
Jerome Statistics
12 pages
Lecture 2 - Descriptive Statistics
No ratings yet
Lecture 2 - Descriptive Statistics
40 pages
SALMAN ALAM SHAH - Definitions of Statistics
No ratings yet
SALMAN ALAM SHAH - Definitions of Statistics
16 pages
GCSE Maths Revision: Cheeky Revision Shortcuts
From Everand
GCSE Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (2)
DFANet Deep Feature Aggregation For Real-Time Semantic Segmentation
No ratings yet
DFANet Deep Feature Aggregation For Real-Time Semantic Segmentation
10 pages
3 ANOVA and Chi Square
No ratings yet
3 ANOVA and Chi Square
67 pages
Com221 Fortran Lang
No ratings yet
Com221 Fortran Lang
61 pages
Kun Her Tanti 2018
No ratings yet
Kun Her Tanti 2018
7 pages
Introduction To Engineering Mechanics
No ratings yet
Introduction To Engineering Mechanics
7 pages
Pre Calculus
0% (1)
Pre Calculus
42 pages
Basic Electrical Engineering: Fall 2015 Instructor: Dr. Hassan Dawood
No ratings yet
Basic Electrical Engineering: Fall 2015 Instructor: Dr. Hassan Dawood
183 pages
Russian Maths Books For IIT
No ratings yet
Russian Maths Books For IIT
1 page
Space Group
No ratings yet
Space Group
104 pages
(Problem Books in Mathematics) Vladimir V. Tkachuk (Auth.) - A Cp-Theory Problem Book - Special Features of Function Spaces-Springer International Publishing (2014) PDF
100% (1)
(Problem Books in Mathematics) Vladimir V. Tkachuk (Auth.) - A Cp-Theory Problem Book - Special Features of Function Spaces-Springer International Publishing (2014) PDF
595 pages
Semi Detailed LP in Math VI REVISED
No ratings yet
Semi Detailed LP in Math VI REVISED
2 pages
Corporate Finance: Fifth Edition
No ratings yet
Corporate Finance: Fifth Edition
137 pages
Geometry 1 1
No ratings yet
Geometry 1 1
2 pages
IBA Math Revision Handout
No ratings yet
IBA Math Revision Handout
8 pages
Loss Function
No ratings yet
Loss Function
4 pages
Multiplication 1
No ratings yet
Multiplication 1
2 pages
Distance and Displacement Speed and Velocity Answer Scheme
No ratings yet
Distance and Displacement Speed and Velocity Answer Scheme
2 pages
2024 Grade 8 November Mock Exam Memorandum p1
No ratings yet
2024 Grade 8 November Mock Exam Memorandum p1
7 pages
Period PV of 1 at 10% PV of Ordinary Annuity of 1 at 10%
No ratings yet
Period PV of 1 at 10% PV of Ordinary Annuity of 1 at 10%
1 page
Program of Stack Using Array
No ratings yet
Program of Stack Using Array
9 pages
Shuffled Frog Leaping Algorithm A Memetic Meta Heuristic For Discrete Optimization
No ratings yet
Shuffled Frog Leaping Algorithm A Memetic Meta Heuristic For Discrete Optimization
27 pages
Gamemaster
No ratings yet
Gamemaster
1 page
Preliminaries: - Prayer - in - Energizer - Asedilla, Gwyneth Kyle - Checking of Attendance - Review
No ratings yet
Preliminaries: - Prayer - in - Energizer - Asedilla, Gwyneth Kyle - Checking of Attendance - Review
20 pages
Etop Analysis PDF
No ratings yet
Etop Analysis PDF
2 pages
3004 Assignment 1 Model Answer
No ratings yet
3004 Assignment 1 Model Answer
3 pages
Jurnal Kemurnian Benih
No ratings yet
Jurnal Kemurnian Benih
20 pages
Kahoot and Quiz Questions
100% (1)
Kahoot and Quiz Questions
3 pages