0% found this document useful (0 votes)

926 views56 pages

Module 2 - Exploratory Data Analysis (EDA) : Central Tendency and Variability

Uploaded by

teganalexis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

926 views56 pages

Module 2 - Exploratory Data Analysis (EDA) : Central Tendency and Variability

Uploaded by

teganalexis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 56

Module 2 - Exploratory Data

Analysis (EDA)

Central Tendency and Variability

Text: Field, A. 2009 2nd edition

-Chapter 1: 1.7
-Chapter 2: 2.1 – 2.5
-Chapter 4: 4.1 – 4.9
Describing a Population/Sample
• Statistics is the study of data which has some
element of random variation - random variable.

• This variation in the variable under study can be

conceptualised as a frequency or probability
distribution.

• An example - Distribution of a normal random variable

(x)

x
• The properties of this distribution can be described in
several ways - Central tendency, Position, Variability
Describing a Population/Sample
• Central Tendency or “Average”
– Mode
– Median
15

– Mean 12

F re q u e n c y
9

• Position 6

– Quantiles 3

– Quartiles Mean = 23.03

Std. Dev. = 2.7412
N = 50

– Percentiles
0
16 18 20 22 24 26 28 30 32
height

• Variability or Dispersion
– Range, Interquartile Range (IQR)
– Variance, Standard Deviation
– Standard Error of the Sample Mean
Working With an Example

Note that for the following definitions, we

will be working with the following data set
(n=23) of individual weights (kg)

73 78.5 73 65.5 71.5

93 83 75.6 39 76
68.5 80 61 98 74.5
101 80.5 86.5 69.5
65.5 87 61.5 52.5
Central Tendency -
Value Frequency
39 1
52.5 1
Mode 61
61.5
1
1
• The mode is the most common 65.5
68.5
2
1
value 69.5 1
71.5 1
• It has the highest frequency in 73
74.5
2
1
the dataset 75.6 1
76 1
• You can see that the example 78.5 1
80 1
dataset has two modes: 80.5 1
65.5kg and 73kg both have a 83
86.5
1
1
frequency of 2 87 1
93 1
• This dataset is bimodal 98 1
101 1
Central Tendency - Median
• The median is the middle value in an ordered
list of n numbers

• 50% of the data lie on either side of this

value

• It is also represented as Q2 (2nd Quartile)

• The position of Q2 can be calculated by using

the following
( n  1)
2
Order Number

Calculating the Median 1

2
39
52.5
3 61
4 61.5
5 65.5
In our example the dataset 6 65.5
contains 23 numbers: 7
8
68.5
69.5
9 71.5
(n  1) 10 73
11 73
2 12 74.5
13 75.6
(23  1) 14 76
2 Therefore the 12th 15 78.5
number in the 16 80
12th number ascending data set 17 80.5
18 83
will be the median 19 86.5
(Q2 = 74.5kg) 20 87
21 93
22 98
23 101
Central Tendency - Mean
n

• Sample mean x i

– Represented by x x i 1
n
• Population mean
– Represented by 
n

• Note that 
i 1
means
– sum all values from 1 to n
Calculating the Mean
• The summation of all of our data values
23

 xi = 1714.1 kg.
i 1

• Divided by the number of values (n = 23)

• So the mean is n

x i
x i 1

n
1714.1

23
 74.5 kg .
Position
• Quantiles
– General name for measures of position that
divide the distribution (or ranked data) into
equal groups. For examples quarters,tenths,
hundreds, etc.
• Quartiles
– Measures of position that divide the
distribution (or ranked data) into Quarters.
• Percentiles
– Measures of position that divide the
distribution (or ranked data) into 100 equal
subsets
Central Tendency vs. Variability
• The mean, median, and mode all tell us about
the central tendency of a distribution.

• They cannot tell us about the spread of the

distribution (variability).
Variability - Range

• The Range of the distribution of data is

given by the difference between the
maximum value and the minimum value
Range = Max - Min

• A measurement of variability that usually

accompanies the Median.
Variability - Interquartile Range
• Quartiles are the three points (Q1, Q2, Q3)
in the distribution defining four equal
quarters.
• The quartiles cut the data distribution into
four sections each containing 25% of the
data.
25% of the data Q1 Q2 Q3
Variability - Interquartile Range
• The Interquartile Range (IQR) is represented by
the difference between the lower quartile (Q1)
and the upper quartile (Q3)
• These quartile positions can be calculated via
(n  1) 3(n  1)
for Q1 for Q3
4 4
• The IQR can then be calculated using the value
at these positions
• A measurement of variability that usually
accompanies the Median.
Calculating the
Order Number
1 39
2 52.5
Interquartile Range 3
4
61
61.5
(n  1) 5 65.5
for Q1 6 65.5
4 7 68.5
8 69.5
(23  1) Q1 is therefore 9 71.5
4 65.5kg. 10 73
11 73
6th number Q2 or Median 12 74.5
13 75.6
14 76
15 78.5
3(n  1) 16 80
for Q3 17 80.5
4 18 83
3(23  1) 19 86.5
20 87
4 Q3 is therefore 21 93
22 98
18th number 83.0kg. 23 101
Calculating the
Order Number
1 39
2 52.5
Interquartile Range 3
4
61
61.5
5 65.5
Q1 = 65.5kg 6 65.5
7 68.5
8 69.5
9 71.5
10 73
11 73
IQR = Q3 - Q1 12 74.5
13 75.6
= 83.0 - 65.5 14 76
= 17.5 15 78.5
16 80
17 80.5
18 83
Q3 = 83.0kg 19 86.5
20 87
21 93
22 98
23 101
Variability Around the Mean
Variation
around the
mean can be
80

70
described as
60
the difference
50
Mean (or distance)
40
between the
data point and
the mean
30

Sample xx
Number Mean Number - Mean

Variability Around
73 74.52609 -1.526086957
93 74.52609 18.47391304
68.5 74.52609 -6.026086957

the Mean 101 74.52609

65.5 74.52609
26.47391304
-9.026086957
78.5 74.52609 3.973913043
83 74.52609 8.473913043
80 74.52609 5.473913043
80.5 74.52609 5.973913043
We cannot simply 87 74.52609 12.47391304

subtract each number 73 74.52609

75.6 74.52609
-1.526086957
1.073913043
from the mean because 61 74.52609 -13.52608696
86.5 74.52609 11.97391304
the sum of these 61.5 74.52609 -13.02608696
differences will be 65.5 74.52609
39 74.52609
-9.026086957
-35.52608696
zero - the positive 98 74.52609 23.47391304

differences will cancel 69.5 74.52609

52.5 74.52609
-5.026086957
-22.02608696
out the negative 71.5 74.52609 -3.026086957
76 74.52609 1.473913043
differences 74.5 74.52609 -0.026086957
Total 0
• If we square the Difference

differences then we
Number Mean Number - Mean Squared
73 74.52609 -1.526086957 2.328941
will always get a 93 74.52609
68.5 74.52609
18.47391304
-6.026086957
341.2855
36.31372
positive number 101 74.52609
65.5 74.52609
26.47391304
-9.026086957
700.8681
81.47025
– this is known as the sum 78.5 74.52609 3.973913043 15.79198
83 74.52609 8.473913043 71.8072
of squares (SS) 80 74.52609 5.473913043 29.96372
– this can be represented 80.5 74.52609
87 74.52609
5.973913043
12.47391304
35.68764
155.5985
by the following equation 73 74.52609 -1.526086957 2.328941
75.6 74.52609 1.073913043 1.153289
61 74.52609 -13.52608696 182.955

(x  x)
2 86.5 74.52609 11.97391304 143.3746
61.5 74.52609 -13.02608696 169.6789
65.5 74.52609 -9.026086957 81.47025
39 74.52609 -35.52608696 1262.103
98 74.52609 23.47391304 551.0246
– Where; 69.5 74.52609 -5.026086957 25.26155
x represents the mean 52.5 74.52609
71.5 74.52609
-22.02608696
-3.026086957
485.1485
9.157202
76 74.52609 1.473913043 2.17242
x represents each
74.5 74.52609 -0.026086957 0.000681
Total 0 4386.944
individual number
Variability Around the Mean

• Although useful in some calculations, the

sum of squares does not take into account
the number of observations (is dependent on
sample size).
• There are some important ways that the
spread of the data around the mean can be
represented (based on sum of squares).
– The Variance (s2).
– The Standard Deviation (s).
– The Standard Error of the Sample Mean.
(S.E. or s).
Variability - Sample Variance
• The Variance uses the Sum of Squares
adjusted for the number of “independent”
observations in the sample:-“average” variation

2
(x  X )
s 2

n 1
• We can use the Sums of Squares calculated in
the previous slide:
4386.944
s  2

23  1
Notice that we are
 199.4kg 2 in squared units
Variability -
Sample Standard Deviation

• The sample’s Standard Deviation is the

square root of the Variance:

(x  x)
2

s
n 1
s  199.4
 14.12kg Notice that we are
now back in our
original units
The Standard Error
of the Sample Mean
• The Std. Dev. divided by the square
root of n is called the Standard Error of
the sample mean - we will encounter this
measure later on in the course.
s2 s
sx  
n n
199.4 14.12
 
23 23
 2.94
Sample VS Population
Sample Population

x = sample mean  = population mean

s = sample std dev  = population std dev
s2 = sample variance 2 = population var.
n = sample size N = population size

Sample Only
sx Standard error of the sample mean (S.E.)
Module 2 - Exploratory Data
Analysis (EDA)

Graphical Methods

Text: Field, A. 2009 2nd edition

-Chapter 1: 1.7
-Chapter 2: 2.1 – 2.5
-Chapter 4: 4.1 – 4.9
Graphical Methods & SPSS
• Graphical methods are a good way of
summarising information and are useful to
visualise patterns within your data.
• Various methods can be used depending on the
measurement scale of the variables.

• SPSS is the statistical package that you will

be using this semester and has a similar
spreadsheet format to Microsoft Excel.
• Generally, when entering data into SPSS,
each column contains a different variable.
Graphs for Discrete Variables
• Measurement scale - nominal or ordinal
– Other terms -categorical, binned, class, qualitative
– Examples - gender, age group, trap type

• Common graphical methods are:

– Pie charts for proportions, percentages, or values
that sum to a fixed value
– Bar charts for most other discrete variables

• Data can be entered into SPSS in two forms

– Each case (row) represents a single observation
– Each case (row) represents the count, percentage,
or proportion of each level of the discrete variable
Data Entry for Discrete Variables
Data entry type 1 :-
Can create charts directly
using this type of data
Data entry type 2:-
First tell SPSS that each
discrete level has been counted
An Example - Mass (%) of Each
Element Within a Star
• The data is entered
into SPSS as in data
entry type 2

• You must then tell

SPSS to weight each
observation (case) by
the variable “mass”

• You will need to do

this for a pie chart
and for a bar graph
Making a Pie
Chart in SPSS
The Pie Chart

Other
Helium

Hydrogen

Cases weighted by MASS

Making a Bar Chart in SPSS
Simple Bar Chart
One variable with three categories

20
Count

0
Hydrogen Helium Other

Element
Cases weighted by MASS
Clustered Bar Chart
Two variables with two categories each

Cancer Status
500
Cancer
No Cancer

400
Count

300

200

100

0
Smoker Non Smoker
Smoking Status
Cases weighted by freq
Graphs for Continuous Variables
• Measurement scale - Scale
– Other terms - quantitative
– Examples - Length, Temperature, Species Richness

• Common graphical methods are:

– For a single sample - Histograms, Box and Whisker
plots, Error Bar plots, Q-Q plots.
– For 2 or more samples - Clustered Box and
Whisker plots, Clustered Error Bar plots.
– For 2 scale variables - Scatter plots.
An Example - Plant Heights
We will be using the following data set of
plant heights (cm) to construct a histogram.

21 24.5 20 23.5 24.5

20 26 21 24 25
21.5 23.5 21 20 28
23 24.5 22.5 21 28
21 25 21.5 22 26
21.5 26.5 22.5 21.5 25
24 21.5 23 16.5 29
25.5 23 25 19 31
20.5 22.5 23 19 21.5
24 23.5 23 19.5 22.5
Histogram
To create a histogram by hand, we need to
create a series of “bins” or categories.
– The data ranges from 16.5 to 31.0.
– we can use the following groups to classify the data.

You can see Bin Tally Frequency

that the ‘bins’ 16 – 17.9
have been 18 – 19.9
organised so 20 – 21.9
that there 22 – 23.9
each datum 24 – 25.9
belongs to a 26 – 27.9
unique group 28 – 29.9
30 – 31.9
Histogram
Histogram of Plant height

16
14
12
Frequency

10
8
6
4
2
0
16 – 17.9

18 – 19.9

20 – 21.9

22 – 23.9

24 – 25.9

26 – 27.9

28 – 29.9

30 – 31.9
Height Categories (or Bins)
Histogram
Here’s One We Prepared Earlier
Histogram of Plant height

16
14
12
Frequency

10
8
6
4
2
0

Height Categories (or Bins)

Histogram
Using SPSS
• SPSS will create the
bins, work out
frequencies and
create the histogram
for you

• The data needs to

be entered in a
single column
Histogram
Using SPSS
Histogram
Using SPSS
15

12
F re q u e n c y

Mean = 23.03
Std. Dev. = 2.7412
N = 50
0
16 18 20 22 24 26 28 30 32

Single sample height

Variable height (8 bins)

Histogram
Using SPSS
12

8
F re q u e n c y

Mean = 23.03
Std. Dev. = 2.7412
N = 50
0

Single sample 16 17 18 19 20 21 22 23 24
height
25 26 27 28 29 30 31 32

Variable height (16 bins)

Q-Q Plot
• For a single sample

• Plots the quantiles of a variable's distribution

(observed - unknown distribution) against the
quantiles of a test distribution (expected - e.g.
Normal Dist.).

• The test distribution (expected values) have the

same mean and standard deviation as the observed
data.

• Available test distributions include Beta, Chi-

square, Exponential, Gamma, Logistic, Lognormal,
Normal, Student’s t, and Uniform.
Q-Q Plot

• Probability plots are generally used to determine

whether the distribution of a variable (observed -
unknown distribution) matches a given distribution
(expected - e.g.. Normal Dist.).

• If the selected variable matches the test

distribution, the points line up on a 450 line
(observed = expected).

• Note, if using a sample from a population the

sample size needs to be reasonably large.

• An alternative is the P-P plot (percentile plot)

Q-Q Plot
Expected quantiles for a
normal distribution with the
same mean and standard
deviation as the observed
distribution 30
Normal Q-Q Plot of HEIGHT

Expected Normal Value

Observed quantiles from our 20

sample of plant heights 18

16
16 18 20 22 24 26 28 30 32

Observed Value
Box and Whisker Plots
• The Box includes
– The Median
– Q1 and Q3 as the edges of the box

• The Whiskers
– either (method 1) – “5 number summary”
• Max and the Min are the ends of the whiskers
– or (method 2) – default method used in SPSS
• Q3+1.5  IQR and Q1-1.5  IQR are the ends of the
whiskers
• Q3+3.0  IQR and Q1-3.0  IQR border between outliers
and extreme outliers
• symbols used for outliers (O) and extreme outliers (*)
Box and Whisker Plot
Method 1 - 5 Number Summary
This type of Box and
Whisker Plot is the
simplest.
Max
It is based on a five
Q3
number summary:-
Range IQR Q2 (Median)
Max, Q3, Q2, Q1, Min
Q1
Min
Box and Whisker Plot
Method 2 - SPSS (Boxplot)
Extreme Outlier *

Outliers o Q3 + 3  IQR
o
Q3 + 1.5  IQR (or max)
Q3
Q2 (Median)
Q1
Q1 - 1.5  IQR (or min)
o
Outlier Q1 - 3  IQR
Making a Boxplot in SPSS
SPSS Clustered Boxplot
Note:
Outlier present in
second site 70

(sample)
15

10
GALLS

-10
N= 8 8 8 8 8

1 2 3 4 5

Several samples SITES

Error Bar Plot

The Error Bar plot is used to represent

• The mean
• Plus a measure of variation around the mean
– Confidence Interval of the Sample Mean
– The Standard Error of the Sample Mean
– The Standard Deviation of the sample

• The most common form of the Error Bar Plot

– Is the Standard Error Plot
– Mean  1 Standard Error of the Sample Mean
Error Bar Plot in
SPSS

The default
Make sure you multiplier is 2
select the correct so make sure
measure of that you always
variability change it to 1
SPSS Clustered Error Bar Plot

30
Note:
Mean  1 S.E.
20
Mean +- 1 SE GALLS

0
N= 8 8 8 8 8

1 2 3 4 5

Several samples SITES

Scatter Plot
Two scale variables

5.00 5.00
Oxygen Concentration

Oxygen Concentration
4.00 4.00

3.00 3.00

2.00
2.00 R Sq Linear = 0.979

-20.00 -10.00 0.00 10.00 20.00

-20.00 -10.00 0.00 10.00 20.00
Temperature
Temperature

Line of best fit or linear regression model

Scatter Plot
Three scale variables

20.0

18.0
T u r b id it y

16.0

14.0

12.0

10.0

5.0 4.0
20.0 25.0 8.0 7.0 6.0
30.0 35.0 10.09.0
40.011.0

All Recivers Master Code
50% (2)
All Recivers Master Code
20 pages
Descriptive Statistics and Exploratory Data Analysis
No ratings yet
Descriptive Statistics and Exploratory Data Analysis
36 pages
Mastering Gymnastic Strength Training. Foundation Four (PDFDrive)
No ratings yet
Mastering Gymnastic Strength Training. Foundation Four (PDFDrive)
66 pages
3rd Grade Unit 2 Planner Weather - 1
100% (3)
3rd Grade Unit 2 Planner Weather - 1
9 pages
Measures of Central Tendency
100% (15)
Measures of Central Tendency
15 pages
Gertec Grout Pump 36974
No ratings yet
Gertec Grout Pump 36974
69 pages
SANS 2001-CC1:2012: Construction Works Part CC1: Concrete Works (Structural)
60% (5)
SANS 2001-CC1:2012: Construction Works Part CC1: Concrete Works (Structural)
5 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
02 - Descriptive Statistics
No ratings yet
02 - Descriptive Statistics
45 pages
RMBS BPT402
No ratings yet
RMBS BPT402
103 pages
Emperor Joker
No ratings yet
Emperor Joker
253 pages
Vol 3-2 28-08-07
No ratings yet
Vol 3-2 28-08-07
96 pages
Speed Test - 11 Simple Interest
No ratings yet
Speed Test - 11 Simple Interest
4 pages
Group-1 Module-1 PPT
No ratings yet
Group-1 Module-1 PPT
100 pages
Section 3 Well Performance Retesting
No ratings yet
Section 3 Well Performance Retesting
59 pages
Spring Semester, 2020-2021
No ratings yet
Spring Semester, 2020-2021
40 pages
Topic 1 Describing Data II
No ratings yet
Topic 1 Describing Data II
68 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
53 pages
Ee492b2 39630 General Introduction To PIARC November 2022 World Road Association
No ratings yet
Ee492b2 39630 General Introduction To PIARC November 2022 World Road Association
46 pages
Dance 101
No ratings yet
Dance 101
17 pages
Desc. Stat
No ratings yet
Desc. Stat
41 pages
Unit 3 - Descriptive Statistics
No ratings yet
Unit 3 - Descriptive Statistics
44 pages
MetNum1 2023 1 Week 10
No ratings yet
MetNum1 2023 1 Week 10
79 pages
Lecture 04
No ratings yet
Lecture 04
88 pages
Dote 2011 L1
No ratings yet
Dote 2011 L1
35 pages
PATHOLOGY
No ratings yet
PATHOLOGY
12 pages
AP Statistics Study Guide
No ratings yet
AP Statistics Study Guide
87 pages
Ahmd To Gandhidham PDF
No ratings yet
Ahmd To Gandhidham PDF
2 pages
03 Numerical Description
No ratings yet
03 Numerical Description
52 pages
Central Tendency Variation Outliers
No ratings yet
Central Tendency Variation Outliers
59 pages
FDSA Unit 2
No ratings yet
FDSA Unit 2
44 pages
HNS 2321 Biostatistics Lecture 3 and 4 Descritive Statistics
No ratings yet
HNS 2321 Biostatistics Lecture 3 and 4 Descritive Statistics
36 pages
Analysis of Statistcal Data
No ratings yet
Analysis of Statistcal Data
46 pages
3-Measures of Central Tendency
No ratings yet
3-Measures of Central Tendency
59 pages
Chapter 5 Measures of Variability
No ratings yet
Chapter 5 Measures of Variability
24 pages
Lecture 2-3 Data Analysis Location & Dispression
No ratings yet
Lecture 2-3 Data Analysis Location & Dispression
43 pages
Statistical Data
No ratings yet
Statistical Data
41 pages
Hns 2321 Biostatistics Descritive Statistics
No ratings yet
Hns 2321 Biostatistics Descritive Statistics
35 pages
Slides Week2
No ratings yet
Slides Week2
43 pages
Averages and Variation Eda
No ratings yet
Averages and Variation Eda
29 pages
Actuary Math - Stat. Lec1-9
No ratings yet
Actuary Math - Stat. Lec1-9
22 pages
3-Measures of Dispersion
No ratings yet
3-Measures of Dispersion
33 pages
ch03 Ver3
No ratings yet
ch03 Ver3
25 pages
Dispersion
No ratings yet
Dispersion
25 pages
Hubungan Sub Etnik Pada Suku Minahasa Menggunakan
No ratings yet
Hubungan Sub Etnik Pada Suku Minahasa Menggunakan
19 pages
Lecture No. 6 Measures of Variability
No ratings yet
Lecture No. 6 Measures of Variability
25 pages
Freud's Psychoanalytic Theory
No ratings yet
Freud's Psychoanalytic Theory
9 pages
Statistics I Chapter 2: Univariate Data Analysis
No ratings yet
Statistics I Chapter 2: Univariate Data Analysis
27 pages
Chapter 5 - RM
No ratings yet
Chapter 5 - RM
22 pages
STAE Lecture Notes - LU3
No ratings yet
STAE Lecture Notes - LU3
24 pages
EECM3724 Unit 1 Ch3 Slides 2022
No ratings yet
EECM3724 Unit 1 Ch3 Slides 2022
48 pages
CP16
No ratings yet
CP16
19 pages
Euphonium Mouthpiece Guide
100% (1)
Euphonium Mouthpiece Guide
3 pages
Chapter 3, Part A Descriptive Statistics: Numerical Measures
No ratings yet
Chapter 3, Part A Descriptive Statistics: Numerical Measures
7 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
MedSurg - Respiratory Case Study
No ratings yet
MedSurg - Respiratory Case Study
7 pages
Nogueira 2015 Stir Bar - Sorptive - Ex
No ratings yet
Nogueira 2015 Stir Bar - Sorptive - Ex
10 pages
SRM20 Operator Manual
No ratings yet
SRM20 Operator Manual
19 pages
Descriptive Statistics - Measures of Spread: April 2014
No ratings yet
Descriptive Statistics - Measures of Spread: April 2014
5 pages
Brick Exchange - Descriptive Statistics and Data Representation
No ratings yet
Brick Exchange - Descriptive Statistics and Data Representation
24 pages
Descriptive Statistics - Measures of Spread: April 2014
No ratings yet
Descriptive Statistics - Measures of Spread: April 2014
5 pages
Biostatistics (Descriptive Statistics)
No ratings yet
Biostatistics (Descriptive Statistics)
30 pages
Unit 3 Measure of Central Location
No ratings yet
Unit 3 Measure of Central Location
29 pages
Central Tendency - Lecture Notes
No ratings yet
Central Tendency - Lecture Notes
34 pages
Iplc
No ratings yet
Iplc
22 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
1.3 Variation
No ratings yet
1.3 Variation
16 pages
Mary English Work
No ratings yet
Mary English Work
10 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
40 pages
Descriptive Statistics W25
No ratings yet
Descriptive Statistics W25
41 pages
Full-Application Note Drinking Water Monitoring An Algae Bloom
No ratings yet
Full-Application Note Drinking Water Monitoring An Algae Bloom
6 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
Class 1 - 20th August 2024 - Descriptive Statistic
No ratings yet
Class 1 - 20th August 2024 - Descriptive Statistic
6 pages
STAE Lecture Notes - LU3 - Annotated
No ratings yet
STAE Lecture Notes - LU3 - Annotated
10 pages
Manual Módem Huawei
No ratings yet
Manual Módem Huawei
3 pages
Measures
No ratings yet
Measures
8 pages
Bio Orthopaedics A New Approach Instant PDF Download
No ratings yet
Bio Orthopaedics A New Approach Instant PDF Download
15 pages
Measures of Variability
No ratings yet
Measures of Variability
4 pages
Task 2 SWOT Analysis
No ratings yet
Task 2 SWOT Analysis
5 pages
Measures of Dispersion Tendency
No ratings yet
Measures of Dispersion Tendency
7 pages
Feed Export Flow
No ratings yet
Feed Export Flow
2 pages
Statistics
No ratings yet
Statistics
6 pages
These Are The Measures of Variability
No ratings yet
These Are The Measures of Variability
4 pages
Инструкция Panasonic KX-TCD150FXC (77 страницы)
No ratings yet
Инструкция Panasonic KX-TCD150FXC (77 страницы)
3 pages
Boq Line Item No 4 T4N
No ratings yet
Boq Line Item No 4 T4N
2 pages
Patliputra University, Patna, Bihar: Bachelor of Science-Year-1-Sem-I, Session-2023-2027
No ratings yet
Patliputra University, Patna, Bihar: Bachelor of Science-Year-1-Sem-I, Session-2023-2027
1 page
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
4 pages
Numerical Methods
From Everand
Numerical Methods
Germund Dahlquist
4.5/5 (1)
Basic Stochastic Processes
From Everand
Basic Stochastic Processes
Pierre Devolder
No ratings yet
Adaptive Filtering Prediction and Control
From Everand
Adaptive Filtering Prediction and Control
Graham C Goodwin
No ratings yet

Module 2 - Exploratory Data Analysis (EDA) : Central Tendency and Variability

Uploaded by

Module 2 - Exploratory Data Analysis (EDA) : Central Tendency and Variability

Uploaded by

Module 2 - Exploratory Data

Central Tendency and Variability

Text: Field, A. 2009 2nd edition

• This variation in the variable under study can be

• An example - Distribution of a normal random variable

– Quartiles Mean = 23.03

Note that for the following definitions, we

73 78.5 73 65.5 71.5

• 50% of the data lie on either side of this

• It is also represented as Q2 (2nd Quartile)

• The position of Q2 can be calculated by using

Calculating the Median 1

• Divided by the number of values (n = 23)

• They cannot tell us about the spread of the

• The Range of the distribution of data is

• A measurement of variability that usually

the Mean 101 74.52609

subtract each number 73 74.52609

differences will cancel 69.5 74.52609

• Although useful in some calculations, the

• The sample’s Standard Deviation is the

x = sample mean  = population mean

Text: Field, A. 2009 2nd edition

• SPSS is the statistical package that you will

• Common graphical methods are:

• Data can be entered into SPSS in two forms

• You must then tell

• You will need to do

Cases weighted by MASS

• Common graphical methods are:

21 24.5 20 23.5 24.5

You can see Bin Tally Frequency

Height Categories (or Bins)

• The data needs to

Single sample height

Variable height (8 bins)

Variable height (16 bins)

• Plots the quantiles of a variable's distribution

• The test distribution (expected values) have the

• Available test distributions include Beta, Chi-

• Probability plots are generally used to determine

• If the selected variable matches the test

• Note, if using a sample from a population the

• An alternative is the P-P plot (percentile plot)

Expected Normal Value

Observed quantiles from our 20

sample of plant heights 18

Several samples SITES

The Error Bar plot is used to represent

• The most common form of the Error Bar Plot

Several samples SITES

-20.00 -10.00 0.00 10.00 20.00

Line of best fit or linear regression model

You might also like