0% found this document useful (0 votes)
7 views21 pages

Chapter 1 Descriptivestatistics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views21 pages

Chapter 1 Descriptivestatistics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

16/10/2022

BFC 34303
CIVIL ENGINEERING STATISTICS
Chapter 1
Descriptive Statistics
Faculty of Civil Engineering and Built Environment
Universiti Tun Hussein Onn Malaysia

What is ‘statistics’?
Statistics is the science that deals with collecting, classifying, presenting,
describing, analysing and interpreting data to enable us to draw
conclusions and make reasonable decisions.

It can be divided into two categories:


a) Descriptive statistics
b) Inferential statistics

1
16/10/2022

Descriptive statistics
The activity of collecting, classifying, presenting and describing
quantitative data.
Methods for organising (frequency table), representing (graphs) and
summarising data (central tendency and variability).

Inferential statistics
The part dealing with techniques and methods of interpretation of the
results obtained from the descriptive statistics.

Population Sample
Population is the entire A portion of population selected
(complete) collection of data for study.
whose properties are analysed.
It contains all the subjects of A sample is any set of entities,
interest. cases, subjects, items or
experimental units chosen from
Can be of any size, its items
the population.
need not be uniform but must
share at least one measurable
feature.

2
16/10/2022

Random Sample
A random sample is a sample selected in such a way that each element
of the population has the same chance of being selected.

Parameter
Parameter is a numerical measurement describing some characteristics
of a population.
Eg. the population mean and variance

Statistic
Statistic is a numerical measurement describing some characteristics of
a sample.
Eg. the sample mean and variance
5

Variable
Any measured characteristic or attribute that differs for different
elements.
For example, if the weight of 30 people were measured, then weight
would be a variable.
Can be classified as quantitative or qualitative.

Quantitative Variable
The variable being studied is numeric and measured on an ordinal,
interval or ratio scale.
Eg. Ambient temperature, vehicular speed and walking distance.

3
16/10/2022

Qualitative Variable
The variable being studied is non-numeric and measured on a nominal
scale.
Also called ‘categorical’ variable.
Eg. Gender, eye colour and educational level.

Ordinal Interval Ratio Nominal

Meaningful Meaningful zero point


Data may only be
Data are ranked differences beween and ratio between
classified
values values

Position in Distance to Number of Marital


Exam grade Clothes size Temperature Gender
race class patients seen Status

Data
A set of data is a collection of observations, measurements or
information obtained for a study.
It can be classified as qualitative data or quantitative data.

Quantitative Data Qualitative Data

• Data that can be measured • Data that are not in numerical


numerically or counted. form but instead assigned as
• Can be either continuous data attributes.
or discrete data. • Eg. race, age, gender, marital
• Eg. length, time, mass, status
temperature
8

4
16/10/2022

• Data that can only take exact and countable


values.
Discrete Data • Eg. number of students in a class, number of
cars sold in a day, number of persons in a
family.

• Data that take any value over a certain interval


and can be measured to a certain degree of
Continuous accuracy (correct to certain decimal places)
Data
• Eg. weight of students in a class, time taken to
complete race, fat content in canned food.

Ungrouped and Grouped Data

Ungrouped Data
Raw data that is not in the term of interval.
Frequency distribution has been arranged in order.
Example:
Weight of seven students: 56, 74, 68, 90, 52, 48, 65
Number of cars owned per household:

No. of cars owned 0 1 2 3


No. of households 6 28 12 4

10

10

5
16/10/2022

Grouped Data
Data is grouped according to class intervals before the frequency
distribution is assigned.
Example:
Height of students in a class:

Height (cm) 150-159 160-169 170-179 180-189


No. of students 5 11 21 8

11

11

Measures of Location

Median

Mean Percentile

Measures
Mode of Quartile
Location

12

12

6
16/10/2022

Measures of Central Tendency

Central tendency is a
statistical measure that Mean
determines a single value
that accurately describes the
center of the distribution and Mode Median
represents the entire
distribution of scores.
The goal is to identify the Measures
single value that is the best of Central
Tendency
representative for the entire
set of data.

13

13

Mode • The mode is the most frequent score in a data set.

• The median is the middle score for a data set that


Median
has been arranged in order of magnitude.

• The mean (or average) is equal to the sum of all the


Mean values in a data set divided by the number of values
in the data set.

14

14

7
16/10/2022

Quartiles
Quartiles are values that divide a data set into four parts containing an
approximately equal number of observations.
The total of 100% is split into four equal parts (four quarters):

Q1 Q2 Q3

25% 25% 25% 25%

Interquartile Range = Q3 – Q1

First quartile (Q1) or lower quartile


Second quartile (Q2) or middle quartile, which is also the median
Third quartile (Q3) or upper quartile
15

15

Percentiles
Percentiles divide a set of data which are arranged in ascending order
into 100 equal parts.
A percentile is a measure used to indicate the value below which a given
percentage of observations in a group of observations fall.
For example, the 25th percentile is the value below which 25% of the
observations may be found.
Note:
25th percentile (P25) = First quartile (Q1)
50th percentile (P50) = Second quartile (Q2), which is also the median
75th percentile (P75) = Third quartile (Q3)

16

16

8
16/10/2022

Measures of Dispersion

Variance

Standard
Range
Deviation

Measures
of
Dispersion

17

17

Measures of Dispersion
Measures of dispersion (or variation) describe how spread out a set of
data is, or the extent of the variability in individual items of the distribution.
Let us look at the following data sets to see how measures of central
tendency is different from measures of dispersion:

Data Set 1: 6, 7, 8, 6, 9, 6 (Mean = 7) (Range = 6 – 9)


Data Set 2: 5, 7, 2, 6, 13, 9 (Mean = 7) (Range = 2 – 13)

Most of the numbers in data set 1 are close to the mean value, while in
data set 2 the numbers are spread away from the mean. The difference in
the spread can be determined by a measure of dispersion.
18

18

9
16/10/2022

Measures of Dispersion
However, range is not a good measure of dispersion because it is
influenced by the extreme values and the calculation does not cover all
observations.
Variance and standard deviation are most useful and widely used
measures of dispersion. Although they are influenced by the extreme
values, the calculations cover all the observations.

19

19

Variance
Variance (s2 or s2) is the average of the squared differences from the
mean.

Standard Deviation
Standard deviation (s or s) a measure of dispersion of observations
within a data set. It is simply the square root of the variance.
If the observations are all close to the mean, then the standard deviation
is close to zero.
If many observations are far from the mean, then the standard deviation
is far from zero.
If all the observations are equal, then the standard deviation is zero.
20

20

10
16/10/2022

The equation for variance (s2) is given below:

σ 𝑥 − 𝑥ҧ 2
𝑠2 =
𝑛−1

The equation for standard deviation (s) is given below:

σ 𝑥 − 𝑥ҧ 2
𝑠=
𝑛−1

where 𝑥ҧ is the mean and n is number of observations.

21

21

Stem-and-Leaf Diagram

A stem-and-leaf diagram (or


display) is a method for
presenting quantitative data
in a graphical format to assist
in visualising the shape of a
distribution.
The "stem" is the first digit or
digits, and the "leaf" is the
last digit.

Stem Leaf

22

22

11
16/10/2022

To construct a stem-and-leaf diagram:


1. Arrange the data in order of magnitude (ascending order).
2. Place the stems in order, vertically from smallest to largest.
3. Place the leaves in order, in each row from smallest to largest.
4. Create a key for the stem-and-leaf diagram so that people know how
to interpret the diagram.

Online tutorial: https://www.youtube.com/watch?v=_7m0Q_m2ppg

23

23

24

24

12
16/10/2022

Distribution of Data
A symmetric curve (bell-shaped) is one in which both sides of the
distribution would exactly match the other if the figure were folded over
its central point. This is called a normal distribution.
An example is shown below:

25

25

A distribution is said to be skewed to the right, or positively skewed,


when most of the data are concentrated on the left of the distribution.
The right tail clearly extends farther from the distribution's centre than
the left tail, as shown below:

Most data on the left

Right tail elongated

Positive skew

26

26

13
16/10/2022

A distribution is said to be skewed to the left, or negatively skewed, if


most of the data are concentrated on the right of the distribution.
The left tail clearly extends farther from the distribution's centre than the
right tail, as shown below:

Most data on the right

Left tail elongated

Negative skew

27

27

Interpreting Distribution of Data from Stem-and-Leaf Diagram


If the stem-and-leaf diagram is turned on its side, it will look like the
following:

The distribution shows that most data are clustered at the right. The left
tail extends farther from the data centre than the right tail. Therefore, the
distribution is skewed to the left or negatively skewed.
28

28

14
16/10/2022

Box-and-Whisker Plot
A box-and-whisker plot (also called a box plot) displays the five-number
summary of a set of data.

The five-number summary is the:


1. Minimum
2. First quartile
3. Second quartile (median)
4. Third quartile
5. Maximum

In a box plot, we draw a box from the first quartile to the third quartile. A
vertical line goes through the box at the median.

29

29

70
max

Horizontal Box-and-Whisker 60

Q1 Q2 Q3 50
min max

40 Q3
0 10 20 30 40 50 60 70

30
Q2

20

Vertical Box-and-Whisker 10
min
0
30

30

15
16/10/2022

To construct a box-and-whisker plot:

1. Determine the five-number summary.


2. Draw a horizontal axis on which the number obtained in step 1 can
be located. Above this axis, mark the five-number summary with
vertical lines.
3. Connect the quartiles to each other to make a box, and then connect
the box to the maximum and minimum lines.
4. Calculate the values of upper and lower inner fence to determine
whether the data has outliers.

Upper inner fence = Q3 + 1.5*(Q3 – Q1)


Lower inner fence = Q1 – 1.5*(Q3 – Q1)

Online tutorial: https://www.youtube.com/watch?v=o7qWblT5NZI

31

31

Lower inner fence Upper inner fence

min max
Q1 Q2 Q3

10 20 30 40 50 60 70 80 90 100

The data lies within the upper and lower inner fence, so the data has no outlier.

Lower inner fence Upper inner fence


Outlier

min max
Q1 Q2 Q3

10 20 30 40 50 60 70 80 90 100

The observation that lies outside fence is known as outlier. 32

32

16
16/10/2022

33

33

Shape of Distribution: Symmetry and Skewness

The diagram below shows a symmetrical distribution (normal


distribution). The ‘whiskers’ are the same length and the median (Q 2) is
in the centre of the box.

Q1 Q2 Q3
min max

34

34

17
16/10/2022

The diagram below shows a positively skewed distribution (skewed to


the right). The left ‘whisker’ is shorter than the right ‘whisker’ and the
median (Q2) is nearer to Q1.

Q1 Q2 Q3
min max

35

35

The diagram below shows a negatively skewed distribution (skewed


to the left). The left ‘whisker’ is longer than the right ‘whisker’ and the
median (Q2) is nearer to Q3.

Q1 Q2 Q3
min max

36

36

18
16/10/2022

Analysing Grouped Data

Median Percentile

Mean Quartile

Measures
Mode Decile
of Location

37

37

Standard Interquartile
Deviation Range

Variance Range
Measures
of
Dispersion

38

38

19
16/10/2022

Formula

σ 𝑓𝑥
Mean, 𝑥ҧ = σ𝑓
where x = data and f = frequency

𝑑1
Mode = 𝐿𝑚 + c
𝑑1 +𝑑2

where Lm = lower boundary of the class containing the mode


d1 = difference between the frequency of the mode class and the
frequency of the class immediately before it
d2 = difference between the frequency of the mode class and the
frequency of the class immediately after it
c = size of the mode class

39

39

𝑛
−𝐹𝐿
Median = 𝐿𝑚 + 2
c
𝑓𝑚

where Lm = lower boundary of the class containing the median


n = total number of observations
FL = cumulative frequency of the class before the median class
fm = frequency of the median class
c = size of the median class

𝑘
𝑛−𝐹𝐿
4
Quartile, 𝑄𝑘 = 𝐿𝑘 + 𝑐𝑘
𝑓𝑘

40

40

20
16/10/2022

𝑘
𝑛−𝐹𝐿
100
Percentile, 𝑃𝑘 = 𝐿𝑘 + 𝑐𝑘
𝑓𝑘

𝑘
𝑛−𝐹𝐿
10
Decile, 𝐷𝑘 = 𝐿𝑘 + 𝑐𝑘
𝑓𝑘

where k = 1, 2, 3, …
Lk = lower boundary of the class where Qk, Pk, Dk lies
n = total number of observations
FL = cumulative frequency of the class before the Qk, Pk, Dk class
fk = frequency of the class where Qk, Pk, Dk lies
ck = size of the class where Qk, Pk, Dk lies

41

41

σ 𝑓𝑥 2
σ 𝑓𝑥 2 −
σ𝑓
Variance, 𝑠 2 = σ 𝑓 −1

σ 𝑓𝑥 2
σ 𝑓𝑥 2 −
σ𝑓
Standard Deviation, 𝑠 = σ 𝑓 −1

where x = data and f = frequency

42

42

21

You might also like