0% found this document useful (0 votes)
5 views40 pages

Civil Stat 02

The document discusses summarizing and presenting numerical data through various graphical methods such as histograms, frequency polygons, frequency curves, and ogives. It provides examples and steps for creating frequency tables and histograms based on river discharge data from 1922 to 1971.

Uploaded by

Abdel Samie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views40 pages

Civil Stat 02

The document discusses summarizing and presenting numerical data through various graphical methods such as histograms, frequency polygons, frequency curves, and ogives. It provides examples and steps for creating frequency tables and histograms based on river discharge data from 1922 to 1971.

Uploaded by

Abdel Samie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Applied Statistics

Lecture (2)
Graphical Presentation of Data
Numerical Descriptive Measures

Dr. Mohamed Gad Lecture 2 – Page 1


Describing Numerical Data With
Graphs
• Histogram

• Frequency Polygon

• Frequency Curve

• Less & More than Ogive (Cumulative


Frequency Polygon)

Dr. Mohamed Gad Lecture 2 – Page 2


Steps to Create Frequency Tables

◼ Sort raw data in ascending order:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

◼ Find range: 58 - 12 = 46
◼ Select number of classes: 5 (usually between 5 and 15)
◼ The smaller the number of classes, the greater the loss of information
◼ Compute class interval (width): 10 (46/5 then round up)
◼ Determine class boundaries (limits): 10, 20, 30, 40, 50, 60

◼ Compute class midpoints: 15, 25, 35, 45, 55

◼ Count observations & assign to classes

Lecture 2- Page 3
Dr. Mohamed Gad
Frequency Table

Data in ordered array:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Class Frequency Relative


Percentage
j fj Frequency rfj
10 but under 20 3 0.15 15
20 but under 30 6 0.30 30
30 but under 40 5 0 .25 25
40 but under 50 4 0.20 20
50 but under 60 2 0.10 10
Total 20 1 100
Dr. Mohamed Gad Lecture 2 – Page 4
Graphing Numerical Data:
The Histogram
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Histogram

7 6
6 5 No Gaps
Frequency

5 4
4 3
Between
3 2 Bars
2
1 0 0
0
5 15 25 35
36 45 55 More
Class Boundaries
Dr. Mohamed Gad
Class Midpoints Lecture 2 – Page 5
Graphing Numerical Data:
The Frequency Polygon
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Frequency Polygon

7
6
5
4
3
2
1
0
5 15 25 36 45 55 More

Dr. Mohamed Gad


Class Midpoints Lecture 2 – Page 6
Graphing Numerical Data:
The Frequency Curve
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Frequency Curve

7
6
5
4
3
2
1
0
5 15 25 36 45 55 More

Dr. Mohamed Gad


Class Midpoints Lecture 2 – Page 7
Cumulative Frequency Curve
Create Cumulative Frequency Table first
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
6
5
4
3

0 0

5 15 25 35
36 45 55 More

Can we do More than (>)?


Dr. Mohamed Gad Lecture 2 – Page 8
Tabulating and Graphing
Numerical Data: Example
◼ Consider the data for the mean flow of a river for
the month of May during the period from 1922 to
1971 (see Table below)
discharge discharge discharge discharge discharge
year year year year year
(m3/s) (m3/s) (m3/s) (m3/s) (m3/s)
1922 3532 1932 2338 1942 1608 1952 1949 1962 2568
1923 2071 1933 1873 1943 1456 1953 1396 1963 1944
1924 4188 1934 1243 1944 1570 1954 1344 1964 2062
1925 2080 1935 2849 1945 2301 1955 1886 1965 3919
1926 2036 1936 2359 1946 1460 1956 1786 1966 2944
1927 2685 1937 3070 1947 1584 1957 1455 1967 2175
1928 1832 1938 1222 1948 1410 1958 3025 1968 2877
1929 1500 1939 2841 1949 1490 1959 1828 1969 3208
1930 2856 1940 2110 1950 1959 1960 1401 1970 4750
1931 3043 1941 2058 1951 1981 1961 2427 1971 1475

Dr. Mohamed Gad Lecture 2 – Page 9


Tabulating Numerical Data:
Example (Continued)
◼ Sort raw data in ascending order:
1222, 1243, …, 4750
◼ Number of observations n = 50
◼ Minimum discharge is 1222 m3/s
◼ Maximum discharge is 4750 m3/s
◼ Find range: 4750 - 1222 = 3528
◼ Select number of classes: 6 (usually between 5 and 15)
◼ Compute class interval (width): 600 (3528/6 then round up)
◼ Determine class boundaries (limits): 1200, 1800, 2400,3000, 3600, 4200, 4800
◼ Compute class midpoints: 1500, 2100, 2700, 3300, 3900, 4500
◼ Count observations & assign to classes

Dr. Mohamed Gad Lecture 2 – Page 10


Example (Continued)

Relative
Class No. Class Interval Frequency
3 Description Frequency
j I j (m /s) fj
rf j
1 (1200, 1800) 1200 but under 1800 16 0.32
2 (1800, 2400) 1800 but under 2400 18 0.36
3 (2400, 3000) 2400 but under 3000 8 0.16
4 (3000, 3600) 3000 but under 3600 5 0.1
5 (3600, 4200) 3600 but under 4200 2 0.04
6 (4200, 4800) 4200 but under 4800 1 0.02
Total 50 1.00

Dr. Mohamed Gad Lecture 2 – Page 11


Example (Continued)
Frequency
Histogram

Relative
Frequency
Histogram

Dr. Mohamed Gad Lecture 2 – Page 12


Example (Continued)

Area under polygon


= Area under histogram

Dr. Mohamed Gad Lecture 2 – Page 13


Example (Continued)
Boundary Cumulative
Cumulative
This is called an Ogive 3
Value (m /s)
Description
Frequency
Relative
Frequency
1,200 Less than 1,200 0 0
Cumulative frequency 1,800 Less than 1,800 16 0.32
polygon & cumulative 2,400
3,000
Less than 2,400
Less than 3,000
34
42
0.68
0.84
frequency curve (smooth 3,600 Less than 3,600 47 0.94
4,200 Less than 4,200 49 0.98
Ogive) 4,800 Less than 4,800 50 1.00

How does the “more


than” Ogive look like?

Dr. Mohamed Gad Lecture 2 – Page 14


More-Than Curve (Ogive)

More than curve

100
90
80
70
60
%F

50
40
30
20
10
0
2 6 10 14 18 22

Classes

Dr. Mohamed Gad Lecture 2 – Page 15


Graphing Bivariate Numerical Data

Scatter Plot of bi-variate numerical data

Dr. Mohamed Gad Lecture 2 – Page 16


Describing Numerical Data with Numbers
– Numerical Descriptive Measures

◼ Measures of central tendency


◼ Mean, median, mode
◼ Measure of variation (or Dispersion)
◼ Range, variance and standard deviation,
coefficient of variation
◼ Measure of Shape
◼ Skewness Coefficient
◼ Measure of accordance
◼ Coefficient of Correlation
Dr. Mohamed Gad Lecture 2 – Page 17
Measures of Central Tendency

Central Tendency

Average or
Arithmetic Mean

Population mean Sample mean


N
1 1 n
=
N
X
i =1
i X =  Xi
n i =1
Parameter Statistic
Dr. Mohamed Gad Lecture 2 – Page 18
Mean (Arithmetic Mean)
◼ Mean (arithmetic mean)
◼ Sample mean
n Sample Size
X i
X1 + X 2 + + Xn
X= i =1
=
n n
◼ Population mean
N Population Size
X i
X1 + X 2 + + XN
= i =1
=
N N
Dr. Mohamed Gad Lecture 2 – Page 19
Mean (Arithmetic Mean)
(continued)

◼ The most common measure of central


tendency
◼ Affected by extreme values (outliers)
1 3 5 7 9 Mean = 5

1 3 5 7 24

Mean = 8
Dr. Mohamed Gad Lecture 2 – Page 20
Median
◼ The variate value that divides the data into two equal
halves
◼ Not affected by extreme values
1 3 5 7 9 Median = 5

Median = 5
1 3 5 7 24

◼ In an ordered array, the median is the “middle” number


◼ If n or N is odd, the median is the middle number @ (n+1)/2.
◼ If n or N is even, the median is the average of the two middle
numbers at (n/2) and (n/2+1).
Dr. Mohamed Gad Lecture 2 – Page 21
Mode
◼ A measure of central tendency
◼ Value that occurs most often
◼ Not affected by extreme values
◼ There may be no mode
◼ There may be several modes
1,3,5,5,7,9,9,9,10,12,12,13,14 0,1,2,3,4,5,6

0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

No Mode
Mode = 9
Dr. Mohamed Gad Lecture 2 – Page 22
Measures of Variation
Variation

Variance Standard Deviation Coefficient


of Variation
Range Population
Population
Variance Standard
Relative
Sample Deviation
Range
Variance Sample
Standard
Deviation
Dr. Mohamed Gad Lecture 2 – Page 23
Range

◼ Measure of variation
◼ Difference between the largest and the
smallest observations:
Range = X Largest − X Smallest
◼ Ignores the way in which data are distributed
Range = 12 - 7 = 5 Range = 12 - 7 = 5

7 8 9 10 11 12 7 8 9 10 11 12

Dr. Mohamed Gad Lecture 2 – Page 24


Relative Range

Range X Largest − X Smallest


Relative Range = =
Mean Mean

Dr. Mohamed Gad Lecture 2 – Page 25


Variance
◼ Important measure of variation
◼ Shows variation about the mean
Sample variance: n

( X −X)

2
i
S =
2 i =1

n −1
◼ Population variance:
N

( X −)
2
i
 =
2 i =1

Dr. Mohamed Gad


N Lecture 2 – Page 26
Standard Deviation
◼ Most important measure of variation
◼ Shows variation about the mean
◼ Has the same units as the original data
◼ Sample standard deviation: n

( X −X)
2
i
S= i =1

n −1
N
Population standard deviation:
( X −)
2

i
= i =1

N
Dr. Mohamed Gad Lecture 2 – Page 27
Coefficient of Variation
S 
CV =  100%
X 

◼ Measures relative variation


◼ Always in percentage (%)
◼ Shows variation relative to mean
◼ Is used to compare two or more sets of data
measured in different units
◼ Not suitable if mean is close to zero
Dr. Mohamed Gad Lecture 2 – Page 28
Shape of a Distribution

◼ Skewness Coefficient
◼ Describes how data is distributed
◼ Measure of shape
 (X i − X)
n
3
n
For population or large sample CS = i =1
3/ 2
 2
  (X i − X ) 
n

 i =1 

◼ Corrected form of CS
 (X i − X)
n
3

n2 n
For small sample CS = i =1
(n − 1)(n − 2)  n 2
3/ 2

  (X i − X ) 
 i =1 
Dr. Mohamed Gad Lecture 2 – Page 29
Shape of a Distribution

◼ Symmetric or skewed

CS < 0 CS = 0 CS > 0
Left-Skewed Symmetric Right-Skewed
Mean < Median < Mode Mean = Median =Mode Mode < Median < Mean

Dr. Mohamed Gad Lecture 2 – Page 30


Descriptive Measure using Grouped
Data (Frequency Distribution)
k

Sample Mean
f j X Classj
1 k
=  f j X Classj
◼ j =1
X= k

f
n j =1
j
j =1

k
1
◼ Sample Variance S2 = 
n − 1 j =1
f j ( X Classj − X ) 2

X Classj is the mid-point for class j

fj is the frequency for class j

Dr. Mohamed Gad Lecture 2 – Page 31


Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

7
6
6
5
5
4
4
3
3
2
2

1
0 0
0
5 15 25 36
35 45 55 More

Dr. Mohamed Gad Lecture 2 – Page 32


Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

𝑋ሜ = 32.4 S2=160.57
7
6 𝑘
6 1
5 𝑋ሜ = ෍ 𝑓𝑗 𝑋𝐶𝑙𝑎𝑠𝑠𝑗
5 𝑛
4 𝑗=1
4 1
3 = 45 + 150 + 175 + 180 + 110
3
2
50
2 = 33
1

0
0 0
𝑆2
1
5 15 25 35
36 45 55 More
= σ𝑘𝑗=1 𝑓𝑗 (𝑋𝐶𝑙𝑎𝑠𝑠𝑗 ሜ 2 =153.68
− 𝑋)
𝑛−1

Dr. Mohamed Gad Lecture 2 – Page 33


Descriptive Measure using Grouped
Data (Frequency Distribution)

◼ Sample Mode
 
 Δ1 
M ode = L1 +    C
 Δ1 + Δ 2 
C = L2 − L1

L1 L2

Dr. Mohamed Gad Lecture 2 – Page 34


Descriptive Measure using Grouped
Data (Frequency Distribution)

◼ Sample Mode
 
 Δ1 
M ode = L1 +    C
 Δ1 + Δ 2 
C = L2 − L1

L1 L2

3
Mode = 20 + × 10 = 27.5
3+1

Dr. Mohamed Gad Lecture 2 – Page 35


Descriptive Measure using Grouped
Data (Frequency Distribution)

◼ Sample Median Median is in this class

M edian = L1 +
N /2 − f i
C
fmedian
f median
f i =  frequency until L1

C = L2 − L1
L1 L2

Dr. Mohamed Gad Lecture 2 – Page 36


Descriptive Measure using Grouped
Data (Frequency Distribution)

◼ Sample Median Median is in this class

M edian = L1 +
N /2 − f i
C
fmedian
f median
f i =  frequency until L1

C = L2 − L1
L1 L2

10 − 9
Median = 30 + × 10
5
= 32
Dr. Mohamed Gad Lecture 2 – Page 37
Coefficient of Correlation
◼ Measures the strength of the linear
relationship between two quantitative
variables
n

( X i − X )(Yi − Y )
r= i =1
n n

( X −X)  (Y − Y )
2 2
i i
i =1 i =1

Dr. Mohamed Gad Lecture 2 – Page 38


Features of
Correlation Coefficient
◼ Unit free
◼ Ranges between –1 and 1
◼ The closer to –1, the stronger the negative linear
relationship
◼ The closer to 1, the stronger the positive linear
relationship
◼ The closer to 0, the weaker any linear relationship

Dr. Mohamed Gad Lecture 2 – Page 39


Scatter Plots of Data with
Various Correlation Coefficients
Y Y Y

X X X
r = -1 r = -.6 r=0
Y Y

X X
Dr. Mohamed Gad
r = .6 r=1 Lecture 2 – Page 40

You might also like