0% found this document useful (0 votes)
33 views37 pages

Engineering Statistics - Chapter 2 Jafar

The document discusses frequency distributions, which are used to summarize large datasets by grouping raw data into class intervals and frequencies. A frequency distribution presents grouped data in a table with class boundaries, frequencies, and other properties. It reduces data size for easier analysis while focusing on important subgroups. The document provides examples to illustrate key aspects of constructing and interpreting frequency distributions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views37 pages

Engineering Statistics - Chapter 2 Jafar

The document discusses frequency distributions, which are used to summarize large datasets by grouping raw data into class intervals and frequencies. A frequency distribution presents grouped data in a table with class boundaries, frequencies, and other properties. It reduces data size for easier analysis while focusing on important subgroups. The document provides examples to illustrate key aspects of constructing and interpreting frequency distributions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Chapter 2

Presentation of Statistical Data

College of Engineering
Department of Petroleum Engineering
41 Hussein Jasim Mohammed
Frequency Distribution
• Ungrouped data is raw data, or data that have not been classified and summarized in
any way.

• Grouped data is the data that have been organized into a frequency distribution.

• Frequency distribution is a summary of data presented in the form of class intervals


(or classes) and frequencies.

• The benefits of grouping data are:


1- Reducing large-size data sets for easier interpretation and understanding.
2- Focusing on important subpopulations and ignores irrelevant ones.
3- Improving the accuracy and efficiency of estimation.

College of Engineering
Department of Petroleum Engineering
42 Hussein Jasim Mohammed
Frequency Distribution (cont.)
• The following table contains the data of lifetimes in hours of 200 light bulbs:
Item Lifetime (hours)

College of Engineering
Department of Petroleum Engineering
43 Hussein Jasim Mohammed
Frequency Distribution (cont.)
• The data of the previous table is grouped and presented in the following table which
is called frequency distribution:

College of Engineering
Department of Petroleum Engineering
44 Hussein Jasim Mohammed
Frequency Distribution (cont.)
• Frequency distribution basic properties are:
• A class interval is represented by two numbers called class boundaries or class endpoint.
• The smaller number of class boundaries is the lower class boundary.
• The larger number of class boundaries is the upper class boundary.
• The class width (also referred to as the class size or class length) of a frequency distribution is
the difference between the lower and upper class boundaries of any class in that frequency
distribution.
• Frequency of a class interval is the number of values occur within that class interval.

Lower class boundaries Upper class boundaries

College of Engineering
Department of Petroleum Engineering
45 Hussein Jasim Mohammed
Frequency Distribution (cont.)
• For Example, for the following frequency distribution:
Class Interval Frequency
0-10 4
10-20 12
20-30 13
30-40 19
40-50 7
50-60 5
Total 60

• The start of the frequency distribution is 0 and the end is 60.


• There are 6 class intervals (or classes.)
• The first class boundaries are 0 (which is the lower class boundary) and 10 (which
is the upper class boundary.)
• The class width for the fifth class = 50 – 40 = 10 (which is the same for all classes.)
• The frequency of the third class is 13.

College of Engineering
Department of Petroleum Engineering
46 Hussein Jasim Mohammed
Frequency Distribution (cont.)
Notice that the frequency distribution may be constructed in such a way that there is a gap between
the upper class boundary of each class and the lower class boundary of the next class.
• The following frequency distribution of heights of 100 male soldiers at one of the Iraqi army units
is an example of this way of presenting frequency distribution.

Height (cm) Number of Soldiers


160-164 5
165-169 18
170-174 42
175-179 26
180-184 6
185-189 3
Total 100

• The class end values in this case are called class limits.
• For example, for the second class, 165 is the lower class limit and 169 is the upper class limit.

College of Engineering
Department of Petroleum Engineering
47 Hussein Jasim Mohammed
Frequency Distribution (cont.)
• Class boundaries are selected so that no value of the data can fit into more than one class.
• This problem happens when there are values equal to the class boundaries.
• For the following table, values such as 8, 12, 16, or any other value equal to one of the class
boundaries (except the start and the end of the frequency distribution) can fit in two classes.

Class Frequency
4-8 3
8-12 7
12-16 12
16-20 28
20-24 8
24-26 2
Total 60

• In general, the problem of having values of the data can fit into more than one class can be
avoided by using left-end inclusion convention, which means that a class interval contains its
left boundary but not its right boundary.

College of Engineering
Department of Petroleum Engineering
48 Hussein Jasim Mohammed
Frequency Distribution
• This convention is explained in the following table:

Class Interval Format Meaning

8-12 - Lower class boundary = 8

8-under 12 - Upper class boundary = 12


- The class contains all values that are greater than or
8 ≤ x < 12 equal to 8 and less than 12.

Notice that the left-end inclusion convention explained in the above table will be
used throughout the study of Engineering Statistics subject.

College of Engineering
Department of Petroleum Engineering
49 Hussein Jasim Mohammed
Frequency Distribution (cont.)
• Other frequency distribution properties are:
• Class midpoint (or class mark) is the value halfway across the class interval and can
be calculated as the average of the two class boundaries.
• The relative frequency of a class is the frequency of that class divided by the total
frequency.
• The cumulative frequency for each class interval is the frequency for that class
interval added to the sum of the preceding classes’ frequencies.

• The tables in the next page show a set of data about the heights of 50 male players at a
football club and the frequency distribution of this data set. The frequency distribution
includes classes, frequencies, class midpoints, relative frequencies and cumulative
frequencies.

College of Engineering
Department of Petroleum Engineering
50 Hussein Jasim Mohammed
Frequency Distribution (cont.)
Players’ Heights (cm)
161.2 171.3 165.8 174.6 181.3 177.1 169.2 174.9 169.8 175.4
174.2 179.2 170.9 170.0 166.3 173.7 176.3 171.0 184.3 167.5
180.8 172.2 165.0 177.2 173.5 186.2 178.1 173.3 185.6 177.6 Ungrouped Data
173.6 165.2 179.2 171.4 175.9 164.8 168.9 170.9 179.7 171.7
188.7 172.0 178.9 166.6 183.6 176.2 174.4 177.0 160.0 182.7

Height Number of Class Midpoint Relative Cumulative


(cm) Players Frequency Frequency
160+165
160-165 3 =162.5 0.06 3
2
165+170
165-170 9 =167.5 0.18 12
2
170+175
170-175 17 2
=172.5 0.34 29 Grouped Data
175+180
175-180 13 =177.5 0.26 42
2
180+185
180-185 5 =182.5 0.10 47
2
185+190
185-190 3 =187.5 0.06 50
2

Total 50

College of Engineering
Department of Petroleum Engineering
51 Hussein Jasim Mohammed
Frequency Distribution (cont.)
• The steps of constructing a frequency distribution are:
1- Determine the range of the raw data. The range is defined as the difference between the
largest and smallest values. The range for the previous example = 188.7 - 160.0 = 22.7.
2- Determine the number of classes that the frequency distribution will contain. As a general
rule, select between 5 and 15 classes.
3- Determine the width of the class interval. Approximately, the class width is calculated by
dividing the range by the number of classes. Normally, the number is rounded up to the next
whole number.
4- Choose a start for the frequency distribution which must be equal to or lower than the lowest
number of the ungrouped data. This starting value is the lower boundary of the first class.
5- Add the class width from step 3 to the lower boundary of the first class of step 4 to get its
upper boundary.
6- Continue creating the remaining classes until you reach the number you chose in step 2. The
end value of the frequency distribution (the upper boundary of last class) should be higher
than the highest number of the ungrouped data.
7- Complete the other details of the frequency distribution: frequencies, midpoints, relative
frequencies and cumulative frequencies.

College of Engineering
Department of Petroleum Engineering
52 Hussein Jasim Mohammed
Frequency Distribution (cont.)
Example 1: For the following table which contains 60 years of raw data of the unemployment
rates for Canada, construct a frequency distribution including class midpoints,
relative frequencies, and cumulative frequencies.

2.3 7.0 6.3 11.3 9.6


2.8 7.1 5.6 10.6 9.1
3.6 5.9 5.4 9.7 8.3
2.4 5.5 7.1 8.8 7.6
2.9 4.7 7.1 7.8 6.8
3.0 3.9 8.0 7.5 7.2
4.6 3.6 8.4 8.1 7.7
4.4 4.1 7.5 10.3 7.6
3.4 4.8 7.5 11.2 7.2
4.6 4.7 7.6 11.4 6.8
6.9 5.9 11.0 10.4 6.3
6.0 6.4 12.0 9.5 6.0

College of Engineering
Department of Petroleum Engineering
53 Hussein Jasim Mohammed
Frequency Distribution (cont.)
Solution:
1- Range = 12.0 - 2.3 = 9.7
2- Let the number of classes = 6
9.7
3- Class width = 6 = 1.62
Use class width = 2
4- Let the start of the frequency distribution = 1 ⇒ First class lower boundary = 1
5- First class upper boundary = 1 + 2 = 3 ⇒ First class is: 1-3
6- The other classes are: 3-5, 5-7, 7-9, 9-11 and 11-13.
7- Complete the other details of the frequency distribution by determining the
frequencies, midpoints, relative frequencies and cumulative frequencies.

College of Engineering
Department of Petroleum Engineering
54 Hussein Jasim Mohammed
Frequency Distribution (cont.)
The frequency distribution is as shown in the following table.

Class Relative Cumulative


Class Frequency
Midpoint Frequency Frequency
1-3 4 2 0.0667 4
3-5 12 4 0.0200 16
5-7 13 6 0.2167 29
7-9 19 8 0.3167 48
9-11 7 10 0.1167 55
11-13 5 12 0.0833 60
Total 60

College of Engineering
Department of Petroleum Engineering
55 Hussein Jasim Mohammed
Charts and Graphs
• One of the most effective methods for showing data to help decision makers is
graphical presentation.
• Using graphs and charts, the decision maker can often get an overall picture of the
data and reach some useful conclusions only by studying the chart or graph.

• The most commonly used types of charts and graphs in statistics to display data are:
1- Histogram
2- Frequency Polygon
3- Ogive
4- Stem-and-Leaf Plot
5- Pie Chart
6- Bar Graph
7- Scatter Diagram

College of Engineering
Department of Petroleum Engineering
56 Hussein Jasim Mohammed
Graphs and Charts (cont.)
1- Histogram
• The histogram is a graphical display of the frequency distribution.
• A histogram is bar graph plot of class data, with the bars placed adjacent to each
other.
• The vertical axis of a histogram usually represent the class frequency, and the
resulting graph is called histogram (or frequency histogram.)
• If the vertical axis of a histogram represent the relative class frequency, then the
graph is called relative frequency histogram.

• The steps of constructing a histogram are:


1- Label the class interval boundaries on a horizontal scale (x-axis).
2- Mark and label the vertical scale (y-axis) with the frequencies (or the relative
frequencies.)
3- Above each class, draw a rectangle where height is equal to the frequency
corresponding to that class.

College of Engineering
Department of Petroleum Engineering
57 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Example 2: Draw the frequency histogram for the set of data of example 1 using the results
already obtained.
Solution:
Using the results already obtained for example 1, the histogram is as shown in the following graph.

Histogram of Canadian Unemployment Data for 60 Years


Class Frequency
1-3 4
3-5 12
5-7 13
7-9 19
9-11 7
11-13 5

College of Engineering
Department of Petroleum Engineering
58 Hussein Jasim Mohammed
Charts and Graphs (cont.)
2- Frequency Polygon
• A frequency polygon, like the histogram, is a graphical display of class frequencies.
• In a frequency polygon each class frequency is plotted as a dot at the class midpoint,
and the dots are connected by a series of line segments.
• The vertical axis of a frequency polygon represent the class frequency.

• The steps of constructing a frequency polygon are:


1- Scale and label class midpoints along the horizontal axis (x-axis).
2- Mark and label the vertical scale (y-axis) with the frequencies.
3- Plot a dot for the associated frequency value at each class midpoint.
4- Connect these midpoint dots to complete the graph.

College of Engineering
Department of Petroleum Engineering
59 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Example 3: Draw the frequency polygon for the set of data of example 1 using the obtained results.

Solution:
Using the obtained results for example 1, the frequency polygon is as shown in the following graph.

Frequency Polygon of Canadian Unemployment Data for 60 Years


Class
Class Frequency
Midpoint
1-3 4 2
3-5 12 4
5-7 13 6
7-9 19 8
9-11 7 10
11-13 5 12

College of Engineering
Department of Petroleum Engineering
60 Hussein Jasim Mohammed
Charts and Graphs (cont.)
3- Ogive
• An ogive (o-jive) is a cumulative frequency polygon.
• Ogives are graphs that are used to display how many numbers lie below or above a
particular variable or value.

• The steps of constructing an Ogive are:


1- Construction begins by labeling the x-axis with the class endpoints.
2- Mark and label the vertical scale (y-axis) with the frequencies. However, the use
of cumulative frequency values requires that the scale along the y-axis be large
enough to include the frequency total.
3- A dot of zero frequency is plotted at the beginning of the first class, and
construction proceeds by marking a dot at the end of each class interval for the
cumulative value.
4- Connect the dots with line segments to complete the ogive.

College of Engineering
Department of Petroleum Engineering
61 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Example 4: Draw ogive for the set of data of example 1 using the results previously found.
Solution:
Using the found results for example 1, ogive is as shown in the graph below.

Ogive of Canadian Unemployment Data for 60 Years


Cumulative
Class Frequency
Frequency
1-3 4 4
3-5 12 16
5-7 13 29
7-9 19 48
9-11 7 55
11-13 5 60

College of Engineering
Department of Petroleum Engineering
62 Hussein Jasim Mohammed
Charts and Graphs (cont.)
4- Stem-and-Leaf Plot
• Stem-and-leaf plot technique is simple and provides an exceptional view of the data.
• A stem-and-leaf plot is constructed by separating the digits for each number of the
data into two groups, a stem and a leaf.
• The leftmost digits are the stem and consist of the higher valued digits.
• The rightmost digits are the leaves and contain the lower values.
• If a set of data has only two digits, the stem is the value on the left and the leaf is
the value on the right. For example, if 34 is one of the numbers, the stem is 3 and the
leaf is 4.
• For numbers with more than two digits, division of stem and leaf is a matter of
researcher preference.

College of Engineering
Department of Petroleum Engineering
63 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Example 5: The following table contains grades from a Computer Programming exam given to 35
students. Draw a stem-and-leaf plot of this data.
86 77 91 60 55 76 92
47 88 67 23 59 72 75
83 77 68 82 97 89 81
75 74 39 67 79 83 70
78 91 68 49 56 94 81
Solution:
Stem-and-leaf plot is shown below.

Stem-and-Leaf Plot of Computer Programming Exam Grades


Stem Leaf
2 3
3 9
4 7 9
5 5 6 9
6 0 7 7 8 8
7 0 2 4 5 5 6 7 7 8 9
8 1 1 2 3 3 6 8 9
9 1 1 2 4 7

College of Engineering
Department of Petroleum Engineering
64 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Example 6: The following data represents the costs (in dollars) of a sample of 30 postal mailings
by a company. Use the whole numbers as a stem and the decimal places as a leaf.
3.67 2.75 9.15 5.11 3.32 2.09
1.83 10.94 1.93 3.89 7.2 2.78
6.72 7.80 5.47 4.15 3.55 3.53
3.34 4.95 5.42 8.64 4.84 4.10
5.10 6.45 4.65 1.97 2.84 3.21
Solution:
Using dollars as a stem and cents as a leaf, a stem-and-leaf plot of the data is as shown below.
Stem-and-Leaf Plot of Postal Mailings Costs
Stem Leaf
1 83 93 97
2 09 75 78 84
3 21 32 34 53 55 67 89
4 10 15 65 84 95
5 10 11 42 47
6 45 72
7 20 80
8 64
9 15
10 94

College of Engineering
Department of Petroleum Engineering
65 Hussein Jasim Mohammed
Charts and Graphs (cont.)
5- Pie Chart
• A pie chart is a circular diagram of data where the area of the whole pie represents
100% of the data and slices of the pie represent the percentage breakdown of the
categories.
• Pie charts show the relative magnitudes of the parts to the whole.

• The steps of constructing a Pie Chart are:


1- Determine the proportion of the item to the whole by dividing each item by the
total.
2- Because a circle contains 360°, each proportion is then multiplied by 360 to obtain
the correct number of degrees to represent each item.
3- The pie chart is then completed by determining each of the other angles and then
drawing the slices.

College of Engineering
Department of Petroleum Engineering
66 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Example 7: The following table contains annual sales for the top petroleum refining companies in
the United States in the last year. Construct a pie chart from this data.

Company Annual Sales ($ millions)


Exxon Mobil 372,824
Chevron 210,783
Conoco Phillips 178,558
Valero Energy 96,758
Marathon Oil 60,044
Sunoco 42,101
Totals 961,068

Solution:
• First convert the raw sales figures to proportions by dividing each sales figure by the total sales
figure. This proportion is like relative frequency computed for frequency distributions.

College of Engineering
Department of Petroleum Engineering
67 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Solution:
• First convert the raw sales figures to proportions by dividing each sales figure by the total sales
figure. This proportion is analogous to relative frequency computed for frequency distributions.
• Multiply each proportion by 360 to obtain the angle in degrees for each item.

Annual Sales Petroleum Refining Companies’ Sales in the Last Year


Company Proportion Angle
($ millions)
Exxon Mobil 372,824 .3879 139.64
Chevron 210,783 .2193 78.95
Conoco Phillips 178,558 .1858 66.89
Valero Energy 96,758 .1007 36.25
Marathon Oil 60,044 .0625 22.50
Sunoco 42,101 .0438 15.77
Totals 961,068 1.0000 360.00

College of Engineering
Department of Petroleum Engineering
68 Hussein Jasim Mohammed
Charts and Graphs (cont.)
6- Bar Graph
• A bar graph (or bar chart) contains two or more categories along one axis and a
series of bars, one for each category, along the other axis.
• The length of the bar represents the magnitude of the variable (amount, frequency,
money, percentage, etc.) for each category.
• A bar graph generally is constructed from the same type of data that is used to
produce a pie chart.
• The bar graph may be either horizontal or vertical.
• When bar graph is used to represent the values for more than one object within the
same category, then it called a grouped bar graph.

College of Engineering
Department of Petroleum Engineering
69 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Example 8: For the data set of example 7, draw a vertical bar graph.

Solution:
• A vertical bar graph is as shown below.

Petroleum Refining Companies’ Sales in the Last Year


400,000
372,824
350,000
300,000
250,000
Company

210,783
200,000 178,558

150,000
96,758
100,000
60,044
42,101
50,000
0
Exxon Chevron Conoco Valero Marathon Sunoco
Mobil Phillips Energy Oil
Annual Sales ($ millions)

College of Engineering
Department of Petroleum Engineering
70 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Example 9: For the data set of example 7, draw a horizontal bar graph.

Solution:
• A horizontal bar graph is as shown below.

Petroleum Refining Companies’ Sales in the Last Year


Sunoco 42,101

Marathon Oil 60,044


Company

Valero Energy 96,758

Conoco Phillips 178,558

Chevron 210,783

Exxon Mobil 372,824

0 50,000 100,000 150,000 200,000 250,000 300,000 350,000 400,000


Annual Sales ($ millions)

College of Engineering
Department of Petroleum Engineering
71 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Example 10: A car company produces 4 different colors of one of its car models. The sales for 6
months is shown in the table below. Represent this data using a vertical bar chart.

Number of Cars Sold


Month
White Red Grey Blue
January 7000 6000 4500 4000
February 7500 5500 6000 4500
March 8000 7000 6500 5500
April 8500 6000 7000 5000
May 8500 6500 5500 4500
June 9500 6000 5000 3500

College of Engineering
Department of Petroleum Engineering
72 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Solution:
• Since there are more than one value for each category, a grouped bar graph should be used and
as shown below.

Number of Cars Sold During 6 Months


10000

9000

8000
Number of Cars Sold

7000

6000
White Cars
5000
Red Cars
4000
Grey Cars
3000
Blue Cars
2000

1000

0
January February March April May June
Month

College of Engineering
Department of Petroleum Engineering
73 Hussein Jasim Mohammed
Charts and Graphs (cont.)
7- Scatter Plot
• A scatter diagram (or scatter plot) is a two-dimensional graph plot of pairs of values
of two numerical variables.
• Scatter diagrams are used to explore (or investigate) the relationship between two
numerical variables.
• A positive relationship between the two variables would be indicated if the points
give a line going up from left to right, which means that as one variable increases,
the other increases too.
• A negative relationship between the two variables would be indicated if the points
suggest a line going down from left to right, meaning that as one variable increases,
the other decreases.
• If the scatter plot dots suggest a completely horizontal line, a completely vertical
line, or no line at all, then no relationship would be indicated.

College of Engineering
Department of Petroleum Engineering
74 Hussein Jasim Mohammed
Charts and Graphs (cont.)

An example of positive correlation An example of negative correlation

Scatter plots which display no relationship between the variables plotted

College of Engineering
Department of Petroleum Engineering
75 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Example 11: The table below y is the purity of oxygen produced in a chemical distillation process,
and x is the percentage of hydrocarbons that are present in the main condenser of the
distillation unit. Present a scatter diagram of this data and give your conclusion about
the relationship between x and y.
Test Hydrocarbons Level Purity
Number x (%) y (%)

College of Engineering
Department of Petroleum Engineering
76 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Solution:
• A scatter diagram of the data set is as shown below.

• The scatter diagram shows that the points lie scattered randomly around a straight line, and there
is a positive relationship between percentage of hydrocarbons and purity of oxygen produced.

College of Engineering
Department of Petroleum Engineering
77 Hussein Jasim Mohammed

You might also like