Engineering Statistics - Chapter 2 Jafar
Engineering Statistics - Chapter 2 Jafar
College of Engineering
Department of Petroleum Engineering
41 Hussein Jasim Mohammed
Frequency Distribution
• Ungrouped data is raw data, or data that have not been classified and summarized in
any way.
• Grouped data is the data that have been organized into a frequency distribution.
College of Engineering
Department of Petroleum Engineering
42 Hussein Jasim Mohammed
Frequency Distribution (cont.)
• The following table contains the data of lifetimes in hours of 200 light bulbs:
Item Lifetime (hours)
College of Engineering
Department of Petroleum Engineering
43 Hussein Jasim Mohammed
Frequency Distribution (cont.)
• The data of the previous table is grouped and presented in the following table which
is called frequency distribution:
College of Engineering
Department of Petroleum Engineering
44 Hussein Jasim Mohammed
Frequency Distribution (cont.)
• Frequency distribution basic properties are:
• A class interval is represented by two numbers called class boundaries or class endpoint.
• The smaller number of class boundaries is the lower class boundary.
• The larger number of class boundaries is the upper class boundary.
• The class width (also referred to as the class size or class length) of a frequency distribution is
the difference between the lower and upper class boundaries of any class in that frequency
distribution.
• Frequency of a class interval is the number of values occur within that class interval.
College of Engineering
Department of Petroleum Engineering
45 Hussein Jasim Mohammed
Frequency Distribution (cont.)
• For Example, for the following frequency distribution:
Class Interval Frequency
0-10 4
10-20 12
20-30 13
30-40 19
40-50 7
50-60 5
Total 60
College of Engineering
Department of Petroleum Engineering
46 Hussein Jasim Mohammed
Frequency Distribution (cont.)
Notice that the frequency distribution may be constructed in such a way that there is a gap between
the upper class boundary of each class and the lower class boundary of the next class.
• The following frequency distribution of heights of 100 male soldiers at one of the Iraqi army units
is an example of this way of presenting frequency distribution.
• The class end values in this case are called class limits.
• For example, for the second class, 165 is the lower class limit and 169 is the upper class limit.
College of Engineering
Department of Petroleum Engineering
47 Hussein Jasim Mohammed
Frequency Distribution (cont.)
• Class boundaries are selected so that no value of the data can fit into more than one class.
• This problem happens when there are values equal to the class boundaries.
• For the following table, values such as 8, 12, 16, or any other value equal to one of the class
boundaries (except the start and the end of the frequency distribution) can fit in two classes.
Class Frequency
4-8 3
8-12 7
12-16 12
16-20 28
20-24 8
24-26 2
Total 60
• In general, the problem of having values of the data can fit into more than one class can be
avoided by using left-end inclusion convention, which means that a class interval contains its
left boundary but not its right boundary.
College of Engineering
Department of Petroleum Engineering
48 Hussein Jasim Mohammed
Frequency Distribution
• This convention is explained in the following table:
Notice that the left-end inclusion convention explained in the above table will be
used throughout the study of Engineering Statistics subject.
College of Engineering
Department of Petroleum Engineering
49 Hussein Jasim Mohammed
Frequency Distribution (cont.)
• Other frequency distribution properties are:
• Class midpoint (or class mark) is the value halfway across the class interval and can
be calculated as the average of the two class boundaries.
• The relative frequency of a class is the frequency of that class divided by the total
frequency.
• The cumulative frequency for each class interval is the frequency for that class
interval added to the sum of the preceding classes’ frequencies.
• The tables in the next page show a set of data about the heights of 50 male players at a
football club and the frequency distribution of this data set. The frequency distribution
includes classes, frequencies, class midpoints, relative frequencies and cumulative
frequencies.
College of Engineering
Department of Petroleum Engineering
50 Hussein Jasim Mohammed
Frequency Distribution (cont.)
Players’ Heights (cm)
161.2 171.3 165.8 174.6 181.3 177.1 169.2 174.9 169.8 175.4
174.2 179.2 170.9 170.0 166.3 173.7 176.3 171.0 184.3 167.5
180.8 172.2 165.0 177.2 173.5 186.2 178.1 173.3 185.6 177.6 Ungrouped Data
173.6 165.2 179.2 171.4 175.9 164.8 168.9 170.9 179.7 171.7
188.7 172.0 178.9 166.6 183.6 176.2 174.4 177.0 160.0 182.7
Total 50
College of Engineering
Department of Petroleum Engineering
51 Hussein Jasim Mohammed
Frequency Distribution (cont.)
• The steps of constructing a frequency distribution are:
1- Determine the range of the raw data. The range is defined as the difference between the
largest and smallest values. The range for the previous example = 188.7 - 160.0 = 22.7.
2- Determine the number of classes that the frequency distribution will contain. As a general
rule, select between 5 and 15 classes.
3- Determine the width of the class interval. Approximately, the class width is calculated by
dividing the range by the number of classes. Normally, the number is rounded up to the next
whole number.
4- Choose a start for the frequency distribution which must be equal to or lower than the lowest
number of the ungrouped data. This starting value is the lower boundary of the first class.
5- Add the class width from step 3 to the lower boundary of the first class of step 4 to get its
upper boundary.
6- Continue creating the remaining classes until you reach the number you chose in step 2. The
end value of the frequency distribution (the upper boundary of last class) should be higher
than the highest number of the ungrouped data.
7- Complete the other details of the frequency distribution: frequencies, midpoints, relative
frequencies and cumulative frequencies.
College of Engineering
Department of Petroleum Engineering
52 Hussein Jasim Mohammed
Frequency Distribution (cont.)
Example 1: For the following table which contains 60 years of raw data of the unemployment
rates for Canada, construct a frequency distribution including class midpoints,
relative frequencies, and cumulative frequencies.
College of Engineering
Department of Petroleum Engineering
53 Hussein Jasim Mohammed
Frequency Distribution (cont.)
Solution:
1- Range = 12.0 - 2.3 = 9.7
2- Let the number of classes = 6
9.7
3- Class width = 6 = 1.62
Use class width = 2
4- Let the start of the frequency distribution = 1 ⇒ First class lower boundary = 1
5- First class upper boundary = 1 + 2 = 3 ⇒ First class is: 1-3
6- The other classes are: 3-5, 5-7, 7-9, 9-11 and 11-13.
7- Complete the other details of the frequency distribution by determining the
frequencies, midpoints, relative frequencies and cumulative frequencies.
College of Engineering
Department of Petroleum Engineering
54 Hussein Jasim Mohammed
Frequency Distribution (cont.)
The frequency distribution is as shown in the following table.
College of Engineering
Department of Petroleum Engineering
55 Hussein Jasim Mohammed
Charts and Graphs
• One of the most effective methods for showing data to help decision makers is
graphical presentation.
• Using graphs and charts, the decision maker can often get an overall picture of the
data and reach some useful conclusions only by studying the chart or graph.
• The most commonly used types of charts and graphs in statistics to display data are:
1- Histogram
2- Frequency Polygon
3- Ogive
4- Stem-and-Leaf Plot
5- Pie Chart
6- Bar Graph
7- Scatter Diagram
College of Engineering
Department of Petroleum Engineering
56 Hussein Jasim Mohammed
Graphs and Charts (cont.)
1- Histogram
• The histogram is a graphical display of the frequency distribution.
• A histogram is bar graph plot of class data, with the bars placed adjacent to each
other.
• The vertical axis of a histogram usually represent the class frequency, and the
resulting graph is called histogram (or frequency histogram.)
• If the vertical axis of a histogram represent the relative class frequency, then the
graph is called relative frequency histogram.
College of Engineering
Department of Petroleum Engineering
57 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Example 2: Draw the frequency histogram for the set of data of example 1 using the results
already obtained.
Solution:
Using the results already obtained for example 1, the histogram is as shown in the following graph.
College of Engineering
Department of Petroleum Engineering
58 Hussein Jasim Mohammed
Charts and Graphs (cont.)
2- Frequency Polygon
• A frequency polygon, like the histogram, is a graphical display of class frequencies.
• In a frequency polygon each class frequency is plotted as a dot at the class midpoint,
and the dots are connected by a series of line segments.
• The vertical axis of a frequency polygon represent the class frequency.
College of Engineering
Department of Petroleum Engineering
59 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Example 3: Draw the frequency polygon for the set of data of example 1 using the obtained results.
Solution:
Using the obtained results for example 1, the frequency polygon is as shown in the following graph.
College of Engineering
Department of Petroleum Engineering
60 Hussein Jasim Mohammed
Charts and Graphs (cont.)
3- Ogive
• An ogive (o-jive) is a cumulative frequency polygon.
• Ogives are graphs that are used to display how many numbers lie below or above a
particular variable or value.
College of Engineering
Department of Petroleum Engineering
61 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Example 4: Draw ogive for the set of data of example 1 using the results previously found.
Solution:
Using the found results for example 1, ogive is as shown in the graph below.
College of Engineering
Department of Petroleum Engineering
62 Hussein Jasim Mohammed
Charts and Graphs (cont.)
4- Stem-and-Leaf Plot
• Stem-and-leaf plot technique is simple and provides an exceptional view of the data.
• A stem-and-leaf plot is constructed by separating the digits for each number of the
data into two groups, a stem and a leaf.
• The leftmost digits are the stem and consist of the higher valued digits.
• The rightmost digits are the leaves and contain the lower values.
• If a set of data has only two digits, the stem is the value on the left and the leaf is
the value on the right. For example, if 34 is one of the numbers, the stem is 3 and the
leaf is 4.
• For numbers with more than two digits, division of stem and leaf is a matter of
researcher preference.
College of Engineering
Department of Petroleum Engineering
63 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Example 5: The following table contains grades from a Computer Programming exam given to 35
students. Draw a stem-and-leaf plot of this data.
86 77 91 60 55 76 92
47 88 67 23 59 72 75
83 77 68 82 97 89 81
75 74 39 67 79 83 70
78 91 68 49 56 94 81
Solution:
Stem-and-leaf plot is shown below.
College of Engineering
Department of Petroleum Engineering
64 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Example 6: The following data represents the costs (in dollars) of a sample of 30 postal mailings
by a company. Use the whole numbers as a stem and the decimal places as a leaf.
3.67 2.75 9.15 5.11 3.32 2.09
1.83 10.94 1.93 3.89 7.2 2.78
6.72 7.80 5.47 4.15 3.55 3.53
3.34 4.95 5.42 8.64 4.84 4.10
5.10 6.45 4.65 1.97 2.84 3.21
Solution:
Using dollars as a stem and cents as a leaf, a stem-and-leaf plot of the data is as shown below.
Stem-and-Leaf Plot of Postal Mailings Costs
Stem Leaf
1 83 93 97
2 09 75 78 84
3 21 32 34 53 55 67 89
4 10 15 65 84 95
5 10 11 42 47
6 45 72
7 20 80
8 64
9 15
10 94
College of Engineering
Department of Petroleum Engineering
65 Hussein Jasim Mohammed
Charts and Graphs (cont.)
5- Pie Chart
• A pie chart is a circular diagram of data where the area of the whole pie represents
100% of the data and slices of the pie represent the percentage breakdown of the
categories.
• Pie charts show the relative magnitudes of the parts to the whole.
College of Engineering
Department of Petroleum Engineering
66 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Example 7: The following table contains annual sales for the top petroleum refining companies in
the United States in the last year. Construct a pie chart from this data.
Solution:
• First convert the raw sales figures to proportions by dividing each sales figure by the total sales
figure. This proportion is like relative frequency computed for frequency distributions.
College of Engineering
Department of Petroleum Engineering
67 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Solution:
• First convert the raw sales figures to proportions by dividing each sales figure by the total sales
figure. This proportion is analogous to relative frequency computed for frequency distributions.
• Multiply each proportion by 360 to obtain the angle in degrees for each item.
College of Engineering
Department of Petroleum Engineering
68 Hussein Jasim Mohammed
Charts and Graphs (cont.)
6- Bar Graph
• A bar graph (or bar chart) contains two or more categories along one axis and a
series of bars, one for each category, along the other axis.
• The length of the bar represents the magnitude of the variable (amount, frequency,
money, percentage, etc.) for each category.
• A bar graph generally is constructed from the same type of data that is used to
produce a pie chart.
• The bar graph may be either horizontal or vertical.
• When bar graph is used to represent the values for more than one object within the
same category, then it called a grouped bar graph.
College of Engineering
Department of Petroleum Engineering
69 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Example 8: For the data set of example 7, draw a vertical bar graph.
Solution:
• A vertical bar graph is as shown below.
210,783
200,000 178,558
150,000
96,758
100,000
60,044
42,101
50,000
0
Exxon Chevron Conoco Valero Marathon Sunoco
Mobil Phillips Energy Oil
Annual Sales ($ millions)
College of Engineering
Department of Petroleum Engineering
70 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Example 9: For the data set of example 7, draw a horizontal bar graph.
Solution:
• A horizontal bar graph is as shown below.
Chevron 210,783
College of Engineering
Department of Petroleum Engineering
71 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Example 10: A car company produces 4 different colors of one of its car models. The sales for 6
months is shown in the table below. Represent this data using a vertical bar chart.
College of Engineering
Department of Petroleum Engineering
72 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Solution:
• Since there are more than one value for each category, a grouped bar graph should be used and
as shown below.
9000
8000
Number of Cars Sold
7000
6000
White Cars
5000
Red Cars
4000
Grey Cars
3000
Blue Cars
2000
1000
0
January February March April May June
Month
College of Engineering
Department of Petroleum Engineering
73 Hussein Jasim Mohammed
Charts and Graphs (cont.)
7- Scatter Plot
• A scatter diagram (or scatter plot) is a two-dimensional graph plot of pairs of values
of two numerical variables.
• Scatter diagrams are used to explore (or investigate) the relationship between two
numerical variables.
• A positive relationship between the two variables would be indicated if the points
give a line going up from left to right, which means that as one variable increases,
the other increases too.
• A negative relationship between the two variables would be indicated if the points
suggest a line going down from left to right, meaning that as one variable increases,
the other decreases.
• If the scatter plot dots suggest a completely horizontal line, a completely vertical
line, or no line at all, then no relationship would be indicated.
College of Engineering
Department of Petroleum Engineering
74 Hussein Jasim Mohammed
Charts and Graphs (cont.)
College of Engineering
Department of Petroleum Engineering
75 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Example 11: The table below y is the purity of oxygen produced in a chemical distillation process,
and x is the percentage of hydrocarbons that are present in the main condenser of the
distillation unit. Present a scatter diagram of this data and give your conclusion about
the relationship between x and y.
Test Hydrocarbons Level Purity
Number x (%) y (%)
College of Engineering
Department of Petroleum Engineering
76 Hussein Jasim Mohammed
Charts and Graphs (cont.)
Solution:
• A scatter diagram of the data set is as shown below.
• The scatter diagram shows that the points lie scattered randomly around a straight line, and there
is a positive relationship between percentage of hydrocarbons and purity of oxygen produced.
College of Engineering
Department of Petroleum Engineering
77 Hussein Jasim Mohammed