CE Data Analysys Chap1.
CE Data Analysys Chap1.
DESCRIPTIVE STATISTICS
DESCRIPTIVE AND INDUCTIVE
STATISTICS
◦Statistics
is a science that deals with the methods of collecting,
organizing, summarizing and interpreting data in
order to draw valid conclusions (or interferences)
from them. These methods are categorized as
belonging to two major areas called descriptive
statistics and inductive or inferential statistics.
◦Desrciptive statistics
is concerned with the collection and
presentation of data and the description of
some of their features to yield meaningful
information without attempting to draw
any inferences from them.
◦Inductive or inferential statistics
is concerned with the development and use
of mathematical tools to go beyond data
presentation and make forecasts and
inferences. The concept of probability is basic
to the development and understanding of
inductive statistics.
POPULATION, SAMPLE AND
VARIABLES
One of the goals of statistical investigation is to acquire
information or draw some conclusions about a large group of items
on the basis of a few. When it is impossible or impractical to
observe the entire set of observations, we must, therefore, depend
on a subset of the observations.
A population consist of all the individuals or objects in a group
under study. A subcollection of items drawn from a population
under study is called a sample. The characteristics thatis being
studied is a variable. A variable may be qualitative or quantitative.
STATISTICS
DESCRIPTIVE INFERENTIAL
1. Organizing and summarizing data 1. Using complete data to make an
using numbers inference or draw a conclusion of the
and graphs population.
2. Data summary: 2. Uses probability to determine how
Bar Graphs, Histograms, Pie Charts, confident we can be that the conclusions
etc. we make are correct.
Shape of graph and skewness (Confidence Intervals and Margins of
3. Mesures of Central Tendency: Error)
Mean, Median, and Mode
4. Measures of Variability:
Range, variance, and Standard
deviation
QUALITATIVE VARIABLES QUANTITATIVE VARIABLES
• Categorical or nonnumeric
• How much or how many
-gender -Company revenues
-eye color -age
-religious affiliation -salary
-political affiliation -IQ
-major
Discrete variables –
can only assume certain values and gaps exist
between those values
Continuous Variables-
can assume any value within a certain range
-Profits
◦ -Square footage
COLLECTION OF DATA
Statistics deals also with the development of techniques for collecting
data. Data should be properly collected so that an investigator may be able
to answer the question under consideration with a reasonable degree of
confidence.
The simplest method foe ensuring a representative selection of samples
is to take a simple random sample. In this method of sampling, any
particular subset of the specified size has the same chance of being
selected. For example, a sample of size 100 will be taken from the 100,000
items produced. Each item’s serial number could be noted on identical
small sheet of paper, placed in box and jumbled thoroughly.
◦Another method is stratified sampling. This involves
taking a sample from each population unit in non-
overlapping groups. For instance, the manufacturer of a
light bulb wishes to investigate the lifetime of their
bulbs. If 25-watt, 60-watt, and 100-watt bulbs were
produced, a separate sample could be selected from
each of the three bulb sizes. This would result in
information on all the three bulb sizes.
TABULAR AND GRAPHICAL
METHODS IN DESCRIPTIVE
STATISTICS
◦ Descriptive statistics can be divided into two subjects areas.
The first area consists of data presentation using visual
techniques in the form of tables and graphs. The other area
consists of numerical summary measures foe data set.
◦ There may be many visual techniques which are familiar to
you, however, we focus our discussion on a selected few
techniques of data presentation that are most useful and
relevant to probability and inferential statistics.
FREQUENCY DISTRIBUTIONS
The organization of data in tabular form yields frequency
distributions. Data in frequency distributions may be grouped or
ungrouped.
Raw data are collected data that have not been organized
numerically an arrangement of raw data in ascending or descending
order or magnitude is an array. In an array, any value may appear
several times. The number of times a value appears in the listing is its
frequency. The relative frequency of any observation is obtained by
dividing the actual frequency of the observation by the total
requency.
UNGROUPED DATA
When the data is small (n ≤ 30) or when there are few distinct values, the data may be organized without
grouping.
EXAMPLE 1.1
A certain machine is to dispense 1.5 kilos of sodium nitrate. To determine whether it is properly adjusted
to dispense 1.5 kilos, the quality control engineer weighed 30 bags of sodium nitrate, 1.5 kilo each after the
machine was adjusted. The data given below refer to the net weight (in kilos) of each bag.
146 126 119 119 105 132 126 118 100 113
80-89 II 2 0.04
∑ 50 1.00
24.5 23.6 24.1 25.0 22.9 24.7 23.8 25.2 23.7 24.4
24.7 23.9 25.1 24.6 23.3 24.3 24.6 23.9 24.1 24.4
24.5 25.7 23.6 24.0 23.9 24.2 24.7 24.9 25.O 24.8
24.5 23.4 24.9 24.8 24.7 24.1 22.8 23.1 25.3 24.6
The lowest value is 22.8, therefore. 22.5 maybe the
lower limit of the 1st class. 22.5 + 0.5 = 23.0 is the
lower limit of the 2nd class.
Gasoline Consumption Tally No. of cars (frequency, f)
(miles/gallon)
22.5 – 22.9 II 2 0.050
23.0 – 23.4 III 3 0.075
23.5 – 23.9 IIII – II 7 0.175
24.0 – 24.4 IIII – III 8 0.200
24.5 – 24.9 IIII – IIII – IIII 14 0.350
25.0 – 25.4 IIII 5 0.125
25.5 – 25.9 I 1 0.025
∑ 40 1.000
TABLE 1.4 CLASS LIMITS, CLASS BOUNDARIES AND CLASS MARKS FOR FREQUENCY DISTRIBUTION
PRESENTED IN TABLE 1.2
Classes Class Boundaries Class Marks
Table 1.5 class limits, class boundaries and class marks of frequency distribution presented in Table 1.3
NUMERICAL SUMMARY
MEASURES
∑xi = x1 + x2 + …+ xn = ∑xi
i=1
MEASURES OF CENTRAL TENDENCY:
MEAN, MEDIAN AND MODE
MEAN
The arithmetic mean or simply the mean is the overall average.
If the data represent the entire population the mean of the
values is referred to as the population mean, μ. This mean is a
quantitative measure describing the characteristic of a population
and therefore, it is a parameter. If the data constitute a sample
drawn _ from a population, the mean is referred to as the sample
mean, ᵪ , which is a statistic.
If there are n observations with numerical values x1, x2,…xn, then the
sample mean is given by
_
ᵪ =
n
Fundamentals of Probability and Statistics of Engineering
The population mean, μ is given by
I
μ= N
ᵪ i Xi
n
Example 1.5
The following data represent the time in seconds for 9 glued
samples to dry and attains its bond strength: 3.6, 2.5, 3.1, 4.3,
2.4, 2.9, 2., 4.1 and 3.4. Calculate the mean.
SOLUTION:
ᵪ = ∑
𝑓𝑖𝑋𝑖 = 28.8 =
n 9 3.2 seconds
Example 1.6
Find the mean weight of sodium nitrate in Example 1.1
SOLUTION:
WEIGHT (kl.) No. of bags (frequency, f)
1.46 6 8.76
1.48 4 5.92
1.49 5 7.45
1.50 6 9.00
1.52 9 13.68
∑ 30 44.81
ᵪ = ∑=n𝑓𝑖 𝑋𝑖
44.81
=301.49 kilos
ᵪ = ∑n𝑓𝑖 𝑋𝑖
Example 1.7
◦ Find the mean of gasoline consumption in Example 1.3
Classes
Solution:
22.5
22.5 -- 22.9
22.9 2
2 22.7
22.7 45.4
45.4
23.0 - 23.4 3 23.2 69.6
23.0 - 23.4 3 23.2 69.6
23.5 - 23.9 7 23.7 165.9
23.5 - 23.9 7 23.7 165.9
24.0 - 24.4 8 24.2 193.6
24.0 - 24.4 8 24.2 193.6
24.5 - 24.9 14 24.7 345.8
24.5 - 24.9 14 24.7 345.8
25.0 - 25.4 5 25.2 126
25.0
25.5 -- 25.4
25.9 5
1 25.2
25.7 126
25.7
25.5
∑ - 25.9 1
40 25.7 25.7
972
∑ 40 972
if n is odd if n is even ᵪn ᵪ
+ n
~ ~
ᵪ= ᵪ ᵪ = 2 2+1
n + ⅟2 2
The sample median ᵪ is used to estimate the population median μ.
Example 1.8
For the set of numbers 1, 3, 3, 5, 6, 8, 9, 9, 10
ᵪ ~ ᵪ= = 6
5
Example 1.9
For the set of numbers 4, 4, 7, 9, 11, 12, 15, 18
ᵪ4 + ᵪ 5 9 + 11
ᵪ = = = 10
` 2 2
Example 1.10
Find the median of the data in Example1.5.
Solution:
Arrange the data in ascending magnitude. 2.3, 2.5, 2.6, 2.9, 3.1, 3.4, 3.6, 4.1, 4.3
ᵪ ᵪ~ = 5
= 3.1 seconds
n
( ∑f ) L
ᵪ ~
= Lm +
2
C
fm
Where:
lowest
Lm = class boundary of the median class
Fundamentals of Probability and Statistics for Engineering
( ∑=f ) sum of frequencies of all classes lower than the median classes
= ffrequency
m
of the median class
C = size of the median class
Example 1.11
Determine the median of the data in example 1.2. Refer to Table 1.8
Solution:
ᵪ ~
= 23.95 + (20 – 12)(0.5) = 24.45
8
MODE
Mode is the valuḙ which occurs with greatest frequency. The sample mode
is designated as and theᵪ population mode by μ.
Example 1.12
For the set of numbers 3, 3, 5, 7, 9, 10, 11, 10, 11, 12, 9, 18, 9
̭
ᵪ = 9 (unimodal)
Example 1.13
The set of numbers 6, 7, 9, 10, 12 has no mode.
Example 1.14
For the set of values 2.2, 3.1, 4.1, 4.1, 5.4, 5.4, 5.4, 5.4, 6.2, 7.7, 7.7, 8.5, 8.5, 8.5, 9.3
̭
ᵪ = 5.4 and 8.5 (bimodal)
MODE OF GROUPED DATA
̭ d1
ᵪ = L mO + c
d1 + d2
where:
σ² = ²
i
N
The sample variance s² is a statistic. A statistic that estimates the true parameter on the
average is said to be unbiased. Dividing by n will underestimate the population variance on the
average. To compensate for the bias in estimating σ², we us n – 1 in the divisor. The number n
- 1 is called the degrees of freedom.
Thus, if there are n numerical observations x , x1 ,….x
2
inna sample, the deviation
of each individual observations from the mean is x - x . i
²
S² = i
n-1
Shortcut formula in Finding the Sample Variance
Therefore:
S² = -
1 ²
² i
n
i
n-1
STANDARD DEVIATION
Standard deviation is the positive square root of the variance.
The population standard deviation σ = √σ² and the sample standard
deviation s = √s² .
s² = ² or s² = - ²
i i 1 i
n-1
² i i
i
n-1
EXAMPLE 1.17
The following readings were the obtained tensile strength in kg/cm² of six
specimens of carbon steel.
2.46 2.65 2.40 2.44 2.41 2.58
a) What is the mean tensile strength and the standard deviation of the tensile
strengths?
b) Suppose each reading is expressed in kg/m², what is the standard deviation?
SOLUTION:
X (X - X)²
i X ² i i
∑( Xi - X )² 0.0516
S² = n=- 1 = 0.01
5
Fundamentals of Probability and Statistics for Engineering
Using the shortcut formula:
2
s² = ⅕[37.2522 – (14.94) ]
6
s² = ⅕(0.0516 = 0.01
2
s = 0.1 kg/cm
b.) If the readings will be expressed in kg/m2 each reading will be multiplied by 104 since 1 kg/cm2 = 104 kg/m2 .
Therefore, the standard deviation of the new set of data is
S = (104 )(0.1)=1000kg/m2
Example 1.18
◦ Determine the standard deviation of gasoline consumption in Example 1.3
Solution:
2
Classes (-X ) 2
19.60
= 0.503
◦ S² = 39
◦ S = 0.71mi/gal
Chapter 2
PROBABILITY refers to the study of randomness and uncertainty of
an outcome. The theory of probability provides methods that will
permit us to quantify the chances, or likelihood, associated with
various outcomes of an event.
P= nPr = n!/(n-r)!
3. The number of permutations of n objects of which n1 are identical,
n2 are identical,…., nm are identical is
P= n!/ n1! n2!...nm!
4. The number of permutations of n objetcs of which n distinct
objects are arrange in a circle is
P= (n-1)!
COMBINATION
Combination is the number of ways of selecting r objects from n without regard to order.
The number of combinations of n objects taken r at a time is
nCr = n!/r!(n-r)!
EXAMPLE 1.1
How many numbers can be formed using all the digits 6, 7, 8, and 9?
Solution: To form different numbers, arrange all the 4 digits and the
arrangements are the number of numbers formed.
P=4! = 24 numbers
Example 1.2
How many distinct permutations are there in the word MILLENNIUM?
Solution: There are 2M’s, 2L’s, 2I’s, 2N’s
P= 10!/2!2!2!2! = 226, 800
Example 2.3
A. In how many ways can 4 letters a,b,c and d be arranged inn a circle?
Solution: P= (4-1)! = 3! = 6 ways
Example 2.4
From a box containing 4 defective and 5 non defective items, how many
sample of size 3 are possible.
A. With no restrictions
B. With 1 defective and 2 non defective item
C. With 2 defective and 1 non defective item if a certain defective item must
be on the sample chosen
a) n = number of ways of selecting 3 from 9
9! 9!
n = 9 C3 = 3!(9-3)! = =
3! 6!
84 samples
n = C C1 . 4! = 4(10)
. 5!= 40 samples
4 5= 2 1! 3! 2! 3!
c) n = no. of samples with 2 defective and 1 non defective with a certain defective item on the sample
chosen
n1 = no. of ways of selecting 1 defective from among 3 defective items
n2 = no. of ways of selecting 1 defective from among 5 non defective items
n = 3 C1 . C = 3! . 5! = 3(5) 15 samples
5 1 1! 2! 1! 4!
Example 2.5
A jar contains 3 red marbles, 7 green marbles and 10 white marbles. If a
marble is drawn from the jar at random, what is the probability that this
marble is white?
A box of 10 fuses has two defective fuses. In how many ways can
one select three of these fuses and get
a.) neither of the defective fuses
b.) one defective fuses
C.) both of the defective fuses
PROBABILITY OF AN EVENT
The objective of probability is to asign to each event A a number P(A), called the probability of the
event A , which will give a precise measure of the chance that A will happen.
The probability of an event A is the number of the outcomes favorable to A to the number of
outcomes. If NA is the number of outcomes favorable to event A and N is the total number of outcomes, the
number of outcomes in a sample space, thus,
P(A) = nA/N
Example 1.1
In the experiment of examining 3 bulbs, find the
probability of the following events:
a.) exactly 2 bulbs are defective
b.) at least 2 bulbs are defective
Solution:
The sample space for this experiment is