0% found this document useful (0 votes)
251 views60 pages

CE Data Analysys Chap1.

Descriptive statistics is concerned with collecting, organizing, and summarizing data without drawing inferences. It involves presenting data through tables and graphs, and calculating numerical measures like mean, median and mode. Inferential statistics uses probability and mathematical tools to make forecasts and inferences beyond the data. A population is all individuals/objects in a group, while a sample is a subset selected from the population. Variables can be qualitative like gender, or quantitative like age. Data can be organized through ungrouped or grouped frequency distributions.

Uploaded by

Angel Antonio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
251 views60 pages

CE Data Analysys Chap1.

Descriptive statistics is concerned with collecting, organizing, and summarizing data without drawing inferences. It involves presenting data through tables and graphs, and calculating numerical measures like mean, median and mode. Inferential statistics uses probability and mathematical tools to make forecasts and inferences beyond the data. A population is all individuals/objects in a group, while a sample is a subset selected from the population. Variables can be qualitative like gender, or quantitative like age. Data can be organized through ungrouped or grouped frequency distributions.

Uploaded by

Angel Antonio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

CHAPTER 1

DESCRIPTIVE STATISTICS
DESCRIPTIVE AND INDUCTIVE
STATISTICS
◦Statistics
is a science that deals with the methods of collecting,
organizing, summarizing and interpreting data in
order to draw valid conclusions (or interferences)
from them. These methods are categorized as
belonging to two major areas called descriptive
statistics and inductive or inferential statistics.
◦Desrciptive statistics
is concerned with the collection and
presentation of data and the description of
some of their features to yield meaningful
information without attempting to draw
any inferences from them.
◦Inductive or inferential statistics
is concerned with the development and use
of mathematical tools to go beyond data
presentation and make forecasts and
inferences. The concept of probability is basic
to the development and understanding of
inductive statistics.
POPULATION, SAMPLE AND
VARIABLES
One of the goals of statistical investigation is to acquire
information or draw some conclusions about a large group of items
on the basis of a few. When it is impossible or impractical to
observe the entire set of observations, we must, therefore, depend
on a subset of the observations.
A population consist of all the individuals or objects in a group
under study. A subcollection of items drawn from a population
under study is called a sample. The characteristics thatis being
studied is a variable. A variable may be qualitative or quantitative.
STATISTICS
DESCRIPTIVE INFERENTIAL
1. Organizing and summarizing data 1. Using complete data to make an
using numbers inference or draw a conclusion of the
and graphs population.
2. Data summary: 2. Uses probability to determine how
Bar Graphs, Histograms, Pie Charts, confident we can be that the conclusions
etc. we make are correct.
Shape of graph and skewness (Confidence Intervals and Margins of
3. Mesures of Central Tendency: Error)
Mean, Median, and Mode
4. Measures of Variability:
Range, variance, and Standard
deviation
QUALITATIVE VARIABLES QUANTITATIVE VARIABLES
• Categorical or nonnumeric
• How much or how many
-gender -Company revenues
-eye color -age
-religious affiliation -salary
-political affiliation -IQ
-major
Discrete variables –
can only assume certain values and gaps exist
between those values
Continuous Variables-
can assume any value within a certain range
-Profits
◦ -Square footage
COLLECTION OF DATA
Statistics deals also with the development of techniques for collecting
data. Data should be properly collected so that an investigator may be able
to answer the question under consideration with a reasonable degree of
confidence.
The simplest method foe ensuring a representative selection of samples
is to take a simple random sample. In this method of sampling, any
particular subset of the specified size has the same chance of being
selected. For example, a sample of size 100 will be taken from the 100,000
items produced. Each item’s serial number could be noted on identical
small sheet of paper, placed in box and jumbled thoroughly.
◦Another method is stratified sampling. This involves
taking a sample from each population unit in non-
overlapping groups. For instance, the manufacturer of a
light bulb wishes to investigate the lifetime of their
bulbs. If 25-watt, 60-watt, and 100-watt bulbs were
produced, a separate sample could be selected from
each of the three bulb sizes. This would result in
information on all the three bulb sizes.
TABULAR AND GRAPHICAL
METHODS IN DESCRIPTIVE
STATISTICS
◦ Descriptive statistics can be divided into two subjects areas.
The first area consists of data presentation using visual
techniques in the form of tables and graphs. The other area
consists of numerical summary measures foe data set.
◦ There may be many visual techniques which are familiar to
you, however, we focus our discussion on a selected few
techniques of data presentation that are most useful and
relevant to probability and inferential statistics.
FREQUENCY DISTRIBUTIONS
The organization of data in tabular form yields frequency
distributions. Data in frequency distributions may be grouped or
ungrouped.
Raw data are collected data that have not been organized
numerically an arrangement of raw data in ascending or descending
order or magnitude is an array. In an array, any value may appear
several times. The number of times a value appears in the listing is its
frequency. The relative frequency of any observation is obtained by
dividing the actual frequency of the observation by the total
requency.
UNGROUPED DATA
When the data is small (n ≤ 30) or when there are few distinct values, the data may be organized without
grouping.
EXAMPLE 1.1
A certain machine is to dispense 1.5 kilos of sodium nitrate. To determine whether it is properly adjusted
to dispense 1.5 kilos, the quality control engineer weighed 30 bags of sodium nitrate, 1.5 kilo each after the
machine was adjusted. The data given below refer to the net weight (in kilos) of each bag.

1.46 1.49 1.52 1.50 1.46 1.52

1.52 1.50 1.49 1.50 1.46 1.46

1.50 1.52 1.49 1.52 1.46 1.49

1.52 1.46 1.52 1.48 1.52 1.50

1.50 1.48 1.49 1.52 1.48 1.48

ARRANGE THE DATA IN A FREQUENCY TABLE.


GROUP DATA
Statistical data gathered I large masses (n ≥ 30) can be assessed by grouping the data
into different classes.
The following are suggested steps in forming a frequency distribution from raw data:
1. Find the range (R). The range is the difference between the largest and smallest
value.
2.Decide on a suitable number of classes. This will depend upon what information the
table is supposed to present. Sturge suggested the number of classes (m) as
m= 1+3.3 log n where n= number of cases
3. Determine the class size (c).
c = R/m
The class size (c) may be rounded off to the same place value as tge data.
4. Fins the number of observations in each class. This is the class frequency (f).
CLASS INTERVALS
◦ Classes represent the grouping or classification. The
range of values in a class is the class interval
consisting of a lower limit and an upper limit.
Whenever possible, we must make the class
interval of equal width and make the ranges
multiples of numbers which are easy to work with
such as 6, 10 or 100.
EXAMPLE
The following are data on the observed compressive strength in psi
of 50 samples of concrete interlocking blocks.
136 92 115 118 121 137 132 120 104 125

119 115 101 129 87 108 110 133 135 126

127 103 110 126 118 82 104 137 120 95

146 126 119 119 105 132 126 118 100 113

106 125 117 102 146 129 124 113 95 148

PREPARE THE FREQUENCY DISTRIBUTION TABLE.


The number of observed values tallied in each is the class frequency. The
relative frequency of each class is aalso obtained and presented in Table
1.2.
Compressive Strength (psi) Tally No. of blocks (frequency, f) Relative Frequency

80-89 II 2 0.04

90-99 III 3 0.06

100-109 IIIII-IIII 9 0.18

110-119 IIIII-IIIII-III 13 0.26

120-129 IIIII-IIIII-III 13 0.26

130-139 IIIII-II 7 0.14

140-149 III 3 0.06

∑ 50 1.00

TABLE 1.2 FREQUENCY DISTRIBUTION OF COMPREHENSISIVE STRENGTH OF CONCRETE INTERLOCKING BLOCKS


EXERCISE 1.3
The following are the observed gasolin consumption in miles per gallon of 40
cars. Arrange the data in a frequency distribution.

24.5 23.6 24.1 25.0 22.9 24.7 23.8 25.2 23.7 24.4

24.7 23.9 25.1 24.6 23.3 24.3 24.6 23.9 24.1 24.4

24.5 25.7 23.6 24.0 23.9 24.2 24.7 24.9 25.O 24.8

24.5 23.4 24.9 24.8 24.7 24.1 22.8 23.1 25.3 24.6
The lowest value is 22.8, therefore. 22.5 maybe the
lower limit of the 1st class. 22.5 + 0.5 = 23.0 is the
lower limit of the 2nd class.
Gasoline Consumption Tally No. of cars (frequency, f)
(miles/gallon)
22.5 – 22.9 II 2 0.050
23.0 – 23.4 III 3 0.075
23.5 – 23.9 IIII – II 7 0.175
24.0 – 24.4 IIII – III 8 0.200
24.5 – 24.9 IIII – IIII – IIII 14 0.350
25.0 – 25.4 IIII 5 0.125
25.5 – 25.9 I 1 0.025
∑ 40 1.000

TABLE 1.3 FREQUENCY DISTRIBUTION OF THE GASOLINE CONSUMPTION


CLASS MARKS AND CLASS BOUNDERIES
The midpoint of the class interval is the class mark. It is half the sum
of lower and upper limits of a class. A point that represents halfway,
or a dividing point between successive classes is the class boundary.
The upper class boundary of the first class is the dividing point
between the first class and the second class. The lower class
boundary of the second class is the dividing point between the first
class and the second class.
Thus in Table 1.2, ½ (89 + 90) = 89.5 is the upper class boundary of
the first class. This is the lower boundary of the second class.
The class mark of the first class is equal to ½ (80 + 89) = 84.5. for the
succeeding classes, the class mark may be obtained by adding c = 10 since the
classes have equal widths.

Classes Class Bounderies Class marks


80 – 89 79.5 – 89.5 84.5

90 – 99 89.5 – 99.5 94.5

100 – 109 99.5 - 109.5 104.5

110 – 119 109.5 – 119.5 114.5

120 – 129 119.5 – 129.5 124.5

130 – 139 129.5 – 139.5 134.5

140 – 149 139.5 – 149.5 144.5

TABLE 1.4 CLASS LIMITS, CLASS BOUNDARIES AND CLASS MARKS FOR FREQUENCY DISTRIBUTION
PRESENTED IN TABLE 1.2
Classes Class Boundaries Class Marks

22.5 – 22.9 22.45 – 22.95 22.7

23.0 – 23.4 22.95 – 23.45 23.2

23.5 – 23.9 23.45 – 23.95 23.7

24.0 – 24.4 23.95 – 24.45 24.2

24.5 – 24.9 24.45 – 24.95 24.7

25.0 – 25.4 24.95 – 25.45 25.2

25.5 – 25.9 25.45 – 25.95 25.7

Table 1.5 class limits, class boundaries and class marks of frequency distribution presented in Table 1.3
NUMERICAL SUMMARY
MEASURES

We will focus on numerical summary measures for quantitative data


to calculate a set of numbers that will characterize the data set and
convey some of its salient features. The important characteristics of a set
of numbers are its location, particularly the center, and its variability.
Numerical summary measures can be calculated from either a sample
or a population. Any quantitative measure that describes a
characteristics of a sample.
The summation notation and rules below will be useful in dealing with
summary measures.
Summation Notation
n

∑xi = x1 + x2 + …+ xn = ∑xi
i=1
MEASURES OF CENTRAL TENDENCY:
MEAN, MEDIAN AND MODE
MEAN
The arithmetic mean or simply the mean is the overall average.
If the data represent the entire population the mean of the
values is referred to as the population mean, μ. This mean is a
quantitative measure describing the characteristic of a population
and therefore, it is a parameter. If the data constitute a sample
drawn _ from a population, the mean is referred to as the sample
mean, ᵪ , which is a statistic.
If there are n observations with numerical values x1, x2,…xn, then the
sample mean is given by
 

_
ᵪ =
n
Fundamentals of Probability and Statistics of Engineering
The population mean, μ is given by

I
μ= N
 

If the values x1, x2,….,xm occurs f1,f2,….,fm times, then

ᵪ i Xi
n
Example 1.5
The following data represent the time in seconds for 9 glued
samples to dry and attains its bond strength: 3.6, 2.5, 3.1, 4.3,
2.4, 2.9, 2., 4.1 and 3.4. Calculate the mean.
SOLUTION:
ᵪ = ∑
 
𝑓𝑖𝑋𝑖 = 28.8 =
n 9 3.2 seconds

Example 1.6
Find the mean weight of sodium nitrate in Example 1.1
SOLUTION:
WEIGHT (kl.) No. of bags (frequency, f)

1.46 6 8.76
1.48 4 5.92
1.49 5 7.45
1.50 6 9.00
1.52 9 13.68
∑ 30 44.81
ᵪ = ∑=n𝑓𝑖 𝑋𝑖
 
44.81
=301.49 kilos

MEAN OF GROUPED DATA


If the data in a frequency table consist of n observations having
m classes with class marks x1, x2,…, xm with class frequencies f1, f2,…
fm then the simple mean is

ᵪ = ∑n𝑓𝑖 𝑋𝑖
 
Example 1.7
◦ Find the mean of gasoline consumption in Example 1.3
Classes
Solution:
22.5
22.5 -- 22.9
22.9 2
2 22.7
22.7 45.4
45.4
23.0 - 23.4 3 23.2 69.6
23.0 - 23.4 3 23.2 69.6
23.5 - 23.9 7 23.7 165.9
23.5 - 23.9 7 23.7 165.9
24.0 - 24.4 8 24.2 193.6
24.0 - 24.4 8 24.2 193.6
24.5 - 24.9 14 24.7 345.8
24.5 - 24.9 14 24.7 345.8
25.0 - 25.4 5 25.2 126
25.0
25.5 -- 25.4
25.9 5
1 25.2
25.7 126
25.7
25.5
∑ - 25.9 1
40 25.7 25.7
972
∑ 40 972

Table 1.8 Frequency distribution of gasoline consumption described in Table 1.3


∑ 𝑓𝑖 𝑋𝑖
 
972.0
ᵪ = n = 40 = 24.3 mi/gal

Fundamentals of Probability and Statistics for Engineering


Median
The median of a set of numbers in an array is either the middle
value or the arithmetic mean of two middle values.

if n is odd if n is even ᵪn ᵪ
+ n
~ ~
ᵪ= ᵪ ᵪ = 2 2+1

n + ⅟2 2
The sample median ᵪ is used to estimate the population median μ.

Example 1.8
For the set of numbers 1, 3, 3, 5, 6, 8, 9, 9, 10

ᵪ ~ ᵪ= = 6
5
Example 1.9
For the set of numbers 4, 4, 7, 9, 11, 12, 15, 18
ᵪ4 + ᵪ 5 9 + 11
ᵪ = = = 10
` 2 2
Example 1.10
Find the median of the data in Example1.5.
Solution:
Arrange the data in ascending magnitude. 2.3, 2.5, 2.6, 2.9, 3.1, 3.4, 3.6, 4.1, 4.3
ᵪ ᵪ~ = 5
= 3.1 seconds

MEDIAN OF GROUPED DATA

n
( ∑f ) L
ᵪ ~
= Lm +
2
C
fm

Where:
lowest
Lm = class boundary of the median class
Fundamentals of Probability and Statistics for Engineering
( ∑=f ) sum of frequencies of all classes lower than the median classes
= ffrequency
m
of the median class
C = size of the median class
Example 1.11
Determine the median of the data in example 1.2. Refer to Table 1.8
Solution:
ᵪ ~
= 23.95 + (20 – 12)(0.5) = 24.45
8

MODE
Mode is the valuḙ which occurs with greatest frequency. The sample mode
is designated as and theᵪ population mode by μ.
Example 1.12
For the set of numbers 3, 3, 5, 7, 9, 10, 11, 10, 11, 12, 9, 18, 9
̭
ᵪ = 9 (unimodal)

Example 1.13
The set of numbers 6, 7, 9, 10, 12 has no mode.
Example 1.14
For the set of values 2.2, 3.1, 4.1, 4.1, 5.4, 5.4, 5.4, 5.4, 6.2, 7.7, 7.7, 8.5, 8.5, 8.5, 9.3

̭
ᵪ = 5.4 and 8.5 (bimodal)
MODE OF GROUPED DATA
̭ d1
ᵪ = L mO + c
d1 + d2

where:

L m=O lower class boundary of the modal class


d1 = excess of modal frequency over frequency of the next lower class
d2 = excess of modal frequency over frequency of the next lower class
c = size of the modal class interval
MEASURES OF VARIABILITY:

RANGE, VARIANCE AND STANDARD DEVIATION

IMPORTANCE OF MEASURING VARIABILITY


RANGE
Range is the simplest measure of variability. It is the least satisfactory because
it provides no information at all about the date between the highest and the
lowest values. If we will evaluate the variability of the scores of the students using
the range, R = 8 RA = 60.B We may say that student A performs better upon knowing
the scores between. Suppose wed do not know any information, the range 8 could
be the difference between two failing scores, so we cannot infer that student A is
better than B.
VARIANCE
Variance is a measures that considers position of each observation relative to the mean. It
is defined as the squares of all the deviations.

If there are N numerical measurements, x , x ,….x 1 in2 a population,


N
the deviation from the
i
mean of each individual observation is obtained as x - μ. Hence the population variance σ² is:

σ² =  ²
i

N
The sample variance s² is a statistic. A statistic that estimates the true parameter on the
average is said to be unbiased. Dividing by n will underestimate the population variance on the
average. To compensate for the bias in estimating σ², we us n – 1 in the divisor. The number n
- 1 is called the degrees of freedom.
Thus, if there are n numerical observations x , x1 ,….x
2
inna sample, the deviation
of each individual observations from the mean is x - x . i

 ²
S² = i
n-1
Shortcut formula in Finding the Sample Variance
Therefore:
S² = -
1    ²
² i
n
i
n-1
STANDARD DEVIATION
Standard deviation is the positive square root of the variance.
The population standard deviation σ = √σ² and the sample standard
deviation s = √s² .

VARIANCE OF GROUPED DATA


If the data in a frequency table consist of n observations having m classes with
class marks x , x ,….x
1 2 with
m class frequencies f , f ,….f
1 2 and
m with sample mean, x then

s² =  ² or s² = -  ²
i i 1   i
n-1
² i i
i

n-1
EXAMPLE 1.17
The following readings were the obtained tensile strength in kg/cm² of six
specimens of carbon steel.
2.46 2.65 2.40 2.44 2.41 2.58
a) What is the mean tensile strength and the standard deviation of the tensile
strengths?
b) Suppose each reading is expressed in kg/m², what is the standard deviation?
SOLUTION:
X (X - X)²
i X ² i i

2.58 0.0081 6.6554


2.65 0.0256 7.0225
2.40 0.0081 5.7600
2.46 0.0009 6.0516
2.44 0.0025 5.9536
2.41 0.0064 5.8081
14.94 0.0516 37.2522
14.94
a) X = =62.49

∑( Xi - X )² 0.0516
S² = n=- 1 = 0.01
5
Fundamentals of Probability and Statistics for Engineering
Using the shortcut formula:

2
s² = ⅕[37.2522 – (14.94) ]
6

s² = ⅕(0.0516 = 0.01
2
s = 0.1 kg/cm
b.) If the readings will be expressed in kg/m2 each reading will be multiplied by 104 since 1 kg/cm2 = 104 kg/m2 .
Therefore, the standard deviation of the new set of data is
S = (104 )(0.1)=1000kg/m2
Example 1.18
◦ Determine the standard deviation of gasoline consumption in Example 1.3
Solution:
2
Classes (-X ) 2

22.5 – 22.9 2 22.7 45.4 5.12 1030.58


23.0 – 23.4 3 23.2 69.6 3.63 1614.72
23.5 - 23.9 7 23.7 165.9 2.52 3931.83
24.0 – 24.4 8 24.2 193.6 0.08 4685.12
24.5 – 24.9 14 24.7 345.8 2.24 8541.26
25.0 – 25.4 5 25.2 126 4.05 3175.20
25.5 – 25.9 1 25.7 25.7 1.96 660.49
∑ 40 972 19.60 23639.20
972 24.3 mi/gal
◦ X= 40
=

19.60
= 0.503
◦ S² = 39

◦ S = 0.71mi/gal
Chapter 2
PROBABILITY refers to the study of randomness and uncertainty of
an outcome. The theory of probability provides methods that will
permit us to quantify the chances, or likelihood, associated with
various outcomes of an event.

SAMPLE SPACE OF AN EXPERIMENT


An experiment is any action or process that generates data. The set
of all possible outcomes of an experiment is the sample space, S.
Each outcome is a sample space called an element or a sample
point.
Methods of Describing a Sample
Space
1. If the sample space has a finite number of sample points, we may
describe the set by listing the elements separated by commas and
enclosed in brackets.

2. If the sample space has large or infinite number of sample points,


describe the set by a statement or rule.
Example 2.1
An experiment consists of examining a bulb to determine whether or
not is defective. Using D for defective and N for not defective, the
sample space for the experiment is
S= {D, N}
Another such experiment is tossing a single coin. The set of all
possible outcome is
S= {H, T}
COUNTING TECHNIQUES
If the number of possible outcomes in an experiment is quite large, the effort of
constructing the list of outcomes becomes prohibitive. By using some counting
rules, it is possible to determine the number of outcomes without listing.
PERMUTATION
A permutation is an arrangement of all or part of a group of objects or
elements. Order is an important aspect of permutation.
1. The number of permutation of n distinct objects taken n at a time is
P= n!
2. The number of permuations of n distinct objects taken r at a time is

P= nPr = n!/(n-r)!
3. The number of permutations of n objects of which n1 are identical,
n2 are identical,…., nm are identical is
P= n!/ n1! n2!...nm!
4. The number of permutations of n objetcs of which n distinct
objects are arrange in a circle is

P= (n-1)!
COMBINATION
Combination is the number of ways of selecting r objects from n without regard to order.
The number of combinations of n objects taken r at a time is
nCr = n!/r!(n-r)!

EXAMPLE 1.1
How many numbers can be formed using all the digits 6, 7, 8, and 9?
Solution: To form different numbers, arrange all the 4 digits and the
arrangements are the number of numbers formed.
P=4! = 24 numbers
Example 1.2
How many distinct permutations are there in the word MILLENNIUM?
Solution: There are 2M’s, 2L’s, 2I’s, 2N’s
P= 10!/2!2!2!2! = 226, 800
Example 2.3
A. In how many ways can 4 letters a,b,c and d be arranged inn a circle?
Solution: P= (4-1)! = 3! = 6 ways

Example 2.4
From a box containing 4 defective and 5 non defective items, how many
sample of size 3 are possible.
A. With no restrictions
B. With 1 defective and 2 non defective item
C. With 2 defective and 1 non defective item if a certain defective item must
be on the sample chosen
a) n = number of ways of selecting 3 from 9
9! 9!
n = 9 C3 = 3!(9-3)! = =
3! 6!
84 samples

b) n = Number of samples of 1 defective and 2 non defective


n1 = no. of ways of selecting 1 defective from 4 defective samples
n2 = no. of ways of selecting 2 non defective from 5 non defective samples

n = C C1 . 4! = 4(10)
. 5!= 40 samples
4 5= 2 1! 3! 2! 3!

c) n = no. of samples with 2 defective and 1 non defective with a certain defective item on the sample
chosen
n1 = no. of ways of selecting 1 defective from among 3 defective items
n2 = no. of ways of selecting 1 defective from among 5 non defective items
n = 3 C1 . C = 3! . 5! = 3(5) 15 samples
5 1 1! 2! 1! 4!
Example 2.5
A jar contains 3 red marbles, 7 green marbles and 10 white marbles. If a
marble is drawn from the jar at random, what is the probability that this
marble is white?

A box of 10 fuses has two defective fuses. In how many ways can
one select three of these fuses and get
a.) neither of the defective fuses
b.) one defective fuses
C.) both of the defective fuses
PROBABILITY OF AN EVENT
The objective of probability is to asign to each event A a number P(A), called the probability of the
event A , which will give a precise measure of the chance that A will happen.
The probability of an event A is the number of the outcomes favorable to A to the number of
outcomes. If NA is the number of outcomes favorable to event A and N is the total number of outcomes, the
number of outcomes in a sample space, thus,

P(A) = nA/N
Example 1.1
In the experiment of examining 3 bulbs, find the
probability of the following events:
a.) exactly 2 bulbs are defective
b.) at least 2 bulbs are defective
Solution:
The sample space for this experiment is

You might also like