Engineering Probability and Statistics

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 42

ENGINEERING PROBABILITY AND STATISTICS

SYLLABUS FOR ENGINEERING PROBABILITY AND STATISTICS


Course Description: This course focuses on the descriptive branch of statistics that comprise of data analysis and organization of raw data into frequency table, measures central tendency and dispersion, introduction to probability and counting techniques, probability laws, Bayes rule, random variables, discrete and continuous probability distribution and its applications to real world setting.

COURSE OUTLINE: PRELIM PERIOD


- Definition of terms

- Differentiate between descriptive and inferential statistics - Organizing data - Construction of frequency table, and its graphical representation (histogram, frequency polygon, stem and leaf plot) - Computation for mean, median and mode for grouped and ungrouped data. - Computation of range, variance and standard deviation for grouped and ungrouped data. - Counting Rules and Venn Diagram

COURSE OUTLINE: MIDTERM - Concept of Probability and theories - Counting Sample Points - Venn Diagram - Addition and Multiplication Rules - Conditional and Bayes Rules - Concept of Random Variables - Discrete and Continuous Random Variables

COURSE OUTLINE: FINAL PERIOD - Special discrete probability distribution (uniform, binomial, multinomial, geometric, hypergeometric, neg. binomial and poisson dist.) - Special Continuous distribution (Normal and Continuous uniform distribution) - Sampling theory and Sampling distribution - Estimation of parameters

Quizzes: there are two exams in the course per period. Each exam consists of 100 points . The time limit on all exams is one hour. Only material covered in class will be included in the exams. (No calculator, no examination) Assignments: There are maximum of two homework assignments in one period. Do homework problems carefully and to the best of your ability. Show all your calculations and reasoning. You will receive credit for honest attempts to answer all the questions, even if your answers are incorrect. Homeworks that are sloppy or incomplete will not earn full credit. Homeworks should be turned in on time . Seatworks: There are maximum of two seatwork in one period. The problem is usually computational and something we have just

INTRODUCTION
STATISTICS is a collection of methods for planning experiments, obtaining data, and then organizing, summarizing, analyzing, interpreting, and drawing conclusions based on the data. DESCRIPTIVE STATISTICS comprises those methods concerned with collecting and describing a set of data so as to yield meaningful information (the purpose is to summarize or display data so that we can quickly obtain an overview) INFERENTIAL STATISTICS those methods concerned with the analysis of a subset data leading to predictions or inferences about the entire population (it allows us to make claims or

Population - All subjects possessing a common characteristic that is being studied. It consists of the totality of the observations with which are concerned. Parameter - Characteristic or measure obtained from a population. Sample - A subgroup or subset of the population. Statistic (not to be confused with Statistics) Characteristic or measure obtained from a sample.

Population vs. Sample


The population includes all objects of interest whereas the sample is only a portion of the population. Parameters are associated with populations and statistics with samples. Parameters are usually denoted using Greek letters (mu, sigma) while statistics are usually denoted using Roman letters (x, s). There are several reasons why we don't work with populations. They are usually large, and it is often impossible to get data for every object we're studying. We compute statistics, and use them to estimate parameters. The computation is the first part of the statistics course (Descriptive Statistics) and the estimation is the second part (Inferential Statistics)

Classify the following statements as belonging to the area of descriptive statistics or inferential statistics: (a) As a result of recent cutbacks by the oilproducing nations, we can expect the price of gasoline to double in the next years. (b) At least 5% of all fires reported last year in a certain city were deliberately set by arsonists. (c) Of all patients who have received this particular type of drug at a local clinic, 60% later developed significant side effects. (d) Assuming that less than 20% of the Columbian coffee beans were destroyed by frost this past winter, we should expect an increase of no more than 30 cents for a kilogram of coffee by the end of the year

Histogram A graph which displays the data by using vertical bars of various heights to represent frequencies. The bases of each bar are the class boundary. Pie Chart Graphical depiction of data as slices of a pie. The frequency determines the size of the slice. The number of degrees in any slice is the relative frequency times 360 degrees.

Frequency Polygon Constructed by plotting class frequencies against class marks and connecting the consecutive points by straight lines.

HISTOGRAM

FREQUENCY POLYGON

Stem and Leaf Plot A data plot which uses part of the data value as the stem and the rest of the data value (the leaf) to form groups or classes. This is very useful for sorting data quickly.

Ogive
A frequency polygon of the cumulative frequency or the relative cumulative frequency. Obtained by plotting the cumulative frequency less than any upper class boundary against the upper class boundary and joining all the consecutive points by straight lines. The vertical axis the cumulative frequency or relative cumulative frequency. The horizontal axis is the class boundaries. The graph always starts at zero at the lowest class boundary and will end up at the total frequency (for a cumulative frequency) or 1.00 (for a relative cumulative frequency).

DESCRIPTIVE STATISTICS
Measures of Central Location
Any measure indicating the center of a set of data, arranged in an increasing or decreasing order of magnitude is called a measure of central location or measure of central tendency. The most commonly used Measures of central location are the mean, median, and mode. The most important of these and the one we shall consider first is the mean.

THE MEAN
The arithmetic mean or mean, of a set of measurements is the sum of the measurements divided by the total number of measurements.
For Ungrouped Data: Let x1 , x2 , x3 ,. Xn be n observations of a random variable X. The sample mean, denoted by x, is the arithmetic average of these values. That is, _ x1 + x2 + x3 , + +Xn x(x-bar) = ------------------------------n For Grouped Data _ fi xi x (x-bar) = --------- fi Where: fi is the frequency of class interval i xi is the class midpoint of class interval i

THE MEDIAN
For Ungrouped Data:

Let x1 , x2 , x3 ,. Xn be a sample observations arranged in the order of smallest to largest. The sample median for this collection is given by the middle observation if n is odd. If n is even, the sample median is the average of the two middle observations. For Grouped Data: When the data are grouped into a frequency distribution, the median is obtained by finding the cell that has the middle number and then interpolating within the cell. n/2 <cf1-1 n/2 >cfi-1 x = Lb + -------------------- (i) OR x = Ub -------------------- (i) fi fi where: Lb = lower class boundary of the interpolated interval Ub = lower class boundary of the interpolated interval <cfi-1 = less than cumulative frequency of the class before interpolated interval >cfi-1 = greater than cumulative frequency of the class

THE MODE
The last measure of central tendency is the mode. For a finite population, the population mode is the value of X that occurs most often. The mode of a sample is the value that occurs most often in the sample. The drawback to this measure is that there might not be a unique mode. There might be no single number that occurs more often that any another. For this reason, the mode is not a particularly useful descriptive measure. When the data are grouped into a frequency distribution, the midpoint of the cell with the highest

EXAMPLES:
1. The reaction times for a random sample of 9 subjects to a stimulant were recorded as 2.5, 3.6, 3.1, 4.3, 2.9, 2.3, 2.6, 4.1 and 4.3 seconds. Calculate the mean, median and mode. 2.5 + 3.6 + 3.1 + 4.3 + 2.9 + 2.3 + 2.6 + 4.1 + 4.3 Mean = -----------------------------------------------------------------9 Mean = 3.3 Median : 2.3, 2.5, 2.6, 2.9, Median = 3.4 Mode = 4.3 , 3.6, 4.1, 4.3, ,4.3

1. The numbers of incorrect answers on a true or false test for a random sample of 15 students were recorded as follows : 2, 1, 3, 0, 1, 3, 6, 0, 3, 3, 5, 2, 1, 4, and 2. Find the Mean, Median and Mode 2. The number of building permits issued last month to 12 construction firms in a small midwestern city were 4, 7, 0, 7, 11, 4, 1, 15, 3, 5, 8, and 7. treating data as population, find the mean, median and mode. 3. What is the average for a student who received grades of 85, 76, and 82 on 3 tests and a 79 on the final examination counts three times as much as each of the three tests? 4. The pilot has to pass three tests. The second test is weighted three times as much as the first, and the third test is weighted four times as much as the first. The pilot reached the scores of 40 on the first test, 45 on the second test and 60 on the third test. Calculate the mean.

Construction of Frequency Table

Frequency Distribution - The organization of raw data in table form with classes and frequencies. An arrangement of a large mass of data by grouping into different classes of the same size and determining the number of observations that fall in each of the classes. Ungrouped Frequency Distribution - A frequency distribution of numerical data. The raw data is not grouped. Grouped Frequency Distribution - A frequency distribution where several numbers are grouped into one class. Class Limits - The smallest and the largest values that can fall in a given class interval. Lower Class Limit The smallest value in a class interval Upper Class Limit - The largest value in a class interval Class Frequency - The number of observations falling in a particular class. (denoted by the letter f)

Class Boundaries - Separate one class in a grouped frequency distribution from another. The boundaries have one more decimal place than the raw data and therefore do not appear in the data. There is no gap between the upper boundary of one class and the lower boundary of the next class. The lower class boundary is found by subtracting 0.5 units from the lower class limit and the upper class boundary is found by adding 0.5 units to the upper class limit. Class Width - The difference between the upper and lower class boundaries of any. The class width is also the difference between the lower limits of two consecutive classes or the upper limits of two consecutive classes. It is not the difference between the upper and lower limits of the same class. Class Mark (Midpoint) - The number in the middle of the class. It is found by adding the upper and lower limits and dividing by two. It can also be found by adding the upper and lower boundaries and dividing by two.

Relative frequency of a class is defined as the frequency of the class divided by The total number of measurements. Data that are presented in the form of frequency distribution are called grouped data A set of measurements that has not been organized numerically is called raw data. An arrangement of raw numerical data in descending or ascending order of magnitude is called an array. The total frequency of all values less than the upper class boundary of a given interval up to and including that interval is called the Cumulative frequency

1. Twenty student are enrolled in the foreign language department, an their major field is as follows: Spanish, Spanish, French , Italian, French, Spanish, German, German, Russian, Russian, French, German, German, German, Spanish, Russian, German, Italian, German and Spanish. a) Make a frequency distribution table. b) Make a frequency bar graph. 2. The survey of 1000 adults was conducted to determine what meal they preferred to have in a fast food restaurant. Forty percent preferred breakfast, 30% preferred lunch, 20% preferred dinner, and 10% preferred snack. Display this information in a pie chart

Major Field German Russian Spanish French Italian Total

No. of students 7 3 5 3 2 20

The frequency distribution table is constructed by writing down the major field and next to it the number of students

Example: One hundred families were chosen at random, and their yearly income was recorded.
Income of 100 families Income in thousands 10 14 15 19 20 24 25 29 30 34 35 39 40 - 44 No. of families 3 12 19 20 23 18 5 100

Frequency distribution table 14 and 10 are called class limits 14 is the upper limit 10 is the lower limit Income in thousands 10 14 15 19 20 24 Class intervals In the table, the class width is 5 25 29 30 34 35 39 40 - 44 Total No. of families 3 12 19 20 23 18 5 100 Class frequency

The smaller number is the lower class boundary, and the larger is the upper class boundary. The difference between the upper and lower class boundaries is called the class width or class size

The class mark (or class midpoint) is the midpoint of class interval Lower class limit + Upper class limit 2

Class Midpoint =

The class mark of the class interval 35 39 is 35 + 39 / 2 = 37

1. Find the class boundaries, class marks and class widths for the following interval

7 13 (-5) (-1) 10.4 18.7 0.346 0.418

Class boundaries = 6.5 13.5 Class mark = 10 Class width =7

--- (-5) (+5)

Class boundaries = (-5.5) (-0.5) Class mark = -3 Class width =5 Class boundaries = 10.35 18.75 Class mark = 14.55 Class width = 8.4 Class boundaries = 0.3455 0.4185 Class mark = .382 Class width = .073

The weight of 50 men is depicted in the table below in the form of frequency distribution. Weight 115 121 122 128 129 135 136 142 143 149 150 156 157 163 Freq. 2 3 13 15 9 5 3 121.5 128.5 128.5 135.5 135.5 142.5 Boundary 114.5 121.5 118 125 132 139 146 153 .04 .06 .26 .30 .18 .10 .06 2 5 18 33 42 47 50 Classmarks R. Freq. C. Freq.

160 142.5 149.5 Class boundary = 115 - 0.5 = 114.5 and 121 + .5 = 121.5 149.5 156.5 Class mark = 114.5 + 121.5 = 236 / 2 = 118 Relative frequency = 2 / 156.5 163.5 50 = .04 Cumulative frequency = 0 + 2 = 2 and 2 + 3 = 5

Weight 115 121 122 128 129 135 136 142 143 149 150 156 157 163

Freq. 2 3 13 15 9 5 3

Boundary 114.5 121.5

Classmarks R. Freq. C. Freq. 118 .04 .06 .26 .30 .18 .10 .06 2 5 18 33 42 47 50

121.5 128.5 128.5 135.5 135.5 142.5 142.5 149.5

125 132 139 146 153 160

Compute for the mean, median 156.5 149.5 and mode 156.5 163.5

The frequency table represent the final examination for an statistics course. Find the mean, the median, and the mode.
Class Interval Frequency Class mark Cumulative Frequency

10 19 20 29 30 39 40 49 50 59 60 69 70 79 80 89 90 99

3 2 3 4 5 11 14 14 4

14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5 94.5

3 5 8 12 17 28 42 56 60

fi xi Mean = -------------- fi (3)(14.5) + (2)(24.5) +( 3)(34.5) + (4)(44.5) + (5) (54.5) + (11)(64.5) + 14(74.5)+ (14)(84.5) +(4)(94.5) Mean = ------------------------------------------------------------------------------3 + 2 + 3 + 4 + 5 + 11 + 14 + 14 + 14 Mean = 66

n/2 <cf1-1 Median = Lb + -------------------- (i) fi


60/2 28 Median = 69.5 + -------------------- (10) 14 Median = 70.93 Mode = Classmark with the highest frequency Mode = 74.5 and 84.5

The 25 measurements given below represent the sulfur level in the air for a sample of 25 days. The unit used in parts per million. 27 35 33 39 41 32 28 40 37 41 28 44 41 39 35 32 45 36 37 35 31 36 35 44 33

a. Make a frequency distribution table b. Construct Histogram c. Compute for relative frequency

Class Interval Frequency Relative freq.

26 29 30 33 34 37 38 41 42 - 45

3 5 8 6 3

.12 .20 .32 .24 .12

Guidelines for constructing Class Intervals and frequency distributions


1. Find the range of the measurements, which is the difference between the largest and smallest measurements. 2. Divide the range of the measurements by the approximate number of class intervals desired. The number of class interval is usually between 5 and 20, depending on the data. Then round the result to the convenient unit, which should be easy to work with. 3. The first class interval should contain the smallest measurement, and the last class interval should the largest measurement. 4. determine the number of measurements that fall into each class interval

The following scores represents the final examination grade for an elementary statistics course: 23 80 52 41 60 34 60 77 10 71 78 67 79 81 64 83 89 17 32 95 75 54 76 82 57 41 78 64 84 69 74 65 25 72 48 74 52 92 80 88 84 63 70 85 98 62 90 80 82 55 81 74 15 85 36 76 67 43 79 61

Using 9 intervals with the lowest starting at 10, a.Set up a frequency distribution table b.Construct a cumulative frequency distribution

MEASURES OF VARIABILITY - refers to the extent of scatter or dispersion around the zone of central tendency

A.

RANGE One measure of variation is the range, which has the advantage of being very easy to compute. The range, R, of a set of n measurements is defined as the difference between the largest and smallest measurements.

Formula: Range = Highest score Lowest Score or R = (H L)

B.

VARIANCE and STANDARD DEVIATION

The variance of a population of N measurements is defined to be the average of the squares of the deviations of the measurements about their mean . The population variance is denoted by and is given by the formula (x - ) = -------------N (x - ) = ---------------- for ungrouped data

for grouped data

The variance of a sample of n measurements is defined to be the sum of the squared deviations of the measurement about their mean x divided by (n1). The sample variance is denoted by s and is given by the formula (x x) s = --------------for ungrouped data n-1 (x x) s = ------------------for grouped data n-1 The standard deviation, in essence, represents the average amount of variability in a set of measures, using the mean as a reference point. Strictly speaking, the standard deviation is the positive square root of the average of the square deviations

1.

The reaction times for a random sample of 9 subjects to a stimulant were recorded as 2.5, 3.6, 3.1, 4.3, 2.9, 2.3, 2.6, 4.1 and 4.3 seconds. Calculate the range, variance and standard deviation. R = HV LV = 4.3 2.3 = 2 (x x-bar) s = -------------------------n-1 (2.5-3.3)2 + (3.6-3.3)2 + (3.1-3.3)2 +(4.3-3.3)2 + (2.9-3.3)2 + (2.33.3)2 +(2.6-3.3)2 + (4.1-3.3)2 + (4.3-3.3)2 = -------------------------------------------------------------------------------9 -1 = 0. 6325 (sample variance) s = sqrt (0.6325) = 0.795298686 or 0.80 (sample standard deviation)

The frequency table (on the right side) represent the final examination for an statistics course. Find the population range, population variance and population standard deviation
Class Interval Frequency Class mark Cumulative Frequency

10 19 20 29 30 39 40 49 50 59 60 69

3 2 3 4 5 11

14.5 24.5 34.5 44.5 54.5 64.5

3 5 8 12 17 28

70 79
80 89 90 99

14
14 4

74.5
84.5 94.5

42
56 60

Range = Highest Upper Class Boundary - Smallest Lower Class Boundary = 99.5 9.5 = 90 (x - ) = ---------------- 3(14.5 66)2 +2 (24.5 66)2 +3 (34.5 66)2 + 4(44.5 66)2 + 5(54.5 66)2 +11 (64.5 66)2 +14 (74.5 66)2 + 14(84.5 66)2 + 4(94.5 66)2 = ---------------------------------------------------------------------------60 = 432.75 = 20.80264406 or 20.80

You might also like