0% found this document useful (0 votes)
24 views13 pages

PSM Syllabus

The document outlines the course MAT256: Probability and Statistical Modelling, which covers fundamental concepts in probability and statistics, sampling techniques, hypothesis testing, and regression analysis. It includes course outcomes, assessment patterns, a detailed syllabus divided into modules, and sample questions for evaluation. The course is designed for students in Computer Science and Engineering with a focus on applications in engineering and science.

Uploaded by

A K2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views13 pages

PSM Syllabus

The document outlines the course MAT256: Probability and Statistical Modelling, which covers fundamental concepts in probability and statistics, sampling techniques, hypothesis testing, and regression analysis. It includes course outcomes, assessment patterns, a detailed syllabus divided into modules, and sample questions for evaluation. The course is designed for students in Computer Science and Engineering with a focus on applications in engineering and science.

Uploaded by

A K2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Computer Science and Engineering (Data Science)

PROBABILITY AND Year of


Category L T P Credit
MAT256 STATISTICAL Introduction
MODELLING BSC 3 1 0 4 2019

Preamble: Study of this course provides the learners a clear understanding of fundamental
concepts in probability and statistics. This course covers the modern theory of probability and
statistics, important models of sampling, techniques of hypothesis testing and correlation &
regression. The course helps the learners to find varied applications in engineering and science
like disease modelling, climate prediction and computer networks.
Prerequisite: A sound knowledge in Calculus.

Mapping of course outcomes with program outcomes


Explain the concept, properties and important models of discrete random variables and use
CO1
them to analyze suitable random phenomena(Cognitive Knowledge Level: Apply )
Summarize the properties and relevant models of continuous random variables and use
CO2
them to analyze suitable random phenomena(Cognitive Knowledge Level: Apply)

Make use of concepts of sampling and theory of estimation to solve application level
CO3
problems (Cognitive Knowledge Level: Apply)

Organize the basic concepts in hypothesis testing and develop decision procedures for the
CO4
most frequently encountered testing problems(Cognitive Knowledge Level: Apply)

Build statistical methods like correlation and regression analysis to interpret experimental
CO5
data (Cognitive Knowledge Level: Apply)

Mapping of course outcomes with program outcomes

PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12

CO1

CO2

CO3

CO4

CO5

CO6
Computer Science and Engineering (Data Science)

Abstract POs defined by National Board of Accreditation

PO# Broad PO PO# Broad PO

PO1 Engineering Knowledge PO7 Environment and Sustainability

PO2 Problem Analysis PO8 Ethics

PO3 Design/Development of solutions PO9 Individual and team work

Conduct investigations of
PO4 complex problems PO10 Communication

PO5 Modern tool usage PO11 Project Management and Finance

PO6 The Engineer and Society PO12 Lifelong learning

Assessment Pattern

Continuous Assessment Tests


Bloom’s End Semester
Category Examination
Test 1 (%) Test 2 (%) Marks (%)

Remember 30 30 30

Understand 30 30 30

Apply 40 40 40

Analyze

Evaluate

Create

Mark Distribution

Total Marks CIE Marks ESE Marks ESE Duration

150 50 100 3
Computer Science and Engineering (Data Science)

Continuous Internal Evaluation Pattern:


Attendance 10 marks
Continuous Assessment Tests(Average of Internal Tests1&2) 25 marks
Continuous Assessment Assignment 15 marks

Internal Examination Pattern


Each of the two internal examinations has to be conducted out of 50 marks. First series test shall
be preferably conducted after completing the first half of the syllabus and the second series test
shall be preferably conducted after completing remaining part of the syllabus. There will be two
parts: Part A and Part B. Part A contains 5 questions (preferably, 2 questions each from the
completed modules and 1 question from the partly completed module), having 3 marks for each
question adding up to 15 marks for part A. Students should answer all questions from Part A. Part
B contains 7 questions (preferably, 3 questions each from the completed modules and 1 question
from the partly completed module), each with 7 marks. Out of the 7 questions, a student should
answer any5.
End Semester Examination Pattern:
There will be two parts; Part A and Part B. Part A contains 10 questions with 2 questions from
each module, having 3 marks for each question. Students should answer all questions. Part B
contains 2 full questions from each module of which student should answer any one. Each
question can have maximum 2 sub-divisions and carries 14 marks.
Syllabus

Module-1 (Discrete probability distributions)


Discrete random variables and their probability distributions, Expectation, mean and variance,
Binomial distribution, Poisson distribution, Poisson approximation to the binomial distribution,
Discrete bivariate distributions, marginal distributions, Independent random variables,
Expectation ,multiple random variables.

Module - 2(Continuous probability distributions)


Continuous random variables and their probability distributions, Expectation, mean and variance,
Uniform, exponential and normal distributions, Continuous bivariate distributions, marginal
distributions, Independent random variables. Expectation-multiple random variables, independent
and identically distributed (i.i.d) random variables and Central limit theorem (Proof not required).

Module - 3(Sampling Techniques)


Need for Sampling, Some Fundamental Definitions, Important Sampling Distributions, Sampling
Theory, Sandler’s A-test, Concept of Standard Error, Estimation, Estimating the Population
Mean(µ), Estimating Population Proportion, Sample Size and its Determination, Determination of
Computer Science and Engineering (Data Science)

Sample Size through the Approach Based on Precision Rate and Confidence Level, Determination
of Sample Size through the Approach Based on Bayesian Statistics

Module– 4(Testing of Hypothesis)


Hypothesis and Test Procedures, Tests about a population mean, Tests concerning a population
proportion, p-values, Single factor ANOVA, F-test, Multiple comparisons in ANOVA, Two factor
ANOVA

Module - 5 (Correlation and Regression Analysis)


Simple Linear Regression Model, Estimating model parameters, Correlation, Non-Linear and
multiple regression, Assessing Model Adequacy, Regression with transformed values, Polynomial
Regression, Multiple Regression Analysis

Text Books
1. Jay L. Devore, Probability and Statistics for Engineering and the Sciences, 8th edition,
Cengage, 2012
2. Research Methodology: Methods and Techniques: C.R. Kothari, New Age International
Publishers
Reference Books
1. HosseinPishro-Nik, Introduction to Probability, Statistics and Random Processes, Kappa
Research, 2014 (Also available online at www.probabilitycourse.com )
2. Sheldon M. Ross, Introduction to probability and statistics for engineers and scientists, 4th
edition, Elsevier, 2009.
3. T. VeeraRajan, Probability, Statistics and Random processes, Tata McGraw-Hill,2008
4. B.S. Grewal, Higher Engineering Mathematics, Khanna Publishers, 36 Edition, 2010
5. Levin R.I. and Rubin D.S., Statistics for Management, 7th edition, Prentice Hall of India
Pvt. Ltd., New Delhi, 2001.
6. Srivastava TN, Shailaja Rego, Statistics for Management, Tata McGraw Hill, 2008.
7. Anand Sharma, Statistics for Management, Himalaya Publishing House, Second Revised
edition, 2008.
8. Goon A.M., Gupta M.K. and Dasgupta B. (2002): Fundamentals of Statistics, Vol. I & II,
8th Edition. The World Press, Kolkata.
9. Miller, Irwin and Miller, Marylees (2006): John E. Freund’s Mathematical Statistics with
Applications, (7th Edition.), Pearson Education, Asia.
10. Sampling of Populations: Methods and Applications (2008): Paul S. Levy , Stanley
Lemeshow (Fourth Edition), John Wiley &Sons
Computer Science and Engineering (Data Science)

Course Level Assessment Questions


Course Outcome1 (CO1):
1. Organizers of a concert are limiting tickets sales to a maximum of 4 tickets per customer.
Let T be the number of tickets purchased by a random customer. Here is the probability
distribution of T:
T=#of tickets 1 2 3 4
P(T) 0.1 0.3 0.2 0.4

Calculate the expected value of T.


2. X is a binomial random variable B (n, p) with n = 100 and p= 0.1. How would you
approximate it by a Poisson random variable?
3. Three balls are drawn at random without replacement from a box containing 2 white,3
red and 4 black balls. If X denotes the number of white balls drawn and Y denotes the
number of red balls drawn, find the joint probability distribution of (X, Y).

Course Outcome 2(CO2):


1. What can you say about P(X = a) for any real number a when X is a (i) discrete
random variable? (ii) continuous random variable?
2. Let X be a random variable with PDF given by

fX(x)= cx2 |x| ≤ 1


0 Otherwise

a. Find the constant c.


b. Find E(X) and Var(X).
c. Find P(X≥1/2).
3. A string, 1 meter long, is cut into two pieces at a random point between its ends. What
is the probability that the length of one piece is at least twice the length of the other?

Course Outcome 3(CO3):


1. In a random selection of 64 of the 2400 intersections in a small city, the mean number of
scooter accidents per year was 3.2 and the sample standard deviation was 0.8.
(a) Make an estimate of the standard deviation of the population from the sample standard
deviation.
(b) Work out the standard error of mean for this finite population.
(c) If the desired confidence level is 0.90, what will be the upper and lower limits of the
confidence interval for the mean number of accidents per intersection per year?
Computer Science and Engineering (Data Science)

2. Suppose a certain hotel management is interested in determining the percentage of the


hotel’s guests who stay for more than 3 days. The reservation manager wants to be 95 per
cent confident that the percentage has been estimated to be within ± 3% of the true value.
What is the most conservative sample size needed for this problem?
3. 500 articles were selected at random out of a batch containing 10000 articles and 30 were
found defective. How many defective articles would you reasonably expect to find in the
whole batch?

Course Outcome 4(CO4):


1. A manufacturer of sprinkler systems used for fire protection in office buildings claims that
the true average system-activation temperature is 130°F. A sample of n=9 systems, when
tested, yields a sample average activation temperature of 131.08°F. If the distribution of
activation times is normal with standard deviation 1.5°F, does the data contradict the
manufacturer’s claim at significance level α=0.01?
2. Let m denote the true average radioactivity level (picocuries per liter). The value 5 pCi/L is
considered the dividing line between safe and unsafe water. Would you recommend testing
H0: µ =5 versus Ha: µ >5 or H0: µ =5 versus Ha: µ <5 ? Explain your reasoning.
3. Pairs of P-values and significance levels, a, are given. For each pair, state whether the
observed P-value would lead to rejection of H0 at the given significance level.
a. P-value=0.084, α=0.05
b. P-value=0.003, α=0.001

Course Outcome 5 (CO5):


1.Calculate and interpret the correlation coefficient of the two variables below.

Person Hand Height


A 17 150
B 15 154
C 19 169
D 17 172
E 21 175

2. You are told that a 95% CI for expected lead content when traffic flow is 15, based on a
sample of n=10 observations is (462.1, 597.7). Calculate a CI with confidence level 99%
for expected lead content when traffic flow is 15.
3. A trucking company considered a multiple regression model for relating the dependent
variable y=total daily travel time for one of its drivers (hours) to the predictors x1=distance
travelled (miles) and x2=the number of deliveries made. Suppose that the model equation
is Y = -0.800 +0.060 x1+ 0.900x2+ ε. What is the mean value of travel time when distance
traveled is 50 miles and three deliveries are made?
Computer Science and Engineering (Data Science)

Model Question Paper

QP CODE:

Reg No: _______________

Name: _________________ PAGES : 4

APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY

FOURTH SEMESTER B.TECH DEGREE EXAMINATION, MONTH & YEAR

Course Code: MAT256

Course Name: Probability and Statistical Modelling

Max.Marks:100 Duration: 3 Hours

PART A

Answer All Questions. Each Question Carries 3 Marks

1. Let X denote the number that shows up when an unfair die is tossed. Faces 1
to 5 of the die are equally likely, while face 6 is twice as likely as any other.
Find the probability distribution, mean and variance of X.

2. An equipment consists of 5 components each of which may fail independently


with probability 0.15. If the equipment is able to function properly when at
least 3 of the components are operational, what is the probability that it
functions properly?

3. A random variable has a normal distribution with standard deviation 10. If the
probability that it will take on a value less than 82.5 is 0.82, what is the probability
that it will take on a value more than 58.3?

4. X and Y are independent random variables with X following an exponential


distribution with parameter µ and Y following and exponential distribution
with parameterλ. Find P (X+ Y⩽ 1).

5. Discuss the difference between F-distribution and Chi-square distribution.

6. From a random sample of 36 New Delhi civil service personnel, the mean age
and the sample standard deviation were found to be 40 years and 4.5 years
Computer Science and Engineering (Data Science)

respectively. Construct a 95 per cent confidence interval for the mean age of
civil servants in New Delhi.

7. A sample of 50 lenses used in eyeglasses yields a sample mean thickness of 3.05


mm and a sample standard deviation of .34 mm. The desired true average
thickness of such lenses is 3.20 mm. Does the data strongly suggest that the true
average thickness of such lenses is something other than what is desired? Test
using α=0.05.

8. A random sample of 110 lightning flashes in a certain region resulted in a sample


average radar echo duration of 0.81 sec and a sample standard deviation of 0.34
sec. Calculate a 99% (two-sided) confidence interval for the true average echo
duration m, and interpret the resulting interval.

9. Let the test statistic T have a t distribution when H0 is true. Give the significance
level for the following situation Ha: µ>µ0, df=15, rejection region t ≥ 3.733.

10. Calculate the regression coefficient and obtain the lines of regression for the
following data
(10x3=30)
X 1 2 3 4 5 6 7
Y 9 8 10 12 11 13 14

Part B
(Answer any one question from each module. Each question carries 14 Marks)

11. (a) The probability mass function of a discrete random variable is P(x) = kx ; (7)
x = 1,2,3 where k is positive constant. Find (i) the value of k (ii) P(X ≤2)
(iii) E[X] (iv) var(1-X).

(b) Find the mean and variance of a binomial random variable (7)

OR

12. (a) Accidents occur at an intersection at a Poisson rate of 2 per day. What is the (7)
probability that there would be no accidents on a given day? What is the
probability that in January there are at least 3 days (not necessarily
consecutive) without any accidents?

(b) One fair die is rolled. Let X denote the number on the die and Y = 0 or 1, (7)
according as the die shows an even number or odd number. Find (i) the joint
probability distribution of X and Y, (ii) the marginal distributions. (iii) Are X
and Y independent?
Computer Science and Engineering (Data Science)

13. (a) The IQ of an individual randomly selected from a population is a normal (7)
distribution with mean 100 and standard deviation 15. Find the probability
that an individual has IQ (i) above 140 (ii) between 120 and 130.

(b) A continuous random variable X is uniformly distributed with mean 1 and (7)
variance 4/3. Find P(X < 0)?

OR

14. (a) The joint density function of random variables X and Y is given by (7)

f (x, y) = e-(x+ y), x>0, y>0


0 otherwise

Find P(X + Y ≤1). Are X and Y independent? Justify

(b) The lifetime of a certain type of electric bulb may be considered as an (7)
exponential random variable with mean 50 hours. Using central limit
theorem, find the approximate probability that 100 of these electric bulbs
will provide a total of more than 6000 hours of burning time.

15. (a) A market research survey in which 64 consumers were contacted and states (7)
that 64 percent of all consumers of a certain product were motivated by the
product’s advertising. Find the confidence limits for the proportion of
consumers motivated by advertising in the population, given a confidence
level equal to 0.95.

(b) Determine the size of the sample for estimating the true weight of the cereal (7)
containers for the universe with N = 5000 on the basis of the following
information:
(i) the variance of weight = 4 ounces on the basis of past records.
(ii) estimate should be within 0.8 ounces of the true average weight with 99%
probability.

OR

16. (a) The foreman of ABC mining company has estimated the average quantity of (7)
iron ore extracted to be 36.8 tons per shift and the sample standard deviation
to be 2.8 tons per shift, based upon a random selection of 4 shifts. Construct
a 90 percent confidence interval around this estimate.
Computer Science and Engineering (Data Science)

(b) What should be the size of the sample if a simple random sample from a (7)
population of 4000 items is to be drawn to estimate the percent defective
within 2 per cent of the true value with 95.5 per cent probability? What would
be the size of the sample if the population is assumed to be infinite in the
given case?

17. The calibration of a scale is to be checked by weighing a10-kg test specimen


25 times. Suppose that the results of different weighings are independent of
one another and that the weight on each trial is normally distributed with σ
=0.200kg. Let µ denote the true average weight reading on the scale.
(a)What hypotheses should be tested? (7)
(b)Suppose the scale is to be recalibrated if either x̄≥10.1032 orx̄≤ 0 .8968. (7)
What is the probability that recalibration is carried out when it is actually
unnecessary?

OR

18. (a) Lightbulbs of a certain type are advertised as having an average lifetime of
750 hours. The price of these bulbs is very favorable, so a potential customer
has decided to go ahead with a purchase arrangement unless it can be
conclusively demonstrated that the true average lifetime is smaller than what
is advertised. A random sample of 50 bulbs was selected, the lifetime of each
bulb determined, and the appropriate hypotheses were tested using Minitab,
resulting in the accompanying output.
Variable N Mean StDev SEMean Z P-Value
lifetime 50 738.44 38.20 5.40 -2.14 0
(7)
What conclusion would be appropriate for a significance level of 0.05? A
significance level of 0.01? What significance level and conclusion would you
recommend?

(b) The recommended daily dietary allowance for zinc among males older than
age 50 years is 15 mg/day. The article “Nutrient Intakes and Dietary Patterns
of Older Americans: A National Study” reports the following summary data
on intake for a sample of males age 65–74 years: n=115, x̄ =11.3, and s=6.43.
(7)
Does this data indicate that average daily zinc intake in the population of all
males ages 65–74 falls below the recommended allowance?

19. The flow rate y (m3/min) in a device used for air-quality measurement
depends on the pressure drop x (inches of water ) across the device’s filter.
Suppose that for x values between 5 and 20, the two variables are related
according to the simple linear regression model with true regression line
y = -0.12 + 0.095x
Computer Science and Engineering (Data Science)

(a)What is the expected change in flow rate associated with a 1 inch increase (7)
in pressure drop? Explain.
(b)What change in flow rate can be expected when pressure drop decreases (7)
by 5 inches?

OR

20. Suppose that in a certain chemical process the reaction time y (hr) is related to
the temperature (°F) in the chamber in which the reaction takes place
according to the simple linear regression model with equation y = 5.00 - 0.01x
and σ =0.075
(a)What is the expected change in reaction time for a 1°F increase and 10°F (7)
increase in temperature?
(b)What is the expected reaction time when temperature is 200°F and 250°F? (7)

Teaching Plan

No. of
Lecture
No Contents Hours
(45 hrs)

Module 1- (Discrete Probability distributions) (9 hours)

1.1 Discrete random variables 1 hour

1.2 Probability Distributions 1 hour

1.3 Expectation, mean and variance 1 hour

1.4 Binomial distribution 1 hour

1.5 Poisson distribution 1 hour

1.6 Poisson approximation to binomial Distribution 1 hour

1.7 Discrete bivariate distributions 1 hour

1.8 Marginal distributions, Independent Random variables 1 hour

1.9 Expectation-multiple random variables 1 hour

Module-2 Continuous Probability distributions(9 hours)

2.1 Continuous random variables and probability distributions 1 hour


Computer Science and Engineering (Data Science)

2.2 Expectation, mean and variance 1 hour

2.3 Uniform distributions 1 hour

2.4 Exponential Distribution 1 hour

2.5 Normal distribution 1 hour

2.6 Continuous Bivariate distributions 1 hour

2.7 Marginal distributions, Independent random variables 1 hour

2.8 Expectation-multiple random variables, i.i.d random variables 1 hour

2.9 Central limit theorem. 1 hour

Module-3 (Sampling Techniques) (9 hours)

3.1 Need for Sampling 1 hour

3.2 Some fundamental Definitions, Important Sampling Distributions 1 hour

3.3 Sampling Theory, Sandler’s A-test 1 hour

3.4 Concept of Standard Error, Estimation , Estimating the Population Mean(µ) 1 hour

3.5 Estimating Population Proportion 1 hour

3.6 Sample Size and its Determination 1 hour

Determination of Sample Size through the Approach Based on Precision


3.7 1 hour
Rate and Confidence Level

Determination of Sample Size through the Approach Based on Bayesian


3.8 1 hour
Statistics

Determination of Sample Size through the Approach Based on Bayesian


3.9 1 hour
Statistics(continued)

Module-4 (Testing of Hypothesis) (9 hours)

4.1 Null and alternate Hypothesis 1 hour

4.2 Test Procedures 1 hour

4.3 Test Tests about a population mean 1 hour

4.4 Tests concerning a population proportion 1 hour

4.5 p-values 1 hour


Computer Science and Engineering (Data Science)

4.6 Single factor ANOVA 1 hour

4.7 F-Test 1 hour

4.8 Multiple comparisons in ANOVA 1 hour

4.9 Two factor ANOVA 1 hour

Module-5 (Correlation and Regression Analysis) (9 hours)

5.1 Simple Linear Regression Model(Lecture 1) 1 hour

5.2 Simple Linear Regression Model(Lecture 2) 1 hour

5.3 Estimating model parameters 1 hour

5.4 Correlation 1 hour

5.5 Non-Linear and multiple regression 1 hour

5.6 Assessing Model Adequacy 1 hour

5.7 Regression with transformed values 1 hour

5.8 Polynomial Regression 1 hour

5.9 Multiple Regression Analysis 1 hour

You might also like