Main
Main
(UMA401)
Department of Mathematics
TIET, Patiala
LIST OF EXPERIMENTS:
S. No. Title Number of hours CO-mapping
1 Basics of R programming 2 CO-1
2 Descriptive statistics, Sample 2 CO-1 and CO-2
space, definition of probability
3 Probability distributions 3 CO-3
4 Mathematical Expectation, Mo- 4 CO-2 and CO-3
ments and Functions of Random
Variables
5 Continuous Probability Distribu- 3 CO-3
tions
6 Joint probability mass and density 2 CO-2
functions
7 Chi-square, t-distribution, F- 2 CO-3 and CO-4
distribution
8 Sample statitics and Testing hy- 4 CO-4
pothesis
1
Probability and Statistics (UMA401)
Experiment 1: Basics of R programming
We hope you already have R and RStudio installed in your computers. Good luck with
your programming in R! Let us start with the very basics of R programming here in our
first Lab :)
If this is the first time, you are using R, then there is a lot to explore. You can try making
codes for the following problems. This will give you a general idea of how to use R and
over the next few labs you will learn more.
(1) Create a vector c = [5, 10, 15, 20, 25, 30] and write a program which returns the max-
imum and minimum of this vector.
(2) Write a program in R to find factorial of a number by taking input from user. Please
print error message if the input number is negative.
(3) Write a program to write first n terms of a Fibonacci sequence. You may take n as an
input from the user.
(4) Write an R program to make a simple calculator which can add, subtract, multiply
and divide.
(5) Explore plot, pie, barplot etc. (the plotting options) which are built-in functions in R.
2
Probability and Statistics (UMA401)
Experiment 2: Descriptive statistics, Sample space, definition of
probability
(1) (a) Suppose there is a chest of coins with 20 gold, 30 silver and 50 bronze coins.
You randomly draw 10 coins from this chest. Write an R code which will give us the
one outcome of the sample space for this experiment. (use of sample(): an in-built
function in R)
(b) In a surgical procedure, the chances of success and failure are 90% and 10%
respectively. Generate an outcome for the next 10 surgical procedures performed.
(use of prob(): an option to be passed in the sample function)
(2) A room has n people, and each has an equal chance of being born on any of the 365
days of the year. (For simplicity, we’ll ignore leap years). What is the probability
that two people in the room have the same birthday?
(a) Use an R simulation to estimate this for various n.
(b) Find the smallest value of n for which the probability of a match is greater than
0.5.
(3) Write an R function for computing conditional probability. Call this function to do
the following problem:
suppose the probability of the weather being cloudy is 40%. Also suppose the prob-
ability of rain on a given day is 20% and that the probability of clouds on a rainy day
is 85%. If it’s cloudy outside on a given day, what is the probability that it will rain
that day?
(4) The iris dataset is a built-in dataset in R that contains measurements on 4 different
attributes (in centimeters) for 150 flowers from 3 different species. Load this dataset
and do the following:
3
(5) R does not have a standard in-built function to calculate mode. So we create a user
function to calculate mode of a data set in R. This function takes the vector as input
and gives the mode value as output.
4
Probability and Statistics (UMA401)
Experiment 3: Probability distribution
When working with different statistical distributions, we often want to make probabilistic
statements based on the distribution. We typically want to know one of four things:
Every distribution that R handles has four functions. There is a root name, for example,
the root name for the normal distribution is ‘norm’. This root is prefixed by one of the
letters
For the normal distribution, these functions are pnorm, qnorm, dnorm, and rnorm. For the
binomial distribution, these functions are pbinom, qbinom, dbinom, and rbinom. And so
forth. For a continuous distribution (like the normal), the most useful functions for doing
problems involving probability calculations are the ”p” and ”q” functions (c. d. f. and
inverse c. d. f.), because the the density (p. d. f.) calculated by the ”d” function can only
be used to calculate probabilities via integrals and R doesn’t do integrals.
For a discrete distribution (like the binomial), the ”d” function calculates the density (p.
f.), which in this case is a probability f (x) = P(X = x) and hence is useful in calculating
probabilities.
(1) Roll 12 dice simultaneously, and let X denotes the number of 6’s that appear. Calcu-
late the probability of getting 7, 8 or 9, 6’s using R. (Try using the function pbinom;
If we set S = {get a 6 on one roll}, P(S) = 1/6 and the rolls constitute Bernoulli
trials; thus X ∼ binom(size=12, prob=1/6) and we are looking for P(7 ≤ X ≤ 9).
5
(2) Assume that the test scores of a college entrance exam fits a normal distribution.
Furthermore, the mean test score is 72, and the standard deviation is 15.2. What is
the percentage of students scoring 84 or more in the exam?
(3) On the average, five cars arrive at a particular car wash every hour. Let X count the
number of cars that arrive from 10AM to 11AM, then X ∼Poisson(λ = 5). What is
probability that no car arrives during this time. Next, suppose the car wash above
is in operation from 8AM to 6PM, and we let Y be the number of customers that
appear in this period. Since this period covers a total of 10 hours, we get that Y ∼
Poisson(λ = 5 × 10 = 50). What is the probability that there are between 48 and 50
customers, inclusive?
(4) Suppose in a certain shipment of 250 Pentium processors there are 17 defective pro-
cessors. A quality control consultant randomly collects 5 processors for inspection to
determine whether or not they are defective. Let X denote the number of defectives
in the sample. Find the probability of exactly 3 defectives in the sample, that is, find
P(X = 3).
(5) A recent national study showed that approximately 44.7% of college students have
used Wikipedia as a source in at least one of their term papers. Let X equal the
number of students in a random sample of size n = 31 who have used Wikipedia as a
source.
6
Probability and Statistics (UMA401)
Experiment 4: Mathematical Expectation, Moments and Functions of
Random Variables
x 0 1 2 3 4
p(x) 0.41 0.37 0.16 0.05 0.01
Find the average number of imperfections per 10 meters of this fabric. (Try functions
sum( ), weighted.mean( ), c(a % ∗ % b) to find expected value/mean).
(2) The time T , in days, required for the completion of a contracted project is a random
variable with probability density function f (t) = 0.1 exp(−0.1t) for t > 0 and 0 oth-
erwise. Find the expected value of T . Use function integrate( ) to find the expected
value of continuous random variable T .
(3) A bookstore purchases three copies of a book at $6.00 each and sells them for $12.00
each. Unsold copies are returned for $2.00 each. Let X = number of copies sold and
Y = net revenue. If the probability mass function of X is:
x 0 1 2 3
p(x) 0.1 0.2 0.2 0.5
(4) Find the first and second moments about the origin of the random variable X with
probability density function f (x) = 0.5 exp(−|x|), 1 < x < 10 and 0 otherwise. Fur-
ther use the results to find Mean and Variance. (kth moment = E(X k ), Mean = first
moment and Variance = second moment – Mean2 .
7
Probability and Statistics (UMA401)
Experiment 5: Continuous Probability Distributions
(1) Consider that X is the time (in minutes) that a person has to wait in order to take a
flight. If each flight takes off each hour X ∼ U(0, 60). Find the probability that
(2) The time (in hours) required to repair a machine is an exponential distributed random
variable with parameter λ = 1/2.
(3) The lifetime of certain equipment is described by a random variable X that follows
Gamma distribution with parameters α = 2 and β = 1/3.
(a) Find the probability that the lifetime of equipment is (i) 3 units of time, and (ii)
at least 1 unit of time.
(b) What is the value of c, if P(X ≤ c) ≥ 0.70? (Hint: try quantile function
qgamma())
8
Probability and Statistics (UMA401)
Experiment 6: Joint probability mass and density functions
Write a R-code to
(2) The joint probability mass function of two random variables X and Y is
f (x, y) = (x + y)/30; x = 0, 1, 2, 3; y = 0, 1, 2
Write a R-code to
9
Probability and Statistics (UMA401)
Experiment 7: Chi-square, t-distribution, F-distribution
(1) Use the rt(n, d f ) function in R to investigate the t-distribution for n = 100 and d f =
n1 and plot the histogram for the same.
(2) Use the rchisq(n, d f ) function in R to investigate the chi-square distribution with
n = 100 and d f = 2, 10, 25.
(3) Generate a vector of 100 values between −6 and 6. Use the dt() function in R to
find the values of a t-distribution given a random variable X and degrees of freedom
1, 4, 10, 30. Using these values plot the density function for students t-distribution
with degrees of freedom 30. Also shows a comparison of probability density func-
tions having different degrees of freedom (1, 4, 10, 30).
(i) To find the 95th percentile of the F-distribution with (10, 20) degrees of free-
dom.
(ii) To calculate the area under the curve for the interval [0, 1.5] and the interval
[1.5, ∞) of a F -curve with v1 = 10 and v2 = 20 (USE p f ()).
(iii) To calculate the quantile for a given area (= probability) under the curve for a
F-curve with v1 = 10 and v2 = 20 that corresponds to q = 0.25, 0.5, 0.75 and
0.999. (use the q f ()).
(iv) To generate 1000 random values from the F -distribution with v1 = 10 and v2 =
20 (use r f ()) and plot a histogram.
10
Probability and Statistics (UMA401)
Experiment 8: Sample statitics and Testing hypothesis
See the red vertical line in the histogram? That’s the population mean. Comment on
whether the data is normally distributed or not?
Now perform the following tasks:
(a) Draw sufficient samples of size 10, calculate their means, and plot them in R
by making histogram. Do you get a normal distribution.
(b) Now repeat the same with sample size 50, 500 and 9000. Can you comment on
what you observe.
Here, we get a good bell-shaped curve and the sampling distribution approaches
normal distribution as the sample sizes increase. Therefore, we can recommend the
organization to use sampling distributions of mean for further analysis.
2. The following table gives information on ages and cholesterol levels for a random
sample of 10 men
Age 58 69 43 39 63 52 47 31 74 36
Cholesterol 189 235 193 177 154 191 213 165 198 181
Plot the scatter diagram and a regression line that will enable us to predict Cholesterol
level on age. Further, estimate the cholesterol level of a 60 year-old man.
11
3. A research methodology course has recently been added to the PhD curriculum at
the Thapar Institute of Engineering and Technology, Patiala. To evaluate its effec-
tiveness, students take a test on formulating research problems and writing research
papers both before and after completing the course. Below are the marks for a ran-
dom sample of ten students:
Before the test 145 173 158 141 167 159 154 167 145 153
After the test 155 167 156 149 168 162 158 169 157 161
Assume that the differences between the pre-course and post-course test scores are
normally distributed, and a high score on the test indicates a strong level of assertive-
ness. Do the collected data, at 5% level of significance, provide enough evidence to
conclude that research scholars become more assertive after completing the course?
12