Probst at Lab
Probst at Lab
We hope you already have R and RStudio installed in your computers. Good luck with
your programming in R! Let us start with the very basics of R programming here in our
first Lab :)
If this is the first time, you are using R, then there is a lot to explore. You can try making
codes for the following problems. This will give you a general idea of how to use R and
over the next few labs you will learn more.
(1) Create a vector c = [5, 10, 15, 20, 25, 30] and write a program which returns the max-
imum and minimum of this vector.
(2) Write a program in R to find factorial of a number by taking input from user. Please
print error message if the input number is negative.
(3) Write a program to write first n terms of a Fibonacci sequence. You may take n as an
input from the user.
(4) Write an R program to make a simple calculator which can add, subtract, multiply
and divide.
(5) Explore plot, pie, barplot etc. (the plotting options) which are built-in functions in R.
2
Probability and Statistics (UCS410)
Experiment 2: Descriptive statistics, Sample space, definition of
probability
(1) (a) Suppose there is a chest of coins with 20 gold, 30 silver and 50 bronze coins.
You randomly draw 10 coins from this chest. Write an R code which will give us the
sample space for this experiment. (use of sample(): an in-built function in R)
(b) In a surgical procedure, the chances of success and failure are 90% and 10%
respectively. Generate a sample space for the next 10 surgical procedures performed.
(use of prob(): an in-built function in R)
(2) A room has n people, and each has an equal chance of being born on any of the 365
days of the year. (For simplicity, we’ll ignore leap years). What is the probability
that two people in the room have the same birthday?
(a) Use an R simulation to estimate this for various n.
(b) Find the smallest value of n for which the probability of a match is greater than
.5.
(3) Write an R function for computing conditional probability. Call this function to do
the following problem:
suppose the probability of the weather being cloudy is 40%. Also suppose the prob-
ability of rain on a given day is 20% and that the probability of clouds on a rainy day
is 85%. If it’s cloudy outside on a given day, what is the probability that it will rain
that day?
(4) The iris dataset is a built-in dataset in R that contains measurements on 4 different
attributes (in centimeters) for 150 flowers from 3 different species. Load this dataset
and do the following:
3
(5) R does not have a standard in-built function to calculate mode. So we create a user
function to calculate mode of a data set in R. This function takes the vector as input
and gives the mode value as output.
4
Probability and Statistics (UCS410)
Experiment 3: Probability distributions
(1) Roll 12 dice simultaneously, and let X denotes the number of 6’s that appear. Calcu-
late the probability of getting 7, 8 or 9, 6’s using R. (Try using the function pbinom;
If we set S = {get a 6 on one roll}, P(S) = 1/6 and the rolls constitute Bernoulli tri-
als; thus X ∼ binom(size=12, prob=1/6) and we are looking for P(7 ≤ X ≤ 9).
(2) Assume that the test scores of a college entrance exam fits a normal distribution.
Furthermore, the mean test score is 72, and the standard deviation is 15.2. What is
the percentage of students scoring 84 or more in the exam?
(3) On the average, five cars arrive at a particular car wash every hour. Let X count the
number of cars that arrive from 10AM to 11AM, then X ∼Poisson(λ = 5). What is
probability that no car arrives during this time. Next, suppose the car wash above
is in operation from 8AM to 6PM, and we let Y be the number of customers that
appear in this period. Since this period covers a total of 10 hours, we get that Y ∼
Poisson(λ = 5 × 10 = 50). What is the probability that there are between 48 and 50
customers, inclusive?
(4) Suppose in a certain shipment of 250 Pentium processors there are 17 defective pro-
cessors. A quality control consultant randomly collects 5 processors for inspection to
determine whether or not they are defective. Let X denote the number of defectives
in the sample. Find the probability of exactly 3 defectives in the sample, that is, find
P(X = 3).
(5) A recent national study showed that approximately 44.7% of college students have
used Wikipedia as a source in at least one of their term papers. Let X equal the
number of students in a random sample of size n = 31 who have used Wikipedia as a
source.
2. The time T, in days, required for the completion of a contracted project is a random
variable with probability density function f(t) = 0.1 e(-0.1t) for t > 0 and 0 otherwise. Find
the expected value of T.
Use function integrate( ) to find the expected value of continuous random variable T.
3. A bookstore purchases three copies of a book at $6.00 each and sells them for $12.00
each. Unsold copies are returned for $2.00 each. Let X = {number of copies sold} and
Y = {net revenue}. If the probability mass function of X is
x 0 1 2 3
p(x) 0.1 0.2 0.2 0.5
4. Find the first and second moments about the origin of the random variable X with
probability density function f(x) = 0.5e-|x|, 1 < x < 10 and 0 otherwise. Further use the
results to find Mean and Variance.
(kth moment = E(Xk), Mean = first moment and Variance = second moment – Mean2.
1. Consider that X is the time (in minutes) that a person has to wait in order to take a flight.
If each flight takes off each hour X ~ U(0, 60). Find the probability that
(a) waiting time is more than 45 minutes, and
(b) waiting time lies between 20 and 30 minutes.
2. The time (in hours) required to repair a machine is an exponential distributed random
variable with parameter λ = 1/2.
(a) Find the value of density function at x = 3.
(b) Plot the graph of exponential probability distribution for 0 ≤ x ≤ 5.
(c) Find the probability that a repair time takes at most 3 hours.
(d) Plot the graph of cumulative exponential probabilities for 0 ≤ x ≤ 5.
(e) Simulate 1000 exponential distributed random numbers with λ = ½ and plot the
simulated data.
Probability and Statistics(UCS410) Exp. sheet 06 (Joint probability mass and density functions)
————————————————————————————————————————————
(2) The joint probability mass function of two random variables X and Y is
(vi) find E(x), E(y), E(xy), V ar(x), V ar(y), Cov(x, y) and its correlation coefficient.
School of Mathematics(SOM)
————————————————————————————————————————————
(1) Use the rt(n, df ) function in r to investigate the t-distribution for n = 100 and df = n − 1 and plot
(2) Use the rchisq(n, df ) function in r to investigate the chi-square distribution with n = 100 and
df = 2, 10, 25.
(3) Generate a vector of 100 values between -6 and 6. Use the dt() function in r to find the values of a
t-distribution given a random variable x and degrees of freedom 1,4,10,30. Using these values plot
the density function for students t-distribution with degrees of freedom 30. Also shows a comparison
(i) To find the 95th percentile of the F -distribution with (10, 20) degrees of freedom.
(ii) To calculate the area under the curve for the interval [0, 1.5] and the interval [1.5, +∞) of
(iii) To calculate the quantile for a given area (= probability) under the curve for a F -curve
with v1 = 10 and v2 = 20 that corresponds to q = 0.25, 0.5, 0.75 and 0.999. (use the qf ())
(iv) To generate 1000 random values from the F -distribution with v1 = 10 and v2 = 20 (use
See the red vertical line in the histogram? That’s the population mean. Comment on
whether the data is normally distributed or not?
Now perform the following tasks:
(a) Draw sufficient samples of size 10, calculate their means, and plot them in R
by making histogram. Do you get a normal distribution.
(b) Now repeat the same with sample size 50, 500 and 9000. Can you comment on
what you observe.
Here, we get a good bell-shaped curve and the sampling distribution approaches
normal distribution as the sample sizes increase. Therefore, we can recommend the
organization to use sampling distributions of mean for further analysis.