0% found this document useful (0 votes)
26 views10 pages

R Programming Student Lab Manual-52-63-3-12

how you with the roadmap of r programming

Uploaded by

lucyrahul004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views10 pages

R Programming Student Lab Manual-52-63-3-12

how you with the roadmap of r programming

Uploaded by

lucyrahul004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

R function pgeom(q, prob, lower.

tail) is the cumulative probability


( lower. tail = TRUE for left tail, lower. tail = FALSE for right tail) of less
than or equal to q failures prior to success.
Example: A sports marketer randomly selects persons on the street until he
encounters someone who attended a game last season. what is the
probability the marketer fails to find someone who attended a game in x <=
5 trials before finding someone who attended a game on the next trial when
the population probability is p = 0.20?
Sol:p = 0.20 n = 5
# exact
pgeom(q = n, prob = p, lower.tail = TRUE)
[1] 0.737856

Experiment 10 Implement built in R functions for sample statistics and


Statistics tests
10(a) Calculate Confidence Interval in R for Normal Distribution
# Assume mean= 12
# Standard deviation = 3
# Sample size n= 30
# 95 percent confidence interval so tails are .925 calculate confidence interval

Solution
> center <- 12
> sd <- 3
> n <- 30
> E <- qnorm(0.975)*sd/sqrt(n)
>E
[1] 1.073516
> lower_bound <- center – E
>lower_bound
[1] 10.92648
>upper_bound <- center + E
>upper_bound

53
[1] 13.07352
Therefore lower_bound is 10.92648 and upper_bound is 13.07352
Thus the range in this case is between 10.9 and 13.1 (rounding outwards).

Experiment 10 (b) Hypothesis Testing


Hypothesis testing is mathematically related to the problem of finding
confidence intervals. However, the approach is different. For one, you use
the data to tell you where the unknown parameters should lie, for
hypothesis testing, you make a hypothesis about the value of the unknown
parameter and then calculate how likely it is that you observed the data or
worse. However, with R you will not notice much difference as the same
functions are used for both. The way you use them is slightly different
though.
Question : Consider a simple survey. You ask 100 people (randomly chosen)
and 42 say “yes” to your question. Does this support the hypothesis that the
true proportion is 50%? To answer this, we set up a test of hypothesis. The null
hypothesis, denoted H0 is that p = .5, the alternative hypothesis, denoted HA, in
this example would be p 6= 0.5. This is a so called “two-sided” alternative. To
test the assumptions, we use the function prop.test as with the confidence
interval calculation.
Solution:
> prop.test(42,100,p=.5)

Note the p-value of 0.1336. The p-value reports how likely we are to see this
data or worse assuming the null hypothesis. The notion of worse, is implied by
the alternative hypothesis. In this example, the alternative is two sided as too
small a value or too large a value or the test statistic is consistent with HA. In
particular, the p-value is the probability of 42 or fewer or 58 or more answer
54
“yes” when the chance a person will answer “yes” is fifty-fifty.

Now, the p-value is not so small as to make an observation of 42 seem


unreasonable in 100 samples assuming the null hypothesis. Thus, one would
“accept” the null hypothesis. Next, we repeat, only suppose we ask 1000 people
and 420 say yes. Does this still support the null hypothesis that p = 0.5?

Now the p-value is tiny (that’s 0.0000004956!) and the null hypothesis is not
supported. That is, we “reject” the null hypothesis. This illustrates the the p
value depends not just on the ratio, but also n. In particular, it is because the
standard error of the sample average gets smaller as n gets larger.

Experiment 11.Implement R Program to predict data using Linear Regression model.

Linear regression is one of the simplest and most common supervised machine
learning algorithms that data scientists use for predictive modeling. In this post,
we’ll use linear regression to build a model that predicts cherry tree volume
from metrics that are much easier for folks who study trees to measure.

 Collect some data relevant to the problem (more is almost always better).
 Clean, augment, and preprocess the data into a convenient form, if
needed.
 Conduct an exploratory analysis of the data to get a better sense of it.
 Using what you find as a guide, construct a model of some aspect of the
data.
 Use the model to answer the question you started with, and validate your
results.

55
 The Simple Linear Regression is handled by the inbuilt function ‘lm’ in R.

Creating the Linear Regression Model and fitting it


with training_Set
regressor = lm(formula = Y ~ X, data = training_set)
This line creates a regressor and provides it with the data set to train.
Multiple Linear Regression is also handled by the function lm.

Creating the Multiple Linear Regressor and fitting


it with Training Set
regressor = lm(Y ~ .,data = training_set)
The expression ‘Y ~ .” takes all variables except Y in the training_set as
independent variables.
lm() Function
This function creates the relationship model between the predictor and the
response variable.
Create Relationship Model & get the coefficients
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)

# Apply the lm() function.


relation <- lm(y~x)

print(relation)
Call:
lm(formula = y ~ x)

56
Get the Summary of the Relationship
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Apply the lm() function.
relation <- lm(y~x)

predict() Function
Syntax
The basic syntax for predict() in linear regression is −
predict(object, newdata)
Following is the description of the parameters used −
 object is the formula which is already created using the lm() function.
 newdata is the vector containing the new value for predictor variable.

Predict the weight of new persons


# The predictor vector.
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)

# The resposne vector.


y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)

# Apply the lm() function.

57
relation <- lm(y~x)

# Find weight of a person with height 170.


a <- data.frame(x = 170)
result <- predict(relation,a)
print(result)
1
76.22869
Visualize the Regression Graphically

# Create the predictor and response variable.


x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
relation <- lm(y~x)

# Give the chart file a name.


png(file = "linearregression.png")
# Plot the chart.
plot(y,x,col = "blue",main = "Height & Weight Regression",
abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in cm")
# Save the file.
dev.off()

58
Experiment 11
The most used plotting function in R programming is the plot() function. It
is a generic function, meaning, it has many methods which are called according
to the type of object passed to plot().
In the simplest case, we can pass in a vector and we will get a scatter plot of
magnitude vs index. But generally, we pass in two vectors and a scatter plot of
these points are plotted.
For example, the command plot(c(1,2),c(3,5)) would plot the points (1,3) and
(2,5).
Here is a more concrete example where we plot a sine function form range
-pi to pi.

x <- seq(-pi,pi,0.1)
plot(x, sin(x))

59
Adding Titles and Labelling Axes
We can add a title to our plot with the parameter main. Similarly, xlab and ylab
can be used to label the x-axis and y-axis respectively.

plot(x, sin(x),
main="The Sine Function", ylab="sin(x)")

Changing Color and Plot Type


We can see above that the plot is of circular points and black in color. This is
the default color.
We can change the plot type with the argument type. It accepts the following
strings and has the given effect.

"p" - points "l" - lines


"b" - both points and lines
"c" - empty points joined by lines "o" - overplotted points and lines "s" and "S" -
stair steps
"h" - histogram-like vertical lines
"n" - does not produce any points or line
Similarly, we can define the color using col.

plot(x, sin(x),
main="The Sine Function", ylab="sin(x)",
type="l", col="blue")

60
R 3D PLOTS
There are many functions in R programming for creating 3D plots. In this
section, we will discuss on the persp() function which can be used to create 3D
surfaces in perspective view.
This function mainly takes in three variables, x, y and z where x and y are
vectors defining the location along x- and y-axis. The height of the surface (z-
axis) will be in the matrix z. As an example,
Let’s plot a cone. A simple right circular cone can be obtained with the
following function.

cone <- function(x, y){ sqrt(x^2+y^2)


}
Now let’s prepare our variables.

x<- y <- seq(-1, 1, length= 20)


z <- outer(x, y, cone)
We used the function seq() to generate vector of equally spaced numbers.
Then, we used the outer() function to apply the function cone at every
combination of x and y.
Finally, plot the 3D surface as follows.
persp(x, y, z)

61
Adding Titles and Labelling Axes to Plot
We can add a title to our plot with the parameter main. Similarly, xlab, ylab and
zlab can be used to label the three axes.
Rotational angles
We can define the viewing direction using parameters theta and phi.
By default theta, azimuthal direction, is 0 and phi, colatitude direction, is 15.
Colouring and Shading Plot
Colouring of the plot is done with parameter col. Similarly, we can add shading
with the parameter shade.
persp(x, y, z,
main="Perspective Plot of a Cone", zlab = "Height",
theta = 30, phi = 15,
col = "springgreen", shade = 0.5)

62

You might also like