Statistical Methods Lab Manual-2021-22
Statistical Methods Lab Manual-2021-22
R is a programming language and free software environment for statistical computing and
graphics that is supported by the R Foundation for Statistical Computing. The R language is
widely used among statisticians and data miners for developing statistical software and data
analysis.
scoping semantics inspired by Scheme. S was created by John Chambers in 1976, while at Bell
Labs. There are some important differences, but much of the code written for S runs unaltered.
R was created by Ross Ihaka and Robert Gentleman at the University of Auckland,
New Zealand, and is currently developed by the R Development Core Team, of which Chambers
is a member. R is named partly after the first names of the first two R authors and partly as a play
on the name of S. The project was conceived in 1992, with an initial version released in 1995
R and its libraries implement a wide variety of statistical and graphical techniques,
including linear and nonlinear modeling, classical statistical tests, time-series analysis,
classification, clustering, and others. R is easily extensible through functions and extensions, and
the R community is noted for its active contributions in terms of packages. Many of R's standard
functions are written in R itself, which makes it easy for users to follow the algorithmic choices
made.
user types 2+2 at the R command prompt and presses enter, the computer replies with 4, as
shown below:
2
>2+2
[1] 4
Features of R
R provides a large, coherent and integrated collection of tools for data analysis.
R provides graphical facilities for data analysis and display either directly at the computer or printing
at the papers.
2. Click the "download R" link in the middle of the page under "Getting Started."
3. Select a CRAN location (a mirror site) and click the corresponding link.
4. Click on the "Download R for WINDOWS" link at the top of the page.
6. Save the .pkg file, double-click it to open, and follow the installation instructions.
3. Click on the version recommended for your system, or the latest Mac version, save the
.dmg file on your computer, double-click it to open, and then drag and drop it to your
applications folder.
To Install R Packages:
The capabilities o R are extended through user-created packages, which allow specialized
Sweave), etc. These packages are developed primarily in R, and sometimes in Java, C, C++, and
FORTAN. The R packaging system is also used by researchers to create compendia to organize
research data, code and report files in a systematic way for sharing and public archiving.
A core set of packages is included with the installation of R, with more than 12,500
Network (CRAN).
Packages are collections of R functions, data, and compiled code in a well- defined format.
The directory where packages are stored is called the library. R comes with a standard set of
packages. Others are available for download and installation. Once installed, they have to be
4
. libPaths() # get library location
library() # see all
packagesinstalled
search() # see packages currently loaded
Adding R Packages: You can expand the types of analyses you do be adding other packages. A complete
2. To use the package, invoke the library(package) command to load it into the current
session. (You need to do this once in each session, unless you customize your
It turns out the ability to estimate ordered logistic or probit regression is included in the
MASS package.
To install this package you run the following command: 1 > install . packages
You will be asked to pick a CRAN mirror from which to download (generally the closer the
faster) and R will install the package to your library. R will still be clueless. To actually tell R to
use the new package you have to tell R to load the package’s library each time you start an R
>R now knows all the functions that are canned in the MASS package. To see what functions are
5
The Workspace
The workspace is your current R working environment and includes any user-defined
objects (vectors, matrices, data frames, lists, functions). At the end of an R session, the user can
save an image of the current workspace that is automatically reloaded the next time R is started.
Commands are entered interactively at the R user prompt. Up and down arrow keys scroll
You will probably want to keep different projects in different physical directories. Here are
settings
6
EXPERIMENT-2
EXPLORE THE DATA TYPES OF R AND DEMONSTRATE THE BASIC OPERATIONS ON DATA
TYPES.
1. DATA TYPES
You may like to store information of various data types like character, wide character, integer,
floating point, double floating point, Boolean etc. Based on the data type of a variable, the
operating system allocates memory and decides what can be stored in the reserved memory.
The variables are assigned with R-Objects and the data type of the R-object becomes the data type
of the variable. There are many types of R-objects. The frequently used ones are
Factors: Factors are the r-objects which are created using a vector. It stores the vector
along with the distinct values of the elements in the vector as labels. The labels are
Data Frames: Data frames are tabular data objects. Unlike a matrix in data frame each
column can contain different modes of data. The first column can be numeric while the
second column can be character and third column can be logical. It is a list of vectors of
equal length.
Lists: A list is an R-object which can contain many different types of elements inside it
7
Modes: All objects have a certain mode. Some objects can only deal with one mode at a time, others can
5. logical: data containing logical constants (i.e. TRUE and FALSE) By atomic, we
mean the vector only holds data of a single type.
numeric: 2, 15.5
R provides many functions to examine features of vectors and other objects, for example
[1] 2418
II. 697/41
[1] 17
8
2. Assign the value of 39 to x Sol: >
x<-39
> x [1]
39
3. Assign the value of 22 to y Sol: >
y<-22
> y [1]
22
4. Make z the value of x - y Sol:
> z<- x - y
5. Display the value of z in the console Sol: > z
[1] 17
9
6. Calculate the square root of 2345, and perform a log2 transformation on the result.
Sol : > log2(sqrt(2345)) [1]
5.597686
7. Type the following code, which assigns numbers to objects x and y. x <- 10 y <- 20
III. Calculate the 10-based logarithm of 100, and multiply the result with the
cosine of π. Hint: see ? log and ? pi.
Sol: > log10(100)*cos(pi) [1] -2
10
Built-inFunctions:
Almost everything in R is done through functions. Here I'm only referring to numeric and character
functions that are commonly used in creating or recoding variables.
Numeric Functions
Function Description
ceiling(x) ceiling(3.475) is 4
floor(x) floor(3.475) is 3
trunc(x) trunc(5.99) is 5
exp(x) e^x
2. Print the 1 to10 numbers in reverse order. Hint: use the rev function. Sol:
> rev(1:10)
[1] 10 9 8 7 6 5 4 3 2 1
11
3. Calculate the cumulative sum of those numbers, but in reverse order.
Sol: > rev(cumsum(1:10))
[1] 55 45 36 28 21 15 10 6 3 1
4. Find 10 random numbers between 0 and100. (Hint: you can use sample()
function)
Sol: > sample(1:100)
[1] 92 86 59 88 19 2 37 23 89 29 18 87 15 30 32 63 14 75
[19] 12 49 72 66 24 20 54 68 48 69 5 99 22 61 83 90 7 94
[37] 81 3 84 43 26 82 80 53 41 27 71 9 38 1 47 10 51 40
[55] 46 44 13 45 100 34 42 79 6 96 4 97 57 28 73 95 91 65
[73] 93 58 39 8 16 17 78 60 36 35 74 85 55 31 76 25 98 70
[91] 33 77 21 56 52 67 50 62 11 64
12
EXPERIMENT-3
Vectors are generally created using the c() function. Since, a vector must have elements of the same type;
this function will try and coerce elements to the same type, if they are different.
Coercion is from lower to higher types from logical to integer to double to character.
x <- c(1, 5, 4, 9, 0)
typeof(x)
[1] "double"
length(x)
typeof(x)
13
Example 1: Creating a vector using : operator
x <- 1:7;x
y <- 2:-2; y
More complex sequences can be created using the seq() function, like defining number of points in an
interval, or the step size.
[1] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0
VECTORS EXERCISE - I
1. Consider two vectors, x, y
x=c(4,6,5,7,10,9,4,15)
y=c(0,10,1,8,2,3,4,1) What is the value of: x*y and x+y
14
> x+y
[1] 4 16 6 15 12 12 8 16
3. If x=c(1:12)
What is the value of: dim(x) What is the
value of: length(x) Sol:
> x<-c(1:12)
> dim(x)
NULL
> length(x)
[1] 12
15
>y
[1] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x"
> x<y
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
6. If x=c ('blue', 'red', 'green', 'yellow') what is the value of: is.character(x).
Sol:
> is.character(x)
[1] TRUE
> is.character(x)
[1] TRUE
> a<-c(10,2,4,15)
> b<-c(3,12,4,11)
>a
[1] 10 2 4 15
>b
[1] 3 12 4 11
> rbind(a,b)
16
[,1] [,2] [,3] [,4]
a 10 2 4 15
b 3 12 4 11
VECTORS EXERCISE - II
1. The numbers below are the first ten days of rainfall amounts in 1996. Read them in to a
vector using the c() function 0.1, 0.6, 33.8, 1.9, 9.6, 4.3, 33.7, 0.3, 0.0, 0.1
Sol:
> rainfall<-c(0.1, 0.6, 33.8, 1.9, 9.6, 4.3, 33.7, 0.3, 0.0, 0.1)
> rainfall
[1] 0.1 0.6 33.8 1.9 9.6 4.3 33.7 0.3 0.0 0.1
2. Inspect Table and answer the following questions:
I. What was the mean rainfall, how about the standard deviation?
Sol:
rainfall
[1] 0.1 0.6 33.8 1.9 9.6 4.3 33.7 0.3 0.0 0.1
> mean(rainfall) [1]
8.44
> sd(rainfall) [1]
13.66473
II. Calculate the cumulative rainfall (’running total’) over these ten days. Confirm
that the last value of the vector that this produces is equal to the total sum of the
rainfall.
Sol:
> rainfall
[1] 0.1 0.6 33.8 1.9 9.6 4.3 33.7 0.3 0.0 0.1
> cumsum(rainfall)
[1] 0.1 0.7 34.5 36.4 46.0 50.3 84.0 84.3 84.3 84.4
17
> sum(rainfall)==rainfall[10]
[1] FALSE
III. Which day saw the highest rainfall? Hint which.max()
Sol:
> rainfall
[1] 0.1 0.6 33.8 1.9 9.6 4.3 33.7 0.3 0.0 0.1
> max(rainfall) [1]
33.8
18
3. Compute the problem sum ((x - mean(x)) ^2).
Sol:
> x<-c(1:10)
> sum ((x - mean(x)) ^2)
[1] 82.5
4. The weights of five people before and after a diet programme are
given in the table.
Read the `before' and `after' values into two different vectors called before and after. Use R to
evaluate the amount of weight lost for each participant. What is the average amount of weight
lost?
Sol:
> before
[1] 78 72 78 79 105
> after
[1] 67 65 79 70 93
> weightlost<-before-after
> weightlost
[1] 11 7 -1 9 12
> mean(weightlost)
[1] 7.6
Matrices: A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the
matrix function.
Creating Matrices: To create matrices we will use the matrix() function. The matrix()
function takes the following arguments:
• data an R object (this could be a vector).
• nrow the desired number of rows.
• ncol the desired number of columns.
• byrow a logical statement to populate the matrix by either row or by
19
column.
Creation of matrix
Manipulation of Matrix
f) matrix1
Sol:
> matrix1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
g) matrix1[1, 3]
Sol:
> matrix1[1, 3]
[1] 7
matrix1[ 2, ]
Sol:
> matrix1[ 2, ]
[1] 2 5 8
h) matrix1[,-2]
Sol:
> matrix1[,-2]
[,1] [,2]
[1,] 1 7
[2,] 2 8
[3,] 3 9
j) matrix1[1, 1] = 15 Sol:
> matrix1[1, 1] = 15
> matrix1
[,1] [,2] [,3]
[1,] 15 4 7
[2,] 2 5 8
[3,] 3 6 9
k) matrix1[ ,2 ] = 1 Sol:
> matrix1
21
[,1] [,2] [,3]
[1,] 15 1 7
[2,] 2 1 8
[3,] 3 1 9
l) matrix1[ ,2:3 ] = 2 Sol:
> matrix1[ ,2:3 ] = 2
> matrix1
[,1] [,2] [,3]
[1,] 15 2 2
[2,] 2 2 2
[3,] 3 2 2
Mathematical Operations
R can do matrix arithmetic. Below is a list of some basic operations we can do.
+ - * / standard scalar or by element operations
%*% matrix multiplication
t() transpose
solve() inverse
det() determinant
chol() cholesky decomposition
eigen() eigenvalues and eigenvectors
crossprod() cross product.
[1,] 1 2 3
[2,] 4 2 6
[3,] -3 -1 -3
> B%*%B%*%B
[,1] [,2] [,3]
[1,] -6 0 0
[2,] 0 -6 0
22
[3,] 0 0 -6
23
sol:
[1,] 1 3 5 7
[2,] 2 4 6 8
b) Calculate Transpose.
Sol:
> t(m)
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
[4,] 7 8
c) Calculate Inverse.
Sol:
> solve(m)
Error in solve.default(m) : 'a' (2 x 4) must be square
24
e) Calculate the Multiplication of the matrix.
Sol:
> m1<-m%*%m
> m1
[,1] [,2] [,3]
[1,] 52 49 62
[2,] 45 57 79
[3,] 116 106 143
>
f) Construct a matrix with 10 columns and 10 rows, all filled with
random numbers between 0 and 100.
Sol:
m <-matrix(runif(100), ncol=10)
g) Calculate the row means of this matrix (Hint: use rowMeans). Also
calculate the standard deviation across the row means (now also use sd().
Sol:
> m1<-rowMeans(m)
> m1
[1] 0.3885344 0.6758386 0.4342555 0.5735385 0.5112892
0.4370579 0.4852983
[8] 0.6234814 0.6275129 0.7056754
> sd(m1)
[1] 0.1104536
h) Now remake the above matrix with 100 columns, and 10 rows. Then
calculate the column means (using, of course, colMeans).
Sol:
>m <- matrix(runif(1000), ncol=100,nrow=10)
> m1<-colMeans(m)
> m1
25
EXPERIMENT-4
EXPLORE THE CONTROL STRUCTURES OF R AND DEMONSTRATE WITH ONE EXAMPLE
R if statement
if (test_expression) {
statement
If the test_expression is TRUE, the statement gets executed. But if it’s FALSE, nothing happens.
Here, test_expression can be a logical or numeric vector, but only the first element is taken into
consideration.
In the case of numeric vector, zero is taken as FALSE, rest as TRUE.
Flowchart of if statement
26
Example: if statement
x <- 5
print("Positive number")
Output
} else {
27
} else {
} else {
Output 1
factorial = 1
0 is 1")
} else {
28
for(i in 1:num){
factorial = factorial * i
Output
Enter a number: 8
} else {
Enter a number: 89
29
ITERATIVE CONTROL STRUCTURES
FOR LOOP
A for loop is used to iterate over a vector in R programming.
statement
Here, sequence is a vector and val takes on each of its value during the loop. In each iteration, statement is
evaluated.
30
1. Program to count the number of even numbers in a vector.
x <- c(2,5,3,9,8,11,6)
count <- 0
for (val in x) {
print(count)
Output
# Program to check if the input number is prime or not # take inputfrom the
user
flag = 0
31
for(i in 2:(num-1)){
if ((num %% i) == 0) { flag = 0
break
} else {
Output 1
Enter a number: 25
32
3. Program to display multiplication table.
user
10 times
for(i in 1:10) {
Output
Enter a number: 7
33
ITERATIVE CONTROL STRUCTURES
WHILE LOOP
In R programming, while loops are used to loop until a specific condition is
met.
Syntax of while loop
while (test_expression)
statement
Here, test_expression is evaluated and the body of the loop is entered if the result is TRUE.
The statements inside the loop are executed and the flow returns to
evaluate the test_expression again.
This is repeated each time until test_expression evaluates to FALSE, in which case, the loop exits.
34
Example of while Loop
i <-
while (i < 6) {
print(i)
i = i+1
Output
# initialize sum
sum = 0
35
# find the sum of the cube of each digit temp = num
temp %% 10
floor(temp / 10)
} else {
Output 1
Enter a number: 23
36
if(num < 0) {
} else {
sum = 0
num - 1
Output
Enter a number: 10
37
n1 = 0
n2 = 1
count = 2
if(nterms <= 0) {
} else {
if(nterms == 1) {
print("Fibonacci sequence:")
print(n1)
} else {
print("Fibonacci sequence:")
print(n1)
print(n2)
nth = n1 +n2
print(nth)
# update values
n1 = n2
38
n2 = nth
Output
39
EXPERIMENT-5
CREATE R FUNCTIONS AND USE THEM WITH SIMPLE SCRIPTS.
Data frame is a two dimensional data structure in R. It is a special case of a list which has
each component of equal length. Each component form the column and contents of the
of 3 variables:
$ SN : int 2
$ Age : num 21 15
Notice above that the third column, Name is of type factor, instead of a character vector.
By default, data.frame() function converts character vector into factor.
To suppress this behavior, we can pass the argument
stringsAsFactors=FALSE.
obs. of 3 variables:
40
$ SN : int 1 2
$ Age : num 21 15
Many data input functions of R like, read.table(), read.csv(), read.delim(), read.fwf() also read
41
EXPERIMENT-6
EXPLORE THE DATA ANALYTICS LIFE CYCLE.
As a data analyst or someone who works with data regularly, it’s important to understand how to
manage a data analytics project so you can ensure efficiency and get the best results for your clients. One
The data analytics lifecycle describes the process of conducting a data analytics project, which
consists of six key steps based on the CRISP-DM methodology. These steps include: understanding the
business issue, understanding the data set, preparing the data, exploratory analysis, validation, and
1. Understand the Business Issues: When presented with a data project, you will be given a brief outline
of the expectations. From that outline, you should identify the key objectives that the business is trying to
uncover. You should examine the overall scope of the work, business objectives, information the
stakeholders are seeking, the type of analysis they want you to use, and the deliverables (the outputs of the
You need to have these elements clearly defined prior to beginning your data analysis project to
provide the best deliverable you can. Additionally, it’s important to ask as many questions as you can at the
outset of the project because, often, you may not have another chance before the completion of the project.
42
2. Understand Your Data Set: There are a variety of tools you can use to organize your data. When
presented with a small dataset, you can use Excel, but for heftier jobs, you’ll likely want to use more rigid
tools to explore and prepare your data. Muñoz suggests R, Python, Alteryx, Tableau Prep or Tableau
Desktop to help prepare your data for it’s cleaning. Within these programs, you should identify key
variables to help categorize the data. When going through the data sets, look for errors in the data. These
can be anything from omitted data, data that doesn’t logically make sense, duplicate data, or even spelling
errors. These missing variables need to be amended so you can properly clean your data.
3. Prepare the Data: Once you have organized and identified all the variables in your dataset, you can
begin cleaning. In this step, you will input missing variables, create new broad categories to help categorize
data that doesn’t have a proper place, and remove any duplicates in your data. Imputing average data scores
for categories where there are missing values will help the data be processed more efficiently without
skewing it.
4. Perform Exploratory Analysis and Modeling: In this step, you will begin building models to test your
data and seek out answers to the objectives given. Using different statistical modeling methods, you can
determine which is the best for your data. Common models include linear regressions, decision trees, and
5. Validate Your Data: Once you have crafted your models, you’ll need to assess the data and determine
if you have the correct information for your deliverable. Did the models work properly? Does the data need
more cleaning? Did you find the outcome the client was looking to answer? If not, you may need to go
over the previous steps again. You should expect a lot of trial and error!
43
6. Visualize and Present Your Findings: Once you have all your deliverables met, you can begin your
data visualization. In many cases, data visualization will be crucial in communicating your findings to the
client. Not all clients are data-savvy, and interactive visualization tools like Tableau are tremendously
useful in illustrating your conclusions to clients. Being able to tell a story with your data is essential.
Telling a story will help explain to the client the value of your findings.
As with any project, you need to identify your objectives clearly. Outlining your work will ensure
you get the best deliverables for your clients. While all of these steps are important, if you start the project
without all the data you need, you are likely to have to backtrack.
44
EXPERIMENT-7
IMPORTING & EXPORTING THE DATA FROM (I) CSV FILE (II) EXCEL FILE.
1.Reading different types of data sets (.txt, .csv) from web and disk and writing in file in specific disk
location.
library(utils)
data<- read.csv("input.csv") data
Output :-
id, name, salary, start_date, dept
1 1 Rick 623.30 2012-01-01 IT
2 2 Dan 515.20 2013-09-23 Operations
3 3 Michelle 611.00 2014-11-15 IT
4 4 Ryan 729.00 2014-05-11 HR
5 NA Gary 843.25 2015-03-27 Finance
6 6 Nina 578.00 2013-05-21 IT
7 7 Simon 632.80 2013-07-30 Operations
8 8 Guru 722.50 2014-06-17 Finance
data<- read.csv("input.csv")
Output:-
[1] TRUE
[1] 5
[1] 8
install.packages("xlsx") library("xlsx")
data<- read.xlsx("input.xlsx", sheetIndex = 1) data
Output:-
Output:-
1
Rick 623.3
1/1/2012
IT
2
Dan 515.2
9/23/2013
Operations
3
Michelle 611
11/15/2014
IT
4
Ryan 729
5/11/2014
HR
5
Gary 843.25
3/27/2015
Finance
6
Nina 578
5/21/2013
IT
7
Simon 632.8
7/30/2013
Operations
8
Guru 722.5
6/17/2014, Financ
46
EXPERIMENT-8
DATA VISUALIZATIONS
Install.packages(“ggplot2”)
Library(ggplot2)
Input <- mtcars[,c('mpg','cyl')]
input
Dev.off()
Output :-
mpg cyl
Mazda rx4 21.0 6
Mazda rx4 wag 21.0 6
Datsun 710 22.8 4
Hornet 4 drive 21.4 6
Hornet sportabout 18.7 8
Valiant 18.1 6
47
b. Find the outliers using plot.
v=c(50,75,100,125,150,175,200)
boxplot(v)
c. Plot the histogram, bar chart and pie chart on sample data.
Histogram
library(graphics)
v <- c(9,13,21,8,36,22,12,41,31,33,19)
Output:-
48
Bar chart
library(graphics)
H <- c(7,12,28,3,41)
M <- c("Jan","Feb","Mar","Apr","May")
# Plot the bar chart.
barplot(H,names.arg = M,xlab = "Month",ylab = "Revenue",col = "blue",main = "Revenue chart",border
= "red")
dev.off()
Pie Chart
library(graphics)
x <- c(21, 62, 10, 53)
labels<- c("London", "NewYork", "Singapore", "Mumbai")
# Plot the Pie chart.
pie(x,labels)
dev.off()
49
New York
London
S in gapore
Mumbai
50
EXPERIMENT-9
IN DETAIL.
size<-c(1.4,2.6,1.0,3.7,5.5,3.2,3.0,4.9,6.3)
weight<-c(0.9,1.8,2.4,3.5,3.9,4.4,5.1,5.6,6.3)
tail<-c(0.7,1.3,0.7,2.0,3.6,3.0,2.9,3.9,4.0)
mouse<-data.frame(size,weight,tail)
mouse
plot(mouse$weight,mouse$size)
simple<-lm(size~weight,data=mouse)
summary(simple)
abline(simple,col="red",lwd=2)
Output::
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5813 0.9647 0.603 0.5658
weight 0.7778 0.2334 3.332 0.0126 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Output Console::
51
Output Plots::
52
EXPERIMENT-10
DETAIL.
no<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25)
dt<-
c(16.68,11.50,12.03,14.88,13.75,18.11,8.00,17.83,79.24,21.50,40.33,21.00,13.50,19.75,24.00,29.00,15.35
,19.00,9.50,35.10,17.90,52.32,18.75,19.83,10.75)
cases<-c(7,3,3,4,6,7,2,7,30,5,16,10,4,6,9,10,6,7,3,17,10,26,9,8,4)
distance<-
c(560,220,340,80,150,330,110,210,1460,605,688,215,255,462,448,776,200,132,36,770,140,810,450,635,
150)
vending<-data.frame(dt,cases,distance)
plot(vending)
mlr<-lm(dt~cases+distance)
summary(mlr)
Output::
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.341231 1.096730 2.135 0.044170 *
cases 1.615907 0.170735 9.464 3.25e-09 ***
distance 0.014385 0.003613 3.981 0.000631 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
53
Output Console:
Output Plots::
54
EXPERIMENT-11
DETAIL.
df <- read.csv(“https://stats.idre.ucla.edu/stat/data/binary.csv")
str(df)
## ‘data.frame’: 400 obs. of 4 variables:
## $ admit: int 0 1 1 1 0 1 1 0 1 0 …
## $ gre : int 380 660 800 640 520 760 560 400 540 700 …
## $ gpa : num 3.61 3.67 4 3.19 2.93 3 2.98 3.08 3.39 3.92 …
## $ rank : int 3 3 1 4 4 2 1 2 3 2 …
sum(is.na(df))## [1] 0
summary(logit)##
## Call:
## glm(formula = admit ~ gre + gpa + rank, family = "binomial",
## data = df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6268 -0.8662 -0.6388 1.1490 2.0790
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
55
## (Intercept) -3.989979 1.139951 -3.500 0.000465 ***
## gre 0.002264 0.001094 2.070 0.038465 *
## gpa 0.804038 0.331819 2.423 0.015388 *
## rank2 -0.675443 0.316490 -2.134 0.032829 *
## rank3 -1.340204 0.345306 -3.881 0.000104 ***
## rank4 -1.551464 0.417832 -3.713 0.000205 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 499.98 on 399 degrees of freedom
## Residual deviance: 458.52 on 394 degrees of freedom
## AIC: 470.52
##
## Number of Fisher Scoring iterations: 4
x <- data.frame(gre=790,gpa=3.8,rank=as.factor(1))
p<- predict(logit,x)
p## 1
## 0.85426
56
EXPERIMENT-12
fitall<-read.csv("C:\\Users\\Blessy Anjaleena\\Desktop\\Fitting.csv")
plot(fitall)
fit<-lm(y~x1+x2+x3+x4,data=fitall)
summary(fit)
#backward selection
step(fit,direction="backward")
fitstart=lm(y~1,data=fitall)
fitstart
#forward selection
f<-step(fitstart,direction="forward",scope=formula(fitall))#forward selection
summary(f)
Output::
Series: temp.ts
ARIMA(0,1,0)
57
Output Console::
Output Plots::
58