0% found this document useful (0 votes)
64 views58 pages

Statistical Methods Lab Manual-2021-22

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views58 pages

Statistical Methods Lab Manual-2021-22

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

INDEX

S.No Name of the Experiment Page


Number
Exploring R, R-Studio Environment and Installation process, Explore the
features.
1.
Explore the data types of R and demonstrate the basic operations on data
types.
2.
3. Create vectors and Matrices

4. Explore the control structures of R and demonstrate with one example


under each case.
5. Create R functions and use them with simple scripts.
6. Explore the Data Analytics Life Cycle.
7. Importing & Exporting the data from (i) CSV file (ii) Excel file.
Data VISUALIZATIONS through
8. a) Histogram
b) Pie Chart
c) Box Plot
d) Density Plots
9. Demonstrate simple linear regression analysis. Analyze results in detail.

10. Demonstrate multiple regression models. Analyze results in detail.

11. Demonstrate logistic regression model. Analyze results in detail.


12.
Demonstrate other regression model. Analyze results in detail.
EXPERIMENT-1

Exploring R, R-Studio Environment and Installation process Explore the features

R is a programming language and free software environment for statistical computing and

graphics that is supported by the R Foundation for Statistical Computing. The R language is

widely used among statisticians and data miners for developing statistical software and data

analysis.

R is an implementation of the S programming language combined with lexical

scoping semantics inspired by Scheme. S was created by John Chambers in 1976, while at Bell

Labs. There are some important differences, but much of the code written for S runs unaltered.

R was created by Ross Ihaka and Robert Gentleman at the University of Auckland,

New Zealand, and is currently developed by the R Development Core Team, of which Chambers

is a member. R is named partly after the first names of the first two R authors and partly as a play

on the name of S. The project was conceived in 1992, with an initial version released in 1995

and a stable beta version in 2000

R and its libraries implement a wide variety of statistical and graphical techniques,

including linear and nonlinear modeling, classical statistical tests, time-series analysis,

classification, clustering, and others. R is easily extensible through functions and extensions, and

the R community is noted for its active contributions in terms of packages. Many of R's standard

functions are written in R itself, which makes it easy for users to follow the algorithmic choices

made.

R is an interpreted language; users typically access it through a command- line interpreter. If a

user types 2+2 at the R command prompt and presses enter, the computer replies with 4, as

shown below:

2
>2+2

[1] 4

Features of R

As stated earlier, R is a programming language and software environment for statistical

analysis, graphics representation and reporting.

The following are the important features of R

 R is a well-developed, simple and effective programming language which includes


conditionals, loops; user defined recursive functions and input and output facilities.

 R has an effective data handling and storage facility,

 R provides a suite of operators for calculations on arrays, lists, vectors and


matrices.

 R provides a large, coherent and integrated collection of tools for data analysis.

 R provides graphical facilities for data analysis and display either directly at the computer or printing
at the papers.

To Install R and R Packages

1. Open an internet browser and go to www.r-project.org.

2. Click the "download R" link in the middle of the page under "Getting Started."

3. Select a CRAN location (a mirror site) and click the corresponding link.

4. Click on the "Download R for WINDOWS" link at the top of the page.

5. Click on the file containing the latest version of R under "Files."

6. Save the .pkg file, double-click it to open, and follow the installation instructions.

7. Now that R is installed, you need to download and install R Studio.


3
To Install RStudio:

1. Go to www.rstudio.com and click on the "Download RStudio" button.

2. Click on "Download RStudio Desktop."

3. Click on the version recommended for your system, or the latest Mac version, save the
.dmg file on your computer, double-click it to open, and then drag and drop it to your

applications folder.

To Install R Packages:

The capabilities o R are extended through user-created packages, which allow specialized

statistical techniques, graphical devices, import/export capabilities, reporting tools (knitr,

Sweave), etc. These packages are developed primarily in R, and sometimes in Java, C, C++, and

FORTAN. The R packaging system is also used by researchers to create compendia to organize

research data, code and report files in a systematic way for sharing and public archiving.

A core set of packages is included with the installation of R, with more than 12,500

additional packages (as of May 2018[update]) available at the Comprehensive R Archive

Network (CRAN).

Packages are collections of R functions, data, and compiled code in a well- defined format.

The directory where packages are stored is called the library. R comes with a standard set of

packages. Others are available for download and installation. Once installed, they have to be

loaded into the session to be used.

4
. libPaths() # get library location
library() # see all
packagesinstalled
search() # see packages currently loaded

Adding R Packages: You can expand the types of analyses you do be adding other packages. A complete

list of contributed packages is available from CRAN.

Follow these steps:

1. Download and install a package (you only need to do this once).

2. To use the package, invoke the library(package) command to load it into the current
session. (You need to do this once in each session, unless you customize your

environment to automatically load it each time.)

Installing and Loading Packages

It turns out the ability to estimate ordered logistic or probit regression is included in the

MASS package.

To install this package you run the following command: 1 > install . packages

(" MASS ")

You will be asked to pick a CRAN mirror from which to download (generally the closer the

faster) and R will install the package to your library. R will still be clueless. To actually tell R to

use the new package you have to tell R to load the package’s library each time you start an R

session, just like so:

1 > library (" MASS ")

>R now knows all the functions that are canned in the MASS package. To see what functions are

implemented in the MASS package, type:

1 > library ( help = " MASS ")

5
The Workspace

The workspace is your current R working environment and includes any user-defined

objects (vectors, matrices, data frames, lists, functions). At the end of an R session, the user can

save an image of the current workspace that is automatically reloaded the next time R is started.

Commands are entered interactively at the R user prompt. Up and down arrow keys scroll

through your command history.

You will probably want to keep different projects in different physical directories. Here are

some standard commands for managing your workspace.

getwd( ) # print the current working directory . ls ( ) # list the

objects in the current workspace.

Setwd (mydirectory) # change to my directory

setwd ("c:/docs/mydir") # note / instead of \ in windows

# view and set options for the session help(options) # learn

about available options options( ) # view current option

settings

6
EXPERIMENT-2

EXPLORE THE DATA TYPES OF R AND DEMONSTRATE THE BASIC OPERATIONS ON DATA
TYPES.

1. DATA TYPES

You may like to store information of various data types like character, wide character, integer,

floating point, double floating point, Boolean etc. Based on the data type of a variable, the

operating system allocates memory and decides what can be stored in the reserved memory.

The variables are assigned with R-Objects and the data type of the R-object becomes the data type

of the variable. There are many types of R-objects. The frequently used ones are

 Vectors: A basic data structure of R containing the same type of data.

 Matrices: A matrix is a two-dimensional rectangular data set. It can be created using a

vector input to the matrix function.

 Factors: Factors are the r-objects which are created using a vector. It stores the vector

along with the distinct values of the elements in the vector as labels. The labels are

always character irrespective of whether it is numeric or character or Boolean etc. in the

input vector. They are useful in statistical modelling.

 Data Frames: Data frames are tabular data objects. Unlike a matrix in data frame each

column can contain different modes of data. The first column can be numeric while the

second column can be character and third column can be logical. It is a list of vectors of

equal length.

 Lists: A list is an R-object which can contain many different types of elements inside it

like vectors, functions and even another list inside it

7
Modes: All objects have a certain mode. Some objects can only deal with one mode at a time, others can

store elements of multiple modes. R distinguishes the following modes:

1. integer: integers (e.g. 1, 2 or -69)

2. numeric: real numbers (e.g 2.336, -0.35)

3. complex: complex or imaginary numbers

4. character: elements made up of text-strings (e.g. "text", "Hello World!", or "123")

5. logical: data containing logical constants (i.e. TRUE and FALSE) By atomic, we
mean the vector only holds data of a single type.

character: "a", "swc"

 numeric: 2, 15.5

 integer: 2L (the L tells R to store this as an integer)

 logical: TRUE, FALSE

 complex: 1+4i(complex numbers with real and imaginary parts)

R provides many functions to examine features of vectors and other objects, for example

 class( ) - what kind of object is it (high-level)?

 typeof( ) - what is the object’s data type (low-level)?

 length( ) - how long is it? What about two dimensional object

1. Use R to calculate the following:


I. 31 * 78
Sol: > 31*78

[1] 2418

II. 697/41

Sol: > 697 /41

[1] 17

8
2. Assign the value of 39 to x Sol: >
x<-39
> x [1]
39
3. Assign the value of 22 to y Sol: >
y<-22
> y [1]
22
4. Make z the value of x - y Sol:
> z<- x - y
5. Display the value of z in the console Sol: > z
[1] 17

9
6. Calculate the square root of 2345, and perform a log2 transformation on the result.
Sol : > log2(sqrt(2345)) [1]
5.597686
7. Type the following code, which assigns numbers to objects x and y. x <- 10 y <- 20

I. Calculate the product of x and y. Sol: >


x<-10
> y<-20
> x*y
[1] 200
II. Store the result in a new object called z. Sol: > z<-
x*y
>z
[1] 200
8. Calculate the following quantities:
I. The sum of 100.1, 234.9 and 12.01.
Sol: > 100.1+234.9+12.01
[1] 347.01

II. The square root of 256. Sol:


> sqrt(256)
[1] 16

III. Calculate the 10-based logarithm of 100, and multiply the result with the
cosine of π. Hint: see ? log and ? pi.
Sol: > log10(100)*cos(pi) [1] -2

10
Built-inFunctions:
Almost everything in R is done through functions. Here I'm only referring to numeric and character
functions that are commonly used in creating or recoding variables.

Numeric Functions

Function Description

abs(x) absolute value

sqrt(x) square root

ceiling(x) ceiling(3.475) is 4

floor(x) floor(3.475) is 3

trunc(x) trunc(5.99) is 5

round(x, digits=n) round(3.475, digits=2) is 3.48

signif(x, digits=n) signif(3.475, digits=2) is 3.5

cos(x), sin(x), tan(x) also acos(x), cosh(x), acosh(x), etc.

log(x) natural logarithm

log10(x) common logarithm

exp(x) e^x

1. Calculate the cumulative sum (’running total’) of the numbers 2, 3, 4, 5, 6.


Hint: use cumsum() Function.
Sol: > sum(2:6) [1]
20
> cumsum(2:6) [1] 2 5 9 14 20

2. Print the 1 to10 numbers in reverse order. Hint: use the rev function. Sol:
> rev(1:10)

[1] 10 9 8 7 6 5 4 3 2 1

11
3. Calculate the cumulative sum of those numbers, but in reverse order.
Sol: > rev(cumsum(1:10))
[1] 55 45 36 28 21 15 10 6 3 1
4. Find 10 random numbers between 0 and100. (Hint: you can use sample()
function)
Sol: > sample(1:100)
[1] 92 86 59 88 19 2 37 23 89 29 18 87 15 30 32 63 14 75
[19] 12 49 72 66 24 20 54 68 48 69 5 99 22 61 83 90 7 94
[37] 81 3 84 43 26 82 80 53 41 27 71 9 38 1 47 10 51 40
[55] 46 44 13 45 100 34 42 79 6 96 4 97 57 28 73 95 91 65
[73] 93 58 39 8 16 17 78 60 36 35 74 85 55 31 76 25 98 70
[91] 33 77 21 56 52 67 50 62 11 64

5. Calculate and Verify the value of x where x = 5, 5*x -> x, x


Sol: > x<- 5
> 5*x->x, output: 25
6. x Compute log to the base 10 (log10) of the sqrt of 100. Do not use variables.
Sol: > log10(sqrt(100))
[1] 1

12
EXPERIMENT-3

CREATING VETORS AND MATRICES

Vectors: A basic data structure of R containing the same type of data.

Vectors are generally created using the c() function. Since, a vector must have elements of the same type;

this function will try and coerce elements to the same type, if they are different.

Coercion is from lower to higher types from logical to integer to double to character.

x <- c(1, 5, 4, 9, 0)

typeof(x)

[1] "double"

length(x)

x <- c(1, 5.4, TRUE, "hello")

[1] "1" "5.4" "TRUE" "hello"

typeof(x)

If we want to create a vector of consecutive numbers, the : operator is very helpful.

13
Example 1: Creating a vector using : operator

x <- 1:7;x

y <- 2:-2; y

More complex sequences can be created using the seq() function, like defining number of points in an
interval, or the step size.

Example 2: Creating a vector using seq() function

seq(1, 3, by=0.2) # specify step size

[1] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0

seq(1, 5, length.out=4) # specify length of the vector

[1] 1.000000 2.333333 3.666667 5.000000

VECTORS EXERCISE - I
1. Consider two vectors, x, y
x=c(4,6,5,7,10,9,4,15)
y=c(0,10,1,8,2,3,4,1) What is the value of: x*y and x+y

Sol: > x<-c(4,6,5,7,10,9,4,15)


> y<-c(0,10,1,8,2,3,4,1)
>x
[1] 4 6 5 7 10 9 4 15
>y
[1] 0 10 1 8 2 3 4 1
> x*y
[1] 0 60 5 56 20 27 16 15

14
> x+y
[1] 4 16 6 15 12 12 8 16

2. Consider two vectors, a, b


a=c(1,5,4,3,6)
b=c(3,5,2,1,9) What is the value of: a<=b Sol:
> a<-c(1,5,4,3,6)
> b<-c(3,5,2,1,9)
> a<=b
[1] TRUE TRUE FALSE FALSE TRUE

3. If x=c(1:12)
What is the value of: dim(x) What is the
value of: length(x) Sol:
> x<-c(1:12)
> dim(x)
NULL
> length(x)
[1] 12

4. If a=c(12:5) What is the value of: is.numeric(a)


Sol:
> a<-c(12:5)
> typeof(a)
[1] "integer"
> is.numeric(a)
[1] TRUE

5. Consider two vectors, x, y


x=letters [1:10]
y=letters[15:24] What is the value of: x<y Sol:
> x<-letters[1:10]
> y<-letters[15:24]
>x
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

15
>y
[1] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x"
> x<y
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

6. If x=c ('blue', 'red', 'green', 'yellow') what is the value of: is.character(x).
Sol:

> x<-c ('blue', 'red', 'green', 'yellow')


> typeof(x)
[1] "character"

> is.character(x)
[1] TRUE

7. If x=c('blue',10,'green',20) What is the value of: is.character(x). Sol:


> typeof(x)
[1] "character"

> is.character(x)
[1] TRUE

8. Consider two vectors, a, b


a=c(10,2,4,15)
b=c(3,12,4,11) What is the value of: rbind(a,b) SOL:

> a<-c(10,2,4,15)
> b<-c(3,12,4,11)
>a
[1] 10 2 4 15

>b
[1] 3 12 4 11

> rbind(a,b)

16
[,1] [,2] [,3] [,4]
a 10 2 4 15
b 3 12 4 11

9. Consider two vectors, a, b


a=c(1,2,4,5,6)
b=c(3,2,4,1,9) What is the value of: cbind(a,b) Sol:
> a=c(1,2,4,5,6)
> b=c(3,2,4,1,9)
> cbind (a,b)
ab
[1,] 1 3
[2,]2 2
[3,] 4 4
[4,] 5 1
[5,] 69

VECTORS EXERCISE - II

1. The numbers below are the first ten days of rainfall amounts in 1996. Read them in to a
vector using the c() function 0.1, 0.6, 33.8, 1.9, 9.6, 4.3, 33.7, 0.3, 0.0, 0.1
Sol:
> rainfall<-c(0.1, 0.6, 33.8, 1.9, 9.6, 4.3, 33.7, 0.3, 0.0, 0.1)
> rainfall
[1] 0.1 0.6 33.8 1.9 9.6 4.3 33.7 0.3 0.0 0.1
2. Inspect Table and answer the following questions:
I. What was the mean rainfall, how about the standard deviation?
Sol:
rainfall
[1] 0.1 0.6 33.8 1.9 9.6 4.3 33.7 0.3 0.0 0.1
> mean(rainfall) [1]
8.44
> sd(rainfall) [1]
13.66473

II. Calculate the cumulative rainfall (’running total’) over these ten days. Confirm
that the last value of the vector that this produces is equal to the total sum of the
rainfall.

Sol:
> rainfall
[1] 0.1 0.6 33.8 1.9 9.6 4.3 33.7 0.3 0.0 0.1
> cumsum(rainfall)
[1] 0.1 0.7 34.5 36.4 46.0 50.3 84.0 84.3 84.3 84.4
17
> sum(rainfall)==rainfall[10]
[1] FALSE
III. Which day saw the highest rainfall? Hint which.max()
Sol:
> rainfall
[1] 0.1 0.6 33.8 1.9 9.6 4.3 33.7 0.3 0.0 0.1
> max(rainfall) [1]
33.8

18
3. Compute the problem sum ((x - mean(x)) ^2).
Sol:
> x<-c(1:10)
> sum ((x - mean(x)) ^2)
[1] 82.5

4. The weights of five people before and after a diet programme are
given in the table.

Read the `before' and `after' values into two different vectors called before and after. Use R to
evaluate the amount of weight lost for each participant. What is the average amount of weight
lost?

Sol:
> before
[1] 78 72 78 79 105

> after
[1] 67 65 79 70 93

> weightlost<-before-after
> weightlost
[1] 11 7 -1 9 12
> mean(weightlost)

[1] 7.6

Matrices: A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the

matrix function.

Creating Matrices: To create matrices we will use the matrix() function. The matrix()
function takes the following arguments:
• data an R object (this could be a vector).
• nrow the desired number of rows.
• ncol the desired number of columns.
• byrow a logical statement to populate the matrix by either row or by

19
column.
Creation of matrix

a) matrix1 <- matrix ( data = 1, nrow = 3, ncol = 3)


Sol:
> matrix1 <- matrix ( data = 1, nrow = 3, ncol = 3)
> matrix1
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 1
[3,] 1 1 1
b) vector8 <- 1:12 matrix3 <- matrix ( data = Vector8 , nrow = 4) Sol:
> vector8 <- c(1:12)
> vector8
[1] 1 2 3 4 5 6 7 8 9 10 11 12
> matrix3 <- matrix ( data = vector8 , nrow = 4)
> matrix3
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 1
1
[4,] 4 8 12
c) v1<- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3) Sol:
> v1<- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3)
> v1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
d) v2<- matrix(1:8, ncol = 2)
Sol:
> v2<- matrix(1:8, ncol = 2)
> v2
[,1] [,2]
[1,] 1 5
[2,] 2 6
[3,] 3 7
[4,] 4 8
e) matrix1 = matrix(1:9, nrow = 3) matrix1 + 2
Sol:
20
> matrix1 = matrix(1:9, nrow = 3)
> matrix1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> matrix1+2
[,1] [,2] [,3]
[1,] 3 6 9
[2,] 4 7 10
[3,] 5 8 11

Manipulation of Matrix
f) matrix1
Sol:
> matrix1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
g) matrix1[1, 3]
Sol:
> matrix1[1, 3]
[1] 7
matrix1[ 2, ]
Sol:
> matrix1[ 2, ]
[1] 2 5 8
h) matrix1[,-2]
Sol:
> matrix1[,-2]
[,1] [,2]
[1,] 1 7
[2,] 2 8
[3,] 3 9

j) matrix1[1, 1] = 15 Sol:
> matrix1[1, 1] = 15
> matrix1
[,1] [,2] [,3]
[1,] 15 4 7
[2,] 2 5 8
[3,] 3 6 9
k) matrix1[ ,2 ] = 1 Sol:
> matrix1
21
[,1] [,2] [,3]
[1,] 15 1 7
[2,] 2 1 8
[3,] 3 1 9
l) matrix1[ ,2:3 ] = 2 Sol:
> matrix1[ ,2:3 ] = 2
> matrix1
[,1] [,2] [,3]
[1,] 15 2 2
[2,] 2 2 2
[3,] 3 2 2
Mathematical Operations
R can do matrix arithmetic. Below is a list of some basic operations we can do.
 + - * / standard scalar or by element operations
 %*% matrix multiplication
 t() transpose
 solve() inverse
 det() determinant
 chol() cholesky decomposition
 eigen() eigenvalues and eigenvectors
 crossprod() cross product.

> B<-matrix(nrow=3,ncol=3,data=c(1,2,3,4,2,6,-3,-1,-3) , byrow=TRUE)


>B
[,1] [,2] [,3]

[1,] 1 2 3
[2,] 4 2 6
[3,] -3 -1 -3
> B%*%B%*%B
[,1] [,2] [,3]
[1,] -6 0 0
[2,] 0 -6 0
22
[3,] 0 0 -6

23
sol:

> m<-matrix(nrow=2,ncol=4,data=c(1,3,5,7,2,4,6,8) , byrow=TRUE)


>m
[,1] [,2] [,3] [,4]

[1,] 1 3 5 7
[2,] 2 4 6 8
b) Calculate Transpose.
Sol:
> t(m)
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
[4,] 7 8
c) Calculate Inverse.
Sol:
> solve(m)
Error in solve.default(m) : 'a' (2 x 4) must be square

> m<-matrix(nrow=3,ncol=3,data=c(1,3,5,7,2,4,6,8,9) , byrow=TRUE)


>m
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 7 2 4
[3,] 6 8 9
>
d) Calculate Determinant.
Sol:
> det(m)
[1] 89

24
e) Calculate the Multiplication of the matrix.
Sol:
> m1<-m%*%m
> m1
[,1] [,2] [,3]
[1,] 52 49 62
[2,] 45 57 79
[3,] 116 106 143
>
f) Construct a matrix with 10 columns and 10 rows, all filled with
random numbers between 0 and 100.
Sol:
m <-matrix(runif(100), ncol=10)
g) Calculate the row means of this matrix (Hint: use rowMeans). Also
calculate the standard deviation across the row means (now also use sd().
Sol:
> m1<-rowMeans(m)
> m1
[1] 0.3885344 0.6758386 0.4342555 0.5735385 0.5112892
0.4370579 0.4852983
[8] 0.6234814 0.6275129 0.7056754
> sd(m1)
[1] 0.1104536

h) Now remake the above matrix with 100 columns, and 10 rows. Then
calculate the column means (using, of course, colMeans).
Sol:
>m <- matrix(runif(1000), ncol=100,nrow=10)
> m1<-colMeans(m)
> m1

25
EXPERIMENT-4
EXPLORE THE CONTROL STRUCTURES OF R AND DEMONSTRATE WITH ONE EXAMPLE

UNDER EACH CASE

CONDITIONAL CONTROL STRUCTURES

R if statement

The syntax of if statement is:

if (test_expression) {

statement

If the test_expression is TRUE, the statement gets executed. But if it’s FALSE, nothing happens.
Here, test_expression can be a logical or numeric vector, but only the first element is taken into
consideration.
In the case of numeric vector, zero is taken as FALSE, rest as TRUE.

Flowchart of if statement

26
Example: if statement

x <- 5

if(x > 0){

print("Positive number")

Output

[1] "Positive number"

Develop programs on if-else in R.


1. Program to check the leap year or not.

# Program to check if the input year is a leap year or not

year = as.integer(readline(prompt="Enter ayear: ")) if((year %% 4) == 0)


{
if((year %% 100) == 0) {

if((year %% 400) == 0) { print(paste(year,"is


a leap year"))

} else {

print(paste(year,"isnot a leap year"))

27
} else {

print(paste(year,"is a leap year"))

} else {

print(paste(year,"isnot a leap year"))

Output 1

Enter a year: 1900

[1] "1900 is not a leap year"

2. Find the Factorial of a given Number.

# take inputfrom the user

factorial = 1

# check is the number is negative, positive or zero if(num < 0) {

print("Sorry, factorial does not exist for negative numbers")

} else if(num == 0) { print("The factorial of

0 is 1")

} else {

28
for(i in 1:num){

factorial = factorial * i

print(paste("The factorial of", num ,"is",factorial))

Output

Enter a number: 8

[1] "The factorial of 8 is 40320"

2. Check whether the given number is Even or Odd.

# Program to check if the input number is odd or even.

# A number is even if division by 2 give a remainder of 0. # If remainder is 1, it is


odd.
if((num %% 2) == 0) {
print(paste(num,"is Even"))

} else {

Enter a number: 89

[1] "89 is Odd"

29
ITERATIVE CONTROL STRUCTURES
FOR LOOP
A for loop is used to iterate over a vector in R programming.

Syntax offor loop

for (val in sequence)

statement

Here, sequence is a vector and val takes on each of its value during the loop. In each iteration, statement is
evaluated.

Flowchart of for loop

30
1. Program to count the number of even numbers in a vector.

x <- c(2,5,3,9,8,11,6)

count <- 0

for (val in x) {

if(val %% 2 == 0) count= count+1

print(count)

Output

2. Program to Check Whether the given number is prime or not.

# Program to check if the input number is prime or not # take inputfrom the

user

flag = 0

# check for factors

31
for(i in 2:(num-1)){

if ((num %% i) == 0) { flag = 0

break

print(paste(num,"is a prime number"))

} else {

print(paste(num,"is not a prime number"))

Output 1

Enter a number: 25

[1] "25 is not a prime number"

32
3. Program to display multiplication table.

# R Program to find the multiplicationtable (from 1 to 10) # take inputfrom the

user

num = as.integer(readline(prompt = "Enter a number: ")) # use for loop to iterate

10 times

for(i in 1:10) {

Output

Enter a number: 7

[1] "7 x 2 = 14"

[1] "7 x 4 = 28"


[1] "7 x 5 = 35"
[1] "7 x 6 = 42"
[1] "7 x 7 = 49"
[1] "7 x 8 =56"
[1] "7 x 9 = 63"
[1] "7 x 10 = 70"

33
ITERATIVE CONTROL STRUCTURES
WHILE LOOP
In R programming, while loops are used to loop until a specific condition is
met.
Syntax of while loop

while (test_expression)

statement

Here, test_expression is evaluated and the body of the loop is entered if the result is TRUE.
The statements inside the loop are executed and the flow returns to
evaluate the test_expression again.

This is repeated each time until test_expression evaluates to FALSE, in which case, the loop exits.

Flowchart of while Loop

34
Example of while Loop

i <-

while (i < 6) {

print(i)

i = i+1

Output

1. Check whether the given number is Arm strong number or not.

# take inputfrom the user

# initialize sum

sum = 0

35
# find the sum of the cube of each digit temp = num

while(temp > 0) { digit =

temp %% 10

sum = sum + (digit ^ 3) temp =

floor(temp / 10)

# display the result if(num


== sum) {
print(paste(num, "is an Armstrong number"))

} else {

print(paste(num, "is not an Armstrong number"))

Output 1

Enter a number: 23

[1] "23 is not an Armstrong number"

2. Find sum of natural numbers without formula.

# take inputfrom the user

num = as.integer(readline(prompt = "Enter a number: "))

36
if(num < 0) {

print("Enter a positive number")

} else {

sum = 0

# use while loop to iterate until zero while(num > 0)

sum = sum + num num =

num - 1

print(paste("The sum is", sum))

Output

Enter a number: 10

[1] "The sum is 55"

3. Program to print the Fibonacci Series

# take inputfrom the user

nterms= as.integer(readline(prompt="Howmanyterms?")) # first two terms

37
n1 = 0

n2 = 1

count = 2

# check if the number of terms is valid

if(nterms <= 0) {

print("Plese enter a positive integer")

} else {

if(nterms == 1) {

print("Fibonacci sequence:")

print(n1)

} else {

print("Fibonacci sequence:")

print(n1)

print(n2)

while(count < nterms) {

nth = n1 +n2

print(nth)

# update values

n1 = n2

38
n2 = nth

Output

How many terms? 7

[1] "Fibonacci sequence:"

39
EXPERIMENT-5
CREATE R FUNCTIONS AND USE THEM WITH SIMPLE SCRIPTS.

Data frame is a two dimensional data structure in R. It is a special case of a list which has

each component of equal length. Each component form the column and contents of the

component form the rows.

Creating Data Frame in R


We can create a data frame using the data.frame() function.
For example, the above shown data frame can be created as follows.

x <- data.frame("SN" = 1:2, "Age" = c(21,15), "Name" = c("John","Dora"))

str(x) # structure of x 'data.frame': 2 obs.

of 3 variables:

$ SN : int 2

$ Age : num 21 15

$ Name: Factor w/ 2 levels "Dora","John": 2 1

Notice above that the third column, Name is of type factor, instead of a character vector.
By default, data.frame() function converts character vector into factor.
To suppress this behavior, we can pass the argument
stringsAsFactors=FALSE.

x <- data.frame("SN" = 1:2, "Age" = c(21,15), "Name" = c("John", "Dora"), stringsAsFactors =


FALSE)

str(x) # now the third column is a character vector 'data.frame': 2

obs. of 3 variables:

40
$ SN : int 1 2

$ Age : num 21 15

$ Name: chr "John" "Dora"

Many data input functions of R like, read.table(), read.csv(), read.delim(), read.fwf() also read

data into a data frame.

41
EXPERIMENT-6
EXPLORE THE DATA ANALYTICS LIFE CYCLE.

As a data analyst or someone who works with data regularly, it’s important to understand how to

manage a data analytics project so you can ensure efficiency and get the best results for your clients. One

of the first steps in doing so understands the data analytics lifecycle.

The data analytics lifecycle describes the process of conducting a data analytics project, which

consists of six key steps based on the CRISP-DM methodology. These steps include: understanding the

business issue, understanding the data set, preparing the data, exploratory analysis, validation, and

visualization and presentation.

1. Understand the Business Issues: When presented with a data project, you will be given a brief outline

of the expectations. From that outline, you should identify the key objectives that the business is trying to

uncover. You should examine the overall scope of the work, business objectives, information the

stakeholders are seeking, the type of analysis they want you to use, and the deliverables (the outputs of the

project) they want.

You need to have these elements clearly defined prior to beginning your data analysis project to

provide the best deliverable you can. Additionally, it’s important to ask as many questions as you can at the

outset of the project because, often, you may not have another chance before the completion of the project.

42
2. Understand Your Data Set: There are a variety of tools you can use to organize your data. When

presented with a small dataset, you can use Excel, but for heftier jobs, you’ll likely want to use more rigid

tools to explore and prepare your data. Muñoz suggests R, Python, Alteryx, Tableau Prep or Tableau

Desktop to help prepare your data for it’s cleaning. Within these programs, you should identify key

variables to help categorize the data. When going through the data sets, look for errors in the data. These

can be anything from omitted data, data that doesn’t logically make sense, duplicate data, or even spelling

errors. These missing variables need to be amended so you can properly clean your data.

3. Prepare the Data: Once you have organized and identified all the variables in your dataset, you can

begin cleaning. In this step, you will input missing variables, create new broad categories to help categorize

data that doesn’t have a proper place, and remove any duplicates in your data. Imputing average data scores

for categories where there are missing values will help the data be processed more efficiently without

skewing it.

4. Perform Exploratory Analysis and Modeling: In this step, you will begin building models to test your

data and seek out answers to the objectives given. Using different statistical modeling methods, you can

determine which is the best for your data. Common models include linear regressions, decision trees, and

random forest modeling, among others.

5. Validate Your Data: Once you have crafted your models, you’ll need to assess the data and determine

if you have the correct information for your deliverable. Did the models work properly? Does the data need

more cleaning? Did you find the outcome the client was looking to answer? If not, you may need to go

over the previous steps again. You should expect a lot of trial and error!

43
6. Visualize and Present Your Findings: Once you have all your deliverables met, you can begin your

data visualization. In many cases, data visualization will be crucial in communicating your findings to the

client. Not all clients are data-savvy, and interactive visualization tools like Tableau are tremendously

useful in illustrating your conclusions to clients. Being able to tell a story with your data is essential.

Telling a story will help explain to the client the value of your findings.

As with any project, you need to identify your objectives clearly. Outlining your work will ensure

you get the best deliverables for your clients. While all of these steps are important, if you start the project

without all the data you need, you are likely to have to backtrack.

44
EXPERIMENT-7
IMPORTING & EXPORTING THE DATA FROM (I) CSV FILE (II) EXCEL FILE.

1.Reading different types of data sets (.txt, .csv) from web and disk and writing in file in specific disk

location.

library(utils)
data<- read.csv("input.csv") data

Output :-
id, name, salary, start_date, dept
1 1 Rick 623.30 2012-01-01 IT
2 2 Dan 515.20 2013-09-23 Operations
3 3 Michelle 611.00 2014-11-15 IT
4 4 Ryan 729.00 2014-05-11 HR
5 NA Gary 843.25 2015-03-27 Finance
6 6 Nina 578.00 2013-05-21 IT
7 7 Simon 632.80 2013-07-30 Operations
8 8 Guru 722.50 2014-06-17 Finance

data<- read.csv("input.csv")

print(is.data.frame(data)) print(ncol(data)) print(nrow(data))

Output:-

[1] TRUE
[1] 5
[1] 8

2. Reading Excel data sheet in R.

install.packages("xlsx") library("xlsx")
data<- read.xlsx("input.xlsx", sheetIndex = 1) data

Output:-

id, name, salary, start_date, dept


1 1 Rick 623.30 2012-01-01 IT
2 2 Dan 515.20 2013-09-23 Operations
3 3 Michelle 611.00 2014-11-15 IT
4 4 Ryan 729.00 2014-05-11 HR
5 NA Gary 843.25 2015-03-27 Finance
6 6 Nina 578.00 2013-05-21 IT
45
7 7 Simon 632.80 2013-07-30 Operations
8 8 Guru 722.50 2014-06-17 Finance

3. Reading XML dataset in R.

install.packages("XML") library("XML") library("methods")


result<- xmlParse(file = "input.xml") result

Output:-

1
Rick 623.3
1/1/2012
IT

2
Dan 515.2
9/23/2013

Operations
3
Michelle 611
11/15/2014
IT
4
Ryan 729
5/11/2014
HR
5
Gary 843.25
3/27/2015
Finance
6
Nina 578
5/21/2013
IT
7
Simon 632.8
7/30/2013
Operations
8
Guru 722.5
6/17/2014, Financ

46
EXPERIMENT-8
DATA VISUALIZATIONS

a. Find the data distributions using box and scatter plot.

Install.packages(“ggplot2”)
Library(ggplot2)
Input <- mtcars[,c('mpg','cyl')]
input

Boxplot(mpg ~ cyl, data = mtcars, xlab = "number of cylinders",


ylab = "miles per gallon", main = "mileage data")

Dev.off()

Output :-
mpg cyl
Mazda rx4 21.0 6
Mazda rx4 wag 21.0 6
Datsun 710 22.8 4
Hornet 4 drive 21.4 6
Hornet sportabout 18.7 8
Valiant 18.1 6

47
b. Find the outliers using plot.
v=c(50,75,100,125,150,175,200)
boxplot(v)

c. Plot the histogram, bar chart and pie chart on sample data.

Histogram

library(graphics)
v <- c(9,13,21,8,36,22,12,41,31,33,19)

# Create the histogram.


hist(v,xlab = "Weight",col = "blue",border = "green")
dev.off()

Output:-

48
Bar chart

library(graphics)
H <- c(7,12,28,3,41)
M <- c("Jan","Feb","Mar","Apr","May")
# Plot the bar chart.
barplot(H,names.arg = M,xlab = "Month",ylab = "Revenue",col = "blue",main = "Revenue chart",border
= "red")
dev.off()

Pie Chart

library(graphics)
x <- c(21, 62, 10, 53)
labels<- c("London", "NewYork", "Singapore", "Mumbai")
# Plot the Pie chart.
pie(x,labels)
dev.off()

49
New York

London

S in gapore
Mumbai

50
EXPERIMENT-9

DEMONSTRATE SIMPLE LINEAR REGRESSION ANALYSIS. ANALYZE RESULTS

IN DETAIL.

size<-c(1.4,2.6,1.0,3.7,5.5,3.2,3.0,4.9,6.3)
weight<-c(0.9,1.8,2.4,3.5,3.9,4.4,5.1,5.6,6.3)
tail<-c(0.7,1.3,0.7,2.0,3.6,3.0,2.9,3.9,4.0)
mouse<-data.frame(size,weight,tail)
mouse
plot(mouse$weight,mouse$size)
simple<-lm(size~weight,data=mouse)
summary(simple)
abline(simple,col="red",lwd=2)

Output::
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5813 0.9647 0.603 0.5658
weight 0.7778 0.2334 3.332 0.0126 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.19 on 7 degrees of freedom


Multiple R-squared: 0.6133, Adjusted R-squared: 0.558
F-statistic: 11.1 on 1 and 7 DF, p-value: 0.01256

Output Console::

51
Output Plots::

52
EXPERIMENT-10

DEMONSTRATE MULTIPLE REGRESSION ANALYSIS. ANALYZE RESULTS IN

DETAIL.

no<-c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25)
dt<-
c(16.68,11.50,12.03,14.88,13.75,18.11,8.00,17.83,79.24,21.50,40.33,21.00,13.50,19.75,24.00,29.00,15.35
,19.00,9.50,35.10,17.90,52.32,18.75,19.83,10.75)
cases<-c(7,3,3,4,6,7,2,7,30,5,16,10,4,6,9,10,6,7,3,17,10,26,9,8,4)
distance<-
c(560,220,340,80,150,330,110,210,1460,605,688,215,255,462,448,776,200,132,36,770,140,810,450,635,
150)
vending<-data.frame(dt,cases,distance)
plot(vending)
mlr<-lm(dt~cases+distance)
summary(mlr)

Output::
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.341231 1.096730 2.135 0.044170 *
cases 1.615907 0.170735 9.464 3.25e-09 ***
distance 0.014385 0.003613 3.981 0.000631 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.259 on 22 degrees of freedom


Multiple R-squared: 0.9596, Adjusted R-squared: 0.9559
F-statistic: 261.2 on 2 and 22 DF, p-value: 4.687e-16

53
Output Console:

Output Plots::

54
EXPERIMENT-11

DEMONSTRATE LOGISTIC REGRESSION ANALYSIS. ANALYZE RESULTS IN

DETAIL.

df <- read.csv(“https://stats.idre.ucla.edu/stat/data/binary.csv")
str(df)
## ‘data.frame’: 400 obs. of 4 variables:
## $ admit: int 0 1 1 1 0 1 1 0 1 0 …
## $ gre : int 380 660 800 640 520 760 560 400 540 700 …
## $ gpa : num 3.61 3.67 4 3.19 2.93 3 2.98 3.08 3.39 3.92 …
## $ rank : int 3 3 1 4 4 2 1 2 3 2 …

sum(is.na(df))## [1] 0

summary(df)## admit gre gpa


rank
## Min. :0.0000 Min. :220.0 Min. :2.260 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:520.0 1st Qu.:3.130 1st Qu.:2.000
## Median :0.0000 Median :580.0 Median :3.395 Median :2.000
## Mean :0.3175 Mean :587.7 Mean :3.390 Mean :2.485
## 3rd Qu.:1.0000 3rd Qu.:660.0 3rd Qu.:3.670 3rd Qu.:3.000
## Max. :1.0000 Max. :800.0 Max. :4.000 Max. :4.000

df$rank <- as.factor(df$rank)

logit <- glm(admit ~ gre+gpa+rank,data=df,family="binomial")

summary(logit)##
## Call:
## glm(formula = admit ~ gre + gpa + rank, family = "binomial",
## data = df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6268 -0.8662 -0.6388 1.1490 2.0790
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)

55
## (Intercept) -3.989979 1.139951 -3.500 0.000465 ***
## gre 0.002264 0.001094 2.070 0.038465 *
## gpa 0.804038 0.331819 2.423 0.015388 *
## rank2 -0.675443 0.316490 -2.134 0.032829 *
## rank3 -1.340204 0.345306 -3.881 0.000104 ***
## rank4 -1.551464 0.417832 -3.713 0.000205 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 499.98 on 399 degrees of freedom
## Residual deviance: 458.52 on 394 degrees of freedom
## AIC: 470.52
##
## Number of Fisher Scoring iterations: 4
x <- data.frame(gre=790,gpa=3.8,rank=as.factor(1))
p<- predict(logit,x)
p## 1
## 0.85426

56
EXPERIMENT-12

DEMONSTRATE OTHER REGRESSION MODEL REGRESSION ANALYSIS.

ANALYZE RESULTS IN DETAIL (Stepwise, Forward, Backward Regression)

fitall<-read.csv("C:\\Users\\Blessy Anjaleena\\Desktop\\Fitting.csv")
plot(fitall)
fit<-lm(y~x1+x2+x3+x4,data=fitall)
summary(fit)
#backward selection
step(fit,direction="backward")
fitstart=lm(y~1,data=fitall)
fitstart
#forward selection
f<-step(fitstart,direction="forward",scope=formula(fitall))#forward selection
summary(f)

Output::
Series: temp.ts
ARIMA(0,1,0)

sigma^2 estimated as 205.7: log likelihood=-159.2


AIC=320.4 AICc=320.5 BIC=322.06

Training set error measures:


ME RMSE MAE MPE MAPE MASE
ACF1
Training set -2.133825 14.16017 11.56617 -0.383208 1.968452 0.6008403
-0.02787511

57
Output Console::

Output Plots::

58

You might also like