0% found this document useful (0 votes)
41 views32 pages

Lab Manual Page No 1

The document provides instructions on various data science concepts in R including: 1. Downloading and installing R and setting the library path 2. The different data types in R such as integer, numeric, complex, character etc. 3. Creating functions to perform basic arithmetic operations like addition, subtraction etc.

Uploaded by

R.R.Rao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views32 pages

Lab Manual Page No 1

The document provides instructions on various data science concepts in R including: 1. Downloading and installing R and setting the library path 2. The different data types in R such as integer, numeric, complex, character etc. 3. Creating functions to perform basic arithmetic operations like addition, subtraction etc.

Uploaded by

R.R.Rao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

DATA SCIENCE LAB MANUAL

1. Downloading , Installing and Setting path for R.


R is Scripting programming Language Which provides an environment for Statistical Computing , Data
Science and Graphics.
R is Open Source and Object –Oriented Programming Language for Statistical Computing and Data
Visualisation.
The Integrated development Suite for R language can be downloaded from Comprehenisve R Archive
Network(CRAN).The Network includes mirror websites for downloading the Suite from different
Countries.
To download R ,users need to visit the CRAN mirror page and Click on the URL of the choosen mirror
that will redirect them to the respective Site.
https://cran.r-project.org
R is offered as Precompiled binary distribution of a base System and contributing packing Different
distributions of R are available for different Operating Systems (OS) like windows,Mac, Linux

Downloading R for Windows.

Windows Users need to first download and install binaries for base distribution.The current version of
base binary distribution is R 3.3.1. Users Can Check and download Previous Contributions and
versions of R .R tools from the mirror website R tools is used for building R and it’s Packages.

Installing R for Windows

Installing R on windows is Simple Users need to double click the downloaded binary named R-3.3.1 –
win- exe ,On a graphical interface command line installation options are available for windows.
Command .libpaths( ) can be used to get or Set the path of the package library.

>.libpaths( )

O/P: C:/R/R-3.4.3/library.
2. R data types
Int,char ,float,double , Boolean ,complex,raw are the basic data types in R

x=5.6
print(class(x))
print(typeof(x))
y=5
print(class(y))
print(typeof(y))
x=as.integer(5)
print(class(x))
print(typeof(x))
y=sL
y=5L
print(class(y))
print(typeof(y))
x=4
y=3
z=x>y
print(z)
print(class(z))
print(typeof())
print(typeof(z))
x=4+3i
print(class(x))
print(typeof(x))
char="Magnet"
print(class(char))
print(typeof(char))
x=as.integer(6)
print(class(x))
raw_variable <-charToRaw("welcome to Programiz")
print(raw_variable)
print(class(raw_variable))
char_variable <-rawTochar(raw_variable)
char_variable <-rawToChar(raw_variable)
print(char_variable)
print(class(char_variable))
dbl_var <- c
dbl_var <- c(1L,2.5,4.5)
dbl_var
int_var <- c(1L,6L,10L)
int_var

3. Program make a simple calculator that can add, subtract, multiply and divide using functions
add <- function(x, y) {

return(x + y)

subtract <- function(x, y) {

return(x - y)

multiply <- function(x, y) {

return(x * y)

divide <- function(x, y) {

return(x / y)

# take input from the user

print("Select operation.")

print("1.Add")

print("2.Subtract")

print("3.Multiply")

print("4.Divide")
choice = as.integer(readline(prompt="Enter choice[1/2/3/4]: "))

num1 = as.integer(readline(prompt="Enter first number: "))

num2 = as.integer(readline(prompt="Enter second number: "))

operator <- switch(choice,"+","-","*","/")

result <- switch(choice, add(num1, num2), subtract(num1, num2), multiply(num1, num2),


divide(num1, num2))

print(paste(num1, operator, num2, "=", result))

Output

[1] "Select operation."

[1] "1.Add"

[1] "2.Subtract"

[1] "3.Multiply"

[1] "4.Divide"

Enter choice[1/2/3/4]: 4

Enter first number: 20

Enter second number: 4

[1] "20 / 4 = 5"

2. Find the Rectangle area using R programming


RectangleHeight <- 2

RectangleWidth <- 4

RectangleArea <- RectangleHeight * RectangleWidth

RectangleHeight

[1]2

RectangleWidth

[1] 4

RectangleArea

[1] 8.

4.Demonstrate the process of creating a user defined function in R.


Functions in R are min ,max, summary,average,aggregate,tapply,sapply( ) functions are available in R
this applicable on grouped data . in order to find required output through a defined function baesd on input
vector
EX: a user- defined function is required to Calculate mode in R Here the input is a vector value and output
is the mode value.
# Create the Function
getmode <- function(y){
uniqy <- unique(y)
uniqy (which.max(tabulated(match(y,uniqy))
}
V <- c(5,6,4,8,5,7,4,6,5,8,3,2,1)
Calculate the mode with User-defined Functions
resultmode<- getmode(v)
print(resultmode)

5. program on R objects like list,data.frame and operations on datasets


emp <- list(EmpName= "keir",EmpUnit="lawyer",Empsal=55000)
em
emp
emp$EmpUnit <- NULL
emp
length(emp)
emp$EmpCity = "london"
emp
length(emp)
emp1 <- list(EmpDesg = "prosecutor")
emp1
Emplist <- list(emp,emp1)
Emplist
data()
q()
names(matrix)
names(orange)
names(Orange)
summary(Orange)
str(Orange)
Orange
head(Orange,n=3)
tail(Orange,n=3)
class(Orange)
dim(Orange)
table(Orange$age)
table(Orange$Tree)
table(Orange$circumference)
TD
TD[1]
Orange[2]
Orange[1]
Orange[,3]
TD <- read.csv("Hardware.csv")
TD <- read.csv("Orange")
BOD
SRow <- BOD(1)
Row
str(SRow)
q()
R <- data.frame(RN = c('A','B','C'),RM = c(10,20,30))
R
T <- data.frame(TN= c('A','B','C'),TM= c(10,20,30))
T
K <- merge(R,T)
K
B <- data.frame(BN= c('A','C'),BM= c(100,200))
B
E <- merge(B,T,all.x = "TRUE")
E
E <- merge(B,T,all.y = "TRUE")
E
E <- merge(T,B,all.x = "TRUE")
E
E <- merge(T,B ,all.y = "TRUE")
E
E <- merge(B,T,all.x = "FALSE")
E
E <- merge(T,B,all.y = "FALSE")
E
list(Fruit.Name = S$Fruit.Name)
list(Fruit.Name = $Fruit.Name)
list(Fruit.Name)
Orange
data( )

SampleSuperstore
library( )
library(tools)
library( )
library(Matrix)
data( )

iris
data(iris)
Orange
Rcurl
WDI
getURL

getURL( )
htmlTreeParse()
install.packages("rjson")
Rcurl
library(Rcurl)
library(Rcurl)
RCurl
library(RCurl)
RCurl
library(RCurl)
RCurl

library()
library()
data()
library()
library()
lirary()
library()

EmpNo <- c(6,11,1966)


EmpName <- c("puthin","jinping","starmer")
ProjName <- c("R","PL","GBMKL")
Employee <- data.frame(EmpNo, EmpName, ProjName)
Employee
Employee[2]
Emp[1:2]
Employee[1:2]
Employee[3,]
Employee[,3]
row.names(Employee) <- c("Employee1","Employee2","Employee3")
row.names(Employee)
Employee
Employee["Employee1"]
Employee["Employee1",]
Employee["Employee3",]
Employee[c ("Employee 2", "Employee 1"),]
Employee [ c ("Employee 2","Employee 1")]
Employee [ c("Employee 1","Employee 2"),]
Employee
Employee [ c ("Employee 1", "Employee 2"),]
Employee [ "Employee2",]
Employee[["EmpName"]]
Employee[c("EmpNo", ProjName")]
)
6. operations on Different datasets and Packages

View(mtcars)
ncol(mtcars)
View(mtcars)
.libPaths()
Installed.Packages()

packageDescription("stats")
help(package="stats"
help(package="stats")
help(package="stats")
packageDescription("matrix package")
packageDescription("matrix")

plot(tress,col="red",pch=33)
.libPaths()
installed.Packages()
packageDescription("matrix")
help(package="matrix")
help(package="datsets")
datasets::air passengers
datasets::Air passengers
datasets::AirPassengers
library(datasets)
AirPassengers
.libpaths()
.libPaths()
installed.Packages()
installed.packages()
packageDescription("parellel")
packageDescription("Parellel")
packageDescription("Parellel")
packageDescription("tools")
packageDescription("utils")
packageDescription("translations")
packageDescription("servivel")
packageDescription("servival")
packageDescription("survival")
find.packages("survival")
find.Packages("survival")

7. Find the Square area using R

Square length <-4


Square length <- 4
Squarelength <- 4
Squarearea <- Squarelength*Squarelength
Squarelength
Squarearea
AirPassengers
ncol(AirPassengers)
8. ncol,nrow str,rnorm,summary functions on datasets
datasets ::(mtcars)
datasets ::mtcars
ncol(mtcars)
nrow(mtcars)
mtcars
summary(mtcars)
str()
str(mtcars)
str(str)
str(ls)
rnorm()
rnorm(100,2,4)
rnorm(2,3)
help(rnorm())
x <- rnorm(100,2,2)
x
summary(x)
str(x)

9.head tail edit ,plot functions on datasets

data()
data
data(trees)
trees
head(trees,n=7)
tail(trees,n=2)
summary(trees)
View(trees)
trees
edit(trees)
edit(trees)
edit(trees)
edit(trees)
plot(trees)
edit(trees)
plot(trees,color)
plot(trees,col="green")
dir()
list.files()
plot(tress)
plot (tress)
plot(tress)
plot(trees,col="green")
plot(trees,col="green")
plot(trees,col="blue")
plot(trees,col="pink")
plot(trees,col="red")

read.csv()
read.xlsx()
mtcars
summary(mtcars$mpg)
str(c(1,2,3,4,5,6))

10.Arthamatic expressions in R
9+23
4-2
5*4
%/4
5/4
4^5
4**5
23 %%9
5%/%4
sqrt(9)
sqrt(225)
2<4
T==FALSE
F==FALSE
x <- c(1:5)
x
x[(x>2)|(x<5)]
x[x>2)&(x<5)]
x[(x>2)&(x<5)]
x>2
x<4
x==3
x>=3
x<=2
install of rjson xml packages in R
install.packages("rjson")

install.packages("XML")
Titanic()
Titanic ()
Titanic()
titanic()
OPERATIONS ON SEQUENCE
a <- seq(5,11,by=2)
a
11.MATRIX IN R
matrix(a,2,2)
matrix(a,2,1)
matrix(a,2,2)
matrix(a,1,2)
dim(a) <- c(1,2)
diam(a) <- c(2,2)
dim(a) <- c(2,2)
a
x <- 6:11
x
mat <- matrix(x,2,3)
mat
mat <- matrix(x,3,2)
mat
mat <- matrix(x,3,3)
mat
mat [2,2]
mat [3,3]
mat [2,3]
mat [1,1]
mat [3,2]
mat [3,1]
mat [2,1]
mat [2,3]
mat[,3]
mat[2,]
mat[3,]
sin ,cos functions in R
sin(x)
cos(x)
tan(x)
sin(90)
cos(90)
tan(90)
cot(90)
c(1,2,6)+c(11,6,55)
x <- seq(1,20,0.1)
x
y <- sin(x)
y
plot(x,y)
x <- seq(1,20,5)
x
y <- cos(x)
y
plot(x,y)
x <- seq(1,2,0.1)
x
x <- seq(0,1,0.1)
x
y <- cos(x)
y
plot(x,y)
x <- seq(1,20,0.1)
x
y <-seq(x)
y
y <- cos(x)
y
plot(x.y)
plot(x,y)
reading .csv file in R
read.csv("sampledata.csv")
InputData <- read.csv("D:/samledata.csv")
read.csv('D:/Sampledata.csv')
read.table('D:/sampladata.csv',header=TRUE,sep= ',',)
read.table('D:/sampladata.csv',header=TRUE)
read.table('D:/sampledata.csv')
read.table('D:/sampladat.csv',header=FALSE)
read.table('D:/sampledata.csv', header=TRUE)
read.table('D:/sampledata.csv', header=TRUE, sep=',', )
read.table('D:/sampledata.csv', header=TRUE, sep=',',....)
read.csv('D:/sampledat.csv')
read.csv('D:/sampledata.csv')
Inputdata <- read.csv("D:/sampledata.csv")
save.image("C:\\Anitha\\R.R.Rao 19-12-22")
q()
data()
x <- c(v,s,p,a,l)
x <- c(a,p by=2)
x <- seq(10,25,by=5)
x
matrix(a,2,2)
dim(a)
edix(x)
edit(x)
12.Descriptive Statistics in R
mtcars
summary(mtcars)
min(mtcars)
max(mtcars)
range(mtcars)
mean(mtcars)
BOD
mean(BOD)
IQR(mtcars)
mtcars
IQR(mtcars)
x <-1:6
x
summary(x)
min(x)
max(x)
range(x)
mean(x)
median(x)
mad(x)
IQR(x)
quantile(x)
IHR(x)
apply(x,1,mean)
matrix(x)
matrix(x,3,2)
dim(x)
apply(x,1,mean)
a <- seq(10,25,by=5)
a
matrix(a,2,2)
q()
x<- c(6,11,4)
median.result <- medain(x)
median.result <- median(x)
print(median.result)
x<- c(-6,11,4)
median.result <- median(x)
print(median.result)
numbers <- c(6,11,4)
median(numbers)
barplot(numbers)
abline(h = medain (numbers))
abline(h= median (numbers))
q()
numbers <- c(6,11,4)
mean(numbers)
deviation <- sd(numbers)
deviation
barplot,abline functions in R
barplot(numbers)
abline(h= sd(numbers))
abline(h= sd(numbers)+ mean(numbers))
11.Apply,bins,median,sd,histogram ,barplot,abline functions in R

mtcars
summary(mtcars)
min(mtcars)
max(mtcars)
range(mtcars)
mean(mtcars)
BOD
mean(BOD)
IQR(mtcars)
mtcars
IQR(mtcars)
x <-1:6
x
summary(x)
min(x)
max(x)
range(x)
mean(x)
median(x)
mad(x)
IQR(x)
quantile(x)
IHR(x)
apply(x,1,mean)
matrix(x)
matrix(x,3,2)
dim(x)
apply(x,1,mean)
a <- seq(10,25,by=5)
a
matrix(a,2,2)
q()
x<- c(6,11,4)
median.result <- medain(x)
median.result <- median(x)
print(median.result)
x<- c(-6,11,4)
median.result <- median(x)
print(median.result)
numbers <- c(6,11,4)
median(numbers)
barplot(numbers)
abline(h = medain (numbers))
abline(h= median (numbers))
q()
numbers <- c(6,11,4)
mean(numbers)
deviation <- sd(numbers)
deviation
barplot(numbers)
abline(h= sd(numbers))
abline(h= sd(numbers)+ mean(numbers))
q()
h <- c(1,2,3)
bins <- c(0,5,10,15)
bins
hist(h,xlab="Values",ylab="Colours",col="red",xlim=c(0,3),
ylim=c(0,3),breaks=bins)
EmpNo <- c(6,11,1966)
EmpName <- c("RAM","RAHIM","ROBERT")
ProjName <- c("R","PL","GBMKL")
Employee <- data.frame(EmpNo, EmpName, ProjName)
Employee
Employee[2]
Emp[1:2]
Employee[1:2]
Employee[3,]
Employee[,3]
row.names(Employee) <- c("Employee1","Employee2","Employee3")
row.names(Employee)
Employee
Employee["Employee1"]
Employee["Employee1",]
Employee["Employee3",]
Employee[c ("Employee 2", "Employee 1"),]
Employee [ c ("Employee 2","Employee 1")]
Employee [ c("Employee 1","Employee 2"),]
Employee
Employee [ c ("Employee 1", "Employee 2"),]
Employee [ "Employee2",]
Employee[["EmpName"]]
Employee[c("EmpNo", ProjName")]
Employee[ c("EmpNo", "ProjName")]
Employee
Employee$EmpExpYears <- c(6,11,4)
Employee
Employee[order(Employee$EmpExpYears),]
Employee[order(-Employee$EmpExpYears),]
dim(Employee)
nrow(Employee0
)
nrow(Employee)
names(Employee)
edit(Employee)
Employee[1:3,]
Employee[1:3,1:2]
head()
head(Employee)
subset(Employee, EmpExpYears >6)
subset(Employee, EmpExpYears >4 ,select = c(EmpNo))
subset(Employee, Employee=="keir")
subset(Employee, EmpName=="keir")
subset(Employee, EmpName == "keir")
Employee
subset(Employee, EmpNo == "6")
subset(Employee, EmpNo == "6" | EmpNo == "11")
reading txt file in R
read.table("d:/sep,txt", sep="\t")
read.table("d:/sep.txt", sep="\t")
read.table("d:/sep.txt", sep="\t",header=TRUE)
mydata = read.table("d:/mydata.txt")
mydata = read.table("d:/mydata.txt")
mydata = read.table("d:/mydata.txt")
mydata = read.table("c:/mydata.txt")

mydata <- read.table("d:/mydata.txt")


mydata = read.table("d:/mydata.txt")
mydata = read.table("d:/mydata.txt")
mydata = read.table("d:/mydata.txt.docx")
read.table("d:/mydata.txt.docx",header=TRUE,Sep= ',',....)
read.table("d:/mydata.txt.docx",header=TRUE)
read.table("d:/mydata.txt.docx",sep ="\t",header=TRUE)
read.table("d:/mydata.txt.docx",sep="\t",header=TRUE)
read.table("d:/mydata.txt",sep="\t",header=TRUE)
mydata = read.table("d:/mydata.txt",sep="\t")
mydata = read.table("d:/mydata.txt")

14.sapply ,min,max functions in R


Employee
summary(Employee[4])
min(Employee[4])
max(Employee[4])
range(Employee[4])
Employee[,3]
mean(Employee[,4])
median(Employee[,4])
mad (Employee[,4])
IQR (Employee[,4])
quantile(Employee[,4])
sapply(Employee[4],mean)
sapply(Employee[4],min)
sapply(Employee[4],range)
sapply(Employee[4],quantile)
which.min(Employee$EmpExpYears)
which.max(Employee$EmpExpYears)
data <- read.table(header= TRUE)
data <- read.table(header= TRUE,text)
read.table("D:/sep.txt",header= TRUE,sep=",")
library(plyr)

missing value functions in R


x <- c(6,11,5,NA,55,3)
y <- c("red","safron",NA,"NA")
is.na(x)
is.na(y)
c <- as.data.frame(matrix c(1:11,NA),ncol=3)
c <- as.data.frame(matrix c(1:11,NA),ncol=3))

c <- as.data.frame (matrix c(1:11,NA),ncol=3))


c <- as.data.frame ( matrix(c(1:11,NA),ncol=3))
c
na.omit(c)
na.exclude(c)
na.pass(c)
na.fail(c)
sum(is.na(c))
rowsums(is.na(c))
rowSums(is.na(c))
rowMeans(is.na(c))*length(c)
q()
anyNA(c(-11,NaN,11))
anyNA(c(-11,NA,11))
anyNA(c(-11,11))
any(is.na(c(-6,NA,6)))
is.na(c(-6,6))
any(is.na(c(-6,6)))
any(is.na(c(-6,6)))
any(is.na(c(-6,NaN,6)))
any(is.valid(c(-4,NaN,4)))
any(is.valid(c(-4,NA,4)))
is.valid(c(-4,NA,4))
is.valid(c(-4,4))
is.Valid(c(-4,4)
)
is.inavlid(c(-4,NA,4))
is.invalid(c(-4,NA,4))
is.finite(c(-4,Inf,4))
is.infinite(c(-4,Inf,4))
is.finite(c(-11,-Inf,11))
is.NaN(c(-11,Inf,11))
is.nan(c(-11,Inf,11))
is.nan(c(-4,Inf,NaN))
Employee
summary(Employee$EmpExpYears)
summary(Employee$EmpNo)
duration = Employee$EmpExpYears
max(duration)-min(duration)
save.image("C:\\Users\\R\\Documents\\.RData")
mtcars
head(subset(mtcars, select = 'gear')
)
factor(mtcars$gear)
w = table(mtcars$gears)
w
w = table(mtcars$gear)
w
t = as.data.frame(w)
t
names(t)[1] = 'gear'
t
names(t)[2] = 'rao'
t
w
save.image("C:\\Users\\R\\OneDrive\\Documents\\.RData")
cbind(w)
getmode <- function(y){
uniqy <- unique(y)
uniqy[which.max(tabulate(match(y,uniqy))]
v <-c(5,6,4,8,5,7,4,6,5,8,3,2,1)
resultmode <- getmode(v)
resultmode<- getmode(v)
v
resultmode <- getmode(v)
print(resultmode)
getmode <- function(y)
{ uniqy <- unique(y)
uniqy[which.max(tabulate(match(y,uniqy))}
getmode <- function(y)
>
getmode <- function(y){
uniqy <- unique (y)
uniqy[which.max(tabulate(match(y, uniqy)))]
}

v <- c(5,6,4,8,5,7,4,6,5,8,3,2,1)
resultmode <- getmode(v)
print(resultmode)
charv <- c("Rt","cc","cc","bm","cc")
resultmode <- getmode (charv)
print(resultmode)
save.image("C:\\Users\\R\\OneDrive\\Documents\\.RData")
x <- c(15,54,6.5,9.2,36,5.3,8,-7,-5)
result.mean <- mean(x)
print(result.mean)
x <- c(1,2,3,4,5,6)
result.mean <- mean(x)
print(result.mean)
result.mean <- mean(x,trim= 0.1)
print(result.mean)
x <- c(6,11,4,NA)
result.mean <- mean(x)
print(result.mean)
result.mean <- mean(x,na.rm= TRUE)
print(result.mean)
numbers <- c(6,11,4)
mean(numbers)
barplot(numbers)
abline(h = mean (numbers))
barplot(numbers)
abline(h = mean (numbers))
q()
h <- density(c(6,11,4))
plot(h)
plot(h,xlab="Values",ylab="Density")
plot(h,xlab="Values",ylab="Density")
h <- c(6,11,4)

15. LINEAR RELATIONSHIP IN R

barplot(h,xlab= "categories",ylab= "Values",col="saffron")


barplot(h,xlab= "categories",ylab= "Values",col="red")
png(file= "samplebarchart.png")
barplot(h,horiz=TRUE)
dev.off()
barplot(h,xlab="Values",ylab="Categories",col="Red",horiz=TRUE)
colors <- c("red","yellow","block")
months <- c("May","jan","Nov")
regions <- c("ENG","AUS","NZ")
Values <- matrix(c(2,9,3,11,9,4,8,7,3),nrow=3,ncol=3,byrow=TRUE)
Values
rownames(Values) <- regions
rownames(Values)
Values
colnames(Values) <- months
Values
barplot(Values,col=colors,width=2,beside=TRUE,names.arg= months,main=
"total revenue 2022 by month")
colors <- c("red","yellow","blue")
barplot(colors)
days <- c("tue","wed")
months <- c("May","Nov")
colors <- c("red","blue")
val <- matrix(c(6,11,4,12,3,36),nrow= 3,ncol=2,byrow=TRUE)
val
barplot(val,main= "total",names.org=months,xlab="Months",ylab="Days",col=colors)
legend("topleft",days,cex=1.3,fill=colors)
dim(Grades)
Employee
cor(Employee$EmpNo,Employee$EmpExpYears)
16.LINEAR REGRESSION MODELS
x <- Employee$EmpNo
y <- Employee$EmpExpYears
n <- nrow(Employee)
xmean <- mean(Employee$EmpNo)
ymean <- mean(Employee$EmpExpYears)
xiyi <- x*y
numerator <- sum(xiyi)- n*xmean*ymean
denominator <- sum(n^2)-n*(xmean^2)
b1 <- numerator/denominator
b0 <- ymean - b1*xmean
b1
b0
xmean
ymean
model_R <- lm(Employee$EmpExpYears ~ Employee$EmpNo)
model_R
summary(model_R)
Plot(Employee$EmpNo,Employee$EmpExpYears,col="blue",main="Linear Regression",
abline(lm(Employee$EmpExpYears ~ Employee$EmpNo)),
cex= 1.3,pch= 16,xlab="No of Emp",
ylab= "Employee$EmpExpYears)
)
")
Plot(Employee$EmpNo,Employee$EmpExpYears,col="Red",main="Linear Regression",
abline(lm(Employee$EmpExpYears ~ Employee$EmpNo)),
cex = 1.3,pch=16,xlab="No of Emp",
ylab= "Employee$EmpExpYears")
ylab= "Employee$EmpExpYears")
plot(Employee$EmpNo,Employee$EmpExpYears,col="Red",main="Linear Regression",
abline(lm(Employee$EmpExpYears ~ Employee$EmpNo)),
cex= 1.3,pch=16,xlab="No of Emp",
ylab= "Employee$EmpExpYears")
17. Linear Model function in R programming
x <- C(160,180,200)

y <- C(80,90,100)

relation <- lm(y~x)


print(relation)

print(summary(relation))

PREDICT( ) function in R

a <- data.frame(x=170)

result <- predict(relation,a)

print(result)

[1]

85

Plot(y,x,col = “blue”, main = “Height & Weight Regression”,

abline(lm(x~y)), cex=1.3, pch = 16, xlab = “ Weight in Kg”, ylab = “height in cm”)

18. Scatter plot and box plot in R Using Cars dataset

Cars

head(Cars)

Scatter.smooth( x = Cars $ Speed, y= Cars$ dist , main = “Dist ~Speed”)

Implementing box plot

Par(mfrow = c (1,2)) # divide the graph area in 2 coloumns.

boxplot(Cars$Speed, main = “Speed”, Sub = Paste(“outlier rows: “, boxplot.stats


((Cars$Speed)$out)) # box plot for speed.

Boxplot(Cars$dist, main = “Distance”, Sub = paste(“outlier rows : “, boxplot.stats


(cars$dist)$out)) # box plot for distance.

beta co-efficients in R

linear Mod <- lm (dist ~ Speed, data = cars)

print(linear Mod)
print(Summary(linear Mod))

plot(Cars $dist, Cars $Speed, col = “red”, main = “Speed & Distance Regression “ ,

abline (lm(Cars$Speed ~ Cars$ dist)), cex = 1.3, pch = 16, xlab = “Distance”, ylab =
“Speed”)

19.Binary Logistic Regression --- glm ( ) function using R

E <- factor (c(“NA”,”C”,”N”))

res <- cbind ( clear = c(6,5,10), not clear = c(9,10,0))

res

clear not clear

[1,] 6 9

[2,] 5 10

[3,] 10 0

EN = (E == “N”)

EC = (E == “C”)

b1 <- glm (formula = res ~ EC+EN, formula = binomial (“logit”))

b1

summary(b1)

20. mle ( ) function implemention in R

f <- function(x)

Sum ((x-1)^2)

m <- mle (f, Start = list(x=10))


m

summary(m)

optim( ) function in R

f <- function (x)

Sum((x-1)^2)

Optim(c(10,10),f)

21. nlm ( ) function implementation in R

f <- function (n)

Sum((n-1)^2)

nlm(f, c(10,10))

Tree Models

Install.packages(“tree”)

library(tree)

attach (Cars)

names(Cars)

model <- tree(Cars)

plot(model)

text(model)
22. autocorrelation function, partial autocorrelation implemention in R

Z <- rnorm(250,0,2)

Y[1]-Z[1]

for( i in 2 : 250) Y[i] <- Y[i-1] +Z[i]

plot.ts(Y)

acf(Y)

Y <- numeric(250)

Y[1]-Z[1]

for(i in 2 : 250) Y[i] <- Y[i-1]+Z[i]

plot.ts(Y)

acf(Y)

Y <- rnorm(250,0,2)

par(mfrow = c(1,2))

plot.ts(Y)

acf(Y)

par(mfrow = c(1,2))

acf(x,type = “p”)

acf(y,type = “p”)

par(mfrow=c(1,1))

acf(cbind(x,y))

acf(cbind(x,y),type = “p”)
23.detrended function in R programming

ma3 <- function(x){

y <- numeric(length(x)-2)

detrended <- second – predict(lm(second ~I(1:length(second))))

ts.plot(detrended)

acf(detrended, type = “p”)

Time series modelling

par(mfrow= c(1,2))

acf(Lynx,main=””)

acf(Lynx,type = “p”,main=””)

model10 <- arima(Lynx,order=c(1,0,0))

model20 <- arima(Lynx,order=c(2,0,0))

AIC(model10,model20)

24. Distance measuring in R CLUSTER

R <- 1:16

mat <- matrix (R,4,4)

mat

m <- mat

dist(m,method = “euclidean”)

dist(m, method = “manhattan”)

dist(m, method = “binary”)


dist(m,method = “maximum”)

dist(m,method =”Canberra”)

dist(m, method =”minkowski”)

25. rbind in Clustering

mtcars

ncol(mtcars)

[1] 11

nrow(mtcars)

[1] 32

x <- mtcars[“Toyata Corolla “]

y <- mtcars[“Toyata Corona”]

rbind(x,y)

dist(rbind (x,y))

z <- mtcars [“Pontiac Firebird”]

dist(rbind(y,z))

dist(as.matrix(mtcars))

26. hcluster in R

mt <- matrix(1:100, 10,10) # creating matix 10 by 10

# Calculating Euclidean algorithm


ed <- dist(mt,method = “euclidean”)

# apply hclust( ) for clustering

h1 <- hclust(ed)

h1.

# plotting of clustering

Plot(h1)

27. Implementation of K-means clustering in R.

# Creating a matrix

m <- matrix[1:20,4,4]

# now apply kmeans algorithm

Km <- K means algorithm

Km <- kmeans(m,centers=3)

Km

# plot k-means( ) function output

Plot(m,col = (km$cluster),

main = “K means output with 3 clusters”,

pch= 20,cex= 4)

iris

head(iris)

newiris <- iris

head(newiris)

newiris$Species <- NULL


head(newiris)

km <- kmeans(newiris,3))

table(iris$Species,km$cluster)

plot(newiris[c(“Sepal.Length”, “Sepal.Width”)],col = Km$Cluster)$Species,Km$cluster)

points(Km$centers[,c(“Sepal.Length”, “Sepal.Width”)],col=1:3,pch =8,cex =2)

28.CURE algorithm

CURE(clustering Using Representatitives) algorithm is another large-Scale Clustering

Algorithm.

Initialization of Cure algorithm

The Cure algorithm follows the concept of Euclidean Space

29.Distance Measuring in Clustering

R <- 1:16

[1] 1 2 3 4 5 6…………16

mat <- matrix(R,4,4)

mat

m <- mat

# the Euclidean algorithm distance.

dist(m, method = “euclidean”).

# Manhattan distance

dist(m, method = “manhattan”)

# the jaccard method for binary data


dist(m, method = “binary”)

# maximum distance

dist(m, method = “maximum”)

# Canberra distance

dist(m, method =”Canberra”)

# minkowsi distance

dist(m, method = “minkowski”)

30. éclat( ) function in association rules in clustering

apriori ( ) function

The Package a rules provides a function apriori ( ) that performs association rule mining using
Apriori algorithm .The function determines the frequent itemsets ,association rules and
association hyperedges.

eclat( ) function

Support(x,transcations,type,…)

R <- Support(items(ap),TM, type = “absolute”)

ri <- ruleInduction(ap,TM)

ri

Inspect(head(ri,by =”lift”)

Sample(x,size,replace,…..)

random transactions(nItems,nTrans,method,….)

rt <- random.transcations(nItems = 20, nTrans = 10, method = “Independent”)


rt.

You might also like