0% found this document useful (0 votes)
24 views32 pages

Lab Manual Page No 1

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 32

DATA SCIENCE LAB MANUAL

1. Downloading , Installing and Setting path for R.


R is Scripting programming Language Which provides an environment for Statistical Computing , Data
Science and Graphics.
R is Open Source and Object –Oriented Programming Language for Statistical Computing and Data
Visualisation.
The Integrated development Suite for R language can be downloaded from Comprehenisve R Archive
Network(CRAN).The Network includes mirror websites for downloading the Suite from different
Countries.
To download R ,users need to visit the CRAN mirror page and Click on the URL of the choosen mirror
that will redirect them to the respective Site.
https://cran.r-project.org
R is offered as Precompiled binary distribution of a base System and contributing packing Different
distributions of R are available for different Operating Systems (OS) like windows,Mac, Linux

Downloading R for Windows.

Windows Users need to first download and install binaries for base distribution.The current version of
base binary distribution is R 3.3.1. Users Can Check and download Previous Contributions and
versions of R .R tools from the mirror website R tools is used for building R and it’s Packages.

Installing R for Windows

Installing R on windows is Simple Users need to double click the downloaded binary named R-3.3.1 –
win- exe ,On a graphical interface command line installation options are available for windows.
Command .libpaths( ) can be used to get or Set the path of the package library.

>.libpaths( )

O/P: C:/R/R-3.4.3/library.
2. R data types
Int,char ,float,double , Boolean ,complex,raw are the basic data types in R

x=5.6
print(class(x))
print(typeof(x))
y=5
print(class(y))
print(typeof(y))
x=as.integer(5)
print(class(x))
print(typeof(x))
y=sL
y=5L
print(class(y))
print(typeof(y))
x=4
y=3
z=x>y
print(z)
print(class(z))
print(typeof())
print(typeof(z))
x=4+3i
print(class(x))
print(typeof(x))
char="Magnet"
print(class(char))
print(typeof(char))
x=as.integer(6)
print(class(x))
raw_variable <-charToRaw("welcome to Programiz")
print(raw_variable)
print(class(raw_variable))
char_variable <-rawTochar(raw_variable)
char_variable <-rawToChar(raw_variable)
print(char_variable)
print(class(char_variable))
dbl_var <- c
dbl_var <- c(1L,2.5,4.5)
dbl_var
int_var <- c(1L,6L,10L)
int_var

3. Program make a simple calculator that can add, subtract, multiply and divide using functions
add <- function(x, y) {

return(x + y)

subtract <- function(x, y) {

return(x - y)

multiply <- function(x, y) {

return(x * y)

divide <- function(x, y) {

return(x / y)

# take input from the user

print("Select operation.")

print("1.Add")

print("2.Subtract")

print("3.Multiply")

print("4.Divide")
choice = as.integer(readline(prompt="Enter choice[1/2/3/4]: "))

num1 = as.integer(readline(prompt="Enter first number: "))

num2 = as.integer(readline(prompt="Enter second number: "))

operator <- switch(choice,"+","-","*","/")

result <- switch(choice, add(num1, num2), subtract(num1, num2), multiply(num1, num2),


divide(num1, num2))

print(paste(num1, operator, num2, "=", result))

Output

[1] "Select operation."

[1] "1.Add"

[1] "2.Subtract"

[1] "3.Multiply"

[1] "4.Divide"

Enter choice[1/2/3/4]: 4

Enter first number: 20

Enter second number: 4

[1] "20 / 4 = 5"

2. Find the Rectangle area using R programming


RectangleHeight <- 2

RectangleWidth <- 4

RectangleArea <- RectangleHeight * RectangleWidth

RectangleHeight

[1]2

RectangleWidth

[1] 4

RectangleArea

[1] 8.

4.Demonstrate the process of creating a user defined function in R.


Functions in R are min ,max, summary,average,aggregate,tapply,sapply( ) functions are available in R
this applicable on grouped data . in order to find required output through a defined function baesd on input
vector
EX: a user- defined function is required to Calculate mode in R Here the input is a vector value and output
is the mode value.
# Create the Function
getmode <- function(y){
uniqy <- unique(y)
uniqy (which.max(tabulated(match(y,uniqy))
}
V <- c(5,6,4,8,5,7,4,6,5,8,3,2,1)
Calculate the mode with User-defined Functions
resultmode<- getmode(v)
print(resultmode)

5. program on R objects like list,data.frame and operations on datasets


emp <- list(EmpName= "keir",EmpUnit="lawyer",Empsal=55000)
em
emp
emp$EmpUnit <- NULL
emp
length(emp)
emp$EmpCity = "london"
emp
length(emp)
emp1 <- list(EmpDesg = "prosecutor")
emp1
Emplist <- list(emp,emp1)
Emplist
data()
q()
names(matrix)
names(orange)
names(Orange)
summary(Orange)
str(Orange)
Orange
head(Orange,n=3)
tail(Orange,n=3)
class(Orange)
dim(Orange)
table(Orange$age)
table(Orange$Tree)
table(Orange$circumference)
TD
TD[1]
Orange[2]
Orange[1]
Orange[,3]
TD <- read.csv("Hardware.csv")
TD <- read.csv("Orange")
BOD
SRow <- BOD(1)
Row
str(SRow)
q()
R <- data.frame(RN = c('A','B','C'),RM = c(10,20,30))
R
T <- data.frame(TN= c('A','B','C'),TM= c(10,20,30))
T
K <- merge(R,T)
K
B <- data.frame(BN= c('A','C'),BM= c(100,200))
B
E <- merge(B,T,all.x = "TRUE")
E
E <- merge(B,T,all.y = "TRUE")
E
E <- merge(T,B,all.x = "TRUE")
E
E <- merge(T,B ,all.y = "TRUE")
E
E <- merge(B,T,all.x = "FALSE")
E
E <- merge(T,B,all.y = "FALSE")
E
list(Fruit.Name = S$Fruit.Name)
list(Fruit.Name = $Fruit.Name)
list(Fruit.Name)
Orange
data( )

SampleSuperstore
library( )
library(tools)
library( )
library(Matrix)
data( )

iris
data(iris)
Orange
Rcurl
WDI
getURL

getURL( )
htmlTreeParse()
install.packages("rjson")
Rcurl
library(Rcurl)
library(Rcurl)
RCurl
library(RCurl)
RCurl
library(RCurl)
RCurl

library()
library()
data()
library()
library()
lirary()
library()

EmpNo <- c(6,11,1966)


EmpName <- c("puthin","jinping","starmer")
ProjName <- c("R","PL","GBMKL")
Employee <- data.frame(EmpNo, EmpName, ProjName)
Employee
Employee[2]
Emp[1:2]
Employee[1:2]
Employee[3,]
Employee[,3]
row.names(Employee) <- c("Employee1","Employee2","Employee3")
row.names(Employee)
Employee
Employee["Employee1"]
Employee["Employee1",]
Employee["Employee3",]
Employee[c ("Employee 2", "Employee 1"),]
Employee [ c ("Employee 2","Employee 1")]
Employee [ c("Employee 1","Employee 2"),]
Employee
Employee [ c ("Employee 1", "Employee 2"),]
Employee [ "Employee2",]
Employee[["EmpName"]]
Employee[c("EmpNo", ProjName")]
)
6. operations on Different datasets and Packages

View(mtcars)
ncol(mtcars)
View(mtcars)
.libPaths()
Installed.Packages()

packageDescription("stats")
help(package="stats"
help(package="stats")
help(package="stats")
packageDescription("matrix package")
packageDescription("matrix")

plot(tress,col="red",pch=33)
.libPaths()
installed.Packages()
packageDescription("matrix")
help(package="matrix")
help(package="datsets")
datasets::air passengers
datasets::Air passengers
datasets::AirPassengers
library(datasets)
AirPassengers
.libpaths()
.libPaths()
installed.Packages()
installed.packages()
packageDescription("parellel")
packageDescription("Parellel")
packageDescription("Parellel")
packageDescription("tools")
packageDescription("utils")
packageDescription("translations")
packageDescription("servivel")
packageDescription("servival")
packageDescription("survival")
find.packages("survival")
find.Packages("survival")

7. Find the Square area using R

Square length <-4


Square length <- 4
Squarelength <- 4
Squarearea <- Squarelength*Squarelength
Squarelength
Squarearea
AirPassengers
ncol(AirPassengers)
8. ncol,nrow str,rnorm,summary functions on datasets
datasets ::(mtcars)
datasets ::mtcars
ncol(mtcars)
nrow(mtcars)
mtcars
summary(mtcars)
str()
str(mtcars)
str(str)
str(ls)
rnorm()
rnorm(100,2,4)
rnorm(2,3)
help(rnorm())
x <- rnorm(100,2,2)
x
summary(x)
str(x)

9.head tail edit ,plot functions on datasets

data()
data
data(trees)
trees
head(trees,n=7)
tail(trees,n=2)
summary(trees)
View(trees)
trees
edit(trees)
edit(trees)
edit(trees)
edit(trees)
plot(trees)
edit(trees)
plot(trees,color)
plot(trees,col="green")
dir()
list.files()
plot(tress)
plot (tress)
plot(tress)
plot(trees,col="green")
plot(trees,col="green")
plot(trees,col="blue")
plot(trees,col="pink")
plot(trees,col="red")

read.csv()
read.xlsx()
mtcars
summary(mtcars$mpg)
str(c(1,2,3,4,5,6))

10.Arthamatic expressions in R
9+23
4-2
5*4
%/4
5/4
4^5
4**5
23 %%9
5%/%4
sqrt(9)
sqrt(225)
2<4
T==FALSE
F==FALSE
x <- c(1:5)
x
x[(x>2)|(x<5)]
x[x>2)&(x<5)]
x[(x>2)&(x<5)]
x>2
x<4
x==3
x>=3
x<=2
install of rjson xml packages in R
install.packages("rjson")

install.packages("XML")
Titanic()
Titanic ()
Titanic()
titanic()
OPERATIONS ON SEQUENCE
a <- seq(5,11,by=2)
a
11.MATRIX IN R
matrix(a,2,2)
matrix(a,2,1)
matrix(a,2,2)
matrix(a,1,2)
dim(a) <- c(1,2)
diam(a) <- c(2,2)
dim(a) <- c(2,2)
a
x <- 6:11
x
mat <- matrix(x,2,3)
mat
mat <- matrix(x,3,2)
mat
mat <- matrix(x,3,3)
mat
mat [2,2]
mat [3,3]
mat [2,3]
mat [1,1]
mat [3,2]
mat [3,1]
mat [2,1]
mat [2,3]
mat[,3]
mat[2,]
mat[3,]
sin ,cos functions in R
sin(x)
cos(x)
tan(x)
sin(90)
cos(90)
tan(90)
cot(90)
c(1,2,6)+c(11,6,55)
x <- seq(1,20,0.1)
x
y <- sin(x)
y
plot(x,y)
x <- seq(1,20,5)
x
y <- cos(x)
y
plot(x,y)
x <- seq(1,2,0.1)
x
x <- seq(0,1,0.1)
x
y <- cos(x)
y
plot(x,y)
x <- seq(1,20,0.1)
x
y <-seq(x)
y
y <- cos(x)
y
plot(x.y)
plot(x,y)
reading .csv file in R
read.csv("sampledata.csv")
InputData <- read.csv("D:/samledata.csv")
read.csv('D:/Sampledata.csv')
read.table('D:/sampladata.csv',header=TRUE,sep= ',',)
read.table('D:/sampladata.csv',header=TRUE)
read.table('D:/sampledata.csv')
read.table('D:/sampladat.csv',header=FALSE)
read.table('D:/sampledata.csv', header=TRUE)
read.table('D:/sampledata.csv', header=TRUE, sep=',', )
read.table('D:/sampledata.csv', header=TRUE, sep=',',....)
read.csv('D:/sampledat.csv')
read.csv('D:/sampledata.csv')
Inputdata <- read.csv("D:/sampledata.csv")
save.image("C:\\Anitha\\R.R.Rao 19-12-22")
q()
data()
x <- c(v,s,p,a,l)
x <- c(a,p by=2)
x <- seq(10,25,by=5)
x
matrix(a,2,2)
dim(a)
edix(x)
edit(x)
12.Descriptive Statistics in R
mtcars
summary(mtcars)
min(mtcars)
max(mtcars)
range(mtcars)
mean(mtcars)
BOD
mean(BOD)
IQR(mtcars)
mtcars
IQR(mtcars)
x <-1:6
x
summary(x)
min(x)
max(x)
range(x)
mean(x)
median(x)
mad(x)
IQR(x)
quantile(x)
IHR(x)
apply(x,1,mean)
matrix(x)
matrix(x,3,2)
dim(x)
apply(x,1,mean)
a <- seq(10,25,by=5)
a
matrix(a,2,2)
q()
x<- c(6,11,4)
median.result <- medain(x)
median.result <- median(x)
print(median.result)
x<- c(-6,11,4)
median.result <- median(x)
print(median.result)
numbers <- c(6,11,4)
median(numbers)
barplot(numbers)
abline(h = medain (numbers))
abline(h= median (numbers))
q()
numbers <- c(6,11,4)
mean(numbers)
deviation <- sd(numbers)
deviation
barplot,abline functions in R
barplot(numbers)
abline(h= sd(numbers))
abline(h= sd(numbers)+ mean(numbers))
11.Apply,bins,median,sd,histogram ,barplot,abline functions in R

mtcars
summary(mtcars)
min(mtcars)
max(mtcars)
range(mtcars)
mean(mtcars)
BOD
mean(BOD)
IQR(mtcars)
mtcars
IQR(mtcars)
x <-1:6
x
summary(x)
min(x)
max(x)
range(x)
mean(x)
median(x)
mad(x)
IQR(x)
quantile(x)
IHR(x)
apply(x,1,mean)
matrix(x)
matrix(x,3,2)
dim(x)
apply(x,1,mean)
a <- seq(10,25,by=5)
a
matrix(a,2,2)
q()
x<- c(6,11,4)
median.result <- medain(x)
median.result <- median(x)
print(median.result)
x<- c(-6,11,4)
median.result <- median(x)
print(median.result)
numbers <- c(6,11,4)
median(numbers)
barplot(numbers)
abline(h = medain (numbers))
abline(h= median (numbers))
q()
numbers <- c(6,11,4)
mean(numbers)
deviation <- sd(numbers)
deviation
barplot(numbers)
abline(h= sd(numbers))
abline(h= sd(numbers)+ mean(numbers))
q()
h <- c(1,2,3)
bins <- c(0,5,10,15)
bins
hist(h,xlab="Values",ylab="Colours",col="red",xlim=c(0,3),
ylim=c(0,3),breaks=bins)
EmpNo <- c(6,11,1966)
EmpName <- c("RAM","RAHIM","ROBERT")
ProjName <- c("R","PL","GBMKL")
Employee <- data.frame(EmpNo, EmpName, ProjName)
Employee
Employee[2]
Emp[1:2]
Employee[1:2]
Employee[3,]
Employee[,3]
row.names(Employee) <- c("Employee1","Employee2","Employee3")
row.names(Employee)
Employee
Employee["Employee1"]
Employee["Employee1",]
Employee["Employee3",]
Employee[c ("Employee 2", "Employee 1"),]
Employee [ c ("Employee 2","Employee 1")]
Employee [ c("Employee 1","Employee 2"),]
Employee
Employee [ c ("Employee 1", "Employee 2"),]
Employee [ "Employee2",]
Employee[["EmpName"]]
Employee[c("EmpNo", ProjName")]
Employee[ c("EmpNo", "ProjName")]
Employee
Employee$EmpExpYears <- c(6,11,4)
Employee
Employee[order(Employee$EmpExpYears),]
Employee[order(-Employee$EmpExpYears),]
dim(Employee)
nrow(Employee0
)
nrow(Employee)
names(Employee)
edit(Employee)
Employee[1:3,]
Employee[1:3,1:2]
head()
head(Employee)
subset(Employee, EmpExpYears >6)
subset(Employee, EmpExpYears >4 ,select = c(EmpNo))
subset(Employee, Employee=="keir")
subset(Employee, EmpName=="keir")
subset(Employee, EmpName == "keir")
Employee
subset(Employee, EmpNo == "6")
subset(Employee, EmpNo == "6" | EmpNo == "11")
reading txt file in R
read.table("d:/sep,txt", sep="\t")
read.table("d:/sep.txt", sep="\t")
read.table("d:/sep.txt", sep="\t",header=TRUE)
mydata = read.table("d:/mydata.txt")
mydata = read.table("d:/mydata.txt")
mydata = read.table("d:/mydata.txt")
mydata = read.table("c:/mydata.txt")

mydata <- read.table("d:/mydata.txt")


mydata = read.table("d:/mydata.txt")
mydata = read.table("d:/mydata.txt")
mydata = read.table("d:/mydata.txt.docx")
read.table("d:/mydata.txt.docx",header=TRUE,Sep= ',',....)
read.table("d:/mydata.txt.docx",header=TRUE)
read.table("d:/mydata.txt.docx",sep ="\t",header=TRUE)
read.table("d:/mydata.txt.docx",sep="\t",header=TRUE)
read.table("d:/mydata.txt",sep="\t",header=TRUE)
mydata = read.table("d:/mydata.txt",sep="\t")
mydata = read.table("d:/mydata.txt")

14.sapply ,min,max functions in R


Employee
summary(Employee[4])
min(Employee[4])
max(Employee[4])
range(Employee[4])
Employee[,3]
mean(Employee[,4])
median(Employee[,4])
mad (Employee[,4])
IQR (Employee[,4])
quantile(Employee[,4])
sapply(Employee[4],mean)
sapply(Employee[4],min)
sapply(Employee[4],range)
sapply(Employee[4],quantile)
which.min(Employee$EmpExpYears)
which.max(Employee$EmpExpYears)
data <- read.table(header= TRUE)
data <- read.table(header= TRUE,text)
read.table("D:/sep.txt",header= TRUE,sep=",")
library(plyr)

missing value functions in R


x <- c(6,11,5,NA,55,3)
y <- c("red","safron",NA,"NA")
is.na(x)
is.na(y)
c <- as.data.frame(matrix c(1:11,NA),ncol=3)
c <- as.data.frame(matrix c(1:11,NA),ncol=3))

c <- as.data.frame (matrix c(1:11,NA),ncol=3))


c <- as.data.frame ( matrix(c(1:11,NA),ncol=3))
c
na.omit(c)
na.exclude(c)
na.pass(c)
na.fail(c)
sum(is.na(c))
rowsums(is.na(c))
rowSums(is.na(c))
rowMeans(is.na(c))*length(c)
q()
anyNA(c(-11,NaN,11))
anyNA(c(-11,NA,11))
anyNA(c(-11,11))
any(is.na(c(-6,NA,6)))
is.na(c(-6,6))
any(is.na(c(-6,6)))
any(is.na(c(-6,6)))
any(is.na(c(-6,NaN,6)))
any(is.valid(c(-4,NaN,4)))
any(is.valid(c(-4,NA,4)))
is.valid(c(-4,NA,4))
is.valid(c(-4,4))
is.Valid(c(-4,4)
)
is.inavlid(c(-4,NA,4))
is.invalid(c(-4,NA,4))
is.finite(c(-4,Inf,4))
is.infinite(c(-4,Inf,4))
is.finite(c(-11,-Inf,11))
is.NaN(c(-11,Inf,11))
is.nan(c(-11,Inf,11))
is.nan(c(-4,Inf,NaN))
Employee
summary(Employee$EmpExpYears)
summary(Employee$EmpNo)
duration = Employee$EmpExpYears
max(duration)-min(duration)
save.image("C:\\Users\\R\\Documents\\.RData")
mtcars
head(subset(mtcars, select = 'gear')
)
factor(mtcars$gear)
w = table(mtcars$gears)
w
w = table(mtcars$gear)
w
t = as.data.frame(w)
t
names(t)[1] = 'gear'
t
names(t)[2] = 'rao'
t
w
save.image("C:\\Users\\R\\OneDrive\\Documents\\.RData")
cbind(w)
getmode <- function(y){
uniqy <- unique(y)
uniqy[which.max(tabulate(match(y,uniqy))]
v <-c(5,6,4,8,5,7,4,6,5,8,3,2,1)
resultmode <- getmode(v)
resultmode<- getmode(v)
v
resultmode <- getmode(v)
print(resultmode)
getmode <- function(y)
{ uniqy <- unique(y)
uniqy[which.max(tabulate(match(y,uniqy))}
getmode <- function(y)
>
getmode <- function(y){
uniqy <- unique (y)
uniqy[which.max(tabulate(match(y, uniqy)))]
}

v <- c(5,6,4,8,5,7,4,6,5,8,3,2,1)
resultmode <- getmode(v)
print(resultmode)
charv <- c("Rt","cc","cc","bm","cc")
resultmode <- getmode (charv)
print(resultmode)
save.image("C:\\Users\\R\\OneDrive\\Documents\\.RData")
x <- c(15,54,6.5,9.2,36,5.3,8,-7,-5)
result.mean <- mean(x)
print(result.mean)
x <- c(1,2,3,4,5,6)
result.mean <- mean(x)
print(result.mean)
result.mean <- mean(x,trim= 0.1)
print(result.mean)
x <- c(6,11,4,NA)
result.mean <- mean(x)
print(result.mean)
result.mean <- mean(x,na.rm= TRUE)
print(result.mean)
numbers <- c(6,11,4)
mean(numbers)
barplot(numbers)
abline(h = mean (numbers))
barplot(numbers)
abline(h = mean (numbers))
q()
h <- density(c(6,11,4))
plot(h)
plot(h,xlab="Values",ylab="Density")
plot(h,xlab="Values",ylab="Density")
h <- c(6,11,4)

15. LINEAR RELATIONSHIP IN R

barplot(h,xlab= "categories",ylab= "Values",col="saffron")


barplot(h,xlab= "categories",ylab= "Values",col="red")
png(file= "samplebarchart.png")
barplot(h,horiz=TRUE)
dev.off()
barplot(h,xlab="Values",ylab="Categories",col="Red",horiz=TRUE)
colors <- c("red","yellow","block")
months <- c("May","jan","Nov")
regions <- c("ENG","AUS","NZ")
Values <- matrix(c(2,9,3,11,9,4,8,7,3),nrow=3,ncol=3,byrow=TRUE)
Values
rownames(Values) <- regions
rownames(Values)
Values
colnames(Values) <- months
Values
barplot(Values,col=colors,width=2,beside=TRUE,names.arg= months,main=
"total revenue 2022 by month")
colors <- c("red","yellow","blue")
barplot(colors)
days <- c("tue","wed")
months <- c("May","Nov")
colors <- c("red","blue")
val <- matrix(c(6,11,4,12,3,36),nrow= 3,ncol=2,byrow=TRUE)
val
barplot(val,main= "total",names.org=months,xlab="Months",ylab="Days",col=colors)
legend("topleft",days,cex=1.3,fill=colors)
dim(Grades)
Employee
cor(Employee$EmpNo,Employee$EmpExpYears)
16.LINEAR REGRESSION MODELS
x <- Employee$EmpNo
y <- Employee$EmpExpYears
n <- nrow(Employee)
xmean <- mean(Employee$EmpNo)
ymean <- mean(Employee$EmpExpYears)
xiyi <- x*y
numerator <- sum(xiyi)- n*xmean*ymean
denominator <- sum(n^2)-n*(xmean^2)
b1 <- numerator/denominator
b0 <- ymean - b1*xmean
b1
b0
xmean
ymean
model_R <- lm(Employee$EmpExpYears ~ Employee$EmpNo)
model_R
summary(model_R)
Plot(Employee$EmpNo,Employee$EmpExpYears,col="blue",main="Linear Regression",
abline(lm(Employee$EmpExpYears ~ Employee$EmpNo)),
cex= 1.3,pch= 16,xlab="No of Emp",
ylab= "Employee$EmpExpYears)
)
")
Plot(Employee$EmpNo,Employee$EmpExpYears,col="Red",main="Linear Regression",
abline(lm(Employee$EmpExpYears ~ Employee$EmpNo)),
cex = 1.3,pch=16,xlab="No of Emp",
ylab= "Employee$EmpExpYears")
ylab= "Employee$EmpExpYears")
plot(Employee$EmpNo,Employee$EmpExpYears,col="Red",main="Linear Regression",
abline(lm(Employee$EmpExpYears ~ Employee$EmpNo)),
cex= 1.3,pch=16,xlab="No of Emp",
ylab= "Employee$EmpExpYears")
17. Linear Model function in R programming
x <- C(160,180,200)

y <- C(80,90,100)

relation <- lm(y~x)


print(relation)

print(summary(relation))

PREDICT( ) function in R

a <- data.frame(x=170)

result <- predict(relation,a)

print(result)

[1]

85

Plot(y,x,col = “blue”, main = “Height & Weight Regression”,

abline(lm(x~y)), cex=1.3, pch = 16, xlab = “ Weight in Kg”, ylab = “height in cm”)

18. Scatter plot and box plot in R Using Cars dataset

Cars

head(Cars)

Scatter.smooth( x = Cars $ Speed, y= Cars$ dist , main = “Dist ~Speed”)

Implementing box plot

Par(mfrow = c (1,2)) # divide the graph area in 2 coloumns.

boxplot(Cars$Speed, main = “Speed”, Sub = Paste(“outlier rows: “, boxplot.stats


((Cars$Speed)$out)) # box plot for speed.

Boxplot(Cars$dist, main = “Distance”, Sub = paste(“outlier rows : “, boxplot.stats


(cars$dist)$out)) # box plot for distance.

beta co-efficients in R

linear Mod <- lm (dist ~ Speed, data = cars)

print(linear Mod)
print(Summary(linear Mod))

plot(Cars $dist, Cars $Speed, col = “red”, main = “Speed & Distance Regression “ ,

abline (lm(Cars$Speed ~ Cars$ dist)), cex = 1.3, pch = 16, xlab = “Distance”, ylab =
“Speed”)

19.Binary Logistic Regression --- glm ( ) function using R

E <- factor (c(“NA”,”C”,”N”))

res <- cbind ( clear = c(6,5,10), not clear = c(9,10,0))

res

clear not clear

[1,] 6 9

[2,] 5 10

[3,] 10 0

EN = (E == “N”)

EC = (E == “C”)

b1 <- glm (formula = res ~ EC+EN, formula = binomial (“logit”))

b1

summary(b1)

20. mle ( ) function implemention in R

f <- function(x)

Sum ((x-1)^2)

m <- mle (f, Start = list(x=10))


m

summary(m)

optim( ) function in R

f <- function (x)

Sum((x-1)^2)

Optim(c(10,10),f)

21. nlm ( ) function implementation in R

f <- function (n)

Sum((n-1)^2)

nlm(f, c(10,10))

Tree Models

Install.packages(“tree”)

library(tree)

attach (Cars)

names(Cars)

model <- tree(Cars)

plot(model)

text(model)
22. autocorrelation function, partial autocorrelation implemention in R

Z <- rnorm(250,0,2)

Y[1]-Z[1]

for( i in 2 : 250) Y[i] <- Y[i-1] +Z[i]

plot.ts(Y)

acf(Y)

Y <- numeric(250)

Y[1]-Z[1]

for(i in 2 : 250) Y[i] <- Y[i-1]+Z[i]

plot.ts(Y)

acf(Y)

Y <- rnorm(250,0,2)

par(mfrow = c(1,2))

plot.ts(Y)

acf(Y)

par(mfrow = c(1,2))

acf(x,type = “p”)

acf(y,type = “p”)

par(mfrow=c(1,1))

acf(cbind(x,y))

acf(cbind(x,y),type = “p”)
23.detrended function in R programming

ma3 <- function(x){

y <- numeric(length(x)-2)

detrended <- second – predict(lm(second ~I(1:length(second))))

ts.plot(detrended)

acf(detrended, type = “p”)

Time series modelling

par(mfrow= c(1,2))

acf(Lynx,main=””)

acf(Lynx,type = “p”,main=””)

model10 <- arima(Lynx,order=c(1,0,0))

model20 <- arima(Lynx,order=c(2,0,0))

AIC(model10,model20)

24. Distance measuring in R CLUSTER

R <- 1:16

mat <- matrix (R,4,4)

mat

m <- mat

dist(m,method = “euclidean”)

dist(m, method = “manhattan”)

dist(m, method = “binary”)


dist(m,method = “maximum”)

dist(m,method =”Canberra”)

dist(m, method =”minkowski”)

25. rbind in Clustering

mtcars

ncol(mtcars)

[1] 11

nrow(mtcars)

[1] 32

x <- mtcars[“Toyata Corolla “]

y <- mtcars[“Toyata Corona”]

rbind(x,y)

dist(rbind (x,y))

z <- mtcars [“Pontiac Firebird”]

dist(rbind(y,z))

dist(as.matrix(mtcars))

26. hcluster in R

mt <- matrix(1:100, 10,10) # creating matix 10 by 10

# Calculating Euclidean algorithm


ed <- dist(mt,method = “euclidean”)

# apply hclust( ) for clustering

h1 <- hclust(ed)

h1.

# plotting of clustering

Plot(h1)

27. Implementation of K-means clustering in R.

# Creating a matrix

m <- matrix[1:20,4,4]

# now apply kmeans algorithm

Km <- K means algorithm

Km <- kmeans(m,centers=3)

Km

# plot k-means( ) function output

Plot(m,col = (km$cluster),

main = “K means output with 3 clusters”,

pch= 20,cex= 4)

iris

head(iris)

newiris <- iris

head(newiris)

newiris$Species <- NULL


head(newiris)

km <- kmeans(newiris,3))

table(iris$Species,km$cluster)

plot(newiris[c(“Sepal.Length”, “Sepal.Width”)],col = Km$Cluster)$Species,Km$cluster)

points(Km$centers[,c(“Sepal.Length”, “Sepal.Width”)],col=1:3,pch =8,cex =2)

28.CURE algorithm

CURE(clustering Using Representatitives) algorithm is another large-Scale Clustering

Algorithm.

Initialization of Cure algorithm

The Cure algorithm follows the concept of Euclidean Space

29.Distance Measuring in Clustering

R <- 1:16

[1] 1 2 3 4 5 6…………16

mat <- matrix(R,4,4)

mat

m <- mat

# the Euclidean algorithm distance.

dist(m, method = “euclidean”).

# Manhattan distance

dist(m, method = “manhattan”)

# the jaccard method for binary data


dist(m, method = “binary”)

# maximum distance

dist(m, method = “maximum”)

# Canberra distance

dist(m, method =”Canberra”)

# minkowsi distance

dist(m, method = “minkowski”)

30. éclat( ) function in association rules in clustering

apriori ( ) function

The Package a rules provides a function apriori ( ) that performs association rule mining using
Apriori algorithm .The function determines the frequent itemsets ,association rules and
association hyperedges.

eclat( ) function

Support(x,transcations,type,…)

R <- Support(items(ap),TM, type = “absolute”)

ri <- ruleInduction(ap,TM)

ri

Inspect(head(ri,by =”lift”)

Sample(x,size,replace,…..)

random transactions(nItems,nTrans,method,….)

rt <- random.transcations(nItems = 20, nTrans = 10, method = “Independent”)


rt.

You might also like