0% found this document useful (0 votes)
3 views2 pages

Ds

The document outlines various statistical analyses and data manipulation techniques using R, including correlation tests, t-tests, ANOVA, decision trees, and clustering methods. It also includes practical applications of time-series forecasting and principal component analysis on different datasets. Additionally, there are examples of MongoDB queries for data retrieval and manipulation.

Uploaded by

sefami1889
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views2 pages

Ds

The document outlines various statistical analyses and data manipulation techniques using R, including correlation tests, t-tests, ANOVA, decision trees, and clustering methods. It also includes practical applications of time-series forecasting and principal component analysis on different datasets. Additionally, there are examples of MongoDB queries for data retrieval and manipulation.

Uploaded by

sefami1889
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Bankai str(cor)

BANKAI summary(cor)
BANKAI cor.test(cor$aptitude,cor$job_prof,alternative = "two.sided",method =
BANKIA "pearson")
AIDEJDXSKN #paired t-test application
FDNVFJSKNDCK stud<-read.csv("student.csv",sep = ",",header = T)
FVISJICKN;KSJFNVXC View(stud)
str(stud)
JM’LFVKNLKNXFV summary(stud)
boxplot(stud)
t.test(stud$Test1,stud$Test2,alternative="less",paired = T)
#correlation - Ice cream
OXMFMVL NKLJVF ice<-read.csv("icecream.csv",sep = ",",header = T)
;X KNDKLFVXN LXKNVF View(ice)
mean(biopsy1$predict==biopsy1$class) str(ice)
summary(ice)
Practical of Decision Tree boxplot(ice)
Regression tree cor.test(ice$Total.sales,ice$Temp,alternative = "two.sided",method =
data<-read.csv("Hitters.csv",sep = ",",header = T) "pearson")
View(data)...str(data) …summary(data)... names(data) …library(rpart)
regtree<-rpart(Salary~Hits+Runs+Years,data=data) Aim : Practical of Analysis of Variance
regtree …plot(regtree) …plot(regtree)... text(regtree) #one-way-anova test
install.packages("rpart.plot") library(rpart.plot) …rpart.plot(regtree) … data1<-read.csv("one-way-anova.csv",sep = ",", header = T)
View(regtree) … names(data1) …str(data1)
//Cp-complexity parameter data1$dept<-as.factor(data1$dept)
regtree$cptable …cp=min(regtree$cptable[5,]) … str(data1) …summary(data1) …View(data1) …head(data1)
pr=prune(regtree,cp=cp) …rpart.plot(pr) … anv1<-aov(formula=satindex~dept,data=data1)...summary(anv1)
//Classification Tree
library("MASS") …data("biopsy") …View(biopsy) …str(biopsy) … #two-way-anova test
names(biopsy) …summary(biopsy) …biopsy$ID=NULL data2<-read.csv("crop-data.csv",sep = ",",header = T)
classtree<-rpart(class~.,data=biopsy) names(data2)
rpart.plot(classtree) str(data2)
biopsy$pred=predict(classtree,biopsy,type = "class") data2$density<-as.factor(data2$density)
table(biopsy$pred,biopsy$class) str(data2)
install.packages("titanic") summary(data2)
library("titanic") head(data2)
data("titanic_train") View(data2)
str(titanic_train) anv2<-aov(formula=yield~density+block+fertilizer,data=data2)
View(titanic_train) summary(anv2)
titanic_train$Name=NULL
titanictree<-rpart(Survived~Pclass+Age+Parch,data = titanic_train) library(readxl)
rpart.plot(titanictree) mydata<-read.csv("newsadv.csv") …View(mydata) …names(mydata)
anv<-aov(formula=Count~Day+Section,data=mydata)
'Classification tree' summary(anv)
golf<-read.csv("Golf.csv",sep = ',',header = T) Practical of Clustering
View(golf)...str(golf) …names(golf) …library("rpart") … # K-means clustering on IRIS dataset
install.packages("rpart.plot") …library("rpart.plot") … data("iris")...names(iris)...newdata<-iris[,-5]...head(newdata)
tree<-rpart(Play~.,data=golf,control = rpart.control(minsplit = fit<-kmeans(newdata,3)
1,minbucket = 1,cp=0)) …rpart.plot(tree) library(cluster)
clusplot(newdata,fit$cluster,color=T,shade=T,labels=2,lines=0)
fit… fit$size
Practical of Hypothesis Testing #one sample t-test dim(newdata)
data<-read.csv("onesample.csv",sep = ",",header = T) # Hierarchical clustering on IRIS dataset
View(data) …str(data) …summary(data) …boxplot(data) # dist function is used to compute the distance matrix
t.test(data$Time,mu=80,alternative="greater") # i.e. Euclidean distance between every pair of observations
#two sample t-test clust<-hclust(dist(iris[,3:4]))
my_data<-read.csv("twosample.csv",sep = ",",header = T) plot(clust)
View(my_data) …str(my_data) … clusterCut<-cutree(clust,3)
summary(my_data) …boxplot(my_data) … table(clusterCut,iris$Species)
var.test(my_data$time_g1,my_data$time_g2,alternative="two.sided") clust<-hclust(dist(iris[,3:4]),method = "average")
t.test(my_data$time_g1,my_data$time_g2,alternative="two.sided") plot(clust)
#paired t-test clusterCut<-cutree(clust,3)
time<-read.csv("paired_t_test.csv",sep = ",",header = T) table(clusterCut,iris$Species)
View(time) …str(time) …
summary(time) …boxplot(time) Aim : Practical of Time-Series Forecasting
t.test(time$time_before,time$time_after,alternative="greater",paired = # Time Series Analysis and Forecasting on AirPassengers
T) install.packages("forecast")
#correlation library(forecast)
cor<-read.csv("correlation.csv",sep = ",",header = T) data("AirPassengers")
View(cor) class(AirPassengers)
head(AirPassengers)
sum(is.na(AirPassengers))
summary(AirPassengers)
plot(AirPassengers)
tsdata<-ts(AirPassengers,frequency = 12)
ddata<-decompose(tsdata)
plot(ddata)
holt<-HoltWinters(tsdata,beta = FALSE,gamma = FALSE)
plot(holt)
# Time Series Analysis on Rainfall dataset
rainfall<-read.csv("rainfall.csv",sep = ",",header = T)
head(rainfall)
summary(rainfall)
class(rainfall)
tsdata<-ts(rainfall,frequency = 12,start = c(2012,1))
class(tsdata)
plot(tsdata)

Aim : Practical of Principal Component Analysis.


# Principal Component Analysis upon IRIS dataset
data("iris")
str(iris)
summary(iris)
mypr<-prcomp(iris[,-5])
mypr
summary(mypr)
plot(mypr,type="l")
biplot(mypr)

db.student.insert({_id=101,RollNo:4,Name=”Laxmi”,Marks:450,H
obbies:[“Reading”,”Danci ng”]});

db.student.find({Class:”TYCS”},{Name:1,Class:1,_id=0})
db.student.find({Class:{$ne:”TYCS”}},{Name:1,Class:1,_id:0})
db.student.find().sort({Marks:1}) //ascending
db.student.find({Class:”TYCS”},Marks:{$gt:400}})
//or, and, not
db.student.find({$or:[{Class:”TYCS”},Marks:{$gt:500}}]})
db.student.find({Class:{$ne:”TYCS”}},{Name:1,Class:1,_id:0}) ->
will name and class of those students whose class not TYCS
db.student.update({RollNo:2},{$set:{Marks:531}})
db.student.remove({Class:”FYCS”})
db.student.updateMany({Class:”TYCS”},{$inc:{Marks:5}})
db.Employee.aggregate({$group:{“_id”:”$Dept”,”Count”:{$sum:
1}}}) -> This will retrieve the number of employees in each
department
…………..:”$Dept”,”Count”:{$avg:”$Salary”}}})

db.student.find({},{Name:1,Marks:1,_is:0}).sort({Marks:1}) ->
sort the name and marks using projection argument

You might also like