0% found this document useful (0 votes)
52 views22 pages

Big Data Analytics Programs Only

The document discusses various data processing operations in R including: 1) Importing data from CSV, text, and Excel files using functions like read.csv, read.table, and library XML. 2) Exporting data frames to text files using write.table. 3) Performing numerical operations on user input data like calculating sum, max, min, mean, square root and rounding using functions such as sum, max, min, mean, sqrt, and round.

Uploaded by

usha chidambaram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views22 pages

Big Data Analytics Programs Only

The document discusses various data processing operations in R including: 1) Importing data from CSV, text, and Excel files using functions like read.csv, read.table, and library XML. 2) Exporting data frames to text files using write.table. 3) Performing numerical operations on user input data like calculating sum, max, min, mean, square root and rounding using functions such as sum, max, min, mean, sqrt, and round.

Uploaded by

usha chidambaram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

EX1.

To get the input from user and perform numerical operations (MAX, MIN,
AVG, SUM, SQRT, ROUND) using R.

> a=scan()

1: 32

2: 23

3: 34

4: 67

5: 45

> sum(a)

[1] 201

> max(a)

[1] 67

> min(a)

[1] 23

> p=sqrt(a)

> print(p)

[1] 5.656854 4.795832 5.830952 8.185353 6.708204

> round(p)

[1] 6 5 6 8 7

> mean(a)

[1] 40.2
EX2. To perform data import/export (.CSV, .XLS, .TXT) operations
using data frames in R.

#Data import from .CSV


# R program to read csv file using read.table()
x <- read.csv2("E://Data//myfile.csv", header = TRUE, sep=", ")
# print x
print(x)

Output:

Col1.Col2.Col3
1 100, a1, b1
2 200, a2, b2
3 300, a3, b3

#Data import from .txt


# Simple R program to read txt file
x<-read.table("D://Data//myfile.txt ", header=FALSE)

# print x
print(x)

Output:
V1 V2 V3
1 100 a1 b1
2 200 a2 b2
3 300 a3 b3
#Data import from .xls
# loading the library and other important packages
library("XML")
library("methods")

# the contents of sample.xml are parsed


data <- xmlParse(file = "sample.xml")

print(data)

Output:
1
Alia
620
IT
2
Brijesh
440
Commerce
3
Yash
600
Humanities
4
Mallika
660
IT
5
Zayn
560
IT
#Exporting data from R as .txt
#Creating a data frame
>df=data.frame("name"=c("xxxx","yyyy","zzzz"),
+"language"=c("R","python","java"),"age"=c(21,20,22))
>write.table(df,file="mydata.txt",sep="\t",row.names=TRUE,col.names=NA)

Output:
#Exporting data from R
# Creating a dataframe
df = data.frame( "Name" = c("Amiya", "Raj", "Asish"), "Language" = c("R", "Python",
"Java"), "Age" = c(22, 25, 45) )
# Export a data frame to a text file using write.table()
write.table(df,file = "myDataFrame.csv",sep = "\t",row.names = FALSE,)

Output
EX3. To get the input matrix from user and perform Matrix addition, subtraction,
multiplication, inverse transpose and division operations using vector concept in R.
#To create a matrix in R
>matrix (data, nrow, ncol, byrow, dim_name)
#To create matrix using R programming
>matrix1<-matrix(c(11, 13, 15, 12, 14, 16),nrow =2, ncol =3, byrow = TRUE)
>print(matrix1)
[,1] [,2] [,3]
[1,] 11 13 15
[2,] 12 14 16
#Arranging elements sequentially by row.
P <- matrix(c(5:16), nrow = 4, byrow = TRUE)
print(P)
[,1] [,2] [,3]
[1,] 5 6 7
[2,] 8 9 10
[3,] 11 12 13
[4,] 14 15 16
# Arranging elements sequentially by column.
Q <- matrix(c(3:14), nrow = 4, byrow = FALSE)
print(Q)
[,1] [,2] [,3]
[1,] 3 7 11
[2,] 4 8 12
[3,] 5 9 13
[4,] 6 10 14

# Defining the column and row names.


row_names = c("row1", "row2", "row3", "row4")
col_names = c("col1", "col2", "col3")
R <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_names))
print(R)

#To add

>Addition<-P+Q

>print(Addition)

col1 col2 col3


row1 8 10 12
row2 14 16 18
row3 20 22 24
row4 26 28 30
#Tosubtract
>subtraction<- Q-P

>print(subtraction)

[,1] [,2] [,3]


[1,] -2 1 4
[2,] -4 -1 2
[3,] -6 -3 0
[4,] -8 -5 -2
#To multiplication
>mul<-P*Q

>print(mul)

[,1] [,2] [,3]


[1,] 15 42 77
[2,] 32 72 120
[3,] 55 108 169
[4,] 84 150 224
#To Divide
>div<-P/Q
>print(div)
[,1] [,2] [,3]
[1,] 1.666667 0.8571429 0.6363636
[2,] 2.000000 1.1250000 0.8333333
[3,] 2.200000 1.3333333 1.0000000
[4,] 2.333333 1.5000000 1.1428571

#To inverse the matrix


> INV<-matrix(c(11, 13, 15, 12, 14, 16),nrow =2, ncol =2, byrow = TRUE)
>print(INV)
[,1] [,2]
[1,] 11 13
[2,] 15 12
> t<-solve(INV)
>print(t)
[,1] [,2]
[1,] -0.1904762 0.2063492
[2,] 0.2380952 -0.1746032

#To Transpose matrix


> A12<-t(INV)
>print(A12)
[,1] [,2]
[1,] 11 15
[2,] 13 12
EX4. To perform Association Rule Mining and Clustering using R.
a)Association rule mining
#loading required packages
> install.packages("arules")
> install.packages("arulesViz")
>library(arules)
>library(arulesViz)
>library(RColorBrewer)
#loading dataset
>a=data(Groceries)
>head(Groceries)
> itemFrequencyPlot(Groceries,topN=10,type='absolute')
> rule=apriori(Groceries,parameter = list(supp=0.001,conf=0.9))
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen maxlen target
ext
0.9 0.1 1 none FALSE TRUE 5 0.001 1 10 rules TRUE

Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 9
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
sorting and recoding items ... [157 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 done [0.01s].
writing ... [129 rule(s)] done [0.00s].
creating S4 object ... done [0.00s]
> rule
set of 129 rule
> inspect (rule)
lhs rhs support confidence coverage lift count
[1] {liquor, red/blush wine} => {bottled beer} 0.001931876 0.9047619 0.002135231 11.235269 19
[2] {curd, cereals} => {whole milk} 0.001016777 0.9090909 0.001118454 3.557863 10
[3] {soups, bottled beer} => {whole milk} 0.001118454 0.9166667 0.001220132 3.587512 11
[4] {whipped/sour cream, house keeping products} => {whole milk} 0.001220132 0.9230769 0.001321810 3.612599
12
[5] {pastry, sweet spreads} => {whole milk} 0.001016777 0.9090909 0.001118454 3.557863 10
[6] {rice, sugar} => {whole milk} 0.001220132 1.0000000 0.001220132 3.913649 12
[7] {rice, bottled water} => {whole milk} 0.001220132 0.9230769 0.001321810 3.612599 12
[8] {canned fish, hygiene articles} => {whole milk} 0.001118454 1.0000000 0.001118454 3.913649 11
[9] {grapes, onions} => {other vegetables} 0.001118454 0.9166667 0.001220132 4.737476 11
[10] {hard cheese, oil} => {other vegetables} 0.001118454 0.9166667 0.001220132 4.737476 11
[11] {root vegetables, butter, rice} => {whole milk} 0.001016777 1.0000000 0.001016777 3.913649 10
[12] {herbs, whole milk, fruit/vegetable juice} => {other vegetables} 0.001016777 0.9090909 0.001118454 4.698323 10
[13] {citrus fruit, tropical fruit, herbs} => {whole milk} 0.001118454 0.9166667 0.001220132 3.587512 11
[14] {root vegetables, whipped/sour cream, flour} => {whole milk} 0.001728521 1.0000000 0.001728521 3.913649 17
[15] {butter, soft cheese, domestic eggs} => {whole milk} 0.001016777 1.0000000 0.001016777 3.913649 10
[16] {tropical fruit, whipped/sour cream, soft cheese} => {other vegetables} 0.001220132 0.9230769 0.001321810 4.770605
12
[17] {root vegetables, whipped/sour cream, soft cheese} => {whole milk} 0.001220132 0.9230769 0.001321810 3.612599
12
[18] {citrus fruit, root vegetables, soft cheese} => {other vegetables} 0.001016777 1.0000000 0.001016777 5.168156 10
[19] {frankfurter, tropical fruit, frozen meals} => {other vegetables} 0.001016777 0.9090909 0.001118454 4.698323 10
[20] {frankfurter, tropical fruit, frozen meals} => {whole milk} 0.001016777 0.9090909 0.001118454 3.557863 10
[21] {tropical fruit, butter, frozen meals} => {whole milk} 0.001016777 0.9090909 0.001118454 3.557863 10
[22] {tropical fruit, whipped/sour cream, hard cheese} => {other vegetables} 0.001016777 0.9090909 0.001118454 4.698323
10
[23] {pork, whole milk, butter milk} => {other vegetables} 0.001016777 0.9090909 0.001118454 4.698323 10
[24] {pip fruit, butter milk, fruit/vegetable juice} => {other vegetables} 0.001016777 0.9090909 0.001118454 4.698323 10
[25] {frankfurter, root vegetables, sliced cheese} => {whole milk} 0.001016777 0.9090909 0.001118454 3.557863 10
[26] {butter, whipped/sour cream, sliced cheese} => {whole milk} 0.001220132 0.9230769 0.001321810 3.612599 12
[27] {yogurt, oil, coffee} => {other vegetables} 0.001016777 0.9090909 0.001118454 4.698323 10
[28] {root vegetables, onions, napkins} => {other vegetables} 0.001016777 0.9090909 0.001118454 4.698323 10
[29] {sausage, berries, butter} => {whole milk} 0.001016777 0.9090909 0.001118454 3.557863 10
…………
……….
[126] {tropical fruit, other vegetables, whole milk, yogurt, oil} => {root vegetables} 0.001016777 0.9090909 0.001118454 8.340400
10
[127] {tropical fruit, other vegetables, butter, yogurt, domestic eggs} => {whole milk} 0.001016777 0.9090909 0.001118454
3.557863 10
[128] {citrus fruit, root vegetables, whole milk, yogurt, whipped/sour cream} => {other vegetables} 0.001016777 0.9090909 0.001118454
4.698323 10
[129] {citrus fruit, tropical fruit, root vegetables, whole milk, yogurt} => {other vegetables} 0.001423488 0.9333333 0.001525165
4.823612 14
b) clustering
> data("iris")
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> x=iris[,3:4]
> model=kmeans(x,3)
> library(cluster)
> clusplot(x,model$cluster)
> clusplot(x,model$cluster,color=T,shade=T)
EX5. To perform data pre-processing operations i) Handling Missing data ii) MinMax
normalization
i) Handling Missing data
# Create a data frame
> dataframe <- data.frame( Name = c("Banu", "Anil", "Varshini", "Veena"),Physics = c(98, 87,
+ 91, 94), Chemistry = c(NA, 84, 93, 87), Mathematics = c(91, 86, NA, NA) )

> #Print dataframe


> print(dataframe)
Name Physics Chemistry Mathematics
1 Banu 98 NA 91
2 Anil 87 84 86
3 Varshini 91 93 NA
4 Veena 94 87 NA
> listMissingColumns <- colnames(dataframe)[ apply(dataframe, 2, anyNA)]
> print(listMissingColumns)
[1] "Chemistry" "Mathematics"
> meanMissing <- apply(dataframe[,colnames(dataframe) %in% listMissingColumns],
+ 2, mean, na.rm = TRUE)
> print(meanMissing)
Chemistry Mathematics
88.0 88.5
> medianMissing <- apply(dataframe[,colnames(dataframe) %in% listMissingColumns],
+ 2, median, na.rm = TRUE)
> print(medianMissing)
Chemistry Mathematics
87.0 88.5
> # Importing library
> library(dplyr)
> # Create a data frame
> dataframe <- data.frame( Name = c("Banu", "Anil", "Varshini", "Veena"),Physics = c(98, 87,
+ 91, 94),Chemistry = c(NA, 84, 93, 87), Mathematics = c(91, 86, NA, NA) )

> listMissingColumns <- colnames(dataframe)[ apply(dataframe, 2, anyNA)]


> meanMissing <- apply(dataframe[,colnames(dataframe) %in% listMissingColumns],
+ 2, mean, na.rm = TRUE)
> medianMissing <- apply(dataframe[,colnames(dataframe) %in% listMissingColumns],
+ 2, median, na.rm = TRUE)
> newDataFrameMedian <- dataframe %>% mutate(
+ Chemistry = ifelse(is.na(Chemistry), medianMissing[1], Chemistry),
+ Mathematics = ifelse(is.na(Mathematics), medianMissing[2],Mathematics))
> print(newDataFrameMedian)
Name Physics Chemistry Mathematics
1 Banu 98 87 91.0
2 Anil 87 84 86.0
3 Jai 91 93 88.5
4 Naveen 94 87 88.5
ii) MinMax normalization
> min_max_norm<-function(x){(x-min(x)/(max(x)-min(x))}
> min_max_norm<-function(x){(x-min(x))/(max(x)-min(x))}
> iris_norm<-as.data.frame(lapply(iris[1:4],min_max_norm))
> head(iris_norm)

Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 0.22222222 0.6250000 0.06779661 0.04166667
2 0.16666667 0.4166667 0.06779661 0.04166667
3 0.11111111 0.5000000 0.05084746 0.04166667
4 0.08333333 0.4583333 0.08474576 0.04166667
5 0.19444444 0.6666667 0.06779661 0.04166667
6 0.30555556 0.7916667 0.11864407 0.12500000
EX6. To perform Simple Linear Regression with R.
> x<-c(151,174,138,186,128,136,179,163,152,131)
> y<-c(63,81,56,91,47,57,76,72,62,48)
> relation<-lm(y~x)
> print(relation)

Output:
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
-38.4551 0.6746
EX7. To perform market basket analysis using Association Rules (Apriori).
# Loading Libraries
library(arules)
library(arulesViz)
library(RColorBrewer)

# import dataset
data("Groceries")

# using apriori() function


rules <- apriori(Groceries,
parameter = list(supp = 0.01, conf = 0.2))
Apriori

Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen
0.2 0.1 1 none FALSE TRUE 5 0.01 1
maxlen target ext
10 rules TRUE

Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE

Absolute minimum support count: 98

set item appearances ...[0 item(s)] done [0.00s].


set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
sorting and recoding items ... [88 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 done [0.00s].
writing ... [232 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
# using inspect() function
inspect(rules[1:10])
lhs rhs support confidence coverage
[1] {} => {whole milk} 0.25551601 0.2555160 1.00000000
[2] {hard cheese} => {whole milk} 0.01006609 0.4107884 0.02450432
[3] {butter milk} => {other vegetables} 0.01037112 0.3709091 0.02796136
[4] {butter milk} => {whole milk} 0.01159126 0.4145455 0.02796136
[5] {ham} => {whole milk} 0.01148958 0.4414062 0.02602949
[6] {sliced cheese} => {whole milk} 0.01077783 0.4398340 0.02450432
[7] {oil} => {whole milk} 0.01128622 0.4021739 0.02806304
[8] {onions} => {other vegetables} 0.01423488 0.4590164 0.03101169
[9] {onions} => {whole milk} 0.01209964 0.3901639 0.03101169
[10] {berries} => {yogurt} 0.01057448 0.3180428 0.03324860
lift count
[1] 1.000000 2513
[2] 1.607682 99
[3] 1.916916 102
[4] 1.622385 114
[5] 1.727509 113
[6] 1.721356 106
[7] 1.573968 111
[8] 2.372268 140
[9] 1.526965 119
[10] 2.279848 104

# using itemFrequencyPlot() function


arules::itemFrequencyPlot(Groceries, topN = 20,
col = brewer.pal(8, 'Pastel2'),
main = 'Relative Item Frequency Plot',
type = "relative",
ylab = "Item Frequency (Relative)")
EX8. Using R perform the Time-series analysis with respect to stock market data
# Install the library
install.packages("prophet")
# Load the library
library(prophet)
# Dataset
ap <- read.csv("example_air_passengers.csv")
m <- prophet(ap)
# Predictions
future <- make_future_dataframe(m,periods = 365)

# Print predictions
cat("\nPredictions:\n")
tail(future)

Output:
Predictions:
ds
504 1961-11-26
505 1961-11-27
506 1961-11-28
507 1961-11-29
508 1961-11-30
509 1961-12-01

# Forecast
forecast <- predict(m, future)
tail(forecast[c('ds', 'yhat',
'yhat_lower', 'yhat_upper')])
# Output to be present
# As PNG file
png(file = "facebookprophetGFG.png")
# Plot
plot(m, forecast)

# Saving the file


dev.off()

# Output to be present
# As PNG file
png(file = "facebookprophettrendGFG.png")

# Plot
prophet_plot_components(m, forecast)

# Saving the file


dev.off()

You might also like