0% found this document useful (0 votes)
203 views

Data Analytics Using R

Uploaded by

Sowndarya C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
203 views

Data Analytics Using R

Uploaded by

Sowndarya C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

Data Analytics using R

Programming
By
Dr. K. Sasirekha M.C.A., M.Phil., Ph.D.,
Department of Computer Science
Periyar University

1
Agenda

• Data Analytics – Basics


• Getting Started with R
• Some basic operations in R
• Packages in R
• Data Analytics with R

2
Data Analytics - Basics
What is Data?
Data

Qualitative Quantitative

Discrete Continuous
5 3.45

3
Data Analytics - Basics
What is Data Analytics?

• Analyzing raw data in order to make


conclusions about that information

4
Data Analytics - Basics
• Descriptive analytics What has happened ?

• Diagnostic analytics What has happened in


depth?

• Predictive analytics What might happen ?

• Prescriptive analytics What should we do ?

5
Data Analytics - Basics

Applications
• Business Analytics
• Health Analytics
• Web Analytics
• Risk Analytics

6
Getting Started with R

• R (the language) was created in the early


1990s
 is based upon the S language
 is a high-level language
 is an interpreted language

7
Installing R
• To install R you must first go to
http://www.r-project.org
• Once you’ve chosen a mirror close to you,
click that link and select your platform.

8
Choosing an IDE
• If you use R under Windows or Mac OS X, then a graphical

user interface (GUI) is available to you.

• Some of he best GUIs are:


 Eclipse/Architect
 RStudio
 Revolution-R
 Live-R
 Tinn-R
https://www.rstudio.com/
9
Variable Assignment
• Assign values to variables with the assignment operator "=“

• Note that another form of assignment operator "<-" is also in use


> X = 2;
X
[1] 2

> X <- 5
X
[1] 5

• Comment : #
10
Basic Data Types
 Numeric
 Integer
 Complex
 Logical
 Character
 Factor
 Date

11
Numeric
> x = 10.5 # assign a decimal value
>x # print the value of x
[1] 10.5

> class(x) # print the class name of x


[1] "numeric"

12
Integer
• In order to create an integer variable in R, we invoke the as.integer
function.

For example,

> y = as.integer(3)
>y # print the value of y
[1] 3
> class(y) # print the class name of y
[1] "integer"
> is.integer(y) # is y an integer?
[1] TRUE
13
Complex
• A complex value in R is defined via the pure imaginary
value i
• For example,

> z = 1 + 2i # create a complex number


>z # print the value of z
[1] 1+2i

> class(z) # print the class name of z


[1] "complex“
14
Logical
> x = 1; y = 2 # sample values
>z=x>y # is x larger than y?
>z # print the logical value

[1] FALSE

> class(z) # print the class name of z

[1] "logical"
15
Character

> x = as.character( “hai”)

>x # print the character string

[1] “hai”

> class(x) # print the class name of x

[1] "character"
16
Date
> temp <- c("12-09-1973")
> z <- as.Date(temp, "%d-%m-%Y")
>z
[1] "1973-09-12”

> class(z)
[1] "Date"

17
Data structures

 Before you can perform statistical analysis in R, your


data has to be structured in some coherent way. To
store your data R has the following structures:

 Vector
 Matrix
 Array
 Data frame
 List
 Time-series
18
Vector
 A vector is a sequence of data elements of the same basic type.

 For example, Here is a vector containing three numeric values 2, 3, 5.

> c(2, 3, 5)

[1] 2 3 5
 Here is a vector of logical values.

> c(TRUE, FALSE, TRUE, FALSE, FALSE)


[1] TRUE FALSE TRUE FALSE FALSE
19
Vector Operations
#creating vector using ':' operator
a = 1:5; a
b = -3:4; b

#creating vector using seq function


c=seq(from=1, to=10, by=2); c

#Access Elements of a Vector


a[3]
a[1:3]
a[c(F,T,T,F,T)]
20
Vector Operations Cont’d
#Performing Vector Arithmetic
a = 1:4; b = 5:8 ;
a
b
c = a + b; c
c = a - b; c
c =a * b; c
c = a / b; c
c = a + (b)^2; c
c = 2+a; c

c =2+3*b; c
c =(2+3)*b; c
21
Vector Operations Cont’d
#Vector Repetition
e=rep(5,4) ; e

# Replace single element


e[1]=10
e
e=e[e!=10]
e

#Delete single element


e=e[-1]
e

#Delete Entire Vector


e= NULL
e
22
Matrix Operations
#A matrix is a two-dimensional array
#Creating a Matrix
A=matrix(1:9, nrow = 3); A
B=matrix(1:9, nrow=3, byrow=TRUE); B

#Access Elements of a matrix


A[2, 3]
A[2, ]
A[ ,3]

#Combining Matrices
a = matrix(1:9, 3,3); a
b = matrix(10:18, 3,3); b
cbind(a,b)
rbind(a,b)
23
Matrix Operations Cont’d
#Matrix Arithmetic
c = a+b; c
c = a-b; c
c = a*b; c
c = a/b; c

#Modify Matrix Elements


a[3,3] = 0; a
a[a > 5] = 0; a
24
Array
• In R, Arrays are generalizations of vectors and
matrices.
> z = array(1:27,dim=c(3,3,3))

> dim(z)
[1] 3 3 3

print(z)

z[,,3]
25
List Operations
# A list contain elements of different types like − numbers, strings,
vectors
mylist= list( c(1, 1, 2, 5, 14, 42), month.abb, matrix(c(3, -8, 1, -3),
nrow = 2))
mylist

#Naming list elements


names(mylist) = c("numbers", "months", "matrix")
mylist
#A list’s length is the number of top-level elements that it
contains
length(mylist)

26
List Operations Cont’d
#Arithmetic operations on list
L1 = list(1:5);
L1

L2 = list(6:10);
L2

L1[[1]] + L2[[1]]
L1[[1]] - L2[[1]]
L1[[1]] * L2[[1]]
L1[[1]] / L2[[1]]
27
Data Frame
#Data frame is a two dimensional data structure in R

#hold different type of data

#A data frame is created with the data.frame() function

#mydata <- data.frame(col1, col2,.,colN)

#where col1, col2, col3, . are column vectors of any type


(such as character, numeric, or logical)
28
Data Frame Operations
#Creating a Data Frame
patientID <- c(1, 2, 3, 4)
age <- c(25, 34, 28, 52)
diabetes <- c("Type1", "Type2", "Type1", "Type1")
status <- c("Poor", "Improved", "Excellent", "Poor")

patientdata <- data.frame(patientID, age, diabetes, status)


patientdata

#Data Frame Properties


nrow(patientdata)
ncol(patientdata)
29
Data Frame Operations
#Accessing of a elements in Data Frame
patientdata[1:2]

#Modifying elements in Data Frame


patientdata[1, "age"] <- 30
patientdata

#Adding elements to a Data Frame


patientdata <- rbind(patientdata, list(5, 40, "Type2", "Improved"))
Patientdata

#Deleting Components from Data Frame


patientdata$gender <- NULL
patientdata
patientdata[-5,]
30
Function and Control Stmt
#A series of numbers in which each number #is the sum of the two preceding numbers.
#The simplest is the series 1, 1, 2, 3, 5, 8, etc.

Fibonacci <- function(n)


{
#if else Statement
if (n==1)
{
x <- 0
}
else
{
x <- c(0,1)
# While Loop
while (length(x) < n)
{
position <- length(x)
new <- x[position] + x[position-1]
x <- c(x,new)
}
}
return(x)
} 31
Packages

 Packages are collections of R functions, compiled code, data,


documentation, and tests, in a well-defined format.

 The directory where packages are stored is called the library.

 R comes with a standard set of packages.

 Others are available for download and installation.

>library() # see all packages installed


>install.packages("class")
>search() # see packages currently loaded
32
Packages Cont’d

• Adding Packages

33
Statistical Operations
#to get the iris dataset
dm=iris[,-5]

#dataset to convert into matrix


dm=as.matrix(dm)

meandm=mean(dm)
meandm

mediandm=median(dm)
mediandm

sddm=sd(dm)
sddm
34
Data Exploration Operations
s=c(50,80,90,25,70)

maximum=max(s)
minimum=min(s)

total=sum(s)
average=ave(s)

squareroot=sqrt(s)
round=round(squareroot)

Summary ()
35
DATA VISUALIZATION OPERATIONS
#Visualization of Average Rainfall in India for Last 10 Years

Year=c(2009,2010,2011,2012,2013,2014,2015,2016,2017,2018);
Rainfall=c(69.43,43.15,35.23,50.03,60.02,47.62,48.38,38.69,52.48,58.18);

names(Rainfall)=Year

#Pie Chart
pie(Rainfall,col=Year,main="Average Rainfall in India for Last 10 Years")

#Bar Chart
barplot(Rainfall,col=Year, main="Average Rainfall in India for Last 10 Years")

36
DATA VISUALIZATION OPERATIONS Cont’d

#Histograms
hist(Rainfall,col="yellow", border="blue")

#Line Graph
plot(Year,Rainfall,type='o', col="blue", main="Average
Rainfall in India for Last 10 Years")

#Scatterplot
plot(Year, Rainfall, col="red", main="Average Rainfall in
India for Last 10 Years")
37

You might also like