R-Programming For Data Science
R-Programming For Data Science
DATA SCIENCE
DR.S.AMUTHA
ASSOCIATE PROFESSOR
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
P.S.R ENGINEERING COLLEGE, SIVAKASI
• R is open-source software.
• R can be used for suitable projects for machine learning and deep
learning model building.
• R has a huge capability as a statistical tool.
• R is probably the best visualization tool for depicting insights through
different graphs and charts.
• R has a steep learning curve as the R syntax is quite different and hence,
slightly challenging to learn compared to python.
• R does not offer basic security measures which are essential for production-
grade web applications.
• The performance of r is slower than python or matlab, and it does perform
memory management i.e., R requires a lot of memory.
• SENTIMENT ANALYSIS.
• UBER DATA ANALYSIS.
• MOVIE RECOMMENDATION SYSTEM.
• CREDIT CARD FRAUD DETECTION.
• WINE QUALITY PREDICTION.
• CUSTOMER SEGMENTATION.
• SPEECH EMOTION RECOGNITION.
• PRODUCT BUNDLE IDENTIFICATION.
Dr.S.AMUTHA ASSOCIATE PROFESSOR , CSE DEPT,PSREC 7
RESHAPING DATA
• T() FUNCTION
• TAKES A MATRIX OR DATA FRAME AS AN INPUT AND GIVES
THE TRANSPOSE OF THAT MATRIX OR DATA FRAME AS IT’S
OUTPUT.
• SYNTAX:
T(MATRIX/ DATA FRAME)
RBIND():
• WE CAN COMBINE VECTORS, MATRIX OR DATA FRAMES BY
ROWS USING RBIND() FUNCTION.
• SYNTAX: RBIND(X1, X2, X3)
• WHERE X1, X2 AND X3 CAN BE VECTORS OR MATRICES OR
DATA FRAMES.
# CBIND FUNCTION
INFO <- CBIND(NAME, AGE, ADDRESS)
PRINT("COMBINING VECTORS INTO DATA FRAME USING CBIND ")
PRINT(INFO)
# RBIND FUNCTION
NEW.INFO <- RBIND(INFO, NEWD)
PRINT("COMBINING DATA FRAMES USING RBIND ")
PRINT(NEW.INFO)
Dr.S.AMUTHA ASSOCIATE PROFESSOR , CSE DEPT,PSREC 16
OUTPUT
[1] "COMBINING VECTORS INTO DATA FRAME USING
CBIND "
NAME AGE ADDRESS
[1,] "SHAONI" "24" "PUDUCHERRY"
[2,] "ESHA" "53" "KOLKATA"
[3,] "SOUMITRA" "62" "DELHI"
[4,] "SOUMI" "29" "BANGALORE"
1 SHAONI 24 PUDUCHERRY
2 ESHA 53 KOLKATA
3 SOUMITRA 62 DELHI
4 SOUMI 29 BANGALORE
5 SOUNAK 28 BANGALORE
6 BHABANI 87 KOLKATA
Dr.S.AMUTHA ASSOCIATE PROFESSOR , CSE DEPT,PSREC 18
MERGING TWO DATA FRAMES
• Syntax: merge(dfa, dfb, …)
NAME ID
1 ARJUN 113
2 SHAONI 111
3 SOUMI 112
4 ESHA 115
5 SOUNAK 114
• MELT():
• IT IS USED TO CONVERT A DATA FRAME INTO A MOLTEN DATA FRAME.
• SYNTAX: MELT(DATA, …, NA.RM=FALSE, VALUE.NAME=”VALUE”)
• WHERE,
• DATA: DATA TO BE MELTED
… : ARGUMENTS
NA.RM: CONVERTS EXPLICIT MISSINGS INTO IMPLICIT MISSINGS
VALUE.NAME: STORING VALUES
• DCAST():
• IT IS USED TO AGGREGATE THE MOLTEN DATA FRAME INTO A
NEW FORM.
• SYNTAX: MELT(DATA, FORMULA, FUN.AGGREGATE)
• WHERE,
• DATA: DATA TO BE MELTED
FORMULA: FORMULA THAT DEFINES HOW TO CAST
FUN.AGGREGATE: USED IF THERE IS A DATA AGGREGATION
Dr.S.AMUTHA ASSOCIATE PROFESSOR , CSE DEPT,PSREC 24
# MELT AND CAST
library(mass)
library(reshape)
a <- data.frame(id=c("1", "1", "2", "2"),points=c("1", "2", "1", "2"),
x1=c("5", "3", "6", "2"), x2=c("6", "5", "1", "4"))
print("melting")
m <- melt(a, id=c("id", "point"))
print(m)
print("casting")
idmn <- dcast(a, id~variable, mean)
print(idmn)
• MELTING
ID POINTS VARIABLE VALUE
1 1 X1 5
1 2 X1 3
2 1 X1 6
2 2 X1 2
3 1 X2 6
1 2 X2 5
2 1 X2 1
2 2 X2 4
• CASTING
ID X1 X2
1 4 5.5
2 4 2.5
• download.file() to download the csv file that contains the traffic stop
data
download.file("http://bit.ly/ms_trafficstops_bw",
"data/ms_trafficstops_bw.csv")
• read.csv() to load into memory the content of the csv file as an object of
class data.frame.
trafficstops <- read.csv("data/ms_trafficstops_bw.csv")
• check the top (the first 6 lines) of this data frame using the
function head():
head(trafficstops)
• size:
• dim(trafficstops) - returns a vector with the number of rows in the first element,
and the number of columns as the second element (the dimensions of the
object)
• nrow(trafficstops) - returns the number of rows
• ncol(trafficstops) - returns the number of columns
• length(trafficstops) - returns number of columns
• names:
• names(trafficstops) - returns the column names (synonym
of colnames() for data.frame objects)
• rownames(trafficstops) - returns the row names
• Create a date variable named “x,” which contains three different date values.
• The year() function allows us to extract the year values for each element of
the vector.
• The month() function takes a single date value or a vector that contains dates
as element and extracts the month from those as numbers.
• What if we wanted the abbreviated names for each month from dates? we
have to add the “label = true” argument under the month() function and
could see the month names in abbreviated form.
• HTTPS://BOOKDOWN.ORG/TARAGONMD/PHDS/WORKING-WI
TH-LISTS.HTML