Machine Learning_Unit III
Machine Learning_Unit III
Unit III
1. Introduction to R Programming
2. Variable
3. Output statement
4. Comment statements
5. Data Types
5.1 Vectors
5.2 Lists
5.3 Matrices
5.5 Arrays
5.5 Factors
5.6 Data Frames
6. Decision Statement
6.1 if Statement
6.2 if else statement
6.3 Switch Case statement
7. Loops Statements
8. Read Input
1. Introduction to R Programming
R is a programming language and software environment for statistical analysis, graphics
representation and reporting.
That’s where the caret package can come in handy: it’s short for “Classification and Regression
Training” and offers everything you need to know to solve supervised machine learning
problems: it provides a uniform interface to a ton of machine learning algorithms.
Open-source: R is an open-source programming language. This means that it is free of cost
and requires no license.
Comprehensive Language: R is a comprehensive programming language, meaning that it
provides services for statistical modeling as well as for software development.
R is also an object-oriented programming language which is an addition to its procedure
programming feature.
Provides a Wide Array of Packages:
R is most widely used because of its wide availability of libraries. R has CRAN, which is
a repository holding more than 10,0000 packages. These packages appeal to every
functionality and different fields that deal with data.
Possesses a Number of Graphical Libraries: The most important feature of R that sets it
apart from other programming languages of Data Science is its massive collection of
graphical libraries like ggplot2, plotly, etc. that are capable of making aesthetic and quality
visualizations.
Cross-Platform Compatibility: R supports cross-platform compatibility. It can be run on
any OS in any software environment.
Facilities for Various Industries: Almost every industry that makes use of data, utilizes the
R language.
No Need for a Compiler: R language is interpreted instead of compiled.
Performs Fast Calculations: Through R, you can perform a wide variety of complex
operations on vectors, arrays, data frames and other data objects of varying sizes.
Integration with Other Technologies: R can be integrated with a number of different
technologies, frameworks, software packages, and programming languages. It can be
paired with Hadoop to use its distributed computing ability. It can also be integrated with
programs in other programming languages like C, C++, Java, Python, and FORTRAN.
There are Integrated Development Environment to work with R programming, on such IDE is R
Studio.
Save R Script
The R programming script are written with in script edit window. Save the file with extension .R.
2. Variable:
Variables are loosely coupled with data type, that is data type is not given explicitly. The type of
data is based on the value assigned to it.
There are two ways to assign the value to variables
x = 10
y <- 20
30 -> m
Variable names are case sensitive.
3. Output statement
The function print() is used to display the value in console. The paste() or paste0() is included to
print the string and object value together with formatted output
x <- 10
print(paste("value",x))
4. Comment statements
Non executable statements.
Single line comment symbol in R is #
R does not support multi line comment
5. Data Types
There are many types of R-objects. The frequently used ones are −
Vectors
Lists
Matrices
Arrays
Factors
Data Frames
5.1 Vectors
There are 6 different types of vectors to store basic types.
The built in function class() is used to get the data type of variable.
Example
X <- c(2,3,45)
5.2 Lists
A list is an R-object which can contain many different types of elements inside it like vectors,
functions and even another list inside it.
# Create a list.
list1 <- list(c(2,5,3),21.3,sin)
# Print the list.
print(list1)
5.3 Matrices
A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the
matrix function.
# Create a matrix.
M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)
print(M)
5.4 Arrays
While matrices are confined to two dimensions, arrays can be of any number of dimensions. The
array function takes a dim attribute which creates the required number of dimension. In the
below example we create an array with two elements which are 3x3 matrices each.
# Create an array.
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)
5.5 Factors
Factors are the r-objects which are created using a vector. It stores the vector along with the
distinct values of the elements in the vector as labels. The labels are always character irrespective
of whether it is numeric or character or Boolean etc. in the input vector. They are useful in
statistical modeling.
Factors are created using the factor() function. The nlevels functions gives the count of levels.
# Create a vector.
apple_colors <- c('green','green','yellow','red','red','red','green')
6. Decision Statement
Decides the statement the select the block of statements to be executed based on the result of
conduction given.
if statement
if else statement
switch statement
6.1 if Statement
Example
x <- 30L
if(is.integer(x)) {
print("X is an Integer")
}
An if statement can be followed by an optional else statement which executes when the boolean
expression is false.
Syntax
if(boolean_expression) {
// statement(s) will execute if the boolean expression is true.
} else {
// statement(s) will execute if the boolean expression is false.
}
Example
x <- c("what","is","truth")
if("Truth" %in% x) {
print("Truth is found")
} else {
print("Truth is not found")
}
Syntax
The basic syntax for creating a switch statement in R is −
7. Loops Statements
Loop statements are used to execute the block of statements more than once until the conduction
given is true. Three loop statements in R programming are
for loop
while loop
repeat loop
7.1 for loop
For loop is commonly used to iterate over items of a sequence. It is an entry
controlled loop, in this loop the test condition is tested first, then the body of the
loop is executed, the loop body would not be executed if the test condition is
false.
Syntax
for (value in sequence)
{
statement
}
Example 1
for (val in 1: 5)
{
# statement
print(val)
}
Example 2 - Iterate value in vector
x <- c( 1,2,3,4,5)
for (i in x)
print(i)
Syntax
while ( condition )
{
statement
}
Example
val = 1
Syntax
repeat
{
statement
if( condition )
{
break
}
}
Example
Print the values in vector. Break the printing when odd value is encountered.
x <- c( 10,24,68,67,80,70,72)
i=1
repeat
{
if(x[i] %% 2 !=0)
break
print(x[i])
i=i+1
}
Display: 10,24,68
8. Read Input
Developers often have a need to interact with users, either to get data or to
provide some sort of result.
in R it’s also possible to take input from the user. For doing so, there are two
methods in R.
Syntax
realine()
readline("")
readline(prompt="")
Example
var = readline();
var = as.integer(var);
var = readline(prompt = "Enter any number : ");
Syntax:
To take double, string, character types inputs, specify the type of the inputted
value in the scan() method. To do this there is an argument called what, by which
one can specify the data type of the inputted value.
x = scan()
x = scan(what = double()) —-for double
x = scan(what = ” “) —-for string
x = scan(what = character()) —-for character
To read file using scan() method is same as normal console input, only thing is
that, one needs to pass the file name and data type to the scan() method.
Syntax:
x = scan(“fileDouble.txt”, what = double()) —-for double
x = scan(“fileString.txt”, what = ” “) —-for string
x = scan(“fileChar.txt”, what = character()) —-for character
Example
x = scan()
print(x)
d = scan(what = double())input continuously, to terminate the input process, need
to press Enter key 2 times on the console.
R can read and write into various file formats like csv, excel, xml etc.
The file should be present in current working directory so that R can read it. Of
course it is also passible to set any directory as working directory and read files
from there.
The csv file is a text file in which the values in the columns are separated by a
comma. Let's consider the following data present in the file named input.csv.
You can create this file using windows notepad by copying and pasting this data.
Save the file as input.csv using the save As All files(*.*) option in notepad.
The particular column can be access using the column header or column
index position.
By default the result set are returned as data frame.
The function is.data.frame() is used to check whether the object is data
frame type or not.
Syntax
Object$ColumnHeader
object["header"]
object[index of column]
Example
data$Salary
data["Value"]
data[2]
Specific set of row and colum shall be accessed using the index position or
column header
s<-file[1:2]
print(is.data.frame(s)) #TRUE
Value<- file[["Value"]]
x <-c(file["Year"],file["Value"])
The function subset() is used to access part of value based on the condition.
More than once condition shall be combined using logical operator
The records are return as data frames
The columns access using the header or column index are returned as data
frame. But individual column can be changed into list by having double [] .
Value<- file[["Value"]]
print(is.data.frame(Value)) #False
or
To convert List to Data Frame in R, call as.data.frame() function and pass the list
as argument to it.
The syntax to create an R Data Frame from a List list using as.data.frame() is
Examples
In the following program, we take a list of numbers, and convert this list to Data
Frame.
example.R
list_1 <- list(41, 42, 43, 44, 45, 46) # Change to single row
df <- as.data.frame(list_1)
print(df)
If the values are vectors then that are converted into column
list_1 <- list(41, 42, 43, 44, 45, 46) # Change to single row
df <- as.data.frame(list_1)
print(df)
Column Name
If the vector is not named, then the column names are taken automatically.
The Column name can be passed as another parameter
list_1 <- list(c(41, 42, 43), c(44, 45, 46))
df <- as.data.frame(list_1, col.names = c("a", "b"))
print(df)
list_1 contains named vectors. These vectors transform to columns in the data
frame, and the names of these vector are transformed to column names of the
data frame.
example.R
The write.csv() function is used to export the data frame into csv file.
write.csv(DataFrame Name, "Path to export the DataFrame\\File Name.csv",
row.names=FALSE)
Explore more on appending the content into CSV instead of writing each time