0% found this document useful (0 votes)
9 views

Machine Learning_Unit III

This document provides a comprehensive overview of R programming, covering fundamental concepts such as variables, data types, decision statements, loops, and input/output methods. It highlights R's capabilities for statistical analysis, its open-source nature, and its extensive library support, making it a preferred choice for data science applications. Additionally, it includes practical examples and syntax for various R programming constructs.

Uploaded by

msudharsan157
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Machine Learning_Unit III

This document provides a comprehensive overview of R programming, covering fundamental concepts such as variables, data types, decision statements, loops, and input/output methods. It highlights R's capabilities for statistical analysis, its open-source nature, and its extensive library support, making it a preferred choice for data science applications. Additionally, it includes practical examples and syntax for various R programming constructs.

Uploaded by

msudharsan157
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Machine Learning – R Programming

Unit III
1. Introduction to R Programming
2. Variable
3. Output statement
4. Comment statements
5. Data Types
5.1 Vectors
5.2 Lists
5.3 Matrices
5.5 Arrays
5.5 Factors
5.6 Data Frames
6. Decision Statement
6.1 if Statement
6.2 if else statement
6.3 Switch Case statement
7. Loops Statements

7.1 for loop

7.2 While loop

7.3 repeat loop

8. Read Input

8.1 Using readline() method

8.2 Using scan() method

8.3 Read Data From File

8.4 Help Command

1. Introduction to R Programming
R is a programming language and software environment for statistical analysis, graphics
representation and reporting.

That’s where the caret package can come in handy: it’s short for “Classification and Regression
Training” and offers everything you need to know to solve supervised machine learning
problems: it provides a uniform interface to a ton of machine learning algorithms.
 Open-source: R is an open-source programming language. This means that it is free of cost
and requires no license.
 Comprehensive Language: R is a comprehensive programming language, meaning that it
provides services for statistical modeling as well as for software development.
 R is also an object-oriented programming language which is an addition to its procedure
programming feature.
 Provides a Wide Array of Packages:
 R is most widely used because of its wide availability of libraries. R has CRAN, which is
a repository holding more than 10,0000 packages. These packages appeal to every
functionality and different fields that deal with data.
 Possesses a Number of Graphical Libraries: The most important feature of R that sets it
apart from other programming languages of Data Science is its massive collection of
graphical libraries like ggplot2, plotly, etc. that are capable of making aesthetic and quality
visualizations.
 Cross-Platform Compatibility: R supports cross-platform compatibility. It can be run on
any OS in any software environment.
 Facilities for Various Industries: Almost every industry that makes use of data, utilizes the
R language.
 No Need for a Compiler: R language is interpreted instead of compiled.
 Performs Fast Calculations: Through R, you can perform a wide variety of complex
operations on vectors, arrays, data frames and other data objects of varying sizes.
 Integration with Other Technologies: R can be integrated with a number of different
technologies, frameworks, software packages, and programming languages. It can be
paired with Hadoop to use its distributed computing ability. It can also be integrated with
programs in other programming languages like C, C++, Java, Python, and FORTRAN.

Download R and install from from: https://cran.r-project.org/bin/windows/base/


It is a RGUI environment has two windows. One is to write the coding and other window is R
console to see the output of code.

There are Integrated Development Environment to work with R programming, on such IDE is R
Studio.
Save R Script
The R programming script are written with in script edit window. Save the file with extension .R.

Execute the file from the console

2. Variable:
Variables are loosely coupled with data type, that is data type is not given explicitly. The type of
data is based on the value assigned to it.
There are two ways to assign the value to variables
x = 10
y <- 20
30 -> m
Variable names are case sensitive.

3. Output statement
The function print() is used to display the value in console. The paste() or paste0() is included to
print the string and object value together with formatted output
x <- 10
print(paste("value",x))

4. Comment statements
Non executable statements.
Single line comment symbol in R is #
R does not support multi line comment

5. Data Types
There are many types of R-objects. The frequently used ones are −

 Vectors
 Lists
 Matrices
 Arrays
 Factors
 Data Frames
5.1 Vectors
There are 6 different types of vectors to store basic types.
The built in function class() is used to get the data type of variable.

Data Type Example Verify

Logical TRUE, FALSE v <- TRUE


print(class(v))

Numeric 12.3, 5, 999 v <- 23.5


print(class(v))
Integer 2L, 34L, 0L v <- 2L
print(class(v))

Complex 3 + 2i v <- 2+5i


print(class(v))

Character 'a' , '"good", "TRUE", '23.4' v <- "TRUE"


print(class(v))

Raw "Hello" is stored as 48 65 6c 6c 6f v <- charToRaw("Hello")


print(class(v))

Vector can have multiple values of same type.

Index ranges from 1 to n

Example

X <- c(2,3,45)

Y <- c(‘red’, ‘green’ , ‘yellow’)

5.2 Lists
A list is an R-object which can contain many different types of elements inside it like vectors,
functions and even another list inside it.

# Create a list.
list1 <- list(c(2,5,3),21.3,sin)
# Print the list.
print(list1)

5.3 Matrices
A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the
matrix function.

# Create a matrix.
M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)
print(M)

5.4 Arrays
While matrices are confined to two dimensions, arrays can be of any number of dimensions. The
array function takes a dim attribute which creates the required number of dimension. In the
below example we create an array with two elements which are 3x3 matrices each.

# Create an array.
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)

5.5 Factors
Factors are the r-objects which are created using a vector. It stores the vector along with the
distinct values of the elements in the vector as labels. The labels are always character irrespective
of whether it is numeric or character or Boolean etc. in the input vector. They are useful in
statistical modeling.

Factors are created using the factor() function. The nlevels functions gives the count of levels.

# Create a vector.
apple_colors <- c('green','green','yellow','red','red','red','green')

# Create a factor object.


factor_apple <- factor(apple_colors)
# Print the factor.
print(factor_apple)
print(nlevels(factor_apple))

5.6 Data Frames


Data frames are tabular data objects. Unlike a matrix in data frame each column can contain
different modes of data. The first column can be numeric while the second column can be
character and third column can be logical. It is a list of vectors of equal length.

Data Frames are created using the data.frame() function.

# Create the data frame.


BMI <- data.frame(
gender = c("Male", "Male","Female"),
height = c(152, 171.5, 165),
weight = c(81,93, 78),
Age = c(42,38,26)
)
print(BMI)

6. Decision Statement

Decides the statement the select the block of statements to be executed based on the result of
conduction given.
 if statement
 if else statement
 switch statement

6.1 if Statement

An if statement consists of a Boolean expression followed by one or more statements.


Syntax
if(boolean_expression) {
// statement(s) will execute if the boolean expression is true.
}

Example
x <- 30L
if(is.integer(x)) {
print("X is an Integer")
}

6.2 if else statement

An if statement can be followed by an optional else statement which executes when the boolean
expression is false.
Syntax
if(boolean_expression) {
// statement(s) will execute if the boolean expression is true.
} else {
// statement(s) will execute if the boolean expression is false.
}

Example

x <- c("what","is","truth")

if("Truth" %in% x) {
print("Truth is found")
} else {
print("Truth is not found")
}

6.3 Switch Case statement


A switch statement allows a variable to be tested for equality against a list of values. Each value
is called a case, and the variable being switched on is checked for each case.

Syntax
The basic syntax for creating a switch statement in R is −

switch(expression, case1, case2, case3....)

7. Loops Statements
Loop statements are used to execute the block of statements more than once until the conduction
given is true. Three loop statements in R programming are
 for loop
 while loop
 repeat loop
7.1 for loop
For loop is commonly used to iterate over items of a sequence. It is an entry
controlled loop, in this loop the test condition is tested first, then the body of the
loop is executed, the loop body would not be executed if the test condition is
false.

Syntax
for (value in sequence)
{
statement
}
Example 1
for (val in 1: 5)
{
# statement
print(val)
}
Example 2 - Iterate value in vector
x <- c( 1,2,3,4,5)
for (i in x)
print(i)

7.2 While loop


It is also an entry controlled loop, in this loop the test condition is tested first, then
the body of the loop is executed, the loop body would not be executed if the test
condition is false.

Syntax
while ( condition )
{
statement
}

Example
val = 1

while (val <= 5)


{
print(val)
val = val + 1
}

7.3 repeat loop


It is a simple loop that will run the same statement or a group of statements
repeatedly until the stop condition has been encountered. Repeat loop does not
have any condition to terminate the loop, a programmer must specifically place a
condition within the loop’s body and use the declaration of a break statement to
terminate this loop. If no condition is present in the body of the repeat loop then it
will iterate infinitely.

Syntax
repeat
{
statement

if( condition )
{
break
}
}
Example
Print the values in vector. Break the printing when odd value is encountered.
x <- c( 10,24,68,67,80,70,72)
i=1
repeat
{
if(x[i] %% 2 !=0)
break
print(x[i])
i=i+1
}
Display: 10,24,68

8. Read Input

Developers often have a need to interact with users, either to get data or to
provide some sort of result.
in R it’s also possible to take input from the user. For doing so, there are two
methods in R.

Using readline() method


Using scan() method

8.1 Using readline() method


In R language readline() method takes input in string format. If one inputs an
integer then it is inputted as a string, lets say, one wants to input 255, then it will
input as “255”, like a string. So one needs to convert that inputted value to the
format that he needs. In this case, string “255” is converted to integer 255. To
convert the inputted value to the desired data type, there are some functions in
R,

Syntax
realine()
readline("")
readline(prompt="")

as.integer(n); —> convert to integer


as.numeric(n); —> convert to numeric type (float, double etc)
as.complex(n); —> convert to complex number (i.e 3+2i)
as.Date(n) —> convert to date …, etc

Example
var = readline();
var = as.integer(var);
var = readline(prompt = "Enter any number : ");

8.2 Using scan() method


Another way to take user input in R language is using a method, called scan()
method. This method takes input from the console. This method is a very handy
method while inputs are needed to taken quickly for any mathematical calculation
or for any dataset. This method reads data in the form of a vector or list. This
method also uses to reads input from a file also.

Syntax:
To take double, string, character types inputs, specify the type of the inputted
value in the scan() method. To do this there is an argument called what, by which
one can specify the data type of the inputted value.

x = scan()
x = scan(what = double()) —-for double
x = scan(what = ” “) —-for string
x = scan(what = character()) —-for character

To read file using scan() method is same as normal console input, only thing is
that, one needs to pass the file name and data type to the scan() method.

Syntax:
x = scan(“fileDouble.txt”, what = double()) —-for double
x = scan(“fileString.txt”, what = ” “) —-for string
x = scan(“fileChar.txt”, what = character()) —-for character

Example
x = scan()
print(x)
d = scan(what = double())input continuously, to terminate the input process, need
to press Enter key 2 times on the console.

# string file input using scan()


s = scan("fileString.txt", what = " ")

# double file input using scan()


d = scan("fileDouble.txt", what = double())

8.3 Read Data From File

R can read and write into various file formats like csv, excel, xml etc.
The file should be present in current working directory so that R can read it. Of
course it is also passible to set any directory as working directory and read files
from there.

The methods so get and set the working directory are


getwd()
setwd("/web/com")

The csv file is a text file in which the values in the columns are separated by a
comma. Let's consider the following data present in the file named input.csv.

You can create this file using windows notepad by copying and pasting this data.
Save the file as input.csv using the save As All files(*.*) option in notepad.

8.3.1 Read CSV

The CSV file read as data frame.


data <- read.csv("annual.csv")
print(data)
print(is.data.frame(data))
print(ncol(data))
print(nrow(data))
print(colnames(data))

8.3.2 Access particular column

 The particular column can be access using the column header or column
index position.
 By default the result set are returned as data frame.
 The function is.data.frame() is used to check whether the object is data
frame type or not.

Syntax
Object$ColumnHeader
object["header"]
object[index of column]
Example
data$Salary
data["Value"]
data[2]

8.3.3 Access set of columns

Specific set of row and colum shall be accessed using the index position or
column header

print(file[1:2]) #first and second column

s<-file[1:2]
print(is.data.frame(s)) #TRUE

print(file[3:0]) # First three column


print(file[4:6]) # 4, 5 and 6 column
print(file[0:3])# First three column
print(file[1:2,0:ncol(file)]) # first two records with all column

Value<- file[["Value"]]
x <-c(file["Year"],file["Value"])

8.3.4 Access part of column based on condition

The function subset() is used to access part of value based on the condition.
More than once condition shall be combined using logical operator
The records are return as data frames

retval <- subset(data, salary == max(salary)) # return the data frame


with all the records satisfying the condition
info <- subset(data, salary > 600 & dept == "IT")
YearBase <-subset(file, Year == 2021)[c("Units","Variable_code")] #select
record satisfying condition with only given set of columns
8.3.5 Convert Data Frame into List

The columns access using the header or column index are returned as data
frame. But individual column can be changed into list by having double [] .

Value<- file[["Value"]]
print(is.data.frame(Value)) #False

or

Value <- list(file[“Value])


print(is.data.frame(Value)) #False

8.3.6 Convert List into data frame

 Set of list can be changed into data frame.


 Each list is added as columns in data frame
 The header can be allocated to each column.
 All the list has be of same size.

To convert List to Data Frame in R, call as.data.frame() function and pass the list
as argument to it.

The syntax to create an R Data Frame from a List list using as.data.frame() is

as.data.frame(x, row.names = NULL, optional = FALSE, ...,


cut.names = FALSE, col.names = names(x), fix.empty.names = TRUE,
check.names = !optional,
stringsAsFactors = FALSE)

When the Items are primitive


If the list is just a list of some primitive type values like numbers, strings, logical
values, etc., then as.data.frame(x) would return a Data Frame with a single row,
since each element of the list is converted to a column.

Examples
In the following program, we take a list of numbers, and convert this list to Data
Frame.

example.R

list_1 <- list(41, 42, 43, 44, 45, 46) # Change to single row
df <- as.data.frame(list_1)
print(df)

When the Item is Vector

If the values are vectors then that are converted into column
list_1 <- list(41, 42, 43, 44, 45, 46) # Change to single row
df <- as.data.frame(list_1)
print(df)

Column Name
If the vector is not named, then the column names are taken automatically.
The Column name can be passed as another parameter
list_1 <- list(c(41, 42, 43), c(44, 45, 46))
df <- as.data.frame(list_1, col.names = c("a", "b"))
print(df)

Convert Named List to Data Frame


A list of named vectors, list_1 and convert this list to Data Frame df using
as.data.frame() function.
Then, print this Data Frame to console.

list_1 contains named vectors. These vectors transform to columns in the data
frame, and the names of these vector are transformed to column names of the
data frame.

example.R

list_1 <- list(a = c(41, 42, 43), b = c(44, 45, 46))


df <- as.data.frame(list_1)
print(df)

8.3.7 Export data frame into R

The write.csv() function is used to export the data frame into csv file.
write.csv(DataFrame Name, "Path to export the DataFrame\\File Name.csv",
row.names=FALSE)

Explore more on appending the content into CSV instead of writing each time

8.4 Help Command


To more about any function the help command is used
help(‘print’)

Enter this in console to get the syntax and description of function.

You might also like