0% found this document useful (0 votes)
23 views46 pages

Edar M-1

Uploaded by

sknihal.cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views46 pages

Edar M-1

Uploaded by

sknihal.cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

MODULE – 1

Introduction to R Programming

Reading and Getting Data into R, Viewing Named Objects, Types of Data Items, The
Structure of Data Items, Working with History Commands, Saving your Work in R.
Control Statements, Arithmetic and Boolean Operators, Functions, Return Values,
Environment and Scope Issues, Recursion.

What is R Language?
 R is a programming language.
 R is often used for statistical computing and graphical presentation to analyze and
visualize data.
 Created by statisticians Ross Ihaka and Robert Gentleman, R is used among data
miners and statisticians for data analysis and developing statistical software.
 The official R software environment is an open-source free software environment
within the GNU package, available under the GNU General Public License.
 It is written primarily in C, Fortran, and R itself (partially self-hosting).
 Precompiled executables are provided for various operating systems.
 R has a command line interface. Multiple third-party graphical user interfaces are
also available, such as RStudio, an integrated development environment,
and Jupyter, a notebook interface.
 According to user surveys and studies of scholarly literature databases, R is one
of the most commonly used programming language used in data mining. As of
March 2022, R ranks 11th in the TIOBE index, a measure of programming
language popularity.

Note:
R Programming Paradigm: Multi-paradigm i.e.,
Procedural
Object oriented
Functional
Reflective
Imperative
Array
Designed by : Ross Ihaka and Robert Gentleman
Developer : R Core Team

1
First Appeared : August 1993; 28 years ago
Stable Release : 4.1.3 / 10 March 2022; 27 days ago
Typing Discipline : Dynamic
Filename extensions :
.r
.rdata
.rds
.rda

Why Use R Language?


 It is a great resource for data analysis, data visualization, data science and
machine learning
 It provides many statistical techniques (such as statistical tests, classification,
clustering and data reduction)
 It is easy to draw graphs in R, like pie charts, histograms, box plot, scatter plot,
etc.
 It works on different platforms (Windows, Mac, Linux)
 It is open-source and free
 It has a large community support
 It has many packages (libraries of functions) that can be used to solve different
problems

Features of R Language
 R is a domain-specific programming language which aims to do data analysis.
 It has some unique features which make it very powerful.
 The most important arguably being the notation of vectors.
 These vectors allow us to perform a complex operation on a set of values in a
single command.
 There are the following features of R programming:
1. It is a simple and effective programming language which has been well
developed.
2. It is data analysis software.
3. It is a well-designed, easy, and effective language which has the concepts of
user-defined, looping, conditional, and various I/O facilities.
4. It has a consistent and incorporated set of tools which are used for data
analysis.
5. For different types of calculation on arrays, lists and vectors, R contains a
suite of operators.

2
6. It provides effective data handling and storage facility.
7. It is an open-source, powerful, and highly extensible software.
8. It provides highly extensible graphical techniques.
9. It allows us to perform multiple calculations using vectors.
10. R is an interpreted language.

History of R Language
 The history of R goes back about 20-30 years ago.
 R was developed by Ross lhaka and Robert Gentleman in the University of
Auckland, New Zealand, and the R Development Core Team currently develops
it.
 This programming language name is taken from the name of both the developers.
 The first project was considered in 1992.
 The initial version was released in 1995, and in 2000, a stable beta version was
released.
 The following table shows the release date, version, and description of R
language:

Release Date Description


This is the last alpha version developed primarily by Ihaka and Gentleman. Much of
0.16 the basic functionality from the "White Book" was implemented. The mailing lists
commenced on 1 April 1997.
This is the oldest source release which is currently available on CRAN. CRAN is
0.49 1997-04-23 started on this date, with 3 mirrors that initially hosted 12 packages. Alpha versions
of R for Microsoft Windows and the classic Mac OS are made available shortly
after this version.
0.60 1997-12-05 R becomes an official part of the GNU Project. The code is hosted and maintained on
CVS.
0.65.1 1999-10-07 First versions of update.packages and install.packages functions for
downloading and installing packages from CRAN.
1.0 2000-02-29 Considered by its developers stable enough for production use.
1.4 2001-12-19 S4 methods are introduced and the first version for Mac OS X is made available soon
after.
1.8 2003-10-08 Introduced a flexible condition handling mechanism for signalling and handling
condition objects.
2.0 2004-10-04 Introduced lazy loading, which enables fast loading of data with minimal expense
of system memory.
2.1 2005-04-18 Support for UTF-8 encoding, and the beginnings of internationalization and
localization for different languages.
2.6.2 2008-02-08 Last version to support Windows 95, 98, Me and NT 4.0

3
2.11 2010-04-22 Support for Windows 64-bit systems.
2.12.2 2011-02-25 Last version to support Windows 2000
2.13 2011-04-14 Adding a new compiler function that allows speeding up functions by
converting them to bytecode.
2.14 2011-10-31 Added mandatory namespaces for packages. Added a new parallel package.
2.15 2012-03-30 New load balancing functions. Improved serialisation speed for long vectors.
3.0.0 2013-04-03 Support for numeric index values 231 and larger on 64-bit systems.
3.3.3 2017-03-06 Last version to support Microsoft Windows XP.
3.4.0 2017-04-21 Just-in-time compilation (JIT) of functions and loops to byte-code enabled by default.
Packages byte-compiled on installation by default. Compact internal
3.5.0 2018-04-23 representation of integer sequences. Added a new serialisation format to
support compact internal representations.
3.6.0 2019-04-26 Improved sampling from a discrete uniform distribution, which was noticeably non-
uniform on large populations. New serialisation format supported since 3.5.0
becomes the default.
R now uses a stringsAsFactors = FALSE default, and hence by default no longer
4.0.0 2020-04-24 converts strings to factors in calls to data.frame( ) and read.table( ).
Reference counting is used for tracking object sharing, which reduces the need for
copying objects. New syntax for raw string constants.
4.1.0 2021-05-18 Introduced |> as the pipe operator for base R syntax (similar to the %>%
operator of the magrittr package) and the anonymous function shortcut syntax
\(x) x+1

Comparison between R and Python


 Data science deals with identifying, extracting, and representing meaningful
information from the data source.
 R, Python, SAS, SQL, Tableau, MATLAB, etc. are the most useful tools for data
science.
 R and Python are the most used ones. But still, it becomes confusing to choose
the better or the most suitable one among the two, R and Python.
Comparison R Python
Index
"R is an interpreted computer programming Python is an Interpreted high-level
language which was created by Ross Ihaka programming language used for
and Robert Gentleman at the University of general-purpose programming. Guido
Auckland, New Zealand ." The R Van Rossum created it, and it was first
Overview Development Core Team currently develops released in 1991. Python has a very
R. R is also a software environment which is simple and clean code syntax. It
used to analyze statistical information, emphasizes the code readability and
graphical representation, reporting, and data debugging is also simple and easier in
modeling. Python.

4
R packages have advanced techniques which For finding outliers in a data set both R
are very useful for statistical work. The and Python are equally good. But for
Specialties for
CRAN text view is provided by many useful developing a web service to allow
data science
R packages. These packages cover everything peoples to upload datasets and find
from Psychometrics to Genetics to Finance. outliers, Python is better.
Most of the data analysis functionalities
are not inbuilt. They are available
Functionalities For data analysis, R has inbuilt functionalities
through packages like Numpy and
Pandas
Data visualization is a key aspect of analysis. Python is better for deep learning
R packages such as ggplot2, ggvis, lattice, because Python packages such as Caffe,
Key domains of
etc. make data visualization easier. Keras, OpenNN, etc. allows the
application
development of the deep neural network
in a very simple way.
There are hundreds of packages and ways to Python has few main packages such as:
Availability of accomplish needful data science tasks. Sccikit learn, and Pandas for data
packages analysis of machine learning,
respectively.

R Advantages and Disadvantages


 R is the most popular programming language for statistical modeling and analysis.
 Like other programming languages, R also has some advantages and
disadvantages.
 It is a continuously evolving language which means that many cons will slowly
fade away with future updates to R.
 There are the following pros and cons of R

5
Advantages:
1) Open Source
 An open-source language is a language on which we can work without any need
for a license or a fee.
 R is an open-source language. We can contribute to the development of R by
optimizing our packages, developing new ones, and resolving issues.
2) Platform Independent
 R is a platform-independent language or cross-platform programming language
which means its code can run on all operating systems.
 R enables programmers to develop software for several competing platforms by
writing a program only once.
 R can run quite easily on Windows, Linux, and Mac.
3) Machine Learning Operations
 R allows us to do various machine learning operations such as classification and
regression.
 For this purpose, R provides various packages and features for developing the
artificial neural network. R is used by the best data scientists in the world.
4) Exemplary support for data wrangling
 R allows us to perform data wrangling.
 R provides packages such as dplyr, readr which are capable of transforming
messy data into a structured form.
5) Quality plotting and graphing
 R simplifies quality plotting and graphing.
 R libraries such as ggplot2 and plotly advocates for visually appealing and
aesthetic graphs which set R apart from other programming languages.
6) The array of packages
 R has a rich set of packages. R has over 10,000 packages in the CRAN repository
which are constantly growing.
 R provides packages for data science and machine learning operations.
7) Statistics
 R is mainly known as the language of statistics.
 It is the main reason why R is predominant than other programming languages for
the development of statistical tools.

6
8) Continuously Growing
 R is a constantly evolving programming language.
 Constantly evolving means when something evolves, it changes or develops over
time, like our taste in music and clothes, which evolve as we get older.
 R is a state of the art which provides updates whenever any new feature is added.

Disadvantages
1) Data Handling
 In R, objects are stored in physical memory.
 It is in contrast with other programming languages like Python.
 R utilizes more memory as compared to Python.
 It requires the entire data in one single place which is in the memory.
 It is not an ideal option when we deal with Big Data.
2) Basic Security
 R lacks basic security. It is an essential part of most programming languages such
as Python.
 Because of this, there are many restrictions with R as it cannot be embedded in a
web-application.
3) Complicated Language
 R is a very complicated language, and it has a steep learning curve.
 The people who don't have prior knowledge or programming experience may find
it difficult to learn R.
4) Weak Origin
 The main disadvantage of R is, it does not have support for dynamic or 3D
graphics.
 The reason behind this is its origin. It shares its origin with a much older
programming language "S."
5) Lesser Speed
 R programming language is much slower than other programming languages such
as MATLAB and Python.
 In comparison to other programming language, R packages are much slower.
 In R, algorithms are spread across different packages.

7
 The programmers who have no prior knowledge of packages may find it difficult
to implement algorithms.

Applications of R
 There are several-applications available in real-time. Some of the popular
applications are as follows:
1. Facebook
2. Google
3. Twitter
4. HRDAG
5. Sunlight Foundation
6. RealClimate
7. NDAA
8. XBOX ONE
9. ANZ
10. FDA

Comments
 Comment is a description of a statement or program.
 It can be used to make it more readable.
 It can also be used to prevent execution when testing alternative code.
 Comments starts with a #.
 When executing code, R will ignore anything that starts with #.
 This example uses a comment before a line of code:

Example:
# This is a comment
"Hello World!“
Output:
[1] "Hello World!"

8
 This example uses a comment at the end of a line of code:
Example:
"Hello World!" # This is a comment
Output:
[1] "Hello World!"

 Comments does not have to be text to explain the code, it can also be used to
prevent R from executing the code:
Example 1:
# "Good morning!“
"Good night!“
Output:
[1] "Good night!"

Example 2:
"Good morning!“
#"Good night!“
Output:
[1] "Good morning!“

Multiline Comments:
 Unlike other programming languages, such as C, there are no syntax in R for
multiline comments. However, we can just insert a # for each line to create
multiline comments:
Example:
# This is a comment
# written in
# more than just one line
"Hello World!“
Output:
[1] "Hello World!"

9
Reading and Getting Data into R
 Sets of data to examine (that is, samples) and will want to create more complex
series of numbers to work on.
 We cannot perform any analyses if we do not have any data so getting data into R
is a very important task.
 This section focuses on ways to create these complex samples and get data into R,
where we are able to undertake further analyses.

Using the combine Command for Making Data


 The simplest way to create a sample is to use the c( ) command. We can think of
this as short for combine or concatenate, which is essentially what it does. The
command takes the following form:
c(item.1, item.2, item.3, item.n)
 Everything in the parentheses is joined up to make a single item. More usually we
will assign the joined-up items to a named object:
sample.name = c(item.1, item.2, item.3, item.n)
 This is much like we did when making simple result objects, except now your
sample objects consist of several bits rather than a single value.

Entering Numerical Items as Data


 Numerical data do not need any special treatment; we simply type the values,
separated by commas, into the c( ) command.
 In the following example, imagine that we have collected some data (a sample)
and now want to get the values into R:
>data1 = c(3, 5, 7, 5, 3, 2, 6, 8, 5, 6, 9)
 Now just create a new object to hold your data and then type the values into the
parentheses. The values are separated using commas.
 The “result” is not automatically displayed; to see the data we must type its name:
> data1
[1] 3 5 7 5 3 2 6 8 5 6 9

 Previously the named objects contained single values (the result of some
mathematical calculation).
 Here the named object data1 contains several values, forming a sample. The [1] at
the beginning shows we that the line begins with the first item (the number 3).
 When we get larger samples and more values, the display may well take up more
than one line of the display, and R provides a number at the beginning of each

10
row so we can see “how far along” we are. In the following example we can see
that there are 41 values in the sample:

[1] 582 132 716 515 158 80 757 529 335 497 3369 746 201 277 593
[16] 361 905 1513 744 507 622 347 244 116 463 453 751 540 1950 520
[31] 179 624 448 844 1233 176 308 299 531 71 717

 The second row starts with [16], which tells we that the first value in that row is
the 16th in the sample. This simple index system makes it a bit easier to pick out
specific items.

 We can incorporate existing data objects with values to make new ones simply by
incorporating them as if they were values themselves (which of course they are).
In this example we take the numerical sample that we made earlier and
incorporate it into a larger sample:
> data1
[1] 3 5 7 5 3 2 6 8 5 6 9
> data2 = c(data1, 4, 5, 7, 3, 4)
> data2
[1] 3 5 7 5 3 2 6 8 5 6 9 4 5 7 3 4
 Here we take your first data1 object and add some extra values to create a new
(larger) sample. In this case we create a new item called data2, but we can
overwrite the original as part of the process:
> data1 = c(6, 7, 6, 4, 8, data1)
> data1
[1] 6 7 6 4 8 3 5 7 5 3 2 6 8 5 6 9
 Now adding extra values at the beginning has modified the original sample.

Entering Text Items as Data


 If the data we require are not numerical, we simply use quotes to differentiate
them from numbers.
 There is no difference between using single and double quotes; R converts them
all to double.
 We can use either or both as long as the surrounding quotes for any single item
match, as shown in the following:
our.text = c(“item1”, “item2”, ‘item3’)

11
 In practice though, it is a good habit to stick to one sort of quote; single quote
marks are easier to type.
 The following example shows a simple text sample comprising of days of the
week:
> day1 = c('Mon', 'Tue', 'Wed', 'Thu')
> day1
[1] "Mon" "Tue" "Wed" "Thu"
 We can combine other text objects in the same way as we did for the numeric
objects previously, like so:
> day1 = c(day1, 'Fri')
> day1
[1] "Mon" "Tue" "Wed" "Thu" "Fri"
 If we mix text and numbers, the entire data object becomes a text variable and the
numbers are converted to text, shown in the following. We can see that the items
are text because R encloses each item in quotes:
> mix = c(data1, day1)
> mix
[1] "3" "5" "7" "5" "3" "2" "6" "8" "5" "6" "9" "Mon"
[13] "Tue" "Wed" "Thu" "Fri"

 The c( ) command used in the previous example is a quick way of getting a


series of values stored in a data object. This command is useful when we don’t
have very large samples, but it can be a bit tedious when a lot of typing is
involved. Other methods of getting data into R exist, which are examined below:

Using the scan Command for Making Data


 When using the c( ) command, typing all those commas to separate the values
can be a bit tedious.
 We can use another command, scan( ), to do a similar job. Unlike the c( )
command we do not insert the values in the parentheses but use empty
parentheses.
 The command then prompts we to enter your data. We generally begin by
assigning a name to hold the resulting data like so:
our.data = scan( )
 Once we press the Enter key we will be prompted to enter your data. The
following activity illustrates this process.
Example: Use scan( ) to Make Numerical Data

12
 Perform the following steps to practice storing data using the scan( ) command.
1. Begin the data entry process with the scan( ) command:
> data3 = scan( )
2. Now type some numerical values, separated by spaces, as follows:
1: 6 7 8 7 6 3 8 9 10 7
3. Now press the Enter key and type some more numbers on the fresh line:
11: 6 9
4. Press the Enter key once again to create a new line:
13:
5. Press the Enter key once more to finish the data entry:
13:
Read 12 items
6. Type the name of the object:
> data3
[1] 6 7 8 7 6 3 8 9 10 7 6 9

Entering Text as Data


 We can enter text using the scan( ) command, but if we simply enter your items in
quotes we will get an error message.
 We need to modify the command slightly like so:
scan(what = 'character')
 We must tell R to expect that the items typed in are characters, not numbers; to do
this we add the what = ‘character’ part in the parentheses. Note that character is
in quotes.
 Once the command runs it operates in an identical manner as before.
 In the following example a simple data item is created containing text stating the
days of the week:
> day2 = scan(what = 'character')
1: Mon Tue Wed
4: Thu
5:
Read 4 items
 Note that quotes are not needed for the entered data. R is expecting the entered
data to be text so the quotes can be left out.

13
 Typing the name of the object we just created displays the data, and we can see
that they are indeed text items and the quotes are there:
> day2
[1] "Mon" "Tue" "Wed" "Thu"

Using the Clipboard to Make Data


 The scan( ) command is easier to use than the c( ) command because it does not
require commas.
 The command can also be used in conjunction with the clipboard, which is quite
useful for entering data from other programs (for example, a spreadsheet). To use
these commands, perform the following steps:
1. If the data are numbers in a spreadsheet, simply type the command in R as
usual before switching to the spreadsheet containing the data.
2. Highlight the necessary cells in the spreadsheet and copy them to the
clipboard.
3. Return to R and paste the data from the clipboard into R. As usual, R waits
until a blank line is entered before ending the data entry so we can continue
to copy and paste more data as required.
4. Once we are finished, enter a blank line to complete data entry.

 If the data are text, we add the what = ‘character’ instruction to the scan()
command as before.
 At this point, if we can open the file in a spreadsheet, proceed with the
aforementioned four steps.
 If the file opens in a text editor or word processor, we must look to see how the
data items are separated before continuing.
 If the data are separated with simple spaces, we can simply copy and paste. If the
data are separated with some other character, we need to tell R which character is
used as the separator. For example, a common file type is CSV (comma-separated
values), which uses commas to separate the data items.
 To tell R we are using this separator, simply add an extra part to your command
like so:
scan(sep = ‘,’)
 In this example R is told to expect a comma; note that we need to enclose the
separator in quotes. Here are some comma-separated numerical data:

23, 17, 12.5, 11, 17, 12, 14.5, 9


11, 9, 12.5, 14.5, 17, 8, 21

14
 To get these into R, use the scan( ) command like so:
> data4 = scan(sep = ',')
1: 23, 17, 12.5, 11, 17, 12, 14.5, 9
9: 11, 9, 12.5, 14.5, 17, 8, 21
16:
Read 15 items
> data4
[1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5 9.0 11.0 9.0 12.5 14.5 17.0 8.0 21.0

 Note that we have to press the Enter key to finish the data entry. Note also that
some of the original data had decimal points (for example, 14.5); R appends
decimals to all the data so that they all have the same level of precision. If your
data are separated by tab stops we can use “\t” to tell R that this is the case.
 If the data are text, we simply add what = ‘character’ and proceed as before. Here
are some text data contained in a CSV text file:
"Jan", "Feb", "Mar", "Apr", "May", "Jun"
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
 To get these data entered into R, perform the following steps:
1. Open the data file; in this case it has opened in a text editor and we see the
quotes and the comma separators.
2. Highlight the data required.
3. Copy to the clipboard.
4. Switch to R and type in the scan() command.
5. Paste the contents of the clipboard.
6. Press Enter on a blank line to end the data entry (this means that we have to
press Enter twice, once after the paste operation and once on the blank line).
7. Type the name of the data object created to view the entered data.

 The set of operations appears as follows:


> data5 = scan(sep = ',', what = 'char')
1: "Jan","Feb","Mar","Apr","May","Jun"
7: "Jul","Aug","Sep","Oct","Nov","Dec"
13:
Read 12 items
> data5

15
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"

 In this example both sep = and what = instructions are used. Additionally, the
scan( ) command allows we to create data items from the keyboard or from
clipboard entries, thus enabling we to move data from other applications quite
easily. It is also possible to get the scan( ) command to read a file directly as
described in the following:

Reading a File of Data from a Disk


 To read a file with the scan( ) command we simply add file = ‘filename’ to the
command.
For example:
> data6 = scan(file = 'test data.txt')
Read 15 items
> data6
[1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5 9.0 11.0 9.0 12.5 14.5 17.0 8.0
21.0

 In this example the data file is called test data.txt, which is plain text, and the
numerical values are separated by spaces. Note that the filename must be
enclosed in quotes (single or double). Of course we can use the what = and sep =
instructions as appropriate.
 R looks for your data file in the default directory. We can find the default
directory by using the getwd( ) command like so:
> getwd( ) # The first example shows the default for a Windows XP
machine
[1] "C:/Documents and Settings/Administrator/My Documents"
> getwd( ) # The second example is for a Macintosh OS X system
[1] "/Users/markgardener"
> getwd( ) # The final example is for Linux (Ubuntu 10.10)
[1] "/home/mark"

 If your file is somewhere else we must type its name and location in full. The
location is relative to the default directory; in the preceding example the file was
on the desktop so the command ought to have been:
> data6 = scan(file = 'Desktop/test data.txt')

16
 The filename and directories are all case sensitive. We can also type in a URL
and link to a file over the Internet directly; once again the full URL is required.
 It may be easier to point permanently at a directory so that the files can be loaded
simply by typing their names. We can alter the working directory using the
setwd() command:
setwd('pathname')
 When using this command, replace the pathname part with the location of your
target directory. The location is always relative to the current working directory,
so to set to my Desktop I used the following:
> setwd('Desktop')
> getwd( )
[1] "/Users/markgardener/Desktop"

 We can look at a directory and see which files/folders are within it using the dir( )
or list.files( ) command:
dir( )
list.files( )
 The default is to show the files and folders in the current working directory, but
we can type in a path (in single quote marks) to list files in any directory. For
example:
dir('Desktop')
dir('Documents')
dir('Documents/Excel files')
 Note that the listing is in alphabetical order; files are shown with their extensions
and folders simply display the name.
 If we have files that do not have extensions (for example: .txt, .doc), it is harder
to work out which are folders and which are files.
 Invisible files are not shown by default, but we can choose to see them by adding
an extra instruction to the command like so:
dir(all.files = TRUE)

Note:
 So far the data items that we have created are simple; they contain either a single
value (the result of a mathematical calculation) or several items.
 A list of data items is called a vector. If we only have a single value, your vector
contains only one item, that is, it has a length of 1.

17
 If we have multiple values, your vector is longer. When we display the list R
provides an index to help we see how many items there are and how far along any
particular item is. Think of a vector as a one-dimensional data object; most of the
time we will deal with larger datasets than single vectors of values.

Reading Bigger data Files


 The scan( ) command is helpful to read a simple vector. More often though, we
will have complicated data files that contain multiple items (in other words two-
dimensional items containing both rows and columns).
 Although it is possible to enter large amounts of data directly into R, it is more
likely that we will have your data stored in a spreadsheet. When we are sent data
items, the spreadsheet is also the most likely format we will receive.
 R provides the means to read data that is stored in a range of text formats, all of
which the spreadsheet is able to create.

The read.csv( ) Command


 In most cases we will have prepared data in a spreadsheet. Your dataset could be
quite large and it would be tedious to use the clipboard. When we have more
complex data it is better to use a new command—read.csv( ):
read.csv( )
 As we might expect, this looks for a CSV file and reads the enclosed data into R.
 We can add a variety of additional instructions to the command.
For example:
read.csv(file, sep = ',', header = TRUE, row.names)

 We can replace the file with any filename as before.


 By default the separator is set to a comma but we can alter this if you need to.
This command expects the data to be in columns, and for each column to have a
helpful name.
 The instruction header = TRUE, the default, reads the first row of the CSV file
and sets this as a name for each column.
 We can override this with header = FALSE. The row.names part allows you to
specify row names for the data; generally this will be a column in the dataset (the
first one is most usual and sensible).
 We can set the row names to be one of the columns by setting row.names = n,
where n is the column number.
Example:

18
 Some simple example data are shown in below Table 1. Here we can see two
columns; each one is a variable. The first column is labeled abund; this is the
abundance of some water-living organism.
 The second column is labeled flow and represents the flow of water where the
organism was found.
Table 1: Simple Data From a Two Column Spreadsheet
ABUND FLOW
9 2
25 3
15 5
2 9
14 14
25 24
24 29
47 34

 In this case there are only two columns and it would not take too long to use the
scan( ) command to transfer the data into R.
 However, it makes sense to keep the two columns together and import them to R
as a single entity. To do so, perform the following steps:
1. If you have a file saved in a proprietary format (for example, XLS), save the data
as a CSV file instead.
2. Now assign the file a sensible name and use the read.csv() command as follows:
> fw = read.csv(file.choose( ))
3. Select the file from the browser window. If we are using Linux, the filename
must be typed in full. Because the read.csv( ) command is expecting the data to
be separated with commas, we do not need to specify that. The data has headings
and because this is also the default, you do not need to tell R anything else.
4. To see the data, type its name like so:
> fw
abund flow
1 9 2
2 25 3
3 15 5
4 2 9
5 14 14
6 25 24
7 24 29
8 47 34

19
Viewing Named Objects

 In a general way you “make” new items by providing a name followed by the
instruction that creates it.
 R is object oriented, which means that it expects to find named things to deal with
in some way.
 For example, if we are conducting an experiment and collecting data from several
samples, we want to create several named data objects in R in order to work on
them and do your analyses later on.
 As a reminder, the following examples show a few of the different ways you have
seen thus far to create named items:
answer1 = 23 + 17 / 2 + pi / 4
my.data = read.csv(file.choose( ))
sample1 = c(2, 5, 7, 3, 9, 4, 5)
 Now to learn how to view these items in R and remove them as necessary.

Viewing Previously Loaded Named-Objects


 Once we have made a few objects and have them stored in R, we might forget
what we have previously loaded as time goes on.
 We need a way to see what R objects are available; to do this we use the ls( )
command like so:
ls( )

Viewing All Objects


 The ls ( ) command lists all the named items that we have available.
 We can also use the objects( ) command; (this is identical in function but slightly
longer to type!) The result of either command is to list the objects stored in R at
the current moment:
> ls( )
[1] "answer1" "my.data" "sample1"

 This example contains three objects. The objects are listed in alphabetical order
(with all the uppercase before the lowercase); if we have a lot of objects, the
display will run to more lines like so:
[1] "A" "A.r" "B" "CI"
[5] "CI.1" "CI.dn" "CI.up" "Ell.F"

20
[9] "F" "F1" "area" "az"
[13] "bare" "beetle.cca" "beta" "bf"
[17] "bf.beta" "bf.lm" "biol" "biol.cca"
[21] "biomass" "bird" "bp" "bs"
[25] "bss" "but" "but.lm" "c3"

 Here there are 28 objects. At the beginning of each new row the display shows
you an index number relating to “how far along” the list of items you are. For
example the bare data object is the 13th item along (alphabetically).
 If we do not have any named objects at all, we get the following “result”:
> ls( )
character(0)

Viewing Only Matching Names


 We may want to limit the display to objects with certain names; this is especially
helpful if we have a lot of data already in R.
 We can limit the display by giving R a search pattern to match.

For example:
> ls(pattern = 'b')
[1] "bare" "beetle.cca" "beta" "bf" "bf.beta"
[6] "bf.lm" "biol" "biol.cca" "biomass" "bird"
[11] "bp" "bs" "bss" "but" "but.lm"
[16] "cbh" "cbh.glm" "cbh.sf" "food.b" "nectar.b"
[21] "pred.prob" "prob2odd" "tab.est" "tab1" "tab2"

 Here the pattern looks for everything containing a “b”. This is pretty broad so we
can refine it by adding more characters:
> ls(pattern = 'be')
[1] "beetle.cca" "beta" "bf.beta"

 Now the pattern picks up objects with “be” in the name. If we want to search for
objects beginning with a certain letter you use the ^ character like so:
> ls(pattern = '^b')
[1] "bare" "beetle.cca" "beta" "bf" "bf.beta"

21
[6] "bf.lm" "biol" "biol.cca" "biomass" "bird"
[11] "bp" "bs" "bss" "but" "but.lm"

 Compare the following search listings. In the first case the pattern matches
objects beginning with “be” but in the second case the letters are enclosed in
square brackets:
> ls(pattern = '^be')
[1] "beetle.cca" "beta"

> ls(pattern = '^[be]')


[1] "bare" "beetle.cca" "beta" "bf" "bf.beta"
[6] "bf.lm" "biol" "biol.cca" "biomass" "bird"
[11] "bp" "bs" "bss" "but" "but.lm"
[16] "eF" "eF2" "env"

 The effect of the square brackets is to isolate the letters; each is treated as a
separate item, hence objects beginning with “b” or “e” are matched. We can
receive the same result using a slightly different approach as well:
ls(pattern = '^b | ^e')
 The vertical brace (sometimes called a pipe) character stands for or, that is, you
want to search for objects beginning with “b” or beginning with “e”.
 To find objects ending with a specific character you use a dollar sign at the end
like so:
> ls(pattern = 'm$')
[1] "bf.lm" "but.lm" "cbh.glm" "dep.pm"
[5] "dm" "frit.glm" "frit.lm" "frit.sum"
[9] "hlm" "mf.lm" "mr.lm" "n.glm"
[13] "newt.glm" "newt.test.glm" "sales.lm" "sm"
[17] "t.glm" "test.glm" "test.lm" "test1.glm"
[21] "tt.glm" "worm.pm"

 We can use the period as a wildcard and R will match any character:
> ls(pattern = 'a.e')
[1] "area" "bare" "date" "sales" "sales.frame"
[6] "sales.lm" "sales.ts" "water"

22
> ls(pattern = 'a..e')
[1] "tab.est" "treatment"

 In the first example a single wildcard was used but in the second there are two.
This pattern matching uses more or less the same conventions as standard
Regular Expressions.

Removing objects from R


 We can remove objects from memory and therefore permanently delete them
using the rm( ) or remove( ) commands.
 To remove objects we can simply list them in the parentheses of the command:
rm(list)
remove(list)
 We can type the names of the objects separated by commas.
For example:
> rm(answer1, my.data, sample1)
 This removes the objects answer1, my.data, and sample1 from the workspace.
 We can use the ls( ) command to produce a list, which will then be deleted. We
need to include the instruction list in the command like so:
> rm(list = ls(pattern = '^b'))
 Here the ls( ) command is used to search for objects beginning with “b” and
remove them.

Types of Data Items


 Data items can exist in one of two forms: numbers or text values. R regards these
as numeric or character.

Number Data:
 Plain values that are whole numbers are integer values, whereas values that
contain decimals are numeric.
 The distinction is fairly minor, but if we have a list of values that contain both
integers and decimals, R will regard the entire sample as numeric.
> data3
[1] 6 7 8 7 6 3 8 9 10 7 6 9
> data7

23
[1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5 9.0 11.0 9.0 12.5 14.5 17.0 8.0 21.0

 In the first example the values are all whole numbers.


 In the second example some of them have decimal places, but R appends
decimals to all of the data to achieve an equal level of precision; in this case they
all have at least one decimal place.

Text Items:
 If you do not have numbers, we must have text. R recognizes two sorts of text
data items.
 We can think of the first kind as plain text labels; R calls these character values.
> data8
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
 These items display as plain text and have the quote marks. However, another
type of non-numeric data is called a factor:
> cut
[1] mow mow mow mow mow unmow unmow unmow unmow
Levels: mow unmow
 Here the data are text but they are not in quotes.
 When they are displayed the text appears plain without quote marks, but with an
additional line showing you how many different things there are in this list.

Converting Between number and text data:


 We can shift between the two kinds of text quite easily.
 The following example begins with data that is a factor. The as.character( )
command is used to convert to plain text. Then the plain text is converted back to
a factor using the as.factor( ) command:
> cut
[1] mow mow mow mow mow unmow unmow unmow unmow
Levels: mow unmow
> cut2 = as.character(cut)
> cut2
[1] "mow" "mow" "mow" "mow" "mow" "unmow" "unmow" "unmow" "unmow"
> cut3 = as.factor(cut2)
> cut3

24
[1] mow mow mow mow mow unmow unmow unmow unmow
Levels: mow unmow

 In this case new data objects were created but the original object could be
overwritten with the new one.

 We can do a similar thing with numbers. If we begin with data that contain
decimals, that is, numeric, we can convert to integers using the as.integer( )
command.
 We can convert integer values to numeric using the as.numeric( ) command:

> data7
[1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5 9.0 11.0 9.0 12.5 14.5 17.0 8.0 21.0
> data7i = as.integer(data7)
> data7i
[1] 23 17 12 11 17 12 14 9 11 9 12 14 17 8 21
> data7n = as.numeric(data7i)
> data7n
[1] 23 17 12 11 17 12 14 9 11 9 12 14 17 8 21
 We can also convert numbers to text using as.character( ):
> data7c = as.character(data7)
> data7c
[1] "23" "17" "12.5" "11" "17" "12" "14.5" "9" "11" "9" "12.5"
[12] "14.5" "17" "8" "21"

 We can also try converting text into numbers like so:


> data7nt = as.numeric(data7c)
> data7nt
[1] 23.0 17.0 12.5 11.0 17.0 12.0 14.5 9.0 11.0 9.0 12.5 14.5 17.0 8.0 21.0

 This works out fine if the text is sensible; in the preceding example the text
values were originally numbers.
 Now see what happens if you try this on a factor:
> cut
[1] mow mow mow mow mow unmow unmow unmow unmow

25
> cut.n = as.numeric(cut)
> cut.n
[1] 1 1 1 1 1 2 2 2 2

 Here we get a surprising (but potentially useful) result; the numbers relate
directly to the different factors that you have.
 If we try to convert something that really is not going to work, R gives a warning
like so:
> data8
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
> data8n = as.numeric(data8)
Warning message:
NAs introduced by coercion
> data8n
[1] NA NA NA NA NA NA NA NA NA NA NA NA

 In this case the data is plain text and cannot be forced into any sensible number,
so we end up with a string of NAs.
 If we were to convert the plain text to a factor first and then to a number, that
would be a different story:
> data8c = as.numeric(as.factor(data8))
> data8c
[1] 5 4 8 1 9 7 6 2 12 11 10 3
 Here one command is nested inside the other. R evaluated the as.factor( ) part
first and then converted that into numbers.
 We started with twelve months and can see that they have been assigned numbers;
notice how R has indexed them alphabetically.

26
Statement

 A statement is the smallest logical entity that can independently exist in a R


program.
 No entity smaller than a statement, i.e., expressions, variables, constants, etc. can
independently exist in a R program unless and until they are converted into
statements.

Classification of Statements:
1. Selection Statements
2. Looping Statements
3. Unconditional Statements

1. Selection Statements:
 Based upon the outcome of a particular condition, selection statements transfer
control from one point to another.
 Selection statements select a statement to be executed among a set of various
statements.
 The selection statements available in R are as follows:
a. if statement
b. if-else statement
c. nested if-else statement
d. if-else-if ladder statement
e. switch statement

a) if statement

 This control structure checks the expression provided in parenthesis is true or not.
 If true, the execution of the statements in braces {} continues.

Syntax:

if(expression){
statements
....
....
}
Flow Diagram: if statement
Example: # To implement if statement in R

print("Enter the m:")


m=scan( )
if(m<10)
{
print("number is less than 10")
}

Input & Output:

“Enter the m:”


1:7
2:

“Number is less than 10”

b) if-else statement

 It is similar to if condition but when the test expression in if condition fails, then
statements in else condition are executed.

Syntax:
if(expression){
statements
....
....
}
else{
statements
....
....
}
Flow Diagram: if-else statement

Example: if-else statement

x <-5
# Check value is less than or greater than 10
if(x > 10){
print(paste(x, "is greater than 10"))
}else{
print(paste(x, "is less than 10"))
}

Output:

[1] "5 is less than 10"

c) Nested if-else statement

 When we have an if-else block as an statement within an if block or optionally


within an else block, then it is called as nested if else statement.
 When an if condition is true then following child if condition is validated and if
the condition is wrong else statement is executed, this happens within parent if
condition.
 If parent if condition is false then else block is executed with also may contain
child if else statement.
Syntax:
if(parent condition is true) {
if( child condition 1 is true) {
execute this statement
} else {
execute this statement
}
} else {
if(child condition 2 is true) {
execute this statement
} else {
execute this statement
}
}

Flow Diagram: Nested if-else statement

Example: # Nested if-else statement

print("Enter the a:")


a=scan( )
print("Enter the b:")
b=scan( )
print("Enter the c:")
c=scan( )

if(a>b){
if(a>c){
print("a is greater")
}
else{
print("c is greater")
}
}else{
if(b>c){
print("b is greater")
}else{
print("c is greater")
}
}

Input:
[1] "Enter the a:"
1: 40
2:
Read 1 item
[1] "Enter the b:"
1: 20
2:
Read 1 item
[1] "Enter the c:"
1: 10
2:
Read 1 item
Output:
[1] "a is greater"

d) if-else-if ladder
 It is similar to if-else statement, here the only difference is that an if statement
is attached to else.
 If the condition provided to if block is true then the statement within the if
block gets executed, else-if the another condition provided is checked and if
true then the statement within the block gets executed.

Syntax:

if(condition 1 is true) {
execute this statement
} else if(condition 2 is true) {
execute this statement
} else {
execute this statement
}

Flow Diagram: if-else-if ladder

Example : # R if-else-if ladder


a <- 67
b <-76
c <-99

if(a > b && b > c){


print("condition a > b > c is TRUE")
} else if(a < b && b > c){
print("condition a < b > c is TRUE")
} else if(a < b && b < c){
print("condition a < b < c is TRUE")
}

Output:

[1] "condition a < b < c is TRUE"

e) switch statement

 In switch statement, expression is matched to list of cases. If a match is found


then it prints that case’s value. No default case is available here. If no case is
matched it outputs NULL as shown in example.

Syntax:
switch (expression, case1, case2, case3,…,case n )
Flow Diagram: switch Statement
Example: # switch statement

#to implement switch statement


print("Enter the b:")
b=scan( )
a<-switch(
b,
"apple",
"strawberry",
"Banana",
"grapes",
"orange"
)
print(a)

Input:

[1] "Enter the b:"


1: 3
2:
Read 1 item

Output:

[1] "Banana"

2. Looping Statements
 In R programming, we require a control structure to run a block of code
multiple times.
 Loops come in the class of the most fundamental and strong programming
concepts.
 A loop is a control statement that allows multiple executions of a statement or a
set of statements. The word ‘looping’ means cycling or iterating.
 There are two components of a loop, the control statement, and the loop body.
 The control statement controls the execution of statements depending on the
condition and the loop body consists of the set of statements to be executed.
 In order to execute the identical lines of code numerous times in a program, a
programmer can simply use a loop.
 There are three types of loop in R programming:
a. for Loop
b. while Loop
c. repeat Loop
a) for loop statement:
 It is a type of control statement that enables one to easily construct a loop that has
to run statements or a set of statements multiple times.
 For loop is commonly used to iterate over items of a sequence.
 It is an entry controlled loop, in this loop the test condition is tested first, then the
body of the loop is executed, the loop body would not be executed if the test
condition is false.
Syntax:
for (value in sequence)
{
statement
}

Flow Diagram: for Loop statement

Example: for loop statement


print("Enter the m:")
m=scan( )
for(i in 1:m){
print(i)
}
Input & Output:
[1] "Enter the m:"
1: 10
2:
Read 1 item
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

b) while loop
 It is a type of control statement which will run a statement or a set of statements
repeatedly unless the given condition becomes false.
 It is also an entry controlled loop, in this loop the test condition is tested first,
then the body of the loop is executed, the loop body would not be executed if
the test condition is false.

Syntax:
while ( condition )
{
statement
}

Flow Diagram: while loop


Example: while loop statement

print("Enter the m:")


m=scan( )
i=1
while(i<=m){
print(i)
i<-i+1
}
Input & Output:
[1] "Enter the m:"
1: 5
2:
Read 1 item
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

c) repeat loop statement


 It is a simple loop that will run the same statement or a group of statements
repeatedly until the stop condition has been encountered.
 Repeat loop does not have any condition to terminate the loop, a programmer
must specifically place a condition within the loop’s body and use the declaration
of a break statement to terminate this loop.
 If no condition is present in the body of the repeat loop then it will iterate
infinitely.
Syntax:
repeat
{
statement
if( condition )
{
break
}
}
Flow Diagram: repeat loop statement

 To terminate the repeat loop, we use a jump statement that is the break keyword.
Below are some programs to illustrate the use of repeat loops in R programming.

Example: repeat loop statement

i=1
repeat{
print(i)
i<-i+1
if(i>5){
break
}
}

Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

Nested loops:
 A nested for-loop has a for-loop inside of another for-loop.
 For each of the iteration in the outer for-loop, the inner loop will be executed
unless a final condition is met for the inner loop.
 Once an inner for-loop is executed for a particular outer iteration then the outer
for-loop goes for the next iteration and now the inner loop will be executed for
this iteration.
 This process repeats itself till the final condition is met for the outer for-loop.

Syntax:
for (element1 in sequence1) {
for(element2 in sequence2){
// body
}
}

Example: Nested loops

print("Enter the m:")


m=scan( )
for(i in 1:m){
for(j in 1:i)
{
print(paste(i,"*",j,"=",i*j))
}
}

Input & Output:


[1] "Enter the m:"
1: 2
2:
Read 1 item
[1] "1 * 1 = 1"
[1] "2 * 1 = 2"
[1] "2 * 2 = 4"

3. Unconditional Statements
 The unconditional statements available in R are as follows:
a. break statement
b. return statement
c. next Statement
a) break statement:
 A break statement is used to stop the iteration of a loop i.e. to terminate it based
on a condition and move the flow control to the next statement just after the loop.
It is used in for loop, while loop and repeat loop.
 Now if there are nested loops then this statement terminates the near most loop.

Syntax:
...
if (condition) {
break
}
...
Flow Diagram: break statement

Example: break statement

for ( x in 1:10) {
if ( x = = 5) {
break
}
print (x)
}

Output:
[1] 1
[1] 2
[1] 3
[1] 4

b) return statement:
 return statement is used to return the result of an executed function and returns
control to the calling function.

Syntax:

return(expression)

Example: return statement

func <- function(x){


if(x > 0){
return("Positive")
}else if(x < 0){
return("Negative")
}else{
return("Zero")
}
}

func(1)
func(0)
func(-1)

Output:

[1] "Positive"
[1] "Zero"
[1] "Negative"
c) next Statement:
 next statement is used to skip the current iteration without executing the further
statements and continues the next iteration cycle without terminating the loop.

Syntax:

loop(condition){
if (condition) {
next
}
expression statement
}

Flow Diagram: next statement

Example: next statement


# Defining vector
x <-1:10
# Print even numbers
for(i in x){
if(i%%2 != 0){
next #Jumps to next loop
}
print(i)
}

Output:
[1] 2
[1] 4
[1] 6
[1] 8
[1] 10

Design and implement to find factorial of given number

 The factorial of 4 is 24. Factorial of any number is the product of all numbers
from 1 to that number. However the factorial of 0 is defined as 1 and negative
numbers don't have factorial.
 To understand factorial see this example
4! = 1*2*3*4 = 24

Symbol
 The symbol of factorial is "!" i.e "exclamatory mark" which is put after the
number like

Example 1: Find factorial of a number with for loop

 In this example we will find factorial using a for loop.


 For understanding this example we should know the basics of for loop in R and
if else structure in R

findfactorial <- function(n){


factorial <- 1
if ((n= =0)|(n= =1))
factorial <- 1
else{
for( i in 1:n)
factorial <- factorial * i
}
return (factorial)
}

Output:
findfactorial(0)
[1] 1
findfactorial(3)
[1] 6

 In this simple example we have defined a function findfactorial which takes one
argument, which is the number for which we want to find factorial.
 A variable named factorial is defined and as minimum factorial can be 1 so we
have assigned 1 to that variable.
 After that we are using if else structure. If number is 0 or 1 then the factorial is 1
hence this is the if condition.
 In else block, the number is more than 1 and it means the factorial can be
calculated by muliplying all numbers from 1 to that number.
 We have used a variable i which goes from 1 to the number and for each iteration
of for loop, the product is multiplied by i.
 In the end a return statement is used to return factorial from this function.
 Ofcourse this can be done without using a function but with function the logic is
with more clarity and you also learn to develop functions as R programming is
done using functions as basic units and you can call these functions again and
again.

 After grasping the logic of factorial in R code, now we can write code for another
R program which finds factorial of number taken as input from user. Here is
another program in Rstudio

findfact <- function(n){


factorial <- 1
if( n < 0 )
print("Factorial of negative numbers is not possible")
else if( n = = 0 )
print("Factorial of 0 is 1")
else {
for(i in 1:n)
factorial <- factorial * i
print(paste("Factorial of ",n," is ",factorial))
}
}

Output:
>findfact(-3)
[1] "Factorial of negative numbers is not possible"
> findfact(0)
[1] "Factorial of 0 is 1"
> findfact(5)
[1] "Factorial of 5 is 120"

 There is a builtin function in R Programming to calculate factorial, factorial( ) we


may use that to find factorial, this function here is for learning how to write
functions, use for loop, if else and if else if else structures etc.

Example 2: Find factorial of a number with while loop


 In this example you will find factorial with while loop in R. For understanding
this example you should know the basics of while loop in R and if else structure
in R

findfactorial <- function(n){


factorial <- 1
if(n==0 | n==1){
factorial <- 1
} else{
while(n >= 1){
factorial <- factorial * n
n <- n-1
}
}
return (factorial)
}
Example 3: Factorial of a number with Recursion
 In this example you will find factorial in R with recursion or recursive function.
 For understanding this example you should know the basics of Recursion in R

#Factorial with recursive function

findfactorial <- function(n) {


if (n = = 0) return (1)
else
return (n * findfactorial (n-1))
}

Output:
 We may call this function and provide any positive integer it will return the
factorial of that number.

> factorial(5)
[1] 120
> factorial(4)
[1] 24

You might also like