BA-unit 3.
BA-unit 3.
3
Getting Started with R
Ms. Asha Yadav
Assistant Professor
Department of Computer Science
School of Open Learning
University of Delhi
STRUCTURE
3.1 Learning Objectives
3.2 Introduction
3.3 Installation
3.4 Importing Data from Spreadsheet Files
3.5 Commands and Syntax
3.6 Data Type
3.7 Operators
3.8 Functions
3.9 Summary
3.10 Answers to In-Text Questions
3.11 Self-Assessment Questions
3.12 References
3.13 Suggested Readings
PAGE 51
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
3.2 Introduction
As you have already studied about importance of data analytics. In the pre-
vious lesson we explored data preparation, summarisation and visualisation
with spreadsheet. In this chapter we will introduce you to a popular open-
source programming language designed primarily for statistical computing
and data analysis i.e. R programming (referred as R henceforth). Suppose
a retail company, “ShopSmart,” that needs to analyse its daily sales, cur-
rently they use excel for basic data handling, but it becomes challenging
with the increasing size of data. Switching to R helps “ShopSmart” analyse
larger datasets seamlessly, create informative visualizations, and under-
stand customer purchasing trends, ultimately leading to better marketing
strategies and inventory management. R was developed in early 1990s by
statisticians Ross Ihaka and Robert Gentleman. It has become commonly
accepted tool among data scientists, researchers, and analysts. Apart from
the traditional tools like excel, R is more flexible and scalable in terms
of data analysis. With R more complex computations and visual displays
for large datasets can be performed and handled by the programmer. R
is used in wide areas and, if we talk about business and commerce, R is
used for tasks like predicting the market, appraising financial risk and
optimization of investment, the behaviour of customers, sales forecasting,
inventory control, analysis regarding customer feedback, customer data
segmentation, and campaign performance. Thus, R is an important tool to
make decisions based on data in commerce and gives commerce students
the ability to process and interpret data efficiently. Let’s understand some
of the advantages of R and why should we prefer R for data analysis.
Statistical software generally has very costly licenses, but R is
completely free to use, which makes it accessible to anyone interested
in learning data analysis without needing to invest money.
R is a versatile statistical platform providing a wide range of data
analysis techniques, enabling virtually any type of data analytics
to be performed efficiently and having state-of-the-art graphics
capabilities for visualization.
The data is mostly gathered from variety of sources analysing it at
one place has its own challenges. R can manage data from a variety
of sources, including text files, spreadsheets, databases, and web
APIs, making it suitable for any business environment.
52 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
3.3 Installation
To begin with R, students need to install both R (the base programming
language) and RStudio which is an Integrated Development Environment
(IDE) that makes working with R much easier. RStudio provides a more
user-friendly interface compared to R’s base interface, making coding, vi-
sualizing outputs, and managing projects more straightforward and easier.
Follow the steps mentioned below in Table 3.1 to download R and Rstudio.
Table 3.1: Installation of R and RStudio
For R
Step 1: Go to [CRAN (Comprehensive R Archive Network)] (https://
cran.r-project.org/).
PAGE 53
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
R Interface
54 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
PAGE 55
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
RStudio Interface
56 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Source Editor Pane: In RStudio IDE, you can access the source Notes
editor (marked as 1 in Figure 3.1) for R code. It is a text editor that
can be used for various forms of R code (shown in 2 of Figure 3.1),
such as standard R Script, R Markdown, R Notebook and R Sweave
etc. We can write and edit code here in the editor.
Console Pane: This pane (as shown in 3 of Figure 3.1) has R
interpreter, where R code is processed. This pane will show execution
of R code (written in editor) and results are displayed.
Environment Pane: This pane can be used to access the variables
that are created in the current R session. The workspace having all
variables can be exported as well as imported (from an existing
file) as R Data file in the environment window.
Output Pane: This pane contains the Files, Plots, Packages, Help,
Viewer, and Presentation tabs. Files allow users to explore files
present on the local storage system. Plots display all graphical
outputs that are produced by the R interpreter. In packages tab you
can view the installed packages (in your RStudio) and load them
manually. In help tab documentation can be searched and viewed
for various R functions.
PAGE 57
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes if you want to access some code or data written by other people you
can do that as well using package. As you already know that R has an
open community support hence, we have many R packages available. R
packages are pre-written sets of functions to perform certain task, that
enhance its capabilities. In simple terms it is a bunch of data, from func-
tions, to help menus, stored in one place called package. In Figure 3.2 we
have installed ‘tidyverse’ package in RStudio, it is a popular collection
of packages designed to make data science easier. It includes tools for
importing, tidying, transforming, and visualizing data.
58 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
You can specify the sheet name, range of cells, and column types for Notes
better control. Sample code is shown in code window 1.
Code Window 1
IN-TEXT QUESTIONS
1. What is the difference between a package and a library in R?
2. Which package is commonly used to import Excel files into R?
Code Window 2
Comments are the text that are written for the clarity of code, they help
reader to understand your code and they are ignored by interpreter while
the program execution. Single comment is given using # at the beginning
of the statement. R does not support multiline comment.
PAGE 59
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
Code Window 3
Variables act as containers that hold data or values, which can be used
and manipulated through your program. The creation of a variable in
R is done using the assignment operator <- or =. Variables make you
work with data much better by giving meaningful names to the values
you want to use. Variables in R are flexible—you don’t have to declare
their type explicitly. R automatically understands whether you’re storing
a number, text, or something else.
Code Window 4
There are certain rules to give valid variable names in R as discussed
below:
A variable name can include letters (a-z, A-Z), digits (0-9), and the
dot (.) or underscore (_) but cannot start with a number.
R is case sensitive var and Var are two different identifiers.
Reserved keywords in R cannot be used as variable names.
Any special character except underscore and dot is not allowed.
Variable names starting with a dot are allowed, but they should not
be followed by a number. It is not advised to use dot as starting
character.
Some examples of valid and invalid variable names are shown in Table 3.2.
60 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Keywords: These are integral part of R’s syntax, keywords are used to
implement various functionalities in R. These are also called reserved
words and are predefined having specific roles within the language. The
list of reserved words in R is quite comprehensive which can be accessed
by executing ‘help(reserved)’ or using ‘?reserved’. Table 3.3 shows the
list of reserved words in R.
Table 3.3: Reserved Words
PAGE 61
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
3.6 Data Type
Unlike C or C++, we do not require to declare a variable with data
type in R. It supports random assignment of data type for a variable
depending upon the values that it has been initialized to. There are var-
ious data types available in R, a list of data types is shown in Table 3.4
apart from these general data types are also supports a lot of flexible
data structures such as vector list arrays data frames etc. which will be
discussed in later lessons.
Table 3.4: Data Types in R
R provides function so that you can view the various variables that
are currently defined in your R environment the following function
is applied to see the list of variables that are currently available.
Use ls() to list all variables in current environment.
ls(pattern = “name”) will give list of variables matching the
given pattern.
Another function that can be used to display variables if objects().
We can also remove variables from R environment using following
functions:
rm(variable_name) removes a single variable.
rm(var1, var2, var3) will remove multiple variables mentioned
as argument.
rm(list = ls()) will remove all variables.
62 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
3.7 Operators
Operators are tools that help us to perform various operations on data, we
can do basic calculations or more advanced logical comparisons, operators
tell R what action to take on data. There are various operators available
in R programming, we will discuss them in this section.
Arithmetic Operators are simplest and most frequently used opera-
tors. It allows us to carry out simple math operations like addition,
subtraction, multiplication, and division. For example, 5 + 3 adds two
numbers together and gives a result of 8. For 5 %% 2, the remainder
will be calculated and is 1. Advanced arithmetic is also available, such
as exponentiation through either the ^ or ** operators. This will enable
you to raise a number to a power. These operators are not restricted to
single numbers only. They also work element-wise on numeric vectors
to compute things easily even with very big datasets. Table 3.5 shows
various arithmetic operators.
Table 3.5: Arithmetic Operators
Relational Operators are used to compare values and check for conditions
like equality, greater than, or less than. For instance, 5 > 3 checks whether
5 is greater than 3 and returns TRUE. similarly, 5 == 3 checks for equality
and returns FALSE. They are also widely used in filtering or subsetting
data where you will find rows of a dataset that satisfy some condition. For
example, you could use age > 18 to find all rows in a dataset where the
age is above 18. Relational operations always return logical values (TRUE
or FALSE). Table 3.6 shows various relational operators.
PAGE 63
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Logical Operators let you combine or modify logical values. You can
use & to perform an AND operation, where it is only true if both the
conditions are satisfied. For instance, TRUE & FALSE is false. Likewise,
you use the | operator to obtain an OR operation, meaning that the result
is TRUE if at least one condition is satisfied. The! operator negates a
logical value, turning TRUE into FALSE and vice versa. Logical operators
are especially useful when dealing with multiple conditions in your data.
For instance, age > 18 & gender == “Male” can filter male individuals
above the age of 18 in a dataset. Table 3.7 shows various logical operators.
Table 3.7: Logical Operators
Operator Example Result
& AND (element-wise) TRUE & FALSE FALSE
&& AND (single comparison) TRUE && TRUE TRUE
| OR (element-wise) TRUE | FALSE TRUE
| | OR (single comparison) FALSE | | FALSE FALSE
! Not (negation) !TRUE FALSE
64 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
3.8 Functions
In R, user-defined functions enable you to create reusable blocks of
code to perform specific tasks. Functions are especially useful in busi-
ness analytics for automating repetitive operations, performing custom
calculations, or implementing domain-specific logic. By defining your
own functions, you can encapsulate complex logic into simple, reusable
units, which improve the clarity and efficiency of your code.
In R, a function is defined by the keyword function(). Inputs are speci-
fied as arguments, and you write the logic to work with these inputs and
generate the desired output. A well-crafted function has three components:
Name which is a descriptive identifier for the function, Arguments are
variables passed into the function for customization and The code block
where the logic is executed called body. Code window 5 shows example
of function in R.
Code Window 5
PAGE 65
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
3.9 Summary
This chapter has given an overview of working with R for data analysis,
including both foundational concepts and practical tools, to equip readers
with the essential skills. The Introduction to R looked at the basics of this
versatile programming language, powerful features, and its critical role
in data analysis. The Installation section guided readers through setting
up R and RStudio to ensure a smooth start to coding. Moving forward,
the chapter covered Importing Data, explaining how to read spreadsheet
files into R using the readxl package, an important part of data prepara-
tion. There was a focus on the understanding of Commands and Syntax,
particularly in R about case sensitivity and the need for proper formatting
to avoid error execution.
The Data Types section highlighted the primary types: numeric, character,
logical, and factors that form the basis of data handling and analysis in R.
Further details about the use of Operators, namely, arithmetic, relational,
logical, and special operators were described in detail for data manip-
ulation and analysis. Finally, the chapter ended with a presentation of
Functions, both built-in and user-defined, to perform tasks with minimal
repetition and compute complex calculations efficiently. Together, these
topics set the strong foundation for using R in data analysis.
66 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
3.11 Self-Assessment Questions
1. How do you install and load a package in R? Provide a code example.
2. Explain the difference between numeric and character data types in
R.
3. Write a user-defined function in R that calculates the square of a
number.
4. Describe how to import a .csv file into R. Mention any required
functions or packages.
5. Identify two relational and two logical operators in R.
3.12 References
Grolemund, G., & Wickham, H. (2016). R for Data Science: Import,
Tidy, Transform, Visualize, and Model Data. O’Reilly Media.
Matloff, N. (2011). The Art of R Programming. No Starch Press.
PAGE 67
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi