0% found this document useful (0 votes)
24 views17 pages

BA-unit 3.

This document serves as a lesson on getting started with R programming, outlining its installation, data handling capabilities, and basic commands. It emphasizes the advantages of using R for data analysis, particularly in business contexts, and provides guidance on importing data and using RStudio. The lesson also covers fundamental concepts such as data types, operators, and functions in R.

Uploaded by

rm99114829
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views17 pages

BA-unit 3.

This document serves as a lesson on getting started with R programming, outlining its installation, data handling capabilities, and basic commands. It emphasizes the advantages of using R for data analysis, particularly in business contexts, and provides guidance on importing data and using RStudio. The lesson also covers fundamental concepts such as data types, operators, and functions in R.

Uploaded by

rm99114829
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

L E S S O N

3
Getting Started with R
Ms. Asha Yadav
Assistant Professor
Department of Computer Science
School of Open Learning
University of Delhi

STRUCTURE
3.1 Learning Objectives
3.2 Introduction
3.3 Installation
3.4 Importing Data from Spreadsheet Files
3.5 Commands and Syntax
3.6 Data Type
3.7 Operators
3.8 Functions
3.9 Summary
3.10 Answers to In-Text Questions
3.11 Self-Assessment Questions
3.12 References
3.13 Suggested Readings

3.1 Learning Objectives


By the end of this chapter, students should be able to:
Define key concepts in R, such as packages, data structures, and basic commands.
Explain and utilize the capabilities of R for handling and manipulating data.
Understand and explain advantages of using R for data analysis.
Differentiate and utilize different data types and operators.
Write functions in R programming language.

PAGE 51
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 51 10-Jan-25 3:51:46 PM


BUSINESS ANALYTICS

Notes
3.2 Introduction
As you have already studied about importance of data analytics. In the pre-
vious lesson we explored data preparation, summarisation and visualisation
with spreadsheet. In this chapter we will introduce you to a popular open-
source programming language designed primarily for statistical computing
and data analysis i.e. R programming (referred as R henceforth). Suppose
a retail company, “ShopSmart,” that needs to analyse its daily sales, cur-
rently they use excel for basic data handling, but it becomes challenging
with the increasing size of data. Switching to R helps “ShopSmart” analyse
larger datasets seamlessly, create informative visualizations, and under-
stand customer purchasing trends, ultimately leading to better marketing
strategies and inventory management. R was developed in early 1990s by
statisticians Ross Ihaka and Robert Gentleman. It has become commonly
accepted tool among data scientists, researchers, and analysts. Apart from
the traditional tools like excel, R is more flexible and scalable in terms
of data analysis. With R more complex computations and visual displays
for large datasets can be performed and handled by the programmer. R
is used in wide areas and, if we talk about business and commerce, R is
used for tasks like predicting the market, appraising financial risk and
optimization of investment, the behaviour of customers, sales forecasting,
inventory control, analysis regarding customer feedback, customer data
segmentation, and campaign performance. Thus, R is an important tool to
make decisions based on data in commerce and gives commerce students
the ability to process and interpret data efficiently. Let’s understand some
of the advantages of R and why should we prefer R for data analysis.
Statistical software generally has very costly licenses, but R is
completely free to use, which makes it accessible to anyone interested
in learning data analysis without needing to invest money.
R is a versatile statistical platform providing a wide range of data
analysis techniques, enabling virtually any type of data analytics
to be performed efficiently and having state-of-the-art graphics
capabilities for visualization.
The data is mostly gathered from variety of sources analysing it at
one place has its own challenges. R can manage data from a variety
of sources, including text files, spreadsheets, databases, and web
APIs, making it suitable for any business environment.
52 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 52 10-Jan-25 3:51:46 PM


GETTING STARTED WITH R

R is compatible with a broad range of platforms, including Windows, Notes


Unix, and macOS, making it likely to run on almost any computer
you use.
The R community which provides wide level of support for R
programmers has developed thousands of packages, extending R’s
capabilities into specialized areas like quantmod for finance, plotting
package for visualization (‘ggplot2’), and support for machine
learning algorithms as well.
Despite of various advantages of R discussed above R can still be diffi-
cult to learn at first. Since it has so many features, the documentation is
extensive and help files can be overwhelming. Many functions come from
optional modules made by different contributors, so the information can be
scattered and hard to find. Understanding everything that R can do can be
quite challenging. In this lesson we will discuss the basics of R starting
from installation (students are expected to follow the steps for installation).

3.3 Installation
To begin with R, students need to install both R (the base programming
language) and RStudio which is an Integrated Development Environment
(IDE) that makes working with R much easier. RStudio provides a more
user-friendly interface compared to R’s base interface, making coding, vi-
sualizing outputs, and managing projects more straightforward and easier.
Follow the steps mentioned below in Table 3.1 to download R and Rstudio.
Table 3.1: Installation of R and RStudio
For R
Step 1: Go to [CRAN (Comprehensive R Archive Network)] (https://
cran.r-project.org/).

PAGE 53
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 53 10-Jan-25 3:51:46 PM


BUSINESS ANALYTICS

Notes Step 2: Choose your operating system (Windows, macOS, or Linux).

Download and run the installer.

R Interface

54 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 54 10-Jan-25 3:51:47 PM


GETTING STARTED WITH R

For RStudio Notes


Visit [RStudio’s website] (https://www.rstudio.com/products/rstudio/
download/).

Choose the free version, “RStudio Desktop.”

PAGE 55
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 55 10-Jan-25 3:51:48 PM


BUSINESS ANALYTICS

Notes Follow the installation prompts.

RStudio Interface

Understanding RStudio IDE


The RStudio the display is divided into various tabs (as shown in Figure 3.1);
these tabs can further be customized as per your requirement. Some of
the most important tabs that you will see by default are described below:

56 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 56 10-Jan-25 3:51:49 PM


GETTING STARTED WITH R

Source Editor Pane: In RStudio IDE, you can access the source Notes
editor (marked as 1 in Figure 3.1) for R code. It is a text editor that
can be used for various forms of R code (shown in 2 of Figure 3.1),
such as standard R Script, R Markdown, R Notebook and R Sweave
etc. We can write and edit code here in the editor.
Console Pane: This pane (as shown in 3 of Figure 3.1) has R
interpreter, where R code is processed. This pane will show execution
of R code (written in editor) and results are displayed.
Environment Pane: This pane can be used to access the variables
that are created in the current R session. The workspace having all
variables can be exported as well as imported (from an existing
file) as R Data file in the environment window.
Output Pane: This pane contains the Files, Plots, Packages, Help,
Viewer, and Presentation tabs. Files allow users to explore files
present on the local storage system. Plots display all graphical
outputs that are produced by the R interpreter. In packages tab you
can view the installed packages (in your RStudio) and load them
manually. In help tab documentation can be searched and viewed
for various R functions.

Figure 3.1: RStudio IDE


Packages: Here you have downloaded and installed R for the first time,
this means you have installed Base R software containing most of the
functions that you will use frequently like mean() and hist(). However,

PAGE 57
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 57 10-Jan-25 3:51:49 PM


BUSINESS ANALYTICS

Notes if you want to access some code or data written by other people you
can do that as well using package. As you already know that R has an
open community support hence, we have many R packages available. R
packages are pre-written sets of functions to perform certain task, that
enhance its capabilities. In simple terms it is a bunch of data, from func-
tions, to help menus, stored in one place called package. In Figure 3.2 we
have installed ‘tidyverse’ package in RStudio, it is a popular collection
of packages designed to make data science easier. It includes tools for
importing, tidying, transforming, and visualizing data.

Figure 3.2: Package Installation RStudio


Libraries: This is the directory where packages are stored on your computer.
In R, to import a package into your workspace, you use the function li-
brary(), making the package’s functions and datasets available for use.

3.4 Importing Data from Spreadsheet Files


Importing data from spreadsheets is quite common in business analytics
because most business data is stored in such formats as Excel. Using R,
you can easily import spreadsheet data into your workspace with packages
like readxl and openxlsx. They support working with.xls and.xlsx files
even without installing Excel on your computer. The readxl package is
especially straightforward and effective. The core function, read_excel(),
reads data directly from a spreadsheet and loads it into an R data frame.

58 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 58 10-Jan-25 3:51:50 PM


GETTING STARTED WITH R

You can specify the sheet name, range of cells, and column types for Notes
better control. Sample code is shown in code window 1.

Code Window 1
IN-TEXT QUESTIONS
1. What is the difference between a package and a library in R?
2. Which package is commonly used to import Excel files into R?

3.5 Commands and Syntax


The most basic program in any programming language is “Hello World”
we will start learning the basic commands. R commands can be written
on command prompt that is interpreter or script file where we can write
complete code at once and then run. We can also run R code on online
compilers if you have not installed R studio.
Note: All the codes in this SLM are written and executed on google colab.

Code Window 2
Comments are the text that are written for the clarity of code, they help
reader to understand your code and they are ignored by interpreter while
the program execution. Single comment is given using # at the beginning
of the statement. R does not support multiline comment.

PAGE 59
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 59 10-Jan-25 3:51:50 PM


BUSINESS ANALYTICS

Notes

Code Window 3
Variables act as containers that hold data or values, which can be used
and manipulated through your program. The creation of a variable in
R is done using the assignment operator <- or =. Variables make you
work with data much better by giving meaningful names to the values
you want to use. Variables in R are flexible—you don’t have to declare
their type explicitly. R automatically understands whether you’re storing
a number, text, or something else.

Code Window 4
There are certain rules to give valid variable names in R as discussed
below:
A variable name can include letters (a-z, A-Z), digits (0-9), and the
dot (.) or underscore (_) but cannot start with a number.
R is case sensitive var and Var are two different identifiers.
Reserved keywords in R cannot be used as variable names.
Any special character except underscore and dot is not allowed.
Variable names starting with a dot are allowed, but they should not
be followed by a number. It is not advised to use dot as starting
character.
Some examples of valid and invalid variable names are shown in Table 3.2.

60 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 60 10-Jan-25 3:51:51 PM


GETTING STARTED WITH R

Table 3.2: Identifiers Notes

Keywords: These are integral part of R’s syntax, keywords are used to
implement various functionalities in R. These are also called reserved
words and are predefined having specific roles within the language. The
list of reserved words in R is quite comprehensive which can be accessed
by executing ‘help(reserved)’ or using ‘?reserved’. Table 3.3 shows the
list of reserved words in R.
Table 3.3: Reserved Words

PAGE 61
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 61 10-Jan-25 3:51:51 PM


BUSINESS ANALYTICS

Notes
3.6 Data Type
Unlike C or C++, we do not require to declare a variable with data
type in R. It supports random assignment of data type for a variable
depending upon the values that it has been initialized to. There are var-
ious data types available in R, a list of data types is shown in Table 3.4
apart from these general data types are also supports a lot of flexible
data structures such as vector list arrays data frames etc. which will be
discussed in later lessons.
Table 3.4: Data Types in R

R provides function so that you can view the various variables that
are currently defined in your R environment the following function
is applied to see the list of variables that are currently available.
Use ls() to list all variables in current environment.
ls(pattern = “name”) will give list of variables matching the
given pattern.
Another function that can be used to display variables if objects().
We can also remove variables from R environment using following
functions:
rm(variable_name) removes a single variable.
rm(var1, var2, var3) will remove multiple variables mentioned
as argument.
rm(list = ls()) will remove all variables.

62 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 62 10-Jan-25 3:51:51 PM


GETTING STARTED WITH R

rm(list = ls(pattern = “temp”)) will remove all variables matching Notes


the given pattern.

3.7 Operators
Operators are tools that help us to perform various operations on data, we
can do basic calculations or more advanced logical comparisons, operators
tell R what action to take on data. There are various operators available
in R programming, we will discuss them in this section.
Arithmetic Operators are simplest and most frequently used opera-
tors. It allows us to carry out simple math operations like addition,
subtraction, multiplication, and division. For example, 5 + 3 adds two
numbers together and gives a result of 8. For 5 %% 2, the remainder
will be calculated and is 1. Advanced arithmetic is also available, such
as exponentiation through either the ^ or ** operators. This will enable
you to raise a number to a power. These operators are not restricted to
single numbers only. They also work element-wise on numeric vectors
to compute things easily even with very big datasets. Table 3.5 shows
various arithmetic operators.
Table 3.5: Arithmetic Operators

Relational Operators are used to compare values and check for conditions
like equality, greater than, or less than. For instance, 5 > 3 checks whether
5 is greater than 3 and returns TRUE. similarly, 5 == 3 checks for equality
and returns FALSE. They are also widely used in filtering or subsetting
data where you will find rows of a dataset that satisfy some condition. For
example, you could use age > 18 to find all rows in a dataset where the
age is above 18. Relational operations always return logical values (TRUE
or FALSE). Table 3.6 shows various relational operators.

PAGE 63
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 63 10-Jan-25 3:51:52 PM


BUSINESS ANALYTICS

Notes Table 3.6: Relational Operators

Logical Operators let you combine or modify logical values. You can
use & to perform an AND operation, where it is only true if both the
conditions are satisfied. For instance, TRUE & FALSE is false. Likewise,
you use the | operator to obtain an OR operation, meaning that the result
is TRUE if at least one condition is satisfied. The! operator negates a
logical value, turning TRUE into FALSE and vice versa. Logical operators
are especially useful when dealing with multiple conditions in your data.
For instance, age > 18 & gender == “Male” can filter male individuals
above the age of 18 in a dataset. Table 3.7 shows various logical operators.
Table 3.7: Logical Operators
Operator Example Result
& AND (element-wise) TRUE & FALSE FALSE
&& AND (single comparison) TRUE && TRUE TRUE
| OR (element-wise) TRUE | FALSE TRUE
| | OR (single comparison) FALSE | | FALSE FALSE
! Not (negation) !TRUE FALSE

Assignment Operators are used to store values in variables. The most


used operator is <-, which assigns a value to a variable, like x <- 10.
You can also use = for assignment, but <- is preferred in R because it
is clear and consistent with the syntax of the language. Interestingly,
R also allows the right assignment operator (->), storing a value to the
left of the operator, such as 10 -> x. These operators are the workhorses
behind using variables, and you can easily manipulate data. Table 3.8
shows assignment operators.

64 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 64 13-Jan-25 2:57:50 PM


GETTING STARTED WITH R

Table 3.8: Assignment Operators Notes

3.8 Functions
In R, user-defined functions enable you to create reusable blocks of
code to perform specific tasks. Functions are especially useful in busi-
ness analytics for automating repetitive operations, performing custom
calculations, or implementing domain-specific logic. By defining your
own functions, you can encapsulate complex logic into simple, reusable
units, which improve the clarity and efficiency of your code.
In R, a function is defined by the keyword function(). Inputs are speci-
fied as arguments, and you write the logic to work with these inputs and
generate the desired output. A well-crafted function has three components:
Name which is a descriptive identifier for the function, Arguments are
variables passed into the function for customization and The code block
where the logic is executed called body. Code window 5 shows example
of function in R.

Code Window 5
PAGE 65
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 65 10-Jan-25 3:51:52 PM


BUSINESS ANALYTICS

Notes IN-TEXT QUESTION


3. How do you define a user-defined function in R?

3.9 Summary
This chapter has given an overview of working with R for data analysis,
including both foundational concepts and practical tools, to equip readers
with the essential skills. The Introduction to R looked at the basics of this
versatile programming language, powerful features, and its critical role
in data analysis. The Installation section guided readers through setting
up R and RStudio to ensure a smooth start to coding. Moving forward,
the chapter covered Importing Data, explaining how to read spreadsheet
files into R using the readxl package, an important part of data prepara-
tion. There was a focus on the understanding of Commands and Syntax,
particularly in R about case sensitivity and the need for proper formatting
to avoid error execution.
The Data Types section highlighted the primary types: numeric, character,
logical, and factors that form the basis of data handling and analysis in R.
Further details about the use of Operators, namely, arithmetic, relational,
logical, and special operators were described in detail for data manip-
ulation and analysis. Finally, the chapter ended with a presentation of
Functions, both built-in and user-defined, to perform tasks with minimal
repetition and compute complex calculations efficiently. Together, these
topics set the strong foundation for using R in data analysis.

3.10 Answers to In-Text Questions


1. A package is a collection of functions and data, while a library
is the location where installed packages are stored.
2. The readxl package is used to import Excel files.
3. A user-defined function is created using the function() keyword,
followed by a name, arguments, and the logic to return a result.

66 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 66 10-Jan-25 3:51:52 PM


GETTING STARTED WITH R

Notes
3.11 Self-Assessment Questions
1. How do you install and load a package in R? Provide a code example.
2. Explain the difference between numeric and character data types in
R.
3. Write a user-defined function in R that calculates the square of a
number.
4. Describe how to import a .csv file into R. Mention any required
functions or packages.
5. Identify two relational and two logical operators in R.

3.12 References
Grolemund, G., & Wickham, H. (2016). R for Data Science: Import,
Tidy, Transform, Visualize, and Model Data. O’Reilly Media.
Matloff, N. (2011). The Art of R Programming. No Starch Press.

3.13 Suggested Readings


Verzani, J. (2014). Using R for Introductory Statistics. CRC Press.
Grolemund, G. (2014). Hands-On Programming with R. O’Reilly
Media.

PAGE 67
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 67 10-Jan-25 3:51:53 PM

You might also like