0% found this document useful (0 votes)
2 views17 pages

Module 5 Introduction to R Programming

Uploaded by

Hareesh bly
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views17 pages

Module 5 Introduction to R Programming

Uploaded by

Hareesh bly
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Module - V

Introduction to R Programming

R Programming:

1. Basics of R
2. Installation of R studio
3. Vectors
4. Matrices
5. Data types
6. Importing files
7. Writing files
8. Merging Files
9. Data Manipulation
10.Creation and Deletion of New Variables
11.Sorting of Data
12.Functions
13.Graphical Presentation and Descriptive Statistics.

What is R programming?
 R programming is the general purpose of the programming language.
 It is also one of the interpreter programming language and execute line by line
code.
 R programming mainly used in the data Analysis and research fields.
 R supports procedural programming with functions and, for some functions,
object oriented programming with generic functions.
 That is widely used as a statistical software and data analysis tool.

Reasons to Learn R Programming


R is an open-source programming language and software environment widely
used for statistical analysis, data visualization, and machine learning. Its user-
friendly interface and extensive libraries make it an excellent choice for
researchers, statisticians, and data scientists. Here’s why R is preferred:
 Free and Open-Source: R is open to everyone, meaning users can modify,
share and distribute their work freely.
 Designed for Data: R is built for data analysis, offering a comprehensive set of
tools for statistical computing and graphics.
 Large Package Repository: The Comprehensive R Archive Network
(CRAN) offers thousands of add-on packages for specialized tasks.
 Cross-Platform Compatibility: R can work on Windows, Mac and Linux
operating systems.
 Great for Visualization: With packages like ggplot2, R makes it easy to create
informative, interactive charts and plots.

History of ‘R’ Programming:


Ross Ihaka and Robert Gentleman

 1991 - Ross Ihaka and Robert Gentleman begin work on a


new dialect of S as a research project for the Department
of Statistics at the University of Auckland.

 1993 - The first announcement of R hits the public via the


data archive StatLib and the s-news mailing list.

 1995 - Fellow statistician Martin Machler convinces R’s


inventors to release the language under a GNU general
public license, making R both free to use and open-source.
 Ihaka and Gentleman release their
seminal paper introducing R to the world.

 1997 - The R Core Team was formed, this group is the only
one with write access to R source code, and they review
and enact any suggested changes to the language.

 The same year, the Comprehensive R Archive Network


(CRAN) was formed. This repository of open-source R
software packages, extensions to the language itself,
helps professionals with myriad tasks.

 2000 - R version 1.0.0 was released to the public.


 2003 - The R Foundation was formed to hold and
administer the R software copyright and to provide
support for the R language project.
 2004 - R version 2.0.0 is released.
 2009 - The R Journal, an open-access journal for statistical
computing and research, is established.
 2013 - R version 3.0.0 is released.

 2020 - R version 4.0.0 is released.

 June 2023 - We're currently on R version 4.3.1.


Features of R Programming language

1. Statistical Analysis: R provides a wide array of statistical techniques,


including linear and nonlinear modeling, time-series analysis, and clustering,
making it a robust tool for data analysis
2. Data Visualization: One of R's standout features is its ability to create high-
quality graphics and visualizations. Packages like ggplot2 allow users to
produce complex and aesthetically pleasing plots with ease.

3. Cross-Platform Compatibility: R runs smoothly on Windows, macOS, and


Linux.

4. Reproducibility: Tools like R Markdown facilitate combining code, output,


and text for fully reproducible research and reports.

5. Open Source: Being a GNU project, R is open-source, which means it is free


to use and has a large community contributing to its development. This
fosters collaboration and innovation within the user community.

6. Data Handling: R is designed to handle and manipulate large datasets


efficiently, although it can be memory-intensive with very large datasets.

7. Integration with Other Languages: R can be integrated with other


programming languages like C, C++, and Python, allowing users to leverage
the strengths of multiple languages in their projects.

8. User-Friendly Syntax: While R has a learning curve, its syntax is designed


to be user-friendly, especially for those familiar with statistical concepts,
making it accessible for statisticians and data analysts

Applications of R
R is used in a variety of fields, including:
 Data Science and Machine Learning: R is widely used for data analysis,
statistical modeling and machine learning tasks.
 Finance: Financial analysts use R for quantitative modeling and risk analysis.
 Healthcare: In clinical research, R helps analyze medical data and test
hypotheses.
 Academia: Researchers and statisticians use R for data analysis and
publishing reproducible research.

Advantages of R Programming
 Comprehensive Statistical Tools: R includes many statistical functions and
models, making it the ideal choice for data analysis.
 Customizable Visualizations: R’s visualization tools allows for customizations
for a simple bar chart or a detailed heatmap.
 Extensive Community Support: R has a large user base and there are
countless resources, forums and tutorials available.
 Highly Extendable: The availability of over 15,000 R packages means we can
extend R's functionality to suit any project or need.

Disadvantages of R Programming
 Memory Intensive: R can be slow with very large datasets, consuming a lot
of memory.
 Limited Support for Error Handling: Unlike some other programming
languages, R has less robust error handling features.
 Steeper Learning Curve: Beginners might face challenges with some of R’s
complex features and syntax.
 Performance: R’s performance can lag behind languages like Python or C++
when it comes to speed, especially for large-scale operations.

Installation of R studio

https://teacherscollege.screenstepslive.com/a/1108074-install-r-and-rstudio-for-
windows

Array:
Array is a linear data structure where all elements are arranged sequentially. It
is a collection of elements of same data type stored at contiguous memory
locations.

Note: Contiguous memory allocation is a type of memory allocation technique where processes are allotted a
continuous block of space in memory
Vectors
In R programming, a vector is one of the most fundamental data structures. It is a
one-dimensional array that can hold elements of the same type, such as numeric,
character, or logical values. Vectors are used extensively in R for data manipulation
and analysis.

Characteristics of Vectors
 Homogeneous: All elements in a vector must be of the same type (e.g., all
numeric or all character).
 One-dimensional: Vectors are essentially one-dimensional arrays.
 Indexing: Elements in a vector can be accessed using indices, which start
from 1 in R.

Creating Vectors
There are several ways to create vectors in R:
1. Using the c() function: The most common method, where c() stands for
"combine."
 numeric_vector = c(1, 2, 3, 4, 5)
 character_vector = c("apple", "banana", "cherry")
 logical_vector = c(TRUE, FALSE, TRUE)

2. Using the seq() function: For creating sequences.


seq_vector = seq (1, 10, by = 2)
# creates a sequence from 1 to 10 with a step of 2, Output: 1 3 5 7 9

3. Using the rep() function: To replicate values.


rep_vector = rep (1:3, times = 3)
# Replicates the sequence 1, 2, 3 three times, Output: 1 2 3 1 2 3 1 2
3
Matrix in R:

In R programming, a matrix is a two-dimensional array that can hold elements of


the same data type, organized in rows and columns. Matrices are particularly useful
for mathematical computations and data analyses, especially in linear algebra and
statistical modeling.

In a matrix, rows are the ones that run horizontally and columns are the ones that
run vertically. In R programming, matrices are two-dimensional, homogeneous
data structures. These are some examples of matrices:

Characteristics of Matrices

 Two-dimensional: Matrices consist of rows and columns.


 Homogeneous: All elements in a matrix must be of the same type (numeric,
character, etc.).
 Accessing Elements: Elements can be accessed using row and column
indices.

Creating Matrices

You can create matrices in R using the matrix() function or by converting vectors
into matrices. Here are a few methods to create matrices:

1. Using the matrix() function:


You specify the data, number of rows, and the arrangement (by row or column).

Syntax:
matrix(data, nrow, ncol, byrow = FALSE)

 data : values you want to enter


 nrow : no. of rows
 ncol : no. of columns
 byrow : logical clue, if 'true' value will be assigned by rows
Example: Output:
data = 1:6
my_matrix = matrix(data, nrow = 2, ncol = 3)
print(my_matrix)

2. Using the rbind() and cbind() functions:


These functions allow you to bind vectors as rows or columns of a matrix.

Example: Output:

row1 = c(1, 2, 3)
row2 = c(4, 5, 6)
my_matrix2 = rbind(row1, row2) # Bind rows
print(my_matrix2)

Output:
col1 <- c(1, 4)
col2 <- c(2, 5)
my_matrix3 = cbind(col1, col2) # Bind columns
print(my_matrix3)

Data Types in R Programming


Data types in R define the kind of values that variables can hold. Choosing
the right data type helps optimize memory usage and computation. Unlike some
languages, R does not require explicit data type declarations while variables can
change their type dynamically during execution.

R Programming language has the following basic R-data types and the following
table shows the data type and the values that each data type can take.

Basic Data Values Examples


Types
Numeric Set of all real numbers "numeric_value = 3.14"
Integer Set of all integers, Z "integer_value = 42L"
Logical TRUE and FALSE "logical_value = TRUE"
Complex Set of complex numbers "complex_value = 1 + 2i"
"a", "b", "c", ..., "@", "#",
Character "character_value = "Hello
"$", ...., "1", "2", ...etc
Geeks"
raw as.raw() "single_raw = as.raw(255)"

1. Numeric Data type in R


Decimal values are called numeric in R. It is the default R data type for numbers in
R. If we assign a decimal value to a variable x as follows, x will be of numeric type.
Real numbers with a decimal point are represented using this data type in R. It uses
a format for double-precision floating-point numbers to represent numerical values.

Example:
x=5 Output
print(class(x)) [1] "numeric"
print(typeof(x)) [1] "double"

2. Integer Data type in R


R supports integer data types which are the set of all integers. we can create as
well as convert a value into an integer type using the as.integer() function.
We can also use the capital 'L' notation as a suffix to denote that a particular value
is of the integer R data type.

Example
x = as.integer(5)
print(class(x))
print(typeof(x))

y = 5L
print(class(y))
print(typeof(y))

3. Logical Data type in R


R has logical data types that take either a value of true or false. A logical value is
often created via a comparison between variables.
Boolean values, which have two possible values, are represented by this R data
type: FALSE or TRUE

Example:
x=4
y=3
z=x>y
print(z)
print(class(z))
print(typeof(z))

4. Complex Data type in R :


R supports complex data types that are set of all the complex numbers. The
complex data type is to store numbers with an imaginary component.

Example:
x = 4 + 3i
print(class(x))
print(typeof(x))

5. Character Data type in R :


R supports character data types where we have all the alphabets and special
characters. It stores character values or strings. Strings in R can contain alphabets,
numbers, and symbols.
The easiest way to denote that a value is of character type in R data type is to wrap
the value inside single or double inverted commas.

Example:
char = "Geeksforgeeks"
print(class(char))
print(typeof(char))

There are several tasks that can be done using R data types. Let's understand each
task with its action and the syntax for doing the task along with an R code to
illustrate the task.

6. Raw data type in R :


To save and work with data at the byte level in R, use the raw data type. By
displaying a series of unprocessed bytes, it enables low-level operations on binary
data. Here are some speculative data on R's raw data types:

Example:
x = as.raw(c(0x1, 0x2, 0x3, 0x4, 0x5))
print(x)

Output = [1] 01 02 03 04 05
Importing files
Importing files in R programming is the process of loading external datasets—
such as those stored in CSV, Excel, text, or database files—into R’s working
environment for analysis and manipulation. There are several methods and
functions available, depending on the file type and your workflow.
1. Importing CSV Files
CSV (Comma-Separated Values) files are widely used for storing tabular data. You
can use the read.csv() function to import these files.

Example:
# Import a CSV file
data_csv = read.csv("path/to/your/file.csv")
# Display the first few rows of the data
head(data_csv)

2. Importing Excel Files


To import Excel files (.xls or .xlsx formats), you can utilize the readxl package. If you
haven't installed this package yet, you can do so using the following command:
install.packages("readxl")
After installation, you can use the read_excel() function.

Example:
library(readxl)
# Import an Excel file
data_excel = read_excel("path/to/your/file.xlsx")
# Display the first few rows of the data
head(data_excel)

3. Importing Text Files


For text files, such as tab-delimited files, you can use the read.table() function or
the read.delim() function, which is specifically designed for tab-delimited files.

Example:
# Import a tab-delimited text file
data_text = read.delim("path/to/your/file.txt")
# Display the first few rows of the data
head(data_text)

4. Importing Data from Other Formats


R also supports importing data from other formats like JSON and XML. For JSON files,
you can use the jsonlite package.
Example:
install.packages("jsonlite")
library(jsonlite)
# Import a JSON file
data_json = fromJSON("path/to/your/file.json")
# Display the structure of the data
str(data_json)
Writing files
R programming Language is one of the very powerful languages specially
used for data analytics in various fields. Analysis of data means reading and
writing data from various files like excel, CSV, text files, etc. Today we will be
dealing with various ways of writing data to different types of files using R
programming.

1. Writing Data to CSV files in R Programming Language


CSV stands for Comma Separated Values. These files are used to handle a large
amount of statistical data. Following is the syntax to write to a CSV file:

Syntax:
write.csv (my_data, file = "my_data.csv")
write.csv2 (my_data, file = "my_data.csv")
Here,
csv() and csv2() are the function in R programming.
 write.csv() uses “.” for the decimal point and a comma (“, ”) for the
separator.
 write.csv2() uses a comma (“, ”) for the decimal point and a semicolon
(“;”) for the separator.

2. Writing Data to text files


Text files are commonly used in almost every application in our day-to-day life as
a step for the "Paperless World". Well,
Writing to .txt files is very similar to that of the CSV files. Following is the syntax
to write to a text file:

Syntax:
write.table(my_data, file = "my_data.txt", sep = "")

3. Writing Data to Excel files:


To write data to excel we need to install the package known as "xlsx package", it
is basically a java based solution for reading, writing, and committing changes to
excel files. It can be installed as follows:
install.packages("xlsx")
and can be loaded and General syntax of using it is:
library("xlsx")

write.xlsx(my_data, file = "result.xlsx",


sheetName = "my_data", append = FALSE).

Merging Files:

Merging files refers to the process of combining the contents of two or more files
into a single file. This is commonly done in programming, data analysis, and version
control to consolidate information, resolve differences, or create a unified dataset.

Merging files in R typically involves combining two or more data frames based on
common columns. This is a common task in data analysis, allowing you to create
richer datasets by integrating information from different sources. R provides several
methods for merging data, including the base merge() function, the dplyr package,
and the data.table package.

1. Using the merge() Function


The merge() function in base R is a straightforward way to combine data frames. It
supports various types of joins, such as inner join, left join, right join, and full outer
join.

Example:

r# Creating two data frames

df1 = data.frame(ID = c(1, 2, 3, 4), Name = c("A", "B", "C", "D"), Age = c(25, 30,
35, 40))
df2 = data.frame(ID = c(2, 3, 4, 5), Occupation = c("Engineer", "Teacher", "Doctor",
"Lawyer"), Salary = c(5000, 4000, 6000, 7000))

# Inner join (default behavior)


inner_join = merge(df1, df2, by = "ID")
print(inner_join)
Output:

ID Name Age Occupation Salary


2 B 30 Engineer 5000
3 C 35 Teacher 4000
4 D 40 Doctor 6000

2. Using the dplyr Package


The dplyr package provides a more intuitive syntax for merging data frames, using
functions like inner_join(), left_join(), right_join(), and full_join().

Example:

r# Load dplyr package

library(dplyr)

# Creating the same data frames


df1 <- data.frame(ID = c(1, 2, 3, 4), Name = c("A", "B", "C", "D"), Age = c(25, 30,
35, 40))
df2 <- data.frame(ID = c(2, 3, 4, 5), Occupation = c("Engineer", "Teacher",
"Doctor", "Lawyer"), Salary = c(5000, 4000, 6000, 7000))

# Left join
left_join_result <- left_join(df1, df2, by = "ID")
print(left_join_result)

Output:

ID Name Age Occupation Salary


1 A 25 <NA> NA
2 B 30 Engineer 5000
3 C 35 Teacher 4000
4 D 40 Doctor 6000

3. Using the data.table Package


The data.table package is optimized for speed and efficiency, especially with large
datasets. It uses a similar syntax to the base R merge() function but is generally
faster.

Example:

r# Load data.table package

library(data.table)

# Creating the same data tables


dt1 <- data.table(ID = c(1, 2, 3, 4), Name = c("A", "B", "C", "D"), Age = c(25, 30, 35,
40))
dt2 <- data.table(ID = c(2, 3, 4, 5), Occupation = c("Engineer", "Teacher", "Doctor",
"Lawyer"), Salary = c(5000, 4000, 6000, 7000))

# Full outer join


full_join_result <- merge(dt1, dt2, by = "ID", all = TRUE)
print(full_join_result)

Output:

ID Name Age Occupation Salary


1 A 25 <NA> NA
2 B 30 Engineer 5000
3 C 35 Teacher 4000
4 D 40 Doctor 6000
5 NA NA Lawyer 7000

Data manipulation
Data manipulation in R programming refers to the process of transforming,
organizing, and preparing data for analysis. It involves tasks such as selecting
specific data, filtering rows, modifying data, summarizing information, and
reshaping datasets. R provides a variety of functions and packages (like dplyr, tidyr,
and base R functions) to perform efficient data manipulation.

Key Concepts of Data Manipulation in R:

1. Selecting Data: Choosing specific columns or rows.


2. Filtering Data: Subsetting data based on conditions.
3. Mutating Data: Creating new variables or modifying existing ones.
4. Summarizing Data: Aggregating data to find summaries.
5. Reshaping Data: Changing data structure between wide and long formats.

Example: Data Manipulation Using Base R and dplyr

Suppose we have a dataset about students’ marks:

# Sample data frame


students <- data.frame(
StudentID = 1:5,
Name = c("Alice", "Bob", "Charlie", "David", "Eva"),
Subject = c("Math", "Science", "Math", "History", "Science"),
Score = c(85, 90, 78, 88, 92),
Pass = c(TRUE, TRUE, FALSE, TRUE, TRUE)
)

Student Name Subjec Scor Pass


ID t e
1 Alice Math 85 TRUE
2 Bob Science 90 TRUE
3 Charlie Math 78 FALSE
4 David History 88 TRUE
5 Eva Science 92 TRUE

Tasks:

1. Select only Name and Score columns.


2. Filter students who passed.
3. Create a new column indicating whether the student scored above 80.
4. Find the average score of students in Science.

R Code for Data Manipulation:

Using base R:

# 1. Select Name and Score columns


selected <- students[, c ("Name", "Score")]

# 2. Filter students who passed


passed_students <- students[students$Pass == TRUE, ]

# 3. Create a new column 'HighScore' indicating if Score > 80


students$HighScore <- students$Score > 80

# 4. Calculate average Score for Science subject


science_scores <- students$Score[students$Subject == "Science"]
avg_science_score <- mean(science_scores)

print(selected)
print(passed_students)
print(students)
print(paste("Average Science Score:", avg_science_score))

Using dplyr package (more efficient):

library(dplyr)

# 1. Select Name and Score


selected <- students %>%
select(Name, Score)

1. Selected columns (Name and Score):

Name Score
1 Alice 85
2 Bob 90
3 Charlie 78
4 David 88
5 Eva 92

# 2. Filter students who passed


passed_students <- students %>%
filter(Pass == TRUE)

# 3. Add 'HighScore' column


students <- students %>%
mutate(HighScore = Score > 80)

# 4. Compute average score in Science


avg_science_score <- students %>%
filter(Subject == "Science") %>%
summarise(AverageScore = mean(Score))

print(selected)
print(passed_students)
print(students)
print(avg_science_score)

https://www.geeksforgeeks.org/r-language/data-manipulation-in-r-with-dplyr-
package/

Creation and Deletion of New Variables

You might also like