Module 5 Introduction to R Programming
Module 5 Introduction to R Programming
Introduction to R Programming
R Programming:
1. Basics of R
2. Installation of R studio
3. Vectors
4. Matrices
5. Data types
6. Importing files
7. Writing files
8. Merging Files
9. Data Manipulation
10.Creation and Deletion of New Variables
11.Sorting of Data
12.Functions
13.Graphical Presentation and Descriptive Statistics.
What is R programming?
R programming is the general purpose of the programming language.
It is also one of the interpreter programming language and execute line by line
code.
R programming mainly used in the data Analysis and research fields.
R supports procedural programming with functions and, for some functions,
object oriented programming with generic functions.
That is widely used as a statistical software and data analysis tool.
1997 - The R Core Team was formed, this group is the only
one with write access to R source code, and they review
and enact any suggested changes to the language.
Applications of R
R is used in a variety of fields, including:
Data Science and Machine Learning: R is widely used for data analysis,
statistical modeling and machine learning tasks.
Finance: Financial analysts use R for quantitative modeling and risk analysis.
Healthcare: In clinical research, R helps analyze medical data and test
hypotheses.
Academia: Researchers and statisticians use R for data analysis and
publishing reproducible research.
Advantages of R Programming
Comprehensive Statistical Tools: R includes many statistical functions and
models, making it the ideal choice for data analysis.
Customizable Visualizations: R’s visualization tools allows for customizations
for a simple bar chart or a detailed heatmap.
Extensive Community Support: R has a large user base and there are
countless resources, forums and tutorials available.
Highly Extendable: The availability of over 15,000 R packages means we can
extend R's functionality to suit any project or need.
Disadvantages of R Programming
Memory Intensive: R can be slow with very large datasets, consuming a lot
of memory.
Limited Support for Error Handling: Unlike some other programming
languages, R has less robust error handling features.
Steeper Learning Curve: Beginners might face challenges with some of R’s
complex features and syntax.
Performance: R’s performance can lag behind languages like Python or C++
when it comes to speed, especially for large-scale operations.
Installation of R studio
https://teacherscollege.screenstepslive.com/a/1108074-install-r-and-rstudio-for-
windows
Array:
Array is a linear data structure where all elements are arranged sequentially. It
is a collection of elements of same data type stored at contiguous memory
locations.
Note: Contiguous memory allocation is a type of memory allocation technique where processes are allotted a
continuous block of space in memory
Vectors
In R programming, a vector is one of the most fundamental data structures. It is a
one-dimensional array that can hold elements of the same type, such as numeric,
character, or logical values. Vectors are used extensively in R for data manipulation
and analysis.
Characteristics of Vectors
Homogeneous: All elements in a vector must be of the same type (e.g., all
numeric or all character).
One-dimensional: Vectors are essentially one-dimensional arrays.
Indexing: Elements in a vector can be accessed using indices, which start
from 1 in R.
Creating Vectors
There are several ways to create vectors in R:
1. Using the c() function: The most common method, where c() stands for
"combine."
numeric_vector = c(1, 2, 3, 4, 5)
character_vector = c("apple", "banana", "cherry")
logical_vector = c(TRUE, FALSE, TRUE)
In a matrix, rows are the ones that run horizontally and columns are the ones that
run vertically. In R programming, matrices are two-dimensional, homogeneous
data structures. These are some examples of matrices:
Characteristics of Matrices
Creating Matrices
You can create matrices in R using the matrix() function or by converting vectors
into matrices. Here are a few methods to create matrices:
Syntax:
matrix(data, nrow, ncol, byrow = FALSE)
Example: Output:
row1 = c(1, 2, 3)
row2 = c(4, 5, 6)
my_matrix2 = rbind(row1, row2) # Bind rows
print(my_matrix2)
Output:
col1 <- c(1, 4)
col2 <- c(2, 5)
my_matrix3 = cbind(col1, col2) # Bind columns
print(my_matrix3)
R Programming language has the following basic R-data types and the following
table shows the data type and the values that each data type can take.
Example:
x=5 Output
print(class(x)) [1] "numeric"
print(typeof(x)) [1] "double"
Example
x = as.integer(5)
print(class(x))
print(typeof(x))
y = 5L
print(class(y))
print(typeof(y))
Example:
x=4
y=3
z=x>y
print(z)
print(class(z))
print(typeof(z))
Example:
x = 4 + 3i
print(class(x))
print(typeof(x))
Example:
char = "Geeksforgeeks"
print(class(char))
print(typeof(char))
There are several tasks that can be done using R data types. Let's understand each
task with its action and the syntax for doing the task along with an R code to
illustrate the task.
Example:
x = as.raw(c(0x1, 0x2, 0x3, 0x4, 0x5))
print(x)
Output = [1] 01 02 03 04 05
Importing files
Importing files in R programming is the process of loading external datasets—
such as those stored in CSV, Excel, text, or database files—into R’s working
environment for analysis and manipulation. There are several methods and
functions available, depending on the file type and your workflow.
1. Importing CSV Files
CSV (Comma-Separated Values) files are widely used for storing tabular data. You
can use the read.csv() function to import these files.
Example:
# Import a CSV file
data_csv = read.csv("path/to/your/file.csv")
# Display the first few rows of the data
head(data_csv)
Example:
library(readxl)
# Import an Excel file
data_excel = read_excel("path/to/your/file.xlsx")
# Display the first few rows of the data
head(data_excel)
Example:
# Import a tab-delimited text file
data_text = read.delim("path/to/your/file.txt")
# Display the first few rows of the data
head(data_text)
Syntax:
write.csv (my_data, file = "my_data.csv")
write.csv2 (my_data, file = "my_data.csv")
Here,
csv() and csv2() are the function in R programming.
write.csv() uses “.” for the decimal point and a comma (“, ”) for the
separator.
write.csv2() uses a comma (“, ”) for the decimal point and a semicolon
(“;”) for the separator.
Syntax:
write.table(my_data, file = "my_data.txt", sep = "")
Merging Files:
Merging files refers to the process of combining the contents of two or more files
into a single file. This is commonly done in programming, data analysis, and version
control to consolidate information, resolve differences, or create a unified dataset.
Merging files in R typically involves combining two or more data frames based on
common columns. This is a common task in data analysis, allowing you to create
richer datasets by integrating information from different sources. R provides several
methods for merging data, including the base merge() function, the dplyr package,
and the data.table package.
Example:
df1 = data.frame(ID = c(1, 2, 3, 4), Name = c("A", "B", "C", "D"), Age = c(25, 30,
35, 40))
df2 = data.frame(ID = c(2, 3, 4, 5), Occupation = c("Engineer", "Teacher", "Doctor",
"Lawyer"), Salary = c(5000, 4000, 6000, 7000))
Example:
library(dplyr)
# Left join
left_join_result <- left_join(df1, df2, by = "ID")
print(left_join_result)
Output:
Example:
library(data.table)
Output:
Data manipulation
Data manipulation in R programming refers to the process of transforming,
organizing, and preparing data for analysis. It involves tasks such as selecting
specific data, filtering rows, modifying data, summarizing information, and
reshaping datasets. R provides a variety of functions and packages (like dplyr, tidyr,
and base R functions) to perform efficient data manipulation.
Tasks:
Using base R:
print(selected)
print(passed_students)
print(students)
print(paste("Average Science Score:", avg_science_score))
library(dplyr)
Name Score
1 Alice 85
2 Bob 90
3 Charlie 78
4 David 88
5 Eva 92
print(selected)
print(passed_students)
print(students)
print(avg_science_score)
https://www.geeksforgeeks.org/r-language/data-manipulation-in-r-with-dplyr-
package/