0% found this document useful (0 votes)
9 views

R Programming Notes

The document provides comprehensive notes on R programming, covering key concepts such as data frames, functions like help() and setwd(), and advantages of R over other programming languages. It includes explanations of data types, built-in functions, and methods for reading data from files, as well as practical examples for creating factors and plotting multiple curves. Additionally, it highlights the importance of R in statistical computing and data analysis, emphasizing its open-source nature and rich ecosystem of packages.

Uploaded by

ajitpmbxr2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

R Programming Notes

The document provides comprehensive notes on R programming, covering key concepts such as data frames, functions like help() and setwd(), and advantages of R over other programming languages. It includes explanations of data types, built-in functions, and methods for reading data from files, as well as practical examples for creating factors and plotting multiple curves. Additionally, it highlights the importance of R in statistical computing and data analysis, emphasizing its open-source nature and rich ecosystem of packages.

Uploaded by

ajitpmbxr2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

1

R Programming Notes : Q 1 (all parts i to vii)

(i) Briefly explain the importance of dataframe in R.

A data frame is one of the most important data structures in R used to store tabular data. It can
hold data of different types (numeric, character, factor, etc.) in columns. Each column can have
a different data type, but all columns must have the same number of rows. Data frames are
commonly used in data analysis, as they resemble tables in databases or Excel sheets. They
are ideal for statistical modeling and manipulation of datasets.

(ii) What is the purpose of help() in R?

The help() function in R is used to get documentation or information about R functions,


packages, or topics. It is helpful when you want to understand how a function works or what
arguments it takes. For example, help(mean) shows information about the mean() function.
You can also use a shortcut like ?mean.

(iii) How is a function called in R?

In R, a function is called by using its name followed by parentheses. If the function requires
arguments, they are passed inside the parentheses. For example, calling the sum() function
with arguments looks like this: sum(2, 3, 5) which will return 10. User-defined functions are
also called in the same way.

(iv) How are factors created in R?

Factors in R are created using the factor() function. Factors are used to represent
categorical data and store it as levels. For example:

gender <- factor(c("Male", "Female", "Female", "Male"))

This creates a factor variable with levels "Male" and "Female". Factors are useful in statistical
modeling and data analysis.

(v) Explain the purpose of polygon() function used in R.

The polygon() function in R is used to draw polygons by connecting a series of (x, y)


coordinates. It is useful in creating custom shapes, shading areas under curves, or highlighting
regions in a plot. Example:

x <- c(1, 2, 3, 4)
y <- c(2, 3, 2, 1)
2

polygon(x, y, col = "lightblue")

This draws a blue polygon connecting the points.

(vi) What do you mean by data casting?

Data casting refers to converting data from one type to another in R. For example, converting a
numeric value to a character, or a factor to numeric. This is done using functions like
as.numeric(), as.character(), as.logical(), etc. For example:

x <- as.numeric("5")

This casts the character "5" into a numeric value 5.

(vii) Explain the purpose of setwd().

The setwd() function is used to set the working directory in R. The working directory is the
folder where R looks for files to read and where it saves files. Setting it correctly is important for
file input/output operations. Example:

setwd("C:/Users/Student/Documents")

This sets the working directory to the specified path so that you can easily read or write files
from/to that location.

Q 2: Write down the advantages of R language over other


programming languages of this category.
R is a specialized language designed for data analysis and statistical computing. It offers
several advantages over other general-purpose languages like Python, SAS, or MATLAB,
especially in the context of statistical analysis and data visualization:

1.​ Free and Open Source: R is completely free to use, modify, and distribute. This makes
it accessible to individuals and institutions without licensing fees.
2.​ Rich Collection of Packages: R has thousands of packages in CRAN (Comprehensive
R Archive Network) for different domains like bioinformatics, machine learning, finance,
and more.
3.​ Built-in Statistical Functions: R has a vast library of inbuilt functions for linear and
nonlinear modeling, time-series analysis, clustering, classification, and more.
3

4.​ Powerful Data Visualization: R excels at data visualization using packages like
ggplot2, lattice, and plotly. It can create detailed and publication-quality graphs.
5.​ Great for Data Cleaning and Manipulation: With packages like dplyr, tidyr, and
data.table, R makes it easy to manipulate and clean large datasets efficiently.
6.​ Platform Independent: R runs on all major operating systems (Windows, macOS,
Linux).
7.​ Community Support: R has an active global community of users, which means it's easy
to find help, tutorials, and packages.
8.​ Integration with Other Languages: R can work with C, C++, Java, and Python,
allowing integration into larger software projects.​

Q 3: Write and explain the different types of data types


supported by R programming language.
R supports a variety of data types that are essential for data processing and analysis. The main
data types in R are:

1.​ Numeric: These include decimal numbers and integers. Example: x <- 5.6 or y <-
10.
2.​ Integer: Whole numbers are explicitly defined using L. Example: x <- 10L.
3.​ Character: Text or string data, enclosed in quotes. Example: name <- "R
Language".
4.​ Logical: Boolean values that are either TRUE or FALSE. Used in conditions and logical
operations. Example: x <- TRUE.
5.​ Complex: Numbers with real and imaginary parts. Example: z <- 4 + 5i.
6.​ Raw: Represent bytes and are used in advanced programming tasks like encryption or
data compression.

Each of these data types is the foundation for more complex structures like vectors, matrices,
and data frames. R provides functions like class() and typeof() to check the data type of
any object. Understanding data types is crucial for writing error-free and optimized code in R.

Q 4: Write the code of a program in R which will accept the height and
weight of all students of your class and it will display the details of all
students whose height is less than 6 feet and weight is more than 90
kg.
4

R Code:
# Accepting height and weight of students
# Sample data (you can take input manually or read from a file)
students <- data.frame(
Name = c("Ravi", "Pooja", "Amit", "Sneha", "Karan"),
Height = c(5.5, 6.1, 5.8, 5.4, 6.0), # in feet
Weight = c(92, 85, 95, 88, 91) # in kg
)

# Display the original data


print("All Students' Data:")
print(students)

# Filtering students with height < 6 and weight > 90


filtered_students <- subset(students, Height < 6 & Weight > 90)

# Displaying the filtered data


print("Students with height < 6 feet and weight > 90 kg:")
print(filtered_students)

✅ Output:
All Students' Data:
Name Height Weight
1 Ravi 5.5 92
2 Pooja 6.1 85
3 Amit 5.8 95
4 Sneha 5.4 88
5 Karan 6.0 91

Students with height < 6 feet and weight > 90 kg:


Name Height Weight
1 Ravi 5.5 92
3 Amit 5.8 95

✅ Explanation:
5

●​ The data.frame() function is used to create a table-like structure.


●​ The subset() function filters the students where Height < 6 and Weight > 90.
●​ This is a common way to process and display data in R using conditional logic.

Q 5: Explain the utility of the following built-in functions of R with


example:

✅ (a) paste()
The paste() function is used to concatenate (join) multiple strings together.

Syntax: paste(..., sep = " ", collapse = NULL)

Example:

name <- paste("R", "Programming", sep = " ")


print(name) # Output: "R Programming"

You can also concatenate vectors:

x <- c("Hello", "Good")


y <- c("World", "Morning")
paste(x, y, sep = "-") # Output: "Hello-World" "Good-Morning"

✅ (b) max()
The max() function returns the maximum value from a set of numeric values.

Example:

marks <- c(45, 67, 89, 74, 56)


max(marks) # Output: 89

It helps find the highest value in a dataset.


6

✅ (c) seq()
The seq() function is used to generate a sequence of numbers.

Syntax: seq(from, to, by) or seq(length.out = n)

Example:

seq(1, 10, by = 2) # Output: 1 3 5 7 9

It is useful for creating loops, indexing, or generating evenly spaced numbers.

✅ (d) mean()
The mean() function is used to calculate the average of numeric values.

Example:

scores <- c(80, 85, 90, 75, 95)


mean(scores) # Output: 85

It is commonly used in statistics to find the central value of a dataset.

Q 6 : Describe the R functions for reading a matrix or data frame from a file. Also
demonstrate with some examples. List the different problems which can be encountered
during this process.

R Functions for Reading Matrix/Data Frame from a File

1.​ read.table()​

○​ Reads a file in table format and creates a data frame.​

○​ Syntax: read.table(file, header = TRUE, sep = ",")​


7

Example:​

data <- read.table("data.txt", header = TRUE, sep = ",")
print(data)

○​
2.​ read.csv()​

○​ A wrapper for read.table() with sep = "," by default.​

○​ Syntax: read.csv(file, header = TRUE)

Example:​

df <- read.csv("students.csv")
print(df)

○​
3.​ readLines()​

○​ Reads text lines from a file. Useful for line-by-line processing.

Example:​

lines <- readLines("file.txt")
print(lines)

○​
4.​ scan()​

○​ Reads data into a vector or list. Useful for numeric data.

Example:​

numbers <- scan("numbers.txt")
print(numbers)

○​
5.​ Converting Data Frame to Matrix​

○​ Use as.matrix() to convert a data frame to matrix.


8

Example:​

df <- read.csv("data.csv")
mat <- as.matrix(df)
print(mat)

○​

Common Problems Encountered

1.​ Incorrect Delimiters:​


File might use a delimiter other than comma (e.g., tab, semicolon) which causes
incorrect parsing.​

2.​ Missing Headers:​


If header = TRUE is not set properly, column names may be misread.​

3.​ Incorrect File Path:​


File may not be located in the working directory or path may be incorrect.​

4.​ Encoding Issues:​


Non-UTF characters in the file can cause errors or incorrect characters.​

5.​ Data Type Mismatch:​


Columns with mixed data types can cause coercion issues or produce unintended
results.​

6.​ Empty or Corrupted Files:​


Reading an empty or corrupted file may throw errors.

Q 7: Write and explain the procedure to create and generate the R-factors
and factor levels.

Answer: In R, factors are used to represent categorical data. They store both the
values and the corresponding levels (categories).
9

Creating Factors
# Create a vector of genders
gender <- c("Male", "Female", "Female", "Male", "Male")

# Convert to factor
gender_factor <- factor(gender)

# Print the factor


print(gender_factor)

Output:
[1] Male Female Female Male Male
Levels: Female Male

Checking Levels
levels(gender_factor)

Output:

[1] "Female" "Male"

Changing the Order of Levels


gender_factor <- factor(gender, levels = c("Male", "Female"))

Creating Ordered Factors


# Education levels
education <- c("High School", "Bachelor", "Master", "PhD")

# Ordered factor
education_factor <- factor(education,
levels = c("High School", "Bachelor",
"Master", "PhD"),
ordered = TRUE)

Summary of Factor Usage

●​ factor(): Creates a factor.


●​ levels(): Retrieves or sets levels.
10

●​ nlevels(): Returns the number of levels.


●​ is.factor(): Checks if a variable is a factor.
●​ ordered = TRUE: Creates an ordered factor.

8: Explain about apply method used in R. Also explain lapply and sapply
with suitable examples.

Answer:

R provides a family of apply functions to perform repetitive tasks over data structures without
writing explicit loops.

1. apply()

Used for matrices or data frames, applying a function over rows or columns.

Syntax:

apply(X, MARGIN, FUN)

●​ X: matrix or data frame


●​ MARGIN: 1 for rows, 2 for columns
●​ FUN: function to apply

Example:

mat <- matrix(1:9, nrow = 3)


apply(mat, 1, sum) # Row-wise sum

2. lapply()

Applies a function over each element of a list or vector, returns a list.

Example:

nums <- list(a = 1:5, b = 6:10)


lapply(nums, mean)
11

3. sapply()

Similar to lapply(), but returns a vector or matrix instead of a list (if possible).

Example:

nums <- list(a = 1:5, b = 6:10)


sapply(nums, mean)

Summary Table
Function Input Type Output Type Use Case

apply() Matrix/DataFrame Vector/Array Apply function across rows/cols

lapply( List/Vector List Apply function element-wise


)

sapply( List/Vector Simplified Same as lapply but simplified


)

9: Explain, how are multiple curves plotted in the same graph? Illustrate
with suitable example.

Answer:

In R, you can plot multiple curves (lines) on the same graph using the plot() function
followed by lines() or points() for additional curves.

Step-by-Step Procedure

1.​ Use plot() to draw the first curve.


2.​ Use lines() or points() to add more curves to the same plot.
3.​ Use different colors or line types to differentiate curves.​
12

Example:
# Data for curve 1
x <- 1:10
y1 <- x^2

# Data for curve 2


y2 <- x^1.5

# Plot the first curve


plot(x, y1, type = "l", col = "blue", lwd = 2, ylim = c(0, 100),
xlab = "X values", ylab = "Y values", main = "Multiple Curves")

# Add second curve


lines(x, y2, col = "red", lwd = 2, lty = 2)

# Add legend
legend("topleft", legend = c("y = x^2", "y = x^1.5"),
col = c("blue", "red"), lty = c(1, 2), lwd = 2)

Explanation:

●​ type = "l": line plot


●​ col: color of the line
●​ lty: line type (solid, dashed, etc.)
●​ lwd: line width
●​ ylim: sets y-axis range to fit all curves
●​ legend(): helps identify each curve

Output:

This will create a graph with two curves:

●​ A blue solid line for y = x^2


●​ A red dashed line for y = x^1.5
13

UNIT-I of R Programming

1. Introduction to R: What is R?

R is a programming language and environment developed specifically for statistical computing


and graphics. It was created by statisticians Ross Ihaka and Robert Gentleman and is now
maintained by the R Development Core Team. R is widely used in academia, research, and
industry for data analysis, statistical modeling, machine learning, and data visualization. It is
open-source and freely available, which contributes to its widespread popularity. Being a
domain-specific language, R is optimized for data manipulation, statistical analysis, and
graphical representation of results.

2. Why R?

R is chosen by statisticians and data scientists for several reasons. It has a rich ecosystem of
packages for performing statistical tests, building predictive models, and visualizing data. R is
highly extensible, allowing users to develop their own functions and packages. Additionally, R is
supported by a strong community, with numerous tutorials, documentation, and forums available
online. It is especially useful for exploratory data analysis due to its interactive environment and
strong visualization capabilities.

3. Advantages of R over Other Programming Languages

R has several advantages over general-purpose programming languages like C, Java, or


Python, when it comes to data analysis and statistics. It has built-in functions for a wide range of
statistical techniques. R also supports high-quality data visualization through libraries like
ggplot2 and lattice. Unlike many other languages, R allows for direct manipulation and
analysis of datasets without requiring extensive boilerplate code. The huge number of
specialized packages in R (like dplyr, caret, shiny, etc.) gives it an edge for data-centric
tasks.

4. R Studio: R Command Prompt, R Script File, Comments

RStudio is an Integrated Development Environment (IDE) for R that makes coding easier and
more organized. The R command prompt is where you can directly enter and execute R
commands interactively. The R script file is a text file where you write multiple lines of code to
execute all at once or step-by-step. These files are saved with the .R extension. Comments in
14

R are written using the # symbol and are used to describe the code. Comments help make the
code readable and maintainable, especially when shared with others or revisited later.

5. Handling Packages in R: Installing a R Package, Few Commands to Get


Started

R allows users to expand its functionalities by using packages, which are collections of
functions, data, and documentation bundled together. To use a package, you must first install it
using install.packages("package_name"). Once installed, you must load it into your R
session using library(package_name). To check which packages are installed, use the
command installed.packages(). To know more about a package, use
packageDescription("package_name"). If you need help with a function or package, use
help(function_name) or simply ?function_name. You can also use
find.package("package_name") to locate the installation path.

6. Input and Output – Entering Data from Keyboard

In R, data can be entered directly from the keyboard. This is useful for testing or small inputs.
You can use the c() function to create vectors, like x <- c(1, 2, 3, 4). For entering text
data, use c("apple", "banana"). You can also use the scan() function for numeric input
from the keyboard. This kind of input is typically useful for quick data entry or interactive
prompts. R also supports reading data from files (CSV, Excel, etc.), but keyboard entry is
fundamental for learning the language.

7. R - Data Types: Vectors, Lists, Matrices, Arrays, Factors, Data Frame

●​ Vectors are the most basic data type in R and can only hold elements of the same type
(e.g., numeric or character). Example: v <- c(1, 2, 3)​

●​ Lists can store elements of different types (e.g., numbers, strings, vectors, and even
other lists). Example: lst <- list(1, "hello", TRUE)​

●​ Matrices are 2D data structures where all elements must be of the same type. Created
using matrix() function.​
15

●​ Arrays are like matrices but with more than two dimensions. Example: array(1:8,
dim = c(2,2,2)) creates a 3D array.​

●​ Factors are used to handle categorical data and store data as levels. Example:
factor(c("male", "female", "male"))​

●​ Data Frames are tabular data structures where each column can contain different types
of data. Example: data.frame(Name=c("Alice","Bob"), Age=c(25,30)). Data
frames are similar to Excel sheets and are widely used for data analysis.​

8. R - Variables: Variable Assignment, Data Types of Variable, Finding


Variables using ls(), Deleting Variables

In R, variables are created by assigning values using the <- or = operators. Example: x <-
10 or y = "Hello". The data type of a variable is determined by the kind of data stored in it.
You can check the type using typeof(x) or class(x).

To see all variables currently stored in the R environment, use the command ls(). This helps in
keeping track of your workspace. If you want to delete a variable, use the rm() function. For
example, rm(x) will remove the variable x from memory.

UNIT-II of R Programming,

R Operators

R provides various types of operators that are used to perform operations on variables and
values. The Arithmetic Operators include + for addition, - for subtraction, * for multiplication,
/ for division, ^ for exponentiation, %% for modulus (remainder), and %/% for integer division. For
example, if a <- 10 and b <- 3, then a + b returns 13, a %% b returns 1, and a ^ b
returns 1000. Relational Operators are used to compare two values and return logical results
(TRUE or FALSE). These operators include == (equal to), != (not equal to), <, >, <=, and >=. For
example, a > b returns TRUE and a == b returns FALSE. Logical Operators are used for
combining multiple logical expressions. These include & (element-wise AND), | (element-wise
OR), ! (NOT), && (first element AND), and || (first element OR). For example, (a > 5 & b <
5) returns TRUE, while !(a == b) returns TRUE. Assignment Operators are used to assign
values to variables. The common ones are <-, =, ->, and <<-. For example, x <- 5, or 5 ->
16

y both assign the value 5 to the variable. Miscellaneous Operators include : (sequence
generator) and %in% (checks if an element belongs to a vector). For example, 1:5 gives 1 2 3
4 5, and 3 %in% c(1,2,3) returns TRUE.

R Decision Making

Decision-making statements in R help execute code conditionally. The if statement evaluates a


condition, and if it is TRUE, executes a block of code. For instance, if (x > 0) {
print("Positive") } will print "Positive" only when x is greater than 0. The if-else
statement provides two paths: one for TRUE and one for FALSE. For example, if (x > 0) {
print("Positive") } else { print("Negative") } prints "Positive" if x is greater
than 0, otherwise it prints "Negative". The if – else if – else statement allows checking multiple
conditions in sequence. For example:

if (x > 0) {
print("Positive")
} else if (x == 0) {
print("Zero")
} else {
print("Negative")
}

This structure checks whether x is positive, zero, or negative. The switch statement is used
when you want to match a value with multiple choices and execute a corresponding block. For
example:

x <- 2
switch(x, "one", "two", "three") # Returns "two"

This returns "two" as it is the second option corresponding to the value of x.


17

R Loops

Loops are used to execute a block of code repeatedly. The repeat loop continues executing a
block until a break statement is used to stop it. For example:

x <- 1
repeat {
print(x)
x <- x + 1
if (x > 5) break
}

This will print numbers 1 to 5. The while loop keeps running as long as the condition is TRUE.
For instance:

x <- 1
while (x <= 5) {
print(x)
x <- x + 1
}

This prints 1 to 5. The for loop is useful for iterating through elements in a vector or list.
Example:

for (i in 1:5) {
print(i)
}

This also prints 1 to 5. To control loop execution, R provides break and next statements. The
break statement exits the loop prematurely when a condition is met. The next statement skips
the current iteration and continues with the next one. Example:

for (i in 1:5) {
if (i == 3) next
print(i)
}

This skips 3 and prints 1, 2, 4, 5.


18

R Functions

Functions in R allow for reusable blocks of code. You define a function using the function()
keyword. A simple user-defined function might look like:

add <- function(a, b) {


return(a + b)
}
add(2, 3) # Returns 5

This function takes two arguments and returns their sum. Built-in functions are predefined and
commonly used in R. For example:

●​ mean(c(1, 2, 3, 4)) returns the average: 2.5


●​ sum(1:5) returns 15
●​ min(c(4, 2, 9)) returns 2
●​ max(c(4, 2, 9)) returns 9
●​ paste("Hello", "R") returns "Hello R"
●​ seq(1, 10, by = 2) returns 1 3 5 7 9

You can also call functions without arguments. For example:

greet <- function() {


print("Hello World")
}
greet() # Outputs "Hello World"

Functions can be called with argument values too. Example:

multiply <- function(x, y) {


return(x * y)
}
multiply(4, 5) # Returns 20

UNIT-III from your R Programming syllabus

R – Strings: Manipulating Text in Data

R provides several functions to manipulate strings or textual data. The substr() function is
used to extract or replace substrings in a character vector. For example, substr("Welcome",
19

1, 4) returns "Welc". The strsplit() function splits strings into substrings based on a
delimiter. For instance, strsplit("Hello World", " ") splits it into ["Hello",
"World"]. The paste() function combines strings. For example, paste("R",
"Programming") returns "R Programming"; using paste(..., sep = "-") gives a
custom separator. The grep() function searches for patterns in strings and returns matching
indices. For example, grep("a", c("cat", "dog", "rat")) returns 1 and 3 (for "cat"
and "rat"). The toupper() and tolower() functions convert text to uppercase and lowercase
respectively. Example: toupper("r language") returns "R LANGUAGE" and
tolower("HELLO") returns "hello".

R – Vectors

Vectors are basic data structures in R. A sequence vector can be created using : or seq();
for example, 1:5 gives 1 2 3 4 5, and seq(1, 10, 2) gives 1 3 5 7 9. The rep()
function repeats elements; rep(1:3, times=2) gives 1 2 3 1 2 3. Vector access is done
using indexing, such as v[2] for the second element. You can assign names to vector
elements like names(v) <- c("a", "b", "c"). Vector math allows operations like
addition, subtraction, multiplication, and division element-wise. For example, c(1,2,3) +
c(4,5,6) results in 5 7 9. Vector recycling occurs when unequal-length vectors are used in
operations; shorter ones are reused. Sorting vectors is done with sort(), order(), and
rev(). For example, sort(c(3,1,2)) returns 1 2 3.

R – List

Lists can hold different types of data. A list is created using list() like myList <-
list(name="Tom", age=25, scores=c(85,90,95)). You can use tags (names) to
access elements: myList$name. Elements can be added using indexing: myList$city <-
"Delhi", or removed using myList$age <- NULL. To get the size of a list, use
length(myList). You can merge lists using c(list1, list2), and convert a list to a
vector using unlist(myList).
20

R – Matrices

A matrix is a two-dimensional array of the same data type, created using matrix() function.
For example, matrix(1:6, nrow=2) gives a 2×3 matrix. Accessing matrix elements is
done with row and column indices like m[1,2]. Matrix computations include element-wise
addition (+), subtraction (-), multiplication (*), and division (/). For example, m1 + m2 adds two
matrices. Matrix multiplication (not element-wise) is done using %*%.

R – Arrays

Arrays in R are used to store multi-dimensional data. An array can be created with array()
function. Example: array(1:8, dim=c(2,2,2)) creates a 3D array. You can name rows
and columns using dimnames() while creating the array. Accessing elements is done with
array[1,2,1] (1st row, 2nd column, 1st layer). Arrays support manipulation using indexing,
and calculations across elements can be done using functions like apply(array,
MARGIN, FUN), where MARGIN is 1 for rows, 2 for columns.

R – Factors

Factors are used for categorical data. You can create a factor using factor(). For example,
gender <- factor(c("Male", "Female", "Male")). R internally stores the levels
(categories). You can check levels using levels(gender). To generate factor levels
automatically, use gl(). Example: gl(2, 3, labels=c("Control", "Treatment"))
generates a factor with 2 levels, each repeated 3 times.

R – Data Frames

Data frames are table-like structures in R with rows and columns. You can create a data frame
using data.frame(). For example:

df <- data.frame(Name=c("A", "B"), Age=c(21, 22))


21

Access elements using column names like df$Name, or df[1,2] for specific cell. To
understand the data, use:

●​ dim(df) – dimensions (rows and columns),


●​ nrow(df) – number of rows,
●​ ncol(df) – number of columns,
●​ str(df) – structure of the data,
●​ summary(df) – summary statistics,
●​ names(df) – column names,
●​ head(df) – top 6 rows,
●​ tail(df) – bottom 6 rows,
●​ edit(df) – opens an editor to modify the data (in RStudio).

To extract data, use subset() or simple indexing. To expand a data frame, use:

●​ df$newCol <- c(1,2) to add a column,


●​ rbind(df, newRow) to add a row.

You can join data frames using rbind() (row-wise), cbind() (column-wise), and merge two
data frames using merge(df1, df2, by="ID"). For reshaping data, use the reshape2
package functions melt() and cast() to convert data between wide and long formats.

UNIT-IV of your R Programming syllabus:

Loading and Handling Data in R

R allows users to handle files and directories easily while working on data projects. The
working directory is the location on your computer where R reads and writes files. You can find
the current working directory using getwd(), and change it using
setwd("path/to/your/folder"). To see what files are in the working directory, use dir()
or list.files(). Properly setting the working directory ensures that R knows where to find or
save files, which is essential when reading or writing data.
22

R - CSV Files: Reading and Writing

CSV (Comma-Separated Values) files are widely used for storing tabular data. In R, the function
read.csv("filename.csv") is used to read CSV files into data frames. For example, data
<- read.csv("students.csv") loads the file into an object called data. Once loaded, you
can analyze the dataset using several built-in functions. The summary(data) function gives a
summary of each column (like mean, min, max for numeric data). You can use
min(data$Age), max(data$Age), and range(data$Age) to find minimum, maximum, and
range of a specific column. For central tendencies, use mean(data$Marks) for average and
median(data$Marks) for the middle value. The apply() function is used to apply a function
(like mean, sum, etc.) to rows or columns. For example, apply(data[,2:4], 2, mean) will
return column-wise means for the selected columns.

To write data into a CSV file, use write.csv(data, "output.csv"). This is helpful for
exporting processed data or results to be shared or used in other applications.

R - Excel Files: Reading Excel Files

To read Excel files in R, external packages like readxl or openxlsx are required. These
packages provide functions to directly import .xlsx files. First, you install the package using
install.packages("readxl") and load it with library(readxl). Then, use
read_excel("file.xlsx") to load an Excel sheet into R as a data frame. This method
avoids converting Excel files into CSV format, and it is useful when working with Excel files that
have multiple sheets or rich formatting.

Data Visualization in R (Base Plots)

R is known for its excellent data visualization capabilities. With base R functions, you can create
various types of charts:

●​ Bar Charts are used to show the count of categories. Example:


barplot(table(data$Gender)).
●​ Histograms display the distribution of numeric data. For example, hist(data$Marks)
shows how marks are spread across intervals.
●​ Frequency Polygons can be created by combining histogram data with line plots,
helping to visualize distribution trends more clearly.
●​ Density Plots are smoothed histograms that show the probability distribution.
plot(density(data$Marks)) is used for this.
23

●​ Scatter Plots help in visualizing the relationship between two numeric variables. Use
plot(data$Height, data$Weight) to create one.
●​ Box and Whisker Plots are great for understanding the spread and outliers in data.
Example: boxplot(data$Marks).
●​ Heat Maps visualize data intensity using colors. Use heatmap(matrix_data) where
matrix_data is a numeric matrix.
●​ Contour Plots are used to represent 3D data in 2D form using contour lines. These are
useful in surface analysis and are created using contour(matrix_data).

Each of these plots can be customized with labels, colors, and styles for better presentation.

Data Visualization Using ggplot2

The ggplot2 package is a powerful tool for creating complex and aesthetically pleasing
visualizations. It uses a layered grammar of graphics and requires installing the package using
install.packages("ggplot2") and loading it with library(ggplot2).

●​ A bar chart can be created with ggplot(data, aes(x=Gender)) + geom_bar(),


which automatically counts frequency.
●​ A histogram can be made using ggplot(data, aes(x=Marks)) +
geom_histogram(binwidth=5) to visualize distribution.
●​ For a scatter plot, use ggplot(data, aes(x=Height, y=Weight)) +
geom_point() to study the relationship between height and weight.
●​ A box plot to visualize quartiles and outliers can be drawn with ggplot(data,
aes(x=Class, y=Marks)) + geom_boxplot().
●​ A density plot for smooth distribution is created using ggplot(data,
aes(x=Marks)) + geom_density().

ggplot2 makes it easy to add themes, labels, colors, titles, and save plots as images. It is
widely used in data science and statistics for high-quality graphing.

You might also like