0% found this document useful (0 votes)
6 views

Unit 1- Data Analysis Using r

Uploaded by

Nilavan Nilavan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Unit 1- Data Analysis Using r

Uploaded by

Nilavan Nilavan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Semester Course Code Title of the Course Hours Credits

V 21UCC53CC10 DATA ANALYSIS USING R

Unit – I (15 Hours)


Introduction - downloading and installing R - IDEs and text editors - handling packages in R
Getting started with R: Loading and handling data in R - Challenges in Analytical Data
Processing - Expression, Variables Functions - Missing Values treatment in R - Using the
‘as” Operator in R – Vectors – Matrices - List.

INTRODUCTION TO R:
R is a scripting or programming language which provides an environment for statistical
computing, data science and graphics. It was inspired by, and is mostly compatible with,
the statistical language S developed at Bell laboratory (formerly AT & T, now Lucent
technologies).
It easily integrates with programming languages such as Java, C++, Python and Ruby.

R is free. It is available under the terms of the Free Software Foundation’s GNU General Public
License in source code form.
It is available for Windows, Mac and a wide variety of Unix platforms (including FreeBSD,
Linux, etc.).

1
DOWNLOADING AND INSTALLING R
Downloading R
Go to the R Project website: Visit the R Project website.

 Click on "Download R": On the R Project homepage, you'll typically find a link or
button that says "Download R." Click on it.
 Select a CRAN mirror: CRAN (Comprehensive R Archive Network) mirrors are
servers that host R packages and the R software itself. Choose a location that is
geographically close to you to ensure faster download speeds.
 Choose your operating system: There are versions of R available for Windows, macOS,
and various Linux distributions. Select the version that corresponds to your operating
system.
 Download the installer: Click on the link to download the installer package appropriate
for your operating system.

Downloading R and RStudio

Downloading and Installing R

1. Visit CRAN: Go to the CRAN R project website.


2. Choose Your OS: Click on the appropriate link for your operating system (Windows,
macOS, or Linux).
3. Download R:
o Windows: Click "Download R for Windows" -> "base" -> "Download R
X.X.X for Windows".

2
omacOS: Click "Download R for macOS" -> download the .pkg file.
oLinux: Follow the instructions specific to your distribution or use package
managers.
4. Run Installer:
o Windows: Double-click the downloaded .exe file and follow the wizard.
o macOS: Double-click the .pkg file and follow the instructions.
o Linux: Use package manager commands (e.g., sudo apt install r-base for
Ubuntu).
5. Complete Installation: Follow the prompts to complete the installation process.

Downloading and Installing RStudio

6. Visit RStudio: Go to the RStudio download page.


7. Select Your Edition: Choose the "Desktop" version for individual use.
8. Download RStudio:
o Windows: Click the "Download RStudio for Windows" button.
o macOS: Click the "Download RStudio for macOS" button.
o Linux: Click the "Download RStudio for Linux" button and select the
appropriate package.
9. Run Installer:
o Windows: Double-click the downloaded .exe file and follow the installation
instructions.
o macOS: Open the downloaded .dmg file, then drag and drop RStudio into the
Applications folder.
o Linux: Follow instructions for your distribution, such as sudo dpkg -i rstudio-
x.x.x-amd64.deb for Debian-based systems.
10. Complete Installation: Follow the setup prompts to finish the RStudio installation.

Final Steps

11. Open R: Locate and launch R from your applications or start menu.
12. Open RStudio: Locate and launch RStudio from your applications or start menu.
13. Verify Installation:
o In RStudio, check that R is properly integrated by running version in the R
console.
14. Update Packages: In RStudio, update your packages by running update.packages() in
the R console.
15. Start Coding: Begin using RStudio for your data analysis and coding tasks!

R Studio
RStudio is an integrated development environment (IDE) specifically designed for R
programming. It provides a comprehensive and user-friendly interface for writing, debugging,
and running R code.

3
RStudio is the most widely used tool for writing, testing, and running R code (Figure 1.7). It's
user-friendly and open source. The typical screen of RStudio includes several key parts:
 Console: Where commands are typed and outputs are displayed.
 Workspace: Shows active objects (variables, datasets) used in the current session.
 History: Displays a log of previously executed commands.
 Files: Provides a view of folders and files in the current directory.
 Plots: Shows graphical outputs such as plots and charts.
 Packages: Lists add-ons and packages needed for specific tasks.
 Help: Provides documentation and assistance on using RStudio, commands, and more.
Eclipse with StatET
Eclipse with StatET is an IDE originally known for Java and C++, but it's also used for R
programming. StatET adds tools for coding in R and building R packages. It supports local and
remote R installations and can expand its capabilities with add-ons like Sweave and Wikitext.
Key features include:

4
 Console: Executes R commands and displays results.
 Object Browser: Helps navigate and manage R objects.
 Package Manager: Manages R packages for project dependencies.
 Debugger: Assists in finding and fixing issues in R code.
 Data Viewer: Displays datasets and allows exploration.
 R Help System: Provides documentation and assistance for R functions and commands.

HANDLING packages in R

 R Packages (Code Modules): Collections of functions, datasets, and documentation


bundled together to extend R’s capabilities.
 Library (Package Directory): A folder where R packages are stored. The default
library is where R installs packages, but you can set a different path if needed.
 Popularity (Extensive Availability): There are over 10,000 packages available on
CRAN, making R highly versatile and widely used.
 Usage (Extend Functionality): Packages add new features and tools to R, and users
can create and share their own packages with others.

Category Package Description


Grammar of data manipulation with consistent verbs for
dplyr
Data common tasks.
Manipulation tidyr Helps tidy and clean data.
and Analysis Enhanced data.frames for faster and memory-efficient data
data.table
manipulation.
Declarative graphics creation based on The Grammar of
ggplot2
Graphics.
Data
Visualization plotly Creates interactive web-based graphics.
lattice High-level data visualization functions.
Statistical
caret Streamlines the creation of predictive models.
Modeling
Implements random forest algorithm for classification and
randomForest
regression.
Fast and user-friendly functions for reading rectangular
readr
data.
Data readxl Simplifies reading Excel files.
Import/Export
jsonlite High-performance JSON parser and generator.

Getting Started with R

 Data Exploration in R: Involves summarizing and visualizing datasets to understand


variables and data structures for further analysis.
 Working with Directory:
o getwd(): Retrieves the current working directory.

5
o setwd(): Sets the working directory to a specified location.
o dir() / list.files(): Lists files and directories in the current or specified
directory, with options to:
 Display files in a specific path.
 Show absolute paths.
 Match files/directories with a pattern.
 Recursively list files/directories.

Data Types in R

R is a programming language that uses variables to store information. When variables


are created, memory is reserved to hold their values. The amount of memory reserved
depends on the data type of the variables, such as boolean, numbers, characters, etc.

In R, variables are not explicitly declared with data types. Instead, they store R objects,
and the data type of the R object determines the data type of the variable. The most
commonly used R objects are:

 Vector
 List
 Matrix
 Array
 Factor
 Data Frame

A vector is the simplest R object and can contain various data types. All other R objects are
based on these atomic vectors. The main data types supported by R are:

 Logical
 Numeric
 Integer
 Character
 Double
 Complex
 Raw

 Vector–a collection of ordered homogeneous elements


 Matrix-a vector with two-dimensional shape information
 Lists A vector with possible heterogeneous elements. The elements of a list can be
numeric vectors, character Vectors, matrices, arrays, and lists.
 Data Frame: A list with possible heterogeneous vector elements of the same length.
 The elements of a data frame can be numeric vectors, factor vectors, and logical vectors,
but they must all be of the same length

In R, there are several ways to work with dates, including functions to get the current date,
manipulate dates, and format dates. Here are some common date functions in R:

6
1. Getting the Current Date and Time:
o Sys.Date(): Returns the current date.
o Sys.time(): Returns the current date and time.

current_date <- Sys.Date()


current_time <- Sys.time()
print(current_date)
print(current_time)

2. Creating Date Objects:


o as.Date(): Converts a character string to a date object.

date_string <- "2024-07-11"


date_obj <- as.Date(date_string)
print(date_obj)

3. Formatting Dates:
o format(): Formats a date object to a specified string format.

formatted_date <- format(date_obj, "%B %d, %Y") # "July 11, 2024"


print(formatted_date)

4. Date Arithmetic:
o You can perform arithmetic operations on date objects to add or subtract days.

new_date <- date_obj + 10 # Adds 10 days


earlier_date <- date_obj - 5 # Subtracts 5 days
print(new_date)
print(earlier_date)

5. Differences Between Dates:


o difftime(): Calculates the difference between two date or time objects.

date1 <- as.Date("2024-07-01")


date2 <- as.Date("2024-07-11")
difference <- difftime(date2, date1, units = "days")
print(difference) # "10 days"

6. Parsing Dates from Strings:


o strptime(): Parses a date-time string into a POSIXlt or POSIXct object.

date_time_string <- "2024-07-11 15:30:00"


date_time_obj <- strptime(date_time_string, format="%Y-%m-%d %H:%M:%S")
print(date_time_obj)

These functions cover the basics of handling dates in R. There are many more functions and
packages available (such as lubridate) that provide additional capabilities and convenience
for working with dates and times.

7
VARIABLES

In R, variables are used to store data that can be referenced and manipulated later in
the program.

Creating Variables

1. Assignment Operator (<- or =):


o The <- operator is traditionally used in R to assign values to variables.
o The = operator can also be used, but <- is preferred for clarity.
x <- 10
y=5
2. Variable Names:
o Variable names can include letters, numbers, periods (.), and underscores (_).
o They cannot start with a number and should not contain spaces.
variable1 <- "Hello"
variable2 <- 20
variable_3 <- TRUE
Types of Variables
1. Numeric:
o Stores numbers.
num <- 42
2. Character:
o Stores strings of text.
text <- "This is a string"
3. Logical:
o Stores TRUE or FALSE values.
flag <- TRUE
4. Vector:
o Stores a sequence of data elements of the same type.
numbers <- c(1, 2, 3, 4, 5)
5. List:
o Stores a collection of elements that can be of different types.
my_list <- list(name = "John", age = 30, married = TRUE)
6. Matrix:
o Stores data in a two-dimensional array.
my_matrix <- matrix(1:9, nrow = 3)

8
7. Data Frame:
o Stores tabular data.
my_data <- data.frame( name = c("John", "Jane", "Doe"),
age = c(30, 25, 22), married = c(TRUE, FALSE, TRUE))

NOTE: POINTS TO REMEMBER


(i) Assign a value of 50 to the variable called ‘Var’.
> Var <-50 Or
> Var=5

(ii) Print the value in the variable, ‘Var’.


> Var
[1] 50

(iii) Perform arithmetic operations on the variable, ‘Var’.


> Var + 10
[1] 60
> Var / 2
[1] 25

Variables can be reassigned values either of the same data type or of a different
datatype.
(iv) Reassign a string value to the variable, ‘Var’.
> Var <- “R is a Statistical Programming Language”
Loading and Handling Data in R 51
Print the value in the variable, ‘Var’.
> Var
[1] “R is a Statistical Programming Language”
(v) Reassign a logical value to the variable, ‘Var’.

> Var <- TRUE


> Var

[1] TRUE

Functions

Functions in R are used to encapsulate a block of code that performs a specific task. They can
take inputs, process them, and return outputs.

Creating Functions

1. Basic Function Structure:


o Functions are defined using the function keyword.
o They can take arguments and return values using the return() function or
implicitly returning the last evaluated expression.

# Define a simple function


my_function <- function()

9
{
print("Hello, World!")
}
# Call the function
my_function()

2. Function with Arguments:


o Functions can take one or more arguments.

# Function with one argument


greet <- function(name) {
print(paste("Hello,", name))
}

greet("Alice")

MIN, MAX, SEQ FUNCTIONS IN R

In R, min, max, and seq are commonly used functions for handling numerical data and
generating sequences. Here's an overview of how to use these functions effectively:

min Function

The min function is used to find the smallest value in a numeric vector.

# Example vector
vec <- c(10, 5, 8, 3, 7)

# Find the minimum value


minimum_value <- min(vec)
print(minimum_value)

 Handling NA values:
o By default, min will return NA if there are any missing values (NA) in the
vector.
o Use na.rm = TRUE to remove NA values before finding the minimum.
o The na.rm argument stands for "NA remove" and is a standard argument
recognized by many R functions for handling missing values.

# Example vector with NA


vec_with_na <- c(10, 5, NA, 3, 7)

# Find the minimum value, ignoring NA


minimum_value_no_na <- min(vec_with_na, na.rm = TRUE)
print(minimum_value_no_na)

max Function

The max function is used to find the largest value in a numeric vector.

10
# Example vector
vec <- c(10, 5, 8, 3, 7)

# Find the maximum value


maximum_value <- max(vec)
print(maximum_value)

 Handling NA values:
o By default, max will return NA if there are any missing values (NA) in the
vector.
o Use na.rm = TRUE to remove NA values before finding the maximum.

# Example vector with NA


vec_with_na <- c(10, 5, NA, 3, 7)

# Find the maximum value, ignoring NA


maximum_value_no_na <- max(vec_with_na, na.rm = TRUE)
print(maximum_value_no_na)

seq Function

The seq function is used to generate sequences of numbers. It can be used in several ways:

1. Basic Sequence:
o Generates a sequence from a starting value to an ending value.

# Generate a sequence from 1 to 10


sequence <- seq(1, 10)
print(sequence)

2. Specifying the Step Size:


o You can specify the step size between each number in the sequence using the
by argument.

# Generate a sequence from 1 to 10 with a step size of 2


sequence_by <- seq(1, 10, by = 2)
print(sequence_by)

3. Specifying the Length of the Sequence:


o You can specify the desired length of the sequence using the length.out
argument.

# Generate a sequence from 1 to 10 with a specified length of 5


sequence_length <- seq(1, 10, length.out = 5)
print(sequence_length)

4. Generating a Sequence Using a Specified Number of Points:


o This is similar to specifying the length but often used for generating sequences
over an interval.

# Generate a sequence from 1 to 10 with 5 points

11
sequence_points <- seq(1, 10, length.out = 5)
print(sequence_points)

Examples

Here are some practical examples combining these functions:

1. Find the Minimum and Maximum of a Sequence:

# Generate a sequence from 1 to 20


sequence <- seq(1, 20)

# Find the minimum and maximum values in the sequence


min_value <- min(sequence)
max_value <- max(sequence)

print(min_value)
print(max_value)

2. Generate a Sequence and Handle NA Values:

# Example vector with NA values


vec_with_na <- c(1, 2, NA, 4, 5, NA, 7)

# Find the minimum and maximum values, ignoring NA


min_no_na <- min(vec_with_na, na.rm = TRUE)
max_no_na <- max(vec_with_na, na.rm = TRUE)

print(min_no_na)
print(max_no_na)

3. Create a Custom Sequence and Find Extremes:

# Generate a sequence from 0 to 100 in steps of 10


custom_sequence <- seq(0, 100, by = 10)

# Find the minimum and maximum values


min_custom <- min(custom_sequence)
max_custom <- max(custom_sequence)

print(custom_sequence)
print(min_custom)
print(max_custom)

Manipulating Text in Data


Manipulating text data in R is a common task, especially when dealing with data cleaning,
pre-processing, and analysis.

12
Manipulating text in R involves using various functions and techniques to modify, clean, and
analyze text data, such as concatenating strings, changing case, extracting substrings, pattern
matching, replacing text, and splitting or joining strings. This is essential for preparing and
transforming textual information for further analysis and processing.

Basic String Functions

1. Concatenating Strings (paste and paste0):


o paste combines strings with a separator.
o paste0 combines strings without any separator.

# Concatenate strings with a space separator


str1 <- "Hello"
str2 <- "World"
combined <- paste(str1, str2)
print(combined) # Output: "Hello World"

# Concatenate strings without a separator


combined_no_sep <- paste0(str1, str2)
print(combined_no_sep) # Output: "HelloWorld"

2. Changing Case (toupper, tolower):


o toupper converts text to uppercase.
o tolower converts text to lowercase.

text <- "Hello World"


upper_text <- toupper(text)
lower_text <- tolower(text)
print(upper_text) # Output: "HELLO WORLD"
print(lower_text) # Output: "hello world"

3. String Length (nchar):


o nchar returns the number of characters in a string.

text <- "Hello"


length <- nchar(text)
print(length) # Output: 5

4. Substrings (substr, substring):


o substr extracts or replaces substrings in a string.
o substring is similar but allows for more flexibility with vector inputs.

text <- "Hello World"


sub_text <- substr(text, 1, 5) # Extracts characters from position 1 to 5
print(sub_text) # Output: "Hello"

# Replace part of a string


substr(text, 1, 5) <- "Hi"
print(text) # Output: "Hi World"

13
Regular Expressions

1. Pattern Matching (grep, grepl):


o grep returns the indices of matches.
o grepl returns a logical vector indicating if a match was found.

text_vector <- c("apple", "banana", "cherry", "date")


pattern <- "a"

# Find indices of matches


matches <- grep(pattern, text_vector)
print(matches) # Output: 1 2 4

# Logical vector of matches


matches_logical <- grepl(pattern, text_vector)
print(matches_logical) # Output: TRUE TRUE FALSE TRUE

2. String Replacement (sub, gsub):


o sub replaces the first occurrence of a pattern.
o gsub replaces all occurrences of a pattern.

text <- "Hello World"

# Replace the first occurrence of "o" with "O"


sub_text <- sub("o", "O", text)
print(sub_text) # Output: "HellO World"

# Replace all occurrences of "o" with "O"


gsub_text <- gsub("o", "O", text)
print(gsub_text) # Output: "HellO WOrld"

String Splitting and Joining

1. Splitting Strings (strsplit):


o strsplit splits a string into substrings based on a specified delimiter.

text <- "apple,banana,cherry,date"


split_text <- strsplit(text, ",")
print(split_text) # Output: list(c("apple", "banana", "cherry", "date"))

2. Joining Strings (paste, paste0):


o These functions can also be used to join elements of a vector into a single
string.

# Join elements with a comma separator


joined_text <- paste(c("apple", "banana", "cherry", "date"), collapse = ",")
print(joined_text) # Output: "apple,banana,cherry,date"

Stringr Package

14
The stringr package provides a cohesive set of functions designed for consistent and easier
string manipulation.

# Install and load the stringr package


install.packages("stringr")
library(stringr)

# Examples of stringr functions


text <- "Hello World"

# String length
str_length(text) # Output: 11

# Substring
str_sub(text, 1, 5) # Output: "Hello"

# Uppercase
str_to_upper(text) # Output: "HELLO WORLD"

# Pattern matching
str_detect(text, "World") # Output: TRUE

# Replace text
str_replace(text, "World", "Everyone") # Output: "Hello Everyone"

Missing Values treatment in R


Missing values treatment in R involves identifying, handling, and imputing NA
(missing) values in datasets to ensure accurate and meaningful data analysis. Techniques
include removing rows or columns with missing values, replacing NAs with specific values
or summary statistics (like mean or median), and using advanced imputation methods to
estimate and fill in missing data.

In R, NA (Not Available) represents missing values and Inf (Infinite) represents


infinite values. R provides different functions that identify the missing values during
processing.

15
1. Identifying Missing Values

 Using is.na:
o is.na returns a logical vector indicating which elements are NA.

# Example vector with missing values


vec <- c(1, 2, NA, 4, NA, 6)

# Identify missing values


is.na(vec)

2. Removing Missing Values

 Removing NAs from a vector:


o Use na.omit or na.exclude to remove NA values.

# Remove NAs from a vector


clean_vec <- na.omit(vec)

 Removing rows with NAs from a data frame:


o Use na.omit to remove rows with any NA values.

# Example data frame with missing values


df <- data.frame(x = c(1, 2, NA), y = c(NA, 4, 5))

# Remove rows with NAs


clean_df <- na.omit(df)

3. Replacing Missing Values

 Replacing NAs with a specific value:

16
o Use replace or indexing to fill NA values with a specified value.

# Replace NAs with zero


vec[is.na(vec)] <- 0

 Replacing NAs with the mean or median:


o Calculate the mean or median, ignoring NA, and use it to fill missing values.

# Replace NAs with the mean of the vector


mean_val <- mean(vec, na.rm = TRUE)
vec[is.na(vec)] <- mean_val

4. Using Functions with na.rm Argument

 Many functions in R have an na.rm argument to handle missing values by removing


them during the computation.

r
Copy code
# Calculate the mean, ignoring NAs
mean(vec, na.rm = TRUE)

5. Using the tidyverse for Missing Values

 The tidyverse package provides convenient functions for handling missing data.

# Load tidyverse package


library(tidyverse)

# Replace NAs using dplyr


df <- df %>%
mutate(x = replace_na(x, 0), y = replace_na(y, 0))

6. Imputing Missing Values

 Using the mice package:


o mice is used for multivariate imputation by chained equations.

# Install and load the mice package


install.packages("mice")
library(mice)

# Perform multiple imputation


imputed_data <- mice(df, m = 5, maxit = 50, method = 'pmm', seed = 500)
complete_data <- complete(imputed_data)

 Using the Hmisc package:


o Hmisc provides tools for imputing missing data.

# Install and load the Hmisc package


install.packages("Hmisc")

17
library(Hmisc)

# Impute missing values using the mean


imputed_df <- impute(df, fun = mean)

Coercion

In R programming, coercion refers to the process of converting an object from one


class or data type to another. R performs coercion automatically in some situations, but it can
also be done explicitly by the programmer. Coercion is essential for data manipulation and
analysis, ensuring that data types are compatible with functions and operations.

Coercion helps to convert one data type to another, e.g. logical “TRUE” value when
converted to numeric yields “1”. Likewise, logical “FALSE” value yields “0 ”.

Types of Coercion in R

1. Implicit Coercion: Automatically performed by R when it encounters mixed types


within an operation.
2. Explicit Coercion: Performed by the user using functions to convert data types
intentionally.

Implicit Coercion

R will automatically coerce data types to make them compatible in certain operations. For
example, combining numeric and character data in a vector will result in all elements being
coerced to characters.

x <- c(1, "a", 3.14)

print(x)

# [1] "1" "a" "3.14"

Explicit Coercion

Explicit coercion is performed using specific functions designed to convert one type to
another.

Common coercion functions include as.numeric(), as.character(), as.logical(), and as.factor().

18
List

 Definition: A list in R is a collection of elements that can be of different types,


including numbers, strings, vectors, and even other lists.

19
 Usage: Lists are used to store heterogeneous data. Each element can be named for
easier access.

my_list <- list(name = "Ajay", age = 25, scores = c(85, 90, 88))

Vector

 Definition: A vector in R is a sequence of elements that are all of the same type. It
can be a numeric, character, logical, or integer vector.
 Usage: Vectors are the basic data type in R for storing a sequence of values.

numeric_vector <- c(1, 2, 3, 4, 5)


character_vector <- c("a", "b", "c")

Data Frame

 Definition: A data frame is a table or a two-dimensional array-like structure where


each column can contain different types of data (numeric, character, factor, etc.).
 Usage: Data frames are used for storing data tables. It is one of the most common data
structures for data analysis in R.

df <- data.frame(name = c("Alice", "Bob"), age = c(25, 30), scores = c(85, 90))

Array

 Definition: An array in R is a multi-dimensional data structure that can store elements


of the same type. An array can have more than two dimensions.
 Usage: Arrays are used to store data in more than two dimensions.

my_array <- array(1:12, dim = c(3, 4)) # 3 rows, 4 columns

Matrix

 Definition: A matrix is a two-dimensional array where elements are of the same type.
It is essentially a collection of vectors of the same length.
 Usage: Matrices are used for mathematical computations and storing tabular data.

my_matrix <- matrix(1:9, nrow = 3, ncol = 3)

Summary

 List: Heterogeneous elements, any data type.


 Vector: Homogeneous elements, one data type.
 Data Frame: Heterogeneous columns, each column one data type.
 Array: Multi-dimensional, homogeneous elements.
 Matrix: Two-dimensional, homogeneous elements.

These data structures are foundational in R and are used extensively for data manipulation
and analysis.

20
using the ‘as’ operator to Change the structure of Data
In R, the as operator is used to change the structure or type of data. This operator is part of
functions that coerce data from one type to another, which is essential for data manipulation
and analysis.

The as operator in R is crucial for converting data types to ensure compatibility with various
functions and analyses. Common as functions include as.numeric, as.character, as.integer,
as.factor, as.Date, and as.logical, each serving a specific purpose in data type coercion.

Common as Functions

1. as.numeric: Converts data to numeric type.

# Example: Convert a character vector to numeric


char_vector <- c("1", "2", "3")
num_vector <- as.numeric(char_vector)
print(num_vector) # Output: 1 2 3

2. as.character: Converts data to character type.

# Example: Convert a numeric vector to character


num_vector <- c(1, 2, 3)
char_vector <- as.character(num_vector)
print(char_vector) # Output: "1" "2" "3"

3. as.integer: Converts data to integer type.

# Example: Convert a numeric vector to integer


num_vector <- c(1.5, 2.8, 3.2)
int_vector <- as.integer(num_vector)
print(int_vector) # Output: 1 2 3

4. as.factor: Converts data to factor type, which is used for categorical data.

# Example: Convert a character vector to factor


char_vector <- c("apple", "banana", "cherry")
factor_vector <- as.factor(char_vector)
print(factor_vector) # Output: apple banana cherry

5. as.Date: Converts data to Date type.

# Example: Convert a character vector to Date


char_dates <- c("2024-07-09", "2024-08-10")
date_vector <- as.Date(char_dates)
print(date_vector) # Output: "2024-07-09" "2024-08-10"

6. as.logical: Converts data to logical type (TRUE/FALSE).

21
# Example: Convert a numeric vector to logical
num_vector <- c(1, 0, 1, 0)
logical_vector <- as.logical(num_vector)
print(logical_vector) # Output: TRUE FALSE TRUE FALSE

Example: Converting Data Types in a Data Frame

# Example data frame


df <- data.frame(
num_col = c("1", "2", "3"),
char_col = c(1, 2, 3),
stringsAsFactors = FALSE
)

# Convert columns to appropriate types


df$num_col <- as.numeric(df$num_col)
df$char_col <- as.character(df$char_col)

print(df)

VECTOR
 Vector is a one-dimensional array that can store a sequence of elements of the same
type.
Vector Definition
 A vector in R is a basic data structure that contains a collection of elements, all
of which must be of the same type. This type can be numeric, integer, character,
or logical.
 Vectors are essential in R because many functions operate on vectors, making them a
fundamental concept in data manipulation and analysis.
Characteristics of Vectors in R
1. Homogeneity: All elements in a vector must be of the same type.
2. Indexing: Elements in a vector can be accessed and manipulated using their index
positions, which start at 1 in R.
3. Element-wise Operations: Many operations in R are vectorized, meaning they can be
applied to each element of a vector simultaneously.
Types of Vectors
1. Numeric Vectors: Contain numbers (real or decimal numbers).
numeric_vector <- c(1.1, 2.2, 3.3)

NOTE: The c() function in R stands for "combine" or "concatenate." It is used to create a vector
by combining its arguments into a single vector.
22
 In this case, c(1.1, 2.2, 3.3) creates a vector containing the numeric values 1.1, 2.2, and
3.3.
2. Integer Vectors: Contain integer numbers.
integer_vector <- c(1, 2, 3) # indicates integers.
3. Character Vectors: Contain strings or text.
character_vector <- c("apple", "banana", "cherry")
4. Logical Vectors: Contain Boolean values (TRUE or FALSE).
logical_vector <- c(TRUE, FALSE, TRUE)
Creating Vectors
Vectors are typically created using the c() function, which stands for "combine" or
"concatenate":
SNO VECTOR TYPE SYNTAX
1 Numeric vector <- c(1, 2, 3, 4, 5)

2 Character vector <- c("a", "b", "c")

3 Logical vector <- c(TRUE, FALSE, TRUE)

Examples of Vector Operations


Vectors support various operations and functions:
# Arithmetic operations
numeric_vector <- c(1, 2, 3)
numeric_vector <- numeric_vector * 2 # Multiplies each element by 2
print(numeric_vector) # Output: 2 4 6

# Accessing elements
second_element <- numeric_vector[2]
print(second_element) # Output: 4

# Functions on vectors
vector_sum <- sum(numeric_vector)

23
print(vector_sum) # Output: 12
vector_mean <- mean(numeric_vector)
print(vector_mean) # Output: 4

Lists in R

A list in R is a data structure that can contain elements of different types, such as numbers,
strings, vectors, and even other lists. Unlike vectors, which can only hold elements of the
same type, lists are versatile and can mix different data types.

Syntax:
list_name <- list(element1, element2, ..., elementN)
Example:

Let's create a simple list containing different types of data: a numeric vector, a character
vector, and a logical value.

my_list <- list(


numbers = c(1, 2, 3),
fruits = c("apple", "banana", "cherry"),
is_true = TRUE
)

print(my_list)
Output:
$numbers
[1] 1 2 3

$fruits
[1] "apple" "banana" "cherry"

$is_true
[1] TRUE
List-Related Operations:

1. Accessing List Elements:


o By Index:

my_list[[1]] # Access the first element (numeric vector)

o By Name:

my_list$fruits # Access the 'fruits' element

o Using [ Operator:

my_list["fruits"] # Returns the element as a list

24
2. Modifying List Elements:
o Change an element:

my_list$numbers <- c(4, 5, 6)

o Add a new element:

r
my_list$new_item <- "new element"

3. Combining Lists:
o Combine two lists:

another_list <- list(colors = c("red", "green", "blue"))


combined_list <- c(my_list, another_list)

4. Removing List Elements:


o Remove an element:

my_list$fruits <- NULL

5. Length of List:
o Find the number of elements in the list:

length(my_list)
Example with Simple Data Set:

Let's consider a data set with information about students: their names, ages, and scores in a
test.

students <- list(


names = c("Alice", "Bob", "Charlie"),
ages = c(20, 22, 23),
scores = c(85, 90, 88)
)

# Accessing the names


students$names

# Adding a new element - gender


students$gender <- c("F", "M", "M")

# Modifying the scores


students$scores <- c(87, 92, 89)

# Removing the ages


students$ages <- NULL

# Checking the length of the list


length(students)

25
Output:
$names
[1] "Alice" "Bob" "Charlie"

$scores
[1] 87 92 89

$gender
[1] "F" "M" "M"

[1] 3 # Length of the list after removing 'ages'

Matrix in R

Definition:

A matrix in R is a two-dimensional data structure that contains elements of the same data
type (numeric, character, or logical). It is essentially a collection of vectors of the same
length, arranged in rows and columns.

Syntax:
matrix(data, nrow, ncol, byrow = FALSE, dimnames = NULL)

 data: The elements to be included in the matrix.


 nrow: Number of rows.
 ncol: Number of columns.
 byrow: Logical value. If TRUE, the matrix is filled by rows, otherwise by columns.
 dimnames: A list of row and column names.

Example:

Let's create a simple matrix with 3 rows and 3 columns.

my_matrix <- matrix(


data = 1:9,
nrow = 3,
ncol = 3,
byrow = TRUE,
dimnames = list(
c("Row1", "Row2", "Row3"),
c("Col1", "Col2", "Col3")
)
)

print(my_matrix)
Output:
Col1 Col2 Col3
Row1 1 2 3
Row2 4 5 6
Row3 7 8 9
26
Operations on Matrices:

1. Accessing Elements:
o Access a specific element:

my_matrix[2, 3] # Element in the 2nd row, 3rd column

o Access a specific row:

my_matrix[1, ] # All elements in the 1st row

o Access a specific column:

my_matrix[, 2] # All elements in the 2nd column

2. Matrix Arithmetic:
o Addition/Subtraction:

matrix1 <- matrix(1:4, nrow = 2, ncol = 2)


matrix2 <- matrix(5:8, nrow = 2, ncol = 2)
result_add <- matrix1 + matrix2 # Addition
result_sub <- matrix1 - matrix2 # Subtraction

o Element-wise Multiplication:

result_mul <- matrix1 * matrix2 # Element-wise multiplication

o Matrix Multiplication:

result_matrix_mul <- matrix1 %*% matrix2 # Matrix multiplication

3. Transposing a Matrix:
o Transpose the matrix:

transposed_matrix <- t(my_matrix)


Example Program:
# Create a 2x2 matrix
matrix1 <- matrix(c(1, 2, 3, 4), nrow = 2, ncol = 2)

# Create another 2x2 matrix


matrix2 <- matrix(c(5, 6, 7, 8), nrow = 2, ncol = 2)

# Perform matrix addition


add_result <- matrix1 + matrix2

# Perform matrix multiplication


mul_result <- matrix1 %*% matrix2

27
# Print results
print("Matrix Addition Result:")
print(add_result)

print("Matrix Multiplication Result:")


print(mul_result)

Output:
[1] "Matrix Addition Result:"
[,1] [,2]
[1,] 6 8
[2,] 10 12

[1] "Matrix Multiplication Result:"


[,1] [,2]
[1,] 19 22
[2,] 43 50

28

You might also like