Basics of R and Rstudio:-
R is a programming language and environment primarily used for
statistical computing and graphics. It's popular among data analysts,
statisticians, and researchers for its robust capabilities in data analysis,
visualization, and machine learning. RStudio, on the other hand, is an
integrated development environment (IDE) for R that provides a user-
friendly interface, making it easier to write, debug, and execute R code.
It offers features like syntax highlighting, code completion, and built-in
tools for data visualization and package management. Together, R and
RStudio form a powerful combination for data analysis and statistical
computing.
Here are some basic concepts and features:
o Data Structures: R has several basic data structures including
vectors, matrices, data frames, and lists.
o Vectors: A basic data structure in R, can hold numeric, character,
or logical data.
o Functions: R has a vast number of built-in functions for statistical
analysis, data manipulation, and visualization. You can also create
your own functions.
o Packages: R's functionality is extended through packages. You can
install packages from CRAN (Comprehensive R Archive Network)
using the install.packages() function and load them into your R
session using the library() function.
o Data Manipulation: R provides powerful tools for data
manipulation such as subsetting, merging, and transforming
datasets.
o Data Visualization: R offers various packages for data
visualization, including ggplot2, lattice, and base graphics.
o R Markdown: R Markdown is an authoring format that enables
easy creation of dynamic documents, including reports,
presentations, and dashboards, integrating R code with narrative
text and output.
o Debugging: RStudio provides features for debugging your code,
including setting breakpoints, stepping through code, and viewing
variable values.
o Version Control: RStudio has integrated support for version
control systems like Git, allowing you to manage your code
changes effectively.
o Help and Documentation: R and RStudio provide extensive help
documentation and online resources for learning and
troubleshooting.
Setting variables in r
Variables set using the assignment operator <- or the equal sign =.
Here's how you do it:
# Using the assignment operator
x <- 10
# Using the equal sign
y = 20
# You can also assign values to multiple variables at once
a <- 5
b <- 8
c <- a + b
# Printing the variables
print(x)
print(y)
print(c)
output
[1] 10
[1] 20
[1] 13
This output indicates that x is 10, y is 20, and c (which is the sum of a
and b) is 13. The [1] indicates the index of the outpu
Knowing about objects in r
In R, everything is treated as an object. Objects are entities that contain
data (values) along with metadata (attributes) that describe the data.
There are several types of objects in R, including:
Scalars: Single values such as numbers, characters, or logical values.
x <- 10 # Numeric scalar
y <- "hello" # Character scalar
z <- TRUE # Logical scalar
Vectors: Ordered collections of elements of the same data type.
nums <- c(1, 2, 3, 4, 5) # Numeric vector
chars <- c("a", "b", "c") # Character vector
bools <- c(TRUE, FALSE, TRUE) # Logical vector
Matrices: 2-dimensional arrays with elements of the same data type.
mat <- matrix(1:9, nrow = 3, ncol = 3) # Numeric matrix
Data Frames: Tables where each column can be a different data type.
df <- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 35),
Married = c(TRUE, FALSE, TRUE)
Lists: Ordered collections of objects (vectors, matrices, data frames,
etc.) of possibly different types.
lst <- list(
nums = c(1, 2, 3),
chars = c("a", "b", "c"),
mat = matrix(1:9, nrow = 3, ncol = 3))
Functions: R treats functions as objects, which can be assigned to
variables and passed as arguments to other functions.
add <- function(a, b) {
return(a + b)
}
Attributes of objects
In R, objects can have attributes associated with them. Attributes
provide additional metadata about the object.
o Some common attributes include:
Names: Many objects in R, such as vectors, lists, and data frames, can
have names associated with their elements. You can assign names using
the names() function.
# Creating a numeric vector with names
nums <- c(1, 2, 3)
names(nums) <- c("first", "second", "third")
Dimensions: Matrices and arrays have dimensions, which specify the
number of rows and columns (and additional dimensions for arrays).
You can set dimensions using the dim() function.
# Creating a matrix with dimensions
mat <- matrix(1:6, nrow = 2, ncol = 3)
dim(mat) <- c(3, 2) # Changing dimensions to 3 rows, 2 columns
Class: The class attribute specifies the type of object. Many functions in
R use the class attribute to determine how to handle objects. You can
set or modify the class using the class() function.
# Creating a numeric vector and setting its class to "myclass"
nums <- c(1, 2, 3)
class(nums) <- "myclass"
Factor Levels: Factors are categorical variables in R. They have levels
that represent the categories. You can set or modify levels using the
levels() function.
# Creating a factor and setting its levels
colors <- factor(c("red", "blue", "green"))
levels(colors) <- c("red", "green", "blue") # Changing level order
Str() and summary() functions
str() Function: The str() function provides a compact display of
the internal structure of an R object.It is particularly useful for
understanding the structure of complex objects like lists, data
frames, and other user-defined objects.
The output of str() includes the type of object, its dimensions (if
applicable), and the structure of its elements.
It provides a summary of the structure of the object, including its
class, length, and, for data frames, the structure of each column.
Example:
# Example using str() with a data frame
df <- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 35),
Married = c(TRUE, FALSE, TRUE)
str(df)
Output:
'data.frame': 3 obs. of 3 variables:
$ Name : Factor w/ 3 levels "Alice","Bob",..: 1 2 3
$ Age : num 25 30 35
$ Married: logi TRUE FALSE TRUE
summary() Function: The summary() function provides
summary statistics for numerical variables in a data frame or
vector.It computes common descriptive statistics such as
minimum, 1st quartile, median, mean, 3rd quartile, and maximum
values.
For factors, it provides counts of each level.
The output varies depending on the type of data being
summarized
Example:
# Example using summary() with a data frame
summary(df)
Output:
Name Age Married
Alice :1 Min. :25.00 Mode :logical
Bob :1 1st Qu.:27.50 FALSE:1
Charlie:1 Median :30.00 TRUE :2
Mean :30.00
3rd Qu.:32.50
Max. :35.00
R workspace: In R, the workspace refers to the environment
where objects (variables, functions, etc.) are stored during an R
session. When you work in R, you create objects, load data, and
perform calculations, all of which are stored in the workspace.
The workspace includes:
1. Objects: Variables, data frames, lists, functions, etc., that you have
defined or loaded into R during the session.
2. Functions: User-defined functions or built-in functions available for
use.
3. Settings: Environment settings and options.
The workspace is stored in memory and persists until you end your R
session or explicitly remove objects from it. You can save the entire
workspace or specific objects within it to a file using functions like
save() and saveRDS(), and later load them back into R using functions
like load() and readRDS().
It's important to manage your workspace effectively, avoiding clutter
and unnecessary objects to keep your R session running smoothly and
efficiently.
Creating sequences in R: In R, you can create sequences using
the seq() function. This function generates sequences of numbers
according to specified parameters. Here's how you can use it:
Basic Sequence: To create a sequence of numbers from a starting point
to an ending point, you can use the syntax seq(from, to).
# Create a sequence from 1 to 10
seq1 <- seq(1, 10)
print(seq1)
Output:
[1] 1 2 3 4 5 6 7 8 9 10
Specifying Increment: You can also specify the increment between
numbers in the sequence using the by parameter.
# Create a sequence from 1 to 10 with increment of 2
seq2 <- seq(1, 10, by = 2)
print(seq2)
Output:
1] 1 3 5 7 9
Specifying Length: Instead of specifying the end point, you can specify
the length of the sequence using the length.out parameter.
# Create a sequence with 5 numbers from 1 to 10
seq3 <- seq(from = 1, to = 10, length.out = 5)
print(seq3)
Output:
[1] 1.00 3.25 5.50 7.75 10.00
Geometric Sequence: You can create a geometric sequence using the
seq() function with the along.with parameter.
# Create a geometric sequence with 10 numbers starting from 1 with a
ratio of 2
seq4 <- seq(along.with = 1:10) * 2
print(seq4)
Output:
[1] 2 4 6 8 10 12 14 16 18 20
Operators in R: In R, there are various types of operators used
for different purposes, including arithmetic, assignment,
comparison, logical, and special operators. Here's an overview:
1. Arithmetic Operators: It is used with numeric value to perform
commom mathematical operations.
- Addition +
- Subtraction -
- Multiplication *
- Division /
- Exponentiation ^ or **
- Modulo (remainder) %%
2. Assignment Operators: It is used to assign values to variables.
- Assignment <- or =
- Compound assignment operators like +=, -=, *=, /=, etc.
3. Comparison Operators: It is used to compare two values.
- Equal to ==
- Not equal to !=
- Less than <
- Greater than >
- Less than or equal to <=
- Greater than or equal to >=
4. Logical Operators: It is used to combine conditional statements.
- AND & or &&
- OR | or ||
- NOT !
5. Miscellaneous Operators: It is used for a special and specific
purpose.
- Colon : for creating sequences
- %in% for testing if elements are contained in a vector
- %*% for matrix multiplication
- %/% for integer divisio
Packages in r
Some popular R packages across various domains:
1. Data Manipulation and Visualization:
- ggplot2: Data visualization based on the Grammar of Graphics.
- dplyr: Data manipulation tools for data frames.
- tidyr: Tools for tidy data.
- reshape2: Reshape and aggregate data.
2. Statistical Analysis:
- lm: Functions for fitting linear models.
- glm: Functions for fitting generalized linear models.
- survival: Survival analysis.
- nlme: Nonlinear mixed-effects models.
3. Machine Learning:
- caret: Classification and regression training.
- randomForest: Random forests for classification and regression.
- xgboost: Extreme Gradient Boosting.
- keras: Deep learning with Keras.
4. Bioinformatics:
- Bioconductor: Collection of packages for bioinformatics and
computational biology.
- GenomicRanges: Genomic annotations and range operations.
- DESeq2: Differential expression analysis for RNA-seq data.
- biomaRt: Access to BioMart databases.
5. Time Series Analysis:
- forecast: Time series forecasting.
- tseries: Time series analysis and computational finance.
- xts: Extensible time series.
6. Geospatial Analysis:
- sf: Simple Features for handling spatial data.
- sp: Classes and methods for spatial data.
- raster: Geographic data analysis and modeling.
7. Text Mining:
- tm: Text mining framework.
- quanteda: Quantitative analysis of textual data.
- text2vec: Framework for text mining and natural language
processing.
8. Web Scraping:
- rvest: Web scraping tools.
- httr: HTTP client for web APIs.
- xml2: XML parsing and manipulation.
Creating script files in :
In R, you can create script files to write and save your R code for
later use or sharing. Script files typically have a ".R" extension and
contain a series of R commands or functions. Here's how you can
create and work with script files in R:
Creating a Script File:
1. Open your preferred text editor or integrated development
environment (IDE) such as RStudio.
2. Write your R code in the editor.
3. Save the file with a ".R" extension.
Writing R Code:
1. Write your R code in the script file just like you would in the R
console.
2. You can include comments in your script using the # symbol.
Comments are ignored by R and are used to add explanatory
notes to your code.
Example script file (example_script.R):
a <- 5
b <- 10
# Calculate the sum
sum <- a + b
# Print the result
print(sum)
[1] 15
Running Script Files:
To execute the code in a script file, you can either:
Open the file in RStudio and click the "Source" button.
Use the source() function in the R console to run the script.
# Run the script file "example_script.R"
source("example_script.R")
Working Directory:
By default, R looks for script files in the current working directory. You
can use the setwd() function to change the working directory if needed.
You can also specify the full path to the script file when using the
source() function.
# Change working directory
setwd("path/to/your/directory")
# Run the script with full path
source("path/to/your/script/example_script.R")
Using script files is a convenient way to organize and execute your R
code, especially for larger projects or when you want to reuse code
snippets. It also helps in documenting your analysis and sharing your
work with others
Vectors in R: A vector is a basic data structure that stores elements of
the same data type in a sequence. Vectors can be created using the c()
function, which concatenates elements together. They can contain
numeric, character, logical, or other types of data.
# Creating a numeric vector
numeric_vector <- c(1, 2, 3, 4, 5)
# Creating a character vector
character_vector <- c("apple", "banana", "orange")
# Creating a logical vector
logical_vector <- c(TRUE, FALSE, TRUE, TRUE)
Creating Vectors: You can create a vector in R using the c() function,
which stands for "combine" or "concatenate":
numeric_vector <- c(1, 2, 3, 4, 5)
character_vector <- c("apple", "banana", "orange")
logical_vector <- c(TRUE, FALSE, TRUE)
mixed_vector <- c(1, "apple", TRUE)
Accessing Elements: You can access elements of a vector using square
brackets [], indexing starts from 1:
# Accessing elements
numeric_vector[3] # Returns the third element of numeric_vector
character_vector[2] # Returns the second element of character_vector
logical_vector[c(1, 3)] # Returns the first and third elements of
logical_vector
Vector Operations: You can perform operations on vectors element-
wise:
# Vector addition
vec1 <- c(1, 2, 3)
vec2 <- c(4, 5, 6)
result <- vec1 + vec2 # Element-wise addition
result <- vec1 * vec2 # Element-wise multiplication
result <- vec1 / vec2 # Element-wise division
result <- vec1 ^ vec2 # Element-wise exponentiation
Vector Functions: R provides many built-in functions to perform
operations on vectors:
o Sum of vector elements
total <- sum(vec1)
o Mean of vector elements
mean_value <- mean(vec1)
o Median of vector elements
median_value <- median(vec1)
o Maximum and minimum of vector elements
max_value <- max(vec1)
min_value <- min(vec1)
o Sorting a vector
sorted_vec <- sort(vec1)
Vector Length and Type: You can find out the length of a vector
using the length() function, and the data type using the typeof()
function:
o Length of vector
vec_length <- length(vec1)
o Data type of vector
vec_type <- typeof(vec1)
Vector Manipulation: You can manipulate vectors by adding, removing,
or modifying elements:
o Adding elements to a vector
vec1 <- c(vec1, 4, 5) # Adds 4 and 5 to the end of vec1
o Removing elements from a vector
vec1 <- vec1[-3] # Removes the third element from vec1
o Modifying elements
vec1[2] <- 10 # Changes the second element of vec1 to 10
Types of vector In R, vectors can be classified into different
types based on the data they contain. The main types of vectors in
R are:
Numeric Vector (numeric): Numeric vectors contain numeric (real)
values. These values can be integers or decimal numbers.
Example:
numeric_vector <- c(1, 2, 3.5, 4, 5)
Integer Vector (integer): Integer vectors contain integer values only.
They are stored as 32-bit or 64-bit integers depending on the platform.
Example:
integer_vector <- c(1L, 2L, 3L, 4L, 5L)
Character Vector (character): Character vectors contain strings of
characters.
Example:
character_vector <- c("apple", "banana", "orange")
Logical Vector (logical): Logical vectors contain logical (Boolean) values,
which can be TRUE or FALSE.
Example:
logical_vector <- c(TRUE, FALSE, TRUE)
Complex Vector (complex): Complex vectors contain complex numbers
with real and imaginary parts.
Example:
complex_vector <- c(2 + 3i, 4 - 2i, 1 + 5i)
Raw Vector (raw): Raw vectors contain raw bytes.
Example:
raw_vector <- as.raw(c(0x41, 0x42, 0x43))
Factor Vector (factor): Factor vectors are used to represent categorical
data. They are internally stored as integers with labels.
Example:
factor_vector <- factor(c("male", "female", "male", "female"))
Accessing and manipulating vectors: In R, you can access and
manipulate vectors in various ways:
1. *Accessing elements*: Use indexing to access specific elements in
a vector.
For example
my_vector <- c("apple", "banana", "orange", "grape", "kiwi")
first_element <- my_vector[1]
second_element <- my_vector[2]
subset_vector <- my_vector[c(2, 4)]
output:
print(first_element) # Output: "apple"
print(second_element) # Output: "banana"
print(subset_vector) # Output: "banana" “grape”
2. Slicing: You can extract a subset of elements using slicing.
For example
my_vector <- c("apple", "banana", "orange", "grape", "kiwi")
my_vector <- my_vector[-3]
print(my_vector)
# Output: "apple" "banana" "grape" "kiwi"
3. *Modifying elements*: Assign new values to elements of a vector
using indexing.
For example
my_vector <- c("apple", "banana", "orange", "grape", "kiwi")
my_vector[3] <- "strawberry"
print(my_vector)
Output: "apple" "banana" "strawberry" "grape" "kiwi"
4. Appending elements: Use the c() function to append elements to
a vector.
For example
my_vector <- c("apple", "banana", "orange")
my_vector <- c(my_vector, "grape", "kiwi")
print(my_vector)
Output: "apple" "banana" "orange" "grape" "kiwi"
5. Vectorized operations: Perform operations on entire vectors at
once.
For example
vec1 <- c(1, 2, 3)
vec2 <- c(4, 5, 6)
result_add <- vec1 + vec2
result_mul <- vec1 * vec2
output:
print(result_add) # Output: 5 7 9
print(result_mul) # Output: 4 10 18
6. Vector functions: R provides various functions for vector
manipulation, such as length(), sum(), mean(), sort(), unique(), etc.
my_vector <- c("apple", "banana", "orange", "grape", "kiwi")
total <- sum(my_vector)
length_my_vector <- length(my_vector)
sorted_vector <- sort(my_vector)
unique_elements <- unique(my_vector)
output:
print(total) # Output: "applebananagrapekiwi"
print(length_my_vector) # Output: 5
print(sorted_vector)
# Output: "apple" "banana" "grape" "kiwi" "orange"
print(unique_elements)
# Output: "apple" "banana" "orange" "grape" "kiwi"
Basic arithmetic operations on numeric vectors: In R, you
can perform basic arithmetic operations on numeric vectors.
Here are some examples:
1. Addition: Use the + operator to add corresponding elements of two
vectors or add a scalar value to each element of a vector.
vector1 <- c(1, 2, 3)
vector2 <- c(4, 5, 6)
result <- vector1 + vector2
# result: [5, 7, 9]
result <- vector1 + 10
# result: [11, 12, 13]
2. Subtraction: Use the - operator to subtract corresponding elements
of two vectors or subtract a scalar value from each element of a vector.
result <- vector1 - vector2
# result: [-3, -3, -3]
result <- vector1 - 2
# result: [-1, 0, 1]
3. Multiplication: Use the * operator to multiply corresponding
elements of two vectors or multiply each element of a vector by a
scalar value.
result <- vector1 * vector2
# resut: [4, 10, 18]
result <- vector1 * 3
# result: [3, 6, 9]
4. Division: Use the / operator to divide corresponding elements of two
vectors or divide each element of a vector by a scalar value.
result <- vector2 / vector1
# result: [4, 2.5, 2]
result <- vector1 / 2
# result: [0.5, 1, 1.5]
Define the mean, median, mode, range, quartiles,
standard deviationb etc of numeric vectors:
1. Mean: The arithmetic average of a set of numbers. It is calculated
by summing all the numbers in the set and then dividing by the
count of numbers.
Example:
numeric_vector <- c(10, 15, 20, 25, 30, 35, 40)
mean_value <- mean(numeric_vector)
Output: [1] 25
2. *Median*: The middle value of a set of numbers when they are
arranged in ascending or descending order. If there is an even number
of observations, the median is the average of the two middle values.
Example:
median_value <- median(numeric_vector)
Output: [1] 25
3. Mode: The value that appears most frequently in a set of numbers. A
set of numbers can have one mode, more than one mode (multimodal),
or no mode if all values occur with the same frequency.
Example:
library(modeest)
mode_value <- mlv(numeric_vector)$mode
Output: [1] 10 15 20 25 30 35 40
4. Range: The difference between the largest and smallest values in a
set of numbers. It provides a measure of the spread or variability of the
data.
Example:
range_value <- range(numeric_vector)
Output:[1] 10 40
5. Quartiles: Values that divide a set of numbers into four equal parts.
The first quartile (Q1) represents the 25th percentile, the second
quartile (Q2) is the median (50th percentile), and the third quartile (Q3)
is the 75th percentile.
Example:
quartiles <- quantile(numeric_vector, probs = c(0.25, 0.5, 0.75))
Output: 25% 50% 75%
17.5 25.0 32.5
6 Standard Deviation: A measure of the dispersion or spread of a set of
numbers. It indicates how much individual values deviate from the
mean of the set. A higher standard deviation indicates greater
variability in the data.
Example:
sd_value <- sd(numeric_vector)
Output:[1] 11.18034
Comparing vectors in R:
In R, you can compare vectors using relational operators, which return
logical values (TRUE or FALSE). Here are the common relational
operators for comparing vectors:
1. *Equality*: ==
- Compares if elements in two vectors are equal.
vector1 == vector2
2. *Inequality*: !=
- Compares if elements in two vectors are not equal
vector1 != vector2
3. *Greater than*: >
- Compares if elements in one vector are greater than the
corresponding elements in another vector.
vector1 > vector2
4. *Greater than or equal to*: >=
- Compares if elements in one vector are greater than or equal to the
corresponding elements in another vector.
vector1 >= vector2
5. *Less than*: <
- Compares if elements in one vector are less than the corresponding
elements in another vector.
vector1 < vector2
6. *Less than or equal to*: <=
- Compares if elements in one vector are less than or equal to the
corresponding elements in another vector.
vector1 <= vector2
Example:
Create two numeric vectors
vec1 <- c(1, 2, 3, 4, 5)
vec2 <- c(3, 2, 1, 4, 5)
# Compare vectors element-wise
greater_than <- vec1 > vec2
less_than <- vec1 < vec2
equal_to <- vec1 == vec2
not_equal_to <- vec1 != vec2
greater_than_or_equal_to <- vec1 >= vec2
less_than_or_equal_to <- vec1 <= vec2
# Print the comparison results
print(greater_than)
print(less_than)
print(equal_to)
print(not_equal_to)
print(greater_than_or_equal_to)
print(less_than_or_equal_to)
Output:
[1] FALSE FALSE TRUE FALSE FALSE
[1] TRUE FALSE FALSE FALSE FALSE
[1] FALSE TRUE FALSE TRUE TRUE
[1] TRUE FALSE TRUE FALSE FALSE
[1] FALSE TRUE TRUE TRUE TRUE
[1] TRUE TRUE FALSE TRUE TRUE
Sorting vectors: In R, you can sort vectors using the sort()
function.
1. Ascending Order: By default, sort() arranges the elements of a vector
in ascending order.
my_vector <- c(3, 1, 4, 2, 5)
sorted_vector <- sort(my_vector)
# sorted_vector: [1, 2, 3, 4, 5]
2. *Descending Order*: You can specify decreasing = TRUE to sort the
vector in descending order.
sorted_vector_desc <- sort(my_vector, decreasing = TRUE)
# sorted_vector_desc: [5, 4, 3, 2, 1]
3. *Sorting Indexes*: If you want to sort a vector without rearranging
its elements, you can use the order() function, which returns the
indexes that would sort the vector.
sorted_indexes <- order(my_vector)
# sorted_indexes: [2, 4, 1, 3, 5]
sorted_vector_by_index <- my_vector[sorted_indexes]
# sorted_vector_by_index: [1, 2, 3, 4, 5]
Character vectors and operations on character vectors:
In R, character vectors store strings of text. You can perform various
operations on character vectors, such as concatenation, subsetting, and
finding unique values.
Here are some common operations:
1. Concatenation: Combine multiple strings into a single character
vector using the paste() function or the concatenation operator c().
names <- c("Alice", "Bob", "Charlie")
greetings <- paste("Hello", names)
# greetings: ["Hello Alice", "Hello Bob", "Hello Charlie"]
2. Subsetting: Access specific elements or subsets of a character vector
using indexing or logical conditions.
first_name <- names[1]
# first_name: "Alice"
subset_names <- names[c(1, 3)]
# subset_names: ["Alice", "Charlie"]
# Select names starting with "A"
a_names <- names[startsWith(names, "A")]
# a_names: ["Alice"]
3. Finding Unique Values: Identify unique elements in a character
vector using the unique() function.
unique_names <- unique(names)
# unique_names: ["Alice", "Bob", "Charlie"]
4. String Manipulation: You can manipulate strings within character
vectors using functions like toupper(), tolower(), substring(), gsub(), etc.
uppercase_names <- toupper(names)
# uppercase_names: ["ALICE", "BOB", "CHARLIE"]
5. Comparisons: Compare character vectors using relational operators.
vowels <- c("a", "e", "i", "o", "u")
contains_vowels <- names %in% vowels
# contains_vowels: [TRUE, FALSE, TRUE]