0% found this document useful (0 votes)
3 views4 pages

R Data Types and Objects

The document provides a comprehensive overview of R data types and objects, detailing their characteristics, usage, and operations. It covers various data types including numeric, integer, character, logical, and complex types, as well as R objects such as vectors, lists, matrices, data frames, factors, and arrays. Additionally, it emphasizes best practices and efficiency tips for working with data in R.

Uploaded by

lokeshkumaar3421
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views4 pages

R Data Types and Objects

The document provides a comprehensive overview of R data types and objects, detailing their characteristics, usage, and operations. It covers various data types including numeric, integer, character, logical, and complex types, as well as R objects such as vectors, lists, matrices, data frames, factors, and arrays. Additionally, it emphasizes best practices and efficiency tips for working with data in R.

Uploaded by

lokeshkumaar3421
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

R Data Types and Objects - Detailed Notes

 Overview of R Data Types:


o R data types are the fundamental building blocks for storing and manipulating
data in the R programming language.
o R supports several basic data types, each designed for specific kinds of data,
ensuring flexibility in statistical computing and data analysis.
o Understanding data types is crucial because R’s operations and functions often
behave differently depending on the type of data they process.
 Numeric Type:
o Represents numbers with decimal points (floating-point numbers), also known
as doubles in R.
o Example: x <- 3.14; typeof(x) returns "double".
o Used for continuous data, such as measurements, e.g., heights (5.9) or
temperatures (23.5).
o Default type for numbers unless specified otherwise; even integers like 5 are
stored as numeric unless defined with L.
o Operations: Supports arithmetic like 3.14 + 2.86 (returns 6).
 Integer Type:
o Represents whole numbers without decimal points.
o Defined explicitly using the L suffix, e.g., y <- 5L; typeof(y) returns
"integer".
o Example: z <- as.integer(5.7) converts 5.7 to 5, truncating the decimal.
o Memory-efficient compared to numeric types, useful for large datasets with
count data, e.g., number of items sold (10L).
o Use case: Statistical models like Poisson regression often require integer
counts, e.g., glm(count ~ predictor, family=poisson).
 Character Type:
o Represents text or strings, enclosed in single (') or double (") quotes.
o Example: name <- "Alice"; typeof(name) returns "character".
o Useful for categorical labels, e.g., city <- "New York", or text data like
comments.
o Operations: String manipulation with functions like paste("Hello",
"World") (returns "Hello World") or nchar("Alice") (returns 5).
o Conversion: as.character(123) converts a number to a string, returning
"123".
 Logical Type:
o Represents boolean values: TRUE (or T) and FALSE (or F).
o Example: is_valid <- TRUE; typeof(is_valid) returns "logical".
o Generated by comparisons, e.g., 5 > 3 returns TRUE.
o Used in conditional statements: if (5 > 3) { print("Yes") } prints
"Yes".
o Operations: Logical operators like & (AND), | (OR), ! (NOT), e.g., TRUE &
FALSE returns FALSE.
o Use case: Subsetting data, e.g., vec <- c(10, 20, 30); vec[vec > 15]
returns 20, 30.
 Complex Type:
o Represents numbers with real and imaginary parts, used in advanced
mathematical computations.
o Example: z <- 2 + 3i; typeof(z) returns "complex".
o Components: Re(z) returns 2 (real part); Im(z) returns 3 (imaginary part).
o Operations: z1 <- 1 + 2i; z2 <- 2 + 3i; z1 + z2 returns 3 + 5i.
o Use case: Signal processing or solving equations in physics, e.g., exp(1i *
pi) returns -1 + 0i (Euler’s formula).
 Overview of R Objects:
o R objects are structures that hold data of various types, used to organize and
manipulate data efficiently.
o Objects determine how data is stored, accessed, and processed in R, making
them essential for programming tasks.
 Vectors:
o One-dimensional arrays holding elements of the same type (homogeneous).
o Created with c(): vec <- c(1, 2, 3); typeof(vec) returns "double".
o Can hold any data type: char_vec <- c("a", "b", "c") for characters.
o Operations: Vectorized, e.g., vec * 2 returns 2, 4, 6.
o Use case: Store a sequence of numbers, e.g., ages (c(25, 30, 35)), for
statistical analysis like mean(vec).
 Lists:
o One-dimensional collections that can hold elements of different types
(heterogeneous).
o Created with list(): my_list <- list(1, "a", TRUE, c(10, 20)).
o Access: my_list[[1]] returns 1; my_list[1] returns a sublist.
o Named lists: list(name="Alice", age=25); access with my_list$name.
o Use case: Store mixed data, e.g., metadata of a dataset (list(id=1,
data=c(10, 20), desc="test")).
 Matrices:
o Two-dimensional arrays, homogeneous, storing elements of the same type.
o Created with matrix(): mat <- matrix(1:6, nrow=2, ncol=3) creates a
2x3 matrix.
o Structure: print(mat) shows [[1,1]] 1, [[1,2]] 3, [[1,3]] 5;
[[2,1]] 2, [[2,2]] 4, [[2,3]] 6.
o Operations: Matrix algebra, e.g., mat %*% t(mat) for matrix multiplication.
o Use case: Linear algebra tasks, e.g., solving systems of equations or image
processing.
 Data Frames:
o Table-like structures, heterogeneous, where each column can have a different
type.
o Created with data.frame(): df <- data.frame(name=c("Alice",
"Bob"), age=c(25, 30)).
o Access: df$name or df[, "name"] for the name column; df[1, ] for the first
row.
o Properties: Combines features of lists (columns) and matrices (row-column
structure).
o Use case: Store datasets, e.g., survey data with columns for id, age, gender,
for analysis like summary(df).
 Factors:
o Used for categorical data, storing unique levels as labels.
o Created with factor(): f <- factor(c("low", "high", "low")).
o Levels: levels(f) returns "high", "low" (alphabetical by default).
o Internal storage: Stored as integers, e.g., as.numeric(f) returns codes like 2,
1, 2.
o Use case: Statistical modeling, e.g., lm(y ~ factor(group), data=df)
treats group as categorical.
o Customization: Reorder levels with factor(f, levels=c("low",
"high")).
 Arrays:
o Multi-dimensional extensions of matrices, homogeneous.
o Created with array(): arr <- array(1:12, dim=c(2, 3, 2)) creates a
2x3x2 array.
o Access: arr[1, 2, 1] for specific elements.
o Use case: Multi-dimensional data, e.g., 3D image data (height, width, color
channels).
o Operations: Similar to matrices, e.g., arr + 1 adds 1 to each element.
 Type and Object Checking:
o typeof(): Returns the data type, e.g., typeof(5L) returns "integer".
o class(): Returns the object class, e.g., class(df) returns "data.frame".
o str(): Displays the structure, e.g., str(df) shows column types and data.
o Use case: Debugging to ensure correct types before operations, e.g., if
(is.numeric(vec)).
 Coercion and Conversion:
o R automatically coerces types in mixed operations: c(1, "2") coerces to
character ["1", "2"].
o Explicit conversion: as.numeric("123") returns 123; as.character(123)
returns "123".
o Factors: as.factor(c("a", "b")) for categorical data;
as.numeric(factor) for integer codes.
o Use case: Prepare data for analysis, e.g., converting strings to numbers for
calculations.
 Practical Applications:
o Numeric/Integer: Compute statistics, e.g., mean(c(1, 2, 3)).
o Character: Label data, e.g., df$city <- c("NY", "LA").
o Logical: Filter data, e.g., df[df$age > 25, ].
o Vectors: Store sequences for analysis, e.g., sales <- c(100, 200, 300).
o Lists: Organize mixed data, e.g., list(data=vec, params=list(mean=0,
sd=1)).
o Data Frames: Analyze tabular data, e.g., summary(df) for descriptive
statistics.
 Best Practices:
o Choose the right type/object for the task: Use factors for categorical data, data
frames for datasets.
o Check types with typeof() or class() to avoid errors in operations.
o Avoid unnecessary coercion to prevent data loss, e.g., as.numeric("text")
returns NA.
o Use str() to understand complex objects before manipulation.
o Ensure homogeneity in vectors/matrices to maintain performance.
 Efficiency Tips:
o Integers are more memory-efficient than numerics for whole numbers.
o Pre-allocate vectors with vector("numeric", length=1000) for large
datasets.
o Use data frames for tabular data instead of lists for faster subsetting.
o Factors reduce memory usage for categorical data compared to characters.
o Arrays are efficient for multi-dimensional numerical data, avoiding nested
lists.

You might also like