0% found this document useful (0 votes)
0 views

R-Programming-Cheat-Sheet

Uploaded by

Mohiuddin Ahmed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

R-Programming-Cheat-Sheet

Uploaded by

Mohiuddin Ahmed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Table of Contents

This cheat sheet provides a quick reference for essential R programming Basics
Statistics
commands, helping you perform data manipulation, visualization, and install.packages, library,
mean, median, sd, cor, lm
assignment (<-), print, class
statistical analysis with confidence. It covers foundational topics like installing
packages and understanding R's data structures, alongside advanced tasks
Data Structures
such as building models and applying machine learning techniques.

 Programming
c, list, matrix, data.frame,
if, for, while, function, apply
df$a or df

Each section includes concise syntax and practical examples to illustrate how R
commands are used in real-world scenarios. You'll find guidance on working Data Manipulation Machine Learning
with vectors, lists, matrices, and data frames, performing common data filter, select, mutate, Matrices, Linear Model, Visualize

summarize, arrange Residuals


wrangling tasks like filtering and summarizing, and creating visualizations
such as histograms, bar plots, and boxplots. The cheat sheet also highlights R's
File I/O
capabilities for statistical analysis with commands like mean, lm, and cor.

 Data Visualization
read.csv, write.csv, readRDS,
plot, barplot, hist, boxplot
saveRDS, list.files

Designed for clarity and accessibility, this resource is ideal for data analysts,
statisticians, and programmers seeking to enhance their workflows in R.
Whether you're exploring data, developing algorithms, or building
reproducible reports, this cheat sheet ensures you can quickly apply R's
powerful tools to your projects.

R Cheat Sheet
Basics Data Structures
Syntax for How to use Explained Syntax for How to use Explained

Install

install.packages("dplyr") Installs the dplyr package. Create Vector c(1, 2, 3) Combines elements into a vector.
Package

Load Package library(dplyr) Loads the dplyr package into the Create List list(a=1, b="two") Creates a list with named elements.
current R session.

Assignment <- 5 Create Matrix


Creates a matrix with 2 rows and 3
x Assigns value 5 to the variable x. matrix(1:6, nrow=2)
columns.

Create Data
Creates a data frame with columns a
Print Output print(x) Prints the value of x to the console. Frame
data.frame(a=1:3, b=4:6)
and b .

Examples of logical, integer, numeric, Access df$a | df[1, 1] Performs a logical OR operation between
125, 12.5, "Hello"
Literals and
TRUE, Element
Data Types and character literals in R. a column and a specific element.

Loading stringr Loads the stringr library to work


Extracting library(readr)
Uses parse_number to extract package
library(stringr)
with strings in R.
Numbers from data_frame <- mutate(data_frame, numeric values from string columns.
Strings column = parse_number(column)) Opening a f <- fromJSON('filename.json') Loads a JSON file into an R dataframe
JSON File using the jsonlite package.

Basic String str_sub("Dataquest is awesome", Extracts “Dataquest”


Indexing
as a substring
Creating a List new_list <- list("data scientist", Defines a list containing diverse data
9) by specifying start and end indices. types.
1, c(50000,40000), "programming
experience")

R Cheat Sheet
Data Manipulation Data Visualization

Syntax for How to use Explained Syntax for How to use Explained

Filter Rows filter(df, a > 2)


Filters rows where column a is greater Creating a
data %>% plot()
gg
Initialize a basic ggplot2 chart

than 2. Basic Plot without specifying any aesthetics.

Select

select(df, a, b) Selects specific columns by name.


Creating
data %>% plot(aes(x = variable_1,
gg
Plots subsets of data in separate
Columns Sub plots
y = variable_2)) + geom_line() + facets.

facet_wrap(~variable_3)
Mutate Adds a new column c as sum of a
mutate(df, c = a + b)
Columns and b .

Creating Bar Create a bar chart using gg plot2 ,


_frame %>% ggplot(aes(x =
variables to x and y axes.
data
Summarize C hart mapping
summarize(df, avg=mean(a)) Calculates mean of column a and variable_1, y = variable_2)) +
Data
returns as avg . geom_col()

Sorts rows by column a in


Arrange Rows arrange(df, desc(a))
Plotting Plots multiple columns on the same
descending order. data %>% ggplot(aes(x =
multiple
variable_1)) + geom_line(aes(y = axes using ggplot2 .
Importing
data <- Imports dataset into R using the columns
variable_2)) + geom_line(aes(y =

Data read_csv function from readr . variable_3))


read_csv("name_of_file_with_data.cs

v")

Scatter plots Generate scatterplots to visualize


ggplot(data = uber_trips, aes(x =
Summing Values Sums specified columns for each row
bivariate relationships in plot2 .
df %>% mutate(new_column_name = distance, y = cost)) + geom_point() gg
Across Rows and adds as a new column.
rowSums(.[1:3]))

Summing Values Sums specified rows for each column


Scatter plots ggplot(data = df, aes(x = predictor, x
Create scatterplots with y-a is labels
df %>% bind_rows(tibble(total = h Labels
Across Columns and adds as a new row.
wit y = response)) + geom_point() + formatted using commas instead of
colSums(across(everything()))))
scale_y_continuous(labels = scientific notation.

scales::comma)

Importing CSV dataframe <- V files into R using readr's


Read CS

files re d_ s () for efficient data


a c v
read_csv("name_of_the_dataset.csv")
import.

R Cheat Sheet
Data Visualization Statistics & Probability
Syntax for How to use Explained Syntax for How to use Explained

Scatterplot Plots a scatterplot with y-axis labels in Mean


ggplot(data = df, aes(x = mean(x) Calculates the mean of vector x .
with Comma comma format.
predictor, y = response)) +
Labels
scale_y_continuous(labels =
Median median(x) Calculates the median of vector x .
scales::comma) + geom_point()

Weighted Mean mean <- weighted.mean(x = Computes the weighted mean of a


Scatterplot Creates scatterplots of response vs numerical vector using specific
ggplot(data = df, aes(x = distribution, w = weights)
with Groups predictor, grouped by a categorical weights.
predictor, y = response)) +
variable.
geom_point() + facet_wrap(~
Standard Calculates the standard deviation
categorical_variable, ncol = 2) sd(x)
Deviation of x .

Correlation cor(x, y) Calculates correlation between x


Scatterplot Creates scatterplots of response vs and y .
ggplot(data = df, aes(x =
with Groups predictor, grouped by a categorical
predictor, y = response)) +
variable. Linear
geom_point() + facet_wrap(~ lm(y ~ x, data=df) Fits a linear regression model.
Model
categorical_variable, ncol = 2)
Types of
# Example Variables: Age Classify variables as Quantitative
Variables
Vertical Bar Creates a vertical bar chart to visualize (Quantitative), Gender (numerical) or Qualitative
ggplot(data = df, aes(x = col)) + (categorical).
Chart counts of data. (Qualitative)
geom_bar()

P-Value
if (p_value < 0.05) { print('Reject Decide on hypothesis rejection using
Grouped Bar Creates a grouped bar plot to compare Decision
ggplot(data = df, aes(x = col_1, null hypothesis') } else { a common p-value threshold of 0.05.
Plot frequency distributions of categorical Threshold
fill = col_2)) + geom_bar(position print('Fail to reject null
variables. hypothesis') }
= "dodge")

R Cheat Sheet
Statistics & Probability
Syntax for How to use Explained Syntax for How to use Explained

Chi-Squared Calculates the cumulative probability Simulate Simulates a random coin toss using
pchisq(3.84, df = 1) set.seed(1)

Distribution for a chi-squared distribution with Coin Toss R's uniform random numbers.
coin_toss <- function() { if
specific degrees of freedom.
(runif(1) <= 0.5) 'HEADS' else
Chi-Squared Calculate cumulative probability for a 'TAILS' }
pchisq(q = 10, df = 5)
Test chi-squared statistic of 10 with 5
degrees of freedom. Addition Formula to calculate probabilities of
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
Rule for unions of events, adjusting for
Multi-category
data <- table(income$sex,
Performs a chi-squared test on the Probability overlap in non-exclusive cases.
Chi-squared given contingency table.
income$high_income)
Test
Independ Probability of independent events
P(A ∩ B) = P(A) * P(B)
Defines a function to calculate the ent Events occurs as product of individual
Computing compute_mode <- function(vector)
mode of a given vector using dplyr probabilities.
Mode in R {counts_df <- tibble(vector) %>%
functions.
group_by(vector) %>%
Product Calculate the total outcomes for two
summarise(frequency=n()) %>% total_outcomes <- a * b
Rule in independent experiments using the
arrange(desc(frequency)); Experiments product rule.
counts_df$vector[1]}

Calculate Z- This calculates the Z-score for a value Uniform # Assuming all outcomes have equal Demonstrates a uniform distribution
z_score <- function(value, vector)
relative to a vector's distribution. Distribution chance
for a dice roll, where outcomes
score { (value - mean(vector)) /
outcomes <- c(1, 2, 3, 4, 5, 6)
equally likely.
sd(vector) }
probabilities <- rep(1/6, 6)

paste('Outcome:', outcomes,
Chi-Squared Calculates the cumulative probability 'Probability:', probabilities)
pchisq(3.84, df = 1)
Distribution for a chi-squared distribution with
specific degrees of freedom.

R Cheat Sheet
Statistics & Probability Programming
Syntax for How to use Explained Syntax for How to use Explained

Conditional Compute P(A|B) given the probability


P_A_given_B <- P_A_and_B / P_B If Statement if (x > 0) print("positive") Executes code if condition is true.
Probability of A and B, and probability of B.
Calculation
For Loop for (i in 1:3) print(i) Iterates over a sequence.
Conditional Compute P(A B) using set
P_A_given_B <- length(intersect(A,

Probability cardinalities.
B)) / length(B)
While Loop while (x < 5) x <- x + 1
Repeats code while the x < 5
condition is true.
Conditional Conditional probabilities are
P_A_given_B <- 1 - P_Ac_given_B Syntax for Defines a reusable function structure
Probability interrelated P(A|B) and its
; function_name <- function(input) {

functions in R.
Definition complement P(Ac|B) can be # Code to manipulate the input

calculated mutually. return(output)

}
Defines independent events joint
Independence P_A_and_B <- P_A P_B
:
*

probability equals product of


Define Defines a function with two
individual probabilities. f <- function(a, b) a + b
Function arguments.

Apply apply(m, 1, sum)


Applies a function over rows/columns
Function of a matrix.

Exponentiation 3^5 Calculates 3 raised to the power of 5.

Converts a string into a Date object


Creating Dates ymd('20/04/21')
using 'year-month-day'.

Creating Dates Converts a string to a date object using


ymd("20/04/21")
from Strings the specified format.

Define Defines a window frame including one


ROWS BETWEEN 1 PRECEDING AND 1
Window row before and after the current row
FOLLOWING
Frame for computations.

R Cheat Sheet
Machine Learning File I/O
Syntax for How to use Explained Syntax for How to use Explained

Fitting a Fit a linear regression model with a Read CSV read.csv("file.csv") Reads a CSV file into a data frame.
lm_fit <- lm(response ~ predictor,
Linear response and a predictor variable.
data = df)
Model
Write CSV write.csv(df, "file.csv") Writes a data frame to a CSV file.
Visualize library(ggplot2)
Visualize the distribution of residuals
Residuals ggplot(data.frame(residuals = to check the linear model's fit.
lm_fit$residuals), aes(x = Read RDS readRDS("file.rds") Reads an RDS file into R.
residuals)) + geom_histogram()

Write RDS saveRDS(df, "file.rds") Saves an object as an RDS file.


Hyperparam
knn_grid <- expand.grid(k = 1:20)
Performs grid search to optimize k
eter Grid
knn_model <- train(tidy_price ~ for k-NN model and visualizes results.
Search List Files list.files() Lists files in the current directory.
accommodates + bathrooms +
bedrooms, data = training_data,
method = "knn", trControl =
train_control, preProcess =
c("center", "scale"), tuneGrid =
knn_grid)

plot(knn_model)

Naive Bayes P(Spam|w1,...,wn) ∝ P(Spam) * Classifies messages as spam using


Algorithm ΠiP(wi|Spam) conditional probabilities.

R Cheat Sheet

You might also like