0% found this document useful (0 votes)
11 views

R Programming Easy

The document is a study guide for the secA.R file, covering foundational R programming concepts through various sections. It includes practical examples, explanations, and key terminologies related to basic syntax, data types, handling dates, vectors, data frames, and data visualization. The guide serves as a comprehensive reference for learning R programming with a focus on the iris dataset and its applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

R Programming Easy

The document is a study guide for the secA.R file, covering foundational R programming concepts through various sections. It includes practical examples, explanations, and key terminologies related to basic syntax, data types, handling dates, vectors, data frames, and data visualization. The guide serves as a comprehensive reference for learning R programming with a focus on the iris dataset and its applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

R Programming Tutorial: Study Guide for secA.

R
This document provides a detailed breakdown of the secA.R file, which introduces foundational R programming concepts. It includes explanations, terminologies,
and practical insights for each section, making it a comprehensive reference for learning R.

Section 1: Basic R Syntax and Data Types

Code

100
1:22
iris
hist(iris$Sepal.Length)
plot(iris)
a <- 100
a = 100
A
'pratyasha'
TRUE
demo(graphics)
c(2.4)

Explanation

100 : Outputs the number 100 to the console. This demonstrates R's interactive nature where expressions are evaluated immediately.
1:22 : Creates a sequence of integers from 1 to 22. Output: [1] 1 2 3 ... 21 22 . The : operator generates a vector of consecutive numbers.
iris : Loads and displays the built-in iris dataset, which contains measurements of 150 iris flowers across 5 columns: Sepal.Length , Sepal.Width ,
Petal.Length , Petal.Width , and Species .
hist(iris$Sepal.Length) : Plots a histogram of the Sepal.Length column, showing its distribution (likely a bell-shaped curve). The $ operator extracts a
column from a data frame.
plot(iris) : Creates a scatterplot matrix, visualizing relationships between all numeric variables in iris (e.g., Sepal.Length vs. Petal.Length ).
a <- 100 and a = 100 : Assigns 100 to variable a . Both <- and = are assignment operators in R, though <- is preferred for clarity. Typing a outputs
100.
A : Attempts to reference variable A , which isn’t defined, resulting in an error ( object 'A' not found ).
'pratyasha' : A character string. Output: [1] "pratyasha" .
TRUE : A logical (boolean) value. Output: [1] TRUE .
demo(graphics) : Runs a demonstration of R’s graphics capabilities, showcasing various plotting functions like histograms, scatterplots, and more.
c(2.4) : Uses the c() function to create a numeric vector with a single element, [1] 2.4 .

Terminologies

Vector: A one-dimensional array of elements of the same type (e.g., 1:22 is a numeric vector).
Data Frame: A table-like structure where columns can be of different types (e.g., iris is a data frame with numeric and factor columns).
Histogram: A graphical representation of a numeric variable’s distribution using bars, where bar height indicates frequency.
Assignment Operator : <- or = used to assign values to variables.
Logical Value : TRUE or FALSE , used for boolean operations.
Scatterplot Matrix: A grid of scatterplots showing pairwise relationships between variables.

Key Concepts

R is an interactive environment where you can test expressions directly.


The iris dataset is a standard dataset for learning, with 150 rows (observations) and 5 columns.
Basic plotting functions like hist() and plot() provide quick visualizations.
Assignment and data types (numeric, character, logical) are foundational in R.

Section 2: Checking Data Types

Code
#--------------------------------
23
# we want to check whatg is the type of this 23
class(25)
class(25.9)
class('samikshya')
F
T
class(F)

Explanation

23 : Outputs 23, but no type check is performed unless explicitly requested.


class(25) : Returns "numeric" . Integers are treated as numeric in R.
class(25.9) : Returns "numeric" . Decimals are also numeric.
class('samikshya') : Returns "character" . Strings (enclosed in quotes) are character type.
F and T : Shorthand for FALSE and TRUE , logical values in R.
class(F) : Returns "logical" , confirming F is a logical type.

Terminologies

Class : The type of an object in R (e.g., numeric, character, logical).


Numeric: A data type for numbers, including integers and decimals.
Character: A data type for text strings.
Logical : A data type with values TRUE or FALSE .

Key Concepts

The class() function identifies an object’s type, which is crucial for data manipulation.
R supports several data types, and understanding them ensures proper operations (e.g., you can’t perform arithmetic on character data).

Section 3: Handling Dates in R

Code

# how R handles dates


'2025-03-19'
# check th type
class('2025-03-19')
as.Date('2025-03-19')
class('2025-03-19')
# save the output of the date variable in a variable
myDate = as.Date('2025-03-19')
class(myDate)
# how to extract the day or month or year
format(myDate, '%d')
format(myDate, '%m')
format(myDate, '%Y')

Explanation

'2025-03-19' : A character string representing a date in YYYY-MM-DD format.


class('2025-03-19') : Returns "character" , as it’s initially a string.
as.Date('2025-03-19') : Converts the string to a Date object, assuming the default format YYYY-MM-DD . Output: [1] "2025-03-19" .
class('2025-03-19') : Still returns "character" because the original string wasn’t reassigned.
myDate = as.Date('2025-03-19') : Assigns the converted date to myDate , now a Date object.
class(myDate) : Returns "Date" , confirming the type change.
format(myDate, '%d') : Extracts the day (19). %d is a format specifier for day.
format(myDate, '%m') : Extracts the month (03). %m is for month.
format(myDate, '%Y') : Extracts the year (2025). %Y is for four-digit year.

Terminologies

Date Object: A special R class for storing dates, enabling arithmetic (e.g., date differences) and component extraction.
Format Specifiers: Symbols like %d (day), %m (month), %Y (year) used with format() to extract date parts.
Key Concepts

Dates start as character strings but must be converted to Date objects using as.Date() for proper handling.
The format() function extracts specific components, which is useful for time-based analysis (e.g., grouping sales by month).

Section 4: Vectors and Data Frames

Code

c(25, 23, 29, 32)


age = c(25, 23, 29, 32)
age2 = c(25, 23, 29, '3o')
age
age2
# create all those columns in excel we have
name = c('ab', 'cd', 'xy', 'po')
name
# check th etype of age and name
class(age)
class(name)
class(age2)
# create the income level column
income = c('high', 'high', 'mid', 'high')
class(income)
# convert it to factor
income = as.factor(income)
class(income)
# create the connection among those vectors
# or create a data frame
data.frame(name, age, income)
# create a data frame which shows three of your friends names,
# their food habit and the sports they like
data.frame(name = c('a', 'b','c'), foodHabit = c('veg', 'veg', 'nv'),
sports = c('tt', 'tennis', 'fb'))
class(iris)

Explanation

c(25, 23, 29, 32) : Creates a numeric vector [1] 25 23 29 32 .


age = c(25, 23, 29, 32) : Assigns the vector to age .
age2 = c(25, 23, 29, '3o') : Mixing a string ( '3o' ) with numbers causes coercion to character: [1] "25" "23" "29" "3o" .
name = c('ab', 'cd', 'xy', 'po') : Creates a character vector of names.
class(age) : Returns "numeric" .
class(name) : Returns "character" .
class(age2) : Returns "character" due to coercion.
income = c('high', 'high', 'mid', 'high') : Creates a character vector.
income = as.factor(income) : Converts income to a factor with levels "high" , "mid" . class(income) now returns "factor" .
data.frame(name, age, income) : Combines the vectors into a data frame:

name age income


1 ab 25 high
2 cd 23 high
3 xy 29 mid
4 po 32 high

data.frame(name = c('a', 'b','c'), foodHabit = c('veg', 'veg', 'nv'), sports = c('tt', 'tennis', 'fb')) : Creates a data frame for
friends:

name foodHabit sports


1 a veg tt
2 b veg tennis
3 c nv fb

class(iris) : Returns "data.frame" .


Terminologies

Coercion : Automatic conversion of data types (e.g., numeric to character when mixed).
Factor: A categorical variable with predefined levels (e.g., "high" , "mid" ), used for grouping or analysis.
Data Frame: A 2D table-like structure where columns can be different types.

Key Concepts

Vectors are the foundation of R data structures, and mixing types leads to coercion.
Factors are essential for categorical data, often used in statistical modeling.
Data frames organize multiple vectors into a tabular format, similar to a spreadsheet.

Section 5: Exploring Datasets (iris and spamData)

Code

#------------------------
# open iris dataset
iris
help(iris)
# get the first six observation s
head(iris)
# get first 10 observations
head(iris, 10)
# get the column or variable names of the dataset
library(tidyverse)
spamData = read_tsv("C:/KIIT/DAR/Dataset/spamD.tsv")
spamData
colnames(spamData)
colnames(iris)
colnames(iris[2])
# check the dimension of the data ( no or rows and columns)
dim(iris)
nrow(iris)
head(iris, 20)
# cehck the last 6 observations
tail(iris)
# check the last 10 observations
tail(iris, 10)
# get the second observation of iris dataset
iris[2, ]
# get the fifth column data or species data
iris[, 5]
# get the sepal.Length column data
iris[, 1]
iris[, "Sepal.Length"]
# get the petal length and petal width data from iris
iris[, c(3,4)]
iris[, 3:4]
# get the 10th observation of species an sepal length
iris[10, c('Sepal.Length', 'Species')]
iris[10, ]
# get the species information for 1st, 51st , and 150th observation
iris[c(1,51,150), 'Species']
# get the species and sepal length for 10th to 20th observation
iris[10:20, c(1,5)]
# get the sepal length data of iris
iris$Sepal.Length

Explanation

iris : Displays the iris dataset (150 rows, 5 columns).


help(iris) : Opens documentation for iris , detailing its structure and source.
head(iris) : Shows the first 6 rows:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
...

head(iris, 10) : Shows the first 10 rows.


library(tidyverse) : Loads the tidyverse package, which includes tools like read_tsv for reading data.
spamData = read_tsv("C:/KIIT/DAR/Dataset/spamD.tsv") : Attempts to read a tab-separated values (TSV) file into spamData . (Note: This requires the
file to exist at the specified path; otherwise, it will error.)
spamData : Displays the loaded dataset (if successful).
colnames(spamData) : Lists column names of spamData (assuming it loaded).
colnames(iris) : Returns ["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species"] .
colnames(iris[2]) : Returns "Sepal.Width" , the name of the second column.
dim(iris) : Returns [1] 150 5 (150 rows, 5 columns).
nrow(iris) : Returns 150 (number of rows).
head(iris, 20) : Shows the first 20 rows.
tail(iris) : Shows the last 6 rows.
tail(iris, 10) : Shows the last 10 rows.
iris[2, ] : Extracts the 2nd row as a data frame:

Sepal.Length Sepal.Width Petal.Length Petal.Width Species


2 4.9 3.0 1.4 0.2 setosa

iris[, 5] : Extracts the 5th column ( Species ), a factor vector.


iris[, 1] and iris[, "Sepal.Length"] : Extracts the Sepal.Length column, a numeric vector.
iris[, c(3,4)] and iris[, 3:4] : Extracts columns 3 and 4 ( Petal.Length , Petal.Width ).
iris[10, c('Sepal.Length', 'Species')] : Extracts Sepal.Length and Species for the 10th row:

Sepal.Length Species
10 5.4 setosa

iris[10, ] : Extracts the entire 10th row.


iris[c(1,51,150), 'Species'] : Extracts Species for rows 1, 51, and 150: [1] setosa versicolor virginica .
iris[10:20, c(1,5)] : Extracts Sepal.Length and Species for rows 10 to 20.
iris$Sepal.Length : Extracts Sepal.Length using the $ operator.

Terminologies

Tidyverse: A collection of R packages (e.g., dplyr , ggplot2 ) for data manipulation and visualization.
TSV: Tab-separated values file, a text format for tabular data.
Dimension: The size of a data frame (rows × columns).
Indexing: Using [rows, columns] to subset data (e.g., iris[2, ] for row 2).

Key Concepts

Datasets like iris are explored using functions like head() , tail() , and colnames() .
Indexing ( [rows, columns] ) and the $ operator are key for subsetting data.
External data can be loaded using functions like read_tsv() from the tidyverse.

Section 6: Data Structure and Visualization

Code
#explore the visual analytics
#------------------------------------
# cehck the structure of iris dataframe
str(iris)
glimpse(iris)
# check th erange of iris speal length
range(iris$Sepal.Length)
# average of petal width
mean(iris$Petal.Length)
mean(iris$Petal.Width)
summary(iris)
summary(mtcars)
# how the sepal length are distributed w.r.t. th e petal lengths
plot(iris$Sepal.Length ~ iris$Petal.Length)
# cehck the distribution of petal length
plot(iris$Petal.Length)
plot(iris$Petal.Length ~ iris$Species)
plot(iris$Petal.Length, col = ifelse(iris$Species == "setosa", "darkgreen", "red"))
# check the distribution of sepal length
hist(iris$Sepal.Length)
# cehck the distribition of sepal length vs different species
boxplot(iris$Sepal.Length ~ iris$Species)

Explanation

str(iris) : Displays the structure of iris :

'data.frame': 150 obs. of 5 variables:


$ Sepal.Length: num 5.1 4.9 4.7 4.6 5.0 ...
$ Sepal.Width : num 3.5 3.0 3.2 3.1 3.6 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 ...

glimpse(iris) : A tidyverse alternative to str() , showing a compact view of the dataset.


range(iris$Sepal.Length) : Returns [1] 4.3 7.9 , the minimum and maximum values of Sepal.Length .
mean(iris$Petal.Length) : Returns ~3.76, the average Petal.Length .
mean(iris$Petal.Width) : Returns ~1.20, the average Petal.Width .
summary(iris) : Provides descriptive statistics for each column (min, 1st quartile, median, mean, 3rd quartile, max).
summary(mtcars) : Similar summary for the mtcars dataset.
plot(iris$Sepal.Length ~ iris$Petal.Length) : Creates a scatterplot with Petal.Length on the x-axis and Sepal.Length on the y-axis. The ~
denotes a formula (y ~ x).
plot(iris$Petal.Length) : Plots Petal.Length values against their indices (a basic line/scatter plot).
plot(iris$Petal.Length ~ iris$Species) : Creates a stripchart or boxplot of Petal.Length by Species , showing distribution across species.
plot(iris$Petal.Length, col = ifelse(iris$Species == "setosa", "darkgreen", "red")) : Plots Petal.Length with points colored based on
Species (setosa in dark green, others in red).
hist(iris$Sepal.Length) : Histogram of Sepal.Length , showing its distribution.
boxplot(iris$Sepal.Length ~ iris$Species) : Boxplot comparing Sepal.Length across the three species, showing medians, quartiles, and outliers.

Terminologies

Structure: Detailed information about a dataset’s variables and types.


Range : The minimum and maximum values of a variable.
Mean: The average value of a numeric variable.
Summary: A statistical overview of a dataset (min, quartiles, mean, max).
Scatterplot: A plot of two variables with points showing their relationship.
Boxplot : A plot showing median, quartiles, and outliers for a variable across groups.

Key Concepts

str() and glimpse() help understand a dataset’s structure.


Statistical functions like mean() , range() , and summary() provide quick insights.
Base R plotting ( plot() , hist() , boxplot() ) is a simple way to visualize data distributions and relationships.

Section 7: Advanced Visualization with ggplot2

Code
#install.packages("ggplot2")
library(ggplot2)
# check the distribution between the sepal length and petal length of iris
ggplot(data = iris, mapping = aes(x= Petal.Length, y= Sepal.Length)) +
geom_point()
# can we see the distribution of different species on the plot
# can we seggregate the points w.r..t th especies
ggplot(data = iris, mapping = aes(x= Petal.Length, y= Sepal.Length)) +
geom_point(aes(color = Species))
# check the distribution of sepal width
ggplot(data = iris, aes(x = Sepal.Width)) +
geom_histogram(bins = 10, color = "tomato", fill = 'darkblue')
# do a density plot to check the distribution of petal length
ggplot(data = iris, aes(x= Petal.Length))+
geom_density(aes(color = Species))

Explanation

library(ggplot2) : Loads the ggplot2 package, which provides a powerful system for creating layered, customizable plots.
ggplot(data = iris, mapping = aes(x= Petal.Length, y= Sepal.Length)) + geom_point() : Creates a scatterplot with Petal.Length on the x-
axis and Sepal.Length on the y-axis. ggplot() initializes the plot, aes() defines aesthetics, and geom_point() adds points.
ggplot(...) + geom_point(aes(color = Species)) : Enhances the scatterplot by coloring points based on Species (setosa, versicolor, virginica),
making it easier to see species-specific patterns.
ggplot(data = iris, aes(x = Sepal.Width)) + geom_histogram(bins = 10, color = "tomato", fill = 'darkblue') : Plots a histogram of
Sepal.Width with 10 bins. color sets the border color to "tomato" (a shade of red), and fill sets the bar fill to "darkblue".
ggplot(data = iris, aes(x= Petal.Length)) + geom_density(aes(color = Species)) : Creates a density plot of Petal.Length , with separate
curves for each species, colored differently.

Terminologies

ggplot2 : A powerful R package for creating layered, customizable plots.


Aesthetic (aes): Mappings like x , y , and color that define how variables are displayed in a plot.
Geom: Geometric objects (e.g., geom_point for points, geom_histogram for bars, geom_density for density curves).
Density Plot: A smoothed curve showing the distribution of a numeric variable, often used to compare groups.

Key Concepts

ggplot2 uses a grammar of graphics, where plots are built by layering components ( ggplot() , aes() , geom_ ).
Aesthetics like color allow for visual differentiation of groups (e.g., species).
Histograms and density plots are advanced ways to visualize distributions, offering more control than base R plotting.

Section 8: Additional Dataset Exploration

Code

#-------------
mtcars
?mtcars
as.factor(cyl)

Explanation

mtcars : Loads and displays the built-in mtcars dataset, which contains data on 32 cars across 11 variables (e.g., mpg , cyl , hp ).
?mtcars : Opens the help documentation for mtcars , explaining its variables and source.
as.factor(cyl) : Attempts to convert cyl (number of cylinders) to a factor, but cyl isn’t a standalone vector here. Correct usage would be mtcars$cyl
<- as.factor(mtcars$cyl) .

Key Concepts

The mtcars dataset is another standard dataset for learning R.


Converting numeric variables like cyl to factors is common when treating them as categories (e.g., 4, 6, 8 cylinders).

Summary of Key Skills Learned


Basic Syntax: Creating sequences, assigning variables, and understanding data types (numeric, character, logical).
Date Handling : Converting strings to dates ( as.Date() ) and extracting components ( format() ).
Vectors and Data Frames: Building vectors ( c() ), handling coercion, converting to factors ( as.factor() ), and creating data frames ( data.frame() ).
Dataset Exploration: Loading datasets ( iris , mtcars ), viewing data ( head() , tail() ), checking structure ( str() , glimpse() ), and subsetting ( [] , $ ).
Statistical Analysis: Computing statistics ( mean() , range() , summary() ).
Visualization: Using base R ( plot() , hist() , boxplot() ) and ggplot2 ( geom_point() , geom_histogram() , geom_density() ) to create plots.
Categorical Data : Working with factors and visualizing group differences.

Practical Applications

Data Analysis : Use these skills to load, clean, and summarize datasets in real-world projects (e.g., analyzing sales data).
Visualization: Create plots to communicate insights effectively (e.g., comparing species in iris ).
Preprocessing: Prepare data by converting types (e.g., factors) and subsetting for modeling or reporting.

This guide covers every line of secA.R , ensuring you have a thorough understanding of R programming fundamentals. You can copy this markdown content into a
tool like RStudio or an online markdown-to-PDF converter (e.g., Pandoc) to create a downloadable PDF for study.

You might also like