R Programming Easy
R Programming Easy
R
This document provides a detailed breakdown of the secA.R file, which introduces foundational R programming concepts. It includes explanations, terminologies,
and practical insights for each section, making it a comprehensive reference for learning R.
Code
100
1:22
iris
hist(iris$Sepal.Length)
plot(iris)
a <- 100
a = 100
A
'pratyasha'
TRUE
demo(graphics)
c(2.4)
Explanation
100 : Outputs the number 100 to the console. This demonstrates R's interactive nature where expressions are evaluated immediately.
1:22 : Creates a sequence of integers from 1 to 22. Output: [1] 1 2 3 ... 21 22 . The : operator generates a vector of consecutive numbers.
iris : Loads and displays the built-in iris dataset, which contains measurements of 150 iris flowers across 5 columns: Sepal.Length , Sepal.Width ,
Petal.Length , Petal.Width , and Species .
hist(iris$Sepal.Length) : Plots a histogram of the Sepal.Length column, showing its distribution (likely a bell-shaped curve). The $ operator extracts a
column from a data frame.
plot(iris) : Creates a scatterplot matrix, visualizing relationships between all numeric variables in iris (e.g., Sepal.Length vs. Petal.Length ).
a <- 100 and a = 100 : Assigns 100 to variable a . Both <- and = are assignment operators in R, though <- is preferred for clarity. Typing a outputs
100.
A : Attempts to reference variable A , which isn’t defined, resulting in an error ( object 'A' not found ).
'pratyasha' : A character string. Output: [1] "pratyasha" .
TRUE : A logical (boolean) value. Output: [1] TRUE .
demo(graphics) : Runs a demonstration of R’s graphics capabilities, showcasing various plotting functions like histograms, scatterplots, and more.
c(2.4) : Uses the c() function to create a numeric vector with a single element, [1] 2.4 .
Terminologies
Vector: A one-dimensional array of elements of the same type (e.g., 1:22 is a numeric vector).
Data Frame: A table-like structure where columns can be of different types (e.g., iris is a data frame with numeric and factor columns).
Histogram: A graphical representation of a numeric variable’s distribution using bars, where bar height indicates frequency.
Assignment Operator : <- or = used to assign values to variables.
Logical Value : TRUE or FALSE , used for boolean operations.
Scatterplot Matrix: A grid of scatterplots showing pairwise relationships between variables.
Key Concepts
Code
#--------------------------------
23
# we want to check whatg is the type of this 23
class(25)
class(25.9)
class('samikshya')
F
T
class(F)
Explanation
Terminologies
Key Concepts
The class() function identifies an object’s type, which is crucial for data manipulation.
R supports several data types, and understanding them ensures proper operations (e.g., you can’t perform arithmetic on character data).
Code
Explanation
Terminologies
Date Object: A special R class for storing dates, enabling arithmetic (e.g., date differences) and component extraction.
Format Specifiers: Symbols like %d (day), %m (month), %Y (year) used with format() to extract date parts.
Key Concepts
Dates start as character strings but must be converted to Date objects using as.Date() for proper handling.
The format() function extracts specific components, which is useful for time-based analysis (e.g., grouping sales by month).
Code
Explanation
data.frame(name = c('a', 'b','c'), foodHabit = c('veg', 'veg', 'nv'), sports = c('tt', 'tennis', 'fb')) : Creates a data frame for
friends:
Coercion : Automatic conversion of data types (e.g., numeric to character when mixed).
Factor: A categorical variable with predefined levels (e.g., "high" , "mid" ), used for grouping or analysis.
Data Frame: A 2D table-like structure where columns can be different types.
Key Concepts
Vectors are the foundation of R data structures, and mixing types leads to coercion.
Factors are essential for categorical data, often used in statistical modeling.
Data frames organize multiple vectors into a tabular format, similar to a spreadsheet.
Code
#------------------------
# open iris dataset
iris
help(iris)
# get the first six observation s
head(iris)
# get first 10 observations
head(iris, 10)
# get the column or variable names of the dataset
library(tidyverse)
spamData = read_tsv("C:/KIIT/DAR/Dataset/spamD.tsv")
spamData
colnames(spamData)
colnames(iris)
colnames(iris[2])
# check the dimension of the data ( no or rows and columns)
dim(iris)
nrow(iris)
head(iris, 20)
# cehck the last 6 observations
tail(iris)
# check the last 10 observations
tail(iris, 10)
# get the second observation of iris dataset
iris[2, ]
# get the fifth column data or species data
iris[, 5]
# get the sepal.Length column data
iris[, 1]
iris[, "Sepal.Length"]
# get the petal length and petal width data from iris
iris[, c(3,4)]
iris[, 3:4]
# get the 10th observation of species an sepal length
iris[10, c('Sepal.Length', 'Species')]
iris[10, ]
# get the species information for 1st, 51st , and 150th observation
iris[c(1,51,150), 'Species']
# get the species and sepal length for 10th to 20th observation
iris[10:20, c(1,5)]
# get the sepal length data of iris
iris$Sepal.Length
Explanation
Sepal.Length Species
10 5.4 setosa
Terminologies
Tidyverse: A collection of R packages (e.g., dplyr , ggplot2 ) for data manipulation and visualization.
TSV: Tab-separated values file, a text format for tabular data.
Dimension: The size of a data frame (rows × columns).
Indexing: Using [rows, columns] to subset data (e.g., iris[2, ] for row 2).
Key Concepts
Datasets like iris are explored using functions like head() , tail() , and colnames() .
Indexing ( [rows, columns] ) and the $ operator are key for subsetting data.
External data can be loaded using functions like read_tsv() from the tidyverse.
Code
#explore the visual analytics
#------------------------------------
# cehck the structure of iris dataframe
str(iris)
glimpse(iris)
# check th erange of iris speal length
range(iris$Sepal.Length)
# average of petal width
mean(iris$Petal.Length)
mean(iris$Petal.Width)
summary(iris)
summary(mtcars)
# how the sepal length are distributed w.r.t. th e petal lengths
plot(iris$Sepal.Length ~ iris$Petal.Length)
# cehck the distribution of petal length
plot(iris$Petal.Length)
plot(iris$Petal.Length ~ iris$Species)
plot(iris$Petal.Length, col = ifelse(iris$Species == "setosa", "darkgreen", "red"))
# check the distribution of sepal length
hist(iris$Sepal.Length)
# cehck the distribition of sepal length vs different species
boxplot(iris$Sepal.Length ~ iris$Species)
Explanation
Terminologies
Key Concepts
Code
#install.packages("ggplot2")
library(ggplot2)
# check the distribution between the sepal length and petal length of iris
ggplot(data = iris, mapping = aes(x= Petal.Length, y= Sepal.Length)) +
geom_point()
# can we see the distribution of different species on the plot
# can we seggregate the points w.r..t th especies
ggplot(data = iris, mapping = aes(x= Petal.Length, y= Sepal.Length)) +
geom_point(aes(color = Species))
# check the distribution of sepal width
ggplot(data = iris, aes(x = Sepal.Width)) +
geom_histogram(bins = 10, color = "tomato", fill = 'darkblue')
# do a density plot to check the distribution of petal length
ggplot(data = iris, aes(x= Petal.Length))+
geom_density(aes(color = Species))
Explanation
library(ggplot2) : Loads the ggplot2 package, which provides a powerful system for creating layered, customizable plots.
ggplot(data = iris, mapping = aes(x= Petal.Length, y= Sepal.Length)) + geom_point() : Creates a scatterplot with Petal.Length on the x-
axis and Sepal.Length on the y-axis. ggplot() initializes the plot, aes() defines aesthetics, and geom_point() adds points.
ggplot(...) + geom_point(aes(color = Species)) : Enhances the scatterplot by coloring points based on Species (setosa, versicolor, virginica),
making it easier to see species-specific patterns.
ggplot(data = iris, aes(x = Sepal.Width)) + geom_histogram(bins = 10, color = "tomato", fill = 'darkblue') : Plots a histogram of
Sepal.Width with 10 bins. color sets the border color to "tomato" (a shade of red), and fill sets the bar fill to "darkblue".
ggplot(data = iris, aes(x= Petal.Length)) + geom_density(aes(color = Species)) : Creates a density plot of Petal.Length , with separate
curves for each species, colored differently.
Terminologies
Key Concepts
ggplot2 uses a grammar of graphics, where plots are built by layering components ( ggplot() , aes() , geom_ ).
Aesthetics like color allow for visual differentiation of groups (e.g., species).
Histograms and density plots are advanced ways to visualize distributions, offering more control than base R plotting.
Code
#-------------
mtcars
?mtcars
as.factor(cyl)
Explanation
mtcars : Loads and displays the built-in mtcars dataset, which contains data on 32 cars across 11 variables (e.g., mpg , cyl , hp ).
?mtcars : Opens the help documentation for mtcars , explaining its variables and source.
as.factor(cyl) : Attempts to convert cyl (number of cylinders) to a factor, but cyl isn’t a standalone vector here. Correct usage would be mtcars$cyl
<- as.factor(mtcars$cyl) .
Key Concepts
Practical Applications
Data Analysis : Use these skills to load, clean, and summarize datasets in real-world projects (e.g., analyzing sales data).
Visualization: Create plots to communicate insights effectively (e.g., comparing species in iris ).
Preprocessing: Prepare data by converting types (e.g., factors) and subsetting for modeling or reporting.
This guide covers every line of secA.R , ensuring you have a thorough understanding of R programming fundamentals. You can copy this markdown content into a
tool like RStudio or an online markdown-to-PDF converter (e.g., Pandoc) to create a downloadable PDF for study.