Basics of R
Basics of R
Dr.V.Sree Ramani
C.B.I.T.
What is R
R is a popular programming language used for statistical computing and graphical
presentation. Its most common use is to analyze and visualize data. R was created
by statistician Ross Ihaka and statistician and bio informaticist Robert Gentleman from
the University of Auckland in 1992 on the basis of the programming language S. The
first official stable version (1.0) was released in 2000.
Why Use R?
•It is a great resource for data analysis, data visualization, data science and machine
learning
•It provides many statistical techniques (such as statistical tests, classification, clustering
and data reduction)
•It is easy to draw graphs in R, like pie charts, histograms, box plot, scatter plot, etc+
•It works on different platforms (Windows, Mac, Linux)
•It is open-source and free
•It has a large community support
•It has many packages (libraries of functions) that can be used to solve different problems
Features of R Programming
• Basic Statistics: The most common basic statistics terms are the mean, mode,
and median. These are all known as “Measures of Central Tendency.” So using
the R language we can measure central tendency very easily.
• Static graphics: R is rich with facilities for creating and developing interesting
static graphics. R contains functionality for many plot types including graphic
maps, mosaic plots, biplots, and the list goes on.
• Probability distributions: Probability distributions play a vital role in statistics
and by using R we can easily handle various types of probability distribution
such as Binomial Distribution, Normal Distribution, Chi-squared Distribution
and many more.
• Data analysis: It provides a large, coherent and integrated collection of tools
for data analysis.
• R Packages: One of the major features of R is it has a wide availability of
libraries. R has CRAN (Comprehensive R Archive Network), which is a repository
holding more than 10, 0000 packages.
• Distributed Computing: Distributed computing is a model in which
components of a software system are shared among multiple computers to
improve efficiency and performance.
General arithmetic operations in R
Example1
"Hello World!"
Example2
5
10
25
Example3
5+5
Basic Data Types
•integer - (1L, 55L, 100L, where the letter "L" declares this as
an integer)
2. List: Lists are heterogeneous data structures. They are very similar to
vectors except they can store data of different types. To create a list, we use
the list() function.
Note1. character>numeric>logical
2. as.integer/as.logical
3.assign(“v2”,list(1,2,3,”a”,12+3i, TRUE))
3. Matrix: Matrices are two-dimensional, homogeneous data structures. This means
that all values in a matrix have to be of the same type. Coercion takes place if there is
more than one data type. They have rows and columns. By default, matrices are in
column-wise order. The basic syntax to create a matrix is
: >matrix( data, nrow, ncol, byrow, dimnames)
nrow is the number of rows, ncol is the number of columns,
byrow is a logical which tells the function to arrange the matrix row-wise, by default it
is set to FALSE,
dimnames is a list of the names of the rows/columns created.
Example: >rownames = c("row1", "row2", "row3")
>colnames = c("col1", "col2", "col3")
>test_matrix2 = matrix(c(1:9), ncol = 3, dimnames = list(rownames, colnames))
# R program to access
# components of a list
Example:
student_id <- c(1:5)
> student_name <- c("raj", "jacob", "iqbal", "shawn", "hitesh")
> student_rank <- c("third", "fifth", "second", "fourth", "first")
> student.data <- data.frame(student_id , student_name,
student_rank)
> student.data
5. Array Arrays are three dimensional, homogeneous data
structures. They are collections of matrices stacked one on top of
the other in layers. We can create an array using the array()
function. The following is the syntax of it:
Array_name = array(data,dim,dimnames)
data is the data that is filled inside the array,
dim is a vector containing the dimensions of the array, and
dimnames is a list containing the names of the rows, columns, and
matrices inside the array
Example >
arr1 <- array(c(1:18),
dim=c(2,3,3))
> arr1
6. Factors Factors are vectors that can only store predefined values.
They are useful for storing categorical data. Factors have two attributes:
• Class – which has a value of “factor”, it makes it behave differently
than a normal vector.
• Levels – which is the set of allowed values You can create a factor
using the factor() function.
For example:
fac <- factor(c("a", "b", "a", "b", "b"))
fac
Note:They can store both strings and integers. They are useful to
categorize unique values in columns like “TRUE” or “FALSE”, or
“MALE” or “FEMALE”,
# Creating a vector(INPUT)
x <-c("female", "male", "male", "female") Output
print(x) [1] "female" "male" "male" "female"
[1] female male male female
Levels: female male
# Converting the vector x into a factor
# named gender
gender <-factor(x)
print(gender)