0% found this document useful (0 votes)
46 views25 pages

Statistics With R Unit 1

The document provides an introduction to R programming and data types in R. It discusses the basic data types in R including numeric, integer, logical, complex, and character data types. It explains how to find the data type of an object using the class() function. The document also covers R operators including arithmetic, logical, relational, and assignment operators. Finally, it provides details about data frames in R, noting that data frames store tabular data and can contain different data types across columns.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views25 pages

Statistics With R Unit 1

The document provides an introduction to R programming and data types in R. It discusses the basic data types in R including numeric, integer, logical, complex, and character data types. It explains how to find the data type of an object using the class() function. The document also covers R operators including arithmetic, logical, relational, and assignment operators. Finally, it provides details about data frames in R, noting that data frames store tabular data and can contain different data types across columns.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Unit I

Introduction to R Programming

R and R Studio, Logical Arguments, Missing Values, Characters, Factors and


Numeric, Help in R, Vector to Matrix, Matrix Access, Data Frames, Data Frame
Access, Basic Data Manipulation Techniques, Usage of various apply functions –
apply, lapply, sapply and tapply, Outliers treatment.

What is R
R is a popular programming language used for statistical computing and
graphical presentation.

Its most common use is to analyze and visualize data.

Why Use R?
 It is a great resource for data analysis, data visualization, data science
and machine learning
 It provides many statistical techniques (such as statistical tests,
classification, clustering and data reduction)
 It is easy to draw graphs in R, like pie charts, histograms, box plot,
scatter plot, etc++
 It works on different platforms (Windows, Mac, Linux)
 It is open-source and free
 It has a large community support
 It has many packages (libraries of functions) that can be used to solve
different problems

Explain in detail about different data types in R.

Data Types
In programming, data type is an important concept.

Variables can store data of different types, and different types can do
different things.

In R, variables do not need to be declared with any particular type, and can
even change type after they have been set:

Basic Data Types


Basic data types in R can be divided into the following types:

Basic Data Types Values

Numeric Set of all real numbers

Integer Set of all integers, Z

Logical TRUE and FALSE

Complex Set of complex numbers

Character “a”, “b”, “c”, …, “@”, “#”, “$”, …., “1”, “2”, …etc

Numeric Datatype
Decimal values are called numerics in R. It is the default data type for
numbers in R. If you assign a decimal value to a variable x as follows, x will
be of numeric type.
# A simple R program
# to illustrate Numeric data type
 
# Assign a decimal value to x
x = 5.6
 
# print the class name of variable
print(class(x))
 
# print the type of variable
print(typeof(x))
Integer Datatype
R supports integer data types which are the set of all integers. You can
create as well as convert a value into an integer type using
the as.integer() function. You can also use the capital ‘L’ notation as a suffix
to denote that a particular value is of the integer data type.
# A simple R program
# to illustrate integer data type
 
# Create an integer value
x = as.integer(5)
 
# print the class name of x
print(class(x))
 
# print the type of x
print(typeof(x))
 
# Declare an integer by appending an L suffix.
y = 5L
 
# print the class name of y
print(class(y))
 
# print the type of y
print(typeof(y))
Logical Datatype
R has logical data types that take either a value of true or false. A logical
value is often created via a comparison between variables. 
# A simple R program
# to illustrate logical data type
 
# Sample values
x = 4
y = 3
 
# Comparing two values
z = x > y
 
# print the logical value
print(z)
 
# print the class name of z
print(class(z))
 
# print the type of z
print(typeof())
Complex Datatype
R supports complex data types that are set of all the complex numbers. The
complex data type is to store numbers with an imaginary component. 
# A simple R program
# to illustrate complex data type
 
# Assign a complex value to x
x = 4 + 3i
 
# print the class name of x
print(class(x))
 
# print the type of x
print(typeof(x))
Character Datatype
R supports character data types where you have all the alphabets and
special characters. It stores character values or strings. Strings in R can
contain alphabets, numbers, and symbols. The easiest way to denote that a
value is of character type in R is to wrap the value inside single or double
inverted commas. 
# A simple R program
# to illustrate character data type
 
# Assign a character value to char
char = "Geeksforgeeks"
 
# print the class name of char
print(class(char))
 
# print the type of char
print(typeof(char))
Find data type of an object
To find the data type of an object you have to use class() function. The
syntax for doing that is you need to pass the object as an argument to the
function class() to find the data type of an object.
Syntax:  
class(object)
# A simple R program
# to find data type of an object
 
# Logical
print(class(TRUE))
 
# Integer
print(class(3L))
 
# Numeric
print(class(10.5))
 
# Complex
print(class(1+2i))
 
# Character
print(class("12-04-2020"))

R – Operators
Operators are the symbols directing the compiler to perform various kinds of
operations between the operands. Operators simulate the various
mathematical, logical, and decision operations performed on a set of
Complex Numbers, Integers, and Numericals as input operands. 
R Operators 
R supports majorly four kinds of binary operators between a set of operands.
In this article, we will see various types of operators in R Programming
language and their usage.
Types of the operator in R language
 Arithmetic Operators
 Logical Operators
 Relational Operators
 Assignment Operators
 Miscellaneous Operator
Arithmetic Operators
Arithmetic operations simulate various math operations, like addition,
subtraction, multiplication, division, and modulo using the specified operator
between operands, which may be either scalar values, complex numbers, or
vectors. The operations are performed element-wise at the corresponding
positions of the vectors. 
Operator Name Example

+ Addition x+y

- Subtraction x-y

* Multiplication x*y

/ Division x/y

^ Exponent x^y

%% Modulus (Remainder from x %% y


division)

%/% Integer Division x%/%y

Input : a <- c (1, 0.1)


b <- c (2.33, 4)
print (a+b)
Output : 3.33 4.10
Logical Operators
Logical operations simulate element-wise decision operations, based on the
specified operator between the operands, which are then evaluated to either
a True or False boolean value. Any non-zero integer value is considered as a
TRUE value, be it a complex or real number. 

Operator Description

& Element-wise Logical AND operator. It returns TRUE if both


elements are TRUE
&& Logical AND operator - Returns TRUE if both statements are
TRUE

| Elementwise- Logical OR operator. It returns TRUE if one of


the statement is TRUE

|| Logical OR operator. It returns TRUE if one of the statement


is TRUE.

! Logical NOT - returns FALSE if statement is TRUE

Relational Operators
The relational operators carry out comparison operations between the
corresponding elements of the operands. Returns a boolean TRUE value if
the first operand satisfies the relation compared to the second. A TRUE
value is always considered to be greater than the FALSE. 

Operator Name Example

== Equal x == y

!= Not equal x != y

> Greater than x>y

< Less than x<y


>= Greater than or equal to x >= y

<= Less than or equal to x <= y

Assignment Operators
Assignment operators are used to assigning values to various data objects in
R. The objects may be integers, vectors, or functions. These values are then
stored by the assigned variable names. There are two kinds of assignment
operators: Left and Right
my_var <- 3

my_var <<- 3

3 -> my_var

3 ->> my_var

my_var # print my_var

Miscellaneous Operators
These are the mixed operators that simulate the printing of sequences and
assignment of vectors, either left or right-handed. 

Operator Description Example

: Creates a series of numbers in a x <- 1:10


sequence

%in% Find out if an element belongs to a x %in% y


vector

%*% Matrix Multiplication x <- Matrix1 %*%


Matrix2
Write in detail about different data frames in R.

Data Frames
R Programming Language is an open-source programming language that is
widely used as a statistical software and data analysis tool. Data Frames in
R Language are generic data objects of R which are used to store the
tabular data. Data frames can also be interpreted as matrices where each
column of a matrix can be of the different data types. DataFrame is made up
of three principal components, the data, rows, and columns. 

R – Data Frames

Following are the characteristics of a data frame.

 The column names should be non-empty.


 The row names should be unique.
 The data stored in a data frame can be of numeric, factor or character type.
 Each column should contain same number of data items.

Data Frames are data displayed in a format as a table.

Data Frames can have different types of data inside it. While the first column
can be character, the second and third can be numeric or logical. However,
each column should have the same type of data.
Use the data.frame() function to create a data frame:

# Create a data frame


Data_Frame <- data.frame (
  Training = c("Strength", "Stamina", "Other"),
  Pulse = c(100, 150, 120),
  Duration = c(60, 30, 45)
)

# Print the data frame


Data_Frame
Training Pulse Duration
1 Strength 100 60
2 Stamina 150 30
3 Other 120 45

Summarize the Data


Use the summary() function to summarize the data from a Data Frame:

Example
Data_Frame <- data.frame (
  Training = c("Strength", "Stamina", "Other"),
  Pulse = c(100, 150, 120),
  Duration = c(60, 30, 45)
)

Data_Frame

summary(Data_Frame)

raining Pulse Duration


Other :1 Min. :100.0 Min. :30.0
Stamina :1 1st Qu.:110.0 1st Qu.:37.5
Strength:1 Median :120.0 Median :45.0
Mean :123.3 Mean :45.0
3rd Qu.:135.0 3rd Qu.:52.5
Max. :150.0 Max. :60.0
Write short note on various apply function in R.

apply(), lapply(), sapply(), and tapply() in R


The apply() collection is a part of R essential package. This family of
functions helps us to apply a certain function to a certain data frame, list, or
vector and return the result as a list or vector depending on the function we
use. There are these following four types of function in apply() function
family:

apply() function
The apply() function lets us apply a function to the rows or columns of a
matrix or data frame. This function takes matrix or data frame as an
argument along with function and whether it has to be applied by row or
column and returns the result in the form of a vector or array or list of values
obtained.

Syntax: apply( x, margin, function )


Parameters:
 x: determines the input array including matrix.
 margin: If the margin is 1 function is applied across row, if the
margin is 2 it is applied across the column.
 function:  determines the function that is to be applied on input
data.
# create sample data
sample_matrix <- matrix(C<-(1:10),nrow=3, ncol=10)
  
print( "sample matrix:")
sample_matrix
  
# Use apply() function across row to find sum
print("sum across rows:")
apply( sample_matrix, 1, sum)
  
# use apply() function across column to find mean
print("mean across columns:")
apply( sample_matrix, 2, mean)
lapply() function
The lapply() function helps us in applying functions on list objects and returns
a list object of the same length. The lapply() function in the R Language
takes a list, vector, or data frame as input and gives output in the form of a
list object. Since the lapply() function applies a certain operation to all the
elements of the list it doesn’t need a MARGIN. 

syntax: lapply( x, fun )
Parameters:
 x: determines the input vector or an object.
 fun: determines the function that is to be applied to input data.

# create sample data


names <- c("priyank", "abhiraj","pawananjani",
           "sudhanshu","devraj")
print( "original data:")
names
  
# apply lapply() function
print("data after lapply():")
lapply(names, toupper)
sapply() function
The sapply() function helps us in applying functions on a list, vector, or data
frame and returns an array or matrix object of the same length. The sapply()
function in the R Language takes a list, vector, or data frame as input and
gives output in the form of an array or matrix object. Since the sapply()
function applies a certain operation to all the elements of the object it doesn’t
need a MARGIN. It is the same as lapply() with the only difference being the
type of return object.
Syntax: sapply( x, fun )
Parameters:
 x: determines the input vector or an object.
 fun: determines the function that is to be applied to input data.
# create sample data
sample_data<- data.frame( x=c(1,2,3,4,5,6),
                          y=c(3,2,4,2,34,5))
print( "original data:")
sample_data
  
# apply sapply() function
print("data after sapply():")
sapply(sample_data, max)
tapply() function
The tapply() helps us to compute statistical measures (mean, median, min,
max, etc..) or a self-written function operation for each factor variable in a
vector. It helps us to create a subset of a vector and then apply some
functions to each of the subsets. For example, in an organization, if we have
data of salary of employees and we want to find the mean salary for male
and female, then we can use tapply() function with male and female as factor
variable gender.
Syntax: tapply( x, index,  fun )
Parameters:
 x: determines the input vector or an object.
 index:  determines the factor vector that helps us distinguish the
data.
 fun: determines the function that is to be applied to input data.
# load library tidyverse
library(tidyverse)
  
# print head of diamonds dataset
print(" Head of data:")
head(diamonds)
  
# apply tapply function to get average price by cut
print("Average price for each cut of diamond:")
tapply(diamonds$price, diamonds$cut, mean)
Explain with an example for plot function in R.

Plot
The plot() function is used to draw points (markers) in a diagram.

The function takes parameters for specifying points in the diagram.

Parameter 1 specifies points on the x-axis.

Parameter 2 specifies points on the y-axis.

At its simplest, you can use the plot() function to plot two numbers against
each other:

Example
Draw one point in the diagram, at position (1) and position (3):

plot(1, 3)
Example
Draw two points in the diagram, one at position (1, 3) and one in position (8,
10):

plot(c(1, 8), c(3, 10))
Multiple Points
You can plot as many points as you like, just make sure you have the same
number of points in both axis:

Example
plot(c(1, 2, 3, 4, 5), c(3, 7, 8, 9, 12))
Example
x <- c(1, 2, 3, 4, 5)
y <- c(3, 7, 8, 9, 12)

plot(x, y)
Sequences of Points
If you want to draw dots in a sequence, on both the x-axis and the y-axis,
use the : operator:

Example
plot(1:10)
Draw a Line
The plot() function also takes a type parameter with the value l to draw a
line to connect all the points in the diagram:

Example
plot(1:10, type="l")
Plot Labels
The plot() function also accept other parameters, such
as main, xlab and ylab if you want to customize the graph with a main title and
different labels for the x and y-axis:

Example
plot(1:10, main="My Graph", xlab="The x-axis", ylab="The y axis")
Scatter Plots
You learned from the Plot chapter that the plot() function is used to plot
numbers against each other.

A "scatter plot" is a type of plot used to display the relationship between two
numerical variables, and plots one dot for each observation.

It needs two vectors of same length, one for the x-axis (horizontal) and one
for the y-axis (vertical):

Example
x <- c(5,7,8,7,2,2,9,4,11,12,9,6)
y <- c(99,86,87,88,111,103,87,94,78,77,85,86)

plot(x, y)
Pie Charts
A pie chart is a circular graphical view of data.

Use the pie() function to draw pie charts:

Example
# Create a vector of pies
x <- c(10,20,30,40)

# Display the pie chart


pie(x)
Bar Charts
A bar chart uses rectangular bars to visualize data. Bar charts can be
displayed horizontally or vertically. The height or length of the bars are
proportional to the values they represent.

Use the barplot() function to draw a vertical bar chart:

Example
# x-axis values
x <- c("A", "B", "C", "D")

# y-axis values
y <- c(2, 4, 6, 8)

barplot(y, names.arg = x)
For more programs in R :
https://www.w3schools.com/r/

You might also like