0% found this document useful (0 votes)
5 views

Data Types in R Programming

Uploaded by

Amisha Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Data Types in R Programming

Uploaded by

Amisha Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Data Types in R Programming

In programming languages, we need to use various variables to store various information.


Variables are the reserved memory location to store values. As we create a variable in our
program, some space is reserved in memory.

In R, there are several data types such as integer, string, etc. The operating system allocates
memory based on the data type of the variable and decides what can be stored in the reserved
memory.

There are the following data types which are used in R programming:

Data type Example Description


It is a special data type for data with only two possible
Logical True, False
values which can be construed as true/false.
Decimal value is called numeric in R, and it is the default
Numeric 12,32,112,5432
computational data type.
Integer 3L, 66L, 2346L Here, L tells R to store the value as an integer,
A complex value in R is defined as the pure imaginary
Complex Z=1+2i, t=7+3i
value i.
In R programming, a character is used to represent string
'a', '"good'",
Character values. We convert objects into character values with the
"TRUE", '35.4'
help ofas.character() function.
Raw A raw data type is used to holds raw bytes.
Data Structures in R Programming
Data structures are very important to understand. Data structure are the objects which we will
manipulate in our day-to-day basis in R. Dealing with object conversions is the most common
sources of despairs for beginners. We can say that everything in R is an object.

R has many data structures, which include:

1. Atomic vector
2. List
3. Array
4. Matrices
5. Data Frame
6. Factors

Vectors

A vector is the basic data structure in R, or we can say vectors are the most basic R data objects.
There are six types of atomic vectors such as logical, integer, character, double, and raw. "A
vector is a collection of elements which is most commonly of mode character, integer,
logical or numeric" A vector can be one of the following two types:

1. Atomic vector
2. Lists
List

In R, the list is the container. Unlike an atomic vector, the list is not restricted to be a single
mode. A list contains a mixture of data types. The list is also known as generic vectors because
the element of the list can be of any type of R object. "A list is a special type of vector in
which each element can be a different type."

We can create a list with the help of list() or as.list(). We can use vector() to create a required
length empty list.

Arrays

There is another type of data objects which can store data in more than two dimensions known
as arrays. "An array is a collection of a similar data type with contiguous memory
allocation." Suppose, if we create an array of dimension (2, 3, 4) then it creates four
rectangular matrices of two rows and three columns.

In R, an array is created with the help of array() function. This function takes a vector as an
input and uses the value in the dim parameter to create an array.

Matrices

A matrix is an R object in which the elements are arranged in a two-dimensional rectangular


layout. In the matrix, elements of the same atomic types are contained. For mathematical
calculation, this can use a matrix containing the numeric element. A matrix is created with the
help of the matrix() function in R.

Syntax

The basic syntax of creating a matrix is as follows:

1. matrix(data, no_row, no_col, by_row, dim_name)

Data Frames

A data frame is a two-dimensional array-like structure, or we can say it is a table in which


each column contains the value of one variable, and row contains the set of value from each
column.

There are the following characteristics of a data frame:

1. The column name will be non-empty.


2. The row names will be unique.
3. A data frame stored numeric, factor or character type data.
4. Each column will contain same number of data items.
Factors

Factors are also data objects that are used to categorize the data and store it as levels. Factors
can store both strings and integers. Columns have a limited number of unique values so that
factors are very useful in columns. It is very useful in data analysis for statistical modeling.

Factors are created with the help of factor() function by taking a vector as an input
parameter.

R factors
The factor is a data structure which is used for fields which take only predefined finite
number of values. These are the variable which takes a limited number of different values.
These are the data objects which are used to categorize the data and to store it on multiple
levels. It can store both integers and strings values, and are useful in the column that has a
limited number of unique values.

Factors have labels which are associated with the unique integers stored in it. It contains
predefined set value known as levels and by default R always sorts levels in alphabetical
order.

Attributes of a factor

There are the following attributes of a factor in R


1. X
It is the input vector which is to be transformed into a factor.
2. levels
It is an input vector that represents a set of unique values which are taken by x.
3. labels
It is a character vector which corresponds to the number of labels.
4. Exclude
It is used to specify the value which we want to be excluded,
5. ordered
It is a logical attribute which determines if the levels are ordered.
6. nmax
It is used to specify the upper bound for the maximum number of level.

How to create a factor?

In R, it is quite simple to create a factor. A factor is created in two steps

1. In the first step, we create a vector.


2. Next step is to convert the vector into a factor,

R provides factor() function to convert the vector into factor. There is the following syntax of
factor() function

1. factor_data<- factor(vector)

Let's see an example to understand how factor function is used.


Example

1. # Creating a vector as input.


2. data <- c("Shubham","Nishka","Arpita","Nishka","Shubham","Sumit","Nishka","Shubham","S
umit","Arpita","Sumit")
3.
4. print(data)
5. print(is.factor(data))
6.
7. # Applying the factor function.
8. factor_data<- factor(data)
9.
10. print(factor_data)
11. print(is.factor(factor_data))

Output

[1] "Shubham" "Nishka" "Arpita" "Nishka" "Shubham" "Sumit" "Nishka"


[8] "Shubham" "Sumit" "Arpita" "Sumit"
[1] FALSE
[1] Shubham Nishka Arpita Nishka Shubham Sumit Nishka Shubham Sumit
[10] Arpita Sumit
Levels: Arpita Nishka Shubham Sumit
[1] TRUE
Accessing components of factor

Like vectors, we can access the components of factors. The process of accessing components
of factor is much more similar to the vectors. We can access the element with the help of the
indexing method or using logical vectors. Let's see an example in which we understand the
different-different ways of accessing the components.

Example

1. # Creating a vector as input.


2. data <- c("Shubham","Nishka","Arpita","Nishka","Shubham","Sumit","Nishka","Shubham","S
umit","Arpita","Sumit")
3.
4. # Applying the factor function.
5. factor_data<- factor(data)
6.
7. #Printing all elements of factor
8. print(factor_data)
9.
10. #Accessing 4th element of factor
11. print(factor_data[4])
12.
13. #Accessing 5th and 7th element
14. print(factor_data[c(5,7)])
15.
16. #Accessing all elemcent except 4th one
17. print(factor_data[-4])
18.
19. #Accessing elements using logical vector
20. print(factor_data[c(TRUE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,TRUE)])

Output

[1] Shubham Nishka Arpita Nishka Shubham Sumit Nishka Shubham Sumit
[10] Arpita Sumit
Levels: Arpita Nishka Shubham Sumit

[1] Nishka
Levels: Arpita Nishka Shubham Sumit

[1] Shubham Nishka


Levels: Arpita Nishka Shubham Sumit

[1] Shubham Nishka Arpita Shubham Sumit Nishka Shubham Sumit Arpita
[10] Sumit
Levels: Arpita Nishka Shubham Sumit

[1] Shubham Shubham Sumit Nishka Sumit


Levels: Arpita Nishka Shubham Sumit
Modification of factor

Like data frames, R allows us to modify the factor. We can modify the value of a factor by
simply re-assigning it. In R, we cannot choose values outside of its predefined levels means
we cannot insert value if it's level is not present on it. For this purpose, we have to create a
level of that value, and then we can add it to our factor.

Let's see an example to understand how the modification is done in factors.

Example

1. # Creating a vector as input.


2. data <- c("Shubham","Nishka","Arpita","Nishka","Shubham")
3.
4. # Applying the factor function.
5. factor_data<- factor(data)
6.
7. #Printing all elements of factor
8. print(factor_data)
9.
10. #Change 4th element of factor with sumit
11. factor_data[4] <-"Arpita"
12. print(factor_data)
13.
14. #change 4th element of factor with "Gunjan"
15. factor_data[4] <- "Gunjan" # cannot assign values outside levels
16. print(factor_data)
17.
18. #Adding the value to the level
19. levels(factor_data) <- c(levels(factor_data),"Gunjan")#Adding new level
20. factor_data[4] <- "Gunjan"
21. print(factor_data)
Output

[1] Shubham Nishka Arpita Nishka Shubham


Levels: Arpita Nishka Shubham
[1] Shubham Nishka Arpita Arpita Shubham
Levels: Arpita Nishka Shubham
Warning message:
In `[<-.factor`(`*tmp*`, 4, value = "Gunjan") :
invalid factor level, NA generated
[1] Shubham Nishka Arpita Shubham
Levels: Arpita Nishka Shubham
[1] Shubham Nishka Arpita Gunjan Shubham
Levels: Arpita Nishka Shubham Gunjan
Factor in Data Frame

When we create a frame with a column of text data, R treats this text column as categorical
data and creates factor on it.

Example

1. # Creating the vectors for data frame.


2. height <- c(132,162,152,166,139,147,122)
3. weight <- c(40,49,48,40,67,52,53)
4. gender <- c("male","male","female","female","male","female","male")
5.
6. # Creating the data frame.
7. input_data<- data.frame(height,weight,gender)
8. print(input_data)
9.
10. # Testing if the gender column is a factor.
11. print(is.factor(input_data$gender))
12.
13. # Printing the gender column to see the levels.
14. print(input_data$gender)

Output

height weight gender


1 132 40 male
2 162 49 male
3 152 48 female
4 166 40 female
5 139 67 male
6 147 52 female
7 122 53 male
[1] TRUE
[1] male male female female male female male
Levels: female male
Changing order of the levels

In R, we can change the order of the levels in the factor with the help of the factor function.

Example
1. data <- c("Nishka","Gunjan","Shubham","Arpita","Arpita","Sumit","Gunjan","Shubham")
2. # Creating the factors
3. factor_data<- factor(data)
4. print(factor_data)
5.
6. # Apply the factor function with the required order of the level.
7. new_order_factor<- factor(factor_data,levels = c("Gunjan","Nishka","Arpita","Shubham","Su
mit"))
8. print(new_order_factor)

Output

[1] Nishka Gunjan Shubham Arpita Arpita Sumit Gunjan Shubham


Levels: Arpita Gunjan Nishka Shubham Sumit
[1] Nishka Gunjan Shubham Arpita Arpita Sumit Gunjan Shubham
Levels: Gunjan Nishka Arpita Shubham Sumit
Generating Factor Levels

R provides gl() function to generate factor levels. This function takes three arguments i.e., n,
k, and labels. Here, n and k are the integers which indicate how many levels we want and
how many times each level is required.

There is the following syntax of gl() function which is as follows

1. gl(n, k, labels)

1. n indicates the number of levels.


2. k indicates the number of replications.
3. labels is a vector of labels for the resulting factor levels.

Example

1. gen_factor<- gl(3,5,labels=c("BCA","MCA","B.Tech"))
2. gen_factor

Output

[1] BCA BCA BCA BCA BCA MCA MCA MCA MCA MCA
[11] B.Tech B.Tech B.Tech B.Tech B.Tech
Levels: BCA MCA B.Tech

You might also like