Data Types in R Programming
Data Types in R Programming
In R, there are several data types such as integer, string, etc. The operating system allocates
memory based on the data type of the variable and decides what can be stored in the reserved
memory.
There are the following data types which are used in R programming:
1. Atomic vector
2. List
3. Array
4. Matrices
5. Data Frame
6. Factors
Vectors
A vector is the basic data structure in R, or we can say vectors are the most basic R data objects.
There are six types of atomic vectors such as logical, integer, character, double, and raw. "A
vector is a collection of elements which is most commonly of mode character, integer,
logical or numeric" A vector can be one of the following two types:
1. Atomic vector
2. Lists
List
In R, the list is the container. Unlike an atomic vector, the list is not restricted to be a single
mode. A list contains a mixture of data types. The list is also known as generic vectors because
the element of the list can be of any type of R object. "A list is a special type of vector in
which each element can be a different type."
We can create a list with the help of list() or as.list(). We can use vector() to create a required
length empty list.
Arrays
There is another type of data objects which can store data in more than two dimensions known
as arrays. "An array is a collection of a similar data type with contiguous memory
allocation." Suppose, if we create an array of dimension (2, 3, 4) then it creates four
rectangular matrices of two rows and three columns.
In R, an array is created with the help of array() function. This function takes a vector as an
input and uses the value in the dim parameter to create an array.
Matrices
Syntax
Data Frames
Factors are also data objects that are used to categorize the data and store it as levels. Factors
can store both strings and integers. Columns have a limited number of unique values so that
factors are very useful in columns. It is very useful in data analysis for statistical modeling.
Factors are created with the help of factor() function by taking a vector as an input
parameter.
R factors
The factor is a data structure which is used for fields which take only predefined finite
number of values. These are the variable which takes a limited number of different values.
These are the data objects which are used to categorize the data and to store it on multiple
levels. It can store both integers and strings values, and are useful in the column that has a
limited number of unique values.
Factors have labels which are associated with the unique integers stored in it. It contains
predefined set value known as levels and by default R always sorts levels in alphabetical
order.
Attributes of a factor
R provides factor() function to convert the vector into factor. There is the following syntax of
factor() function
1. factor_data<- factor(vector)
Output
Like vectors, we can access the components of factors. The process of accessing components
of factor is much more similar to the vectors. We can access the element with the help of the
indexing method or using logical vectors. Let's see an example in which we understand the
different-different ways of accessing the components.
Example
Output
[1] Shubham Nishka Arpita Nishka Shubham Sumit Nishka Shubham Sumit
[10] Arpita Sumit
Levels: Arpita Nishka Shubham Sumit
[1] Nishka
Levels: Arpita Nishka Shubham Sumit
[1] Shubham Nishka Arpita Shubham Sumit Nishka Shubham Sumit Arpita
[10] Sumit
Levels: Arpita Nishka Shubham Sumit
Like data frames, R allows us to modify the factor. We can modify the value of a factor by
simply re-assigning it. In R, we cannot choose values outside of its predefined levels means
we cannot insert value if it's level is not present on it. For this purpose, we have to create a
level of that value, and then we can add it to our factor.
Example
When we create a frame with a column of text data, R treats this text column as categorical
data and creates factor on it.
Example
Output
In R, we can change the order of the levels in the factor with the help of the factor function.
Example
1. data <- c("Nishka","Gunjan","Shubham","Arpita","Arpita","Sumit","Gunjan","Shubham")
2. # Creating the factors
3. factor_data<- factor(data)
4. print(factor_data)
5.
6. # Apply the factor function with the required order of the level.
7. new_order_factor<- factor(factor_data,levels = c("Gunjan","Nishka","Arpita","Shubham","Su
mit"))
8. print(new_order_factor)
Output
R provides gl() function to generate factor levels. This function takes three arguments i.e., n,
k, and labels. Here, n and k are the integers which indicate how many levels we want and
how many times each level is required.
1. gl(n, k, labels)
Example
1. gen_factor<- gl(3,5,labels=c("BCA","MCA","B.Tech"))
2. gen_factor
Output
[1] BCA BCA BCA BCA BCA MCA MCA MCA MCA MCA
[11] B.Tech B.Tech B.Tech B.Tech B.Tech
Levels: BCA MCA B.Tech