What Is A Data Frame in R?
What Is A Data Frame in R?
1. Data Analysis: Data frames are the foundation for data analysis in R.
You can perform statistical tests, hypothesis testing, and regression
analysis with structured data.
2. Data Visualization: Data frames are compatible with R’s data
visualization packages (e.g., ggplot2), allowing you to create a wide
range of charts, graphs, and visualizations for data exploration and
presentation.
3. Data Cleaning and Preprocessing: Data frames are used to clean and
preprocess data, including handling missing values, dealing with
outliers, and standardizing data.
4. Data Subsetting and Filtering: Analysts use data frames to extract
specific subsets of data based on criteria and conditions, facilitating
focused analysis.
5. Merging and Joining Data: Data frames are essential for combining
data from multiple sources. You can merge or join data based on
common variables to create comprehensive datasets.
6. Grouped Operations: Packages like dplyr make it easy to perform
grouped operations and aggregations on data, making it simple to
compute group-wise statistics.
7. Machine Learning: Many machine learning algorithms in R require
data frames as input. You can prepare your data in a data frame format
before applying machine learning techniques.
8. Time Series Analysis: Data frames are used to store and analyze time
series data, enabling time-based operations and modeling.
9. Reporting and Dashboards: Data frames are employed in creating
reports and dashboards using RMarkdown, Shiny, and other reporting
tools, providing a structured format for data presentation.
10.Export and Sharing: After analysis and modeling, you can export
results as data frames for sharing with colleagues or use in other
applications.
2. Initialize a Data Frame in R using
data.frame()
To explore a data frame, the first step is to create one. You can easily create a
data frame using the data.frame() function. To do this, simply pass a list of
vectors of the same length as an argument to the function. Each vector
represents a column in the data frame, and it’s important to ensure that the
length of each column is equal to the number of rows in the data frame. Besides
this, there are different ways to create a data frame in R.
2.1 Syntax of data.frame()
To initialize a data frame in R, you can use the data.frame() function That
takes a list or vector as its first argument. In R, a vector contains elements of
the same data type, such as logical, integer, double, character, complex, or raw.
Let’s create vectors of equal length and pass them into this function to get the
data frame.
# Create Vectors
id <- c(10,11,12,13)
name <- c('sai','ram','deepika','sahithi')
dob <- as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16'))
# Print DataFrame
print("Create data frame:")
df
As you can see from the above example, I have created a data frame using
vectors.
By default, the data frame assigns sequential numbers as row indexes, starting
from 1.
In another way, you can create a data frame using vectors. Let’s create vectors
within the data.frame() function and create a data frame of specified dimension.
# Create DataFrame
df <- data.frame (
id = c(10,11,12,13),
name = c('sai','ram','deepika','sahithi'),
dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16'))
)
# Print DataFrame
df
Yields the same as the output.
To get the type of each column of the data frame you can use
the sapply() function. For that, you need to pass the class parameter along
with the data frame into this function. It will return the data type of each
column present in the data frame.
# Display datatypes
print(sapply(df, class))
# Output:
# id name dob
# "numeric" "Factor" "Date"
You can also check the type of the data frame using the str() function. To get
the data type of each column of the data frame using this function you can
simply pass the data frame into this function. It will return the data type of each
column very explicitly.
# Display datatypes
str(df)
# Output
# 'data.frame': 4 obs. of 3 variables:
# $ id : num 10 11 12 13
# $ name: Factor w/ 4 levels "deepika","ram",..: 4 2 1 3
# $ dob : Date, format: "1990-10-02" "1981-03-24" "1987-06-14" "1985-08-16"