0% found this document useful (0 votes)
11 views5 pages

STA 272 Chapter 02 Notes and Codes Data Frames in R

The document provides an overview of data frames in R, describing their structure, creation, and manipulation. It includes examples of creating data frames, accessing and modifying data, and using built-in datasets like mtcars and iris for analysis. Additionally, it highlights common operations such as filtering, sorting, and aggregating data within data frames.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

STA 272 Chapter 02 Notes and Codes Data Frames in R

The document provides an overview of data frames in R, describing their structure, creation, and manipulation. It includes examples of creating data frames, accessing and modifying data, and using built-in datasets like mtcars and iris for analysis. Additionally, it highlights common operations such as filtering, sorting, and aggregating data within data frames.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Data Frames in R

A data frame is one of the most common data structures in R. It is similar to a table in a
relational database or a spreadsheet in Excel. A data frame is a collection of columns that can
contain different data types (numeric, character, factor, etc.). Each column is a vector, and all
columns in a data frame have the same length.

1. Creating a Data Frame

You can create a data frame using the data.frame() function.

r
Copy code
# Creating a simple data frame
employee_data <- data.frame(
Name = c("Alice", "Bob", "Charlie", "David"),
Age = c(25, 30, 35, 28),
Department = factor(c("HR", "Finance", "IT", "HR")),
Salary = c(50000, 60000, 70000, 55000)
)

# Viewing the data frame


print(employee_data)

Output:

markdown
Copy code
Name Age Department Salary
1 Alice 25 HR 50000
2 Bob 30 Finance 60000
3 Charlie 35 IT 70000
4 David 28 HR 55000

2. Accessing Data in a Data Frame

You can access specific rows, columns, or individual elements using various methods.

2.1. Accessing Columns:

You can access columns in a data frame using $ or by indexing.

r
Copy code
# Accessing a single column
employee_data$Name

# Accessing a column by index


employee_data[, "Age"]

# Accessing multiple columns


employee_data[, c("Name", "Salary")]
2.2. Accessing Rows:

You can access rows using the row index.

r
Copy code
# Accessing the first row
employee_data[1, ]

# Accessing specific rows


employee_data[c(1, 3), ]

2.3. Accessing Specific Elements:

You can access individual elements using [row, column].

r
Copy code
# Accessing the element in row 2, column 4
employee_data[2, 4]

# Accessing the "Salary" of "Charlie"


employee_data[employee_data$Name == "Charlie", "Salary"]

3. Modifying Data Frames

You can easily add, remove, or modify data in a data frame.

3.1. Adding a New Column:

r
Copy code
# Adding a new column "Experience" to the data frame
employee_data$Experience <- c(3, 5, 10, 4)
print(employee_data)

3.2. Adding a New Row:

You can add a new row using rbind().

r
Copy code
# Adding a new row
new_employee <- data.frame(
Name = "Eve",
Age = 27,
Department = factor("IT", levels = levels(employee_data$Department)),
Salary = 58000,
Experience = 2
)

employee_data <- rbind(employee_data, new_employee)


print(employee_data)

3.3. Removing a Column:


You can remove a column using NULL.

r
Copy code
# Removing the "Experience" column
employee_data$Experience <- NULL
print(employee_data)

4. Built-in Datasets in R

R comes with several built-in datasets that are useful for learning and testing. Here are a few
commonly used ones:

4.1. The mtcars Dataset:

The mtcars dataset contains data about car models, including variables like miles per gallon
(mpg), number of cylinders (cyl), horsepower (hp), etc.

r
Copy code
# Loading the mtcars dataset
data(mtcars)

# Viewing the first few rows


head(mtcars)

Example Analysis:

r
Copy code
# Calculating the average miles per gallon
mean_mpg <- mean(mtcars$mpg)
mean_mpg

# Plotting horsepower vs. miles per gallon


plot(mtcars$hp, mtcars$mpg, main = "Horsepower vs. Miles Per Gallon",
xlab = "Horsepower", ylab = "Miles Per Gallon", col = "blue", pch =
19)

4.2. The iris Dataset:

The iris dataset is famous in data science and machine learning. It contains measurements
for different flower species (setosa, versicolor, virginica) and includes variables like sepal
length, sepal width, petal length, and petal width.

r
Copy code
# Loading the iris dataset
data(iris)

# Viewing the structure of the dataset


str(iris)

# Summary statistics
summary(iris)
Example Analysis:

r
Copy code
# Boxplot of Sepal Length by Species
boxplot(Sepal.Length ~ Species, data = iris,
main = "Sepal Length by Species",
xlab = "Species", ylab = "Sepal Length",
col = c("red", "green", "blue"))

# Scatter plot of Sepal Length vs. Sepal Width


plot(iris$Sepal.Length, iris$Sepal.Width, col = iris$Species,
main = "Sepal Length vs. Sepal Width", xlab = "Sepal Length", ylab =
"Sepal Width",
pch = 19)
legend("topright", legend = levels(iris$Species), col = c("red", "green",
"blue"), pch = 19)

5. Common Data Frame Operations

Here are a few common operations often performed on data frames:

5.1. Filtering Rows:

You can filter rows based on conditions.

r
Copy code
# Filter rows where Salary is greater than 55000
high_salary <- employee_data[employee_data$Salary > 55000, ]
print(high_salary)

5.2. Sorting Data Frames:

You can sort data frames by one or more columns.

r
Copy code
# Sort the employee data by Salary in descending order
sorted_employee_data <- employee_data[order(employee_data$Salary,
decreasing = TRUE), ]
print(sorted_employee_data)

5.3. Aggregating Data:

You can aggregate data using the aggregate() function.

r
Copy code
# Calculate the average Salary by Department
avg_salary_by_dept <- aggregate(Salary ~ Department, data = employee_data,
mean)
print(avg_salary_by_dept)
6. Application Examples using Built-in Datasets

6.1. Social Science: Analyzing Demographic Data using mtcars

You can treat variables in mtcars like demographic data (e.g., treating car characteristics as
socio-economic features).

r
Copy code
# Analyzing the relationship between weight (wt) and miles per gallon (mpg)
cor(mtcars$wt, mtcars$mpg)

# Visualization
plot(mtcars$wt, mtcars$mpg, main = "Car Weight vs. Miles Per Gallon",
xlab = "Weight (1000 lbs)", ylab = "Miles Per Gallon", col = "purple",
pch = 18)

6.2. Business Analytics: Customer Segmentation using iris

You can treat the species in iris as customer segments and analyze their characteristics.

r
Copy code
# Calculate the average Sepal Length for each Species
avg_sepal_length <- aggregate(Sepal.Length ~ Species, data = iris, mean)
print(avg_sepal_length)

# Boxplot comparison of Petal Length across species


boxplot(Petal.Length ~ Species, data = iris, main = "Petal Length by
Species",
xlab = "Species", ylab = "Petal Length", col = c("pink", "yellow",
"cyan"))

Conclusion

Data frames are the go-to structure for most tabular data in R. Understanding how to create,
manipulate, and analyze data frames is crucial for any data analysis work. With built-in
datasets like mtcars and iris, you can practice these concepts and explore more complex
analyses.

Happy analyzing!

You might also like