0% found this document useful (0 votes)
4 views

Statistics and Data Science with R Part -4

Uploaded by

Mahima Mehra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Statistics and Data Science with R Part -4

Uploaded by

Mahima Mehra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Statistics

and
Data Science with R
Arrays

• Arrays in R are multi-dimensional data structures


created using the array() function.

• Arrays store elements of the same data type and


can have two or more dimensions.

• Accessing elements involves indexing based on the


dimensions of the array.

• Arrays are widely used for handling multi-


dimensional data in various scientific and
computational applications.
2D Array # Creating a 2D array
data_values <- c(1, 2, 3, 4, 5, 6) # Input data
my_2d_array <- array(data_values, dim =
c(3, 2)) # 3x2 array
print(my_2d_array)

# Accessing elements of an array using


indices
element <- my_2d_array[1, 1]
print(element)
# Creating a 3D array with random values
3D Array my_array <- array(data = 1:24, dim = c(3, 4, 2))
print(my_array)

# Accessing elements of an array using


indices
element <- my_array[2, 3, 1] # Accessing the
element in the 2nd row, 3rd column, and 1st
'layer'
print(element)

# Changing the dimensions of an array


dim(my_array) <- c(4, 3, 2)
print(my_array)
Data Frame

• Data frames in R are tabular data structures


representing rows and columns, similar to a
spreadsheet or database table.

• They are created using the data.frame() function


and allow for handling heterogeneous data.

• Accessing elements involves indexing based on


row and column positions or by column names.

• Data frames are fundamental for data


manipulation, exploration, and analysis in R and
are extensively used in data science workflows.
# Creating a data frame with different data
Creating a Data Frame types
my_data_frame <- data.frame(
ID = 1:3,
Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 22),
Passed = c(TRUE, FALSE, TRUE)
)

print(my_data_frame)
# Accessing elements of a data frame using
Accessing Data Frame Elements column indices
element <- my_data_frame[2, 3] # Accessing
the element in the 2nd row and 3rd column
print(element)

# Accessing a column by its name


ages <- my_data_frame$Age # Extracting the
'Age' column
print(ages)
# Adding a new column to the data frame
Manipulating a Data Frame my_data_frame$City <- c("New York", "Los
Angeles", "Chicago")
print(my_data_frame)

# Removing a column from the data frame


my_data_frame <- my_data_frame[, -3] #
Removes the 3rd column (Age)
print(my_data_frame)
Factor

• A factor is a categorical variable used to represent


qualitative data. Factors are used to define and
store categorical data, which can be nominal or
ordinal in nature.

• They represent levels or categories within a variable


and are particularly useful in statistical modeling,
especially when dealing with categorical data in
analyses or visualizations.

• Factors are created using the factor() function or by


specifying the factor type when importing data.
Factor

Levels
• Factors have levels that define unique categories
or groups within the variable.
• Levels can be predefined or assigned explicitly
using the levels argument.

Ordered Factors
• Factors can be ordered or unordered. Ordered
factors have a specific sequence or hierarchy
among their levels (e.g., low, medium, high).
# Creating a factor with predefined levels
Creating a Factor gender <- c("Male", "Female", "Female",
"Male", "Male")
factor_gender <- factor(gender)
print(factor_gender)

# Creating a factor with custom levels and


specifying the order
temperature <- c("Low", "Medium", "High",
"Low")
factor_temperature <- factor(temperature,
levels = c("Low", "Medium", "High"), ordered =
TRUE)
print(factor_temperature)
Viewing and Modifying factors # Viewing the levels of a factor
print(levels(factor_gender))

# Changing the levels of a factor


factor_temperature <-
factor(factor_temperature, levels = c("Low",
"Medium", "High", "Very High"))
print(factor_temperature)
Data Import

In R, importing and exporting data is a fundamental task, as it allows you to bring data
from external sources into your R environment for analysis and export results for further
use or sharing.

1. CSV Files
Reading CSV files: You can use the read.csv() or read_csv() (from the readr package)
functions to read CSV files.
install.packages(“readr”)
library(readr)
data <- read_csv("path/to/your/file.csv")

2. Excel Files
Reading Excel files: Use the readxl package, which provides the read_excel() function.
install.packages(“readxl”)
library(readxl)
data <- read_excel("path/to/your/file.xlsx", sheet = 1)
Data Export

1. CSV Files
Writing to CSV files: Use the write.csv() or write_csv() (from
the readr package) functions.
write.csv(data, "path/to/save/your/file.csv")
write_csv(data, "path/to/save/your/file.csv")

2. Excel Files
Writing to Excel files: Use the writexl package package.
install.packages(“writexl”)
library(writexl)
write_xlsx(data, "path/to/save/your/file.xlsx")
Built-in Datasets

To list all available datasets in R, you can use the data()


function with no arguments:

# Load the datasets package (it’s usually loaded by


default)
library(datasets)

# List all built-in datasets


data()
Introduction to dplyr

• The dplyr package in R is a powerful tool for data


manipulation and transformation.

• It's particularly popular due to its intuitive syntax and


efficiency when working with data frames.

• The dplyr package is part of the tidyverse and provides a set


of functions to manipulate data in a data frame. Key functions
include select(), filter(), mutate(), arrange(), and
summarise().

• In dplyr, the %>% operator, known as the pipe operator, is


used to chain together multiple operations, making the code
more readable and concise.
Data Manipulation and transformation using dplyr

# Install and load the dplyr package


install.packages("dplyr")
library(dplyr)

# Load the mtcars dataset


data("mtcars")

#View the dataset


View(mtcars)

#Show the structure of the dataset


str(mtcars)
Data Manipulation and transformation using dplyr

#Top 6 rows of the dataset


head(mtcars)
#bottom 6 rows of the dataset
tail(mtcars)

#Show statistical summary of the dataset


summary(mtcars)

#Selecting Variables
#You can use the select() function to choose specific columns and the
rename() function to rename them.

# Select specific columns (e.g., mpg and cyl)


selected_data <- mtcars %>%
select(mpg, cyl)
View(head(selected_data))
Data Manipulation and transformation using dplyr

#Renaming Variables
#You can use the rename() function to rename them.
renamed_data <- selected_data %>%
rename(
Miles_per_Gallon = mpg,
Cylinder_Count = cyl
)

# Display the result


View(head(renamed_data))
Data Manipulation and transformation using dplyr

#Filtering Rows
The filter() function allows you to subset rows based on certain conditions.
# Filter rows where mpg is greater than 20
filtered_data <- mtcars %>%
filter(mpg > 20)

# Display the result


View(head(filtered_data))
Data Manipulation and transformation using dplyr

#Mutating and Transforming Data


#Use the mutate() function to create new variables or modify existing ones.
# Add a new column that calculates the power-to-weight ratio
mutated_data <- mtcars %>%
mutate(
Power_to_Weight = hp / wt
)

# Display the result


View(head(mutated_data))
Data Manipulation and transformation using dplyr

#Grouping and Summarizing Data


#The group_by() function groups data by one or more variables, and summarise()
calculates summary statistics for each group.

# Group by the number of cylinders and calculate average mpg and hp


grouped_summary <- mtcars %>%
group_by(cyl) %>%
summarise(
Avg_MPG = mean(mpg),
Avg_HP = mean(hp),
Count = n()
)

# Display the result


print(grouped_summary)
We are
Shaping Vibrant Bharat
A member of Grant Thornton International Ltd, Grant Thornton Bharat is at the forefront of helping reshape the values in
the profession. We are helping shape various industry ecosystems through our work across Assurance, Tax, Risk,
Transactions, Technology and Consulting, and are going beyond to shape more #VibrantBharat.

Our offices in India


Ahmedabad Bengaluru Chandigarh Chennai Dehradun Scan QR code to see our office addresses
Goa Gurugram Hyderabad Kochi Kolkata Mumbai www.grantthornton.in
New Delhi Noida Pune

Connect
@Grant-Thornton-Bharat-LLP @GrantThorntonBharat @Grantthornton_bharat @GrantThorntonIN @GrantThorntonBharatLLP [email protected]
with us

© 2024 Grant Thornton Bharat LLP. All rights reserved.


“Grant Thornton Bharat” means Grant Thornton Advisory Private Limited, a member firm of Grant Thornton International Limited (UK) in India, and those legal entities which are its related parties as defined by the Companies Act, 2013,
including Grant Thornton Bharat LLP.
Grant Thornton Bharat LLP, formerly Grant Thornton India LLP, is registered with limited liability with identity number AAA-7677 and has its registered office at L-41 Connaught Circus, New Delhi, 110001.
References to Grant Thornton are to Grant Thornton International Ltd. (Grant Thornton International) or its member firms. Grant Thornton International and the member firms are not a worldwide partnership. Services are delivered
independently by the member firms.

You might also like