0% found this document useful (0 votes)

13 views23 pages

Statistics and Data Science With R Part - 4

Uploaded by

Mahima Mehra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views23 pages

Statistics and Data Science With R Part - 4

Uploaded by

Mahima Mehra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Statistics

and
Data Science with R
Arrays

• Arrays in R are multi-dimensional data structures

created using the array() function.

• Arrays store elements of the same data type and

can have two or more dimensions.

• Accessing elements involves indexing based on the

dimensions of the array.

• Arrays are widely used for handling multi-

dimensional data in various scientific and
computational applications.
2D Array # Creating a 2D array
data_values <- c(1, 2, 3, 4, 5, 6) # Input data
my_2d_array <- array(data_values, dim =
c(3, 2)) # 3x2 array
print(my_2d_array)

# Accessing elements of an array using

indices
element <- my_2d_array[1, 1]
print(element)
# Creating a 3D array with random values
3D Array my_array <- array(data = 1:24, dim = c(3, 4, 2))
print(my_array)

# Accessing elements of an array using

indices
element <- my_array[2, 3, 1] # Accessing the
element in the 2nd row, 3rd column, and 1st
'layer'
print(element)

# Changing the dimensions of an array

dim(my_array) <- c(4, 3, 2)
print(my_array)
Data Frame

• Data frames in R are tabular data structures

representing rows and columns, similar to a
spreadsheet or database table.

• They are created using the data.frame() function

and allow for handling heterogeneous data.

• Accessing elements involves indexing based on

row and column positions or by column names.

• Data frames are fundamental for data

manipulation, exploration, and analysis in R and
are extensively used in data science workflows.
# Creating a data frame with different data
Creating a Data Frame types
my_data_frame <- data.frame(
ID = 1:3,
Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 22),
Passed = c(TRUE, FALSE, TRUE)
)

print(my_data_frame)
# Accessing elements of a data frame using
Accessing Data Frame Elements column indices
element <- my_data_frame[2, 3] # Accessing
the element in the 2nd row and 3rd column
print(element)

# Accessing a column by its name

ages <- my_data_frame$Age # Extracting the
'Age' column
print(ages)
# Adding a new column to the data frame
Manipulating a Data Frame my_data_frame$City <- c("New York", "Los
Angeles", "Chicago")
print(my_data_frame)

# Removing a column from the data frame

my_data_frame <- my_data_frame[, -3] #
Removes the 3rd column (Age)
print(my_data_frame)
Factor

• A factor is a categorical variable used to represent

qualitative data. Factors are used to define and
store categorical data, which can be nominal or
ordinal in nature.

• They represent levels or categories within a variable

and are particularly useful in statistical modeling,
especially when dealing with categorical data in
analyses or visualizations.

• Factors are created using the factor() function or by

specifying the factor type when importing data.
Factor

Levels
• Factors have levels that define unique categories
or groups within the variable.
• Levels can be predefined or assigned explicitly
using the levels argument.

Ordered Factors
• Factors can be ordered or unordered. Ordered
factors have a specific sequence or hierarchy
among their levels (e.g., low, medium, high).
# Creating a factor with predefined levels
Creating a Factor gender <- c("Male", "Female", "Female",
"Male", "Male")
factor_gender <- factor(gender)
print(factor_gender)

# Creating a factor with custom levels and

specifying the order
temperature <- c("Low", "Medium", "High",
"Low")
factor_temperature <- factor(temperature,
levels = c("Low", "Medium", "High"), ordered =
TRUE)
print(factor_temperature)
Viewing and Modifying factors # Viewing the levels of a factor
print(levels(factor_gender))

# Changing the levels of a factor

factor_temperature <-
factor(factor_temperature, levels = c("Low",
"Medium", "High", "Very High"))
print(factor_temperature)
Data Import

In R, importing and exporting data is a fundamental task, as it allows you to bring data
from external sources into your R environment for analysis and export results for further
use or sharing.

1. CSV Files
Reading CSV files: You can use the read.csv() or read_csv() (from the readr package)
functions to read CSV files.
install.packages(“readr”)
library(readr)
data <- read_csv("path/to/your/file.csv")

2. Excel Files
Reading Excel files: Use the readxl package, which provides the read_excel() function.
install.packages(“readxl”)
library(readxl)
data <- read_excel("path/to/your/file.xlsx", sheet = 1)
Data Export

1. CSV Files
Writing to CSV files: Use the write.csv() or write_csv() (from
the readr package) functions.
write.csv(data, "path/to/save/your/file.csv")
write_csv(data, "path/to/save/your/file.csv")

2. Excel Files
Writing to Excel files: Use the writexl package package.
install.packages(“writexl”)
library(writexl)
write_xlsx(data, "path/to/save/your/file.xlsx")
Built-in Datasets

To list all available datasets in R, you can use the data()

function with no arguments:

# Load the datasets package (it’s usually loaded by

default)
library(datasets)

# List all built-in datasets

data()
Introduction to dplyr

• The dplyr package in R is a powerful tool for data

manipulation and transformation.

• It's particularly popular due to its intuitive syntax and

efficiency when working with data frames.

• The dplyr package is part of the tidyverse and provides a set

of functions to manipulate data in a data frame. Key functions
include select(), filter(), mutate(), arrange(), and
summarise().

• In dplyr, the %>% operator, known as the pipe operator, is

used to chain together multiple operations, making the code
more readable and concise.
Data Manipulation and transformation using dplyr

# Install and load the dplyr package

install.packages("dplyr")
library(dplyr)

# Load the mtcars dataset

data("mtcars")

#View the dataset

View(mtcars)

#Show the structure of the dataset

str(mtcars)
Data Manipulation and transformation using dplyr

#Top 6 rows of the dataset

head(mtcars)
#bottom 6 rows of the dataset
tail(mtcars)

#Show statistical summary of the dataset

summary(mtcars)

#Selecting Variables
#You can use the select() function to choose specific columns and the
rename() function to rename them.

# Select specific columns (e.g., mpg and cyl)

selected_data <- mtcars %>%
select(mpg, cyl)
View(head(selected_data))
Data Manipulation and transformation using dplyr

#Renaming Variables
#You can use the rename() function to rename them.
renamed_data <- selected_data %>%
rename(
Miles_per_Gallon = mpg,
Cylinder_Count = cyl
)

# Display the result

View(head(renamed_data))
Data Manipulation and transformation using dplyr

#Filtering Rows
The filter() function allows you to subset rows based on certain conditions.
# Filter rows where mpg is greater than 20
filtered_data <- mtcars %>%
filter(mpg > 20)

# Display the result

View(head(filtered_data))
Data Manipulation and transformation using dplyr

#Mutating and Transforming Data

#Use the mutate() function to create new variables or modify existing ones.
# Add a new column that calculates the power-to-weight ratio
mutated_data <- mtcars %>%
mutate(
Power_to_Weight = hp / wt
)

# Display the result

View(head(mutated_data))
Data Manipulation and transformation using dplyr

#Grouping and Summarizing Data

#The group_by() function groups data by one or more variables, and summarise()
calculates summary statistics for each group.

# Group by the number of cylinders and calculate average mpg and hp

grouped_summary <- mtcars %>%
group_by(cyl) %>%
summarise(
Avg_MPG = mean(mpg),
Avg_HP = mean(hp),
Count = n()
)

# Display the result

print(grouped_summary)
We are
Shaping Vibrant Bharat
A member of Grant Thornton International Ltd, Grant Thornton Bharat is at the forefront of helping reshape the values in
the profession. We are helping shape various industry ecosystems through our work across Assurance, Tax, Risk,
Transactions, Technology and Consulting, and are going beyond to shape more #VibrantBharat.

Our offices in India

Ahmedabad Bengaluru Chandigarh Chennai Dehradun Scan QR code to see our office addresses
Goa Gurugram Hyderabad Kochi Kolkata Mumbai www.grantthornton.in
New Delhi Noida Pune

Connect
@Grant-Thornton-Bharat-LLP @GrantThorntonBharat @Grantthornton_bharat @GrantThorntonIN @GrantThorntonBharatLLP [email protected]
with us

“Grant Thornton Bharat” means Grant Thornton Advisory Private Limited, a member firm of Grant Thornton International Limited (UK) in India, and those legal entities which are its related parties as defined by the Companies Act, 2013,
including Grant Thornton Bharat LLP.
Grant Thornton Bharat LLP, formerly Grant Thornton India LLP, is registered with limited liability with identity number AAA-7677 and has its registered office at L-41 Connaught Circus, New Delhi, 110001.
References to Grant Thornton are to Grant Thornton International Ltd. (Grant Thornton International) or its member firms. Grant Thornton International and the member firms are not a worldwide partnership. Services are delivered
independently by the member firms.

Intelligent Network Design Driven by Big Data Analytics IoT AI and Cloud Comput
100% (1)
Intelligent Network Design Driven by Big Data Analytics IoT AI and Cloud Comput
427 pages
Reinforced Concrete Design To Eurocode 2 7th Edition W. H. Mosley Download
No ratings yet
Reinforced Concrete Design To Eurocode 2 7th Edition W. H. Mosley Download
77 pages
Advance R Prog.-1
No ratings yet
Advance R Prog.-1
24 pages
Service Manual: Separation Unit 841
100% (1)
Service Manual: Separation Unit 841
160 pages
Petronas Technical Standards: Pipelines Pre-Commissioning
100% (4)
Petronas Technical Standards: Pipelines Pre-Commissioning
40 pages
Qualitative Questions Transportation Depth Vf18
No ratings yet
Qualitative Questions Transportation Depth Vf18
28 pages
Python
No ratings yet
Python
94 pages
R Packages Dplyr Sem-III 2021
No ratings yet
R Packages Dplyr Sem-III 2021
13 pages
Led Technical Specification
No ratings yet
Led Technical Specification
3 pages
M3 Dar
No ratings yet
M3 Dar
52 pages
Blast Furnace Burden Permeability: Oleh Nick Standish, October 2013
100% (1)
Blast Furnace Burden Permeability: Oleh Nick Standish, October 2013
43 pages
R1 Uptovisualisation
No ratings yet
R1 Uptovisualisation
122 pages
Reading Content From The File: Application 61: File Writing Demo
No ratings yet
Reading Content From The File: Application 61: File Writing Demo
200 pages
02 - Looking For A Pattern
No ratings yet
02 - Looking For A Pattern
3 pages
Business Analytics - L2
No ratings yet
Business Analytics - L2
41 pages
Unit-4 Big Data Analytics Methods Using R
No ratings yet
Unit-4 Big Data Analytics Methods Using R
57 pages
Digital Marketing Be Etc (Insem.) (2019 Pattern) (Semester Viii) (Elective Vi) March 24
No ratings yet
Digital Marketing Be Etc (Insem.) (2019 Pattern) (Semester Viii) (Elective Vi) March 24
1 page
Business Analytics Unit - IV Notes - 60637706 - 2025 - 05!15!02 - 16
No ratings yet
Business Analytics Unit - IV Notes - 60637706 - 2025 - 05!15!02 - 16
28 pages
Data - Wrangling Analysis
No ratings yet
Data - Wrangling Analysis
26 pages
Module IV
No ratings yet
Module IV
43 pages
Dar 4
No ratings yet
Dar 4
28 pages
Chapter 03 Wrangling
No ratings yet
Chapter 03 Wrangling
40 pages
Module 5-6
No ratings yet
Module 5-6
12 pages
R - Lecture 4
No ratings yet
R - Lecture 4
37 pages
MDPN460 Lecture05
No ratings yet
MDPN460 Lecture05
32 pages
Starting With R
No ratings yet
Starting With R
34 pages
Data Preprocessing
No ratings yet
Data Preprocessing
27 pages
MBA Sem 1 Unit 3 Fundamentals of R
No ratings yet
MBA Sem 1 Unit 3 Fundamentals of R
41 pages
03 UnderstandData
No ratings yet
03 UnderstandData
29 pages
Advanced R Data Analysis Training PDF
No ratings yet
Advanced R Data Analysis Training PDF
72 pages
上課筆記 week 13
No ratings yet
上課筆記 week 13
17 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
40 pages
Part 8
No ratings yet
Part 8
17 pages
Data Minig and Techniquezz
No ratings yet
Data Minig and Techniquezz
48 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
Teacher Assistants Working With Students With Disability: The Role of Adaptability in Enhancing Their Workplace Wellbeing
No ratings yet
Teacher Assistants Working With Students With Disability: The Role of Adaptability in Enhancing Their Workplace Wellbeing
24 pages
LTE Outbound Roaming Session For PCRF: Samir Mohanty
No ratings yet
LTE Outbound Roaming Session For PCRF: Samir Mohanty
82 pages
01 IntroSlides
No ratings yet
01 IntroSlides
43 pages
R Programming Cheat Sheet
No ratings yet
R Programming Cheat Sheet
7 pages
R-Lab p-4,2,1
No ratings yet
R-Lab p-4,2,1
12 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Winding
No ratings yet
Winding
15 pages
MTech R Notes
No ratings yet
MTech R Notes
14 pages
Introduction To R For Business Analytics
No ratings yet
Introduction To R For Business Analytics
7 pages
Sci 10 Lesson 1st Week
No ratings yet
Sci 10 Lesson 1st Week
18 pages
R Data Types 8
No ratings yet
R Data Types 8
7 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
Basic R Dplyr Session 4 Demonstration
No ratings yet
Basic R Dplyr Session 4 Demonstration
18 pages
Ch.3 (Chemical Equilibrium) - 1-2
No ratings yet
Ch.3 (Chemical Equilibrium) - 1-2
31 pages
R - A Practical Course
No ratings yet
R - A Practical Course
42 pages
R
No ratings yet
R
15 pages
Holiday Homework Science 2023
No ratings yet
Holiday Homework Science 2023
17 pages
Health Food G8, Beta
No ratings yet
Health Food G8, Beta
17 pages
Rmarkdown
No ratings yet
Rmarkdown
10 pages
M&M Ev
No ratings yet
M&M Ev
5 pages
Basic R Commands For Data Analysis
No ratings yet
Basic R Commands For Data Analysis
7 pages
Mosfet Basics
No ratings yet
Mosfet Basics
51 pages
R Basics
No ratings yet
R Basics
18 pages
STA 272 Chapter 02 Notes and Codes Data Frames in R
No ratings yet
STA 272 Chapter 02 Notes and Codes Data Frames in R
5 pages
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
No ratings yet
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
15 pages
Bernanke and Blinder (1988)
No ratings yet
Bernanke and Blinder (1988)
5 pages
Lab4-Factors & DataFrames
No ratings yet
Lab4-Factors & DataFrames
5 pages
Unit 1 Big Data Analytics - An Introduction (Final)
No ratings yet
Unit 1 Big Data Analytics - An Introduction (Final)
65 pages
6 Working With Data Frames in R
No ratings yet
6 Working With Data Frames in R
8 pages
Apunts BLOC 1 Estadística
No ratings yet
Apunts BLOC 1 Estadística
15 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
Creating Graphs in R: Stats 4590: Lab #2 Graphics and Printing R Output Jan. 25, 2010
No ratings yet
Creating Graphs in R: Stats 4590: Lab #2 Graphics and Printing R Output Jan. 25, 2010
7 pages
3 Questions On Surds and Indices
No ratings yet
3 Questions On Surds and Indices
7 pages
Business Model Canvas
No ratings yet
Business Model Canvas
7 pages
CH 3
No ratings yet
CH 3
33 pages
DS Lab
No ratings yet
DS Lab
31 pages
FDP Indoglobal Group of Colleges: 27 April To 1 May R Programming Language Assignment Submission
No ratings yet
FDP Indoglobal Group of Colleges: 27 April To 1 May R Programming Language Assignment Submission
12 pages
Dyscalculia Checklist
No ratings yet
Dyscalculia Checklist
2 pages
Lab1: Introduction To R: Islr2
No ratings yet
Lab1: Introduction To R: Islr2
10 pages
Graph Plotting in R Programming
No ratings yet
Graph Plotting in R Programming
12 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
Introduction To Basics of R - Assignment: Log2 (2 5) Log (Exp (1) Exp (2) )
No ratings yet
Introduction To Basics of R - Assignment: Log2 (2 5) Log (Exp (1) Exp (2) )
10 pages
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
No ratings yet
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
31 pages
TM Midea 2nd Generation AC Series 50Hz Medium Static Pressure Duct 20210115 V8
No ratings yet
TM Midea 2nd Generation AC Series 50Hz Medium Static Pressure Duct 20210115 V8
19 pages
ME 2140 Mechanics of Materials Lab: Lab 1B: Uniaxial Compression Test Jacob Stieb
No ratings yet
ME 2140 Mechanics of Materials Lab: Lab 1B: Uniaxial Compression Test Jacob Stieb
8 pages
Lesson 7 - The Data Frame
No ratings yet
Lesson 7 - The Data Frame
7 pages
Image To PDF 24-Jun-2024
No ratings yet
Image To PDF 24-Jun-2024
2 pages
Nural Network
No ratings yet
Nural Network
12 pages
Chapter 4
No ratings yet
Chapter 4
19 pages
Architectural Technologists BizHouse - Uk
No ratings yet
Architectural Technologists BizHouse - Uk
3 pages
R Programming: © 2016 SMART Training Resources Pvt. LTD
No ratings yet
R Programming: © 2016 SMART Training Resources Pvt. LTD
28 pages
Module 1: Unit - 1.1: Introduction To Analytics or R Programming
No ratings yet
Module 1: Unit - 1.1: Introduction To Analytics or R Programming
26 pages
R Prog
No ratings yet
R Prog
27 pages
Grade 7 Multiple 1st Grading
No ratings yet
Grade 7 Multiple 1st Grading
3 pages
Importing The Files
No ratings yet
Importing The Files
14 pages
Common Tags
No ratings yet
Common Tags
3 pages
R-Training For Print
No ratings yet
R-Training For Print
11 pages
Ch.6 Projectile Motion
No ratings yet
Ch.6 Projectile Motion
1 page
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet

Statistics and Data Science With R Part - 4

Uploaded by

Statistics and Data Science With R Part - 4

Uploaded by

Statistics

• Arrays in R are multi-dimensional data structures

• Arrays store elements of the same data type and

• Accessing elements involves indexing based on the

• Arrays are widely used for handling multi-

# Accessing elements of an array using

# Accessing elements of an array using

# Changing the dimensions of an array

• Data frames in R are tabular data structures

• They are created using the data.frame() function

• Accessing elements involves indexing based on

• Data frames are fundamental for data

# Accessing a column by its name

# Removing a column from the data frame

• A factor is a categorical variable used to represent

• They represent levels or categories within a variable

• Factors are created using the factor() function or by

# Creating a factor with custom levels and

# Changing the levels of a factor

To list all available datasets in R, you can use the data()

# Load the datasets package (it’s usually loaded by

# List all built-in datasets

• The dplyr package in R is a powerful tool for data

• It's particularly popular due to its intuitive syntax and

• The dplyr package is part of the tidyverse and provides a set

• In dplyr, the %>% operator, known as the pipe operator, is

# Install and load the dplyr package

# Load the mtcars dataset

#View the dataset

#Show the structure of the dataset

#Top 6 rows of the dataset

#Show statistical summary of the dataset

# Select specific columns (e.g., mpg and cyl)

# Display the result

# Display the result

#Mutating and Transforming Data

# Display the result

#Grouping and Summarizing Data

# Group by the number of cylinders and calculate average mpg and hp

# Display the result

Our offices in India

© 2024 Grant Thornton Bharat LLP. All rights reserved.

You might also like