0% found this document useful (0 votes)

39 views43 pages

Data Transformation 1 Reviewed

Here are the steps to solve this exercise: 1. Rank flights by arrival delay from most to least delayed: ranked_flights <- flights %>% mutate(arr_delay_rank = min_rank(-arr_delay)) 2. Select only the specified fields: ranked_flights <- ranked_flights %>% select(year, month, day, dest, arr_delay, sched_dep_time, arr_delay_rank) 3. Arrange by the new ranking field: ranked_flights <- ranked_flights %>% arrange(arr_delay_rank) 4. Add a cumulative arrival delay variable: ranked_flights <- ranked

Uploaded by

JORDI MASFERRER

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views43 pages

Data Transformation 1 Reviewed

Uploaded by

JORDI MASFERRER

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

[IN007]

Data Analysis Tools

Data Transformation

IN007 – Data Analysis Tools Pág. 1

Data Transformation
Basics
■ Coding basics

IN007 – Data Analysis Tools Pág. 2

Data Transformation
Basics
■ Tibbles and data frames basics
■ Flights ➔ library(nycflights13)

■ View(flights)
■ glimpse(flights)
■ ?flights

IN007 – Data Analysis Tools Pág. 3

Data Transformation
Basics
■ Data types

IN007 – Data Analysis Tools Pág. 4

Data Transformation
Basics
■ Data transformations – introduction

– These can all be used in conjunction with group_by() which changes the scope of each function from
operating on the entire dataset to operating on it group-by-group. These six functions provide the verbs for
a language of data manipulation.

■ Make sure you have installed and loaded dplyr library

■ library(dplyr)

IN007 – Data Analysis Tools Pág. 5

Data Transformation
Basics
■ Data transformations – introduction

■ <VERB> (<data frame>, <manipulations>)

– …and remember to assign the result to a variable if you want to make some use of it.

IN007 – Data Analysis Tools Pág. 6

Data Transformation
Filter
■ Filter
■ Allows you to subset observations based on their values.
■ The first argument is the name of the data frame (you already know that)
■ The second and subsequent arguments are the expressions that filter the data frame.
■ For example, we can select all flights on January 1st with:

IN007 – Data Analysis Tools Pág. 7

Data Transformation
Filter
■ Just type the sentence to see the results

■ Assign it to a variable if you want to keep the results for further use

■ Assign + put the whole thing into brackets for both at the same time:
■ (jan1 <- filter(flights, month == 1, day == 1))

■ Logical operators:
■ ==, >, >=, <, <=, !=
■ You may also use near(<parameter 1>, <parameter 2>)

IN007 – Data Analysis Tools Pág. 8

Data Transformation
Filter
■ Logical operators

IN007 – Data Analysis Tools Pág. 9

Data Transformation
Filter
■ Logical operators

■ Can you tell the result of the following commands?

■ filter(flights, month == 11 | month == 12)
■ filter(flights, month != 11 & month != 12)
■ filter(flights, month != 11 | month != 12)
■ filter(flights, month == 11 & month == 12)

IN007 – Data Analysis Tools Pág. 10

Data Transformation
Filter
■ Obtain a list of the flights that were NOT delayed for more than an hour neither on
arrival or on departure.

■ Obtain a list of flights that were scheduled to depart on February between 5:00am and
9:59am

■ Any ideas on how to check the results?

IN007 – Data Analysis Tools Pág. 11

Data Transformation
Filter
■ Obtain a list of the flights that were NOT delayed for more than an hour neither on
arrival or on departure.

■ Obtain a list of flights that were scheduled to depart on February between 5:00am and
9:59am

■ Any ideas on how to check the results?

IN007 – Data Analysis Tools Pág. 12

Data Transformation
Filter
■ Always beware of null / unknown / missing / Not Available / NA values

■ is.na always returns TRUE (1) or FALSE (0). Nothing else.

■ filter() only includes rows where the condition is TRUE; it excludes both FALSE and NA
values. If you want to preserve missing values, ask for them explicitly.

IN007 – Data Analysis Tools Pág. 13

Data Transformation
Filter
■ Filtering NAs

IN007 – Data Analysis Tools Pág. 14

Data Transformation
Filter
■ Filtering NAs

IN007 – Data Analysis Tools Pág. 15

Data Transformation
Arrange
■ Arrange works similar to filter. The goal is to reorder the rows (not the columns), that is,
the observations, in ascending order by default.

■ arrange(flights, year, month, day) → sort first by year, then by month, then by day

■ arrange(flights, desc(dep_delay)) → sort by dep_delay in descending order

■ Missing values are always sorted at the end → how can we change this?

IN007 – Data Analysis Tools Pág. 16

Data Transformation
Arrange
■ Arrange works similar to filter. The goal is to reorder the rows (not the columns), that is,
the observations, in ascending order by default.

■ arrange(flights, year, month, day) → sort first by year, then by month, then by day

■ arrange(flights, desc(dep_delay)) → sort by dep_delay in descending order

■ Missing values are always sorted at the end → how can we change this?

■ arrange(flights, desc(is.na(dep_time)), dep_time)

■ Remember is.na returns TRUE (1) or FALSE (0) only, so if you sort descending you will
get the TRUE first, meaning you will get the NA values first.

IN007 – Data Analysis Tools Pág. 17

Data Transformation
Select
■ Just like “filter” generates a subset of observations (rows), “select” generates a subset
of variables (columns)

IN007 – Data Analysis Tools Pág. 18

Data Transformation
Select
■ Rename
■ Keeps all the variables not explicitly mentioned

IN007 – Data Analysis Tools Pág. 19

Data Transformation
Select
■ Select (…, everything()) → useful, among other things, to reorder variables (e.g. bring
them to the beginning of the data frame)

IN007 – Data Analysis Tools Pág. 20

Data Transformation
Select
■ Different ways to use Select
■ Brainstorm as many ways as possible to select dep_time, dep_delay, arr_time, and
arr_delay from flights.
■ select(flights, dep_time, dep_delay, arr_time, arr_delay)
■ select(flights, "dep_time", "dep_delay", "arr_time", "arr_delay")
■ select(flights, 4, 6, 7, 9) #column numbers
■ select(flights, all_of(c("dep_time", "dep_delay", "arr_time", "arr_delay")))
■ select(flights, any_of(c("dep_time", "dep_delay", "arr_time", "arr_delay")))
■ select(flights, starts_with("dep_"), starts_with("arr_"))
■ vars <- c("year", "dep_time", "dep_delay", "arr_time", "arr_delay")
select(flights, any_of(vars))

IN007 – Data Analysis Tools Pág. 21

Data Transformation
Mutate
■ Add new variables (columns) at the end of your data frame

IN007 – Data Analysis Tools Pág. 22

Data Transformation
Mutate
■ Add new variables (columns) at the end of your data frame

■ You can also add CONSTANTS!!!

IN007 – Data Analysis Tools Pág. 23

Data Transformation
Mutate
■ You can refer to variables you just created

IN007 – Data Analysis Tools Pág. 24

Data Transformation
Mutate
■ Using transmute instead of mutate lets you keep only the newly created variables:

IN007 – Data Analysis Tools Pág. 25

Data Transformation
Mutate
■ Some useful operations (remember that mutate and transmute do not aggregate data, the
number of observations does not change!):
■ Arithmetic operators: +, -, *, /, ^.
■ Modular arithmetic: %/% (integer division) and %% (remainder), where x == y * (x %/%
y) + (x %% y)

■ Logical comparisons, <, <=, >, >=, !=, and ==

■ Find the "previous" (lag()) or "next" (lead()) values in a vector. Useful for comparing
values behind of or ahead of the current values.
■ Cumulative and rolling aggregates: running sums, products, mins and maxes:
cumsum(), cumprod(), cummin(), cummax(); and cummean() for cumulative means.
■ Ranking: there are a number of ranking functions, but you should start with min_rank().
It does the most usual type of ranking (e.g. 1st, 2nd, 2nd, 4th). The default gives
smallest values the first ranks; use desc(x) to give the largest values the last ranks.

IN007 – Data Analysis Tools Pág. 26

Data Transformation
Mutate
■ Exercises (in order!)

■ Add a new value to the flights data frame: a ranking from less to most delayed
values. Consider the delay of arrival. Keep only date fields, destination, arrival
delay and scheduled depart time

■ Arrange the data frame by the newly created field. In case of doubt, sort by
departure time

■ Add a cumulative delay field.

IN007 – Data Analysis Tools Pág. 27

Data Transformation
Mutate
■ Exercises (in order!)

■ Add a new variable to the flights data frame: a ranking from most to less delayed
flights. Consider the delay of arrival. Keep only date fields, destination, arrival
delay and scheduled departure time
■ a<-transmute(flights, year, month, day, sched_dep_time, dest, arr_delay,
most_delayed=min_rank(desc(arr_delay)))

■ Arrange the data frame by the newly created field. In case of doubt, sort by
departure time
■ b<-arrange(a, most_delayed, sched_dep_time)

■ Add a cumulative delay field.

■ (c<-mutate(b, cumulated_delay=cumsum(arr_delay)))

IN007 – Data Analysis Tools Pág. 28

Data Transformation
Summarise
■ Summarise collapses a data frame into a summary of data
■ Used alone, it creates one single row:

■ Generally, we use it with “group_by” to summarize information in groups

IN007 – Data Analysis Tools Pág. 30

Data Transformation
Summarise
■ Count: useful to check that you’re not drawing conclusions based on very small
amounts of data
■ n() → Count the number of observations
■ sum(!is.na(x)) → Count the number of non-missing values
■ n_distinct(x) → Count the number of distinct values
■ Counts and proportions of logical values: sum(x > 10), mean(y == 0)
– summarise(n_early = sum(dep_time < 500))
» Sums number of observations for which dep_time is less than 500

■ Exercise: get the number of flights per day.

IN007 – Data Analysis Tools Pág. 31

■ Exercise: get the number of flights per day.

IN007 – Data Analysis Tools Pág. 32

Data Transformation
Summarise
■ Exercise: what is the relationship between the distance of the destinations and the
average delay for each of them?

■ Tips:
1. Understand the statement
2. Imagine and mentally visualize what you’re trying to achieve
3. Start thinking about the necessary steps to get there
1. Which data will you require? How is it structured?
2. What kind of visualization can help you?

IN007 – Data Analysis Tools Pág. 33

Data Transformation
Summarise
■ Exercise: what is the relationship between the distance of the destinations and the
average delay for each of them?
■ 1 .Group flights by destination.

■ 2. Summarise to compute distance, average delay, and number of flights.

■ 3. Filter to remove noisy points if necessary (hint: avoid, at least, destinations with
less than 20 flights)

IN007 – Data Analysis Tools Pág. 34

Data Transformation
Summarise
■ Exercise: what is the relationship between the distance of the destinations and the
average delay for each of them?
■ 4. Plot de results

■ 5. Any outliers you would like to ignore? How could you have detected them? How
can you filter them?

■ 6. Conclusions??

IN007 – Data Analysis Tools Pág. 35

Data Transformation
Summarise
■ Exercise: what is the relationship between the distance of the destinations and the
average delay for each of them?

IN007 – Data Analysis Tools Pág. 36

Data Transformation
Summarise
■ Some of the most common operations with summarise include, but are not limited to:

■ mean(variable) ➔ we’ve seen it before. Computes the mean / average of the

values for each group

■ sum(variable) ➔ sums all the values. Must be a numerical value or it wont work.

■ max(variable), min(variable) ➔ gets the maximum or the minimum value

■ The “counts” we saw before (n, n_distinct…)

■ Any other operation you’ve seen before

IN007 – Data Analysis Tools Pág. 37

Data Transformation
Summarise
■ Touristic flats exercise: obtain a table with the number of touristic flats per district and
sort it descending. Filter any results you consider an error. Represent the resulting
information in a bar chart

IN007 – Data Analysis Tools Pág. 38

Data Transformation
Summarise
■ Census exercise: create new columns expressing the % of men, women and foreigners
(ESTRANGERS) for each section. Arrange the resulting table by % of women

■ d<-`2022_09_TAULA_MAP_SCENSAL` ➔ careful about the sign ` (not ’, not ´)

■ Plot the relationship between the different age ranges and the number of foreigners for
each section. In which age ranges can you find a more positive / negative relationship?
Can you find any explanation to that?
■ If there are any outliers, DELETE THEM from your dataset:

IN007 – Data Analysis Tools Pág. 39

Data Transformation
Combining operations with the pipe
■ No pipe:

■ Pipe:

IN007 – Data Analysis Tools Pág. 40

Data Transformation
Combining operations with the pipe
■ Some more examples:
■ Average delays (previous exercise) not considering cancelled flights beforehand

■ Planes that have the highest average delays:

IN007 – Data Analysis Tools Pág. 41

Data Transformation
Combining operations with the pipe
■ Always keep in mind the data frame you’re generating!
■ At this point you have a table with three variables: talinum, n, delay

■ Plot your results in order to understand them!!

■ Does this make sense? Why?

IN007 – Data Analysis Tools Pág. 42

Data Transformation
Summarise
■ Measures of spread: sd(x), IQR(x), mad(x). The root mean squared deviation, or
standard deviation sd(x), is the standard measure of spread. The interquartile range
IQR(x) and median absolute deviation mad(x) are robust equivalents that may be more
useful if you have outliers.

IN007 – Data Analysis Tools Pág. 43

Data Transformation
Summarise
■ Measures of rank: min(x), quantile(x, 0.25)
■ Measures of position: first(x), nth(x, 2), last(x)

IN007 – Data Analysis Tools Pág. 44

SAP ABAP Performance Tuning
From Everand
SAP ABAP Performance Tuning
May
4.5/5 (28)
R in Action, Second Edition
0% (2)
R in Action, Second Edition
2 pages
Private Health Institutions Law
100% (1)
Private Health Institutions Law
22 pages
Learn R Programming in 24 Hours
From Everand
Learn R Programming in 24 Hours
Alex Nordeen
No ratings yet
MTH 4407 - Group 2 (Dr. Farid Zamani) - Lecture 6
No ratings yet
MTH 4407 - Group 2 (Dr. Farid Zamani) - Lecture 6
22 pages
Merging Datasets
No ratings yet
Merging Datasets
24 pages
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
No ratings yet
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
31 pages
Tutorial-Introduction To Dplyr
No ratings yet
Tutorial-Introduction To Dplyr
54 pages
Intro To Data Science Lecture 4
No ratings yet
Intro To Data Science Lecture 4
13 pages
Data - Table Tutorial (With 50 Examples) PDF
No ratings yet
Data - Table Tutorial (With 50 Examples) PDF
13 pages
Data - Analysis Using Matlab
No ratings yet
Data - Analysis Using Matlab
156 pages
14 Work With Big Data
No ratings yet
14 Work With Big Data
74 pages
Data Manipulation Workshop Handout
No ratings yet
Data Manipulation Workshop Handout
46 pages
Scope Analytics
No ratings yet
Scope Analytics
25 pages
BS730 Class 12
No ratings yet
BS730 Class 12
36 pages
Module I
No ratings yet
Module I
74 pages
Excel DataAnalysis
No ratings yet
Excel DataAnalysis
38 pages
Coursera Notes
No ratings yet
Coursera Notes
4 pages
Data Handling and Manipulation
No ratings yet
Data Handling and Manipulation
18 pages
Introduction To Dplyr
No ratings yet
Introduction To Dplyr
9 pages
Study Guide Data Manipulation With R
No ratings yet
Study Guide Data Manipulation With R
4 pages
Data
No ratings yet
Data
126 pages
Matlab Mathworks Data Analysis
No ratings yet
Matlab Mathworks Data Analysis
167 pages
Data Analysis
No ratings yet
Data Analysis
110 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
R Sharing
No ratings yet
R Sharing
16 pages
Chapter 3 - Tagged
No ratings yet
Chapter 3 - Tagged
63 pages
2 Manipulating Processing Data
No ratings yet
2 Manipulating Processing Data
81 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
37 pages
Data Cleaning Using R
No ratings yet
Data Cleaning Using R
26 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
94 pages
Preprocessing Techniques
No ratings yet
Preprocessing Techniques
63 pages
Unit 2
No ratings yet
Unit 2
76 pages
DM Merged
No ratings yet
DM Merged
169 pages
Data Minds - Data Science Curriculum 2023 V2
No ratings yet
Data Minds - Data Science Curriculum 2023 V2
15 pages
Business Analytics - L2
No ratings yet
Business Analytics - L2
41 pages
Ivy - Data Analytics and Data Visualization Certification Course
No ratings yet
Ivy - Data Analytics and Data Visualization Certification Course
9 pages
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
No ratings yet
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
36 pages
Data Analysis
No ratings yet
Data Analysis
106 pages
CleaningData Chapter 3
No ratings yet
CleaningData Chapter 3
29 pages
Data Analysis
No ratings yet
Data Analysis
116 pages
Course HandOut Data Analytics Course 2024
No ratings yet
Course HandOut Data Analytics Course 2024
5 pages
R Programming
No ratings yet
R Programming
11 pages
Week13 Slides Review
No ratings yet
Week13 Slides Review
23 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Wk6 Preprocessing
No ratings yet
Wk6 Preprocessing
64 pages
MATLAB Data Analysis - MathWorks - MATLAB and Simulink For
No ratings yet
MATLAB Data Analysis - MathWorks - MATLAB and Simulink For
104 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
Data Mining and Knowledge Discovery
No ratings yet
Data Mining and Knowledge Discovery
65 pages
Data Analytics With Python Lecture 1
No ratings yet
Data Analytics With Python Lecture 1
23 pages
Binder 1
No ratings yet
Binder 1
4 pages
Dev Record Aids
No ratings yet
Dev Record Aids
24 pages
Mda Practical2 Eda
No ratings yet
Mda Practical2 Eda
50 pages
Data Analytics Syllabus
No ratings yet
Data Analytics Syllabus
12 pages
CAP484
No ratings yet
CAP484
2 pages
Module2 BDA
No ratings yet
Module2 BDA
44 pages
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Production System: Fundamentals and Applications
From Everand
Production System: Fundamentals and Applications
Fouad Sabry
No ratings yet
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
SPSS For Beginners: An Illustrative Step-by-Step Approach to Analyzing Statistical data
From Everand
SPSS For Beginners: An Illustrative Step-by-Step Approach to Analyzing Statistical data
Hunt Robert D.
No ratings yet
Chapter 2
No ratings yet
Chapter 2
179 pages
Sundyne Compressor Brochure - US
No ratings yet
Sundyne Compressor Brochure - US
16 pages
Strat Sim
No ratings yet
Strat Sim
289 pages
Obs Gynae Dams Notes 2018 PDF
No ratings yet
Obs Gynae Dams Notes 2018 PDF
398 pages
FICM Unit 3
No ratings yet
FICM Unit 3
6 pages
Accuriopress 6136 6136p 6120 - Additional Information - en - 3 1 0
No ratings yet
Accuriopress 6136 6136p 6120 - Additional Information - en - 3 1 0
60 pages
Bone Forming Tumors
No ratings yet
Bone Forming Tumors
81 pages
Blaszczyk DAOsandRegulatoryCompetition Final
No ratings yet
Blaszczyk DAOsandRegulatoryCompetition Final
17 pages
What Is Athletic Sports and Management?
No ratings yet
What Is Athletic Sports and Management?
3 pages
Seafarer Medical Certificate
No ratings yet
Seafarer Medical Certificate
2 pages
ARINC Meteorological Data Collection and Reporting System (MDCRS)
No ratings yet
ARINC Meteorological Data Collection and Reporting System (MDCRS)
16 pages
A First Introduction To P-Adic Numbers
No ratings yet
A First Introduction To P-Adic Numbers
6 pages
Dilution Systems For Aerosols Series DIL, DDS and HDS: Special Advantages
No ratings yet
Dilution Systems For Aerosols Series DIL, DDS and HDS: Special Advantages
4 pages
6648 0400 5 PS Pi 0001 - F PDF
100% (1)
6648 0400 5 PS Pi 0001 - F PDF
97 pages
Production Process of Monolithic IC
100% (2)
Production Process of Monolithic IC
5 pages
3.1 Tuple Relational Calculus
No ratings yet
3.1 Tuple Relational Calculus
11 pages
Mercedes-Benz: Faculty of Political Science
No ratings yet
Mercedes-Benz: Faculty of Political Science
7 pages
Azure Iot (Complete Steps 1-9 in Order) : Login With Your Live Id To Receive Credit
No ratings yet
Azure Iot (Complete Steps 1-9 in Order) : Login With Your Live Id To Receive Credit
2 pages
Oral Characteristics of Newborns: Journal of Dentistry For Children (Chicago, Ill.) December 2008
No ratings yet
Oral Characteristics of Newborns: Journal of Dentistry For Children (Chicago, Ill.) December 2008
4 pages
Important: Service Data Sheet
No ratings yet
Important: Service Data Sheet
4 pages
Technical Data Sheet & Processing Guide: ENMAT™ Thermoplastics Resin Y1000P
No ratings yet
Technical Data Sheet & Processing Guide: ENMAT™ Thermoplastics Resin Y1000P
6 pages
Herbs and Spices
No ratings yet
Herbs and Spices
13 pages
Metalsa Supplier Manual Rev 4 1
No ratings yet
Metalsa Supplier Manual Rev 4 1
58 pages
Saqs Methods Cog T and D
No ratings yet
Saqs Methods Cog T and D
2 pages
Chapter-4: Operations, Material and Maketing Management: Definition & Importance of Operational Management
No ratings yet
Chapter-4: Operations, Material and Maketing Management: Definition & Importance of Operational Management
47 pages
Chemical Engineering in Practice Second Edition - Sampler
100% (1)
Chemical Engineering in Practice Second Edition - Sampler
99 pages
Resumen Productos Datalogic SENSORES
No ratings yet
Resumen Productos Datalogic SENSORES
219 pages
Kursus ICT Refresh Course Programme (ICTRCP) Tahun 2024 (Sesi 6)
No ratings yet
Kursus ICT Refresh Course Programme (ICTRCP) Tahun 2024 (Sesi 6)
32 pages
Master Thesis Vu Amsterdam
100% (2)
Master Thesis Vu Amsterdam
8 pages

Data Transformation 1 Reviewed

Uploaded by

Data Transformation 1 Reviewed

Uploaded by

[IN007]

Data Analysis Tools

IN007 – Data Analysis Tools Pág. 1

IN007 – Data Analysis Tools Pág. 2

IN007 – Data Analysis Tools Pág. 3

IN007 – Data Analysis Tools Pág. 4

■ Make sure you have installed and loaded dplyr library

IN007 – Data Analysis Tools Pág. 5

■ <VERB> (<data frame>, <manipulations>)

IN007 – Data Analysis Tools Pág. 6

IN007 – Data Analysis Tools Pág. 7

IN007 – Data Analysis Tools Pág. 8

IN007 – Data Analysis Tools Pág. 9

■ Can you tell the result of the following commands?

IN007 – Data Analysis Tools Pág. 10

■ Any ideas on how to check the results?

IN007 – Data Analysis Tools Pág. 11

■ Any ideas on how to check the results?

IN007 – Data Analysis Tools Pág. 12

■ is.na always returns TRUE (1) or FALSE (0). Nothing else.

IN007 – Data Analysis Tools Pág. 13

IN007 – Data Analysis Tools Pág. 14

IN007 – Data Analysis Tools Pág. 15

■ arrange(flights, desc(dep_delay)) → sort by dep_delay in descending order

IN007 – Data Analysis Tools Pág. 16

■ arrange(flights, desc(dep_delay)) → sort by dep_delay in descending order

■ arrange(flights, desc(is.na(dep_time)), dep_time)

IN007 – Data Analysis Tools Pág. 17

IN007 – Data Analysis Tools Pág. 18

IN007 – Data Analysis Tools Pág. 19

IN007 – Data Analysis Tools Pág. 20

IN007 – Data Analysis Tools Pág. 21

IN007 – Data Analysis Tools Pág. 22

■ You can also add CONSTANTS!!!

IN007 – Data Analysis Tools Pág. 23

IN007 – Data Analysis Tools Pág. 24

IN007 – Data Analysis Tools Pág. 25

■ Logical comparisons, <, <=, >, >=, !=, and ==

IN007 – Data Analysis Tools Pág. 26

■ Add a cumulative delay field.

IN007 – Data Analysis Tools Pág. 27

■ Add a cumulative delay field.

IN007 – Data Analysis Tools Pág. 28

■ Generally, we use it with “group_by” to summarize information in groups

IN007 – Data Analysis Tools Pág. 30

■ Exercise: get the number of flights per day.

IN007 – Data Analysis Tools Pág. 31

■ Exercise: get the number of flights per day.

IN007 – Data Analysis Tools Pág. 32

IN007 – Data Analysis Tools Pág. 33

■ 2. Summarise to compute distance, average delay, and number of flights.

IN007 – Data Analysis Tools Pág. 34

IN007 – Data Analysis Tools Pág. 35

IN007 – Data Analysis Tools Pág. 36

■ mean(variable) ➔ we’ve seen it before. Computes the mean / average of the

■ max(variable), min(variable) ➔ gets the maximum or the minimum value

■ The “counts” we saw before (n, n_distinct…)

■ Any other operation you’ve seen before

IN007 – Data Analysis Tools Pág. 37

IN007 – Data Analysis Tools Pág. 38

■ d<-`2022_09_TAULA_MAP_SCENSAL` ➔ careful about the sign ` (not ’, not ´)

IN007 – Data Analysis Tools Pág. 39

IN007 – Data Analysis Tools Pág. 40

■ Planes that have the highest average delays:

IN007 – Data Analysis Tools Pág. 41

■ Plot your results in order to understand them!!

■ Does this make sense? Why?

IN007 – Data Analysis Tools Pág. 42

IN007 – Data Analysis Tools Pág. 43

IN007 – Data Analysis Tools Pág. 44

You might also like