0% found this document useful (0 votes)

74 views

Introduction To Data-2

The document provides an introduction to exploring flight delay data from New York City airports in 2013. It discusses loading necessary packages, exploring the data frame structure and variables, and using dplyr verbs like filter() and summarize() to extract subsets of the data for analysis. Examples filter flights headed to Raleigh-Durham and calculate delay statistics, demonstrating how to work with the data.

Uploaded by

Sampada Desai

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views

Introduction To Data-2

Uploaded by

Sampada Desai

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Introduction

to data
Complete all Exercises, and submit answers to Questions on the Coursera platform.

Some define statistics as the field that focuses on turning information into knowledge. The first step in that process
is to summarize and describe the raw information the data. In this lab we explore flights, specifically a random
sample of domestic flights that departed from the three major New York City airport in 2013. We will generate
simple graphical and numerical summaries of data on these flights and explore delay times. As this is a large data
set, along the way you’ll also learn the indispensable skills of data processing and subsetting.

Getting started
Load packages
In this lab we will explore the data using the dplyr package and visualize it using the ggplot2 package for
data visualization. The data can be found in the companion package for this course, statsr.

Let’s load the packages.

library(statsr)
library(dplyr)
library(ggplot2)

Data
The Bureau of Transportation Statistics (http://www.rita.dot.gov/bts/about/) (BTS) is a statistical agency that is a
part of the Research and Innovative Technology Administration (RITA). As its name implies, BTS collects and
makes available transportation data, such as the flights data we will be working with in this lab.

We begin by loading the nycflights data frame. Type the following in your console to load the data:

data(nycflights)

The data frame containing 32735 flights that shows up in your workspace is a data matrix, with each row
representing an observation and each column representing a variable. R calls this data format a data frame,
which is a term that will be used throughout the labs.

To view the names of the variables, type the command

names(nycflights)
## [1] "year" "month" "day" "dep_time" "dep_delay"
## [6] "arr_time" "arr_delay" "carrier" "tailnum" "flight"
## [11] "origin" "dest" "air_time" "distance" "hour"
## [16] "minute"

This returns the names of the variables in this data frame. The codebook (description of the variables) is included
below. This information can also be found in the help file for the data frame which can be accessed by typing
?nycflights in the console.

year, month, day: Date of departure

dep_time, arr_time: Departure and arrival times, local timezone.
dep_delay, arr_delay: Departure and arrival delays, in minutes. Negative times represent early
departures/arrivals.
carrier: Two letter carrier abbreviation.

9E: Endeavor Air Inc.
AA: American Airlines Inc.
AS: Alaska Airlines Inc.
B6: JetBlue Airways
DL: Delta Air Lines Inc.
EV: ExpressJet Airlines Inc.
F9: Frontier Airlines Inc.
FL: AirTran Airways Corporation
HA: Hawaiian Airlines Inc.
MQ: Envoy Air
OO: SkyWest Airlines Inc.
UA: United Air Lines Inc.
US: US Airways Inc.
VX: Virgin America
WN: Southwest Airlines Co.
YV: Mesa Airlines Inc.
tailnum: Plane tail number
flight: Flight number
origin, dest: Airport codes for origin and destination. (Google can help you with what code stands for
which airport.)
air_time: Amount of time spent in the air, in minutes.
distance: Distance flown, in miles.
hour, minute: Time of departure broken in to hour and minutes.

A very useful function for taking a quick peek at your data frame, and viewing its dimensions and data types is
str, which stands for structure.

str(nycflights)
## Classes 'tbl_df' and 'data.frame': 32735 obs. of 16 variables:
## $ year : int 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
## $ month : int 6 5 12 5 7 1 12 8 9 4 ...
## $ day : int 30 7 8 14 21 1 9 13 26 30 ...
## $ dep_time : int 940 1657 859 1841 1102 1817 1259 1920 725 1323 ...
## $ dep_delay: num 15 -3 -1 -4 -3 -3 14 85 -10 62 ...
## $ arr_time : int 1216 2104 1238 2122 1230 2008 1617 2032 1027 1549 ...
## $ arr_delay: num -4 10 11 -34 -8 3 22 71 -8 60 ...
## $ carrier : chr "VX" "DL" "DL" "DL" ...
## $ tailnum : chr "N626VA" "N3760C" "N712TW" "N914DL" ...
## $ flight : int 407 329 422 2391 3652 353 1428 1407 2279 4162 ...
## $ origin : chr "JFK" "JFK" "JFK" "JFK" ...
## $ dest : chr "LAX" "SJU" "LAX" "TPA" ...
## $ air_time : num 313 216 376 135 50 138 240 48 148 110 ...
## $ distance : num 2475 1598 2475 1005 296 ...
## $ hour : num 9 16 8 18 11 18 12 19 7 13 ...
## $ minute : num 40 57 59 41 2 17 59 20 25 23 ...

The nycflights data frame is a massive trove of information. Let’s think about some questions we might want
to answer with these data:

We might want to find out how delayed flights headed to a particular destination tend to be.
We might want to evaluate how departure delays vary over months.
Or we might want to determine which of the three major NYC airports has a better on time percentage for
departing flights.

Seven verbs
The dplyr package offers seven verbs (functions) for basic data manipulation:

filter()
arrange()
select()
distinct()
mutate()
summarise()
sample_n()

We will use some of these functions in this lab, and learn about others in a future lab.

Analysis
Departure delays in flights to RaleighDurham (RDU)
We can examine the distribution of departure delays of all flights with a histogram.

ggplot(data = nycflights, aes(x = dep_delay)) +

geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This function says to plot the dep_delay variable from the nycflights data frame on the xaxis. It also defines

a geom (short for geometric object), which describes the type of plot you will produce.

Histograms are generally a very good way to see the shape of a single distribution, but that shape can change
depending on how the data is split between the different bins. You can easily define the binwidth you want to use:

ggplot(data = nycflights, aes(x = dep_delay)) +

geom_histogram(binwidth = 15)
ggplot(data = nycflights, aes(x = dep_delay)) +
geom_histogram(binwidth = 150)

Exercise: How do these three histograms with the various binwidths compare?

If we want to focus on departure delays of flights headed to RDU only, we need to first filter the data for flights
headed to RDU ( dest == "RDU") and then make a histogram of only departure delays of only those flights.
rdu_flights <- nycflights %>%
filter(dest == "RDU")
ggplot(data = rdu_flights, aes(x = dep_delay)) +
geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Let’s decipher these three lines of code:

Line 1: Take the nycflights data frame, filter for flights headed to RDU, and save the result as a

new data frame called rdu_flights.
== means “if it’s equal to”.
RDU is in quotation marks since it is a character string.
Line 2: Basically the same ggplot call from earlier for making a histogram, except that it uses the data
frame for flights headed to RDU instead of all flights.

Logical operators: Filtering for certain observations (e.g. flights from a particular airport) is often of interest in data
frames where we might want to examine observations with certain characteristics separately from the rest of the
data. To do so we use the filter function and a series of logical operators. The most commonly used logical
operators for data analysis are as follows:

== means “equal to”
!= means “not equal to”
> or < means “greater than” or “less than”
>= or <= means “greater than or equal to” or “less than or equal to”

We can also obtain numerical summaries for these flights:
rdu_flights %>%
summarise(mean_dd = mean(dep_delay), sd_dd = sd(dep_delay), n = n())

## # A tibble: 1 x 3
## mean_dd sd_dd n
## <dbl> <dbl> <int>
## 1 11.69913 35.55567 801

Note that in the summarise function we created a list of two elements. The names of these elements are user
defined, like mean_dd, sd_dd, n, and you could customize these names as you like (just don’t use spaces in
your names). Calculating these summary statistics also require that you know the function calls. Note that n()
reports the sample size.

Summary statistics: Some useful function calls for summary statistics for a single numerical variable are as
follows:

mean
median
sd
var
IQR
range
min
max

We can also filter based on multiple criteria. Suppose we are interested in flights headed to San Francisco (SFO)
in February:

sfo_feb_flights <- nycflights %>%

filter(dest == "SFO", month == 2)

Note that we can separate the conditions using commas if we want flights that are both headed to SFO and in
February. If we are interested in either flights headed to SFO or in February we can use the | instead of the
comma.

Question 1 Create a new data frame that includes flights headed to SFO in February, and save
this data frame as sfo_feb_flights. How many flights meet these criteria?
A. 68

B. 1345

C. 2286

D. 3563

E. 32735
# type your code for Question 1 here, and Knit

Question 2 Make a histogram and calculate appropriate summary statistics for arrival delays of
sfo_feb_flights. Which of the following is false?

A. The distribution is unimodal.

B. The distribution is right skewed.

C. No flight is delayed more than 2 hours.

D. The distribution has several extreme values on the right side.

E. More than 50% of flights arrive on time or earlier than scheduled.

# type your code for Question 2 here, and Knit

Another useful functionality is being able to quickly calculate summary statistics for various groups in your data
frame. For example, we can modify the above command using the group_by function to get the same summary
stats for each origin airport:

rdu_flights %>%
group_by(origin) %>%
summarise(mean_dd = mean(dep_delay), sd_dd = sd(dep_delay), n = n())

## # A tibble: 3 x 4
## origin mean_dd sd_dd n
## <chr> <dbl> <dbl> <int>
## 1 EWR 13.365517 32.08492 145
## 2 JFK 15.396667 40.30535 300
## 3 LGA 7.904494 32.18620 356

Here, we first grouped the data by origin, and then calculated the summary statistics.

Question 3 Calculate the median and interquartile range for arr_delays of flights in the

sfo_feb_flights data frame, grouped by carrier. Which carrier is the has the
hights IQR of arrival delays?
A. American Airlines

B. JetBlue Airways

C. Virgin America

D. Delta and United Airlines

E. Frontier Airlines

# type your code for Question 3 here, and Knit

Departure delays over months
Which month would you expect to have the highest average delay departing from an NYC airport?

Let’s think about how we would answer this question:

First, calculate monthly averages for departure delays. With the new language we are learning, we need to
group_by months, then
summarise mean departure delays.
Then, we need to arrange these average delays in descending order

nycflights %>%
group_by(month) %>%
summarise(mean_dd = mean(dep_delay)) %>%
arrange(desc(mean_dd))

## # A tibble: 12 x 2
## month mean_dd
## <int> <dbl>
## 1 7 20.754559
## 2 6 20.350293
## 3 12 17.368189
## 4 4 14.554477
## 5 3 13.517602
## 6 5 13.264800
## 7 8 12.619097
## 8 2 10.687227
## 9 1 10.233333
## 10 9 6.872436
## 11 11 6.103183
## 12 10 5.880374

Question 4 Which month has the highest average departure delay from an NYC airport?
A. January

B. March

C. July

D. October

E. December

# type your code for Question 4 here, and Knit

Question 5 Which month has the highest median departure delay from an NYC airport?
A. January

B. March

C. July
D. October

E. December

# type your code for Question 5 here, and Knit

Question 6 Is the mean or the median a more reliable measure for deciding which month(s) to
avoid flying if you really dislike delayed flights, and why?
A. Mean would be more reliable as it gives us the true average.

B. Mean would be more reliable as the distribution of delays is symmetric.

C. Median would be more reliable as the distribution of delays is skewed.

D. Median would be more reliable as the distribution of delays is symmetric.

E. Both give us useful information.

We can also visualize the distributions of departure delays across months using sidebyside box plots:

ggplot(nycflights, aes(x = factor(month), y = dep_delay)) +

geom_boxplot()

There is some new syntax here: We want departure delays on the yaxis and the months on the xaxis to produce
sidebyside box plots. Sidebyside box plots require a categorical variable on the xaxis, however in the data
frame month is stored as a numerical variable (numbers 1 12). Therefore we can force R to treat this variable
as categorical, what R calls a factor, variable with factor(month).

On time departure rate for NYC airports
Suppose you will be flying out of NYC and want to know which of the three major NYC airports has the best on
time departure rate of departing flights. Suppose also that for you a flight that is delayed for less than 5 minutes is
basically “on time”. You consider any flight delayed for 5 minutes of more to be “delayed”.

In order to determine which airport has the best on time departure rate, we need to

first classify each flight as “on time” or “delayed”,
then group flights by origin airport,
then calculate on time departure rates for each origin airport,
and finally arrange the airports in descending order for on time departure percentage.

Let’s start with classifying each flight as “on time” or “delayed” by creating a new variable with the mutate
function.

nycflights <- nycflights %>%

mutate(dep_type = ifelse(dep_delay < 5, "on time", "delayed"))

The first argument in the mutate function is the name of the new variable we want to create, in this case
dep_type. Then if dep_delay < 5 we classify the flight as "on time" and "delayed" if not, i.e. if the flight
is delayed for 5 or more minutes.

Note that we are also overwriting the nycflights data frame with the new version of this data frame that
includes the new dep_type variable.

We can handle all the remaining steps in one code chunk:

nycflights %>%
group_by(origin) %>%
summarise(ot_dep_rate = sum(dep_type == "on time") / n()) %>%
arrange(desc(ot_dep_rate))

## # A tibble: 3 x 2
## origin ot_dep_rate
## <chr> <dbl>
## 1 LGA 0.7279229
## 2 JFK 0.6935854
## 3 EWR 0.6369892

The summarise step is telling R to count up how many records of the currently found group are on time
sum(dep_type == “on time”) and divide that result by the total number of elements in the currently found
group n() to get a proportion, then to store the answer in a new variable called ot_dep_rate.

Question 7 If you were selecting an airport simply based on on time departure percentage,
which NYC airport would you choose to fly out of?
A. EWR

B. JFK

C. LGA

# type your code for Question 7 here, and Knit

We can also visualize the distribution of on on time departure rate across the three airports using a segmented bar
plot.

ggplot(data = nycflights, aes(x = origin, fill = dep_type)) +

geom_bar()

Question 8 Mutate the data frame so that it includes a new variable that contains the average
speed, avg_speed traveled by the plane for each flight (in mph). What is the tail
number of the plane with the fastest avg_speed? Hint: Average speed can be
calculated as distance divided by number of hours of travel, and note that
air_time is given in minutes. If you just want to show the avg_speed and
tailnum and none of the other variables, use the select function at the end of your
pipe to select just these two variables with select(avg_speed, tailnum). You
can Google this tail number to find out more about the aircraft.
A. N666DN

B. N755US

C. N779JB

D. N947UW

E. N959UW

# type your code for Question 8 here, and Knit

Question 9 Make a scatterplot of avg_speed vs. distance. Which of the following is true

about the relationship between average speed and distance.
A. As distance increases the average speed of flights decreases.

B. The relationship is linear.

C. There is an overall postive association between distance and average speed.

D. There are no outliers.

E. The distribution of distances are uniform over 0 to 5000 miles.

# type your code for Question 9 here, and Knit

nycflights <- nycflights %>% mutate(avg_speed = distance / air_time)
ggplot(data = nycflights, aes(x = distance, y = avg_speed)) + geom_point()

Question 10 Suppose you define a flight to be “on time” if it gets to the destination on time or
earlier than expected, regardless of any departure delays. Mutate the data frame to
create a new variable called arr_type with levels "on time" and "delayed"
based on this definition. Then, determine the on time arrival percentage based on
whether the flight departed on time or not. What proportion of flights that were
"delayed" departing arrive "on time"?

length(which((nycflightsdept ype ==" delayed "))/length(which(nycf lights dep_type == “delayed” &

nycflights$arr_type == “on time”))

[NUMERIC INPUT]

# type your code for Question 9 here, and Knit

Week 2 Lab - Introduction To Data - Coursera
No ratings yet
Week 2 Lab - Introduction To Data - Coursera
6 pages
Assignment 3
No ratings yet
Assignment 3
6 pages
CT 4201304
No ratings yet
CT 4201304
26 pages
Intro To Data Coursera
No ratings yet
Intro To Data Coursera
9 pages
Exercises 01
No ratings yet
Exercises 01
2 pages
KrutikaKolhe-862467252-HW4
No ratings yet
KrutikaKolhe-862467252-HW4
16 pages
Dplyr Tutorial
100% (1)
Dplyr Tutorial
22 pages
Loading Datasets From Excel/CSV: A) Local R Database Dataset
No ratings yet
Loading Datasets From Excel/CSV: A) Local R Database Dataset
4 pages
18BCE10291 - Outliers Assignment
No ratings yet
18BCE10291 - Outliers Assignment
10 pages
Intro To Analytics and ML With Sparklyr
No ratings yet
Intro To Analytics and ML With Sparklyr
63 pages
Introduction To Dplyr
No ratings yet
Introduction To Dplyr
9 pages
Transformation Setting It Up. Library (Nycflights13) Library (Tidyverse)
No ratings yet
Transformation Setting It Up. Library (Nycflights13) Library (Tidyverse)
3 pages
Task:-5: Name:-Shambel Gonfa Reg no:-18BCE2429 Data Vitualization Lab Course code:-CSE3020
No ratings yet
Task:-5: Name:-Shambel Gonfa Reg no:-18BCE2429 Data Vitualization Lab Course code:-CSE3020
8 pages
Task:-5: Name:-Shambel Gonfa Reg no:-18BCE2429 Data Vitualization Lab Course code:-CSE3020
No ratings yet
Task:-5: Name:-Shambel Gonfa Reg no:-18BCE2429 Data Vitualization Lab Course code:-CSE3020
8 pages
cl1 Aer
No ratings yet
cl1 Aer
4 pages
Boston Logan Airport in 2015
No ratings yet
Boston Logan Airport in 2015
34 pages
Airline Data Analysis
No ratings yet
Airline Data Analysis
20 pages
Anyflights
No ratings yet
Anyflights
12 pages
LAB1 - Descriptive Statistics
No ratings yet
LAB1 - Descriptive Statistics
4 pages
Introduction To Dplyr
No ratings yet
Introduction To Dplyr
14 pages
Practical 9- Time-series forecasting
No ratings yet
Practical 9- Time-series forecasting
5 pages
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
No ratings yet
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
8 pages
Main Summary
No ratings yet
Main Summary
19 pages
Ormulate The Data Science Problem
No ratings yet
Ormulate The Data Science Problem
5 pages
ANOVA
No ratings yet
ANOVA
8 pages
Tài liệu không có tiêu đề (1)
No ratings yet
Tài liệu không có tiêu đề (1)
7 pages
Designing XSD diagrams vol1
From Everand
Designing XSD diagrams vol1
Jose Luis Arias Cobreros
No ratings yet
mda_practical2_eda
No ratings yet
mda_practical2_eda
50 pages
Thong Ke Mo Ta EDA ANOVA 1
No ratings yet
Thong Ke Mo Ta EDA ANOVA 1
17 pages
Tidyverse AssigmentMishalM
No ratings yet
Tidyverse AssigmentMishalM
2 pages
Assignment 2 Solution
No ratings yet
Assignment 2 Solution
6 pages
Tutorial 9
No ratings yet
Tutorial 9
1 page
14 Work With Big Data
No ratings yet
14 Work With Big Data
74 pages
Flight Delay Prediction - Tomer & Ofek
No ratings yet
Flight Delay Prediction - Tomer & Ofek
29 pages
SNU Assignment 1
No ratings yet
SNU Assignment 1
3 pages
R with SQL (2)
No ratings yet
R with SQL (2)
8 pages
Practice of Introductory Time Series With R
No ratings yet
Practice of Introductory Time Series With R
22 pages
Relational Data
No ratings yet
Relational Data
36 pages
LEARNING R PROGRAMMING FOR DATA SCIENCE ENTHUSIASTS
No ratings yet
LEARNING R PROGRAMMING FOR DATA SCIENCE ENTHUSIASTS
8 pages
Analysis of Factors in Flight Delay: Yiyang Xu, Luyao Liu, Xichen Gao and Fanyu Frank Zeng
No ratings yet
Analysis of Factors in Flight Delay: Yiyang Xu, Luyao Liu, Xichen Gao and Fanyu Frank Zeng
7 pages
Lab EDA and Hypothesis Testing
No ratings yet
Lab EDA and Hypothesis Testing
2 pages
Dav Ex 4 - 099
No ratings yet
Dav Ex 4 - 099
4 pages
Descriptive Statistics, Hypothesis Testing, and Basic
No ratings yet
Descriptive Statistics, Hypothesis Testing, and Basic
62 pages
BTK XSTK Chính TH C
No ratings yet
BTK XSTK Chính TH C
34 pages
Data - Table Tutorial (With 50 Examples) PDF
No ratings yet
Data - Table Tutorial (With 50 Examples) PDF
13 pages
Data Presentation Final
No ratings yet
Data Presentation Final
14 pages
Coursera Notes
No ratings yet
Coursera Notes
4 pages
Data Handling in R - Introduction To Dplyr
No ratings yet
Data Handling in R - Introduction To Dplyr
2 pages
Time Series Analysis
No ratings yet
Time Series Analysis
49 pages
IJRTI2305086
No ratings yet
IJRTI2305086
6 pages
Data Transformation 1 Reviewed
No ratings yet
Data Transformation 1 Reviewed
43 pages
(P3)
No ratings yet
(P3)
9 pages
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet
Report
No ratings yet
Report
5 pages
Learn Visual Data Analysis Live and in Person!
No ratings yet
Learn Visual Data Analysis Live and in Person!
20 pages
American Airlines Flight Arrival Delay Analysis
No ratings yet
American Airlines Flight Arrival Delay Analysis
11 pages
Practicaal Session Lecture3-Set Up For R Programming Language For Data Analytics
No ratings yet
Practicaal Session Lecture3-Set Up For R Programming Language For Data Analytics
11 pages
Time Series Analysis
No ratings yet
Time Series Analysis
4 pages
Ismaykim1 PDF
No ratings yet
Ismaykim1 PDF
522 pages
C Language Programming Codes
From Everand
C Language Programming Codes
Durgesh
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Lecture 33 - Nucleophilic Substiution-2 - 27.10.2014
No ratings yet
Lecture 33 - Nucleophilic Substiution-2 - 27.10.2014
42 pages
Lecture 25 - RG - NMR - Chemical Shift - 7.10.2014
No ratings yet
Lecture 25 - RG - NMR - Chemical Shift - 7.10.2014
30 pages
Week 1 Lab - Coursera
No ratings yet
Week 1 Lab - Coursera
4 pages
Week 3 Lab: Probability: 5/5 Points Earned (100%)
No ratings yet
Week 3 Lab: Probability: 5/5 Points Earned (100%)
3 pages
Andrews Et Al 2022 Concussions in The National Hockey League Analysis of Incidence Return To Play and Performance
No ratings yet
Andrews Et Al 2022 Concussions in The National Hockey League Analysis of Incidence Return To Play and Performance
6 pages
ARTICLE
No ratings yet
ARTICLE
8 pages
London School of Commerce: An Assignment On Business Research Methodology
100% (1)
London School of Commerce: An Assignment On Business Research Methodology
6 pages
FINAL ANNEXES For AO On INCREASE OF FEES
No ratings yet
FINAL ANNEXES For AO On INCREASE OF FEES
17 pages
Role of A Naval Architect in The Maritime Industry
No ratings yet
Role of A Naval Architect in The Maritime Industry
3 pages
OB Week 5 Perception
No ratings yet
OB Week 5 Perception
32 pages
Non Stationary Signals
100% (1)
Non Stationary Signals
48 pages
ISO 9001:2015 Selected Clauses: 8.1 Operational Planning and Control
No ratings yet
ISO 9001:2015 Selected Clauses: 8.1 Operational Planning and Control
2 pages
Quiz
No ratings yet
Quiz
5 pages
Nurs 05: Community Health Nursing 1 Community Health Nursing of Individual and Family As Client
No ratings yet
Nurs 05: Community Health Nursing 1 Community Health Nursing of Individual and Family As Client
23 pages
GroupDiscussion PreRequisite
No ratings yet
GroupDiscussion PreRequisite
5 pages
(Ebook PDF) Ambidextrous Strategy Antecedents Strategic Choices and Performance 1st edition by Agnieszka Zakrzewska Bielawska 1000350894 9781000350890 full chapters - Download the ebook now to start reading without waiting
100% (5)
(Ebook PDF) Ambidextrous Strategy Antecedents Strategic Choices and Performance 1st edition by Agnieszka Zakrzewska Bielawska 1000350894 9781000350890 full chapters - Download the ebook now to start reading without waiting
83 pages
Penerapan Model Pembelajaran TGT Untuk Meningkatkan Hasil Belajar, Partisipasi, Dan Sikap Siswa
No ratings yet
Penerapan Model Pembelajaran TGT Untuk Meningkatkan Hasil Belajar, Partisipasi, Dan Sikap Siswa
9 pages
The Effect of Performance Audit On The Implementation of Fadama II Project in Federal Capital Territory Abuja
No ratings yet
The Effect of Performance Audit On The Implementation of Fadama II Project in Federal Capital Territory Abuja
10 pages
SC 4 - Transportation Management - CA Students
No ratings yet
SC 4 - Transportation Management - CA Students
12 pages
Error and Its Types
No ratings yet
Error and Its Types
17 pages
Practical Research 1 - 11 - Q1 - M18
No ratings yet
Practical Research 1 - 11 - Q1 - M18
15 pages
Penelitian Epidemiologi: (Observational Dan Analitik) : Dr. Lukman Waris Univ Alma Ata Yogyakarta Rabu, 30 Oktober 2019
No ratings yet
Penelitian Epidemiologi: (Observational Dan Analitik) : Dr. Lukman Waris Univ Alma Ata Yogyakarta Rabu, 30 Oktober 2019
59 pages
Pradigms and Characteristics of A Good Qualitative Research
No ratings yet
Pradigms and Characteristics of A Good Qualitative Research
14 pages
Report On Maintenance Programme Recommendations and Dissemination
No ratings yet
Report On Maintenance Programme Recommendations and Dissemination
13 pages
Anas Alam Faizli, Master of Proj MGMT: Assignment Organizational Business and Management, BMOM5203
100% (1)
Anas Alam Faizli, Master of Proj MGMT: Assignment Organizational Business and Management, BMOM5203
38 pages
Chapter 2 Risk Management A Powerful Tool
No ratings yet
Chapter 2 Risk Management A Powerful Tool
19 pages
Camp General Emilio Aguinaldo High School: Boni Serrano Rd. Murphy, Cubao, Quezon City
100% (1)
Camp General Emilio Aguinaldo High School: Boni Serrano Rd. Murphy, Cubao, Quezon City
51 pages
Social Studies School Based Assessment 1234
100% (1)
Social Studies School Based Assessment 1234
24 pages
EBS 234 Assessment in Basic Schools
No ratings yet
EBS 234 Assessment in Basic Schools
92 pages
Jurnal - Daftar Jurnal Terindex Scopus
No ratings yet
Jurnal - Daftar Jurnal Terindex Scopus
4 pages
Tripod-BETA Incident Investigation and Analysis
No ratings yet
Tripod-BETA Incident Investigation and Analysis
10 pages
Actuarial Studies PDF
No ratings yet
Actuarial Studies PDF
2 pages
Customer Perceptions Towards The Service Quality: A Case Study of Bonchon Chicken Restaurant, Olongapo Branch
No ratings yet
Customer Perceptions Towards The Service Quality: A Case Study of Bonchon Chicken Restaurant, Olongapo Branch
45 pages

Introduction To Data-2

Uploaded by

Introduction To Data-2

Uploaded by

Introduction

year, month, day: Date of departure

ggplot(data = nycflights, aes(x = dep_delay)) +

This function says to plot the dep_delay variable from the nycflights data frame on the x­axis. It also defines

ggplot(data = nycflights, aes(x = dep_delay)) +

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Line 1: Take the nycflights data frame, filter for flights headed to RDU, and save the result as a

sfo_feb_flights <- nycflights %>%

# type your code for Question 2 here, and Knit

Question 3 Calculate the median and interquartile range for arr_delays of flights in the

# type your code for Question 3 here, and Knit

# type your code for Question 4 here, and Knit

# type your code for Question 5 here, and Knit

ggplot(nycflights, aes(x = factor(month), y = dep_delay)) +

nycflights <- nycflights %>%

# type your code for Question 7 here, and Knit

ggplot(data = nycflights, aes(x = origin, fill = dep_type)) +

# type your code for Question 8 here, and Knit

Question 9 Make a scatterplot of avg_speed vs. distance. Which of the following is true

# type your code for Question 9 here, and Knit

length(which((nycflightsdept ype ==" delayed "))/length(which(nycf lights dep_type == “delayed” &

# type your code for Question 9 here, and Knit

You might also like

This function says to plot the dep_delay variable from the nycflights data frame on the xaxis. It also defines