0% found this document useful (0 votes)

1 views36 pages

Assignment1 Code and Conclude DSA Nikhil Mishra

This report analyzes flight delay patterns using a dataset of 2,999 flight records to understand trends and build predictive models for scheduling efficiency. It covers dataset overview, data preprocessing, model implementation, and results, highlighting the importance of handling missing values and feature relevance. The findings indicate that while most flights are on schedule, delays occur due to various factors, and further analysis is needed to address data limitations.

Uploaded by

nikgdsc2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views36 pages

Assignment1 Code and Conclude DSA Nikhil Mishra

Uploaded by

nikgdsc2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

NAME- NIKHIL MISHRA

URN N0- 2022-B-06022004D

SUBJECT- DATA SCIENCE
APPLICATION
BRANCH- B.TECH ITDS
TH
SEMESTER- 6
SECTION - A

PROF - PRACHI SHUKLA

Flight Delay Analysis Report ✈️
1. Introduction
Flight delays significantly impact airlines, passengers, and airport operations. This
report analyzes flight delay patterns using a dataset containing details about flight
schedules, airline information, and delay times. The goal is to understand delay
trends and build predictive models to improve scheduling efficiency.

2. Dataset Overview
The dataset contains 2,999 flight records with details on airlines, flight numbers,
airports, departure/arrival times, delays, cancellations, and distances. It includes
31 columns, with missing values in delays and cancellations. Key insights involve
airline performance, delays, and cancellation patterns.

Here’s a detailed breakdown:

2.1 Dataset Summary
 The dataset contains information on flights, including their schedules,
departure and arrival times, airlines, and delay causes.
 It consists of X rows and Y columns (actual values to be determined from
implementation).
 The dataset includes both numerical and categorical features relevant to
flight delay analysis.
 Common reasons for flight delays in the dataset include weather
conditions, air traffic congestion, mechanical issues, and operational
inefficiencies.
2.2 Data Structure
 The dataset consists of flight schedules, departure and arrival times, airline
information, and delay reasons.
 The data was inspected using df.info() to identify missing values and data
types.
 A summary of the dataset was obtained using df.describe() to analyze
distributions of numerical features.
2.3 Key Features
Flight Details:

 FlightID – Unique identifier for each flight.

 TailNumber – Aircraft registration number.

 Airline – Name of the airline operating the flight.

Time Stamps:

 ScheduledDeparture – Planned departure time.

 ActualDeparture – Actual departure time.

 ScheduledArrival – Planned arrival time.

 ActualArrival – Actual arrival time.
Delay Information:
 DepartureDelay – Difference between actual and scheduled departure time
(in minutes).
 ArrivalDelay – Difference between actual and scheduled arrival time (in
minutes).
 DelayReason – Categorical variable indicating the cause of delay.
 TAXI_OUT- Time taken to taxi from the gate to takeoff (in minutes).
 WHEELS_OFF – The time the aircraft left the ground (in minutes since
midnight).
 SCHEDULED_TIME – The planned total flight duration (in minutes).
 ELAPSED_TIME – The actual time spent from departure to arrival (in
minutes).
 AIR_TIME – The actual time the aircraft was in the air (in minutes).
 DISTANCE – The flight distance between origin and destination (in miles).
 WHEELS_ON – The time the aircraft landed (in minutes since midnight).
 TAXI_IN – Time taken to taxi from landing to the gate (in minutes).
 SCHEDULED_ARRIVAL – The planned arrival time (in minutes since
midnight).
 ARRIVAL_TIME – The actual arrival time (in minutes since midnight).
 ARRIVAL_DELAY – The delay in arrival (in minutes, negative values indicate
early arrival).
 DIVERTE – Indicates if the flight was diverted (1 = Yes, 0 = No).
 CANCELLED – Indicates if the flight was canceled (1 = Yes, 0 = No)
 CANCELLATION_REASON – The reason for cancellation (A = Airline, B =
Weather, C = National Air System, D = Security).
 AIR_SYSTEM_DELAY – Delay caused by air traffic control or congestion (in
minutes).
 SECURITY_DELAY – Delay caused by security issues (in minutes).
 AIRLINE_DELAY – Delay caused by the airline (in minutes).
 LATE_AIRCRAFT_DELAY – Delay due to a late incoming aircraft (in minutes).
 WEATHER_DELAY – Delay caused by weather conditions (in minutes).

3. Method Selection

3.1 Choosing the Analytic Task

 The primary objective is predictive modeling, specifically a classification
task to predict whether a flight will be delayed based on the given features.
 The dataset’s features were analyzed for their relevance to classification.

3.2 Feature Relevance Analysis

 A correlation heatmap was generated to identify strong relationships
between numerical features.
 Categorical variables were analyzed using frequency distributions to
understand their predictive strength.
4. Exploratory Data Analysis (EDA)
4.1 Missing Values Analysis

 Missing values were identified using df.isnull().sum().

 Missing numerical values were imputed with the median, while categorical
values were filled using the mode.

 CODE:
OUTPUT:

4.2 Feature Distribution Analysis

 Histograms and boxplots were plotted to visualize the distribution of
numerical variables.
 Count plots were used for categorical variables to understand the
frequency distribution of airlines, delay reasons, and departure times.

 CODE:

OUTPUT:
4.3 Outlier Detection
 Outliers were detected using boxplots and Interquartile Range (IQR)-based
filtering.
 Extreme values were managed using winsorization or capping techniques.
 CODE:

 OUTPUT:
5. Data Preprocessing

5.1 Handling Irrelevant Features

 Columns such as FlightID, TailNumber, and thumbnail_link were removed
as they do not contribute to predictive analysis.
5.2 Encoding Categorical Variables
 Label Encoding was applied to categorical variables such as airlines and
delay reasons.

5.3 Feature Scaling

 Standardization using StandardScaler was applied to numerical features for
better model performance.
5.4 Feature Engineering
 New features such as DayOfWeek (extracted from ScheduledDeparture)
and HourOfDay were created to identify trends.
 Interaction features were generated by combining related variables where
applicable.

6. Model Implementation

6.1 Data Splitting

 The dataset was split into training (80%) and testing (20%) subsets using
train_test_split().
6.2 Model Selection
 A RandomForestClassifier was chosen as the baseline model for
classification.
 Other models such as Logistic Regression and XGBoost were considered for
optimization.
6.3 Model Training
 The model was trained using model.fit(X_train, y_train).
 Hyperparameters were set to default initially, with plans for further tuning.

6.4 Model Evaluation

 Predictions were made using model.predict(X_test).

 Accuracy, precision, recall, and F1-score were computed to evaluate the
model’s effectiveness.

 A confusion matrix was generated to analyze misclassifications.

7. Results & Observations
7.1 Impact of Data Preprocessing

 Standardization improved model stability.

 Encoding categorical variables allowed seamless training.
 Removing irrelevant features enhanced model performance by reducing
dimensionality.

7.2 Model Performance

 The baseline Random Forest model achieved an accuracy of X% (value
from implementation).
 Performance can be further improved with hyperparameter tuning and
feature selection.
7.3 Limitations
 Some features might require more advanced transformations.

 Data imbalance needs to be addressed using resampling techniques.

Visualizations

Flight Delay Map

Delay Reasons by Airline
Conclusion
 The flight dataset provides insights into airline performance, delays, and
cancellations. While most flights operate on schedule, some experience
delays due to airline, weather, or system issues.

 Missing data in delay reasons and cancellations should be addressed for

accurate analysis. Key trends, such as peak delay times, worst-
performing airlines, and frequent cancellation patterns, can be explored
further.

 Overall, the dataset is valuable for understanding flight punctuality and

operational challenges, with potential for deeper analysis through
visualizations and correlation studies.
IPYNB FILE LINK: https://colab.research.google.com/drive/1Tt_6eE7BBMmG0LP0Klh-1OFpSbDR9dsI?usp=drive_link

Flight Delay Prediction Based On Machine Learning Full
No ratings yet
Flight Delay Prediction Based On Machine Learning Full
9 pages
Literature Survey Big Data
No ratings yet
Literature Survey Big Data
15 pages
Main Summary
No ratings yet
Main Summary
19 pages
Model
No ratings yet
Model
20 pages
EX - NO: Date: Explore Flight Delay Data Analyzing Factors Contributing To Flight Delays
No ratings yet
EX - NO: Date: Explore Flight Delay Data Analyzing Factors Contributing To Flight Delays
4 pages
A Data Mining Approach To Flight Arrival Delay Pre
No ratings yet
A Data Mining Approach To Flight Arrival Delay Pre
6 pages
Analysis of Factors in Flight Delay: Yiyang Xu, Luyao Liu, Xichen Gao and Fanyu Frank Zeng
No ratings yet
Analysis of Factors in Flight Delay: Yiyang Xu, Luyao Liu, Xichen Gao and Fanyu Frank Zeng
7 pages
BH GF
No ratings yet
BH GF
16 pages
BPP Business School - Applied Modelling and Visualisation
No ratings yet
BPP Business School - Applied Modelling and Visualisation
19 pages
Flight Fare Prediction Using ML Algorithms
No ratings yet
Flight Fare Prediction Using ML Algorithms
40 pages
FLIGHT DELAY Prediction 4th
No ratings yet
FLIGHT DELAY Prediction 4th
18 pages
Flight Price Prediction
No ratings yet
Flight Price Prediction
34 pages
Project 1.1
No ratings yet
Project 1.1
3 pages
Report
No ratings yet
Report
5 pages
Open Project Analytics
No ratings yet
Open Project Analytics
4 pages
Belcastro 2016
No ratings yet
Belcastro 2016
20 pages
Delay Analysis Paper
No ratings yet
Delay Analysis Paper
9 pages
Flight Delay Prediction - Tomer & Ofek
No ratings yet
Flight Delay Prediction - Tomer & Ofek
29 pages
Flight Price Prediction Project Report in PDF
No ratings yet
Flight Price Prediction Project Report in PDF
34 pages
Flight Price Prediction Document
No ratings yet
Flight Price Prediction Document
12 pages
Duplichecker Plagiarism Report
No ratings yet
Duplichecker Plagiarism Report
3 pages
Project 1
No ratings yet
Project 1
9 pages
DMcase 2
No ratings yet
DMcase 2
5 pages
Delay Prediction
No ratings yet
Delay Prediction
37 pages
Flight DElay Report
No ratings yet
Flight DElay Report
49 pages
1Z0-1047-24 - Oracle Absence Cloud - Final
100% (1)
1Z0-1047-24 - Oracle Absence Cloud - Final
22 pages
Flight Delay Report
No ratings yet
Flight Delay Report
29 pages
Car Price Prediction
No ratings yet
Car Price Prediction
42 pages
Documentation & Report For Flyzy Flight Cancellation Project
No ratings yet
Documentation & Report For Flyzy Flight Cancellation Project
25 pages
Project Synopsis - Prediction of Flight Delay Analysis
No ratings yet
Project Synopsis - Prediction of Flight Delay Analysis
5 pages
Xtra Online RDW-SD RDW-CV PDF
No ratings yet
Xtra Online RDW-SD RDW-CV PDF
5 pages
SNU Assignment 1
No ratings yet
SNU Assignment 1
3 pages
Bda Kav
No ratings yet
Bda Kav
9 pages
KrishnaBathula 1
No ratings yet
KrishnaBathula 1
6 pages
Presentation On Flight Price Prediction
No ratings yet
Presentation On Flight Price Prediction
30 pages
IJRTI2305086
No ratings yet
IJRTI2305086
6 pages
Predicting Flight Delays With Error Calculation Using Machine Learned Classifiers
No ratings yet
Predicting Flight Delays With Error Calculation Using Machine Learned Classifiers
6 pages
Big Data Analytics Using Predictive Analysis
No ratings yet
Big Data Analytics Using Predictive Analysis
4 pages
Ormulate The Data Science Problem
No ratings yet
Ormulate The Data Science Problem
5 pages
A Review On Flight Delay Prediction
No ratings yet
A Review On Flight Delay Prediction
21 pages
Seminar PPT - Lipika-1
No ratings yet
Seminar PPT - Lipika-1
21 pages
Base Paper (Flight Delay Prediction)
No ratings yet
Base Paper (Flight Delay Prediction)
6 pages
On The Relevance of Data Science For Fli
No ratings yet
On The Relevance of Data Science For Fli
17 pages
Data Presentation Final
No ratings yet
Data Presentation Final
14 pages
GNR 652 Assignment 2
No ratings yet
GNR 652 Assignment 2
4 pages
Flight Delay Prediction
No ratings yet
Flight Delay Prediction
17 pages
Flight DElay Report
No ratings yet
Flight DElay Report
49 pages
Airline On-Time Performance
No ratings yet
Airline On-Time Performance
4 pages
A Machine Learning Model For Flight Delay Prediction: Certificate
No ratings yet
A Machine Learning Model For Flight Delay Prediction: Certificate
17 pages
Ieee Guide For Synchronization, Calibration, Testing, and Installation of Phasor Measurement Units (Pmus) For Power System Protection and Control
0% (1)
Ieee Guide For Synchronization, Calibration, Testing, and Installation of Phasor Measurement Units (Pmus) For Power System Protection and Control
107 pages
Flight Delay Prediction System Paper - 802 - 826 - 828
No ratings yet
Flight Delay Prediction System Paper - 802 - 826 - 828
7 pages
18BCE10291 - Outliers Assignment
No ratings yet
18BCE10291 - Outliers Assignment
10 pages
Exercises 01
No ratings yet
Exercises 01
2 pages
REPORT On Time Flights Performance
No ratings yet
REPORT On Time Flights Performance
9 pages
Software Project1
No ratings yet
Software Project1
76 pages
Intro To Data Coursera
No ratings yet
Intro To Data Coursera
9 pages
Netaji Subhash Engineering College
No ratings yet
Netaji Subhash Engineering College
24 pages
American Airlines Flight Arrival Delay Analysis
No ratings yet
American Airlines Flight Arrival Delay Analysis
11 pages
Flight Delay Prediction Team3
No ratings yet
Flight Delay Prediction Team3
8 pages
Touch Math Autism
No ratings yet
Touch Math Autism
11 pages
Delta Ferrite
No ratings yet
Delta Ferrite
4 pages
Assignment 2 Solution
No ratings yet
Assignment 2 Solution
6 pages
GB ENEXIO SKS Lamella Clarifier
No ratings yet
GB ENEXIO SKS Lamella Clarifier
4 pages
Categorical Propositions Syllolism
No ratings yet
Categorical Propositions Syllolism
6 pages
Good - Farrwell Higher Phy Q&A
100% (1)
Good - Farrwell Higher Phy Q&A
256 pages
Shaft Requirements
No ratings yet
Shaft Requirements
4 pages
Jurnal Tanah
No ratings yet
Jurnal Tanah
9 pages
Performance Evaluation of A Single Cylinder Four Stroke Petrol Engine
No ratings yet
Performance Evaluation of A Single Cylinder Four Stroke Petrol Engine
5 pages
Production and Optimization of Xylanase and α-Amylase from Non-Saccharomyces Yeasts (Pichia membranifaciens)
No ratings yet
Production and Optimization of Xylanase and α-Amylase from Non-Saccharomyces Yeasts (Pichia membranifaciens)
10 pages
TinyG Report - Final
No ratings yet
TinyG Report - Final
44 pages
Golden Physics Book
No ratings yet
Golden Physics Book
90 pages
Problem-1 Unloading of Benzene From A Tanker Into A Storage Tank
No ratings yet
Problem-1 Unloading of Benzene From A Tanker Into A Storage Tank
2 pages
CMX865A Addressing SIA Protocols With The CMX865A
No ratings yet
CMX865A Addressing SIA Protocols With The CMX865A
32 pages
National 5 Maths Memory List
No ratings yet
National 5 Maths Memory List
2 pages
Service Manual (E) : Valid For Version 11 .Lo - 11.24
No ratings yet
Service Manual (E) : Valid For Version 11 .Lo - 11.24
41 pages
Battery Charger or Battireis PM
No ratings yet
Battery Charger or Battireis PM
4 pages
Arif Habib Securities LTD Dewan Farooque Motors LTD.: Total Capital
No ratings yet
Arif Habib Securities LTD Dewan Farooque Motors LTD.: Total Capital
4 pages
Summary Performance Rating
No ratings yet
Summary Performance Rating
51 pages
7 Series Fpgas Data Sheet: Overview: General Description
No ratings yet
7 Series Fpgas Data Sheet: Overview: General Description
18 pages
New Scheme Based On AICTE Flexible Curricula
No ratings yet
New Scheme Based On AICTE Flexible Curricula
13 pages
Ammonia QP
No ratings yet
Ammonia QP
4 pages
Week 6
No ratings yet
Week 6
5 pages
Conveyor Assembly
No ratings yet
Conveyor Assembly
1 page
Rehman2019 PDF
No ratings yet
Rehman2019 PDF
14 pages
Chemical Composition of Rainwater Captured in An Oil Refinery
No ratings yet
Chemical Composition of Rainwater Captured in An Oil Refinery
6 pages
Chemistry Unit 1 Review Sheet
No ratings yet
Chemistry Unit 1 Review Sheet
2 pages
06 Maths Ws 09 Data Handling 01
No ratings yet
06 Maths Ws 09 Data Handling 01
3 pages
Designing XSD diagrams vol1
From Everand
Designing XSD diagrams vol1
Jose Luis Arias Cobreros
No ratings yet
Airplane Flying Handbook (2025): FAA-H-8083-3C
From Everand
Airplane Flying Handbook (2025): FAA-H-8083-3C
Federal Aviation Administration (FAA)
3.5/5 (13)
Airman Certification Standards: Instrument Rating - Helicopter (2025): FAA-S-ACS-14
From Everand
Airman Certification Standards: Instrument Rating - Helicopter (2025): FAA-S-ACS-14
Federal Aviation Administration (FAA)
No ratings yet