0% found this document useful (0 votes)
148 views46 pages

My Capstone Project Presentation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
148 views46 pages

My Capstone Project Presentation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Valentin Veintemilla

October 9th, 2024


Outline

• Executive Summary
• Introduction
• Methodology
• Results
• Conclusion
• Appendix

2
Executive Summary
• Summary of methodologies
o SpaceX Data Collection using SpaceX API
o SpaceX Data Collection using Web Scraping
o SpaceX Data Wrangling
o SpaceX Exploratory Data Analysis using SQL
o SpaceX EDA Data Visualization
o SpaceX Launch Sites Analysis with Folium and Plotly Dash
o SpaceX Machine Learning Landing Prediction

• Summary of all results


o EDA Results
o Interactive Visual Analytics and Dashboards
o Predictive Analysis (Classification) 3
Introduction

• Project background and context


SpaceX advertises Falcon 9 rocket launches on its website with a cost of 62
million dollars; other providers cost upward of 165 million dollars each, much of
the savings is because SpaceX can reuse the first stage. Therefore if we can
determine if the first stage will land, we can determine the cost of a launch. This
information can be used if an alternate company wants to bid against SpaceX for a
rocket launch.
• Problems you want to find answers
In this capstone, we will predict if the Falcon 9 first stage will land successfully
using data from Falcon 9 rocket launches advertised on its website.

4
Section 1

5
Methodology

Executive Summary
• Data collection methodology:
• Data was collected using SpaceX REST API and web scrapping from Wikipedia

• Perform data wrangling


• Data was processed using one-hot encoding for categorical features

• Perform exploratory data analysis (EDA) using visualization and SQL


• Perform interactive visual analytics using Folium and Plotly Dash
• Perform predictive analysis using classification models
• Data was split into train and test set. After that, we used different classification
algorithms and chose the best one.
6
Data Collection

• Data collection is the process of gathering and measuring information on targeted


variables in an established system, which then enables one to answer relevant questions
and evaluate outcomes. As mentioned, the dataset was collected by REST API and Web
Scrapping from Wikipedia
• For REST API, its started by using the get request. Then, we decoded the response
content as Json and turn it into a pandas dataframe using json_normalize(). We then
cleaned the data, checked for missing values and fill with whatever needed.
• For web scrapping, we Will use the BeautifulSoup to extract the launch records as HTML
table, parse the table and convert it to a pandas dataframe for further analysis

7
Data Collection – SpaceX API

• The information was extracted form


a Public API where the data is
stored
(https://api.spacexdata.com/v4/lau
nches/past)

• https://github.com/vale20m/Applied
-Data-Science-
Capstone/blob/main/1-jupyter-labs-
spacex-data-collection-api.ipynb

8
Data Collection - Scraping

• The information was


extracted form a Wikipedia
table
(https://en.wikipedia.org/wiki
/List_of_Falcon_9_and_Falco
n_Heavy_launches)
• https://github.com/vale20m/
Applied-Data-Science-
Capstone/blob/main/2-
jupyter-labs-
webscraping.ipynb

9
Data Wrangling

• We performed exploratory data analysis and


determined the training labels.
• We calculated the number of launches at each
site, and the number and occurrence of each
orbits.
• We created landing outcome label from outcome
column and exported the results to csv.
• https://github.com/vale20m/Applied-Data-
Science-Capstone/blob/main/3-labs-jupyter-
spacex-Data%20wrangling.ipynb

10
EDA with Data Visualization

• https://github.com/vale20m/Applied-Data-Science-Capstone/blob/main/5-
11
edadataviz.ipynb
EDA with SQL
• We performed SQL queries to gather and understand data form dataset:
o Displaying the names of the unique lauunch sites in the space mission.

o Display 5 records where launch sites begin with the string 'CCA'.

o Display the total payload mass carried by boosters launched by NASA (CRS).

o Display average payload mass carried by booster version F9 VI. 1.

o List the date when the first successful landing outcome in ground pad was achieved.

o List the names of the boosters which have success in drone ship and have payload mass greater than 4000 but less than
6000.

o List the total number of successful and failure mission outcomes.

o List the names of the booster_versions which have carried the maximum payload mass.

o List the records which Will display the month names, faiilure landing_ouutcomes in drone ship, booster versions, launch_site for
the months in year 2015.

o Rank the count of successful landiing_outcomes between the date 04-06-2010 and 20-03-2017 in descending order.

• https://github.com/vale20m/Applied-Data-Science-Capstone/blob/main/4-jupyter- 12

labs-eda-sql-coursera_sqllite.ipynb
Build an Interactive Map with Folium
• Folium map object is a map centered on NASA Johnson Space Center at Houson, Textas
o Red circle at NASA Johnson Space Center's coordinate with label showing its name (fo/ium.Cirde, fo/ium.map.Marker).

o Red circles at each launch site coordinates with label showing launch site name (fo/ium.Circ/e, fo/ium.map.Marker,
fo/ium.features.Div/con).

o The grouping of points in a cluster to display multiple and different information for the same coordinates
(fo/ium.p/ugins.MarkeOuster).

o Markers to show successful and unsuccessful landings. Green for successful landing and Red for unsuccessful landing.
(folium.map.Marker, fo/ium./con).

o Markers to show distance between launch site to key locations (railway, highway, coastway, city) and plot a line between
them. (folium.map.Marker, fo/iumPo/yLine, fo/ium.features.Div/con).

• The objects are created in order to understand better the problem and the data. We can show easily all
launch sites, their surroundings and the number of successful and unsuccessful landings.

• https://github.com/vale20m/Applied-Data-Science-Capstone/blob/main/6-
lab_jupyter_launch_site_location.ipynb 13
Build a Dashboard with Plotly Dash

• Dashboard has dropdown, pie chart, rangeslider and scatter plot components.
o Dropdown allows a user to choose the launch site or all launch sites
(dash_core_components.Dropdown).

o Pie chart shows the total success and the total failure for the launch site chosen with the dropdown
component (p/ot/y.express.pie).

o Rangeslider allows a user to select a payload mass in a fixed range (dash_ core_
components.RangeS/ider).

o Scatter chart shows the relationship between two variables, in particular Success vs Payload Mass
(p/ot/y.express.scatter).

• https://github.com/vale20m/Applied-Data-Science-Capstone/blob/main/7-
spacex_dash_app.py

14
Predictive Analysis (Classification)

• https://github.com/vale20m/Applied-Data-Science-Capstone/blob/main/8- 15
SpaceX_Machine%20Learning%20Prediction_Part_5.ipynb
Results

• Exploratory data analysis results


• Interactive analytics demo in screenshots
• Predictive analysis results

16
Section 2
Flight Number vs. Launch Site

We observe that, for each site, the success rate is increasing.

18
Payload vs. Launch Site

Depending on the launch site, a heavier payload may be a consideration for a


successful landing. On the other hand, a too heavy payload can make a landing fail.

19
Success Rate vs. Orbit Type

With this plot, we can see success rate for different orbit types. We note that ES-L1,
GEO, HEO, SSO have the best success rate.
20
Flight Number vs. Orbit Type

We notice that the success rate increases with the number of flights for the LEO orbit.
For some orbits like GTO, there is no relation between the success rate and the
number of flights. But we can suppose that the high success rate of some orbits like
SSO or HEO is due to the knowledge learned during former launches for other orbits.
21
Payload vs. Orbit Type

The weight of the payloads can have a great influence on the success rate of the
launches in certain orbits. For example, heavier payloads improve the success rate for
the LEO orbit. Another finding is that decreasing the payload weight for a GTO orbit
improves the success of a launch.

22
Launch Success Yearly Trend

Since 2013, we can see an increase in the Space X Rocket success rate.
23
All Launch Site Names

24
Launch Site Names Begin with 'CCA'

25
Total Payload Mass

26
Average Payload Mass by F9 v1.1

27
First Successful Ground Landing Date

28
Successful Drone Ship Landing with Payload between 4000 and 6000

29
Total Number of Successful and Failure Mission Outcomes

30
Boosters Carried Maximum Payload

31
2015 Launch Records

32
Rank Landing Outcomes Between 2010-06-04 and 2017-03-20

33
Section 3
Folium map – Ground stations

35
Folium map – Color Labeled Markers

Green marker represents successful launches. Red marker represents unsuccessful launches. We
note that KSC LC-39A has a higher launch success rate.
36
Folium Map – Distances between CCAFS SLC-40 and its proximities

Is CCAFS SLC-40 in close proximity to railways ? Yes


Is CCAFS SLC-40 in close proximity to highways ? Yes
Is CCAFS SLC-40 in close proximity to coastline ? Yes
37
Do CCAFS SLC-40 keeps certain distance away from cities ? No
Section 4
Dashboard – Total success by Site

39
Dashboard – Total success launches for Site KSC LC-39A

40
Dashboard – Payload mass vs Outcome for all sites with different payload mass selected

41
Section 5
Classification Accuracy

43
Confusion Matrix

44
Conclusions
• The success of a mission can be explained by several factors such as the launch site, the orbit and
especially the number of previous launches. Indeed, we can assume that there has been a gain in
knowledge between launches that allowed to go from a launch failure to a success.
• The orbits with the best success rates are GEO, HEO, SSO, ES-LI.
• Depending on the orbits, the payload mass can be a criterion to take into account for the success
of a mission. Some orbits require a light or heavy payload mass. But generally low weighted
payloads perform better than the heavy weighted payloads.
• With the current data, we cannot explain why some launch sites are better than others (KSC LC-
39A is the best launch site). To get an answer to this problem, we could obtain atmospheric or
other relevant data.
• For this dataset, we choose the Decision Tree Algorithm as the best model even if the test
accuracy between all the models used is identical. We choose Decision Tree Algorithm because it
has a better train accuracy.
45

You might also like