My Capstone Project Presentation
My Capstone Project Presentation
• Executive Summary
• Introduction
• Methodology
• Results
• Conclusion
• Appendix
2
Executive Summary
• Summary of methodologies
o SpaceX Data Collection using SpaceX API
o SpaceX Data Collection using Web Scraping
o SpaceX Data Wrangling
o SpaceX Exploratory Data Analysis using SQL
o SpaceX EDA Data Visualization
o SpaceX Launch Sites Analysis with Folium and Plotly Dash
o SpaceX Machine Learning Landing Prediction
4
Section 1
5
Methodology
Executive Summary
• Data collection methodology:
• Data was collected using SpaceX REST API and web scrapping from Wikipedia
7
Data Collection – SpaceX API
• https://github.com/vale20m/Applied
-Data-Science-
Capstone/blob/main/1-jupyter-labs-
spacex-data-collection-api.ipynb
8
Data Collection - Scraping
9
Data Wrangling
10
EDA with Data Visualization
• https://github.com/vale20m/Applied-Data-Science-Capstone/blob/main/5-
11
edadataviz.ipynb
EDA with SQL
• We performed SQL queries to gather and understand data form dataset:
o Displaying the names of the unique lauunch sites in the space mission.
o Display 5 records where launch sites begin with the string 'CCA'.
o Display the total payload mass carried by boosters launched by NASA (CRS).
o List the date when the first successful landing outcome in ground pad was achieved.
o List the names of the boosters which have success in drone ship and have payload mass greater than 4000 but less than
6000.
o List the names of the booster_versions which have carried the maximum payload mass.
o List the records which Will display the month names, faiilure landing_ouutcomes in drone ship, booster versions, launch_site for
the months in year 2015.
o Rank the count of successful landiing_outcomes between the date 04-06-2010 and 20-03-2017 in descending order.
• https://github.com/vale20m/Applied-Data-Science-Capstone/blob/main/4-jupyter- 12
labs-eda-sql-coursera_sqllite.ipynb
Build an Interactive Map with Folium
• Folium map object is a map centered on NASA Johnson Space Center at Houson, Textas
o Red circle at NASA Johnson Space Center's coordinate with label showing its name (fo/ium.Cirde, fo/ium.map.Marker).
o Red circles at each launch site coordinates with label showing launch site name (fo/ium.Circ/e, fo/ium.map.Marker,
fo/ium.features.Div/con).
o The grouping of points in a cluster to display multiple and different information for the same coordinates
(fo/ium.p/ugins.MarkeOuster).
o Markers to show successful and unsuccessful landings. Green for successful landing and Red for unsuccessful landing.
(folium.map.Marker, fo/ium./con).
o Markers to show distance between launch site to key locations (railway, highway, coastway, city) and plot a line between
them. (folium.map.Marker, fo/iumPo/yLine, fo/ium.features.Div/con).
• The objects are created in order to understand better the problem and the data. We can show easily all
launch sites, their surroundings and the number of successful and unsuccessful landings.
• https://github.com/vale20m/Applied-Data-Science-Capstone/blob/main/6-
lab_jupyter_launch_site_location.ipynb 13
Build a Dashboard with Plotly Dash
• Dashboard has dropdown, pie chart, rangeslider and scatter plot components.
o Dropdown allows a user to choose the launch site or all launch sites
(dash_core_components.Dropdown).
o Pie chart shows the total success and the total failure for the launch site chosen with the dropdown
component (p/ot/y.express.pie).
o Rangeslider allows a user to select a payload mass in a fixed range (dash_ core_
components.RangeS/ider).
o Scatter chart shows the relationship between two variables, in particular Success vs Payload Mass
(p/ot/y.express.scatter).
• https://github.com/vale20m/Applied-Data-Science-Capstone/blob/main/7-
spacex_dash_app.py
14
Predictive Analysis (Classification)
• https://github.com/vale20m/Applied-Data-Science-Capstone/blob/main/8- 15
SpaceX_Machine%20Learning%20Prediction_Part_5.ipynb
Results
16
Section 2
Flight Number vs. Launch Site
18
Payload vs. Launch Site
19
Success Rate vs. Orbit Type
With this plot, we can see success rate for different orbit types. We note that ES-L1,
GEO, HEO, SSO have the best success rate.
20
Flight Number vs. Orbit Type
We notice that the success rate increases with the number of flights for the LEO orbit.
For some orbits like GTO, there is no relation between the success rate and the
number of flights. But we can suppose that the high success rate of some orbits like
SSO or HEO is due to the knowledge learned during former launches for other orbits.
21
Payload vs. Orbit Type
The weight of the payloads can have a great influence on the success rate of the
launches in certain orbits. For example, heavier payloads improve the success rate for
the LEO orbit. Another finding is that decreasing the payload weight for a GTO orbit
improves the success of a launch.
22
Launch Success Yearly Trend
Since 2013, we can see an increase in the Space X Rocket success rate.
23
All Launch Site Names
24
Launch Site Names Begin with 'CCA'
25
Total Payload Mass
26
Average Payload Mass by F9 v1.1
27
First Successful Ground Landing Date
28
Successful Drone Ship Landing with Payload between 4000 and 6000
29
Total Number of Successful and Failure Mission Outcomes
30
Boosters Carried Maximum Payload
31
2015 Launch Records
32
Rank Landing Outcomes Between 2010-06-04 and 2017-03-20
33
Section 3
Folium map – Ground stations
35
Folium map – Color Labeled Markers
Green marker represents successful launches. Red marker represents unsuccessful launches. We
note that KSC LC-39A has a higher launch success rate.
36
Folium Map – Distances between CCAFS SLC-40 and its proximities
39
Dashboard – Total success launches for Site KSC LC-39A
40
Dashboard – Payload mass vs Outcome for all sites with different payload mass selected
41
Section 5
Classification Accuracy
43
Confusion Matrix
44
Conclusions
• The success of a mission can be explained by several factors such as the launch site, the orbit and
especially the number of previous launches. Indeed, we can assume that there has been a gain in
knowledge between launches that allowed to go from a launch failure to a success.
• The orbits with the best success rates are GEO, HEO, SSO, ES-LI.
• Depending on the orbits, the payload mass can be a criterion to take into account for the success
of a mission. Some orbits require a light or heavy payload mass. But generally low weighted
payloads perform better than the heavy weighted payloads.
• With the current data, we cannot explain why some launch sites are better than others (KSC LC-
39A is the best launch site). To get an answer to this problem, we could obtain atmospheric or
other relevant data.
• For this dataset, we choose the Decision Tree Algorithm as the best model even if the test
accuracy between all the models used is identical. We choose Decision Tree Algorithm because it
has a better train accuracy.
45