0% found this document useful (0 votes)

15 views5 pages

Asteroid

The document outlines a data analytics competition focused on analyzing asteroid risks, emphasizing the importance of detecting and classifying hazardous asteroids to mitigate potential impacts on Earth. Participants are tasked with using data analytics and machine learning techniques to develop predictive models based on various asteroid features, with a structured problem statement divided into five main parts, including exploratory data analysis, feature engineering, classification, and anomaly detection. Teams must submit well-documented code and a detailed report, with evaluations based on clarity, innovation, and feasibility of their solutions.

Uploaded by

zwksgjsyy7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views5 pages

Asteroid

Uploaded by

zwksgjsyy7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

DATA ANALYTICS NSSC’24

Cosmic Collision-
Analysing Asteroid Risks with Data
OVERVIEW
Asteroids are small, rocky objects that orbit the Sun, primarily found in the region between Mars
and Jupiter known as the asteroid belt. They are remnants from the early solar system, formed
over 4.6 billion years ago, and are considered planetesimals, or building blocks of planets that
never coalesced into a larger body due to gravitational disturbances, particularly from Jupiter.

Asteroid impacts have been a constant force shaping Earth's history. From minor events that
leave barely a trace to catastrophic collisions that have caused mass extinctions, these cosmic
encounters have played a significant role in our planet's evolution. Given the potential for
devastating consequences, scientists worldwide are dedicated to detecting and tracking the
asteroid impact threat. To mitigate this risk, it is crucial to identify and analyse asteroids that
could pose a hazard.

Objective of the Analysis: Data-Driven Classification of Hazardous

Asteroids

The primary objective of this analysis is to use data analytics and machine learning techniques
to determine the likelihood of an asteroid being hazardous to Earth based on various features
provided in the dataset. This includes examining characteristics such as the asteroid's size,
orbital parameters, velocity, and proximity to Earth's orbit.

By analysing these features, the goal is to develop predictive models that classify asteroids into
hazardous and non-hazardous categories. This classification is crucial for prioritising which
asteroids require further monitoring, potential deflection efforts, or other mitigation strategies.

We will also identify asteroids exhibiting unusual or anomalous behaviour that may warrant
closer attention.
GENERAL INSTRUCTIONS
1. The dataset for the problem can be found here: Dataset
2. The description of the features used is given here: Description of the Features
3. The problem statement consists of 5 parts, and each part has a few subproblems.
4. These problems are based on the Solar System Dynamics data captured by NASA.
5. Students can participate in teams of 3-4 members.
6. The weightage of each question is mentioned next to the question.
7. Participants are free to use any programming language, environment, and
library. However, it is preferable to use Google Colab or Jupyter Notebook as the
programming environments.
8. For submission, participants should submit the .ipynb file of the solution. The
code should be well-commented to clearly convey the work done. Each team
should also submit a PDF report containing a detailed solution and approach. All
plots and outputs must be shown in the report with proper explanations and
descriptions. Final answers should be highlighted.
9. All code snippets should be attached to the report.
10. Only 15 minutes will be allocated for each team to present their solutions.
Therefore, all relevant conclusions must be presented promptly.
11. Participants will be evaluated on the clarity and applicability of the solution, their
innovation, and the feasibility of their conclusions.
12. Important: Ensure that each .ipynb code block is clearly associated
with the corresponding question and sub-question. Use markdown to
label each block, so it is easy to identify which problem and
subproblem the block is solving. Failure to do so may lead to
disqualification.

Note: The problem statement below is worth a total of 100 MARKS. Your
score will be scaled to 60 MARKS, with the remaining 40 MARKS allocated
for the presentation.
PROBLEM STATEMENT
1. Exploratory Data Analysis (EDA) (20 points)

1.1 Data Inspection (7 points):

● Inspect the dataset and determine the data types of all features (numerical, categorical).
(1 MARK)
● Calculate and analyse basic statistics for each numerical feature, including range, mean,
median, standard deviation, and quartiles. (2 MARKS)
● Identify features that have missing values. (1 MARK)
● Identify the numerical and categorical features of the dataset to use for further analysis.
(1 MARK)
● Use imputation to fill the null values in the dataset. How is this process different for
numerical and categorical columns? (2 MARKS)

1.2 Statistical Inference (6 points):

● Plot the distribution of numerical features to assess the skewness of the data. Does this
dataset require normalisation? If yes, normalise/scale the dataset. (Hint: Use
histograms) (2 MARKS)
● Identify potential outliers in the numerical columns using any statistical technique (e.g.,
box plots, z-score, etc.). (2 MARKS)
● Explore the relationship between different features using scatter plots or correlation
matrices. (Hint: Use Seaborn or similar libraries) (2 MARKS)

1.3 Visualisation (4 points):

● Create a pairplot using Seaborn to visualise relationships between multiple numerical

features simultaneously. (2 MARKS)
● What do you infer from these plots? How do the diagonal plots and off-diagonal plots in
a pairplot differ in the information they provide? (2 MARKS)

1.4 Tackling Class Imbalance (3 points):

● Is there a classification bias (class imbalance) in this dataset? If yes, how would you
tackle it? (2 MARKS)
● Discuss the implication of class imbalance on model performance. (1 MARK)
2. Numerical Interpretation and Mathematical Analysis (20
points)

2.1 Feature Engineering (15 points):

● Combine the approach_date, month, and year features into a single feature representing
the day of the year. Convert it into a ‘datetime’ format. (1 MARK)
● Calculate the ratio of Miss Distance vs. Semi-major axis. Create a 'Time Until Approach'
feature based on the difference between the 'Epoch Date Close Approach' and the current
date. (3 MARKS)
● Calculate the eccentricity of the orbit, average orbital velocity, and orbital period using
Kepler’s Law. (3 MARKS)
● Calculate the heliocentric distance, escape velocity, and specific orbital energy. (3
MARKS)
● Calculate the Specific Angular Momentum using the formula: h=sqrt(GMa(1−e²)). (1
MARK)
● Calculate the velocity at Perihelion and Aphelion. (1 MARK)
● Average the Miss distance of various categories and find the closest approach distance. (1
MARK)
● Calculate Synodic Period and Mean Motion using the orbital period. (2 MARKS)

2.2 Additional Features (5 points):

● Create additional features as per your understanding of the problem for improving
accuracy. More marks are awarded for innovative and effective features. (5 MARKS)

3. Handling Binned Values (5 points):

Note: Binned features are categorical variables where values are grouped into discrete
categories such as: [very slow, slow, fast, very fast, etc.].

● Modify the binned features that have an ordinal relationship in this manner:

(very slow = 0, slow = 1, fast = 2, very fast = 3, etc). (2 MARKS)

● One-hot encode the binned features whose relationship is not strictly ordinal. (3
MARKS)
4. Hazardous Classification (35 points):
● Build a robust and efficient classifier to classify asteroids as Hazardous (1) or Not
Hazardous (0). (7 MARKS)
● Implement K-Fold Cross Validation for training. Train the dataset for all values of K
from 2 to 10. Plot the loss and accuracy versus epochs for these K values. (12 MARKS)
● Optimise all the hyperparameters used in the classifier by selecting an appropriate
optimisation method. (8 MARKS)
● Plot the ROC curve and Confusion Matrix to quantify the performance of your classifier.
(4 MARKS)
● Use SHAP Values, Permutation Importance, or Partial Dependence Plots to list the most
and least useful features. (4 MARKS)

5. Anomaly Detection (20 points):

● Perform anomaly detection using:
○ (i) Any inbuilt library of your choice. (4 MARKS)
○ (ii) Writing your own anomaly detection algorithm. (12 MARKS)
● Store the results as a new column in the dataset. Print the number of anomalies detected
by each method. (2 MARKS)
● Compare the results from both methods by plotting a Confusion Matrix. Print the
number of examples flagged by both algorithms. (2 MARKS)

All Merged IDS Quiz - Merged
No ratings yet
All Merged IDS Quiz - Merged
629 pages
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
100% (1)
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
256 pages
Class 5 GK
No ratings yet
Class 5 GK
4 pages
Project Group
No ratings yet
Project Group
37 pages
NSSC-Data Analytics
No ratings yet
NSSC-Data Analytics
27 pages
Pattern
No ratings yet
Pattern
20 pages
The Symbolism of The 360 of The Zodiac - Janduz Nicolaus
No ratings yet
The Symbolism of The 360 of The Zodiac - Janduz Nicolaus
38 pages
Understanding Earth 7th Edition by John Grotzinger
No ratings yet
Understanding Earth 7th Edition by John Grotzinger
18 pages
Dark Side of The Moon Gerard Degroot - The Magnificent Madness of The American Lunar Quest (2006)
No ratings yet
Dark Side of The Moon Gerard Degroot - The Magnificent Madness of The American Lunar Quest (2006)
337 pages
65ca1c8a92b7f Stellar Analytics PS
No ratings yet
65ca1c8a92b7f Stellar Analytics PS
7 pages
Semi Detailed Lesson Plan in Science Phases of The Moon
80% (5)
Semi Detailed Lesson Plan in Science Phases of The Moon
5 pages
Quiz 4 - Exploratory Data Analysis - Courserav3 PDF
0% (2)
Quiz 4 - Exploratory Data Analysis - Courserav3 PDF
1 page
Tutorial R2011
No ratings yet
Tutorial R2011
52 pages
ML Project Final
No ratings yet
ML Project Final
33 pages
LP Asteroid Risk
No ratings yet
LP Asteroid Risk
5 pages
67e1955abf385 PHHH
No ratings yet
67e1955abf385 PHHH
9 pages
Exoplanets Python Lab
No ratings yet
Exoplanets Python Lab
7 pages
Thematic Cartography, Cartography and the Impact of the Quantitative Revolution
From Everand
Thematic Cartography, Cartography and the Impact of the Quantitative Revolution
Colette Cauvin
No ratings yet
Data Science Final Exam Fall 2023 SOL
No ratings yet
Data Science Final Exam Fall 2023 SOL
6 pages
HW 4
No ratings yet
HW 4
13 pages
Data Science
No ratings yet
Data Science
21 pages
Space Code Challenge 2025
No ratings yet
Space Code Challenge 2025
6 pages
Yr 7 2011
No ratings yet
Yr 7 2011
40 pages
Computational Thinking Theory Answers
No ratings yet
Computational Thinking Theory Answers
2 pages
10.1515 - Astro 2021 0021
No ratings yet
10.1515 - Astro 2021 0021
9 pages
EC8904 Satellite Communication
No ratings yet
EC8904 Satellite Communication
64 pages
0252 001
No ratings yet
0252 001
8 pages
First FS - Listening - Test 2
0% (1)
First FS - Listening - Test 2
7 pages
Module 1 Origin of The Solar System
No ratings yet
Module 1 Origin of The Solar System
37 pages
Case Study (ML in Astronomy) (1) - 1
No ratings yet
Case Study (ML in Astronomy) (1) - 1
4 pages
Analyzing Basics - Dasas (I) - The Art of Vedic Astrology
No ratings yet
Analyzing Basics - Dasas (I) - The Art of Vedic Astrology
29 pages
Cellular Respiration and Its Impact On Health Research Thesis Defense
No ratings yet
Cellular Respiration and Its Impact On Health Research Thesis Defense
58 pages
Nasanearestobjects: 1 Nasa - Nearest Earth Objects
No ratings yet
Nasanearestobjects: 1 Nasa - Nearest Earth Objects
9 pages
Homework1 Spring 2020
No ratings yet
Homework1 Spring 2020
2 pages
SC Cat
No ratings yet
SC Cat
6 pages
Dsbda Nov2023
No ratings yet
Dsbda Nov2023
3 pages
It-3006 (Da) - CS End May 2023
No ratings yet
It-3006 (Da) - CS End May 2023
23 pages
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Atm4171 2024 E0
No ratings yet
Atm4171 2024 E0
7 pages
Sppu Dsbda QP Nov - Dec - 2023
No ratings yet
Sppu Dsbda QP Nov - Dec - 2023
3 pages
WS 1.3 Python Data Science Toolbox
No ratings yet
WS 1.3 Python Data Science Toolbox
8 pages
CETM 24 Part 2
No ratings yet
CETM 24 Part 2
3 pages
Sky Server
No ratings yet
Sky Server
30 pages
ĐỀ CHỌN HSG VÒNG1
No ratings yet
ĐỀ CHỌN HSG VÒNG1
10 pages
Question Bank Semester: IV Sem Subject: Data Science Sub Code: 17MCA441 SL - No. Questions Marks
No ratings yet
Question Bank Semester: IV Sem Subject: Data Science Sub Code: 17MCA441 SL - No. Questions Marks
4 pages
BPhO CompPhys Challenge 2023
No ratings yet
BPhO CompPhys Challenge 2023
3 pages
CS5785 Homework 4: .PDF .Py .Ipynb
No ratings yet
CS5785 Homework 4: .PDF .Py .Ipynb
5 pages
05 Graphing Planet Data Instructions and Checklist
No ratings yet
05 Graphing Planet Data Instructions and Checklist
1 page
FDS Apr - May 2024
No ratings yet
FDS Apr - May 2024
4 pages
ST3189 Assessed Coursework Project 2023-24
No ratings yet
ST3189 Assessed Coursework Project 2023-24
2 pages
L36 Synchronous Motor
No ratings yet
L36 Synchronous Motor
23 pages
Asteroid Deflection System With ML
No ratings yet
Asteroid Deflection System With ML
11 pages
IT445 Project
No ratings yet
IT445 Project
10 pages
Datascience
No ratings yet
Datascience
8 pages
2024 Fods Ques
No ratings yet
2024 Fods Ques
4 pages
Assignment 4
No ratings yet
Assignment 4
4 pages
L34 Induction Machine
No ratings yet
L34 Induction Machine
29 pages
Data Science in Society Cat
No ratings yet
Data Science in Society Cat
5 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
IOAA 2024 Data Analysis
No ratings yet
IOAA 2024 Data Analysis
5 pages
Techno Final
No ratings yet
Techno Final
48 pages
hw4 Astro
No ratings yet
hw4 Astro
3 pages
Finite Element Method
From Everand
Finite Element Method
Gouri Dhatt
1/5 (1)
Bode S Law and The Discovery of Juno Historical Studies in Asteroid Research 1st Edition Clifford J. Cunningham (Auth.) Download PDF
No ratings yet
Bode S Law and The Discovery of Juno Historical Studies in Asteroid Research 1st Edition Clifford J. Cunningham (Auth.) Download PDF
62 pages
Compre FoDS
No ratings yet
Compre FoDS
3 pages
ITT306 Data Science-May2023
No ratings yet
ITT306 Data Science-May2023
3 pages
Nike Business Intelligence Case Study
No ratings yet
Nike Business Intelligence Case Study
12 pages
CEG Assessment II
No ratings yet
CEG Assessment II
4 pages
FDS Important Q
No ratings yet
FDS Important Q
5 pages
Unsolved Mysteries - Doc64b18
No ratings yet
Unsolved Mysteries - Doc64b18
9 pages
Ch1 - Diodes
No ratings yet
Ch1 - Diodes
34 pages
Dsa - DK Question Paper
No ratings yet
Dsa - DK Question Paper
4 pages
DM 2022
No ratings yet
DM 2022
4 pages
Lecture Notes On Geography
No ratings yet
Lecture Notes On Geography
17 pages
Introduction to Finite Element Analysis
From Everand
Introduction to Finite Element Analysis
Rahul Basu
No ratings yet
ADA Assignment - Final - 2022
No ratings yet
ADA Assignment - Final - 2022
6 pages
Aruco Markers Overview
No ratings yet
Aruco Markers Overview
10 pages
Stars and Constellations - Print - Quizizz
No ratings yet
Stars and Constellations - Print - Quizizz
5 pages
Q1S 1
No ratings yet
Q1S 1
2 pages
ISRO MidPrep
No ratings yet
ISRO MidPrep
3 pages
CSLIVE - 25.09.2023 - Chandan Sharma - Current Affairs
No ratings yet
CSLIVE - 25.09.2023 - Chandan Sharma - Current Affairs
35 pages
Project Plans
No ratings yet
Project Plans
4 pages
9 Science NcertSolutions Chapter 10 Exercises
No ratings yet
9 Science NcertSolutions Chapter 10 Exercises
11 pages
Ancient Civilization-Sumerian
No ratings yet
Ancient Civilization-Sumerian
18 pages
Practical Guide to Forming Simulation
From Everand
Practical Guide to Forming Simulation
Rakesh Kumar
No ratings yet
Finite Element Methods
From Everand
Finite Element Methods
Rahul Basu
No ratings yet
Time Allowed: 3 Hrs Maximum Marks: 80 General Instructions
No ratings yet
Time Allowed: 3 Hrs Maximum Marks: 80 General Instructions
3 pages
Messages From Illumined Minds 3
No ratings yet
Messages From Illumined Minds 3
8 pages
Unit 10 Igneous Rocks Part 1
No ratings yet
Unit 10 Igneous Rocks Part 1
17 pages
Unit-2 MCQ
No ratings yet
Unit-2 MCQ
3 pages
Parts List - Service Manual 969430401 - MAT 500 - R534 - en
No ratings yet
Parts List - Service Manual 969430401 - MAT 500 - R534 - en
2 pages
The Lolladoff Plate
No ratings yet
The Lolladoff Plate
1 page
Hora or How Days (Vaars) Are Formed
No ratings yet
Hora or How Days (Vaars) Are Formed
1 page
NATAL CHART - CALCULATION AND INTERPRETATION FOR FREE - The Best Site For Horoscopes Daily, Weekly, Monthly, Yearly Online Free
No ratings yet
NATAL CHART - CALCULATION AND INTERPRETATION FOR FREE - The Best Site For Horoscopes Daily, Weekly, Monthly, Yearly Online Free
4 pages
Did You Know - Amazing Answers To The Questions You Ask (PDFDrive) PDF
100% (16)
Did You Know - Amazing Answers To The Questions You Ask (PDFDrive) PDF
162 pages

Asteroid

Uploaded by

Asteroid

Uploaded by

DATA ANALYTICS NSSC’24

Objective of the Analysis: Data-Driven Classification of Hazardous

1.1 Data Inspection (7 points):

1.2 Statistical Inference (6 points):

1.3 Visualisation (4 points):

● Create a pairplot using Seaborn to visualise relationships between multiple numerical

1.4 Tackling Class Imbalance (3 points):

2.1 Feature Engineering (15 points):

2.2 Additional Features (5 points):

3. Handling Binned Values (5 points):

(very slow = 0, slow = 1, fast = 2, very fast = 3, etc). (2 MARKS)

5. Anomaly Detection (20 points):

You might also like