0% found this document useful (0 votes)

81 views10 pages

IT445 Project

This document provides instructions for a group project on analyzing an asteroid dataset. The project requires students to: 1. Select a dataset on asteroids and form groups of 3-5 students. 2. Analyze the data to understand its structure, trends, and anomalies. This includes describing the data, reducing dimensions, providing statistics, and validating a hypothesis about asteroid size and danger. 3. Present the data analysis in a written report following a template, and a presentation, for a total of 15 marks. The deadline is August 5th, 2022.

Uploaded by

wareef

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views10 pages

IT445 Project

Uploaded by

wareef

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

College of Computing and Informatics

Project
Deadline: Sunday 08/05/2022 @ 23:59
[Total Mark for this Project is 15]
Group Details: CRN: 21907
Name: Maha AlMutairi. ID: S180056413
Name: Hanan AlEnazi. ID: S160165963
Name: Doaa Zainaddin. ID: S180225092
Name: Somaya AlShehri. ID: S170068838
Name: Saadah AlMutairi. ID: S170281918

Instructions:

 You must submit two separate copies (one Word file and one PDF file) using the Assignment Template on
Blackboard via the allocated folder. These files must not be in compressed format.

 It is your responsibility to check and make sure that you have uploaded both the correct files.
 Zero mark will be given if you try to bypass the Safe Assign (e.g. misspell words, remove spaces between
words, hide characters, use different character sets, convert text into image or languages other than English
or any kind of manipulation).
 Email submission will not be accepted.
 You are advised to make your work clear and well-presented. This includes filling your information on the
cover page.

 You must use this template, failing which will result in zero mark.
 You MUST show all your work, and text must not be converted into an image, unless specified otherwise by
the question.
 Late submission will result in ZERO mark.
 The work should be your own, copying from students or other resources will result in ZERO mark.
 Use Times New Roman font for all your answers.
Pg. 01 Project

Learning Outcome(s):
CLO 1, 2, 5 Project 15 Marks

1, Demonstrate an Students can form groups consisting of three students and send their names to
understanding of the instructor before 3rd March and select one dataset from the datasets provided in
concepts of decision
the bellow link. Otherwise, the instructors will form the groups automatically, and
analysis and decision
assign the unselected datasets to the groups.
support systems (DSS)
including probability, https://www.coursera.org/articles/data-analytics-projects-for-beginners
modelling, decisions under
uncertainty, and real-world "10 free public datasets for EDA"
problems.
Use only selected or assigned dataset and analyze the data using Microsoft Excel
2, Describe advanced to discover the structure of data, trends, patterns, or any anomalies in the data
Business Intelligence,
based on your own hypothesis. Perform the following tasks. You should use
Business Analytics, Data
visualization to aid your answer.
Visualization, and
Dashboards. Your project will include two main parts:

5, Improve hands-on skills 1. The final project report which must incorporate all the following 5 tasks
using Excel, and Orange
and written using the provided template. (10 marks distributed among the below
for building Decision
tasks).
Support Systems.
2. A presentation that illustrates your 5 tasks. (5 marks)

==========================================================

Task 1: Understand and describe the nature and structure of the selected dataset.
(2 marks)

 A brief description about the dataset.

 Identify the features of dataset.
 Propose hypothesis / assumptions (between 2 variables) to
validate.

Task 2: Reduce the dimension of the datasets to support the hypothesis

validation. If necessary, do data preprocessing on any missing values, duplicate
Pg. 02 Project

values, etc. You can also generate new feature from the any of the provided
features that may support your hypothesis. Due to the limitation of processing
power of some devices, you can reduce your dataset to 1000 tuples. (2 marks)

Task 3: Provide descriptive statistics for some feature using statistical method to
understand the dataset more and answer the following analysis questions :(3
marks)

 Compare different attributes (features). What trend did you find?

 Include any of the measure of central tendency such as the mean,
median, and mode.
 Describe the spread of your data. This may include the measure of
variance, standard deviation, skewness, and kurtosis.

(You are encouraged to impose other analysis questions based on any trend
you notice in the dataset).

Task 4: Validate the hypothesis in Task 3 by investigating the relationship

between two quantitative variables you have chosen using correlation, regression
and R-squared with possible conclusions. (2 marks)

Task 5: Show visual representation of your analysis (hint: use the right
chart/graph for your data analysis). (1 mark)
Pg. 03 Project

Project Report Template

1. Introduction
Asteroids are large objects in the space coming near other planets like earth. These can be
either hazardous or not, based on multiple attributes and variables. We have found a
dataset showing some of these features ready for the analysis. In this project, this dataset
was obtained, preprocessed, analyzed, and visualized to test the relationship of one of the
variables which is the estimated dimension of the asteroid and the hazardous value of
them.

2. Body section
2.1 Data
The chosen dataset for this project is the data provided by NASA about Asteroids.
They publish this in their Near-Earth Object Web Service. The features of this dataset
are 40, and the rows are 4687. The dataset includes features about these objects and
they have names, ids and other descriptive features like dimensions and sizes. Types of
features include categorical data Like name, and numerical data like id, sizes and
distance and estimated size. Data can be downloaded through this link:
https://www.kaggle.com/datasets/shrutimehta/nasa-asteroids-classification

Hypothesis:
It is commonly known that asteroids objects with larger sizes are classified more
dangerous than the smaller ones. The diameter of the asteroids that are classified as
dangerous in this dataset has a mean diameter of 0.70 kilometers. The average diameter
of non-hazardous asteroids, on the other hand, is 0.40 kilometers. We assume that there
is a correlation between the estimated minimum size of the asteroid and danger level.
Descriptive Statistics:
The following table shows the descriptive statistics of min and max estimated
dimension in kilometre of the detected objects.
In the following table we can see the discriptive statistics of estimated minimum
dimention of asteroid.
Pg. 04 Project

Est Dia in KM(min)

Mean 0.204604203
Standard Error 0.005398253
Median 0.110803882
Mode 0.152951935
Standard Deviation 0.369573402
Sample Variance 0.136584499
Kurtosis 652.5915736
Skewness 17.67010725
Range 15.57854187
Minimum 0.001010543
Maximum 15.57955241
Sum 958.9798995
Count 4687

Also, there is as well other features that we provide descriptive statistics for. This
includes two features worth noting like velocity and miss destination in the following
table:
Relative Velocity Miss
km per sec Dist.(kilometers)
13.97081106 Mean 38413467
Mean
0.106530016 Standard Error 318588.5
Standard Error
12.91788922 Median 39647712
Median
12.28855526 Mode 5421689
Mode
7.293222605 Standard Deviation 21811098
Standard Deviation
53.19109596 Sample Variance 4.76E+14
Sample Variance
0.81028737 Kurtosis -1.1896
Kurtosis
0.887879907 Skewness -0.10239
Skewness
Pg. 05 Project

44.29824252 Range 74754990

Range
0.335504112 Minimum 26609.89
Minimum
44.63374663 Maximum 74781600
Maximum
65481.19146 Sum 1.8E+11
Sum
4687 Count 4687
Count

Also, Hazardous feature has two values we care about, these are shown in the pie chart
below:

2.2 Methods
The data is obtained online from Kaggle website where thousands of datasets are provided.
The dataset was downloaded as CSV (MS Excel Comma Separated Values). The file was
opened in MS Excel for analysis, pre-processing, and preparation. The selected dataset has
been processed and studied multiple times by researchers in the field. The method used for
this research in quantitative. The numbers are analysed to get a "class" of the object
detected.
Pg. 06 Project

2.3 Analysis
Data Pre-processing
The original dataset has some features that we don’t need in our analysis and classification,
this include both Name and New reference ID. We deleted these features from the dataset.
Also, the feature named "Close Approach Date" is not needed in this analysis since it has
no benefit in classification because the date is not related to this analysis, so we deleted it.
This is also applied to another feature named "Orbit Determination Date".
We also have in the original dataset a feature named "Orbiting body" that has only one
values which is "earth", analysis of this data including this feature does nothing to the
analysis since it makes no difference, so we also deleted this feature. This is also applied to
a feature named "Equinox" which contains one value only.
Here we reach the important features which matters to the analysis, but include redundant
data which has to be deleted. These include features having multiple measurement units,
like the ones bellow:
"Est Dia in KM(min)", "Est Dia in KM(max)", "Est Dia in M(min)", "Est Dia in M(max)",
"Est Dia in Miles(min)", "Est Dia in Miles(max)", "Est Dia in Feet(min)", "Est Dia in
Feet(max)".
We had to delete all features with values other than the ones measured in KM.
We also noticed that the class in our dataset which classifies whether the object is
hazardous or not is written in true and false form, and this has to be transferred into
numbers with two values: 0 and 1. 1 is for true, and 0 for false.
Correlation of features:
In this correlation analysis, we are focusing about the size (dimension) of the asteroids and
the whether it is dangerous or not. In Excel, we created correlation matrix showing the
relationship between the dimensions (min, max) and the hazardous of the asteroids. Here
are the results showing that when the estimated minimum and maximum dimension of
asteroid is larger, it means it is more dangerous. This is showing in the table by (positive
value) meaning that the correlation is positive. We can also see that the two features are
correlated strongly, we can see that in the diagram.
Pg. 07 Project

Est Dia in Est Dia in

Hazardous
KM(min) KM(max)

Est Dia in KM(min) 1

Est Dia in KM(max) 1 1

Hazardous 0.132424352 0.132424352 1

We also created a correlation matrix for all variables (features) in the dataset. Shown in the
following table:
We can see all features in this correlation matrix. Positive values show positive correlation
and negative number values shows negative correlation. We highlighted the hazardous
feature to represent the features correlated to this class.
Pg. 08 Project

Regression Analysis and R-Square analysis:

We used Regression analysis of the two features (Est Dia in KM(min)) and (Hazardous) to
see the nature of relationship exist. We found that the regression is positive, showing in the
following diagram.

Also, here is the table showing a summary of information about the regression including
R-Square value.
Pg. 09 Project

Regression Statistics

Multiple R 0.350919113

R Square 0.123144224

Adjusted R Square 0.12214222

Standard Error 0.397283612

Observations 999

2.4 Results
We have conducted a regression analysis with R-Square value and correlation analysis.
The result of correlation analysis shows a value of 0.132424352, which indicates a
positive correlation of the two variables. For the regression analysis, the diagram shows
a positive regression pattern with positive value. For R-square, we also had a value of
0.123144224 which is positive.

3. Conclusion
The goal of this analysis is to find the relationship between the two variables (features)
in the dataset which are the estimated dimension of the asteroids and the hazardous of
them. We found that the two variables have a positive correlation, a positive regression
value, and a positive R Square value. There are other correlations between the other
features in the dataset also shown in this report, but we only focused on the two
variables to test the hypothesis which came to be true for this analysis by the obtained
dataset. The results support the general theory that asteroids with larger dimensions are
more dangerous than the smaller ones.

For future analysis, researchers can test the relationship of Asteroid hazardousness and
other features of them using the same dataset.

2 Simulation
No ratings yet
2 Simulation
16 pages
Assignment 1: Decision Support Systems IT445
No ratings yet
Assignment 1: Decision Support Systems IT445
4 pages
Modern Systems Analysis and Design, 7/E Jeffrey A. Hoffer, Joey George, Joe Valacich Test Bank
No ratings yet
Modern Systems Analysis and Design, 7/E Jeffrey A. Hoffer, Joey George, Joe Valacich Test Bank
5 pages
Model Select Quantitative Mid Exam November 2021
No ratings yet
Model Select Quantitative Mid Exam November 2021
3 pages
IS328 Data Mining-Tutorial 1 Solution
No ratings yet
IS328 Data Mining-Tutorial 1 Solution
5 pages
Operation Research EXAM
No ratings yet
Operation Research EXAM
1 page
Questions Templates Question (1) : Choose The Correct Answer
100% (2)
Questions Templates Question (1) : Choose The Correct Answer
3 pages
UML Includes The Following 9 Diagrams
No ratings yet
UML Includes The Following 9 Diagrams
11 pages
183bus 152a 2 - 1535685063
0% (1)
183bus 152a 2 - 1535685063
14 pages
Special Cases of Linear Programming Models (Part 3)
No ratings yet
Special Cases of Linear Programming Models (Part 3)
2 pages
Sports Arbitrage Guide 04 - The Calculations
100% (2)
Sports Arbitrage Guide 04 - The Calculations
5 pages
Mass and Energy Balances - Basic Principles For Calculation, Design, and Optimization of Macro - Nano Systems
100% (8)
Mass and Energy Balances - Basic Principles For Calculation, Design, and Optimization of Macro - Nano Systems
276 pages
IE 535 - Take Home Exam
100% (1)
IE 535 - Take Home Exam
7 pages
Answer Unit (4) Assgnment
No ratings yet
Answer Unit (4) Assgnment
2 pages
Unit #3 - Data Warehouse and Data Mining
No ratings yet
Unit #3 - Data Warehouse and Data Mining
70 pages
Project Management - Questions Sheet
No ratings yet
Project Management - Questions Sheet
24 pages
Transportation Model
0% (1)
Transportation Model
41 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Applied Statistics in Business & Economics,: David P. Doane and Lori E. Seward
No ratings yet
Applied Statistics in Business & Economics,: David P. Doane and Lori E. Seward
48 pages
Tutorial 11 Answers
No ratings yet
Tutorial 11 Answers
4 pages
Topic 2 Errors
No ratings yet
Topic 2 Errors
144 pages
Lecture Seven: Ethical, Social, and Political Issues in E-Commerce
No ratings yet
Lecture Seven: Ethical, Social, and Political Issues in E-Commerce
17 pages
Network Models - Part 9
No ratings yet
Network Models - Part 9
20 pages
Assignement IPM
100% (1)
Assignement IPM
5 pages
DBDAL LAB - MANUAL - Final
No ratings yet
DBDAL LAB - MANUAL - Final
93 pages
Research Method Lecture 2
No ratings yet
Research Method Lecture 2
61 pages
Chapter 8 - Solutions Problem 2:: N Forecast Actual Forecast Actual Forecast Actual
No ratings yet
Chapter 8 - Solutions Problem 2:: N Forecast Actual Forecast Actual Forecast Actual
5 pages
Applied Statistics Exam
No ratings yet
Applied Statistics Exam
10 pages
Exponential Distribution
No ratings yet
Exponential Distribution
9 pages
Chapter No. 08 Fundamental Sampling Distributions and Data Descriptions - 02 (Presentation)
No ratings yet
Chapter No. 08 Fundamental Sampling Distributions and Data Descriptions - 02 (Presentation)
91 pages
QMT425.429-T7 Simulation 7
100% (1)
QMT425.429-T7 Simulation 7
20 pages
Chapter 3 - Transporatition and Assignment Models & Programming
No ratings yet
Chapter 3 - Transporatition and Assignment Models & Programming
32 pages
Sheet 1 - Productivity
No ratings yet
Sheet 1 - Productivity
2 pages
LP Special Cases
No ratings yet
LP Special Cases
9 pages
Fundamentals of Artificial Intelligence
No ratings yet
Fundamentals of Artificial Intelligence
4 pages
04 - OR2 - Dynamic Programming
No ratings yet
04 - OR2 - Dynamic Programming
14 pages
Math-12th Sample Question Papers (Solved) 2024-25
No ratings yet
Math-12th Sample Question Papers (Solved) 2024-25
21 pages
Briefly Describe The Three Common Types of Models and Give An Example of Each
No ratings yet
Briefly Describe The Three Common Types of Models and Give An Example of Each
3 pages
Final Project
No ratings yet
Final Project
4 pages
An Analysis and Enhancements To MOOD Metrics
No ratings yet
An Analysis and Enhancements To MOOD Metrics
29 pages
Discriminant Function Analysis
No ratings yet
Discriminant Function Analysis
9 pages
Decision Making (Copy From OR Book)
No ratings yet
Decision Making (Copy From OR Book)
103 pages
Modeling, Simulation and Optimization
No ratings yet
Modeling, Simulation and Optimization
20 pages
Asteroid
No ratings yet
Asteroid
5 pages
Certainty Factor
100% (2)
Certainty Factor
41 pages
Chapter 2. Database Concepts
No ratings yet
Chapter 2. Database Concepts
29 pages
W1 Answers To Project Network PERT CPM Exercise Edits
100% (1)
W1 Answers To Project Network PERT CPM Exercise Edits
6 pages
SL-III Lab Manual
No ratings yet
SL-III Lab Manual
74 pages
Assignment-Based Subjective Questions/Answers
No ratings yet
Assignment-Based Subjective Questions/Answers
3 pages
Lab Manual 05 RIP Configuration
No ratings yet
Lab Manual 05 RIP Configuration
33 pages
The Maximal Flow Problem
No ratings yet
The Maximal Flow Problem
12 pages
Assignment PDF
No ratings yet
Assignment PDF
4 pages
Post Optimality Analysis
No ratings yet
Post Optimality Analysis
13 pages
Multi Criteria Decision Making (MCDM)
No ratings yet
Multi Criteria Decision Making (MCDM)
43 pages
MG 602 Probability Theories Exercise
No ratings yet
MG 602 Probability Theories Exercise
5 pages
4TH Summative Test in Math4
No ratings yet
4TH Summative Test in Math4
1 page
Research Methodology and Graduation Project
No ratings yet
Research Methodology and Graduation Project
47 pages
MGT602 PPT Slides VComsats
No ratings yet
MGT602 PPT Slides VComsats
727 pages
Chap8 Pinto Chap10
No ratings yet
Chap8 Pinto Chap10
24 pages
NCERT Grade 09 Mathematics Introduction-To-Euclids-Geometry
No ratings yet
NCERT Grade 09 Mathematics Introduction-To-Euclids-Geometry
8 pages
Productions Assignment Assistance
No ratings yet
Productions Assignment Assistance
11 pages
Chapter 4 HYPOTHESIS TESTING
No ratings yet
Chapter 4 HYPOTHESIS TESTING
48 pages
Chapter 16
No ratings yet
Chapter 16
25 pages
Week 10 Database Design
No ratings yet
Week 10 Database Design
58 pages
1.4 Circle Diagram of Slip Ring Motor
No ratings yet
1.4 Circle Diagram of Slip Ring Motor
9 pages
Estimation Theory and Problem
No ratings yet
Estimation Theory and Problem
5 pages
Matrix Multiplication1
No ratings yet
Matrix Multiplication1
10 pages
Psoc 2015 New
No ratings yet
Psoc 2015 New
20 pages
Prime Factorization: by Jane Alam Jan
No ratings yet
Prime Factorization: by Jane Alam Jan
6 pages
Jee Main - (One Year Crp-2425) C-Lot-Ph-1 (Vec, KM, Lom, Wep & Com)
No ratings yet
Jee Main - (One Year Crp-2425) C-Lot-Ph-1 (Vec, KM, Lom, Wep & Com)
20 pages
Introduction To Quantitative Methods: Morning 6 December 2007
100% (1)
Introduction To Quantitative Methods: Morning 6 December 2007
20 pages
The Two-Phase Simplex Method
No ratings yet
The Two-Phase Simplex Method
22 pages
Modern Algebra I
No ratings yet
Modern Algebra I
21 pages
June 2016 Paper
No ratings yet
June 2016 Paper
20 pages
Analog & Digital Control Systems
No ratings yet
Analog & Digital Control Systems
3 pages
DEA With Stata
No ratings yet
DEA With Stata
14 pages
An Introduction To The Guide To The Expression of Uncertainty in Measurement'
No ratings yet
An Introduction To The Guide To The Expression of Uncertainty in Measurement'
10 pages
Constraint Programming: Michael Trick Carnegie Mellon
No ratings yet
Constraint Programming: Michael Trick Carnegie Mellon
41 pages
Submitted in Partial Fulfilment For The Award of Degree of
No ratings yet
Submitted in Partial Fulfilment For The Award of Degree of
13 pages
EDAN
No ratings yet
EDAN
2 pages
An Example of Application of Microcontroller in Voltage, Current, and Power Measurements
No ratings yet
An Example of Application of Microcontroller in Voltage, Current, and Power Measurements
12 pages
UMEP Sample
No ratings yet
UMEP Sample
2 pages
Caie As Level Psychology 9990 Methodology 63d5229efa0a7313631e05cb 853
No ratings yet
Caie As Level Psychology 9990 Methodology 63d5229efa0a7313631e05cb 853
9 pages
Formal Logic 2020 21 OBE Final Exam On March 19 FINAL PRINT
No ratings yet
Formal Logic 2020 21 OBE Final Exam On March 19 FINAL PRINT
3 pages
NCERT Solutions For Class 11 Maths Chapter 3 Trigonometric Functions Miscellaneous Exercise
No ratings yet
NCERT Solutions For Class 11 Maths Chapter 3 Trigonometric Functions Miscellaneous Exercise
13 pages
Vertopal Com EDA Project
No ratings yet
Vertopal Com EDA Project
21 pages
11 Phy DPP 32
No ratings yet
11 Phy DPP 32
4 pages
Design and Analysis of Algorithms CSC 321 Lecture 3 29092022 032607pm
No ratings yet
Design and Analysis of Algorithms CSC 321 Lecture 3 29092022 032607pm
49 pages
Anova 2
No ratings yet
Anova 2
4 pages

IT445 Project

Uploaded by

IT445 Project

Uploaded by

College of Computing and Informatics

 A brief description about the dataset.

Task 2: Reduce the dimension of the datasets to support the hypothesis

 Compare different attributes (features). What trend did you find?

Task 4: Validate the hypothesis in Task 3 by investigating the relationship

Project Report Template

Est Dia in KM(min)

44.29824252 Range 74754990

Est Dia in Est Dia in

Est Dia in KM(min) 1

Est Dia in KM(max) 1 1

Hazardous 0.132424352 0.132424352 1

Regression Analysis and R-Square analysis:

Adjusted R Square 0.12214222

Standard Error 0.397283612

You might also like