0% found this document useful (0 votes)

16 views8 pages

Application of Machine Learning

This research project explores the applications of Machine Learning in identifying celestial bodies, specifically stars, using algorithms like Naïve Bayes, Decision Tree, and KNN. The findings indicate that the Decision Tree algorithm yields the highest accuracy in both training and testing phases. Machine Learning's potential extends beyond astronomy, with applications in various fields such as healthcare, finance, and personalized recommendations.

Uploaded by

Subhodeep Chanda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views8 pages

Application of Machine Learning

Uploaded by

Subhodeep Chanda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

RESEARCH PROJECT ON APPLICATIONS OF MACHINE LEARNING

Report by Sirsha Dutta, M. P. Birla Foundation Higher Secondary School

Research work completed under Dr. Sudip Misra, IIT Kharagpur
1. Introduction

In the field of Astronomy, observation of celestial bodies using telescopes is a regular

activity. Every time a new celestial body is observed by a telescope, there are several kinds
of factors that need to be taken into account before declaring whether or not it is a star and
further to determine the type of the star. These observations consist of a huge amount of
information regarding the attributes of the given celestial body. In case of stars, these
attributes would be- luminosity, radius, absolute magnitude, star colour, and so on. These
attributes can then be used to figure out the type of the star observed.
Clearly, there is an overwhelming amount of information that needs to be reviewed.
Naturally, for human beings it becomes a tiringly long activity and the results are also highly
subject to manual errors.
Thus to make the process of star identification more efficient and time-saving,
Machine Learning can be utilised. Machine Learning is the process by which a computer (i.e.
a machine) is able to predict or speculate an output when certain information is provided to
it, based on previously inputted information and its corresponding results. This process is
known as Machine Learning because apart from a base program and some initial inputs, the
procedure is conducted by the machine independently.
Along with the developments in the technology in recent years, machines have had a
big role in our lives. There are a lot of data gathered in every part of our lives and these data
are increasing day by day. Although machines are thought to be used only in the fields of
engineering and computer science, they are encountered at every part of human life. Firms
that have already recognized and invested on this area are using this technology actively
today and achieving success. In the future, machines will be successful in the jobs that
cannot be done by humans. In such an environment, the applications of machine learning
increase. Prediction of weather, prediction of what disease a patient might have based on
their symptoms are all fields where machine learning can be utilized.
There are several kinds of Machine Learning algorithms that can be used to identify
stars. In this project, we will be using three different Machine Learning algorithms- Naïve
Bayes, Decision Tree, and KNN- and will be comparing the outputs yielded by the three
algorithms to determine the most accurate method of identifying stars using Machine
Learning.
2. Related Work

There are three algorithms discussed in this paper. There has been considerable amount of
work on each of them. In the paper, Improved naive Bayes classification algorithm for traffic
risk management by Hong Chen, Songhua Hu, Rui Hua & Xiuju Zhao, it is mentioned that the
Naive Bayesian classification algorithm is widely used in big data analysis and other fields
because of its simple and fast algorithm structure. Aiming at the shortcomings of the naive
Bayes classification algorithm, this paper uses feature weighting and Laplace calibration to
improve it, and obtains the improved naive Bayes classification algorithm. Through
numerical simulation, it is found that when the sample size is large, the accuracy of the
improved naive Bayes classification algorithm is more than 99%, and it is very stable; when
the sample attribute is less than 400 and the number of categories is less than 24, the
accuracy of the improved naive Bayes classification algorithm is more than 95%. Through
empirical research, it is found that the improved naive Bayes classification algorithm can
greatly improve the correct rate of discrimination analysis from 49.5 to 92%. Through
robustness analysis, the improved naive Bayes classification algorithm has higher accuracy.
[1] In the paper, KNN Model-Based Approach in Classification by Gongde Guo, Hui Wang,
David Bell, Yaxin Bi, and Kieran Greer, it is mentioned, the k-Nearest-Neighbours (kNN) is a
simple but effective methodfor classification. The major drawbacks with respect to kNN are
(1) its low-efficiency - being a lazy learning method prohibits it in many applications such as
dynamic web mining for a large repository, and (2) its dependency on the selection of a
“good value” for k. In this paper, they proposed a novel kNN type method for classification
that is aimed at overcoming these shortcomings. Their method constructs a kNN model for
the data, which replaces the data to serve as the basis of classification. The value of k is
automatically determined, is varied for different data, and is optimal in terms of
classification accuracy. The construction of the model reduces the dependency on k and
makes classification faster. Experiments were carried out on some public datasets collected
from the UCI machine learning repository in order to test their method. The experimental
results show that the kNN based model compares well with C5.0 and kNN in terms of
classification accuracy, but is more efficient than the standard kNN. [2] In Random Forests
and Decision Trees by Jehad Ali, Rehanullah Khan, Nasir Ahmad, Imran Maqsood, they have
compared the classification results of two models i.e. Random Forest and the J48 for
classifying twenty versatile datasets. They took 20 data sets available from UCI
repository containing instances varying from 148 to 20000. They compared the
classification results obtained from methods i.e. Random Forest and Decision Tree (J48).
The classification parameters consist of correctly classified instances, incorrectly classified
instances, F-Measure, Precision, Accuracy and Recall. They discussed the pros and cons of
using these models for large and small data sets. The classification results show that
Random Forest gives better results for the same number of attributes and large data sets i.e.
with greater number of instances, while J48 is handy with small data sets (less number of
instances). The results from breast cancer dataset depicts that when the number of
instances increased from 286 to 699, the percentage of correctly classified instances
increased from 69.23% to 96.13% for Random Forest i.e. for dataset with same number
of attributes but having more instances, the Random Forest accuracy increased. [3]
In Do we need hundreds of classifiers to solve real world classification problems, by Amorim,
D.G., Barro, S., Cernadas, E., & Delgado, M.F., they evaluate 179 classifiers arising from 17
families (discriminant analysis, Bayesian, neural networks, support vector machines,
decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other
ensembles, generalized linear models, nearest-neighbors, partial least squares and principal
component regression, logistic and multinomial regression, multiple adaptive regression
splines and other methods). We use 121 data sets from UCI data base to study the classifier
behavior, not dependent on the data set collection. The winners are the random forest (RF)
versions implemented in R and accessed via caret) and the SVM with Gaussian kernel
implemented in C using LibSVM. [4] In Trends in extreme learning machines: a review, by
Huang, G., Huang, G., Song, S., & You, K., they aim to report the current state of the
theoretical research and practical advances on Extreme learning machine (ELM). Apart from
classification and regression, ELM has recently been extended for clustering, feature
selection, representational learning and many other learning tasks. Due to its remarkable
efficiency, simplicity, and impressive generalization performance, ELM have been applied in
a variety of domains, such as biomedical engineering, computer vision, system
identification, and control and robotics. [5]
3. Methodology
The methodology followed consisted of 9 main steps. The coding was done in the
programming language, Python and was executed using Anaconda. Algorithm of the
program was as follows:
STEP 1: START
STEP 2: We import the sklearn package

STEP 3: We split the data frame into categorical and numeric features to perform

preprocessing

STEP 4: We initialize the standard scaler for scaling

STEP 5: We encode categorical variables

STEP 6: We combine scaled numeric variables and encoded categorical variables

STEP 7: We split training and testing data

STEP 8: We execute KNN classifier with 3 neighbors

STEP 9: We execute decision tree

STEP 10: We execute Naive Bayes

STEP 11: STOP

4. Results

The train score tells us how the model generalized or fitted in the training data. If the model

fits so well in a data with lots of variance, then this causes over-fitting. This causes poor

result on Test Score. Because the model curved a lot to fit the training data and generalized

very poorly.

The test score is made when our model is ready. Before this step we have not touched this

data-set. So, this represents real life scenario. Higher the score, better the model

generalized.

The results obtained were:

KNN-

Train score: 0.9791666666666666

Test score: 0.9583333333333334

Naive Bayes-

Train score: 0.9791666666666666

Test score: 0.9583333333333334

Decision Tree-

Train score: 1.0

Test score: 1.0

Figure 1: Accuracy of Machine Learning Algorithms

It is observed from the above chart that out of the three machine learning algorithms,

Decision Tree has the highest Train and Test Score.

5. Conclusion

From the findings of this project, it can be concluded that Decision Tree is the most

accurate algorithm followed by Naïve Bayes and KNN. Both the train and test scores of the

algorithm are highest.

Machine Learning has great uses in the future. Machine learning is used in internet

search engines, email filters to sort out spam, websites to make personalised

recommendations, banking software to detect unusual transactions, and lots of apps on our

phones such as voice recognition. It is efficient because it easily identifies trends and

patterns, no human intervention is needed (automation), there is continuous improvement

handling multi-dimensional and multi-variety data and it has wide applications.

6. References

1. Improved naive Bayes classification algorithm for traffic risk management by Hong
Chen, Songhua Hu, Rui Hua & Xiuju Zhao [1]
2. KNN Model-Based Approach in Classification by Gongde Guo, Hui Wang, David Bell,
Yaxin Bi, and Kieran Greer [2]
3. Random Forests and Decision Trees by Jehad Ali, Rehanullah Khan, Nasir Ahmad,
Imran Maqsood [3]
4. Do we need hundreds of classifiers to solve real world classification problems, by
Amorim, D.G., Barro, S., Cernadas, E., & Delgado, M.F. (2014). Journal of Machine
Learning Research (cited 387 times, HIC: 3 , CV: 0) [4]
5. Trends in extreme learning machines: a review, by Huang, G., Huang, G., Song, S., &
You, K. (2015). Neural Networks, (cited 323 times, HIC: 0, CV: 0) [5]

Breast Cancer Classification
100% (2)
Breast Cancer Classification
16 pages
Review of Data Mining Classification Techniques
No ratings yet
Review of Data Mining Classification Techniques
4 pages
Chapter 2 Final
No ratings yet
Chapter 2 Final
115 pages
A Systematic Review of Supervised Learning Algorithms in Disease Diagnosis
No ratings yet
A Systematic Review of Supervised Learning Algorithms in Disease Diagnosis
10 pages
DL PPR3
No ratings yet
DL PPR3
57 pages
Sahana S - 1BI22MC086
No ratings yet
Sahana S - 1BI22MC086
47 pages
Prediction of Breast Cancer Using Machine Learning Algorithms - 2nd Review
No ratings yet
Prediction of Breast Cancer Using Machine Learning Algorithms - 2nd Review
21 pages
05 Classification
No ratings yet
05 Classification
33 pages
Machine Learning Project
No ratings yet
Machine Learning Project
12 pages
Himansh PR
No ratings yet
Himansh PR
12 pages
ML Report2
No ratings yet
ML Report2
21 pages
Types of Classification Algorithm
No ratings yet
Types of Classification Algorithm
27 pages
Survey On ML Algorithms For Medical Guidance System
No ratings yet
Survey On ML Algorithms For Medical Guidance System
4 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
14 pages
LS Project Report
No ratings yet
LS Project Report
10 pages
Machine Learning Promising Prediction in Feature Extraction
No ratings yet
Machine Learning Promising Prediction in Feature Extraction
5 pages
A03 Research Paper
No ratings yet
A03 Research Paper
11 pages
DataMining Unit-3
No ratings yet
DataMining Unit-3
8 pages
Breast Cancer Classifier Using Machine Learning
No ratings yet
Breast Cancer Classifier Using Machine Learning
7 pages
Breast Cancer Detection Using Machine Learning Algorithms: Abstract
No ratings yet
Breast Cancer Detection Using Machine Learning Algorithms: Abstract
5 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
14 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
5 pages
IJCSI Jehad
No ratings yet
IJCSI Jehad
8 pages
05 Classification Part1
No ratings yet
05 Classification Part1
35 pages
Research and Implementation of Machine
No ratings yet
Research and Implementation of Machine
6 pages
A Detailed Analysis of The Supervised Machine Learning Algorithms
No ratings yet
A Detailed Analysis of The Supervised Machine Learning Algorithms
5 pages
ML Important
No ratings yet
ML Important
11 pages
Unit 4 Classification & Prediction
No ratings yet
Unit 4 Classification & Prediction
10 pages
Data Science Unit 3
No ratings yet
Data Science Unit 3
33 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
The Role of Machine Learning Algorithms For Diagnosing Diseases
No ratings yet
The Role of Machine Learning Algorithms For Diagnosing Diseases
10 pages
Python ML Algorithm
No ratings yet
Python ML Algorithm
30 pages
Employment Law Handbook Law2407
No ratings yet
Employment Law Handbook Law2407
236 pages
Performance Analysis of Machine Learning Algorithms For Big Data Classification ML and AI Based Algorithms For Big Data Analysis
No ratings yet
Performance Analysis of Machine Learning Algorithms For Big Data Classification ML and AI Based Algorithms For Big Data Analysis
16 pages
Unit - 2 ML Notes
No ratings yet
Unit - 2 ML Notes
14 pages
Supervised Learning
No ratings yet
Supervised Learning
71 pages
Effective Heart Disease Prediction Using Data Mining Technique
No ratings yet
Effective Heart Disease Prediction Using Data Mining Technique
11 pages
INT 354 CA1 Mokshagna
No ratings yet
INT 354 CA1 Mokshagna
8 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
DLL Math 6 Q1 Week 1
89% (9)
DLL Math 6 Q1 Week 1
5 pages
Tan 2021 J. Phys. Conf. Ser. 1994 012016
No ratings yet
Tan 2021 J. Phys. Conf. Ser. 1994 012016
6 pages
A Case Study On Data Classification Approach Using K-Nearest Neighbor
No ratings yet
A Case Study On Data Classification Approach Using K-Nearest Neighbor
7 pages
Breast Cancer Classification and Prediction Using Machine Learning IJERTV9IS020280
No ratings yet
Breast Cancer Classification and Prediction Using Machine Learning IJERTV9IS020280
5 pages
Lecture Plan IC UG Sem3
No ratings yet
Lecture Plan IC UG Sem3
3 pages
Advice To Youth by Mark Twain
No ratings yet
Advice To Youth by Mark Twain
6 pages
Ijet V7i2 8 10557
No ratings yet
Ijet V7i2 8 10557
4 pages
Chapter 1 - The Business and Society Relationship
100% (5)
Chapter 1 - The Business and Society Relationship
19 pages
Supervised Learning Classification Algorithms Comparison
No ratings yet
Supervised Learning Classification Algorithms Comparison
6 pages
5.1.8 K-Nearest-Neighbor Algorithm
No ratings yet
5.1.8 K-Nearest-Neighbor Algorithm
8 pages
Foundations of Educational Technology 2018
No ratings yet
Foundations of Educational Technology 2018
88 pages
A Study of Classification Algorithms Using Rapidminer
No ratings yet
A Study of Classification Algorithms Using Rapidminer
12 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
27 pages
Numbers of Classifier
No ratings yet
Numbers of Classifier
49 pages
MLunit 2 Mynotes
No ratings yet
MLunit 2 Mynotes
15 pages
Stat Learn Big Data 20130401
No ratings yet
Stat Learn Big Data 20130401
53 pages
Performance Evaluation of Decision Tree Classifiers On Medical Datasets
No ratings yet
Performance Evaluation of Decision Tree Classifiers On Medical Datasets
4 pages
Disease Prediction Using Machine Learning
No ratings yet
Disease Prediction Using Machine Learning
10 pages
ML Assignment 2 PDF
No ratings yet
ML Assignment 2 PDF
9 pages
Machine Learning Project 1
No ratings yet
Machine Learning Project 1
19 pages
Practice Math AA HL Paper1
100% (2)
Practice Math AA HL Paper1
12 pages
UC5-Contribute-in-Workplace-Innovation-PTC Final
No ratings yet
UC5-Contribute-in-Workplace-Innovation-PTC Final
66 pages
Ai in Education The Perceptions of Stem Students On Using Chatgpt in Academic Writing
No ratings yet
Ai in Education The Perceptions of Stem Students On Using Chatgpt in Academic Writing
69 pages
CTF Assessment Tool - Table Version
No ratings yet
CTF Assessment Tool - Table Version
13 pages
HSTGE3032T
No ratings yet
HSTGE3032T
2 pages
A Comparative Study On Mushrooms Classification
No ratings yet
A Comparative Study On Mushrooms Classification
8 pages
Lecture Plan C2PH230311T
No ratings yet
Lecture Plan C2PH230311T
1 page
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
No ratings yet
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
32 pages
Machine Learning Algorithms For Breast Cancer Prediction
No ratings yet
Machine Learning Algorithms For Breast Cancer Prediction
8 pages
L Trình Standard 5.5 - 6.0 - 6.5
No ratings yet
L Trình Standard 5.5 - 6.0 - 6.5
4 pages
3 OBE Presentation B
No ratings yet
3 OBE Presentation B
33 pages
Centre Handbook
No ratings yet
Centre Handbook
58 pages
Python Rev
No ratings yet
Python Rev
31 pages
Ess Connect Est-R Calendar
No ratings yet
Ess Connect Est-R Calendar
4 pages
Lesson Plan For Implementing NETS - S-Template I: (More Directed Learning Activities)
No ratings yet
Lesson Plan For Implementing NETS - S-Template I: (More Directed Learning Activities)
12 pages
My English Book Eight
No ratings yet
My English Book Eight
114 pages
Subhajit FinalReport Quantum
No ratings yet
Subhajit FinalReport Quantum
26 pages
Machine Learning Toolbox
No ratings yet
Machine Learning Toolbox
10 pages
ml2 PDF
No ratings yet
ml2 PDF
5 pages
136 Gnuplot Ass1n2
No ratings yet
136 Gnuplot Ass1n2
24 pages
B2 First For Schools - Examiner Comments - Lara, Agustina and Mateo
No ratings yet
B2 First For Schools - Examiner Comments - Lara, Agustina and Mateo
6 pages
The Tempest Opening Scene Significance
No ratings yet
The Tempest Opening Scene Significance
2 pages
Unit 4. Climate Change Its Impact and Mitigation
No ratings yet
Unit 4. Climate Change Its Impact and Mitigation
22 pages
Mrs Tam
No ratings yet
Mrs Tam
1 page
MDS 2 - Introduction-to-Central-Dogma-of-Life - SR - Rest
No ratings yet
MDS 2 - Introduction-to-Central-Dogma-of-Life - SR - Rest
11 pages
Arts Scopeand Sequence
No ratings yet
Arts Scopeand Sequence
17 pages
Resume 2020 Robin Randolph
No ratings yet
Resume 2020 Robin Randolph
2 pages
Report Bipasha
No ratings yet
Report Bipasha
5 pages
Scott Mcknight: Education
No ratings yet
Scott Mcknight: Education
2 pages
Spectrum 2025 - Events
No ratings yet
Spectrum 2025 - Events
4 pages
QM Project
No ratings yet
QM Project
2 pages
SALVATORE
No ratings yet
SALVATORE
6 pages
Samiksha Krishna Kadam
No ratings yet
Samiksha Krishna Kadam
6 pages
Pre Observation Form 10 26 15
No ratings yet
Pre Observation Form 10 26 15
2 pages
Crim6212 Mo
No ratings yet
Crim6212 Mo
19 pages
BMS SEM V Project Work Guidelines 21-22
No ratings yet
BMS SEM V Project Work Guidelines 21-22
2 pages
Producto A1 Ingles I Uss
No ratings yet
Producto A1 Ingles I Uss
5 pages
Reviewer in Beed 14
No ratings yet
Reviewer in Beed 14
1 page
Lesson Plans
No ratings yet
Lesson Plans
20 pages
Class XI - Eng Lit - Salvatore
No ratings yet
Class XI - Eng Lit - Salvatore
3 pages
Hostage Negotiations Training Plan
No ratings yet
Hostage Negotiations Training Plan
7 pages
Co-Exploring The Perception of Food and Eating
No ratings yet
Co-Exploring The Perception of Food and Eating
4 pages
MYP Criteria Year 5
No ratings yet
MYP Criteria Year 5
4 pages
Office of The College of Education: Jose Rizal Memorial State University
No ratings yet
Office of The College of Education: Jose Rizal Memorial State University
2 pages

Application of Machine Learning

Uploaded by

Application of Machine Learning

Uploaded by

RESEARCH PROJECT ON APPLICATIONS OF MACHINE LEARNING

Report by Sirsha Dutta, M. P. Birla Foundation Higher Secondary School

In the field of Astronomy, observation of celestial bodies using telescopes is a regular

STEP 4: We initialize the standard scaler for scaling

STEP 5: We encode categorical variables

STEP 6: We combine scaled numeric variables and encoded categorical variables

STEP 7: We split training and testing data

STEP 8: We execute KNN classifier with 3 neighbors

STEP 9: We execute decision tree

STEP 10: We execute Naive Bayes

STEP 11: STOP

The results obtained were:

Train score: 0.9791666666666666

Test score: 0.9583333333333334

Train score: 0.9791666666666666

Test score: 0.9583333333333334

Train score: 1.0

Test score: 1.0

Figure 1: Accuracy of Machine Learning Algorithms

Decision Tree has the highest Train and Test Score.

algorithm are highest.

patterns, no human intervention is needed (automation), there is continuous improvement

handling multi-dimensional and multi-variety data and it has wide applications.

You might also like