0% found this document useful (0 votes)
5 views

A Survey on Machine Learning Algorithms

This paper surveys the evolution and concepts of Machine Learning (ML), highlighting its algorithms and applications in various fields such as data mining and artificial intelligence. It discusses the intersection of statistics and computer science, the role of unlabelled data in supervised learning, and the importance of continuous learning in machines. The study also emphasizes the need for privacy-preserving data mining techniques and the potential for machines to learn autonomously from their own data.

Uploaded by

Narmadha R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

A Survey on Machine Learning Algorithms

This paper surveys the evolution and concepts of Machine Learning (ML), highlighting its algorithms and applications in various fields such as data mining and artificial intelligence. It discusses the intersection of statistics and computer science, the role of unlabelled data in supervised learning, and the importance of continuous learning in machines. The study also emphasizes the need for privacy-preserving data mining techniques and the potential for machines to learn autonomously from their own data.

Uploaded by

Narmadha R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A Survey on Machine Learning Algorithms: Concepts and

Applications
S. Akshaya1, Student I B.Sc AI ,B. Dikshaa 2 Student, I.B.Sc. AI
P.Rakshaya3 Student, I. B.Sc AI, Dr. R. Shanthy4 AP/CS
Sona College of Arts and Science

ABSTRACT: Over the past few decades, Machine Learning (ML) has evolved from the
endeavour of few computer enthusiasts exploiting the possibility of computers learning to
play games, and a part of Mathematics (Statistics) that seldom considered computational
approaches, to an independent research discipline that has not only provided the necessary
base for statistical-computational principles of learning procedures, but also has developed
various algorithms that are regularly used for text interpretation, pattern recognition, and a
many other commercial purposes and has led to a separate research interest in data mining to
identify hidden regularities or irregularities in social data that growing by second. This paper
focuses on explaining the concept and evolution of Machine Learning, some of the popular
Machine Learning algorithms and try to compare three most popular algorithms based on
some basic notions. Sentiment140 dataset was used and performance of each algorithm
in terms of training time, prediction time and accuracy of prediction has been
documented and Compared.
KEYWORDS: Machine Learning, Algorithm, Data, Training, accuracy

I. INTRODUCTION

Machine learning is a paradigm that may refer to learning from past experience (which in this
case is previous data) to improve future performance. The sole focus of this field is
automatic learning methods. Learning refers to modification or improvement of algorithm
based on past “experiences” automatically without any external assistance from human.
While designing a machine (a software system), the programmer always has a specific
purpose in mind. For instance, consider J. K. Rowling’s Harry Potter Series and Robert
Galbraith’s Cormoran Strike Series. To confirm the claim that it was indeed Rowling who
had written those books under the name Galbraith, two experts were engaged by The London
Sunday Times and using Forensic Machine Learning they were able to prove that the claim
was true. They develop a machine learning algorithm and “trained” it with Rowling’s as well
as other writers writing examples to seek and learn the underlying patterns and then “test”
the books by Galbraith. The algorithm concluded that Rowling’s and Galbraith’s writing
matched the most in several aspects. So instead of designing an algorithm to address the
problem directly, using Machine Learning, a researcher seek an approach through which the
machine, i.e., the algorithm will come up with its own solution based on the example or
training data set provided here.

A. MACHINE LEARNING : INTERSECTION OF STATISTICS AND COMPUTER SCIENCE


Machine Learning was the phenomenal outcomewhen Computer Science and Statistics
joined forces. Computer Science focuses on building machines that solve particular
problems, and tries to identify if problems are solvable at all. The main approach that
Statistics fundamentally employs is data inference, modelling hypothesises and measuring
reliability of the conclusions. The defining idea of Machine Learning is a little different but
partially dependent on both nonetheless. Whereas Computer Science concentrate on
manually programming computers, ML addresses the problem of getting computers to re-
program themselves whenever Methods used in Machine Learning Over past years an
enormous number of ML algorithms was introduced. Only some of them were able to solve
the problem so they replaced by another one [3]. There are three ML algorithms for example
unsupervised learning and reinforcement learning, supervised learning, Supervised learning:
It consists of a given set of input variables (training data) which are pre labelled and target
data [5]. Using the input variables it generate a mapping function to map inputs to required
outputs. Parameter adjustment procedure continues until the system acquired a suitable
accuracy extent regarding the teaching data

B. PE psychology of humans and animals.


Study of the human and animal brain in neuroscience, psychology, and allied disciplines is a t
hird field of study that is strongly tied to machine learning.The researchers hypothesised that
t tudy of the human and animal brain in neuroscience, psyhology, and allied disciplines is a th
ird field of study that is strongly tied to machine learnigThe researchers hypothesised that the
way an animal or human mind learns through time andexperience would probably not differ
much from how a machine could learn from experience.
However, studies focused on applying human brain learning techniques to machine learning
hallenges have not yet produced as promising results as those focussing on statistical and co
putational approaches.This could be because there is still much to learn about the psychology
of humans and anima.Notwithstanding these challenges, machine learning and human learnin
g are working together more and more he way an animal or human mind learns through time
and experience would probably not differ much from how a machine could learn from experi
ence.However, studies focused on applying human brain learning techniques to machine lear
ning challenges have not yet produced as promising results as those focussing on statistical an
d computational approaches.This could be because there is still much to learn about theNotwi
thstanding these challenges, machine learning and human learning are working together more

C. DATA MINING, ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING


In practise, these three disciplines are so intertwined and overlapping that it’s almost
to draw a boundary or hierarchy among the three. To put it in other words, these three
fields are symbiotically related and a combination of these approachesmay be used as a tactic
to produce more efficient and sensitive outputs. Roughly, Data mining is basically about
interpreting any kind of data, but it lays the foundation for both artificial intelligence and
machine learning. In practice, it not only sample information from various sources but it
analyses and recognises pattern and correlations that exists in those information that would
have been difficult to interpret manually. Hence, data mining is not a mere method to prove a
hypothesis but method for drawing relevant hypotheses.That mined data and the
corresponding patterns and hypotheses may be utilised the basis for both machine learning
and artificial intelligence. Artificial intelligence may be broadly defined asmachinesthose
having the ability to solve a given problem on their own without any human intervention. The
solutions are notprogrammed directly into the system but the necessary data and the AI
interpreting that data produce a solution by itself. The interpretation that goes underneath
isnothing but a data mining algorithm. Machine learning takes promote the approach to an
advanced level by providing the data essential for a machine to train and modify suitably
when exposed to new data. This is known as “training”. It focuses on extracting information
from considerably largesets of data, and then detects and identifies underlying patterns
using various statistical
measures to improve its ability to interpret new data and produce more effective results.
Evidently, some parameters should be “tuned” at the incipient levelfor better productivity.
Machine learning is thefoothold of artificial intelligence. It is improbable to design
any machinehaving abilitiesassociated with intelligence, like language or vision, to get there
at once. That task would have been almost impossible to solve. Moreover, a system can not
be considered completely intelligent if it lacked the ability to learn and improve from its
previous exposures

II. Literature Survey


The Several applications mentioned earlier suggests considerable advancement so far in
ML algorithms and their fundamental theory. The discipline is divulging in several
direction, probing a range of learning problems. ML is a vast discipline and over past few
decades numerous researchers have added their works in this field. The enumeration of these
works are countably infinite and mentioning every work is out of the scope of this paper.
However this paper describes the main research questions that are being pursued at present
and provide references to some of the recent notable works on that task. [10][11][25][26][27]

A. USING UNLABELLED DATA IN SUPERVISED LEARNING


Supervised learning algorithms approximate the relation between features and labels by
defining anestimator f : X → Y for a particulargroup of pre-labeled training data { , y }.
The main challenge in this approach is pre-labeled data is not always readily available. So
before applying Supervised Classification, data need to be preprocessed, filtered and labeled
using unsupervised learning, feature extraction, dimensionality reduction etc. there by adding
to the total cost. This hike in cost can be reduced effectively if the Supervised algorithm can
make use of unlabelled data (e.g., images) as well. Interestingly, in many special instances of
learning problems with additional assumptions, unlabelled data can indeed be warranted to
improve the expected accuracy of supervised learning. Like,consider classifying web pages
or detecting spam emails. Currently active researchers are seriously taking into account new
algorithms or new learning problems to exploit unlabelled data efficiently.
[12][13][14][15][16]

B. TRANSFERRINGTHE LEARNING EXPERIENCE


In many real life problem, the supervised algorithm may involve learning a family of
related functions (e.g., diagnosis functionsfor hospitals across the globe) rather than a
single function. Even if the diagnosis functionsfor different cities (e.g., Kolkata and
London) are presumed to be relatively different, some commonalities are anticipated as well.
ML algorithmslike hierarchical Bayesian methodsgive one approach that assumes the
learning parameters of both the functions, say for Kolkata and London respectively,
havesome common prior probabilities, and allows the data from different city hospitals to
overrulerelevant priors as fitting. The subtlety further increases when the transfer among the
functions are compounded.

C. LINKING DIFFERENT ML ALGORITHMS


VariousML algorithms have been introduced and experimented on in a number of domains.
One trail of research aims to discover thepossible correlations among the existing ML
algorithms, and appropriate case or scenarios to use a particular algorithm. Consider, theses
two supervised classification algorithms, Naive Bayes and Logistic Regression. Both of them
approach many data sets distinctly, but their equivalence can be demonstrated when
implemented to specific types of training data (i.e., when the criteria of Naive Bayes
classifier are fulfilled, and the number of examples
in trying set tends to infinity). In general, the conceptual understanding of ML algorithms,
theirconvergence features, and their respective eeffectiveness and limitations to date remain a
radical research concern.

D. BEST STRATEGICAL APPROACH FOR LEARNERS WHICH COLLECTS


THEIR OWN DATA
A border research discipline focuses on learning systems that instead of mechanically using
data collected by some other means, actively collects data for its own processing and
learning. The research is devoted into finding the most effective strategy to completely hand
over the control to the learning algorithm. For example consider a drug testing systemwhich
try to learn the success of the drug while monitoring the exposed patients for possible
unknown side effects and try to in turn minimising them. [17][18][19][20]

e. PRIVACY PRESERVING DATA MINING


This approach involvessuccessfully applying data mining and obtaining results without
exploiting the underlying informationis attracting variety of research communities and
beyond. Consider,a medical diagnosis routine trainedwith data from hospitals all over the
world. But due to privacy concerns, this kind of applications is not largely pursued.Even if
this presents a cross road between data mining and data privacy, ongoing research says a
system can have both. One proposed solution of the above probleis to develop a shared
learning algorithm instead of a central database. Each of the hospitals will only be allowed to
employ the algorithm under pre-defined restrictions to protect the privacy of the patients and
then hand it over tothe next. This is an booming research domain, combining statistical
exploitation of data and recent cryptographic techniques to ensure data privacy.

F. NEVER-ENDING LEARNERS
Most of the machine learning tasks entails training the learner using certain data sets, then
setting aside the learner and utilise the output. Whereas, learning in humans and other
animals learn continuously, adapting different skills in succession with experience, and use
these learnings and abilities in a thoroughly synergistic way. Despite of sizeable commercial
applications of ML algorithms, learning in machines(computers)to date has remained
strikingly lacking compared to learning in human or animal. An alternative approach
that more diligently capture the multiplicity, adeptness and accumulating character of
learning in human, is named as never- ending learning. For instance, the Never Ending
Language Learner (NELL)[8] is a learner whose function is learning to read webpages and
has been reported to world wide web every hour since January 2010. NELL has obtained
almost 80 million confidence- weighted opinions (Example, served With(tea, biscuits)) and
has been able to learn million pairs of features and parameters that capacitate it to acquire
these beliefs. Furthermore, it has become competent in reading (extracting) more beliefs, and
overthrow old in accurateones, adding to a collection of confidence and provenance for
each belief and there by
REFERENCES

1. T. M. Mitchell, Machine Learning, McGraw-Hill International, 1997.


2. T.M. Mitchel, The Discipline of Machine Learning, CMU-ML-06-108, 2006
3. N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, 2000.
4. E. Osuna, R. Freund, and F. Girosi. Support vector machines: training and applications. AI Memo 1602, MIT, May 1997.
5. V. Vapnik. Statistical Learning Theory. John Wiley & Sons, 1998.
6. C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery,
2(2):1-47, 1998.
7. Taiwo Oladipupo Ayodele, Types of Machine Learning Algorithms, New Advances in Machine Learning, Yagang Zhang
(Ed.), InTech, 2010
8. T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J.
Krishnamurthy, N. Lao, K.
Mazaitis, T. Mohamed, N. Nakashole, E. Platanios,A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X.
Chen, A. Saparov,M.
Greaves, J. Welling, Never-Ending Learning, Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence,
2014
9. Pedregosa et al.,Scikit-learn: Machine Learning in Python, JMLR 12, pp. 2825-2830, 2011.
10. Wang, J. and Jebara, T. and Chang, S.-F. Semi-supervised learning using greedy max- cut.Journal of Machine
Learning Research , Volume
14(1), 771-800 2013
11. Chapelle, O. and Sindhwani, V. and Keerthi, S. S. Optimization Techniques for Semi- Supervised Support Vector
Machines, Journal of Machine
Learning Research , Volume 9, 203–233, 2013
12. J. Baxter. A model of inductive bias learning. Journal of Artificial Intelligence Research, 12:149–198, 2000.
13. S. Ben-David and R. Schuller. Exploiting task relatedness for multiple task learning. In Conference on Learning
Theory, 2003.
14. W. Dai, G. Xue, Q. Yang, and Y. Yu, Transferring Naive Bayes classifiers for text classification.AAAI Conference
on Artificial Intelligence,
2007.
15. H. Hlynsson. Transfer learning using the minimum description length principle with a decision tree application.
Master’s thesis, University of
Amsterdam, 2007.
16. Z. Marx, M. Rosenstein, L. Kaelbling, and T. Dietterich. Transfer learning with an ensemble of background
tasks. In NIPS Workshop on
Transfer Learning, 2005.
17. R Conway and D Strip, Selective partial access to a database, In Proceedings of ACM Annual Conference, 85 - 89,
1976
18. P D Stachour and B M Thuraisingham Design of LDV A multilevel secure relational databasemanagement system,
IEEE Trans. Knowledge
and Data Eng., Volume 2, Issue 2, 190 - 209, 1990
19. R Oppliger, Internet security: Firewalls and beyond, Comm. ACM, Volume 40, Issue 5, 92 -102, 1997
20. Rakesh Agrawal, Ramakrishnan Srikant, Privacy Preserving Data Mining, SIGMOD '00 Proceedings of the 2000
ACM SIGMOD international
conference on Management of data, Volume 29 Issue 2,Pages 439-450, 2000
21. A. Carlson, J. Betteridge, B.Kisiel, B.Settles,E. R.Hruschka Jr,and T. M. Mitchell, Toward an architecture for never-
ending language learning,
AAAI, volume 5, 3, 2010
22. X. Chen, A. Shrivastava, and A. Gupta, Neil: Extracting visual knowledge from web data, In Proceedings of ICCV,
2013.
23. P. Donmezand J. G. Carbonell, Proactive learning: cost-sensitive active learning with multiple imperfect oracles.
In Proceedings of the 17th
ACM conference on In- formation and knowledge management, 619–628. ACM, 2008
24. T. M.Mitchell, J. Allen, P. Chalasani, J. Cheng, O. Etzioni, M. N. Ringuetteand J. C. Schlimmer, Theo: A
framework for self-improving
systems, Arch. for Intelli- gence 323–356, 1991
25. Gregory, P. A. and Gail, A. C. Self-supervised ARTMAP Neural Networks, Volume 23, 265-282, 2010
26. Cour, T. and Sapp, B. and Taskar, B. Learning from partial labels, Journal of Machine Learning Research, Volume 12,
1501-1536 2012
27. Adankon, M. and Cheriet, M. Genetic algorithm-based training for semi-supervised SVM, Neural Computing and
Applications , Volume 19(8)

You might also like