Unit 5 - Machine Learning - Www.a2softech - Xyz - A2kash
Unit 5 - Machine Learning - Www.a2softech - Xyz - A2kash
Unit 5 - Machine Learning - Www.a2softech - Xyz - A2kash
Resulting set: The score S(fi) is computed from the training data,
measuring some criteria of feature fi. By convention a high score is
indicative for a valuable (relevant) feature.
• The correlation between variables and target are not enough to assess
relevance
• Filter Methods
• Wrapper Methods
• Embedded Methods
Filter Methods
Wrapper Methods
Embedded Methods
• Embedded Methods are specific to a given learning machine
• Performs variable selection (implicitly) in the process of training
• E.g. WINNOW-algorithm (linear unit with multiplicative updates).
Type of problem: It is obvious that algorithms have been designd to solve specific problems.
So, it is important to know what type of problem we are dealing with and what kind of
algorithmworksbestforeachtypeofproblem.Idon’twanttogointomuchdetailbutathigh level, machine
learning algorithms can be classified into Supervised, Unsupervised and Reinforcement
learning. Supervised learning by itself can be categorized into Regression, Classification,
and Anomoly Detection.
Size of training set: This factor is a big player in our choice of algorithm. For a small training
set, high bias/low variance classifiers (e.g., Naive Bayes) have an advantage over low
bias/high variance classifiers (e.g., kNN), since the latter will overfit. But low bias/high
variance classifiers start to win out as training set grows (they have lower asymptotic error),
since high bias classifiers aren’t powerful enough to provide accurate models [1].
Accuracy: Depending on the application, the required accuracy will be different. Sometimes
an approximation is adequate, which may lead to huge reduction in processing time. In
addition, approximate methods are very robust to overfitting.
Training time: Various algorithms have different running time. Training time is normally
function of size of dataset and the target accuracy.
Linearity: Lots of machine learning algorithms such as linear regression, logistic regression,
andsupportvectormachinesmakeuseoflinearity.Theseassumptionsaren’tbadforsome problems, but on
others they bring accuracy down. Despite their dangers, linear algorithms are very popular as
a first line of attack. They tend to be algorithmically simple and fast to train.
Number of parameters: Parameters affect the algorithm’s behavior, such as error tolerance or number of
iterations. Typically, algorithms with large numbers parameters require the most trial and
error to find a good combination. Even though having many parameters typically provides
greater flexibility, training time and accuracy of the algorithm can sometimes be quite
sensitive to getting just the right settings.
Number of features: The number of features in some datasets can be very large compared to
the number of data points. This is often the case with genetics or textual data. The large
number of features can bog down some learning algorithms, making training time unfeasibly
long. Some algorithms such as Support Vector Machines are particularly well suited to this
case [2,3].
Amazon Web Services (AWS) comes with several AI toolkits for developers. For
example, AWS Rekognition utilizes AI to build image interpretation and facial
recognition into apps with common biometric security features.
Furthermore, AWS Lex is the open source tool behind Amazon’s personal assistant Alexa.
This technology enables developers to integrate chatbots into mobile and web
applications. AWS Polly, on the other hand, utilizes AI to automate voice to
written text in 24 languages and 47 voices.
2. AI-ONE
This is a tool that enables developers to build intelligent assistants within almost all
software applications. Often referred to as biologically inspired intelligence, ai-
one’s Analyst Toolbox is equipped with the following:
• APIs
• building agents
• document library
The primary benefit of this tool is the ability to turn data into generalized sets of
rules that enable in-depth ML and AI structures.
3. DEEPLEARNING4J
Deeplearning4j or Deep Learning for Java is a leading open source deep learning
(DL) library written for Java and Java Virtual Machine (JVM). It’s specifically designed to run
on enterprise applications such as Apache Spark and Hadoop.
It also includes the following:
• Boltzmann machine
• Deep autoencoder
• Deep belief net
• Doc2vec
• Recursive neural tensor network
• Stacked denoising autoencoder
• Word2vec
4. APACHE MAHOUT