Machine Learning: Short Hand Book
Machine Learning: Short Hand Book
ML CONCEPTS
FOR EVERYONE
H i ever yon e. L e t’s ta l k numb ers
According to Google Trends, the interest in
Machine Learning (ML) has increased by 400
percent since 2004. We have watched the trend
of Machine Learning in Macedonia and Europe
grow from a small number of data scientists to
the generally accepted mainstream of analysis
and business. While this opened doors for many
innovations and improvements among us and
our customers worldwide, the confusion of what
exactly Machine Learning represents provokes
reactions ranging from curiosity to anxiety among
people everywhere. This guidebook begins by defining some basic
concepts. It then goes on to explain some of
We decided to make and share this e-guideline in- the most common algorithms used in machine
spired by the notion that there are not that many learning today, such as linear regression and
resources out there which precisely answer What tree-based models. Then we go deeper and dis-
is machine learning? And even less try to do so cuss how to evaluate and fine-tune these mod-
using minimal technical terms. When explained els. Next, we look at clustering models. And we
simply, the basic concept of machine learning is finish with some resources that you can explore
not very difficult to grasp. if you want to learn more.
2
DATA MASTERS MACHINE LEARNING HANDBOOK
AN INTRODUCTION TO
MACHINE LEARNING AND
DATA SCIENCE CONCEPTS
KEY CONCEPTS OF DATA SCIENCE AND MACHINE LEARNING
T HE ML
Machine Learning is a way for machines to imitate and adapt human like behavior. This means that the
computers learn things without being specifically programed to do so. But how exactly does that hap-
pen?
A LG OR ITHMS
They are nothing more than a set of rules that a computer can follow. Think about this for example….
How did you learn to do long division? Perhaps you learned to take the denominator and divide it into
the first digits of the numerator, then subtracting the subtotal and continuing with the next digits until
you were left with a remainder.
OK...
So, what the algorithm does, is it takes
that logic of thinking and does it much
faster than any human can do.
3
DATA MASTERS MACHINE LEARNING HANDBOOK
In Machine Learning this problem can be classified in two groups: prediction or clustering. Prediction is
a process where the algorithm estimates the output variable, from a set of input variables. For exam-
ple: Given all the characteristic of a spam email we can actually predict whether a given email is a spam
or not.
R E GR E SS IO N C L ASS I F I CAT I O N
PR O B LE M S P R O B L EM S
Regression problems, problems where the Classification problems, problems where the
output to predict is a number (e.g. Number of output to predict is a class (e.g. Is an email spam
emails sent in a day) or not spam)
4
DATA MASTERS MACHINE LEARNING HANDBOOK
TOP PREDICTION
ALGORITHMS
PROS AND CONS OF SOME OF THE
TOP MACHINE LEARNING ALGORITHMS
The most prominent and common algorithms used in machine learning belong to one of three groups:
linear models, tree-based models, and neural networks.
Let’s take a look of some of the most important and used machine learning algorithms.
We group them in three groups:
L IN E AR TREE-BAS ED NE U R A L
MOD E LS M ODELS NET WO R K
Statisticians use this class types When people are taking about This type of models can han-
of models due to their simplic- machine learning problems or dle very complex datasets and
ity. An output variable can be solution there is no way that tasks, but they are very slow to
very easily be represented as a they don’t mention the tree- train because often they have
relation of known variables you based, think decision trees or very complicated architecture.
know (independent variables) tree models.
and so the prediction is noth-
ing more than an output of an
equation given the input vari-
ables.
5
DATA MASTERS MACHINE LEARNING HANDBOOK
LINEAR
MODEL
APPROACH
6
DATA MASTERS MACHINE LEARNING HANDBOOK
TREE-BASED
MODEL
APPROACH
Decision trees are nothing are It’s like making a cake. You
not as scary and complicated as decide how much flower, sugar
When people are taking about they sound. A decision tree is a and chocolate you put in the
machine learning problems or graph that is using a branching box while you make it. Different
solution there is no way that mechanism to show every single decision regarding the amount
they don’t mention the tree- output of a decision. of the ingredients or the deci-
based, think decision trees or One of the reasons they are so sion to include them or not will
tree models. powerful is because they can follow with different type of
be easily visualized so that a cake (or output in our case). In
human can understand what’s machine learning, the branches
going on. Imagine a flowchart, used are binary yes/no answers.
where each level is a question
with a yes or no answer. Eventu-
ally an answer will give you a
solution to the initial problem.
That is a decision tree.
7
DATA MASTERS MACHINE LEARNING HANDBOOK
NEURAL
NETWORKS
This type of models can han- Deep Learning … the word that
dle very complex datasets and everyone talks about but not
Neural networks refer to a bio- tasks, but they are very slow to many know what it means.
logical phenomenon comprised train because often they have Deep Learning is an artifi-
of interconnected neurons that very complicated architecture. cial intelligence function that
exchange messages with each imitates the workings of the
other. The artificial neural net- human brain in processing data
works are built like the human and creating patterns for deci-
brain, with neuron nodes inter- sion making. Deep Learning can
connected like a web. This idea be done with several layers of
has now been adapted to the neural networks put one after
world of machine learning and the other.
is called ANN (Artificial Neural
Networks).
8
DATA MASTERS MACHINE LEARNING HANDBOOK
9
DATA MASTERS MACHINE LEARNING HANDBOOK
10
DATA MASTERS MACHINE LEARNING HANDBOOK
UNSUPERVISED LEARNING
AND CLUSTERING
OVERVIEW OF THE MOST USED METHODS IN CLUSTERING
K-M E A NS H I ER A R C H I CA L
C LUST E R I N G
K-Means clustering is one of the algorithms Hierarchical clustering is one of the algorithms
of Clustering technique, in which similar data of Clustering technique, in which similar data is
grouped in a cluster. K-means is an iterative grouped in a cluster. It is an algorithm that builds
clustering algorithm that aims to find local the hierarchy of clusters. This algorithm starts
maxima in each iteration. It starts with K as the with all the data points assigned to a bunch of
input which is how many groups you want to their own. Then two nearest groups are merged
see. Input k centroids in random locations in into the same cluster. In the end, this algorithm
your space. Now, with the use of the Euclidean terminates when there is only a single cluster
distance method calculate the distance between left.
data points and centroids and assign data point
to the cluster which is close to it. Recalculate the
cluster centers as a mean of data points attached
to it. Repeat until no further changes occur.
11
DATA MASTERS MACHINE LEARNING HANDBOOK
APPLICATION OF MACHINE
LEARNING
OVERVIEW OF THE MOST COMMON DOMAINS WHERE
MACHINE LEARNING IS IMPLEMENTED
There are many uses of Machine Learning in various areas, such as Medicine, Defense, Technology,
Finance, Security, etc. These fields’ areas fields use different applications of Supervised, Unsupervised
and Reinforcement learning. You can see all of the domains where machine learning is applicable in the
drawing below.
Targeted
Marketing
Clustering
Machine Population
Growth
Regression
Market
Learning
Prediction Forecasting
Customer Estimating
Segmentation life expectancy
Reinforcement
Learning
Robot Navigation Skill Acquisition
Learning Tasks
12
DATA MASTERS MACHINE LEARNING HANDBOOK
DATA SCIENCE
A data scientist is someone who knows how to
extract meaning from and interpret data, which
requires both tools and methods from statistics
LEARNING PATH
and machine learning. He spends a lot of
time in the process of collecting, cleaning and
munging data, because data is never clean.
90% 100%
Data Scientist
Data ingestion Toolbox
13
A BOU T
DATA MASTERS
1000 SKOPJE
NORTH MACEDONIA
WWW.DATAMASTERS.CO
We are a dedicated and passionate team, specialized in the area of BI and Advanced Analytics, extremely
motivated to address the challenges of any company and help them become Masters of their Data.
Each one of us, with our personal portfolios and experience is ready to share our worldly knowledge in
Business Intelligence, Data Warehousing, Product Development and Machine Learning. We are eager to col-
laborate with companies and help them reach their best potential.
Business Intelligence integrates data from across your enterprise and provides self-service reporting and
analysis at your fingertips. It helps decision makers spend less time looking for answers and more time im-
plementing strategic decisions. Our BI Masters can develop custom solutions for you and provide services in
Data Integration, Data Warehouse Management, Enterprise Reporting, ETL and Data Visualization
Advanced Analytics - the nature of AI is complex and because of that many business and technology leaders
aren’t sure where to start. In addition, machine learning projects can be slow and expensive to implement.
Data Masters has a simple mission - to make machine learning, AI and data science much more accessible,
faster and efficient. Our AI masters can teach machines to read (natural language processing / NLP), to un-
derstand business data and to make high value custom predictions across most common business verticals.
Product development covers the complete process of bringing a new product to the market. Our engineers
and designers work in tandem to understand the core of your product. With both technical and methodo-
logical expertise, our team minimizes the costs of production and assembly dedicating themselves in the
analytical component of your product.