ML_Concepts&Algorithms
ML_Concepts&Algorithms
Hermann Völlinger,
Mathematics & IT-Architecture
www.dhbw-stuttgart.de
Page 2
Status: 1. Dezember 2020
Wahlmodul Informatik
Data Mining
Data Mining & Martin Clement
Evaluation Share
Page 3
Status: 1. Dezember 2020
Data Mining:
• Daten und Datenanalyse
• Clustering
• Classification
• Assoziationsanalyse
• Weitere Verfahren, z.B.:
• Regression
• Deviation Detection
• Visualisierung
Page 6
Status: 1. Dezember 2020
Page 7
Status: 1. Dezember 2020
Page 8
Status: 1. Dezember 2020
Supervised Learning
Supervised learning is the most popular paradigm for machine learning. It is the easiest to
understand and the simplest to implement. It is very similar to teaching a child with the
use of flash cards.
Given data in the form of examples with labels, we can feed a learning algorithm these
example-label pairs one by one, allowing the algorithm to predict the label for each
example, and giving it feedback as to whether it predicted the right answer or not.
Unsupervised Learning
Unsupervised learning is very much the opposite of supervised learning. It features no
labels. Instead, our algorithm would be fed a lot of data and given the tools to understand
the properties of the data. From there, it can learn to group, cluster, and/or organize the
data in a way such that a human (or other intelligent algorithm) can come in and make
sense of the newly organized data.
Reinforcement Learning
Reinforcement learning is learning from mistakes. Place a reinforcement learning algorithm
into any environment and it will make a lot of mistakes in the beginning. So long as we
provide some sort of signal to the algorithm that associates good behaviors with a positive
signal and bad behaviors with a negative one, we can reinforce our algorithm to prefer
good behaviors over bad ones. Over time, our learning algorithm learns to make less
mistakes than it used to.
We have 8 chapters, 3 are skipped (see gray color font). The date can
change if necessary:
1. ML1: Introduction to Machine Learning - 29.09.2020, 10:00-12:30
2. ML2: Concept Learning: Version Spaces & Candidate Elim. Algorithm
3. ML3: Supervised and Unsupervised Learning - 06.&13.10.20, 10:00-12:30
4. ML4: Decision Tree Learning - 20.10.2020, 10:00-12:30
5. ML5: Simple Linear - & Multiple Regression - 27.10.2020, 10:00-12:30
6. ML6: Neural Networks: Convolutional NN - 03.11.2020, 10:00-12:30
7. ML7: Neural Network: BackPropagation Algorithm
8. ML8: Support Vector Machines (SVM)
Actual time plan:
https://rapla.dhbw-stuttgart.de/rapla?key=txB1FOi5xd1wUJBWuX8lJoG0cr9RVi1zB7e5WYTczgq3qJZTian3jkGZb9hTVbzP
Page 10
Status: 1. Dezember 2020
8. [HVö-1]: Hermann Völlinger: Script of the Lecture "Introduction to Data Warehousing“; DHBW
Stuttgart; WS2019
9. [HVö-2]: Hermann Völlinger and Other: Exercises & Solutions of the Lecture "Introduction to
Data Warehousing“; DHBW Stuttgart; WS2019
10. [HVö-3]: Hermann Völlinger: MindMap of the Lecture "Machine Learning: Concepts &
Algorithms” “; DHBW Stuttgart; WS2020
11. [HVö-4]: Hermann Völlinger and Other: Exercises & Solutions of the Lecture "Machine
Learning: Concepts & Algorithms” “; DHBW Stuttgart; WS2020
12. [HVö-5]: Hermann Völlinger: Script of the Lecture "Machine Learning: Concepts &
Algorithms“; DHBW Stuttgart; WS2020
13. [HVö-6]: Hermann Völlinger: GitHub to the Lecture "Machine Learning: Concepts &
Algorithms“; see in: https://github.com/HVoellinger/Lecture-Notes-to-ML-WS2020
14. [MatLab]: MatLab eBook: ”Reinforcement Learning with MATLAB - Understanding Training
and Deployment”; MathWorks 2019;
https://www.slideshare.net/HiteshMohapatra/reinforcement-learning-ebook-part3
15. [SfUni-1]: Stanford University (USA) - Machine Learning Course, by Andrey Ng:
https://www.coursera.org/learn/machine-learning;
Page 11
Status: 1. Dezember 2020
17. [Sift]: Sift Science Engineering – Article: “Deep Learning for Fraud Detection - Introduction”:
https://engineering.siftscience.com/deep-learning-fraud-detection/
20. [TMun]: Toshinori Munakata: "Fundamentals of the new Artificial Intelligence", Springer
Verlag, 2. Edition (January 2011)
21. [TU-Darm]: TU Darmstadt: Data Mining und Maschinelles Lernen - WS 17/18 „Einführung
in maschinelles Lernen und Data Mining“: http://www.ke.tu-darmstadt.de/lehre/ws-17-
18/mldm
Page 12
Status: 1. Dezember 2020
Beside the information you get from the literature, you have also the chance
to learn from other ML experts in your town. For example you can visit a
Meetup meeting (i.e under the logo “Cyber Valley”) in Stuttgart/Tübingen.
Groupwork (2 Persons). Compare the differences of the three categories, see slide
“goal of lecture (2/2)”:
1. Supervised - (SVL)
2. Unsupervised - (USL)
Give of short descriptions of the categories and explain the differences (~5 minutes for
each category).
Page 13
Status: 1. Dezember 2020
https://www.youtube.com/watch?v=5dLG3JDk2VU https://www.youtube.com/watch?v=XHjIqQBsPjk
Page 14
Status: 1. Dezember 2020
Like the dream of flying, artificial intelligence (AI) has long been a dream and vision. Muscle
power was not enough to enable man to fly (see "Schneiderlein von Ulm"). Even the Wright
brothers would not have gotten so far using a steam engine weighing tons. Only the discovery of
the light diesel engine brought the breakthrough.
Similar in computing only the quantum jump in performance enabled the new digital revolution.
See next slide.
Page 15
Status: 1. Dezember 2020
There are three main drivers for Digitalization (i.e. ML), compared to end
of the last century (1997-1998) - only 20 years ago:
~ 100-times better CPU power available using cloud-based infrastructure
~ 100-times more data (and more use cases) available than 20 yeas ago
~ 100-times better mathematical algorithms and models
===➔ ~ 1 million better possibilities
Moore's law is the
observation that the
number of transistors in a
dense integrated circuit
doubles about every two
years.
https://en.wikipedia.org/wi
ki/Moore%27s_law
Page 16
Status: 1. Dezember 2020
https://www.youtube.com/watch?v=OmJ-4B-mS-Y
In the most cases where persons are interviewed about the usage of KI, it
was not used in production-processes in the German companies (see
chemistry, travel, logistics, machine building, etc.). Only in Automobile (20%)
or Finance (~10%) we see some progress in this area (compare slide 23) .
Currently the topic of artificial intelligence is dividing the spirits. On the one
hand, it brings great progress, on the other, it carries risks that are difficult to
assess. This becomes evident in the discussion about self-driving cars which
would make road traffic much safer but are highly debated and are feared to
be partly unpredictable.
Tesla founder Elon Musk warns against the use of artificial intelligence, which
could be more dangerous for humanity than nuclear weapons. The Germans
are also skeptical about the use of Artificial Intelligence in general as above
YouGov survey clearly shows.
Page 18
Status: 1. Dezember 2020
Page 19
Status: 1. Dezember 2020
See the book „Machine Learning” from Tom Mitchell, McGraw Hill, 1997:
https://www.cs.cmu.edu/~tom/mlbook.html
Machine Learning is the study of computer algorithms that improve automatically
through experience. Applications range from datamining programs that discover
general rules in large data sets, to information filtering systems that automatically
learn users' interests. This book provides a single source introduction to the
field. It is written for advanced undergraduate and graduate students, and for
developers and researchers in the field. No prior background in artificial
intelligence or statistics is assumed.
See the following two examples:
1. Chess Playing, where Task T is playing chess. Performance measure P is
percent of games won against opponents and Training experience E is
playing practice games against itself.
2. Robot Driving, where Task T is driving on public four-lane highways using
vision sensors. Performance measure P is average distance traveled before
an error (as judged by human overseer) and Training experience E is a
sequence of images and steering commands recorded while observing a
human driver.
Page 20
Status: 1. Dezember 2020
Page 21
Status: 1. Dezember 2020
Page 22
Status: 1. Dezember 2020
Page 23
Status: 1. Dezember 2020
Page 24
Status: 1. Dezember 2020
Page 25
Status: 1. Dezember 2020
In addition to this variety of data types and growing data volume, incoming data can
also evolve with respect to velocity, that is, more data being generated at a faster or a
variable pace. Business rules define the business process and include objectives
constraints, preferences, policies, best practices, and boundaries. Mathematical
models and computational models are techniques derived from mathematical
sciences, computer science and related disciplines such as applied statistics, machine
learning, operations research, natural language processing, computer vision, pattern
recognition, image processing, speech recognition, and signal processing.
The correct application of all these methods and the verification of their results implies
the need for resources on a massive scale including human, computational and
temporal for every Prescriptive Analytic project.
In order to spare the expense of dozens of people, high performance machines and
weeks of work one must consider the reduction of resources and therefore a reduction
in the accuracy or reliability of the outcome. The preferable route is a reduction that
produces a probabilistic result within acceptable limits.
https://youtu.be/ARotWjhRpjE
Page 26
Status: 1. Dezember 2020
Cambridge Consultants — A new artificial intelligence system can turn simple sketches
into paintings reminiscent of works by great artists of the 19th and 20th centuries,
researchers say.
The artificial intelligence (AI) system, dubbed Vincent, learned to paint by "studying"
8,000 works of art from the Renaissance up to the 20th century. According to the
system's creators — engineers from the United Kingdom-based research and innovation
company Cambridge Consultants — Vincent is unique not only in its ability to make art
that is actually enjoyable but also in its capability to respond promptly to human input.
"Vincent allows you to draw edges with a pen, edges of a picture you can imagine in
your mind, and from those pictures, it produces a possible painting based on its
training," said Monty Barlow, director of machine learning at Cambridge Consultants,
who led the project. "There is this concern that artificial intelligence will start replacing
people doing things for them, but Vincent allows humans to take part in the decisions of
the creativity of artificial intelligence." [Super-Intelligent Machines: 7 Robotic Futures]
https://blog.netapp.com/how-vincent-ai-learned-to-paint/
Page 27
Status: 1. Dezember 2020
Teaching Vincent
Barlow said that using only 8,000 works of art to train Vincent is by itself a major
achievement. Previously, a similar system would have needed millions, or even
billions, of samples to learn to paint.
"Most machine learning deployed today has been about classifying and feeding lots
and lots of examples into a system," Barlow said. "It's called supervised
learning. You show a million photos of a face, for example, and a million photos of
not a face, and it learns to detect faces."
Vincent uses a more sophisticated technique that allows the machine to teach itself
automatically, without constant human input. The system behind Vincent's abilities
is based on the so-called generative adversarial network, which was first described
in 2014.
The technique uses two neural networks that compete with each other. At the
beginning, both networks are trained, for example, on images of birds.
Subsequently, one network is tasked with producing more images of birds that
would persuade the other network that they are real. Gradually, the first network
gets better at producing realistic images, while the second one gets better at
spotting fakes, according to the researchers.
https://deepart.io/
Page 28
Status: 1. Dezember 2020
The Handelsblatt has written a good report. “Digitization and Health - The Medicine of the Future
- How AI protects us against cancer and heart attack”: http://a.msn.com/05/de-de/BBT0lCR?ocid=se
Page 29
Status: 1. Dezember 2020
Page 30
Status: 1. Dezember 2020
Page 31
Status: 1. Dezember 2020
Page 32
Status: 1. Dezember 2020
Page 33
Status: 1. Dezember 2020
Page 34
Status: 1. Dezember 2020
Page 35
Status: 1. Dezember 2020
Today, machine-learning algorithms are mainly used in the field of image analysis
and recognition. In the future, speech recognition and processing will become more
important. The processing and analysis of large amounts of data is a core task of
such a digital infrastructure platform.
Therefore, IT managers must ensure that their IT can handle different artificial
intelligence processes. Server, storage and network infrastructures must be designed
for new ML-based workloads. Data management must also be prepared so that ML-
as-a-Service offerings in the cloud can be used. In the context of ML, alternative
hardware components such as GPU-based clusters from Nvidia, Google's Tensor
Processing Unit (TPU) or IBM's TrueNorth processor have become popular in recent
months.
Companies have to decide whether they want to invest themselves or use the
services of corresponding cloud providers. One of the major uses for ML is speech
recognition and processing. Amazon Alexa is currently moving into households,
Microsoft, Google, Facebook and IBM have invested here a large part of their
research and development funds and purchased specialist firms.
It can be foreseen that natural language communication becomes more natural at
the customer interface. The operation of digital products and enterprise IT solutions
will also be possible via voice command. This affects both the customer frontend and
the IT backend.
Page 36
Status: 1. Dezember 2020
With large cloud providers including ML services and products in their service
portfolio, it's relatively easy for users to get started. Amazon Machine Learning,
Microsoft Azure Machine Learning, IBM Bluemix, and Google Machine Learning
provide cost-effective access to related services through the public cloud.
The more they get involved, the greater the risk of vendor lock-ins. Therefore, users
should think about their strategy before starting. IT service providers and managed
service providers can also deploy and operate ML systems and infrastructures,
making independence from the public cloud providers and their SLAs equally
possible.
Page 37
Status: 1. Dezember 2020
The usage behavior of ML is very different not only between, but also within the industries. In the
automotive industry, for example, there are big gaps between the pioneers and the latecomers.
Real-time image and video analysis and statistical methods and mathematical models from
machine learning and deep learning are widely used for the development and production of self-
driving cars. Some methods are also used to detect manufacturing defects.
The share of innovators, who already use ML to a large extent, is the largest in the automotive
industry at around 20 percent. In contrast, however, there are 60 percent who deal with ML but are
still in the evaluation and planning phase. Thus, it turns out that in the automotive industry, some
lighthouses shape the picture, but from a nationwide adaptation, there is no question.
The mechanical and plant engineering companies also have half (53 percent) in the evaluation and
planning phase. Nearly one-third use ML productively in selected applications and 18 percent are
currently building prototypes. Next are the commercial and consumer goods companies, which are
44 percent to test ML in initial projects and prototypes. This is not surprising given that these
companies usually have well-maintained data sets and a lot of experience in business intelligence
and data warehouses. If they succeed in measurably improving pricing strategies, product
availability or marketing campaigns, ML is seen as a welcome innovation tool for existing big-data
strategies.
The same applies to the IT, telecoms and media industries: there ML processes have long been
used, for example, for playing online advertising, calculating purchase probabilities (conversion
rates) or personalizing web content and shopping recommendations. For professional service
providers, measuring and improving customer loyalty, quality of service and on-time delivery play
an important role, as these are the competitive differentiating factors.
Page 38
Status: 1. Dezember 2020
When it comes to selecting platforms and products, solutions from the public cloud play an
increasingly important role (ML as a Service). In order to avoid complexity and because the
major cloud providers are also the leading innovators in this field, many users are choosing
these cloud solutions. While 38.1 of the respondents prefer solutions from the public cloud,
19.1 percent choose proprietary solutions from selected providers and 18.5 percent open
source alternatives. The rest either follow a hybrid strategy (15.5 percent) or have not yet
formed an opinion (8.8 percent).
Among the cloud-based solutions AWS (https://aws.amazon.com/de/machine-learning/) has the
highest level of awareness: 71 percent of the decision makers indicate that they Amazon in this
context is known. Even Microsoft, Google and IBM are the survey respondents to more than
two-thirds in the ML environment. Interestingly enough, only 17 percent of respondents use
AWS cloud services in the context of evaluation, design and production operations for ML.
About one third of the respondents in each case deal with IBM Watson ML
(https://www.ibm.com/cloud/machine-learning), Microsoft Azure ML Studio
(https://azure.microsoft.com/en-us/services/machine-learning-studio) or the Google Cloud ML
Platform (https://cloud.google.com/ml-engine/docs/tensorflow/technical-overview). The analysts
believe that this has a lot to do with the manufacturers' marketing efforts. Accordingly, IBM and
Microsoft are investing heavily in their cognitive and AI strategies. Both have strong SME and
wholesale distribution and a large network of partners. Google, however, owes its position to
the image as a huge data and analytics machine, which drives the market through many
innovations - such as Tensorflow, many ML-APIs and their own hardware. After all, HP
Enterprise with "Haven on Demand" is also one of the relevant ML players and is used by 14%.
The development of artificial intelligence is rapid and explosive at the same time: Algorithms that decide
for us, and the capabilities of machines that surpass us humans already raise many ethical questions:
How should an autonomous car behave in everyday life? Which rules will apply to robots in the future? Is it
possible to include ethics in AI?
The latter question is at the center of the lecture. The author assumes this could work and tries to explain
how to do that. He differentiates between human ethics in dealing with AI, which he calls external ethics,
and ethics, which he calls machinery morality. In the lecture the necessity and the benefit as well as the
feasibility are discussed.
In the Special Interest Group AI at bwcon, we address ethical issues and challenges in the rapid
development of artificial intelligence with business representatives from southern Germany.
More about Special Interest Group
See also Stanley Kubrick’s famous movie from 1968: “2001: A Space Odyssey“
https://www.youtube.com/watch?v=XHjIqQBsPjk
HAL 9000: "I'm sorry Dave, I'm afraid I can't do that“ & “Deactivation of HAL 9000”
https://www.youtube.com/watch?v=ARJ8cAGm6JE&t=42s
https://www.youtube.com/watch?v=c8N72t7aScY&list=PLawr1rgf_CvSiNsWPbLOOrMKbcZRHJud7&index=25
Page 39
Status: 1. Dezember 2020
Give of short overview about the products and its features (~10 minutes for
each) und give a comparison matrix of the 3 products and an evaluation.
What is your favorite product (~ 5 minutes).
Page 40
Status: 1. Dezember 2020
Page 41
Status: 1. Dezember 2020
Not only in Silicon Valley, but also in the northern Black Forest, the artificial
intelligence is driven forward. The medium-sized company Omikron wrestles with
Google for specialists and sells its search technology to the greats: Renault,
Fresenius and Siemens rely on the machine learning from Pforzheim.
Groupwork (2 Persons) - summaries the results of the second and third YouTupe
Video “Supervised Learning” and “Unsupervised Learning” by Andrew Ng in a Report
of 15 Minutes. Create a small PowerPoint presentation. See:
https://www.youtube.com/playlist?list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN
Page 42
Status: 1. Dezember 2020
Page 43
Status: 1. Dezember 2020
References:
• T. Mitchell, 1997, Chapter 2.
• P. Winston, "Learning by Managing Multiple Models", in P.
Winston, Artificial Intelligence, Addison-Wesley Publishing Company,
1992, pp. 411-422.
See also:
http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/vspace/3_vspace.html
Motivation
Page 44
Introduction
Page 45
Definition of AQ Learning
Definition
AQ learning is a form of supervised machine learning of rules from examples and
background knowledge performed by the well-known AQ family of programs and other
machine learning methods. AQ learning pioneered separate-and-conquer approach to
rule learning in which examples are sequentially covered until a complete class
description is formed. Derived knowledge is represented in a highly expressive form
of attributional rules.
Theoretical Background
The core of AQ learning is a simple version of Aq (algorithm quasi-optimal) covering
algorithm, developed by Ryszard S. Michalski in the late 1960s (Michalski 1969). The
algorithm was initially developed for the purpose of minimization of logic functions,
and later adapted for rule learning and other machine learning applications.
Simple Aq Algorithm
Aq algorithm realizes a form of supervised learning. Given a set of positive events
(examples) P, a set of negative events N, and a...
Page 46
Page 47
https://onlinelibrary.wiley.com/doi/abs/10.1002/wics.78
Page 48
Page 49
Definition of AQ Learning
Page 50
Page 51
Status: 1. Dezember 2020
***********Platzhalter********************
***********Platzhalter********************
Page 52
Status: 1. Dezember 2020
https://images.app.goo.gl/ZiyTSGRYcnSzBsB59
Page 53
Status: 1. Dezember 2020
In the ML3 Chapter we list the most common concepts and algorithms of
SuperVised- (SVL) and UnSuperVised Learning (USVL):
In esp. under SVL we see Classification methods like Lazy Learning ( Rote
Learning, kNN Algorithm, etc. ) and Bayes Learning for Text Classification and
also Regression methods (i.e. simple linear regression).
In USVL we discuss Clustering- (i.e. K-Means Clustering) and Association
Algorithms (i.e. Predictive Market Basket Analysis).
https://en.wikipedia.org/wiki/Machine_learning#Supervised_and_semi-supervised_learning
Page 54
Status: 1. Dezember 2020
Supervised: All data is labeled and the algorithms learn to predict the
output from the input data.
Unsupervised: All data is unlabeled and the algorithms learn to inherent
structure from the input data.
Semi-supervised: Some data is labeled but most of it is unlabeled and a
mixture of supervised and unsupervised techniques can be used.
Page 55
Status: 1. Dezember 2020
Business Meaning: The vdm metrics describes the distance in the behavior of different instances of the
attribute a. See for example the outcome of the concrete calculations in the homework ML3.1
Page 56
https://pdfs.semanticscholar.org/f72c/bf9f16f244f5643273fa04c25e2697fe66b
9.pdf
Page 57
Remark: Bayes Learning is called Naive Bayes when the value of a particular feature
is independent of the value of any other feature, given the class variable. For example, a fruit may be
considered to be an apple if it is red, round, and about 10 cm in diameter. A naive Bayes classifier
considers each of these features to contribute independently to the probability that this fruit is an
apple, regardless of any possible correlations between the color, roundness, and diameter features.
Bayes’ Theorem is useful when working with conditional probabilities (like we are doing here), because it provides us with a
way to reverse them. In our case, we have, so using this theorem we can reverse the conditional probability:
Page 58
More details:
https://medium.com/analytics-vidhya/naive-bayes-classifier-for-text-classification-
556fabaf252b#:~:text=The%20Naive%20Bayes%20classifier%20is,time%20and%20less%20
training%20data.
The Naive Bayes classifier is a simple classifier that classifies based on probabilities
of events. It is the applied commonly to text classification. Though it is a simple
algorithm, it performs well in many text classification problems.
Other Pros include less training time and less training data. That is, less CPU and
Memory consumption.
As with any machine learning model, we need to have an existing set of examples
(training set) for each category (class).
Let us consider sentence classification to classify a sentence to either ‘Sports’ or
‘Not Sports’. In this case, there are two classes (“Sports” and “Not Sports”). With
the training set, we can train a Naive Bayes classifier which we can use to
automatically categorize a new sentence.
Calculating Probabilities:
The final step is just to calculate every probability and see which one turns out to be larger. Calculating a probability is just
counting in our training data. First, we calculate the a priori probability of each tag: for a given sentence in our training data, the
probability that it is Sports = P(Sports)=3/5. Then, P(Not Sports)= 2/5. That’s easy enough.
Then, calculating P(game|Sports) means counting how many times the word “game” appears in Sports texts (2) divided by the
total number of words in sports (11). Therefore, P(game|Sports)=2/11.
However, we run into a problem here: “close” doesn’t appear in any Sports text! That means that P(close|Sports)=0. This is
rather inconvenient since we are going to be multiplying it with the other probabilities, so we’ll end up with zero.
Page 59
How do we do it? By using something called Laplace smoothing: we add 1 to every count so it’s never zero. To
balance this, we add the number of possible words to the divisor, so the division will never be greater than 1. In
our case, the possible words are (see notespage): [
'a', 'great', 'very', 'over', 'it', 'but', 'game', 'election', 'clean', 'close', 'the', 'was', 'forgettable', 'match'].
Since the number of possible words is 14 (I counted them!), applying smoothing we get
that P(game|Sports)=(2+1)/(11+14)=3/25. The full results are:
https://www.youtube.com/watch?v=exHwwy9kVcg
Page 60
Page 61
Page 62
Log in into IBM Cloud and follow the tutorial descriptions (see links):
Create a “Voice Agent” by running the following steps:
• Set up the requires IBM Cloud Services
• Configure the TWILIO Account
• Configure the Voice Agent on the IBM Cloud and import Skill by uploading
either skill-banking-balance-enquiry.json or skill-pizza-order-book-
table.json
Link: https://en.wikipedia.org/wiki/Regression_analysis
Page 63
Status: 1. Dezember 2020
Page 64
Status: 1. Dezember 2020
Page 65
Status: 1. Dezember 2020
Page 66
Status: 1. Dezember 2020
Page 67
Status: 1. Dezember 2020
Page 68
K-means Clustering is one of the simplest and popular unsupervised machine learning
algorithms. Typically, unsupervised algorithms make inferences from datasets using only
input vectors without referring to known, or labelled, outcomes. The objective of K-
means is simple: group similar data points together and discover underlying patterns. To
achieve this objective, K-means looks for a fixed number (k) of clusters in a dataset.
A cluster refers to a collection of data points aggregated together because of certain
similarities. You’ll define a target number k, which refers to the number of centroids you
need in the dataset. A centroid is the imaginary or real location representing the center
of the cluster. Every data point is allocated to each of the clusters through reducing the
in-cluster sum of squares.
In other words, the K-means algorithm identifies k number of centroids, and then
allocates every data point to the nearest cluster, while keeping the centroids as small as
possible. The ‘means’ in the K-means refers to averaging of the data; that is, finding the
centroid. More details:
K-means algorithm: Let’s see the steps on how the K-means machine learning algorithm
works using the Python. We’ll use the Scikit-learn library and some random data to
illustrate a K-means clustering ….See more details under:
https://towardsdatascience.com/understanding-k-means-clustering-in-machine-
learning-6a6e67336aa1
https://github.com/bhattbhavesh91/k_means_iris_dataset/blob/master/K_in_K_means_Clustering.ipynb
Page 69
Status: 1. Dezember 2020
How the K-means algorithm works: To process the learning data, the K-means algorithm
in data mining starts with a first group of randomly selected centroids, which are used as
the beginning points for every cluster, and then performs iterative (repetitive)
calculations to optimize the positions of the centroids It halts creating and optimizing
clusters when either: The centroids have stabilized — there is no change in their values
because the clustering has been successful. The defined number of iterations has been
achieved.
K-means Clusters of IRIS Dataset: The Iris dataset contains the data for 50 flowers from
each of the 3 species - Setosa, Versicolor and Virginica.
http://www.lac.inpe.br/~rafael.santos/Docs/CAP394/WholeStory-Iris.html
The data gives the measurements in centimeters of the variables sepal length and width
and petal length and width for each of the flowers. Goal of the study is to perform
exploratory analysis on the data and build a K-means clustering model to cluster them
into groups. Here we have assumed we do not have the species column to form clusters
and then used it to check our model performance. Since we are not using the species
column we have an unsupervised learning method. Develop a Python program by using
the Scikit-learn library can bee see under:
https://github.com/bhattbhavesh91/k_means_iris_dataset/blob/master/K_in_K_means_
Clustering.ipynb
Page 70
Status: 1. Dezember 2020
YouTupe videos:
https://www.youtube.com/watch?v=Cifl6cuEwMw
https://www.youtube.com/watch?v=nnp77iFxjrE
https://www.youtube.com/watch?v=pH3hQc585WQ
Use Case Description:
https://gallery.azure.ai/Experiment/a7299de725a141388f373e9d74ef2f86
This sample demonstrates how to perform clustering using k-means algorithm on the UCI Iris
data set. Also we apply multi-class Logistic regression to perform multi-class classification
and compare its performance with k-means clustering.
Clustering: Group Iris Data
This sample demonstrates how to perform clustering using the k-means algorithm on the
UCI Iris data set. In this experiment, we perform k-means clustering using all the features in
the dataset, and then compare the clustering results with the true class label for all samples.
We also use the Multiclass Logistic Regression module to perform multiclass classification
and compare its performance with that of k-means clustering.
Data
We used the Iris data set, a well-known benchmark dataset for multiclass classification from
the UCI repository. This dataset has 150 samples with 4 features and 1 label (the last
column). All features are numeric except that the label, which is a string.
Page 71
Status: 1. Dezember 2020
Properties:
• Sup(X=>Y) = Sup(Y=>X)
• Lift(X=>Y) = Lift(Y=>X)
Question:
• How many rules have you to consider in
this example?
• Prove the answer: You have to consider
80 rules (40 for Support and Lift).
Page 72
Status: 1. Dezember 2020
N=5
Support (A=>D):= frq(A,D)/5=2/5
Support (C=>A):= frq(C,D)/5=2/5
Support (A=>C):= frq(A,C)/5=2/5
Support (B&C=>D):= frq(B&C,D)/5=1/5
Confidence(A=>D):=frq(A,D)/frq(A)=(2/5)/(3/5)=2/3
Confidence(C=>A):=frq(C,A)/frq(C)=(2/5)/(4/5)=2/4=1/2
Confidence(A=>C):=frq(A,C)/frq(A)=(2/5)/(3/5)=2/3
Confidence(B&C=>D):=frq(B&C,D)/frq(B&C)=(1/5)/(3/5)=1/3
Lift(A=>D):=Sup(A=>D)/(Sup(A)*Sup(D))=(2/5)/(3/5*3/5)=(2/5)/(9/25)=2/(9/5)=10/9
Lift(C=>A):=Sup(C=>A)/(Sup(C)*Sup(A))=(2/5)/(4/5*3/5)=(2/5)/(12/25)=2/(12/5)=10/1
2=5/6
Lift(A=>C):=Sup(A=>C)/(Sup(A)*Sup(C))=(2/5)/(3/5*4/5)=(2/5)/(12/25)=2/(12/5)=10/1
2=5/6
Lift(B&C=>D):=Sup(B&C=>D)/(Sup(B&C)*Sup(D))=(1/5)/(3/5*3/5)=(1/5)/(9/25)=1/(9/
5)=5/9
Page 73
Status: 1. Dezember 2020
Page 75
Status: 1. Dezember 2020
d(refund=yes; refund=no)
1 Person: Review the example about Bayes Learning in this lesson. Use the same training data as
in the lesson together with the new lagged text. Run the Bayes -Text Classification calculation for
the sentence “Hermann plays a TT match” and tag this sentence.
No. Training-Text Label
Additional Question: What will happen if we change the target to “Hermann plays a very clean
game”
Optional*(1 P.): Define an algorithm in Python (use Jupyter Notebook) to automate the calculations.
Use description under: https://medium.com/analytics-vidhya/naive-bayes-classifier-for-text-classification-
556fabaf252b#:~:text=The%20Naive%20Bayes%20classifier%20is,time%20and%20less%20training%20data.
Page 76
Status: 1. Dezember 2020
Page 77
Status: 1. Dezember 2020
A further example of a Finance Chatbot is BOTTO which runs von MS Azure from Fiducia
AG (Karlsruhe). See a presentation from Fiducia and Adesso (Dortmund):
https://www.adesso.de/adesso/adesso-de/branchen/banken-
finanzdienstleister/sonderthemen/forum-banken/praesentation-chatbot-botto-g-weber-fiducia-
gad-it-ag.pdf
Page 78
Status: 1. Dezember 2020
Repeat K-Means clustering (DM lesson or internet). Describe and explain the 4
necessary steps to reach the final cluster
1. The centroids.
2. Assigning the first clusters.
3. Calculating the center of gravity and interacting.
4. The final clusters.
Calculate these measures for following 8 item sets of a shopping basket (1 person, 10
minutes):
{ Milch, Limonade, Bier }; { Milch, Apfelsaft, Bier }; { Milch, Apfelsaft, Orangensaft };{
Milch, Bier, Orangensaft, Apfelsaft };{ Milch, Bier };{ Limonade, Bier, Orangensaft }; {
Orangensaft };{ Bier, Apfelsaft }
1. What is the support of the item set { Bier, Orangensaft }?
2. What is the confidence of { Bier } ➔ { Milch } ?
3. Which association rules have support and confidence of at least 50%?
Page 79
Status: 1. Dezember 2020
Page 80
Status: 1. Dezember 2020
Decision tree learning uses a decision tree as a predictive model which maps
observations about an item to conclusions about the item's target value.
It is one of the predictive modelling approaches used in statistics, data
mining and machine learning. Tree models where the target variable can take a
finite set of values are called classification trees. In these tree structures, leaves
represent class labels and branches represent conjunctions of features that lead
to those class labels. Decision trees where the target variable can take continuous
values (typically real numbers) are called regression trees.
In decision analysis, a decision tree can be used to visually and explicitly
represent decisions and decision making. In data mining, a decision tree describes
data but not decisions; rather the resulting classification tree can be an input for
decision making.
Page 81
Page 82
Status: 1. Dezember 2020
There are couple of algorithms there to • Information entropy is the average rate at which
information is produced by a stochastic source of data.
build a decision tree , the most
important are: • Information Gain is the change in information entropy Η
from a prior state to a state that takes some information
1. ID3 (Iterative Dichotomiser 3) →
as given:
uses Entropy IG (T , a ) = H( T ) − H (T | a ) ,
function and Information where H( T | a ) is the conditional entropy of T given the
value of attribute a.
gain as metrics.
Page 83
Status: 1. Dezember 2020
Page 84
Status: 1. Dezember 2020
We have four X values (also called “features”)= {outlook, temp, humidity, windy} being
categorical and one y value (“target”)= {play Y or N} also being categorical. So we need to
learn the mapping (what machine learning always does) between X and y.
To better work with the data set we have first to summaries the number of results
(yes/no) in the target variable y=“play” for each value of all features (attributes)
X={outlook; temp.; humidity; windy}:
To create a tree, we need to have a root node first and we know that nodes are
features/attributes (outlook, temp, humidity and windy),
So, which one do we need to pick first??
Answer: determine the attribute that best classifies the training data; use this attribute
at the root of the tree. Repeat this process at for each branch.
This means we are performing top-down, greedy search through the space of possible
decision trees.
Okay, so how do we choose the best attribute?
Answer: use the attribute with the highest information gain in ID3
Page 85
Status: 1. Dezember 2020
Page 86
Status: 1. Dezember 2020
• If half of the examples are of positive class and half are of negative class
then entropy is one i.e. high
Page 87
Status: 1. Dezember 2020
See the calculation of the measure IG(S,Outlook) for the “Playing Tennis” data:
Similarity we can calculate for other two attributes (Humidity and Temp).
Page 88
Status: 1. Dezember 2020
Page 89
Status: 1. Dezember 2020
Page 90
Status: 1. Dezember 2020
Page 91
Status: 1. Dezember 2020
In CART we use Gini index as a metric. We use the Gini Index as our cost
function used to evaluate splits in the dataset.
Gini Index for Binary Target variable
A Gini score gives an idea of how good a split is by how mixed the classes are in the
two groups created by the split. A perfect separation results in a Gini score of 0,
whereas the worst case split that results in 50/50 classes:
Minimum value of Gini Index will be 0 when all observations belong to one label.
Page 92
Status: 1. Dezember 2020
GINI Index is also used in Data Mining lecture and is therefore a repeating of the
concepts. By using a different example like the DM lecture the GINI measure is
better understood.
Compare also the following YouTupe video “Gini index based Decision Tree” about
the calculation of a Gini index based Decision Tree:
https://www.youtube.com/watch?v=2lEcfRuHFV4
No 0 7 1 6 2 5 3 4 3 4 3 4 3 4 4 3 5 2 6 1 7 0
Gini 0.420 0.400 0.375 0.343 0.417 0.400 0.300 0.343 0.375 0.400 0.420
Remark: For the calculation of the Gini index for each cell see Notes-Page of this slide.
Page 93
Status: 1. Dezember 2020
First calculate the Gini-Index of the first cell, we write: Gini (55):
Gini(55) = Frq(<=55)*Gini(<=55)+Frq(>55)*Gini(>55) where Frq(X):=#(values in X)/#(values in
cell). We calculate: Gini(<=55)=1-0²-0²=1
Gini(>55)=1-(3/10)²-(7/10)²=(100-9-49)/100=42/100=0.42
=➔ Gini(55)=0/10*1+10/10*0.42=0.420
Second cell: Gini (65): We calculate: Gini(<=65)=1-0²-1²=0
Gini(>65)=1-(3/9)²-(6/9)²=(81-9-36)/81=36/81=4/9
=➔ Gini(65)=1/10*0+9/10*4/9=0.400
Similar: Gini(72)=2/10*(1-0²-1²)+8/10*(1-(3/8)²-(5/8)²)=4/5*(64-9-
25)/64=4/5*30/64=3/8=0.375
Gini(80)=3/10*(1-0-1²)+7/10*(1-(3/7)²-(4/7)²)=7/10*(49-9-16)/49=7/10*24/49=12/35=0.343
Gini(87)=4/10*(1-(1/4)²-(3/4)²)+6/10*(1-(2/6)²-(4/6)²)=4/10*(16-1-9)/16+6/10*(36-4-16)/36
=3/20+4/15=(9+16)/60=25/60=5/12=0.417
Gini(92)=5/10*(1-(2/5)²-(3/5)²)+5/10*(1-(1/5)²-(4/5)²)=5/10*(25-4-9)/25+5/10*(25-1-16)/25
=5/10*12/25+5/10*8/25=6/25+4/25=10/25=40/100=0.400
Gini(97)=6/10*(1-(3/6)²-(3/6)²)+4/10*(1-(0/4)²-(4/4)²)=6/10*(1-1/4-1/4)+4/10*0
=6/10*1/2=3/10=0.300
Per symmetry of the cell values you can see, that: Gini(110)=Gini(80)=0.343;
Gini(122)=Gini(72)=0.375; Gini(172)=Gini(65)=0.400 and Gini(230)=Gini(55)=0.420
Result: the best split value is 97, as this has the lowest Gini-Index.
Page 94
Status: 1. Dezember 2020
Calculate the Gini Index of each of the attributes/features of “Play Tennis” example. Use
the Frequency Table X→y:
We calculate it for every row and split the data accordingly in our binary tree.
We repeat this process recursively.
Page 95
Status: 1. Dezember 2020
Page 96
Status: 1. Dezember 2020
Gini (outlook)
Fill Frequency Table for outlook: = 4/14*Gini(overcast) + 5/14*Gini(sunny) + 5/14*Gini(rainy)
outlook overcast sunny rainy sum = 4/14*(1- (4/4)²-(0/4)²) + 5/14*(1-(2/5)²-(3/5)²) + 5/14*(1-(3/5)²-(2/5)²)
= 4/14*(1- 1-0) + 5/14*(25/25-4/25-9/25)+ 5/14*(25/25-9/25-4/12)
YES 4 2 3 9 = 4/14*0+5/14*(12/25)+5/14*12/25=10/14*12/25=2/7*6/5=12/35~0.343
NO 0 3 2 5
sum 4 5 5 14 Similar: Gini(windy)=0,429; Gini(temp)=0.44; Gini(humidity)=0.367
=➔ choose outlook = root node
Page 97
Status: 1. Dezember 2020
Calculate the Gini Index of each of the attributes/features of “Play Tennis” example. Use the
Frequency Table X→y:
Gini (temp)
= 4/14*Gini(hot) + 6/14*Gini(mild) + 4/14*Gini(cold)
= 4/14*(1- (2/4)²-(2/4)²) + 6/14*(1-(4/6)²-(2/6)²) + 4/14*(1-(3/4)²-(1/4)²)
= 4/14*(1- 1/4 - 1/4 ) + 6/14*(36/36-16/36-4/36) + 4/14*(16/16-9/16-1/16) = 4/14*1/2
+ 6/14*(16/36) + 4/14*6/16 = 1/7 + 4/21 + 3/28 = 37/84 = ~ 0.44
Gini(windy)
= 8/14*Gini(false) + 6/14*Gini(true)
= 8/14*(1- (6/8)²-(2/8)²) + 6/14*(1-(3/6)²-(3/6)²)
= 8/14*(16/16 - 9/16 - 4/16) + 6/14*(1-1/4 -1/4) = 3/14*10/16 + 6/14*1/2
= 3/14 + 3/14 = 6/14 = 3/7 = ~ 0.429
Gini(humidity)
= 7/14*Gini(high) + 7/14*Gini(normal)
= 7/14*(1- (3/7)²-(4/7)²) + 7/14*(1-(6/7)²-(1/7)²)
= 7/14*(49/49-9/49-16/49) + 7/14*(49/49-36/49 -1/4) = 7/14*24/49 + 7/14*12/49 = 12/49
+ 6/49 = 18/49 = ~ 0,367
In the same procedure, we can follow other steps to build the tree with Gini index
Page 98
Status: 1. Dezember 2020
Case A:
Gini (N1) = 1 – (4/7)² + (3/7)² = 49/49 – 16/49 - 9/49 = 24/49 ~0.4898
Gini (N2) = 1 – (2/5)² + (3/5)² = 25/25 – 4/25 - 9/25 = 12/25 = 0.48
Case B:
Gini (N1) = 1 – (1/5)² + (4/5)² = 25/25 – 1/25 - 16/25 = 8/25 = 0.32
Gini (N2) = 1 – (5/7)² + (2/7)² = 49/49 – 25/49 - 4/49 = 20/49 ~0.4082
Gini(B) = 5/12 * 8/25 + 7/12 * 20/49 =shorten= 2/15 + 5/21 = 14/105 + 25/105 =
39/105 = 13/35
Gini(windy|sunny)
= 3/5*(1-(1/3)²-(2/3)²) + 2/5*(1–(1/2)²-(1/2)²) = 3/5*(9/9 -
1/9 - 4/9) + 2/5*(1-1/4-1/4)=3/5*4/9+2/5*1/2=7/15 = ~0.267
Gini(humidity|sunny)
= 3/5*(1-(0/3)²-(3/3)²) + 2/5*(1–(2/2)²-(0/2)²)=3/5*(1-0-1)+2/5*(1-1-0) = 0
Page
Status: 1. Dezember 2020
100
YES 0 2 1 2
NO 0 1 1 3
sum 0 3 2 5
Gini(temp|rainy)
= 0* + 3/5*(1-(2/3)²-(1/3)²)+ 2/5*(1–(1/2)²-(1/2)²) = 3/5*(4/9) + 2/5*1/2
= 4/15 +3/15 = 7/15 = ~0.467
Remark*: there is no data record for the value “hot” --→Training Set is to small.
windy with outlook=rainy FALSE TRUE sum
YES 3 0 2
NO 0 2 3
sum 3 2 5
Gini(windy|rainy)
= 3/5*(1-(3/3)²-(0/3)²) + 2/5*(1–(0/2)²-(2/2)²) = 3/5*(1-1-0) + 2/5*(1-0-1) = 3/5*0 +
2/5*0 = 0
Page
Status: 1. Dezember 2020
101
Overfitting and Pruning shows limitations of ML Decisions Tree methods and give
the reason to consider special ML methods for special problems. A ML method
with fits all applications did not exist.
Pruning is a technique in machine learning that reduces the size of decision trees by removing
sections of the tree that provide little power to classify instances.
Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by
the reduction of overfitting.
Pruning should reduce the size of a learning tree without reducing predictive accuracy as
measured by a cross-validation set. There are many techniques for tree pruning that differ in the
measurement that is used to optimize performance.
To lessen the chance of, or amount of, overfitting, several techniques are available
(e.g. model comparison, cross-validation, regularization, early
stopping, pruning, Bayesian priors, or dropout).
The basis of some techniques is either (1) to explicitly penalize overly complex
models or (2) to test the model's ability to generalize by evaluating its performance
on a set of data not used for training, which is assumed to approximate the typical
unseen data that a model will encounter.
Decision Tree
Nr. Anl Typ Temp. Druck Füllst. Fehler
1001 123 TN 244 140 4600 nein
1002 123 TO 200 130 4300 nein
1009 128 TSW 245 108 4100 ja
1028 128 TS 250 112 4100 nein
1043 128 TSW 200 107 4200 nein
1088 128 TO 272 170 4400 ja
1102 128 TSW 265 105 4100 nein
1119 123 TN 248 138 4800 ja
1122 123 TM 200 194 4500 ja
Page
Status: 1. Dezember 2020
103
Page
Status: 1. Dezember 2020
104
Page
Status: 1. Dezember 2020
105
Page
Status: 1. Dezember 2020
106
Page
Status: 1. Dezember 2020
107
Page
Status: 1. Dezember 2020
108
Page
Status: 1. Dezember 2020
109
Example „Gießerei“:
Root Cause Analysis and Result
Prediction
Regel:
WENN Öffnungszeit > 1287
UND Heizkreis 12 <= 598,8
DANN wird zu 100% Ausschuss
produziert
Page
Status: 1. Dezember 2020
110
http://d-nb.info/992620961/34
Page
Status: 1. Dezember 2020
111
Page
Status: 1. Dezember 2020
112
Page
Status: 1. Dezember 2020
113
https://console.bluemix.net/dashboard/apps/?cm_mmc=Email_Nurtur
e-_-Cloud_Platform-_-WW_WW-_-
LoginBodyLoginButton&cm_mmca1=000002FP&cm_mmca2=10001675
&cm_mmca3=M00010245&cvosrc=email.Nurture.M00010245&cvo_ca
mpaign=Cloud_Platform-WW_WW
Page
Status: 1. Dezember 2020
114
Page
Status: 1. Dezember 2020
115
Page
Status: 1. Dezember 2020
116
Page
Status: 1. Dezember 2020
117
Page
Status: 1. Dezember 2020
118
Page
Status: 1. Dezember 2020
119
Hint to H4.3: See also for Jupyter Notebooks for Homework H4.3 with the name
“ML4-Homework-H4_3.ipynb” in [HVö-6]: GitHUb/HVoellinger:
https://github.com/HVoellinger/Lecture-Notes-to-ML-WS2020
Hint to H4.4*: Another link to the paper you will find in:
https://nbn-resolving.org/urn:nbn:de:gbv:ilm1-2008000255
Page
Status: 1. Dezember 2020
120
https://github.com/HVoellinger/Lecture-Notes-to-ML-WS2020/blob/master/QYuIc.gif
Page
Status: 1. Dezember 2020
121
Page
Status: 1. Dezember 2020
122
Page
Status: 1. Dezember 2020
123
Page
Status: 1. Dezember 2020
124
For the definition of R² we need some measures of the training-set (set of observations points). These measures are Sum of
Squares Total (SST), Sum of Squares Error (SSE) and Sum of Squares Regression (SSR). SSR is not needed for the definition of
R², but we will use it later in the chapter:
Page
Status: 1. Dezember 2020
125
Page
Status: 1. Dezember 2020
126
The question is how well the independent variables are suited to explain the
variance of the dependent or to predict their values. This is where the R² comes
into play. It is a measure that can not be less than 0 and not greater than 1. Since
the R² is a share value, it is also often given in percent.
If a regression has an R² close to 0, it means that the chosen independent
variables are not well suited to predict the dependent variable. One speaks then
also of a bad model adaptation ("poor model fit").
You get a better impression of the surface, when you check the
slices where intercept or slope are constant. See the sketches:
“Slice-const. slope” & “Slice-const. intercept” (see the right side).
What are the conditions for R² to be maximal? From Calculus we
know that a function in variables a, b has extreme values, if the
differential dR²=(dR²/da)*da+(dR²/db)*db=0; max. if d(d(R²))<0.
See the tangential lines in the sketches on the right side. To
calculate the two variables a and b you have to solve the two
equations: dR²/da=0 and dR²/db=0 & check maximum condition.
We will use these conditions for the calculations of coefficients a,
b in y = a + b*x (see “Least Square Fitting” method).
Page 127
We need now some helpful formulas about sums and mean-values, because we need them later in the calculation
of “optimal” coefficients a, b (see: “Least Square Fit” (LSF) method):
Proposition (P5.1): Easily you can proof, that the following equations are valid (let M(x):=Mean(xi)).
(i) Sum[(xi - M(x))²] = sum(xi²) - n*M(x)²
(ii) Sum[(yi – M(y))²] = sum(yi²) - n*M(y)²
(iii) Sum[(xi - M(x))*(yi - M(y))] = sum(xi*yi) – n*M(x)*M(y)
Proof: ”straightforward”
(i) sum[(xi - M(x))²] = sum[xi²- 2*M(x)*(xi) + M(x)²] (binominal formula)
= sum(xi²)-2*M(x)*sum(xi)+sum(M(x)²)=sum(xi²)–2*n*M(x)²+n*M(x)² because sum(xi)=n*M(x))
(ii) analog: sum[(yi-M(y))]²= sum(yi²)-2*M(y)*sum(yi)+ sum[M(y)²]=sum(yi²) - 2n*M(y)²+n*M(y)²=sum(yi²)-n*M(y)²
(iii) analog: sum[(xi - M(x))*(yi-M(y))] = sum[xi*yi - M(x)*yi – xi*M(y) + M(x)*M(y)] (“multiply out all factors”)
= sum(xi*yi) – n*M(x)*M(y) - n*M(x)*M(y) + n*M(x)*M(y) = sum(xi*yi*) - n*M(x)*M(y) q.e.d.
Page 128
Task: Calculate for a simple Linear Regression (sLR)-line f(x) = a +b*x the coefficients a and such
that f(x) is an optimal.
Solution:
The condition for f(x)=optimal is equivalent (“=>”) R²=1-SSE/SST= max. (T5.1)=> SSE = min.
We know from the initial lecture mathematics 1 ("extreme value problems") that the first derivation
of S (“dS”) must be zero and for a minimum additionally the second derivation (d(ds)) is less zero.
Start with first derivation: dS = dS/da*da + dS/db*db = 0 => dS/da = 0 (1) & dS/db = 0 (2)
We write “sum” for the symbol:
Page
Status: 1. Dezember 2020
129
Page
Status: 1. Dezember 2020
130
Find the "least square fit" y = b0 + b1x for the experimental data points: {(1 , 2) , (3 , 4) , (2 , 6) , (4 , 8) , (5 , 12) , (6 , 13) , (7 , 15)}
Solution:
Number of Point N=7 Mean-Values ("Mittelwerte"): [M(x),M(y)] = [28/7; 60/7] ~ [4; 8,5714]
Set up a table with the quantities included in the above formulas for b0 and b1 and also the quantities for the calculation of R²:
needed for calculation of b0 and b1 needed for calculation of R² SST = SSE + SSR ?
SST=sum(yi-
i xi yi x i *y i x i² y(x i ) SSE=sum(yi-y(x i ))² SSR=sum(y(xi) - M(y))²
M(y))²
1 1 2 2 1 2,0357 0,001274 43,1833 42,7101
2 3 4 12 9 6,3929 5,726 20,8977 4,7459
3 2 6 12 4 4,2143 3,1887 6,6121 18,9843
4 4 8 32 16 8,5715 0,3266 0,3265 0
5 5 12 60 25 10,7501 1,5623 11,7553 4,7467
6 6 13 78 36 12,9287 0,00661 19,6125 18,9861
7 7 15 105 49 15,1073 0,01151 41,3269 42,718
sum 28 60 301 140 10,822994 143,7143 132,8911
Substitute these values into Formula I and II: Compare with Python
Page
Status: 1. Dezember 2020
131
Let y=a + b*x be an optimal SLR line, then we have the following properties:
i. Let M(x) and M(y) the mean-values for all xi and yi ➔ the sLR line goes through the center of mass
(“Schwerpunkt”) point = (M(x), M(y)) if the model includes an intercept term (i.e. is not forced to go
through the origin).
ii. The sum of errors ei:=yi-y(xi) is zero if the model includes an intercept term: sum(ei) = 0
iii. Values ei and xi are uncorrelated (whether or not there is an intercept term model): sum(xi*ei )=0
Proof:
Part (i): Use the equations of “least best fit” method for sLR, then take the formulas (I) and (II) for the coeff. a and
b and set the values into regression line to proof: y(M(x)) =a + b*M(x) =M(y):
(I),(II) => y(M(x))= a+b*M(x)= (1/det)*[(M(y)*sum(xi²)-M(x)*sum(xi*yi)]+(1/det)[(sum(xi*yi)-n*M(x)*M(y)]*M(x) (1)
where det:= sum(xi²)–n*M(x)² (2)
The red parts are vanishing,(1) => y(M(x))= (M(y)/det)*[sum(xi²)-n*M(x)²] = (M(y)/det)*det = M(y) using (2).
Part (ii) and (iii): see equation (2) and (3) in Theorem (T5.2) q.e.d.
Remark: With the “center of mass” condition, we have a “quick check” for a line, if it is an candidate for an optimal
sLR-model. Since only one of two conditions for the determination of an optimal regression line is fulfilled with the
"center of mass" it’s therefore a necessary but not a sufficient condition (“..notwendig aber nicht hinreichend ….”)
Page 132
Sometimes in the literature or in YouTube videos you see the formula: “SST=SSR+SSE” (SSE,SST see slides
before, and SSR := Sumi(f(xi) – Mean(yi))²). Wikipedia (https://en.wikipedia.org/wiki/Coefficient_of_determination):
“… In some cases the total sum of squares equals the sum of the two other sums of squares defined above…”.
We can prove that his formula is true, if we have the “optimal” Regression-Line. (Take Care!).
Theorem (T5.2): Let y(x) = a + bx be a regression-line with a=intercept and b=slope, with b <> 0 then:
y(x) is optimal ==> SST = SSR + SSE (Equation (E5.1))
Proof: Proof that SST=SSR+SSE under the condition: y(x) is optimal <=➔ dR²/da=0 and dR²/db=0 for y(x).
Page 133
Manuel calculation of two sLR-lines (green ,red) (Homework (H5.1_a) + Compare with optimal sLR-line (homewrk (H5.1_b) + Check Results with the new metric R²=SSR/SST
Decide what is the "better" sLR-Line: y = 1,5 + 0,5*x or y = 1,25+0,5*x ? With the defintion R²:=SSR/SST we get the
result that the green line is the best sLR-line of
Solution:
Number of Point N=3 Mean-Values ("Mittelwerte"): [M(x),M(y)] = [2; (7/3)]
the three --> With R²=1-SSE/SST it was the
Set up a table with the quantities included in the above formulas for a and b and also the quantities for the calculation of R²: yellow line => the red metric is not applicable!
needed for calculation of a and b Needed for calculation of R² SST = SSE + SSR ?
SST=sum(yi-
i xi yi x i *y i x i² y(x i ) SSE=sum(yi-y(x i ))² R² SSR=sum(y(xi) - M(y))² R²
M(y))²
1 1 2 2 1 2,00 0,0000000 0,1111111 0,1111111
2 3 3 9 9 3,00 0,0000000 0,4444444 0,4444444
3 2 2 4 4 2,50 0,2500000 0,1111111 0,0277778
sum 6 7 15 14 0,2500000 0,6666667 0,6250000 0,5833333 0,8750000
0,8333333 <--SSR + SSE
Needed for calculation of R² SST = SSE + SSR ?
SST=sum(yi-
y(x i ) SSE=sum(yi-y(x i ))² R² SSR=sum(y(xi) - M(y))² R²
M(y))²
1,75 0,0625000 0,1111111 0,3402778
2,75 0,0625000 0,4444444 0,1736111
2,25 0,0625000 0,1111111 0,0069444
0,1875000 0,6666667 0,7187500 0,5208333 0,7812500
0,7083333 <--SSR + SSE
From Homework (H5.1_b) we get the data for the "optimal" sLR-line:
needed for calculation of R² SST = SSE + SSR ?
y(xi) SSE=sum(yi-y(xi))² SST=sum(yi- R² SSR=sum(y(xi) - M(y))² R²
11/6 (1/6)²=1/36 (-1/3)²=1/9 1/4
17/6 (1/6)²=1/36 (2/3)²=4/9 1/4
14/6 (-2/6)²=4/36 (1/3)²=1/9 0
42/6=7 6/36=1/6 2/3 0,7500000 1/2 0,7500000
0,6666667 <--SSR + SSE
Page 134
Conjecture
I guess the following conjecture. It is a conjecture, since I can’t prove it so far. → Proof
done →
Theorem
Idea of a proof:
If you do not believe that “==“ is correct, you can for example refute this by constructing a counterexample. This
would mean you have to find a Trainings-Set TS, with a sLR-line with (SST=SST+SSE) which is not optimal. To
check this, is a sub-task in Homework (H5.5)*.
Page 135
Page
Status: 1. Dezember 2020
136
Page
Status: 1. Dezember 2020
137
Page
Status: 1. Dezember 2020
138
Page
Status: 1. Dezember 2020
139
Part b: Do the same with a new x:=“Effort for homework[h]“: {(homework[h], score[pt.])}= {(5, 41), (4, 27), (5, 35), (3,
26), (9, 48), (8, 45), (10, 46), (5, 27), (3, 29), (3, 19)}
Task: Build the model sLR(x,y). Compare and check your result with the output of a Python-Program. Answer the
following questions:
1. Q1: How much points would a student achieve without any preparation or without doing any homework?
2. Q2: How much points would a student achieve with (10 hours of preparation for the exam) or (10 hours homework)?
3. Q3: How much effort you will need to reach enough points (=25) to get enough points to pass the exam?
Additional Question/Remark: Our calculation use one of the two variables independent from the other variable. What
is the difference to mLR-model results. Is R² (calculate here) different from Adj.R² we got in Example (E5.3)?
Page
Status: 1. Dezember 2020
140
Example (E5.1)-First Part: {(exam prep.[h], score[pt.])} = {(7, 41), (3, 27), (5, 35), (3,
26), (8, 48), (7, 45), (10, 46), (3, 27), (5, 29) , (3, 19)}
Second Part: similar to first part build the above table with the new data:
{(homework[h], score[pt.])}= {(5, 41), (4, 27), (5, 35), (3, 26), (9, 48), (8, 45), (10, 46),
(5, 27), (3, 29), (3, 19)}
Example (E5.2): “Manual calculation for sLR-model with Iowa Houses Data”
Take a subset of 10 data-records and calculate manually a, b and R² using the matrices of the lesson. Compare your
result with the results of the Homework ML5.3: “Coding with the dataset “Iowa Homes” to predict the “House Price”
based on “Square Feet” and a second variable (i.e. “Age of Home”). Compare your result with the output of a Python-
Program. See the same link to GitHub as above in E5.1
Page
Status: 1. Dezember 2020
141
Answer = 2
df :=Degrees of
Freedom =
Number of
observations - 2
Answer = 3
n:=Number of
observations
k:=Number of
variables
Page
Status: 1. Dezember 2020
142
See also the YouTupe Video: “Regression II: Degrees of Freedom EXPLAINED |
Adjusted R-Squared” : https://www.youtube.com/watch?v=4otEcA3gjLk
Consider also the YouTupe video from Andrew Ng: “Lecture 4.1 - Linear Regression
With Multiple Variables” and following Lectures 4.x:
https://www.youtube.com/watch?v=Q4GNLhRtZNc&list=PLLssT5z_DsK-
h9vYZkQkYNWcItqhlRJLN&index=18
Page
Status: 1. Dezember 2020
143
R² only works as intended in a simple linear regression model with one explanatory
variable. With multiple regression made up of several independent variables (k>1),
the R-Squared must be adjusted.
The adjusted R² compares the descriptive power of regression models that include
diverse numbers of predictors. Every predictor added to a model increases R² and
never decreases it. Thus, a model with more terms may seem to have a better fit
just for the fact that it has more terms, while the adjusted R-squared compensates
for the addition of variables and only increases if the new term enhances the model
above what would be obtained by probability and decreases when a predictor
enhances the model less than what is predicted by chance.
➔ Rule: if K>1 choose the regression model where Adj.R² is max.
In an overfitting condition, an incorrectly high value of R², which leads to a
decreased ability to predict, is obtained. This is not the case with the adjusted R².
For more detailed interpretation about the differences of R-² and Adj.-R², see also
the YouTupe Video: “Regression II: Degrees of Freedom EXPLAINED | Adj. R²”:
https://www.youtube.com/watch?v=4otEcA3gjLk
Proof:
Page
Status: 1. Dezember 2020
144
Page
Status: 1. Dezember 2020
145
https://github.com/HVoellinger/Lecture-Notes-to-ML-WS2020/blob/master/QYuIc.gif
Page
Status: 1. Dezember 2020
146
Page
Status: 1. Dezember 2020
147
Let z=a + b*x + c*y be an optimal mLR(k=2)-plane, then we have the following properties:
Let M(x), M(y) and M(z) the mean-values for all xi, yi and zi ➔ The center of mass point (“Schwerpunkt”) =
(M(x), M(y),M(z)) is on the mLR(k=2)-plane.
Proof: The statement follows directly for the LSF calculation, formula (I).
Page
Status: 1. Dezember 2020
148
Similar to Example (E5.1): Find "least square fit" z = a + b*x + c*y for the z:=“Achieved points(score) of exam[pt.]”
depending on the two parameter: x:="Effort exam preparation[h]" and y:=“Effort for homework [h]“. Data from Training
Set TS ={(x, y; z) | (exam prep.[h], homework[h]; score[pt.])}= {(7,5;41 ), (3,4;27), (5,5;35), (3,3;26), (8,9;48), (7,8;45),
(10,10;46), (3,5;27), (5,3;29), (3,3;19)}
Task: Build to the model mLR(x,y; z). Compare and check your result with the output of a Python-Program.
Answer the following three Questions:
1. Q1: How much points would a student achieve without any preparation and without doing any homework?
2. Q2: How much points would a student achieve with (10 hours of preparation for the exam) and (10 hours homework) ?
3. Q3: How much effort you will need to reach enough points (=25) to score enough points to pass the exam?
Add. Question/Remark: Our calculation use both variables in the calculation. What is the difference to our sLR-model
results? Compare Adj.R² (calculated here) to the two R² you got in Example (E5.1).
Page
Status: 1. Dezember 2020
149
Example (E5.4): “Manual calculation for mLR(k=2)-model with Iowa Houses Data”
Similar to example (E5.2) we take a subset of 10 data-records and calculate manually a,b and and Adj.R² using
the matrices of the lesson. Compare your result with the results of Homework: “Coding with the dataset “Iowa
Homes” to predict the “House Price” based on “Square Feet” and a second variable (i.e. “Age of Home”).
Compare your result with the output of a Python-Program.
Page
Status: 1. Dezember 2020
150
Page
Status: 1. Dezember 2020
151
Page
Status: 1. Dezember 2020
152
Exercises to ML5
Homework H5.1 - “sLR manual calculations of R² & Jupyter Notebook
(Python)”
Consider 3 points P1=(1|2), P2=(3|3) and P3=(2|2) in the xy-plane. Part b: 1 Person; Rest: 1
Person;
Part a: Calculate the sLR-Measures R² for the two estimated sLR-lines y=1,5 + 0,5*x and y=1,25
+ 0,5*x. Which estimation (red or green) is better? Check “center of mass”. (Hint: R²:= 1 - SSE/SST).
Part b: Calculate the optimal Regression-Line y = a + b*x. By using the formulas developed in the
lesson for the coefficients a and b. What is R² for this line?
Part c: Build a Jupyter Notebook (Python) to check the manual calculations of Part b. You can use
the approach of the lesson by using the Scikit-learn Python library. Optional*: Pls. plot a picture of
the “mountain landscape” for R² over the (a,b)-plane.
Part d: Sometimes in the literature or in YouTube videos you see the formula: “SST=SSR+SSE”
(SSE,SST see lesson and SSR := Sumi(f(xi) – Mean(yi))². Theorem (ML5-2): “This formula is only
true, if we have the optimal Regression-Line. For all other lines it is wrong! Check this, for the
two lines of Part a (red and green) and the opt. Regression-Line calculated in Part b.
Page
Status: 1. Dezember 2020
153
Part d:
Exercises to ML5
Homework H5.2*- “Create a Python Pgm. for sLR with. Iowa Houses
Data”:
2 Persons: See the video, which shows the coding using Keras library & Python:
https://www.youtube.com/watch?v=Mcs2x5-7bc0 .Repeat the coding with the dataset “Iowa
Homes” to predict the “House Price” based on “Square Feet”. See the result:
Page
Status: 1. Dezember 2020
154
Task:
• Part A: Calculate Adj.R² for given R² for a ”Housing Price” example (see table below). Did
you see a “trend”?
• Part B: What would be the best model if n=25 and if n=10 (use Adj.R²)?
Page
Status: 1. Dezember 2020
155
Exercises to ML5
Homework H5.4 - “mLR (k=2) manual calculations of Adj.R² & Jupyter
Notebook (Python)”
Consider the 4 points P1= (1|2|3), P2=(3|3|4), P3=(2|2|4) and P4=(4|3|6) in the 3-
dimensional space:
Part a: Calculate the sLR-Measures Adj.R² for the two Hyperplanes H1:=plane defined
by {P1,P2,P3} and H2:=Plane defined bx {P2,P3,P4}. Which plane (red or green) is a
better mLR estimation?
Part b: What is the optimal Regression-Plane z = a + b*x + c*y. By using the formulas
developed with “Least Square Fit for mLR” method for the coefficients a, b and c. What
is Adj.R² for this plane? (Hint: a=17/4, b=3/2, c=-3/2; R²~0,9474 and Adj.R² ~0.8421)
Part c: Build a Jupyter Notebook( Python) to check the manual calculations of part b.
You can use the approach of the lesson by using the Scikit-learn Python library.
Page
Status: 1. Dezember 2020
156
Exercises to ML5
Examine this direction of the (SST=SSE+SSR) condition. We could assume that the
condition: "SST = SSR + SSE" (*) also implies that y(x) is an optimal regression line.
In many examples this is true! (see homework 5H.1_a).
Task: Decide the two possibilities a) and b): ( 2 Persons, one for each step)
a. Statement is true, so you have to prove this. I.e. Show that when the “mixed term”
of the equation is zero (sum[(fi-yi)*(fi-M(y)]=0 for all i) implies an optimal sLR-line.
Page
Status: 1. Dezember 2020
157
https://www.youtube.com/watch?v=3JQ3hYko51Y
Page
Status: 1. Dezember 2020
158
• “Convolutional Neural Network Tutorial (CNN) | How CNN Works | Deep Learning
Tutorial | Simplilearn“ https://www.youtube.com/watch?v=Jy9-aGMB_TE
Page
Status: 1. Dezember 2020
159
Page
Status: 1. Dezember 2020
160
Page
Status: 1. Dezember 2020
161
The other concept is the convolution operation: we have a single input image and
we apply some filter or can be called mask care node template or even window
and by applying this we can see that this red square is represented in this input
window so all these values pixels of the image in that position and by applying a
filter which we can represent here by Omega at these nine pixels of the image in
that position pixels of the image in that position.
For these three by three positions we obtain an output value by combining the
input window or the input image weighted by all that values inside
the filter. The result of the convolution in this position X will be the result of the
sum (see right side of the picture)
Page
Status: 1. Dezember 2020
162
Page
Status: 1. Dezember 2020
163
Page
Status: 1. Dezember 2020
164
The second convolutional layer run again filters and ReLu function.
Output are 3 tensors (25 x25)
Page
Status: 1. Dezember 2020
165
Page
Status: 1. Dezember 2020
166
FC (i.e. fully-connected) layer will compute the class scores, resulting in volume of
size, where each of the 10 numbers correspond to a class score, such as among the
10 categories of CIFAR-10. As with ordinary Neural Networks and as the name
implies, each neuron in this layer will be connected to all the numbers in the
previous volume.
Output: 4 neurons.
In general the FC is the fully connected layer of neurons at the end of CNN.
Neurons in a fully connected layer have full connections to all activations in the
previous layer, as seen in regular Neural Networks and work in a similar way.
Page
Status: 1. Dezember 2020
167
Page
Status: 1. Dezember 2020
168
Regular Neural Nets: Neural Networks receive an input (a single vector), and transform
it through a series of hidden layers. Each hidden layer is made up of a set of neurons,
where each neuron is fully connected to all neurons in the previous layer, and where
neurons in a single layer function completely independently and do not share any
connections. The last fully-connected layer is called the “output layer” and in
classification settings it represents the class scores. Regular Neural Nets don’t scale well
to full images.
3D volumes of neurons. Convolutional Neural Networks take advantage of the fact that
the input consists of images and they constrain the architecture in a more sensible way.
In particular, unlike a regular Neural Network, the layers of a ConvNet have neurons
arranged in 3 dimensions: width, height, depth. (Note that the word depth here refers
to the third dimension of an activation volume, not to the depth of a full Neural
Network, which can refer to the total number of layers in a network.) For example, the
input images in CIFAR-10 are an input volume of activations, and the volume has
dimensions 32x32x3 (width, height, depth respectively). As we will soon see, the
neurons in a layer will only be connected to a small region of the layer before it, instead
of all of the neurons in a fully-connected manner. Moreover, the final output layer
would for CIFAR-10 have dimensions 1x1x10, because by the end of the ConvNet
architecture we will reduce the full image into a single vector of class scores, arranged
along the depth dimension.
Page
Status: 1. Dezember 2020
169
Page
Status: 1. Dezember 2020
170
Page
Status: 1. Dezember 2020
171
Page
Status: 1. Dezember 2020
172
Page
Status: 1. Dezember 2020
173
DNNs have already made breakthroughs for a number of practical applications, the most
widely used being speech recognition. Without DNN Siri would not have been possible.
DNN is also used for playing games: The new with DNN is that systems like Goggle
AlphaGo has learned the game strategy independently. The victory of IBM Deep Blue
against Kasparov in chess was a victory of ingenious programmers in combination with
superior computing power. AlphaGo, on the other hand, has achieved its progress since
October - when it beat the European champions - by playing and learning against itself.
This is the new capability of Deep Neural Networks (DNN). In other gam
es, DNN have already demonstrated how autonomous their learning has become: for
example, Space Invaders, where the DNN became a master player just by "watching" the
screen pixels and playing around with joystick moves.
See also: https://www.latimes.com/science/sciencenow/la-sci-sn-computer-learning-
space-invaders-20150224-story.html
However, for many areas, such as autonomous driving or lending, the use of such
networks is extremely critical and risky due to their "black box" nature, as it is difficult to
interpret how or why the models come to particular conclusions. It is an open issue to
understand and explain the decision making of deep neural networks.
https://rg-stuttgart.gi.de/veranstaltung/deep-learning-und-autonomes-fahren/
Automatic driving promises to redesign mobility. For an automatic vehicle to
navigate safely on the road, it has to be equipped with artificial intelligence. This
allows the vehicle to perceive its surroundings, analyze the traffic and make
decisions.
An essential building block for this new intelligence are artificial neural networks
(KNNs). Although these have been the subject of research for decades, only in
recent years have they achieved a quality that makes their use in road traffic within
reach. This goes back to the development of so-called deep learning, in which KNNs
with several million parameters can be successfully trained.
These deep networks can be used everywhere where explicit modeling by humans
is hardly possible. In particular, they are used in the processing of sensor data and
the understanding of complex traffic scenarios.
Page
Status: 1. Dezember 2020
175
Fraunhofer IEE has more than 15 years of experience in forecasting from volatile
energy producers. Enercast GmbH (https://www.enercast.de/) has been developing
since 2011 together with Fraunhofer in a cooperation project reliable and accurate
performance forecasts and extrapolations for wind + photovoltaic systems. The
special at the calculations is that these rely on artificial intelligence - through the use
of neural networks. All data is processed by an algorithm. For more details see also
the whitepaper: https://www.enercast.de/wp-content/uploads/2018/04/whitepaper-
prognosen-wind-solar-kuenstliche-intelligenz-neuronale-netze_110418_EN.pdf
Detail: The entire infrastructure was gradually built up - always with the objective of
providing a max. to achieve possible forecast quality. Fact is, per system (so for a
single wind turbine) are always used multiple networks. This leads for a complete
wind farm to hundreds of networks at work. The first training of network is mostly on
Cassandra clusters. Only when a certain quality is achieved, the switch takes place in
the operational mode - then without Cassandra. The networks then change
continuously and adapt to the respective local conditions. That is, if there are changes
in the local climate situation, then the grids correct themselves automatically to
provide optimal performance prediction. Exactly this adaptability is the big advantage
over the classic, static models. The static models describe the situation through a set
of fixed parameters in an algorithm. Economically, the static models deliver faster
results, whereas the AI-based systems achieve higher forecasting quality in the
medium term.
Page
Status: 1. Dezember 2020
176
Page
Status: 1. Dezember 2020
177
Page
Status: 1. Dezember 2020
178
https://gallery.azure.ai/Experiment/Compare-Multi-class-Classifiers-Letter-
recognition-2
Page
Status: 1. Dezember 2020
179
Page
Status: 1. Dezember 2020
180
Page
Status: 1. Dezember 2020
181
What is BackPropagation
Page
Status: 1. Dezember 2020
182
https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
Page
Status: 1. Dezember 2020
183
*******placeholder*************
*******placeholder*************
Page
Status: 1. Dezember 2020
184
Page
Status: 1. Dezember 2020
185
See also:
https://courses.cs.washington.edu/courses/cse573/05au/support-vector-
machines.ppt
*******placeholder*************
Page
Status: 1. Dezember 2020
186
*******placeholder*************
Page
Status: 1. Dezember 2020
187
Page
Status: 1. Dezember 2020
188
Page
Status: 1. Dezember 2020
189
https://scikit-learn.org/stable/auto_examples/svm/plot_iris_svc.html#sphx-glr-
auto-examples-svm-plot-iris-svc-py
*********** placeholder********************
Page
Status: 1. Dezember 2020
190
*******placeholder*************
Page
Status: 1. Dezember 2020
191
Appendix
Appendix-1
1. ML1: Introduction to Machine Learning (ML) 1. ML1: Introduction to Machine Learning (ML)
2. ML2: Concept Learning: VSpaces & Cand. Elim. Algo. 2. ML2: Concept Learning: VSpaces & Cand. Elim. Algo.
3. ML3: Supervised and Unsupervised Learning 3. ML3: Supervised and Unsupervised Learning
4. ML4: Decision Tree Learning 4. ML4: Decision Tree Learning
5. ML5: simple Linear Regression (sLR) & multiple Linear 5. ML5: simple Linear Regression (sLR) & multiple Linear
Regression (mLR) Regression (mLR)
6. ML6: Neural Networks: Convolutional NN 6. ML6: Neural Networks: Convolutional NN
7. ML7: Neural Network: BackPropagation Algorithm 7. ML7: Neural Network: BackPropagation Algorithm
8. ML8: Support Vector Machines (SVM) 8. ML8: Support Vector Machines (SVM)
1. ML1: Introduction to Machine Learning (ML) 1. ML1: Introduction to Machine Learning (ML)
2. ML2: Concept Learning: VSpaces & Cand. Elim. Algo. 2. ML2: Concept Learning: VSpaces & Cand. Elim. Algo.
3. ML3: Supervised and Unsupervised Learning 3. ML3: Supervised and Unsupervised Learning
4. ML4: Decision Tree Learning 4. ML4: Decision Tree Learning
5. ML5: simple Linear Regression (sLR) & multiple Linear 5. ML5: simple Linear Regression (sLR) & multiple Linear
Regression (mLR) Regression (mLR)
6. ML6: Neural Networks: Convolutional NN 6. ML6: Neural Networks: Convolutional NN
7. ML7: Neural Network: BackPropagation Algorithm 7. ML7: Neural Network: BackPropagation Algorithm
8. ML8: Support Vector Machines (SVM) 8. ML8: Support Vector Machines (SVM)
1. ML1: Introduction to Machine Learning (ML) 1. ML1: Introduction to Machine Learning (ML)
2. ML2: Concept Learning: VSpaces & Cand. Elim. Algo. 2. ML2: Concept Learning: VSpaces & Cand. Elim. Algo.
3. ML3: Supervised and Unsupervised Learning 3. ML3: Supervised and Unsupervised Learning
4. ML4: Decision Tree Learning 4. ML4: Decision Tree Learning
5. ML5: simple Linear Regression (sLR) & multiple Linear 5. ML5: simple Linear Regression (sLR) & multiple Linear
Regression (mLR) Regression (mLR)
6. ML6: Neural Networks: Convolutional NN 6. ML6: Neural Networks: Convolutional NN
7. ML7: Neural Network: BackPropagation Algorithm 7. ML7: Neural Network: BackPropagation Algorithm
8. ML8: Support Vector Machines (SVM) 8. ML8: Support Vector Machines (SVM)