Exercises2lecture Tres Interessant
Exercises2lecture Tres Interessant
Contents
Numbers of Exercises per Chapter .................................................................................................... 3
Links to Further Literature: .................................................................................................................. 3
Exercises to Lesson ML0: General Remarks and Goals of Lecture (ML) ................................... 4
Homework H0.1- “Three Categories of Machine Learning” ................................................ 4
Exercises to Lesson ML1: Introduction to Machine Learning (ML) ............................................... 6
Homework H1.1 - “Most Popular ML Technologies + Products” ....................................... 6
Homework H1.2 - “Ethics in Artificial Intelligence” ............................................................. 17
Homework H1.3 (optional)- “Create Painting with DeepArt” ............................................ 20
Homework H1.4 (optional) - Summary of video “What is ML?” ....................................... 20
Homework H1.5 (optional)– Summary of video “Supervised- & Unsupervised-
Learning” ........................................................................................................................................ 20
Exercises to Lesson ML2: Concept Learning: Version Spaces & Candidate Elimination ....... 33
Homework H2.1– “Version Space for “EnjoySport ............................................................. 33
Homework H2.2– “Version Space – Second example*********” ........................................ 33
Exercises to Lesson ML3: Supervised and Unsupervised Learning .......................................... 34
Homework H3.1 - “Calculate Value Difference Metric”....................................................... 34
Homework H3.2 – “Bayes Learning for Text Classification” ............................................ 35
Homework H3.3 (advanced)* – “Create in IBM Cloud two services Voice Agent and
Watson Assistant Search Skill with IBM Watson Services” ............................................. 41
Homework H3.4* – “Create a K-Means Clustering in Python” ......................................... 47
Homework H3.5 – “Repeat + Calculate Measures for Association” ............................... 55
Exercises to Lesson ML4: Decision Tree Learning ....................................................................... 60
Homework H4.1 - “Calculate ID3 and CART Measures”..................................................... 60
Homework H4.2 - “Define the Decision Tree for UseCase “Predictive Maintenance”
(slide p.77) by calculating the GINI Indexes” ........................................................................ 75
Homework H4.3* - “Create and describe the algorithm to automate the calculation of
the Decision Tree for UseCase “Predictive Maintenance” ................................................ 80
Homework H4.4* - “Summary of the Article … prozessintegriertes
Qualitätsregelungssystem…” ................................................................................................... 84
Homework H4.5* - “Create and describe the algorithm to automate the calculation of
the Decision Tree for the Use Case “Playing Tennis” using ID3 method” .................. 87
Exercises to Lesson ML5: simple Linear Regression (sLR) & multiple Linear Regression
(mLR) .................................................................................................................................................... 91
Homework H5.1 - “sLR manual calculations of R² & Jupyter Notebook (Python)” .... 91
Homework H5.2*- “Create a Python Pgm. for sLR with Iowa Houses Data” ................ 97
Homework H5.3 – “Calculate Adj.R² for MR” ........................................................................ 98
Homework H5.4 - “mLR (k=2) manual calculations of Adj.R² & Jupyter Notebook
(Python) to check results” ......................................................................................................... 99
Homework H5.5* - Decide (SST=SSE+SSR) => optimal sLR- line? ............................... 105
Exercises to Lesson ML6: Convolutional Neural Networks (CNN) ........................................... 106
Homework H6.1 – “Power Forecasts with CNN in UC2” .................................................. 106
Homework H6.2 – “Evaluate AI Technology of UC3” ........................................................ 106
Homework H6.3* – “Create Summary to GO Article”........................................................ 106
Homework H6.4* – “Create Summary to BERT Article” ................................................... 106
Exercises to Lesson ML7: BackPropagation for Neural Networks ........................................... 110
Homework H7.1 – “Exercise of an Example with Python” .............................................. 110
Homework H7.2 – “Exercise of an Example with Python” .............................................. 110
Exercises to Lesson ML8: Support Vector Machines (SVM) ..................................................... 111
Homework H8.1 – “Exercise of an Example with Python” .............................................. 111
Homework H8.2 – “Exercise of an Example with Python” .............................................. 111
Homework H8.3 – “Exercise of an Example with Python” .............................................. 111
Homework H8.4 – “Exercise of an Example with Python” .............................................. 111
Groupwork (2 Persons). Compare the differences of the three categories, see slide
“goal of lecture (2/2)”:
1. Supervised- (SVL)
2. Unsupervised- (USL)
3. Reinforcement-Learning (RIF)
Give of short descriptions of the categories and explain the differences (~5 minutes for each
category).
First Solution:
Give of short overview about the products and its features (~10 minutes for each)
und give a comparison matrix of the 3 products and an evaluation. What is your
favorite product (~ 5 minutes).
First Solution:
Second Solution:
P a g e | 10 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 11 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 12 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 13 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 14 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 15 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 16 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 17 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 18 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 19 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Solutions:
Groupwork (2 Persons) - summaries the results of the second and third YouTupe
Video “Supervised Learning” and “Unsupervised Learning” by Andrew Ng in a Report
of 15 Minutes. Create a small PowerPoint presentation. See:
https://www.youtube.com/playlist?list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN
P a g e | 20 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Solutions:
P a g e | 21 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 22 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 23 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 24 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 25 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 26 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 27 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 28 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Second Solution:
P a g e | 29 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 30 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 31 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 32 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Solutions:
*********** placeholder********************
Solutions:
….
P a g e | 33 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Solutions:
P a g e | 34 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
1 Person: Review the example about Bayes Learning in this lesson. Use the same
training data as in the lesson together with the new lagged text. Run the Bayes -Text
Classification calculation for the sentence “Hermann plays a TT match” and tag this
sentence.
Additional Question: What will happen if we change the target to “Hermann plays a
very clean game”
Optional*(1 P.): Define an algorithm in Python (use Jupyter Notebook) to automate
the calculations. Use description under: https://medium.com/analytics-vidhya/naive-bayes-classifier-
for-text-classification-
556fabaf252b#:~:text=The%20Naive%20Bayes%20classifier%20is,time%20and%20less%20training%20data.
P a g e | 35 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 36 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
[1]: # This notebook was created by Alireza Gholami and Jannik Schwarz
# Importing everything we need
P a g e | 37 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from nltk.tokenize import word_tokenize
# Import library time to check execution with date + time information
import time
#check versions of libraries
print('pandas version is: {}'.format(pd.__version__))
import sklearn
print('sklearn version is: {}'.format(sklearn.__version__))
[4]: # Here we are actually creating the matrix for sport and not sport sentences
tdm_sport, vector_sport, X_sport = vectorisation('Sports')
tdm_not_sport, vector_not_sport, X_not_sport = vectorisation('Not Sports')
print (f'Sport sentence matrix: \n{tdm_sport}\n')
P a g e | 38 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 39 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
[11]: # We're using Laplace smoothing, # if a new word occurs the probability would be 0
# So every word counter gets incremented by one
def laplace(freq, total_count, total_feat): prob_sport_or_not = []
for my_word in new_word_list:
if my_word in freq.keys():
counter = freq[my_word]
else: counter = 0
# total_count is the amount of words in sport sentences and total_feat the total amount of words
prob_sport_or_not.append((counter + 1) / (total_count + total_feat))
return prob_sport_or_not
# multiplying the result with the ratio of sports sentences to the total amount of sentences (here: 4/6)
P a g e | 40 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Homework for 2 Persons: Log in into IBM Cloud and follow the tutorial descriptions
(see links):
1. “Voice Agent” (1 person)
a. Set up the requires IBM Cloud Services
b. Configure the TWILIO Account
c. Configure the Voice Agent on the IBM Cloud and Import Skill by uploading either
• skill-banking-balance-enquiry.json or
• skill-pizza-order-book-table.json
See tutorial:
https://github.com/FelixAugenstein/digital-tech-tutorial-watson- assistant-search-skill
Remark: You can integrate the two skills, such that when the dialog skill has no
answer you show the search results. The reading of texts from the search results of
P a g e | 41 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
the search skill is unfortunately not (yet) possible. Watson can only display the
search result with title/description etc. as on Google. The tutorial in the cloud docs on
the same topic is also helpful: https://cloud.ibm.com/docs/assistant?topic=assistant-
skill-search-add
Solutions:
Ad1: by Hermann Völlinger; 12.3.2020
You link the phone-number with your solution “Watson-Voice Agent Tutorial”, see:
P a g e | 42 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Finally, you can see the final configuration by opening the service app “Watson-Voice
Agent Tutorial”. See the following screenshot:
By opening the Watson Assistant, we see all available solutions, i.e. dialog- and
search skills. Under “my second assistant” we see the two dialog skills “hermann
skill” and “voice”:
After opening “voice” we see all intents (number=12). Some are imported by the json-
file. Other are created by myself, like #machine, #FirstExample or #SecondExample:
P a g e | 43 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
You can define questions (see #machine) and also answers of the voice assistant
(“chatbot”):
So, one gets the final flow chart of the dialog skill for the Voice-Agent Voice. See her
the response of the question “What is Machine Learning?”:
P a g e | 44 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Similar you see her the logic of the question “What is my Balance?”:
Ad2: By Niklas Gysinn & Maximilian Wegmann, DHBW Stg. SS2020 (4.3.2020)
Creating a Watson Search (Discovery) Skill using the IBM Cloud
Source used: https://github.com/FelixAugenstein/digital-tech-tutorial-watson-
assistant-search-skill
P a g e | 45 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
First of all, we created two services. One service for crawling and indexing the
website information and one for providing the assistant functionality.
The discovery service uses various news sites (e.g. German “Tagesschau”) to
retrieve the latest articles and make them available to the assistant.
P a g e | 46 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
This information can then be accessed via a "chat" provided by the IBM Watson
Assistant service.
Homework H3.4* – “Create a K-Means Clustering in Python”
Homework for 2 Persons: Create a python algorithm (in Jupyter Notebook) which
clusters the following points:
P a g e | 47 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
1.2 Preparations
1.2.1 Import of libraries
The first step is to import the necessary library packages.
%matplotlib inline
import copy
import sklearn as sk
from sklearn.cluster import KMeans
# to check the time of execution, import function time
import time
# check versions of libraries
print('pandas version is: {}'.format(pd.__version__))
print('numpy version is: {}'.format(np.__version__))
print('sklearn version is: {}'.format(sk.__version__))
1.2.2 Dataset
The second step is defining data to work with. The data frame contains two arrays of x and y
coordinates. These build several points in a two-dimensional space.
16 64 19
17 69 7
18 72 24
P a g e | 50 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 51 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 52 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
old_x = old_centroids[i][0]
old_y = old_centroids[i][1]
dx = (centroids[i][0] - old_centroids[i][0]) * 0.75
dy = (centroids[i][1] - old_centroids[i][1]) * 0.75
ax.arrow(old_x, old_y, dx, dy, head_width=2, head_length=3, fc=colmap[i],ec=colmap[i])
plt.show()
# display result
displayDataset(df, centroids)
[11]: # Dataset
df = pd.DataFrame({
'x': [12, 20, 28, 18, 29, 33, 24, 45, 45, 52, 51, 52, 55, 53, 55, 61, 64, 69, 72],
'y': [39, 36, 30, 52, 54, 46, 55, 59, 63, 70, 66, 63, 58, 23, 14, 8, 19, 7, 24] })
Invoke the imported k-Means constructor with the number of clusters (here 3). Then train the
model with the dataset.
To 2a.:
We have 8 market baskets -→Support(Bier=>Orangensaft)=frq(Bier,Orangensaft)/8
We see two baskets which have Bier and Orangensaft together
--→Support = 2/8=1/4 = 25%
To 2b.:
We see that frq(Bier)=6 und frq(Bier,Milch)=4 -→Conf(Bier=>Milch)=4/6=2/3= 66,7%
To 2c.:
To have a support>=50% we need items/products which occur in more than 4 baskets.
We see for example Milch is in 5 baskets (we write: #Milch=5), #Bier=6, #Apfelsaft=4,
#Orangensaft=4 and #Limonade=2.
Only the 2-pair #(Milch, Bier)=4 has minimum of 4 occurrences. We see this by
calculating the Frequency-Matric(frq(X=>Y)) for all tuples (X,Y):
It is easy to see, that there are no 3-pairs with a minimum of 4 occurrences: only
Sup(Bier,Milch) is >=50%. But for all X: Sup{Bier,Milch},X)<50% .
We see from the above matric, that: Supp(Milch=>Bier)=Supp(Bier=>Milch)4/8=1/2=50%
We now calculate: Conf(Milch=>Bier)=4/#Milch=4/5=80%
From Question 2, we know that Conf(Bier=>Milch)=66,7%
Solution: Only the two association rules (Bier=>Milch) and (Milch=>Bier) have support
and confidence >=50%.
P a g e | 55 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 56 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 57 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 58 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 59 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
First Solution with ID3 (Hermann Völlinger, Feb. 2020): Missing calculations on ID3
method (see page number of the corresponding lecture slides on the right top):
P a g e | 60 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 61 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 62 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 63 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 64 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 65 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Second Solution with ID3 (Lars Gerne & Nils Hauschel, 03/31/20):
P a g e | 66 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 67 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 68 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 69 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 70 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 71 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 72 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
First Solution with CART: Missing calculations on CART method using GINI Index
as a metric (see page number of the corresponding lecture slides on the right top):
see Notes Page in the lecture presentation.
P a g e | 73 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 74 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Optional*: Create and describe the algorithm to automate the calculation of steps
1. to 3.
P a g e | 75 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Ad 1:
We calculate first the matrix for Druck by looking on the Data Table:
When we follow strictly the approach of slide 67, we have to consider intervals for classes "<= "and ">" and a split-point
in the middle of the interval. See the slide p.67:
***********************************************************************************************************************
We calculate next the matrix for Temp.: Nr. Anl Typ Temp Druck Füllst. Fehler
1001 123 TN 244 140 4600 NO
Temp. 1002 123 TO 200 130 4300 NO
Values 200, 200, 200 244 245 248 250 265 272 1009 128 TSW 245 108 4100 YES
Error NO, NO,YES NO YES YES NO NO YES 1028 128 TS 250 112 4100 NO
Split-Point 178 222 244,5 246,5 249 257,5 268,5 275,5 1043 128 TSW 200 107 4200 NO
Interval < = > <= > <= >< = >< = >< = ><= > <= > 1088 128 TO 272 170 4400 YES
NO 0 5 2 3 3 23 23 2 4 1 5 0 5 0 1102 128 TSW 265 105 4100 NO
YES 0 4 1 3 1 3 2 2 3 1 3 1 3 1 4 0 1119 123 TN 248 138 4800 YES
GINI 1122 123 TM 200 194 4500 YES
***********************************************************************************************************************
Temp
Finally we calculate the matrix for Füllst.: Nr. Anl Typ Druck Füllst. Fehler
.
1001 123 TN 244 140 4600 NO
Füllst. 1002 123 TO 200 130 4300 NO
Values 4100, 4100,4100 4200 4300 4400 4500 4600 4800 1009 128 TSW 245 108 4100 YES
Error NO, NO, YES NO NO YES YES NO YES 1028 128 TS 250 112 4100 NO
Split-Point 4050 4150 4250 4350 4450 4550 4700 4900 1043 128 TSW 200 107 4200 NO
Interval < = > <= > <= >< = >< = >< = ><= > <= > 1088 128 TO 272 170 4400 YES
NO 0 5 2 3 3 2 4 14 1 4 1 5 0 5 0 1102 128 TSW 265 105 4100 NO
YES 0 4 1 3 1 3 1 3 2 2 3 1 3 1 4 0 1119 123 TN 248 138 4800 YES
GINI 1122 123 TM 200 194 4500 YES
P a g e | 76 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Ad2:
Druck
Values 105 107 108 112 130 138 140 170 194
Error NO NO YES NO NO YES NO YES YES
Split-Point 104 106 107,5 110 121 134 139 155 182 206
Interval <= > <= > <= > <= > <= > <= > <= > <= > <= > <= >
NO 0 5 1 4 2 3 2 3 3 2 4 1 4 1 5 0 5 0 5 0
YES 0 4 0 4 0 4 1 3 1 3 1 3 2 2 2 2 3 1 4 0
GINI 0.494 0.444 0.381 0.481 0.433 0.344 0.444 0.317 0.417 0.494
First we calculate Gini (Druck) for the value= 139: Second we calculate Gini (Druck) for the value= 155:
Gini (Druck) Gini (Druck)
= 6/9*Gini(<=139) + 3/9*Gini(>139)' = 7/9*Gini(<=0155) + 2/9*Gini(>155)'
= 2/3*(1- (4/6)²-(2/6)²) + 1/3*(1-(1/3)²-(2/3)²)' = 7/9*(1- (2/7)²-(5/7)²) + 2/9*(1-(2/2)²-(0/2)²)'
= 2/3*((36-16-4)/36 ) + 1/3*((9-1-4)/9) = 8/27 + 4/27 = 4/9 = ~0.444' = 7/9*((49-4-25)/49 ) + 0 = 7/9*(20/49) = 20/63 = ~0.317'
P a g e | 77 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Temp.
Values 200, 200, 200 244 245 248 250 265 272
Error NO, NO,YES NO YES YES NO NO YES
Split-Point 178 222 244,5 246,5 249 257,5 268,5 275,5
Interval <= > <= > <= > <= > <= > <= > <= > <= >
NO 0 5 2 3 3 2 3 2 3 2 4 1 5 0 5 0
YES 0 4 1 3 1 3 2 2 3 1 3 1 3 1 4 0
GINI 0.494 0.481 0.433 0.489 0.481 0.492 0.417 0.494
We see that the value of the GINI-index only depends on the distribution of YES and NO's:
For the values 178, 222, 244,5, 249, 268,5 and 275,5 we can use the GINI of Druck, since the distribution of YES and NO's are same.
So we need only to calculate GINI(Temp.) for the values= 246,5 and 257,5
*************************************************************************************************************
RESULT: When we consider the lowest GINI we see it with 0.317 for the feature DRUCK for the value 155.
=> DRUCK = Root-Node and the Split-Value is at 155. Our descion tree is now:
Ad3:
We need to calculate the GINI-Indexes for all remaining 7 values (where Druck <
170) for the Features Temp. and Füllst.:
P a g e | 78 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
We need to calculate the GINI-Indexes for all remaining 7 values (where Druck < =155) for the Features Temp. and Füllst.:
Temp.
Values 200, 200 244 245 248 250 265
Error NO, NO NO YES YES NO NO
Split-Point 178 222 244,5 246,5 249 257,5 272,5
Interval <= > <= > <= > <= > <= > <= > <= >
NO 0 5 2 3 3 2 3 2 3 2 4 1 5 0
YES 0 2 0 2 0 2 1 1 2 0 2 0 2 0
GINI (246,5) = 4/7*(1-(3/4)²-(1/4)² )+ 3/7*(1-(1/3)²-(2/3))² = 4/7*((16-9-1)/16) + 3/7*((9-1-4)/9)= 6/28 + 4/21 = 17/34 ~ 0.405
GINI (257,5) = 6/7*(1-(4/6)²-(2/6)² )+ 1/7*(1-0 -1))² = 6/7*(1-(2/3)²-(1/3)² + 0= 6/7*(4/9) =6/7*4/9 = 8/21 ~ 0.405
Füllst.
Values 4100, 4100, 4100 4200 4300 4600 4800
Error NO, NO, YES NO NO NO YES
Split-Point 4050 4150 4250 4450 4700 4900
Interval <= > <= > <= > <= > <= > <= >
NO 0 5 2 3 3 2 4 1 5 0 5 0
YES 0 2 1 1 1 1 1 1 1 1 2 0
GINI 0.408 0.405 0.405 0.371 0.238 0.408
For the Values 4050, 4150, 4250 and 4900 we can use the GINI calculation from Temp.
GINI (4450) = 5/7*(1-(4/5)²- (1/5)²)+ 2/7*(1-(1/2)²-(1/2)²) = 5/7*((25-16-1)/25) + 2/7*(1/2) = 8/35 + 1/7 = 13/35 = 12/63 = 4/21 ~ 0.371
GINI (4700) = 6/7*(1-(5/6)²-(1/6)² )+ 1/7*(1-0 -1) = 6/7*((36-25-1)/36) = (6/7)*(10/36) = 10/42 = 5/21 ~ 0.238
P a g e | 79 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Solution: Created by H. Fritze. & P. Mäder (DHBW, SS2020) and H. Völlinger (DHBW,
WS2020). The following screenshot are from a Jupyter Notebook (using Python3):
P a g e | 80 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 81 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 82 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 83 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Groupwork (2 Persons) – read and create a short summary about a special part of
article/dissertation from Hans W. Dörmann Osuna: “Ansatz für ein prozessintegriertes
Qualitätsregelungssystem für nicht stabile Prozesse“.
Link to article: http://d-nb.info/992620961/34
For the two chapters (1 Person, 15 Minutes):
• Chapter 7.1 „Aufbau des klassischen Qualitätsregelkreises”
• Chapter 7.2. “Prädiktive dynamische Prüfung”
P a g e | 84 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Subheadings
• „Aufgaben“
• „Voraussetzungen für die Datenerfassung“
• „Datenauswertung“
▪ „Data Understanding“
▪ „Data Preparation“
▪ „Modellierung und Datenanalyse“
▪ „Implementierung“
„Aufgaben“ - Functions
During production data is collected and compared to target values. If the values do not
match, the system automatically acts to correct itself:
2. Do
3. Check
4. Act
„Data Understanding“
• What variables are relevant for my process?
• What must be taken into consideration?
„Data Preparation “
• Goal: Creation of a table with which current data can be compared to target values
• Generation of initial target values by testing and measurements as well as opinions of
specialists and more
• CART- and CHAID- decision trees as well as rule-based System as possible methods
„Implementierung“ - Implementation
• Creation of new variables and target values based on new solutions
• Adaptation of existing target values to accommodate new knowledge and rules
************************************************************************************************
P a g e | 86 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 87 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 88 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 89 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
See the rest of this Jupyter Notebooks H4.3 with the name “Homework_H4.5-
DecTree_ID3.ipynb” (as PDF: “Homework_H4.5-DecTree_ID3.pdf”) in [HVö-6]:
GitHUb/HVoellinger: https://github.com/HVoellinger/Lecture-Notes-to-ML-WS2020
P a g e | 90 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Part a: Calculate the SLR-Measures R-Square R² for the two estimated SLR-lines
y=1,5 + 0,5*x and y=1,25 + 0,5*x. Which estimation (red or green) is better? (1
Person, 15 minutes). (Hint: R²-Square= 1-SSE/SST).
Part c: Build a Jupyter Notebook (Python) to check the manual calculations of Part b.
You can use the approach of the lesson by using the Scikit-learn Python library.
Optional*: Pls. plot a picture of the “mountain landscape” for R² over the (a,b)-plane.
Part d: Sometimes in the literature or in YouTube videos you see the formula:
“SST=SSR+SSE” (SSE, SST see lesson and SSR := Sumi(f(xi) – Mean(yi))².
Theorem (ML5-2): “This formula is only true, if we have the optimal Regression-Line.
For all other lines it is wrong! Check this, for the two lines of Part a (red and green)
and the opt. Regression-Line calculated in Part b.
Solutions:
P a g e | 91 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 92 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Part b:
Detailed description and Excel document with the integrated formulas for the
calculation of the coefficients a, b can be found GitHub/Hvoellinger:
https://github.com/HVoellinger/Lecture-Notes-to-ML-WS2020
P a g e | 93 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Part c:
P a g e | 94 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 95 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
P a g e | 96 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Homework H5.2*- “Create a Python Pgm. for sLR with Iowa Houses Data”
2 Persons: See the video, which shows the coding using Keras library & Python:
https://www.youtube.com/watch?v=Mcs2x5-7bc0 .Repeat the coding with the dataset
“Iowa Homes” to predict the “House Price” based on “Square Feet”. See the result:
Solutions:
P a g e | 97 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
See also the YouTupe Video: “Regression II: Degrees of Freedom EXPLAINED |
Adjusted R-Squared”; https://www.youtube.com/watch?v=4otEcA3gjLk
Task:
• Part A: Calculate Adj.R² for given R² for a ”Housing Price” example (see table
below). Did you see a “trend”?
• Part B: What would be the best model if n=25 and if n=10 (use Adj.R²)?
Part A:
Part B:
P a g e | 98 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Part a: Calculate the mLR-Measures Adj.R² for the two Hyperplanes H1:=plane
defined by {P1,P2,P3} and H2:=Plane defined bx {P2,P3,P4}. Which plane (red or
green) is a better mLR estimation? (Hint: calculate Adj.R²).
Part b: What is the optimal Regression-Plane z = a + b*x + c*y. By using the formulas
developed with “Least Square Fit for mLR” method for the coefficients a, b and c.
What is Adj.R² for this plane? (Hint: a=17/4, b=3/2, c=-3/2; R² ~0.9474 and Adj.R²=0,8421)
Part c: Build a Jupyter Notebook (Python) to check the manual calculations of part b.
You can use the approach of the lesson by using the Scikit-learn Python library.
P a g e | 99 Date: 22 December
2022
Exercises to ML DHBW Stuttgart – WS2020
Part c:
Part c:
Rest see [HVö-6]: Dr. Hermann Völlinger: GitHub to the Lecture "Machine Learning:
Concepts & Algorithms“; see: https://github.com/HVoellinger/Lecture-Notes-to-ML-
WS2020
Examine this direction of the (SST=SSE+SSR) condition. We could assume that the
condition: "SST = SSR + SSE" (*) also implies that y(x) is an optimal regression line.
In many examples this is true! (see homework 5H.1_a).
Task: Decide the two possibilities a) and b): (2 Persons, one for each step)
a. Statement is true, so you have to prove this. I.e. Show that when the “mixed
term” of the equation is zero (sum[(fi-yi)*(fi-M(y)]=0 for all i) implies an optimal
sLR-line.
b. To prove that it’s wrong, it’s enough to construct a counterexample: define a
Training Set TS= {observation-points}; a sLR-line which has condition (*), but
is not an optimal sLR-line.
Groupwork (2 Persons): Evaluate and explain in more details the CNN in “UC2-
Fraunhofer + enercast: Power forecasts for renewable energy with CNN”
https://www.enercast.de/wp-content/uploads/2018/04/whitepaper-prognosen-wind-solar-
kuenstliche-intelligenz-neuronale-netze_110418_EN.pdf
Solutions:
…..
Solutions:
……
Groupwork (2 Persons) - read and create a summary of the main results of the article
“Mastering the game of Go with deep neural networks and tree search”
https://storage.googleapis.com/deepmind-media/alphago/AlphaGoNaturePaper.pdf
Solutions:
…..
Groupwork (2 Persons): Read and summaries of the main results of the article about
BERT. See Ref. [BERT]: Jacob Devlin and Other: “BERT: Pre-training of Deep
Bidirectional Transformers for Language Understanding”; Google (USA); 2019
*********** placeholder********************
Solutions:
….
*********** placeholder********************
Solutions:
….
*********** placeholder********************
Solutions:
….
*********** placeholder********************
Solutions:
….
*********** placeholder********************
Solutions:
….
*********** placeholder********************
Solutions:
….