0% found this document useful (0 votes)
17 views21 pages

2 Machine Learning

Uploaded by

anna.na16567
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views21 pages

2 Machine Learning

Uploaded by

anna.na16567
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Intel Educational

Mindshare Initiative in
Artificial Intelligence

AI and Predictive Analytics in Data-Center Environments


http://dcai.bsc.es

Exercises Session 2 – Machine Learning


In this part we will do some exercises showing the concepts of Data Science and Machine Learning, seen
in the second session Data and Machine Learning.

Important: All the following exercises are done in Python, as a common scripting platform and language
found in many computing servers and data-centers. The same experiments can be also repeated in MacOS
and Windows (using the latest WSL), with possible limitations, but this does not mean that there aren’t
other ways to achieve the same in such systems.

Exercise 1 – Data Science


In this part we will see some examples of asking questions to data, and attempt to identify methods to
obtain solutions through analytics and machine learning.

Asking the correct Questions to the Data

From the following questions, let’s classify them as “descriptive”, “exploratory”, “inference”, “prediction”
or “causal”, and find methods to apply:

1. We have a dataset form an airline, with the registry of passengers of a plane. The dataset contains
the personal data of passengers with their age and nationality. We are asked to retrieve some
information towards commercial segmentation: Which age ranges exist in data? Which is the
most frequent age range? And which is the most usual country for the passengers?

 We are in front of an descriptive question. We are asked to retrieve basic information from
the features of our data, in order to understand the characteristics of the sampled population.
We can solve this through analytics by looking for features ranges and statistics like average,
trend, max and min. A summary of the dataset will provide much of that data.

2. We have a dataset from a library, with the registry of book lending having data from users, from
the lent books, the dates of loans and book categories. We are asked if there is any preference of
book categories across different users and ages.
 We have here an exploratory question. We are asked if there are relations among the features
we are presented. We need to cross ages and categories, to discover for each age with are
the most popular categories (computing the trend, grouping by ages or ranges of ages). Here
we could even do some clustering of ages according to categories, to discover the best
partitioning of age ranges, and have the best representation in preferences.

3. We have a dataset of a web site access logs, with the registry of accesses, also the evidence of an
attack comming from a specific country of origin, aside of registries of legitimate users. We are
asked if we can we determine if in the future, the same pattern of attack that distinguishes the
attackers and the legitimate users will still be valid to separate users, and if that pattern repeats
in attacks from other countries.

 Here we have an inductive question, as we are asked if the patterns we are finding are
applicable in future or different data. Here we can look for the same found patterns across
datasets. A naïve method could be use the machine learning model that displayed the pattern
for the target data, and attempt to infer with it different data, then check the error. If the new
data does not fit the model, it is possible that that data is different (different patterns,
behaviors, relations...), but if it fits it is possible the patter is more general that just the training
data.

4. We have a dataset of a streamming music service, with the registry of users and their music
preferences. We want to know if, given a random user, according to their music history, this user
can be classified into a specific or several categories.

 This question is a predictive question. We are asked to create a method to learn from user
preferences and separate them by categories, so labeled users can be (most probably) offered
recommendations. If we have the categories (users are labeled) we can apply machine
learning classification methods, to predict from a user which labels they have. If we don’t have
a-priori those labels, we can apply clustering to find new labels on users, then discover which
classes of user exist, and classify future users.
Exercise 2 – Supervised Learning

Classification with the Iris Dataset

With these exercises we will take a first look to how to create, train and evaluate a Machine Learning
model with Python and Scikit-learn (abbreviated Sklearn). Sklearn is a free software machine learning
library for the Python programming language featuring various classification, regression and clustering
algorithms, and it is widely used both in industry and academia.

Loading the data


For this first example we will use the Iris dataset that has been already introduced in the course. Sklearn
conveniently includes a pre-processed version of the Iris dataset in its datasets package, saving us the
hassle of downloading and wrangling with the data. We will start by importing the package
sklearn.datasets, which contains the Iris data:

from sklearn import datasets

Now we can load the Iris data using the method load_iris:

iris_ds = datasets.load_iris()

To avoid having to type iris_ds. to access the features and labels of the dataset we will save the
dataset's features (iris_ds.data) and labels (iris_ds.target) into variables called X and y
respectively:

X = iris_ds.data
y = iris_ds.target

One of the advantages of the Sklearn datasets package is that it comes with “loading” functions for its
datasets. Unfortunately, other datasets and files do not often have loaders, and we’ll have to load them
by reading a CSV file, then select the data corresponding to X and Y manually.

Training and validation datasets


A crucial step in the model creation process is being able to test the model in some data that has not been
used to train it. Because we only have one dataset and we are not able to get more data on the Iris flowers,
we will need to save some of the data we have in order to use it for validation purposes. We can do this
using Sklearn's train_test_split, which will split the dataset in two with a given train-test dataset
size ratio:

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3,
random_state = 0)

You can see that we have also passed a random_state parameter to train_test_split.
random_state sets a seed for the Random Number Generator (RNG) used to perform the split, and it
allows us to obtain the same results at every run of this example to ensure repeatability of the results.
Building the classification model
Once our training and validation datasets ready we can move on to creating our model. For this first
classification example we will use a Logistic Regression model:
from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression( solver = 'lbfgs', multi_class = 'auto',
random_state = 0 )

Again, we have passed a random_state. You will see this quite often when working with Sklearn. We
have also passed two other parameters, solver and multi_class. They are actually passing the default
values, but we need to do so to silence some warnings (solved for future versions of Sklearn).

Right now our logistic regression model is like an empty box, as we have not trained it with any data.
Sklearn uses a common API for all its models, with fit used to train the model on training data and
predict used to make predictions on data. We can then train the model using fit:

log_reg.fit( X_train, y_train );

Evaluating the model


Now that the model has been trained one of the scores of the quality of the fit we can obtain is the R^2
score, which indicates how much of the variance of the training data is explained by the model:

train_score = log_reg.score( X_train, y_train )


print("R2 Score: {}".format(train_score))

R2 Score: 0.9809523809523809

That's a fairly good R2 score! According to it our model can explain 98% of the variance of the training
data. The Iris dataset is a quite simple dataset, so this result should not be surprising.

Let's now try and make predictions on the validation data using predict. We can calculate the Mean
Squared Error of the prediction versus the real labels with mean_squared_error:

lr_test_prediction = log_reg.predict(X_test)

Confusion Matrix
We can generate the prediction's confusion matrix with confusion_matrix:

from sklearn.metrics import confusion_matrix


print("Logistic Regression confusion
matrix:\n{}".format(confusion_matrix(y_test, lr_test_prediction)))

Logistic Regression confusion matrix:


[[16 0 0]
[ 0 17 1]
[ 0 0 11]]

Although the confusion matrix above has all the information we need, its format is not very appealing.
We can generate a better visualization using a heatmap from the Seaborn package (plot in Figure 2.1):

import matplotlib.pyplot as plt


import seaborn as sns
plt.figure(figsize = (10,8))
sns.heatmap(confusion_matrix(y_test, lr_test_prediction), annot=True);

Visualizing error
To be able to visualize how the test and training errors evolve as the training process progresses it would
be interesting to plot them with regards to the numbers of samples used in training. This can be achieved
with the learning_curve method (plot in Figure 2.2):

# Courtesy of Sklearn documentation


import numpy as np
from sklearn.model_selection import learning_curve, ShuffleSplit

# Cross validation with 100 iterations to get smoother mean test and train
# score curves, each time with 20% data randomly selected as a validation
set.
cv = ShuffleSplit(n_splits=100, test_size=0.2, random_state=0)

estimator = LogisticRegression(solver='lbfgs', multi_class='auto',


random_state=0)

plt.figure(figsize=(8,6))
plt.ylim(0.7, 1.01)
plt.xlabel("Training examples")
plt.ylabel("Score")

train_sizes, train_scores, test_scores = learning_curve( estimator, X, y,


cv=cv, n_jobs=-1, train_sizes=np.linspace(.1, 1.0, 10) )
plt.grid()

plt.plot( train_sizes, np.mean(train_scores, axis=1), 'o-', color="r",


label="Training score" )
plt.plot( train_sizes, np.mean(test_scores, axis=1), 'o-', color="g",
label="Test score" )
plt.legend( loc="best" );

Figure 2.1 Figure 2.2


Regression with the Boston Housing Dataset

In this exercise we will take a more in-depth view of the process of building a supervised learning model,
particularly a Linear Regression. We will start by importing the packages we will need for plotting and data
manipulation:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets

Loading and inspecting the data


For this example, we will use the well-known Boston Housing dataset, which includes the median prices
for houses in different neighborhoods in the Boston area. As with the Iris dataset, Sklearn offers a pre-
processed version in its datasets package so we can directly start working with it. In the same fashion as
with the Iris, we can load it with the load_boston function:

boston_ds = datasets.load_boston()

Sklearn also packages a description of the dataset with the structure returned by load_boston,
accessble throug the DESCR attribute:

print(boston_ds.DESCR)

[HERE THE DESCRIPTION OF THE DATASET, ALSO REFERENCES TO AUTHORS]

We can access the feature names through the feature_names attribute:

boston_ds.feature_names

array(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD',


'TAX', 'PTRATIO', 'B', 'LSTAT'], dtype='<U7')

All the data stored in the structure returned by load_boston is in Numpy's array format, the features,
labels and feature names being stored in a separate array each. This format can be somewhat
inconvenient when manipulating and exploring data so in order to avoid this we will convert the dataset
into Pandas' dataframe format. Pandas is Python's library for data manipulation and analysis which allows
to store data in a relational database table-style with its DataFrames.

We will first create a dataframe with the features and their respective names (recall that the feature data
is stored in the data attribute and the list of feature names is stored in feature_names):

boston_df_raw = pd.DataFrame(data = boston_ds.data, columns =


boston_ds.feature_names)
Once we have created the dataframe with the features, we can add the MEDV column containing the
target values:

boston_df_raw['MEDV'] = boston_ds.target

You can check the number of rows and columns of a dataframe with the shape attribute:

boston_df_raw.shape

(506, 14)

You can also visualize the first rows of the dataframe with nice formatting with head. Passing a number
to head will display that amount of rows, if no argument is passed it will default to displaying 5 rows:

boston_df_raw.head()

CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 396.90 4.98 24.0
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4.03 34.7
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5.33 36.2

Some preliminary analysis and data preparation


In this example we have a significant amount of features, and most likely not all of them will be directly
related to MEDV. Because we want to avoid bloating our linear regression model with features that do
not provide much (if at all) information on the target, it is advisable to first explore the data and find which
ones are the features that are correlated to MEDV the most.
A particularly convenient way to do this is through a Correlation Matrix, which displays the correlations
between the indicated variables. In order to visualize this in an effective way we can use a heatmap we
did with the confusion matrix in one of the previous examples (plot in Figure 2.3):

plt.figure(figsize=(14, 12))
sns.heatmap(boston_df_raw.corr(), annot=True);

Cells corresponding to highly correlated variables take colors on the far ends of the scale. By analyzing the
values on the heatmap above we can see that the features MEDV is correlated with the most are RM (0.7)
and LSTAT (-0.74) so we will use them to make our predictions.

Another frequent way of visualizing data is through a scatterplot matrix, which allows to display the way
in which different variables are related (Seaborn's version also includes the histogram for each variable
on the diagonal). As we have already identified that MEDV is related to RM and LSTAT the most, we will
only display the scatter matrix for those three variables (plot in Figure 2.4):

sns.pairplot(boston_df_raw[['MEDV', 'RM', 'LSTAT']]);


Figure 2.3

Observing the scatterplots of MEDV against the other two variables we can see that the value 50
appears for many different values of the other variables, and it does not follow the overall trend of the
rest of the data. Because MEDV ranges from 0 to 50 these values might not correspond to an actual
median price but rather be the result of truncating the real value to the maximum of the scale (50). To
avoid considering modified values we will remove them from the data:

boston_df = boston_df_raw[boston_df_raw['MEDV'] < 50]

If we check the number of rows in the dataset with shape we can see we have deleted 16 rows:

boston_df.shape

(490, 14)

Taking a further look to the data we can see that the relation between MEDV and LSTAT is definitely
not linear. Observing LSTAT's histogram there's a clear positive skewness which suggests that we might
be able to address this non-linearity by applying a logarithmic transformation. Let's create a new
dataframe with the the transformed LSTAT:

df = pd.DataFrame(boston_df[['MEDV', 'RM']])
df['logLSTAT'] = np.log(boston_df['LSTAT']);

And now let's plot is scatter matrix (plot in Figure 2.5):

sns.pairplot(df);

Figure 2.4 Figure 2.5

Now the relationship between MEDV and logLSTAT definitely looks more like a linear one.

Training and validation datasets


As we discussed in the previous exercises, we want to split our data into training and validation instances
so we can test the linear regression model with data that has not been used to train it. For that purpose,
we use the train_test_split function, this time only requiring the dataframe containing both the
features and label data (one of the advantages of working with dataframes):

from sklearn.model_selection import train_test_split


train_df, test_df = train_test_split(df, test_size=0.3, random_state=0)

Now that the data is ready we can move on to creating and training the model. To avoid writing the feature
names multiple times we can store them in a variable:

feats = ['RM', 'logLSTAT']


labels = 'MEDV'

Building the linear regression model

from sklearn.linear_model import LinearRegression


lin_reg = LinearRegression().fit(train_df[feats].values,
train_df[labels].values.reshape(-1,1))

The values.reshape(-1,1) methods are used to reshape the numpy array into a specific shape
required by Sklearn. You can see that we have also applied the fit method directly rather than creating
the model and training in two separate steps.

Evaluating the model


We can check the coefficients and intercept of our freshly trained model:

print("Coefs: {}".format(lin_reg.coef_))
print("Intercept: {}".format(lin_reg.intercept_ ))

Coefs: [[ 2.9001909 -8.69545777]]


Intercept: [24.44541119]

The coefficients alone don't provide much insight on the quality of the model, so we can calculate the R^2
score to check how much variance on the data our model is able to explain:

train_score = lin_reg.score(train_df[feats].values,
train_df[labels].values.reshape(-1,1))

print("R2 Score: {}".format(train_score))


R2 Score: 0.7149388632460918

Not a great score, but take into account we are making predictions in only two variables. Let's now
calculate the Root Mean Square Error:

from sklearn.metrics import mean_squared_error


train_prediction = lin_reg.predict(train_df[feats].values)
rmse = np.sqrt(mean_squared_error(train_df[labels].values.reshape(-1,1),
train_prediction))
print("RMSE on test data: {}".format(rmse))

RMSE on test data: 3.978592909011344

The results above are on training data, at this point in the course you might have an intuition that testing
on training data alone does not give much insight on the quality of the model. To see an actual measure
of the quality of our model we need to test it on the validation data. For that purpose, we will predict
MEDV for each one of the observations on the validation dataset and calculate the Root Mean Square
Error (RMSE) with regards to the real values:

test_prediction = lin_reg.predict(test_df[feats].values)
rmse = np.sqrt(mean_squared_error(test_df[labels].values.reshape(-1,1),
test_prediction))
print("RMSE on test data: {}".format(rmse))

RMSE on test data: 4.734323703976905

As expected, the RMSE on the test data is slightly above the one on the training data.
Exercise 3 – Unsupervised Learning

Performing some Clustering


Now, we'll take a look to unsupervised learning through the quite simple yet powerful K-means algorithm.
We will start by importing the required packages as usual:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

And setting a seed for repeatability:

seed=25740565

For illustration purposes we will generate our own data for this example so we can have full control over
it. Let's start by defining the cluster centers:

centers = np.array(((0,0), (6,0), (6, 6)))

In order to generate data for clustering purposes Sklearn provides the make_blobs function that allows
to generate isotropic Gaussian data points or 'blobs' that can be used to for clustering:

from sklearn.datasets.samples_generator import make_blobs


X, y = make_blobs(n_samples = 300, centers = centers, n_features = 2,
cluster_std = [0.5, 1.5, 1], random_state = seed)

We have told make_blobs to generate 300 2-dimensional points spread over 3 blobs, each one centered
at one of the points we defined in centers. As usual, it is a good idea to plot the data (whenever
dimensionality allows) to see what it looks like (plot in Figure 3.1):

fig, ax = plt.subplots(figsize=(6,6))
ax.scatter(X.T[0], X.T[1], s=3);

Adding the cluster centers (plot in Figure 3.2):

ax.scatter(x=centers.T[0], y=centers.T[1], s=80)


fig

And giving it some color so we can better differentiate the clusters (plot in Figure 3.3):

fig, ax = plt.subplots(figsize=(6,6))
ax.scatter(X.T[0], X.T[1], s=3, c=y);
ax.scatter(x=centers.T[0], y=centers.T[1], s=80, c=[0,1,2]);
Figure 3.1 Figure 3.2 Figure 3.3

As usual, let's split the data into training and validation datasets:

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3,
random_state = seed)

We can now build our clustering model and training using fit. K-means requires the number K of clusters
as a parameter, so we will use our privileged knowledge and we will set K = 3:

from sklearn.cluster import KMeans


kmeans = KMeans(n_clusters=3, random_state=seed).fit(X_train)

We can check the labels that the algorithm has assigned with the labels_ property:

kmeans.labels_

array([1, 2, 0, 1, 1, 2, 2, 0, 2, 0, 2, 0, 2, 2, 1, 1, 0, 0, 2, 1, 0, 1,
2, 2, 2, 0, 0, 2, 1, 1, 0, 2, 2, 0, 0, 0, 2, 0, 1, 2, 0, 1, 2, 0,
2, 1, 0, 0, 0, 2, 0, 2, 2, 2, 2, 0, 2, 1, 1, 0, 1, 0, 0, 1, 2, 1,
1, 2, 1, 2, 0, 0, 2, 1, 2, 0, 0, 0, 2, 1, 2, 1, 0, 1, 2, 0, 2, 2,
1, 1, 2, 1, 0, 1, 0, 1, 2, 1, 0, 0, 1, 2, 0, 0, 0, 1, 0, 0, 1, 2,
2, 2, 0, 0, 1, 2, 0, 1, 2, 1, 2, 0, 0, 0, 2, 0, 0, 0, 2, 0, 2, 2,
0, 1, 2, 2, 1, 2, 0, 2, 1, 2, 2, 0, 1, 0, 2, 2, 1, 0, 1, 2, 1, 1,
1, 2, 1, 1, 2, 2, 1, 2, 0, 1, 0, 1, 2, 2, 0, 2, 0, 1, 0, 1, 1, 1,
1, 2, 2, 1, 1, 0, 1, 0, 1, 0, 0, 2, 0, 1, 0, 0, 0, 2, 2, 0, 0, 1,
2, 1, 2, 1, 0, 1, 2, 2, 1, 1, 2, 0], dtype=int32)

And the cluster centers obtained by k-means with cluster_centers_:

kmeans.cluster_centers_.T

array([[ 6.14166207, 5.74371096, -0.01398996],


[ 5.93004311, -0.24699213, 0.01725255]])

If we compare the clustering obtained by k-means and the original cluster we can see it did a pretty good
job (don't mind the cluster colors not matching from one plot to another, k-means re-assigns label values
to each cluster when done training) (plot in Figure 3.4):
fig, ax = plt.subplots(1, 2, figsize = (14,6))
ax[0].scatter(X_train.T[0], X_train.T[1], s = 3, c = y_train);
ax[0].scatter(x = centers.T[0], y = centers.T[1], s = 80, c = [0,1,2]);
ax[0].set_title('Actual data')
ax[1].scatter(X_train.T[0], X_train.T[1], s = 3, c = kmeans.labels_)
ax[1].scatter(x = kmeans.cluster_centers_.T[0], y =
kmeans.cluster_centers_.T[1], s = 80, c = [0,1,2])
ax[1].set_title('k-means clustering');

Figure 3.4

Overlaying both plots, we can see the cluster centers have a pretty good match (actual centers in red, plot
in Figure 3.5):

fig, ax = plt.subplots(figsize = (6,6))


ax.scatter(X_train.T[0], X_train.T[1], s = 3, c = kmeans.labels_);
ax.scatter(x = kmeans.cluster_centers_.T[0], y =
kmeans.cluster_centers_.T[1], s = 80, c = [0,1,2])
ax.scatter(x = centers.T[0], y = centers.T[1], s = 80, c = ['red']);

Another way to evaluate the quality of a clustering is the homogeneity_score, which measures how
many of the points in a cluster correspond to a single class:

from sklearn.metrics import homogeneity_score


homogeneity_score(y_train, kmeans.labels_)

0.9603651641018862

The main drawback of homogeneity is that it only takes into account points within the same cluster
belonging to the same class, so if all points in two different clusters belong to the same class the
homogeneity score will be high even though there should be only one cluster.

One of the main issues with K-means is that you need to specify the number of clusters you want the
algorithm to look for. As normally you do not know the amount of clusters beforehand tuning this
hyperparameter can be tricky, especially for large values of k. For example, if we look for only two clusters
in our dataset (plot in Figure 3.6):
kmeans2 = KMeans(n_clusters = 2, random_state = seed).fit(X_train)
fig, ax = plt.subplots(figsize = (6,6))
ax.scatter(X_train.T[0], X_train.T[1], s = 3, c = kmeans2.labels_);
ax.scatter(x = kmeans2.cluster_centers_.T[0], y =
kmeans2.cluster_centers_.T[1], s = 80, c = [0,1])
ax.scatter(x = centers.T[0], y = centers.T[1], s = 80, c = ['red']);

We can see that k-means has aggregated two clusters into a single one, finding two clusters as we told it
(original cluster centers in red). Checking the homogeneity score we can see its significantly lower:

homogeneity_score(y_train, kmeans2.labels_)
0.47267561432978034

Alternatively, if we say K = 5 (plot in Figure 3.7):

kmeans5 = KMeans(n_clusters=5, random_state=seed).fit(X_train)


fig, ax = plt.subplots(figsize=(6,6))
ax.scatter(X_train.T[0], X_train.T[1], s=3, c=kmeans5.labels_);
ax.scatter(x=kmeans5.cluster_centers_.T[0], y=kmeans5.cluster_centers_.T[1],
s=80, c=[0,1,2,3,4])
ax.scatter(x=centers.T[0], y=centers.T[1], s=80, c=['red']);

Figure 3.5 Figure 3.6 Figure 3.7

We can see now that it has found two extra cluster that were not there in the first place. Calculating the
homogeneity score:

homogeneity_score(y_train, kmeans5.labels_)
0.9999999999999998

As we mentioned before, because it is only checking that point within each cluster belong to the same
class, it has a score of effectively one. Taking it to an extreme case, if we made as many clusters as data
points the homogeneity score would be 1 so you need to put some thought when using and interpreting
the score.

A case of Reinforcement Learning


Now, we will see an example of reinforcement learning, another unsupervised learning method which
does not learn from a labeled dataset but rather from a series of actions and its corresponding rewards.
Let's start with some theory:

Let Temporal Differencing Q-Learning be the formula:

Q(s,a) ← Q(s,a) + α(R(s) + γ ⋅ maxa′ Q(s′,a′) − Q(s,a))


...also known as SARSA when the new state selection is obtained from our current Q(s,a) look-up table.

 Q(s,a): is the score of the current status given the current action
 α: is the learning rate
 R(s): is the reward for reaching status s
 γ: is the discount factor for the expected new score from next status
 maxa′ Q(s′,a′): is the best expected score for available actions and status

We also consider the Utility Function U(s) as

U(s) = maxa Q(s,a)


so we want at each step maximize the Utility of our state by choosing the best action each time:

Q(s,a) ← Q(s,a) + α(R(s) + γ ⋅ U(s′) − Q(s,a))


When s is a final state, we consider the utility as the final reward obtained.

Implementation of the Q-Learning function


Before implementing the qlearn function itself we need to implement the utility function which
exhaustively finds the best score, action and state given a set of valid actions and current scoring:

import numpy as np
import numpy.random as rd
import math

# Auxiliar function to get the index of a given action


def find_action_index(actions, a):
return [x==a[0] and y==a[1] for x, y in actions].index(True)

# Utility function // Fitness function


def utility_function(s, valid_actions, q_scoring):
best_q = -math.inf
best_a = -math.inf
best_s = s

# Shuffle the actions


shuffled_actions = rd.permutation(valid_actions)
for a_prime in shuffled_actions:
# Tentative status
s_prime = a_prime + s
# Check score for Q(state', action')
q_prime = q_scoring[s_prime[0], s_prime[1],
find_action_index(actions, a_prime)]

if(q_prime > best_q):


best_q = q_prime
best_a = a_prime
best_s = s_prime
return {"q": best_q, "a": best_a, "s": best_s }

We also need a viability function that returns the list of possible actions from a given state:

# Viability function
def viability_function(actions, s):
valid_actions = []
for a in actions:
a_prime = a
if s[0] + a_prime[0] >= 0 \
and s[1] + a_prime[1] >= 0 \
and s[0] + a_prime[0] < rewards.shape[0] \
and s[1] + a_prime[1] < rewards.shape[1]:
valid_actions.append(a_prime)
return np.array(valid_actions)

Now we can create the qlearn function:

# qlearn function
def qlearn(actions, rewards, s_initial, alpha, gamma, max_iters,
q_scoring=None):
# Initialize scoring, state and action variables
if q_scoring is None:
q_scoring = np.full([rewards.shape[0], rewards.shape[1],
len(actions)], 0.5)
s = s_initial
a = actions[0]

# Initialize counters for breaking loop


convergence_count = 0
iteration_count = 0
prev_status = (-1,-1)

while iteration_count < max_iters:


# Get valid actions
valid_actions = viability_function(actions, s)

# Get maximum Q in current Status s


best = utility_function(s, valid_actions, q_scoring)

# Check convergence. We add a convergence threshold to stop


convergence_count = convergence_count+1 if (all((s ==
best['q']).flatten()) \
or (all(prev_status ==
best['s']))) else 0
if convergence_count > 5: break

# Solve Reward
r = rewards[s[0], s[1]]

# Update State-Action Table


action_index = find_action_index(actions, a)
current_score = q_scoring[s[0], s[1], action_index]
q_scoring[s[0], s[1], action_index] = current_score + alpha * (r +
gamma*best['q'] - current_score)

# Change the status


prev_status = s
s = best['s']
a = best['a']
iteration_count = iteration_count + 1

print("Iteration: {}".format(iteration_count))
print(" Best Position: ({},{})".format(*best['s']))
print(" Best Action: ({},{})".format(*best['a']))
print(" Best Value: {}".format(best['q']))
print()

return { "s": s, "a": a, "r": rewards[s[0], s[1]], "it":


iteration_count, "q": q_scoring }

Case of example
Consider a bidimensional space of 4 x 4 cells, with the following rewards for being in each cell:

[,0] [,1] [,2] [3,]


[0,] +0 +0 +0 +0
[1,] +0 +0 +0 +0
[2,] +0 +0 -0.04 +0
[3,] +0 +0 +1 -0.1

There is a goal point, position [3,2], with high reward for being on it, and no reward or negative reward
for leaving it. Our actions are king movements in a chess game, plus the No Operation movement. Adding
the NOP movement allows us to remain in the best position when found, then exhaust the convergence
steps until loop breaks, finishing the game. The NOP has as drawback that we could get stuck in a local
sub-optimal, while forcing us to always move could let us escape from them.

Problem Details:
 Space has dimensions 4 x 4
 Goal is to reach [3,2] (We don't tell which is the goal, but rather we reward it better)
 Start point is at Random
 Reward depends only on the current position

# Seed the RNG for repeatability


rd.seed(2019)

# Generate the king movement by iterating all possible coordinates


# between -1 and 1 for the two coordinates
actions = np.array([(i,j) for i in range(-1,2) for j in range(-1,2)])

# Generate the reward map


rewards = np.zeros((4,4))
rewards[2,2] = -0.04
rewards[3,2] = 1
rewards[3,3] = -0.1

# Choose an initial state uniformly at random from all possible positions


initial_state = (rd.randint(0,4), rd.randint(0,4))

alpha = 0.5
gamma = 1.0
max_iters = 50

# Run the function


final_result = qlearn(actions, rewards, initial_state, alpha=alpha,
gamma=gamma, max_iters=max_iters)
Exercise 4 – Neural Networks

Simple Multi-Layer Perceptron using Sklearn


We will now see how to implement a simple MultiLayer Perceptron (MLP) using Sklearn, before applying
distribution technologies.

Generating the Data


For this example, we will generate a simple dataset to train our first full-fledged neural network. Let's
start by importing the necessary packages for number manipulation and visualization and setting up a
seed for the RGN:

import numpy as np
import numpy.random as rd
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set up a seed for the RNG


seed = 25740565
rd.seed(seed)

We would like to see how the neural network performs in non-linearly separable data, so we will generate
two group of points in a radial space:

# Define data parameters


num_features = 2
num_classes = 2
num_samples = 2000

mean = 0.0
var = 0.26
threshold = 0.25

x = rd.normal(mean, var, num_samples)


y = rd.normal(mean, var, num_samples)
labels = np.array([np.linalg.norm(s) > threshold for s in zip(x,y)])
colors = ['r', 'b']

features = np.vstack((x,y)).T

This data is obviously non-separable by a logistic regression or linear models (without transforming the
feature vectors x and y). We can plot it to see that it is not linearly separable (plot in Figure 4.1):

plt.figure(figsize=(6,6))
plt.scatter(x,y, c=[colors[int(l)] for l in labels], s=0.2);
plt.xlim(-1,1); plt.ylim(-1,1);

Now we create a training dataset and a test dataset for validation purposes:

from sklearn.model_selection import train_test_split


features_train, features_test, labels_train, labels_test =
train_test_split(features, labels, random_state=seed)

Now we are training the model. We are using the fit function to fit a model to the data. We are selecting
a MultiLayer Perceptron, and we choose a single layer with 64 neurons:

from sklearn.neural_network import MLPClassifier


mlp = MLPClassifier(hidden_layer_sizes=(64))
mlp.fit(features_train, labels_train)

Time to test! We can use the function predict to pass new data (here the test dataset) through the model,
then we get the confusion matrix. As we have 2 classes, the confusion matrix will be a 2 x 2 table, with
“Real vs Predicted” values. We want to have all the big numbers on the diagonal (“Real Red & Predicted
Red”, “Real Blue & Predicted Blue”)

from sklearn.metrics import classification_report,confusion_matrix


predictions = mlp.predict(features_test)
print(confusion_matrix(labels_test,predictions))

[[191 11]
[ 6 292]]

As we see, the network learns pretty well, as we have around 97% of values correctly predicted. We can
also print the summary of the classification, and see the accuracy and precision values:

print(classification_report(labels_test,predictions))

precision recall f1-score support

False 0.97 0.95 0.96 202


True 0.96 0.98 0.97 298

accuracy 0.97 500


macro avg 0.97 0.96 0.96 500
weighted avg 0.97 0.97 0.97 500

We can finish by plotting how the test points are classified “red” or”blue”, and see that follow the same
color than the shape we used to design the dataset (plot in Figure 4.2):

plt.figure(figsize=(6,6))
plt.scatter(features_test.T[0], features_test.T[1], c=[colors[int(l)] for l
in predictions], s=0.2);
plt.xlim(-1,1); plt.ylim(-1,1);
Figure 4.1 Figure 4.2

Reminder
All these exercises are to show the basic concepts of those technologies. To have a better understanding
and discover all the capabilities of those methods and applications, check the reference manuals and play
with new examples.

You might also like