Python Scikit Learn PDF
Python Scikit Learn PDF
3 appendix 27
3.1 Basic Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.1 Data types . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.2 Control flows . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.4 Object oriented programing . . . . . . . . . . . . . . . . . 33
3.1.5 Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.6 Working with modules . . . . . . . . . . . . . . . . . . . . 35
3.1.7 File management . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Scientific Python for matrix calculation with numpy . . . . . . . . 38
3.2.1 Creating matrices: help, array, matrix, ndim, shape, re-
shape, vstack, hstack, tolist, ix_ . . . . . . . . . . . . . . . 40
3.2.2 Transforming matrices: dot, multiply, *, power, **, /, >,
nonzero, copy, flatten . . . . . . . . . . . . . . . . . . . . . 43
3.2.3 Generating matrices: r_, zeros, ones, eye, diag, rand, randn,
randint, max, maximum, norm, logical_and, logical_or, &, | 47
3.2.4 Calculating with matrices: inv, pinv, matrix_rank, solve,
lstsq, svd, transpose, eig, sort, linspace, meshgrid, mgrid,
ogrid, concatenate, tile, squeeze, integrate . . . . . . . . . 50
3.3 Scientific Python for advance computions with scipy . . . . . . . 55
iii
iv CONTENTS
Prerequisite
Before starting this tutorial, you need to install Python 3 and Scientific Python with
numpy, scipy, matplotlib, scikit-learn and possibly iPython. The easiest way to
do so is to use Pyzo http://www.pyzo.org/downloads.html that installs all at once.
There are versions for Microsoft Windows (zip versions do not require administrator
proviledge), Mac OS X and even Linux.
If Python 3 is already installed with Scientific Python, install the scikit-learn with the
pip tool for instance: type in the command line interface pip3 install scikitlearn U (or
pip install scikitlearn U).
Chapter 1
Among the system to be managed, living area systems are quite unique because they
are highly human-machine cooperative systems. Indeed, the physical and control
part provides services to occupants but occupants are also part of the system and
influence it a lot with their own behavior. Human activities cannot be neglected and
are part of the system state. It is composed of:
3
4 Chapter 1 Objective of the tutorial
context related to time, weather conditions, energy costs, heat gains per zone
but also occupant current positions and activities
controls related to doors, windows, flaps and shutters positions, configurations
of the HVAC system and other electric appliances
reactions related to indoor temperature and air quality, and to satisfaction of
occupants regarding services provided by electric appliances.
Future building energy management systems may cover a large set of applications.
It can support retrospective analyses of past periods by estimating and correlating
actions and variables such as occupancy, usage of windows or heat flows through
windows for instance. Using simulation means, it can extrapolate future states
depending on hypothetical controls or replay past situations changing the controls.
To develop all these applications, energy management systems have to embed
knowledge and data models of the living area system; they have to be equipped with
learning, estimation, simulation and optimization capabilities.
Because living zone are both related to physics and to occupants, the state char-
acterizing a zone at a given time is also related to occupants. What might matter
is:
location of occupants
activities of occupants
actions performed on the envelope
actions performed on the HVAC system
actions performed on other appliances
Estimating that human part of the state can be helpful with different regards.
Usage analysis Measuring and estimating what cannot be measured can help occu-
pants to discover costly routines. It can be managed in replaying a past period,
displaying both energy impacts and human past behaviors.
Occupant behavior modeling Estimating non-measurable variables related to hu-
man behavior can ease the tuning of reactive human activity models. These
models can then be co-simulated with models of the physics to better represent
human-physical zone systems.
Performances assessment Building performance is highly dependent on usage and
therefore, it is difficult to guarantee performances except if human activity is
monitored and meets a committed limit. Human activity estimation can help to
check whether the limit is met or not. If human activities are known, intrinsic
building performances could be computed such as envelope performance.
1.3 What is machine learning? 5
This tutorial aims at illustrating the capabilities of machine learning for estimating
occupancy and human activities, using Scientific Python.
Machine learning algorithms can figure out how to perform important tasks by
generalizing from examples. This is often feasible and cost effective where manual
programming is not. As more data becomes available, more ambitious problems can
be tackled. As a result, machine learning is widely used in computer science and
other fields.
Machine learning usually refers to the changes in systems that perform tasks asso-
ciated with artificial intelligence (AI). Such tasks involve recognition, diagnosis,
planning, robot control, prediction, etc...
6 Chapter 1 Objective of the tutorial
temperature
window contact illumance
video-camera
co O2
rri , C
C
do O
r t V,
em hu
pe mi
ra dit
tu y
re
,
video-camera power
consumption microphone
heater power
power
temperature consumption motion
consumption
detector,
power illumance
window contact consumption
CO2, COV,
temperature, humidity power
consumption
door
power contact
consumption
All the data possibly used for estimating the occupancy will be called features as in
machine learning community.
8 Chapter 1 Objective of the tutorial
The objective followed by this tutorial is to use Scientific Python to determine what
are the most relevant sensors to estimate the number of occupants in the office.
We are going to determine the best way to combine measurements to get a good
occupancy estimator. It will be extended to activity estimation.
The scikit learn project has started in 2007 as a Google Summer code project
proposed by David Cournapeau. Later, still in 2007, Matthieu Brucher started to
work on this project as a part of his PhD thesis. In 2010, Fabian Pedregosa, Gael
Varoquaux, Alexandre Gramfort and Vincent Michel from INRIA took leadership
of the project and made the first public release in February 1st, 2010. Since then,
several releases appeared and a thriving international community is leading the
development.
For training, scikit-learn embeds a set of testing data corresponding to iris flower
species (see http://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.
html). The data consist of the following features in the Iris dataset: sepal length in
cm, sepal width in cm and petal length in cm. Target classes to predict are the iris
setosa, the iris versicolour and the virginica.
Machine learning generates models from data. For that reason, we will start by
discussing how data can be represented in order to be understood by the computer.
n samples the number of samples: each sample is an item to process (e.g. classify). A
sample can be a document, a picture, a sound, a video, an astronomical object,
a row in database or CSV file, or whatever you can describe with a fixed set
of quantitative traits.
n features the number of features or distinct traits that can be used to describe each
item in a quantitative manner. Features are generally real-valued, but may
be boolean or discrete-valued in some cases. The number of features must
be fixed in advance. However it can be very high (e.g. millions of features)
with most of them being zeros for a given sample. In this case, scipy sparse
matrices can be useful in that they are much more memory-efficient than
numpy arrays.
Whatever the classifier is, scikit-learn proposes common methods to process data:
model.fit() fit training data. For supervised learning applications, it accepts two
arguments: the data X and the labels y (e.g. model.fit(X, y)). For unsupervised
learning applications, it accepts only one single argument, the data X (e.g.
model.fit(X)).
model.predict() given a trained model, predict the label of a new set of data. This
method accepts one argument, the new data X_new (e.g. model.predict(X_new)),
1.5 Machine learning with scikit-learn 11
and returns the learned label for each object in the array.
model.predict_proba() For classification problems, some estimators also provide
this method, which returns the probability that a new observation has each
categorical label. In this case, the label with the highest probability is returned
by model.predict().
model.score() For classification or regression problems, most estimators implement
a score method. Scores are between 0 and 1. A larger score indicating a better
fit.
model.transform() Given an unsupervised model, transform new data into the new
basis. It also accepts one argument X_new, and returns the new representation
of the data based on the unsupervised model.
model.fit_transform() Some estimators implement this method, which performs
more efficiently a fit and a transform on the same input data.
iris = datasets.load_iris()
X, y = iris.data, iris.target
k_means = KMeans(n_clusters=3, random_state=0) # Fixing the RNG in kmeans
k_means.fit(X)
y_pred = k_means.predict(X)
Tutorials and user guide about Scikit learn can be found at http://scikit-learn.
org/stable/tutorial/ and http://www.math.unipd.it/~aiolli/corsi/1213/aa/user_guide-0.
12-git.pdf.
Chapter 2
Estimation of occupancy in an
office
13
14 Chapter 2 Estimation of occupancy in an office
Entropy is an attribute of a random variable that measures its disorder. The higher
the entropy is, the higher is the disorder associated with the variable i.e. the less it
can be predicted. Mathematically, entropy is defined by:
n1
p(y = Yi ) log2 p(y = Yi )
X
H (y) =
i =0
Solution:
import math
X1 = [0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1]
2.2 Information gain to measure the purity of a decision 15
X2 = [1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1]
X3 = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
print(entropy(X1)) # 1.0
print(entropy(X2)) # 0.41381685030363374
print(entropy(X3)) # 0.0
Information gain can now be defined for two random variables, x and y as:
where:
x: an input discrete random variable (x k is a sample) used as feature (dom(x) = {X0 , . . . , XnX 1 })
y: the target discrete random variable ( y k is a sample) (dom(y) = {Y0 , . . . , YnY 1 })
H (y): the entropy of y
H (y|x): the conditional entropy of y given x with:
nX
X 1
p (x = Xi ) H y {k|xk =Xi }
H (y|x) = (2.2)
i =0
16 Chapter 2 Estimation of occupancy in an office
The higher the reduction of disorder according to feature x is, the more is the
information gained for determining y thus making x a good feature to use for
classifying y .
Solution:
1-First step: read the values of data in the file data.txt
door, window, occupancy_data = [], [], []
file = open("data.txt", 'r')
Title= file.readline()
for line in file:
line_token = line.split(";")
door.append(float(line_token[0])), window.append(float(line_token[1])),
, occupancy_data.append(float(line_token[2])) # store the values of each column
, in example2.txt in 3 separate lists
file.close()
print(occupancy_data)
To go further using real data, two datasets are available: a training dataset in SQLite
database office.db and a validation dataset is testing.db. Training data covers for
11 days from 04-May-2015 to 14-May-2015 while validation data is collected over
4 days from 17-May-2015 to 21-May-2015.
Solution: There are three kinds of data: continuous data (figure 2.3), discrete data
(figure 2.4) and occupancy labels (figure 2.5).
Another issue has to be solved: how to calculate the information gain for continuous
variables such as CO2 concentration or (average) occupancy.
Solution: According to the prior discretization, it comes out that the best sensors to
estimate occupancy are :
Feature IG 1
microphone 0.68
motion fluctuations 0.62
occupancy from physical model 0.55
power consumption 0.5
CO2 mean 0.5
CO2 derivative 0.452
doors position 0.41
windows position 0.341
indoor temperature 0.08
mean (tempoutside -tempinside ) 0.07
day type 0
Finally, after removing less important features, the main informative features are
found to be:
1. Microphone.
2. Occupancy estimation from CO2 physical model.
3. Motion detector counting.
4. Occupancy estimation from power consumption.
5. Door position.
Propose a classifier based on decision tree using scikit learn and classify the sample
Xnew = [2, 2]
Solution:
from sklearn import tree
X = [ [0, 0], [1, 1], [0, 1] ] # training data (2 features 3 samples)
Y = [0, 1, 0] # labelling data
classifier = tree.DecisionTreeClassifier(random_state=0, criterion='entropy',
, max_depth=None)
classifier.fit(X, Y) # fit the tree according to training and labelling data
print(classifier.predict([[2, 2]])) # predict the new target sample (validation)
with open("tree.dot", 'w') as file:
f = tree.export_graphviz(classifier, out_file=file) # save the tree as dot file and open
, it using Graphviz or a text editor (at lear WordPad on Microsoft Windows)
Figure 2.6 represents the learned decision tree. Right arrow corresponds to the
decision when condition in the box is satisfied.
To go further, we are going to use scikit-learn to generate the decision tree for
estimating the number of occupants based on datasets office.db and testing.db.
Exercise 2.7 Depth of decision trees Run example5.py and observe the
results, then propose some changes in DecisionTreeClassifier function to limit
the depth of the tree.
The C4.5 decision tree algorithm has been used to perform recognition by using
aggregated features and the labels extracted from video cameras. 5 occupancy levels
have been defined to generate decision trees because of the maximum number of
22 Chapter 2 Estimation of occupancy in an office
Figure 2.7 shows the results obtained from the learned decision tree considering all
the features as input to the detection model, where we plot both actual occupancy
profile and the estimated profile as a graph of number of occupants with respect
to time (quantum time is 30 minutes). The accuracy achieved is 79% (number of
correctly estimated points divided by the total number of points), and average error
0.32 (average distance between actual points and estimated points). The following
table represents the values of average error for each class of estimation:
Figure 2.8 shows the result obtained from the decision tree considering only the
main features. It leads to improvement in occupancy estimation with an accuracy
of 82% and an average error of 0.24 occupant in average. Additionally, the results
indicate that microphone, occupancy estimation from CO2 physical model, motion
detector, power consumption and door contact have the largest correlation with the
number of occupancy detection.
2.3 Decision tree as a classifier 23
Here are the errors for each class when considering all the features:
Comparing the results obtained from the decision tree using all the features with the
one using only the main features, a significant improvement in occupancy estimation
for all levels can be observed. The estimation of human activities is proposed
also. Figure 2.9 represents the results of estimation human activities with average
error=0.02. The considered activities are given by:
Note that average error is taking into account each change in the estimation values,
while the accuracy consider only the correctly estimated points, even if there are
changing in the other estimation values.
Propose a classifier based on random forest using scikit learn and classify the sample
Xnew = [2, 2]
Solution:
from sklearn.ensemble import RandomForestClassifier
X = [ [0, 0], [1, 1], [0, 1] ] # training data (2 features 3 samples)
Y = [0, 1, 0] # labelling data
classifier = RandomForestClassifier(n_estimators=10, random_state=0, criterion='
, entropy',max_depth=None)#n_estimators is the number of trees in the
, forest
classifier.fit(X, Y) # fitting the tree according to training and labelling data
print(classifier.predict([[2, 2]])) # predict the target (validation)
Run example8.py and compare the current results of random forest with the the
results of example5.
Chapter 3
appendix
The Python programming language is one of the most popular programming lan-
guages for scientific computing. Considering the number of papers that compare it
to Matlab, we can infer a growing popularity in scientific community. Thanks to its
high level interactive nature and its maturing ecosystem of scientific libraries, it is
an appealing choice for algorithmic development and exploratory data analysis. Yet,
as a general purpose language, it is increasingly used not only in academic settings
but also in industry.
27
28 Chapter 3 appendix
Python 2 is not fully compatible with Python 3. Python 2 will not be released any
more. Python 3 is a better choice for beginners.
Working with Python can be done just like with the command line window of Matlab
using IDLE or iPython with a lot more facilities. For beginners that would like
to test Python, we recommend the Pyzo distribution because it is easy to install
on any OS but also because it installs iPython, scientific libraries and a basic
integrated development environment named IEP (http://www.pyzo.org/downloads.
html). Nevertheless, a simple text editor such as Sublime Text or Komodo
Edit can also be used to develop and run codes. For those who would like an
advanced integrated development environment, we recommend PyCharm (http:
//www.jetbrains.com/pycharm/)
Getting help about an function or an object can be done from within Python. For
instance, to get help about a function list, you can do: help(list), print(list.__doc__) or
dir(list). Official documentation can be found at http://docs.python.org/3/.
Here are some example of codes illustrating basic data types and how they can be
processed:
myvar += 2
print(myvar) # 5
myvar = 1 # no semicolon at the end
print(myvar) # 4
# a variable is not explicit typed. myvar was refering to an integer. It will now refer to a character
, string.
myvar = """This is a multiline comment.
The following lines concatenate the two strings.""" # here is a string including a carriage return
mystring = "Hello"
mystring += " world." # use `+' to concatenate strings
print(mystring) # Hello world.
print(mystring[2:5]) # llo
print(mystring[2:4]) # llo wo (negative index count from the end)
print('a %s parrot' % 'dead') # a dead parrot
[x for x in mystring] # ['H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '.']
'world' in mystring # True
myvar, mystring = mystring, myvar # This swaps the variables in one line. It does not violate
, strong typing because values are not actually being assigned, but new objects are bound
, to the old names.
a=(1,2,3) # a tuple
a[1] # 2
a[1] = 3 # error: 'tuple' object does not support item assignment
mylist = ["List item 1", 2, 3.14] # heterogeneous list
print(mylist[:]) # ['List item 1', 2, 3.1400000000000001]
print(mylist[0:2]) # ['List item 1', 2]
print(mylist[3:1]) # ['List item 1', 2]
print(mylist[1:]) # [2, 3.14]
mylist.extend([4,5,6]) # ['List item 1', 2, 3.14, 4, 5, 6]
mylist.append(7) # ['List item 1', 2, 3.14, 4, 5, 6, 7]
print(mylist[::2]) # ['List item 1', 3.14, 5, 7]
mylist[0]=8 # [8, 2, 3.14, 4, 5, 6, 7]
mylist.sort() # mylist = [2, 3.14, 4, 5, 6, 7, 8]
mylist.reverse() # [8, 7, 6, 5, 4, 3.14, 2]
mylist.index(3.14) # 5
del mylist[2:4] # [8, 7, 4, 3.14, 2]
mylist[1:2] = [4.5,2.3] # [8, 4.5, 2.3, 4, 3.14, 2]
[x for x in range(2,23,3)] # [2, 5, 8, 11, 14, 17, 20]
[x for x in range(2,23,3) if x%2 == 1] # [5, 11, 17]
[x * y for x in [1, 2, 3] for y in [3, 4, 5]] # [3, 4, 5, 6, 8, 10, 9, 12, 15]
any([i % 3 for i in [3, 3, 4, 4, 3]]) # "any" returns true if any item in the list is true
sum(1 for i in [3, 3, 4, 4, 3] if i == 4) # 2
d1 = {}
d2 = {'spam': 2, 'eggs': 3}
d3 = {'food': {'ham': 1, 'egg': 2}}
d2['eggs'] #[Out]# 3
30 Chapter 3 appendix
d3['food']['ham'] #[Out]# 1
'eggs' in d2 #[Out]# True
d2.keys() #[Out]# dict_keys(['spam', 'eggs'])
d2.values() #[Out]# dict_values([2, 3])
len(d1) #[Out]# 0
d2[5] = 'ok' #[Out]# {'spam': 2, 'eggs': 3, 5: 'ok'}
del d2[5] #[Out]# {'eggs': 3, 'spam': 2}
d2.items() #[Out]# dict_items([('spam', 2), ('eggs', 3)])
for k, v in d2.items(): # key: spam val: 2
print('key: ', k, 'val: ', v) # key: eggs val: 3
Comparison operators are similar to other languages (==, !=, <, <=, >, >=). Addi-
tionally, Python offers is and is not comparison operator to test object identities
i.e. addresses is memory:
number = 5
def f():
print(number)
def g():
print(number)
numner = 3
def h():
global number
print(number)
number = 3
f() #5
g() #5
g() #5
h() #5
h() #3
3.1 Basic Python 31
Python control flows look like those of other languages except curly brackets are
not used but indentations determine the code blocks:
x = 3
if x < 0:
x=0
print('Negative change to zero')
elif x == 0:
print('Zero')
elif x == 1:
print('Single')
else:
print('None') # Negative change to zero
for i in range(3,10,2):
print(i) # 3, 5, 7, 9
i=0
rangelist = range(10)
while rangelist[i] < 4:
print(rangelist[i]) # 0, 1, 2, 3
i += 1
3.1.3 Functions
ok = input(prompt)
if ok in ('y', 'ye', 'yes'):
return True
elif ok in ('n', 'no'):
return False
retries = retries 1
if retries < 0:
print(complaint)
raise IOError('refused')
passing_example([1, 2, 3], 10) ## ([1, 2, 3, 'A new item'], 4, 'A default string')
def f(x):
return x % 2 != 0 and x % 3 != 0
def g(x):
return x**3
vargtest(1,2,3,4,x=2,y=3)
vargtest(1,2,*(3,4),**{'x':2,'y':3})
# both yield
# a is: 1
# b is: 2
# additional nonkeyword arg: 3
# additional nonkeyword arg: 4
# additional keyword arg: "y": 3
# additional keyword arg: "x": 2
Python can also be used to developed object oriented codes. It supports multiple
inheritance and overloading but does not support method polymorphism. In practice,
it is not a big issue because of the default argument values.
class Dog:
counter = 0
def __init__(self, name): # constructor
self.name = name
Dog.counter += 1
def bark(self):
return self.name + '(' + str(Dog.counter) + ') wouaf'
dog1 = Dog('Lassie')
dog2 = Dog('Rex')
print(dog1.bark()) # Lassie(2) wouaf
print(dog2.bark()) # Rex(2) wouaf
class Vehicle:
def __init__(self, initial_covered_distance):
self.covered_distance = initial_covered_distance
def add_distance(self, new_covered_distance):
self.covered_distance += new_covered_distance
class Surface:
def get_surface(self):
raise NotImplementedError()
def get_perimeter(self):
raise NotImplementedError()
def __str__(self):
return 'Surface is: %f and perimeter: %f' % (self.get_surface(), self.
, get_perimeter())
class Color:
def __init__(self, color: int=0):
self.color = color
def set(self, color: int):
self.color = color
def get(self):
return self.color
class Square(Rectangle):
def __init(self, length: float, color: int=0):
Rectangle.__init__(self, length, length, color)
import math
3.1.5 Exceptions
def some_function():
try:
10 / 0 # Division by zero raises an exception
except ZeroDivisionError:
print("Oops, invalid.")
else: # Exception didn't occur, we're good.
pass
finally:
# This is executed after the code block is run and all exceptions have been handled, even
, if a new exception is raised while handling.
print("We're done with that.")
some_function()
# Oops, invalid.
# We're done with that.
class Surface:
def get_surface(self):
raise NotImplementedError()
def get_perimeter(self):
raise NotImplementedError()
def __str__(self):
return 'Surface is: %f and perimeter: %f' % (self.get_surface(), self.
, get_perimeter())
In Python, each file is a module where all the codes inside can be imported using a
single import mymodule to load the content of the file. If you want to put some codes
not to be run when importing, create a mean function starting with:
if __name__ == '__main__':
pass # put here the code to be execute when module is directly called and not in case of
, importation.
36 Chapter 3 appendix
Consider a file myio.py that contains a function called load(). To use that function
from another module, just do:
import myio
myio.load()
But be careful with multiple definitions. Just specific parts of a module can also be
imported:
from myio import load
load()
In order to make visible only a subset of the entities (functions, classes,...) in a mod-
ule, just declare a '__all__' variable as __all__ = ['myfunction1', 'myfunction2
, ',... 'Class1', 'Class2,'...]. The ones that are not in '__all__' are hidden to
other modules.
Finally, all the modules in a folder can be imported at once by importing the folder.
If only some modules must be imported, just create a file named '__init__.py'
with inside a '__all__' variable as __all__ = ['module1', 'module2',...].
file = open('data.txt','w')
file.write('line1\n') #[Out]# 6
file.write('line2') #[Out]# 5
file.write('line3') #[Out]# 5
sequence = ['line4\n', 'line5\n']
file.writelines(sequence)
file.close()
# content of data.txt
# line1
# line2line3line4
# line5
file=open('data.txt','r')
38 Chapter 3 appendix
Because Python is very relevant for prototyping solution but also because it is open
and free, the Python language is today widely used by diverse scientific communities.
It is sometime used as a Matlab replacement. Matlab is more focused on matrix
computation, signal processing and automatic control whereas Scientific Python
is larger because it also gathers the community interested in computer science.
Scientific Python is composed of a lot of libraries coming from different scientific
communities, for instance:
The 6 first libraries come usually together under the name Scientific Python or
SciPy (pronounced Sigh Pie). Documentation of the 6 libraries is available at
http://www.scipy.org/docs.html. There is a magic import from pylab import * coming
with matplotlib together with numpy ans scipy that performs proper imports in such
a way that Python becomes very close to Matlab.
Same processing could involved different methods depending on the objects used.
#[Out]# [ 5, 6, 7, 8],
#[Out]# [ 9, 10, 11, 12],
#[Out]# [13, 14, 15, 16]])
G=np.matrix(G)
G
#[Out]# matrix([[1, 4, 7],
#[Out]# [3, 1, 2],
#[Out]# [9, 5, 6],
#[Out]# [8, 4, 1]])
F*G
#[Out]# matrix([[ 66, 37, 33],
#[Out]# [150, 93, 97],
#[Out]# [234, 149, 161],
#[Out]# [318, 205, 225]])
F*F
#[Out]# matrix([[ 90, 100, 110, 120],
#[Out]# [202, 228, 254, 280],
#[Out]# [314, 356, 398, 440],
#[Out]# [426, 484, 542, 600]])
np.multiply(F,F)
#[Out]# matrix([[ 1, 4, 9, 16],
#[Out]# [ 25, 36, 49, 64],
#[Out]# [ 81, 100, 121, 144],
#[Out]# [169, 196, 225, 256]])
G=F[:,::1]
G
#[Out]# matrix([[ 4, 3, 2, 1],
#[Out]# [ 8, 7, 6, 5],
#[Out]# [12, 11, 10, 9],
#[Out]# [16, 15, 14, 13]])
X=G/F # compute X / F*X=G
X
#[Out]# matrix([[ 4. , 1.5 , 0.66666667, 0.25 ],
#[Out]# [ 1.6 , 1.16666667, 0.85714286, 0.625 ],
#[Out]# [ 1.33333333, 1.1 , 0.90909091, 0.75 ],
#[Out]# [ 1.23076923, 1.07142857, 0.93333333, 0.8125 ]])
F**2
#[Out]# matrix([[ 90, 100, 110, 120],
#[Out]# [202, 228, 254, 280],
#[Out]# [314, 356, 398, 440],
#[Out]# [426, 484, 542, 600]])
F**3
#[Out]# matrix([[ 3140, 3560, 3980, 4400],
#[Out]# [ 7268, 8232, 9196, 10160],
#[Out]# [11396, 12904, 14412, 15920],
#[Out]# [15524, 17576, 19628, 21680]])
np.power(F,3)
#[Out]# matrix([[ 1, 8, 27, 64],
#[Out]# [ 125, 216, 343, 512],
#[Out]# [ 729, 1000, 1331, 1728],
#[Out]# [2197, 2744, 3375, 4096]])
F>4
#[Out]# matrix([[False, False, False, False],
#[Out]# [ True, True, True, True],
#[Out]# [ True, True, True, True],
3.2 Scientific Python for matrix calculation with numpy 45
3.2.3 Generating matrices: r_, zeros, ones, eye, diag, rand, randn,
randint, max, maximum, norm, logical_and, logical_or, &, |
A
#[Out]# array([[9, 0],
#[Out]# [4, 1],
#[Out]# [6, 9]])
B=np.random.randint(0,10,(3,1))
B
#[Out]# array([[8],
#[Out]# [5],
#[Out]# [8]])
linalg.lstsq(A,B) #solve Ax~B (least square)
#[Out]# (array([[ 0.92999204],
#[Out]# [ 0.28122514]]),
#[Out]# array([ 1.14677804]),
#[Out]# 2,
#[Out]# array([ 13.07127037, 6.64393639]))
# best solution is x1=0.92999204, x2=0.28122514
J=np.random.randint(0,11,(3,4))
J
#[Out]# array([[ 3, 4, 0, 7],
#[Out]# [ 2, 10, 4, 6],
#[Out]# [ 9, 4, 0, 8]])
(U,S,V)=linalg.svd(J)
U
#[Out]# array([[0.45603171, 0.09977533, 0.88435285],
#[Out]# [0.61454651, 0.75404815, 0.23182748],
#[Out]# [0.64371396, 0.64919664, 0.40518645]])
S
#[Out]# array([ 18.22998676, 7.28665454, 2.36056104])
V
#[Out]# array([[0.4602644 , 0.57841226, 0.134843 , 0.65985856],
#[Out]# [ 0.63595706, 0.62368726, 0.4139338 , 0.18770089],
#[Out]# [0.61734242, 0.17013296, 0.39283455, 0.66001828],
#[Out]# [0.05102585, 0.497502 , 0.8100353 , 0.30615508]])
np.dot(U.transpose(),U)
#[Out]# array([[ 1.00000000e+00, 5.55111512e17, 0.00000000e+00],
#[Out]# [ 5.55111512e17, 1.00000000e+00, 0.00000000e+00],
#[Out]# [ 0.00000000e+00, 0.00000000e+00, 1.00000000e+00]])
np.dot(V.transpose(),V)
#[Out]# array([[ 1.00000000e+00, 4.44089210e16, 1.38777878e16,
#[Out]# 0.00000000e+00],
#[Out]# [ 4.44089210e16, 1.00000000e+00, 1.11022302e16,
#[Out]# 1.11022302e16],
#[Out]# [ 1.38777878e16, 1.11022302e16, 1.00000000e+00,
#[Out]# 9.71445147e17],
#[Out]# [ 0.00000000e+00, 1.11022302e16, 9.71445147e17,
#[Out]# 1.00000000e+00]])
D=np.hstack([np.diag(S),np.zeros((3,1))])
D
#[Out]# array([[ 18.22998676, 0. , 0. , 0. ],
#[Out]# [ 0. , 7.28665454, 0. , 0. ],
#[Out]# [ 0. , 0. , 2.36056104, 0. ]])
np.dot(np.dot(U,D),V)
#[Out]# array([[ 3.00000000e+00, 4.00000000e+00, 9.99200722e16,
#[Out]# 7.00000000e+00],
#[Out]# [ 2.00000000e+00, 1.00000000e+01, 4.00000000e+00,
52 Chapter 3 appendix
#[Out]# 6.00000000e+00],
#[Out]# [ 9.00000000e+00, 4.00000000e+00, 1.60982339e15,
#[Out]# 8.00000000e+00]])
L=np.matrix([[6,3,10],[0,2,2],[2,1,5]])
L
#[Out]# matrix([[ 6, 3, 10],
#[Out]# [ 0, 2, 2],
#[Out]# [ 2, 1, 5]])
V,P=linalg.eig(L)
V
#[Out]# array([2.88089912, 4.24581024, 1.63508888])
P
#[Out]# matrix([[ 0.76017728, 0.95227747, 0.73605421],
#[Out]# [ 0.24634855, 0.20299732, 0.66592894],
#[Out]# [ 0.60120121, 0.22794673, 0.12150244]])
P*np.diag(V)*linalg.inv(P)
#[Out]# matrix([[ 6.00000000e+00, 3.00000000e+00, 1.00000000e+01],
#[Out]# [ 1.33226763e15, 2.00000000e+00, 2.00000000e+00],
#[Out]# [ 2.00000000e+00, 1.00000000e+00, 5.00000000e+00]])
L0=L.copy()
L1=L.copy()
L0.sort(0)
L0
#[Out]# matrix([[ 0, 3, 10],
#[Out]# [ 2, 1, 5],
#[Out]# [ 6, 2, 2]])
L1.sort(1)
L1
#[Out]# matrix([[10, 3, 6],
#[Out]# [ 2, 0, 2],
#[Out]# [ 5, 1, 2]])
np.linspace(1,3,4)
#[Out]# array([ 1. , 1.66666667, 2.33333333, 3. ])
np.meshgrid([1,2,4],[2,4,5])
#[Out]# [array([[1, 2, 4],
#[Out]# [1, 2, 4],
#[Out]# [1, 2, 4]]), array([[2, 2, 2],
#[Out]# [4, 4, 4],
#[Out]# [5, 5, 5]])]
np.mgrid[0:9,0:6]
#[Out]# array([[[0, 0, 0, 0, 0, 0],
#[Out]# [1, 1, 1, 1, 1, 1],
#[Out]# [2, 2, 2, 2, 2, 2],
#[Out]# [3, 3, 3, 3, 3, 3],
#[Out]# [4, 4, 4, 4, 4, 4],
#[Out]# [5, 5, 5, 5, 5, 5],
#[Out]# [6, 6, 6, 6, 6, 6],
#[Out]# [7, 7, 7, 7, 7, 7],
#[Out]# [8, 8, 8, 8, 8, 8]],
#[Out]#
#[Out]# [[0, 1, 2, 3, 4, 5],
#[Out]# [0, 1, 2, 3, 4, 5],
#[Out]# [0, 1, 2, 3, 4, 5],
#[Out]# [0, 1, 2, 3, 4, 5],
#[Out]# [0, 1, 2, 3, 4, 5],
3.2 Scientific Python for matrix calculation with numpy 53
3.3.1 Load and saving matrices: save, load, savetxt, loadtxt, savemat,
loadmat, expm, interp1d
# use numpy own format for one matrix
import numpy as np
import scipy.io
data=np.random.randn(2,1000)
np.save('pop.npy', data)
data1 = np.load('pop.npy')
# use text format for one matrix
np.savetxt('pop2.txt', data)
data2 = np.loadtxt('pop2.txt')
# use matlab format for several matrices ({'var1':matrix1,'var2',...})
scipy.io.savemat('pop3.mat',{'data':data})
data3=scipy.io.loadmat('pop3.mat')
Compute u 1 + 3u 2 5 u53
Compute the scalar product of u 1 and u 2
Compute (A 2 I 2 )e A u 1 A 1 u 2
Extract the size of A
What is the rank of A ?
58 Chapter 3 appendix
Solve Ax = u 1
Solution:
from pylab import *
from scipy.linalg import expm
u1=matrix([[1],[2],[3]])
u2=matrix([[5],[2],[1]])
u3=matrix([[1],[3],[7]])
A=matrix([[2,3,4],[7,6,5],[2,8,7]])
print(u1+3*u2u3/5)
print(u1.transpose()*u2)
print((A**2eye(3))*expm(A)*u1inv(A)*u2)
print(A.shape)
print(matrix_rank(A))
print(solve(A,u1))
print(A*solve(A,u1))
Solution:
from pylab import *
from numpy.linalg import *
X0=matrix([[3],[1]])
X=X0.copy()
A=matrix([[.5,.8],[.3,.8]])
for i in range(10):
print(X)
X=A*X
X=X0.copy()
for i in range(1000):
X=A*X
print('X='+str(X))
V,P=eig(A)
print(V)
print('X='+str(P*matrix_power(diag(V),1000)*inv(P)*X0))
A = P P 1
A n = (P P 1 ) (P P 1 ) (P P 1 ) = (P n P 1 )
3.4 Scientific Python for plotting with matplotlib 59
Solution:
from pylab import *
from numpy.linalg import pinv
A=matrix([[2,3,7,3],[1,2,4,7],[0,1,5,1]])
B=matrix([[1],[3],[1]])
Aplus = pinv(A)
Aplus
#[Out]# matrix([[ 0.05819261, 0.01104518, 0.05636696],
#[Out]# [ 0.18051118, 0.10798722, 0.17092652],
#[Out]# [ 0.04016431, 0.00100411, 0.14148791],
#[Out]# [ 0.02031036, 0.11300776, 0.12163396]])
M=(Aplus*Aeye(4))
M
#[Out]# matrix([[0.89465997, 0.25303514, 0.08133272, 0.15362848],
#[Out]# [ 0.25303514, 0.0715655 , 0.02300319, 0.04345048],
#[Out]# [ 0.08133272, 0.02300319, 0.00739388, 0.01396623],
#[Out]# [ 0.15362848, 0.04345048, 0.01396623, 0.02638065]])
X0=Aplus*B
X0
#[Out]# matrix([[0.0313099 ],
#[Out]# [0.314377 ],
#[Out]# [ 0.18466454],
#[Out]# [ 0.23769968]])
solutions=M*randint(10,10,(4,10))+X0
A*solutions
#[Out]# matrix([[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
#[Out]# [ 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.],
#[Out]# [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
matplotlib tries to make easy things easy and hard things possible. You can generate
plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc, with just a
few lines of code.
For simple plotting the pyplot interface provides a MATLAB-like interface, particu-
larly when combined with IPython. For the power user, you have full control of line
styles, font properties, axes properties, etc, via an object oriented interface or via a
set of functions familiar to MATLAB users.
Simple examples are provided in the next as an illustration of the matplotlib capabil-
ities. Further documentation can be found at http://matplotlib.org/contents.html.
This code:
import numpy
from numpy.random import randn
import matplotlib.pyplot as plt
npoints=100
x=[i for i in range(npoints)]
y1 = randn(npoints)
y2 = randn(npoints)
y3 = randn(npoints)
y4 = randn(npoints)
plt.subplot(2,1,1)
plt.plot(x,y1)
plt.plot(x,y2,':')
plt.grid()
plt.xlabel('time')
plt.ylabel('series 1 and 2')
plt.axis('tight')
plt.subplot(2,1,2)
plt.plot(x,y3)
plt.plot(x,y4,'r')
plt.grid()
plt.xlabel('time')
plt.ylabel('series 3 and 4')
plt.axis('tight')
plt.show()
leads to:
yields:
yields:
yields:
64 Chapter 3 appendix
yields:
t = np.arange(0.0, 2, 0.01)
s1 = np.sin(2*np.pi*t)
s2 = 1.2*np.sin(4*np.pi*t)
fig, ax = plt.subplots()
ax.set_title('using span_where')
ax.plot(t, s1, color='black')
ax.axhline(0, color='black', lw=2)
collection = collections.BrokenBarHCollection.span_where(t, ymin=0, ymax=1, where=s1>0,
, facecolor='green', alpha=0.5)
ax.add_collection(collection)
collection = collections.BrokenBarHCollection.span_where(t, ymin=1, ymax=0, where=s1<0,
, facecolor='red', alpha=0.5)
ax.add_collection(collection)
plt.show()
yields:
yields:
66 Chapter 3 appendix
yields:
3.4 Scientific Python for plotting with matplotlib 67
yields:
yields:
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
ax.xaxis.set_minor_locator(months)
datemin = datetime.date(r.date.min().year, 1, 1)
datemax = datetime.date(r.date.max().year+1, 1, 1)
ax.set_xlim(datemin, datemax)
def price(x): return '$%1.2f'%x
ax.format_xdata = mdates.DateFormatter('%Y-%m-%d')
ax.format_ydata = price
ax.grid(True)
fig.autofmt_xdate()
plt.show()
yields:
yields:
70 Chapter 3 appendix
yields:
3.4 Scientific Python for plotting with matplotlib 71
yields:
yields:
plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title(r'Histogram of IQ: $\mu=100$, $\sigma=15$')
plt.subplots_adjust(left=0.15)
plt.show()
yields:
yields:
74 Chapter 3 appendix
yields:
3.4 Scientific Python for plotting with matplotlib 75
yields:
76 Chapter 3 appendix
yields:
yields:
cs[0] = 'c'
ax.bar(xs, ys, zs=z, zdir='y', color=cs, alpha=0.8)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
plt.show()
yields:
3.4.24 Contours
from mpl_toolkits.mplot3d import axes3d
import matplotlib.pyplot as plt
from matplotlib import cm
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
X, Y, Z = axes3d.get_test_data(0.05)
cset = ax.contour(X, Y, Z, cmap=cm.coolwarm)
ax.clabel(cset, fontsize=9, inline=1)
plt.show()
yields:
3.4 Scientific Python for plotting with matplotlib 79
fig = plt.figure()
ax = fig.gca(projection='3d')
X, Y, Z = axes3d.get_test_data(0.05)
ax.plot_surface(X, Y, Z, rstride=8, cstride=8, alpha=0.3)
cset = ax.contour(X, Y, Z, zdir='z', offset=100, cmap=cm.coolwarm)
cset = ax.contour(X, Y, Z, zdir='x', offset=40, cmap=cm.coolwarm)
cset = ax.contour(X, Y, Z, zdir='y', offset=40, cmap=cm.coolwarm)
ax.set_xlabel('X')
ax.set_xlim(40, 40)
ax.set_ylabel('Y')
ax.set_ylim(40, 40)
ax.set_zlabel('Z')
ax.set_zlim(100, 100)
plt.show()
yields:
80 Chapter 3 appendix
3.4.26 Surface
from mpl_toolkits.mplot3d import axes3d
import matplotlib.pyplot as plt
import numpy as np
plt.ion()
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
X, Y, Z = axes3d.get_test_data(0.1)
ax.plot_wireframe(X, Y, Z, rstride=5, cstride=5)
for angle in range(0, 360):
ax.view_init(30, angle)
plt.draw()
yields:
3.4 Scientific Python for plotting with matplotlib 81
yields:
82 Chapter 3 appendix
yields:
3.4 Scientific Python for plotting with matplotlib 83
yields:
84 Chapter 3 appendix
x = np.linspace(0, 1, 100)
y = np.sin(x * 2 * np.pi) / 2 + 0.5
ax.plot(x, y, zs=0, zdir='z', label='zs=0, zdir=z')
colors = ('r', 'g', 'b', 'k')
for c in colors:
x = np.random.sample(20)
y = np.random.sample(20)
ax.scatter(x, y, 0, zdir='y', c=c)
ax.legend()
ax.set_xlim3d(0, 1)
ax.set_ylim3d(0, 1)
ax.set_zlim3d(0, 1)
plt.show()
yields:
3.4 Scientific Python for plotting with matplotlib 85
3.4.31 Points
import matplotlib.pyplot as plt
from numpy.random import rand
for color in ['red', 'green', 'blue']:
n = 750
x, y = rand(2, n)
scale = 200.0 * rand(n)
plt.scatter(x, y, c=color, s=scale, label=color,
alpha=0.3, edgecolors='none')
plt.legend()
plt.grid(True)
plt.show()
yields:
yields:
yields:
3.4.34 Radar
import numpy as np
import matplotlib.pyplot as plt
N = 20
theta = np.linspace(0.0, 2 * np.pi, N, endpoint=False)
radii = 10 * np.random.rand(N)
width = np.pi / 4 * np.random.rand(N)
ax = plt.subplot(111, polar=True)
bars = ax.bar(theta, radii, width=width, bottom=0.0)
for r, bar in zip(radii, bars):
bar.set_facecolor(plt.cm.jet(r / 10.))
bar.set_alpha(0.5)
plt.show()
yields:
88 Chapter 3 appendix
yields:
3.4 Scientific Python for plotting with matplotlib 89
def _radar_factory(num_vars):
theta = 2*np.pi * np.linspace(0, 11./num_vars, num_vars)
theta += np.pi/2
def unit_poly_verts(theta):
x0, y0, r = [0.5] * 3
verts = [(r*np.cos(t) + x0, r*np.sin(t) + y0) for t in theta]
return verts
class RadarAxes(PolarAxes):
name = 'radar'
def fill(self, *args, **kwargs):
closed = kwargs.pop('closed', True)
return super(RadarAxes, self).fill(closed=closed, *args, **kwargs)
def plot(self, *args, **kwargs):
lines = super(RadarAxes, self).plot(*args, **kwargs)
for line in lines:
self._close_line(line)
def _close_line(self, line):
x, y = line.get_data()
if x[0] != x[1]:
x = np.concatenate((x, [x[0]]))
y = np.concatenate((y, [y[0]]))
line.set_data(x, y)
def set_varlabels(self, labels):
self.set_thetagrids(theta * 180/np.pi, labels)
def _gen_axes_patch(self):
verts = unit_poly_verts(theta)
90 Chapter 3 appendix
labels = ['v1', 'v2', 'v3', 'v4', 'v5', 'v6', 'v7', 'v8', 'v9']
values = [1, 1, 2, 7, 4, 0, 3, 10, 6]
optimum = [5, 3, 2, 4, 5, 7, 5, 8, 5]
N = len(labels)
theta = _radar_factory(N)
max_val = max(max(optimum), max(values))
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='radar')
ax.plot(theta, values, color='k')
ax.plot(theta, optimum, color='r')
ax.set_varlabels(labels)
plt.show()
yields:
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, xticks=[], yticks=[],
title="Flow Diagram of a Widget")
sankey = Sankey(ax=ax, scale=0.01, offset=0.2, head_angle=180,
format='%.0f', unit='%')
sankey.add(flows=[25, 0, 60, 10, 20, 5, 15, 10, 40],
labels = ['', '', '', 'First', 'Second', 'Third', 'Fourth',
'Fifth', 'Hurray!'],
orientations=[1, 1, 0, 1, 1, 1, 1, 1, 0],
pathlengths = [0.25, 0.25, 0.25, 0.25, 0.25, 0.6, 0.25, 0.25,
0.25],
patchlabel="Widget\nA",
alpha=0.2, lw=2.0) # Arguments to matplotlib.patches.PathPatch()
diagrams = sankey.finish()
diagrams[0].patch.set_facecolor('#37c959')
diagrams[0].texts[1].set_color('r')
diagrams[0].text.set_fontweight('bold')
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, xticks=[], yticks=[], title="Two Systems")
flows = [0.25, 0.15, 0.60, 0.10, 0.05, 0.25, 0.15, 0.10, 0.35]
sankey = Sankey(ax=ax, unit=None)
sankey.add(flows=flows, label='one',
orientations=[1, 1, 0, 1, 1, 1, 1, 1, 0])
sankey.add(flows=[0.25, 0.15, 0.1], fc='#37c959', label='two',
orientations=[1, 1, 1], prior=0, connect=(0, 0))
diagrams = sankey.finish()
diagrams[1].patch.set_hatch('/')
plt.legend(loc='best')
plt.show()
yields:
3.4.38 Vectors
92 Chapter 3 appendix
yields:
3.4.39 Insets
from pylab import *
dt = 0.001
t = arange(0.0, 10.0, dt)
r = exp(t[:1000]/0.05)
x = randn(len(t))
s = convolve(x,r)[:len(x)]*dt
plot(t, s)
axis([0, 1, 1.1*amin(s), 2*amax(s) ])
xlabel('time (s)')
ylabel('current (nA)')
title('Gaussian colored noise')
a = axes([.65, .6, .2, .2], axisbg='y')
n, bins, patches = hist(s, 400, normed=1)
title('Probability')
setp(a, xticks=[], yticks=[])
a = axes([0.2, 0.6, .2, .2], axisbg='y')
3.4 Scientific Python for plotting with matplotlib 93
plot(t[:len(r)], r)
title('Impulse response')
setp(a, xlim=(0,.2), xticks=[], yticks=[])
show()
yields;
Solution:
i , y i = x i2 +
x2
i , y i =
i
1
Y = [y 0 . . . y n1 ]
=
2 2
x . . . x n1
MX = 0 with Y = M X
1 ... 1
It yields:
= Y M XT (M X X XT )1
then
Y 0 = M X 0
94 Chapter 3 appendix
import numpy as np
import matplotlib.pyplot as plt
x=np.r_[5:6]
y=[12.5,7,5,1.6,1.1,3.2,2.6,0.7,3.4,7.6,13]
Y=np.matrix(y)
M=np.vstack([np.power(np.matrix(x),2),np.ones(Y.shape)])
theta=Y*M.transpose()*np.linalg.inv(M*M.transpose())
print(theta)
xest=np.linspace(5,5,41)
Mest=np.vstack([np.power(np.matrix(xest),2), np.ones((1,len(xest)))])
Yest=theta*Mest
yest=Yest.tolist()[0]
plt.plot(x,y,'b')
plt.plot(x,y,'bo')
plt.plot(xest,yest,'r:')
plt.grid()
plt.show()
Solution:
import numpy as np
import matplotlib.pyplot as plt
__all__=['findOutliers']
def _isOutlier(x:list,midlleindex:int,halfwidth=5,threshold=4):
window=np.hstack([x[midlleindexhalfwidth:midlleindex],x[midlleindex+1:
, midlleindex+halfwidth+1]])
windowmean=np.mean(window)
windowstd=np.std(window)
if abs(x[midlleindex]windowmean)>threshold*windowstd:
return True
return False
def findOutliers(x:list):
3.5 Code for calculating entropy and information gain 95
if __name__ == '__main__':
x=np.random.rand(1,100).reshape(1)
x[40]+=2
x[27]+=3
x[51]=2
(outlierIndice,outlierValues)=findOutliers(x)
print(outlierIndice)
plt.plot(x)
plt.plot(outlierIndice,outlierValues,'ro')
plt.grid()
plt.axis('tight')
plt.show()
class DatabaseExtraction():
def get_seconds_in_sample(self):
return self.seconds_in_timequantum
def close(self):
self.connection.commit()
self.connection.close()
return p0 * v0 p1 * v1
def discretize(value, bounds=((0,0), (0,1), (1,1))): # discretization of window and door data
for i in range(len(bounds)):
if value >= bounds[i][0] and value<= bounds[i][1]:
return i
return 1
seconds_in_sample = 1800
database_extraction = DatabaseExtraction('testing.db', '17/05/2015 00:30:00', '
, 21/05/2015 00:30:00', seconds_in_sample)
###################################
times, occupancies = database_extraction.get_sampled_measurement('labels')
times, door_openings = database_extraction.get_sampled_measurement('Door_contact')
times, window_openings = database_extraction.get_sampled_measurement('window')
import numpy
from statistics import stdev
from time import strftime, localtime, strptime, mktime
from matplotlib.pylab import *
from sklearn import metrics, tree
class DatabaseExtraction():
def get_seconds_in_sample(self):
return self.seconds_in_timequantum
previous_epochtime = augmented_data_epochtimes[i 1]
previous_datasample_epochtime = self.to_datasample_epochtime(
, augmented_data_epochtimes[i1])
epochtime = augmented_data_epochtimes[i]
datasample_epochtime = self.to_datasample_epochtime(augmented_data_epochtimes[i
, ])
previous_value = augmented_data_values[i 1]
integrator += (epochtime previous_epochtime) * previous_value
if datasample_epochtime > previous_datasample_epochtime:
sampled_data_epochtimes.append(previous_datasample_epochtime)
sampled_data_values.append(integrator / self.seconds_in_timequantum)
integrator = 0
return sampled_data_epochtimes, sampled_data_values
def close(self):
self.connection.commit()
self.connection.close()
def average_error(actual_occupants):
level_midpoint=[]
i=0
for i in range(0,len(actual_occupants),1):
x = actual_occupants[i]
if (x == 0 ):
level_midpoint.append(0)
else:
if (0<actual_occupants[i]<2):
level_midpoint.append(1)
else:
if (actual_occupants[i]>=2):
level_midpoint.append(3.15)
return level_midpoint
classifier = tree.DecisionTreeClassifier(random_state=0,criterion='entropy',max_depth=
, None)
classifier.fit(X, Y)
102 Chapter 3 appendix
features_importance = classifier.feature_importances_
X_t = []
for i in range(0, len(test_label)):
test_features = [test_feature1[i],test_feature2[i],test_feature3[i]]
X_t.append(test_features)
Y_t=classifier.predict(X_t)
accuracy = classifier.score(X_t, test_label)
#save the tree
with open("tree.dot", 'w') as file:
file= tree.export_graphviz(classifier, out_file=file)
return accuracy, features_importance, Y_t, classifier
seconds_in_sample = 1800
database_extraction = DatabaseExtraction('office4.db', '4/05/2015 00:30:00',
, '15/05/2015 00:30:00', seconds_in_sample)
times, occupancies = database_extraction.get_sampled_measurement('labels')
times, co2in = database_extraction.get_sampled_measurement('co2')
for i in range(len(co2in)): # removal of outliers
if co2in[i] >= 2500:
co2in[i] = 390
times, Tin = database_extraction.get_sampled_measurement('Temperature')
times, door = database_extraction.get_sampled_measurement('Door_contact')
times, RMS = database_extraction.get_sampled_measurement('RMS')
print("Accuracy = ",accuracy)
print("Features Importance(microphone,physical model,door
, position) = ",features_importance)
print(metrics.classification_report(label_test, estimated_occupancy,target_names = ['class
, 0', 'class 1', 'class 1']))
error = []
for i in range(len(label_test)):
s = labelerror[i] occupancies_test[i]
error.append(abs(s))
average_error = (numpy.sum(error)) / len(labelerror)
print("average_error = ", average_error)
plot(label_test, 'bo-', label='actual_level')
plot(estimated_occupancy, 'ro-', label='predicited _level')
3.7 Code for estimating occupancy using random forest 103
plt.ylabel('Occupancy Level')
plt.xlabel('Time Quantum')
axis('tight')
grid()
plt.legend(('measured','estimated'), loc =0)
plt.show()
, features_importance, Y_t, clf,confusionMatrics)
class DatabaseExtraction():
def get_seconds_in_sample(self):
return self.seconds_in_timequantum
augmented_data_epochtimes.append(epochtime)
augmented_data_values.append(augmented_data_values[1])
if self.to_datasample_epochtime(data_epochtime) > self.to_datasample_epochtime(
, augmented_data_epochtimes[1]):
augmented_data_epochtimes.append(self.to_datasample_epochtime(data_epochtime))
augmented_data_values.append(augmented_data_values[1])
augmented_data_epochtimes.append(data_epochtime)
augmented_data_values.append(data_values[i])
for epochtime in range(self.to_datasample_epochtime(augmented_data_epochtimes[1]),
, to_epochtime(self.ending_datetime), self.seconds_in_timequantum):
augmented_data_epochtimes.append(epochtime + self.seconds_in_timequantum)
augmented_data_values.append(augmented_data_values[1])
sampled_data_epochtimes, sampled_data_values = [], []
integrator = 0
for i in range(1,len(augmented_data_epochtimes)):
previous_epochtime = augmented_data_epochtimes[i 1]
previous_datasample_epochtime = self.to_datasample_epochtime(
, augmented_data_epochtimes[i1])
epochtime = augmented_data_epochtimes[i]
datasample_epochtime = self.to_datasample_epochtime(augmented_data_epochtimes[i])
previous_value = augmented_data_values[i 1]
integrator += (epochtime previous_epochtime) * previous_value
if datasample_epochtime > previous_datasample_epochtime:
sampled_data_epochtimes.append(previous_datasample_epochtime)
sampled_data_values.append(integrator / self.seconds_in_timequantum)
integrator = 0
return sampled_data_epochtimes, sampled_data_values
def close(self):
self.connection.commit()
self.connection.close()
def average_error(actual_occupants):
level_midpoint=[]
i=0
for i in range(0,len(actual_occupants),1):
x = actual_occupants[i]
if (x == 0 ):
level_midpoint.append(0)
else:
if (0<actual_occupants[i]<2):
level_midpoint.append(1)
else:
if (actual_occupants[i]>=2):
106 Chapter 3 appendix
level_midpoint.append(3.15)
return level_midpoint
seconds_in_sample = 1800
database_extraction = DatabaseExtraction('office4.db', '4/05/2015 00:30:00', '
, 15/05/2015 00:30:00', seconds_in_sample)
times, occupancies = database_extraction.get_sampled_measurement('labels')
times, co2in = database_extraction.get_sampled_measurement('co2')
for i in range(len(co2in)): # removal of outliers
if co2in[i] >= 2500:
co2in[i] = 390
times, Tin = database_extraction.get_sampled_measurement('Temperature')
times, door = database_extraction.get_sampled_measurement('Door_contact')
times, RMS = database_extraction.get_sampled_measurement('RMS')
print("Accuracy = ",accuracy)
3.7 Code for estimating occupancy using random forest 107