3 4 MachineLearning
3 4 MachineLearning
Machine Learning
Edward Tsang
Classical Economics
Built on critical assumptions
Machine
Learning
Combinatorial
Explosion
Define Target
Investigator
Data preparation
Which ML method?
Off the shelf
whenever possible
Apply ML method
21
Machine
Learning
program
22
23
What to learn?
• We could try to predict the price tomorrow
• We cold try to predict whether prices will
go up (or down) by a margin
– E.g. will the price go up by r% within n days?
• Notes:
– Always ask: can the market be predicted?
– There is no magic in machine learning
– Harder task less chance to succeed
24
28
29
Classification Performance
Edward Tsang
30
Reality Prediction
Confusion Matrix – –
+ +
+ –
Prediction – –
– –
- +
– –
Reality
- 5 2 7
+ +
+ 1 2 3
– +
6 4 10
– +
– –
32
Performance Measures
- + - +
Reality
Reality
- 7 0 7 - 5 2 7
+ 0 3 3 + 1 2 3
7 3 10 6 4 10
33
34
- TN 5 FP 2 7
True Positive Rate =
+ FN 1 TP 2 3
recall = TP/(TP+FN) =
6 4 10
2/(2+1) = 67%
TN = True Negative
FN = False Negative = Miss = Type II Error
FP = False Positive = False Alarm = Type I Error
TP = True Positive
37
0.8
– Points on diagonal represent
random predictions
• A curve can be fitted on 0.6
multiple measures
Random
– Note: points may not cover 0.4
classifications
widely
• Area under the curve 0.2
measures the classifier’s
performance 0.0
0.0 0.2 0.4 0.6 0.8 1.0
False Positive Rate
38
39
Predictions Classifier 1
True Positive Rate
0.8
− +
− 300 200 500
Reality
0.6
+ 100 400 500
400 600 1000 0.4 Classifier 2
− + 0.2
0.0
+ 200 300 500
0.0 0.2 0.4 0.6 0.8 1.0
600 400 1000 False Positive Rate
41
43
Scarce opportunities 1
Ideal Predictions Easy Predictions
+ +
9,900 0 99% 9,900 0 99%
Reality
+ 0 100 1% + 100 0 1%
99% 1% 100% 0%
44
Scarce opportunities 2
Easy Predictions Random Predictions
+ +
9,900 0 99% 9,801 99 99%
Reality
+ 100 0 1% + 99 1 1%
100% 0% 99% 1%
45
Scarce opportunities 3
Random Predictions Predictions
+ +
9,801 99 99% 9,810 90 99%
Reality
+ 99 1 1% + 90 10 1%
99% 1% 99% 1%
Random move from to + Better moves from to +
Accuracy = 98.02% Accuracy = 98.2%
Precision = Recall = 1% Precision = Recall = 10%
46
47
Edward Tsang
61
Inspiration
• A average human brain has
– 86 billion (8.6 x 1010) neurons
– Used as storage as well as working memory
• A desktop PC may have:
– 8GB (8 x 109) RAM
– 1 Terabyte (1012) of storage
• Human brains are very efficient
• Connections between neurons matter!
62
Hidden
Input Units Output
Units Units
63
Input …..
Hidden Units Output
Units Units
64
AlphaGo Master
AlphaGo 4-1 Lee Sedol, 2016
65
…..
Input …..
Hidden Units Output
Units Units
67
Genetic Algorithms
Edward Tsang
68
Terminology in GA
String with
1 0 0 0 1 1 0 Chromosome
Building blocks
with Genes
with values
with alleles
evaluation
fitness
69
70
71
Selection of Parents
72
Crossover
Parents Offspring F(x)
1 0 1 0 1 1 0 0 1 0 18 280
0 1 0 1 0 0 1 1 0 1 13 295
0 1 1 0 1 0 1 1 1 0 14 296
0 1 0 1 0 0 1 0 0 1 9 271
73
Mutations
0 1 1 0 1 0 1 1 1 1
74
Evolution in GA
0 1 0 1 0
Replace Population of strings 0
1
1
1
1
0
0
0
1
0
1 0 1 0 1
24%
28%
10101
01010
Selection 19% 29%
11000 01101
1 0 1 0 1
New
Parents 0 1 0 1 0
Population
Crossover
[Mutation]
1 0 1 1 0
Offspring
0 1 0 0 1
75
Exploitation vs Exploration
• To succeed, GA must maintain a healthy
balance between exploitation and exploration
• Exploitation: encourage fitness
– In Selection, fit strings have a higher chance of
being picked as parents
• Exploration: allow randomness
– Selection also depends on randomness
– Crossover and Mutation are both completely
random operations
76
GA Expertise required
77
Edward Tsang
79
80
Testing
Agent 1
1. modelling
new2
1. modelling
Agent
Market
Artificial
trading 2. interaction Market
Mechanism
strategies
Agent n Design
3. Observe
81
83
84
85
Supplementary Information
90
91
92
93
94
95
96
Types of Data
Boolean
Nominal
Ordinal
Interval
Ratio
97
Data Types
• Boolean: True or False
• Nominal: {Dog, Cat, Horse, Monkey}
• Ordinal: 1st, 2nd, 3rd, 4th
• Interval: 1-10, 11-20, 21-30, 31-40
• Ratio: 2.4, 7.8
98