0% found this document useful (0 votes)
2 views47 pages

Intelligent Optimization Algorithm for Master (2)

The document discusses intelligent algorithms for solving optimization problems, highlighting methods such as hill-climbing and genetic algorithms. It also covers reinforcement learning techniques, including Q-learning and deep Q-learning, to optimize decision-making in uncertain environments. Additionally, it suggests hybrid approaches combining heuristic methods and neural networks for improved problem-solving in specific cases like RCPSP.

Uploaded by

lizhen.huang09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views47 pages

Intelligent Optimization Algorithm for Master (2)

The document discusses intelligent algorithms for solving optimization problems, highlighting methods such as hill-climbing and genetic algorithms. It also covers reinforcement learning techniques, including Q-learning and deep Q-learning, to optimize decision-making in uncertain environments. Additionally, it suggests hybrid approaches combining heuristic methods and neural networks for improved problem-solving in specific cases like RCPSP.

Uploaded by

lizhen.huang09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Topic: Intelligent

algorithm for
optimization problems
• Optimization problem
example
Optimization problem example b
cases from

Optimization problem example c


Rcpsp math formation
• Objective function: min([ft[i],…..]), given es[i], i=1,….N

• St: ft[i]=es[i]+d[i]
• if i to pa[j], es[j]>=ft[i]
• For any day t, k type of rs consumed by act i

∑𝑟 𝑠𝑖𝑘𝑡 <𝑟 𝑠 𝑙𝑘
𝑖
It is not possible to Heuristic approach
solve the problem is the promissing
by manual way for this
operation. problem.
How can we use IA to solve the optimization
problem?
• hill-climbing algorithm (competition between two individuals)

f ( x0)

f ( x0  delta)
Example

• See hill_climb algorithm.py


• Find the maximum value for
Sin(x^2)+2*cos(2*x)
• X is in[5,8]
How can we use IA to solve the optimization
problem?
• genetic algorithm ( competition among group)

• Many individuals
• Crossover and mutation
Ga process
• Step1 :Generate the individual answer (the answer should
be the feasible answer)
• Srep 2:Generate a population of answers
• Step 3:make the object function for the problem
• Step 4:Evaluate the population by using object function
• Step 5:Select the feasible answers according to their fittting
values
• Step 6:Corssover
• Step 7:Mutation
• Step 8: back to step 4
Variation: cossover and mutation for
binary value
Variation: cossover and mutation for decimal
value
Variation: mutation for decimal value
Advantage and disadvantage
• question free
• Not guarante to global solution
• Many parameters
• Operate Slowly with operator
Several algorithms with few
parameters and simple evolution
strcture
• 1+1 ES
• Only mutation
Several algorithms with few
parameters and simple evolution
strcture
• U +lambda ES (u parents, each parent produce lambda children, all
are evaluated, select u, repeat)
• Only mutation
DE flow
chart
( more on
mutation)
Difference
evolution
PSO (competition and cooperation)
• Particle Swarm Optimization (PSO)
What is nn
Surrogate • To find an approximate function for the data,
traditionally using gausian process with kernal
optimization function
Neuron network (surrogate
optimization)
• The concept of surrogate optimization
• To find an approximate function for the data, traditionally using gausian
process with kernal function
• but nn is more powerful to fit the data
• (an example)…nn for optimization

• Differentiable , continuours function


Rl (reinforcement learning)
• Based on the Dynamical programming and control theory
• Subproblems
• For each subproblems ,Presented by states and controled varaibled
Rl (reinforcement learning)
• Learning what?
• Learning reaction strategy to unknown environment or given state
Rl
• Learning from data

• State: (fire)
• Action: (oil),
• Rw_f(state,action)=reward
• Rw_f( fire,use oil)=-50
• Rw_f(fire, use water)=100
Using q table to store the
knowledge
• Data is stored in a table with the reuslts for paired data (state, action)

• Given the q table, greedy strategy to select action under state environment
• Here, states are discrete and independent in the fire example.

Action a Action b

State 1 Q(1,a) Q(1,b)


State 2 Q(2,a) Q(2,b)
Using q table to store the
knowledge
• For consecutive task, states have specific requirements.
• It should follow Markov property.
Consecutive task or risky
environment
Explore vs exploit in rl
• for the unknown enviroment, how to explore?
• Greedy epislon strategy
Rl target

• for the unknown enviroment, by taking a lot of trial


and error, the agent obtains precious data:
• S0-a0-r0-s1-a1-r1…….sn-an-rn…… ( one episode)
• Sometimes, the immediate reward may not be clear
until the end of state.
• S0-a0-s1-a1-….sn-an-……
• Which is much alike multi bandit slot game.
• So the target for the agent is to maximize the
expectation return
• Return =R0+dis*r1+r2*dis^2+….+rn*dis^n
Consecutive task or risk enviroment
• for the unknown enviroment, by taking a lot
of trial and error, the agent obtains precious
data:
• 2,right,0,3
• 2,left,0,3
• 2,left,10,1,left,-100,0
• 2,left,10,1,right,-100,0

• How to use this experimental data to


calculate q table ?
Bellman equation

S1-r1-s2-r2…. v(s1)=r1+dis*v(s2)
Monte carlo q table
• One episode:
• 2,left,10,1,left,-100,0
• 2,left,10,1
• 1,left,-100,0 (end state)
• q(1,left)=-100+dis*0=-100
• q(2,left)=10+0.9*(-100)=-80
Update the knowledge
• q(1,left)=-100
• q(2,left)=-80
• New episode
• 2,left,10,1,right,-100,0
• 2,left,10,1
• 1,right,-100,0
• q(1,right)=-100+dis*0=-100
• q(2,left)=10+0.9*(-100)=-80
• Update the knowledg again
• q(1,left)=-100 , q(1,right)=-100 ,q(2,left)=(-80-80)/2=-80,
Monte carlo q
table

• Need a lof of experiments to


explore to obtain useful q table for
exploitation
• If Enviroment is too complicated,
some states may not be detected.
For example , the np problem.
• Need to get a complete episode
Bellman optimal equation: q
learning
• S1-a1-r1-s2( section of episode)
• Q(s1,a1)=r1+dis*max v(s2) ….. q learning
• S1-a1-r1-s2-a2( section of episode)
• Q(s1,a1)=r1+dis*q(s2,a2)-----sarsa learning
Deep q learning
• If state space is infinite, q table is unavailable.
• We use nn to fit the data for q value.
data
• S0,a0,r0,s1,a1,r1,s2…..
• (S0,a0,r0),( s1,a1,r1)…..(we use
before to build nn)
• (S0,a0,r0,s1) ,

f(s0,a0)=r0+dis*max(list(f(s1,ai))) ,i
is all actions
• using estimation to validate nn
Policy gradient
https://towardsdatascience.com/reinforcement-learning-
explained-visually-part-6-policy-gradients-step-by-step-
f9f448e73754
Actor critic method
Application in rcpsp
• nn
to approximate a function from a matrix which store the result of fun
ction with row and column as inputs
• Monte carlo may not be enough to detect the the whole searching
space
• Heuristic methods are good at searching.
• Hybrid method may be a way to solve rcpsp
See practical ga for case 1.py
isos for case 1 improved.py
puregaforcase2.py
Thanks

You might also like