0% found this document useful (0 votes)
53 views20 pages

22 06 24 Deep Q Learning

Uploaded by

Asif Raza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views20 pages

22 06 24 Deep Q Learning

Uploaded by

Asif Raza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Models of Higher Brain Functions

A Tutorial on Deep-Q-Learning

Robert Lange
@RobertTLange
RobertTLange June 2022

@Sprekelerlab @ECNBERLIN @SCIOI


The Success Story.
DM - Atari DQN DM - AlphaGo DM - AlphaZero
(2013, 2015) (2016, 2017) (2018)

OpenAI - Dexterity OpenAI - Five DM - AlphaStar


(2018, 2019) (Dota 2 - 2019) (StarCraft II -2019)
A Roadmap for Today.
🐟 • MDP Formalism
🌗 🏓 • The RL Problem
🍭
• Value-Based RL
🌏

Fitted Double Dueling


DQN PER
Q-Learning DQN DQN

1996 2013 2015 2016 2016


The Action Perception Loop of RL.

Agent
<latexit sha1_base64="fgZMVIinq77FT/T/B0yT07nS0uM=">AAAC13ichVFNT8JAEB3qF+AX6tFLIzHxRFpjokf8jBcTTEQwgKYtS20o26ZdiEiIN+PVm1f9V/pbPPh2LSZKDNtsZ/bNm7czO3boe7EwjPeUNjU9MzuXzmTnFxaXlnMrq5dx0I0cVnYCP4iqthUz3+OsLDzhs2oYMatj+6xitw9lvNJjUewF/EL0Q9boWC73Wp5jCUDXdcHuhN0a7LuMi+FNLm8UDLX0ccdMnDwlqxTkPqhOTQrIoS51iBEnAd8ni2J8NTLJoBBYgwbAInieijMaUha5XbAYGBbQNv4uTrUE5ThLzVhlO7jFx46QqdMm9olStMGWtzL4Mewn9r3C3H9vGChlWWEf1oZiRimeARd0C8akzE7CHNUyOVN2JahFe6obD/WFCpF9Oj86R4hEwNoqotOxYrrQsNW5hxfgsGVUIF95pKCrjpuwlrJMqfBE0YJeBCtfH/VgzObfoY47l9sF0yiY5zv54kEy8DSt0wZtYaq7VKRTKqEOB8ov9Epv2pX2oD1qT99ULZXkrNGvpT1/Ae/ilOM=</latexit>

at ⇠
st+1 ⇠ rt+1 ⇠
⇡(.|st )
T (.|st , at ) R(st , at )
<latexit sha1_base64="Y41q9LKXVshnvpg08FgpFTSK5Wo=">AAAC4XichVFNb9NAEH0xLbShhQASFy5Wo0rpJbKrSnCsoKBekIJEPqS6itbuJl3FX/JuIoW2P6C3iis3ru0fKr+FA89bB4lGqGutZ/bNm7czO2EeK20877bmPFpZffxkbb3+dGPz2fPGi5c9nU2LSHajLM6KQSi0jFUqu0aZWA7yQookjGU/nHwo4/2ZLLTK0q9mnsvjRIxTNVKRMISGjddiaNxAq8QNAjfIVat9rodmZ9hoem3PLnfZ8SuniWp1ssYvBDhBhghTJJBIYejHEND8juDDQ07sGGfECnrKxiUuUGfulCxJhiA64X/M01GFpjyXmtpmR7wl5i6Y6WKb+5NVDMkub5X0Ne1v7m8WG//3hjOrXFY4pw2puG4VPxM3OCXjocykYi5qeTiz7MpghHe2G8X6couUfUZ/dQ4YKYhNbMTFR8scUyO05xlfIKXtsoLylRcKru34hFZYK61KWikK6hW05euzHo7Zvz/UZae32/a9tv9lr7n/vhr4Gt5gCy1O9S32cYgO64hwjp+4xo0TOZfOlfP9jurUqpxX+Gc5P/4ArO2W+w==</latexit>

<latexit sha1_base64="yjB3lD90L5ucSD0QYCUcSe1RyLo=">AAAC8nichVHLShxBFD22eejk4USXbpoMIRMShu4QSJYSH7gRFGZUsIemui3HYvpFV41gxvmE/EB24tadW/0Q/RYXnq60gUSC1VTfW+eee+reulGRKG0873rKmX7y9NnzmdnGi5evXs8138xv63xUxrIX50le7kZCy0RlsmeUSeRuUUqRRonciYbLVXznSJZa5VnXHBeyn4pBpg5ULAyhsPleh2Pz0Z+4gVapGwRukApzGItk3J20Oyc6NJ9cEZoPYbPldTy73IeOXzst1Gszb94gwD5yxBghhUQGQz+BgOa3Bx8eCmJ9jImV9JSNS0zQYO6ILEmGIDrkf8DTXo1mPFea2mbHvCXhLpnp4h33mlWMyK5ulfQ17S33D4sN/nvD2CpXFR7TRlSctYobxA0OyXgsM62Z97U8nll1ZXCAb7YbxfoKi1R9xn90VhgpiQ1txMWqZQ6oEdnzEV8go+2xguqV7xVc2/E+rbBWWpWsVhTUK2mr12c9HLP/71AfOtufO77X8be+tJa+1wOfwSLeos2pfsUS1rHJOmL8xAUuceUY55dz6pz9pjpTdc4C/lrO+R3MTJ3Y</latexit> <latexit sha1_base64="ccs5ASiK9fPlKV0g6zCCLRSCl8g=">AAAC8HichVFNTxRBEH2MIguKrHrkMnFDgoFsZoiJHIki4WKCxgUShkx6hma3s/OV7l6SdbO/gD/gzXj1xhV+ifwWDr5pBxMlhp70VPWrV6+rupIqU8YGwc8Z78HD2UdzrfmFx08Wny61nz3fN+VIp7KXllmpDxNhZKYK2bPKZvKw0lLkSSYPkuG7On5wJrVRZfHZjit5nIt+oU5VKiyhuL2i44ldC6d+ZFTuR5Ef5cIOUpFNPk1XTWzXfRHbV3G7E3QDt/y7Ttg4HTRrr2xfI8IJSqQYIYdEAUs/g4Dhd4QQASpix5gQ0/SUi0tMscDcEVmSDEF0yH+fp6MGLXiuNY3LTnlLxq2Z6WOFe8cpJmTXt0r6hvaG+4vD+v+9YeKU6wrHtAkV553iB+IWAzLuy8wb5m0t92fWXVmcYtN1o1hf5ZC6z/SPzjYjmtjQRXy8d8w+NRJ3PuMLFLQ9VlC/8q2C7zo+oRXOSqdSNIqCepq2fn3WwzGH/w71rrO/0Q2DbvjxdWfrbTPwFpbxEquc6htsYRd7rCPFOS5wiStPe1+9b97331Rvpsl5gb+W9+MXxoCdFw==</latexit>

Environment
<latexit sha1_base64="2j419xPEF3CiJp/l/KtBdGBChaI=">AAAC33ichVFLS8NAEB7jq/VZHzcvxSJ4KokIeixqxYtQwWpBS9mk27o02YTNtlhLz97Eqzev+o/0t3jwy5oKWsQJm5n9ZubbebiRL2Jt228T1uTU9MxsJjs3v7C4tJxbWb2Iw67yeNUL/VDVXBZzX0he1UL7vBYpzgLX55du5zDxX/a4ikUoz3U/4vWAtaVoCY9pQI3c+rXmt9ptDcqyJ1QoAy71sJEr2EXbSH7ccFKjQKlUwtw7XVOTQvKoSwFxkqRh+8QoxndFDtkUAavTAJiCJYyf05DmkNtFFEcEA9rBv43bVYpK3BPO2GR7eMXHUcjM0xbOsWF0EZ28ymHH0B84dwZr//nCwDAnFfahXTBmDeMpcE03iPgvM0gjR7X8n5l0palF+6YbgfoigyR9et88R/AoYB3jyVPZRLbB4Zp7DxOQ0FVUkEx5xJA3HTehmdHcsMiUkYFPQSfTRz1Ys/N7qePGxU7RsYvO2W6hdJAuPEMbtEnb2OoeleiEKqjDw7Sf6YVeLWbdWw/W41eoNZHmrNEPsZ4+AetEl/Y=</latexit>

• MDP Formalism: • Deterministic Policy:


(S, A, T , R, )
<latexit sha1_base64="2Xh95jidcm3sE0X4O3VaKBLmKEw=">AAADB3ichVHLTttAFD2YtjxaSgrLbiyiSlRCkV1VgkUXQKHqBglaQpBCFI3NxLHil+wJEo34AH6DH2CH2HbXbdnBt7DgzNSgQFQx1vjee+69Z+7Dy6KwUI5zPWaNv3j5amJyavr1m5m3s5V3c3tF2s99WffTKM33PVHIKExkXYUqkvtZLkXsRbLh9b5qf+NI5kWYJrvqOJOtWARJ2Al9oQi1K18WD2Khur6IBj9PluwHY23Y2B02fmgjEHEsPrYrVafmmGOPKm6pVFGe7bRygwMcIoWPPmJIJFDUIwgU/Jpw4SAj1sKAWE4tNH6JE0wzt88oyQhBtMd/QKtZogltzVmYbJ+vRLw5M2184P1mGD1G61cl9YLylveXwYL/vjAwzLrCY0qPjFOGcYu4QpcRz2XGZeR9Lc9n6q4UOlgx3YSsLzOI7tN/4NmgJyfWMx4bmyYyIIdn7CNOIKGsswI95XsG23R8SCmMlIYlKRkF+XJKPX3WwzW7T5c6qux9qrlOzd35XF1dLxc+ifdYwCK3uoxVfMc26/Bxhj/4iyvr1Dq3LqzLf6HWWJkzj0fH+n0HKeKm4g==</latexit>
⇡(s) : S ! A
<latexit sha1_base64="cJ1XYl/xOrFa+vOpwSN4xgmndtQ=">AAAC7nichVHLTttAFD2YAuEd6JKN1agSbCIbIVF1FR6t2CCBSgISQWhshmQUv2RPIoUoH8APdFexZdctfAr9li44njogihBjje+dc889c+9cLwlUph3nYcwa/zAxOVWanpmdm19YLC8tN7K4m/qy7sdBnJ54IpOBimRdKx3IkySVIvQCeex1dvL4cU+mmYqjI91P5FkoWpG6VL7QhM7LlWaiVrO1r3YzFLrti2DwY2g3dfx83hqS5VQds+zXjls4FRTrIC7/QRMXiOGjixASETT9AAIZv1O4cJAQO8OAWEpPmbjEEDPM7ZIlyRBEO/y3eDot0IjnXDMz2T5vCbhTZtr4zP3dKHpk57dK+hntX+4rg7XevGFglPMK+7QeFaeN4j5xjTYZ72WGBXNUy/uZeVcal/hiulGsLzFI3qf/pLPLSEqsYyI2vhlmixqeOff4AhFtnRXkrzxSsE3HF7TCWGlUokJRUC+lzV+f9XDM7v9Dfe001quuU3UPNyq17WLgJazgE1Y51U3UsIcD1uHjGr9xh3srsX5av6ybf1RrrMj5iBfLun0E9rCc2A==</latexit>
The Reinforcement Learning Problem.
1
X
• Value function: V ⇡ (s) = E⇡ [ k
rt+k+1 | st = s, ⇡]
<latexit sha1_base64="SlllF+mKQJna0y3nKgS0FJRlsq8=">AAADHHichVHLahRBFL1pH3n4GnXppnAQIpGhWwSzGQhqxI2QgDMJTM80VT01naKfVNUExqZ/xd/wB9yJW0HBrf6EC0+XHUGDpJrqe+vce07dW1dUmTLW97+seZcuX7m6vrG5de36jZu3erfvjE251LEcxWVW6mPBjcxUIUdW2UweV1ryXGTySKTP2/jRqdRGlcUbu6rkNOdJoRYq5hZQ1Dscz8JKbZuHbMjCnNsTIer9JqoBNpPQLPOoTod+MwtVsbArFiY8z/ksZTqq7U66EzRgqTkzkR2aRwysadTr+wPfLXbeCTqnT906KHtfKaQ5lRTTknKSVJCFnxEng29CAflUAZtSDUzDUy4uqaEtcJfIksjgQFP8E5wmHVrg3Goax45xS4atwWT0APulUxTIbm+V8A3sT+y3Dkv+e0PtlNsKV7ACiptO8TVwSyfIuIiZd5lntVzMbLuytKBd141CfZVD2j7jPzovENHAUhdhtO8yE2gIdz7FCxSwI1TQvvKZAnMdz2G5s9KpFJ0ih56GbV8f9WDMwb9DPe+MHw8CfxAcPunvPesGvkH36D5tY6pPaY9e0QHqiOk9faPv9MN7533wPnqffqd6ax3nLv21vM+/AN6nrv8=</latexit>
k=0

• The RL problem: V ? (s) = max V ⇡ (s)


<latexit sha1_base64="Oqq2XvqvIoAEMgIBd+M6wKda9Sk=">AAAC9nichVHLahRBFD1pH3n4GuPSTeEgRIShWwSzEYJRcSOM4EwC6WSo7lQmxfSLqpqQOMxH5AeyE7fu3Opn6Le48FTZETRIqqm+t84999S9dbOm0NbF8feF6MrVa9cXl5ZXbty8dftO5+7q0NZTk6tBXhe12c6kVYWu1MBpV6jtxihZZoXayiabPr51pIzVdfXenTRqt5TjSh/oXDpCo87j4V5qnTRr9pF4LtJSHo9maaNFqiuR9vVcDPf8eS5IGHW6cS8OS1x0ktbpol39uvMDKfZRI8cUJRQqOPoFJCy/HSSI0RDbxYyYoadDXGGOFeZOyVJkSKIT/sc87bRoxbPXtCE75y0Ft2GmwEPu10ExI9vfquhb2p/cHwI2/u8Ns6DsKzyhzai4HBTfEnc4JOOyzLJlntdyeabvyuEA66EbzfqagPg+8z86LxkxxCYhIvAqMMfUyML5iC9Q0Q5YgX/lcwUROt6nlcGqoFK1ipJ6hta/PuvhmJN/h3rRGT7pJXEvefe0u/GiHfgS7uMB1jjVZ9jAG/RZR45TfMFXfIuOo7PoY/TpNzVaaHPu4a8Vff4FGC6fDA==</latexit>
⇡2⇧

• Action-Value function:
1
X R. Bellman
Q⇡ (s, a) = E⇡ [ k
rt+k+1 | st = s, at = a]
<latexit sha1_base64="uX8YCSzPGMgtI/f/w1SG9+n4L40=">AAADIHichVFNaxRBEK2MX0n82ujRS+MiRBKWmSCYy0JQI16EBNwksLM7dM/2Tpr5pLs3sA7zZ/wb/gFv4tHcvOpf8OCbdiJokPTQ86pfVb2u6hJVpoz1/a8r3rXrN27eWl1bv33n7r37vY0HR6Zc6FiO4jIr9YngRmaqkCOrbCZPKi15LjJ5LNKXrf/4TGqjyuKdXVZykvOkUHMVcwsq6o0Op2GlNs02f8qGLMy5PRWi3m+iGnQzDs0ij+p06DfTUBVzu2RhwvOcT1Omo9pupVtBgyw1YyayQ7PNOIBPol7fH/husctG0Bl96tZB2TunkGZUUkwLyklSQRZ2RpwMvjEF5FMFbkI1OA1LOb+khtaRu0CURAQHm+Kf4DTu2ALnVtO47Bi3ZNgamYyeYL92igLR7a0StgH+xH7vuOS/N9ROua1wCRRQXHOKb8FbOkXEVZl5F3lRy9WZbVeW5rTrulGor3JM22f8R+cVPBpc6jyM9l1kAg3hzmd4gQI4QgXtK18oMNfxDMgdSqdSdIocehrYvj7qwZiDf4d62TjaGQT+IDh81t970Q18lR7RY9rEVJ/THr2hA9QR00f6Rt/ph/fB++R99r78DvVWupyH9Nfyzn8Bn36wTA==</latexit>
k=0
X
= T (s, a, s0 )[R(s, a, s0 ) + Q⇡ (s0 , a = ⇡(s0 ))]
<latexit sha1_base64="FqGaT/xWu2TDyvLM4b7iJoBuEs0=">AAADGXichVHLbtNAFD01jz54NIUlmxERSiKiyEZI7aZSxUtsilpo2kpxiMbu1B3FL3mcSiXKl/Ab/AA7xJYV3ZbPYMHx1K0EFepY43vm3HvP3Ds3yGNtStf9OefcuHnr9vzC4tKdu/fuLzdWHuyabFKEqh9mcVbsB9KoWKeqX+oyVvt5oWQSxGovGL+s/HvHqjA6S3fKk1wNExml+lCHsiQ1arxbF76ZJKOpaQlfp8JPZHkUynj6YTYTO23TlV3T6ojB+0v4VPiRTBIptj/6uW6bVleun4NOZzhqNN2ea5e4CrwaNFGvraxxCh8HyBBiggQKKUriGBKG3wAeXOTkhpiSK4i09SvMsMTcCaMUIyTZMf8RT4OaTXmuNI3NDnlLzF0wU+AJ9xurGDC6ulURG9rf3J8sF/33hqlVrio8oQ2ouGgVN8mXOGLEdZlJHXlRy/WZVVclDrFmu9GsL7dM1Wd4qfOKnoLc2HoEXtvIiBqBPR/zBVLaPiuoXvlCQdiOD2iltcqqpLWipF5BW70+6+GYvX+HehXsPut5bs/bft7ceFEPfAGP8BhtTnUVG3iLLdYR4gtOcYZfzmfnq/PN+X4e6szVOQ/x13J+/AFL1qom</latexit>
s0 2S
Temporal Di erence Learning in MDPs.
X
Q (s, a) = ⇡
T (s, a, s0 )[R(s, a, s0 ) + Q⇡ (s0 , a = ⇡(s0 ))]
<latexit sha1_base64="v0jPrqcb5TvvRpW8DByVvLvBnVQ=">AAADJHichVHLahRBFD1pX0l8ZNSlm8JBJoPD0C2CbgLBF26ERDNJZHocqjuVTjH9oqsnEIf5HX/DH3AnLnThzq3+gAtPVToBDZJqqu+pc+89dW/dqEy1qX3/64J34eKly1cWl5avXrt+Y6V189a2KaZVrAZxkRbVbiSNSnWuBrWuU7VbVkpmUap2oslT6985VJXRRb5VH5VqlMkk1/s6ljWpcevt5ruw1KumJ7tiTYRmmo1npiNCnYswk/VBLNPZm/lcbNmQnul0xfD1KbwvwkRmmRSNSKcn145Btzsat9p+33dLnAVBA9po1kbR+oYQeygQY4oMCjlq4hQSht8QAXyU5EaYkauItPMrzLHM3CmjFCMk2Qn/CU/Dhs15tprGZce8JeWumClwj/uFU4wYbW9VxIb2N/d7xyX/vWHmlG2FR7QRFZec4ivyNQ4YcV5m1kSe1HJ+pu2qxj4eu2406ysdY/uMT3We0VORmziPwHMXmVAjcudDvkBOO2AF9pVPFITreI9WOqucSt4oSupVtPb1WQ/HHPw71LNg+0E/8PvB5sP2+pNm4Iu4g7tY5VQfYR0vscE6YnzED/zEL++D98n77H05DvUWmpzb+Gt53/8ACQOt6Q==</latexit>
s0 2S

• Bootstrapping/TD-Learning: TD Error: <latexit sha1_base64="3RQWbWtmIegeanlPdQyGgBl9drk=">AAACz3ichVFNS8NAEH2NX62fVY9eikXwVBIR9Fj8wovQgq2CFtmka43NF8m2Uovi1ZtX/Wf6Wzz4sqaCirhhM7Nv3ryd2bEjz02Uab7mjLHxicmpfGF6ZnZufqG4uNRMwl7syIYTemF8aotEem4gG8pVnjyNYil825Mndnc3jZ/0ZZy4YXCsBpFs+aITuJeuIxSh5nlbekpcFMtmxdSr9NuxMqeMbNXC4hvO0UYIBz34kAig6HsQSPidwYKJiFgLQ2IxPVfHJe4wzdweWZIMQbTLf4enswwNeE41E53t8BaPO2ZmCWvcB1rRJju9VdJPaN+5bzXW+fOGoVZOKxzQ2lQsaMUj4gpXZPyX6WfMUS3/Z6ZdKVxiW3fjsr5II2mfzpfOHiMxsa6OlLCvmR1q2Prc5wsEtA1WkL7ySKGkO27TCm2lVgkyRUG9mDZ9fdbDMVs/h/rbaW5ULLNi1TfL1Z1s4HmsYBXrnOoWqjhEjXU4uMYTnvFi1I0b4954+KQauSxnGd+W8fgBcOuRKQ==</latexit>

0 0
Q(s, a)k+1 = Q(s, a)k + ⌘(r + max
0
Q(s , a )k Q(s, a)k )
<latexit sha1_base64="gQBjNb79R9hQELKYgeikblz7pCo=">AAADJ3ichVFLTxRBEC5GlIevVY9cOmzM7gbczBgSuZCggPFiAokLJAyZ9AzN0JlnenqJOJkf5N/wD3gzws2bVz1z4Ot2lkSJoSc9VfVV1df1CMtUVtp1z6ecO9N3783Mzs3ff/Dw0ePOk6e7VTFWkRhFRVqo/ZBXIpW5GGmpU7FfKsGzMBV7YbJh/HunQlWyyD/os1IcZjzO5bGMuAYUdPydfrXMB0GdLHkNW2PXZsOWmC80Z31ltJhnGWd+xj8GNe8xX+bG0CcRT+vXTYO03jLvDYKEvZhwJIOg03WHrj3spuK1Spfas110LsinIyooojFlJCgnDT0lThW+A/LIpRLYIdXAFDRp/YIamkfuGFECERxogn8M66BFc9iGs7LZEV5JcRUyGT3HfWsZQ0SbVwX0CvIS95PF4v++UFtmU+EZZAjGOcv4HrimE0Tclpm1kZNabs80XWk6plXbjUR9pUVMn9E1zyY8ClhiPYy2bGQMjtDap5hADjlCBWbKEwZmOz6C5FYKy5K3jBx8CtJMH/Vgzd6/S72p7L4ceu7Q21nprr9pFz5LC7RIfWz1Fa3TO9pGHRF9oZ/0i347n52vzjfn+59QZ6rNeUZ/HefHFT4drrU=</latexit>
a 2A
Target: Yk
<latexit sha1_base64="KedFiNHpbYHwtdHeh2hIqAWpKrc=">AAACzHichVFLS8NAEJ7GV1tfVY9egkXwVBIR9Fh84UWpaB9SS0nSNS55stkWaunVm1f9bfpbPPhlTQUt0g2bmf3mm29nduzY54k0jPecNje/sLiULxSXV1bX1ksbm40k6guH1Z3Ij0TLthLm85DVJZc+a8WCWYHts6btnaTx5oCJhEfhrRzGrBNYbsgfuGNJQDd3Xa9bKhsVQy192jEzp0zZqkWlD7qnHkXkUJ8CYhSShO+TRQm+NplkUAysQyNgAh5XcUZjKiK3DxYDwwLq4e/i1M7QEOdUM1HZDm7xsQUyddrFPleKNtjprQx+AvuJ/aQw998bRko5rXAIa0OxoBQvgUt6BGNWZpAxJ7XMzky7kvRAR6objvpihaR9Oj86p4gIYJ6K6HSmmC40bHUe4AVC2DoqSF95oqCrjnuwlrJMqYSZogU9AZu+PurBmM2/Q512GvsV06iY1wfl6nE28Dxt0w7tYaqHVKULqqEOB9W90Cu9aVea1Eba+Juq5bKcLfq1tOcv89OPyA==</latexit>

• Dopamine Reward Prediction Error Hypothesis:

Schultz, Dayan, Montague (1997)


ff
The Curse of Dimensionality in DRL.

Q(s,
Q(s,a)a)
<latexit sha1_base64="GGiGM75WVSeWT8FnIwdQZAXOgcs=">AAACz3ichVFNS8NAEH2Nn61fVY9eikWoICURQY/FL7wILdgPqEU26bbGpklIUkVF8erNq/4z/S0efFlTQYt0w2Zm37x5O7Nj+o4dRrr+ntImJqemZ2bTmbn5hcWl7PJKLfQGgSWrlud4QcMUoXRsV1YjO3Jkww+k6JuOrJu9gzhev5ZBaHvuWXTry1ZfdF27Y1siIlSrFMItsXmRzetFXa3cqGMkTh7JKnvZD5yjDQ8WBuhDwkVE34FAyK8JAzp8Yi3cEwvo2Sou8YAMcwdkSTIE0R7/XZ6aCeryHGuGKtviLQ53wMwcNriPlaJJdnyrpB/SfnLfKaz77w33Sjmu8JbWpGJaKZ4Sj3BJxrjMfsIc1jI+M+4qQgd7qhub9fkKifu0fnQOGQmI9VQkhyPF7FLDVOdrvoBLW2UF8SsPFXKq4zatUFYqFTdRFNQLaOPXZz0cs/F3qKNObbto6EWjspMv7ScDn8Ua1lHgVHdRwgnKrMPCFV7wijetot1oj9rTN1VLJTmr+LW05y+ZJZBl</latexit>

<latexit sha1_base64="GGiGM75WVSeWT8FnIwdQZAXOgcs=">AAACz3ichVFNS8NAEH2Nn61fVY9eikWoICURQY/FL7wILdgPqEU26bbGpklIUkVF8erNq/4z/S0efFlTQYt0w2Zm37x5O7Nj+o4dRrr+ntImJqemZ2bTmbn5hcWl7PJKLfQGgSWrlud4QcMUoXRsV1YjO3Jkww+k6JuOrJu9gzhev5ZBaHvuWXTry1ZfdF27Y1siIl
sha1_base64="GGiGM75WVSeWT8FnIwdQZAXOgcs=">AAACz3ichVFNS8NAEH2Nn61fVY9eikWoICURQY/FL7wILdgPqEU26bbGpklIUkVF8erNq/4z/S0efFlTQYt0w2Zm37x5O7Nj+o4dRrr+ntImJqemZ2bTmbn5hcWl7PJKLfQGgSWrlud4QcMUoXRsV1YjO3Jkww+k6JuOrJu9gzhev5ZBaHvuWXTry1ZfdF27Y1siIlSrFMItsXmRzetFXa3cqGMkTh7JKnvZD5yjDQ8WBuhDwkVE34FAyK8JAzp8Yi3cEwvo2Sou8YAMcwdkSTIE0R7/XZ6aCeryHGuGKtviLQ53wMwcNriPlaJJdnyrpB/SfnLfKaz77w33Sjmu8JbWpGJaKZ4Sj3BJxrjMfsIc1jI+M+4qQgd7qhub9fkKifu0fnQOGQmI9VQkhyPF7FLDVOdrvoBLW2UF8SsPFXKq4zatUFYqFTdRFNQLaOPXZz0cs/F3qKNObbto6EWjspMv7ScDn8Ua1lHgVHdRwgnKrMPCFV7wijetot1oj9rTN1VLJTmr+LW05y+ZJZBl</latexit>
aa1 1
<latexit sha1_base64="vfGvYbEY73RHNZkDbUyXOLM5rJ8=">AAACzHichVFLS8NAEJ7GV1tfVY9egkXwVBIR9Fh84UWpaB9QS9mk2xqaF5ttoZZevXnV36a/xYPfrqmgRbphM7PffPPtzI4T+14iLes9YywsLi2vZHP51bX1jc3C1nYtiQbC5VU38iPRcFjCfS/kVelJnzdiwVng+Lzu9M9UvD7kIvGi8F6OYt4KWC/0up7LJKA71rbbhaJVsvQyZx07dYqUrkpU+KAH6lBELg0oIE4hSfg+MUrwNckmi2JgLRoDE/A8Hec0oTxyB2BxMBjQPv49nJopGuKsNBOd7eIWH1sg06R97Eut6ICtbuXwE9hP7CeN9f69YayVVYUjWAeKOa14DVzSIxjzMoOUOa1lfqbqSlKXTnQ3HuqLNaL6dH90zhERwPo6YtKFZvag4ejzEC8QwlZRgXrlqYKpO+7AMm25VglTRQY9AateH/VgzPbfoc46tcOSbZXs26Ni+TQdeJZ2aY8OMNVjKtMVVVCHi+pe6JXejBtDGmNj8k01MmnODv1axvMXe/ePlg==</latexit>
<latexit sha1_base64="vfGvYbEY73RHNZkDbUyXOLM5rJ8=">AAACzHichVFLS8NAEJ7GV1tfVY9egkXwVBIR9Fh84UWpaB9QS9mk2xqaF5ttoZZevXnV36a/xYPfrqmgRbphM7PffPPtzI4T+14iLes9YywsLi2vZHP51bX1jc3C1nYtiQbC5VU38iPRcFjCfS/kVelJnzdiwVng+Lzu9M9UvD7kIvGi8F6OYt4KWC/0up7LJKA71rbbhaJVsvQyZx07dYqUrkpU+KAH6lBELg0oIE4hSfg+MUrwNckmi2JgLRoDE/A8Hec0oTxyB2BxMBjQPv49nJopGuKsNBOd7eIWH1sg06R97Eut6ICtbuXwE9hP7CeN9f69YayVVYUjWAeKOa14DVzSIxjzMoOUOa1lfqbqSlKXTnQ3HuqLNaL6dH90zhERwPo6YtKFZvag4ejzEC8QwlZRgXrlqYKpO+7AMm25VglTRQY9AateH/VgzPbfoc46tcOSbZXs26Ni+TQdeJZ2aY8OMNVjKtMVVVCHi+pe6JXejBtDGmNj8k01MmnODv1axvMXe/ePlg==</latexit>
a2a2
<latexit sha1_base64="YDmpLwhPsR9ylWWG8NaNBIQ5W0o=">AAACzHichVFLS8NAEJ7GV1tfVY9egkXwVJIi6LH4wotS0T6glrJJtzU0LzbbQi29evOqv01/iwe/XVNBi7hhM7PffPPtzI4T+14iLestYywsLi2vZHP51bX1jc3C1nY9iYbC5TU38iPRdFjCfS/kNelJnzdjwVng+LzhDE5VvDHiIvGi8E6OY94OWD/0ep7LJKBb1il3CkWrZOllzjt26hQpXdWo8E731KWIXBpSQJxCkvB9YpTga5FNFsXA2jQBJuB5Os5pSnnkDsHiYDCgA/z7OLVSNMRZaSY628UtPrZApkn72Bda0QFb3crhJ7Af2I8a6/95w0QrqwrHsA4Uc1rxCrikBzD+ywxS5qyW/zNVV5J6dKy78VBfrBHVp/utc4aIADbQEZPONbMPDUefR3iBELaGCtQrzxRM3XEXlmnLtUqYKjLoCVj1+qgHY7Z/D3XeqZdLtlWybw6LlZN04FnapT06wFSPqEKXVEUdLqp7phd6Na4NaUyM6RfVyKQ5O/RjGU+ffl2Plw==</latexit>
<latexit sha1_base64="YDmpLwhPsR9ylWWG8NaNBIQ5W0o=">AAACzHichVFLS8NAEJ7GV1tfVY9egkXwVJIi6LH4wotS0T6glrJJtzU0LzbbQi29evOqv01/iwe/XVNBi7hhM7PffPPtzI4T+14iLestYywsLi2vZHP51bX1jc3C1nY9iYbC5TU38iPRdFjCfS/kNelJnzdjwVng+LzhDE5VvDHiIvGi8E6OY94OWD/0ep7LJKBb1il3CkWrZOllzjt26hQpXdWo8E731KWIXBpSQJxCkvB9YpTga5FNFsXA2jQBJuB5Os5pSnnkDsHiYDCgA/z7OLVSNMRZaSY628UtPrZApkn72Bda0QFb3crhJ7Af2I8a6/95w0QrqwrHsA4Uc1rxCrikBzD+ywxS5qyW/zNVV5J6dKy78VBfrBHVp/utc4aIADbQEZPONbMPDUefR3iBELaGCtQrzxRM3XEXlmnLtUqYKjLoCVj1+qgHY7Z/D3XeqZdLtlWybw6LlZN04FnapT06wFSPqEKXVEUdLqp7phd6Na4NaUyM6RfVyKQ5O/RjGU+ffl2Plw==</latexit>
...
<latexit sha1_base64="FOQGk7RS9N+CrSJbLo1W/IYn0qc=">AAACznichVFLS8NAEJ7GV1tfVY9egkXwVBIR9Fh84UWoYFqhLbJJt3VpXmy2hVqKV29e9afpb/HglzUVtEg3bGb2m2++ndlxY18kyrLec8bC4tLySr5QXF1b39gsbW3Xk2ggPe54kR/JO5cl3Bchd5RQPr+LJWeB6/OG2z9L440hl4mIwls1ink7YL1QdIXHFCCn1YlUcl8qWxVLL3PWsTOnTNmqRaUPalGHIvJoQAFxCknB94lRgq9JNlkUA2vTGJiEJ3Sc04SKyB2AxcFgQPv493BqZmiIc6qZ6GwPt/jYEpkm7WNfakUX7PRWDj+B/cR+1Fjv3xvGWjmtcATrQrGgFa+BK3oAY15mkDGntczPTLtS1KUT3Y1AfbFG0j69H51zRCSwvo6YdKGZPWi4+jzEC4SwDipIX3mqYOqOO7BMW65VwkyRQU/Cpq+PejBm++9QZ536YcW2KvbNUbl6mg08T7u0RweY6jFV6YpqqMOD5gu90ptRM4bGxHj6phq5LGeHfi3j+QuJRZDP</latexit>

a|A|
<latexit sha1_base64="Ex2cxbFnlV/pjm5/TviRz0YVg7c=">AAAC3HichVFNS8NAEH3G7++qRy/FIngqiQh6rJ94ERSsFbSWTVxraL5Itkpte/MmXr151d+kv8WDL2sqaBE3bGb2zZu3Mzt25LmJMs23AWNwaHhkdGx8YnJqemY2Nzd/koTN2JFlJ/TC+NQWifTcQJaVqzx5GsVS+LYnK3ZjO41XbmScuGFwrFqRrPqiHrhXriMUoVpuTtTanXNfqGtHeO3NbqdbyxXMoqlXvt+xMqeAbB2GuXec4xIhHDThQyKAou9BIOF3BgsmImJVtInF9Fwdl+higrlNsiQZgmiD/zpPZxka8JxqJjrb4S0ed8zMPJa597SiTXZ6q6Sf0H5w32ms/ucNba2cVtiitak4rhUPiCtck/Ffpp8xe7X8n5l2pXCFDd2Ny/oijaR9Ot86O4zExBo6kseuZtapYevzDV8goC2zgvSVewp53fElrdBWapUgUxTUi2nT12c9HLP1e6j9zslq0TKL1tFaobSVDXwMi1jCCqe6jhL2ccg6HNziGS94NS6Me+PBePyiGgNZzgJ+LOPpEytLloE=</latexit>

ss11 <latexit sha1_base64="1yKgy7GLpt7kJnx9UpviDE/OWNw=">AAACzHichVFLS8NAEJ7GV1tfVY9egkXwVBIR9Fh84UWpaB9QS9mk27o0L7LbQi29evOqv01/iwe/rKmgRbphM7PffPPtzI4TeUIqy3rPGAuLS8sr2Vx+dW19Y7OwtV2T4SB2edUNvTBuOExyTwS8qoTyeCOKOfMdj9ed/lkSrw95LEUY3KtRxFs+6wWiK1ymAN3Jtt0uFK2SpZc569ipU6R0VcLCBz1Qh0JyaUA+cQpIwfeIkcTXJJssioC1aAwshid0nNOE8sgdgMXBYED7+PdwaqZogHOiKXW2i1s87BiZJu1jX2pFB+zkVg5fwn5iP2ms9+8NY62cVDiCdaCY04rXwBU9gjEv00+Z01rmZyZdKerSie5GoL5II0mf7o/OOSIxsL6OmHShmT1oOPo8xAsEsFVUkLzyVMHUHXdgmbZcqwSpIoNeDJu8PurBmO2/Q511aocl2yrZt0fF8mk68Czt0h4dYKrHVKYrqqAOF9W90Cu9GTeGMsbG5JtqZNKcHfq1jOcvp0ePqA==</latexit>


Q(s
Q(s1 ,1a
, 1a)1 )
<latexit sha1_base64="4G2Uj46KfaZB7YmF7gj2uwbJu3w=">AAAC03ichVFLS8NAEJ7GV1tfVY9egkWoICURQU9S8IEXoQXTFmspm3QbQ/MimxZq8SJevXnV/6W/xYNf1lTQIt2wmdlvZr55maHriFjT3jPK3PzC4lI2l19eWV1bL2xs1kUwiCxuWIEbRE2TCe46PjdiJ3Z5M4w480yXN8z+aWJvDHkknMC/jkchb3vM9p2eY7EY0E2tJDr6Puvoe51CUStr8qjTip4qRUpPNSh80C11KSCLBuQRJ59i6C4xEvhapJNGIbA2jYFF0Bxp5/RAecQO4MXhwYD28bfxaqWoj3fCKWS0hSwuboRIlXZxLySjCe8kK4cuID9x7yVm/5thLJmTCkeQJhhzkvEKeEx38JgV6aWek1pmRyZdxdSjY9mNg/pCiSR9Wj88Z7BEwPrSotK59LTBYcr3EBPwIQ1UkEx5wqDKjruQTEouWfyUkYEvgkymj3qwZv3vUqeV+kFZ18p67bBYOUkXnqVt2qEStnpEFbqkKuqwkOWFXulNMZSx8qg8fbsqmTRmi34d5fkL6xKRqQ==</latexit>

<latexit sha1_base64="4G2Uj46KfaZB7YmF7gj2uwbJu3w=">AAAC03ichVFLS8NAEJ7GV1tfVY9egkWoICURQU9S8IEXoQXTFmspm3QbQ/MimxZq8SJevXnV/6W/xYNf1lTQIt2wmdlvZr55maHriFjT3jPK3PzC4lI2l19eWV1bL2xs1kUwiCxuWIEbRE2TCe46PjdiJ3Z5M4w480yXN8z+aWJvDHkknMC/jkchb3vM9p2eY7EY0E2tJDr6Puvoe51CUStr8qjTip4qRUpPNSh80C11KSCLBuQRJ59i6C4xEvhapJNGIbA2jYFF0Bxp5/RAecQO4MXhwYD28bfxaqWoj3fCKWS0hSwuboRIlXZxLySjCe8kK4cuID9x7yVm/5thLJmTCkeQJhhzkvEKeEx38JgV6aWek1pmRyZdxdSjY9mNg/pCiSR9Wj88Z7BEwPrSotK59LTBYcr3EBPwIQ1UkEx5wqDKjruQTEouWfyUkYEvgkymj3qwZv3vUqeV+kFZ18p67bBYOUkXnqVt2qEStnpEFbqkKuqwkOWFXulNMZSx8qg8fbsqmTRmi34d5fkL6xKRqQ==</latexit>
Q(s
Q(s1 ,1a,2a)2 )
<latexit sha1_base64="AGkSfip/aDSMH1wFHytg58lvgJE=">AAAC1HichVFLS8NAEJ7GV1tfVY9egkWoICUpgp6k4AMvQgu2FWopm3Rbl+ZFkhZq9SRevXnV36W/xYPfrqmgRbphM7PffPPtzI4VOCKKDeM9pc3NLywupTPZ5ZXVtfXcxmY98gehzWu27/jhtcUi7giP12IRO/w6CDlzLYc3rP6JjDeGPIyE713Fo4C3XNbzRFfYLAbUrBaitrmvs3Zpr53LG0VDLX3aMRMnT8mq+LkPuqEO+WTTgFzi5FEM3yFGEb4mmWRQAKxFY2AhPKHinB4oi9wBWBwMBrSPfw+nZoJ6OEvNSGXbuMXBDpGp0y72uVK0wJa3cvgR7Cf2ncJ6/94wVsqywhGsBcWMUrwEHtMtGLMy3YQ5qWV2puwqpi4dqW4E6gsUIvu0f3ROEQmB9VVEpzPF7EHDUuchXsCDraEC+coTBV113IFlynKl4iWKDHohrHx91IMxm3+HOu3US0XTKJrVg3z5OBl4mrZphwqY6iGV6YIqqEPO5YVe6U2ra/fao/b0TdVSSc4W/Vra8xdiSJHU</latexit>

<latexit sha1_base64="AGkSfip/aDSMH1wFHytg58lvgJE=">AAAC1HichVFLS8NAEJ7GV1tfVY9egkWoICUpgp6k4AMvQgu2FWopm3Rbl+ZFkhZq9SRevXnV36W/xYPfrqmgRbphM7PffPPtzI4VOCKKDeM9pc3NLywupTPZ5ZXVtfXcxmY98gehzWu27/jhtcUi7giP12IRO/w6CDlzLYc3rP6JjDeGPIyE713Fo4C3XNbzRFfYLAbUrBaitrmvs3Zpr53LG0VDLX3aMRMnT8mq+LkPuqEO+WTTgFzi5FEM3yFGEb4mmWRQAKxFY2AhPKHinB4oi9wBWBwMBrSPfw+nZoJ6OEvNSGXbuMXBDpGp0y72uVK0wJa3cvgR7Cf2ncJ6/94wVsqywhGsBcWMUrwEHtMtGLMy3YQ5qWV2puwqpi4dqW4E6gsUIvu0f3ROEQmB9VVEpzPF7EHDUuchXsCDraEC+coTBV113IFlynKl4iWKDHohrHx91IMxm3+HOu3US0XTKJrVg3z5OBl4mrZphwqY6iGV6YIqqEPO5YVe6U2ra/fao/b0TdVSSc4W/Vra8xdiSJHU</latexit>

ss22 <latexit sha1_base64="d9b3AoNuSf7Q1pAxbnEMIvAjo4E=">AAACzHichVFLS8NAEJ7GV1tfVY9egkXwVJIi6LH4wotS0T6glpKk2xqaF7vbQi29evOqv01/iwe/XVNBi7hhM7PffPPtzI6bBL6QlvWWMRYWl5ZXsrn86tr6xmZha7su4iH3WM2Lg5g3XUewwI9YTfoyYM2EMyd0A9ZwB6cq3hgxLvw4upPjhLVDpx/5Pd9zJKBb0Sl3CkWrZOllzjt26hQpXdW48E731KWYPBpSSIwikvADckjga5FNFiXA2jQBxuH5Os5oSnnkDsFiYDhAB/j3cWqlaISz0hQ628MtATZHpkn72Bda0QVb3crgC9gP7EeN9f+8YaKVVYVjWBeKOa14BVzSAxj/ZYYpc1bL/5mqK0k9Otbd+Kgv0Yjq0/vWOUOEAxvoiEnnmtmHhqvPI7xABFtDBeqVZwqm7rgL62jLtEqUKjrQ47Dq9VEPxmz/Huq8Uy+XbKtk3xwWKyfpwLO0S3t0gKkeUYUuqYo6PFT3TC/0alwb0pgY0y+qkUlzdujHMp4+Aamtj6k=</latexit>


Q(s
Q(s2 ,2a
, 1a)1 )
<latexit sha1_base64="ZUHDFPFnk/K1Czbm9YwbVzhZ4Cw=">AAAC1HichVFLS8NAEJ7GV1tfVY9egkWoICUpgp6k4AMvQgu2FWopm3Rbl+ZFkhZq9SRevXnV36W/xYPfrqmgRbphM7PffPPtzI4VOCKKDeM9pc3NLywupTPZ5ZXVtfXcxmY98gehzWu27/jhtcUi7giP12IRO/w6CDlzLYc3rP6JjDeGPIyE713Fo4C3XNbzRFfYLAbUrBaidmlfZ21zr53LG0VDLX3aMRMnT8mq+LkPuqEO+WTTgFzi5FEM3yFGEb4mmWRQAKxFY2AhPKHinB4oi9wBWBwMBrSPfw+nZoJ6OEvNSGXbuMXBDpGp0y72uVK0wJa3cvgR7Cf2ncJ6/94wVsqywhGsBcWMUrwEHtMtGLMy3YQ5qWV2puwqpi4dqW4E6gsUIvu0f3ROEQmB9VVEpzPF7EHDUuchXsCDraEC+coTBV113IFlynKl4iWKDHohrHx91IMxm3+HOu3US0XTKJrVg3z5OBl4mrZphwqY6iGV6YIqqEPO5YVe6U2ra/fao/b0TdVSSc4W/Vra8xdiTZHU</latexit>

<latexit sha1_base64="ZUHDFPFnk/K1Czbm9YwbVzhZ4Cw=">AAAC1HichVFLS8NAEJ7GV1tfVY9egkWoICUpgp6k4AMvQgu2FWopm3Rbl+ZFkhZq9SRevXnV36W/xYPfrqmgRbphM7PffPPtzI4VOCKKDeM9pc3NLywupTPZ5ZXVtfXcxmY98gehzWu27/jhtcUi7giP12IRO/w6CDlzLYc3rP6JjDeGPIyE713Fo4C3XNbzRFfYLAbUrBaidmlfZ21zr53LG0VDLX3aMRMnT8mq+LkPuqEO+WTTgFzi5FEM3yFGEb4mmWRQAKxFY2AhPKHinB4oi9wBWBwMBrSPfw+nZoJ6OEvNSGXbuMXBDpGp0y72uVK0wJa3cvgR7Cf2ncJ6/94wVsqywhGsBcWMUrwEHtMtGLMy3YQ5qWV2puwqpi4dqW4E6gsUIvu0f3ROEQmB9VVEpzPF7EHDUuchXsCDraEC+coTBV113IFlynKl4iWKDHohrHx91IMxm3+HOu3US0XTKJrVg3z5OBl4mrZphwqY6iGV6YIqqEPO5YVe6U2ra/fao/b0TdVSSc4W/Vra8xdiTZHU</latexit>
Q(s
Q(s2 ,2a,2a)2 )
<latexit sha1_base64="7zOBhdXfz2Mci/NrW8YBS1cbRio=">AAAC1HichVFLS8NAEJ7GV1tfVY9egkWoICUpgp6k4AMvQgu2FWopm3RbQ/NikxZq9SRevXnV36W/xYPfrqmgRbphM7PffPPtzI4Vuk4UG8Z7SpubX1hcSmeyyyura+u5jc16FAyEzWt24Abi2mIRdx2f12Indvl1KDjzLJc3rP6JjDeGXERO4F/Fo5C3PNbzna5jsxhQs1qI2qV9nbVLe+1c3igaaunTjpk4eUpWJch90A11KCCbBuQRJ59i+C4xivA1ySSDQmAtGgMT8BwV5/RAWeQOwOJgMKB9/Hs4NRPUx1lqRirbxi0utkCmTrvY50rRAlveyuFHsJ/Ydwrr/XvDWCnLCkewFhQzSvESeEy3YMzK9BLmpJbZmbKrmLp0pLpxUF+oENmn/aNziogA1lcRnc4UswcNS52HeAEftoYK5CtPFHTVcQeWKcuVip8oMugJWPn6qAdjNv8Oddqpl4qmUTSrB/nycTLwNG3TDhUw1UMq0wVVUIecywu90ptW1+61R+3pm6qlkpwt+rW05y9ktJHV</latexit>

<latexit sha1_base64="7zOBhdXfz2Mci/NrW8YBS1cbRio=">AAAC1HichVFLS8NAEJ7GV1tfVY9egkWoICUpgp6k4AMvQgu2FWopm3RbQ/NikxZq9SRevXnV36W/xYPfrqmgRbphM7PffPPtzI4Vuk4UG8Z7SpubX1hcSmeyyyura+u5jc16FAyEzWt24Abi2mIRdx2f12Indvl1KDjzLJc3rP6JjDeGXERO4F/Fo5C3PNbzna5jsxhQs1qI2qV9nbVLe+1c3igaaunTjpk4eUpWJch90A11KCCbBuQRJ59i+C4xivA1ySSDQmAtGgMT8BwV5/RAWeQOwOJgMKB9/Hs4NRPUx1lqRirbxi0utkCmTrvY50rRAlveyuFHsJ/Ydwrr/XvDWCnLCkewFhQzSvESeEy3YMzK9BLmpJbZmbKrmLp0pLpxUF+oENmn/aNziogA1lcRnc4UswcNS52HeAEftoYK5CtPFHTVcQeWKcuVip8oMugJWPn6qAdjNv8Oddqpl4qmUTSrB/nycTLwNG3TDhUw1UMq0wVVUIecywu90ptW1+61R+3pm6qlkpwt+rW05y9ktJHV</latexit>

...
<latexit sha1_base64="FOQGk7RS9N+CrSJbLo1W/IYn0qc=">AAACznichVFLS8NAEJ7GV1tfVY9egkXwVBIR9Fh84UWoYFqhLbJJt3VpXmy2hVqKV29e9afpb/HglzUVtEg3bGb2m2++ndlxY18kyrLec8bC4tLySr5QXF1b39gsbW3Xk2ggPe54kR/JO5cl3Bchd5RQPr+LJWeB6/OG2z9L440hl4mIwls1ink7YL1QdIXHFCCn1YlUcl8qWxVLL3PWsTOnTNmqRaUPalGHIvJoQAFxCknB94lRgq9JNlkUA2vTGJiEJ3Sc04SKyB2AxcFgQPv493BqZmiIc6qZ6GwPt/jYEpkm7WNfakUX7PRWDj+B/cR+1Fjv3xvGWjmtcATrQrGgFa+BK3oAY15mkDGntczPTLtS1KUT3Y1AfbFG0j69H51zRCSwvo6YdKGZPWi4+jzEC4SwDipIX3mqYOqOO7BMW65VwkyRQU/Cpq+PejBm++9QZ536YcW2KvbNUbl6mg08T7u0RweY6jFV6YpqqMOD5gu90ptRM4bGxHj6phq5LGeHfi3j+QuJRZDP</latexit>

s|S|
<latexit sha1_base64="f/nXaNbaQmhCxAZzz0BDJmT0eU8=">AAAC3HichVHLSsNAFD2N7/pqdemmWARXJRFBl8UXbgRFq4KPMoljDc2LZFqpbXfuxK07t/pN+i0uPBmjoCJOmNw755575t65duS5iTLNl5wxMDg0PDI6lh+fmJyaLhRnDpOwFTuy5oReGB/bIpGeG8iacpUnj6NYCt/25JHdXE/jR20ZJ24YHKhOJM980QjcS9cRilC9UEzq3d6pL9SVI7zufr/XrxfKZsXUq/TbsTKnjGzthoVXnOICIRy04EMigKLvQSDhdwILJiJiZ+gSi+m5Oi7RR565LbIkGYJok/8GTycZGvCcaiY62+EtHnfMzBIWuLe0ok12equkn9C+cd9orPHnDV2tnFbYobWpOKYVd4grXJHxX6afMT9r+T8z7UrhEqu6G5f1RRpJ+3S+dDYYiYk1daSETc1sUMPW5zZfIKCtsYL0lT8VSrrjC1qhrdQqQaYoqBfTpq/Pejhm6+dQfzuHSxXLrFh7y+XqWjbwUcxhHouc6gqq2MYu63BwjUc84dk4N26NO+P+g2rkspxZfFvGwzuC+Zal</latexit>

st 2 R20⇥10⇥5 ⇒ 21000
<latexit sha1_base64="JHVTe1E0LdW5gSpWwoufWJA2U/o=">AAAC9XichVHLShxBFD22Jj7ycDTLbBqHgGQxdIuSLMVHcCOoOCo4OlS15VhMv+iqUXSYf8gPZCdu3bmNv6HfkkVOl61gJFhN9T117r2n7q0r81gbGwR3Q97wyJu3o2PjE+/ef/g4WZua3jFZr4hUM8rirNiTwqhYp6pptY3VXl4okchY7crucunfPVWF0Vm6bc9zdZCITqqPdSQsqXbtq2lbv6VTv5UIeyJlf2tw2J8L/JbViTJ++IQWBu1aPWgEbvkvQViBOqq1kdXu0cIRMkToIYFCCkscQ8Dw20eIADm5A/TJFUTa+RUGmGBuj1GKEYJsl/8OT/sVm/JcahqXHfGWmLtgpo8v3D+comR0easiNrR/uC8c1/nvDX2nXFZ4TiupOO4U18lbnDDitcykinys5fXMsiuLY3x33WjWlzum7DN60lmhpyDXdR4fqy6yQw3pzqd8gZS2yQrKV35U8F3HR7TCWeVU0kpRUK+gLV+f9XDM4b9DfQl25hph0Ag35+uLS9XAx/AZM5jlVL9hEWvYYB0RfuIGv3HrnXm/vEvv6iHUG6pyPuHZ8q7/Asp+nvM=</latexit>
Overcoming the Curse of Dimensionality.

Problem: What do we do if the state space is too large!?

Idea: Combat via generalisation by function approximation

Q(s, a)
<latexit sha1_base64="GGiGM75WVSeWT8FnIwdQZAXOgcs=">AAACz3ichVFNS8NAEH2Nn61fVY9eikWoICURQY/FL7wILdgPqEU26bbGpklIUkVF8erNq/4z/S0efFlTQYt0w2Zm37x5O7Nj+o4dRrr+ntImJqemZ2bTmbn5hcWl7PJKLfQGgSWrlud4QcMUoXRsV1YjO3Jkww+k6JuOrJu9gzhev5ZBaHvuWXTry1ZfdF27Y1siIlSrFMItsXmRzetFXa3cqGMkTh7JKnvZD5yjDQ8WBuhDwkVE34FAyK8JAzp8Yi3cEwvo2Sou8YAMcwdkSTIE0R7/XZ6aCeryHGuGKtviLQ53wMwcNriPlaJJdnyrpB/SfnLfKaz77w33Sjmu8JbWpGJaKZ4Sj3BJxrjMfsIc1jI+M+4qQgd7qhub9fkKifu0fnQOGQmI9VQkhyPF7FLDVOdrvoBLW2UF8SsPFXKq4zatUFYqFTdRFNQLaOPXZz0cs/F3qKNObbto6EWjspMv7ScDn8Ua1lHgVHdRwgnKrMPCFV7wijetot1oj9rTN1VLJTmr+LW05y+ZJZBl</latexit>
Q(s, a; ✓)
<latexit sha1_base64="TGuI+mY7v1Zf8H/yaQEyTo8FjZ4=">AAAC2nichVFLS8NAEJ7GV1sfjXr0EiyCgpREBAUvxRdehBZsFazIJm7T0LxItoVavHgTr9686o/S3+LBb9dU0CJu2MzsN998O7Njx76XCtN8y2kTk1PTM/lCcXZufqGkLy4106iXOLzhRH6UXNgs5b4X8obwhM8v4oSzwPb5ud09kPHzPk9SLwrPxCDmVwFzQ6/tOUwAutZL9fV002B7Rkt0uGAb13rZrJhqGeOOlTllylYt0t+pRTcUkUM9CohTSAK+T4xSfJdkkUkxsCsaAkvgeSrO6Y6KyO2BxcFgQLv4uzhdZmiIs9RMVbaDW3zsBJkGrWEfK0UbbHkrh5/CfmDfKsz984ahUpYVDmBtKBaU4ilwQR0w/ssMMuaolv8zZVeC2rSruvFQX6wQ2afzrXOISAKsqyIGHSmmCw1bnft4gRC2gQrkK48UDNXxDSxTliuVMFNk0Etg5eujHozZ+j3Ucae5VbHMilXfLlf3s4HnaYVWaR1T3aEqnVANdcjJP9MLvWot7V570B6/qFouy1mmH0t7+gSfIZPd</latexit>

0 , W0 , b0 arg max
<latexit sha1_base64="sNsdZxyoN2fA9smgZeQUXwwpRkU=">AAAC73ichVHLShxBFD22mvhI4qhLN41jIAsZukPArIKgETeCguMIjgxVbU2nmH6lu0bUwR/wB7ITt+7c6p/ot7jwdNmGqASrqb73nnvvqfuQWaQL43m3Q87wyOi792PjE5MfPn6aqk3P7BRpPw9UM0ijNN+VolCRTlTTaBOp3SxXIpaRasneSulvHaq80GmybY4ztR+LMNFdHQhDqFNbaBc6jEXHW3TbRh0Z2R20Tv+1JK1Ore41PHvc14pfKXVUZzOt3aGNA6QI0EcMhQSGegSBgt8efHjIiO1jQCynpq1f4RQTzO0zSjFCEO3xH9Laq9CEdslZ2OyAr0S8OTNdfOZds4yS0eWrinpBec97YrHwvy8MLHNZ4TGlJOO4ZdwgbvCLEW9lxlXkUy1vZ5ZdGXTx3XajWV9mkbLP4C/PKj05sZ71uPhpI0NySGsfcgIJZZMVlFN+YnBtxweUwkplWZKKUZAvpyynz3q4Zv/lUl8rO18bvtfwt77Vl39UCx/DHObxhVtdwjLWsck6ApzhCte4cX47f5xz5+Ix1Bmqcmbx7DiXDzD6nUk=</latexit>
<latexit sha1_base64="/+7XIexzwjkO7e1cCKAN+kd+oWk=">AAAC0HichVFLS8NAEJ7GV1tfVY9egkXwVBIR1FvBB16EKvYBbZFNuo2hebFJS2sR8erNq/4y/S0e/HZNBS3SDZuZ/eabb2d2rMhz48Qw3jPa3PzC4lI2l19eWV1bL2xs1uKwL2xetUMvFA2LxdxzA15N3MTjjUhw5lser1u9ExmvD7iI3TC4SUYRb/vMCdyua7MEUJ0Jp+Wz4W2haJQMtfRpx0ydIqWrEhY+qEUdCsmmPvnEKaAEvkeMYnxNMsmgCFibxsAEPFfFOT1QHrl9sDgYDGgPfwenZooGOEvNWGXbuMXDFsjUaRf7XClaYMtbOfwY9hP7XmHOvzeMlbKscARrQTGnFC+BJ3QHxqxMP2VOapmdKbtKqEtHqhsX9UUKkX3aPzqniAhgPRXR6UwxHWhY6jzACwSwVVQgX3mioKuOO7BMWa5UglSRQU/AytdHPRiz+Xeo005tv2QaJfPqoFg+TgeepW3aoT1M9ZDKdEEV1CE7fKFXetOutaH2qD19U7VMmrNFv5b2/AWxrZGg</latexit>

at
...
<latexit sha1_base64="3MlFKqBQ7+VrzFliS+x/kF6vYPI=">AAACzHichVFLS8NAEJ7GV1tfVY9egkXwVBIR9Fh84UWpaB9QS9mk2xqaF5ttoZZevXnV36a/xYPfrqmgRbphM7PffPPtzI4T+14iLes9YywsLi2vZHP51bX1jc3C1nYtiQbC5VU38iPRcFjCfS/kVelJnzdiwVng+Lzu9M9UvD7kIvGi8F6OYt4KWC/0up7LJKA71pbtQtEqWXqZs46dOkVKVyUqfNADdSgilwYUEKeQJHyfGCX4mmSTRTGwFo2BCXiejnOaUB65A7A4GAxoH/8eTs0UDXFWmonOdnGLjy2QadI+9qVWdMBWt3L4Cewn9pPGev/eMNbKqsIRrAPFnFa8Bi7pEYx5mUHKnNYyP1N1JalLJ7obD/XFGlF9uj8654gIYH0dMelCM3vQcPR5iBcIYauoQL3yVMHUHXdgmbZcq4SpIoOegFWvj3owZvvvUGed2mHJtkr27VGxfJoOPEu7tEcHmOoxlemKKqjDRXUv9Epvxo0hjbEx+aYamTRnh34t4/kLHLiP2Q==</latexit>

<latexit sha1_base64="FOQGk7RS9N+CrSJbLo1W/IYn0qc=">AAACznichVFLS8NAEJ7GV1tfVY9egkXwVBIR9Fh84UWoYFqhLbJJt3VpXmy2hVqKV29e9afpb/HglzUVtEg3bGb2m2++ndlxY18kyrLec8bC4tLySr5QXF1b39gsbW3Xk2ggPe54kR/JO5cl3Bchd5RQPr+LJWeB6/OG2z9L440hl4mIwls1ink7YL1QdIXHFCCn1YlUcl8qWxVLL3PWsTOnTNmqRaUPalGHIvJoQAFxCknB94lRgq9JNlkUA2vTGJiEJ3Sc04SKyB2AxcFgQPv493BqZmiIc6qZ6GwPt/jYEpkm7WNfakUX7PRWDj+B/cR+1Fjv3xvGWjmtcATrQrGgFa+BK3oAY15mkDGntczPTLtS1KUT3Y1AfbFG0j69H51zRCSwvo6YdKGZPWi4+jzEC4SwDipIX3mqYOqOO7BMW65VwkyRQU/Cpq+PejBm++9QZ536YcW2KvbNUbl6mg08T7u0RweY6jFV6YpqqMOD5gu90ptRM4bGxHj6phq5LGeHfi3j+QuJRZDP</latexit>

h
<latexit sha1_base64="VtLQlKuk0ifENTNCGd/5lJxnnw8=">AAAC03ichVFLS8NAEJ7GV1tfVY9egkXwVBIR9Fh84UWoYNpiWyRJt2loXiTbYg29iFdvXvV/6W/x4LdrKmgRN2xm9puZb15W5LkJ17S3nDI3v7C4lC8Ul1dW19ZLG5v1JBzGNjPs0AvjpmUmzHMDZnCXe6wZxcz0LY81rMGJsDdGLE7cMLjm44h1fNMJ3J5rmxzQTZuzO2710v7ktlTWKpo86qyiZ0qZslMLS+/Upi6FZNOQfGIUEIfukUkJvhbppFEErEMpsBiaK+2MJlRE7BBeDB4m0AH+Dl6tDA3wFpyJjLaRxcONEanSLu65ZLTgLbIy6AnkB+69xJw/M6SSWVQ4hrTAWJCMl8A59eHxX6SfeU5r+T9SdMWpR0eyGxf1RRIRfdrfPKewxMAG0qLSmfR0wGHJ9wgTCCANVCCmPGVQZcddSFNKJlmCjNEEXwwppo96sGb991Jnlfp+Rdcq+tVBuXqcLTxP27RDe9jqIVXpgmqow0aWZ3qhV8VQUuVBefxyVXJZzBb9OMrTJ5vokzQ=</latexit>
<latexit
|A|
<latexit sha1_base64="O22Tcp77vszjC8F3PCIlSoR5AjE=">AAAC1nichVFNS8NAEH3G7++qRy/FIngqiQh6rJ94ERSsLbRFNulaQ/Nlsi1o1Zt49eZVf5b+Fg++rFHQIt2wmdk3b97O7NiR5ybKNN+GjOGR0bHxicmp6ZnZufncwuJZEnZiR5ad0Avjqi0S6bmBLCtXebIaxVL4ticrdns3jVe6Mk7cMDhV15Fs+KIVuBeuIxShxm3dF+rSEV5v++72PFcwi6Ze+X7HypwCsnUc5t5RRxMhHHTgQyKAou9BIOFXgwUTEbEGesRieq6OS9xhirkdsiQZgmib/xZPtQwNeE41E53t8BaPO2ZmHqvcB1rRJju9VdJPaD+4bzTW+veGnlZOK7ymtak4qRWPiCtckjEo08+Y37UMzky7UrjAlu7GZX2RRtI+nR+dPUZiYm0dyWNfM1vUsPW5yxcIaMusIH3lb4W87rhJK7SVWiXIFAX1Ytr09VkPx2z9HWq/c7ZetMyidbJRKO1kA5/AMlawxqluooRDHLMOB1d4xgtejapxbzwYj19UYyjLWcKvZTx9AskKlHA=</latexit>
Fitted Q-Learning - Gordon (1996).
• Regression Problem - Mean Squared Bellman/TD Error:

LM SBE = Es,a,r,s0 [(Q(s, a; ✓k )


<latexit sha1_base64="KitHtnxNhHkQ5o9eeiEU8Xz+PEg=">AAADEXichVHLahRBFD1pX0l8ZNSlm8IhOMI4dAdBQYQQneDCQEKcJDIzDlWdykwz/aK6JhCb+Qp/wx9wJ27dufPxKS48XXYEDZJqqu+959576j5UHkeF9f2vC96Fi5cuX1lcWr567fqNlcbNW3tFNjOh7oVZnJkDJQsdR6nu2cjG+iA3WiYq1vtq+qzy7x9rU0RZ+sqe5HqYyHEaHUWhtIRGjc1BIu0klHH5cj4qt3Y3unPxVDhQqbJLrGjLtmkX9+b91k6LxhMxsBNt5Wh6XzwQrynerA1Hjabf8d0RZ5WgVpqoz3bW+IYBDpEhxAwJNFJY6jEkCn59BPCRExuiJGaoRc6vMccyc2eM0oyQRKf8j2n1azSlXXEWLjvkKzGvYabAKu+mY1SMrl7V1AvKn7xvHTb+7wulY64qPKFUZFxyjFvELSaMOC8zqSNPazk/s+rK4giPXTcR68sdUvUZ/uF5To8hNnUega6LHJNDOfuYE0gpe6ygmvIpg3AdH1JKJ7VjSWtGST5DWU2f9XDNwb9LPavsrXUCvxPsPGyub9QLX8Qd3EWLW32EdbzANusI8R5f8B0/vHfeB++j9+l3qLdQ59zGX8f7/AvAfakN</latexit>
Yk ) 2 ]
0 0
Yk = r + max
0
Q(s , a ; ✓k )
<latexit sha1_base64="1wD1k29/nTB246kxSB2PhWZe0ik=">AAADCnichVHLahRBFD1pX8n4GnXppnCQiShDtwgJiBifuBEScJJIOjTVnUpP0U+qa4JxmD/wN/wBd+LWnVtd6Le48HTZETRIqqm+955776n7iOtcN9b3vy94p06fOXtucal3/sLFS5f7V65uNtXUJGqcVHlltmPZqFyXamy1zdV2bZQs4lxtxdmT1r91oEyjq/KVPazVbiHTUu/rRFpCUf/h6ygTD4QRt0WYyqKQIizkm2gmhyLUZWvYSSLz2aP5XGwsN8M7Qg7vi9BOlJVRdqsX9Qf+yHdHHFeCThmgO+tV/wdC7KFCgikKKJSw1HNINPx2EMBHTWwXM2KGmnZ+hTl6zJ0ySjFCEs34T2ntdGhJu+VsXHbCV3Jew0yBm7zPHWPM6PZVRb2h/Mn71mHpf1+YOea2wkPKmIxLjvElcYsJI07KLLrIo1pOzmy7stjHqutGs77aIW2fyR+ep/QYYpnzCDxzkSk5YmcfcAIl5ZgVtFM+YhCu4z1K6aRyLGXHKMlnKNvpsx6uOfh3qceVzbujwB8FG/cGa4+7hS/iOm5gmVtdwRpeYJ11JHiPL/iKb94774P30fv0O9Rb6HKu4a/jff4Fng+lXg==</latexit>
a 2A

• Update Weights with SGD + Backprop:

✓k+1 = ✓k + ↵(Yk
<latexit sha1_base64="2+NFUqS7sEStguF1toJB6Jodjms=">AAADJHichVFNSxxBEH1OjN+ajR5zabIEVozLTBAUgiAmSi4BBdcPXFl6xnZ3mN6ZYaZXMMv+nfyN/IHcQg7mkJvX+Ac8+KadDaiIPfRU1atXr6u7/FSHuXHdyxHnxejLsfGJyanpmdm5V5XX8/t50ssC1QgSnWSHvsyVDmPVMKHR6jDNlOz6Wh340acif3CusjxM4j1zkaqTrmzH4VkYSEOoVTlqmo4ystWPlryBWBdlGIkl0ZQ67cjaEYNlsVvL38uPw+ziomjG0tesG0KDh5RWperWXbvEY8crnSrKtZNU/qCJUyQI0EMXCjEMfQ2JnN8xPLhIiZ2gTyyjF9q8wgBTrO2RpciQRCP+24yOSzRmXGjmtjrgKZo7Y6XAO+5tq+iTXZyq6Oe0N9zfLNZ+8oS+VS46vKD1qThpFb8SN+iQ8Vxlt2QOe3m+sriVwRnW7G1C9pdapLhn8F/nMzMZschmBLYss00N38bnfIGYtsEOilceKgh741Naaa2yKnGpKKmX0Ravz344Zu/hUB87+x/qnlv3dleqG5vlwCfwBm9R41RXsYEv2GEfAX7gCv9w7Xx3fjq/nN93VGekrFnAveX8vQWVULCI</latexit>
Q(s, a; ✓k ))r✓k Q(s, a; ✓k )

No convergence Wasteful Online Weight updates


guarantees Updates change targets
DQNs - Mnih et al (2013, 2015).

Wasteful Online Store & reuse


Updates past transitions.

• Experience Replay: <latexit sha1_base64="+7ol/eQbcpoNERVWKhBL3VK59Uk=">AAAC6nichVFNSxxBEH2OSfzIh6s5ekjjIuawLDMi6MEESTTkIihkVXBFesZ2Hbbng+5ewSx7zB/ILeTqzav+mOS35OCbzhhIJNhDT1W/evW6qisudWpdGP4YC8YfPX4yMTk1/fTZ8xczjdm5PVsMTKI6SaELcxBLq3Saq45LnVYHpVEyi7Xaj/vvq/j+uTI2LfJP7qJUR5ns5elpmkhH6LjxqptJd5ZIPdwciTeiO1y3LSFbwrSEXXrbHR03mmE79Evcd6LaaaJeO0XjJ7o4QYEEA2RQyOHoa0hYfoeIEKIkdoQhMUMv9XGFEaaZOyBLkSGJ9vnv8XRYoznPlab12Qlv0dyGmQKL3B+8Ykx2dauib2l/cX/2WO+/Nwy9clXhBW1MxSmvuE3c4YyMhzKzmnlXy8OZVVcOp1jz3aSsr/RI1WfyR2eTEUOs7yMCW57Zo0bsz+d8gZy2wwqqV75TEL7jE1rprfIqea0oqWdoq9dnPRxz9O9Q7zt7y+0obEe7K82Nd/XAJzGPBbzmVFexgY/YYR0JvuAK17gJdPA1+BZ8/00Nxuqcl/hrBZe3O++Zyw==</latexit>


D = {< s, a, r, s0 >} („Dataset“)

LM SBE = Es,a,r,s0 ⇠U (D) [(Q(s, a; ✓k )


<latexit sha1_base64="TF6k7cIIaw9B8a/P4/3OUwl2vG8=">AAADLnichVFNaxRBEK2MX0n8yEaPXhoXcQPrMhMEhSCEmBUPBhJ0k8juOnRPOrvNfNLdG4jD/Cf/hn9AUBCv3rwqePBNO1nQIOmhp6pfvXpd1SWKRBnr+58WvEuXr1y9tri0fP3GzVsrrdXb+yaf6UgOojzJ9aHgRiYqkwOrbCIPCy15KhJ5IOJndfzgRGqj8uy1PS3kOOWTTB2riFtAYSsapdxOI56UL6uw3Hm11a/YU+ZAIco+MNPlXd01D9jIqJTN6YOqM/e3q7Vq2NnrgLrBRnYqLQ/jNfaQvQnj8dv1sNX2e75b7LwTNE6bmrWbtz7TiI4op4hmlJKkjCz8hDgZfEMKyKcC2JhKYBqecnFJFS0jdwaWBIMDjfGf4DRs0AznWtO47Ai3JNgamYzuYz93igLs+lYJ38D+wn7nsMl/byidcl3hKayA4pJT3AFuaQrGRZlpwzyr5eLMuitLx/TEdaNQX+GQus9orrONiAYWuwijvmNOoCHc+QQvkMEOUEH9ymcKzHV8BMudlU4laxQ59DRs/fqoB2MO/h3qeWd/vRf4vWDvUXtzqxn4It2le9TBVB/TJr2gXdQR0Qf6Tj/op/fe++h98b7+oXoLTc4d+mt5334DAe+1MQ==</latexit>
Yk ]2

< s1 , a1 , r2 , s2 > < s1 , a1 , r2 , s2 >


<latexit sha1_base64="Afkh7r4bYyL7px+f5agnSRAHPok=">AAAC4HichVFLS8NAEJ7GV1tfVfHkJVgED6UkRdCDSPGFF6GCfUAtZZNua2iahE1aqMW7N/Hqzav+Iv0tHvx2TQUt0l12Z+abmW9ndqzAdcLIMN4T2szs3PxCMpVeXFpeWc2srVdCvy9sXrZ91xc1i4XcdTxejpzI5bVAcNazXF61uifSXx1wETq+dx0NA97osY7ntB2bRYCamc1DPWyaOZ3JSzQLOZiFo2Yma+QNtfRJxYyVLMWr5Gc+6IZa5JNNfeoRJ48i6C4xCrHrZJJBAbAGjYAJaI7yc7qnNHL7iOKIYEC7uDuw6jHqwZacocq28YqLI5Cp0w7OuWK0EC1f5dBDyE+cO4V1/n1hpJhlhUNIC4wpxXgJPKJbREzL7MWR41qmZ8quImrTgerGQX2BQmSf9g/PKTwCWFd5dDpTkR1wWMoe4Ac8yDIqkL88ZtBVxy1IpiRXLF7MyMAnIOXvox6M2fw71EmlUsibRt682ssWj+OBJ2mLtmkXU92nIl1QCXXY6OaFXulNs7QH7VF7+g7VEnHOBv1a2vMXzTKVAw==</latexit>

< s1 , a1 , r2 , s2 >
.. ..
<latexit sha1_base64="Afkh7r4bYyL7px+f5agnSRAHPok=">AAAC4HichVFLS8NAEJ7GV1tfVfHkJVgED6UkRdCDSPGFF6GCfUAtZZNua2iahE1aqMW7N/Hqzav+Iv0tHvx2TQUt0l12Z+abmW9ndqzAdcLIMN4T2szs3PxCMpVeXFpeWc2srVdCvy9sXrZ91xc1i4XcdTxejpzI5bVAcNazXF61uifSXx1wETq+dx0NA97osY7ntB2bRYCamc1DPWyaOZ3JSzQLOZiFo2Yma+QNtfRJxYyVLMWr5Gc+6IZa5JNNfeoRJ48i6C4xCrHrZJJBAbAGjYAJaI7yc7qnNHL7iOKIYEC7uDuw6jHqwZacocq28YqLI5Cp0w7OuWK0EC1f5dBDyE+cO4V1/n1hpJhlhUNIC4wpxXgJPKJbREzL7MWR41qmZ8quImrTgerGQX2BQmSf9g/PKTwCWFd5dDpTkR1wWMoe4Ac8yDIqkL88ZtBVxy1IpiRXLF7MyMAnIOXvox6M2fw71EmlUsibRt682ssWj+OBJ2mLtmkXU92nIl1QCXXY6OaFXulNs7QH7VF7+g7VEnHOBv1a2vMXzTKVAw==</latexit>

<latexit sha1_base64="Afkh7r4bYyL7px+f5agnSRAHPok=">AAAC4HichVFLS8NAEJ7GV1tfVfHkJVgED6UkRdCDSPGFF6GCfUAtZZNua2iahE1aqMW7N/Hqzav+Iv0tHvx2TQUt0l12Z+abmW9ndqzAdcLIMN4T2szs3PxCMpVeXFpeWc2srVdCvy9sXrZ91xc1i4XcdTxejpzI5bVAcNazXF61uifSXx1wETq+dx0NA97osY7ntB2bRYCamc1DPWyaOZ3JSzQLOZiFo2Yma+QNtfRJxYyVLMWr5Gc+6IZa5JNNfeoRJ48i6C4xCrHrZJJBAbAGjYAJaI7yc7qnNHL7iOKIYEC7uDuw6jHqwZacocq28YqLI5Cp0w7OuWK0EC1f5dBDyE+cO4V1/n1hpJhlhUNIC4wpxXgJPKJbREzL7MWR41qmZ8quImrTgerGQX2BQmSf9g/PKTwCWFd5dDpTkR1wWMoe4Ac8yDIqkL88ZtBVxy1IpiRXLF7MyMAnIOXvox6M2fw71EmlUsibRt682ssWj+OBJ2mLtmkXU92nIl1QCXXY6OaFXulNs7QH7VF7+g7VEnHOBv1a2vMXzTKVAw==</latexit>

.
<latexit sha1_base64="GIJ6uzVdC/d/ABdzhSnpiIVSi8U=">AAACz3ichVFNS8NAEH3G7++qRy/BIngqiQjqTfADL4KCrYIWSdI1rk2TkN1WVBSv3rzqP9Pf4sGXNRW0SDdsZvbNm7czO34aSaUd533AGhwaHhkdG5+YnJqemS3NzddU0s4CUQ2SKMlOfU+JSMaiqqWOxGmaCa/lR+LEb27n8ZOOyJRM4mN9m4p6ywtjeSkDTxOqnXcaiVYXpbJTccyyex23cMoo1mFS+sA5GkgQoI0WBGJo+hE8KH5ncOEgJVbHPbGMnjRxgQdMMLdNliDDI9rkP+TprEBjnnNNZbID3hJxZ8y0scy9ZxR9svNbBX1F+8l9Z7Dw3xvujXJe4S2tT8Vxo3hAXOOKjH6ZrYLZraV/Zt6VxiU2TDeS9aUGyfsMfnR2GMmINU3Exq5hhtTwzbnDF4hpq6wgf+Wugm06btB6xgqjEheKHvUy2vz1WQ/H7P4daq9TW624TsU9WitvbRYDH8MilrDCqa5jC/s4ZB0BrvGCV7xZR9aN9Wg9fVOtgSJnAb+W9fwFyaWRRg==</latexit>
<latexit
.
<latexit sha1_base64="GIJ6uzVdC/d/ABdzhSnpiIVSi8U=">AAACz3ichVFNS8NAEH3G7++qRy/BIngqiQjqTfADL4KCrYIWSdI1rk2TkN1WVBSv3rzqP9Pf4sGXNRW0SDdsZvbNm7czO34aSaUd533AGhwaHhkdG5+YnJqemS3NzddU0s4CUQ2SKMlOfU+JSMaiqqWOxGmaCa/lR+LEb27n8ZOOyJRM4mN9m4p6ywtjeSkDTxOqnXcaiVYXpbJTccyyex23cMoo1mFS+sA5GkgQoI0WBGJo+hE8KH5ncOEgJVbHPbGMnjRxgQdMMLdNliDDI9rkP+TprEBjnnNNZbID3hJxZ8y0scy9ZxR9svNbBX1F+8l9Z7Dw3xvujXJe4S2tT8Vxo3hAXOOKjH6ZrYLZraV/Zt6VxiU2TDeS9aUGyfsMfnR2GMmINU3Exq5hhtTwzbnDF4hpq6wgf+Wugm06btB6xgqjEheKHvUy2vz1WQ/H7P4daq9TW624TsU9WitvbRYDH8MilrDCqa5jC/s4ZB0BrvGCV7xZR9aN9Wg9fVOtgSJnAb+W9fwFyaWRRg==</latexit>
<latexit

< s2 , a2 , r2 , s3 >
< st , at , rt+1 , st+1 > < st , at , rt+1 , st+1 >
<latexit sha1_base64="p7cQ9Xm1mSEMwxGOAx91uF85uMA=">AAAC33ichVFLSwMxEJ6ur/peHzcvi0XwIGVXBT2IiC+8CBVsFaqU7Bpr6L5ItoIWz97Eqzev+o/0t3jwS9wKWsSEZCbfzHyZh5+GQmWu+1aw+voHBoeKwyOjY+MTk/bUdE0lbRnwapCEiTz1meKhiHk1E1nIT1PJWeSH/MRv7Wj7yTWXSiTxcXaT8vOINWNxKQKWAWrYsxuqsbzkMH1JfanGymbDLrll1yynV/FypUT5qiT2O53RBSUUUJsi4hRTBj0kRgq7Th65lAI7pw4wCU0YO6c7GkFsG14cHgxoC3cTr3qOxnhrTmWiA/wS4khEOrSAs28YfXjrXzl0BfmBc2uw5p8/dAyzzvAG0gfjsGE8BJ7RFTz+i4xyz24u/0fqqjK6pHVTjUB+qUF0ncE3zy4sEljLWBzaM55NcPjmfY0OxJBVZKC73GVwTMUXkMxIbljinJGBT0Lq7iMfjNn7PdRepbZc9tyyd7Ra2trOB16kOZqnRUx1jbbogCrII0C3n+mFXi1m3VsP1uOXq1XIY2box7KePgFfGZTc</latexit>
sha1_base64="I3qzNHs0O153gIiFHHCMQ4HedNo=">AAACu3ichVLLSsNAFD2Nr7ZWrWs3wSK4KokbXQo+EEGoYB9QiyTpNIamSZhMK7X4A279Ov0WF56MqaBFOmFy75x777mPGTcJg1RZ1nvBWFldW98olsqblfLW9k610krjsfRE04vDWHZcJxVhEImmClQoOokUzsgNRdsdnmX29kTINIijOzVNRG/k+FEwCDxHEWo8VGtW3dLLXFTsXKkhX3H1A/foI4aHMUYQiKCoh3CQ8uvChoWEWA8zYpJaoO0CLygzdkwvQQ+H6JB/n6dujkY8Z5ypjvaYJeSWjDRxwH2pGV16Z1kF9ZTyk/tZY/6/GWaaOatwSumSsaQZb4grPNJjWeQo95zXsjwy60phgBPdTcD6Eo1kfXo/POe0SGJDbTFxoT19crj6POEEIsomK8imPGcwdcd9SkdLoVminNEhn6TMps96eMv23ztdVFpHdduq27cWitjDPg55mcc4xRUaTO8xySvejGsjMZ6+X4NRyJ/FLn4tY/oFhOiMYg==</latexit>
sha1_base64="LFYsxe7+Evc4f0URhHGPFDqp9nE=">AAAC1HichVLLSsNAFD3G97uKOzfBIriQkuhCFyKCD9wICrYKKmUSxxqaJmEmLdTi2p34CW71j/RbXHhmjIKKOGHm3jn33nMfkyCLI5173kuf0z8wODQ8Mjo2PjE5NV2amajptK1CWQ3TOFWngdAyjhJZzaM8lqeZkqIVxPIkaG4b+0lHKh2lyXHezeRFSzSS6CoKRU6oXprb0PWVZVeYQ5lD11c366WyV/Hscn8rfqGUUazDtPSKc1wiRYg2WpBIkFOPIaD5ncGHh4zYBXrEFLXI2iVuMcbYNr0kPQTRJs8Gb2cFmvBuOLWNDpkl5laMdLHIvWcZA3qbrJK6pnzjvrFY488MPctsKuxSBmQctYwHxHNc0+O/yFbh+VnL/5GmqxxXWLfdRKwvs4jpM/zi2aFFEWtai4td69kgR2DvHU4goayyAjPlTwbXdnxJKayUliUpGAX5FKWZPuvhM/s/H/W3Ulup+F7FP/IwgnksYImPuYYt7OOQ6UMO+RFPeHaEc+fcf/wQTl/xZ8zi23Ie3gHwbZOR</latexit>
sha1_base64="5qilDapBIhFoJtqSot57jj6Q53s=">AAAC33ichVFLS8NAEB7jq/UZHzcvwSJ4kJLUgx5Eii+8CBVsK2gJm3StoWkSdtNCLZ69iVdvXvUf6W/x4LdrKqiIu+zO7Dcz387DS8JAprb9OmKMjo1PTObyU9Mzs3Pz5sJiTcZd4fOqH4exOPeY5GEQ8WoapCE/TwRnHS/kda+9r+z1HhcyiKOztJ/wRoe1ouAq8FkKyDWXd6Rb2rCYuoS6pLu565oFu2jrZf1WnEwpULYqsflGl9SkmHzqUoc4RZRCD4mRxL4gh2xKgDVoAExAC7Sd0y1NIbYLLw4PBrSNu4XXRYZGeCtOqaN9/BLiCERatIZzpBk9eKtfOXQJ+Y5zo7HWnz8MNLPKsA/pgTGvGU+Ap3QNj/8iO5nnMJf/I1VVKV3Rtq4mQH6JRlSd/hfPASwCWFtbLDrUni1wePrdQwciyCoyUF0eMli64iYk05JrlihjZOATkKr7yAdjdn4O9bdSKxUdu+ic2oXyXjbwHK3QKq1jqltUpmOqIA8f3X6iZ3oxmHFn3BsPn67GSBazRN+W8fgBXdmU2A==</latexit>

..
.
<latexit sha1_base64="SKilMgZpKOi2pkpFvelZVnpemkg=">AAAC6HichVFBSxtBFP7c1qqpttv2WAqLQRAMYbcU9CBFaiteChGMChrC7GaM22x2l9lJIA2e/AO9lV5767X+Gvtbeug340awQTLL7Hvzve998968ME/iQvv+zZzz6PH8k4XFpcrT5ZVnz90XL4+KbKAi2YyyJFMnoShkEqeyqWOdyJNcSdEPE3kc9nZN/HgoVRFn6aEe5bLVF900Po8joQm13TfbXtHWNU+Yn2qP9UZwWSNknfdtt+rXfbu8aSconSrK1cjcPzhDBxkiDNCHRApNP4FAwe8UAXzkxFoYE1P0YhuXuESFuQOyJBmCaI//Lk+nJZrybDQLmx3xloRbMdPDGveeVQzJNrdK+gXtX+6vFus+eMPYKpsKR7QhFZes4mfiGhdkzMrsl8xJLbMzTVca59iy3cSsL7eI6TO60/nIiCLWsxEPnyyzS43Qnod8gZS2yQrMK08UPNtxh1ZYK61KWioK6ila8/qsh2MO/h/qtHP0th749eDgXXXnQznwRbzGKtY51U3sYB8N1hHhCr/wG9fOF+eb8935cUt15sqcV7i3nJ//ACK8mQU=</latexit>

<latexit sha1_base64="SKilMgZpKOi2pkpFvelZVnpemkg=">AAAC6HichVFBSxtBFP7c1qqpttv2WAqLQRAMYbcU9CBFaiteChGMChrC7GaM22x2l9lJIA2e/AO9lV5767X+Gvtbeug340awQTLL7Hvzve998968ME/iQvv+zZzz6PH8k4XFpcrT5ZVnz90XL4+KbKAi2YyyJFMnoShkEqeyqWOdyJNcSdEPE3kc9nZN/HgoVRFn6aEe5bLVF900Po8joQm13TfbXtHWNU+Yn2qP9UZwWSNknfdtt+rXfbu8aSconSrK1cjcPzhDBxkiDNCHRApNP4FAwe8UAXzkxFoYE1P0YhuXuESFuQOyJBmCaI//Lk+nJZrybDQLmx3xloRbMdPDGveeVQzJNrdK+gXtX+6vFus+eMPYKpsKR7QhFZes4mfiGhdkzMrsl8xJLbMzTVca59iy3cSsL7eI6TO60/nIiCLWsxEPnyyzS43Qnod8gZS2yQrMK08UPNtxh1ZYK61KWioK6ila8/qsh2MO/h/qtHP0th749eDgXXXnQznwRbzGKtY51U3sYB8N1hHhCr/wG9fOF+eb8935cUt15sqcV7i3nJ//ACK8mQU=</latexit>

.. ..
<latexit sha1_base64="GIJ6uzVdC/d/ABdzhSnpiIVSi8U=">AAACz3ichVFNS8NAEH3G7++qRy/BIngqiQjqTfADL4KCrYIWSdI1rk2TkN1WVBSv3rzqP9Pf4sGXNRW0SDdsZvbNm7czO34aSaUd533AGhwaHhkdG5+YnJqemS3NzddU0s4CUQ2SKMlOfU+JSMaiqqWOxGmaCa/lR+LEb27n8ZOOyJRM4mN9m4p6ywtjeSkDTxOqnXcaiVYXpbJTccyyex23cMoo1mFS+sA5GkgQoI0WBGJo+hE8KH5ncOEgJVbHPbGMnjRxgQdMMLdNliDDI9rkP+TprEBjnnNNZbID3hJxZ8y0scy9ZxR9svNbBX1F+8l9Z7Dw3xvujXJe4S2tT8Vxo3hAXOOKjH6ZrYLZraV/Zt6VxiU2TDeS9aUGyfsMfnR2GMmINU3Exq5hhtTwzbnDF4hpq6wgf+Wugm06btB6xgqjEheKHvUy2vz1WQ/H7P4daq9TW624TsU9WitvbRYDH8MilrDCqa5jC/s4ZB0BrvGCV7xZR9aN9Wg9fVOtgSJnAb+W9fwFyaWRRg==</latexit>
<latexit

< st , at , rt+1 , st+1 >


.
<latexit sha1_base64="GIJ6uzVdC/d/ABdzhSnpiIVSi8U=">AAACz3ichVFNS8NAEH3G7++qRy/BIngqiQjqTfADL4KCrYIWSdI1rk2TkN1WVBSv3rzqP9Pf4sGXNRW0SDdsZvbNm7czO34aSaUd533AGhwaHhkdG5+YnJqemS3NzddU0s4CUQ2SKMlOfU+JSMaiqqWOxGmaCa/lR+LEb27n8ZOOyJRM4mN9m4p6ywtjeSkDTxOqnXcaiVYXpbJTccyyex23cMoo1mFS+sA5GkgQoI0WBGJo+hE8KH5ncOEgJVbHPbGMnjRxgQdMMLdNliDDI9rkP+TprEBjnnNNZbID3hJxZ8y0scy9ZxR9svNbBX1F+8l9Z7Dw3xvujXJe4S2tT8Vxo3hAXOOKjH6ZrYLZraV/Zt6VxiU2TDeS9aUGyfsMfnR2GMmINU3Exq5hhtTwzbnDF4hpq6wgf+Wugm06btB6xgqjEheKHvUy2vz1WQ/H7P4daq9TW624TsU9WitvbRYDH8MilrDCqa5jC/s4ZB0BrvGCV7xZR9aN9Wg9fVOtgSJnAb+W9fwFyaWRRg==</latexit>
<latexit
.
<latexit sha1_base64="GIJ6uzVdC/d/ABdzhSnpiIVSi8U=">AAACz3ichVFNS8NAEH3G7++qRy/BIngqiQjqTfADL4KCrYIWSdI1rk2TkN1WVBSv3rzqP9Pf4sGXNRW0SDdsZvbNm7czO34aSaUd533AGhwaHhkdG5+YnJqemS3NzddU0s4CUQ2SKMlOfU+JSMaiqqWOxGmaCa/lR+LEb27n8ZOOyJRM4mN9m4p6ywtjeSkDTxOqnXcaiVYXpbJTccyyex23cMoo1mFS+sA5GkgQoI0WBGJo+hE8KH5ncOEgJVbHPbGMnjRxgQdMMLdNliDDI9rkP+TprEBjnnNNZbID3hJxZ8y0scy9ZxR9svNbBX1F+8l9Z7Dw3xvujXJe4S2tT8Vxo3hAXOOKjH6ZrYLZraV/Zt6VxiU2TDeS9aUGyfsMfnR2GMmINU3Exq5hhtTwzbnDF4hpq6wgf+Wugm06btB6xgqjEheKHvUy2vz1WQ/H7P4daq9TW624TsU9WitvbRYDH8MilrDCqa5jC/s4ZB0BrvGCV7xZR9aN9Wg9fVOtgSJnAb+W9fwFyaWRRg==</latexit>
<latexit

..
<latexit sha1_base64="SKilMgZpKOi2pkpFvelZVnpemkg=">AAAC6HichVFBSxtBFP7c1qqpttv2WAqLQRAMYbcU9CBFaiteChGMChrC7GaM22x2l9lJIA2e/AO9lV5767X+Gvtbeug340awQTLL7Hvzve998968ME/iQvv+zZzz6PH8k4XFpcrT5ZVnz90XL4+KbKAi2YyyJFMnoShkEqeyqWOdyJNcSdEPE3kc9nZN/HgoVRFn6aEe5bLVF900Po8joQm13TfbXtHWNU+Yn2qP9UZwWSNknfdtt+rXfbu8aSconSrK1cjcPzhDBxkiDNCHRApNP4FAwe8UAXzkxFoYE1P0YhuXuESFuQOyJBmCaI//Lk+nJZrybDQLmx3xloRbMdPDGveeVQzJNrdK+gXtX+6vFus+eMPYKpsKR7QhFZes4mfiGhdkzMrsl8xJLbMzTVca59iy3cSsL7eI6TO60/nIiCLWsxEPnyyzS43Qnod8gZS2yQrMK08UPNtxh1ZYK61KWioK6ila8/qsh2MO/h/qtHP0th749eDgXXXnQznwRbzGKtY51U3sYB8N1hHhCr/wG9fOF+eb8935cUt15sqcV7i3nJ//ACK8mQU=</latexit>

. <latexit sha1_base64="GIJ6uzVdC/d/ABdzhSnpiIVSi8U=">AAACz3ichVFNS8NAEH3G7++qRy/BIngqiQjqTfADL4KCrYIWSdI1rk2TkN1WVBSv3rzqP9Pf4sGXNRW0SDdsZvbNm7czO34aSaUd533AGhwaHhkdG5+YnJqemS3NzddU0s4CUQ2SKMlOfU+JSMaiqqWOxGmaCa/lR+LEb27n8ZOOyJRM4mN9m4p6ywtjeSkDTxOqnXcaiVYXpbJTccyyex23cMoo1mFS+sA5GkgQoI0WBGJo+hE8KH5ncOEgJVbHPbGMnjRxgQdMMLdNliDDI9rkP+TprEBjnnNNZbID3hJxZ8y0scy9ZxR9svNbBX1F+8l9Z7Dw3xvujXJe4S2tT8Vxo3hAXOOKjH6ZrYLZraV/Zt6VxiU2TDeS9aUGyfsMfnR2GMmINU3Exq5hhtTwzbnDF4hpq6wgf+Wugm06btB6xgqjEheKHvUy2vz1WQ/H7P4daq9TW624TsU9WitvbRYDH8MilrDCqa5jC/s4ZB0BrvGCV7xZR9aN9Wg9fVOtgSJnAb+W9fwFyaWRRg==</latexit>
<latexit

< sN , aN , rN +1 , sN +1 >
<latexit sha1_base64="em8c2QdqgBXv3tasAKgm5IROLCg=">AAAC6HichVHLSsRAECzj+73qUYTgIgjKkoigBxHxhRdFwVVBZZnEcY2bTcIkK+jiyR/wJl69edWv0W/xYM0YBRVxwqR7qqtruqe9JAzSzHFeWqzWtvaOzq7unt6+/oHBwtDwXho3lC/LfhzG6sATqQyDSJazIAvlQaKkqHuh3PdqKzq+fyFVGsTRbnaZyOO6qEbBaeCLjFClMLZgp5WtaVvon6o0t6bc62lCxlmsFIpOyTHL/u24uVNEvrbjwiuOcIIYPhqoQyJCRj+EQMrvEC4cJMSO0SSm6AUmLnGNHuY2yJJkCKI1/qs8HeZoxLPWTE22z1tCbsVMGxPc60bRI1vfKumntG/cVwar/nlD0yjrCi9pPSp2G8VN4hnOyPgvs54zP2v5P1N3leEU86abgPUlBtF9+l86q4woYjUTsbFmmFVqeOZ8wReIaMusQL/yp4JtOj6hFcZKoxLlioJ6ila/PuvhmN2fQ/3t7M2UXKfk7swWl5bzgXdhFOOY5FTnsIQNbLMOHzd4xBOerXPr1rqz7j+oVkueM4Jvy3p4B60mmG0=</latexit>

< sN +1 , aN +1 , rN +2 , sN +2 > <latexit sha1_base64="1tcCjyZpnn9b5b6SsnbLYmWPr6M=">AAAC8HichVHBattAEH1Wm8Z20tZtjrmImkChxUgh0BxKCW0ScmlxobYDqTErZeMIy5LYlQOu8RfkB3ILueaWa/Il7bf00LdrudCE4hWreftm5u3MTpDFkc4972fJefR46clyuVJdWX367Hntxcu2TkcqlK0wjVN1GAgt4yiRrTzKY3mYKSmGQSw7weCT8XfOpNJRmnzLx5nsDkU/iU6iUOSkerWN967uTb688advXTEHyoBNAj0DH3q1utfw7HIfAr8AdRSrmdZ+4TuOkSLECENIJMiJYwhofkfw4SEj18WEnCKKrF9iiipzR4ySjBBkB/z3eToq2IRno6ltdshbYm7FTBcb3PtWMWC0uVUSa9rf3D8s1//vDROrbCoc0wZUrFjFz+RznDJiUeawiJzXsjjTdJXjBNu2m4j1ZZYxfYZ/dXbpUeQG1uNiz0b2qRHY8xlfIKFtsQLzynMF13Z8TCuslVYlKRQF9RSteX3WwzH794f6ELQ3G77X8L9u1Xc+FgMvYx2v8JpTfYcdHKDJOkKc4wa3uHOUc+FcOlezUKdU5Kzhn+Vc/wGWhZtn</latexit>


DQNs - Mnih et al (2013, 2015).

Weight updates Slowly changing


change targets target network.

0 0
• Target Networks: Yk = r + max
0
Q(s , a ; ✓k )
<latexit sha1_base64="r9AwJBx8MBnFYR6/ZWzLtWavi5A=">AAADC3ichVHLahRBFD1pX0l8ZNSlm8JBJqIO3SIoiDA+cSMk4CSRdGyqO5Weop9U1wTjMJ/gb/gD7sStO7cK+i0uPF12BA2Saqrvvefee+o+4jrXjfX97wveiZOnTp9ZXFo+e+78hZXexUsbTTU1iRonVV6ZrVg2KtelGlttc7VVGyWLOFebcfa49W/uK9PoqnxpD2q1U8i01Hs6kZZQ1Bu9ijLxQBhxQ4SpLAopwkK+iWZyIEJdtoadJDKfPZzPxfpqM7gp5OC+CO1EWRllr29dj3p9f+i7I44qQaf00Z21qvcDIXZRIcEUBRRKWOo5JBp+2wjgoya2gxkxQ007v8Icy8ydMkoxQhLN+E9pbXdoSbvlbFx2wldyXsNMgWu8zxxjzOj2VUW9ofzJ+9Zh6X9fmDnmtsIDypiMS47xBXGLCSOOyyy6yMNajs9su7LYwz3XjWZ9tUPaPpM/PE/oMcQy5xF46iJTcsTO3ucESsoxK2infMggXMe7lNJJ5VjKjlGSz1C202c9XHPw71KPKhu3h4E/DNbv9EePuoUv4gquYpVbvYsRnmONdSR4jy/4im/eO++D99H79DvUW+hyLuOv433+BQ6xpek=</latexit>
a 2A
• Update every C iterations: k <latexit sha1_base64="UGiqjoNHpkRPMwNvSShfQurHyLY=">AAAC93ichVHLShxBFD12fD8ncZlN4SC40KFbhAQhIJkkZBNQyKjg6FDdUzM2/aS6xjAZ8hP5AXfBrbtsk7+I3+LC05VWUBGrqb63zj331L11/TwOC+O6/8acF+MTk1PTM7Nz8wuLS7WXr/aLbKAD1QqyONOHvixUHKaqZUITq8NcK5n4sTrwo2YZPzhTugiz9KsZ5uo4kf007IWBNIQ6tfVItJOsK5rinXC3RducKiM70cmGaMeqZ6TW2bc7tFOruw3XLvHY8SqnjmrtZrUrtNFFhgADJFBIYejHkCj4HcGDi5zYMUbENL3QxhV+YJa5A7IUGZJoxH+fp6MKTXkuNQubHfCWmFszU2CV+5NV9Mkub1X0C9pr7u8W6z95w8gqlxUOaX0qzljFL8QNTsl4LjOpmLe1PJ9ZdmXQw1vbTcj6couUfQZ3Oh8Y0cQiGxH4aJl9avj2fMYXSGlbrKB85VsFYTvu0kprlVVJK0VJPU1bvj7r4Zi9h0N97OxvNjy34e1t1XfeVwOfxmusYI1TfYMdfMYu6wjwE7/xB3+doXPu/HIu/lOdsSpnGfeWc3kDGI6fdg==</latexit>
mod C = 0 : ✓k ✓k
An Intermezzo - Reverse Mode Automatic Di erentiation.
/ W1T rh1 ⇥
<latexit sha1_base64="lLkPXxguK7+zbXK6RAgakpHSbZA=">AAAC/nichVHLShxBFD12YnxEkzEu3TQOAVdDdxJI3Ak+cBNQcBzB0aGqrZkppl9U1wg6DPgb+YHsxK27bJNP0G9x4emyFRIJVlN9b5177ql768o81oUNgpsJ79XryTdT0zOzb+fm372vLXzYL7KhiVQzyuLMHEhRqFinqmm1jdVBbpRIZKxacrBexlunyhQ6S/fsWa6OEtFLdVdHwhLq1D63c5PlNvPbibB92R21xsejvXEn9NupkLF4wvvjY2JWJ6ro1OpBI3DLf+6ElVNHtXay2i3aOEGGCEMkUEhh6ccQKPgdIkSAnNgRRsQMPe3iCmPMMndIliJDEB3w3+PpsEJTnkvNwmVHvCXmNsz08ZF7yylKsstbFf2C9o773GG9/94wcsplhWe0koozTvE7cYs+GS9lJhXzsZaXM8uuLLr45rrRrC93SNln9KSzwYghNnARH5uO2aOGdOdTvkBK22QF5Ss/Kviu4xNa4axyKmmlKKhnaMvXZz0cc/jvUJ87+58aYdAId7/U11argU9jCctY4VS/Yg3b2GEdEX7gF37jj3fh/fQuvasHqjdR5Szir+Vd3wNv7qQZ</latexit>
👎 rout⇥
<latexit sha1_base64="rCa9tok8Lc/37F5Ksj0dPkdTXgM=">AAAC5XichVHLSuRAFD3G8f1qdaebYCO4ahIRnNkJjuJGULBVsEWqYnUsOi+SatFpBH9gdoNbd271d/RbXHhSxgEVsULl3nvuvafuQ2aRLoznPfY5/T8GBoeGR0bHxicmp2rTM/tF2s0D1QzSKM0PpShUpBPVNNpE6jDLlYhlpA5kZ730H5yrvNBpsmcuM3UcizDRbR0IQ+ikNtdKhIyE2zLqwsh2L+2aKxo6VsVJre41PHvcz4pfKXVUZyetPaGFU6QI0EUMhQSGegSBgt8RfHjIiB2jRyynpq1f4QqjzO0ySjFCEO3wH9I6qtCEdslZ2OyAr0S8OTNdLPJuWkbJ6PJVRb2gfOb9Y7Hwyxd6lrms8JJSknHEMm4TNzhjxHeZcRX5Vsv3mWVXBm38tN1o1pdZpOwz+M/zm56cWMd6XGzYyJAc0trnnEBC2WQF5ZTfGFzb8SmlsFJZlqRiFOTLKcvpsx6u2f+41M/K/nLD9xr+7kp97Ve18GHMYwFL3Ooq1rCFHdYR4Bp3uMeDEzp/nX/OzWuo01flzOLdcW5fAG/4mgQ=</latexit>

👎
flatten
👎

W0 , b0 W1 , b1 arg max
„Left“
<latexit sha1_base64="5zdALCuXJo2Klwbow7cSAs79y7k=">AAAC5XichVHLattAFD1W2sZx+nCSXboRNYEsipFKocnO0KZ0E0igfoBtzIwyVoT1QhobXGPoD2QXsu0u2/Z3mm/JIkdTudCGkBGje++59565D5mGQa4d53fFWnvy9Nl6daO2+fzFy1f1re1OnkwzT7W9JEyynhS5CoNYtXWgQ9VLMyUiGaqunHws/N2ZyvIgib/qeaqGkfDjYBx4QhMa1XcHkdDncrzoLkfOW3tlSVqjesNpOubY9xW3VBooz0lSv8EAZ0jgYYoICjE09RACOb8+XDhIiQ2xIJZRC4xfYYkac6eMUowQRCf8+7T6JRrTLjhzk+3xlZA3Y6aNPd7PhlEyunhVUc8pb3m/Gcx/8IWFYS4qnFNKMm4YxmPiGueMeCwzKiNXtTyeWXSlMcaB6SZgfalBij69vzyf6MmITYzHxpGJ9MkhjT3jBGLKNisoprxisE3HZ5TCSGVY4pJRkC+jLKbPerhm9/+l3lc675qu03RP3zdah+XCq3iNN9jnVj+ghS84YR0evuMaP/HL8q0L69K6+hNqVcqcHfxzrB9352+ZYg==</latexit> <latexit sha1_base64="B6z8pbAKi9TMsX+ou2/tTS6TG8o=">AAAC5XichVHLattAFD1W0tZxH3GTXbIRNYUuipFKIe3O0KZ0E0ggfoBtzIwyVoT1QhobXGPoD2QXsu0u2/Z3mm/JIkcTOZCGkhGje++59565D5mGQa4d52/FWlt/8vRZdaP2/MXLV5v111udPJlmnmp7SZhkPSlyFQaxautAh6qXZkpEMlRdOflS+LszleVBEh/reaqGkfDjYBx4QhMa1XcGkdCncrzoLkfue3tlSVqjesNpOubYDxW3VBooz2FSv8IAJ0jgYYoICjE09RACOb8+XDhIiQ2xIJZRC4xfYYkac6eMUowQRCf8+7T6JRrTLjhzk+3xlZA3Y6aNt7zfDKNkdPGqop5TXvP+MJj/3xcWhrmocE4pybhhGA+Ia5wy4rHMqIxc1fJ4ZtGVxhifTDcB60sNUvTp3fF8pScjNjEeG/sm0ieHNPaME4gp26ygmPKKwTYdn1AKI5VhiUtGQb6Mspg+6+Ga3X+X+lDpfGi6TtM9+thofS4XXsUu3uAdt7qHFr7jkHV4+IlL/MYfy7fOrHPr4jbUqpQ527h3rF837EmZZA==</latexit>

<latexit sha1_base64="/+7XIexzwjkO7e1cCKAN+kd+oWk=">AAAC0HichVFLS8NAEJ7GV1tfVY9egkXwVBIR1FvBB16EKvYBbZFNuo2hebFJS2sR8erNq/4y/S0e/HZNBS3SDZuZ/eabb2d2rMhz48Qw3jPa3PzC4lI2l19eWV1bL2xs1uKwL2xetUMvFA2LxdxzA15N3MTjjUhw5lser1u9ExmvD7iI3TC4SUYRb/vMCdyua7MEUJ0Jp+Wz4W2haJQMtfRpx0ydIqWrEhY+qEUdCsmmPvnEKaAEvkeMYnxNMsmgCFibxsAEPFfFOT1QHrl9sDgYDGgPfwenZooGOEvNWGXbuMXDFsjUaRf7XClaYMtbOfwY9hP7XmHOvzeMlbKscARrQTGnFC+BJ3QHxqxMP2VOapmdKbtKqEtHqhsX9UUKkX3aPzqniAhgPRXR6UwxHWhY6jzACwSwVVQgX3mioKuOO7BMWa5UglSRQU/AytdHPRiz+Xeo005tv2QaJfPqoFg+TgeepW3aoT1M9ZDKdEEV1CE7fKFXetOutaH2qD19U7VMmrNFv5b2/AWxrZGg</latexit>

in
<latexit sha1_base64="30IeLuvYXM04yeI3qGVbCOJKvQI=">AAAC1HichVFLS8NAEJ7GV1tfVY9egkXwVBIR9Fh84UWoYB/QFknSbVyaF8m2WGtP4tWbV/1d+ls8+O2aClqkGzYz+803387s2JHHE2EY7xltbn5hcSmbyy+vrK6tFzY2a0nYjx1WdUIvjBu2lTCPB6wquPBYI4qZ5dseq9u9ExmvD1ic8DC4FsOItX3LDXiXO5YA1GwJdifs7ogH45tC0SgZaunTjpk6RUpXJSx8UIs6FJJDffKJUUACvkcWJfiaZJJBEbA2jYDF8LiKMxpTHrl9sBgYFtAe/i5OzRQNcJaaicp2cIuHHSNTp13sc6Vogy1vZfAT2E/se4W5/94wUsqywiGsDcWcUrwELugWjFmZfsqc1DI7U3YlqEtHqhuO+iKFyD6dH51TRGJgPRXR6UwxXWjY6jzACwSwVVQgX3mioKuOO7CWskypBKmiBb0YVr4+6sGYzb9DnXZq+yXTKJlXB8XycTrwLG3TDu1hqodUpguqoA45lxd6pTetpj1oj9rTN1XLpDlb9Gtpz1/QRZOt</latexit>

h1
<latexit sha1_base64="gsADsksDWX5JYixYl2xwdvgqIzE=">AAAC1XichVFLS8NAEJ7GV1tfVY9egkXwVBIR9Fh84UWoYB/QVtmk2zQ0L5JtoZbexKs3r/q39Ld48Ns1FbRIN2xm9ptvvp3ZsSLPTYRhvGe0hcWl5ZVsLr+6tr6xWdjariXhILZ51Q69MG5YLOGeG/CqcIXHG1HMmW95vG71z2S8PuRx4obBrRhFvO0zJ3C7rs0EoFbLZ6Jndce9yZ15XygaJUMtfdYxU6dI6aqEhQ9qUYdCsmlAPnEKSMD3iFGCr0kmGRQBa9MYWAzPVXFOE8ojdwAWB4MB7ePv4NRM0QBnqZmobBu3eNgxMnXax75UihbY8lYOP4H9xH5QmPPvDWOlLCscwVpQzCnFa+CCemDMy/RT5rSW+ZmyK0FdOlHduKgvUojs0/7ROUckBtZXEZ0uFNOBhqXOQ7xAAFtFBfKVpwq66rgDy5TlSiVIFRn0Ylj5+qgHYzb/DnXWqR2WTKNk3hwVy6fpwLO0S3t0gKkeU5muqII6bPTxQq/0ptW1ifaoPX1TtUyas0O/lvb8BQYCk7w=</latexit>
out
<latexit sha1_base64="27Z+EUZI4voX8mlSGY+uruENstY=">AAAC1XichVFLS8NAEJ7GV1tfVY9egkXwVBIR9Fh84UWoYB/QFsmm2xiaF8mmWEtv4tWbV/1b+ls8+O0aBS3SDZuZ/eabb2d2WOS5iTCMt5w2N7+wuJQvFJdXVtfWSxubjSRMY5vX7dAL4xazEu65Aa8LV3i8FcXc8pnHm2xwIuPNIY8TNwyuxSjiXd9yArfv2pYA1OkIfidYfxymYnJTKhsVQy192jEzp0zZqoWld+pQj0KyKSWfOAUk4HtkUYKvTSYZFAHr0hhYDM9VcU4TKiI3BYuDYQEd4O/g1M7QAGepmahsG7d42DEyddrFPleKDGx5K4efwH5g3yvM+feGsVKWFY5gGRQLSvESuKBbMGZl+hnzu5bZmbIrQX06Ut24qC9SiOzT/tE5RSQGNlARnc4U04EGU+chXiCAraMC+crfCrrquAdrKcuVSpApWtCLYeXrox6M2fw71GmnsV8xjYp5dVCuHmcDz9M27dAepnpIVbqgGuqw0cczvdCr1tQm2oP2+EXVclnOFv1a2tMnMHKUOA==</latexit>
Super-Human Performance on ATARI 57 Benchmark.
Mnih et al., 2015

• Reward clipping
• Down-sampling/grey scaling
• Frame skipping/concatenation
• Centered RMSProp
• # Iters between target update
• ER bu er capacity

• Structured Exploration
• Long-Term Credit Assignment
• Sample E ciency
ff
ffi
A Toy Gridworld Example.
Some DQN Hyperparameter Intuition.
Prioritized ER - Schaul et al (2016).

Memory Replay = Computation + Storage


< s1 , a1 , r1 , s2 > <latexit sha1_base64="QvI+qzoQ2hgbj/spAZFptAHpET0=">AAAC33ichVHJSgNBEC3HfR+Xm5fBIHiQMBMEFUQCLngRFMwCGoaeSRuHzEb3JBDFszfx6s2r/pF+iwdftxNBRdJNd1W/qnpdi5eGgcxs+23IGB4ZHRufmJyanpmdmzcXFqsy6QifV/wkTETdY5KHQcwrWZCFvJ4KziIv5DWvva/stS4XMkji86yX8kbEWnFwFfgsA+Say7vSdTYspi6hLumW9lyzYBdtvay/ipMrBcrXaWK+0yU1KSGfOhQRp5gy6CExktgX5JBNKbAG3QIT0AJt53RHU4jtwIvDgwFt427hdZGjMd6KU+poH7+EOAKRFq3hHGlGD97qVw5dQn7g3Gis9e8Pt5pZZdiD9MA4qRlPgGd0DY9BkVHu2c9lcKSqKqMr2tbVBMgv1Yiq0//mOYBFAGtri0WH2rMFDk+/u+hADFlBBqrLfQZLV9yEZFpyzRLnjAx8AlJ1H/lgzM7vof5VqqWiYxeds81CeScf+ASt0CqtY6pbVKZjOkUePrr9TC/0ajDj3ngwHr9cjaE8Zol+LOPpE1KqlM8=</latexit>

..
. <latexit sha1_base64="GIJ6uzVdC/d/ABdzhSnpiIVSi8U=">AAACz3ichVFNS8NAEH3G7++qRy/BIngqiQjqTfADL4KCrYIWSdI1rk2TkN1WVBSv3rzqP9Pf4sGXNRW0SDdsZvbNm7czO34aSaUd533AGhwaHhkdG5+YnJqemS3NzddU0s4CUQ2SKMlOfU+JSMaiqqWOxGmaCa/lR+LEb27n8ZOOyJRM4mN9m4p6ywtjeSkDTxOqnXcaiVYXpbJTccyyex23cMoo1mFS+sA5GkgQoI0WBGJo+hE8KH5ncOEgJVbHPbGMnjRxgQdMMLdNliDDI9rkP+TprEBjnnNNZbID3hJxZ8y0scy9ZxR9svNbBX1F+8l9Z7Dw3xvujXJe4S2tT8Vxo3hAXOOKjH6ZrYLZraV/Zt6VxiU2TDeS9aUGyfsMfnR2GMmINU3Exq5hhtTwzbnDF4hpq6wgf+Wugm06btB6xgqjEheKHvUy2vz1WQ/H7P4daq9TW624TsU9WitvbRYDH8MilrDCqa5jC/s4ZB0BrvGCV7xZR9aN9Wg9fVOtgSJnAb+W9fwFyaWRRg==</latexit>
<latexit

< st , at , rt , st+1 >


Sample Sliding
<latexit sha1_base64="XTY6lskVCUyeompdJHRm95KP73o=">AAAC43ichVFbSxtBFP7cXrz0Fu2L4MtiEAotYbcIKogIaumLYKHRgIYwu5nEJZvdZWYSiMH+Ad+Kr7752v6f9rf44DfTVWhDyQwz58x3zvnmXKIiTbQJgl8z3pOnz57Pzs0vvHj56vWbyuLSsc4HKpb1OE9z1YiElmmSybpJTCobhZKiH6XyJOrtWfvJUCqd5NlXMypksy+6WdJJYmEItSrL27plPvjCXspeujU278PLnValGtQCt/xJJSyVKsp1lFd+4wxt5IgxQB8SGQz1FAKa+xQhAhTEmhgTU9QSZ5e4xAJjB/SS9BBEe7y7fJ2WaMa35dQuOuYvKY9ipI81nk+OMaK3/VVS15R3PBcO6/73h7FjthmOKCMyzjvGQ+IG5/SYFtkvPR9ymR5pqzLoYNNVkzC/wiG2zviRZ58WRaznLD4OnGeXHJF7D9mBjLLODGyXHxh8V3GbUjgpHUtWMgryKUrbfebDMYf/DnVSOf5YC4Na+GW9urtVDnwOK1jFO051A7v4jCPmEeMbbvEDPz3pXXnfves/rt5MGfMWfy3v5h6/4pdW</latexit>

..
.
Uniform Window
<latexit sha1_base64="GIJ6uzVdC/d/ABdzhSnpiIVSi8U=">AAACz3ichVFNS8NAEH3G7++qRy/BIngqiQjqTfADL4KCrYIWSdI1rk2TkN1WVBSv3rzqP9Pf4sGXNRW0SDdsZvbNm7czO34aSaUd533AGhwaHhkdG5+YnJqemS3NzddU0s4CUQ2SKMlOfU+JSMaiqqWOxGmaCa/lR+LEb27n8ZOOyJRM4mN9m4p6ywtjeSkDTxOqnXcaiVYXpbJTccyyex23cMoo1mFS+sA5GkgQoI0WBGJo+hE8KH5ncOEgJVbHPbGMnjRxgQdMMLdNliDDI9rkP+TprEBjnnNNZbID3hJxZ8y0scy9ZxR9svNbBX1F+8l9Z7Dw3xvujXJe4S2tT8Vxo3hAXOOKjH6ZrYLZraV/Zt6VxiU2TDeS9aUGyfsMfnR2GMmINU3Exq5hhtTwzbnDF4hpq6wgf+Wugm06btB6xgqjEheKHvUy2vz1WQ/H7P4daq9TW624TsU9WitvbRYDH8MilrDCqa5jC/s4ZB0BrvGCV7xZR9aN9Wg9fVOtgSJnAb+W9fwFyaWRRg==</latexit>
<latexit

< sN , aN , rN , sN +1 >
<latexit sha1_base64="u4LyjPAaXh2HyFFXAPUlNfjF7S8=">AAAC43ichVFdSxtBFD1uazVq27S+CL4shkLBEnaLoAURwSq+GCKYRNAQZtdJXLLZXWY2gg3pH+hb6atvvtr/0/6WPnhmuhFUxBlm7p1z7z1zP4IsjnTueX+mnBcvp1/NzJbm5hdev3lbfve+qdOhCmUjTONUHQdCyzhKZCOP8lgeZ0qKQRDLVtDfMfbWhVQ6SpOj/DKT7YHoJVE3CkVOqFNe2tSd2idXmEuZS3dGtVV/vNUpV7yqZ5f7WPELpYJi1dPyX5ziDClCDDGARIKcegwBzX0CHx4yYm2MiClqkbVLjDHH2CG9JD0E0T7vHl8nBZrwbTi1jQ75S8yjGOniA8+eZQzobX6V1DXlP55vFus9+cPIMpsMLykDMpYs4wHxHOf0eC5yUHhOcnk+0lSVo4sNW03E/DKLmDrDO56vtChifWtxsWs9e+QI7PuCHUgoG8zAdHnC4NqKzyiFldKyJAWjIJ+iNN1nPhyz/3Coj5Xm56rvVf3Dtcr2l2Lgs1jGCj5yquvYxj7qzCPEd1zjBr8d6fxwfjq//rs6U0XMIu4t5+oWS/2Wvg==</latexit>

• Prioritization - Sample proportionately to learning progress: | | <latexit sha1_base64="kKDNGSqLKsxtSjjFvGNh62/er9U=">AAAC0XichVFLS8NAEJ7GV1tfVY9egkXwVBIR9Fh84UWoaB/QFtmk2xiaF8m2UFtBvHrzqn9Mf4sHv11TQYt0w2Zmv/nm25kdK/LcRBjGe0abm19YXMrm8ssrq2vrhY3NWhL2Y5tX7dAL44bFEu65Aa8KV3i8EcWc+ZbH61bvRMbrAx4nbhjciGHE2z5zArfr2kwAaoxbHe4JNr4tFI2SoZY+7ZipU6R0VcLCB7WoQyHZ1CefOAUk4HvEKMHXJJMMioC1aQQshueqOKcHyiO3DxYHgwHt4e/g1EzRAGepmahsG7d42DEyddrFPleKFtjyVg4/gf3EvleY8+8NI6UsKxzCWlDMKcVL4ILuwJiV6afMSS2zM2VXgrp0pLpxUV+kENmn/aNzikgMrKciOp0ppgMNS50HeIEAtooK5CtPFHTVcQeWKcuVSpAqMujFsPL1UQ/GbP4d6rRT2y+ZRsm8OiiWj9OBZ2mbdmgPUz2kMl1QBXXIab7QK71p19pQe9SevqlaJs3Zol9Le/4CFVSSNQ==</latexit>

1
• Rank-based: pi =
rank(i) p↵
P (i) = P i ↵
<latexit sha1_base64="BmfOs7BQF0270il7YmNsgVjeEfQ=">AAAC4nichVFLS8NAEJ7GV1tfVQ8evASLoJeSiKAXofjCi1DBPqCVskm3dWmahE1aqKV/wJt49eZVf5D+Fg9+u6aCirhhM7PffPPtzI4TeiKKLes1ZUxNz8zOpTPZ+YXFpeXcymolCvrS5WU38AJZc1jEPeHzcixij9dCyVnP8XjV6R6reHXAZSQC/yoehvy6xzq+aAuXxYCaufWwKcxDs9GWzB3Z45Fkfndb7IybubxVsPQyfzt24uQpWaUg90YNalFALvWpR5x8iuF7xCjCVyebLAqBXdMImIQndJzTmLLI7YPFwWBAu/h3cKonqI+z0ox0totbPGyJTJO2sM+0ogO2upXDj2DfsW811vnzhpFWVhUOYR0oZrTiBfCYbsD4L7OXMCe1/J+puoqpTQe6G4H6Qo2oPt0vnRNEJLCujph0qpkdaDj6PMAL+LBlVKBeeaJg6o5bsExbrlX8RJFBT8Kq10c9GLP9c6i/ncpuwbYK9uVevniUDDxNG7RJ25jqPhXpnEqoQ9X+RM/0YrSMO+PeePikGqkkZ42+LePxA8gbl9I=</latexit>

• Proportional: pi = | i | + ✏
<latexit sha1_base64="btkeTQiBG56RduA5xIvIohYn9KU=">AAAC5nichVHPSxtBFP5crVXbamqPuSyGQqEQdougl0KoVrwUIjQqGAmzmzEZMvuD3YmgiYf+A70Vr715bf+c9m/pwW+mm0IVcZbZ9+Z73/vmvXlRrlVpguDXnDe/8GTx6dLyyrPnL1bXai/XD8tsXMSyE2c6K44jUUqtUtkxymh5nBdSJJGWR9Fox8aPzmVRqiz9bC5yeZqIQarOVCwMoV6tnveU/96fdvtSG9FTU/+t35V5qbSNNoJm4JZ/3wkrp4FqtbPab3TRR4YYYySQSGHoawiU/E4QIkBO7BQTYgU95eISV1hh7pgsSYYgOuJ/wNNJhaY8W83SZce8RXMXzPTxmnvPKUZk21sl/ZL2D/elwwYP3jBxyrbCC9qIistO8RNxgyEZj2UmFXNWy+OZtiuDM2y7bhTryx1i+4z/6ewyUhAbuYiPj445oEbkzud8gZS2wwrsK88UfNdxn1Y4K51KWikK6hW09vVZD8cc3h3qfefwXTMMmuHBZqP1oRr4EurYwBtOdQst7KPNOmJ8wQ1+4Kc39L5637zrv1Rvrsp5hf+W9/0W4JyZYg==</latexit>

k pk
<latexit sha1_base64="ePDtEQo9nN8Ge0ikHR/pldGVELw=">AAAC9nichVFdS9xAFD2mX2prm9ZHX4JLwSIsSSm0LwWprfgirOC6C8aGSZyNQz6ZZKW6+CP6B/omvvatr/oz7G/pgyfTrGClOGFy75x77pl754Zlqqrada9mrAcPHz1+Mjs3//TZwvMX9stXu1Ux1pHsR0Va6GEoKpmqXPZrVadyWGopsjCVgzBZb+KDI6krVeQ79XEp9zMR52qkIlETCuzV3op643x0/JEW0aQM1FdfpOWhOJ341TgLEqcMkikU2B2365rl3HW81umgXb3C/g0fBygQYYwMEjlq+ikEKn578OCiJLaPCTFNT5m4xCnmmTsmS5IhiCb8xzzttWjOc6NZmeyIt6TcmpkOXnNvGMWQ7OZWSb+i/cN9YrD4vzdMjHJT4TFtSMU5o7hFvMYhGfdlZi1zWsv9mU1XNUb4YLpRrK80SNNndKPzmRFNLDERB18MM6ZGaM5HfIGcts8KmleeKjim4wNaYaw0KnmrKKinaZvXZz0cs/fvUO86u2+7ntv1tt911j61A5/FEpaxwqm+xxo20WMdEb7jFy5waX2zflhn1vlfqjXT5izi1rJ+XgNP26Bi</latexit>
Prioritized Experience Replay - Schaul et al (2016).
Prioritized Experience Replay - Schaul et al (2016).
s, a, r, s0 ⇠ U(D) 6=
<latexit sha1_base64="73O/+7alrZixHI4+pJk3ZqKhqiA=">AAAC8XichVHLShxBFD12TBzNa6LLbJoMUQPD0B0EXQ4+ghvBgKOCI1LdljPF9IuqGsEM8wf5AXfi1p3b5EfMt2Th6bIVjASrqb6nzr331L11oyJRxgbBzYT3YvLlq6na9MzrN2/fva9/mN01+VDHshPnSa73I2FkojLZscomcr/QUqRRIveiwVrp3zuV2qg827FnhTxMRS9TJyoWltRRfd40RVM3zYLfNSr1u6mw/Vgko8548QGvj78c1RtBK3DLfwrCCjRQre28/gddHCNHjCFSSGSwxAkEDL8DhAhQkDvEiJwmUs4vMcYMc4eMkowQZAf893g6qNiM51LTuOyYtyTcmpk+PnN/c4oRo8tbJbGh/cv9w3G9/94wcsplhWe0ERWnneIWeYs+I57LTKvI+1qezyy7sjjBiutGsb7CMWWf8YPOOj2a3MB5fGy4yB41Inc+5QtktB1WUL7yvYLvOj6mFc5Kp5JVioJ6mrZ8fdbDMYf/DvUp2P3aCoNW+H2p0V6tBl7DR3zCIqe6jDY2sc06YvzENX7ht2e8c+/Cu7wL9SaqnDk8Wt7VLXTencM=</latexit>
<latexit sha1_base64="1f4elkI6jauiTUz9fJjT+o6j5MI=">AAACzXichVFLS8NAEJ7GV1tfVY9egkXwVBIR9Fh84UWsYB/QFknSbQzNy82mUKtevXnVv6a/xYPfrqmgRdywmdlvvvl2ZseOfS8RhvGW02Zm5+YX8oXi4tLyymppbb2RRCl3WN2J/Ii3bCthvheyuvCEz1oxZ1Zg+6xpD45kvDlkPPGi8EqMYtYNLDf0+p5jCQl1QnZ7XSobFUMtfdoxM6dM2apFpXfqUI8iciilgBiFJOD7ZFGCr00mGRQD69IYGIfnqTijByoiNwWLgWEBHeDv4tTO0BBnqZmobAe3+NgcmTptY58qRRtseSuDn8B+YN8pzP3zhrFSlhWOYG0oFpTiOXBBN2D8lxlkzEkt/2fKrgT16UB146G+WCGyT+db5xgRDmygIjqdKKYLDVudh3iBELaOCuQrTxR01XEP1lKWKZUwU7Sgx2Hl66MejNn8PdRpp7FbMY2KeblXrh5mA8/TJm3RDqa6T1U6oxrqcND1M73Qq3ahpdq99vhF1XJZzgb9WNrTJ0bNkE8=</latexit>
s, a, r, s0 ⇠ P (D)
<latexit sha1_base64="XKKhsBXDTvhUHV0bAadFhsmVRjw=">AAAC53ichVFNSxxBEH1OTPxK4poc42FwCTGwLDMi6FHUBC/CCq4KrkjP2K7NzhfdvYIuXvwD3iTX3LyafxN/iwfftKOgIvbQU9WvXr2u6oqKRBkbBP+HvHfD7z+MjI6NT3z89HmyNvVly+R9Hct2nCe53omEkYnKZNsqm8idQkuRRoncjnorZXz7WGqj8mzTnhRyLxXdTB2qWFhC+7Vp0xAN3TA//I5Rqd+a7aTCHsUiGaye/dyv1YNm4Jb/0gkrp45qtfLaDTo4QI4YfaSQyGDpJxAw/HYRIkBBbA8DYpqecnGJM4wzt0+WJEMQ7fHf5Wm3QjOeS03jsmPeknBrZvr4zv3bKUZkl7dK+ob2lvvUYd1Xbxg45bLCE9qIimNOcZ24xREZb2WmFfOhlrczy64sDrHoulGsr3BI2Wf8qLPKiCbWcxEfvxyzS43InY/5AhltmxWUr/yg4LuOD2iFs9KpZJWioJ6mLV+f9XDM4fOhvnS25pph0Aw35utLy9XAR/ENM5jlVBewhDW0WEeMc1zhGv885V14l96fe6o3VOV8xZPl/b0DbOyZLA==</latexit>

✓ ◆
• Solution: Importance sampling! 1 1
wi =
• Starting from 0.4 linearly anneal <latexit sha1_base64="oJ4y4Kl2aVGuBNYyK0aJzqlJVgw=">AAACznichVFNS8NAEJ3Gr7Z+VT16CRbBU0lE0GPxCy9CBdMKbZFNuo1L0yRstoVaildvXvWn6W/x4MuaClqkGzYz++bN25kdNw5EoizrPWcsLC4tr+QLxdW19Y3N0tZ2PYkG0uOOFwWRvHNZwgMRckcJFfC7WHLWdwPecHtnabwx5DIRUXirRjFv95kfiq7wmALktFyu2H2pbFUsvcxZx86cMmWrFpU+qEUdisijAfWJU0gKfkCMEnxNssmiGFibxsAkPKHjnCZURO4ALA4GA9rD38epmaEhzqlmorM93BJgS2SatI99qRVdsNNbOfwE9hP7UWP+vzeMtXJa4QjWhWJBK14DV/QAxrzMfsac1jI/M+1KUZdOdDcC9cUaSfv0fnTOEZHAejpi0oVm+tBw9XmIFwhhHVSQvvJUwdQdd2CZtlyrhJkig56ETV8f9WDM9t+hzjr1w4ptVeybo3L1NBt4nnZpjw4w1WOq0hXVUIcHzRd6pTejZgyNifH0TTVyWc4O/VrG8xdBN5Cx</latexit>
to 1. <latexit sha1_base64="BcOHQCTEBBOL8YICQNWs03AZF7k=">AAAC/nichVFNT9tAEH1xPyD0K5QjF6tRpXCJ7IJULkiIFsSlKJUaQMI0WpuNs4pjW+sNiEaR+jf4A9wQV269tj+B/pYe+rw1SC2qWGs9b9/MvJ3ZCfNEFcbzrmvOg4ePHs/M1ueePH32/EVj/uVukY11JLtRlmR6PxSFTFQqu0aZRO7nWopRmMi9cPiu9O8dS12oLP1kTnN5OBJxqvoqEoZUr7F80lPumhsksm9aQV+LaOJPJztT9xZ3WmppGmgVD8zS5yCURvQaTa/t2eXeBX4FmqhWJ2v8RIAjZIgwxggSKQxxAoGC3wF8eMjJHWJCThMp65eYYo65Y0ZJRgiyQ/5jng4qNuW51CxsdsRbEm7NTBevubesYsjo8lZJXND+4v5iufi/N0ysclnhKW1IxbpV/EDeYMCI+zJHVeRNLfdnll0Z9LFqu1GsL7dM2Wd0q/OeHk1uaD0uNm1kTI3Qno/5AiltlxWUr3yj4NqOj2iFtdKqpJWioJ6mLV+f9XDM/r9DvQt237R9r+1/XGmub1QDn8UiXqHFqb7FOrbRYR0RzvAN3/HD+eqcOxfO5Z9Qp1blLOCv5Vz9BjkPo0Q=</latexit>
N P (i)

LIS
<latexit sha1_base64="MCZYi/6icjs1qtWtW72l+f2hvwc=">AAADJnichVFdaxNBFL1dv9L60VQffRkMYoQ07BahhSKEakTBQkpNW0limNlOkyH7xcykUpf9P/4N/4BvIvrkm6/67oNnx21Bi3SW2XPn3HvP3DtXZJEy1ve/LHiXLl+5eq22uHT9xs1by/WV23smnetQ9sM0SvWB4EZGKpF9q2wkDzIteSwiuS9mT0r//rHURqXJK3uSyVHMJ4k6UiG3oMb1wTDmdhryKH9ZjPMXu6vbu1vdgj1mjhci74I2Ld7SLfOADY2K2VlGrygGb5s7Tbg32dBOpeXj2UO2yl4D3qyNxvWG3/bdYueNoDIaVK1eWv9KQzqklEKaU0ySErKwI+Jk8A0oIJ8ycCPKwWlYyvklFbSE3DmiJCI42Bn+E5wGFZvgXGoalx3ilghbI5PRfexnTlEgurxVwjbAX9jvHDf57w25Uy4rPAEKKC46xW3wlqaIuCgzriJPa7k4s+zK0hFtuG4U6sscU/YZnuk8hUeDmzkPo66LnEBDuPMxXiAB9lFB+cqnCsx1fAjkDqVTSSpFDj0NLF8f9WDMwb9DPW/srbUDvx3sPGp0tqqB1+gu3aMmprpOHXpOPdQR0gf6Tj/op/fe++h98j7/CfUWqpw79Nfyvv0GjPqxgg==</latexit>
M SBE = Es,a,r,s0 ⇠P [w(Q(s, a; ✓k ) Yk ) 2 ]
Prioritized Experience Replay - Schaul et al (2016).
The One Equation Summary.
Motivation: Curse of Dimensionality in large state spaces + e ciency

Fitted Q-L. Yk = r + max Q(s 0 0


, a ; ✓k )
0 a 2A
<latexit sha1_base64="1wD1k29/nTB246kxSB2PhWZe0ik=">AAADCnichVHLahRBFD1pX8n4GnXppnCQiShDtwgJiBifuBEScJJIOjTVnUpP0U+qa4JxmD/wN/wBd+LWnVtd6Le48HTZETRIqqm+955776n7iOtcN9b3vy94p06fOXtucal3/sLFS5f7V65uNtXUJGqcVHlltmPZqFyXamy1zdV2bZQs4lxtxdmT1r91oEyjq/KVPazVbiHTUu/rRFpCUf/h6ygTD4QRt0WYyqKQIizkm2gmhyLUZWvYSSLz2aP5XGwsN8M7Qg7vi9BOlJVRdqsX9Qf+yHdHHFeCThmgO+tV/wdC7KFCgikKKJSw1HNINPx2EMBHTWwXM2KGmnZ+hTl6zJ0ySjFCEs34T2ntdGhJu+VsXHbCV3Jew0yBm7zPHWPM6PZVRb2h/Mn71mHpf1+YOea2wkPKmIxLjvElcYsJI07KLLrIo1pOzmy7stjHqutGs77aIW2fyR+ep/QYYpnzCDxzkSk5YmcfcAIl5ZgVtFM+YhCu4z1K6aRyLGXHKMlnKNvpsx6uOfh3qceVzbujwB8FG/cGa4+7hS/iOm5gmVtdwRpeYJ11JHiPL/iKb94774P30fv0O9Rb6HKu4a/jff4Fng+lXg==</latexit>

0 0
DQN Yk = r + max
0
Q(s , a ; ✓k ) + ER
<latexit sha1_base64="r9AwJBx8MBnFYR6/ZWzLtWavi5A=">AAADC3ichVHLahRBFD1pX0l8ZNSlm8JBJqIO3SIoiDA+cSMk4CSRdGyqO5Weop9U1wTjMJ/gb/gD7sStO7cK+i0uPF12BA2Saqrvvefee+o+4jrXjfX97wveiZOnTp9ZXFo+e+78hZXexUsbTTU1iRonVV6ZrVg2KtelGlttc7VVGyWLOFebcfa49W/uK9PoqnxpD2q1U8i01Hs6kZZQ1Bu9ijLxQBhxQ4SpLAopwkK+iWZyIEJdtoadJDKfPZzPxfpqM7gp5OC+CO1EWRllr29dj3p9f+i7I44qQaf00Z21qvcDIXZRIcEUBRRKWOo5JBp+2wjgoya2gxkxQ007v8Icy8ydMkoxQhLN+E9pbXdoSbvlbFx2wldyXsNMgWu8zxxjzOj2VUW9ofzJ+9Zh6X9fmDnmtsIDypiMS47xBXGLCSOOyyy6yMNajs9su7LYwz3XjWZ9tUPaPpM/PE/oMcQy5xF46iJTcsTO3ucESsoxK2infMggXMe7lNJJ5VjKjlGSz1C202c9XHPw71KPKhu3h4E/DNbv9EePuoUv4gquYpVbvYsRnmONdSR4jy/4im/eO++D99H79DvUW+hyLuOv433+BQ6xpek=</latexit>
a 2A

Double YkDDQN = r + Q(s0 , arg max Q(s0 , a; ✓k ); ✓k )


DQN <latexit sha1_base64="Vj1UfoLZ4Vq3dCc1P2zBboASYVs=">AAADJHichVFNb9NAEJ2a0i+gpHDsZUWEaEUb2ahSK1VIoQTEBdRIpB+qW2vtbJ2VP7XeRLRW/k7/Bn+AG+IAh964wh/gwPPigKBCXWs9b9/MvJ3Z8fNYFtq2P09ZN6ZvzszOzS/cun1n8W5j6d5ekQ1VIHpBFmfqwOeFiGUqelrqWBzkSvDEj8W+Hz2v/PsjoQqZpW/1WS6OEx6m8lQGXIPyGoeHXnRSdjrdN2P2lCn2mLkhTxLOuivFozXGVegm/J1XcubKlAHrQcDj8tl4PInYZq4eCM29aPUPPFlf9RpNu2Wbxa4CpwZNqtdu1vhCLvUpo4CGlJCglDRwTJwKfEfkkE05uGMqwSkgafyCxrSA3CGiBCI42Aj/EKejmk1xrjQLkx3glhhbIZPRQ+yXRtFHdHWrAC5gf2CfGy787w2lUa4qPIP1oThvFF+D1zRAxHWZSR05qeX6zKorTae0ZbqRqC83TNVn8FunA48CFxkPoxcmMoSGb84jvEAK20MF1StPFJjpuA/LjRVGJa0VOfQUbPX6qAdjdv4d6lWw96Tl2C2nu9Fs79QDn6NlekArmOomtekV7aKOgN7TV/pG360L64P10fr0K9SaqnPu01/LuvwJt/Cvaw==</latexit>
a2A

✓ ◆
p↵ 1 1
PER pi = | i | + ✏ + P (i) = P i ↵ + wi =
<latexit sha1_base64="btkeTQiBG56RduA5xIvIohYn9KU=">AAAC5nichVHPSxtBFP5crVXbamqPuSyGQqEQdougl0KoVrwUIjQqGAmzmzEZMvuD3YmgiYf+A70Vr715bf+c9m/pwW+mm0IVcZbZ9+Z73/vmvXlRrlVpguDXnDe/8GTx6dLyyrPnL1bXai/XD8tsXMSyE2c6K44jUUqtUtkxymh5nBdSJJGWR9Fox8aPzmVRqiz9bC5yeZqIQarOVCwMoV6tnveU/96fdvtSG9FTU/+t35V5qbSNNoJm4JZ/3wkrp4FqtbPab3TRR4YYYySQSGHoawiU/E4QIkBO7BQTYgU95eISV1hh7pgsSYYgOuJ/wNNJhaY8W83SZce8RXMXzPTxmnvPKUZk21sl/ZL2D/elwwYP3jBxyrbCC9qIistO8RNxgyEZj2UmFXNWy+OZtiuDM2y7bhTryx1i+4z/6ewyUhAbuYiPj445oEbkzud8gZS2wwrsK88UfNdxn1Y4K51KWikK6hW09vVZD8cc3h3qfefwXTMMmuHBZqP1oRr4EurYwBtOdQst7KPNOmJ8wQ1+4Kc39L5637zrv1Rvrsp5hf+W9/0W4JyZYg==</latexit>

k pk
<latexit sha1_base64="ePDtEQo9nN8Ge0ikHR/pldGVELw=">AAAC9nichVFdS9xAFD2mX2prm9ZHX4JLwSIsSSm0LwWprfgirOC6C8aGSZyNQz6ZZKW6+CP6B/omvvatr/oz7G/pgyfTrGClOGFy75x77pl754Zlqqrada9mrAcPHz1+Mjs3//TZwvMX9stXu1Ux1pHsR0Va6GEoKpmqXPZrVadyWGopsjCVgzBZb+KDI6krVeQ79XEp9zMR52qkIlETCuzV3op643x0/JEW0aQM1FdfpOWhOJ341TgLEqcMkikU2B2365rl3HW81umgXb3C/g0fBygQYYwMEjlq+ikEKn578OCiJLaPCTFNT5m4xCnmmTsmS5IhiCb8xzzttWjOc6NZmeyIt6TcmpkOXnNvGMWQ7OZWSb+i/cN9YrD4vzdMjHJT4TFtSMU5o7hFvMYhGfdlZi1zWsv9mU1XNUb4YLpRrK80SNNndKPzmRFNLDERB18MM6ZGaM5HfIGcts8KmleeKjim4wNaYaw0KnmrKKinaZvXZz0cs/fvUO86u2+7ntv1tt911j61A5/FEpaxwqm+xxo20WMdEb7jFy5waX2zflhn1vlfqjXT5izi1rJ+XgNP26Bi</latexit>
<latexit sha1_base64="BcOHQCTEBBOL8YICQNWs03AZF7k=">AAAC/nichVFNT9tAEH1xPyD0K5QjF6tRpXCJ7IJULkiIFsSlKJUaQMI0WpuNs4pjW+sNiEaR+jf4A9wQV269tj+B/pYe+rw1SC2qWGs9b9/MvJ3ZCfNEFcbzrmvOg4ePHs/M1ueePH32/EVj/uVukY11JLtRlmR6PxSFTFQqu0aZRO7nWopRmMi9cPiu9O8dS12oLP1kTnN5OBJxqvoqEoZUr7F80lPumhsksm9aQV+LaOJPJztT9xZ3WmppGmgVD8zS5yCURvQaTa/t2eXeBX4FmqhWJ2v8RIAjZIgwxggSKQxxAoGC3wF8eMjJHWJCThMp65eYYo65Y0ZJRgiyQ/5jng4qNuW51CxsdsRbEm7NTBevubesYsjo8lZJXND+4v5iufi/N0ysclnhKW1IxbpV/EDeYMCI+zJHVeRNLfdnll0Z9LFqu1GsL7dM2Wd0q/OeHk1uaD0uNm1kTI3Qno/5AiltlxWUr3yj4NqOj2iFtdKqpJWioJ6mLV+f9XDM/r9DvQt237R9r+1/XGmub1QDn8UiXqHFqb7FOrbRYR0RzvAN3/HD+eqcOxfO5Z9Qp1blLOCv5Vz9BjkPo0Q=</latexit>
N P (i)

Dueling Q(s, a; ✓1 , ✓2 , ✓3 ) = V (s; ✓1 , ✓3 ) + (A(s, a; ✓1 , ✓2 )


1 X
A(s, a; ✓1 , ✓2 ))
DQN <latexit sha1_base64="Qj6I6myIJieskVQtPqX9NpAYMl0=">AAADZHichVHtShtBFL2bta1frWnFXwUZDEJCNe7aQgulotiW/hEUTBRcK7PjJFmyX8xMBLvmQfpo7QP0KQr27HQjtCrOMnvPnHvvmXvnhnkcaeN5P5yaO/Xo8ZPpmdm5+afPFurPX3R1NlJCdkQWZ+o45FrGUSo7JjKxPM6V5EkYy6NwuFv6jy6k0lGWHprLXJ4mvJ9GvUhwA+qs/v2gqdcYf88CM5CGf/XXJmjzBr1usQ+s29R3BMH1ijV3oHGnRIuts6CnuCj8cXEVJNwMBI+LnfHVmAV6lLD7M1tn9YbX9uxit4FfgQZVaz+r/6SAzikjQSNKSFJKBjgmThrfCfnkUQ7ulApwCiiyfkljmkXuCFESERzsEP8+TicVm+JcamqbLXBLjK2QyWgV+7NVDBFd3iqBNexv7G+W6997Q2GVywovYUMozljFPfCGBoh4KDOpIie1PJxZdmWoR+9sNxHqyy1T9iludD7Co8ANrYfRJxvZh0Zozxd4gRS2gwrKV54oMNvxOSy3VlqVtFLk0FOw5eujHozZ/3+ot0F3s+17bf/gTWN7qxr4NL2kFWpiqm9pm77QPuoQdO2sOm1no/bLnXcX3aW/oTWnylmkf5a7/AddI77E</latexit>
|A|

Many Open Qs: Intrinsic Motivation, Partial Obs., Multi-Agent, Transfer

You might also like