Dlvu Lecture12

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Lecture 12: Transformers

Peter Bloem
Deep Learning 2020

dlvu.github.io

THE PLAN

part one: self-attention

part two: transformers

part three: famous transformers

part four: advanced tricks

A recurrent neural network is any neural network


that has a cycle in it

PART ONE: SELF-ATTENTION

RECAP: SEQUENCE-TO-SEQUENCE LAYERS

outputs

s2s layer

inputs

time

4
RECAP: SEQUENCE-TO-SEQUENCE LAYERS

Defining property: can handle sequences of different lengths with the


same parameters.

Versatile: label-to-sequence, sequence-to-label, sequence-to-sequence,


autoregressive training.

Causal or non-causal: casual models can only look backward.

RECURRENT CONNECTIONS, CONVOLUTIONS We’ve seen two examples of (non-trivial)


RNN CNN
sequence-to-sequence layers so far: recurrent
neural networks, and convolutions. RNNs have
the benefit that they can potentially look infinitely
far back into the sequence, but they require
fundamentally sequential processing, making
Conv1D layer
RNN layer

h0 h1 h2 h3 h4
them slow. Convolution don’t have this drawback
—we can compute each output vector in parallel if
we want to—but the downside is that they are
limited in how far back they can look into the
sequence.
sequential processing finite “memory”
6
Self-attention is another sequence-to-sequence
layer, and one which provides us with the best of
both worlds: parallel processing and a potentially
infinite memory.

SELF-ATTENTION

Best of both worlds: parallel computation and long dependencies.

Simple self-attention: the basic idea


Practical self-attention: adding some bells and whistles.

We’ll explain the name later.


7
SELF-ATTENTION At heart, the operation of self-attention is very
simple. Every output is simply a weighted sum
y1 y21 y312 yy4231 yy51342 yy624531 y35642 y4653 y564 y65 y6
<latexit sha1_base64="xeGfV1YPMsYDYKSe/yoPKYFBu90=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQC92lsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+aRR0A</latexit> <latexit sha1_base64="xeGfV1YPMsYDYKSe/yoPKYFBu90=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQC92lsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+aRR0A</latexit> <latexit sha1_base64="xeGfV1YPMsYDYKSe/yoPKYFBu90=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQC92lsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+aRR0A</latexit> <latexit sha1_base64="xeGfV1YPMsYDYKSe/yoPKYFBu90=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQC92lsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+aRR0A</latexit> <latexit sha1_base64="xeGfV1YPMsYDYKSe/yoPKYFBu90=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQC92lsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+aRR0A</latexit> <latexit sha1_base64="xeGfV1YPMsYDYKSe/yoPKYFBu90=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQC92lsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+aRR0A</latexit>

over the inputs. The trick is that the weights in this


outputs sum are not parameters. They are derived from
the inputs.
<latexit sha1_base64="tTo7PvB30kPtldb77IGPHijn3Rc=">AAAN8HicfZdNb9s2HMbV7q3L6i3djrsQCzoMQxBIieOXQ4Baso0e1jYN8tItCgyKpmXVkkhQlGNX0HlfYbdh132BfZYddt2+xijbkSWKsk5/8Hn46Ke/SJtyqO9FXNf/fvT4o48/+fSzJ5/vffG08eVX+8++vo5IzBC+QsQn7J0DI+x7Ib7iHvfxO8owDBwf3zgzK9Nv5phFHgkv+ZLiuwC6oTfxEORiaLT/1kZuYi/TkQfOwPd2FAej98BGJLlPR4n3PhW1k9iLNBu192yOFzy59/gUpEr3GTBG+wf6kb66QLUwNsWBtrnOR8+e/rpnjwmKAxxy5MMoujV0yu8SyLiHfJzu2XGEKUQz6OLbmE86d4kX0pjjEKXgudAmsQ84AdnzgbHHMOL+UhQQMU8kADSFDCIuurBXjopwCAMcHY7nHo3WZTR31wWHooV3yWLV4rQ0MXEZpFMPLUpkCQyiAPJpZTBaBk55EMc+ZvOgPJhRCkbJucAMeVHWg3PRmDc0e2vRJTnf6NMlneIwSpOY+WlxohAwY3giJq7KCPOYJquHEUtlFp1xFuPDrFyNnfUhm13g8aHIKQ2UcSY+gbw85EiPsZiIXmf9CvE9IkEAw3Fi0zRZLx378CgV4nPg+MLvEMjGZedFmiR21kbHARfCWhJfF8TXsjgoiIPNTYg/BhPCwFwsCcIiIIxAWJiHcFSefZXPnoArOfq6IF7L4k1BvJFFJy6ocUWdF9R5Rb0vqPeyuiiIC1lcFsRlJZcXVC6rHwriB1l8t+umP++66S9SrHg7YhctxcbHE/HTtVpzyQylycvLVz+lSXd1bVZKjIFRNiLnwXgybLW77VSW/Qe9OewYZr+q54a22TMslSF39NqW3h9sWY4lbw6t661BryVHIX+rd7rWsKpvYfVeu28qDFvaodUctDftwziUrG7us7qtZqUtbp7TNU3ztFvVc4PZtKzOscKQO6x+v9+zVig0ZtTHkpc+GFutU70aRfOgjt5q9hT69gXoHdOswNICizk0jb6xYuEY+pKT54vF6vS6AzmIb/tv9iyr8gJ5of0dy+g3FYYt62n/dHCyIiEMhq7cFZK3r9XunHTkKJIHDdviDVZYyPZOw66ptysspMAyNC0za2x5J4pNdmvcJesf5O2+AwcGSFPZLDaabM723oNZ8voKs1/vVtp3+dUT/Fr46pMiVBePFOGoFgapWFA9PFLCo13wbtXv1sW7inC3FsZVsbj18K4S3t0FT6t+WhdPFeG0FoaqWGg9PFXC013wvOrndfFcEc5rYbiKhdfDcyU83wVPqn5SF08U4aQWhqhYSD08UcKTMrz488iO+dAH2TGZ+MALNweD0m8WzY4PMyROkmv3+sH7WHwuMPxKHCveiDMuFGe8HxMbMjfwwlR8Prj2YVbtMsLFg1FU4tPFkD9UqsX18ZFxeqS/bR68MDcfMU+0b7XvtB80Q2trL7SX2rl2pSHtL+0f7V/tvwZr/Nb4vfHH2vr40WbON1rpavz5P1CnIqw=</latexit>

X Note that this means that the input and output


self-attention
yi = wij xj dimensions of a self-attention layer are always the
j same. If we want to transform to a different
X
with wij = 1 dimension, we’ll need to add a projection layer.
inputs
j
x1 xx21 xx321 xxx4321xxx54321xxx6x54321xxx65432xxx6543 xx654 xx65 x6
<latexit sha1_base64="UwzBCGnFqsCXotGJ+NPMfcVd3xo=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQDd2Fsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+5Exzc</latexit> <latexit sha1_base64="UwzBCGnFqsCXotGJ+NPMfcVd3xo=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQDd2Fsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+5Exzc</latexit> <latexit sha1_base64="UwzBCGnFqsCXotGJ+NPMfcVd3xo=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQDd2Fsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+5Exzc</latexit> <latexit sha1_base64="UwzBCGnFqsCXotGJ+NPMfcVd3xo=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQDd2Fsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+5Exzc</latexit> <latexit sha1_base64="UwzBCGnFqsCXotGJ+NPMfcVd3xo=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQDd2Fsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+5Exzc</latexit> <latexit sha1_base64="UwzBCGnFqsCXotGJ+NPMfcVd3xo=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQDd2Fsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+5Exzc</latexit>

<latexit sha1_base64="tTo7PvB30kPtldb77IGPHijn3Rc=">AAAN8HicfZdNb9s2HMbV7q3L6i3djrsQCzoMQxBIieOXQ4Baso0e1jYN8tItCgyKpmXVkkhQlGNX0HlfYbdh132BfZYddt2+xijbkSWKsk5/8Hn46Ke/SJtyqO9FXNf/fvT4o48/+fSzJ5/vffG08eVX+8++vo5IzBC+QsQn7J0DI+x7Ib7iHvfxO8owDBwf3zgzK9Nv5phFHgkv+ZLiuwC6oTfxEORiaLT/1kZuYi/TkQfOwPd2FAej98BGJLlPR4n3PhW1k9iLNBu192yOFzy59/gUpEr3GTBG+wf6kb66QLUwNsWBtrnOR8+e/rpnjwmKAxxy5MMoujV0yu8SyLiHfJzu2XGEKUQz6OLbmE86d4kX0pjjEKXgudAmsQ84AdnzgbHHMOL+UhQQMU8kADSFDCIuurBXjopwCAMcHY7nHo3WZTR31wWHooV3yWLV4rQ0MXEZpFMPLUpkCQyiAPJpZTBaBk55EMc+ZvOgPJhRCkbJucAMeVHWg3PRmDc0e2vRJTnf6NMlneIwSpOY+WlxohAwY3giJq7KCPOYJquHEUtlFp1xFuPDrFyNnfUhm13g8aHIKQ2UcSY+gbw85EiPsZiIXmf9CvE9IkEAw3Fi0zRZLx378CgV4nPg+MLvEMjGZedFmiR21kbHARfCWhJfF8TXsjgoiIPNTYg/BhPCwFwsCcIiIIxAWJiHcFSefZXPnoArOfq6IF7L4k1BvJFFJy6ocUWdF9R5Rb0vqPeyuiiIC1lcFsRlJZcXVC6rHwriB1l8t+umP++66S9SrHg7YhctxcbHE/HTtVpzyQylycvLVz+lSXd1bVZKjIFRNiLnwXgybLW77VSW/Qe9OewYZr+q54a22TMslSF39NqW3h9sWY4lbw6t661BryVHIX+rd7rWsKpvYfVeu28qDFvaodUctDftwziUrG7us7qtZqUtbp7TNU3ztFvVc4PZtKzOscKQO6x+v9+zVig0ZtTHkpc+GFutU70aRfOgjt5q9hT69gXoHdOswNICizk0jb6xYuEY+pKT54vF6vS6AzmIb/tv9iyr8gJ5of0dy+g3FYYt62n/dHCyIiEMhq7cFZK3r9XunHTkKJIHDdviDVZYyPZOw66ptysspMAyNC0za2x5J4pNdmvcJesf5O2+AwcGSFPZLDaabM723oNZ8voKs1/vVtp3+dUT/Fr46pMiVBePFOGoFgapWFA9PFLCo13wbtXv1sW7inC3FsZVsbj18K4S3t0FT6t+WhdPFeG0FoaqWGg9PFXC013wvOrndfFcEc5rYbiKhdfDcyU83wVPqn5SF08U4aQWhqhYSD08UcKTMrz488iO+dAH2TGZ+MALNweD0m8WzY4PMyROkmv3+sH7WHwuMPxKHCveiDMuFGe8HxMbMjfwwlR8Prj2YVbtMsLFg1FU4tPFkD9UqsX18ZFxeqS/bR68MDcfMU+0b7XvtB80Q2trL7SX2rl2pSHtL+0f7V/tvwZr/Nb4vfHH2vr40WbON1rpavz5P1CnIqw=</latexit>

X
yi = wij xj
j
X
with wij = 1
j
0
<latexit sha1_base64="47bRmxjTbNtwHq9NXos4ppI6b/o=">AAAOCnicfZdNc+M0HMbd5W0pG+jCkYuHzgLDdDp2m+bl0JmNnWT2wO6WTt+gLh1ZUVxvZEsjy2myHp+58Gm4MVz5Ahz5JFyR8+LYshyfNHoePf7pbymRXIr9iBvGvztPPvjwo48/efrp7mfPGp9/sff8y6uIxAyiS0gwYTcuiBD2Q3TJfY7RDWUIBC5G1+7EzvTrKWKRT8ILPqfoLgBe6I99CLjout+bOJAkj+l394n/LtW/PdUTB7qJM0vv/fTXi3X7ne44u0tnbnTGDMDEQTOqFzPSxIniIBtRUe739o1DY/Ho1Ya5auxrq+fs/vmz33adEYFxgEIOMYiiW9Og/C4BjPsQo3TXiSNEAZwAD93GfNy5S/yQxhyFMNVfCG0cY50TPZu5PvIZghzPRQNA5osEHT4AMQku6rNbjopQCAIUHYymPo2WzWjqLRsciOLeJbNF8dPSwMRjgD74cFYiS0AQBYA/VDqjeeCWO1GMEZsG5c6MUjBKzhli0I+yGpyJwryl2feMLsjZSn+Y0wcURmkSM5wWBwoBMYbGYuCiGSEe02QxGbGIJtEpZzE6yJqLvtM+YJNzNDoQOaWOMs4YE8DLXa40jVm2YLJ6hegRkiAA4ShxqFgvHM144hwcpkJ8obtY+F0C2KjsPE+TxMnK6Lr6ubCWxDcF8Y0sDgriYPUSgkf6mDB9KpYEYZEujLqwMB+iqDz6Mh891i/l6KuCeCWL1wXxWhbduKDGFXVaUKcV9bGgPsrqrCDOZHFeEOeVXF5Quay+L4jvZfFm20t/3vbSX6RY8XXELpqLjY/G4kdtseaSCUyTVxevf0yT7uJZrZQY6WbZCN218XjYanfbqSzjtd4cdkyrX9VzQ9vqmbbKkDt6bdvoDzYsR5I3hzaM1qDXkqMg3uidrj2s6htYo9fuWwrDhnZoNwftVfkQCiWrl/vsbqtZKYuX53QtyzrpVvXcYDVtu3OkMOQOu9/v9+wFCo0ZxUjy0rWx1ToxqlE0D+oYrWZPoW8+gNGxrAosLbBYQ8vsmwsWjgCWnDxfLHan1x3IQXxTf6tn25UPyAvl79hmv6kwbFhP+ieD4wUJYSD05KqQvHytdue4I0eRPGjYFl+wwkI2bxp2LaNdYSEFlqFlW1lhyztRbLJb8y5Z/iBv9p2+b+ppKpvFRpPN2d5bmyUvVphxvVtp3+ZXD8C18NWZQlgXDxXhsBYGqlhgPTxUwsNt8F7V79XFe4pwrxbGU7F49fCeEt7bBk+rfloXTxXhtBaGqlhoPTxVwtNt8Lzq53XxXBHOa2G4ioXXw3MlPN8GT6p+UhdPFOGkFoaoWEg9PFHCkzK8+PPIjvkA69kxmWDdD1cHg9JvFs2ODxNxEVm5lxPvI3FdYOi1OFa8FWdcIM54PyQOYF7gh6m4PnjOQdbaZgSztVG0xNXFlC8q1cbV0aF5cmj81Nx/aa0uMU+1r7VvtO81U2trL7VX2pl2qUHtH+2/HW1np/F744/Gn42/ltYnO6sxX2mlp/H3/+F9LNM=</latexit>

wij = xi T xj
0
0 exp wij
<latexit sha1_base64="47bRmxjTbNtwHq9NXos4ppI6b/o=">AAAOCnicfZdNc+M0HMbd5W0pG+jCkYuHzgLDdDp2m+bl0JmNnWT2wO6WTt+gLh1ZUVxvZEsjy2myHp+58Gm4MVz5Ahz5JFyR8+LYshyfNHoePf7pbymRXIr9iBvGvztPPvjwo48/efrp7mfPGp9/sff8y6uIxAyiS0gwYTcuiBD2Q3TJfY7RDWUIBC5G1+7EzvTrKWKRT8ILPqfoLgBe6I99CLjout+bOJAkj+l394n/LtW/PdUTB7qJM0vv/fTXi3X7ne44u0tnbnTGDMDEQTOqFzPSxIniIBtRUe739o1DY/Ho1Ya5auxrq+fs/vmz33adEYFxgEIOMYiiW9Og/C4BjPsQo3TXiSNEAZwAD93GfNy5S/yQxhyFMNVfCG0cY50TPZu5PvIZghzPRQNA5osEHT4AMQku6rNbjopQCAIUHYymPo2WzWjqLRsciOLeJbNF8dPSwMRjgD74cFYiS0AQBYA/VDqjeeCWO1GMEZsG5c6MUjBKzhli0I+yGpyJwryl2feMLsjZSn+Y0wcURmkSM5wWBwoBMYbGYuCiGSEe02QxGbGIJtEpZzE6yJqLvtM+YJNzNDoQOaWOMs4YE8DLXa40jVm2YLJ6hegRkiAA4ShxqFgvHM144hwcpkJ8obtY+F0C2KjsPE+TxMnK6Lr6ubCWxDcF8Y0sDgriYPUSgkf6mDB9KpYEYZEujLqwMB+iqDz6Mh891i/l6KuCeCWL1wXxWhbduKDGFXVaUKcV9bGgPsrqrCDOZHFeEOeVXF5Quay+L4jvZfFm20t/3vbSX6RY8XXELpqLjY/G4kdtseaSCUyTVxevf0yT7uJZrZQY6WbZCN218XjYanfbqSzjtd4cdkyrX9VzQ9vqmbbKkDt6bdvoDzYsR5I3hzaM1qDXkqMg3uidrj2s6htYo9fuWwrDhnZoNwftVfkQCiWrl/vsbqtZKYuX53QtyzrpVvXcYDVtu3OkMOQOu9/v9+wFCo0ZxUjy0rWx1ToxqlE0D+oYrWZPoW8+gNGxrAosLbBYQ8vsmwsWjgCWnDxfLHan1x3IQXxTf6tn25UPyAvl79hmv6kwbFhP+ieD4wUJYSD05KqQvHytdue4I0eRPGjYFl+wwkI2bxp2LaNdYSEFlqFlW1lhyztRbLJb8y5Z/iBv9p2+b+ppKpvFRpPN2d5bmyUvVphxvVtp3+ZXD8C18NWZQlgXDxXhsBYGqlhgPTxUwsNt8F7V79XFe4pwrxbGU7F49fCeEt7bBk+rfloXTxXhtBaGqlhoPTxVwtNt8Lzq53XxXBHOa2G4ioXXw3MlPN8GT6p+UhdPFOGkFoaoWEg9PFHCkzK8+PPIjvkA69kxmWDdD1cHg9JvFs2ODxNxEVm5lxPvI3FdYOi1OFa8FWdcIM54PyQOYF7gh6m4PnjOQdbaZgSztVG0xNXFlC8q1cbV0aF5cmj81Nx/aa0uMU+1r7VvtO81U2trL7VX2pl2qUHtH+2/HW1np/F744/Gn42/ltYnO6sxX2mlp/H3/+F9LNM=</latexit>

T
w
wij =
= xP
i xj
ij 0
j exp
exp ww0 ij
ij
wij = P 0
j exp wij

y1 y2 y3 y4 y5 y6
<latexit sha1_base64="xeGfV1YPMsYDYKSe/yoPKYFBu90=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQC92lsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+aRR0A</latexit>

x1 x2 x3 x4 xx15 xx26 x3 x4xx15xx26x3 x4xx5xx6x x xx xx x x xx xx x x x x


<latexit sha1_base64="UwzBCGnFqsCXotGJ+NPMfcVd3xo=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQDd2Fsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+5Exzc</latexit> <latexit sha1_base64="UwzBCGnFqsCXotGJ+NPMfcVd3xo=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQDd2Fsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+5Exzc</latexit> <latexit sha1_base64="UwzBCGnFqsCXotGJ+NPMfcVd3xo=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQDd2Fsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+5Exzc</latexit>

softmax

⇥ ⇥ ⇥ ⇥ ⇥ ⇥
<latexit sha1_base64="0qRstZIKf4JOUFLBsGr5ufSVrM0=">AAAN7XicfZdNb9s2HMbV7q3L6jXdjrsICwoMQxBIiV8xFKgl2ehhTbMgb1tsBBRNK4IpkaAox66g8z7BbsOu+wL7Mtt1+yCjbEeWKMo6kXwePvrpL9EmXYr9iBvG30+efvTxJ59+9uzzvS+eN758sf/yq6uIxAyiS0gwYTcuiBD2Q3TJfY7RDWUIBC5G1+7MzvTrOWKRT8ILvqRoHAAv9Kc+BFwM3e2fjiBJHtK75MRMRz/kneNi56TYaRY7rWKnLTp3+wfGkbG69GrD3DQOtM11dvfy+a97owmBcYBCDjGIolvToHycAMZ9iFG6N4ojRAGcAQ/dxnzaHSd+SGOOQpjqr4Q2jbHOiZ49nD7xGYIcL0UDQOaLBB3eAwYgFyXYK0dFKAQBig4nc59G62Y099YNDkT9xsliVd+0NDHxGKD3PlyUyBIQRAHg95XBaBm45UEUY8TmQXkwoxSMknOBGPSjrAZnojDvafbKogtyttHvl/QehVGaxAynxYlCQIyhqZi4akaIxzRZPYz4TmbRa85idJg1V2OvHcBm52hyKHJKA2WcKSaAl4dc6TEWU1HrrF4heoAkCEA4SUY0TUYcLXgyOjxKhfhKd7HwuwSwSdl5nibJKCuj6+rnwloSTwviqSwOCuJgcxOCJ/qUMH0uPgnCIl0YdWFhPkRRefZlPnuqX8rRVwXxShavC+K1LLpxQY0r6rygzivqQ0F9kNVFQVzI4rIgLiu5vKByWf1QED/I4s2um/6866a/SLHi7YhVtBQLH03F79bqm0tmME3eXrz7MU16q2vzpcRIN8tG6D4aT4btTq+TyjJ+1JvDrmk5VT03dKy+aasMuaPfsQ1nsGU5lrw5tGG0B/22HAXxVu/27GFV38Ia/Y5jKQxb2qHdHHQ25UMolKxe7rN77WalLF6e07Msq9Wr6rnBatp291hhyB224zh9e4VCY0Yxkrz00dhut4xqFM2Duka72Vfo2xdgdC2rAksLLNbQMh1zxcIRwJKT5x+L3e33BnIQ39bf6tt25QXyQvm7tuk0FYYta8tpDU5WJISB0JOrQvLytTvdk64cRfKgYUe8wQoL2d5p2LOMToWFFFiGlm1lhS2vRLHIbs1xsv5B3q47/cDU01Q2i4Umm7O192iWvFhhxvVupX2XXz0B18JXnxTCunioCIe1MFDFAuvhoRIe7oL3qn6vLt5ThHu1MJ6KxauH95Tw3i54WvXTuniqCKe1MFTFQuvhqRKe7oLnVT+vi+eKcF4Lw1UsvB6eK+H5LnhS9ZO6eKIIJ7UwRMVC6uGJEp6U4cWfR7bNB1jPtskE63642RiUfrNotn2YQbGTXLvXD+4gcVxg6J3YVrwXe1wg9njfJyPAvMAPU3F88EaHWWuXESwejaIlji6mfFCpNq6Oj8zWkfFT8+CNtTnEPNO+0b7VvtNMraO90d5qZ9qlBrW/tH+0f7X/GqTxW+P3xh9r69Mnmzlfa6Wr8ef/TP8idg==</latexit>

w31xw
1 x
322w
xw331xw
33 4wx32
5x
34ww
1w
x 2w
x33
631 35 w
xw
32xw
334
36w
4wx
335xw
35 w
xwx36w
634 ww
x 35xw
wwx36w
xw
xwx w
ww xwwx xww
x x x wx xw x x <latexit sha1_base64="UwzBCGnFqsCXotGJ+NPMfcVd3xo=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQDd2Fsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+5Exzc</latexit>
<latexit sha1_base64="0qRstZIKf4JOUFLBsGr5ufSVrM0=">AAAN7XicfZdNb9s2HMbV7q3L6jXdjrsICwoMQxBIiV8xFKgl2ehhTbMgb1tsBBRNK4IpkaAox66g8z7BbsOu+wL7Mtt1+yCjbEeWKMo6kXwePvrpL9EmXYr9iBvG30+efvTxJ59+9uzzvS+eN758sf/yq6uIxAyiS0gwYTcuiBD2Q3TJfY7RDWUIBC5G1+7MzvTrOWKRT8ILvqRoHAAv9Kc+BFwM3e2fjiBJHtK75MRMRz/kneNi56TYaRY7rWKnLTp3+wfGkbG69GrD3DQOtM11dvfy+a97owmBcYBCDjGIolvToHycAMZ9iFG6N4ojRAGcAQ/dxnzaHSd+SGOOQpjqr4Q2jbHOiZ49nD7xGYIcL0UDQOaLBB3eAwYgFyXYK0dFKAQBig4nc59G62Y099YNDkT9xsliVd+0NDHxGKD3PlyUyBIQRAHg95XBaBm45UEUY8TmQXkwoxSMknOBGPSjrAZnojDvafbKogtyttHvl/QehVGaxAynxYlCQIyhqZi4akaIxzRZPYz4TmbRa85idJg1V2OvHcBm52hyKHJKA2WcKSaAl4dc6TEWU1HrrF4heoAkCEA4SUY0TUYcLXgyOjxKhfhKd7HwuwSwSdl5nibJKCuj6+rnwloSTwviqSwOCuJgcxOCJ/qUMH0uPgnCIl0YdWFhPkRRefZlPnuqX8rRVwXxShavC+K1LLpxQY0r6rygzivqQ0F9kNVFQVzI4rIgLiu5vKByWf1QED/I4s2um/6866a/SLHi7YhVtBQLH03F79bqm0tmME3eXrz7MU16q2vzpcRIN8tG6D4aT4btTq+TyjJ+1JvDrmk5VT03dKy+aasMuaPfsQ1nsGU5lrw5tGG0B/22HAXxVu/27GFV38Ia/Y5jKQxb2qHdHHQ25UMolKxe7rN77WalLF6e07Msq9Wr6rnBatp291hhyB224zh9e4VCY0Yxkrz00dhut4xqFM2Duka72Vfo2xdgdC2rAksLLNbQMh1zxcIRwJKT5x+L3e33BnIQ39bf6tt25QXyQvm7tuk0FYYta8tpDU5WJISB0JOrQvLytTvdk64cRfKgYUe8wQoL2d5p2LOMToWFFFiGlm1lhS2vRLHIbs1xsv5B3q47/cDU01Q2i4Umm7O192iWvFhhxvVupX2XXz0B18JXnxTCunioCIe1MFDFAuvhoRIe7oL3qn6vLt5ThHu1MJ6KxauH95Tw3i54WvXTuniqCKe1MFTFQuvhqRKe7oLnVT+vi+eKcF4Lw1UsvB6eK+H5LnhS9ZO6eKIIJ7UwRMVC6uGJEp6U4cWfR7bNB1jPtskE63642RiUfrNotn2YQbGTXLvXD+4gcVxg6J3YVrwXe1wg9njfJyPAvMAPU3F88EaHWWuXESwejaIlji6mfFCpNq6Oj8zWkfFT8+CNtTnEPNO+0b7VvtNMraO90d5qZ9qlBrW/tH+0f7X/GqTxW+P3xh9r69Mnmzlfa6Wr8ef/TP8idg==</latexit>

<latexit sha1_base64="UwzBCGnFqsCXotGJ+NPMfcVd3xo=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQDd2Fsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+5Exzc</latexit>
<latexit sha1_base64="0qRstZIKf4JOUFLBsGr5ufSVrM0=">AAAN7XicfZdNb9s2HMbV7q3L6jXdjrsICwoMQxBIiV8xFKgl2ehhTbMgb1tsBBRNK4IpkaAox66g8z7BbsOu+wL7Mtt1+yCjbEeWKMo6kXwePvrpL9EmXYr9iBvG30+efvTxJ59+9uzzvS+eN758sf/yq6uIxAyiS0gwYTcuiBD2Q3TJfY7RDWUIBC5G1+7MzvTrOWKRT8ILvqRoHAAv9Kc+BFwM3e2fjiBJHtK75MRMRz/kneNi56TYaRY7rWKnLTp3+wfGkbG69GrD3DQOtM11dvfy+a97owmBcYBCDjGIolvToHycAMZ9iFG6N4ojRAGcAQ/dxnzaHSd+SGOOQpjqr4Q2jbHOiZ49nD7xGYIcL0UDQOaLBB3eAwYgFyXYK0dFKAQBig4nc59G62Y099YNDkT9xsliVd+0NDHxGKD3PlyUyBIQRAHg95XBaBm45UEUY8TmQXkwoxSMknOBGPSjrAZnojDvafbKogtyttHvl/QehVGaxAynxYlCQIyhqZi4akaIxzRZPYz4TmbRa85idJg1V2OvHcBm52hyKHJKA2WcKSaAl4dc6TEWU1HrrF4heoAkCEA4SUY0TUYcLXgyOjxKhfhKd7HwuwSwSdl5nibJKCuj6+rnwloSTwviqSwOCuJgcxOCJ/qUMH0uPgnCIl0YdWFhPkRRefZlPnuqX8rRVwXxShavC+K1LLpxQY0r6rygzivqQ0F9kNVFQVzI4rIgLiu5vKByWf1QED/I4s2um/6866a/SLHi7YhVtBQLH03F79bqm0tmME3eXrz7MU16q2vzpcRIN8tG6D4aT4btTq+TyjJ+1JvDrmk5VT03dKy+aasMuaPfsQ1nsGU5lrw5tGG0B/22HAXxVu/27GFV38Ia/Y5jKQxb2qHdHHQ25UMolKxe7rN77WalLF6e07Msq9Wr6rnBatp291hhyB224zh9e4VCY0Yxkrz00dhut4xqFM2Duka72Vfo2xdgdC2rAksLLNbQMh1zxcIRwJKT5x+L3e33BnIQ39bf6tt25QXyQvm7tuk0FYYta8tpDU5WJISB0JOrQvLytTvdk64cRfKgYUe8wQoL2d5p2LOMToWFFFiGlm1lhS2vRLHIbs1xsv5B3q47/cDU01Q2i4Umm7O192iWvFhhxvVupX2XXz0B18JXnxTCunioCIe1MFDFAuvhoRIe7oL3qn6vLt5ThHu1MJ6KxauH95Tw3i54WvXTuniqCKe1MFTFQuvhqRKe7oLnVT+vi+eKcF4Lw1UsvB6eK+H5LnhS9ZO6eKIIJ7UwRMVC6uGJEp6U4cWfR7bNB1jPtskE63642RiUfrNotn2YQbGTXLvXD+4gcVxg6J3YVrwXe1wg9njfJyPAvMAPU3F88EaHWWuXESwejaIlji6mfFCpNq6Oj8zWkfFT8+CNtTnEPNO+0b7VvtNMraO90d5qZ9qlBrW/tH+0f7X/GqTxW+P3xh9r69Mnmzlfa6Wr8ef/TP8idg==</latexit> <latexit sha1_base64="0qRstZIKf4JOUFLBsGr5ufSVrM0=">AAAN7XicfZdNb9s2HMbV7q3L6jXdjrsICwoMQxBIiV8xFKgl2ehhTbMgb1tsBBRNK4IpkaAox66g8z7BbsOu+wL7Mtt1+yCjbEeWKMo6kXwePvrpL9EmXYr9iBvG30+efvTxJ59+9uzzvS+eN758sf/yq6uIxAyiS0gwYTcuiBD2Q3TJfY7RDWUIBC5G1+7MzvTrOWKRT8ILvqRoHAAv9Kc+BFwM3e2fjiBJHtK75MRMRz/kneNi56TYaRY7rWKnLTp3+wfGkbG69GrD3DQOtM11dvfy+a97owmBcYBCDjGIolvToHycAMZ9iFG6N4ojRAGcAQ/dxnzaHSd+SGOOQpjqr4Q2jbHOiZ49nD7xGYIcL0UDQOaLBB3eAwYgFyXYK0dFKAQBig4nc59G62Y099YNDkT9xsliVd+0NDHxGKD3PlyUyBIQRAHg95XBaBm45UEUY8TmQXkwoxSMknOBGPSjrAZnojDvafbKogtyttHvl/QehVGaxAynxYlCQIyhqZi4akaIxzRZPYz4TmbRa85idJg1V2OvHcBm52hyKHJKA2WcKSaAl4dc6TEWU1HrrF4heoAkCEA4SUY0TUYcLXgyOjxKhfhKd7HwuwSwSdl5nibJKCuj6+rnwloSTwviqSwOCuJgcxOCJ/qUMH0uPgnCIl0YdWFhPkRRefZlPnuqX8rRVwXxShavC+K1LLpxQY0r6rygzivqQ0F9kNVFQVzI4rIgLiu5vKByWf1QED/I4s2um/6866a/SLHi7YhVtBQLH03F79bqm0tmME3eXrz7MU16q2vzpcRIN8tG6D4aT4btTq+TyjJ+1JvDrmk5VT03dKy+aasMuaPfsQ1nsGU5lrw5tGG0B/22HAXxVu/27GFV38Ia/Y5jKQxb2qHdHHQ25UMolKxe7rN77WalLF6e07Msq9Wr6rnBatp291hhyB224zh9e4VCY0Yxkrz00dhut4xqFM2Duka72Vfo2xdgdC2rAksLLNbQMh1zxcIRwJKT5x+L3e33BnIQ39bf6tt25QXyQvm7tuk0FYYta8tpDU5WJISB0JOrQvLytTvdk64cRfKgYUe8wQoL2d5p2LOMToWFFFiGlm1lhS2vRLHIbs1xsv5B3q47/cDU01Q2i4Umm7O192iWvFhhxvVupX2XXz0B18JXnxTCunioCIe1MFDFAuvhoRIe7oL3qn6vLt5ThHu1MJ6KxauH95Tw3i54WvXTuniqCKe1MFTFQuvhqRKe7oLnVT+vi+eKcF4Lw1UsvB6eK+H5LnhS9ZO6eKIIJ7UwRMVC6uGJEp6U4cWfR7bNB1jPtskE63642RiUfrNotn2YQbGTXLvXD+4gcVxg6J3YVrwXe1wg9njfJyPAvMAPU3F88EaHWWuXESwejaIlji6mfFCpNq6Oj8zWkfFT8+CNtTnEPNO+0b7VvtNMraO90d5qZ9qlBrW/tH+0f7X/GqTxW+P3xh9

x1 x2 x3 x4 xx15 xx26 x3 x4xx15xx26x3 x4xx15xx26x3 x4xx15xx26x3 x4xx15xx26x3 x4 x5 x6


<latexit sha1_base64="UwzBCGnFqsCXotGJ+NPMfcVd3xo=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQDd2Fsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+5Exzc</latexit> <latexit sha1_base64="UwzBCGnFqsCXotGJ+NPMfcVd3xo=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQDd2Fsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+5Exzc</latexit> <latexit sha1_base64="UwzBCGnFqsCXotGJ+NPMfcVd3xo=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQDd2Fsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+5Exzc</latexit> <latexit sha1_base64="UwzBCGnFqsCXotGJ+NPMfcVd3xo=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQDd2Fsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+5Exzc</latexit> <latexit sha1_base64="UwzBCGnFqsCXotGJ+NPMfcVd3xo=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQDd2Fsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+5Exzc</latexit> <latexit sha1_base64="UwzBCGnFqsCXotGJ+NPMfcVd3xo=">AAAN4XicfZdbb9s2GIbV7tRl9Zaul70RFhQYhiCQEh9RFKgl2+jF2mZBTlsUBBRNK4JpkaAox66g62F3w273B/Zrdrvt34zyQQeKsq4+8H356uEn0aZciv2QG8Z/jx5/8ulnn3/x5Mu9r542vv5m/9m3lyGJGEQXkGDCrl0QIuwH6IL7HKNryhCYuRhduVM71a/miIU+Cc75kqLbGfACf+JDwMXQ3b7lQDd2Fsmd6bzalsd5eZKXzbxs5WXbeXW3f2AcGatLrxbmpjjQNtfp3bOnv+45YwKjGQo4xCAMb0yD8tsYMO5DjJI9JwoRBXAKPHQT8Un3NvYDGnEUwER/KbRJhHVO9HRB+thnCHK8FAWAzBcJOrwHDEAulr1XjgpRAGYoPBzPfRquy3DurQsORM9u48Wqp0lpYuwxQO99uCiRxWAWzgC/rwyGy5lbHkQRRmw+Kw+mlIJRci4Qg36Y9uBUNOYDTR9TeE5ON/r9kt6jIEziiOGkOFEIiDE0ERNXZYh4ROPVYsS7MQ1fcxahw7Rcjb0eADY9Q+NDkVMaKONMMAG8PORKy1hMRK/TfgXoAZLZDATj2KFJ7HC04LFzeJQI8aXuYuF3CWDjsvMsiWMnbaPr6mfCWhLfF8T3sjgsiMPNTQge6xPC9Ll4JQgLdWHUhYX5EIXl2RfZ7Il+IUdfFsRLWbwqiFey6EYFNaqo84I6r6gPBfVBVhcFcSGLy4K4rOTygspl9WNB/CiL17tu+vOum/4ixYqnI3bRUmx8NBG/Vat3Lp7CJH57/u7HJO6trs2bEiHdLBuhuzWejNqdXieRZbzVm6OuaQ2qemboWH3TVhkyR79jG4NhznIseTNow2gP+205CuJc7/bsUVXPYY1+Z2ApDDntyG4OO5v2IRRIVi/z2b12s9IWL8vpWZbV6lX1zGA1bbt7rDBkDnswGPTtFQqNGMVI8tKtsd1uGdUomgV1jXazr9DzB2B0LasCSwss1sgyB+aKhSOAJSfPXha72+8N5SCe99/q23blAfJC+7u2OWgqDDlra9AanqxICAOBJ3eFZO1rd7onXTmKZEGjjniCFRaS32nUs4xOhYUUWEaWbaWNLe9EscluzNt4/YOc7zv9wNSTRDaLjSab0723NUterDDjerfSvsuvnoBr4asrhbAuHirCYS0MVLHAeniohIe74L2q36uL9xThXi2Mp2Lx6uE9Jby3C55W/bQunirCaS0MVbHQeniqhKe74HnVz+viuSKc18JwFQuvh+dKeL4LnlT9pC6eKMJJLQxRsZB6eKKEJ2V48eeRHvMB1tNjMsG6H2wOBqXfLJoeH6ZQnCTX7vXCB0h8LjD0ThwrPogzLhBnvB9iBzBv5geJ+HzwnMO02mUEi61RVOLTxZQ/VKrF5fGR2ToyfmoevLE2HzFPtBfad9r3mql1tDfaW+1Uu9Cg9pf2t/aP9m8DNn5r/N74Y219/Ggz57lWuhp//g+5Exzc</latexit>

To vector ze th s operat on we can concatenate


VECTOR ZED
the nput and output sequences nto matr ces and
Y = WX
per orm the s mp e se attent on operat on n
m
three steps
W =X X

W =X X W =X X W= m W W= m Y W= WX

W =X X W= m W Y = WX
TAKE NOTE

In simple self-attention wii (xi to yi) usually has the most weight
not a big problem, but we’ll allow this to change later.

Simple self-attention has no parameters.


Whatever parameterized mechanism generates xi (like an embedding layer) drives the self attention.

There is a linear operation between X and Y.


non-vanishing gradients through Y = WXT, vanishing gradients through W = softmax(XTX).

Y = WXT
<latexit sha1_base64="bX1CVr1BmYlZVtrvdVSb3jYZFu8=">AAANw3icfZdNb9s2HMbV7q3L6i3djrsIywoMQxBIieOXQ4Bako0e1jYL8ro4DSiaVgRTIkFRjl1B5wH7NLtu32TfZpTsyBJFWad/+Dx8/NOfokK5FPsRN4z/nj3/7PMvvvzqxdc737xsffvd7qvvLyMSM4guIMGEXbsgQtgP0QX3OUbXlCEQuBhduTM706/miEU+Cc/5kqK7AHihP/Uh4GLofvfnMfSSsTvVb1L9RB9Dkv9xlYrSzcvr9OP5/e6ecWDkl14vzHWxp62v0/tXL//cGU8IjAMUcohBFN2aBuV3CWDchxilO+M4QhTAGfDQbcynvbvED2nMUQhT/bXQpjHWOdEzZH3iMwQ5XooCQOaLBB0+AAYgFze2U42KUAgCFO1P5j6NVmU091YFB6Ird8ki71pamZh4DNAHHy4qZAkIogDwh9pgtAzc6iCKMWLzoDqYUQpGyblADPpR1oNT0ZgPNFuI6JycrvWHJX1AYZQmMcNpeaIQEGNoKibmZYR4TJP8ZsTqz6ITzmK0n5X52IkD2OwMTfZFTmWgijPFBPDqkCvdxmIqep31K0SPkAQBCCfJmKbJmKMFT8b7B6kQX+suFn6XADapOs/SJBlnbXRd/UxYK+L7kvheFoclcbj+EYIn+pQwfS4eCcIiXRh1YWE+RFF19kUxe6pfyNGXJfFSFq9K4pUsunFJjWvqvKTOa+pjSX2U1UVJXMjisiQua7m8pHJZ/VQSP8nidUm8lsWbkngji39IsWJ1xC5aio2PpuJtlD9zyQymydvzd7+lST+/1k9KjHSzaoTuk/Fo1On2u6ks4ye9PeqZllPXC0PXGpi2ylA4Bl3bcIYblkPJW0AbRmc46MhREG/0Xt8e1fUNrDHoOpbCsKEd2e1hd90+hELJ6hU+u99p19riFTl9y7KO+3W9MFht2+4dKgyFw3YcZ2DnKDRmFCPJS5+Mnc6xUY+iRVDP6LQHCn2zAEbPsmqwtMRijSzTMXMWjgCWnLx4WOzeoD+Ug/im/9bAtmsLyEvt79mm01YYNqzHzvHwKCchDISe3BVStK/T7R315ChSBI26YgVrLGTzS6O+ZXRrLKTEMrJsK2tsdSeKTXZr3iWrF/Jm3+l7pp6msllsNNmc7b0ns+TFCjNudivt2/zqCbgRvn6nEDbFQ0U4bISBKhbYDA+V8HAbvFf3e03xniLca4TxVCxeM7ynhPe2wdO6nzbFU0U4bYShKhbaDE+V8HQbPK/7eVM8V4TzRhiuYuHN8FwJz7fBk7qfNMUTRThphCEqFtIMT5TwpAov/nlkx3yA9eyYTLDuh+uDQeWdRbPjwwyKk+TKvbpxB4nPBYbeiWPFB3HGBeKM92syBswL/DAVnw/eeD+rthnB4skoKvHpYsofKvXi8vDAPD4wfm/vvbHWHzEvtB+1n7RfNFPram+0t9qpdqFB7S/tb+0f7d/WsDVrsRZfWZ8/W8/5QatcrfR/PkUPaQ==</latexit>

Y = WXT
<latexit sha1_base64="bX1CVr1BmYlZVtrvdVSb3jYZFu8=">AAANw3icfZdNb9s2HMbV7q3L6i3djrsIywoMQxBIieOXQ4Bako0e1jYL8ro4DSiaVgRTIkFRjl1B5wH7NLtu32TfZpTsyBJFWad/+Dx8/NOfokK5FPsRN4z/nj3/7PMvvvzqxdc737xsffvd7qvvLyMSM4guIMGEXbsgQtgP0QX3OUbXlCEQuBhduTM706/miEU+Cc/5kqK7AHihP/Uh4GLofvfnMfSSsTvVb1L9RB9Dkv9xlYrSzcvr9OP5/e6ecWDkl14vzHWxp62v0/tXL//cGU8IjAMUcohBFN2aBuV3CWDchxilO+M4QhTAGfDQbcynvbvED2nMUQhT/bXQpjHWOdEzZH3iMwQ5XooCQOaLBB0+AAYgFze2U42KUAgCFO1P5j6NVmU091YFB6Ird8ki71pamZh4DNAHHy4qZAkIogDwh9pgtAzc6iCKMWLzoDqYUQpGyblADPpR1oNT0ZgPNFuI6JycrvWHJX1AYZQmMcNpeaIQEGNoKibmZYR4TJP8ZsTqz6ITzmK0n5X52IkD2OwMTfZFTmWgijPFBPDqkCvdxmIqep31K0SPkAQBCCfJmKbJmKMFT8b7B6kQX+suFn6XADapOs/SJBlnbXRd/UxYK+L7kvheFoclcbj+EYIn+pQwfS4eCcIiXRh1YWE+RFF19kUxe6pfyNGXJfFSFq9K4pUsunFJjWvqvKTOa+pjSX2U1UVJXMjisiQua7m8pHJZ/VQSP8nidUm8lsWbkngji39IsWJ1xC5aio2PpuJtlD9zyQymydvzd7+lST+/1k9KjHSzaoTuk/Fo1On2u6ks4ye9PeqZllPXC0PXGpi2ylA4Bl3bcIYblkPJW0AbRmc46MhREG/0Xt8e1fUNrDHoOpbCsKEd2e1hd90+hELJ6hU+u99p19riFTl9y7KO+3W9MFht2+4dKgyFw3YcZ2DnKDRmFCPJS5+Mnc6xUY+iRVDP6LQHCn2zAEbPsmqwtMRijSzTMXMWjgCWnLx4WOzeoD+Ug/im/9bAtmsLyEvt79mm01YYNqzHzvHwKCchDISe3BVStK/T7R315ChSBI26YgVrLGTzS6O+ZXRrLKTEMrJsK2tsdSeKTXZr3iWrF/Jm3+l7pp6msllsNNmc7b0ns+TFCjNudivt2/zqCbgRvn6nEDbFQ0U4bISBKhbYDA+V8HAbvFf3e03xniLca4TxVCxeM7ynhPe2wdO6nzbFU0U4bYShKhbaDE+V8HQbPK/7eVM8V4TzRhiuYuHN8FwJz7fBk7qfNMUTRThphCEqFtIMT5TwpAov/nlkx3yA9eyYTLDuh+uDQeWdRbPjwwyKk+TKvbpxB4nPBYbeiWPFB3HGBeKM92syBswL/DAVnw/eeD+rthnB4skoKvHpYsofKvXi8vDAPD4wfm/vvbHWHzEvtB+1n7RfNFPram+0t9qpdqFB7S/tb+0f7d/WsDVrsRZfWZ8/W8/5QatcrfR/PkUPaQ==</latexit>

Y = WXT
<latexit sha1_base64="bX1CVr1BmYlZVtrvdVSb3jYZFu8=">AAANw3icfZdNb9s2HMbV7q3L6i3djrsIywoMQxBIieOXQ4Bako0e1jYL8ro4DSiaVgRTIkFRjl1B5wH7NLtu32TfZpTsyBJFWad/+Dx8/NOfokK5FPsRN4z/nj3/7PMvvvzqxdc737xsffvd7qvvLyMSM4guIMGEXbsgQtgP0QX3OUbXlCEQuBhduTM706/miEU+Cc/5kqK7AHihP/Uh4GLofvfnMfSSsTvVb1L9RB9Dkv9xlYrSzcvr9OP5/e6ecWDkl14vzHWxp62v0/tXL//cGU8IjAMUcohBFN2aBuV3CWDchxilO+M4QhTAGfDQbcynvbvED2nMUQhT/bXQpjHWOdEzZH3iMwQ5XooCQOaLBB0+AAYgFze2U42KUAgCFO1P5j6NVmU091YFB6Ird8ki71pamZh4DNAHHy4qZAkIogDwh9pgtAzc6iCKMWLzoDqYUQpGyblADPpR1oNT0ZgPNFuI6JycrvWHJX1AYZQmMcNpeaIQEGNoKibmZYR4TJP8ZsTqz6ITzmK0n5X52IkD2OwMTfZFTmWgijPFBPDqkCvdxmIqep31K0SPkAQBCCfJmKbJmKMFT8b7B6kQX+suFn6XADapOs/SJBlnbXRd/UxYK+L7kvheFoclcbj+EYIn+pQwfS4eCcIiXRh1YWE+RFF19kUxe6pfyNGXJfFSFq9K4pUsunFJjWvqvKTOa+pjSX2U1UVJXMjisiQua7m8pHJZ/VQSP8nidUm8lsWbkngji39IsWJ1xC5aio2PpuJtlD9zyQymydvzd7+lST+/1k9KjHSzaoTuk/Fo1On2u6ks4ye9PeqZllPXC0PXGpi2ylA4Bl3bcIYblkPJW0AbRmc46MhREG/0Xt8e1fUNrDHoOpbCsKEd2e1hd90+hELJ6hU+u99p19riFTl9y7KO+3W9MFht2+4dKgyFw3YcZ2DnKDRmFCPJS5+Mnc6xUY+iRVDP6LQHCn2zAEbPsmqwtMRijSzTMXMWjgCWnLx4WOzeoD+Ug/im/9bAtmsLyEvt79mm01YYNqzHzvHwKCchDISe3BVStK/T7R315ChSBI26YgVrLGTzS6O+ZXRrLKTEMrJsK2tsdSeKTXZr3iWrF/Jm3+l7pp6msllsNNmc7b0ns+TFCjNudivt2/zqCbgRvn6nEDbFQ0U4bISBKhbYDA+V8HAbvFf3e03xniLca4TxVCxeM7ynhPe2wdO6nzbFU0U4bYShKhbaDE+V8HQbPK/7eVM8V4TzRhiuYuHN8FwJz7fBk7qfNMUTRThphCEqFtIMT5TwpAov/nlkx3yA9eyYTLDuh+uDQeWdRbPjwwyKk+TKvbpxB4nPBYbeiWPFB3HGBeKM92syBswL/DAVnw/eeD+rthnB4skoKvHpYsofKvXi8vDAPD4wfm/vvbHWHzEvtB+1n7RfNFPram+0t9qpdqFB7S/tb+0f7d/WsDVrsRZfWZ8/W8/5QatcrfR/PkUPaQ==</latexit>

12

TAKE NOTE

No problem looking far back into the sequence.


In fact, every input has the same distance to every output.

More of a set model than a sequence model. No access to the sequential


information.
We’ll fix by encoding the sequential structure into the embeddings. Details later.

Permutation equivariant.
for any permutation p of the input: p(sa(X)) = sa(p(X))

13

A LITTLE MORE INTUITION: DOT PRODUCTS. To build some intiuition for why the self attention
works, we need to look into how dot products
users movies function. To do so, we’ll leave the realm of
likes sequence learning for a while and dip our toes
briefly into the pool of recommendation.

Imagine that we have a set of users and a set of


movies, with no features about any of them except
an incomplete list of which user liked which
movie. Our task is to predict which other movies a
14 given user will like.

If we had features for each movie and user, we


could match them up like this. We multiply how
much the user likes romance by how much
user u movie m
<latexit sha1_base64="62Z/cMQEVE70aUQzqhSK3UNWDA0=">AAAOI3icfZfLbuM2GIWV6W2ajttMu2q7IRoMULRBICWOLygCjCXbmEVnJg1ym0ZBQNG0IpgSCYpy7BG0K9An6dN0V3TTRV+gT1HKkWWJkqJNCJ7D408/SYV0GPFCoev/bD354MOPPv7k6afbnz1rff7FzvMvL0IacYTPESWUXzkwxMQL8LnwBMFXjGPoOwRfOjMr1S/nmIceDc7EkuEbH7qBN/UQFLLrduc3G7mxLfBCxFGIOUhAbDtTECWJ/ROwEco0n849vBb9JAG2vW0jmqkhohzLzmOQpkXJrZGO9OVf8OO66yDrOth0HWZdh7c7u/q+vnpAtWFkjV0te05unz/7fdueUBT5OBCIwDC8NnQmbmLIhYcITrZt+TYMohl08XUkpr2b2AtYJHCAEvBCatOIAEFBWhIw8ThGgixlAyLuyQSA7iCHSMjCbZejQhxAH4d7k7nHwodmOHcfGgLKqt/Ei9WsJKWBscshu/PQokQWQz/0obirdIZL3yl34ohgPvfLnSmlZFScC8yRF6Y1OJGFecvSiQ7P6Emm3y3ZHQ7CJI44SYoDpYA5x1M5cNUMsYhYvHoZubpm4bHgEd5Lm6u+4yHks1M82ZM5pY4yzpRQKMpdjvIai6msdVqvAN8j6vswmMQ2S7LFZe/tJ1J8ARwi/Q6FfFJ2niZxbKdldBxwKq0l8U1BfKOKo4I4yn6EkgmYUg7mcklQHgJpBNLCPYTD8ujzfPQUnKvRFwXxQhUvC+KlKjpRQY0q6rygzivqfUG9V9VFQVyo4rIgLiu5oqAKVX1fEN+r4lVBvFLFdwXxnSr+qsTK2ZG7aCk3Pp7Kr91qzcUzlMSvzl7/nMT91ZOtlAgDo2xEztp4OO50+91Elclab497hjms6rmhaw4Mq86QOwZdSx+ONiwHijeH1vXOaNBRoxDZ6L2+Na7qG1h90B2aNYYN7dhqj7pZ+TAOFKub+6x+p10pi5vn9E3TPOpX9dxgti2rd1BjyB3WcDgcWCsUFnFGsOJla2Onc6RXo1ge1NM77UGNvpkAvWeaFVhWYDHHpjE0ViwCQ6I4Rb5YrN6gP1KDxKb+5sCyKhMoCuXvWcawXWPYsB4Nj0aHKxLKYeCqVaF5+Trd3mFPjaJ50LgrZ7DCQje/NO6berfCQgssY9My08KWd6LcZNfGTfzwQd7sO7BrgCRRzXKjqeZ0763NipfUmEmzu9b+mL9+AGmEr74pQk3xqCYcNcKgOhbUDI9q4dFj8G7V7zbFuzXhbiOMW8fiNsO7tfDuY/Cs6mdN8awmnDXCsDoW1gzPauHZY/Ci6hdN8aImXDTCiDoW0QwvauHFY/C06qdN8bQmnDbC0DoW2gxPa+FpGV7+80iP+ZCA9JhMCfCC7GBQ+max9PgwkxeYzP3w4kMsrwscv5bHirfyjAvlGe+H2Ibc9b0gkdcH195LW48Z4WJtlC15dTHUi0q1cXGwbxzt67+0d1+a2SXmqfat9p32vWZoXe2l9ko70c41pP23tbP19dY3rT9af7b+av39YH2ylY35Sis9rX//B6LUNMo=</latexit>

romance there is in the movie. If both are positive


score = u1 m1 + u2 m2 + u3 m3 has romance
of negative, the score is increased. If one is
has acbon
<latexit sha1_base64="62Z/cMQEVE70aUQzqhSK3UNWDA0=">AAAOI3icfZfLbuM2GIWV6W2ajttMu2q7IRoMULRBICWOLygCjCXbmEVnJg1ym0ZBQNG0IpgSCYpy7BG0K9An6dN0V3TTRV+gT1HKkWWJkqJNCJ7D408/SYV0GPFCoev/bD354MOPPv7k6afbnz1rff7FzvMvL0IacYTPESWUXzkwxMQL8LnwBMFXjGPoOwRfOjMr1S/nmIceDc7EkuEbH7qBN/UQFLLrduc3G7mxLfBCxFGIOUhAbDtTECWJ/ROwEco0n849vBb9JAG2vW0jmqkhohzLzmOQpkXJrZGO9OVf8OO66yDrOth0HWZdh7c7u/q+vnpAtWFkjV0te05unz/7fdueUBT5OBCIwDC8NnQmbmLIhYcITrZt+TYMohl08XUkpr2b2AtYJHCAEvBCatOIAEFBWhIw8ThGgixlAyLuyQSA7iCHSMjCbZejQhxAH4d7k7nHwodmOHcfGgLKqt/Ei9WsJKWBscshu/PQokQWQz/0obirdIZL3yl34ohgPvfLnSmlZFScC8yRF6Y1OJGFecvSiQ7P6Emm3y3ZHQ7CJI44SYoDpYA5x1M5cNUMsYhYvHoZubpm4bHgEd5Lm6u+4yHks1M82ZM5pY4yzpRQKMpdjvIai6msdVqvAN8j6vswmMQ2S7LFZe/tJ1J8ARwi/Q6FfFJ2niZxbKdldBxwKq0l8U1BfKOKo4I4yn6EkgmYUg7mcklQHgJpBNLCPYTD8ujzfPQUnKvRFwXxQhUvC+KlKjpRQY0q6rygzivqfUG9V9VFQVyo4rIgLiu5oqAKVX1fEN+r4lVBvFLFdwXxnSr+qsTK2ZG7aCk3Pp7Kr91qzcUzlMSvzl7/nMT91ZOtlAgDo2xEztp4OO50+91Elclab497hjms6rmhaw4Mq86QOwZdSx+ONiwHijeH1vXOaNBRoxDZ6L2+Na7qG1h90B2aNYYN7dhqj7pZ+TAOFKub+6x+p10pi5vn9E3TPOpX9dxgti2rd1BjyB3WcDgcWCsUFnFGsOJla2Onc6RXo1ge1NM77UGNvpkAvWeaFVhWYDHHpjE0ViwCQ6I4Rb5YrN6gP1KDxKb+5sCyKhMoCuXvWcawXWPYsB4Nj0aHKxLKYeCqVaF5+Trd3mFPjaJ50LgrZ7DCQje/NO6berfCQgssY9My08KWd6LcZNfGTfzwQd7sO7BrgCRRzXKjqeZ0763NipfUmEmzu9b+mL9+AGmEr74pQk3xqCYcNcKgOhbUDI9q4dFj8G7V7zbFuzXhbiOMW8fiNsO7tfDuY/Cs6mdN8awmnDXCsDoW1gzPauHZY/Ci6hdN8aImXDTCiDoW0QwvauHFY/C06qdN8bQmnDbC0DoW2gxPa+FpGV7+80iP+ZCA9JhMCfCC7GBQ+max9PgwkxeYzP3w4kMsrwscv5bHirfyjAvlGe+H2Ibc9b0gkdcH195LW48Z4WJtlC15dTHUi0q1cXGwbxzt67+0d1+a2SXmqfat9p32vWZoXe2l9ko70c41pP23tbP19dY3rT9af7b+av39YH2ylY35Sis9rX//B6LUNMo=</latexit>

has comedy user u movie m


positive and one is negative, the score is
decreased.
user u movie m score = u1 m1 + u2 m2 + u3 m3
<latexit sha1_base64="62Z/cMQEVE70aUQzqhSK3UNWDA0=">AAAOI3icfZfLbuM2GIWV6W2ajttMu2q7IRoMULRBICWOLygCjCXbmEVnJg1ym0ZBQNG0IpgSCYpy7BG0K9An6dN0V3TTRV+gT1HKkWWJkqJNCJ7D408/SYV0GPFCoev/bD354MOPPv7k6afbnz1rff7FzvMvL0IacYTPESWUXzkwxMQL8LnwBMFXjGPoOwRfOjMr1S/nmIceDc7EkuEbH7qBN/UQFLLrduc3G7mxLfBCxFGIOUhAbDtTECWJ/ROwEco0n849vBb9JAG2vW0jmqkhohzLzmOQpkXJrZGO9OVf8OO66yDrOth0HWZdh7c7u/q+vnpAtWFkjV0te05unz/7fdueUBT5OBCIwDC8NnQmbmLIhYcITrZt+TYMohl08XUkpr2b2AtYJHCAEvBCatOIAEFBWhIw8ThGgixlAyLuyQSA7iCHSMjCbZejQhxAH4d7k7nHwodmOHcfGgLKqt/Ei9WsJKWBscshu/PQokQWQz/0obirdIZL3yl34ohgPvfLnSmlZFScC8yRF6Y1OJGFecvSiQ7P6Emm3y3ZHQ7CJI44SYoDpYA5x1M5cNUMsYhYvHoZubpm4bHgEd5Lm6u+4yHks1M82ZM5pY4yzpRQKMpdjvIai6msdVqvAN8j6vswmMQ2S7LFZe/tJ1J8ARwi/Q6FfFJ2niZxbKdldBxwKq0l8U1BfKOKo4I4yn6EkgmYUg7mcklQHgJpBNLCPYTD8ujzfPQUnKvRFwXxQhUvC+KlKjpRQY0q6rygzivqfUG9V9VFQVyo4rIgLiu5oqAKVX1fEN+r4lVBvFLFdwXxnSr+qsTK2ZG7aCk3Pp7Kr91qzcUzlMSvzl7/nMT91ZOtlAgDo2xEztp4OO50+91Elclab497hjms6rmhaw4Mq86QOwZdSx+ONiwHijeH1vXOaNBRoxDZ6L2+Na7qG1h90B2aNYYN7dhqj7pZ+TAOFKub+6x+p10pi5vn9E3TPOpX9dxgti2rd1BjyB3WcDgcWCsUFnFGsOJla2Onc6RXo1ge1NM77UGNvpkAvWeaFVhWYDHHpjE0ViwCQ6I4Rb5YrN6gP1KDxKb+5sCyKhMoCuXvWcawXWPYsB4Nj0aHKxLKYeCqVaF5+Trd3mFPjaJ50LgrZ7DCQje/NO6berfCQgssY9My08KWd6LcZNfGTfzwQd7sO7BrgCRRzXKjqeZ0763NipfUmEmzu9b+mL9+AGmEr74pQk3xqCYcNcKgOhbUDI9q4dFj8G7V7zbFuzXhbiOMW8fiNsO7tfDuY/Cs6mdN8awmnDXCsDoW1gzPauHZY/Ci6hdN8aImXDTCiDoW0QwvauHFY/C06qdN8bQmnDbC0DoW2gxPa+FpGV7+80iP+ZCA9JhMCfCC7GBQ+max9PgwkxeYzP3w4kMsrwscv5bHirfyjAvlGe+H2Ibc9b0gkdcH195LW48Z4WJtlC15dTHUi0q1cXGwbxzt67+0d1+a2SXmqfat9p32vWZoXe2l9ko70c41pP23tbP19dY3rT9af7b+av39YH2ylY35Sis9rX//B6LUNMo=</latexit>

score = u1 m1 + u2 m2 + u3 m3 Note that we’re not just taking into account the
likes romance
likes acbon
likes comedy

sign of the values, but also the magnitude. If a


!
edd ing vectors user’s preference for action is near zero, it doesn’t
no features? emb

matter much for the score whether the movie has


15
action.
As a simple example, let’s build a sequence
classifier consisting of just one embedding layer
global max pooling followed by a global maxpooling layer. We’ll
imagine a sentiment classification task where the
output sequence
aim is to predict whether a restaurant review is
simple self attention
positive or negative.
If we did this without the self-attention layer, we
embeddings
would essentially have a model where each word
embedding layer can only contribute to the output score
independently of the other. This is known as a bag
is

nt

as

le
inputs

no
of words model. In this case, the word terrible

to
th

rib
ra

w
au

r
te
st
re

would probably cause us to predict that this is a


negative review. In order to see that it might be a
positive review, we need to recognize that the
meaning of the word terrible is moderated by the
word not. This is what the self-attention can do for
us.

<latexit sha1_base64="wdldC1bqSTRlBfD7X34fN+OypEA=">AAANuXicfZfbbts2HMbV7tRl9ZZud9uNsKDAMASBlDg+YChQS7LRi7XNgpy6OAgompY10yJBUo5dQcDu9ia73V5nbzPKdmSJoqwrgt/Hzz/+KcqkT3HIhWX99+TpJ59+9vkXz77c++p54+tv9l98e8VJzCC6hAQTduMDjnAYoUsRCoxuKENg5mN07U/dTL+eI8ZDEl2IJUV3MxBE4TiEQMiu+/3vhzBIhv7YXKb3Q4EWIhGIsVAOT+/3D6wja/WY1Ya9aRwYm+fs/sXzv/aGIwLjGYoExIDzW9ui4i4BTIRQJu4NY44ogFMQoNtYjDt3SRjRWKAIpuZLqY1jbApiZqDmKGQICryUDQBZKBNMOAEMQAnI98pRHEVghvjhaB5Svm7yebBuCCAnc5csVrVKSwOTgAE6CeGiRJaAGZ8BMal08uXML3eiGCM2n5U7M0rJqDgXiMGQZzU4k4V5T7Py8wtyttEnSzpBEU+TmOG0OFAKckHQWA5cNTkSMU1Wk5FrPuWvBIvRYdZc9b3yAJueo9GhzCl1lHHGmABR7vKVaSzGstZZvSL0AMlsBqJRMqRpsn5LhodHqRRfmj6Wfp8ANio7z9MkGWZl9H3zXFpL4ruC+E4V+wWxv/kRgkfmmDBzLl8Jwrgpjaa0sBAiXh59mY8em5dq9FVBvFLF64J4rYp+XFDjijovqPOK+lBQH1R1URAXqrgsiMtKriioQlU/FsSPqnhTEG9U8UNB/KCKvyuxcnXkLlrKjY/G8hu0eueSKUyTNxdvf02T7urZvCkxMu2yEfqPxpNBq91tp6qMH/XmoGM7XlXPDW2nZ7s6Q+7otV3L629ZjhVvDm1ZrX6vpUZBvNU7XXdQ1bewVq/tORrDlnbgNvvtTfkQihRrkPvcbqtZKUuQ53QdxzntVvXc4DRdt3OsMeQO1/O8nrtCoTGjGCle+mhstU6tahTNgzpWq9nT6NsFsDqOU4GlBRZn4NievWIRCGDFKfKXxe30un01SGzr7/Rct7KAolD+jmt7TY1hy3rqnfZPViSEgShQq0Ly8rXanZOOGkXyoEFbrmCFhWx/adB1rHaFhRRYBo7rZIUt70S5yW7tu2T9Qd7uO/PANtNUNcuNppqzvfdoVrxYY8b1bq19l18/ANfCV2cKYV081ITDWhioY4H18FALD3fBB1V/UBcfaMKDWphAxxLUwwda+GAXPK36aV081YTTWhiqY6H18FQLT3fBi6pf1MULTbiohRE6FlEPL7TwYhc8qfpJXTzRhJNaGKJjIfXwRAtPyvDyzyM75gNsZsdkgs0w2hwMSt8smh0fplCeJNfu9cQ9JK8LDL2Vx4r38owL5Bnv52QIWDALo1ReH4LhYdbaZQSLR6NsyauLrV5Uqo2r4yP79Mj6rXnw2tlcYp4ZPxg/Gj8ZttE2XhtvjDPj0oDGn8bfxj/Gv41fGqAxafyxtj59shnznVF6Gvx/zFMNMQ==</latexit>

yterrible If the embedding vectors of not and terrible have


a high dot product together, the weight of the
input vector for not becomes high, allowing it to
influence the meaning of the word terrible in the
+ output sequence.
<latexit sha1_base64="hT+FQzXOLeuz6OSFWbXGYnQt0eU=">AAANnnicfZdNb9s2HMbVdi9dVq/tetxFWFBg2IJAShy/HArUkmz0sDRpECfZYqOgaFoRTIkERTl2BZ132HX7cPs2o/wiSxRlnQg+Dx/9+Kdoky7FfsQN478nT5999fU33z7/7uD7F40fXr56/eNNRGIG0RASTNidCyKE/RANuc8xuqMMgcDF6Nad2Zl+O0cs8kl4zZcUjQPghf7Uh4CLrk+/fX51aBwbq0evNsxN41DbPJefX7/462A0ITAOUMghBlF0bxqUjxPAuA8xSg9GcYQogDPgofuYTzvjxA9pzFEIU/2t0KYx1jnRMxp94jMEOV6KBoDMFwk6fAAMQC6YD8pREQpBgKKjydyn0boZzb11gwMx4XGyWBUkLQ1MPAbogw8XJbIEBFEA+EOlM1oGbrkTxRixeVDuzCgFo+RcIAb9KKvBpSjMBc1qHF2Ty43+sKQPKIzSJGY4LQ4UAmIMTcXAVTNCPKbJajJiYWfRO85idJQ1V33vHMBmV2hyJHJKHWWcKSaAl7tcaRqLqah1Vq8QPUISBCCcJCOaJiOOFjwZHR2nQnyru1j4XQLYpOy8SpNklJXRdfUrYS2JHwviR1nsF8T+5iUET/QpYfpcfBKERbow6sLCfIii8uhhPnqqD+Xom4J4I4u3BfFWFt24oMYVdV5Q5xX1saA+yuqiIC5kcVkQl5VcXlC5rH4piF9k8W7fS//Y99I/pVixOmIXLcXGR1PxQ7P65pIZTJMP1+e/p0l39Wy+lBjpZtkI3a3xdNBqd9upLOOt3hx0TMup6rmhbfVMW2XIHb22bTj9HcuJ5M2hDaPV77XkKIh3eqdrD6r6DtbotR1LYdjRDuxmv70pH0KhZPVyn91tNStl8fKcrmVZZ92qnhuspm13ThSG3GE7jtOzVyg0ZhQjyUu3xlbrzKhG0TyoY7SaPYW+WwCjY1kVWFpgsQaW6ZgrFo4Alpw8/1jsTq/bl4P4rv5Wz7YrC8gL5e/YptNUGHasZ85Z/3RFQhgIPbkqJC9fq9057chRJA8atMUKVljI7k2DrmW0KyykwDKwbCsrbHknik12b46T9Q/ybt/ph6aeprJZbDTZnO29rVnyYoUZ17uV9n1+9QBcC1+dKYR18VARDmthoIoF1sNDJTzcB+9V/V5dvKcI92phPBWLVw/vKeG9ffC06qd18VQRTmthqIqF1sNTJTzdB8+rfl4XzxXhvBaGq1h4PTxXwvN98KTqJ3XxRBFOamGIioXUwxMlPCnDiz+P7JgPsJ4dkwnW/XBzMCj9ZtHs+DCD4iS5dq8n7iBxXWDoXBwrLsQZF4gz3q/JCDAv8MNUXB+80VHW2mcEi61RtMTVxZQvKtXGzcmxeXZsfGoevrc2l5jn2k/az9ovmqm1tffaB+1SG2pQQ9rf2j/avw29MWicNy7W1qdPNmPeaKWncfc//1EBxQ==</latexit>

vterrible vnot
<latexit sha1_base64="A9NwmAiYJPzpQe9MqsgsZWOdhMQ=">AAAN0nicfZdNb9s2HMbV7q3L6jXdjrsICwoMQxBIieMXDAVqSTZ6WNssyFsXBQFF04pgSiQoyrEj6DDssMu+wD7NrttH2LcZZTuyRFHWieDz8NGPf5I25VEcxNww/nvy9JNPP/v8i2df7nz1vPX1i92X31zEJGEQnUOCCbvyQIxwEKFzHnCMrihDIPQwuvSmdq5fzhCLAxKd8QVFNyHwo2ASQMBF1+3ukQu91PUm+iy7dTma85QjxgIxPNPdn/SaGhGe3e7uGQfG8tHrDXPd2NPWz8nty+d/7LhjApMQRRxiEMfXpkH5TQoYD6B41Y6bxIgCOAU+uk74pHeTBhFNOIpgpr8S2iTBOid6PgN9HDAEOV6IBoAsEAk6vAMMQEEe71SjYhSBEMX741lA41UznvmrBgdiljfpfFnErDIw9RmgdwGcV8hSEMYh4He1zngRetVOlGDEZmG1M6cUjJJzjhgM4rwGJ6IwH2i+LvEZOVnrdwt6h6I4SxOGs/JAIYiVQhMxcNmMEU9oupyM2AzT+DVnCdrPm8u+1w5g01M03hc5lY4qzgQTwKtdnjSN+UTUOq9XhO4hCUMQjVOXZulqg7j7B5kQX+keFn6PADauOk+zNHXzMnqefiqsFfF9SXwvi8OSOFy/hOCxPiFMn4ktQVisC6MuLCyAKK6OPi9GT/RzOfqiJF7I4mVJvJRFLympSU2dldRZTb0vqfeyOi+Jc1lclMRFLZeXVC6rDyXxQRavSuKVLH4siR9l8VcpVqyOOEULcfDRRPw4LfdcOoVZ+vbs3c9Z2l8+652SIN2sGqH3aDwadbr9bibL+FFvj3qm5dT1wtC1BqatMhSOQdc2nOGG5VDyFtCG0RkOOnIUxBu917dHdX0Dawy6jqUwbGhHdnvYXZcPoUiy+oXP7nfatbL4RU7fsqzjfl0vDFbbtnuHCkPhsB3HGdhLFJowipHkpY/GTufYqEfRIqhndNoDhb5ZAKNnWTVYWmKxRpbpmEsWjgCWnLzYLHZv0B/KQXxTf2tg27UF5KXy92zTaSsMG9Zj53h4tCQhDES+XBVSlK/T7R315ChSBI26YgVrLGTzplHfMro1FlJiGVm2lRe2ehLFIbs2b9LVD/Lm3Ol7pp5lslkcNNmcn71Hs+TFCjNudivt2/zqAbgRvj5TCJvioSIcNsJAFQtshodKeLgN3q/7/aZ4XxHuN8L4Kha/Gd5Xwvvb4GndT5viqSKcNsJQFQtthqdKeLoNntf9vCmeK8J5IwxXsfBmeK6E59vgSd1PmuKJIpw0whAVC2mGJ0p4UoUXfx75NR9gPb8mE6wH0fpiUPnNovn1YQrFTXLlXk3cQeJzgaF34lrxQdxxgbjj/Zi6gPlhEGXi88F39/PWNiOYPxpFS3y6mPKHSr1xcXhgHh8Yv7T33ljrj5hn2nfa99oPmql1tTfaW+1EO9eg9pf2t/aP9m/rrPXQ+q31+8r69Ml6zLda5Wn9+T+c/xeW</latexit>

⇥ ⇥ ⇥ ⇥ ⇥ ⇥
<latexit sha1_base64="VVOj9rnMzMPj3kMAppAgd9+7MrY=">AAANo3icfZdbb9s2HMXV7tZl9dpuj3sRFgQYhiCQEl8fCtSSbBTY2mZBbltsFBRNy5opkaAox66g573udftm+zajfJElirKeCJ7Dox//FG3SpdiPuGH89+TpZ59/8eVXz74++uZ549sXL199dxuRmEF0Awkm7N4FEcJ+iG64zzG6pwyBwMXozp3bmX63QCzySXjNVxSNA+CF/tSHgIuu2xH3AxR9fHlsnBnrR682zG3jWNs+lx9fPf/raDQhMA5QyCEGUfRgGpSPE8C4DzFKj0ZxhCiAc+Chh5hPu+PED2nMUQhT/URo0xjrnOgZkj7xGYIcr0QDQOaLBB3OAAOQC/CjclSEQiCITycLn0abZrTwNg0OxKzHyXJdlbQ0MPEYoDMfLktkCQiiAPBZpTNaBW65E8UYsUVQ7swoBaPkXCIG/SirwaUozAeaFTq6JpdbfbaiMxRGaRIznBYHCgExhqZi4LoZIR7TZD0Zsbrz6DVnMTrNmuu+1w5g8ys0ORU5pY4yzhQTwMtdrjSN5VTUOqtXiB4hCQIQTpIRTZMRR0uejE7PUiGe6C4WfpcANik7r9IkGWVldF39SlhL4vuC+F4WBwVxsH0JwRN9Spi+EJ8EYZEujLqwMB+iqDz6Jh891W/k6NuCeCuLdwXxThbduKDGFXVRUBcV9bGgPsrqsiAuZXFVEFeVXF5Quax+KoifZPH+0Et/P/TSP6RYsTpiF63ExkdT8Wuz/uaSOUyTt9fvfk2T3vrZfikx0s2yEbo748Ww3el1UlnGO7057JqWU9VzQ8fqm7bKkDv6HdtwBnuWc8mbQxtGe9Bvy1EQ7/Vuzx5W9T2s0e84lsKwpx3azUFnWz6EQsnq5T67125WyuLlOT3Lslq9qp4brKZtd88VhtxhO47Tt9coNGYUI8lLd8Z2u2VUo2ge1DXazb5C3y+A0bWsCiwtsFhDy3TMNQtHAEtOnn8sdrffG8hBfF9/q2/blQXkhfJ3bdNpKgx71pbTGlysSQgDoSdXheTla3e6F105iuRBw45YwQoL2b9p2LOMToWFFFiGlm1lhS3vRLHJHsxxsvlB3u87/djU01Q2i40mm7O9tzNLXqww43q30n7Irx6Aa+GrM4WwLh4qwmEtDFSxwHp4qISHh+C9qt+ri/cU4V4tjKdi8erhPSW8dwieVv20Lp4qwmktDFWx0Hp4qoSnh+B51c/r4rkinNfCcBULr4fnSnh+CJ5U/aQunijCSS0MUbGQeniihCdlePHnkR3zAdazYzLBuh9uDwal3yyaHR/mUJwkN+7NxB0krgsMvRPHig/ijAvEGe/nZASYF/hhKq4P3ug0ax0yguXOKFri6mLKF5Vq4/b8zGydGb81j99Y20vMM+0H7UftJ83UOtob7a12qd1oUPtT+1v7R/u3cdL4pXHVuN5Ynz7ZjvleKz2N8f+4KwRK</latexit> <latexit sha1_base64="VVOj9rnMzMPj3kMAppAgd9+7MrY=">AAANo3icfZdbb9s2HMXV7tZl9dpuj3sRFgQYhiCQEl8fCtSSbBTY2mZBbltsFBRNy5opkaAox66g573udftm+zajfJElirKeCJ7Dox//FG3SpdiPuGH89+TpZ59/8eVXz74++uZ549sXL199dxuRmEF0Awkm7N4FEcJ+iG64zzG6pwyBwMXozp3bmX63QCzySXjNVxSNA+CF/tSHgIuu2xH3AxR9fHlsnBnrR682zG3jWNs+lx9fPf/raDQhMA5QyCEGUfRgGpSPE8C4DzFKj0ZxhCiAc+Chh5hPu+PED2nMUQhT/URo0xjrnOgZkj7xGYIcr0QDQOaLBB3OAAOQC/CjclSEQiCITycLn0abZrTwNg0OxKzHyXJdlbQ0MPEYoDMfLktkCQiiAPBZpTNaBW65E8UYsUVQ7swoBaPkXCIG/SirwaUozAeaFTq6JpdbfbaiMxRGaRIznBYHCgExhqZi4LoZIR7TZD0Zsbrz6DVnMTrNmuu+1w5g8ys0ORU5pY4yzhQTwMtdrjSN5VTUOqtXiB4hCQIQTpIRTZMRR0uejE7PUiGe6C4WfpcANik7r9IkGWVldF39SlhL4vuC+F4WBwVxsH0JwRN9Spi+EJ8EYZEujLqwMB+iqDz6Jh891W/k6NuCeCuLdwXxThbduKDGFXVRUBcV9bGgPsrqsiAuZXFVEFeVXF5Quax+KoifZPH+0Et/P/TSP6RYsTpiF63ExkdT8Wuz/uaSOUyTt9fvfk2T3vrZfikx0s2yEbo748Ww3el1UlnGO7057JqWU9VzQ8fqm7bKkDv6HdtwBnuWc8mbQxtGe9Bvy1EQ7/Vuzx5W9T2s0e84lsKwpx3azUFnWz6EQsnq5T67125WyuLlOT3Lslq9qp4brKZtd88VhtxhO47Tt9coNGYUI8lLd8Z2u2VUo2ge1DXazb5C3y+A0bWsCiwtsFhDy3TMNQtHAEtOnn8sdrffG8hBfF9/q2/blQXkhfJ3bdNpKgx71pbTGlysSQgDoSdXheTla3e6F105iuRBw45YwQoL2b9p2LOMToWFFFiGlm1lhS3vRLHJHsxxsvlB3u87/djU01Q2i40mm7O9tzNLXqww43q30n7Irx6Aa+GrM4WwLh4qwmEtDFSxwHp4qISHh+C9qt+ri/cU4V4tjKdi8erhPSW8dwieVv20Lp4qwmktDFWx0Hp4qoSnh+B51c/r4rkinNfCcBULr4fnSnh+CJ5U/aQunijCSS0MUbGQeniihCdlePHnkR3zAdazYzLBuh9uDwal3yyaHR/mUJwkN+7NxB0krgsMvRPHig/ijAvEGe/nZASYF/hhKq4P3ug0ax0yguXOKFri6mLKF5Vq4/b8zGydGb81j99Y20vMM+0H7UftJ83UOtob7a12qd1oUPtT+1v7R/u3cdL4pXHVuN5Ynz7ZjvleKz2N8f+4KwRK</latexit> <latexit sha1_base64="VVOj9rnMzMPj3kMAppAgd9+7MrY=">AAANo3icfZdbb9s2HMXV7tZl9dpuj3sRFgQYhiCQEl8fCtSSbBTY2mZBbltsFBRNy5opkaAox66g573udftm+zajfJElirKeCJ7Dox//FG3SpdiPuGH89+TpZ59/8eVXz74++uZ549sXL199dxuRmEF0Awkm7N4FEcJ+iG64zzG6pwyBwMXozp3bmX63QCzySXjNVxSNA+CF/tSHgIuu2xH3AxR9fHlsnBnrR682zG3jWNs+lx9fPf/raDQhMA5QyCEGUfRgGpSPE8C4DzFKj0ZxhCiAc+Chh5hPu+PED2nMUQhT/URo0xjrnOgZkj7xGYIcr0QDQOaLBB3OAAOQC/CjclSEQiCITycLn0abZrTwNg0OxKzHyXJdlbQ0MPEYoDMfLktkCQiiAPBZpTNaBW65E8UYsUVQ7swoBaPkXCIG/SirwaUozAeaFTq6JpdbfbaiMxRGaRIznBYHCgExhqZi4LoZIR7TZD0Zsbrz6DVnMTrNmuu+1w5g8ys0ORU5pY4yzhQTwMtdrjSN5VTUOqtXiB4hCQIQTpIRTZMRR0uejE7PUiGe6C4WfpcANik7r9IkGWVldF39SlhL4vuC+F4WBwVxsH0JwRN9Spi+EJ8EYZEujLqwMB+iqDz6Jh891W/k6NuCeCuLdwXxThbduKDGFXVRUBcV9bGgPsrqsiAuZXFVEFeVXF5Quax+KoifZPH+0Et/P/TSP6RYsTpiF63ExkdT8Wuz/uaSOUyTt9fvfk2T3vrZfikx0s2yEbo748Ww3el1UlnGO7057JqWU9VzQ8fqm7bKkDv6HdtwBnuWc8mbQxtGe9Bvy1EQ7/Vuzx5W9T2s0e84lsKwpx3azUFnWz6EQsnq5T67125WyuLlOT3Lslq9qp4brKZtd88VhtxhO47Tt9coNGYUI8lLd8Z2u2VUo2ge1DXazb5C3y+A0bWsCiwtsFhDy3TMNQtHAEtOnn8sdrffG8hBfF9/q2/blQXkhfJ3bdNpKgx71pbTGlysSQgDoSdXheTla3e6F105iuRBw45YwQoL2b9p2LOMToWFFFiGlm1lhS3vRLHJHsxxsvlB3u87/djU01Q2i40mm7O9tzNLXqww43q30n7Irx6Aa+GrM4WwLh4qwmEtDFSxwHp4qISHh+C9qt+ri/cU4V4tjKdi8erhPSW8dwieVv20Lp4qwmktDFWx0Hp4qoSnh+B51c/r4rkinNfCcBULr4fnSnh+CJ5U/aQunijCSS0MUbGQeniihCdlePHnkR3zAdazYzLBuh9uDwal3yyaHR/mUJwkN+7NxB0krgsMvRPHig/ijAvEGe/nZASYF/hhKq4P3ug0ax0yguXOKFri6mLKF5Vq4/b8zGydGb81j99Y20vMM+0H7UftJ83UOtob7a12qd1oUPtT+1v7R/u3cdL4pXHVuN5Ynz7ZjvleKz2N8f+4KwRK</latexit> <latexit sha1_base64="VVOj9rnMzMPj3kMAppAgd9+7MrY=">AAANo3icfZdbb9s2HMXV7tZl9dpuj3sRFgQYhiCQEl8fCtSSbBTY2mZBbltsFBRNy5opkaAox66g573udftm+zajfJElirKeCJ7Dox//FG3SpdiPuGH89+TpZ59/8eVXz74++uZ549sXL199dxuRmEF0Awkm7N4FEcJ+iG64zzG6pwyBwMXozp3bmX63QCzySXjNVxSNA+CF/tSHgIuu2xH3AxR9fHlsnBnrR682zG3jWNs+lx9fPf/raDQhMA5QyCEGUfRgGpSPE8C4DzFKj0ZxhCiAc+Chh5hPu+PED2nMUQhT/URo0xjrnOgZkj7xGYIcr0QDQOaLBB3OAAOQC/CjclSEQiCITycLn0abZrTwNg0OxKzHyXJdlbQ0MPEYoDMfLktkCQiiAPBZpTNaBW65E8UYsUVQ7swoBaPkXCIG/SirwaUozAeaFTq6JpdbfbaiMxRGaRIznBYHCgExhqZi4LoZIR7TZD0Zsbrz6DVnMTrNmuu+1w5g8ys0ORU5pY4yzhQTwMtdrjSN5VTUOqtXiB4hCQIQTpIRTZMRR0uejE7PUiGe6C4WfpcANik7r9IkGWVldF39SlhL4vuC+F4WBwVxsH0JwRN9Spi+EJ8EYZEujLqwMB+iqDz6Jh891W/k6NuCeCuLdwXxThbduKDGFXVRUBcV9bGgPsrqsiAuZXFVEFeVXF5Quax+KoifZPH+0Et/P/TSP6RYsTpiF63ExkdT8Wuz/uaSOUyTt9fvfk2T3vrZfikx0s2yEbo748Ww3el1UlnGO7057JqWU9VzQ8fqm7bKkDv6HdtwBnuWc8mbQxtGe9Bvy1EQ7/Vuzx5W9T2s0e84lsKwpx3azUFnWz6EQsnq5T67125WyuLlOT3Lslq9qp4brKZtd88VhtxhO47Tt9coNGYUI8lLd8Z2u2VUo2ge1DXazb5C3y+A0bWsCiwtsFhDy3TMNQtHAEtOnn8sdrffG8hBfF9/q2/blQXkhfJ3bdNpKgx71pbTGlysSQgDoSdXheTla3e6F105iuRBw45YwQoL2b9p2LOMToWFFFiGlm1lhS3vRLHJHsxxsvlB3u87/djU01Q2i40mm7O9tzNLXqww43q30n7Irx6Aa+GrM4WwLh4qwmEtDFSxwHp4qISHh+C9qt+ri/cU4V4tjKdi8erhPSW8dwieVv20Lp4qwmktDFWx0Hp4qoSnh+B51c/r4rkinNfCcBULr4fnSnh+CJ5U/aQunijCSS0MUbGQeniihCdlePHnkR3zAdazYzLBuh9uDwal3yyaHR/mUJwkN+7NxB0krgsMvRPHig/ijAvEGe/nZASYF/hhKq4P3ug0ax0yguXOKFri6mLKF5Vq4/b8zGydGb81j99Y20vMM+0H7UftJ83UOtob7a12qd1oUPtT+1v7R/u3cdL4pXHVuN5Ynz7ZjvleKz2N8f+4KwRK</latexit> <latexit sha1_base64="VVOj9rnMzMPj3kMAppAgd9+7MrY=">AAANo3icfZdbb9s2HMXV7tZl9dpuj3sRFgQYhiCQEl8fCtSSbBTY2mZBbltsFBRNy5opkaAox66g573udftm+zajfJElirKeCJ7Dox//FG3SpdiPuGH89+TpZ59/8eVXz74++uZ549sXL199dxuRmEF0Awkm7N4FEcJ+iG64zzG6pwyBwMXozp3bmX63QCzySXjNVxSNA+CF/tSHgIuu2xH3AxR9fHlsnBnrR682zG3jWNs+lx9fPf/raDQhMA5QyCEGUfRgGpSPE8C4DzFKj0ZxhCiAc+Chh5hPu+PED2nMUQhT/URo0xjrnOgZkj7xGYIcr0QDQOaLBB3OAAOQC/CjclSEQiCITycLn0abZrTwNg0OxKzHyXJdlbQ0MPEYoDMfLktkCQiiAPBZpTNaBW65E8UYsUVQ7swoBaPkXCIG/SirwaUozAeaFTq6JpdbfbaiMxRGaRIznBYHCgExhqZi4LoZIR7TZD0Zsbrz6DVnMTrNmuu+1w5g8ys0ORU5pY4yzhQTwMtdrjSN5VTUOqtXiB4hCQIQTpIRTZMRR0uejE7PUiGe6C4WfpcANik7r9IkGWVldF39SlhL4vuC+F4WBwVxsH0JwRN9Spi+EJ8EYZEujLqwMB+iqDz6Jh891W/k6NuCeCuLdwXxThbduKDGFXVRUBcV9bGgPsrqsiAuZXFVEFeVXF5Quax+KoifZPH+0Et/P/TSP6RYsTpiF63ExkdT8Wuz/uaSOUyTt9fvfk2T3vrZfikx0s2yEbo748Ww3el1UlnGO7057JqWU9VzQ8fqm7bKkDv6HdtwBnuWc8mbQxtGe9Bvy1EQ7/Vuzx5W9T2s0e84lsKwpx3azUFnWz6EQsnq5T67125WyuLlOT3Lslq9qp4brKZtd88VhtxhO47Tt9coNGYUI8lLd8Z2u2VUo2ge1DXazb5C3y+A0bWsCiwtsFhDy3TMNQtHAEtOnn8sdrffG8hBfF9/q2/blQXkhfJ3bdNpKgx71pbTGlysSQgDoSdXheTla3e6F105iuRBw45YwQoL2b9p2LOMToWFFFiGlm1lhS3vRLHJHsxxsvlB3u87/djU01Q2i40mm7O9tzNLXqww43q30n7Irx6Aa+GrM4WwLh4qwmEtDFSxwHp4qISHh+C9qt+ri/cU4V4tjKdi8erhPSW8dwieVv20Lp4qwmktDFWx0Hp4qoSnh+B51c/r4rkinNfCcBULr4fnSnh+CJ5U/aQunijCSS0MUbGQeniihCdlePHnkR3zAdazYzLBuh9uDwal3yyaHR/mUJwkN+7NxB0krgsMvRPHig/ijAvEGe/nZASYF/hhKq4P3ug0ax0yguXOKFri6mLKF5Vq4/b8zGydGb81j99Y20vMM+0H7UftJ83UOtob7a12qd1oUPtT+1v7R/u3cdL4pXHVuN5Ynz7ZjvleKz2N8f+4KwRK</latexit> <latexit sha1_base64="VVOj9rnMzMPj3kMAppAgd9+7MrY=">AAANo3icfZdbb9s2HMXV7tZl9dpuj3sRFgQYhiCQEl8fCtSSbBTY2mZBbltsFBRNy5opkaAox66g573udftm+zajfJElirKeCJ7Dox//FG3SpdiPuGH89+TpZ59/8eVXz74++uZ549sXL199dxuRmEF0Awkm7N4FEcJ+iG64zzG6pwyBwMXozp3bmX63QCzySXjNVxSNA+CF/tSHgIuu2xH3AxR9fHlsnBnrR682zG3jWNs+lx9fPf/raDQhMA5QyCEGUfRgGpSPE8C4DzFKj0ZxhCiAc+Chh5hPu+PED2nMUQhT/URo0xjrnOgZkj7xGYIcr0QDQOaLBB3OAAOQC/CjclSEQiCITycLn0abZrTwNg0OxKzHyXJdlbQ0MPEYoDMfLktkCQiiAPBZpTNaBW65E8UYsUVQ7swoBaPkXCIG/SirwaUozAeaFTq6JpdbfbaiMxRGaRIznBYHCgExhqZi4LoZIR7TZD0Zsbrz6DVnMTrNmuu+1w5g8ys0ORU5pY4yzhQTwMtdrjSN5VTUOqtXiB4hCQIQTpIRTZMRR0uejE7PUiGe6C4WfpcANik7r9IkGWVldF39SlhL4vuC+F4WBwVxsH0JwRN9Spi+EJ8EYZEujLqwMB+iqDz6Jh891W/k6NuCeCuLdwXxThbduKDGFXVRUBcV9bGgPsrqsiAuZXFVEFeVXF5Quax+KoifZPH+0Et/P/TSP6RYsTpiF63ExkdT8Wuz/uaSOUyTt9fvfk2T3vrZfikx0s2yEbo748Ww3el1UlnGO7057JqWU9VzQ8fqm7bKkDv6HdtwBnuWc8mbQxtGe9Bvy1EQ7/Vuzx5W9T2s0e84lsKwpx3azUFnWz6EQsnq5T67125WyuLlOT3Lslq9qp4brKZtd88VhtxhO47Tt9coNGYUI8lLd8Z2u2VUo2ge1DXazb5C3y+A0bWsCiwtsFhDy3TMNQtHAEtOnn8sdrffG8hBfF9/q2/blQXkhfJ3bdNpKgx71pbTGlysSQgDoSdXheTla3e6F105iuRBw45YwQoL2b9p2LOMToWFFFiGlm1lhS3vRLHJHsxxsvlB3u87/djU01Q2i40mm7O9tzNLXqww43q30n7Irx6Aa+GrM4WwLh4qwmEtDFSxwHp4qISHh+C9qt+ri/cU4V4tjKdi8erhPSW8dwieVv20Lp4qwmktDFWx0Hp4qoSnh+B51c/r4rkinNfCcBULr4fnSnh+CJ5U/aQunijCSS0MUbGQeniihCdlePHnkR3zAdazYzLBuh9uDwal3yyaHR/mUJwkN+7NxB0krgsMvRPHig/ijAvEGe/nZASYF/hhKq4P3ug0ax0yguXOKFri6mLKF5Vq4/b8zGydGb81j99Y20vMM+0H7UftJ83UOtob7a12qd1oUPtT+1v7R/u3cdL4pXHVuN5Ynz7ZjvleKz2N8f+4KwRK</latexit>

vterrible vnot
<latexit sha1_base64="A9NwmAiYJPzpQe9MqsgsZWOdhMQ=">AAAN0nicfZdNb9s2HMbV7q3L6jXdjrsICwoMQxBIieMXDAVqSTZ6WNssyFsXBQFF04pgSiQoyrEj6DDssMu+wD7NrttH2LcZZTuyRFHWieDz8NGPf5I25VEcxNww/nvy9JNPP/v8i2df7nz1vPX1i92X31zEJGEQnUOCCbvyQIxwEKFzHnCMrihDIPQwuvSmdq5fzhCLAxKd8QVFNyHwo2ASQMBF1+3ukQu91PUm+iy7dTma85QjxgIxPNPdn/SaGhGe3e7uGQfG8tHrDXPd2NPWz8nty+d/7LhjApMQRRxiEMfXpkH5TQoYD6B41Y6bxIgCOAU+uk74pHeTBhFNOIpgpr8S2iTBOid6PgN9HDAEOV6IBoAsEAk6vAMMQEEe71SjYhSBEMX741lA41UznvmrBgdiljfpfFnErDIw9RmgdwGcV8hSEMYh4He1zngRetVOlGDEZmG1M6cUjJJzjhgM4rwGJ6IwH2i+LvEZOVnrdwt6h6I4SxOGs/JAIYiVQhMxcNmMEU9oupyM2AzT+DVnCdrPm8u+1w5g01M03hc5lY4qzgQTwKtdnjSN+UTUOq9XhO4hCUMQjVOXZulqg7j7B5kQX+keFn6PADauOk+zNHXzMnqefiqsFfF9SXwvi8OSOFy/hOCxPiFMn4ktQVisC6MuLCyAKK6OPi9GT/RzOfqiJF7I4mVJvJRFLympSU2dldRZTb0vqfeyOi+Jc1lclMRFLZeXVC6rDyXxQRavSuKVLH4siR9l8VcpVqyOOEULcfDRRPw4LfdcOoVZ+vbs3c9Z2l8+652SIN2sGqH3aDwadbr9bibL+FFvj3qm5dT1wtC1BqatMhSOQdc2nOGG5VDyFtCG0RkOOnIUxBu917dHdX0Dawy6jqUwbGhHdnvYXZcPoUiy+oXP7nfatbL4RU7fsqzjfl0vDFbbtnuHCkPhsB3HGdhLFJowipHkpY/GTufYqEfRIqhndNoDhb5ZAKNnWTVYWmKxRpbpmEsWjgCWnLzYLHZv0B/KQXxTf2tg27UF5KXy92zTaSsMG9Zj53h4tCQhDES+XBVSlK/T7R315ChSBI26YgVrLGTzplHfMro1FlJiGVm2lRe2ehLFIbs2b9LVD/Lm3Ol7pp5lslkcNNmcn71Hs+TFCjNudivt2/zqAbgRvj5TCJvioSIcNsJAFQtshodKeLgN3q/7/aZ4XxHuN8L4Kha/Gd5Xwvvb4GndT5viqSKcNsJQFQtthqdKeLoNntf9vCmeK8J5IwxXsfBmeK6E59vgSd1PmuKJIpw0whAVC2mGJ0p4UoUXfx75NR9gPb8mE6wH0fpiUPnNovn1YQrFTXLlXk3cQeJzgaF34lrxQdxxgbjj/Zi6gPlhEGXi88F39/PWNiOYPxpFS3y6mPKHSr1xcXhgHh8Yv7T33ljrj5hn2nfa99oPmql1tTfaW+1EO9eg9pf2t/aP9m/rrPXQ+q31+8r69Ml6zLda5Wn9+T+c/xeW</latexit>

this restaurant was not too terrible

BELLS AND WHISTLES: STANDARD SELF-ATTENTION The standard self attention add some bells and
whistles to this basic framework. We’ll discuss the
• scaled dot product three most important additions.
• key, value and query transformations
• multi-head attention

18
SCALED SELF-ATTENTION Scaled self attention is very simple: instead of
using the dot product, we use the dot product
scaled by the square root of the input dimension.
This ensures that the input and output of the self
attention operation have similar variance.
xi T xj
<latexit sha1_base64="vlnS2v28DxeK7y6kIZsKP0NONDY=">AAAN33icfZdNb9s2HMbV7q3L6i3djrsIC7oNQxBIieOXQ4BYko0e1jYL8tItygKKphXVtMhRlGNX0Lm3Ydd9gX2aXTdg32aUX2SJoqwTwefhox//Im3SoziIuGH89+jxBx9+9PEnTz7d+exp4/Mvdp99eRWRmEF0CQkm7I0HIoSDEF3ygGP0hjIEJh5G197YzvTrKWJRQMILPqfodgL8MBgFEHDRdbd76kKSPKTf3SXB21T/9kR3RwzAJHGhl7iz9C5If71Yt9+miRv9xrgQeTJO01R33Z273T3jwFg8erVhrhp72uo5u3v29P2OOyQwnqCQQwyi6MY0KL9NAOMBxCjdceMIUQDHwEc3MR91bpMgpDFHIUz150IbxVjnRM+mow8DhiDHc9EAkAUiQYf3QEyBi0nvlKMiFIIJivaH04BGy2Y09ZcNDkTFbpPZoqJpaWDiM0DvAzgrkSVgEk0Av690RvOJV+5EMUZsOil3ZpSCUXLOEINBlNXgTBTmNc0+UnRBzlb6/ZzeozBKk5jhtDhQCIgxNBIDF80I8Zgmi8mIlTGOTjiL0X7WXPSdOICNz9FwX+SUOso4I0wAL3d50jRm2XLJ6hWiB0gmExAOE5eKhcLRTCyU/YNUiM91Dwu/RwAblp3nqVhqWRk9Tz8X1pL4qiC+ksV+QeyvXkLwUB8Rpk/FkiAs0oVRFxYWQBSVR1/mo0f6pRx9VRCvZPG6IF7LohcX1LiiTgvqtKI+FNQHWZ0VxJkszgvivJLLCyqX1XcF8Z0svtn20p+3vfQXKVZ8HbGL5mLjo5H4pVqsuWQM0+TFxcsf06S7eFYrJUa6WTZCb208GrTa3XYqy3itNwcd03Kqem5oWz3TVhlyR69tG05/w3IoeXNow2j1ey05CuKN3unag6q+gTV6bcdSGDa0A7vZb6/Kh1AoWf3cZ3dbzUpZ/Dyna1nWcbeq5waradudQ4Uhd9iO4/TsBQqNGcVI8tK1sdU6NqpRNA/qGK1mT6FvPoDRsawKLC2wWAPLdMwFC0cAS06eLxa70+v25SC+qb/Vs+3KB+SF8nds02kqDBvWY+e4f7QgIQyEvlwVkpev1e4cdeQokgcN2uILVljI5k2DrmW0KyykwDKwbCsrbHknik12Y94myx/kzb7T90w9TWWz2GiyOdt7a7PkxQozrncr7dv86gG4Fr46Uwjr4qEiHNbCQBULrIeHSni4Dd6v+v26eF8R7tfC+CoWvx7eV8L72+Bp1U/r4qkinNbCUBULrYenSni6DZ5X/bwunivCeS0MV7HweniuhOfb4EnVT+riiSKc1MIQFQuphydKeFKGF38e2TEfYD07JhOsB+HqYFD6zaLZ8WEMxUly6V5O3EHiusDQS3GseC3OuECc8X5IXMD8SRCm4vrgu/tZa5sRzNZG0RJXF1O+qFQbV4cH5vGB8VNz79RaXWKeaF9r32jfa6bW1k61F9qZdqlB7S/tb+0f7d8GaLxv/N74Y2l9/Gg15iut9DT+/B9ehR1l</latexit>

0
wij = p Why √k? Imagine a vector in ℝk with values all c.
k <- inp
ut d im Its Euclidean length is √kc. Therefore, we are
ension dividing out the amount by which the increase in
dimension increases the length of the average
vectors. Transformer usually models apply
19
normalization at every layer, so we can usually
assume that the input is standard-normally
distributed.

KEYS, QUERIES AND VALUES


<latexit sha1_base64="wdldC1bqSTRlBfD7X34fN+OypEA=">AAANuXicfZfbbts2HMbV7tRl9ZZud9uNsKDAMASBlDg+YChQS7LRi7XNgpy6OAgompY10yJBUo5dQcDu9ia73V5nbzPKdmSJoqwrgt/Hzz/+KcqkT3HIhWX99+TpJ59+9vkXz77c++p54+tv9l98e8VJzCC6hAQTduMDjnAYoUsRCoxuKENg5mN07U/dTL+eI8ZDEl2IJUV3MxBE4TiEQMiu+/3vhzBIhv7YXKb3Q4EWIhGIsVAOT+/3D6wja/WY1Ya9aRwYm+fs/sXzv/aGIwLjGYoExIDzW9ui4i4BTIRQJu4NY44ogFMQoNtYjDt3SRjRWKAIpuZLqY1jbApiZqDmKGQICryUDQBZKBNMOAEMQAnI98pRHEVghvjhaB5Svm7yebBuCCAnc5csVrVKSwOTgAE6CeGiRJaAGZ8BMal08uXML3eiGCM2n5U7M0rJqDgXiMGQZzU4k4V5T7Py8wtyttEnSzpBEU+TmOG0OFAKckHQWA5cNTkSMU1Wk5FrPuWvBIvRYdZc9b3yAJueo9GhzCl1lHHGmABR7vKVaSzGstZZvSL0AMlsBqJRMqRpsn5LhodHqRRfmj6Wfp8ANio7z9MkGWZl9H3zXFpL4ruC+E4V+wWxv/kRgkfmmDBzLl8Jwrgpjaa0sBAiXh59mY8em5dq9FVBvFLF64J4rYp+XFDjijovqPOK+lBQH1R1URAXqrgsiMtKriioQlU/FsSPqnhTEG9U8UNB/KCKvyuxcnXkLlrKjY/G8hu0eueSKUyTNxdvf02T7urZvCkxMu2yEfqPxpNBq91tp6qMH/XmoGM7XlXPDW2nZ7s6Q+7otV3L629ZjhVvDm1ZrX6vpUZBvNU7XXdQ1bewVq/tORrDlnbgNvvtTfkQihRrkPvcbqtZKUuQ53QdxzntVvXc4DRdt3OsMeQO1/O8nrtCoTGjGCle+mhstU6tahTNgzpWq9nT6NsFsDqOU4GlBRZn4NievWIRCGDFKfKXxe30un01SGzr7/Rct7KAolD+jmt7TY1hy3rqnfZPViSEgShQq0Ly8rXanZOOGkXyoEFbrmCFhWx/adB1rHaFhRRYBo7rZIUt70S5yW7tu2T9Qd7uO/PANtNUNcuNppqzvfdoVrxYY8b1bq19l18/ANfCV2cKYV081ITDWhioY4H18FALD3fBB1V/UBcfaMKDWphAxxLUwwda+GAXPK36aV081YTTWhiqY6H18FQLT3fBi6pf1MULTbiohRE6FlEPL7TwYhc8qfpJXTzRhJNaGKJjIfXwRAtPyvDyzyM75gNsZsdkgs0w2hwMSt8smh0fplCeJNfu9cQ9JK8LDL2Vx4r38owL5Bnv52QIWDALo1ReH4LhYdbaZQSLR6NsyauLrV5Uqo2r4yP79Mj6rXnw2tlcYp4ZPxg/Gj8ZttE2XhtvjDPj0oDGn8bfxj/Gv41fGqAxafyxtj59shnznVF6Gvx/zFMNMQ==</latexit>

yterrible In each self attention computation, every input


vector occurs in three distinct roles:
• the value: the vector that is used in the
+
<latexit sha1_base64="hT+FQzXOLeuz6OSFWbXGYnQt0eU=">AAANnnicfZdNb9s2HMbVdi9dVq/tetxFWFBg2IJAShy/HArUkmz0sDRpECfZYqOgaFoRTIkERTl2BZ132HX7cPs2o/wiSxRlnQg+Dx/9+Kdoky7FfsQN478nT5999fU33z7/7uD7F40fXr56/eNNRGIG0RASTNidCyKE/RANuc8xuqMMgcDF6Nad2Zl+O0cs8kl4zZcUjQPghf7Uh4CLrk+/fX51aBwbq0evNsxN41DbPJefX7/462A0ITAOUMghBlF0bxqUjxPAuA8xSg9GcYQogDPgofuYTzvjxA9pzFEIU/2t0KYx1jnRMxp94jMEOV6KBoDMFwk6fAAMQC6YD8pREQpBgKKjydyn0boZzb11gwMx4XGyWBUkLQ1MPAbogw8XJbIEBFEA+EOlM1oGbrkTxRixeVDuzCgFo+RcIAb9KKvBpSjMBc1qHF2Ty43+sKQPKIzSJGY4LQ4UAmIMTcXAVTNCPKbJajJiYWfRO85idJQ1V33vHMBmV2hyJHJKHWWcKSaAl7tcaRqLqah1Vq8QPUISBCCcJCOaJiOOFjwZHR2nQnyru1j4XQLYpOy8SpNklJXRdfUrYS2JHwviR1nsF8T+5iUET/QpYfpcfBKERbow6sLCfIii8uhhPnqqD+Xom4J4I4u3BfFWFt24oMYVdV5Q5xX1saA+yuqiIC5kcVkQl5VcXlC5rH4piF9k8W7fS//Y99I/pVixOmIXLcXGR1PxQ7P65pIZTJMP1+e/p0l39Wy+lBjpZtkI3a3xdNBqd9upLOOt3hx0TMup6rmhbfVMW2XIHb22bTj9HcuJ5M2hDaPV77XkKIh3eqdrD6r6DtbotR1LYdjRDuxmv70pH0KhZPVyn91tNStl8fKcrmVZZ92qnhuspm13ThSG3GE7jtOzVyg0ZhQjyUu3xlbrzKhG0TyoY7SaPYW+WwCjY1kVWFpgsQaW6ZgrFo4Alpw8/1jsTq/bl4P4rv5Wz7YrC8gL5e/YptNUGHasZ85Z/3RFQhgIPbkqJC9fq9057chRJA8atMUKVljI7k2DrmW0KyykwDKwbCsrbHknik12b46T9Q/ybt/ph6aeprJZbDTZnO29rVnyYoUZ17uV9n1+9QBcC1+dKYR18VARDmthoIoF1sNDJTzcB+9V/V5dvKcI92phPBWLVw/vKeG9ffC06qd18VQRTmthqIqF1sNTJTzdB8+rfl4XzxXhvBaGq1h4PTxXwvN98KTqJ3XxRBFOamGIioXUwxMlPCnDiz+P7JgPsJ4dkwnW/XBzMCj9ZtHs+DCD4iS5dq8n7iBxXWDoXBwrLsQZF4gz3q/JCDAv8MNUXB+80VHW2mcEi61RtMTVxZQvKtXGzcmxeXZsfGoevrc2l5jn2k/az9ovmqm1tffaB+1SG2pQQ9rf2j/avw29MWicNy7W1qdPNmPeaKWncfc//1EBxQ==</latexit>
weighted sum that ultimately provides the
output
the key • the query: the input vector that corresponds to
the current output, matched against every other
⇥ ⇥ ⇥ ⇥ ⇥ ⇥
<latexit sha1_base64="VVOj9rnMzMPj3kMAppAgd9+7MrY=">AAANo3icfZdbb9s2HMXV7tZl9dpuj3sRFgQYhiCQEl8fCtSSbBTY2mZBbltsFBRNy5opkaAox66g573udftm+zajfJElirKeCJ7Dox//FG3SpdiPuGH89+TpZ59/8eVXz74++uZ549sXL199dxuRmEF0Awkm7N4FEcJ+iG64zzG6pwyBwMXozp3bmX63QCzySXjNVxSNA+CF/tSHgIuu2xH3AxR9fHlsnBnrR682zG3jWNs+lx9fPf/raDQhMA5QyCEGUfRgGpSPE8C4DzFKj0ZxhCiAc+Chh5hPu+PED2nMUQhT/URo0xjrnOgZkj7xGYIcr0QDQOaLBB3OAAOQC/CjclSEQiCITycLn0abZrTwNg0OxKzHyXJdlbQ0MPEYoDMfLktkCQiiAPBZpTNaBW65E8UYsUVQ7swoBaPkXCIG/SirwaUozAeaFTq6JpdbfbaiMxRGaRIznBYHCgExhqZi4LoZIR7TZD0Zsbrz6DVnMTrNmuu+1w5g8ys0ORU5pY4yzhQTwMtdrjSN5VTUOqtXiB4hCQIQTpIRTZMRR0uejE7PUiGe6C4WfpcANik7r9IkGWVldF39SlhL4vuC+F4WBwVxsH0JwRN9Spi+EJ8EYZEujLqwMB+iqDz6Jh891W/k6NuCeCuLdwXxThbduKDGFXVRUBcV9bGgPsrqsiAuZXFVEFeVXF5Quax+KoifZPH+0Et/P/TSP6RYsTpiF63ExkdT8Wuz/uaSOUyTt9fvfk2T3vrZfikx0s2yEbo748Ww3el1UlnGO7057JqWU9VzQ8fqm7bKkDv6HdtwBnuWc8mbQxtGe9Bvy1EQ7/Vuzx5W9T2s0e84lsKwpx3azUFnWz6EQsnq5T67125WyuLlOT3Lslq9qp4brKZtd88VhtxhO47Tt9coNGYUI8lLd8Z2u2VUo2ge1DXazb5C3y+A0bWsCiwtsFhDy3TMNQtHAEtOnn8sdrffG8hBfF9/q2/blQXkhfJ3bdNpKgx71pbTGlysSQgDoSdXheTla3e6F105iuRBw45YwQoL2b9p2LOMToWFFFiGlm1lhS3vRLHJHsxxsvlB3u87/djU01Q2i40mm7O9tzNLXqww43q30n7Irx6Aa+GrM4WwLh4qwmEtDFSxwHp4qISHh+C9qt+ri/cU4V4tjKdi8erhPSW8dwieVv20Lp4qwmktDFWx0Hp4qoSnh+B51c/r4rkinNfCcBULr4fnSnh+CJ5U/aQunijCSS0MUbGQeniihCdlePHnkR3zAdazYzLBuh9uDwal3yyaHR/mUJwkN+7NxB0krgsMvRPHig/ijAvEGe/nZASYF/hhKq4P3ug0ax0yguXOKFri6mLKF5Vq4/b8zGydGb81j99Y20vMM+0H7UftJ83UOtob7a12qd1oUPtT+1v7R/u3cdL4pXHVuN5Ynz7ZjvleKz2N8f+4KwRK</latexit> <latexit sha1_base64="VVOj9rnMzMPj3kMAppAgd9+7MrY=">AAANo3icfZdbb9s2HMXV7tZl9dpuj3sRFgQYhiCQEl8fCtSSbBTY2mZBbltsFBRNy5opkaAox66g573udftm+zajfJElirKeCJ7Dox//FG3SpdiPuGH89+TpZ59/8eVXz74++uZ549sXL199dxuRmEF0Awkm7N4FEcJ+iG64zzG6pwyBwMXozp3bmX63QCzySXjNVxSNA+CF/tSHgIuu2xH3AxR9fHlsnBnrR682zG3jWNs+lx9fPf/raDQhMA5QyCEGUfRgGpSPE8C4DzFKj0ZxhCiAc+Chh5hPu+PED2nMUQhT/URo0xjrnOgZkj7xGYIcr0QDQOaLBB3OAAOQC/CjclSEQiCITycLn0abZrTwNg0OxKzHyXJdlbQ0MPEYoDMfLktkCQiiAPBZpTNaBW65E8UYsUVQ7swoBaPkXCIG/SirwaUozAeaFTq6JpdbfbaiMxRGaRIznBYHCgExhqZi4LoZIR7TZD0Zsbrz6DVnMTrNmuu+1w5g8ys0ORU5pY4yzhQTwMtdrjSN5VTUOqtXiB4hCQIQTpIRTZMRR0uejE7PUiGe6C4WfpcANik7r9IkGWVldF39SlhL4vuC+F4WBwVxsH0JwRN9Spi+EJ8EYZEujLqwMB+iqDz6Jh891W/k6NuCeCuLdwXxThbduKDGFXVRUBcV9bGgPsrqsiAuZXFVEFeVXF5Quax+KoifZPH+0Et/P/TSP6RYsTpiF63ExkdT8Wuz/uaSOUyTt9fvfk2T3vrZfikx0s2yEbo748Ww3el1UlnGO7057JqWU9VzQ8fqm7bKkDv6HdtwBnuWc8mbQxtGe9Bvy1EQ7/Vuzx5W9T2s0e84lsKwpx3azUFnWz6EQsnq5T67125WyuLlOT3Lslq9qp4brKZtd88VhtxhO47Tt9coNGYUI8lLd8Z2u2VUo2ge1DXazb5C3y+A0bWsCiwtsFhDy3TMNQtHAEtOnn8sdrffG8hBfF9/q2/blQXkhfJ3bdNpKgx71pbTGlysSQgDoSdXheTla3e6F105iuRBw45YwQoL2b9p2LOMToWFFFiGlm1lhS3vRLHJHsxxsvlB3u87/djU01Q2i40mm7O9tzNLXqww43q30n7Irx6Aa+GrM4WwLh4qwmEtDFSxwHp4qISHh+C9qt+ri/cU4V4tjKdi8erhPSW8dwieVv20Lp4qwmktDFWx0Hp4qoSnh+B51c/r4rkinNfCcBULr4fnSnh+CJ5U/aQunijCSS0MUbGQeniihCdlePHnkR3zAdazYzLBuh9uDwal3yyaHR/mUJwkN+7NxB0krgsMvRPHig/ijAvEGe/nZASYF/hhKq4P3ug0ax0yguXOKFri6mLKF5Vq4/b8zGydGb81j99Y20vMM+0H7UftJ83UOtob7a12qd1oUPtT+1v7R/u3cdL4pXHVuN5Ynz7ZjvleKz2N8f+4KwRK</latexit> <latexit sha1_base64="VVOj9rnMzMPj3kMAppAgd9+7MrY=">AAANo3icfZdbb9s2HMXV7tZl9dpuj3sRFgQYhiCQEl8fCtSSbBTY2mZBbltsFBRNy5opkaAox66g573udftm+zajfJElirKeCJ7Dox//FG3SpdiPuGH89+TpZ59/8eVXz74++uZ549sXL199dxuRmEF0Awkm7N4FEcJ+iG64zzG6pwyBwMXozp3bmX63QCzySXjNVxSNA+CF/tSHgIuu2xH3AxR9fHlsnBnrR682zG3jWNs+lx9fPf/raDQhMA5QyCEGUfRgGpSPE8C4DzFKj0ZxhCiAc+Chh5hPu+PED2nMUQhT/URo0xjrnOgZkj7xGYIcr0QDQOaLBB3OAAOQC/CjclSEQiCITycLn0abZrTwNg0OxKzHyXJdlbQ0MPEYoDMfLktkCQiiAPBZpTNaBW65E8UYsUVQ7swoBaPkXCIG/SirwaUozAeaFTq6JpdbfbaiMxRGaRIznBYHCgExhqZi4LoZIR7TZD0Zsbrz6DVnMTrNmuu+1w5g8ys0ORU5pY4yzhQTwMtdrjSN5VTUOqtXiB4hCQIQTpIRTZMRR0uejE7PUiGe6C4WfpcANik7r9IkGWVldF39SlhL4vuC+F4WBwVxsH0JwRN9Spi+EJ8EYZEujLqwMB+iqDz6Jh891W/k6NuCeCuLdwXxThbduKDGFXVRUBcV9bGgPsrqsiAuZXFVEFeVXF5Quax+KoifZPH+0Et/P/TSP6RYsTpiF63ExkdT8Wuz/uaSOUyTt9fvfk2T3vrZfikx0s2yEbo748Ww3el1UlnGO7057JqWU9VzQ8fqm7bKkDv6HdtwBnuWc8mbQxtGe9Bvy1EQ7/Vuzx5W9T2s0e84lsKwpx3azUFnWz6EQsnq5T67125WyuLlOT3Lslq9qp4brKZtd88VhtxhO47Tt9coNGYUI8lLd8Z2u2VUo2ge1DXazb5C3y+A0bWsCiwtsFhDy3TMNQtHAEtOnn8sdrffG8hBfF9/q2/blQXkhfJ3bdNpKgx71pbTGlysSQgDoSdXheTla3e6F105iuRBw45YwQoL2b9p2LOMToWFFFiGlm1lhS3vRLHJHsxxsvlB3u87/djU01Q2i40mm7O9tzNLXqww43q30n7Irx6Aa+GrM4WwLh4qwmEtDFSxwHp4qISHh+C9qt+ri/cU4V4tjKdi8erhPSW8dwieVv20Lp4qwmktDFWx0Hp4qoSnh+B51c/r4rkinNfCcBULr4fnSnh+CJ5U/aQunijCSS0MUbGQeniihCdlePHnkR3zAdazYzLBuh9uDwal3yyaHR/mUJwkN+7NxB0krgsMvRPHig/ijAvEGe/nZASYF/hhKq4P3ug0ax0yguXOKFri6mLKF5Vq4/b8zGydGb81j99Y20vMM+0H7UftJ83UOtob7a12qd1oUPtT+1v7R/u3cdL4pXHVuN5Ynz7ZjvleKz2N8f+4KwRK</latexit> <latexit sha1_base64="VVOj9rnMzMPj3kMAppAgd9+7MrY=">AAANo3icfZdbb9s2HMXV7tZl9dpuj3sRFgQYhiCQEl8fCtSSbBTY2mZBbltsFBRNy5opkaAox66g573udftm+zajfJElirKeCJ7Dox//FG3SpdiPuGH89+TpZ59/8eVXz74++uZ549sXL199dxuRmEF0Awkm7N4FEcJ+iG64zzG6pwyBwMXozp3bmX63QCzySXjNVxSNA+CF/tSHgIuu2xH3AxR9fHlsnBnrR682zG3jWNs+lx9fPf/raDQhMA5QyCEGUfRgGpSPE8C4DzFKj0ZxhCiAc+Chh5hPu+PED2nMUQhT/URo0xjrnOgZkj7xGYIcr0QDQOaLBB3OAAOQC/CjclSEQiCITycLn0abZrTwNg0OxKzHyXJdlbQ0MPEYoDMfLktkCQiiAPBZpTNaBW65E8UYsUVQ7swoBaPkXCIG/SirwaUozAeaFTq6JpdbfbaiMxRGaRIznBYHCgExhqZi4LoZIR7TZD0Zsbrz6DVnMTrNmuu+1w5g8ys0ORU5pY4yzhQTwMtdrjSN5VTUOqtXiB4hCQIQTpIRTZMRR0uejE7PUiGe6C4WfpcANik7r9IkGWVldF39SlhL4vuC+F4WBwVxsH0JwRN9Spi+EJ8EYZEujLqwMB+iqDz6Jh891W/k6NuCeCuLdwXxThbduKDGFXVRUBcV9bGgPsrqsiAuZXFVEFeVXF5Quax+KoifZPH+0Et/P/TSP6RYsTpiF63ExkdT8Wuz/uaSOUyTt9fvfk2T3vrZfikx0s2yEbo748Ww3el1UlnGO7057JqWU9VzQ8fqm7bKkDv6HdtwBnuWc8mbQxtGe9Bvy1EQ7/Vuzx5W9T2s0e84lsKwpx3azUFnWz6EQsnq5T67125WyuLlOT3Lslq9qp4brKZtd88VhtxhO47Tt9coNGYUI8lLd8Z2u2VUo2ge1DXazb5C3y+A0bWsCiwtsFhDy3TMNQtHAEtOnn8sdrffG8hBfF9/q2/blQXkhfJ3bdNpKgx71pbTGlysSQgDoSdXheTla3e6F105iuRBw45YwQoL2b9p2LOMToWFFFiGlm1lhS3vRLHJHsxxsvlB3u87/djU01Q2i40mm7O9tzNLXqww43q30n7Irx6Aa+GrM4WwLh4qwmEtDFSxwHp4qISHh+C9qt+ri/cU4V4tjKdi8erhPSW8dwieVv20Lp4qwmktDFWx0Hp4qoSnh+B51c/r4rkinNfCcBULr4fnSnh+CJ5U/aQunijCSS0MUbGQeniihCdlePHnkR3zAdazYzLBuh9uDwal3yyaHR/mUJwkN+7NxB0krgsMvRPHig/ijAvEGe/nZASYF/hhKq4P3ug0ax0yguXOKFri6mLKF5Vq4/b8zGydGb81j99Y20vMM+0H7UftJ83UOtob7a12qd1oUPtT+1v7R/u3cdL4pXHVuN5Ynz7ZjvleKz2N8f+4KwRK</latexit> <latexit sha1_base64="VVOj9rnMzMPj3kMAppAgd9+7MrY=">AAANo3icfZdbb9s2HMXV7tZl9dpuj3sRFgQYhiCQEl8fCtSSbBTY2mZBbltsFBRNy5opkaAox66g573udftm+zajfJElirKeCJ7Dox//FG3SpdiPuGH89+TpZ59/8eVXz74++uZ549sXL199dxuRmEF0Awkm7N4FEcJ+iG64zzG6pwyBwMXozp3bmX63QCzySXjNVxSNA+CF/tSHgIuu2xH3AxR9fHlsnBnrR682zG3jWNs+lx9fPf/raDQhMA5QyCEGUfRgGpSPE8C4DzFKj0ZxhCiAc+Chh5hPu+PED2nMUQhT/URo0xjrnOgZkj7xGYIcr0QDQOaLBB3OAAOQC/CjclSEQiCITycLn0abZrTwNg0OxKzHyXJdlbQ0MPEYoDMfLktkCQiiAPBZpTNaBW65E8UYsUVQ7swoBaPkXCIG/SirwaUozAeaFTq6JpdbfbaiMxRGaRIznBYHCgExhqZi4LoZIR7TZD0Zsbrz6DVnMTrNmuu+1w5g8ys0ORU5pY4yzhQTwMtdrjSN5VTUOqtXiB4hCQIQTpIRTZMRR0uejE7PUiGe6C4WfpcANik7r9IkGWVldF39SlhL4vuC+F4WBwVxsH0JwRN9Spi+EJ8EYZEujLqwMB+iqDz6Jh891W/k6NuCeCuLdwXxThbduKDGFXVRUBcV9bGgPsrqsiAuZXFVEFeVXF5Quax+KoifZPH+0Et/P/TSP6RYsTpiF63ExkdT8Wuz/uaSOUyTt9fvfk2T3vrZfikx0s2yEbo748Ww3el1UlnGO7057JqWU9VzQ8fqm7bKkDv6HdtwBnuWc8mbQxtGe9Bvy1EQ7/Vuzx5W9T2s0e84lsKwpx3azUFnWz6EQsnq5T67125WyuLlOT3Lslq9qp4brKZtd88VhtxhO47Tt9coNGYUI8lLd8Z2u2VUo2ge1DXazb5C3y+A0bWsCiwtsFhDy3TMNQtHAEtOnn8sdrffG8hBfF9/q2/blQXkhfJ3bdNpKgx71pbTGlysSQgDoSdXheTla3e6F105iuRBw45YwQoL2b9p2LOMToWFFFiGlm1lhS3vRLHJHsxxsvlB3u87/djU01Q2i40mm7O9tzNLXqww43q30n7Irx6Aa+GrM4WwLh4qwmEtDFSxwHp4qISHh+C9qt+ri/cU4V4tjKdi8erhPSW8dwieVv20Lp4qwmktDFWx0Hp4qoSnh+B51c/r4rkinNfCcBULr4fnSnh+CJ5U/aQunijCSS0MUbGQeniihCdlePHnkR3zAdazYzLBuh9uDwal3yyaHR/mUJwkN+7NxB0krgsMvRPHig/ijAvEGe/nZASYF/hhKq4P3ug0ax0yguXOKFri6mLKF5Vq4/b8zGydGb81j99Y20vMM+0H7UftJ83UOtob7a12qd1oUPtT+1v7R/u3cdL4pXHVuN5Ynz7ZjvleKz2N8f+4KwRK</latexit> <latexit sha1_base64="VVOj9rnMzMPj3kMAppAgd9+7MrY=">AAANo3icfZdbb9s2HMXV7tZl9dpuj3sRFgQYhiCQEl8fCtSSbBTY2mZBbltsFBRNy5opkaAox66g573udftm+zajfJElirKeCJ7Dox//FG3SpdiPuGH89+TpZ59/8eVXz74++uZ549sXL199dxuRmEF0Awkm7N4FEcJ+iG64zzG6pwyBwMXozp3bmX63QCzySXjNVxSNA+CF/tSHgIuu2xH3AxR9fHlsnBnrR682zG3jWNs+lx9fPf/raDQhMA5QyCEGUfRgGpSPE8C4DzFKj0ZxhCiAc+Chh5hPu+PED2nMUQhT/URo0xjrnOgZkj7xGYIcr0QDQOaLBB3OAAOQC/CjclSEQiCITycLn0abZrTwNg0OxKzHyXJdlbQ0MPEYoDMfLktkCQiiAPBZpTNaBW65E8UYsUVQ7swoBaPkXCIG/SirwaUozAeaFTq6JpdbfbaiMxRGaRIznBYHCgExhqZi4LoZIR7TZD0Zsbrz6DVnMTrNmuu+1w5g8ys0ORU5pY4yzhQTwMtdrjSN5VTUOqtXiB4hCQIQTpIRTZMRR0uejE7PUiGe6C4WfpcANik7r9IkGWVldF39SlhL4vuC+F4WBwVxsH0JwRN9Spi+EJ8EYZEujLqwMB+iqDz6Jh891W/k6NuCeCuLdwXxThbduKDGFXVRUBcV9bGgPsrqsiAuZXFVEFeVXF5Quax+KoifZPH+0Et/P/TSP6RYsTpiF63ExkdT8Wuz/uaSOUyTt9fvfk2T3vrZfikx0s2yEbo748Ww3el1UlnGO7057JqWU9VzQ8fqm7bKkDv6HdtwBnuWc8mbQxtGe9Bvy1EQ7/Vuzx5W9T2s0e84lsKwpx3azUFnWz6EQsnq5T67125WyuLlOT3Lslq9qp4brKZtd88VhtxhO47Tt9coNGYUI8lLd8Z2u2VUo2ge1DXazb5C3y+A0bWsCiwtsFhDy3TMNQtHAEtOnn8sdrffG8hBfF9/q2/blQXkhfJ3bdNpKgx71pbTGlysSQgDoSdXheTla3e6F105iuRBw45YwQoL2b9p2LOMToWFFFiGlm1lhS3vRLHJHsxxsvlB3u87/djU01Q2i40mm7O9tzNLXqww43q30n7Irx6Aa+GrM4WwLh4qwmEtDFSxwHp4qISHh+C9qt+ri/cU4V4tjKdi8erhPSW8dwieVv20Lp4qwmktDFWx0Hp4qoSnh+B51c/r4rkinNfCcBULr4fnSnh+CJ5U/aQunijCSS0MUbGQeniihCdlePHnkR3zAdazYzLBuh9uDwal3yyaHR/mUJwkN+7NxB0krgsMvRPHig/ijAvEGe/nZASYF/hhKq4P3ug0ax0yguXOKFri6mLKF5Vq4/b8zGydGb81j99Y20vMM+0H7UftJ83UOtob7a12qd1oUPtT+1v7R/u3cdL4pXHVuN5Ynz7ZjvleKz2N8f+4KwRK</latexit>

the query
input vector.
the value • the key: the input vector that the query is
matched against to determine the weight.
this
20 restaurant was not too terrible

ATTENTION AS A SOFT DICTIONARY In a dictionary, all the operations are discrete: a


query only matches a single key, and returns only
d = {'a' : 1, 'b' : 2, 'c' : 3} the value corresponding to that key.
key value
<- value
<- key

d['b'] = 3
a 1
<-
qu
er

b 2
y

c 3

21

ATTENTION AS A SOFT DICTIONARY If the dot product of only one query/key pair is
non-zero, we recover the operation of a normal
Attention is a soft dictionary dictionary.
• key, query and value are vectors
• every key matches the query to some extent
as determined by their dot-product

• a mixture of all values is returned


with softmax-normalized dot products as mixture weights

Self-attention
Attention with keys, queries and values from the same set.

22
KEY, QUERY AND VALUE TRANSFORMATIONS To give the self attention some more flexibility in
determining its behavior, we multiply each input
introduce matrices K, Q, V for linear transforms vector by three different k-by-k parameter
and associated biases
matrices, which gives us a different vector to act
+ as key query and value.
<latexit sha1_base64="hT+FQzXOLeuz6OSFWbXGYnQt0eU=">AAANnnicfZdNb9s2HMbVdi9dVq/tetxFWFBg2IJAShy/HArUkmz0sDRpECfZYqOgaFoRTIkERTl2BZ132HX7cPs2o/wiSxRlnQg+Dx/9+Kdoky7FfsQN478nT5999fU33z7/7uD7F40fXr56/eNNRGIG0RASTNidCyKE/RANuc8xuqMMgcDF6Nad2Zl+O0cs8kl4zZcUjQPghf7Uh4CLrk+/fX51aBwbq0evNsxN41DbPJefX7/462A0ITAOUMghBlF0bxqUjxPAuA8xSg9GcYQogDPgofuYTzvjxA9pzFEIU/2t0KYx1jnRMxp94jMEOV6KBoDMFwk6fAAMQC6YD8pREQpBgKKjydyn0boZzb11gwMx4XGyWBUkLQ1MPAbogw8XJbIEBFEA+EOlM1oGbrkTxRixeVDuzCgFo+RcIAb9KKvBpSjMBc1qHF2Ty43+sKQPKIzSJGY4LQ4UAmIMTcXAVTNCPKbJajJiYWfRO85idJQ1V33vHMBmV2hyJHJKHWWcKSaAl7tcaRqLqah1Vq8QPUISBCCcJCOaJiOOFjwZHR2nQnyru1j4XQLYpOy8SpNklJXRdfUrYS2JHwviR1nsF8T+5iUET/QpYfpcfBKERbow6sLCfIii8uhhPnqqD+Xom4J4I4u3BfFWFt24oMYVdV5Q5xX1saA+yuqiIC5kcVkQl5VcXlC5rH4piF9k8W7fS//Y99I/pVixOmIXLcXGR1PxQ7P65pIZTJMP1+e/p0l39Wy+lBjpZtkI3a3xdNBqd9upLOOt3hx0TMup6rmhbfVMW2XIHb22bTj9HcuJ5M2hDaPV77XkKIh3eqdrD6r6DtbotR1LYdjRDuxmv70pH0KhZPVyn91tNStl8fKcrmVZZ92qnhuspm13ThSG3GE7jtOzVyg0ZhQjyUu3xlbrzKhG0TyoY7SaPYW+WwCjY1kVWFpgsQaW6ZgrFo4Alpw8/1jsTq/bl4P4rv5Wz7YrC8gL5e/YptNUGHasZ85Z/3RFQhgIPbkqJC9fq9057chRJA8atMUKVljI7k2DrmW0KyykwDKwbCsrbHknik12b46T9Q/ybt/ph6aeprJZbDTZnO29rVnyYoUZ17uV9n1+9QBcC1+dKYR18VARDmthoIoF1sNDJTzcB+9V/V5dvKcI92phPBWLVw/vKeG9ffC06qd18VQRTmthqIqF1sNTJTzdB8+rfl4XzxXhvBaGq1h4PTxXwvN98KTqJ3XxRBFOamGIioXUwxMlPCnDiz+P7JgPsJ4dkwnW/XBzMCj9ZtHs+DCD4iS5dq8n7iBxXWDoXBwrLsQZF4gz3q/JCDAv8MNUXB+80VHW2mcEi61RtMTVxZQvKtXGzcmxeXZsfGoevrc2l5jn2k/az9ovmqm1tffaB+1SG2pQQ9rf2j/avw29MWicNy7W1qdPNmPeaKWncfc//1EBxQ==</latexit>

ki = Kxi + bk
<latexit sha1_base64="0oeRmgrsqlmLSmfsuGk8y9UcBQo=">AAAORnicfZfdbts2GIbl7q/Lmi7dDnciLCgwbEEgJY5/gAWoJdsosLVNg/xtURBQNK0IpkWWohy7go53KbuV3cJuYmfDTkfJtixRlHVE8n2/T48+kjbpUuyH3DD+bjz55NPPPv/i6Zc7Xz3bff713otvrkISMYguIcGE3bggRNgP0CX3OUY3lCEwdTG6did2ql/PEAt9ElzwBUV3U+AF/tiHgIuh+70/HQhjxx3rk+Te10/1dfeXxIFu1pqnwk+54Cb3E8fZcSDPuh/WYcvu+2oYX4d9cLI4mvVn67hl96oaR9dxszTufm/fODSyR682zFVjX1s9Z/cvnv2x44wIjKYo4BCDMLw1DcrvYsC4DzFKdpwoRBTACfDQbcTHnbvYD2jEUQAT/aXQxhHWOdHTqukjnyHI8UI0AGS+yKDDB8AA5KK2O+VUIQrAFIUHo5lPw2UznHnLBgdiYu7ieTZxSSkw9higDz6cl8hiMA2ngD9UBsPF1C0PoggjNpuWB1NKwSg554hBP0xrcCYK846mayG8IGcr/WFBH1AQJnHEcFIMFAJiDI1FYNYMEY9onH2MWICT8JSzCB2kzWzstA/Y5ByNDkSe0kAZZ4wJ4OUhV/qM+VjUOq1XgB4hmU5BMIodmsQOR3OxwA4OEyG+1F0s/C4BbFR2nidx7KRldF39XFhL4tuC+FYWBwVxsHoJwSN9TJg+E0uCsFAXRl1YmA9RWI6+zKPH+qWc+qogXsnidUG8lkU3KqhRRZ0V1FlFfSyoj7I6L4hzWVwUxEUlLy+oXFY/FsSPsniz7aW/bXvp71JaMTtiFy3Exkdj8YOYrbl4ApP49cWbX5O4mz2rlRIh3Swbobs2Hg9b7W47kWW81pvDjmn1q3puaFs901YZckevbRv9wYblSPLm0IbRGvRaciqIN3qnaw+r+gbW6LX7lsKwoR3azUF7VT6EAsnq5T6722pWyuLlebqWZZ10q3pusJq23TlSGHKH3e/3e3aGQiNGMZK8dG1stU6MaiqaJ+oYrWZPoW8mwOhYVgWWFlisoWX2zYyFI4AlJ88Xi93pdQdyIr6pv9Wz7coE8kL5O7bZbyoMG9aT/sngOCMhDASeXBWSl6/V7hx35FQkTzRsixmssJDNm4Zdy2hXWEiBZWjZVlrY8k4Um+zWvIuXP8ibfafvm3qSyGax0WRzuvfWZsmLFWZc71bat/nVAbgWvvqlENalh4rksBYGqlhgPTxUwsNt8F7V79Wl9xTJvVoYT8Xi1cN7SnhvGzyt+mldeqpITmthqIqF1sNTJTzdBs+rfl6XniuS81oYrmLh9fBcCc+3wZOqn9SlJ4rkpBaGqFhIPTxRwpMyvPjzSI/5AOvpMZlg3Q9WB4PSbxZNjw8Tca9ZuZcf3kfiusDQG3GseCfOuECc8X6MHcC8qR8k4vrgOQdpa5sRzNdG0RJXF1O+qFQbV0eH5smh8b65/8paXWKeat9p32s/aKbW1l5pr7Uz7VKDjd3GcePnxunuX7v/7P67+9/S+qSxivlWKz3Ptf8Bjrw9sg==</latexit>

kj
<latexit sha1_base64="cvZ1ZtTJtpEMMd0GFeF5nR2Utxo=">AAANqnicfZdNb9s2HMbV7q3L6q3djrsICwoMgxFIiV8PBWpJNnpY2yzIWxcbAUXTsmZK5CjKsSvovI+w6/ax9m1GObYsUZR1Ivg8fPTjn6JNuhT7ETeM/548/ezzL7786tnXR988b3z73YuX319HJGYQXUGCCbt1QYSwH6Ir7nOMbilDIHAxunEXdqbfLBGLfBJe8jVFkwB4oT/zIeCiazKGMBm7M32R3v9xdP/i2DgxNo9ebZjbxrG2fc7vXz7/62g8JTAOUMghBlF0ZxqUTxLAuA8xSo/GcYQogAvgobuYz3qTxA9pzFEIU/2V0GYx1jnRMzJ96jMEOV6LBoDMFwk6nAMGIBf8R+WoCIUgQFFzuvRp9NiMlt5jgwMx+Umy2hQnLQ1MPAbo3IerElkCgigAfF7pjNaBW+5EMUZsGZQ7M0rBKDlXiEE/ympwLgrzgWb1ji7J+Vafr+kchVGaxAynxYFCQIyhmRi4aUaIxzTZTEYs8iJ6zVmMmllz0/faAWxxgaZNkVPqKOPMMAG83OVK01jNRK2zeoXoAZIgAOE0GdM0GXO04sm4eZIK8ZXuYuF3CWDTsvMiTZJxVkbX1S+EtSS+L4jvZXFYEIfblxA81WeE6UvxSRAW6cKoCwvzIYrKo6/y0TP9So6+LojXsnhTEG9k0Y0LalxRlwV1WVEfCuqDrK4K4koW1wVxXcnlBZXL6qeC+EkWbw+99OOhl/4uxYrVEbtoLTY+mokfnc03lyxgmry9fPdrmvQ3z/ZLiZFulo3Q3RnPRp1uv5vKMt7prVHPtJyqnhu61sC0VYbcMejahjPcs5xK3hzaMDrDQUeOgniv9/r2qKrvYY1B17EUhj3tyG4Nu9vyIRRKVi/32f1Oq1IWL8/pW5bV7lf13GC1bLt3qjDkDttxnIG9QaExoxhJXrozdjptoxpF86Ce0WkNFPp+AYyeZVVgaYHFGlmmY25YOAJYcvL8Y7F7g/5QDuL7+lsD264sIC+Uv2ebTkth2LO2nfbwbENCGAg9uSokL1+n2zvryVEkDxp1xQpWWMj+TaO+ZXQrLKTAMrJsKytseSeKTXZnTpLHH+T9vtOPTT1NZbPYaLI523s7s+TFCjOudyvth/zqAbgWvjpTCOvioSIc1sJAFQush4dKeHgI3qv6vbp4TxHu1cJ4KhavHt5TwnuH4GnVT+viqSKc1sJQFQuth6dKeHoInlf9vC6eK8J5LQxXsfB6eK6E54fgSdVP6uKJIpzUwhAVC6mHJ0p4UoYXfx7ZMR9gPTsmE6z74fZgUPrNotnxYSFuGVv348QdJK4LDL0Tx4oP4owLxBnvl2QMmBf4YSquD964mbUOGcFqZxQtcXUx5YtKtXF9emK2T4zfWsdvrO0l5pn2o/aT9rNmal3tjfZWO9euNKj9qf2t/aP922g2LhofG3eP1qdPtmN+0EpPY/o/JugGrg==</latexit>

qi = Qxi + bq Note that this makes the self attention operation a


ki = Kxi
<latexit sha1_base64="sKL7sQsVoPcdsKtgftTBgnVSUOU=">AAAOGnicfZdNb9s2HMaV7qVdVm/JdtxFWFBgGIJAShy/HALUkm0U2NqmQd62KAgomlYE0yJHUY5dQed9iX2a3YZdd9lx32SUbMsSRVknis/DRz/9KdqkS7EfcsP4d+fZJ59+9vnzF1/sfvmy8dXXe/vfXIckYhBdQYIJu3VBiLAfoCvuc4xuKUNg6mJ0407sVL+ZIRb6JLjkC4rup8AL/LEPARddD3uxA2HsuGN9kjz4+pm+vv0pcaCbteap4Di7DuTZ/W9r4/L2g8JIs/vZ2ri8va4YH/YOjCMju/Rqw1w1DrTVdf6w//L3XWdEYDRFAYcYhOGdaVB+HwPGfYhRsutEIaIAToCH7iI+7tzHfkAjjgKY6K+ENo6wzomeVkIf+QxBjheiASDzRYIOHwEDkIt67ZajQhSAKQoPRzOfhstmOPOWDQ5Ese/jeTYZSWlg7DFAH304L5HFYBpOAX+sdIaLqVvuRBFGbDYtd6aUglFyzhGDfpjW4FwU5j1N5ze8JOcr/XFBH1EQJnHEcFIcKATEGBqLgVkzRDyicfYy4qOahGecRegwbWZ9Z33AJhdodChySh1lnDEmgJe7XOk15mNR67ReAXqCZDoFwSh2aBI7HM3Ft3V4lAjxle5i4XcJYKOy8yKJYycto+vqF8JaEt8VxHeyOCiIg9VDCB7pY8L0mfgkCAt1YdSFhfkQheXRV/nosX4lR18XxGtZvCmIN7LoRgU1qqizgjqrqE8F9UlW5wVxLouLgrio5PKCymX1Y0H8KIu32x76y7aH/irFitkRq2ghFj4aix+57JuLJzCJ31y+/TmJu9m1+lIipJtlI3TXxpNhq91tJ7KM13pz2DGtflXPDW2rZ9oqQ+7otW2jP9iwHEveHNowWoNeS46CeKN3uvawqm9gjV67bykMG9qh3Ry0V+VDKJCsXu6zu61mpSxentO1LOu0W9Vzg9W07c6xwpA77H6/37MzFBoxipHkpWtjq3VqVKNoHtQxWs2eQt9MgNGxrAosLbBYQ8vsmxkLRwBLTp5/LHan1x3IQXxTf6tn25UJ5IXyd2yz31QYNqyn/dPBSUZCGAg8uSokL1+r3TnpyFEkDxq2xQxWWMjmScOuZbQrLKTAMrRsKy1seSWKRXZn3sfLH+TNutMPTD1JZLNYaLI5XXtrs+TFCjOudyvt2/zqAbgWvvqmENbFQ0U4rIWBKhZYDw+V8HAbvFf1e3XxniLcq4XxVCxePbynhPe2wdOqn9bFU0U4rYWhKhZaD0+V8HQbPK/6eV08V4TzWhiuYuH18FwJz7fBk6qf1MUTRTiphSEqFlIPT5TwpAwv/jzSbT7AerpNJlj3g9XGoPSbRdPtw0QcYlbu5Yv3kTguMPRWbCveiz0uEHu8H2MHMG/qB4k4PnjOYdraZgTztVG0xNHFlA8q1cb18ZF5emR8aB68tlaHmBfad9r32g+aqbW119ob7Vy70qD2387znb2d/cYfjT8bfzX+Xlqf7azGfKuVrsY//wPJzS6S</latexit>

layer with parameters (where before it had none).


vi = Vxi + bv q = Qx ki = Kxi
<latexit sha1_base64="sKL7sQsVoPcdsKtgftTBgnVSUOU=">AAAOGnicfZdNb9s2HMaV7qVdVm/JdtxFWFBgGIJAShy/HALUkm0U2NqmQd62KAgomlYE0yJHUY5dQed9iX2a3YZdd9lx32SUbMsSRVknis/DRz/9KdqkS7EfcsP4d+fZJ59+9vnzF1/sfvmy8dXXe/vfXIckYhBdQYIJu3VBiLAfoCvuc4xuKUNg6mJ0407sVL+ZIRb6JLjkC4rup8AL/LEPARddD3uxA2HsuGN9kjz4+pm+vv0pcaCbteap4Di7DuTZ/W9r4/L2g8JIs/vZ2ri8va4YH/YOjCMju/Rqw1w1DrTVdf6w//L3XWdEYDRFAYcYhOGdaVB+HwPGfYhRsutEIaIAToCH7iI+7tzHfkAjjgKY6K+ENo6wzomeVkIf+QxBjheiASDzRYIOHwEDkIt67ZajQhSAKQoPRzOfhstmOPOWDQ5Ese/jeTYZSWlg7DFAH304L5HFYBpOAX+sdIaLqVvuRBFGbDYtd6aUglFyzhGDfpjW4FwU5j1N5ze8JOcr/XFBH1EQJnHEcFIcKATEGBqLgVkzRDyicfYy4qOahGecRegwbWZ9Z33AJhdodChySh1lnDEmgJe7XOk15mNR67ReAXqCZDoFwSh2aBI7HM3Ft3V4lAjxle5i4XcJYKOy8yKJYycto+vqF8JaEt8VxHeyOCiIg9VDCB7pY8L0mfgkCAt1YdSFhfkQheXRV/nosX4lR18XxGtZvCmIN7LoRgU1qqizgjqrqE8F9UlW5wVxLouLgrio5PKCymX1Y0H8KIu32x76y7aH/irFitkRq2ghFj4aix+57JuLJzCJ31y+/TmJu9m1+lIipJtlI3TXxpNhq91tJ7KM13pz2DGtflXPDW2rZ9oqQ+7otW2jP9iwHEveHNowWoNeS46CeKN3uvawqm9gjV67bykMG9qh3Ry0V+VDKJCsXu6zu61mpSxentO1LOu0W9Vzg9W07c6xwpA77H6/37MzFBoxipHkpWtjq3VqVKNoHtQxWs2eQt9MgNGxrAosLbBYQ8vsmxkLRwBLTp5/LHan1x3IQXxTf6tn25UJ5IXyd2yz31QYNqyn/dPBSUZCGAg8uSokL1+r3TnpyFEkDxq2xQxWWMjmScOuZbQrLKTAMrRsKy1seSWKRXZn3sfLH+TNutMPTD1JZLNYaLI5XXtrs+TFCjOudyvt2/zqAbgWvvqmENbFQ0U4rIWBKhZYDw+V8HAbvFf1e3XxniLcq4XxVCxePbynhPe2wdOqn9bFU0U4rYWhKhZaD0+V8HQbPK/6eV08V4TzWhiuYuH18FwJz7fBk6qf1MUTRTiphSEqFlIPT5TwpAwv/jzSbT7AerpNJlj3g9XGoPSbRdPtw0QcYlbu5Yv3kTguMPRWbCveiz0uEHu8H2MHMG/qB4k4PnjOYdraZgTztVG0xNHFlA8q1cb18ZF5emR8aB68tlaHmBfad9r32g+aqbW119ob7Vy70qD2387znb2d/cYfjT8bfzX+Xlqf7azGfKuVrsY//wPJzS6S</latexit>

⇥ i i
<latexit sha1_base64="VVOj9rnMzMPj3kMAppAgd9+7MrY=">AAANo3icfZdbb9s2HMXV7tZl9dpuj3sRFgQYhiCQEl8fCtSSbBTY2mZBbltsFBRNy5opkaAox66g573udftm+zajfJElirKeCJ7Dox//FG3SpdiPuGH89+TpZ59/8eVXz74++uZ549sXL199dxuRmEF0Awkm7N4FEcJ+iG64zzG6pwyBwMXozp3bmX63QCzySXjNVxSNA+CF/tSHgIuu2xH3AxR9fHlsnBnrR682zG3jWNs+lx9fPf/raDQhMA5QyCEGUfRgGpSPE8C4DzFKj0ZxhCiAc+Chh5hPu+PED2nMUQhT/URo0xjrnOgZkj7xGYIcr0QDQOaLBB3OAAOQC/CjclSEQiCITycLn0abZrTwNg0OxKzHyXJdlbQ0MPEYoDMfLktkCQiiAPBZpTNaBW65E8UYsUVQ7swoBaPkXCIG/SirwaUozAeaFTq6JpdbfbaiMxRGaRIznBYHCgExhqZi4LoZIR7TZD0Zsbrz6DVnMTrNmuu+1w5g8ys0ORU5pY4yzhQTwMtdrjSN5VTUOqtXiB4hCQIQTpIRTZMRR0uejE7PUiGe6C4WfpcANik7r9IkGWVldF39SlhL4vuC+F4WBwVxsH0JwRN9Spi+EJ8EYZEujLqwMB+iqDz6Jh891W/k6NuCeCuLdwXxThbduKDGFXVRUBcV9bGgPsrqsiAuZXFVEFeVXF5Quax+KoifZPH+0Et/P/TSP6RYsTpiF63ExkdT8Wuz/uaSOUyTt9fvfk2T3vrZfikx0s2yEbo748Ww3el1UlnGO7057JqWU9VzQ8fqm7bKkDv6HdtwBnuWc8mbQxtGe9Bvy1EQ7/Vuzx5W9T2s0e84lsKwpx3azUFnWz6EQsnq5T67125WyuLlOT3Lslq9qp4brKZtd88VhtxhO47Tt9coNGYUI8lLd8Z2u2VUo2ge1DXazb5C3y+A0bWsCiwtsFhDy3TMNQtHAEtOnn8sdrffG8hBfF9/q2/blQXkhfJ3bdNpKgx71pbTGlysSQgDoSdXheTla3e6F105iuRBw45YwQoL2b9p2LOMToWFFFiGlm1lhS3vRLHJHsxxsvlB3u87/djU01Q2i40mm7O9tzNLXqww43q30n7Irx6Aa+GrM4WwLh4qwmEtDFSxwHp4qISHh+C9qt+ri/cU4V4tjKdi8erhPSW8dwieVv20Lp4qwmktDFWx0Hp4qoSnh+B51c/r4rkinNfCcBULr4fnSnh+CJ5U/aQunijCSS0MUbGQeniihCdlePHnkR3zAdazYzLBuh9uDwal3yyaHR/mUJwkN+7NxB0krgsMvRPHig/ijAvEGe/nZASYF/hhKq4P3ug0ax0yguXOKFri6mLKF5Vq4/b8zGydGb81j99Y20vMM+0H7UftJ83UOtob7a12qd1oUPtT+1v7R/u3cdL4pXHVuN5Ynz7ZjvleKz2N8f+4KwRK</latexit>

vi = Vxiqi = Qxi
vi = Vxi
23

MULTI-HEAD ATTENTION In many sentences, there are different relations to


model. Here, the word meaning of the word
“terrible” is inverted by “not” and moderated by
inverts
“too”. Its relation to the word restaurant is
completely different: it describes a property of the
moderates restaurant.
this restaurant was not too terrible The idea behind multi-head self-attention is that
multiple relations are best captured by different
self-attention operations.
property of

24

MULTI-HEAD SELF-ATTENTION The idea of multi-head attention, is that we


project the input sequence down to several lower
Wo
dimensional sequences, and apply a separate low-
dimensional self attention to each of these. After
concatenate
multi-head self-attention this, we concatenate their outputs, and apply
another linear transformation (biases not shown)
self-attention 1 self-attention 2

K1, Q1, V1 K2, Q2, V2

split
W1

W2

If we were to implement this scheme naively, we


IMPLEMENTATION NOTE head 1 input
xi T xj would first apply one transformation to split the
<latexit sha1_base64="vlnS2v28DxeK7y6kIZsKP0NONDY=">AAAN33icfZdNb9s2HMbV7q3L6i3djrsIC7oNQxBIieOXQ4BYko0e1jYL8tItygKKphXVtMhRlGNX0Lm3Ydd9gX2aXTdg32aUX2SJoqwTwefhox//Im3SoziIuGH89+jxBx9+9PEnTz7d+exp4/Mvdp99eRWRmEF0CQkm7I0HIoSDEF3ygGP0hjIEJh5G197YzvTrKWJRQMILPqfodgL8MBgFEHDRdbd76kKSPKTf3SXB21T/9kR3RwzAJHGhl7iz9C5If71Yt9+miRv9xrgQeTJO01R33Z273T3jwFg8erVhrhp72uo5u3v29P2OOyQwnqCQQwyi6MY0KL9NAOMBxCjdceMIUQDHwEc3MR91bpMgpDFHIUz150IbxVjnRM+mow8DhiDHc9EAkAUiQYf3QEyBi0nvlKMiFIIJivaH04BGy2Y09ZcNDkTFbpPZoqJpaWDiM0DvAzgrkSVgEk0Av690RvOJV+5EMUZsOil3ZpSCUXLOEINBlNXgTBTmNc0+UnRBzlb6/ZzeozBKk5jhtDhQCIgxNBIDF80I8Zgmi8mIlTGOTjiL0X7WXPSdOICNz9FwX+SUOso4I0wAL3d50jRm2XLJ6hWiB0gmExAOE5eKhcLRTCyU/YNUiM91Dwu/RwAblp3nqVhqWRk9Tz8X1pL4qiC+ksV+QeyvXkLwUB8Rpk/FkiAs0oVRFxYWQBSVR1/mo0f6pRx9VRCvZPG6IF7LohcX1LiiTgvqtKI+FNQHWZ0VxJkszgvivJLLCyqX1XcF8Z0svtn20p+3vfQXKVZ8HbGL5mLjo5H4pVqsuWQM0+TFxcsf06S7eFYrJUa6WTZCb208GrTa3XYqy3itNwcd03Kqem5oWz3TVhlyR69tG05/w3IoeXNow2j1ey05CuKN3unag6q+gTV6bcdSGDa0A7vZb6/Kh1AoWf3cZ3dbzUpZ/Dyna1nWcbeq5waradudQ4Uhd9iO4/TsBQqNGcVI8tK1sdU6NqpRNA/qGK1mT6FvPoDRsawKLC2wWAPLdMwFC0cAS06eLxa70+v25SC+qb/Vs+3KB+SF8nds02kqDBvWY+e4f7QgIQyEvlwVkpev1e4cdeQokgcN2uILVljI5k2DrmW0KyykwDKwbCsrbHknik12Y94myx/kzb7T90w9TWWz2GiyOdt7a7PkxQozrncr7dv86gG4Fr46Uwjr4qEiHNbCQBULrIeHSni4Dd6v+v26eF8R7tfC+CoWvx7eV8L72+Bp1U/r4qkinNbCUBULrYenSni6DZ5X/bwunivCeS0MV7HweniuhOfb4EnVT+riiSKc1MIQFQuphydKeFKGF38e2TEfYD07JhOsB+HqYFD6zaLZ8WEMxUly6V5O3EHiusDQS3GseC3OuECc8X5IXMD8SRCm4vrgu/tZa5sRzNZG0RJXF1O+qFQbV4cH5vGB8VNz79RaXWKeaF9r32jfa6bW1k61F9qZdqlB7S/tb+0f7d8GaLxv/N74Y2l9/Gg15iut9DT+/B9ehR1l</latexit>

0
wij = p
k head 1 key input vector, and then apply some smaller
head 1 input head 1 query transformations to turn this into a key, query and
head 2 input
head 1 value
value. However, since these are all linear
transformations, we can compose them into three
larger transformations, and compute the keys,
<latexit sha1_base64="vlnS2v28DxeK7y6kIZsKP0NONDY=">AAAN33icfZdNb9s2HMbV7q3L6i3djrsIC7oNQxBIieOXQ4BYko0e1jYL8tItygKKphXVtMhRlGNX0Lm3Ydd9gX2aXTdg32aUX2SJoqwTwefhox//Im3SoziIuGH89+jxBx9+9PEnTz7d+exp4/Mvdp99eRWRmEF0CQkm7I0HIoSDEF3ygGP0hjIEJh5G197YzvTrKWJRQMILPqfodgL8MBgFEHDRdbd76kKSPKTf3SXB21T/9kR3RwzAJHGhl7iz9C5If71Yt9+miRv9xrgQeTJO01R33Z273T3jwFg8erVhrhp72uo5u3v29P2OOyQwnqCQQwyi6MY0KL9NAOMBxCjdceMIUQDHwEc3MR91bpMgpDFHIUz150IbxVjnRM+mow8DhiDHc9EAkAUiQYf3QEyBi0nvlKMiFIIJivaH04BGy2Y09ZcNDkTFbpPZoqJpaWDiM0DvAzgrkSVgEk0Av690RvOJV+5EMUZsOil3ZpSCUXLOEINBlNXgTBTmNc0+UnRBzlb6/ZzeozBKk5jhtDhQCIgxNBIDF80I8Zgmi8mIlTGOTjiL0X7WXPSdOICNz9FwX+SUOso4I0wAL3d50jRm2XLJ6hWiB0gmExAOE5eKhcLRTCyU/YNUiM91Dwu/RwAblp3nqVhqWRk9Tz8X1pL4qiC+ksV+QeyvXkLwUB8Rpk/FkiAs0oVRFxYWQBSVR1/mo0f6pRx9VRCvZPG6IF7LohcX1LiiTgvqtKI+FNQHWZ0VxJkszgvivJLLCyqX1XcF8Z0svtn20p+3vfQXKVZ8HbGL5mLjo5H4pVqsuWQM0+TFxcsf06S7eFYrJUa6WTZCb208GrTa3XYqy3itNwcd03Kqem5oWz3TVhlyR69tG05/w3IoeXNow2j1ey05CuKN3unag6q+gTV6bcdSGDa0A7vZb6/Kh1AoWf3cZ3dbzUpZ/Dyna1nWcbeq5waradudQ4Uhd9iO4/TsBQqNGcVI8tK1sdU6NqpRNA/qGK1mT6FvPoDRsawKLC2wWAPLdMwFC0cAS06eLxa70+v25SC+qb/Vs+3KB+SF8nds02kqDBvWY+e4f7QgIQyEvlwVkpev1e4cdeQokgcN2uILVljI5k2DrmW0KyykwDKwbCsrbHknik12Y94myx/kzb7T90w9TWWz2GiyOdt7a7PkxQozrncr7dv86gG4Fr46Uwjr4qEiHNbCQBULrIeHSni4Dd6v+v26eF8R7tfC+CoWvx7eV8L72+Bp1U/r4qkinNbCUBULrYenSni6DZ5X/bwunivCeS0MV7HweniuhOfb4EnVT+riiSKc1MIQFQuphydKeFKGF38e2TEfYD07JhOsB+HqYFD6zaLZ8WEMxUly6V5O3EHiusDQS3GseC3OuECc8X5IXMD8SRCm4vrgu/tZa5sRzNZG0RJXF1O+qFQbV4cH5vGB8VNz79RaXWKeaF9r32jfa6bW1k61F9qZdqlB7S/tb+0f7d8GaLxv/N74Y2l9/Gg15iut9DT+/B9ehR1l</latexit>

xi T xj
<latexit sha1_base64="vlnS2v28DxeK7y6kIZsKP0NONDY=">AAAN33icfZdNb9s2HMbV7q3L6i3djrsIC7oNQxBIieOXQ4BYko0e1jYL8tItygKKphXVtMhRlGNX0Lm3Ydd9gX2aXTdg32aUX2SJoqwTwefhox//Im3SoziIuGH89+jxBx9+9PEnTz7d+exp4/Mvdp99eRWRmEF0CQkm7I0HIoSDEF3ygGP0hjIEJh5G197YzvTrKWJRQMILPqfodgL8MBgFEHDRdbd76kKSPKTf3SXB21T/9kR3RwzAJHGhl7iz9C5If71Yt9+miRv9xrgQeTJO01R33Z273T3jwFg8erVhrhp72uo5u3v29P2OOyQwnqCQQwyi6MY0KL9NAOMBxCjdceMIUQDHwEc3MR91bpMgpDFHIUz150IbxVjnRM+mow8DhiDHc9EAkAUiQYf3QEyBi0nvlKMiFIIJivaH04BGy2Y09ZcNDkTFbpPZoqJpaWDiM0DvAzgrkSVgEk0Av690RvOJV+5EMUZsOil3ZpSCUXLOEINBlNXgTBTmNc0+UnRBzlb6/ZzeozBKk5jhtDhQCIgxNBIDF80I8Zgmi8mIlTGOTjiL0X7WXPSdOICNz9FwX+SUOso4I0wAL3d50jRm2XLJ6hWiB0gmExAOE5eKhcLRTCyU/YNUiM91Dwu/RwAblp3nqVhqWRk9Tz8X1pL4qiC+ksV+QeyvXkLwUB8Rpk/FkiAs0oVRFxYWQBSVR1/mo0f6pRx9VRCvZPG6IF7LohcX1LiiTgvqtKI+FNQHWZ0VxJkszgvivJLLCyqX1XcF8Z0svtn20p+3vfQXKVZ8HbGL5mLjo5H4pVqsuWQM0+TFxcsf06S7eFYrJUa6WTZCb208GrTa3XYqy3itNwcd03Kqem5oWz3TVhlyR69tG05/w3IoeXNow2j1ey05CuKN3unag6q+gTV6bcdSGDa0A7vZb6/Kh1AoWf3cZ3dbzUpZ/Dyna1nWcbeq5waradudQ4Uhd9iO4/TsBQqNGcVI8tK1sdU6NqpRNA/qGK1mT6FvPoDRsawKLC2wWAPLdMwFC0cAS06eLxa70+v25SC+qb/Vs+3KB+SF8nds02kqDBvWY+e4f7QgIQyEvlwVkpev1e4cdeQokgcN2uILVljI5k2DrmW0KyykwDKwbCsrbHknik12Y94myx/kzb7T90w9TWWz2GiyOdt7a7PkxQozrncr7dv86gG4Fr46Uwjr4qEiHNbCQBULrIeHSni4Dd6v+v26eF8R7tfC+CoWvx7eV8L72+Bp1U/r4qkinNbCUBULrYenSni6DZ5X/bwunivCeS0MV7HweniuhOfb4EnVT+riiSKc1MIQFQuphydKeFKGF38e2TEfYD07JhOsB+HqYFD6zaLZ8WEMxUly6V5O3EHiusDQS3GseC3OuECc8X5IXMD8SRCm4vrgu/tZa5sRzNZG0RJXF1O+qFQbV4cH5vGB8VNz79RaXWKeaF9r32jfa6bW1k61F9qZdqlB7S/tb+0f7d8GaLxv/N74Y2l9/Gg15iut9DT+/B9ehR1l</latexit>

xi T xj
<latexit sha1_base64="vlnS2v28DxeK7y6kIZsKP0NONDY=">AAAN33icfZdNb9s2HMbV7q3L6i3djrsIC7oNQxBIieOXQ4BYko0e1jYL8tItygKKphXVtMhRlGNX0Lm3Ydd9gX2aXTdg32aUX2SJoqwTwefhox//Im3SoziIuGH89+jxBx9+9PEnTz7d+exp4/Mvdp99eRWRmEF0CQkm7I0HIoSDEF3ygGP0hjIEJh5G197YzvTrKWJRQMILPqfodgL8MBgFEHDRdbd76kKSPKTf3SXB21T/9kR3RwzAJHGhl7iz9C5If71Yt9+miRv9xrgQeTJO01R33Z273T3jwFg8erVhrhp72uo5u3v29P2OOyQwnqCQQwyi6MY0KL9NAOMBxCjdceMIUQDHwEc3MR91bpMgpDFHIUz150IbxVjnRM+mow8DhiDHc9EAkAUiQYf3QEyBi0nvlKMiFIIJivaH04BGy2Y09ZcNDkTFbpPZoqJpaWDiM0DvAzgrkSVgEk0Av690RvOJV+5EMUZsOil3ZpSCUXLOEINBlNXgTBTmNc0+UnRBzlb6/ZzeozBKk5jhtDhQCIgxNBIDF80I8Zgmi8mIlTGOTjiL0X7WXPSdOICNz9FwX+SUOso4I0wAL3d50jRm2XLJ6hWiB0gmExAOE5eKhcLRTCyU/YNUiM91Dwu/RwAblp3nqVhqWRk9Tz8X1pL4qiC+ksV+QeyvXkLwUB8Rpk/FkiAs0oVRFxYWQBSVR1/mo0f6pRx9VRCvZPG6IF7LohcX1LiiTgvqtKI+FNQHWZ0VxJkszgvivJLLCyqX1XcF8Z0svtn20p+3vfQXKVZ8HbGL5mLjo5H4pVqsuWQM0+TFxcsf06S7eFYrJUa6WTZCb208GrTa3XYqy3itNwcd03Kqem5oWz3TVhlyR69tG05/w3IoeXNow2j1ey05CuKN3unag6q+gTV6bcdSGDa0A7vZb6/Kh1AoWf3cZ3dbzUpZ/Dyna1nWcbeq5waradudQ4Uhd9iO4/TsBQqNGcVI8tK1sdU6NqpRNA/qGK1mT6FvPoDRsawKLC2wWAPLdMwFC0cAS06eLxa70+v25SC+qb/Vs+3KB+SF8nds02kqDBvWY+e4f7QgIQyEvlwVkpev1e4cdeQokgcN2uILVljI5k2DrmW0KyykwDKwbCsrbHknik12Y94myx/kzb7T90w9TWWz2GiyOdt7a7PkxQozrncr7dv86gG4Fr46Uwjr4qEiHNbCQBULrIeHSni4Dd6v+v26eF8R7tfC+CoWvx7eV8L72+Bp1U/r4qkinNbCUBULrYenSni6DZ5X/bwunivCeS0MV7HweniuhOfb4EnVT+riiSKc1MIQFQuphydKeFKGF38e2TEfYD07JhOsB+HqYFD6zaLZ8WEMxUly6V5O3EHiusDQS3GseC3OuECc8X5IXMD8SRCm4vrgu/tZa5sRzNZG0RJXF1O+qFQbV4cH5vGB8VNz79RaXWKeaF9r32jfa6bW1k61F9qZdqlB7S/tb+0f7d8GaLxv/N74Y2l9/Gg15iut9DT+/B9ehR1l</latexit>

xi T xj queries and values directly in the splitting


0 0 0
wij = p wij = p wij = p operation. This shows that the multi-head
k k k
head 1 key h1 query h1 value
attention can be implemented with the same
head 2 key h2 query
number of parameters as a single-head attention
h2 value
on the same input dimension.
NB the parameter of the Wo transformation are
extra, but they are not strictly necessary.
RECAP

Self-attention: sequence-to-sequence layer with


• parallel computation
• perfect long-term memory

Fundamentally a set-to-set layer, no access to the sequential structure of


the input.

A large part of the behavior comes from the parameters upstream.

27

Lecture 12: Transformers

Peter Bloem
Deep Learning 2020

dlvu.github.io

A recurrent neural network is any neural network


that has a cycle in it

PART TWO: TRANSFORMERS

transformer:
Any sequence-based model that primarily uses self-attention to propagate
information along the time dimension.

more broadly:
Any model that primarily uses self-attention to propagate information
between the basic units of our instances.
pixels -> image transformer
graph nodes -> graph transformer

30
The basic building block of transformer models is
TRANSFORMER BLOCK
usually a simple transformer block.
class Block(nn.Module):

def forward(self, x):


+
<latexit sha1_base64="hT+FQzXOLeuz6OSFWbXGYnQt0eU=">AAANnnicfZdNb9s2HMbVdi9dVq/tetxFWFBg2IJAShy/HArUkmz0sDRpECfZYqOgaFoRTIkERTl2BZ132HX7cPs2o/wiSxRlnQg+Dx/9+Kdoky7FfsQN478nT5999fU33z7/7uD7F40fXr56/eNNRGIG0RASTNidCyKE/RANuc8xuqMMgcDF6Nad2Zl+O0cs8kl4zZcUjQPghf7Uh4CLrk+/fX51aBwbq0evNsxN41DbPJefX7/462A0ITAOUMghBlF0bxqUjxPAuA8xSg9GcYQogDPgofuYTzvjxA9pzFEIU/2t0KYx1jnRMxp94jMEOV6KBoDMFwk6fAAMQC6YD8pREQpBgKKjydyn0boZzb11gwMx4XGyWBUkLQ1MPAbogw8XJbIEBFEA+EOlM1oGbrkTxRixeVDuzCgFo+RcIAb9KKvBpSjMBc1qHF2Ty43+sKQPKIzSJGY4LQ4UAmIMTcXAVTNCPKbJajJiYWfRO85idJQ1V33vHMBmV2hyJHJKHWWcKSaAl7tcaRqLqah1Vq8QPUISBCCcJCOaJiOOFjwZHR2nQnyru1j4XQLYpOy8SpNklJXRdfUrYS2JHwviR1nsF8T+5iUET/QpYfpcfBKERbow6sLCfIii8uhhPnqqD+Xom4J4I4u3BfFWFt24oMYVdV5Q5xX1saA+yuqiIC5kcVkQl5VcXlC5rH4piF9k8W7fS//Y99I/pVixOmIXLcXGR1PxQ7P65pIZTJMP1+e/p0l39Wy+lBjpZtkI3a3xdNBqd9upLOOt3hx0TMup6rmhbfVMW2XIHb22bTj9HcuJ5M2hDaPV77XkKIh3eqdrD6r6DtbotR1LYdjRDuxmv70pH0KhZPVyn91tNStl8fKcrmVZZ92qnhuspm13ThSG3GE7jtOzVyg0ZhQjyUu3xlbrzKhG0TyoY7SaPYW+WwCjY1kVWFpgsQaW6ZgrFo4Alpw8/1jsTq/bl4P4rv5Wz7YrC8gL5e/YptNUGHasZ85Z/3RFQhgIPbkqJC9fq9057chRJA8atMUKVljI7k2DrmW0KyykwDKwbCsrbHknik12b46T9Q/ybt/ph6aeprJZbDTZnO29rVnyYoUZ17uV9n1+9QBcC1+dKYR18VARDmthoIoF1sNDJTzcB+9V/V5dvKcI92phPBWLVw/vKeG9ffC06qd18VQRTmthqIqF1sNTJTzdB8+rfl4XzxXhvBaGq1h4PTxXwvN98KTqJ3XxRBFOamGIioXUwxMlPCnDiz+P7JgPsJ4dkwnW/XBzMCj9ZtHs+DCD4iS5dq8n7iBxXWDoXBwrLsQZF4gz3q/JCDAv8MNUXB+80VHW2mcEi61RtMTVxZQvKtXGzcmxeXZsfGoevrc2l5jn2k/az9ovmqm1tffaB+1SG2pQQ9rf2j/avw29MWicNy7W1qdPNmPeaKWncfc//1EBxQ==</latexit>

The details differ per transformer, but the basic


y = self.layernorm(y)
feed-forward ingredients are usually: one self-attention, one

res
y = self.attention(x) layer normalization feed-forward layer applied individually to each
x = x + y +
<latexit sha1_base64="hT+FQzXOLeuz6OSFWbXGYnQt0eU=">AAANnnicfZdNb9s2HMbVdi9dVq/tetxFWFBg2IJAShy/HArUkmz0sDRpECfZYqOgaFoRTIkERTl2BZ132HX7cPs2o/wiSxRlnQg+Dx/9+Kdoky7FfsQN478nT5999fU33z7/7uD7F40fXr56/eNNRGIG0RASTNidCyKE/RANuc8xuqMMgcDF6Nad2Zl+O0cs8kl4zZcUjQPghf7Uh4CLrk+/fX51aBwbq0evNsxN41DbPJefX7/462A0ITAOUMghBlF0bxqUjxPAuA8xSg9GcYQogDPgofuYTzvjxA9pzFEIU/2t0KYx1jnRMxp94jMEOV6KBoDMFwk6fAAMQC6YD8pREQpBgKKjydyn0boZzb11gwMx4XGyWBUkLQ1MPAbogw8XJbIEBFEA+EOlM1oGbrkTxRixeVDuzCgFo+RcIAb9KKvBpSjMBc1qHF2Ty43+sKQPKIzSJGY4LQ4UAmIMTcXAVTNCPKbJajJiYWfRO85idJQ1V33vHMBmV2hyJHJKHWWcKSaAl7tcaRqLqah1Vq8QPUISBCCcJCOaJiOOFjwZHR2nQnyru1j4XQLYpOy8SpNklJXRdfUrYS2JHwviR1nsF8T+5iUET/QpYfpcfBKERbow6sLCfIii8uhhPnqqD+Xom4J4I4u3BfFWFt24oMYVdV5Q5xX1saA+yuqiIC5kcVkQl5VcXlC5rH4piF9k8W7fS//Y99I/pVixOmIXLcXGR1PxQ7P65pIZTJMP1+e/p0l39Wy+lBjpZtkI3a3xdNBqd9upLOOt3hx0TMup6rmhbfVMW2XIHb22bTj9HcuJ5M2hDaPV77XkKIh3eqdrD6r6DtbotR1LYdjRDuxmv70pH0KhZPVyn91tNStl8fKcrmVZZ92qnhuspm13ThSG3GE7jtOzVyg0ZhQjyUu3xlbrzKhG0TyoY7SaPYW+WwCjY1kVWFpgsQaW6ZgrFo4Alpw8/1jsTq/bl4P4rv5Wz7YrC8gL5e/YptNUGHasZ85Z/3RFQhgIPbkqJC9fq9057chRJA8atMUKVljI7k2DrmW0KyykwDKwbCsrbHknik12b46T9Q/ybt/ph6aeprJZbDTZnO29rVnyYoUZ17uV9n1+9QBcC1+dKYR18VARDmthoIoF1sNDJTzcB+9V/V5dvKcI92phPBWLVw/vKeG9ffC06qd18VQRTmthqIqF1sNTJTzdB8+rfl4XzxXhvBaGq1h4PTxXwvN98KTqJ3XxRBFOamGIioXUwxMlPCnDiz+P7JgPsJ4dkwnW/XBzMCj9ZtHs+DCD4iS5dq8n7iBxXWDoXBwrLsQZF4gz3q/JCDAv8MNUXB+80VHW2mcEi61RtMTVxZQvKtXGzcmxeXZsfGoevrc2l5jn2k/az9ovmqm1tffaB+1SG2pQQ9rf2j/avw29MWicNy7W1qdPNmPeaKWncfc//1EBxQ==</latexit>
token in the sequence and a layer normalization
self-attention
and residual connection for each.

res
y = self.layernorm(x)
layer normalization Note that the self-attention is the only operation
y = self.linear(x) in the block that propagates information across
return x + y the time dimension. The other layers operation
only on each token independently.s

LAYER NORMALIZATION Layer normalization is like batch normalization,


except that it normalizes along a different
{xbt }b,t : input vectorsbt(one per timestep t and batch instance b) in Rd
<latexit sha1_base64="L5K2/xBbak8BcUut24HQZlytr2A=">AAAQZXicrVfdbts2FHab/XRZu6bbsJtdjFjSoF29QEod/6woUEu2UQxrmwb526IkoGhaFiyJHEm5dgVd78n2EHuCvcYo2Zb166sJuTjm9/E7nw4PGdGkjs2Fovxz5+7WJ59+9vm9L7a/vP/gq4c7j74+58RnCJ8h4hB2aUKOHdvDZ8IWDr6kDEPXdPCFOdEj/GKKGbeJdyrmFF+70PLskY2gkEO3j+7+bQTAQGZgmCMwC28CU4TACG8Dsw5ktP8LMASeicD2qC/AFCNBGAdPiIcBxQwI28VcYAr2xB6A3hCYUKAxsD0uoIcw2DP3nspfYC8wXCjGpglOQnAz3JM5jO04sxVnnldnJr74/1PLpC4wLOi6MKyDxS8TCxgmaR0MmQdlGQGFDLpYyJRLDwvz8g/RxUzXD2+MCYrfYP8lMEYMokANg2FocN+9tcFsBct4f6HvYugBMo1ehEywt9CMqyEFuW1JYxs0wZO05M8gbeXpzeEqyRQyOy5GPtEYCmNW1E+NZTVXw2Fg8D+ZCCqcSnEZGC/AM2C8MDDltkM8Obq/LGq0NEPIhvZHvPAxT3tIL8rNac7js/QqJYIMcwQdHN7u7CoHSvyAYqAug93a8jm+fXT/r21jSJDvYk8gB3J+pSpUXAeQCRtJxW3D55hCNIEWvvLFqH292APYQyF4LLGR78iKgmhTgaHNZGs4cxlAxGypANBYtg2SXcO3s1Ice7KdeH04tSlfhHxqLQIRNdx1MIv3dZiZGFgM0rGNZhlnAXR51N2FQT53zewg9h3Mpm52MHIpPeaYM8yQzaMaHMvCvKPRUcFPyfESH8/pGHs8DHzmhOmJEsCM4ZGcGIccC58G8cvI82nCXwrm43oUxmMve5BNTvCwLnUyA1k7I4dAkR0yc68xi7o3qpeHPyAi+8cbBgaVzRo3iVE/CCX4GJiO5JtENmCWeRIGqUMizIJvU+DbPNhPgf1lEuIMwYgkp0V8NkkKsxHm2dlnyewROMtLn6fA8zx4kQIv8qDpp1C/gE5T6LSAfkihH/LoLAXO8uA8Bc4LuiKFijz6MQV+zIOXm5L+vinpHzlZuTpyF83lxscj+f8y7rlggsLg9emb38KgEz/LTvExULNEZK6IzwfNVqcV5mFnhTcGbVXrFfGE0NK6ql5GSBjdlq70+msvhzluYlpRmv1uMy+FnDXe7uiDIr42q3RbPa2EsHY70Bv91rJ8GHs5qpXw9E6zUSiLleh0NE076hTxhKA1dL19WEJIGHqv1+vqsRXqM+rgHJeuiM3mkVKUoolQW2k2uiX4egGUtqYVzNKUF22gqT019iIwdHJMkTSL3u52+nkhsa6/1tX1wgKKVPnbutprlBDWXo96R/3nsRPCoGflq0KS8jVb7eftvBRJhAYtuYIFL2SdadDRlFbBC0l5GWi6FhU2uxPlJrtSr4PFgbzed2BXBWGYJ8uNlidHe29FznGdErJTzS6lb+KXT3AqzRffFKEqeVQijirNoDIvqNo8KjWPNpm3inyrSt4qEbcqzVhlXqxq81apeWuTeVrk0yp5WiJOK83QMi+02jwtNU83mRdFvqiSFyXiotKMKPMiqs2LUvNik3lS5JMqeVIiTirNkDIvpNo8KTVPsuYfR3c6YUMHRJ/JxImuiYsPg8yZRaPPh+g+s2QvXryH5XWB4Tfys+Kd/MaF8hvvp8CAzHJtL5TXB8uoR9EmIpytiDKSVxc1f1EpBueHB+rRgfK+sfvq1+Ul5l7t+9qPtSc1tdaqvaq9rh3Xzmpoq751snW1ZTz8d+fBzrc73y2od+8s53xTyzw7P/wHIdYCSQ==</latexit>

{xbt }b,t : input vectors (one per timestep t and


{x batch
}b,t : instance
input vectors
b) in (one
Rd per timestep t and batch instance b) in Rd dimension of the batch tensor. For each individual
<latexit sha1_base64="L5K2/xBbak8BcUut24HQZlytr2A=">AAAQZXicrVfdbts2FHab/XRZu6bbsJtdjFjSoF29QEod/6woUEu2UQxrmwb526IkoGhaFiyJHEm5dgVd78n2EHuCvcYo2Zb166sJuTjm9/E7nw4PGdGkjs2Fovxz5+7WJ59+9vm9L7a/vP/gq4c7j74+58RnCJ8h4hB2aUKOHdvDZ8IWDr6kDEPXdPCFOdEj/GKKGbeJdyrmFF+70PLskY2gkEO3j+7+bQTAQGZgmCMwC28CU4TACG8Dsw5ktP8LMASeicD2qC/AFCNBGAdPiIcBxQwI28VcYAr2xB6A3hCYUKAxsD0uoIcw2DP3nspfYC8wXCjGpglOQnAz3JM5jO04sxVnnldnJr74/1PLpC4wLOi6MKyDxS8TCxgmaR0MmQdlGQGFDLpYyJRLDwvz8g/RxUzXD2+MCYrfYP8lMEYMokANg2FocN+9tcFsBct4f6HvYugBMo1ehEywt9CMqyEFuW1JYxs0wZO05M8gbeXpzeEqyRQyOy5GPtEYCmNW1E+NZTVXw2Fg8D+ZCCqcSnEZGC/AM2C8MDDltkM8Obq/LGq0NEPIhvZHvPAxT3tIL8rNac7js/QqJYIMcwQdHN7u7CoHSvyAYqAug93a8jm+fXT/r21jSJDvYk8gB3J+pSpUXAeQCRtJxW3D55hCNIEWvvLFqH292APYQyF4LLGR78iKgmhTgaHNZGs4cxlAxGypANBYtg2SXcO3s1Ice7KdeH04tSlfhHxqLQIRNdx1MIv3dZiZGFgM0rGNZhlnAXR51N2FQT53zewg9h3Mpm52MHIpPeaYM8yQzaMaHMvCvKPRUcFPyfESH8/pGHs8DHzmhOmJEsCM4ZGcGIccC58G8cvI82nCXwrm43oUxmMve5BNTvCwLnUyA1k7I4dAkR0yc68xi7o3qpeHPyAi+8cbBgaVzRo3iVE/CCX4GJiO5JtENmCWeRIGqUMizIJvU+DbPNhPgf1lEuIMwYgkp0V8NkkKsxHm2dlnyewROMtLn6fA8zx4kQIv8qDpp1C/gE5T6LSAfkihH/LoLAXO8uA8Bc4LuiKFijz6MQV+zIOXm5L+vinpHzlZuTpyF83lxscj+f8y7rlggsLg9emb38KgEz/LTvExULNEZK6IzwfNVqcV5mFnhTcGbVXrFfGE0NK6ql5GSBjdlq70+msvhzluYlpRmv1uMy+FnDXe7uiDIr42q3RbPa2EsHY70Bv91rJ8GHs5qpXw9E6zUSiLleh0NE076hTxhKA1dL19WEJIGHqv1+vqsRXqM+rgHJeuiM3mkVKUoolQW2k2uiX4egGUtqYVzNKUF22gqT019iIwdHJMkTSL3u52+nkhsa6/1tX1wgKKVPnbutprlBDWXo96R/3nsRPCoGflq0KS8jVb7eftvBRJhAYtuYIFL2SdadDRlFbBC0l5GWi6FhU2uxPlJrtSr4PFgbzed2BXBWGYJ8uNlidHe29FznGdErJTzS6lb+KXT3AqzRffFKEqeVQijirNoDIvqNo8KjWPNpm3inyrSt4qEbcqzVhlXqxq81apeWuTeVrk0yp5WiJOK83QMi+02jwtNU83mRdFvqiSFyXiotKMKPMiqs2LUvNik3lS5JMqeVIiTirNkDIvpNo8KTVPsuYfR3c6YUMHRJ/JxImuiYsPg8yZRaPPh+g+s2QvXryH5XWB4Tfys+Kd/MaF8hvvp8CAzHJtL5TXB8uoR9EmIpytiDKSVxc1f1EpBueHB+rRgfK+sfvq1+Ul5l7t+9qPtSc1tdaqvaq9rh3Xzmpoq751snW1ZTz8d+fBzrc73y2od+8s53xTyzw7P/wHIdYCSQ==</latexit>
<latexit sha1_base64="L5K2/xBbak8BcUut24HQZlytr2A=">AAAQZXicrVfdbts2FHab/XRZu6bbsJtdjFjSoF29QEod/6woUEu2UQxrmwb526IkoGhaFiyJHEm5dgVd78n2EHuCvcYo2Zb166sJuTjm9/E7nw4PGdGkjs2Fovxz5+7WJ59+9vm9L7a/vP/gq4c7j74+58RnCJ8h4hB2aUKOHdvDZ8IWDr6kDEPXdPCFOdEj/GKKGbeJdyrmFF+70PLskY2gkEO3j+7+bQTAQGZgmCMwC28CU4TACG8Dsw5ktP8LMASeicD2qC/AFCNBGAdPiIcBxQwI28VcYAr2xB6A3hCYUKAxsD0uoIcw2DP3nspfYC8wXCjGpglOQnAz3JM5jO04sxVnnldnJr74/1PLpC4wLOi6MKyDxS8TCxgmaR0MmQdlGQGFDLpYyJRLDwvz8g/RxUzXD2+MCYrfYP8lMEYMokANg2FocN+9tcFsBct4f6HvYugBMo1ehEywt9CMqyEFuW1JYxs0wZO05M8gbeXpzeEqyRQyOy5GPtEYCmNW1E+NZTVXw2Fg8D+ZCCqcSnEZGC/AM2C8MDDltkM8Obq/LGq0NEPIhvZHvPAxT3tIL8rNac7js/QqJYIMcwQdHN7u7CoHSvyAYqAug93a8jm+fXT/r21jSJDvYk8gB3J+pSpUXAeQCRtJxW3D55hCNIEWvvLFqH292APYQyF4LLGR78iKgmhTgaHNZGs4cxlAxGypANBYtg2SXcO3s1Ice7KdeH04tSlfhHxqLQIRNdx1MIv3dZiZGFgM0rGNZhlnAXR51N2FQT53zewg9h3Mpm52MHIpPeaYM8yQzaMaHMvCvKPRUcFPyfESH8/pGHs8DHzmhOmJEsCM4ZGcGIccC58G8cvI82nCXwrm43oUxmMve5BNTvCwLnUyA1k7I4dAkR0yc68xi7o3qpeHPyAi+8cbBgaVzRo3iVE/CCX4GJiO5JtENmCWeRIGqUMizIJvU+DbPNhPgf1lEuIMwYgkp0V8NkkKsxHm2dlnyewROMtLn6fA8zx4kQIv8qDpp1C/gE5T6LSAfkihH/LoLAXO8uA8Bc4LuiKFijz6MQV+zIOXm5L+vinpHzlZuTpyF83lxscj+f8y7rlggsLg9emb38KgEz/LTvExULNEZK6IzwfNVqcV5mFnhTcGbVXrFfGE0NK6ql5GSBjdlq70+msvhzluYlpRmv1uMy+FnDXe7uiDIr42q3RbPa2EsHY70Bv91rJ8GHs5qpXw9E6zUSiLleh0NE076hTxhKA1dL19WEJIGHqv1+vqsRXqM+rgHJeuiM3mkVKUoolQW2k2uiX4egGUtqYVzNKUF22gqT019iIwdHJMkTSL3u52+nkhsa6/1tX1wgKKVPnbutprlBDWXo96R/3nsRPCoGflq0KS8jVb7eftvBRJhAYtuYIFL2SdadDRlFbBC0l5GWi6FhU2uxPlJrtSr4PFgbzed2BXBWGYJ8uNlidHe29FznGdErJTzS6lb+KXT3AqzRffFKEqeVQijirNoDIvqNo8KjWPNpm3inyrSt4qEbcqzVhlXqxq81apeWuTeVrk0yp5WiJOK83QMi+02jwtNU83mRdFvqiSFyXiotKMKPMiqs2LUvNik3lS5JMqeVIiTirNkDIvpNo8KTVPsuYfR3c6YUMHRJ/JxImuiYsPg8yZRaPPh+g+s2QvXryH5XWB4Tfys+Kd/MaF8hvvp8CAzHJtL5TXB8uoR9EmIpytiDKSVxc1f1EpBueHB+rRgfK+sfvq1+Ul5l7t+9qPtSc1tdaqvaq9rh3Xzmpoq751snW1ZTz8d+fBzrc73y2od+8s53xTyzw7P/wHIdYCSQ==</latexit>

{ybt }b,t : output vectors (one per timestep t and d batch instance b) in R
d
{ybt }b,t : output vectors (one per timestep t and
{ybt }batch: output
instance
vectors
b) in (one
R per timestep t and batch instance b) in Rd
b,t
, : learnable parameter vectors
, : learnable parameter vectors , : learnable parameter vectors vector representing
time
1 X bt
batch
µbt =
1 X bt
xi
µbt =
d
xi
µbt =
1 X bt
xi mean over token
mean over tokenNote that
mean over this does not propagate information
token
d i d
input features

i
1 X bt bt
=
1 X bt 2
(xi - µ)bt
i
1 X bt across the time dimension. That is still reserved for
variance over token
bt
= (xi - µ)2 d = (xi - µ)2 variance over token variance over token
d
xbt - µbt x̂bt =p
xbt - µbt
d
xbt - µbt
the self attention only.
standardize
x̂bt =p bt + ✏
bt
x̂ = p standardize standardize
bt + ✏ bt + ✏
ybt = T x̂bt + rescale
ybt = T x̂bt + ybt = T x̂bt + rescale While layer
rescale normalization tends to work a little

less well than batch normalization, the great


normalize to N(0, 1)
benefit here is that its behavior doesn’t depend on
32 the batch size. This is important, because
transformer models are often so big that we can
only train on single-instance batches. We can
accumulate the gradients, but the forward pass

Once we’ve defined a transformer block, all we


need to do is stack a bunch of them together.
global sum/avg/max pooling Then, if we have a senquence-to-label task, we
output sequence just need one global pooling operation and we
have a sequence-to-label model.
transformer block

transformer block

transformer block

input embeddings

WHAT ABOUT AUTOREGRESSIVE MODELS? What about autoregressive modeling?


targets h e l l o ! ! If we do this naively, we have a problem: the self-
attention operation can just look ahead in the
sequence to predict what the next model will be.
We will never learn to predict the future from the
causal
transformer block <- not past. In short the transformer block is not a causal
sequence-to-sequence operation.
transformer block

transformer block

34
inputs h e l l o ! !
The solution is simple: when we compute the
MASKING: MAKING SELF-ATTENTION CAUSAL
attention weights, we mask out any attention from

You might also like