L09 MLP Slides
L09 MLP Slides
Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching
Lecture 09
Multilayer Perceptrons
Sebastian Raschka STAT 453: Intro to Deep Learning 1
Today
We will nally be able to solve the
XOR problem ...
w1,1
(2)
y
<latexit sha1_base64="5CoRH/4hNmmOELpSJVIbVc5Zpaw=">AAAB9XicbVBNSwMxEJ31s9avqkcvwSJUkLJRQY9FLx4r2A9otyWbZtvQbHZJspay9H948aCIV/+LN/+NabsHbX0w8Hhvhpl5fiy4Nq777aysrq1vbOa28ts7u3v7hYPDuo4SRVmNRiJSTZ9oJrhkNcONYM1YMRL6gjX84d3UbzwxpXkkH804Zl5I+pIHnBJjpc6om+JzhCedtITPJt1C0S27M6BlgjNShAzVbuGr3YtoEjJpqCBat7AbGy8lynAq2CTfTjSLCR2SPmtZKknItJfOrp6gU6v0UBApW9Kgmfp7IiWh1uPQt50hMQO96E3F/7xWYoIbL+UyTgyTdL4oSAQyEZpGgHpcMWrE2BJCFbe3IjogilBjg8rbEPDiy8ukflHGl2X34apYuc3iyMExnEAJMFxDBe6hCjWgoOAZXuHNGTkvzrvzMW9dcbKZI/gD5/MHvkORXA==</latexit>
<latexit sha1_base64="UUz+fFdIMuJxCEAyuchQQrqM+Xo=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/JkV0=</latexit>
x1
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>
(2) (3)
a1 w1,1
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>
<latexit sha1_base64="2hc9HR5bv0+inQ8IsEn3Gd2v7nM=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpTECnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaV5WnGrFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8AcFPkV4=</latexit>
L(y, o) = l
<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>
(1)
a2 o
<latexit sha1_base64="xkDVhV2R7yGjiI8Bkoa6EodHAlw=">AAAB/nicbVDLSsNAFL2pr1pfUXHlZrAIFaQkKuhGKLpx4aKCfUAbymQ6aYdOJmFmIpRQ8FfcuFDErd/hzr9x0mah1QMDh3Pu5Z45fsyZ0o7zZRUWFpeWV4qrpbX1jc0te3unqaJEEtogEY9k28eKciZoQzPNaTuWFIc+py1/dJ35rQcqFYvEvR7H1AvxQLCAEayN1LP3uiHWQ4J5ejupjI9RdIQuEe/ZZafqTIH+EjcnZchR79mf3X5EkpAKTThWquM6sfZSLDUjnE5K3UTRGJMRHtCOoQKHVHnpNP4EHRqlj4JImic0mqo/N1IcKjUOfTOZhVXzXib+53USHVx4KRNxoqkgs0NBwpGOUNYF6jNJieZjQzCRzGRFZIglJto0VjIluPNf/kuaJ1X3tOrcnZVrV3kdRdiHA6iAC+dQgxuoQwMIpPAEL/BqPVrP1pv1PhstWPnOLvyC9fENUuuUZw==</latexit>
w1,2
(1)
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>
<latexit sha1_base64="zmvhV5w6wvufBjgJnplzs3qmpp8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69BIvgqSQq6LHoxWML9gPaUDbbSbt2sxt2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemHCmjed9O4W19Y3NreJ2aWd3b/+gfHjU0jJVFJtUcqk6IdHImcCmYYZjJ1FI4pBjOxzfzfz2EyrNpHgwkwSDmAwFixglxkoN2S9XvKo3h7tK/JxUIEe9X/7qDSRNYxSGcqJ11/cSE2REGUY5Tku9VGNC6JgMsWupIDHqIJsfOnXPrDJwI6lsCePO1d8TGYm1nsSh7YyJGellbyb+53VTE90EGRNJalDQxaIo5a6R7uxrd8AUUsMnlhCqmL3VpSOiCDU2m5INwV9+eZW0Lqr+ZdVrXFVqt3kcRTiBUzgHH66hBvdQhyZQQHiGV3hzHp0X5935WLQWnHzmGP7A+fwB2T+M9Q==</latexit>
l <latexit sha1_base64="E5Kc1ZKr520j8ga7QDzfGA0mefk=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzUEP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03MX5GleFM4LTUSzUmlI3pELuWShqh9rP5oVNyZpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjjZ1wmqUHJFovCVBATk9nXZMAVMiMmllCmuL2VsBFVlBmbTcmG4C2/vEpaF1Xvsuo2riq12zyOIpzAKZyDB9dQg3uoQxMYIDzDK7w5j86L8+58LFoLTj5zDH/gfP4A1LOM8g==</latexit>
(2)
<latexit sha1_base64="94K/4WUXlcb/JabfmqCJ0lfwAyA=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QNW0l5Sd87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/OkV0=</latexit>
x2
<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>
(2)
w1,3
<latexit sha1_base64="AnHRMEgiEAO14EMHGCDfABkTtl0=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTfuJcomraS0qVi7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8ODkWE=</latexit>
a2
<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>
(1) (1)
w3,2
<latexit sha1_base64="PK5wtTnoxgAbdbB3wQfMpe3ws7s=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTflK9RJW0l5Sci7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8OOkWE=</latexit>
a3 where o := σ(z) = σ(w⊤x + b)
<latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>
c Error lands
Input
Parameter 1
b
Scalar feedback Vector feedback
Fig. 1 | A spectrum of learning algorithms. a | Left to right: a neural b | Backpropagation and perturbation a
network computes an output through a series of simple computational respect to the specificity of the synaptic
units. To
Lillicrap, T. P., Santoro, A., improve
Marris, L., &its outputs
Akerman, for aBackpropagation
C. (n.d.). task , it adjusts
andthe synapses
the brain. Nature between on this1–12.
Reviews Neuroscience, spectrum learn at different sp
https://doi.org/10.1038/
s41583-020-0277-3 these units. Simple Hebbian learning — which dictates that a synaptic con- parameters wander randomly on the er
nection should strengthen if a presynaptic neuron reliably contributes to a require detailed feedback circuits, but
postsynaptic neuron’s Sebastian
firing — cannot
Raschka make meaningful changes
STAT 453: Intro to Deepto the blue is used to inform learning at all synapse
Learning 7
Computation Graph with Multiple Fully-Connected Layers
= Multilayer Perceptron
here, we could also writ o
as a (3)
1
(1)
w1,1
(1) a1 <latexit sha1_base64="51Rbp1GGPW28qr7Kl7NY0LPiq2o=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5Iu5Zsmm1Dk+ySZIWy9Fd48aCIV3+ON/+N6XYP2vpg4PHeDDPzgpgzbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRWiTRDxSnQBrypmkTcMMp51YUSwCTtvB+Gbmt5+o0iyS92YSU1/goWQhI9hY6QH3vce06p1O++WKW3MzoGXi5aQCORr98ldvEJFEUGkIx1p3PTc2foqVYYTTaamXaBpjMsZD2rVUYkG1n2YHT9GJVQYojJQtaVCm/p5IsdB6IgLbKbAZ6UVvJv7ndRMTXvkpk3FiqCTzRWHCkYnQ7Hs0YIoSwyeWYKKYvRWREVaYGJtRyYbgLb68TFpnNe+85t5dVOrXeRxFOIJjqIIHl1CHW2hAEwgIeIZXeHOU8+K8Ox/z1oKTzxzCHzifP5zWj58=</latexit>
w1,1
(2)
y
<latexit sha1_base64="5CoRH/4hNmmOELpSJVIbVc5Zpaw=">AAAB9XicbVBNSwMxEJ31s9avqkcvwSJUkLJRQY9FLx4r2A9otyWbZtvQbHZJspay9H948aCIV/+LN/+NabsHbX0w8Hhvhpl5fiy4Nq777aysrq1vbOa28ts7u3v7hYPDuo4SRVmNRiJSTZ9oJrhkNcONYM1YMRL6gjX84d3UbzwxpXkkH804Zl5I+pIHnBJjpc6om+JzhCedtITPJt1C0S27M6BlgjNShAzVbuGr3YtoEjJpqCBat7AbGy8lynAq2CTfTjSLCR2SPmtZKknItJfOrp6gU6v0UBApW9Kgmfp7IiWh1uPQt50hMQO96E3F/7xWYoIbL+UyTgyTdL4oSAQyEZpGgHpcMWrE2BJCFbe3IjogilBjg8rbEPDiy8ukflHGl2X34apYuc3iyMExnEAJMFxDBe6hCjWgoOAZXuHNGTkvzrvzMW9dcbKZI/gD5/MHvkORXA==</latexit>
<latexit sha1_base64="UUz+fFdIMuJxCEAyuchQQrqM+Xo=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/JkV0=</latexit>
x1
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>
(2) (3)
a1 w1,1
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>
<latexit sha1_base64="2hc9HR5bv0+inQ8IsEn3Gd2v7nM=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpTECnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaV5WnGrFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8AcFPkV4=</latexit>
L(y, o) = l
<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>
(1)
a2 o
<latexit sha1_base64="xkDVhV2R7yGjiI8Bkoa6EodHAlw=">AAAB/nicbVDLSsNAFL2pr1pfUXHlZrAIFaQkKuhGKLpx4aKCfUAbymQ6aYdOJmFmIpRQ8FfcuFDErd/hzr9x0mah1QMDh3Pu5Z45fsyZ0o7zZRUWFpeWV4qrpbX1jc0te3unqaJEEtogEY9k28eKciZoQzPNaTuWFIc+py1/dJ35rQcqFYvEvR7H1AvxQLCAEayN1LP3uiHWQ4J5ejupjI9RdIQuEe/ZZafqTIH+EjcnZchR79mf3X5EkpAKTThWquM6sfZSLDUjnE5K3UTRGJMRHtCOoQKHVHnpNP4EHRqlj4JImic0mqo/N1IcKjUOfTOZhVXzXib+53USHVx4KRNxoqkgs0NBwpGOUNYF6jNJieZjQzCRzGRFZIglJto0VjIluPNf/kuaJ1X3tOrcnZVrV3kdRdiHA6iAC+dQgxuoQwMIpPAEL/BqPVrP1pv1PhstWPnOLvyC9fENUuuUZw==</latexit>
w1,2
(1)
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>
<latexit sha1_base64="zmvhV5w6wvufBjgJnplzs3qmpp8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69BIvgqSQq6LHoxWML9gPaUDbbSbt2sxt2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemHCmjed9O4W19Y3NreJ2aWd3b/+gfHjU0jJVFJtUcqk6IdHImcCmYYZjJ1FI4pBjOxzfzfz2EyrNpHgwkwSDmAwFixglxkoN2S9XvKo3h7tK/JxUIEe9X/7qDSRNYxSGcqJ11/cSE2REGUY5Tku9VGNC6JgMsWupIDHqIJsfOnXPrDJwI6lsCePO1d8TGYm1nsSh7YyJGellbyb+53VTE90EGRNJalDQxaIo5a6R7uxrd8AUUsMnlhCqmL3VpSOiCDU2m5INwV9+eZW0Lqr+ZdVrXFVqt3kcRTiBUzgHH66hBvdQhyZQQHiGV3hzHp0X5935WLQWnHzmGP7A+fwB2T+M9Q==</latexit>
l <latexit sha1_base64="E5Kc1ZKr520j8ga7QDzfGA0mefk=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzUEP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03MX5GleFM4LTUSzUmlI3pELuWShqh9rP5oVNyZpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjjZ1wmqUHJFovCVBATk9nXZMAVMiMmllCmuL2VsBFVlBmbTcmG4C2/vEpaF1Xvsuo2riq12zyOIpzAKZyDB9dQg3uoQxMYIDzDK7w5j86L8+58LFoLTj5zDH/gfP4A1LOM8g==</latexit>
(2)
<latexit sha1_base64="94K/4WUXlcb/JabfmqCJ0lfwAyA=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QNW0l5Sd87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/OkV0=</latexit>
x2
<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>
(2)
w1,3
<latexit sha1_base64="AnHRMEgiEAO14EMHGCDfABkTtl0=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTfuJcomraS0qVi7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8ODkWE=</latexit>
a2
<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>
(1) (1)
w3,2
<latexit sha1_base64="PK5wtTnoxgAbdbB3wQfMpe3ws7s=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTflK9RJW0l5Sci7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8OOkWE=</latexit>
a3
output laye
<latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>
(layer 4)
w1,1
(2)
y
<latexit sha1_base64="5CoRH/4hNmmOELpSJVIbVc5Zpaw=">AAAB9XicbVBNSwMxEJ31s9avqkcvwSJUkLJRQY9FLx4r2A9otyWbZtvQbHZJspay9H948aCIV/+LN/+NabsHbX0w8Hhvhpl5fiy4Nq777aysrq1vbOa28ts7u3v7hYPDuo4SRVmNRiJSTZ9oJrhkNcONYM1YMRL6gjX84d3UbzwxpXkkH804Zl5I+pIHnBJjpc6om+JzhCedtITPJt1C0S27M6BlgjNShAzVbuGr3YtoEjJpqCBat7AbGy8lynAq2CTfTjSLCR2SPmtZKknItJfOrp6gU6v0UBApW9Kgmfp7IiWh1uPQt50hMQO96E3F/7xWYoIbL+UyTgyTdL4oSAQyEZpGgHpcMWrE2BJCFbe3IjogilBjg8rbEPDiy8ukflHGl2X34apYuc3iyMExnEAJMFxDBe6hCjWgoOAZXuHNGTkvzrvzMW9dcbKZI/gD5/MHvkORXA==</latexit>
<latexit sha1_base64="UUz+fFdIMuJxCEAyuchQQrqM+Xo=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/JkV0=</latexit>
x1
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>
(2) (3)
a1 w1,1
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>
<latexit sha1_base64="2hc9HR5bv0+inQ8IsEn3Gd2v7nM=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpTECnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaV5WnGrFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8AcFPkV4=</latexit>
L(y, o) = l
<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>
(1)
a2 o
<latexit sha1_base64="xkDVhV2R7yGjiI8Bkoa6EodHAlw=">AAAB/nicbVDLSsNAFL2pr1pfUXHlZrAIFaQkKuhGKLpx4aKCfUAbymQ6aYdOJmFmIpRQ8FfcuFDErd/hzr9x0mah1QMDh3Pu5Z45fsyZ0o7zZRUWFpeWV4qrpbX1jc0te3unqaJEEtogEY9k28eKciZoQzPNaTuWFIc+py1/dJ35rQcqFYvEvR7H1AvxQLCAEayN1LP3uiHWQ4J5ejupjI9RdIQuEe/ZZafqTIH+EjcnZchR79mf3X5EkpAKTThWquM6sfZSLDUjnE5K3UTRGJMRHtCOoQKHVHnpNP4EHRqlj4JImic0mqo/N1IcKjUOfTOZhVXzXib+53USHVx4KRNxoqkgs0NBwpGOUNYF6jNJieZjQzCRzGRFZIglJto0VjIluPNf/kuaJ1X3tOrcnZVrV3kdRdiHA6iAC+dQgxuoQwMIpPAEL/BqPVrP1pv1PhstWPnOLvyC9fENUuuUZw==</latexit>
w1,2
(1)
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>
<latexit sha1_base64="zmvhV5w6wvufBjgJnplzs3qmpp8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69BIvgqSQq6LHoxWML9gPaUDbbSbt2sxt2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemHCmjed9O4W19Y3NreJ2aWd3b/+gfHjU0jJVFJtUcqk6IdHImcCmYYZjJ1FI4pBjOxzfzfz2EyrNpHgwkwSDmAwFixglxkoN2S9XvKo3h7tK/JxUIEe9X/7qDSRNYxSGcqJ11/cSE2REGUY5Tku9VGNC6JgMsWupIDHqIJsfOnXPrDJwI6lsCePO1d8TGYm1nsSh7YyJGellbyb+53VTE90EGRNJalDQxaIo5a6R7uxrd8AUUsMnlhCqmL3VpSOiCDU2m5INwV9+eZW0Lqr+ZdVrXFVqt3kcRTiBUzgHH66hBvdQhyZQQHiGV3hzHp0X5935WLQWnHzmGP7A+fwB2T+M9Q==</latexit>
l <latexit sha1_base64="E5Kc1ZKr520j8ga7QDzfGA0mefk=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzUEP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03MX5GleFM4LTUSzUmlI3pELuWShqh9rP5oVNyZpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjjZ1wmqUHJFovCVBATk9nXZMAVMiMmllCmuL2VsBFVlBmbTcmG4C2/vEpaF1Xvsuo2riq12zyOIpzAKZyDB9dQg3uoQxMYIDzDK7w5j86L8+58LFoLTj5zDH/gfP4A1LOM8g==</latexit>
(2)
<latexit sha1_base64="94K/4WUXlcb/JabfmqCJ0lfwAyA=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QNW0l5Sd87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/OkV0=</latexit>
x2
<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>
(2)
w1,3
<latexit sha1_base64="AnHRMEgiEAO14EMHGCDfABkTtl0=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTfuJcomraS0qVi7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8ODkWE=</latexit>
a2
<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>
(1) (1)
w3,2
<latexit sha1_base64="PK5wtTnoxgAbdbB3wQfMpe3ws7s=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTflK9RJW0l5Sci7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8OOkWE=</latexit>
a3
output laye
<latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>
(layer 4) layer 3
(1)
w1,1
(1) a1 <latexit sha1_base64="51Rbp1GGPW28qr7Kl7NY0LPiq2o=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5Iu5Zsmm1Dk+ySZIWy9Fd48aCIV3+ON/+N6XYP2vpg4PHeDDPzgpgzbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRWiTRDxSnQBrypmkTcMMp51YUSwCTtvB+Gbmt5+o0iyS92YSU1/goWQhI9hY6QH3vce06p1O++WKW3MzoGXi5aQCORr98ldvEJFEUGkIx1p3PTc2foqVYYTTaamXaBpjMsZD2rVUYkG1n2YHT9GJVQYojJQtaVCm/p5IsdB6IgLbKbAZ6UVvJv7ndRMTXvkpk3FiqCTzRWHCkYnQ7Hs0YIoSwyeWYKKYvRWREVaYGJtRyYbgLb68TFpnNe+85t5dVOrXeRxFOIJjqIIHl1CHW2hAEwgIeIZXeHOU8+K8Ox/z1oKTzxzCHzifP5zWj58=</latexit>
w1,1
(2)
y
<latexit sha1_base64="5CoRH/4hNmmOELpSJVIbVc5Zpaw=">AAAB9XicbVBNSwMxEJ31s9avqkcvwSJUkLJRQY9FLx4r2A9otyWbZtvQbHZJspay9H948aCIV/+LN/+NabsHbX0w8Hhvhpl5fiy4Nq777aysrq1vbOa28ts7u3v7hYPDuo4SRVmNRiJSTZ9oJrhkNcONYM1YMRL6gjX84d3UbzwxpXkkH804Zl5I+pIHnBJjpc6om+JzhCedtITPJt1C0S27M6BlgjNShAzVbuGr3YtoEjJpqCBat7AbGy8lynAq2CTfTjSLCR2SPmtZKknItJfOrp6gU6v0UBApW9Kgmfp7IiWh1uPQt50hMQO96E3F/7xWYoIbL+UyTgyTdL4oSAQyEZpGgHpcMWrE2BJCFbe3IjogilBjg8rbEPDiy8ukflHGl2X34apYuc3iyMExnEAJMFxDBe6hCjWgoOAZXuHNGTkvzrvzMW9dcbKZI/gD5/MHvkORXA==</latexit>
<latexit sha1_base64="UUz+fFdIMuJxCEAyuchQQrqM+Xo=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/JkV0=</latexit>
x1
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>
(2) (3)
a1 w1,1
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>
<latexit sha1_base64="2hc9HR5bv0+inQ8IsEn3Gd2v7nM=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpTECnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaV5WnGrFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8AcFPkV4=</latexit>
L(y, o) = l
<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>
(1)
a2 o
<latexit sha1_base64="xkDVhV2R7yGjiI8Bkoa6EodHAlw=">AAAB/nicbVDLSsNAFL2pr1pfUXHlZrAIFaQkKuhGKLpx4aKCfUAbymQ6aYdOJmFmIpRQ8FfcuFDErd/hzr9x0mah1QMDh3Pu5Z45fsyZ0o7zZRUWFpeWV4qrpbX1jc0te3unqaJEEtogEY9k28eKciZoQzPNaTuWFIc+py1/dJ35rQcqFYvEvR7H1AvxQLCAEayN1LP3uiHWQ4J5ejupjI9RdIQuEe/ZZafqTIH+EjcnZchR79mf3X5EkpAKTThWquM6sfZSLDUjnE5K3UTRGJMRHtCOoQKHVHnpNP4EHRqlj4JImic0mqo/N1IcKjUOfTOZhVXzXib+53USHVx4KRNxoqkgs0NBwpGOUNYF6jNJieZjQzCRzGRFZIglJto0VjIluPNf/kuaJ1X3tOrcnZVrV3kdRdiHA6iAC+dQgxuoQwMIpPAEL/BqPVrP1pv1PhstWPnOLvyC9fENUuuUZw==</latexit>
w1,2
(1)
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>
<latexit sha1_base64="zmvhV5w6wvufBjgJnplzs3qmpp8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69BIvgqSQq6LHoxWML9gPaUDbbSbt2sxt2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemHCmjed9O4W19Y3NreJ2aWd3b/+gfHjU0jJVFJtUcqk6IdHImcCmYYZjJ1FI4pBjOxzfzfz2EyrNpHgwkwSDmAwFixglxkoN2S9XvKo3h7tK/JxUIEe9X/7qDSRNYxSGcqJ11/cSE2REGUY5Tku9VGNC6JgMsWupIDHqIJsfOnXPrDJwI6lsCePO1d8TGYm1nsSh7YyJGellbyb+53VTE90EGRNJalDQxaIo5a6R7uxrd8AUUsMnlhCqmL3VpSOiCDU2m5INwV9+eZW0Lqr+ZdVrXFVqt3kcRTiBUzgHH66hBvdQhyZQQHiGV3hzHp0X5935WLQWnHzmGP7A+fwB2T+M9Q==</latexit>
l <latexit sha1_base64="E5Kc1ZKr520j8ga7QDzfGA0mefk=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzUEP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03MX5GleFM4LTUSzUmlI3pELuWShqh9rP5oVNyZpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjjZ1wmqUHJFovCVBATk9nXZMAVMiMmllCmuL2VsBFVlBmbTcmG4C2/vEpaF1Xvsuo2riq12zyOIpzAKZyDB9dQg3uoQxMYIDzDK7w5j86L8+58LFoLTj5zDH/gfP4A1LOM8g==</latexit>
(2)
<latexit sha1_base64="94K/4WUXlcb/JabfmqCJ0lfwAyA=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QNW0l5Sd87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/OkV0=</latexit>
x2
<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>
(2)
w1,3
<latexit sha1_base64="AnHRMEgiEAO14EMHGCDfABkTtl0=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTfuJcomraS0qVi7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8ODkWE=</latexit>
a2
<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>
(1) (1)
w3,2 a3 could use sigmoid here
<latexit sha1_base64="PK5wtTnoxgAbdbB3wQfMpe3ws7s=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTflK9RJW0l5Sci7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8OOkWE=</latexit>
<latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>
x1 (2)
<latexit sha1_base64="sJdgXiAVm2a4S+4dRd3rRrYB1HY=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK1hbaUDbbTbt0swm7EyGE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSKQw6LrfTmlldW19o7xZ2dre2d2r7h88mjjVjLdYLGPdCajhUijeQoGSdxLNaRRI3g7GN1O//cS1EbF6wCzhfkSHSoSCUbTSfdb3+tWaW3dnIMvEK0gNCjT71a/eIGZpxBUySY3pem6Cfk41Cib5pNJLDU8oG9Mh71qqaMSNn89OnZATqwxIGGtbCslM/T2R08iYLApsZ0RxZBa9qfif100xvPJzoZIUuWLzRWEqCcZk+jcZCM0ZyswSyrSwtxI2opoytOlUbAje4svL5PGs7p3X3buLWuO6iKMMR3AMp+DBJTTgFprQAgZDeIZXeHOk8+K8Ox/z1pJTzBzCHzifPw3gjaM=</latexit> <latexit sha1_base64="fJxEJZDwIRAXzsny9UbZpYFPXZ4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHkyXoR3QoecgZNVZ6yPq1frniVt05yCrxclKBHI1++as3iFkaoTRMUK27npsYf0KV4UzgtNRLNSaUjekQu5ZKGqH2J/NTp+TMKgMSxsqWNGSu/p6Y0EjrLApsZ0TNSC97M/E/r5ua8NqfcJmkBiVbLApTQUxMZn+TAVfIjMgsoUxxeythI6ooMzadkg3BW355lbRqVe+i6t5fVuo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnBfn3flYtBacfOYY/sD5/AEPZI2k</latexit>
<latexit sha1_base64="85kz+6+8sUyRlr+84amQIqvMQLQ=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0msoMeiF48V7Qe0oWy2k3bpZhN2N0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHkyXoR3QoecgZNVZ6yPq1frniVt05yCrxclKBHI1++as3iFkaoTRMUK27npsYf0KV4UzgtNRLNSaUjekQu5ZKGqH2J/NTp+TMKgMSxsqWNGSu/p6Y0EjrLApsZ0TNSC97M/E/r5ua8NqfcJmkBiVbLApTQUxMZn+TAVfIjMgsoUxxeythI6ooMzadkg3BW355lbQuql6t6t5fVuo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnBfn3flYtBacfOYY/sD5/AEQ6I2l</latexit>
a1 o1
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>
<latexit sha1_base64="P19Wda8vivmLhYYW0RO0w4mIIBA=">AAAB6nicbVA9SwNBEJ2LXzF+RS1tFoNgFe5U0DJoYxnRmEByhL3NXrJkb/fYnRNCyE+wsVDE1l9k579xk1yhiQ8GHu/NMDMvSqWw6PvfXmFldW19o7hZ2tre2d0r7x88Wp0ZxhtMS21aEbVcCsUbKFDyVmo4TSLJm9HwZuo3n7ixQqsHHKU8TGhfiVgwik66192gW674VX8GskyCnFQgR71b/ur0NMsSrpBJam078FMMx9SgYJJPSp3M8pSyIe3ztqOKJtyG49mpE3LilB6JtXGlkMzU3xNjmlg7SiLXmVAc2EVvKv7ntTOMr8KxUGmGXLH5ojiTBDWZ/k16wnCGcuQIZUa4WwkbUEMZunRKLoRg8eVl8nhWDc6r/t1FpXadx1GEIziGUwjgEmpwC3VoAIM+PMMrvHnSe/HevY95a8HLZw7hD7zPH/6VjZk=</latexit>
<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>
(1)
a2
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>
x2 (2)
a2 o2
<latexit sha1_base64="lp3ZeQ57DPk1EUCeNsK7Ny2DKe8=">AAAB6nicbVBNSwMxEJ3Ur1q/qh69BIvgqexWQY9FLx4r2g9ol5JNs21oNlmSrFCW/gQvHhTx6i/y5r8xbfegrQ8GHu/NMDMvTAQ31vO+UWFtfWNzq7hd2tnd2z8oHx61jEo1ZU2qhNKdkBgmuGRNy61gnUQzEoeCtcPx7cxvPzFtuJKPdpKwICZDySNOiXXSg+rX+uWKV/XmwKvEz0kFcjT65a/eQNE0ZtJSQYzp+l5ig4xoy6lg01IvNSwhdEyGrOuoJDEzQTY/dYrPnDLAkdKupMVz9fdERmJjJnHoOmNiR2bZm4n/ed3URtdBxmWSWibpYlGUCmwVnv2NB1wzasXEEUI1d7diOiKaUOvSKbkQ/OWXV0mrVvUvqt79ZaV+k8dRhBM4hXPw4QrqcAcNaAKFITzDK7whgV7QO/pYtBZQPnMMf4A+fwAAKI2a</latexit>
L(y, o)
<latexit sha1_base64="jRuGYuNAf6C7yfYguq+x/vIHq08=">AAACDHicbZDLSsNAFIYnXmu9VV26GSxCBSmJCrosunHhooK9QBvKZDpph05mwsxECCEP4MZXceNCEbc+gDvfxkkaQVt/GPj4zznMOb8XMqq0bX9ZC4tLyyurpbXy+sbm1nZlZ7etRCQxaWHBhOx6SBFGOWlpqhnphpKgwGOk402usnrnnkhFBb/TcUjcAI049SlG2liDSrUfID3GiCU3aS1nz0/i9Bj+sEiPTJddt3PBeXAKqIJCzUHlsz8UOAoI15ghpXqOHWo3QVJTzEha7keKhAhP0Ij0DHIUEOUm+TEpPDTOEPpCmsc1zN3fEwkKlIoDz3RmK6rZWmb+V+tF2r9wE8rDSBOOpx/5EYNawCwZOKSSYM1iAwhLanaFeIwkwtrkVzYhOLMnz0P7pO6c1u3bs2rjsoijBPbBAagBB5yDBrgGTdACGDyAJ/ACXq1H69l6s96nrQtWMbMH/sj6+AYGV5uW</latexit>
<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>
<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>
(1)
a3
(2) o3
<latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>
a3
<latexit sha1_base64="vBxbcVs2Wnfm0yi6DKhPPczIBHw=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPu1x7TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOhcI+i</latexit>
<latexit sha1_base64="wT+53Eb88nVtTpSfn4qk4kRsjrU=">AAAB6nicbVBNSwMxEJ3Ur1q/qh69BIvgqexaQY9FLx4r2g9ol5JNs21oNlmSrFCW/gQvHhTx6i/y5r8xbfegrQ8GHu/NMDMvTAQ31vO+UWFtfWNzq7hd2tnd2z8oHx61jEo1ZU2qhNKdkBgmuGRNy61gnUQzEoeCtcPx7cxvPzFtuJKPdpKwICZDySNOiXXSg+rX+uWKV/XmwKvEz0kFcjT65a/eQNE0ZtJSQYzp+l5ig4xoy6lg01IvNSwhdEyGrOuoJDEzQTY/dYrPnDLAkdKupMVz9fdERmJjJnHoOmNiR2bZm4n/ed3URtdBxmWSWibpYlGUCmwVnv2NB1wzasXEEUI1d7diOiKaUOvSKbkQ/OWXV0nrourXqt79ZaV+k8dRhBM4hXPw4QrqcAcNaAKFITzDK7whgV7QO/pYtBZQPnMMf4A+fwABrI2b</latexit>
(1)
a4
(2)
<latexit sha1_base64="uxWzlquY+EeW/UpcO69SCXeIYtQ=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPu1x7Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGhdI+i</latexit>
a4
<latexit sha1_base64="vsgJntgqeAyiGWhpRcem3fXieTw=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPu1x7TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteJdVNy7Wql+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOi+o+j</latexit>
use softmax if this is a multi-clas
(1) problem with mutually exclusive classes
a5 <latexit sha1_base64="NHK0ywkULzi4Jl2BlAjdO8n2Yig=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquVfRY9OKxgv2Qdi3ZNNuGJtklyQpl6a/w4kERr/4cb/4b03YP2vpg4PHeDDPzgpgzbVz328mtrK6tb+Q3C1vbO7t7xf2Dpo4SRWiDRDxS7QBrypmkDcMMp+1YUSwCTlvB6Gbqt56o0iyS92YcU1/ggWQhI9hY6QH3Lh7Tsnc66RVLbsWdAS0TLyMlyFDvFb+6/YgkgkpDONa647mx8VOsDCOcTgrdRNMYkxEe0I6lEguq/XR28ASdWKWPwkjZkgbN1N8TKRZaj0VgOwU2Q73oTcX/vE5iwis/ZTJODJVkvihMODIRmn6P+kxRYvjYEkwUs7ciMsQKE2MzKtgQvMWXl0nzrOJVK+7deal2ncWRhyM4hjJ4cAk1uIU6NICAgGd4hTdHOS/Ou/Mxb8052cwh/IHz+QOi/o+j</latexit>
Is It Deep Learning?
arXiv:1712.09913v3 [cs.LG] 7
generalization, using a range of visualization methods. First, w
Note That the Loss is Not Convex Anymore “filter normalization” method that helps us visualize loss fu
make meaningful side-by-side comparisons between loss fu
a variety of visualizations, we explore how network archite
landscape, and how training parameters affect the shape of m
• Linear regression, Adaline, Logistic Regression, and
Softmax Regression have convex loss functions
1 Introduction
Training neural networks requires minimizing a high-dimensional n
task that is hard in theory, but sometimes easy in practice. Despite
general neural loss functions [2], simple gradient methods often find
• This is not the case anymore; in practice, we usually end up at configurations with zero or near-zero training loss), even when data and
training [42]. However, this good behavior is not universal; the traina
di erent local minima if we repeat the training (e.g., by changing dependent on network architecture design choices, the choice of optimi
a variety of other considerations. Unfortunately, the effect of each of th
the random seed for weight initialization or shu ing the dataset the underlying loss surface is unclear. Because of the prohibitive cos
(which requires looping over all the data points in the training set), stud
while leaving all setting the same) predominantly theoretical.
import torch.nn.functional as
<latexit sha1_base64="lNHxR/yX3xSb0KuWlCBXI8r8/pU=">AAAB/nicbVDLSgMxFM3UV62vUXHlJliEFqTMiKgboejGZQX7gM5QMmmmDU0yQ5IR26Hgr7hxoYhbv8Odf2PazkJbD1w4nHMv994TxIwq7TjfVm5peWV1Lb9e2Njc2t6xd/caKkokJnUcsUi2AqQIo4LUNdWMtGJJEA8YaQaDm4nffCBS0Ujc62FMfI56goYUI22kjn3gKdrjqDQqX0GPo8eScwJH5Y5ddCrOFHCRuBkpggy1jv3ldSOccCI0ZkiptuvE2k+R1BQzMi54iSIxwgPUI21DBeJE+en0/DE8NkoXhpE0JTScqr8nUsSVGvLAdHKk+2rem4j/ee1Eh5d+SkWcaCLwbFGYMKgjOMkCdqkkWLOhIQhLam6FuI8kwtokVjAhuPMvL5LGacU9rzh3Z8XqdRZHHhyCI1ACLrgAVXALaqAOMEjBM3gFb9aT9WK9Wx+z1pyVzeyDP7A+fwCs9JQA</latexit>
(z) = max(0, z)
<latexit sha1_base64="Rt8jTWtDgekNkOeHqTh73RoLHno=">AAAB9HicbVBNSwMxEJ31s9avqkcvwSLUS9lVQS9C0YvHCvYD2qVk02wbmmTXJFtol/4OLx4U8eqP8ea/MW33oK0PBh7vzTAzL4g508Z1v52V1bX1jc3cVn57Z3dvv3BwWNdRogitkYhHqhlgTTmTtGaY4bQZK4pFwGkjGNxN/caQKs0i+WhGMfUF7kkWMoKNlfy2Zj2BS+MzdIPGnULRLbszoGXiZaQIGaqdwle7G5FEUGkIx1q3PDc2foqVYYTTSb6daBpjMsA92rJUYkG1n86OnqBTq3RRGClb0qCZ+nsixULrkQhsp8Cmrxe9qfif10pMeO2nTMaJoZLMF4UJRyZC0wRQlylKDB9Zgoli9lZE+lhhYmxOeRuCt/jyMqmfl72LsvtwWazcZnHk4BhOoAQeXEEF7qEKNSDwBM/wCm/O0Hlx3p2PeeuKk80cwR84nz9f5JEt</latexit>
(z) = z
https://github.com/rasbt/stat453-deep-learning-ss21/blob/master/L09/code/xor-problem.ipynb
1
(z) =
<latexit sha1_base64="Rt8jTWtDgekNkOeHqTh73RoLHno=">AAAB9HicbVBNSwMxEJ31s9avqkcvwSLUS9lVQS9C0YvHCvYD2qVk02wbmmTXJFtol/4OLx4U8eqP8ea/MW33oK0PBh7vzTAzL4g508Z1v52V1bX1jc3cVn57Z3dvv3BwWNdRogitkYhHqhlgTTmTtGaY4bQZK4pFwGkjGNxN/caQKs0i+WhGMfUF7kkWMoKNlfy2Zj2BS+MzdIPGnULRLbszoGXiZaQIGaqdwle7G5FEUGkIx1q3PDc2foqVYYTTSb6daBpjMsA92rJUYkG1n86OnqBTq3RRGClb0qCZ+nsixULrkQhsp8Cmrxe9qfif10pMeO2nTMaJoZLMF4UJRyZC0wRQlylKDB9Zgoli9lZE+lhhYmxOeRuCt/jyMqmfl72LsvtwWazcZnHk4BhOoAQeXEEF7qEKNSDwBM/wCm/O0Hlx3p2PeeuKk80cwR84nz9f5JEt</latexit>
(z) = z <latexit sha1_base64="juP/lRLXXUm+xowd+Wp3uxbypdo=">AAACCXicbVDLSgMxFM3UV62vUZdugkVoEcuMCroRim5cVrAP6Awlk2ba0CQzJBmxHWbrxl9x40IRt/6BO//G9LHQ6oELh3Pu5d57gphRpR3ny8otLC4tr+RXC2vrG5tb9vZOQ0WJxKSOIxbJVoAUYVSQuqaakVYsCeIBI81gcDX2m3dEKhqJWz2Mic9RT9CQYqSN1LGhp2iPo9KoDC+gF0qEUzdL3UOP3Melo1E569hFp+JMAP8Sd0aKYIZax/70uhFOOBEaM6RU23Vi7adIaooZyQpeokiM8AD1SNtQgThRfjr5JIMHRunCMJKmhIYT9edEirhSQx6YTo50X817Y/E/r53o8NxPqYgTTQSeLgoTBnUEx7HALpUEazY0BGFJza0Q95GJQ5vwCiYEd/7lv6RxXHFPKs7NabF6OYsjD/bAPigBF5yBKrgGNVAHGDyAJ/ACXq1H69l6s96nrTlrNrMLfsH6+AYrgpi4</latexit>
1 + exp( z)
8
>
<1 if z > 1
exp(z) exp( z)
Tanh(z) = HardTanh(z) = 1 if z < 1
exp(z) + exp( z) >
:
<latexit sha1_base64="5cZuu210xeCBlZ4E5jtS8+GzZjM=">AAACKHicbZDLSgMxFIYzXmu9VV26CRbBIpYZFXQjim5cKtgqdErJpGdsMJMZkjNiO/Rx3PgqbkQUceuTmF7wVn8IfPnPOSTnDxIpDLruuzM2PjE5NZ2byc/OzS8sFpaWqyZONYcKj2WsrwJmQAoFFRQo4SrRwKJAwmVwc9KrX96CNiJWF9hOoB6xayVCwRlaq1E49BHuMLtgqtXd6JToAfVDzXjmw13Su2/RPm11Sl36ZW5+m41C0S27fdFR8IZQJEOdNQrPfjPmaQQKuWTG1Dw3wXrGNAouoZv3UwMJ4zfsGmoWFYvA1LP+ol26bp0mDWNtj0Lad39OZCwyph0FtjNi2DJ/az3zv1otxXC/ngmVpAiKDx4KU0kxpr3UaFNo4CjbFhjXwv6V8hazQaHNNm9D8P6uPArV7bK3U3bPd4tHx8M4cmSVrJEN4pE9ckROyRmpEE7uySN5Ia/Og/PkvDnvg9YxZzizQn7J+fgEGYOkGQ==</latexit>
<latexit sha1_base64="HUDbc5GxdYcfA36GuHXXRmmXRRg=">AAACmXicbZHPbtNAEMbXLtAS/oUi9dLLiAhUDkQ2rVQOLWpBoIhTEU1bKY6i9XqcrLpeW7vjQmL5nXgWbrwNm9QSacInrfTpm99odmfjQklLQfDH8zfu3X+wufWw9ejxk6fP2s+3L2xeGoF9kavcXMXcopIa+yRJ4VVhkGexwsv4+tO8fnmDxspcn9O0wGHGx1qmUnBy0aj9CxpFhD+p6nGTnHM9qfdmb+AYohjHUlfCDbB1C5YUwuumBWQKNczgg8ui6A70dp06mocr2OwfldMEzQ9p0cFLVIQ6aW4xaneCbrAQrJuwMR3W6GzU/h0luSgz1CQUt3YQBgUNK25ICoV1KyotFlxc8zEOnNU8QzusFput4ZVLEkhz444mWKTLHRXPrJ1msSMzThO7WpuH/6sNSkrfDyupi5JQi9tBaamAcph/EyTSoCA1dYYLI91dQUy44YLcZ7bcEsLVJ6+bi3fdcL8bfDvonHxs1rHFdtlLtsdCdshOWI+dsT4T3o535H32vvi7/qnf87/eor7X9Lxgd+R//wuLNLmu</latexit>
z otherwise
• Advantages of Tan
• Mean centerin
• Positive and negative value
• Larger gradients
Tanh ("tanH")
Additional tip: Also good to
normalize inputs to mean zero
and use random weight
initialization with avg. weight
centered at zer
Also simple
derivative
d
T anh(z) = 1 T anh(z)2
<latexit sha1_base64="DqKNoYPxBXT/vBenWHakJA2dvWM=">AAACD3icbVC7TsMwFHXKq5RXgJHFogKVgSopSLAgVbAwFqkvqS2V4zitVceJbAepjfIHLPwKCwMIsbKy8Tc4bQZoOdKVjs+5V773OCGjUlnWt5FbWl5ZXcuvFzY2t7Z3zN29pgwigUkDBywQbQdJwignDUUVI+1QEOQ7jLSc0U3qtx6IkDTgdTUOSc9HA049ipHSUt887noC4dhNYneS1BEfliYn8Ara8BRmr/u4khT6ZtEqW1PARWJnpAgy1PrmV9cNcOQTrjBDUnZsK1S9GAlFMSNJoRtJEiI8QgPS0ZQjn8hePL0ngUdacaEXCF1cwan6eyJGvpRj39GdPlJDOe+l4n9eJ1LeZS+mPIwU4Xj2kRcxqAKYhgNdKghWbKwJwoLqXSEeIh2Q0hGmIdjzJy+SZqVsn5Wtu/Ni9TqLIw8OwCEoARtcgCq4BTXQABg8gmfwCt6MJ+PFeDc+Zq05I5vZB39gfP4A+tmarA==</latexit>
dz
Sebastian Raschka STAT 453: Intro to Deep Learning 21
:
<latexit sha1_base64="8/mhuw0565qyUYql8RjpNUv+x0k=">AAACaXicbVHRShtBFJ1dbbVRa1Qq0r5cGioWSthVob4UpH3xwQdbGhWyIcxO7iaDs7PrzF1tsiz0G/vWH+hLf8JJXKxVDwwczj333pkzca6kpSD47flz88+eLyy+aCwtr7xcba6tn9qsMAI7IlOZOY+5RSU1dkiSwvPcIE9jhWfxxZdp/ewKjZWZ/k7jHHspH2qZSMHJSf3mT4gIf1D5DY871c7kPXxqQI0oxqHUpXDTbXWnTj7Adt0DMoEKJhAN8RICiKI7U3DPlNEIzbW0CP+GRKgH9eB+sxW0gxngMQlr0mI1TvrNX9EgE0WKmoTi1nbDIKdeyQ1JobBqRIXFnIsLPsSuo5qnaHvlLKkK3jllAElm3NEEM/V+R8lTa8dp7Jwpp5F9WJuKT9W6BSUHvVLqvCDU4nZRUiigDKaxw0AaFKTGjnBhpLsriBE3XJD7nIYLIXz45MfkdLcd7rWDr/utw891HIvsDXvLdljIPrJDdsROWIcJ9sdb9l55m95ff83f8l/fWn2v7tlg/8Fv3QCnIK87</latexit>
ReLU(z) = max(0, z)
<latexit sha1_base64="tpvX57OOx4EZDDJwxaIFA+Uqi8w=">AAACA3icbVDLSgNBEJyNrxhfq970MhiECBJ2VdCLEPTiwUMUE4UkhNlJxwyZfTDTK0mWgBd/xYsHRbz6E978GyePgyYWNBRV3XR3eZEUGh3n20rNzM7NL6QXM0vLK6tr9vpGWYex4lDioQzVncc0SBFACQVKuIsUMN+TcOu1zwf+7QMoLcLgBrsR1Hx2H4im4AyNVLe3qggdTK7hstTP9fZOadVnnZyzT3t7dTvr5J0h6DRxxyRLxijW7a9qI+SxDwFyybSuuE6EtYQpFFxCP1ONNUSMt9k9VAwNmA+6lgx/6NNdozRoM1SmAqRD9fdEwnytu75nOn2GLT3pDcT/vEqMzZNaIoIoRgj4aFEzlhRDOgiENoQCjrJrCONKmFspbzHFOJrYMiYEd/LlaVI+yLuHeefqKFs4G8eRJttkh+SIS45JgVyQIikRTh7JM3klb9aT9WK9Wx+j1pQ1ntkkf2B9/gCi/JYz</latexit>
↵ = 0.025
<latexit sha1_base64="GxyagE92KSMVQEIKdX70/1bBiDc=">AAAB83icbVBNSwMxEJ31s9avqkcvi0XwVHaroheh6MVjBfsB3aXMptk2NJsNSVYopX/DiwdFvPpnvPlvTNs9aOuDYR7vzZDJiyRn2njet7Oyura+sVnYKm7v7O7tlw4OmzrNFKENkvJUtSPUlDNBG4YZTttSUUwiTlvR8G7qt56o0iwVj2YkaZhgX7CYETRWCgLkcoA3XsWrXnZLZdtncJeJn5My5Kh3S19BLyVZQoUhHLXu+J404RiVYYTTSTHINJVIhtinHUsFJlSH49nNE/fUKj03TpUtYdyZ+ntjjInWoySykwmagV70puJ/Xicz8XU4ZkJmhgoyfyjOuGtSdxqA22OKEsNHliBRzN7qkgEqJMbGVLQh+ItfXibNasU/r3gPF+XabR5HAY7hBM7AhyuowT3UoQEEJDzDK7w5mfPivDsf89EVJ985gj9wPn8ASyuQiA==</latexit>
(
z, if z 0
PReLU(z) =
↵=1
<latexit sha1_base64="yzGSqYTdMxl7DUavLZUx0UoSOM8=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0ItQ9OKxgv2ANpTJdtMu3Wzi7kYooX/CiwdFvPp3vPlv3LY5aOuDgcd7M8zMCxLBtXHdb6ewsrq2vlHcLG1t7+zulfcPmjpOFWUNGotYtQPUTHDJGoYbwdqJYhgFgrWC0e3Ubz0xpXksH8w4YX6EA8lDTtFYqd1FkQzx2uuVK27VnYEsEy8nFchR75W/uv2YphGThgrUuuO5ifEzVIZTwSalbqpZgnSEA9axVGLEtJ/N7p2QE6v0SRgrW9KQmfp7IsNI63EU2M4IzVAvelPxP6+TmvDKz7hMUsMknS8KU0FMTKbPkz5XjBoxtgSp4vZWQoeokBobUcmG4C2+vEyaZ1XvvOreX1RqN3kcRTiCYzgFDy6hBndQhwZQEPAMr/DmPDovzrvzMW8tOPnMIfyB8/kDgdGPnA==</latexit>
↵z, otherwise
<latexit sha1_base64="5XMW6SQ4ZR0mdd8VGBaZZP6S0qU=">AAACcnicbVFNT9tAEF2bfkD6FUBcWqmdNgIFCUU2IMEFCbWXHnpIqwaQ4ihab8bJivXa7I5pE8s/oH+vt/6KXvgBbFKrUOiTVnp6M29m922cK2kpCH55/tKDh48eL680njx99vxFc3XtxGaFEdgTmcrMWcwtKqmxR5IUnuUGeRorPI3PP8zrp5dorMz0V5rmOEj5WMtECk5OGjZ/QET4ncruF/zUq9qzbThqQI0oxrHUpXDjbfVXne3AVm0CmUAFM4jGeAEBRNGNlat8wstZdas5owmab9Ii3AyLUI/qBcNmK+gEC8B9EtakxWp0h82f0SgTRYqahOLW9sMgp0HJDUmhsGpEhcWci3M+xr6jmqdoB+Uisgo2nTKCJDPuaIKFettR8tTaaRq7zpTTxN6tzcX/1foFJYeDUuq8INTiz6KkUEAZzPOHkTQoSE0d4cJId1cQE264IPdLDRdCePfJ98nJbifc6wSf91vH7+s4ltkr9o61WcgO2DH7yLqsxwT77W14r7033pX/0n/r19n5Xu1ZZ//A37kGhZ6ziQ==</latexit>
0.4 0.8
0.3 0.6
0.2 0.4
0.1 0.2
0.0 0.0
-0.1 -0.2
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6
Figure 1: Left Panel: ReLU and Parametric SoftPlus. Right Panel: the first derivatives for ReLU and Parametric
SoftPlus. Compared to ReLU, Parametric Softplus is smooth with continuous derivatives.
training samples. Guided by this principle, we identify that ReLU, a widely-used activation function
in most network
It is commonly architectures,
believed significantly
that networks weakens
cannot be adversarial trainingand
both accurate due robust,
to its non-smooth nature,
that gaining
e.g., ReLU’s
robustness meansgradient
losing gets an abrupt[...]
accuracy. change
Our when its input is zero,
key observation is as illustrated
that in Figure 1. ReLU
the widely-used
To fix
activation the issuesigni
function induced by ReLU,
cantly weakensin this adversarial
paper, we propose smooth
training adversarial
due training (SAT),nature.
to its non-smooth which
1
enforces architectural smoothness via replacing ReLU with its smooth
Hence we propose smooth adversarial training (SAT), in which we replace ReLU approximations for improving
with its
the gradient quality in adversarial training (Figure 1 shows Parametric SoftPlus, an example of smooth
smooth approximations to strengthen adversarial training.
approximations for ReLU). With smooth activation functions, SAT is able to feed the networks with
harder adversarial training samples and compute better gradient updates for network optimization,
hence substantially strengthens adversarial training.
Sebastian Raschka
Our experiment results show that SAT improves
STAT 453: Intro to Deep Learning 23
fi
Forward Backward
3.9 1.2
Parametric SoftPlus Parametric SoftPlus
Swish Swish
3.2 GELU 1.0 GELU
ELU ELU
2.5 SmoothReLU 0.8 SmoothReLU
1.8 0.6
1.1 0.4
0.4 0.2
-0.3 0.0
-1.0 -0.2
-4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0
Figure 2: Visualizations of smooth activation functions and their derivatives.
↵
ELU
GELU 1
As shown above, 70 improving ReLU’s gradient can both strengthen the
SmoothReLU
attacker and provide better1.2
Swish
-0.3
gradient updates. Nonetheless, this strategy may be suboptimal as there still is a discrepancy between1.4 -2.0
Parametric Softplus ELU
the forward pass (which
69 we
ReLUuse ReLU) and the backward pass (which we use Parametric Softplus).1.6 -3.7
1.8 -6.2
To fully exploit the potential of training with better gradients, we hereby propose smooth adversarial2.0
Softplus
-7.9
68
training (SAT), which enforces
33 34 architectural
35 36 smoothness
37 38via the
39 exclusive
40 usage 42
41 of smooth
43 activation
functions in adversarial training. We keep all other Robustness
Adversarial network components
(%) the same, as most of them
Table 2:
will not result in the issue of poor gradient.3 comparison
Figure 3: Smooth activation functions improve adversarial training. Compared ELU (non-sm
to ReLU, all smooth activation functions significantly boost robustness, while ↵ 6= 1)
4.1 Adversarial Training
keeping with
accuracy Smooth
almost Activation Functions
the same. (always smo
We consider the following activation functions as the smooth approximations of ReLU in SAT
4.2 Ruling Out the Effect From x < 0
(Figure 2 plots these functions as well as their derivatives):
Compared to ReLU, in addition to being smooth, the functions above have non-zero re
• Softplus [24]:https://arxiv.org/abs/2006.14536
f (x) = log(1 + exp(x)). We also consider its parametric version f (x) = ↵1 log(1 +
negative inputs (x < 0) which may also affect adversarial training. To rule out this factor,
exp(↵x)), and set ↵ = 10 as in Section 3.
propose SmoothReLU, which flattens the activation function by only modifying ReLU af
• Swish [31, 9]: f (x) = x · sigmoid(x).
Sebastian Raschka Compared totoother
STAT 453: Intro activation functions, Swish has a
⇢ Learning
Deep 24
1
https://twitter.com/
TheInsaneApp/status/
1366324846976659461?s=20
Sebastian Raschka STAT 453: Intro to Deep Learning 25
Implementing Multilayer Perceptrons in
PyTorch
Dead Neurons
[1] Balázs Csanád Csáji (2001) Approximation with Arti cial Neural Networks; Faculty of Sciences; Eötvös Loránd
University, Hungar
[2] Cybenko, G. (1989) "Approximations by superpositions of sigmoidal functions", Mathematics of Control, Signals, and
Systems, 2(4), 303–314. doi:10.1007/BF0255127
[3] Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural
networks, 2(5), 359-366.
fi
fi
g
Failure cases of a ~93% accuracy (not very good, but beside the point)
2-layer (1-hidden layer) MLP on MNIS
(where t=target class and p=predicted class)
Error
Generalization Error
Training Error
Over tting
Model Capacity
fi
Bias-Variance Decomposition
Bias-Variance Decomposition
Bias [ ˆ
✓] = E[ ˆ
✓] ✓
Bias(θ)̂ ✓= E[θ]̂ − θ
<latexit sha1_base64="ArW0mTesET86qB5wWHfkERheXBA=">AAACKnicbVDJSgNBEO1xN25Rj14Gg+DFMKOCXgQXBI8KZoHMEGo6laRJz/TQXSOEId/jxV/x4kERr36InUUw6oOGV+9V0VUvSqUw5Hnvzszs3PzC4tJyYWV1bX2juLlVNSrTHCtcSaXrERiUIsEKCZJYTzVCHEmsRb2roV97QG2ESu6pn2IYQycRbcGBrNQsXgQqRQ2kdAIx5pcCzKCZB9RFgkEj6AJ9F+HZ9XR9MCbNYskreyO4f4k/ISU2wW2z+BK0FM9iTIhLMKbheymFOWgSXOKgEGQGU+A96GDD0uFeJsxHpw7cPau03LbS9iXkjtSfEznExvTjyHbGQF3z2xuK/3mNjNqnYS6SNCNM+PijdiZdUu4wN7clNHKSfUuAa2F3dXkXNHCy6RZsCP7vk/+S6mHZPyp7d8el88tJHEtsh+2yfeazE3bObtgtqzDOHtkze2VvzpPz4rw7H+PWGWcys82m4Hx+ARaKqXc=</latexit>
( )
2
Var(θ)̂ = E[θ2̂ ] − E[θ ]̂
h i
Var
Var(
ˆ
θ)̂ =✓ [E[(E[
✓] = ˆ
θ]̂ −Eθ)̂ 2]✓
2 ˆ
(E[✓]) 2
h i
<latexit sha1_base64="+kALwOjv4LtTP3EAUE/0PGX7wiU=">AAACR3icbVBBS+NAGJ3UddXq7lY9eglbFvSwJXEX1osgiuBRwVahieXL9EszOMmEmS9CCfl3Xrzubf+CFw+KeHRSK1jdBwNv3nsf882LcikMed4/pzH3af7zwuJSc3nly9dvrdW1nlGF5tjlSip9HoFBKTLskiCJ57lGSCOJZ9HlQe2fXaE2QmWnNM4xTGGUiVhwICsNWheBylEDKZ1BimUPdDUoA0qQoOoHCdDrJdw9DCTGNCNelNtVoMUoofDn5uFsfqs2B6221/EmcD8Sf0rabIrjQetvMFS8SDEjLsGYvu/lFJagSXCJVTMoDObAL2GEfUvrpU1YTnqo3B9WGbqx0vZk5E7UtxMlpMaM08gmU6DEvPdq8X9ev6B4JyxFlheEGX95KC6kS8qtS3WHQiMnObYEuBZ2V5cnoIGTrb5pS/Dff/kj6W13/F8d7+R3e29/Wsci22Df2Sbz2R+2x47YMesyzq7ZLbtnD86Nc+c8Ok8v0YYznVlnM2g4z8j+tLQ=</latexit>
Sebastian
Sebastian Raschka STAT STAT
Raschka 453: Intro
479: Machine to Deep Learning
Learning FS 2018 44
25
Bias & Variance vs
Over tting & Under tting
Training Error
Variance
Bias
Model Capacity
capacity: abstract concept meaning roughly the number of parameters of the model times how ef ciently the parameters are use
• Also, train/valid/test splits are usually suf cient for training &
estimating the generalization performance in deep learnin
https://arxiv.org/pdf/1811.12808.pdf
Sebastian Raschka STAT 453: Intro to Deep Learning 46
fi
s
fi
g
Generalization
Error
Deep Learning
1 I NTRODUCTION
Architectures: CNNs (standard & ResNet) and transformers trained with cross-entropy loss
Left:
Figure 1:Sebastian Train STAT
Raschka
and test error as a function of model size
453: Intro to Deep Learning 48
E POCH - WISE D OUBLE D ESCENT
this section, we demonstrate a novel form of double-descent with respect to training epochs,
hich is consistent with our unified view of effective model complexity (EMC) and the generalized
ouble descent hypothesis. Increasing the train time increases the EMC—and thus a sufficiently
rge model transitions from under- to over-parameterized over the course of training.
https://arxiv.org/abs/1912.02292
9: Left: Training dynamics for models in three regimes. Models are ResNet18s on CIFAR10
gureThoughts:
ith 20% label noise, trained using Adam with learning rate 0.0001, and data augmentation. Right:
est error
• atover (Model
critical size
region, ⇥ Epochs).
only Three
one model slices
ts the ofwell
data this and
plot is
arevery
shown on thetoleft.
sensitive nois
• overparametrized models: many t the data well, SGD nds one that memorizes the
training set but also performs well on the test set
s illustrated in Figure 9, sufficiently
Sebastianlarge models
Raschka can
STAT 453: undergo
Intro to Deepa “double descent” behavior where49
Learning
fi
fi
fi
e
https://arxiv.org/abs/1912.02292
Figure 2: Left: Test error as a function of model size and train epochs. The horizontal line corre-
sponds to model-wise double descent–varying model size while training for as long as possible. The
vertical line corresponds to epoch-wise double descent, with test error undergoing double-descent
as train time increases. Right Train error of the corresponding models. All models are Resnet18s
trained on CIFAR-10 with 15% label noise, data-augmentation, and Adam for up to 4K epochs.
https://github.com/rasbt/deeplearning-models/blob/
master/pytorch_ipynb/cnn/cnn-vgg16-cats-
dogs.ipynb
Training/Validation/Test splits
Epoch: 001/100 | Batch 000/156 | Cost: 1136.912
Epoch: 001/100 | Batch 120/156 | Cost: 0.632
Epoch: 001/100 Train Acc.: 63.35% | Validation Acc.: 62.12
Time elapsed: 3.09 mi
Epoch: 002/100 | Batch 000/156 | Cost: 0.667
Epoch: 002/100 | Batch 120/156 | Cost: 0.664
Epoch: 002/100 Train Acc.: 66.05% | Validation Acc.: 66.32
Time elapsed: 6.15 mi
Epoch: 003/100 | Batch 000/156 | Cost: 0.613
Epoch: 003/100 | Batch 120/156 | Cost: 0.631
Epoch: 003/100 Train Acc.: 65.82% | Validation Acc.: 63.76
Time elapsed: 9.21 mi
Epoch: 004/100 | Batch 000/156 | Cost: 0.599
Epoch: 004/100 | Batch 120/156 | Cost: 0.583
Epoch: 004/100 Train Acc.: 66.75% | Validation Acc.: 64.52
Time elapsed: 12.27 mi
Epoch: 005/100 | Batch 000/156 | Cost: 0.591
Epoch: 005/100 | Batch 120/156 | Cost: 0.574
Epoch: 005/100 Train Acc.: 68.29% | Validation Acc.: 67.00
Time elapsed: 15.33 mi
...
Parameters vs Hyperparameters
• weights (weight parameters • minibatch siz
• biases (bias units) • data normalization scheme
• number of epoch
• number of hidden layer
• number of hidden unit
• learning rate
• (random seed, why?
• loss functio
• various weights (weighting terms
• activation function type
• regularization schemes (more later
• weight initialization schemes (more later
• optimization algorithm type (more later
• ...
fi
s
https://twitter.com/_ScottCondron/status/1363494433715552259?s=20
https://github.com/rasbt/stat453-deep-learning-ss20/blob/main/L09/code/
custom-dataloader