0% found this document useful (0 votes)
4 views60 pages

L09 MLP Slides

Lecture 09 of STAT 453 focuses on Multilayer Perceptrons (MLPs) and their architecture, including nonlinear activation functions and code examples. The lecture aims to address the XOR problem and includes practical applications involving cats and dogs with custom data loaders. Key topics also cover overfitting and underfitting in the context of deep learning models.

Uploaded by

sarv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views60 pages

L09 MLP Slides

Lecture 09 of STAT 453 focuses on Multilayer Perceptrons (MLPs) and their architecture, including nonlinear activation functions and code examples. The lecture aims to address the XOR problem and includes practical applications involving cats and dogs with custom data loaders. Key topics also cover overfitting and underfitting in the context of deep learning models.

Uploaded by

sarv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

STAT 453: Introduction to Deep Learning and Generative Models

Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching

Lecture 09

with Applications in Python

Multilayer Perceptrons
Sebastian Raschka STAT 453: Intro to Deep Learning 1
Today
We will nally be able to solve the
XOR problem ...

Sebastian Raschka STAT 453: Intro to Deep Learning 2


fi
... and talk about cats!

Sebastian Raschka STAT 453: Intro to Deep Learning 3


Lecture Overview

1. Multilayer Perceptron Architecture

2. Nonlinear Activation Functions

3. Multilayer Perceptron Code Examples

4. Over tting and Under tting

5. Cats & Dogs and Custom Data Loaders

Sebastian Raschka STAT 453: Intro to Deep Learning 4


fi
fi
Fully-connected feedforward
neural networks with one or more hidden
layers

1. Multilayer Perceptron Architecture

2. Nonlinear Activation Functions

3. Multilayer Perceptron Code Examples

4. Over tting and Under tting

5. Cats & Dogs and Custom Data Loaders

Sebastian Raschka STAT 453: Intro to Deep Learning 5


fi
fi

Computation Graph with Multiple Fully-Connected Layers


= Multilayer Perceptron
Nothing new, really
(bias not shown)
(1)
w1,1
(1) a1 <latexit sha1_base64="51Rbp1GGPW28qr7Kl7NY0LPiq2o=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5Iu5Zsmm1Dk+ySZIWy9Fd48aCIV3+ON/+N6XYP2vpg4PHeDDPzgpgzbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRWiTRDxSnQBrypmkTcMMp51YUSwCTtvB+Gbmt5+o0iyS92YSU1/goWQhI9hY6QH3vce06p1O++WKW3MzoGXi5aQCORr98ldvEJFEUGkIx1p3PTc2foqVYYTTaamXaBpjMsZD2rVUYkG1n2YHT9GJVQYojJQtaVCm/p5IsdB6IgLbKbAZ6UVvJv7ndRMTXvkpk3FiqCTzRWHCkYnQ7Hs0YIoSwyeWYKKYvRWREVaYGJtRyYbgLb68TFpnNe+85t5dVOrXeRxFOIJjqIIHl1CHW2hAEwgIeIZXeHOU8+K8Ox/z1oKTzxzCHzifP5zWj58=</latexit>

w1,1
(2)
y
<latexit sha1_base64="5CoRH/4hNmmOELpSJVIbVc5Zpaw=">AAAB9XicbVBNSwMxEJ31s9avqkcvwSJUkLJRQY9FLx4r2A9otyWbZtvQbHZJspay9H948aCIV/+LN/+NabsHbX0w8Hhvhpl5fiy4Nq777aysrq1vbOa28ts7u3v7hYPDuo4SRVmNRiJSTZ9oJrhkNcONYM1YMRL6gjX84d3UbzwxpXkkH804Zl5I+pIHnBJjpc6om+JzhCedtITPJt1C0S27M6BlgjNShAzVbuGr3YtoEjJpqCBat7AbGy8lynAq2CTfTjSLCR2SPmtZKknItJfOrp6gU6v0UBApW9Kgmfp7IiWh1uPQt50hMQO96E3F/7xWYoIbL+UyTgyTdL4oSAQyEZpGgHpcMWrE2BJCFbe3IjogilBjg8rbEPDiy8ukflHGl2X34apYuc3iyMExnEAJMFxDBe6hCjWgoOAZXuHNGTkvzrvzMW9dcbKZI/gD5/MHvkORXA==</latexit>

<latexit sha1_base64="UUz+fFdIMuJxCEAyuchQQrqM+Xo=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/JkV0=</latexit>

x1
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>

(2) (3)
a1 w1,1
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>

<latexit sha1_base64="2hc9HR5bv0+inQ8IsEn3Gd2v7nM=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpTECnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaV5WnGrFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8AcFPkV4=</latexit>

L(y, o) = l
<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>

(1)
a2 o
<latexit sha1_base64="xkDVhV2R7yGjiI8Bkoa6EodHAlw=">AAAB/nicbVDLSsNAFL2pr1pfUXHlZrAIFaQkKuhGKLpx4aKCfUAbymQ6aYdOJmFmIpRQ8FfcuFDErd/hzr9x0mah1QMDh3Pu5Z45fsyZ0o7zZRUWFpeWV4qrpbX1jc0te3unqaJEEtogEY9k28eKciZoQzPNaTuWFIc+py1/dJ35rQcqFYvEvR7H1AvxQLCAEayN1LP3uiHWQ4J5ejupjI9RdIQuEe/ZZafqTIH+EjcnZchR79mf3X5EkpAKTThWquM6sfZSLDUjnE5K3UTRGJMRHtCOoQKHVHnpNP4EHRqlj4JImic0mqo/N1IcKjUOfTOZhVXzXib+53USHVx4KRNxoqkgs0NBwpGOUNYF6jNJieZjQzCRzGRFZIglJto0VjIluPNf/kuaJ1X3tOrcnZVrV3kdRdiHA6iAC+dQgxuoQwMIpPAEL/BqPVrP1pv1PhstWPnOLvyC9fENUuuUZw==</latexit>

w1,2
(1)
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>
<latexit sha1_base64="zmvhV5w6wvufBjgJnplzs3qmpp8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69BIvgqSQq6LHoxWML9gPaUDbbSbt2sxt2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemHCmjed9O4W19Y3NreJ2aWd3b/+gfHjU0jJVFJtUcqk6IdHImcCmYYZjJ1FI4pBjOxzfzfz2EyrNpHgwkwSDmAwFixglxkoN2S9XvKo3h7tK/JxUIEe9X/7qDSRNYxSGcqJ11/cSE2REGUY5Tku9VGNC6JgMsWupIDHqIJsfOnXPrDJwI6lsCePO1d8TGYm1nsSh7YyJGellbyb+53VTE90EGRNJalDQxaIo5a6R7uxrd8AUUsMnlhCqmL3VpSOiCDU2m5INwV9+eZW0Lqr+ZdVrXFVqt3kcRTiBUzgHH66hBvdQhyZQQHiGV3hzHp0X5935WLQWnHzmGP7A+fwB2T+M9Q==</latexit>

l <latexit sha1_base64="E5Kc1ZKr520j8ga7QDzfGA0mefk=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzUEP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03MX5GleFM4LTUSzUmlI3pELuWShqh9rP5oVNyZpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjjZ1wmqUHJFovCVBATk9nXZMAVMiMmllCmuL2VsBFVlBmbTcmG4C2/vEpaF1Xvsuo2riq12zyOIpzAKZyDB9dQg3uoQxMYIDzDK7w5j86L8+58LFoLTj5zDH/gfP4A1LOM8g==</latexit>

(2)
<latexit sha1_base64="94K/4WUXlcb/JabfmqCJ0lfwAyA=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QNW0l5Sd87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/OkV0=</latexit>

x2
<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>
(2)
w1,3
<latexit sha1_base64="AnHRMEgiEAO14EMHGCDfABkTtl0=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTfuJcomraS0qVi7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8ODkWE=</latexit>
a2
<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>

(1) (1)
w3,2
<latexit sha1_base64="PK5wtTnoxgAbdbB3wQfMpe3ws7s=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTflK9RJW0l5Sci7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8OOkWE=</latexit>
a3 where o := σ(z) = σ(w⊤x + b)
<latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>

(Assume network for binary classi cation)


(2) (1)
@l @l @o @a1 @a1
(1)
= · (2) · (1) · (1)
@w1,1 @o @a @a @w
1 1 1,1
(2) (1)
@l @o @a2 @a1
+ · (2) · (1) · (1)
@o @a @a @w
<latexit sha1_base64="duY3mtbRiHW1HhXeSoGG3GDEm3E=">AAADmXictVJdS8MwFM1aP2b92hR88SU4BEUZzRT0QcEPFPFJ0amwzpFm6QymTUlSZZT+Jv+Lb/4b0zlwbqIveqFwOPecm57L9WPOlHbdt4Jlj41PTBannOmZ2bn5UnnhRolEElonggt552NFOYtoXTPN6V0sKQ59Tm/9x+O8f/tEpWIiutbdmDZD3IlYwAjWhmqVCy9eIDFJvRhLzTCHPPvEz60UbaLsPl1D61kGnX34g1hk0CNtoZ0hjRjQYDMwH1fLx32rHlSMsuhXH/ri6wWAA1bP2fiDDLVfM9T+LUOrVHGrbq/gKEB9UAH9umiVXr22IElII004VqqB3Fg303w64TRzvETRGJNH3KENAyMcUtVMe5eVwVXDtGEgpPkiDXvsoCPFoVLd0DfKEOsHNdzLye96jUQHu82URXGiaUQ+HgoSDrWA+ZnCNpOUaN41ABPJzL9C8oDNurQ5ZscsAQ1HHgU3tSraqrqX25WDo/46imAZrIA1gMAOOABn4ALUAbGWrD3rxDq1l+1D+8w+/5Bahb5nEXwp++oddm8n9w==</latexit>
2 1 1,1

Sebastian Raschka STAT 453: Intro to Deep Learning 6


fi
PERSPECTIVES

a No feedback Scalar feedback Vector feedback Synaps


Feedba
Feedforward Hebbian Perturbation Backpropagation Backprop-like learning Feedba
network learning learning with feedback network Feedfo
Output Diffuse

c Error lands

Input

Parameter 1
b
Scalar feedback Vector feedback

Weight perturbation Backpropagation

Precision of synaptic change in reducing error


B
Node perturbation Backpropagation approximations

Fig. 1 | A spectrum of learning algorithms. a | Left to right: a neural b | Backpropagation and perturbation a
network computes an output through a series of simple computational respect to the specificity of the synaptic
units. To
Lillicrap, T. P., Santoro, A., improve
Marris, L., &its outputs
Akerman, for aBackpropagation
C. (n.d.). task , it adjusts
andthe synapses
the brain. Nature between on this1–12.
Reviews Neuroscience, spectrum learn at different sp
https://doi.org/10.1038/
s41583-020-0277-3 these units. Simple Hebbian learning — which dictates that a synaptic con- parameters wander randomly on the er
nection should strengthen if a presynaptic neuron reliably contributes to a require detailed feedback circuits, but
postsynaptic neuron’s Sebastian
firing — cannot
Raschka make meaningful changes
STAT 453: Intro to Deepto the blue is used to inform learning at all synapse
Learning 7
Computation Graph with Multiple Fully-Connected Layers
= Multilayer Perceptron
here, we could also writ o
as a (3)
1
(1)
w1,1
(1) a1 <latexit sha1_base64="51Rbp1GGPW28qr7Kl7NY0LPiq2o=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5Iu5Zsmm1Dk+ySZIWy9Fd48aCIV3+ON/+N6XYP2vpg4PHeDDPzgpgzbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRWiTRDxSnQBrypmkTcMMp51YUSwCTtvB+Gbmt5+o0iyS92YSU1/goWQhI9hY6QH3vce06p1O++WKW3MzoGXi5aQCORr98ldvEJFEUGkIx1p3PTc2foqVYYTTaamXaBpjMsZD2rVUYkG1n2YHT9GJVQYojJQtaVCm/p5IsdB6IgLbKbAZ6UVvJv7ndRMTXvkpk3FiqCTzRWHCkYnQ7Hs0YIoSwyeWYKKYvRWREVaYGJtRyYbgLb68TFpnNe+85t5dVOrXeRxFOIJjqIIHl1CHW2hAEwgIeIZXeHOU8+K8Ox/z1oKTzxzCHzifP5zWj58=</latexit>

w1,1
(2)
y
<latexit sha1_base64="5CoRH/4hNmmOELpSJVIbVc5Zpaw=">AAAB9XicbVBNSwMxEJ31s9avqkcvwSJUkLJRQY9FLx4r2A9otyWbZtvQbHZJspay9H948aCIV/+LN/+NabsHbX0w8Hhvhpl5fiy4Nq777aysrq1vbOa28ts7u3v7hYPDuo4SRVmNRiJSTZ9oJrhkNcONYM1YMRL6gjX84d3UbzwxpXkkH804Zl5I+pIHnBJjpc6om+JzhCedtITPJt1C0S27M6BlgjNShAzVbuGr3YtoEjJpqCBat7AbGy8lynAq2CTfTjSLCR2SPmtZKknItJfOrp6gU6v0UBApW9Kgmfp7IiWh1uPQt50hMQO96E3F/7xWYoIbL+UyTgyTdL4oSAQyEZpGgHpcMWrE2BJCFbe3IjogilBjg8rbEPDiy8ukflHGl2X34apYuc3iyMExnEAJMFxDBe6hCjWgoOAZXuHNGTkvzrvzMW9dcbKZI/gD5/MHvkORXA==</latexit>

<latexit sha1_base64="UUz+fFdIMuJxCEAyuchQQrqM+Xo=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/JkV0=</latexit>

x1
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>

(2) (3)
a1 w1,1
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>

<latexit sha1_base64="2hc9HR5bv0+inQ8IsEn3Gd2v7nM=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpTECnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaV5WnGrFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8AcFPkV4=</latexit>

L(y, o) = l
<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>

(1)
a2 o
<latexit sha1_base64="xkDVhV2R7yGjiI8Bkoa6EodHAlw=">AAAB/nicbVDLSsNAFL2pr1pfUXHlZrAIFaQkKuhGKLpx4aKCfUAbymQ6aYdOJmFmIpRQ8FfcuFDErd/hzr9x0mah1QMDh3Pu5Z45fsyZ0o7zZRUWFpeWV4qrpbX1jc0te3unqaJEEtogEY9k28eKciZoQzPNaTuWFIc+py1/dJ35rQcqFYvEvR7H1AvxQLCAEayN1LP3uiHWQ4J5ejupjI9RdIQuEe/ZZafqTIH+EjcnZchR79mf3X5EkpAKTThWquM6sfZSLDUjnE5K3UTRGJMRHtCOoQKHVHnpNP4EHRqlj4JImic0mqo/N1IcKjUOfTOZhVXzXib+53USHVx4KRNxoqkgs0NBwpGOUNYF6jNJieZjQzCRzGRFZIglJto0VjIluPNf/kuaJ1X3tOrcnZVrV3kdRdiHA6iAC+dQgxuoQwMIpPAEL/BqPVrP1pv1PhstWPnOLvyC9fENUuuUZw==</latexit>

w1,2
(1)
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>
<latexit sha1_base64="zmvhV5w6wvufBjgJnplzs3qmpp8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69BIvgqSQq6LHoxWML9gPaUDbbSbt2sxt2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemHCmjed9O4W19Y3NreJ2aWd3b/+gfHjU0jJVFJtUcqk6IdHImcCmYYZjJ1FI4pBjOxzfzfz2EyrNpHgwkwSDmAwFixglxkoN2S9XvKo3h7tK/JxUIEe9X/7qDSRNYxSGcqJ11/cSE2REGUY5Tku9VGNC6JgMsWupIDHqIJsfOnXPrDJwI6lsCePO1d8TGYm1nsSh7YyJGellbyb+53VTE90EGRNJalDQxaIo5a6R7uxrd8AUUsMnlhCqmL3VpSOiCDU2m5INwV9+eZW0Lqr+ZdVrXFVqt3kcRTiBUzgHH66hBvdQhyZQQHiGV3hzHp0X5935WLQWnHzmGP7A+fwB2T+M9Q==</latexit>

l <latexit sha1_base64="E5Kc1ZKr520j8ga7QDzfGA0mefk=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzUEP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03MX5GleFM4LTUSzUmlI3pELuWShqh9rP5oVNyZpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjjZ1wmqUHJFovCVBATk9nXZMAVMiMmllCmuL2VsBFVlBmbTcmG4C2/vEpaF1Xvsuo2riq12zyOIpzAKZyDB9dQg3uoQxMYIDzDK7w5j86L8+58LFoLTj5zDH/gfP4A1LOM8g==</latexit>

(2)
<latexit sha1_base64="94K/4WUXlcb/JabfmqCJ0lfwAyA=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QNW0l5Sd87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/OkV0=</latexit>

x2
<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>
(2)
w1,3
<latexit sha1_base64="AnHRMEgiEAO14EMHGCDfABkTtl0=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTfuJcomraS0qVi7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8ODkWE=</latexit>
a2
<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>

(1) (1)
w3,2
<latexit sha1_base64="PK5wtTnoxgAbdbB3wQfMpe3ws7s=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTflK9RJW0l5Sci7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8OOkWE=</latexit>
a3
output laye
<latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>

(layer 4)

2nd hidden layer


(layer 3)
input laye
(layer 1) 1st hidden laye
(layer 2)

Sebastian Raschka STAT 453: Intro to Deep Learning 8


r

Computation Graph with Multiple Fully-Connected Layers


= Multilayer Perceptron
A more common counting/naming scheme, because then a perceptron/Adaline/
logistic regression model can be called a "1-layer neural network"
(1)
w1,1
(1) a1 <latexit sha1_base64="51Rbp1GGPW28qr7Kl7NY0LPiq2o=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5Iu5Zsmm1Dk+ySZIWy9Fd48aCIV3+ON/+N6XYP2vpg4PHeDDPzgpgzbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRWiTRDxSnQBrypmkTcMMp51YUSwCTtvB+Gbmt5+o0iyS92YSU1/goWQhI9hY6QH3vce06p1O++WKW3MzoGXi5aQCORr98ldvEJFEUGkIx1p3PTc2foqVYYTTaamXaBpjMsZD2rVUYkG1n2YHT9GJVQYojJQtaVCm/p5IsdB6IgLbKbAZ6UVvJv7ndRMTXvkpk3FiqCTzRWHCkYnQ7Hs0YIoSwyeWYKKYvRWREVaYGJtRyYbgLb68TFpnNe+85t5dVOrXeRxFOIJjqIIHl1CHW2hAEwgIeIZXeHOU8+K8Ox/z1oKTzxzCHzifP5zWj58=</latexit>

w1,1
(2)
y
<latexit sha1_base64="5CoRH/4hNmmOELpSJVIbVc5Zpaw=">AAAB9XicbVBNSwMxEJ31s9avqkcvwSJUkLJRQY9FLx4r2A9otyWbZtvQbHZJspay9H948aCIV/+LN/+NabsHbX0w8Hhvhpl5fiy4Nq777aysrq1vbOa28ts7u3v7hYPDuo4SRVmNRiJSTZ9oJrhkNcONYM1YMRL6gjX84d3UbzwxpXkkH804Zl5I+pIHnBJjpc6om+JzhCedtITPJt1C0S27M6BlgjNShAzVbuGr3YtoEjJpqCBat7AbGy8lynAq2CTfTjSLCR2SPmtZKknItJfOrp6gU6v0UBApW9Kgmfp7IiWh1uPQt50hMQO96E3F/7xWYoIbL+UyTgyTdL4oSAQyEZpGgHpcMWrE2BJCFbe3IjogilBjg8rbEPDiy8ukflHGl2X34apYuc3iyMExnEAJMFxDBe6hCjWgoOAZXuHNGTkvzrvzMW9dcbKZI/gD5/MHvkORXA==</latexit>

<latexit sha1_base64="UUz+fFdIMuJxCEAyuchQQrqM+Xo=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/JkV0=</latexit>

x1
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>

(2) (3)
a1 w1,1
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>

<latexit sha1_base64="2hc9HR5bv0+inQ8IsEn3Gd2v7nM=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpTECnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaV5WnGrFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8AcFPkV4=</latexit>

L(y, o) = l
<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>

(1)
a2 o
<latexit sha1_base64="xkDVhV2R7yGjiI8Bkoa6EodHAlw=">AAAB/nicbVDLSsNAFL2pr1pfUXHlZrAIFaQkKuhGKLpx4aKCfUAbymQ6aYdOJmFmIpRQ8FfcuFDErd/hzr9x0mah1QMDh3Pu5Z45fsyZ0o7zZRUWFpeWV4qrpbX1jc0te3unqaJEEtogEY9k28eKciZoQzPNaTuWFIc+py1/dJ35rQcqFYvEvR7H1AvxQLCAEayN1LP3uiHWQ4J5ejupjI9RdIQuEe/ZZafqTIH+EjcnZchR79mf3X5EkpAKTThWquM6sfZSLDUjnE5K3UTRGJMRHtCOoQKHVHnpNP4EHRqlj4JImic0mqo/N1IcKjUOfTOZhVXzXib+53USHVx4KRNxoqkgs0NBwpGOUNYF6jNJieZjQzCRzGRFZIglJto0VjIluPNf/kuaJ1X3tOrcnZVrV3kdRdiHA6iAC+dQgxuoQwMIpPAEL/BqPVrP1pv1PhstWPnOLvyC9fENUuuUZw==</latexit>

w1,2
(1)
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>
<latexit sha1_base64="zmvhV5w6wvufBjgJnplzs3qmpp8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69BIvgqSQq6LHoxWML9gPaUDbbSbt2sxt2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemHCmjed9O4W19Y3NreJ2aWd3b/+gfHjU0jJVFJtUcqk6IdHImcCmYYZjJ1FI4pBjOxzfzfz2EyrNpHgwkwSDmAwFixglxkoN2S9XvKo3h7tK/JxUIEe9X/7qDSRNYxSGcqJ11/cSE2REGUY5Tku9VGNC6JgMsWupIDHqIJsfOnXPrDJwI6lsCePO1d8TGYm1nsSh7YyJGellbyb+53VTE90EGRNJalDQxaIo5a6R7uxrd8AUUsMnlhCqmL3VpSOiCDU2m5INwV9+eZW0Lqr+ZdVrXFVqt3kcRTiBUzgHH66hBvdQhyZQQHiGV3hzHp0X5935WLQWnHzmGP7A+fwB2T+M9Q==</latexit>

l <latexit sha1_base64="E5Kc1ZKr520j8ga7QDzfGA0mefk=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzUEP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03MX5GleFM4LTUSzUmlI3pELuWShqh9rP5oVNyZpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjjZ1wmqUHJFovCVBATk9nXZMAVMiMmllCmuL2VsBFVlBmbTcmG4C2/vEpaF1Xvsuo2riq12zyOIpzAKZyDB9dQg3uoQxMYIDzDK7w5j86L8+58LFoLTj5zDH/gfP4A1LOM8g==</latexit>

(2)
<latexit sha1_base64="94K/4WUXlcb/JabfmqCJ0lfwAyA=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QNW0l5Sd87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/OkV0=</latexit>

x2
<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>
(2)
w1,3
<latexit sha1_base64="AnHRMEgiEAO14EMHGCDfABkTtl0=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTfuJcomraS0qVi7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8ODkWE=</latexit>
a2
<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>

(1) (1)
w3,2
<latexit sha1_base64="PK5wtTnoxgAbdbB3wQfMpe3ws7s=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTflK9RJW0l5Sci7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8OOkWE=</latexit>
a3
output laye
<latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>

(layer 4) layer 3

2nd hidden layer


(layer 3)
input laye
layer 2
(layer 1) 1st hidden laye
(layer 2) layer 1
layer 0
Sebastian Raschka STAT 453: Intro to Deep Learning 9
r

Computation Graph with Multiple Fully-Connected Layers


= Multilayer Perceptron

(1)
w1,1
(1) a1 <latexit sha1_base64="51Rbp1GGPW28qr7Kl7NY0LPiq2o=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5Iu5Zsmm1Dk+ySZIWy9Fd48aCIV3+ON/+N6XYP2vpg4PHeDDPzgpgzbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRWiTRDxSnQBrypmkTcMMp51YUSwCTtvB+Gbmt5+o0iyS92YSU1/goWQhI9hY6QH3vce06p1O++WKW3MzoGXi5aQCORr98ldvEJFEUGkIx1p3PTc2foqVYYTTaamXaBpjMsZD2rVUYkG1n2YHT9GJVQYojJQtaVCm/p5IsdB6IgLbKbAZ6UVvJv7ndRMTXvkpk3FiqCTzRWHCkYnQ7Hs0YIoSwyeWYKKYvRWREVaYGJtRyYbgLb68TFpnNe+85t5dVOrXeRxFOIJjqIIHl1CHW2hAEwgIeIZXeHOU8+K8Ox/z1oKTzxzCHzifP5zWj58=</latexit>

w1,1
(2)
y
<latexit sha1_base64="5CoRH/4hNmmOELpSJVIbVc5Zpaw=">AAAB9XicbVBNSwMxEJ31s9avqkcvwSJUkLJRQY9FLx4r2A9otyWbZtvQbHZJspay9H948aCIV/+LN/+NabsHbX0w8Hhvhpl5fiy4Nq777aysrq1vbOa28ts7u3v7hYPDuo4SRVmNRiJSTZ9oJrhkNcONYM1YMRL6gjX84d3UbzwxpXkkH804Zl5I+pIHnBJjpc6om+JzhCedtITPJt1C0S27M6BlgjNShAzVbuGr3YtoEjJpqCBat7AbGy8lynAq2CTfTjSLCR2SPmtZKknItJfOrp6gU6v0UBApW9Kgmfp7IiWh1uPQt50hMQO96E3F/7xWYoIbL+UyTgyTdL4oSAQyEZpGgHpcMWrE2BJCFbe3IjogilBjg8rbEPDiy8ukflHGl2X34apYuc3iyMExnEAJMFxDBe6hCjWgoOAZXuHNGTkvzrvzMW9dcbKZI/gD5/MHvkORXA==</latexit>

<latexit sha1_base64="UUz+fFdIMuJxCEAyuchQQrqM+Xo=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/JkV0=</latexit>

x1
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>

(2) (3)
a1 w1,1
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>

<latexit sha1_base64="2hc9HR5bv0+inQ8IsEn3Gd2v7nM=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpTECnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaV5WnGrFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8AcFPkV4=</latexit>

L(y, o) = l
<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>

(1)
a2 o
<latexit sha1_base64="xkDVhV2R7yGjiI8Bkoa6EodHAlw=">AAAB/nicbVDLSsNAFL2pr1pfUXHlZrAIFaQkKuhGKLpx4aKCfUAbymQ6aYdOJmFmIpRQ8FfcuFDErd/hzr9x0mah1QMDh3Pu5Z45fsyZ0o7zZRUWFpeWV4qrpbX1jc0te3unqaJEEtogEY9k28eKciZoQzPNaTuWFIc+py1/dJ35rQcqFYvEvR7H1AvxQLCAEayN1LP3uiHWQ4J5ejupjI9RdIQuEe/ZZafqTIH+EjcnZchR79mf3X5EkpAKTThWquM6sfZSLDUjnE5K3UTRGJMRHtCOoQKHVHnpNP4EHRqlj4JImic0mqo/N1IcKjUOfTOZhVXzXib+53USHVx4KRNxoqkgs0NBwpGOUNYF6jNJieZjQzCRzGRFZIglJto0VjIluPNf/kuaJ1X3tOrcnZVrV3kdRdiHA6iAC+dQgxuoQwMIpPAEL/BqPVrP1pv1PhstWPnOLvyC9fENUuuUZw==</latexit>

w1,2
(1)
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>
<latexit sha1_base64="zmvhV5w6wvufBjgJnplzs3qmpp8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69BIvgqSQq6LHoxWML9gPaUDbbSbt2sxt2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemHCmjed9O4W19Y3NreJ2aWd3b/+gfHjU0jJVFJtUcqk6IdHImcCmYYZjJ1FI4pBjOxzfzfz2EyrNpHgwkwSDmAwFixglxkoN2S9XvKo3h7tK/JxUIEe9X/7qDSRNYxSGcqJ11/cSE2REGUY5Tku9VGNC6JgMsWupIDHqIJsfOnXPrDJwI6lsCePO1d8TGYm1nsSh7YyJGellbyb+53VTE90EGRNJalDQxaIo5a6R7uxrd8AUUsMnlhCqmL3VpSOiCDU2m5INwV9+eZW0Lqr+ZdVrXFVqt3kcRTiBUzgHH66hBvdQhyZQQHiGV3hzHp0X5935WLQWnHzmGP7A+fwB2T+M9Q==</latexit>

l <latexit sha1_base64="E5Kc1ZKr520j8ga7QDzfGA0mefk=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzUEP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03MX5GleFM4LTUSzUmlI3pELuWShqh9rP5oVNyZpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjjZ1wmqUHJFovCVBATk9nXZMAVMiMmllCmuL2VsBFVlBmbTcmG4C2/vEpaF1Xvsuo2riq12zyOIpzAKZyDB9dQg3uoQxMYIDzDK7w5j86L8+58LFoLTj5zDH/gfP4A1LOM8g==</latexit>

(2)
<latexit sha1_base64="94K/4WUXlcb/JabfmqCJ0lfwAyA=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QNW0l5Sd87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/OkV0=</latexit>

x2
<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>
(2)
w1,3
<latexit sha1_base64="AnHRMEgiEAO14EMHGCDfABkTtl0=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTfuJcomraS0qVi7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8ODkWE=</latexit>
a2
<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>

(1) (1)
w3,2 a3 could use sigmoid here
<latexit sha1_base64="PK5wtTnoxgAbdbB3wQfMpe3ws7s=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTflK9RJW0l5Sci7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8OOkWE=</latexit>

<latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>

could use sigmoid here

could use sigmoid here

Sebastian Raschka STAT 453: Intro to Deep Learning 10


Computation Graph with Multiple Fully-Connected Layers
= Multilayer Perceptron
(1)
a1
y1 y2 y3
<latexit sha1_base64="51Rbp1GGPW28qr7Kl7NY0LPiq2o=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5Iu5Zsmm1Dk+ySZIWy9Fd48aCIV3+ON/+N6XYP2vpg4PHeDDPzgpgzbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRWiTRDxSnQBrypmkTcMMp51YUSwCTtvB+Gbmt5+o0iyS92YSU1/goWQhI9hY6QH3vce06p1O++WKW3MzoGXi5aQCORr98ldvEJFEUGkIx1p3PTc2foqVYYTTaamXaBpjMsZD2rVUYkG1n2YHT9GJVQYojJQtaVCm/p5IsdB6IgLbKbAZ6UVvJv7ndRMTXvkpk3FiqCTzRWHCkYnQ7Hs0YIoSwyeWYKKYvRWREVaYGJtRyYbgLb68TFpnNe+85t5dVOrXeRxFOIJjqIIHl1CHW2hAEwgIeIZXeHOU8+K8Ox/z1oKTzxzCHzifP5zWj58=</latexit>

x1 (2)
<latexit sha1_base64="sJdgXiAVm2a4S+4dRd3rRrYB1HY=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK1hbaUDbbTbt0swm7EyGE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSKQw6LrfTmlldW19o7xZ2dre2d2r7h88mjjVjLdYLGPdCajhUijeQoGSdxLNaRRI3g7GN1O//cS1EbF6wCzhfkSHSoSCUbTSfdb3+tWaW3dnIMvEK0gNCjT71a/eIGZpxBUySY3pem6Cfk41Cib5pNJLDU8oG9Mh71qqaMSNn89OnZATqwxIGGtbCslM/T2R08iYLApsZ0RxZBa9qfif100xvPJzoZIUuWLzRWEqCcZk+jcZCM0ZyswSyrSwtxI2opoytOlUbAje4svL5PGs7p3X3buLWuO6iKMMR3AMp+DBJTTgFprQAgZDeIZXeHOk8+K8Ox/z1pJTzBzCHzifPw3gjaM=</latexit> <latexit sha1_base64="fJxEJZDwIRAXzsny9UbZpYFPXZ4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHkyXoR3QoecgZNVZ6yPq1frniVt05yCrxclKBHI1++as3iFkaoTRMUK27npsYf0KV4UzgtNRLNSaUjekQu5ZKGqH2J/NTp+TMKgMSxsqWNGSu/p6Y0EjrLApsZ0TNSC97M/E/r5ua8NqfcJmkBiVbLApTQUxMZn+TAVfIjMgsoUxxeythI6ooMzadkg3BW355lbRqVe+i6t5fVuo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnBfn3flYtBacfOYY/sD5/AEPZI2k</latexit>
<latexit sha1_base64="85kz+6+8sUyRlr+84amQIqvMQLQ=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0msoMeiF48V7Qe0oWy2k3bpZhN2N0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHkyXoR3QoecgZNVZ6yPq1frniVt05yCrxclKBHI1++as3iFkaoTRMUK27npsYf0KV4UzgtNRLNSaUjekQu5ZKGqH2J/NTp+TMKgMSxsqWNGSu/p6Y0EjrLApsZ0TNSC97M/E/r5ua8NqfcJmkBiVbLApTQUxMZn+TAVfIjMgsoUxxeythI6ooMzadkg3BW355lbQuql6t6t5fVuo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnBfn3flYtBacfOYY/sD5/AEQ6I2l</latexit>

a1 o1
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>

<latexit sha1_base64="P19Wda8vivmLhYYW0RO0w4mIIBA=">AAAB6nicbVA9SwNBEJ2LXzF+RS1tFoNgFe5U0DJoYxnRmEByhL3NXrJkb/fYnRNCyE+wsVDE1l9k579xk1yhiQ8GHu/NMDMvSqWw6PvfXmFldW19o7hZ2tre2d0r7x88Wp0ZxhtMS21aEbVcCsUbKFDyVmo4TSLJm9HwZuo3n7ixQqsHHKU8TGhfiVgwik66192gW674VX8GskyCnFQgR71b/ur0NMsSrpBJam078FMMx9SgYJJPSp3M8pSyIe3ztqOKJtyG49mpE3LilB6JtXGlkMzU3xNjmlg7SiLXmVAc2EVvKv7ntTOMr8KxUGmGXLH5ojiTBDWZ/k16wnCGcuQIZUa4WwkbUEMZunRKLoRg8eVl8nhWDc6r/t1FpXadx1GEIziGUwjgEmpwC3VoAIM+PMMrvHnSe/HevY95a8HLZw7hD7zPH/6VjZk=</latexit>

<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>

(1)
a2
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>

x2 (2)
a2 o2
<latexit sha1_base64="lp3ZeQ57DPk1EUCeNsK7Ny2DKe8=">AAAB6nicbVBNSwMxEJ3Ur1q/qh69BIvgqexWQY9FLx4r2g9ol5JNs21oNlmSrFCW/gQvHhTx6i/y5r8xbfegrQ8GHu/NMDMvTAQ31vO+UWFtfWNzq7hd2tnd2z8oHx61jEo1ZU2qhNKdkBgmuGRNy61gnUQzEoeCtcPx7cxvPzFtuJKPdpKwICZDySNOiXXSg+rX+uWKV/XmwKvEz0kFcjT65a/eQNE0ZtJSQYzp+l5ig4xoy6lg01IvNSwhdEyGrOuoJDEzQTY/dYrPnDLAkdKupMVz9fdERmJjJnHoOmNiR2bZm4n/ed3URtdBxmWSWibpYlGUCmwVnv2NB1wzasXEEUI1d7diOiKaUOvSKbkQ/OWXV0mrVvUvqt79ZaV+k8dRhBM4hXPw4QrqcAcNaAKFITzDK7whgV7QO/pYtBZQPnMMf4A+fwAAKI2a</latexit>
L(y, o)
<latexit sha1_base64="jRuGYuNAf6C7yfYguq+x/vIHq08=">AAACDHicbZDLSsNAFIYnXmu9VV26GSxCBSmJCrosunHhooK9QBvKZDpph05mwsxECCEP4MZXceNCEbc+gDvfxkkaQVt/GPj4zznMOb8XMqq0bX9ZC4tLyyurpbXy+sbm1nZlZ7etRCQxaWHBhOx6SBFGOWlpqhnphpKgwGOk402usnrnnkhFBb/TcUjcAI049SlG2liDSrUfID3GiCU3aS1nz0/i9Bj+sEiPTJddt3PBeXAKqIJCzUHlsz8UOAoI15ghpXqOHWo3QVJTzEha7keKhAhP0Ij0DHIUEOUm+TEpPDTOEPpCmsc1zN3fEwkKlIoDz3RmK6rZWmb+V+tF2r9wE8rDSBOOpx/5EYNawCwZOKSSYM1iAwhLanaFeIwkwtrkVzYhOLMnz0P7pO6c1u3bs2rjsoijBPbBAagBB5yDBrgGTdACGDyAJ/ACXq1H69l6s96nrQtWMbMH/sj6+AYGV5uW</latexit>

<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>

<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>

(1)
a3
(2) o3
<latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>

a3
<latexit sha1_base64="vBxbcVs2Wnfm0yi6DKhPPczIBHw=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPu1x7TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOhcI+i</latexit>
<latexit sha1_base64="wT+53Eb88nVtTpSfn4qk4kRsjrU=">AAAB6nicbVBNSwMxEJ3Ur1q/qh69BIvgqexaQY9FLx4r2g9ol5JNs21oNlmSrFCW/gQvHhTx6i/y5r8xbfegrQ8GHu/NMDMvTAQ31vO+UWFtfWNzq7hd2tnd2z8oHx61jEo1ZU2qhNKdkBgmuGRNy61gnUQzEoeCtcPx7cxvPzFtuJKPdpKwICZDySNOiXXSg+rX+uWKV/XmwKvEz0kFcjT65a/eQNE0ZtJSQYzp+l5ig4xoy6lg01IvNSwhdEyGrOuoJDEzQTY/dYrPnDLAkdKupMVz9fdERmJjJnHoOmNiR2bZm4n/ed3URtdBxmWSWibpYlGUCmwVnv2NB1wzasXEEUI1d7diOiKaUOvSKbkQ/OWXV0nrourXqt79ZaV+k8dRhBM4hXPw4QrqcAcNaAKFITzDK7whgV7QO/pYtBZQPnMMf4A+fwABrI2b</latexit>

(1)
a4
(2)
<latexit sha1_base64="uxWzlquY+EeW/UpcO69SCXeIYtQ=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPu1x7Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGhdI+i</latexit>

a4
<latexit sha1_base64="vsgJntgqeAyiGWhpRcem3fXieTw=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPu1x7TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteJdVNy7Wql+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOi+o+j</latexit>
use softmax if this is a multi-clas
(1) problem with mutually exclusive classes
a5 <latexit sha1_base64="NHK0ywkULzi4Jl2BlAjdO8n2Yig=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquVfRY9OKxgv2Qdi3ZNNuGJtklyQpl6a/w4kERr/4cb/4b03YP2vpg4PHeDDPzgpgzbVz328mtrK6tb+Q3C1vbO7t7xf2Dpo4SRWiDRDxS7QBrypmkDcMMp+1YUSwCTlvB6Gbqt56o0iyS92YcU1/ggWQhI9hY6QH3Lh7Tsnc66RVLbsWdAS0TLyMlyFDvFb+6/YgkgkpDONa647mx8VOsDCOcTgrdRNMYkxEe0I6lEguq/XR28ASdWKWPwkjZkgbN1N8TKRZaj0VgOwU2Q73oTcX/vE5iwis/ZTJODJVkvihMODIRmn6P+kxRYvjYEkwUs7ciMsQKE2MzKtgQvMWXl0nzrOJVK+7deal2ncWRhyM4hjJ4cAk1uIU6NICAgGd4hTdHOS/Ou/Mxb8052cwh/IHz+QOi/o+j</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning 11


s

Is It Deep Learning?

Sebastian Raschka STAT 453: Intro to Deep Learning 12


effects on the underlying loss landscape, are not well understo
explore the structure of neural loss functions, and the effect o

arXiv:1712.09913v3 [cs.LG] 7
generalization, using a range of visualization methods. First, w

Note That the Loss is Not Convex Anymore “filter normalization” method that helps us visualize loss fu
make meaningful side-by-side comparisons between loss fu
a variety of visualizations, we explore how network archite
landscape, and how training parameters affect the shape of m
• Linear regression, Adaline, Logistic Regression, and
Softmax Regression have convex loss functions
1 Introduction
Training neural networks requires minimizing a high-dimensional n
task that is hard in theory, but sometimes easy in practice. Despite
general neural loss functions [2], simple gradient methods often find
• This is not the case anymore; in practice, we usually end up at configurations with zero or near-zero training loss), even when data and
training [42]. However, this good behavior is not universal; the traina
di erent local minima if we repeat the training (e.g., by changing dependent on network architecture design choices, the choice of optimi
a variety of other considerations. Unfortunately, the effect of each of th
the random seed for weight initialization or shu ing the dataset the underlying loss surface is unclear. Because of the prohibitive cos
(which requires looping over all the data points in the training set), stud
while leaving all setting the same) predominantly theoretical.

• In practice though, we WANT


to explore di erent starting
weights, however, because
some lead to better solutions
than others
(a) without skip connections (b) wit
Image Source: Li, H., Xu, Z., Taylor, G., Studer, C. and Goldstein, T., 2018. Visualizing the loss
Figure 1: The loss surfaces of ResNet-56 with/without skip conn
landscape of neural nets. In Advances in Neural Information Processing Systems
normalization scheme is used to enable comparisons of sharpness/flat
(pp. 6391-6401).
32nd Conference on Neural Information Processing Systems (NIPS 2018), M

Sebastian Raschka STAT 453: Intro to Deep Learning 13


ff
ff
ffl
About Softmax & Sigmoid in the Output Layer
and Issues with MSE

• Sigmoid activation + MSE has the problem of very at gradients when


the output is very wrong i.e., 10-5 probability and class label 1
@L 2
= (y a) (z) (1 (z))x>
j (derivative for sigmoid + MSE neuron)
<latexit sha1_base64="/g++o5T2V8zTJ0e9RKVOsTvwMZQ=">AAAChHicbVFdb9MwFHUCjNEB6+CRF4tqWiexKtmY4AU0sZc98DAkuk1qSnTjOp03x47sG6BE/iX8K972b+a0UYF1V7J0fM659v3ISiksRtFNED54+Gjt8fqTzsbTZ883u1svzqyuDONDpqU2FxlYLoXiQxQo+UVpOBSZ5OfZ9XGjn3/nxgqtvuKs5OMCpkrkggF6Ku3+TnIDrE5KMChA0qQAvGQg68/O/WV/pFeOfqB7dOHed7VytD/3Znk9c43SXsDt0kRPNNLEimkBS9evpdCP91a13eULP1169S1BXabdXjSI5kFXQdyCHmnjNO3+SSaaVQVXyCRYO4qjEsd10wST3HWSyvIS2DVM+chDBQW343o+REe3PTOhuTb+KKRz9t+MGgprZ0XmnU2h9q7WkPdpowrz9+NaqLJCrtjio7ySFDVtNkInwnCGcuYBMCN8rZRdgp8z+r11/BDiuy2vgrP9QXwwiL687R19asexTl6R16RPYvKOHJETckqGhAVBsBNEQRyuhW/Cg/BwYQ2DNucl+S/Cj7euucNR</latexit>
@wj n

• Softmax (forces network to learn probability distribution over


labels) in output layer is better than sigmoid because of the
mutually exclusive labels as discussed in the Softmax lecture;
hence, in output layer, softmax is usually better than sigmoid

Sebastian Raschka STAT 453: Intro to Deep Learning 14


fl
What happens if we
initialize the multilayer
perceptron to all-zero
weights?

Sebastian Raschka STAT 453: Intro to Deep Learning 15


Complex, nonlinear decision boundaries
with hidden layer and nonlinear activations

1. Multilayer Perceptron Architecture

2. Nonlinear Activation Functions

3. Multilayer Perceptron Code Examples

4. Over tting and Under tting

5. Cats & Dogs and Custom Data Loaders

Sebastian Raschka STAT 453: Intro to Deep Learning 16


fi
fi
PyTorch Recap

import torch.nn.functional as

class MultilayerPerceptron(torch.nn.Module) class MultilayerPerceptron(torch.nn.Module)

def __init__(self, num_features, num_classes) def __init__(self, num_features, num_classes)


super(MultilayerPerceptron, self).__init__( super(MultilayerPerceptron, self).__init__(

### 1st hidden layer self.my_network = torch.nn.Sequential


self.linear_1 = torch.nn.Linear(num_features torch.nn.Linear(num_features, num_hidden_1)
num_hidden_1 torch.nn.ReLU()
torch.nn.Linear(num_hidden_1, num_hidden_2)
### 2nd hidden layer torch.nn.ReLU()
self.linear_2 = torch.nn.Linear(num_hidden_1 torch.nn.Linear(num_hidden_2, num_classes
num_hidden_2

### Output layer def forward(self, x)


self.linear_out = torch.nn.Linear(num_hidden_2 logits = self.my_network(x
num_classes probas = F.softmax(logits, dim=1
return logits, probas
def forward(self, x)
out = self.linear_1(x
out = F.relu(out
out = self.linear_2(out
out = F.relu(out
logits = self.linear_out(out
probas = F.log_softmax(logits, dim=1
return logits, probas

Sebastian Raschka STAT 453: Intro to Deep Learning 17


Solving the XOR Problem with Non-Linear


Activations

Sebastian Raschka STAT 453: Intro to Deep Learning 18


Solving the XOR Problem with Non-Linear
Activations

<latexit sha1_base64="lNHxR/yX3xSb0KuWlCBXI8r8/pU=">AAAB/nicbVDLSgMxFM3UV62vUXHlJliEFqTMiKgboejGZQX7gM5QMmmmDU0yQ5IR26Hgr7hxoYhbv8Odf2PazkJbD1w4nHMv994TxIwq7TjfVm5peWV1Lb9e2Njc2t6xd/caKkokJnUcsUi2AqQIo4LUNdWMtGJJEA8YaQaDm4nffCBS0Ujc62FMfI56goYUI22kjn3gKdrjqDQqX0GPo8eScwJH5Y5ddCrOFHCRuBkpggy1jv3ldSOccCI0ZkiptuvE2k+R1BQzMi54iSIxwgPUI21DBeJE+en0/DE8NkoXhpE0JTScqr8nUsSVGvLAdHKk+2rem4j/ee1Eh5d+SkWcaCLwbFGYMKgjOMkCdqkkWLOhIQhLam6FuI8kwtokVjAhuPMvL5LGacU9rzh3Z8XqdRZHHhyCI1ACLrgAVXALaqAOMEjBM3gFb9aT9WK9Wx+z1pyVzeyDP7A+fwCs9JQA</latexit>

(z) = max(0, z)
<latexit sha1_base64="Rt8jTWtDgekNkOeHqTh73RoLHno=">AAAB9HicbVBNSwMxEJ31s9avqkcvwSLUS9lVQS9C0YvHCvYD2qVk02wbmmTXJFtol/4OLx4U8eqP8ea/MW33oK0PBh7vzTAzL4g508Z1v52V1bX1jc3cVn57Z3dvv3BwWNdRogitkYhHqhlgTTmTtGaY4bQZK4pFwGkjGNxN/caQKs0i+WhGMfUF7kkWMoKNlfy2Zj2BS+MzdIPGnULRLbszoGXiZaQIGaqdwle7G5FEUGkIx1q3PDc2foqVYYTTSb6daBpjMsA92rJUYkG1n86OnqBTq3RRGClb0qCZ+nsixULrkQhsp8Cmrxe9qfif10pMeO2nTMaJoZLMF4UJRyZC0wRQlylKDB9Zgoli9lZE+lhhYmxOeRuCt/jyMqmfl72LsvtwWazcZnHk4BhOoAQeXEEF7qEKNSDwBM/wCm/O0Hlx3p2PeeuKk80cwR84nz9f5JEt</latexit>
(z) = z

1-hidden layer ML 1-hidden layer ML


with linear activation function with non-linear activation function (ReLU)

https://github.com/rasbt/stat453-deep-learning-ss21/blob/master/L09/code/xor-problem.ipynb

Sebastian Raschka STAT 453: Intro to Deep Learning 19


P

A Selection of Common Activation Functions (1)


Identity (Logistic) Sigmoid

1
(z) =
<latexit sha1_base64="Rt8jTWtDgekNkOeHqTh73RoLHno=">AAAB9HicbVBNSwMxEJ31s9avqkcvwSLUS9lVQS9C0YvHCvYD2qVk02wbmmTXJFtol/4OLx4U8eqP8ea/MW33oK0PBh7vzTAzL4g508Z1v52V1bX1jc3cVn57Z3dvv3BwWNdRogitkYhHqhlgTTmTtGaY4bQZK4pFwGkjGNxN/caQKs0i+WhGMfUF7kkWMoKNlfy2Zj2BS+MzdIPGnULRLbszoGXiZaQIGaqdwle7G5FEUGkIx1q3PDc2foqVYYTTSb6daBpjMsA92rJUYkG1n86OnqBTq3RRGClb0qCZ+nsixULrkQhsp8Cmrxe9qfif10pMeO2nTMaJoZLMF4UJRyZC0wRQlylKDB9Zgoli9lZE+lhhYmxOeRuCt/jyMqmfl72LsvtwWazcZnHk4BhOoAQeXEEF7qEKNSDwBM/wCm/O0Hlx3p2PeeuKk80cwR84nz9f5JEt</latexit>
(z) = z <latexit sha1_base64="juP/lRLXXUm+xowd+Wp3uxbypdo=">AAACCXicbVDLSgMxFM3UV62vUZdugkVoEcuMCroRim5cVrAP6Awlk2ba0CQzJBmxHWbrxl9x40IRt/6BO//G9LHQ6oELh3Pu5d57gphRpR3ny8otLC4tr+RXC2vrG5tb9vZOQ0WJxKSOIxbJVoAUYVSQuqaakVYsCeIBI81gcDX2m3dEKhqJWz2Mic9RT9CQYqSN1LGhp2iPo9KoDC+gF0qEUzdL3UOP3Melo1E569hFp+JMAP8Sd0aKYIZax/70uhFOOBEaM6RU23Vi7adIaooZyQpeokiM8AD1SNtQgThRfjr5JIMHRunCMJKmhIYT9edEirhSQx6YTo50X817Y/E/r53o8NxPqYgTTQSeLgoTBnUEx7HALpUEazY0BGFJza0Q95GJQ5vwCiYEd/7lv6RxXHFPKs7NabF6OYsjD/bAPigBF5yBKrgGNVAHGDyAJ/ACXq1H69l6s96nrTlrNrMLfsH6+AYrgpi4</latexit>
1 + exp( z)

Tanh ("tanH") Hard Tanh

8
>
<1 if z > 1
exp(z) exp( z)
Tanh(z) = HardTanh(z) = 1 if z < 1
exp(z) + exp( z) >
:
<latexit sha1_base64="5cZuu210xeCBlZ4E5jtS8+GzZjM=">AAACKHicbZDLSgMxFIYzXmu9VV26CRbBIpYZFXQjim5cKtgqdErJpGdsMJMZkjNiO/Rx3PgqbkQUceuTmF7wVn8IfPnPOSTnDxIpDLruuzM2PjE5NZ2byc/OzS8sFpaWqyZONYcKj2WsrwJmQAoFFRQo4SrRwKJAwmVwc9KrX96CNiJWF9hOoB6xayVCwRlaq1E49BHuMLtgqtXd6JToAfVDzXjmw13Su2/RPm11Sl36ZW5+m41C0S27fdFR8IZQJEOdNQrPfjPmaQQKuWTG1Dw3wXrGNAouoZv3UwMJ4zfsGmoWFYvA1LP+ol26bp0mDWNtj0Lad39OZCwyph0FtjNi2DJ/az3zv1otxXC/ngmVpAiKDx4KU0kxpr3UaFNo4CjbFhjXwv6V8hazQaHNNm9D8P6uPArV7bK3U3bPd4tHx8M4cmSVrJEN4pE9ckROyRmpEE7uySN5Ia/Og/PkvDnvg9YxZzizQn7J+fgEGYOkGQ==</latexit>

<latexit sha1_base64="HUDbc5GxdYcfA36GuHXXRmmXRRg=">AAACmXicbZHPbtNAEMbXLtAS/oUi9dLLiAhUDkQ2rVQOLWpBoIhTEU1bKY6i9XqcrLpeW7vjQmL5nXgWbrwNm9QSacInrfTpm99odmfjQklLQfDH8zfu3X+wufWw9ejxk6fP2s+3L2xeGoF9kavcXMXcopIa+yRJ4VVhkGexwsv4+tO8fnmDxspcn9O0wGHGx1qmUnBy0aj9CxpFhD+p6nGTnHM9qfdmb+AYohjHUlfCDbB1C5YUwuumBWQKNczgg8ui6A70dp06mocr2OwfldMEzQ9p0cFLVIQ6aW4xaneCbrAQrJuwMR3W6GzU/h0luSgz1CQUt3YQBgUNK25ICoV1KyotFlxc8zEOnNU8QzusFput4ZVLEkhz444mWKTLHRXPrJ1msSMzThO7WpuH/6sNSkrfDyupi5JQi9tBaamAcph/EyTSoCA1dYYLI91dQUy44YLcZ7bcEsLVJ6+bi3fdcL8bfDvonHxs1rHFdtlLtsdCdshOWI+dsT4T3o535H32vvi7/qnf87/eor7X9Lxgd+R//wuLNLmu</latexit>
z otherwise

Sebastian Raschka STAT 453: Intro to Deep Learning 20


A Selection of Common Activation Functions (1)
(Logistic) Sigmoid

• Advantages of Tan
• Mean centerin
• Positive and negative value
• Larger gradients

Tanh ("tanH")
Additional tip: Also good to
normalize inputs to mean zero
and use random weight
initialization with avg. weight
centered at zer
Also simple
derivative
d
T anh(z) = 1 T anh(z)2
<latexit sha1_base64="DqKNoYPxBXT/vBenWHakJA2dvWM=">AAACD3icbVC7TsMwFHXKq5RXgJHFogKVgSopSLAgVbAwFqkvqS2V4zitVceJbAepjfIHLPwKCwMIsbKy8Tc4bQZoOdKVjs+5V773OCGjUlnWt5FbWl5ZXcuvFzY2t7Z3zN29pgwigUkDBywQbQdJwignDUUVI+1QEOQ7jLSc0U3qtx6IkDTgdTUOSc9HA049ipHSUt887noC4dhNYneS1BEfliYn8Ara8BRmr/u4khT6ZtEqW1PARWJnpAgy1PrmV9cNcOQTrjBDUnZsK1S9GAlFMSNJoRtJEiI8QgPS0ZQjn8hePL0ngUdacaEXCF1cwan6eyJGvpRj39GdPlJDOe+l4n9eJ1LeZS+mPIwU4Xj2kRcxqAKYhgNdKghWbKwJwoLqXSEeIh2Q0hGmIdjzJy+SZqVsn5Wtu/Ni9TqLIw8OwCEoARtcgCq4BTXQABg8gmfwCt6MJ+PFeDc+Zq05I5vZB39gfP4A+tmarA==</latexit>
dz
Sebastian Raschka STAT 453: Intro to Deep Learning 21
:

A Selection of Common Activation Functions (2)


ReLU (Recti ed Linear Unit) Leaky ReLU
(
( z, if z 0
z, if z 0 LeakyReLU(z) =
ReLU(z) = ↵ ⇥ z, otherwise
0, otherwise <latexit sha1_base64="URhCySzdBd9uYPw3t3ZOk9dZRqk=">AAACg3icbVFNb9NAEF0bCiV8pXDkMiKiKgKldqlULpUquHDooSDSVoqjaLwZJ6us12Z3DCSW/wg/ixv/hk1qtbTlSSs9vTdvZ3Y2LbVyHEV/gvDO3Y179zcfdB4+evzkaXfr2akrKitpIAtd2PMUHWllaMCKNZ2XljBPNZ2l848r/+w7WacK85UXJY1ynBqVKYnspXH3F7RImH5yfUw4X3yh40Gzs3wNh51LN6WpMrX0nVxzqS7fwnYbBJVBA0tIpvQNIkiSqyjqcoa+TOXkrkUKnpH9oRzB1ZUJmUnbZtztRf1oDbhN4pb0RIuTcfd3MilklZNhqdG5YRyVPKrRspKamk5SOSpRznFKQ08N+oFG9XqHDbzyygSywvpjGNbqv4kac+cWeeorc+SZu+mtxP95w4qz96NambJiMvKiUVZp4AJWHwITZUmyXniC0io/K8gZWpTsv63jlxDffPJtcrrXj9/1o8/7vaMP7To2xQvxUuyIWByII/FJnIiBkIEItoPdIAo3wjfhXrh/URoGbea5uIbw8C+bYraB</latexit>

<latexit sha1_base64="8/mhuw0565qyUYql8RjpNUv+x0k=">AAACaXicbVHRShtBFJ1dbbVRa1Qq0r5cGioWSthVob4UpH3xwQdbGhWyIcxO7iaDs7PrzF1tsiz0G/vWH+hLf8JJXKxVDwwczj333pkzca6kpSD47flz88+eLyy+aCwtr7xcba6tn9qsMAI7IlOZOY+5RSU1dkiSwvPcIE9jhWfxxZdp/ewKjZWZ/k7jHHspH2qZSMHJSf3mT4gIf1D5DY871c7kPXxqQI0oxqHUpXDTbXWnTj7Adt0DMoEKJhAN8RICiKI7U3DPlNEIzbW0CP+GRKgH9eB+sxW0gxngMQlr0mI1TvrNX9EgE0WKmoTi1nbDIKdeyQ1JobBqRIXFnIsLPsSuo5qnaHvlLKkK3jllAElm3NEEM/V+R8lTa8dp7Jwpp5F9WJuKT9W6BSUHvVLqvCDU4nZRUiigDKaxw0AaFKTGjnBhpLsriBE3XJD7nIYLIXz45MfkdLcd7rWDr/utw891HIvsDXvLdljIPrJDdsROWIcJ9sdb9l55m95ff83f8l/fWn2v7tlg/8Fv3QCnIK87</latexit>

LeakyReLU(z) = max(0, z) + ↵ ⇥ min(0, z)


<latexit sha1_base64="3Anmn/Z9saq5q1nuVNWhWUho6OY=">AAACInicbZDPSxtBFMdno7YabZvq0ctgECKWsFuFtoeC2IuHHKKYGMiG8HbyYobMzi4zb4vpkr+ll/4rvfRQUU+Cf4yTHweb9MHAh+/3Pd68b5Qqacn3H73Cyuraq9frG8XNrTdv35XebzdtkhmBDZGoxLQisKikxgZJUthKDUIcKbyKht8m/tV3NFYm+pJGKXZiuNayLwWQk7qlLyHhDeU1hOHoAmuNceXHAf/KwxhuKv4Hx4c8BJUOgIckY7TOkXrqdEtlv+pPiy9DMIcym1e9W7oPe4nIYtQkFFjbDvyUOjkYkkLhuBhmFlMQQ7jGtkMNbl0nn5445vtO6fF+YtzTxKfqy4kcYmtHceQ6Y6CBXfQm4v+8dkb9z51c6jQj1GK2qJ8pTgmf5MV70qAgNXIAwkj3Vy4GYECQS7XoQggWT16G5sdqcFT1z4/LJ6fzONbZLttjFRawT+yEnbE6azDBfrLf7C+79X55f7w772HWWvDmMzvsn/KengFsEaGx</latexit>

ReLU(z) = max(0, z)
<latexit sha1_base64="tpvX57OOx4EZDDJwxaIFA+Uqi8w=">AAACA3icbVDLSgNBEJyNrxhfq970MhiECBJ2VdCLEPTiwUMUE4UkhNlJxwyZfTDTK0mWgBd/xYsHRbz6E978GyePgyYWNBRV3XR3eZEUGh3n20rNzM7NL6QXM0vLK6tr9vpGWYex4lDioQzVncc0SBFACQVKuIsUMN+TcOu1zwf+7QMoLcLgBrsR1Hx2H4im4AyNVLe3qggdTK7hstTP9fZOadVnnZyzT3t7dTvr5J0h6DRxxyRLxijW7a9qI+SxDwFyybSuuE6EtYQpFFxCP1ONNUSMt9k9VAwNmA+6lgx/6NNdozRoM1SmAqRD9fdEwnytu75nOn2GLT3pDcT/vEqMzZNaIoIoRgj4aFEzlhRDOgiENoQCjrJrCONKmFspbzHFOJrYMiYEd/LlaVI+yLuHeefqKFs4G8eRJttkh+SIS45JgVyQIikRTh7JM3klb9aT9WK9Wx+j1pQ1ntkkf2B9/gCi/JYz</latexit>

↵ = 0.025
<latexit sha1_base64="GxyagE92KSMVQEIKdX70/1bBiDc=">AAAB83icbVBNSwMxEJ31s9avqkcvi0XwVHaroheh6MVjBfsB3aXMptk2NJsNSVYopX/DiwdFvPpnvPlvTNs9aOuDYR7vzZDJiyRn2njet7Oyura+sVnYKm7v7O7tlw4OmzrNFKENkvJUtSPUlDNBG4YZTttSUUwiTlvR8G7qt56o0iwVj2YkaZhgX7CYETRWCgLkcoA3XsWrXnZLZdtncJeJn5My5Kh3S19BLyVZQoUhHLXu+J404RiVYYTTSTHINJVIhtinHUsFJlSH49nNE/fUKj03TpUtYdyZ+ntjjInWoySykwmagV70puJ/Xicz8XU4ZkJmhgoyfyjOuGtSdxqA22OKEsNHliBRzN7qkgEqJMbGVLQh+ItfXibNasU/r3gPF+XabR5HAY7hBM7AhyuowT3UoQEEJDzDK7w5mfPivDsf89EVJ985gj9wPn8ASyuQiA==</latexit>

ELU (Exponential Linear Unit)

PReLU (Parameterized Recti ed Linear Unit)

here, alpha is a trainable parameter

(
z, if z 0
PReLU(z) =
↵=1
<latexit sha1_base64="yzGSqYTdMxl7DUavLZUx0UoSOM8=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0ItQ9OKxgv2ANpTJdtMu3Wzi7kYooX/CiwdFvPp3vPlv3LY5aOuDgcd7M8zMCxLBtXHdb6ewsrq2vlHcLG1t7+zulfcPmjpOFWUNGotYtQPUTHDJGoYbwdqJYhgFgrWC0e3Ubz0xpXksH8w4YX6EA8lDTtFYqd1FkQzx2uuVK27VnYEsEy8nFchR75W/uv2YphGThgrUuuO5ifEzVIZTwSalbqpZgnSEA9axVGLEtJ/N7p2QE6v0SRgrW9KQmfp7IsNI63EU2M4IzVAvelPxP6+TmvDKz7hMUsMknS8KU0FMTKbPkz5XjBoxtgSp4vZWQoeokBobUcmG4C2+vEyaZ1XvvOreX1RqN3kcRTiCYzgFDy6hBndQhwZQEPAMr/DmPDovzrvzMW8tOPnMIfyB8/kDgdGPnA==</latexit>
↵z, otherwise
<latexit sha1_base64="5XMW6SQ4ZR0mdd8VGBaZZP6S0qU=">AAACcnicbVFNT9tAEF2bfkD6FUBcWqmdNgIFCUU2IMEFCbWXHnpIqwaQ4ihab8bJivXa7I5pE8s/oH+vt/6KXvgBbFKrUOiTVnp6M29m922cK2kpCH55/tKDh48eL680njx99vxFc3XtxGaFEdgTmcrMWcwtKqmxR5IUnuUGeRorPI3PP8zrp5dorMz0V5rmOEj5WMtECk5OGjZ/QET4ncruF/zUq9qzbThqQI0oxrHUpXDjbfVXne3AVm0CmUAFM4jGeAEBRNGNlat8wstZdas5owmab9Ii3AyLUI/qBcNmK+gEC8B9EtakxWp0h82f0SgTRYqahOLW9sMgp0HJDUmhsGpEhcWci3M+xr6jmqdoB+Uisgo2nTKCJDPuaIKFettR8tTaaRq7zpTTxN6tzcX/1foFJYeDUuq8INTiz6KkUEAZzPOHkTQoSE0d4cJId1cQE264IPdLDRdCePfJ98nJbifc6wSf91vH7+s4ltkr9o61WcgO2DH7yLqsxwT77W14r7033pX/0n/r19n5Xu1ZZ//A37kGhZ6ziQ==</latexit>

ELU(z) = max(0, z) + min(0, ↵ ⇥ (exp(z)


<latexit sha1_base64="VXwC+63HXOVSA52FqXMXQP5tMss=">AAACKXicbZBBaxpBFMdnTZNamyYmOfYyVApKUtlNA8klICmFHnKwUKPgirwdnzpkdnaZeRs0i18nl3yVXFpoaXvNF+loPLTaPwz8+L/3ePP+UaqkJd//5RU2nm1uPS++KL3cfrWzW97bv7JJZgS2RKIS04nAopIaWyRJYSc1CHGksB1df5jX2zdorEz0F5qm2IthpOVQCiBn9cuNkHBC+cfL1qx6W+PnPIxhUvWPHB86ltoxD0GlY+AhyRgtr4Y4SefN73hQq/XLFb/uL8TXIVhChS3V7Je/hYNEZDFqEgqs7QZ+Sr0cDEmhcFYKM4spiGsYYdehBrezly8unfG3zhnwYWLc08QX7t8TOcTWTuPIdcZAY7tam5v/q3UzGp71cqnTjFCLp0XDTHFK+Dw2PpAGBampAxBGur9yMQYDgly4JRdCsHryOlwd14P3df/zSaVxsYyjyF6zN6zKAnbKGuwTa7IWE+yOPbDv7Id37331fnq/n1oL3nLmgP0j7/EPxPuimA==</latexit>
1)) PReLU(z) = max(0, z) + ↵ ⇥ min(0, z)
<latexit sha1_base64="vr0e7xo3W+vVX5oMYw4nIbD+6JQ=">AAACHnicbZDPaxNBFMdnU21j1JrWo5fBIESUsFtb2ksh1IsHD1HMD8iG8HbykgyZnV1m3pakS/6SXvqvePHQIoIn/W+cbHLQ1AcDH77f93jzvlGqpCXf/+2Vdh483N0rP6o8fvJ0/1n14LBjk8wIbItEJaYXgUUlNbZJksJeahDiSGE3mr1f+d1LNFYm+gstUhzEMNFyLAWQk4bVk5BwTnnrM35sL+tXr/k5D2OY1/23jt/wEFQ6BR6SjNE6R+rCGVZrfsMvit+HYAM1tqnWsPozHCUii1GTUGBtP/BTGuRgSAqFy0qYWUxBzGCCfYca3LpBXpy35K+cMuLjxLiniRfq3xM5xNYu4sh1xkBTu+2txP95/YzGZ4Nc6jQj1GK9aJwpTglfZcVH0qAgtXAAwkj3Vy6mYECQS7TiQgi2T74PnaNG8K7hfzquNS82cZTZC/aS1VnATlmTfWAt1maCXbOv7JbdeTfeN++792PdWvI2M8/ZP+X9+gP8tZ/j</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning 22


fi
fi
https://arxiv.org/abs/2006.14536
Forward Backward
0.6 1.2
ReLU ReLU
0.5 Parametric SoftPlus 1.0 Parametric SoftPlus

0.4 0.8

0.3 0.6

0.2 0.4

0.1 0.2

0.0 0.0

-0.1 -0.2
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6
Figure 1: Left Panel: ReLU and Parametric SoftPlus. Right Panel: the first derivatives for ReLU and Parametric
SoftPlus. Compared to ReLU, Parametric Softplus is smooth with continuous derivatives.
training samples. Guided by this principle, we identify that ReLU, a widely-used activation function
in most network
It is commonly architectures,
believed significantly
that networks weakens
cannot be adversarial trainingand
both accurate due robust,
to its non-smooth nature,
that gaining
e.g., ReLU’s
robustness meansgradient
losing gets an abrupt[...]
accuracy. change
Our when its input is zero,
key observation is as illustrated
that in Figure 1. ReLU
the widely-used
To fix
activation the issuesigni
function induced by ReLU,
cantly weakensin this adversarial
paper, we propose smooth
training adversarial
due training (SAT),nature.
to its non-smooth which
1
enforces architectural smoothness via replacing ReLU with its smooth
Hence we propose smooth adversarial training (SAT), in which we replace ReLU approximations for improving
with its
the gradient quality in adversarial training (Figure 1 shows Parametric SoftPlus, an example of smooth
smooth approximations to strengthen adversarial training.
approximations for ReLU). With smooth activation functions, SAT is able to feed the networks with
harder adversarial training samples and compute better gradient updates for network optimization,
hence substantially strengthens adversarial training.
Sebastian Raschka
Our experiment results show that SAT improves
STAT 453: Intro to Deep Learning 23
fi
Forward Backward
3.9 1.2
Parametric SoftPlus Parametric SoftPlus
Swish Swish
3.2 GELU 1.0 GELU
ELU ELU
2.5 SmoothReLU 0.8 SmoothReLU

1.8 0.6

1.1 0.4

0.4 0.2

-0.3 0.0

-1.0 -0.2
-4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0
Figure 2: Visualizations of smooth activation functions and their derivatives.

4 Smooth Adversarial Training Robu


Standard Accuracy (%)


ELU
GELU 1
As shown above, 70 improving ReLU’s gradient can both strengthen the
SmoothReLU
attacker and provide better1.2
Swish
-0.3
gradient updates. Nonetheless, this strategy may be suboptimal as there still is a discrepancy between1.4 -2.0
Parametric Softplus ELU
the forward pass (which
69 we
ReLUuse ReLU) and the backward pass (which we use Parametric Softplus).1.6 -3.7
1.8 -6.2
To fully exploit the potential of training with better gradients, we hereby propose smooth adversarial2.0
Softplus
-7.9
68
training (SAT), which enforces
33 34 architectural
35 36 smoothness
37 38via the
39 exclusive
40 usage 42
41 of smooth
43 activation
functions in adversarial training. We keep all other Robustness
Adversarial network components
(%) the same, as most of them
Table 2:
will not result in the issue of poor gradient.3 comparison
Figure 3: Smooth activation functions improve adversarial training. Compared ELU (non-sm
to ReLU, all smooth activation functions significantly boost robustness, while ↵ 6= 1)
4.1 Adversarial Training
keeping with
accuracy Smooth
almost Activation Functions
the same. (always smo
We consider the following activation functions as the smooth approximations of ReLU in SAT
4.2 Ruling Out the Effect From x < 0
(Figure 2 plots these functions as well as their derivatives):
Compared to ReLU, in addition to being smooth, the functions above have non-zero re
• Softplus [24]:https://arxiv.org/abs/2006.14536
f (x) = log(1 + exp(x)). We also consider its parametric version f (x) = ↵1 log(1 +
negative inputs (x < 0) which may also affect adversarial training. To rule out this factor,
exp(↵x)), and set ↵ = 10 as in Section 3.
propose SmoothReLU, which flattens the activation function by only modifying ReLU af
• Swish [31, 9]: f (x) = x · sigmoid(x).
Sebastian Raschka Compared totoother
STAT 453: Intro activation functions, Swish has a
⇢ Learning
Deep 24
1
https://twitter.com/
TheInsaneApp/status/
1366324846976659461?s=20
Sebastian Raschka STAT 453: Intro to Deep Learning 25
Implementing Multilayer Perceptrons in
PyTorch

1. Multilayer Perceptron Architecture

2. Nonlinear Activation Functions

3. Multilayer Perceptron Code Examples

4. Over tting and Under tting

5. Cats & Dogs and Custom Data Loaders

Sebastian Raschka STAT 453: Intro to Deep Learning 26


fi
fi
Multilayer Perceptron with Multilayer Perceptron with
Sigmoid Activation VS Softmax Activation
and MSE Loss and Cross Entropy Loss
https://github.com/rasbt/stat453-deep- https://github.com/rasbt/stat453-deep-
learning-ss21/blob/main/L09/code/mlp- learning-ss21/blob/main/L09/code/mlp-
pytorch_sigmoid-mse.ipynb pytorch_softmax-crossentr.ipynb

Training Accuracy: 99.0


Training Accuracy: 90.7
Test Accuracy: 97.61
Test Accuracy: 91.32

Sebastian Raschka STAT 453: Intro to Deep Learning 27


0

Dead Neurons

• ReLU is probably the most popular activation function


(simple to compute, fast, good results
• But esp. ReLU neurons might "die" during trainin
• Can happen if, e.g., input is so large/small that net input is
so small that ReLUs never recover (gradient 0 at x < 0
• Not necessarily bad, can be considered as a form of
regularizatio
• (compared to sigmoid/Tanh, ReLU suffers less from
vanishing gradient problem but can more easily "explode")

Sebastian Raschka STAT 453: Intro to Deep Learning 28


n

Wide vs Deep Architectures


(Breadth vs Depth)

MLP's with one (large) hidden unit are universal function


approximators [1-3] already why do we want to use deeper
architectures?

[1] Balázs Csanád Csáji (2001) Approximation with Arti cial Neural Networks; Faculty of Sciences; Eötvös Loránd
University, Hungar
[2] Cybenko, G. (1989) "Approximations by superpositions of sigmoidal functions", Mathematics of Control, Signals, and
Systems, 2(4), 303–314. doi:10.1007/BF0255127
[3] Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural
networks, 2(5), 359-366.

Sebastian Raschka STAT 453: Intro to Deep Learning 29


y

fi

Wide vs Deep Architectures


(Breadth vs Depth)

• Can achieve the same expressiveness with more layers but


fewer parameters (combinatorics);
fewer parameters => less over ttin
• Also, having more layers provides some form of
regularization: later layers are constrained on the behavior
of earlier layer
• However, more layers => vanishing/exploding gradient
• Later: different layers for different levels of feature
abstraction (DL is really more about feature learning than
just stacking multiple layers)

Sebastian Raschka STAT 453: Intro to Deep Learning 30


s

fi
g

The problems with models that are too simple


and models that t the training data "too well"

1. Multilayer Perceptron Architecture

2. Nonlinear Activation Functions

3. Multilayer Perceptron Code Examples

4. Over tting and Under tting

5. Cats & Dogs and Custom Data Loaders

Sebastian Raschka STAT 453: Intro to Deep Learning 31


fi
fi
fi
Sebastian Raschka STAT 453: Intro to Deep Learning 32
Sebastian Raschka STAT 453: Intro to Deep Learning 33
Sebastian Raschka STAT 453: Intro to Deep Learning 34
Sebastian Raschka STAT 453: Intro to Deep Learning 35
Sebastian Raschka STAT 453: Intro to Deep Learning 36
Sebastian Raschka STAT 453: Intro to Deep Learning 37
Sebastian Raschka STAT 453: Intro to Deep Learning 38
Sebastian Raschka STAT 453: Intro to Deep Learning 39
Sebastian Raschka STAT 453: Intro to Deep Learning 40
Sebastian Raschka STAT 453: Intro to Deep Learning 41
Recommended Practice: Looking at Some
Failure Cases

Failure cases of a ~93% accuracy (not very good, but beside the point)
2-layer (1-hidden layer) MLP on MNIS
(where t=target class and p=predicted class)

Sebastian Raschka STAT 453: Intro to Deep Learning 42


T

Over tting and Under tting

We usually use the test set erro


as estimator of the generalization error

Error
Generalization Error

Training Error
Over tting

Model Capacity

Sebastian Raschka STAT 453: Intro to Deep Learning 43


fi
fi
r

fi
Bias-Variance Decomposition
Bias-Variance Decomposition

General Definition: Intuition:

Bias [ ˆ
✓] = E[ ˆ
✓] ✓
Bias(θ)̂ ✓= E[θ]̂ − θ
<latexit sha1_base64="ArW0mTesET86qB5wWHfkERheXBA=">AAACKnicbVDJSgNBEO1xN25Rj14Gg+DFMKOCXgQXBI8KZoHMEGo6laRJz/TQXSOEId/jxV/x4kERr36InUUw6oOGV+9V0VUvSqUw5Hnvzszs3PzC4tJyYWV1bX2juLlVNSrTHCtcSaXrERiUIsEKCZJYTzVCHEmsRb2roV97QG2ESu6pn2IYQycRbcGBrNQsXgQqRQ2kdAIx5pcCzKCZB9RFgkEj6AJ9F+HZ9XR9MCbNYskreyO4f4k/ISU2wW2z+BK0FM9iTIhLMKbheymFOWgSXOKgEGQGU+A96GDD0uFeJsxHpw7cPau03LbS9iXkjtSfEznExvTjyHbGQF3z2xuK/3mNjNqnYS6SNCNM+PijdiZdUu4wN7clNHKSfUuAa2F3dXkXNHCy6RZsCP7vk/+S6mHZPyp7d8el88tJHEtsh+2yfeazE3bObtgtqzDOHtkze2VvzpPz4rw7H+PWGWcys82m4Hx+ARaKqXc=</latexit>

(we ignore noise in this


lecture for simplicity)

( )
2
Var(θ)̂ = E[θ2̂ ] − E[θ ]̂
h i
Var
Var(
ˆ
θ)̂ =✓ [E[(E[
✓] = ˆ
θ]̂ −Eθ)̂ 2]✓
2 ˆ
(E[✓]) 2

h i
<latexit sha1_base64="+kALwOjv4LtTP3EAUE/0PGX7wiU=">AAACR3icbVBBS+NAGJ3UddXq7lY9eglbFvSwJXEX1osgiuBRwVahieXL9EszOMmEmS9CCfl3Xrzubf+CFw+KeHRSK1jdBwNv3nsf882LcikMed4/pzH3af7zwuJSc3nly9dvrdW1nlGF5tjlSip9HoFBKTLskiCJ57lGSCOJZ9HlQe2fXaE2QmWnNM4xTGGUiVhwICsNWheBylEDKZ1BimUPdDUoA0qQoOoHCdDrJdw9DCTGNCNelNtVoMUoofDn5uFsfqs2B6221/EmcD8Sf0rabIrjQetvMFS8SDEjLsGYvu/lFJagSXCJVTMoDObAL2GEfUvrpU1YTnqo3B9WGbqx0vZk5E7UtxMlpMaM08gmU6DEvPdq8X9ev6B4JyxFlheEGX95KC6kS8qtS3WHQiMnObYEuBZ2V5cnoIGTrb5pS/Dff/kj6W13/F8d7+R3e29/Wsci22Df2Sbz2R+2x47YMesyzq7ZLbtnD86Nc+c8Ok8v0YYznVlnM2g4z8j+tLQ=</latexit>

Var✓ [✓] ˆ = E (E[✓] ˆ ✓)


ˆ2
<latexit sha1_base64="xMMnWtUSSo8rahxD33Zw+9RyoF4=">AAACQ3icbVBNSxxBEO0xfmUTdWOOuQwugjm4zKigF0EMgkcFdxVnJktNb81Osz3TQ3eNsAzz33LJH8jNP+AlB0PwKti7ruDXg4ZX71VR1S8upDDkedfOzIfZufmFxY+NT5+XlleaX1a7RpWaY4crqfRFDAalyLFDgiReFBohiyWex8MfY//8CrURKj+jUYFRBoNcJIIDWanXvAxVgRpI6RwyrLqg614VUooEdRCmQE9FtH8USkwo2Dh6qW8+r77/rLbqUItBSlGv2fLa3gTuW+JPSYtNcdJr/gn7ipcZ5sQlGBP4XkFRBZoEl1g3wtJgAXwIAwwsHR9somqSQe2uW6XvJkrbl5M7UZ9PVJAZM8pi25kBpea1Nxbf84KSkr2oEnlREub8cVFSSpeUOw7U7QuNnOTIEuBa2FtdnoIGTjb2hg3Bf/3lt6S71fa3297pTuvgcBrHIvvG1tgG89kuO2DH7IR1GGe/2A27Zf+c385f579z99g640xnvrIXcO4fAEIitAM=</latexit>

Sebastian
Sebastian Raschka STAT STAT
Raschka 453: Intro
479: Machine to Deep Learning
Learning FS 2018 44
25
Bias & Variance vs
Over tting & Under tting

Training Error

Error Under tting Over tting


increases increases Generalization Error

Variance
Bias

Model Capacity
capacity: abstract concept meaning roughly the number of parameters of the model times how ef ciently the parameters are use

Sebastian Raschka STAT 453: Intro to Deep Learning 45


fi
fi
fi
fi
fi
d

Further Reading Material

• A more detailed understanding of bias and variance is not


mandatory for this clas

• Also, train/valid/test splits are usually suf cient for training &
estimating the generalization performance in deep learnin

• However, if you are interested, we covered these topics in the


model evaluation lectures in my STAT 451 class. If you are
interested, you can nd a compilation of the lecture material here:

Raschka, S. (2018). Model evaluation, model selection, and algorithm


selection in machine learning. arXiv preprint arXiv:1811.12808.

https://arxiv.org/pdf/1811.12808.pdf
Sebastian Raschka STAT 453: Intro to Deep Learning 46
fi
s

fi
g

Deep Learning Works Best with Large Datasets

Generalization
Error

Traditional Machine Learning

Deep Learning

Training Dataset Size

Sebastian Raschka STAT 453: Intro to Deep Learning 47


function of model size, but also as a function of the nu
We unify the above phenomena by defining a new com
the effective model complexity and conjecture a general
respect to this measure. Furthermore, our notion of mode
identify certain regimes where increasing (even quadrup
samples actually hurts test performance.
https://arxiv.org/abs/1912.02292

1 I NTRODUCTION
Architectures: CNNs (standard & ResNet) and transformers trained with cross-entropy loss

Left:
Figure 1:Sebastian Train STAT
Raschka
and test error as a function of model size
453: Intro to Deep Learning 48
E POCH - WISE D OUBLE D ESCENT

this section, we demonstrate a novel form of double-descent with respect to training epochs,
hich is consistent with our unified view of effective model complexity (EMC) and the generalized
ouble descent hypothesis. Increasing the train time increases the EMC—and thus a sufficiently
rge model transitions from under- to over-parameterized over the course of training.
https://arxiv.org/abs/1912.02292

9: Left: Training dynamics for models in three regimes. Models are ResNet18s on CIFAR10
gureThoughts:
ith 20% label noise, trained using Adam with learning rate 0.0001, and data augmentation. Right:
est error
• atover (Model
critical size
region, ⇥ Epochs).
only Three
one model slices
ts the ofwell
data this and
plot is
arevery
shown on thetoleft.
sensitive nois
• overparametrized models: many t the data well, SGD nds one that memorizes the
training set but also performs well on the test set
s illustrated in Figure 9, sufficiently
Sebastianlarge models
Raschka can
STAT 453: undergo
Intro to Deepa “double descent” behavior where49
Learning

fi
fi
fi
e

https://arxiv.org/abs/1912.02292

Figure 2: Left: Test error as a function of model size and train epochs. The horizontal line corre-
sponds to model-wise double descent–varying model size while training for as long as possible. The
vertical line corresponds to epoch-wise double descent, with test error undergoing double-descent
as train time increases. Right Train error of the corresponding models. All models are Resnet18s
trained on CIFAR-10 with 15% label noise, data-augmentation, and Adam for up to 4K epochs.

Sebastian Raschka STAT 453: Intro to Deep Learning 50


These twin effects are shown in Figure 11a. Note that there is a ran
effects “cancel out”—and having 4⇥ more train samples does not
training to completion. Outside the critically-parameterized regime, fo
parameterized models, having more samples helps. This phenomenon
which shows test error as a function of both model and sample size, in th
https://arxiv.org/abs/1912.02292

Sebastian Raschka STAT 453: Intro to Deep Learning 51


Training multilayer perceptrons with your own
datasets

1. Multilayer Perceptron Architecture

2. Nonlinear Activation Functions

3. Multilayer Perceptron Code Examples

4. Over tting and Under tting

5. Cats & Dogs and Custom Data Loaders

Sebastian Raschka STAT 453: Intro to Deep Learning 52


fi
fi
VGG16 Convolutional Neural Network for
Kaggle's Cats and Dogs Images
A "real world" example

https://github.com/rasbt/deeplearning-models/blob/
master/pytorch_ipynb/cnn/cnn-vgg16-cats-
dogs.ipynb

Sebastian Raschka STAT 453: Intro to Deep Learning 53


Training/Validation/Test splits

Ratio depends on the dataset size, but a 80/5/15 split is


usually a good idea

• Training set is used for training, it is not necessary to plot the


training accuracy during training but it can be usefu
• Validation set accuracy provides a rough estimate of the
generalization performance (it can be optimistically biased if
you design the network to do well on the validation set
("information leakage"
• Test set should only be used once to get an unbiased estimate
of the generalization performance

Sebastian Raschka STAT 453: Intro to Deep Learning 54


)

Training/Validation/Test splits
Epoch: 001/100 | Batch 000/156 | Cost: 1136.912
Epoch: 001/100 | Batch 120/156 | Cost: 0.632
Epoch: 001/100 Train Acc.: 63.35% | Validation Acc.: 62.12
Time elapsed: 3.09 mi
Epoch: 002/100 | Batch 000/156 | Cost: 0.667
Epoch: 002/100 | Batch 120/156 | Cost: 0.664
Epoch: 002/100 Train Acc.: 66.05% | Validation Acc.: 66.32
Time elapsed: 6.15 mi
Epoch: 003/100 | Batch 000/156 | Cost: 0.613
Epoch: 003/100 | Batch 120/156 | Cost: 0.631
Epoch: 003/100 Train Acc.: 65.82% | Validation Acc.: 63.76
Time elapsed: 9.21 mi
Epoch: 004/100 | Batch 000/156 | Cost: 0.599
Epoch: 004/100 | Batch 120/156 | Cost: 0.583
Epoch: 004/100 Train Acc.: 66.75% | Validation Acc.: 64.52
Time elapsed: 12.27 mi
Epoch: 005/100 | Batch 000/156 | Cost: 0.591
Epoch: 005/100 | Batch 120/156 | Cost: 0.574
Epoch: 005/100 Train Acc.: 68.29% | Validation Acc.: 67.00
Time elapsed: 15.33 mi
...

Sebastian Raschka STAT 453: Intro to Deep Learning 55


n

Parameters vs Hyperparameters
• weights (weight parameters • minibatch siz
• biases (bias units) • data normalization scheme
• number of epoch
• number of hidden layer
• number of hidden unit
• learning rate
• (random seed, why?
• loss functio
• various weights (weighting terms
• activation function type
• regularization schemes (more later
• weight initialization schemes (more later
• optimization algorithm type (more later
• ...

(Mostly no scienti c explanation, mostly engineering;


need to try many things -> "graduate student descent")

Sebastian Raschka STAT 453: Intro to Deep Learning 56


n

fi
s

https://twitter.com/_ScottCondron/status/1363494433715552259?s=20

Sebastian Raschka STAT 453: Intro to Deep Learning 57


Custom DataLoader Classes ...
• Example showing how you can create your own data loader to ef ciently
iterate through your own collection of images
(pretend the MNIST images there are some custom image collection)

https://github.com/rasbt/stat453-deep-learning-ss20/blob/main/L09/code/
custom-dataloader

Sebastian Raschka STAT 453: Intro to Deep Learning 58


fi
Good news: We can solve non-linear
problems now! Yay! :)

Bad news: Our multilayer neural nets


have a lot of parameters now, and it's
easy to over t the the data! :(

Sebastian Raschka STAT 453: Intro to Deep Learning 59


fi
Next Lecture

Regularization for deep neural network


to prevent over tting

Sebastian Raschka STAT 453: Intro to Deep Learning 60


fi
s

You might also like