L10 Regularization Slides
L10 Regularization Slides
Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching
Lecture 10
Regularization Methods
with Applications in Python
• Early stopping
• L1/L2 regularization (norm penalties)
• Dropout
5. Dropout
Original
Randomly Augmented
https://github.com/rasbt/stat453-deep-learning-ss21/blob/master/L10/code/data-augmentation.ipynb
https://github.com/rasbt/stat453-deep-learning-ss21/blob/master/L10/code/data-augmentation.ipynb
• use test set only once at the end (for unbiased estimate of
generalization performance)
• use validation accuracy for tuning (always recommended)
Dataset
Training Validation Test
dataset dataset dataset
Accuracy
Validation set
Epochs
Sebastian Raschka STAT 453: Intro to Deep Learning 16
fi
Other Ways for Dealing with Over tting
if Collecting More Data is not Feasible
Xn
1 [i] [i]
Costw,b = L(y , ŷ )
n i=1
<latexit sha1_base64="V59LNcdyxr0eVoJo+5mccNlvb8E=">AAACUHicbVHLatwwFL2e9JG6r2m67EZ0KKRQBjsttJtASDZddDGBzExg7BhZI8+IyLKRrpMaoU/MJrt8RzZdtLSaF7RJLwgdnXMuujrKaykMRtFN0Nl68PDR4+0n4dNnz1+87L7aGZmq0YwPWSUrfZpTw6VQfIgCJT+tNadlLvk4Pz9a6OMLro2o1Am2NU9LOlOiEIyip7LuLEH+He1RZdBlNikpzvPCXroPZINz5wjZJ0mhKbOxs8qRxDRlZsV+7M4UISsno9J+c7vtmZ2I1LeHyZyibd3q/D7r9qJ+tCxyH8Rr0IN1DbLudTKtWFNyhUxSYyZxVGNqqUbBJHdh0hheU3ZOZ3zioaIlN6ldBuLIO89MSVFpvxSSJft3h6WlMW2Ze+didnNXW5D/0yYNFl9SK1TdIFdsdVHRSIIVWaRLpkJzhrL1gDIt/KyEzalPDv0fhD6E+O6T74PRXj/+2I+OP/UODtdxbMMbeAu7EMNnOICvMIAhMLiCW/gJv4Lr4EfwuxOsrJsdXsM/1Qn/AM9ytVA=</latexit>
Xn X
1
L2-Regularized-Costw,b = L(y [i] , ŷ [i] ) + wj2
n i=1 n j
<latexit sha1_base64="UZp3ipt8/eQftFzaePzqg+XpUpo=">AAACgHicbVFdb9MwFHXCYKN8rINHXqxVSEWsXVKQhpAmTexlD3sYaN0mNWnkOE7rzXEi+4YRLP8O/hdv/Bgk3DYTbONKls4998P3nptWgmsIgl+e/2Dt4aP1jcedJ0+fPd/sbr0402WtKBvTUpTqIiWaCS7ZGDgIdlEpRopUsPP06nARP//KlOalPIWmYnFBZpLnnBJwVNL9EQH7BuZ4NPjCZrUgin9n2eCw1GATExUE5mluru0OvsGptRjv4yhXhJrQGmlxpOsiMXw/tFOJ8SqTEmGObb+ZmgmPXXknmhMwjV35b/DbtkMk3KwZ+dvnEl8nl9NR0u0Fw2Bp+D4IW9BDrZ0k3Z9RVtK6YBKoIFpPwqCC2BAFnApmO1GtWUXoFZmxiYOSFEzHZimgxa8dk+G8VO5JwEv23wpDCq2bInWZi+X03diC/F9sUkP+ITZcVjUwSVcf5bXAUOLFNXDGFaMgGgcIVdzNiumcOGHA3azjRAjvrnwfnI2G4bth8Pl97+BTK8cGeoW2UR+FaA8doCN0gsaIot9ez9vxBr7v9/1dP1yl+l5b8xLdMv/jHwobwpQ=</latexit>
X
where: wj2 = ||w||22
<latexit sha1_base64="ibct1zBUFvljjClJ/FYMGMyWyc4=">AAACCnicbVDLSsNAFJ34rPUVdelmtAiuShIF3QhFNy4r2Ac0aZhMJ+20kwczE0tJunbjr7hxoYhbv8Cdf+Ok7UJbD1w4nHMv997jxYwKaRjf2tLyyuraemGjuLm1vbOr7+3XRZRwTGo4YhFvekgQRkNSk1Qy0ow5QYHHSMMb3OR+44FwQaPwXo5i4gSoG1KfYiSV5OpHtkgCtw+Hbr9twSuYZXaAZM/z0+E4y1yrbbl6ySgbE8BFYs5ICcxQdfUvuxPhJCChxAwJ0TKNWDop4pJiRsZFOxEkRniAuqSlaIgCIpx08soYniilA/2IqwolnKi/J1IUCDEKPNWZ3ynmvVz8z2sl0r90UhrGiSQhni7yEwZlBPNcYIdygiUbKYIwp+pWiHuIIyxVekUVgjn/8iKpW2XzrGzcnZcq17M4CuAQHINTYIILUAG3oApqAINH8AxewZv2pL1o79rHtHVJm80cgD/QPn8AjqmaLA==</latexit>
j
and λ is a hyperparameter
wi
<latexit sha1_base64="mKDmnWfGezBcThHlfbPZ8Pz+m5g=">AAAB83icbVBNS8NAFHypX7V+VT16WSyCp5KooMeiF48VbC00pWy2L+3SzSbsbpQS+je8eFDEq3/Gm//GTZuDtg4sDDPv8WYnSATXxnW/ndLK6tr6RnmzsrW9s7tX3T9o6zhVDFssFrHqBFSj4BJbhhuBnUQhjQKBD8H4JvcfHlFpHst7M0mwF9Gh5CFn1FjJ9yNqRkGYPU37vF+tuXV3BrJMvILUoECzX/3yBzFLI5SGCap113MT08uoMpwJnFb8VGNC2ZgOsWuppBHqXjbLPCUnVhmQMFb2SUNm6u+NjEZaT6LATuYZ9aKXi/953dSEV72MyyQ1KNn8UJgKYmKSF0AGXCEzYmIJZYrbrISNqKLM2JoqtgRv8cvLpH1W987r7t1FrXFd1FGGIziGU/DgEhpwC01oAYMEnuEV3pzUeXHenY/5aMkpdg7hD5zPH35jkfk=</latexit>
Xn L
X
1 [i] [i] (l) 2
L2-Regularized-Costw,b = L(y , ŷ ) + ||w ||F
n i=1 n
<latexit sha1_base64="RCGZxKRvoEmWPZXkSusdbx4t9Lk=">AAACm3icbVFdb9MwFHXC1ygf6+AFCSFZVEidYFVSJsHL0EQlmFAfBqLbpKaNHMdprTlOZN8AwfWf4qfwxr/BaYMYG1eydHzuPfdeHyel4BqC4JfnX7t+4+atrdudO3fv3d/u7jw40UWlKJvQQhTqLCGaCS7ZBDgIdlYqRvJEsNPkfNTkT78wpXkhP0NdsllOFpJnnBJwVNz9EQH7BmY83PvEFpUgin9n6d6o0GBjE+UElklmvtoX+A9OrMX4AEeZItSE1kiLI13lseEHoZ1LjDeVlAgztv16bqZ85uSdaEnA1HZz38XP2w6RcLum5EIf0fRxWrxa/Z0/N32xa1er+N18GHd7wSBYB74Kwhb0UBvHcfdnlBa0ypkEKojW0zAoYWaIAk4Fs52o0qwk9Jws2NRBSXKmZ2btrcXPHJPirFDuSMBr9qLCkFzrOk9cZbOtvpxryP/lphVkr2eGy7ICJulmUFYJDAVuPgqnXDEKonaAUMXdrpguifMM3Hd2nAnh5SdfBSfDQfhyEH7c7x2+be3YQo/RU9RHIXqFDtEROkYTRL1H3hvvvXfkP/FH/gd/vCn1vVbzEP0T/uQ3kk3NUw==</latexit>
l=1
(l) 2
where ||w ||F is the Frobenius norm (squared): <latexit sha1_base64="71TeQNuRgLqGJEbkEvQmFwkRrF4=">AAACAXicbVDLSsNAFJ3UV62vqBvBzWAR6qYkVdBlURCXFewD2jRMppN26OTBzEQpSdz4K25cKOLWv3Dn3zhps9DWAxcO59zLvfc4IaNCGsa3VlhaXlldK66XNja3tnf03b2WCCKOSRMHLOAdBwnCqE+akkpGOiEnyHMYaTvjq8xv3xMuaODfyUlILA8NfepSjKSSbP0gSXoekiPHjR/SflxhJ2mS2Nf9mq2XjaoxBVwkZk7KIEfD1r96gwBHHvElZkiIrmmE0ooRlxQzkpZ6kSAhwmM0JF1FfeQRYcXTD1J4rJQBdAOuypdwqv6eiJEnxMRzVGd2rZj3MvE/rxtJ98KKqR9Gkvh4tsiNGJQBzOKAA8oJlmyiCMKcqlshHiGOsFShlVQI5vzLi6RVq5qnVfP2rFy/zOMogkNwBCrABOegDm5AAzQBBo/gGbyCN+1Je9HetY9Za0HLZ/bBH2ifP8XjlxM=</latexit>
XX (l) 2
(l) 2
||w ||F = (wi,j )
<latexit sha1_base64="wC+HVng1YzDyBZBFO6aEDDWL7aY=">AAACJHicbVBJSwMxGM3Urdat6tFLsAgtSJmpgoIIRUE8VrALdKZDJs20aTMLScZSpvNjvPhXvHhwwYMXf4vpctDWByGP976P5D0nZFRIXf/SUkvLK6tr6fXMxubW9k52d68mgohjUsUBC3jDQYIw6pOqpJKRRsgJ8hxG6k7/euzXHwgXNPDv5TAkloc6PnUpRlJJdvZiNDI9JLuOGw+SVpxnhWQ0sm9aJXgJTRF5Np1ePZgfTG07psewlxRaJTub04v6BHCRGDOSAzNU7Oy72Q5w5BFfYoaEaBp6KK0YcUkxI0nGjAQJEe6jDmkq6iOPCCuehEzgkVLa0A24Or6EE/X3Row8IYaeoybHgcS8Nxb/85qRdM+tmPphJImPpw+5EYMygOPGYJtygiUbKoIwp+qvEHcRR1iqXjOqBGM+8iKplYrGSdG4O82Vr2Z1pMEBOAR5YIAzUAa3oAKqAINH8AxewZv2pL1oH9rndDSlzXb2wR9o3z95QaQC</latexit>
i j
@L
wi,j := wi,j ⌘
<latexit sha1_base64="5ZjwwihY2mDS1cW07vV16R1eYEo=">AAACNHicbVDLSgMxFM34rPVVdekmWAQXWmZUUASh6EbQRQX7gE4pd9JMG5t5kGSUMsxHufFD3IjgQhG3foOZtlRtPRA4nHvPzb3HCTmTyjRfjKnpmdm5+cxCdnFpeWU1t7ZekUEkCC2TgAei5oCknPm0rJjitBYKCp7DadXpnqf16h0VkgX+jeqFtOFB22cuI6C01Mxd3jdjtotvE3xyikd8D9tUAbZdASS2QxCKAce2B6pDgMdXSfKjjlxJM5c3C2YfeJJYQ5JHQ5SauSe7FZDIo74iHKSsW2aoGnE6mHCaZO1I0hBIF9q0rqkPHpWNuH90gre10sJuIPTzFe6rvx0xeFL2PEd3pnvL8Voq/lerR8o9bsTMDyNFfTL4yI04VgFOE8QtJihRvKcJEMH0rph0QCeldM5ZHYI1fvIkqewXrIOCdX2YL54N48igTbSFdpCFjlARXaASKiOCHtAzekPvxqPxanwYn4PWKWPo2UB/YHx9Aw3LqpE=</latexit>
@wi,j
# regularize loss
L2 = 0.
for name, p in model.named_parameters()
if 'weight' in name
L2 = L2 + (p**2).sum(
optimizer.zero_grad(
cost.backward()
Dropout
1. Improving generalization performance
2. Avoiding over tting with (1) more data and (2) data
augmentation
3. Reducing network capacity & early stopping
4. Adding norm penalties to the loss: L1 & L2 regularization
5. Dropout
5.1 The Main Concept Behind Dropout
5.2 Dropout: Co-Adaptation Interpretation
5.3 Dropout: Ensemble Method Interpretation
5.4 Dropout in PyTorch
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014).
Dropout: a simple way to prevent neural networks from overfitting. The Journal of
Machine Learning Research, 15(1), 1929-1958.
(1)
a1 <latexit sha1_base64="51Rbp1GGPW28qr7Kl7NY0LPiq2o=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5Iu5Zsmm1Dk+ySZIWy9Fd48aCIV3+ON/+N6XYP2vpg4PHeDDPzgpgzbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRWiTRDxSnQBrypmkTcMMp51YUSwCTtvB+Gbmt5+o0iyS92YSU1/goWQhI9hY6QH3vce06p1O++WKW3MzoGXi5aQCORr98ldvEJFEUGkIx1p3PTc2foqVYYTTaamXaBpjMsZD2rVUYkG1n2YHT9GJVQYojJQtaVCm/p5IsdB6IgLbKbAZ6UVvJv7ndRMTXvkpk3FiqCTzRWHCkYnQ7Hs0YIoSwyeWYKKYvRWREVaYGJtRyYbgLb68TFpnNe+85t5dVOrXeRxFOIJjqIIHl1CHW2hAEwgIeIZXeHOU8+K8Ox/z1oKTzxzCHzifP5zWj58=</latexit>
x1 (2)
a1
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>
<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>
(1)
a2
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>
o
<latexit sha1_base64="zmvhV5w6wvufBjgJnplzs3qmpp8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69BIvgqSQq6LHoxWML9gPaUDbbSbt2sxt2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemHCmjed9O4W19Y3NreJ2aWd3b/+gfHjU0jJVFJtUcqk6IdHImcCmYYZjJ1FI4pBjOxzfzfz2EyrNpHgwkwSDmAwFixglxkoN2S9XvKo3h7tK/JxUIEe9X/7qDSRNYxSGcqJ11/cSE2REGUY5Tku9VGNC6JgMsWupIDHqIJsfOnXPrDJwI6lsCePO1d8TGYm1nsSh7YyJGellbyb+53VTE90EGRNJalDQxaIo5a6R7uxrd8AUUsMnlhCqmL3VpSOiCDU2m5INwV9+eZW0Lqr+ZdVrXFVqt3kcRTiBUzgHH66hBvdQhyZQQHiGV3hzHp0X5935WLQWnHzmGP7A+fwB2T+M9Q==</latexit>
(2)
x2
<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>
a2
<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>
(1)
a3 <latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>
• p := drop probability
• v := random sample from uniform distribution in range [0, 1]
• 8i 2 v : vi := 0 if vi < p else 1
<latexit sha1_base64="JuOrLqiuNAIE5LDs5HvARei8z9U=">AAACLnicbZBNSyNBEIZ7/IzxK+rRS2EQPIUZkV0RF0QRPEYwKmRC6OnUaGNPz9BdEzYM+UVe/Cu7B0FFvPoz7IkRXN0XGl6eqqKr3ihT0pLvP3gTk1PTM7OVuer8wuLScm1l9dymuRHYEqlKzWXELSqpsUWSFF5mBnkSKbyIbo7K+kUfjZWpPqNBhp2EX2kZS8HJoW7tOIxTw5UCCaHUECacrqO46A/3oN+Ve7/Ah5DwNxUgYxiWDPYh+2CoLDoadGt1v+GPBN9NMDZ1NlazW/sb9lKRJ6hJKG5tO/Az6hTckBQKh9Uwt5hxccOvsO2s5gnaTjE6dwibjvTA7e2eJhjRzxMFT6wdJJHrLM+xX2sl/F+tnVO82ymkznJCLd4/inMFlEKZHfSkQUFq4AwXRrpdQVxzwwW5hKsuhODryd/N+XYj+NHwT3fqB4fjOCpsnW2wLRawn+yAnbAmazHBbtkf9sievDvv3nv2Xt5bJ7zxzBr7R97rG1YApto=</latexit>
• p := drop probability
• v := random sample from uniform distribution in range [0, 1]
• 8i 2 v : vi := 0 if vi < p else 1
<latexit sha1_base64="JuOrLqiuNAIE5LDs5HvARei8z9U=">AAACLnicbZBNSyNBEIZ7/IzxK+rRS2EQPIUZkV0RF0QRPEYwKmRC6OnUaGNPz9BdEzYM+UVe/Cu7B0FFvPoz7IkRXN0XGl6eqqKr3ihT0pLvP3gTk1PTM7OVuer8wuLScm1l9dymuRHYEqlKzWXELSqpsUWSFF5mBnkSKbyIbo7K+kUfjZWpPqNBhp2EX2kZS8HJoW7tOIxTw5UCCaHUECacrqO46A/3oN+Ve7/Ah5DwNxUgYxiWDPYh+2CoLDoadGt1v+GPBN9NMDZ1NlazW/sb9lKRJ6hJKG5tO/Az6hTckBQKh9Uwt5hxccOvsO2s5gnaTjE6dwibjvTA7e2eJhjRzxMFT6wdJJHrLM+xX2sl/F+tnVO82ymkznJCLd4/inMFlEKZHfSkQUFq4AwXRrpdQVxzwwW5hKsuhODryd/N+XYj+NHwT3fqB4fjOCpsnW2wLRawn+yAnbAmazHBbtkf9sievDvv3nv2Xt5bJ7zxzBr7R97rG1YApto=</latexit>
(you may know this as the "geometric mean" from other classes)
• However, using the last model after training and scaling the
predictions by a factor 1-p approximates the geometric mean
and is much cheaper
(actually, it's exactly the geometric mean if we have a linear
model)
cost.backward(
minibatch_cost.append(cost
### UPDATE MODEL PARAMETERS
optimizer.step(
model.eval(
with torch.no_grad()
cost = compute_loss(model, train_loader
epoch_cost.append(cost
print('Epoch: %03d/%03d Train Cost: %.4f' %
epoch+1, NUM_EPOCHS, cost)
print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))
Without dropout:
https://github.com/rasbt/stat453-deep-learning-ss21/blob/master/L10/code/dropout.ipynb
(1)
a1 <latexit sha1_base64="51Rbp1GGPW28qr7Kl7NY0LPiq2o=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5Iu5Zsmm1Dk+ySZIWy9Fd48aCIV3+ON/+N6XYP2vpg4PHeDDPzgpgzbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRWiTRDxSnQBrypmkTcMMp51YUSwCTtvB+Gbmt5+o0iyS92YSU1/goWQhI9hY6QH3vce06p1O++WKW3MzoGXi5aQCORr98ldvEJFEUGkIx1p3PTc2foqVYYTTaamXaBpjMsZD2rVUYkG1n2YHT9GJVQYojJQtaVCm/p5IsdB6IgLbKbAZ6UVvJv7ndRMTXvkpk3FiqCTzRWHCkYnQ7Hs0YIoSwyeWYKKYvRWREVaYGJtRyYbgLb68TFpnNe+85t5dVOrXeRxFOIJjqIIHl1CHW2hAEwgIeIZXeHOU8+K8Ox/z1oKTzxzCHzifP5zWj58=</latexit>
x1 (2)
a1
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>
<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>
(1)
a2
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>
o
<latexit sha1_base64="zmvhV5w6wvufBjgJnplzs3qmpp8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69BIvgqSQq6LHoxWML9gPaUDbbSbt2sxt2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemHCmjed9O4W19Y3NreJ2aWd3b/+gfHjU0jJVFJtUcqk6IdHImcCmYYZjJ1FI4pBjOxzfzfz2EyrNpHgwkwSDmAwFixglxkoN2S9XvKo3h7tK/JxUIEe9X/7qDSRNYxSGcqJ11/cSE2REGUY5Tku9VGNC6JgMsWupIDHqIJsfOnXPrDJwI6lsCePO1d8TGYm1nsSh7YyJGellbyb+53VTE90EGRNJalDQxaIo5a6R7uxrd8AUUsMnlhCqmL3VpSOiCDU2m5INwV9+eZW0Lqr+ZdVrXFVqt3kcRTiBUzgHH66hBvdQhyZQQHiGV3hzHp0X5935WLQWnHzmGP7A+fwB2T+M9Q==</latexit>
(2)
x2
<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>
a2
<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>
(1)
a3 <latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>
• Generalization of Dropout
• More "possibilities"
• Less popular & doesn't work so well in practice
• Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014).
Dropout: a simple way to prevent neural networks from over tting. The Journal of
Machine Learning Research, 15(1), 1929-1958.
http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf