0% found this document useful (0 votes)
24 views8 pages

Semi-Supervised Learning

Uploaded by

asim zaman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views8 pages

Semi-Supervised Learning

Uploaded by

asim zaman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Advanced Topics;

Semi-Supervised Learning YouTube Playlist

Maziar Raissi

Assistant Professor

Department of Applied Mathematics

University of Colorado Boulder

[email protected]

<latexit sha1_base64="USyVVUP1NTUKfRIMnttD99743OA=">AAACL3icbVC7TgMxEPTxDO8AJY1FhESTcBckoIygoQwSgUghQnu+vcTCZ59sH1IU5UP4DL6AFr4A0SAaCv4CX3IFEKYaze5odidMBTfW99+8mdm5+YXF0tLyyura+kZ5c+vKqEwzbDEllG6HYFBwiS3LrcB2qhGSUOB1eHeWz6/vURuu5KUdpNhNoCd5zBlYJ92WD6tVqtFZDEo71qhA0JLLHgUZHShNBYQoaKpVCr3CVfFr/hh0mgQFqZACzdvy502kWJa4CCbAmE7gp7Y7BG05EzhavskMpsDuoIcdRyUkaLrD8XMjuueUiMbukFhJS8fqT8cQEmMGSeg2E7B983eWi//NOpmNT7pDLtPMomSToDgT1CqaN0UjrpFZMXAEmObuVsr6oIFZ1+evlMjkp41cL8HfFqbJVb0WHNXqF/VK47RoqER2yC7ZJwE5Jg1yTpqkRRh5IE/kmbx4j96r9+59TFZnvMKzTX7B+/oGw0iqPw==</latexit>

– representation learning and/or label propagation


Virtual Adversarial Training: A Regularization Method
for Supervised and Semi-Supervised Learning YouTube Video

! full objective function


<latexit sha1_base64="gzwztjgv+Clk1hWNdfqpT8wecAQ=">AAACJ3icbVDLSgNBEJz1bXxFPXoZDIJ4CLsq6lH04lHBqJCEMDvpTcbMziwzvWpY8g/+hj/gVf/Am+jRi9/hbMzBJDY0FNXVdFeFiRQWff/Tm5icmp6ZnZsvLCwuLa8UV9eurE4NhwrXUpubkFmQQkEFBUq4SQywOJRwHXZO8/n1HRgrtLrEbgL1mLWUiARn6KhGcadmRKuNzBh9T2sID5hFqZRUh7fAUdwBjVLFc22vUSz5Zb9fdBwEA1AigzpvFL9rTc3TGBRyyaytBn6C9YwZFFxCr1BLLSSMd1gLqg4qFoOtZ31PPbrlmCaNtHGtkPbZvxsZi63txqFTxgzbdnSWk//NqilGR/VMqCRFUPz3kLNMUdM8INoUxjmXXQcYN8L9SnmbGcbRxTh0pWnz13oFF0wwGsM4uNotBwflvYv90vHJIKI5skE2yTYJyCE5JmfknFQIJ4/kmbyQV+/Je/PevY9f6YQ32FknQ+V9/QAY4KfG</latexit>

! labeled dataset
<latexit sha1_base64="XjVonP3p0I5E2o3Vunxr8nNHdOU=">AAACH3icbVBJSgNBFK12Nk6tLt0UBsFV6FZRl0E3LhVMIiRN+F39kxRWD1T9VkOTvdfwAm71Bu7EbS7gOawMC6cHBY/3/lQvzJQ05HlDZ2Z2bn5hcWm5tLK6tr7hbm7VTZprgTWRqlTfhGBQyQRrJEnhTaYR4lBhI7w9H/mNO9RGpsk19TMMYugmsiMFkJXa7m5Ly26PQOv0nrcIH6hQEKLCiEdAdjAN2m7Zq3hj8L/En5Iym+Ky7X62olTkMSYkFBjT9L2MggI0SaFwUGrlBjMQt9DFpqUJxGiCYvyXAd+zSsQ7qbYvIT5Wv3cUEBvTj0NbGQP1zG9vJP7nNXPqnAaFTLKcMBGTRZ1ccUr5KBgeSY2CVN8SEFraW7nogQZBNr4fWyIzOm1QssH4v2P4S+oHFf+4cnh1VK6eTSNaYjtsl+0zn52wKrtgl6zGBHtkz+yFvTpPzpvz7nxMSmecac82+wFn+AU+8KQx</latexit>

negative log likelihood for labeled data


<latexit sha1_base64="6uR6GyR/44f+kwGxHtCdst6b6Sc=">AAACJXicbVDLSgMxFM3Ud31VXboJFsFVmamgLotuXCrYVqhDuZO504ZmkiHJFErpL/gb/oBb/QN3Irhy53eYtrPwdVaHc+7NPTlRJrixvv/ulRYWl5ZXVtfK6xubW9uVnd2WUblm2GRKKH0bgUHBJTYttwJvM42QRgLb0eBi6reHqA1X8saOMgxT6EmecAbWSd3KkcSeo0OkQvWo4AP3Ul+pmCZKUwERCoxpDBa6lapf82egf0lQkCopcNWtfN7FiuUpSssEGNMJ/MyGY9CWM4GT8l1uMAM2gB52HJWQognHsx9N6KFT5hkSJS2dqd83xpAaM0ojN5mC7Zvf3lT8z+vkNjkLx1xmuUXJ5oeSXFCr6LQeGnONzIqRI8A0d1kp64MGZl2JP67EZhptUnbFBL9r+Eta9VpwUju+rlcb50VFq2SfHJAjEpBT0iCX5Io0CSP35JE8kWfvwXvxXr23+WjJK3b2yA94H1+HEKXB</latexit>

! unlabeled dataset
<latexit sha1_base64="WX3/j1Ls4TL2QAWMVZvgfFmc5mE=">AAACIXicbVC7SgNBFJ31bXxFLW0Go2AVdlXUMmhjqWCikIRwd/YmGZydXWbuqmHJD/gb/oCt/oGd2Im93+HkUfg6MHA4577mhKmSlnz/3ZuYnJqemZ2bLywsLi2vFFfXajbJjMCqSFRirkKwqKTGKklSeJUahDhUeBlenwz8yxs0Vib6gnopNmPoaNmWAshJreJWw8hOl8CY5JY3CO8oz7SCEBVGPAJyo6nfKpb8sj8E/0uCMSmxMc5axc9GlIgsRk1CgbX1wE+pmYMhKRT2C43MYgriGjpYd1RDjLaZD3/T59tOiXg7Me5p4kP1e0cOsbW9OHSVMVDX/vYG4n9ePaP2UTOXOs0ItRgtameKU8IH0fBIGhSkeo6AMNLdykUXDAhyAf7YEtnBaf2CCyb4HcNfUtstBwflvfP9UuV4HNEc22CbbIcF7JBV2Ck7Y1Um2D17ZE/s2XvwXrxX721UOuGNe9bZD3gfXxDxpSg=</latexit>

Train p(y|x, ✓) using Dl and Dul .


<latexit sha1_base64="HodMeROlhay1ISPStSkT9GLwkeQ=">AAACQHicbVDLSsNAFJ34rPVVdelmsBEUpCQK6lKsC5cKVoW2lJvJtB2cTMLMjVhiPsjf8Afc6g+IO3HrykntwqoHBg7n3Ms9c4JECoOe9+JMTE5Nz8yW5srzC4tLy5WV1UsTp5rxBotlrK8DMFwKxRsoUPLrRHOIAsmvgpt64V/dcm1ErC5wkPB2BD0luoIBWqlTqV9oEIq6ydbg/m6nhX2OsO3S1AjVo24rAuwzkNlJ3pEuBRWOa1kqc7fWqVS9mjcE/Uv8EamSEc46lddWGLM04gqZBGOavpdgOwONgkmel1up4QmwG+jxpqUKIm7a2fCzOd20Ski7sbZPIR2qPzcyiIwZRIGdLJKa314h/uc1U+wetjOhkhS5Yt+HuqmkGNOiORoKzRnKgSXAtLBZKeuDBoa237EroSmi5WVbjP+7hr/kcrfm79f2znerR8ejikpknWyQLeKTA3JETskZaRBGHsgTeSYvzqPz5rw7H9+jE85oZ42Mwfn8Aiqyr3w=</latexit>

Adversarial Training
<latexit sha1_base64="nv2tOMZFUebmxB3CejJlp3054qE=">AAACHHicbVDLSsNAFJ3UV62vqksXDhbBVUkqqMuqG5cV2lpoQ7mZTNqhk0mYmQgldOlv+ANu9Q/ciVvBH/A7nLRZ2NYDA4dz7p17OF7MmdK2/W0VVlbX1jeKm6Wt7Z3dvfL+QVtFiSS0RSIeyY4HinImaEszzWknlhRCj9MHb3Sb+Q+PVCoWiaYex9QNYSBYwAhoI/XLx73pH6mk/uTazyZBMuC4KYEJJgb9csWu2lPgZeLkpIJyNPrln54fkSSkQhMOSnUdO9ZuClIzwumk1EsUjYGMYEC7hgoIqXLTaYgJPjWKj4NImic0nqp/N1IIlRqHnpkMQQ/VopeJ/3ndRAdXbspEnGgqyOxQkHCsI5y1gn0mKdF8bAgQyUxWTIYggWjTyNwVX2XRJiVTjLNYwzJp16rORfX8vlap3+QVFdEROkFnyEGXqI7uUAO1EEFP6AW9ojfr2Xq3PqzP2WjByncO0Rysr1+9gKLc</latexit>

Reduction LDS would make the model smooth at each data point.
<latexit sha1_base64="Qkkr92Jle5Ax8lZdZomMBSDeFmw=">AAACO3icbZA9TxtBEIb3ICTEkMQJJc0oFhKVdUckSBWhJAUFBZAYkIxlze2NuZX347Q7B7Is/5v8Df4AbdJQI1GgtPS5My7Cxyut9OidGc3smxZaBY7jq2hu/sXCy1eLrxtLy2/evmu+/3AYXOkldaTTzh+nGEgrSx1WrOm48IQm1XSUDr/V9aMz8kE5+5NHBfUMnlo1UBK5svrNLweUlbJm2P3+A85dqTMwOCTgnMC4jDQE4xzngAyEMocMGaFwynK70W+24nY8FTyFZAYtMdNev3lzkjlZGrIsNYbQTeKCe2P0rKSmSeOkDFSgHOIpdSu0aCj0xtN/TmCtcjIYOF89yzB1/58YowlhZNKq0yDn4XGtNp+rdUsefO6NlS1KJivvFw1KDeygDg0y5UmyHlWA0qvqVpA5epRcRftgSxbq0yZ1MMnjGJ7C4UY72Wx/2t9obX+dRbQoVsVHsS4SsSW2xY7YEx0hxS9xKX6LP9FFdB3dRn/vW+ei2cyKeKDo7h87nq2B</latexit>

VAT on semi-supervised learning can be given a similar interpretation as label propagation.


<latexit sha1_base64="mYvT/27SKLZ8v5G3AuYig8K9fG4=">AAACYHicbVA9bxNBEF1f+DDmIw500IywkGiw7oIElIE0lIkUO5Fsy5pbjy+r7Jd25yysk38gP4GWIm1a6HJ3cUESnrTS03szO08v91pFTtNfnWTnwcNHj7tPek+fPX+x2997OY6uDJJG0mkXznKMpJWlESvWdOYDock1neYXh41/uqIQlbMnvPY0M1hYtVQSuZbmfTll+sHV+OsJOAuRjPoQS09hpSItQBMGq2wBEi3kBIVakQWEqIzSGEBZplDf4/Y3wAgac9Lgg/NYtOJw05v3B+kwbQH3SbYlA7HF0bx/OV04WRqyLDXGOMlSz7MKAyupadOblpE8ygssaFJTi4birGrL2MC7skm+dKF+lqFV/92o0MS4Nnk9aZDP412vEf/nTUpefplVyvqSycqbQ8tSAztomoWFCiRZr2uCMqg6K8hzDCjrjm5fWcQmWltMdreG+2S8P8w+DT8e7w8Ovm0r6oo34q14LzLxWRyI7+JIjIQUP8WV+CP+dn4n3WQ32bsZTTrbnVfiFpLX15N3ufI=</latexit>

divergence between two distributions


<latexit sha1_base64="v9bt+1EYL0wajhoM0cArLcuq7gM=">AAACIXicbVDLTgJBEJzFF+IL9ehlIpp4IruYqEeiF4+YCJgAIbOzDUyYnd3M9EoI4Qf8DX/Aq/6BN+PNePc7nAUOAnbSSaWqO9VdfiyFQdf9cjIrq2vrG9nN3Nb2zu5efv+gZqJEc6jySEb6wWcGpFBQRYESHmINLPQl1P3+TarXH0EbEal7HMbQCllXiY7gDC3Vzp8EwspdUByoDzgAUBQHEQ2stRZ+kk6Zdr7gFt1J0WXgzUCBzKrSzv80g4gnISjkkhnT8NwYWyOmUXAJ41wzMRAz3mddaFioWAimNZp8M6anlgloJ9K2FdIJ+3djxEJjhqFvJ0OGPbOopeR/WiPBzlVrJFScoP13atRJJMWIptHYnzVwlEMLGNfC3kp5j2nG0QY45xKY9LRxzgbjLcawDGqlondRPL8rFcrXs4iy5IgckzPikUtSJrekQqqEkyfyQl7Jm/PsvDsfzud0NOPMdg7JXDnfv6aipOk=</latexit>

! cross entropy
<latexit sha1_base64="qCZrjboG+BNRgEZ9uVoGwxMyiA0=">AAACHXicbVC7TsMwFHXKq5RXgJHFokJiqhJAwFjBwlgk+pCaqHIcp7XqxJF9A1RRV36DH2CFP2BDrIgf4DtwHwNtOZKlo3PuyydIBdfgON9WYWl5ZXWtuF7a2Nza3rF39xpaZoqyOpVCqlZANBM8YXXgIFgrVYzEgWDNoH898pv3TGkukzsYpMyPSTfhEacEjNSxsad4twdEKfmAPWCPkFMltcYsASXTwbBjl52KMwZeJO6UlNEUtY7944WSZrEZQAXRuu06Kfg5UcCpYMOSl2mWEtonXdY2NCEx034+/skQHxklxJFU5iWAx+rfjpzEWg/iwFTGBHp63huJ/3ntDKJLP+dJmgFL6GRRlAkMEo9iwSFXjIIYGEKo4uZWTHtEEQomvJktoR6dNiyZYNz5GBZJ46TinldOb8/K1atpREV0gA7RMXLRBaqiG1RDdUTRE3pBr+jNerberQ/rc1JasKY9+2gG1tcvI3ajqQ==</latexit>

q(y|xl ) ! true distribution


<latexit sha1_base64="a0M8Y8L4spVqA5ZztI7oW2budGM=">AAACKnicbVDLTgJBEJz1ifhCPXqZSEz0INlVox6JXjxiImACGzI7NDBx9uFMr7JZ+Qp/wx/wqn/gjXg1foezwEHUSjqpVHenq8uLpNBo20NrZnZufmExt5RfXlldWy9sbNZ0GCsOVR7KUN14TIMUAVRRoISbSAHzPQl17/Yi69fvQWkRBteYROD6rBuIjuAMjdQqHNztJY/9ltynTSW6PWRKhQ+0idDHFFUMtG08KOHF2figVSjaJXsE+pc4E1IkE1Raha9mO+SxDwFyybRuOHaEbsoUCi5hkG/GGiLGb1kXGoYGzAftpqO3BnTXKG3aCZWpAOlI/bmRMl/rxPfMpM+wp3/3MvG/XiPGzpmbiiCKEQI+PtSJJcWQZhmZnxVwlIkhjCthvFLeY4pxNElOXWnrzNogb4Jxfsfwl9QOS85J6ejquFg+n0SUI9tkh+wRh5ySMrkkFVIlnDyRF/JK3qxn690aWh/j0RlrsrNFpmB9fgNOrKjq</latexit>

q(y|xl ) ⇡ h(y; yl ) ! one-hot-vector


<latexit sha1_base64="5myDp2Dx+9YDvevokb/dJfaKAAA=">AAACOHicbVDLThtBEJwlPBzzMskxlxEWkjlg7QKCSFwQueRoJPyQbMuaHbe9I2Znlplex6vFH8Nv5AdyhWNuOQVx5QsYPw4BUlJLpapudXeFiRQWff+3t/RheWV1rfCxuL6xubVd2vnUsDo1HOpcS21aIbMghYI6CpTQSgywOJTQDK+/Tf3mCIwVWl1hlkA3ZkMlBoIzdFKvdHZTyW7HPblPOyxJjB7TqJKdZTPBiGGEzBj9g3YQxphrBQeRxoMRcNRm0iuV/ao/A31PggUpkwVqvdLfTl/zNAaFXDJr24GfYDdnBgWXMCl2UgsJ49dsCG1HFYvBdvPZkxO655Q+HWjjSiGdqf9O5Cy2NotD1xkzjOxbbyr+z2unOPjazYVKUgTF54sGqaSo6TQx2hfG/SszRxg3wt1KecQM4+hyfbWlb6enTYoumOBtDO9J47AanFSPLo/L5xeLiArkC9klFRKQU3JOvpMaqRNO7sgvck8evJ/eH+/Re5q3LnmLmc/kFbznF0xardc=</latexit>

for L2 norm
<latexit sha1_base64="CpFtbwwpEDq07NbTMU+bsYOV6BE=">AAACEnicbVA7TsNAFFzzDeEXQFQ0KxIkqsgOCCgjaCgogkQ+UmJZ6806WWW9a+0+IyIrt+ACtHADOkTLBbgA58BOUpCEqUYz8/RG40eCG7Dtb2tpeWV1bT23kd/c2t7ZLeztN4yKNWV1qoTSLZ8YJrhkdeAgWCvSjIS+YE1/cJP5zUemDVfyAYYRc0PSkzzglEAqeYXDDrAnSAKlcenOq5SwVDoceYWiXbbHwIvEmZIimqLmFX46XUXjkEmgghjTduwI3IRo4FSwUb4TGxYROiA91k6pJCEzbjKuP8InqdLFWYVAScBj9e9FQkJjhqGfJkMCfTPvZeJ/XjuG4MpNuIxiYJJOHgWxwKBwtgXucs0oiGFKCNU87Yppn2hCIV1s5kvXZNVG+XQYZ36GRdKolJ2L8tn9ebF6PZ0oh47QMTpFDrpEVXSLaqiOKErQC3pFb9az9W59WJ+T6JI1vTlAM7C+fgHuz52D</latexit>

Power iteration method and the finite di↵erence method


<latexit sha1_base64="W+1zDtQdU05fl19xclUhZDdr8IA=">AAACP3icbVBNSwMxEM36bf2qevQSLIKnsqugHkUvHitYLbSlZLOzNphNlmRWKUv/j3/DP+BV/QN6E6/eTNc92NYHgcd7M5mZF6ZSWPT9N29mdm5+YXFpubKyura+Ud3curY6MxyaXEttWiGzIIWCJgqU0EoNsCSUcBPenY/8m3swVmh1hYMUugm7VSIWnKGTetWzTvFHbiAaNvQDGCoQTGHSBLCvI8pURLEPNBbKeTQScQwGFIeyoFet+XW/AJ0mQUlqpESjV33vRJpnCSjkklnbDvwUuzkzKLiEYaWTWUgZv2O30HZUsQRsNy/2HNI9p0Q01sY9hbRQ/3bkLLF2kISuMmHYt5PeSPzPa2cYn3RzodIM3XG/g+JMUtR0FJy72wBHOXCEcSPcrpT3mWHc5TU+JbKj1YYVF0wwGcM0uT6oB0f1w8uD2ulZGdES2SG7ZJ8E5JickgvSIE3CySN5Ji/k1XvyPrxP7+u3dMYre7bJGLzvHwoQsSw=</latexit>

ˆ p(y|x⇤ + r, ✓)]
<latexit sha1_base64="8OYyQTWFFPvhvEZjkMltArsN8/Y=">AAACP3icbVDLSgMxFM3Ud31VXboJFqHVUmZUVARB1IXLClYL7VAyadqGZh4kd8Rh7P/4G/6AW/UHdCdu3Zl2BrHVA4HDOedyb44TCK7ANF+NzMTk1PTM7Fx2fmFxaTm3snqt/FBSVqW+8GXNIYoJ7rEqcBCsFkhGXEewG6d3NvBvbplU3PeuIAqY7ZKOx9ucEtBSM3d6XpAlfNfcKuEGdBmQIj46xuf1oBDdJ2qXQJxY/WIJp/q2/MnbzVzeLJtD4L/ESkkepag0c2+Nlk9Dl3lABVGqbpkB2DGRwKlg/WwjVCwgtEc6rK6pR1ym7Hj41z7e1EoLt32pnwd4qP6eiImrVOQ6OukS6KpxbyD+59VDaB/aMfeCEJhHk0XtUGDw8aA43OKSURCRJoRKrm/FtEskoaDrHdnSUoPT+lldjDVew19yvVO29su7l3v5k9O0olm0jjZQAVnoAJ2gC1RBVUTRA3pCz+jFeDTejQ/jM4lmjHRmDY3A+PoGkF+s7Q==</latexit>

! for L1 norm D(r, x⇤ , ✓) := D[p(y|x⇤ , ✓),


<latexit sha1_base64="h/NRHGb/8+15Amn3rUJf/2Am3rk=">AAACI3icbVC7TgJBFJ3FF+ILtbSZCBorsqtGLYk2FhaYyCNhCZkdZmHC7Mxm5q5KNvyBv+EP2Oof2BkbC1u/w12gEPBUJ+fcm3vu8ULBDdj2l5VZWFxaXsmu5tbWNza38ts7NaMiTVmVKqF0wyOGCS5ZFTgI1gg1I4EnWN3rX6V+/Z5pw5W8g0HIWgHpSu5zSiCR2vlDV/NuD4jW6gG7wB4h9pXGxZu2y6UPgyKWSgfDdr5gl+wR8DxxJqSAJqi08z9uR9EoYBKoIMY0HTuEVkw0cCrYMOdGhoWE9kmXNRMqScBMKx79M8QHidLBaQ5fScAj9e9GTAJjBoGXTAYEembWS8X/vGYE/kUr5jKMgEk6PuRHAoPCaTm4wzWjIAYJIVTzJCumPaIJhaTCqSsdk0Yb5pJinNka5kntuOSclU5uTwvly0lFWbSH9tERctA5KqNrVEFVRNETekGv6M16tt6tD+tzPJqxJju7aArW9y9/YqVG</latexit>

ˆ takes the minimal value at r = 0 and rr D(r, x⇤ , ✓)|


ˆ r=0 = 0.
<latexit sha1_base64="VIqSbC6mHfjCoNaKV4G8+FFmhlw=">AAACbHicdVFNb9NAEF2bFkr4Ch+3CmlEDCqoiuxWAi6VKuDAsUikrRRH1ng9aVZZr63dcdXI+Gdy4A9w5A9wYZPmQFsYaaW3783bGb3Na60cx/GPILy1sXn7ztbd3r37Dx4+6j9+cuyqxkoayUpX9jRHR1oZGrFiTae1JSxzTSf5/ONSPzkn61RlvvKipkmJZ0ZNlUT2VNavok87dhcusje7kM6Q25RnxNi9joBxTg78FUplVIkazlE3BMgQWTiAOAI0BUSpwVxjZuE/L33LWnsQdyvHMOsP4mG8KrgJkjUYiHUdZf2faVHJpiTDUqNz4ySuedKiZSU1db20cVSjnOMZjT00WJKbtKtgOnjpmQKmlfXHMKzYvx0tls4tytx3lsgzd11bkv/Sxg1P309aZeqGycjLQdNGA1ewTBkKZUmyXniA0iq/K8gZWpTs/+LKlMItV+t6Ppjkegw3wfHeMHk73P+yNzj8sI5oS2yLF2JHJOKdOBSfxZEYCSm+i9/BRrAZ/Aqfhdvh88vWMFh7noorFb76A2L6twQ=</latexit>

Virtual Adversarial Training


<latexit sha1_base64="eJAErnnUpOVTn8Fo8/IyOaHYy6k=">AAACJHicbVDLTsJAFJ3iC/FVdelmIjG6Ii0m6hJ14xITQBIg5HY6wITptJmZmpCGT/A3/AG3+gfujAs3Lv0Op6ULAU8yyck59849OV7EmdKO82UVVlbX1jeKm6Wt7Z3dPXv/oKXCWBLaJCEPZdsDRTkTtKmZ5rQdSQqBx+mDN75N/YdHKhULRUNPItoLYCjYgBHQRurbp93sj0RSf9piUsfA8bWfboBkhjckMMHEsG+XnYqTAS8TNydllKPet3+6fkjigApNOCjVcZ1I9xKQmhFOp6VurGgEZAxD2jFUQEBVL8nCTPGJUXw8CKV5QuNM/buRQKDUJPDMZAB6pBa9VPzP68R6cNVLmIhiTQWZHRrEHOsQp+1gn0lKNJ8YAkQykxWTEUgg2jQyd8VXabRpyRTjLtawTFrVintROb+vlms3eUVFdISO0Rly0SWqoTtUR01E0BN6Qa/ozXq23q0P63M2WrDynUM0B+v7FylEpjM=</latexit>

D(r, x⇤ , ✓)

ˆ Hessian matrix
<latexit sha1_base64="jHuPQnS42gYVChIFRmqrHlMdOjM=">AAACCXicbVDLSsNAFL2pr1pfVZduBovgqiQV1GXRTZcV7APaWCaTSTt0MgkzE7GEfoE/4Fb/wJ249Sv8Ab/DSZuFbT1w4XDOvdzD8WLOlLbtb6uwtr6xuVXcLu3s7u0flA+P2ipKJKEtEvFIdj2sKGeCtjTTnHZjSXHocdrxxreZ33mkUrFI3OtJTN0QDwULGMHaSA8NqhTDAoVYS/Y0KFfsqj0DWiVOTiqQozko//T9iCQhFZpwrFTPsWPtplhqRjidlvqJojEmYzykPUMFDqly01nqKTozio+CSJoRGs3UvxcpDpWahJ7ZNPFGatnLxP+8XqKDazdlIk40FWT+KEg40hHKKkA+k5RoPjEEE8lMVkRGWGKiTVELX3yVRZuWTDHOcg2rpF2rOpfVi7tapX6TV1SEEziFc3DgCurQgCa0gICEF3iFN+vZerc+rM/5asHKb45hAdbXLxL6mwk=</latexit>

<latexit sha1_base64="/o6EV7PJQ1Ftnha4/cpzXc5ODek=">AAACP3icbVDLSgNBEJz1bXxFPXoZTQQFCbsK6lH04lHFqJCE0DvpuIOzD2d61bDmf/wNf8Cr+gN6E6/enI05+GoYKKqru2vKT5Q05LrPzsDg0PDI6Nh4YWJyanqmODt3YuJUC6yKWMX6zAeDSkZYJUkKzxKNEPoKT/2Lvbx/eoXayDg6pk6CjRDOI9mWAshSzeLuESYKBPLy5Urn9ma1zK8lBVyS4SLVGiPiaEiGQFaS5JI1Xg+AsjoFSNBdLS82iyW34vaK/wVeH5RYvw6axZd6KxZpaJcLBcbUPDehRgaapFDYLdRTgwmICzjHmoURhGgaWe+vXb5smRZvx9o+a67Hfp/IIDSmE/pWaU0H5ncvJ//r1VJqbzcyGSUpYSS+DrVTxSnmeXC8JTUKUh0LQGhpvXIRgAZBNt4fV1omt9Yt2GC83zH8BSfrFW+zsnG4XtrZ7Uc0xhbYElthHttiO2yfHbAqE+yOPbBH9uTcO6/Om/P+JR1w+jPz7Ec5H59KT68G</latexit>

Replace q(y|x) with its current estimate p(y|x, ✓)!


! first dominant eigenvector of H with
<latexit sha1_base64="LsEGoKAY6DCSnq32GngNsCTItGM=">AAACS3icbVC7TgMxEPSFd3gFKGksEiSq6A4QUCJoqBBIBJByUeTz7SUWPvtk7wWiU/6K3+ADoAXxA3SIAiek4DXVaGbXO54ok8Ki7z96pYnJqemZ2bny/MLi0nJlZfXS6txwaHAttbmOmAUpFDRQoITrzABLIwlX0c3x0L/qgbFCqwvsZ9BKWUeJRHCGTmpXTkMjOl1kxuhbGiLcYZEIY5HGOhWKKaQgOqB6wFEbqhNaO6nRW4FdOnoI8xhoLYTMCqlVbdCuVP26PwL9S4IxqZIxztqV1zDWPE9BIZfM2mbgZ9gqmEHBJQzKYW4hY/yGdaDpqGIp2FYx+veAbjolpokLlmiXdKR+3yhYam0/jdxkyrBrf3tD8T+vmWNy0CqEynIExb8OJbmkqOmwRBoL4wqRfUcYN8JlpbzLDOPoqv5xJbbDaIOyKyb4XcNfcrldD/bqO+e71cOjcUWzZJ1skC0SkH1ySE7IGWkQTu7JE3kmL96D9+a9ex9foyVvvLNGfqA09QlTd7SV</latexit>

! first dominant eigenvector of H with magnitude ✏


<latexit sha1_base64="LsEGoKAY6DCSnq32GngNsCTItGM=">AAACS3icbVC7TgMxEPSFd3gFKGksEiSq6A4QUCJoqBBIBJByUeTz7SUWPvtk7wWiU/6K3+ADoAXxA3SIAiek4DXVaGbXO54ok8Ki7z96pYnJqemZ2bny/MLi0nJlZfXS6txwaHAttbmOmAUpFDRQoITrzABLIwlX0c3x0L/qgbFCqwvsZ9BKWUeJRHCGTmpXTkMjOl1kxuhbGiLcYZEIY5HGOhWKKaQgOqB6wFEbqhNaO6nRW4FdOnoI8xhoLYTMCqlVbdCuVP26PwL9S4IxqZIxztqV1zDWPE9BIZfM2mbgZ9gqmEHBJQzKYW4hY/yGdaDpqGIp2FYx+veAbjolpokLlmiXdKR+3yhYam0/jdxkyrBrf3tD8T+vmWNy0CqEynIExb8OJbmkqOmwRBoL4wqRfUcYN8JlpbzLDOPoqv5xJbbDaIOyKyb4XcNfcrldD/bqO+e71cOjcUWzZJ1skC0SkH1ySE7IGWkQTu7JE3kmL96D9+a9ex9foyVvvLNGfqA09QlTd7SV</latexit>

Local Distributional Smoothness


<latexit sha1_base64="ihQdvUEbk04GcPmd1TZsNVHgbdo=">AAACHHicbVDLSgMxFM3UV62vUZcuDBbBVZmpoC6LunDhoqJ9QDuUTCbThmaSIckIZejS3/AH3OofuBO3gj/gd5hpZ2FbDwTOPfdezs3xY0aVdpxvq7C0vLK6VlwvbWxube/Yu3tNJRKJSQMLJmTbR4owyklDU81IO5YERT4jLX94lfVbj0QqKviDHsXEi1Cf05BipI3Usw9vBUYMXhsnSf0kE015HwmhB5wo1bPLTsWZAC4SNydlkKPes3+6gcBJRLjGDCnVcZ1YeymSmmJGxqVuokiM8BD1ScdQjiKivHTykTE8NkoAQyHN4xpO1L8bKYqUGkW+mYyQHqj5Xib+1+skOrzwUsrjRBOOp0ZhwqAWMEsFBlQSrNnIEIQlNbdCPEASYW2ym3EJVHbauGSCcedjWCTNasU9q5zeVcu1yzyiIjgAR+AEuOAc1MANqIMGwOAJvIBX8GY9W+/Wh/U5HS1Y+c4+mIH19Qs2HqKP</latexit>

⇠ = 10 6 , d is a randomly sampled u
<latexit sha1_base64="vApTuqdlrtscS+LwCyy239DU06Q=">AAACNHicbVDLSgNBEJz1bXxFPXppTAQPGnYV1IsgevGoYFRIYuidndXBeSwzs2JY8in+hj/gVT9A8CZ69BucxBx81amo6qa6K84Ety4Mn4Oh4ZHRsfGJydLU9MzsXHl+4dTq3FBWp1pocx6jZYIrVnfcCXaeGYYyFuwsvj7o+Wc3zFiu1YnrZKwl8VLxlFN0XmqXt6vNWw67EIUXxfpWt7oG1aQK3AKCQZVoKTpgUWaCJZAr7uCGUacNtMuVsBb2AX9JNCAVMsBRu/zeTDTNJVOOCrS2EYWZaxVoHKeCdUvN3LIM6TVesoanCiWzraL/YBdWvJJA6nNTrRz01e8bBUprOzL2kxLdlf3t9cT/vEbu0p1WwVWWO6boV1CaC3Aaem1Bwo3/13eQcKSG+1uBXqFB6nynP1IS2zutW/LFRL9r+EtON2rRVm3zeKOytz+oaIIskWWySiKyTfbIITkidULJHXkgj+QpuA9egtfg7Wt0KBjsLJIfCD4+Ab6gqSQ=</latexit>

⇠ = 10 6 , d is a randomly sampled unit vector


<latexit sha1_base64="vApTuqdlrtscS+LwCyy239DU06Q=">AAACNHicbVDLSgNBEJz1bXxFPXppTAQPGnYV1IsgevGoYFRIYuidndXBeSwzs2JY8in+hj/gVT9A8CZ69BucxBx81amo6qa6K84Ety4Mn4Oh4ZHRsfGJydLU9MzsXHl+4dTq3FBWp1pocx6jZYIrVnfcCXaeGYYyFuwsvj7o+Wc3zFiu1YnrZKwl8VLxlFN0XmqXt6vNWw67EIUXxfpWt7oG1aQK3AKCQZVoKTpgUWaCJZAr7uCGUacNtMuVsBb2AX9JNCAVMsBRu/zeTDTNJVOOCrS2EYWZaxVoHKeCdUvN3LIM6TVesoanCiWzraL/YBdWvJJA6nNTrRz01e8bBUprOzL2kxLdlf3t9cT/vEbu0p1WwVWWO6boV1CaC3Aaem1Bwo3/13eQcKSG+1uBXqFB6nynP1IS2zutW/LFRL9r+EtON2rRVm3zeKOytz+oaIIskWWySiKyTfbIITkidULJHXkgj+QpuA9egtfg7Wt0KBjsLJIfCD4+Ab6gqSQ=</latexit>

Miyato, Takeru, et al. "Virtual adversarial training: a regularization method for supervised and semi-supervised
virtual adversarial perturbation
<latexit sha1_base64="Qh2osFOYCdPsGwJK8D6D6K5gqp0=">AAACHXicbVBLSgNBFOzxG+Mv6tJNYxBchZkI6jLoxmUE84FkCG963iRNej509wRCyNZreAG3egN34la8gOewJ5mFSSxoKKrq8V6XlwiutG1/W2vrG5tb24Wd4u7e/sFh6ei4qeJUMmywWMSy7YFCwSNsaK4FthOJEHoCW97wLvNbI5SKx9GjHifohtCPeMAZaCP1SnTEpU5BUPCzGEhueIJGk14eKdsVewa6SpyclEmOeq/00/VjloYYaSZAqY5jJ9qdgNScCZwWu6nCBNgQ+tgxNIIQlTuZ/WRKz43i0yCW5kWaztS/ExMIlRqHnkmGoAdq2cvE/7xOqoMbd8KjJNUYsfmiIBVUxzSrhfpcItNibAgwyc2tlA1AAtOmlYUtvspOmxZNMc5yDaukWa04V5XLh2q5dptXVCCn5IxcEIdckxq5J3XSIIw8kRfySt6sZ+vd+rA+59E1K585IQuwvn4BsnGjYg==</latexit>

learning." IEEE transactions on pattern analysis and machine intelligence 41.8 (2018): 1979-1993.
Mean teachers are better role models:
Weight-averaged consistency targets improve semi-supervised deep learning results YouTube Video

J ! consistency loss
<latexit sha1_base64="Wn+jZ8R8q5C53IdTpeNspuCKW9g=">AAACInicbVDLSgNBEJyNrxhfqx69DIaAp7Croh6DXsRTBPOAJITZySQZMjuzzPSqYckX+Bv+gFf9A2/iSfDsdzib5GASGxqKqm6qu4JIcAOe9+VklpZXVtey67mNza3tHXd3r2pUrCmrUCWUrgfEMMElqwAHweqRZiQMBKsFg6tUr90zbbiSdzCMWCskPcm7nBKwVNst3OCm5r0+EK3VA24Ce4SEKmmsNZN0iIUyZtR2817RGxdeBP4U5NG0ym33p9lRNA6ZBCqIMQ3fi6CVEA2cCjbKNWPDIkIHpMcaFkoSMtNKxu+McMEyHdxV2rYEPGb/biQkNGYYBnYyJNA381pK/qc1YuhetBIuozj9bWLUjQUGhdNscIdrRkEMLSBUc3srpn2iCQWb4IxLx6SnjXI2GH8+hkVQPS76Z8WT29N86XIaURYdoEN0hHx0jkroGpVRBVH0hF7QK3pznp1358P5nIxmnOnOPpop5/sXsTSlfQ==</latexit>

student model ! weights ✓ and noise ⌘


<latexit sha1_base64="vC6y1wxs+VWshMTFzklA+abT/m0=">AAACRnicbZC/bhNBEMbnTCDBAWKgpFnFQaKy7kgElBE00AUJJ5F8ljW3N7ZX2ds97c4lWKd7Jl6DF6CgSTpKOkTLnu2C/BlppU+/mdE3+2WlVp7j+GfUubdx/8Hm1sPu9qPHT3Z6T58de1s5SUNptXWnGXrSytCQFWs6LR1hkWk6yc4+tP2Tc3JeWfOFFyWNC5wZNVUSOaBJ71PK9JVrz1VOhkVhc9KNSJ2azRmdsxdiNXBBLfFiL+U5Me4JNLkwVnkKqAXNpNePB/GyxG2RrEUf1nU06f1KcyurIvhKjd6PkrjkcY2OldTUdNPKU4nyDGc0CtJgQX5cL7/ciJeB5GJqXXjh7iX9f6PGwvtFkYXJAnnub/ZaeFdvVPH03bhWpqyYjFwZTSst2Io2P5ErR5L1IgiUToVbhZyjQ8kh5WsuuW9Pa7ohmORmDLfF8etB8maw//mgf/h+HdEWvIBdeAUJvIVD+AhHMAQJ3+AHXMJV9D36Hf2J/q5GO9F65zlcqw78A3+Asr4=</latexit>

teacher model ! weights ✓0 and noise ⌘ 0


<latexit sha1_base64="u7uRD5GJ3qUxH6po4FPB8NTQLyg=">AAACSHicbZDBTttAEIbXobSQ0jbAkcuqoWpPkd1WpUdEL0hcQGoAKY6i8XoSr1jvWrtjaGT5oXgNXqDqrVw4c0PcWCc5FOhIK/36/hnN7J8USjoKwz9Ba+nF8stXK6vt12tv3r7rrG8cO1NagX1hlLGnCThUUmOfJCk8LSxCnig8Sc5+NP7JOVonjf5J0wKHOUy0HEsB5NGocxAT/qKKEESGlucmRVXz2MpJRmCtueDzhgtsiOPbMWVI8HGbg065NtKhZzNSjzrdsBfOij8X0UJ02aIOR52bODWizFGTUODcIAoLGlZgSQqFdTsuHRYgzmCCAy815OiG1ezTNf/gScrHxvqnic/ovxMV5M5N88R35kCZe+o18H/eoKTx92EldVESajFfNC4VJ8ObBHkqLQpSUy9AWOlv5SIDC4J8zo+2pK45rW77YKKnMTwXx5970bfel6Ov3d29RUQrbIu9Z59YxHbYLttnh6zPBLtkv9lfdh1cBbfBXXA/b20Fi5lN9qharQcrkbL1</latexit>

J(✓) = Ex,⌘,⌘0 [kf (x, ✓0 , ⌘ 0 ) f (x, ✓, ⌘)k2 ]


<latexit sha1_base64="lydhnQvcklvxBOSj4L7kZMeAJ0E=">AAACVXicbZBLS8NAEMe3sb7qq+rRy2KRtlBLoqJeBFEE8VTB2kITy2a7sYubB7sTscR8Nb+GeBdv+g0EN2kPvgZ2+fObGWbm70aCKzDNl4IxVZyemZ2bLy0sLi2vlFfXrlUYS8raNBSh7LpEMcED1gYOgnUjyYjvCtZx706zfOeeScXD4ApGEXN8chtwj1MCGvXL3YuaDUMGpI6PsO0TGLpucpb2k4cGtjUe/9UU9+xH7NUeGuPqaiPHdbz9Deasju3Hmx3s9MsVs2nmgf8KayIqaBKtfvnNHoQ09lkAVBClepYZgZMQCZwKlpbsWLGI0Dtyy3paBsRnyklyB1K8pckAe6HULwCc0+8dCfGVGvmursxuVL9zGfwv14vBO3QSHkQxsICOB3mxwBDizE484JJRECMtCJVc74rpkEhCQZv+Y8pAZaulJW2M9duGv+J6p2ntN3cv9yrHJxOL5tAG2kQ1ZKEDdIzOUQu1EUVP6BW9o4/Cc+HTKBoz41KjMOlZRz/CWPkCyEiyPw==</latexit>

✓t0 = ↵✓t0 1 + (1 ↵)✓t ! Exponential Movin


<latexit sha1_base64="evwLG0LdzTbL0KVj8tfNp4m1J6I=">AAACZ3icbVFdSxtBFJ2s/bDph9FCKfRl2lAaEcOuStuXglqEvggWGhWyIdyd3GSnzs4sM3etYdn/2Nf+AcE/0Fc7SbZQtRcGDufce8/lTJIr6SgMfzWCpXv3HzxcftR8/OTps5XW6tqxM4UV2BNGGXuagEMlNfZIksLT3CJkicKT5OzzTD85R+uk0d9omuMgg4mWYymAPDVsfY8pRYJ3Q+KfeAwqT4H/pUrajCq+wTvR5kJZryXfHFs5SQmsNT88iRdUHlzkRqMmCYofmnOpJ3zPO8MEeefgcG+9GrbaYTecF78Lohq0WV1Hw9ZlPDKiyPxSocC5fhTmNCjBkhQKq2ZcOMxBnHmLvocaMnSDcp5Jxd96ZsTHxvqnic/ZfydKyJybZonvzIBSd1ubkf/T+gWNPw5KqfOCUIuF0bhQnAyfBcxH0qIgNfUAhJX+Vi5SsCDIf8MNl5GbnVY1fTDR7RjuguOtbvS+u/11p727X0e0zF6xN6zDIvaB7bIv7Ij1mGA/2W923WCNq2AleBG8XLQGjXrmObtRwes/gx25mw==</latexit>

✓t0 = ↵✓t0 1 + (1 ↵)✓t ! Exponential Moving Average (EMA)


<latexit sha1_base64="evwLG0LdzTbL0KVj8tfNp4m1J6I=">AAACZ3icbVFdSxtBFJ2s/bDph9FCKfRl2lAaEcOuStuXglqEvggWGhWyIdyd3GSnzs4sM3etYdn/2Nf+AcE/0Fc7SbZQtRcGDufce8/lTJIr6SgMfzWCpXv3HzxcftR8/OTps5XW6tqxM4UV2BNGGXuagEMlNfZIksLT3CJkicKT5OzzTD85R+uk0d9omuMgg4mWYymAPDVsfY8pRYJ3Q+KfeAwqT4H/pUrajCq+wTvR5kJZryXfHFs5SQmsNT88iRdUHlzkRqMmCYofmnOpJ3zPO8MEeefgcG+9GrbaYTecF78Lohq0WV1Hw9ZlPDKiyPxSocC5fhTmNCjBkhQKq2ZcOMxBnHmLvocaMnSDcp5Jxd96ZsTHxvqnic/ZfydKyJybZonvzIBSd1ubkf/T+gWNPw5KqfOCUIuF0bhQnAyfBcxH0qIgNfUAhJX+Vi5SsCDIf8MNl5GbnVY1fTDR7RjuguOtbvS+u/11p727X0e0zF6xN6zDIvaB7bIv7Ij1mGA/2W923WCNq2AleBG8XLQGjXrmObtRwes/gx25mw==</latexit>

Three types of noise:


<latexit sha1_base64="CDfrJ73Kd7F1Tq8w/WyiTBaM+Nw=">AAACr3icbVHLjtMwFHXCayivAhskNldUSGymagYJEAs0gsXMcpCm05GaUm6cG2LVsSP7hlGp+lP8DT/Ad+AkXcyDK1k+Ovfch4+zWivPk8mfKL51+87de3v3Bw8ePnr8ZPj02Zm3jZM0lVZbd56hJ60MTVmxpvPaEVaZplm2+tLmZz/JeWXNKa9rWlT4w6hCSeRALYe/U2OVyckwnJaOCFqRB1tA4D19TNPB/j44NLmtgMPtdVfpIVBQWqd+WcOoodCq7uq4JFCmbhhUmEW+73CEjfcKTd8WrLmk07gm18tyZ2sbKKzD4ymHC8Wl6rWG+MK6FQyWw9FkPOkCboJkB0ZiFyfL4d80t7KpwiOlRu/nyaTmxQYdK6lpO0gbTzXKVdh2HqDBivxi03m7hdeByaGwLpxgUsderthg5f26yoKyQi799VxL/i83b7j4sNh0DpCR/aCi0cC2+wPIlSPJeh0ASqfCriBLdCg5fOeVKblvV9u2xiTXbbgJzg7Gybvx268Ho8PPO4v2xEvxSrwRiXgvDsWxOBFTIaMX0afoKDqOk3gWf4u/99I42tU8F1ciVv8AKRTTRg==</latexit>

Three types of noise: and horizontal flips


<latexit sha1_base64="CDfrJ73Kd7F1Tq8w/WyiTBaM+Nw=">AAACr3icbVHLjtMwFHXCayivAhskNldUSGymagYJEAs0gsXMcpCm05GaUm6cG2LVsSP7hlGp+lP8DT/Ad+AkXcyDK1k+Ovfch4+zWivPk8mfKL51+87de3v3Bw8ePnr8ZPj02Zm3jZM0lVZbd56hJ60MTVmxpvPaEVaZplm2+tLmZz/JeWXNKa9rWlT4w6hCSeRALYe/U2OVyckwnJaOCFqRB1tA4D19TNPB/j44NLmtgMPtdVfpIVBQWqd+WcOoodCq7uq4JFCmbhhUmEW+73CEjfcKTd8WrLmk07gm18tyZ2sbKKzD4ymHC8Wl6rWG+MK6FQyWw9FkPOkCboJkB0ZiFyfL4d80t7KpwiOlRu/nyaTmxQYdK6lpO0gbTzXKVdh2HqDBivxi03m7hdeByaGwLpxgUsderthg5f26yoKyQi799VxL/i83b7j4sNh0DpCR/aCi0cC2+wPIlSPJeh0ASqfCriBLdCg5fOeVKblvV9u2xiTXbbgJzg7Gybvx268Ho8PPO4v2xEvxSrwRiXgvDsWxOBFTIaMX0afoKDqOk3gWf4u/99I42tU8F1ciVv8AKRTTRg==</latexit>

Three types of noise:


<latexit sha1_base64="CDfrJ73Kd7F1Tq8w/WyiTBaM+Nw=">AAACr3icbVHLjtMwFHXCayivAhskNldUSGymagYJEAs0gsXMcpCm05GaUm6cG2LVsSP7hlGp+lP8DT/Ad+AkXcyDK1k+Ovfch4+zWivPk8mfKL51+87de3v3Bw8ePnr8ZPj02Zm3jZM0lVZbd56hJ60MTVmxpvPaEVaZplm2+tLmZz/JeWXNKa9rWlT4w6hCSeRALYe/U2OVyckwnJaOCFqRB1tA4D19TNPB/j44NLmtgMPtdVfpIVBQWqd+WcOoodCq7uq4JFCmbhhUmEW+73CEjfcKTd8WrLmk07gm18tyZ2sbKKzD4ymHC8Wl6rWG+MK6FQyWw9FkPOkCboJkB0ZiFyfL4d80t7KpwiOlRu/nyaTmxQYdK6lpO0gbTzXKVdh2HqDBivxi03m7hdeByaGwLpxgUsderthg5f26yoKyQi799VxL/i83b7j4sNh0DpCR/aCi0cC2+wPIlSPJeh0ASqfCriBLdCg5fOeVKblvV9u2xiTXbbgJzg7Gybvx268Ho8PPO4v2xEvxSrwRiXgvDsWxOBFTIaMX0afoKDqOk3gWf4u/99I42tU8F1ciVv8AKRTTRg==</latexit>

– random translations
– random translations and horizontal – random translations
flips of the
Gaussian input
noise theand
onimages horizontal
input layer flips
– Gaussian noise on the input layer – Gaussian noise on
dropout applied the input
within layer
the network
– dropout applied within the network – dropout applied within the network
Ramp up the scale of consistency loss from
<latexit sha1_base64="OVjg5pOIgHr+je39135AsCa92BM=">AAACQHicbVDLSgNBEJz1bXxFPXppDIKnsKugHkUvHlWMCjGE3kmvGZzHMjMrxJAP8jf8Aa/6A+JNvHpyNubgq6ChqO6muivNpXA+jp+jsfGJyanpmdnK3PzC4lJ1eeXcmcJyanAjjb1M0ZEUmhpeeEmXuSVUqaSL9Oaw7F/cknXC6DPfy6ml8FqLTHD0QWpXD09R5VDk4LsEjqMkMBlwo13wJs17II1zkFmj4I6sAW9A+CAIjRJuURZUaVdrcT0eAv6SZERqbITjdvXlqmN4oUh7LtG5ZhLnvtVH6wWXNKhcFY5y5Dd4Tc1ANSpyrf7w2QFsBKUDmbGhtIeh+n2jj8q5nkrDpELfdb97pfhfr1n4bK/VFzovyse/jLJClh+XyUFHWOJe9gJBbkW4FXgXLXIf8v3h0nHlaYMymOR3DH/J+VY92alvn2zV9g9GEc2wNbbONlnCdtk+O2LHrME4u2eP7Ik9Rw/Ra/QWvX+NjkWjnVX2A9HHJ2OMsCo=</latexit>

Ramp up the scale of consistency loss from zero to its final value
<latexit sha1_base64="OVjg5pOIgHr+je39135AsCa92BM=">AAACQHicbVDLSgNBEJz1bXxFPXppDIKnsKugHkUvHlWMCjGE3kmvGZzHMjMrxJAP8jf8Aa/6A+JNvHpyNubgq6ChqO6muivNpXA+jp+jsfGJyanpmdnK3PzC4lJ1eeXcmcJyanAjjb1M0ZEUmhpeeEmXuSVUqaSL9Oaw7F/cknXC6DPfy6ml8FqLTHD0QWpXD09R5VDk4LsEjqMkMBlwo13wJs17II1zkFmj4I6sAW9A+CAIjRJuURZUaVdrcT0eAv6SZERqbITjdvXlqmN4oUh7LtG5ZhLnvtVH6wWXNKhcFY5y5Dd4Tc1ANSpyrf7w2QFsBKUDmbGhtIeh+n2jj8q5nkrDpELfdb97pfhfr1n4bK/VFzovyse/jLJClh+XyUFHWOJe9gJBbkW4FXgXLXIf8v3h0nHlaYMymOR3DH/J+VY92alvn2zV9g9GEc2wNbbONlnCdtk+O2LHrME4u2eP7Ik9Rw/Ra/QWvX+NjkWjnVX2A9HHJ2OMsCo=</latexit>

Temporal Ensembling: Maintains an


<latexit sha1_base64="DYe5g6COj7RFfRVpj2ReTvrnzlw=">AAACinicbVHdahNBFJ5df1pj1WivijdDg+BV2E1Brd4UteCNUKFpC9kQzs6eTYbOHzOzpSHkEXxAX8AX6At4Ng1iWg8c+PjO+c755kzplAwxy34l6YOHjx5vbT/pPN159vxF9+Wrs2AbL3AorLL+ooSAShocRhkVXjiPoEuF5+Xll7Z+foU+SGtO49zhWMPUyFoKiERNuj8LY6Wp0ER+itpZD4ofm4A0QJrpR/4dpImUgYMpig7/G3jtrCGZJIG2V9TMgRbBFLmtuYIS1UY/uaqkaJcGbg1HEDMePU1ulXgN2insT7q9rJ+tgt8H+Rr02DpOJt3fRWVFo8mIUBDCKM9cHC/ARykULjtFE9CBuCRbI4IGNIbxYnW2JX9DTMVr6ynp/Sv2X8UCdAhzXVKnhjgLd2st+b/aqIn1h/FCGtdENOJ2Ud0oHi1v/4BX0qOIak4AhJfklYsZeBCRfmpjSxVaa8sOHSa/e4b74GzQz9/1D34Mekef1yfaZq/ZPnvLcvaeHbFv7IQNmWA3yV6yn/TSnXSQHqafblvTZK3ZZRuRfv0DYKzCWw==</latexit>

exponential moving average of label


predictions on each training example.

Tarvainen, Antti, and Harri Valpola. "Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results."
arXiv preprint arXiv:1703.01780 (2017).
MixMatch: A Holistic Approach
to Semi-Supervised Learning YouTube Playlist

pmodel (y|x; ✓) ! a generic model which produces a distribution over class labels y for an input x with parameters ✓
<latexit sha1_base64="HN/WTGhv741eBcMt+TgeaVXhS2M=">AAACmHicbVHLjtMwFHXCayiPKbCDzRUNYthUCSBAYlPBAtgNiM6M1FSV49w01jhxZN/QRiFfwpfxA3wHTtoFM8NdHZ37OMfHSaWkpTD87fnXrt+4eevg9ujO3Xv3D8cPHp5YXRuBc6GVNmcJt6hkiXOSpPCsMsiLROFpcv6x75/+QGOlLr9TU+Gy4OtSZlJwctRq/KtaxYRbagudouqOmp/b9zHlSPwFxEauc+LG6A3shjissUQjBQzjsMmlyKEyOq0FWuCQOsdGJnV/HLQTBqG4taB4gspC0ASQaQO8BFlWNUGwDWAjyd3ghhdIzikEO/2gW40n4TQcCq6CaA8mbF/Hq/GfONWiLrCkQXYRhRUtW25ICoXdKK4tVlyc8zUuHCydol22Q4gdPHNMOrjLdEkwsP9utLywtikSN1lwyu3lXk/+r7eoKXu3bIf3Yil2QlmtgDT0P+IyMyhINQ5wYaTzCiJ3cYg+jQsqqe2tdSMXTHQ5hqvg5OU0ejN99fX1ZPZhH9EBe8KesiMWsbdsxj6zYzZnwvO8517oRf5jf+Z/8r/sRn1vv/OIXSj/21+f18vH</latexit>

<latexit sha1_base64="7wwclTPPfkBr2B1Nd6Sy7pu0qnM=">AAACF3icbVBLTgJBFOzBH+Jv1J1uOoKJKzKjRl0S3bjERD4JENLTPKBDzyfdb4xkQuI1vIBbvYE749alF/Ac9gALASvppFL1Xup1eZEUGh3n28osLa+srmXXcxubW9s79u5eVYex4lDhoQxV3WMapAigggIl1CMFzPck1LzBTerXHkBpEQb3OIyg5bNeILqCMzRS2z5oIjxiEjHFfEAzSAtN7AOywqht552iMwZdJO6U5MkU5bb90+yEPPYhQC6Z1g3XibCVMIWCSxjlmrGGiPEB60HD0MAk6lYy/sOIHhulQ7uhMi9AOlb/biTM13roe2bSZ9jX814q/uc1YuxetRIRRDFCwCdB3VhSDGlaCO0IBRzl0BDGlTC3Ut43dfC0jZmUjk5PG+VMMe58DYukelp0L4pnd+f50vW0oiw5JEfkhLjkkpTILSmTCuHkibyQV/JmPVvv1of1ORnNWNOdfTID6+sXGPigWw==</latexit>

Consistency Regularization
<latexit sha1_base64="lGcpTrxly2mD6OOFbIy56UTToZQ=">AAACInicbVDLTgIxFO3gC/GFunTTSEhckRlM1CWRjUs08kiAkE7nAg2dzqTtmIwTvsDf8Afc6h+4M65MXPsddmAWAp6kycm55z563JAzpW37y8qtrW9sbuW3Czu7e/sHxcOjlgoiSaFJAx7IjksUcCagqZnm0AklEN/l0HYn9bTefgCpWCDudRxC3ycjwYaMEm2kQbHcm81IJHjTeiCU2QiCxvgORhEnkj1mvpJdsWfAq8TJSAllaAyKPz0voJEPQlNOlOo6dqj7CZGaUQ7TQi9SEBI6ISPoGiqID6qfzE6Z4rJRPDwMpHlC45n6tyMhvlKx7xqnT/RYLddS8b9aN9LDq37CRBiln5wvGkYc6wCn2WCPSaCax4YQKpm5FdMxkYRqk+DCFk+lp00LJhhnOYZV0qpWnIvK+W21VLvOIsqjE3SKzpCDLlEN3aAGaiKKntALekVv1rP1bn1Yn3Nrzsp6jtECrO9fRwel0A==</latexit>

encourage the model to produce the same output distribution when its inputs are perturbed
<latexit sha1_base64="xiJlcTn8iP/j/kJhlqgslRJCOgo=">AAACXXicbZA7T8MwFIXd8C6vAgMDi0WFxFQlIAEjgoWxSLQgtVVxnFtq4diRfQ1UUf8e/4GJjYkVZpy0A68rWTo69177+IszKSyG4UslmJmdm19YXKour6yurdc2NttWO8OhxbXU5iZmFqRQ0EKBEm4yAyyNJVzH9+dF//oBjBVaXeEog17K7pQYCM7QW/3abRfhCXNQ3N/I7oDiEGiqE5AUNc2MThyfmJalQLXDzCFNfDAjYlfcQR+HoKhAS4XyPUuZAZqBQWdiSMb9Wj1shGXRvyKaijqZVrNfe+smmrsUFHLJrO1EYYa9nBkUXMK42nUWMsbvfdiOl8rHsr28JDGme95J6EAbfxTS0v2+kbPU2lEa+8mU4dD+7hXmf72Ow8FJLy9/6FlNHhq4ElKB1RMxwFGOvGDcCJ+V8iEzjKOH/+OVxBbRxlUPJvqN4a9oHzSio8bh5UH99GyKaJHskF2yTyJyTE7JBWmSFuHkmbyTD/JZeQ3mgpVgbTIaVKY7W+RHBdtfDzm57Q==</latexit>

kpmodel (y|Augment(x); ✓) pmodel (y|Augment(x); ✓)k22


<latexit sha1_base64="lbRU9dCGjr2VDCRF1/oeB7rbkbk=">AAACYHiclVC7bhpBFB02TkzIgyXp4mYUFAmKoF0SOZbcOHbjkkjmIbFkNTt7gZFnH5q5Gxkt+4H+BLcu3Lp1uswCRYA0OdJIZ865L50glUKj49xVrGcHz18cVl/WXr1+87ZuN94NdJIpDn2eyESNAqZBihj6KFDCKFXAokDCMLi+KP3hL1BaJPEVLlKYRGwWi6ngDI3k29xbpr6HcIN5lIQgi9ZiSdf/79ksghiL1k371EhzQNamn+n/lHvLn12/69tNp+OsQPeJuyFNskHPtx+8MOFZOY5LpvXYdVKc5Eyh4BKKmpdpSBm/ZjMYGxqzCPQkX4VR0E9GCek0UebFSFfq3x05i7ReRIGpjBjO9a5Xiv/yxhlOTya5iNMMIebrRdNMUkxomSwNhQKOcmEI40qYWymfM8U4mvy3toS6PK2omWDc3Rj2yaDbcY87X358bZ6dbyKqkiPykbSIS76RM3JJeqRPOLklj+SJ/K7cW1WrbjXWpVZl0/OebMH68Afv2boO</latexit>

<latexit sha1_base64="rPMsXiirYr4efHceLFdQNiIuKBU=">AAACR3icbZC7SgNBFIZn4z3eopY2g0HQJuwqqKVoI1YKRoUkhLOzJ2ZwdmaZOauGJe/ka/gCgpVWtnZi6SSm8HZg4Oc/1/niTElHYfgUlMbGJyanpmfKs3PzC4uVpeVzZ3IrsC6MMvYyBodKaqyTJIWXmUVIY4UX8fXhIH9xg9ZJo8+ol2ErhSstO1IAeatdOW4S3lEB3JERXXAkBScL2nWMTYc1fIO6yOnWcEKbOg6xuUEOFrk2xGWC2veA2uy3K9WwFg6D/xXRSFTZKE7alddmYkSe+glCgXONKMyoVYD1AxX2y83cYQbiGq6w4aWGFF2rGP65z9e9k3B/pX+a+ND93lFA6lwvjX2l/0fX/c4NzP9yjZw6e61C6iwn1OJrUSdXnDwAD5An0qIg1fMChJUDYB6cBeHx/NySuMFp/bIHE/3G8Fecb9Windr26VZ1/2CEaJqtsjW2wSK2y/bZETthdSbYPXtkz+wleAjegvfg46u0FIx6VtiPKAWfV96zMw==</latexit>

a stochastic transformation (the two terms above are not identical)


“Mean Teacher” ! replaces one of the terms above with the output of the model using an exponential moving average of model parameter values
<latexit sha1_base64="6dXJnSJ/j4/4EvYaJDikt3iiGTM=">AAAConicbVHbbtNAEF2bS0vKJcAjLyMCKk+RXRBF8FLBC0JCakXTVoqjdLyZxKuud63dcdrIyufwUfwA38HGCRJtGWmlozNn5ozO5pVWnpPkVxTfuXvv/tb2g87Ow0ePn3SfPjvxtnaSBtJq685y9KSVoQEr1nRWOcIy13SaX3xZ9U/n5Lyy5pgXFY1KnBk1VRI5UOPuz4zpipvz8++EBo4JZUFud3cJmVOzgtE5ewlrjaNKoyQP1hDYKXBBwORKD5jbOcGl4qIlbc1VzX8lpZ2QhtorM4NgQVdVmDesUIfWvGXDgThrd67FFTosKeyGOeqa/HLc7SX9pC24DdIN6IlNHY67v7OJlXUZjKRG74dpUvGoQcdKalp2stpThfIi2A4DNMHOj5o2ziW8DswEptaFZxha9t+JBkvvF2UelCVy4W/2VuT/esOapx9GjTIhHTJybTStNbCF1d/ARDmSrBcBoHQq3AqyCFnIEMV1l4lfnbbshGDSmzHcBid7/fR9/+3Ru97B501E2+KFeCneiFTsiwPxVRyKgZDRTpRGH6NP8av4W3wU/1hL42gz81xcqzj7Az470J8=</latexit> <latexit sha1_base64="2bfgzosQnhhaGNf9nGgUqu48QFQ=">AAACLHicbVDJSgNBFOxxN25Rj14ag+Apzqiox6AXjwpGhSSEnp43sbGXoftNMAz5DH/DH/Cqf+BFxGu+w85ycCtoKKreo15XnEnhMAzfg6npmdm5+YXF0tLyyupaeX3j2pnccqhzI429jZkDKTTUUaCE28wCU7GEm/j+bOjfdME6YfQV9jJoKdbRIhWcoZfa5b0mwgMWynSF7lDmR1kHqEmpMglImjHLFCBY2mUyB9dvlythNRyB/iXRhFTIBBft8qCZGJ4r0Mglc64RhRm2CmZRcAn9UjN3kDF+72Mbnmof51rF6GN9uuOVhKbG+qeRjtTvGwVTzvVU7CcVwzv32xuK/3mNHNOTViF0liNoPg5Kc0nR0GFLNBEWOMqeJ4xb4W+l/M53wX0VP1MSNzytX/LFRL9r+Euu96vRUfXg8rBSO51UtEC2yDbZJRE5JjVyTi5InXDySJ7JC3kNnoK34CP4HI9OBZOdTfIDweALYTSpYQ==</latexit>

“Virtual Adversarial Training” (VAT) ! computing an additive perturbation to apply to the input which maximally
<latexit sha1_base64="V6W6ffK9lJfPPU1woVBolUBeesI=">AAACrXicbZHNbtNAEMfX5quErwASFy4rItRyCTYg4MChBYE4FilJKyUhHa8n9qr7Ye2O20ZW3onX4QV4DtZODrRlpJX+mvnPzO5vs0pJT0nyO4pv3Lx1+87O3d69+w8ePuo/fjLxtnYCx8Iq644z8KikwTFJUnhcOQSdKTzKTr+09aMzdF5aM6JVhXMNhZFLKYBCatH/NSO8oObkZCId1aD4Qd7awcmgRw6kkabY3eV7k4PRqzWfOVmUBM7Zc77pFFZXNQUTB8MhzyXJM+QVhmku65ZwshyqSq1aQSVyaUIHPy+lKLmGC6lBhaIowRToO4etqbUIBd7zPFBwMqvbWeveoj9IhkkX/LpIt2LAtnG46P+Z5VbUGg1186ZpUtG8AUdSKFz3ZrXHCsQpFDgN0oBGP286smv+MmRyvrQuHBMu1Gb/7WhAe7/SWXBqoNJfrbXJ/9WmNS0/zpuOBBqxWbSsVYcofFN4tENBAUsuQbgAVbSAHAgKv3NpS+7bq3Vg0qsYrovJm2H6fvj2x7vB/uctoh32nL1geyxlH9g++84O2ZiJ6Fn0KfoafYtfx+N4Fv/cWONo2/OUXYq4+AsRYtWi</latexit>

maximally changes
changes the
the output
outputclass
classdistribution
distribution
<latexit sha1_base64="LHeBiECt8Hk+7Fq0naoeYo05R5Y=">AAACOnicbVDLSgMxFM34tr6qLt0Ei+CqzKioG0F047KCVaEtJZO57QSTzJDc0ZahX+Nv+ANudeXWhSBu/QAzbRe+DgQO596be+4JUyks+v6LNzE5NT0zOzdfWlhcWl4pr65d2iQzHOo8kYm5DpkFKTTUUaCE69QAU6GEq/DmtKhf3YKxItEX2E+hpVhXi47gDJ3ULh81EXqY38WCx1SxnlBMyj7lMdNdsBRjoEmGaYaUS2YtjZwlI8KsmB6U2uWKX/WHoH9JMCYVMkatXX5rRgnPFGgc/tcI/BRbOTMouIRBqZlZSBm/YV1oOKqZAtvKh2cO6JZTItpJjHvaGSrU7xM5U9b2Veg6FcPY/q4V4n+1Roadw1YutDsTNB8t6mSSYkKLzNzRBji6WCLBuBHOaxGQYRxdsj+2RLawNgwm+B3DX3K5Uw32q7vne5Xjk3FEc2SDbJJtEpADckzOSI3UCSf35JE8kWfvwXv13r2PUeuEN55ZJz/gfX4BOEOvVw==</latexit>

Entropy Minimization
<latexit sha1_base64="0aXAwrsok8vfH1RpFsYnfSGXvco=">AAACHHicbVDLSsNAFJ3UV62vqksXDhbBVUkqqMuiCG6ECvYBbSiTyaQdOpmEmYkQQ5b+hj/gVv/AnbgV/AG/w0mahW09MHA45965h+OEjEplmt9GaWl5ZXWtvF7Z2Nza3qnu7nVkEAlM2jhggeg5SBJGOWkrqhjphYIg32Gk60yuMr/7QISkAb9XcUhsH4049ShGSkvD6uEg/yMRxE2vuRJBGMNbyqlPH4uJmlk3c8BFYhWkBgq0htWfgRvgyCdcYYak7FtmqOwECUUxI2llEEkSIjxBI9LXlCOfSDvJQ6TwWCsu9AKhH1cwV/9uJMiXMvYdPekjNZbzXib+5/Uj5V3YCeVhpAjH00NexKAKYNYKdKkgWLFYE4QF1VkhHiOBsNLdzVxxZRYtrehirPkaFkmnUbfO6qd3jVrzsqioDA7AETgBFjgHTXADWqANMHgCL+AVvBnPxrvxYXxOR0tGsbMPZmB8/QIjDaMb</latexit>

encourage the model to output confident predictions on unlabeled data


<latexit sha1_base64="SvQdp9y3O5Gr+KnDFD0klktiiDA=">AAACRHicbVBNSyNBEO1x/dqsH3H3uJdmg7CnMKOgHkUP6zELGw0kIdR015jGnu6hu0YIIT/Jv+Ef8CSs1z3tTbyKPUkOa7IFBY9XX69eWmjlKY4fo5UPq2vrG5sfa5+2tnd263ufL70tncC2sNq6TgoetTLYJkUaO4VDyFONV+nNeVW/ukXnlTW/aFRgP4drozIlgAI1qP9AI8IuuEZOQ+S5lag5WW5LKkriwppMSTTEw1apRDXkuTW8NBpS1Ci5BAJeG9QbcTOeBl8GyRw02Dxag/qfnrSizMNuocH7bhIX1B+DIyU0Tmq90mMB4iYo6wZoIEffH08fnvD9wEieWRfSVCoD++/EGHLvR3kaOnOgoV+sVeT/at2SspP+WJnwezBmdigrp45U7nGpHArSowBAOBW0cjEEB4KCx++uSF9Jm1TGJIs2LIPLg2Zy1Dz8edA4PZtbtMm+sm/sO0vYMTtlF6zF2kywO/bAfrOn6D76Gz1HL7PWlWg+84W9i+j1DcoHsl4=</latexit>

<latexit sha1_base64="oSsE/gUUvbn/U2LrL96yM5sx7hg=">AAACRXicbVA9bxNBEN0zkBiHJAZKmhVOpKSx7hwJkGgsaFIGCSeWbMua25uLV9mP0+5c5OPwX+Jv8AdoKKBMlw7RwtpxQRJe9fRmRu/NSwslPcXx96jx4OGjjc3m49bWk+2d3fbTZ6felk7gQFhl3TAFj0oaHJAkhcPCIehU4Vl68X45P7tE56U1H6kqcKLh3MhcCqAgTdvHWhqp5SfkNEOOhpwtKm5zvldMx4RzqrXNUC0Oqs/zt3wclggO93huHS+NghQVZjwDgmm7E3fjFfh9kqxJh61xMm1fjTMrSh08hQLvR0lc0KQGR1IoXLTGpccCxAWc4yhQAxr9pF59vOD7QclWKXJriK/Ufy9q0N5XOg2bGmjm786W4v9mo5LyN5NamqIkNOLGKC8VJ8uX9fFMOhSkqkBAOBmycjEDB4JCybdcMr+MtmiFYpK7Ndwnp71u8qp79KHX6b9bV9RkL9hLdsAS9pr12TE7YQMm2Bf2jf1gP6Ov0XX0K/p9s9qI1jfP2S1Ef/4CjeOywA==</latexit>

minimize the entropy of pmodel (y|x; ✓) for unlabeled data


“Pseudo-Label” ! constructing hard (1-hot) labels from high-confidence predictions on unlabeled data and using these as training targets in a stand
<latexit sha1_base64="s3ALbb3wtTX6KuptTfcOVhvZ/UI=">AAACwHicbVHbbtNAEF2bWwm3AI+8jIhQ24dEcUHAY1VeeEAiSKQNSqJ0vR7bq653zc4YCFG+jq/gB/gO1kkeaMtIK509c+aiM2ltNPFw+DuKb9y8dfvO3t3OvfsPHj7qPn5ySq7xCsfKGecnqSQ02uKYNRuc1B5llRo8Sy/etfmzb+hJO/uZlzXOK1lYnWslOVCL7q8Z4w9enZ+PCJvM9T/IFM3+/hpmXhclS+/dd9hqlLPEvlGsbQGl9BkcJP3S8SGYtogg966CMpT1gzTXGVqFELbJtGqHETgLjd2IMYNMsgRpM2iobcglEoIkYC+13TDSF8gE2oIE4iBtZyrviPpo2bt6CSZ81otubzgYbgKug2QHemIXo0X3zyxzqqlCF2Uk0TQZ1jxfSc9aGVx3Zg1hLdWFLHAaoJUV0ny1MXsNLwKTQe58eJZhw/5bsZIV0bJKg7KSXNLVXEv+LzdtOH87X2lbNxyc2w7KGwPsoL0cZNqjYrMMQCqvw66gwhmk4nDfS1Myaldbd4IxyVUbroPTo0HyevDy06ve8cnOoj3xTDwXByIRb8SxeC9GYixUdBh9jCbRl/gkLmMXf91K42hX81RcivjnXyHj3Mw=</latexit>

predictions on unlabeled data and using these as training targets in a standard cross-entropy loss
<latexit sha1_base64="BUwu90obAZZ/Esml/yUbYxND+EM=">AAACZnicbVFNbxMxEHWWrxKgBBDiwGVEhMSFaLcg4FjBhWORSFspiaJZe5JY9dorzywiWuU3cuYPIP4AV8Cb5kBbRrL0/N48z+i5rJ1lyfPvveza9Rs3b+3d7t+5e2///uDBw2MOTdQ01sGFeFoik7OexmLF0WkdCavS0Ul59qHTT75QZBv8Z1nXNKtw6e3CapREzQd2KvRV2uQxVncUQ/DQeIclOTJgUBDQG2jY+iXIipgAGSSi9VsG45KEwXpAYEmtGA3oGJhfkpcY6jW4dNnMB8N8lG8LroJiB4ZqV0fzwY+pCbqp0ivaIfOkyGuZtRjFakeb/rRhqlGf4ZImCXqsiGftNpINPE+MgUWI6XiBLfuvo8WKeV2VqbNCWfFlrSP/p00aWbybtdbXjZDX54MWjQMJ0OULxkbS4tYJoI427Qp6hRG1pF+4MMVwt9qmn4IpLsdwFRwfjIo3o1efXg8P3+8i2lNP1TP1QhXqrTpUH9WRGiutvqlf6rf60/uZ7WePsyfnrVlv53mkLlQGfwFgWL08</latexit>

ta and using these as training targets in a standard cross-entropy loss


“Pseudo-Label” and “Sharpening” also achieve entropy minimization
<latexit sha1_base64="WcDaY8EU5wcGVRaIY40eocbhIhQ=">AAACQnicbVDLSiNBFK32GeMr6tJNYRDdGLoVZlyKIsxiFhGNCjGY29U3SWE9mqpqIYZ80fzG/IDLcfYu3IlbF1YnvfB1oOBwzn3ViVPBrQvDf8HE5NT0zGxprjy/sLi0XFlZPbc6MwwbTAttLmOwKLjChuNO4GVqEGQs8CK+Ocr9i1s0lmt15voptiR0Fe9wBs5L15XjdrtuMUv0zm+IUWxtUVAJbbdPe2BSVFx1c0lYTYH1ON4iReWMTvtUcsUlvyvmVMNaOAL9SqKCVEmB+nXl8SrRLJN+GBNgbTMKU9cagHGcCRyWrzKLKbAb6GLTUwUSbWsw+u6QbnoloR1t/FOOjtT3HQOQ1vZl7CsluJ797OXid14zc5391oCrNHOo2HhRJxPUaZpnRxNukDnR9wSY4f5WynxOwJxP+MOWxOanDcs+mOhzDF/J+W4t+lHbO9mtHhwWEZXIOtkg2yQiP8kB+UXqpEEY+UPuyQP5H/wNnoLn4GVcOhEUPWvkA4LXNziEsSg=</latexit>

Traditional Regularization
<latexit sha1_base64="93d7bR0k6r7eX8gIxrg92gWlJQk=">AAACInicbVDLSsNAFJ3Ud31VXboZLIKrklRQl0U3LlX6ENogN5ObdujkwcxEqCFf4G/4A271D9yJK8G13+GkzcLXgYHDOefOvRwvEVxp2363KnPzC4tLyyvV1bX1jc3a1nZXxalk2GGxiOW1BwoFj7CjuRZ4nUiE0BPY88Znhd+7Ral4HLX1JEE3hGHEA85AG+mmtj+Y/pFJ9PO2BJ8XMgh6hcNUgOR3Za5uN+wp6F/ilKROSlzc1D4HfszSECPNBCjVd+xEuxlIzZnAvDpIFSbAxjDEvqERhKjcbHpKTveN4tMgluZFmk7V7xMZhEpNQs8kQ9Aj9dsrxP+8fqqDEzfjUZJqjNhsUZAKqmNadEN9LpFpMTEEmDRVMMpGIIFp0+CPLb4qTsurphjndw1/SbfZcI4ah5fNeuu0rGiZ7JI9ckAcckxa5JxckA5h5J48kifybD1YL9ar9TaLVqxyZof8gPXxBSDvpbk=</latexit>

weight-decay and mixup


<latexit sha1_base64="IInwi2O4fp7WCpCkjop7R/upTV8=">AAACE3icbVDLSsNAFJ3UV62vqODGzWAR3FiSCuqy6MZlBfuANpTJ5KYdOpmEmYkaaj/DH3Crf+BO3PoB/oDf4fSxsK0HLhzOuZd7OH7CmdKO823llpZXVtfy64WNza3tHXt3r67iVFKo0ZjHsukTBZwJqGmmOTQTCSTyOTT8/vXIb9yDVCwWdzpLwItIV7CQUaKN1LEPHoB1e/o0AEoyTESAI/aYJh276JScMfAicaekiKaoduyfdhDTNAKhKSdKtVwn0d6ASM0oh2GhnSpICO2TLrQMFSQC5Q3G+Yf42CgBDmNpRmg8Vv9eDEikVBb5ZjMiuqfmvZH4n9dKdXjpDZhIUg2CTh6FKcc6xqMycMAkUM0zQwiVzGTFtEckodpUNvMlUKNow4Ipxp2vYZHUyyX3vHR2Wy5WrqYV5dEhOkInyEUXqIJuUBXVEEVP6AW9ojfr2Xq3PqzPyWrOmt7soxlYX7+DJZ5r</latexit>

(x1 , p1 ), (x2 , p2 ) ! pair of two examples with their corresponding labels


<latexit sha1_base64="jQqsQZEng6LGZTZSVeiHTvZh+3A=">AAACW3icbZDPbtNAEMY3Ln9KKBCKOHEZESG1UhXZAZUeK7hwLBJpKyXBWq/H8arr3dXuuElk5e36Ej1w5cAVHoB1mgNtGWmkn75vRjP6Mqukpzi+7kRbDx4+erz9pPt059nzF72Xu6fe1E7gSBhl3HnGPSqpcUSSFJ5bh7zKFJ5lF59b/+wSnZdGf6OlxWnFZ1oWUnAKUtr7vrdIkwOwabJ/AIGHLQ/3YeLkrCTunJnDhHBBjeXSgSmA5gZwwSur0MNcUglUYrCEcQ69NTqXegaKZ6j8Ku3140G8LrgPyQb6bFMnae/nJDeirlCTUNz7cRJbmjbckRQKV91J7dFyccFnOA6oeYV+2qxzWMG7oORQGBdaE6zVfzcaXnm/rLIwWXEq/V2vFf/njWsqjqaN1LYm1OLmUFErIANtqJBLh4LUMgAXToZfQZTccUEh+ltXct++tuqGYJK7MdyH0+EgORy8//qhf/xpE9E2e8Pesj2WsI/smH1hJ2zEBLtiv9hv9qfzI9qKutHOzWjU2ey8Yrcqev0X9A62WA==</latexit>

0
<latexit sha1_base64="S5BUfE/+tuxJ1McUG1d8zC4h6k8=">AAACN3icbVDLSgMxFM34tr6qLt0Eq+iqzKioCILoxqWCVaFTyp1M2oZmMkNyp7YM/Rd/wx9wq1tX7sStf2DazsKqBwKHcx/n5gSJFAZd982ZmJyanpmdmy8sLC4trxRX125NnGrGKyyWsb4PwHApFK+gQMnvE80hCiS/C9oXg/pdh2sjYnWDvYTXImgq0RAM0Er14omvRbOFoHX8QH3kXcw6oISUQCPRTROaGm7oli/tyhB26CnN6Va/Xiy5ZXcI+pd4OSmRHFf14ocfxiyNuEImwZiq5yZYy0CjYJL3C771SoC1ocmrliqIuKllwz/26bZVQtqItX0K6VD9OZFBZEwvCmxnBNgyv2sD8b9aNcXGcS0TKkmRKzYyaqSSYkwHgdFQaM5Q9iwBpoW9lbIWaGBoYx1zCc3gtH7BBuP9juEvud0re4fl/euD0tl5HtEc2SCbZJd45IickUtyRSqEkUfyTF7Iq/PkvDsfzueodcLJZ9bJGJyvb9ldrPI=</latexit>

! vanilla mixup uses = Loss Function


<latexit sha1_base64="Ms/nIKno+dIMKCowCo1I2BqIcng=">AAACFXicbVDNSgMxGMzWv1r/ql4EL8EieCq7FdRjURAPHirYVmiXks1m29BssiRZoSzra/gCXvUNvIlXz76Az2F2uwfbOhAYZr4v8zFexKjStv1tlZaWV1bXyuuVjc2t7Z3q7l5HiVhi0saCCfngIUUY5aStqWbkIZIEhR4jXW98lfndRyIVFfxeTyLihmjIaUAx0kYaVA/6+R+JJH56K5SC1zHHU6tm1+0ccJE4BamBAq1B9afvCxyHhGvMkFI9x460myCpKWYkrfRjRSKEx2hIeoZyFBLlJnl6Co+N4sNASPO4hrn6dyNBoVKT0DOTIdIjNe9l4n9eL9bBhZtQHsWacDwNCmIGtYBZHdCnkmDNJoYgLKm5FeIRkghrU9pMiq+y09KKKcaZr2GRdBp156x+eteoNS+LisrgEByBE+CAc9AEN6AF2gCDJ/ACXsGb9Wy9Wx/W53S0ZBU7+2AG1tcvBBGf0w==</latexit>

x0 closer to x1 than x2
<latexit sha1_base64="OOEM7qGU+RzIGkbhV4qcsVsdRtA=">AAACHHicbVDLSsNAFJ3UV62vqEsXDjaiq5JUUJdFNy4r2Ae0IUwm03boZBJmJtISuvQ3/AG3+gfuxK3gD/gdTtosbOuBC4dz7uXee/yYUals+9sorKyurW8UN0tb2zu7e+b+QVNGicCkgSMWibaPJGGUk4aiipF2LAgKfUZa/vA281uPREga8Qc1jokboj6nPYqR0pJnHlujMwtiFkkioIqgNfIcC6oB4hmtWp5Ztiv2FHCZODkpgxx1z/zpBhFOQsIVZkjKjmPHyk2RUBQzMil1E0lihIeoTzqachQS6abTRybwVCsB7EVCF1dwqv6dSFEo5Tj0dWeI1EAuepn4n9dJVO/aTSmPE0U4ni3qJSz7OEsFBlQQrNhYE4QF1bdCPEACYaWzm9sSyOy0SUkH4yzGsEya1YpzWbm4r5ZrN3lERXAETsA5cMAVqIE7UAcNgMETeAGv4M14Nt6ND+Nz1low8plDMAfj6xdnlJ+i</latexit>

MixMatch
<latexit sha1_base64="x1xnCbbiDjzLbxwhS5goN4KMDBg=">AAACEHicbVDLSsNAFJ34rPUV7dLNYBFclaSCuiy6cVOoYB/QhjKZTNqhk0mYmYgh5Cf8Abf6B+7ErX/gD/gdTtIsbOuBgcM59865HDdiVCrL+jbW1jc2t7YrO9Xdvf2DQ/PouCfDWGDSxSELxcBFkjDKSVdRxcggEgQFLiN9d3ab+/1HIiQN+YNKIuIEaMKpTzFSWhqbtVHxRyqIl7XpUxspPB2bdathFYCrxC5JHZTojM2fkRfiOCBcYYakHNpWpJwUCUUxI1l1FEsSITxDEzLUlKOASCctgjN4phUP+qHQjytYqH83UhRImQSungyQmsplLxf/84ax8q+dlPIoVoTjeZAfM6hCmDcBPSoIVizRBGFB9a0QT5FAWOm+FlI8mZ+WVXUx9nINq6TXbNiXjYv7Zr11U1ZUASfgFJwDG1yBFrgDHdAFGCTgBbyCN+PZeDc+jM/56JpR7tTAAoyvXw/KnbU=</latexit>

<latexit sha1_base64="a/0ULtWisjZicns+6kscEiAmNMs=">AAACWHicbZDLbtNAFIZPDPSSQpuWJZsRUdVUKpYNiLJBqoAFy6ImbaU4SseTk2SU8Yw1cwxElh+Ox4AHgC28AePUC3o50ki//nOdL82VdBRFP1rBg4eP1tY3Nttbj59s73R2986dKazAgTDK2MuUO1RS44AkKbzMLfIsVXiRLj7U+YsvaJ00uk/LHEcZn2k5lYKTt8adYaJkNi77LLFyNidurfnKooolhN+oPJtzm6OuevkR6x+yd439UVouWE+GGB6xqyuj8cXc0MHBIZv4i61Mi3p4Ne50ozBaBbsr4kZ0oYnTcedXMjGiyFCTUNy5YRzlNCq5JSkUVu2kcJhzseAzHHqpeYZuVK4gVGzfOxM2NdY/TWzl/t9R8sy5ZZb6yozT3N3O1eZ9uWFB07ejUuq8INTietG0UIwMq4n6P1sUpJZecGGlv5UJz40L8txvbJm4+rSq7cHEtzHcFecvw/hN+Orz6+7J+wbRBjyD59CDGI7hBD7BKQxAwHf4DX/gb+tnAMF6sHldGrSanqdwI4K9f4/gtPM=</latexit>

lim Sharpen(p, T ) = Dirac (i.e., “one-hot”) dis


<latexit sha1_base64="a/0ULtWisjZicns+6kscEiAmNMs=">AAACWHicbZDLbtNAFIZPDPSSQpuWJZsRUdVUKpYNiLJBqoAFy6ImbaU4SseTk2SU8Yw1cwxElh+Ox4AHgC28AePUC3o50ki//nOdL82VdBRFP1rBg4eP1tY3Nttbj59s73R2986dKazAgTDK2MuUO1RS44AkKbzMLfIsVXiRLj7U+YsvaJ00uk/LHEcZn2k5lYKTt8adYaJkNi77LLFyNidurfnKooolhN+oPJtzm6OuevkR6x+yd439UVouWE+GGB6xqyuj8cXc0MHBIZv4i61Mi3p4Ne50ozBaBbsr4kZ0oYnTcedXMjGiyFCTUNy5YRzlNCq5JSkUVu2kcJhzseAzHHqpeYZuVK4gVGzfOxM2NdY/TWzl/t9R8sy5ZZb6yozT3N3O1eZ9uWFB07ejUuq8INTietG0UIwMq4n6P1sUpJZecGGlv5UJz40L8txvbJm4+rSq7cHEtzHcFecvw/hN+Orz6+7J+wbRBjyD59CDGI7hBD7BKQxAwHf4DX/gb+tnAMF6sHldGrSanqdwI4K9f4/gtPM=</latexit>

T !0
lim Sharpen(p, T ) = Dirac (i.e., “one-hot”) distribution
T !0
Berthelot, David, et al. "Mixmatch: A holistic approach to semi-supervised learning." arXiv preprint arXiv:1905.02249 (2019).
Self-training with Noisy Student
improves ImageNet classi cation YouTube Video

mCE (mean corruption error) is


the weighted average of error
rate on different corruptions,
with AlexNet’s error rate as a
ImageNet-C and ImageNet-P test sets include images with baseline (lower is better).
common corruptions and perturbations such as blurring, mFR (mean flip rate) measures
fogging, rotation and scaling. ImageNet-A test set consists the model's probability of
of difficult images that cause significant drops in accuracy flipping predictions under
to state-of-the-art models. These test sets are considered as perturbations with AlexNet as
“robustness” benchmarks. a baseline (lower is better).

Xie, Qizhe, et al. "Self-training with noisy student improves imagenet classi cation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. 2020.
fi
fi
FixMatch: Simplifying Semi-Supervised
Learning with Consistency and Con dence YouTube Video
<latexit sha1_base64="VngCmXqjYAz+t3ON0R4E1CaC6ss=">AAACv3icbVFNb9QwEJ2Ej5bla4ET4mKxQioHVkmRCuJU4MIBoaKy3ZV2V5XjOFurjh3ZzopVtD+0f6C/g5c0SLRlJNtv3pvxjGaySisfkuQiiu/cvXd/Z/fB4OGjx0+eDp89P/G2dkJOhNXWzTLupVZGToIKWs4qJ3mZaTnNzr+2+nQtnVfW/AqbSi5LvjKqUIIHUHbIaUGBJP3G3dAxUEmK3gHVVMFztIbvgXJi9B0vB2fAGVqB2UPkMfi39KnX110Wh/o3psarwWRQdP9TDj/gMNwWt0LdCnm2y2edUsLL2xz2Eh2wvp8CrIPG8asAsz0djpJx0hm7DdIejKi3o9Ph5SK3oi6lCUJz7+dpUoVlw11QQsvtYFF7WXFxzldyDmh4Kf2y6Wa9ZW/A5KywDscE1rH/ZjS89H5TZogseTjzN7WW/J82r0PxcdkoU9VBGnFVqKg1C5a1i2O5clIEvQHgwin0ysQZd1wErPdaldy3rW0HGEx6cwy3wcn+OD0Yv/+5Pzr80o9ol17Ra6w2pQ90SN/oiCYkor3oRzSNZvHneBWbuLoKjaM+5wVds3jzB8CMt0Q=</latexit>

Semi-Supervised Learning (SSL): Leveraging unlabeled data to improve a model’s performance


Extensions of the consistency regularization idea
<latexit sha1_base64="m0hF+4Gp2h+FbAElWWVhPIWiETY=">AAACOXicbVDLSgMxFM3UV62vqks3wSK4KjMV1IWLogguK9gHtKVkMnfa0ExmSDJiHfoz/oY/4FZ3Lt2IuPUHzEy7sK0XAodz7uPkuBFnStv2u5VbWl5ZXcuvFzY2t7Z3irt7DRXGkkKdhjyULZco4ExAXTPNoRVJIIHLoekOr1K9eQ9SsVDc6VEE3YD0BfMZJdpQveJFJ9uRSPDG1w8aRNqpcOhjPQBMDTYeQNARltCPOZHsMZvEzAPSK5bssp0VXgTOFJTQtGq94mfHC2kcgNCUE6Xajh3pbkKkZpTDuNCJFUSEDkkf2gYKEoDqJpnDMT4yjIf9UJonNM7YvxMJCZQaBa7pDIgeqHktJf/T2rH2z7sJE1Gc/nRyyI851iFOI8Mek0A1HxlAqGTGK6YDIgnVJtiZK55KrY0LJhhnPoZF0KiUndPyyW2lVL2cRpRHB+gQHSMHnaEqukE1VEcUPaEX9IrerGfrw/qyvietOWs6s49myvr5BQzmr08=</latexit>

– using an adversarial transformation in place of ↵


<latexit sha1_base64="Jj4KEgWs7x93JLeLQZu141pgENY=">AAACOHicbVDLSgMxFM34rPVVdekm2ApuWmYqqOCm6MZlBfuAtpQ7mUwbmskMSUYoQz/G3/AH3OrSnSvFrV9gZtqFbT0QOJx7LvfkuBFnStv2u7Wyura+sZnbym/v7O7tFw4OmyqMJaENEvJQtl1QlDNBG5ppTtuRpBC4nLbc0W06bz1SqVgoHvQ4or0ABoL5jIA2Ur9wXS7jWDExwCAweKkVJAOOtQSh/FAGmREzgSMOhOLQx6Uu8GgIpX6haFfsDHiZODNSRDPU+4XPrheSOKBCEw5KdRw70r0EpGaE00m+GysaARnBgHYMFRBQ1UuyT07wqVE8bBKZJzTO1L8bCQRKjQPXOE3moVqcpeJ/s06s/atewkQUayrI9JAfmwZCnDaGPSYp0XxsCBDJTFZMhiCBaFPW3BVPpdEmeVOMs1jDMmlWK85F5fy+WqzdzCrKoWN0gs6Qgy5RDd2hOmoggp7QC3pFb9az9WF9Wd9T64o12zlCc7B+fgFkcayr</latexit>

– using a running average or past model predictions for one invocation of pm


<latexit sha1_base64="fTru/LkN6H1lkyI4ZgYPVN6H4Nc=">AAACVXicbVDLbhNBEBwvJgRDEgNHLi1sJC6xdhMp5BjBhaOR8EOyLWt2ttcZZR6rmd4o1sq/xm8g7ogb/AESs2sfcEKfSlVd3aVKCyU9xfH3VvSo/fjgyeHTzrPnR8cn3Rcvx96WTuBIWGXdNOUelTQ4IkkKp4VDrlOFk/TmY61PbtF5ac0XWhe40HxlZC4Fp0Atu9PTUyi9NCvg4EpjGhQMfIVgHRTcE2iboYJwNpOidnnIg2QNgjS3dnsJbA79YjknvKNKb/qw7PbiQdwMPATJDvTYbobL7s95ZkWp0ZBQ3PtZEhe0qLgjKRRuOvPSY8HFTUg2C9BwjX5RNQ1s4G1gsiZWbg1Bw/7rqLj2fq3TsKk5Xfv7Wk3+T5uVlF8uKmmKktCI7aO8VEAW6johkw4FqXUAXDgZsoK45o4LCqXvfcl8HW3TCcUk92t4CMZng+RicP75rHf1YVfRIXvN3rB3LGHv2RX7xIZsxAT7yn6wX+x361vrT9SODrarUWvnecX2Jjr5C5kAtaE=</latexit>

– using a cross-entropy loss in place of the squared l2 loss


<latexit sha1_base64="6mQCxy9qndhg626lfYOGiTeoIcU=">AAACPXicbVBNTxsxEPUGWmj6ldJjLyOSSr0k2k0l4BjBhSOVyIcUQjTrnU2seO3F9iJFUf5O/wZ/gGsr8QPaU9UrV5yQAwk8ydKbNzN64xfnUlgXhndBaWv71eud3Tflt+/ef/hY+bTXsbownNpcS216MVqSQlHbCSeplxvCLJbUjScni373mowVWp27aU6DDEdKpIKj89Kw0qrXobBCjQCBG21tnZQzOp+C9AUIBblETqBTcGMCe1WgoQRq8rJZW44MK9WwES4Bz0m0IlW2wtmw8vci0bzIvA+XaG0/CnM3mKFxgkualy8KSznyCY6o76nCjOxgtvzpHL56JYFUG/+Ug6X6dGOGmbXTLPaTGbqx3ewtxJd6/cKlR4OZUHnhSPFHo7SQ4DQsYoNEGOJOTj1BboS/FfgYDXLnw11zSezitHnZBxNtxvCcdJqN6KDx/Uez2jpeRbTLvrB99o1F7JC12Ck7Y23G2U92y36x38FN8Cf4F/x/HC0Fq53PbA3B/QNP264E</latexit>

– using stronger forms of augmentation


<latexit sha1_base64="Re9iweHt9V9D9CJ/SRpRpil573Q=">AAACJHicbVDLTgIxFO3gC/GFunTTSIxuIDOYqEuiG5eYCJLAhHRKZ2joY9J2TMiET/A3/AG3+gfujAs3Lv0OOzALAU/S5OTc1+kJYka1cd0vp7Cyura+UdwsbW3v7O6V9w/aWiYKkxaWTKpOgDRhVJCWoYaRTqwI4gEjD8HoJqs/PBKlqRT3ZhwTn6NI0JBiZKzUL59WqzDRVERQGyVFRBQMpeIayhCiJOJEmLyz4tbcKeAy8XJSATma/fJPbyBxki3ADGnd9dzY+ClShmJGJqVeokmM8AhFpGupQJxoP51+aAJPrDLIjNgnDJyqfydSxLUe88B2cmSGerGWif/VuokJr/yUijgxRODZoTBh0EiYpQMHVBFs2NgShBW1XiEeIoWwsRnOXRnozNqkZIPxFmNYJu16zbuond/VK43rPKIiOALH4Ax44BI0wC1oghbA4Am8gFfw5jw7786H8zlrLTj5zCGYg/P9C5xipVA=</latexit>

Pseudo-labeling
<latexit sha1_base64="yA5HrAlxfzslA3Phj/6GWXStwF4=">AAACF3icbVDLSsNAFJ3UV62vqDvdDBbBjSWpoC6LblxWsA9oQ5lMbtqhkwczE6GEgr/hD7jVP3Anbl36A36HkzQL23rgwuGc++K4MWdSWda3UVpZXVvfKG9WtrZ3dvfM/YO2jBJBoUUjHomuSyRwFkJLMcWhGwsggcuh445vM7/zCEKyKHxQkxicgAxD5jNKlJYG5lE/35EK8KZNCYkXnXPiZuuGA7Nq1awceJnYBamiAs2B+dP3IpoEECrKiZQ924qVkxKhGOUwrfQTCTGhYzKEnqYhCUA6aX5/ik+14mE/ErpChXP170RKAikngas7A6JGctHLxP+8XqL8aydlYZwoCOnskJ9wrCKcBYI9JoAqPtGEUMH0r5iOiCBU6djmrngye21a0cHYizEsk3a9Zl/WLu7r1cZNEVEZHaMTdIZsdIUa6A41UQtR9IRe0Ct6M56Nd+PD+Jy1loxi5hDNwfj6BbWAoLs=</latexit>

predefined threshold
<latexit sha1_base64="2kzjvgl8PW5kQi/L5o9NJ2Pqzy0=">AAACEXicbVDLSsNAFJ34rPUVHzs3g0VwVZIK6rLoxmUF+4A2lMnkphk6mYSZiVBLv8IfcKt/4E7c+gX+gN/hpM3Cth64cDjnXO7l+ClnSjvOt7Wyura+sVnaKm/v7O7t2weHLZVkkkKTJjyRHZ8o4ExAUzPNoZNKILHPoe0Pb3O//QhSsUQ86FEKXkwGgoWMEm2kvn1s0gGEZjvAOpKgooQHfbviVJ0p8DJxC1JBBRp9+6cXJDSLQWjKiVJd10m1NyZSM8phUu5lClJCh2QAXUMFiUF54+n3E3xmlACHiTQjNJ6qfzfGJFZqFPsmGRMdqUUvF//zupkOr70xE2mmQdDZoTDjWCc4rwIHTALVfGQIoZKZXzGNiCRUm8LmrgQqf21SNsW4izUsk1at6l5WL+5rlfpNUVEJnaBTdI5cdIXq6A41UBNR9IRe0Ct6s56td+vD+pxFV6xi5wjNwfr6BZbGnfk=</latexit>

! one-hot (hard-label)
<latexit sha1_base64="miMLfLHzKjf0YAAxK9tSx/v7P4Y=">AAACJHicbZA7TgMxEIa9PEN4BShpLCJEKIh2AQFlBA1lkAhBSqLI650kFl57Zc8C0SpH4BpcgBZuQIcoaCg5B05IweuXLP36Z0Yz/sJECou+/+ZNTE5Nz8zm5vLzC4tLy4WV1QurU8OhxrXU5jJkFqRQUEOBEi4TAywOJdTDq5NhvX4NxgqtzrGfQCtmXSU6gjN0Ubuw1TSi20NmjL6hTYRbzLSCnZ5GWuoxE+1IFoLcHrQLRb/sj0T/mmBsimSsarvw0Yw0T2NQyCWzthH4CbYyZlBwCYN8M7WQMH7FutBwVrEYbCsbfWhAN10S0Y427imko/T7RMZia/tx6Dpjhj37uzYM/6s1UuwctTKhkhRB8a9FnVRS1HRIh0bCAEfZd4ZxI9ytlDsMjKNj+GNLZIenDfIOTPAbw19zsVsODsp7Z/vFyvEYUY6skw1SIgE5JBVySqqkRji5Iw/kkTx5996z9+K9frVOeOOZNfJD3vsnE0qliw==</latexit>

<latexit sha1_base64="i2qpu0ApcFJdV3ZYsy9cMusXoOc=">AAACYnicbVHLSiNBFK20z2lfUZfjojAICtp0R9BZyszGpYJRIR3C7arbSWF1VVNVrYSQL5wvcC+zd+tsrI5Z+LpQcDj33NeprJTCujh+bARz8wuLS8s/wpXVtfWN5ubWtdWVYdhhWmpzm4FFKRR2nHASb0uDUGQSb7K7P3X+5h6NFVpduVGJvQIGSuSCgfNUv4mp0kJxVI6iYr4rDNDSQnOU1DfigtU6S52mGVKpH4681OhyRPdFhNEhHYrB8IhplQvfhOFBmoZa0UpJyFAipxwchP1mK47iadCvIJmBFpnFRb/5L+WaVYUfxiRY203i0vXGYJxgEidhWlksgd35bbseKijQ9sZTOyZ0zzOc5tr45++asu8rxlBYOyoyryzADe3nXE1+l+tWLv/VGwtVVs7f+jYor2RtTu0t5cIgc3LkATAj/K6UDcEAc/4HPkzhtl5tUhuTfLbhK7huR8lJdHzZbp39nlm0TH6SXbJPEnJKzsg5uSAdwshf8kxeyP/GUxAGm8H2mzRozGq2yYcIdl4BlEK5qw==</latexit>

L ! number of classes
<latexit sha1_base64="K2YcH4DPipzgdJ7BQbSLG85xAKI=">AAACI3icbVA5TgMxFPWwE7YAJY1FBKKKZgABJYKGgiJIZJGSKPI4fxILjz2y/wDRKDfgGlyAFm5Ah2goaDkHzlIA4UmWnt77m1+YSGHR9z+8qemZ2bn5hcXc0vLK6lp+faNidWo4lLmW2tRCZkEKBWUUKKGWGGBxKKEa3pwP/OotGCu0usZeAs2YdZSIBGfopFZ+95I2jOh0kRmj72gD4R4zlcYhGKojyiWzFmy/lS/4RX8IOkmCMSmQMUqt/FejrXkag8LhjHrgJ9jMmEHBJfRzjdRCwvgN60DdUcVisM1s+J8+3XFKm0bauKeQDtWfHRmLre3FoauMGXbtX28g/ufVU4xOmplQSYqg+GhRlEqKmg7CoW1hgKPsOcK4Ee5WyrvMMI4uwl9b2nZwWj/nggn+xjBJKvvF4Kh4cHVYOD0bR7RAtsg22SMBOSan5IKUSJlw8kCeyDN58R69V+/Nex+VTnnjnk3yC97nN9jnpYI=</latexit>

encourages model predictions to be low-entropy (i.e., high-confidence)


X = {(xb , pb ) : b = 1, . . . , B} ! batch of B labeled examples
<latexit sha1_base64="jTwWyP2PgXVeUYpXmC8tkjxXLVA=">AAACXHicbZDBaxNBFMYna9WaWo0VvHh5mAoVQtjV0hZBKPHisYJpA9mwzMy+TYbO7iwzb23Csn+ef4QXj1686t3ZNAfb+sHAx/fe4735iVIrR2H4vRPc27r/4OH2o+7O490nT3vP9s6dqazEsTTa2IngDrUqcEyKNE5KizwXGi/E5ce2fvEVrVOm+EKrEmc5nxcqU5KTj5JeEuecFpLretLAB4jrg2UiBlAm4s17ED6JBhDr1JAbwChuILZqviBurbmCmHBJteAkF2Ay2B/tg+YCNaaAS56XGl2T9PrhMFwL7ppoY/pso7Ok9zNOjaxyLEhq7tw0Ckua1dySkhqbblw5LLm85HOcelvwHN2sXoNo4LVPUsiM9a8gWKf/TtQ8d26VC9/ZftvdrrXh/2rTirKTWa2KsiIs5PWirNJABlqqkCqLkvTKGy6t8reCXHDLJXn2N7akrj2t6Xow0W0Md83522F0NHz3+bB/Otog2mYv2St2wCJ2zE7ZJ3bGxkyyb+wX+83+dH4EW8FOsHvdGnQ2M8/ZDQUv/gLEYbVx</latexit>

on unlabeled data
FixMatch
<latexit sha1_base64="H/m3o+EUyXykbSJTZAxpMP5TnWc=">AAACEHicbVDNSsNAGNzUv1r/oj16WSyCp5JUUI9FQbwIFWwttKFsNpt26WYTdjdiCHkJX8CrvoE38eob+AI+h5s0B9s6sDDMfN/Ox7gRo1JZ1rdRWVldW9+obta2tnd298z9g54MY4FJF4csFH0XScIoJ11FFSP9SBAUuIw8uNOr3H94JELSkN+rJCJOgMac+hQjpaWRWR8Wf6SCeNk1fbpFCk9GZsNqWgXgMrFL0gAlOiPzZ+iFOA4IV5ghKQe2FSknRUJRzEhWG8aSRAhP0ZgMNOUoINJJi+AMHmvFg34o9OMKFurfjRQFUiaBqycDpCZy0cvF/7xBrPwLJ6U8ihXheBbkxwyqEOZNQI8KghVLNEFYUH0rxBMkEFa6r7kUT+anZTVdjL1YwzLptZr2WfP0rtVoX5YVVcEhOAInwAbnoA1uQAd0AQYJeAGv4M14Nt6ND+NzNloxyp06mIPx9QsEcZ2u</latexit>

xb ! training example
<latexit sha1_base64="5pgG+tp7SG4PwCgXo4H8ZmoSmP4=">AAACJHicbVC7TgJBFJ31Lb5QS5uJxGhFdtWopdHGUhMBEyDk7nCBibOzm5m7CtnwCf6GP2Crf2BnLGws/Q4HpFDwJJOcnHNfc8JESUu+/+FNTc/Mzs0vLOaWlldW1/LrG2Ubp0ZgScQqNjchWFRSY4kkKbxJDEIUKqyEt+cDv3KHxspYX1MvwXoEbS1bUgA5qZHf7TZCXjOy3SEwJr7nNcIuZWRAaqnbHLsQJQr7jXzBL/pD8EkSjEiBjXDZyH/VmrFII9QkFFhbDfyE6hkYksLNy9VSiwmIW2hj1VENEdp6NvxQn+84pclbsXFPEx+qvzsyiKztRaGrjIA6dtwbiP951ZRaJ/VM6iQl1OJnUStVnGI+SIc3pUFBqucICCPdrVx0wIAgl+GfLU07OK2fc8EE4zFMkvJ+MTgqHlwdFk7PRhEtsC22zfZYwI7ZKbtgl6zEBHtgT+yZvXiP3qv35r3/lE55o55N9gfe5zdVzqZV</latexit>

`s ! supervised loss
<latexit sha1_base64="HOQdjjOpUIomBao3oKBnlJXFjPs=">AAACJnicbVDLSgNBEJz1GeMr6tHLYBD0EnZV1GPQi0cFo0I2hNlJJxmc3Vmme6NhyTf4G/6AV/0DbyLePPkd7sYcNLFgoKjqpnoqiLVCct0PZ2p6ZnZuvrBQXFxaXlktra1foUmshJo02tibQCBoFUGNFGm4iS2IMNBwHdye5v51DywqE11SP4ZGKDqRaispKJOapV0ftG4i963qdElYa+64T3BPKSYx2J5CaHFtEAfNUtmtuEPwSeKNSJmNcN4sffktI5MQIpJaINY9N6ZGKiwpqWFQ9BOEWMhb0YF6RiMRAjbS4ZcGfDvJg9vGZi8iPlR/b6QiROyHQTYZCuriuJeL/3n1hNrHjVRFcUIQyZ+gdqI5GZ73w1vKgiTdz4iQVmW3ctkVVkjKWvyT0sL8tEExK8Ybr2GSXO1VvMPK/sVBuXoyqqjANtkW22EeO2JVdsbOWY1J9sCe2DN7cR6dV+fNef8ZnXJGOxvsD5zPb0hfp14=</latexit>

pb ! one-hot-labels
<latexit sha1_base64="CbobAbbQiLiItlmuEqE7RzMAJUg=">AAACInicbVDJSgNBEO1xN25Rj14ag+DFMKOiHoNePCqYBZIQejo1mcae7qG7Rg1DvsDf8Ae86h94E0+CZ7/DznJwe1DweK+KqnphKoVF33/3pqZnZufmFxYLS8srq2vF9Y2a1ZnhUOVaatMImQUpFFRRoIRGaoAloYR6eH029Os3YKzQ6gr7KbQT1lMiEpyhkzrFnbQT0pYRvRiZMfqWthDuMNcK9mKNe5KFIO2gUyz5ZX8E+pcEE1IiE1x0ip+truZZAgq5ZNY2Az/Fds4MCi5hUGhlFlLGr1kPmo4qloBt56N3BnTHKV0aaeNKIR2p3ydylljbT0LXmTCM7W9vKP7nNTOMTtq5UGmGoPh4UZRJipoOs6FdYYCj7DvCuBHuVspjZhhHl+CPLV07PG1QcMEEv2P4S2r75eCofHB5WKqcTiJaIFtkm+ySgByTCjknF6RKOLknj+SJPHsP3ov36r2NW6e8ycwm+QHv4wsuAKUr</latexit>

`u ! unsupervised loss
<latexit sha1_base64="wjETlS/KV7VX6eqZAn/gpkQGEPw=">AAACKHicbVDLSgMxFM34rPVVdekmWARBKDMq6lJ047KC1UKnlEx62wYzyZDcqGXoR/gb/oBb/QN34taF3+FM7cK2HggczrmXc3OiRAqLvv/pzczOzS8sFpaKyyura+uljc0bq53hUONaalOPmAUpFNRQoIR6YoDFkYTb6O4i92/vwVih1TX2E2jGrKtER3CGmdQq7YcgZcvR0IhuD5kx+oGGCI+YOmVdAuZeWGhTqa0dtEplv+IPQadJMCJlMkK1VfoO25q7GBRyyaxtBH6CzZQZFFzCoBg6Cwnjd6wLjYwqFoNtpsNPDeiuy4M72mRPIR2qfzdSFlvbj6NsMmbYs5NeLv7nNRx2TpupUIlDUPw3qOMkRU3zhmhbGOAo+xlh3IjsVsp7zDCOWY9jKW2bnzYoZsUEkzVMk5uDSnBcObw6Kp+djyoqkG2yQ/ZIQE7IGbkkVVIjnDyRF/JK3rxn79378D5/R2e80c4WGYP39QMjDKhX</latexit>

U = {ub : b = 1, . . . , µB} ! batch of µB unlabeled examples


<latexit sha1_base64="AjlhOcff3cLrxEh71eSmzrTOr2A=">AAACX3icbVBNaxRBEO0dPxJXjaOexEvhRvAgy4xKFEEI8eIxgpsEdpaluqdmt0nP9NBdE7MM8//8Cx69ePSqR3s2ezCJBU0/3ntFVT1ZG+05Sb4Pohs3b93e2r4zvHvv/s6D+OGjI28bp2iirLHuRKInoyuasGZDJ7UjLKWhY3n6sdePz8h5basvvKppVuKi0oVWyIGaxzIrkZcKTTvp4ANkbTOX70EGmL6EzOSWffjLBg6yDjKnF0tG5+xXyJjOuZXIagm2gN21ZxeayqAkQznQOZa1Id/N41EyTtYF10G6ASOxqcN5/DPLrWpKqlgZ9H6aJjXPWnSslaFumDWealSnuKBpgBWW5GftOosOngcmh8K68CqGNftvR4ul96tSBmd/ub+q9eT/tGnDxbtZq6u6YarUxaCiMcAW+mAh144Um1UAqJwOu4JaokPFIf5LU3Lfr9YNQzDp1Riug6NX43Rv/Przm9H+wSaibfFUPBMvRCrein3xSRyKiVDim/glfos/gx/RVrQTxRfWaLDpeSwuVfTkL5tNt1o=</latexit>

µ ! determines the relative size of X and U


<latexit sha1_base64="HYDDvqF93SPh4Ul3q/d+T+a9WRc=">AAACU3icbVC7bhNBFB1vQggmCQZKmlFspFTWLiCgjKChDBJ2LHkt6+7sXXuUeaxm7iYxq/0zfoMiLQUN/AENs46LvI400tG5rzMnK5X0FMdXnWhr+9HO490n3ad7+wfPes9fjL2tnMCRsMq6SQYelTQ4IkkKJ6VD0JnC0+zsc1s/PUfnpTXfaFXiTMPCyEIKoCDNe+NUVzx1crEkcM5e8JTwkuocCZ0OOz2nJXKHKvSfI/fyO3Jb8EGqgZYCVD1pBhxMfkMZNYNm3uvHw3gNfp8kG9JnG5zMe7/T3IpKoyGhwPtpEpc0q8GRFAqbblp5LEGcwQKngRrQ6Gf1+v8Nfx2UnBfWhWeIr9WbEzVo71c6C52tSX+31ooP1aYVFR9ntTRlRWjE9aGiUpwsb8PkuXQoSK0CAeFk8MrFEhyIkN7tK7lvrTXdEExyN4b7ZPxmmLwfvv36rn/8aRPRLnvFDtkRS9gHdsy+sBM2YoL9YL/YH/a387PzL4qi7evWqLOZecluIdr/D9cytdQ=</latexit>

pm (y|x) ! predicted class distribution produced by the model for input x


<latexit sha1_base64="1WUTpqlLx+8QfX6B+9r7GwMDh6k=">AAACYHicbVBNT9tAFNyYtqTpBwFu5bIiVKKXyKYVcERw4UglEpCSKFqvn5MVu15r9xliGf9AfgJXDly5wq1rJ4cSeKfRzPuYN2EqhUXfv294Kx8+flptfm59+frt+1p7faNvdWY49LiW2lyGzIIUCfRQoITL1ABToYSL8Oqk0i+uwVihk3PMUxgpNklELDhDR43bPB0PEWZYqHI3v539okMjJlNkxugbOlfcvkhwhIhyyaylkXNlRJhVC2hqdJRxp4U5xSlQpSOQNNaGiiTNkO7Mdspxu+N3/broWxAsQIcs6mzcfhxGmmcKEqxPDgI/xVHBDAouoWwNMwsp41dsAgMHE6bAjoo6jJL+dExUO4h1grRm/58omLI2V6HrVAyndlmryPe0QYbx4aio34KEzw/FmaSoaZWsy8UAR5k7wLgRzivlU2aYi868vhLZylrZcsEEyzG8Bf29brDf/f33T+foeBFRk2yRbbJLAnJAjsgpOSM9wskdeSLP5KXx4DW9NW993uo1FjOb5FV5P/4BdJG65w==</latexit>

H(p, q) ! cross-entropy between two probability distributions p and q


<latexit sha1_base64="UmDTDdeiuVlvPV4gCNA9AOahy70=">AAACV3icbZDPbhMxEMadhbYh/EvpkYtFgtRKEO1SBBwreumxSKStSKJo7J0kVr22a88SolXejdfoC/TavgF40xxoyydZ+vTNjGf0E06rQGl62UgePd7Y3Go+aT199vzFy/b2q5NgSy+xL622/kxAQK0M9kmRxjPnEQqh8VScH9b105/og7LmOy0cjgqYGjVREihG4/aPo133jl/s8aFX0xmB93bOh4S/qJLehvAeDXnrFlwgzRENp7nlzlsBQmlFC57HG70SZf1d4F3X5WBy3r3oLsftTtpLV+IPTbY2HbbW8bh9NcytLIu4UmoIYZCljkYVeFJS47I1LAM6kOcwxUG0BgoMo2rFYMnfxiTnE+vjM8RX6b8TFRQhLAoROwugWbhfq8P/1QYlTb6MKmVcSWjk7aJJqTlZXgONADxK0jUJkF7FW7mcgQdJEfudLXmoT1u2IpjsPoaH5uRDL/vU2//2sXPwdY2oyV6zN2yXZewzO2BH7Jj1mWS/2RW7ZjeNy8afZDNp3rYmjfXMDrujZPsvKtO23w==</latexit>

! pseudo-label
<latexit sha1_base64="rC/rKUQSRmAB29FP0uOj57iGBvg=">AAACHHicbVDLSgNBEJyN7/iKevTgYBC8GHZV1KPoxWME84AkhNnZTjJkdmeZ6VXDkqO/4Q941T/wJl4Ff8DvcPI4mMSChqKqm+4uP5bCoOt+O5m5+YXFpeWV7Ora+sZmbmu7bFSiOZS4kkpXfWZAighKKFBCNdbAQl9Cxe9eD/zKPWgjVHSHvRgaIWtHoiU4Qys1c3t1LdodZFqrB1pHeMQ0NpAE6kgyH2S/mcu7BXcIOku8McmTMYrN3E89UDwJIUIumTE1z42xkTKNgkvoZ+uJgZjxLmtDzdKIhWAa6fCRPj2wSkBbStuKkA7VvxMpC43phb7tDBl2zLQ3EP/zagm2LhqpiOIEIeKjRa1EUlR0kAoNhAaOsmcJ41rYWynvMM042uwmtgRmcFo/a4PxpmOYJeXjgndWOLk9zV9ejSNaJrtknxwSj5yTS3JDiqREOHkiL+SVvDnPzrvz4XyOWjPOeGaHTMD5+gX1kaMB</latexit>

A(·) ! strong augmentation (autoaugment/randaugment + cutout)


<latexit sha1_base64="j+knyZmNY8AkD2m6PUYWKYZWpMg=">AAACWXicbVDLThsxFHWmD0KgbYAlG6tRpaBK6QxFlCVtNyyDRAApE0V3PM7EwmOP7GtKNJqf619U7FG35QvwJLPg0SNZOjrnvnySQgqLYfinFbx6/ebtWnu9s7H57v2H7tb2udXOMD5iWmpzmYDlUig+QoGSXxaGQ55IfpFc/az9i2turNDqDBcFn+SQKTETDNBL024c54BzBrL8XvVjlmrco7ER2RzBGP2LxshvsLRotMoouCznCpettA8OdaN8MaDShtPPlHnH4V417fbCQbgEfUmihvRIg+G0exenmrl6DJNg7TgKC5yUYFAwyatO7CwvgF1BxseeKsi5nZTLFCr6ySspnWnjnz9jqT7uKCG3dpEnvrL+s33u1eL/vLHD2dGkFKpwyBVbLZo5SVHTOlKaCsMZyoUnwIzwt1I2BwMMffBPtqS2Pq3q+GCi5zG8JOf7g+hw8PX0oHf8o4moTXbJR9InEflGjskJGZIRYeQ3+Uv+kfvWbdAK2kFnVRq0mp4d8gTBzgOQO7hI</latexit>

↵(·) ! weak augmentation (flip and shift)


<latexit sha1_base64="nJg3OuQ/Cd14EOZv4BAUmu2PZ+c=">AAACe3icdVFNb9NAEF27fJTwFSi3XlZElVKEIpvPHgtcOBaJtJXiKBqvx/Yq611rd0yJLPM/uSP+BRLrNIemhZFWenpv3szobVor6SiKfgbhzq3bd+7u3hvcf/Dw0ePhk6enzjRW4FQYZex5Cg6V1DglSQrPa4tQpQrP0uWnXj/7htZJo7/SqsZ5BYWWuRRAnloMfxwkFVApQLUfunEiMkOHPLGyKAmsNRc8IfxOrSNrdMGhKSrUtPZ2gwRUXcL/TRcIyy0LH+dK1hx0xl0pczrsFsNRNInWxW+CeANGbFMni+HvJDOi6UcKBc7N4qimeQuWpFDob2oc1iCWUODMQw0Vunm7zqnjB57JeG6sf5r4mr3qaKFyblWlvrMPxV3XevJf2qyh/GjeSl03hFpcLsobxcnwPnSeSYuC1MoDEFb6W7kowYIg/zVbWzLXn9YNfDDx9RhugtNXk/jd5PWXN6Pjj5uIdtk+e87GLGbv2TH7zE7YlAn2KxgEe8Gz4E84Cl+ELy9bw2Dj2WNbFb79C2NKwxc=</latexit>

Consistency Regularization
<latexit sha1_base64="lGcpTrxly2mD6OOFbIy56UTToZQ=">AAACInicbVDLTgIxFO3gC/GFunTTSEhckRlM1CWRjUs08kiAkE7nAg2dzqTtmIwTvsDf8Afc6h+4M65MXPsddmAWAp6kycm55z563JAzpW37y8qtrW9sbuW3Czu7e/sHxcOjlgoiSaFJAx7IjksUcCagqZnm0AklEN/l0HYn9bTefgCpWCDudRxC3ycjwYaMEm2kQbHcm81IJHjTeiCU2QiCxvgORhEnkj1mvpJdsWfAq8TJSAllaAyKPz0voJEPQlNOlOo6dqj7CZGaUQ7TQi9SEBI6ISPoGiqID6qfzE6Z4rJRPDwMpHlC45n6tyMhvlKx7xqnT/RYLddS8b9aN9LDq37CRBiln5wvGkYc6wCn2WCPSaCax4YQKpm5FdMxkYRqk+DCFk+lp00LJhhnOYZV0qpWnIvK+W21VLvOIsqjE3SKzpCDLlEN3aAGaiKKntALekVv1rP1bn1Yn3Nrzsp6jtECrO9fRwel0A==</latexit>

! both ↵ and pm are stochastic functions, so the two terms in this equation will indeed have di↵erent values
<latexit sha1_base64="/gjf0Ur+Z8/XeuFTVkdA3ZN3+zM=">AAACjHicbVFdb9MwFHXCx0aBrcDjXizaSTygKoFpICGkCSTE45DoNqmpqhvnprHm2Jl901JF+Q37ffsD/APecdo+sI0rWT465375OK2UdBRFN0H44OGjxzu7T3pPnz3f2++/eHnmTG0FjoVRxl6k4FBJjWOSpPCisghlqvA8vfza6ecLtE4a/ZNWFU5LmGuZSwHkqVn/OrFyXhBYa5Y8IfxFTWqo4MMEVFXAkIPO+LCabaSy9YRF7siIAhxJwfNai66Ve8ud4VQgp6W/0ZaOS+0J6The1etxfCmV8myGmPECFsgzmedoURNfgKrRtbP+IBpF6+D3QbwFA7aN01n/d5IZUZe+h1Dg3CSOKpo2YP1yCtteUjusQFzCHCceaijRTZu1cS0/9EzGc2P98Tus2X8rGiidW5WpzyyBCndX68j/aZOa8o/TRuqqJtRiMyivFSdvjf8F/26LgtTKAxBWdkZ6Qy0Ib9ztKZnrVmt73pj4rg33wdm7UXw8ev/jaHDyZWvRLjtgr9kbFrMP7IR9Z6dszAT7ExwEw+Aw3AuPwk/h501qGGxrXrFbEX77C267yC4=</latexit>

Sohn, Kihyuk, et al. "Fixmatch: Simplifying semi-supervised learning with consistency and con dence." arXiv preprint arXiv:2001.07685 (2020).
fi
fi
Training Data-E cient Image Transformers
& Distillation Through Attention YouTube Video

DeiT: Data-Efficient Image Transformers


<latexit sha1_base64="encSOh291Z605DDzGxAyF5uK9JM=">AAACJHicbVDLSgMxFM3UV62vUZdugkV0Y5mpoOKqaAXdVegL2qFkMhkbmmSGJCOU0k/wN/wBt/oH7sSFG5d+h5l2FrZ6IHA493Fujh8zqrTjfFq5hcWl5ZX8amFtfWNzy97eaaookZg0cMQi2faRIowK0tBUM9KOJUHcZ6TlD67SeuuBSEUjUdfDmHgc3QsaUoy0kXr2YZXQ+gWsIo2Or0OjUyI0vDVdBNYlEiqMJDfzPbvolJwJ4F/iZqQIMtR69nc3iHDCzTrMkFId14m1N0JSU8zIuNBNFIkRHhijjqECcaK80eRDY3hglAAaa/PMORP198QIcaWG3DedHOm+mq+l4n+1TqLDc29ERZxoIvDUKEwY1BFM04EBlQRrNjQEYUnNrRD3kURYmwxmXAKVnjYumGDc+Rj+kma55J6WTu7KxcplFlEe7IF9cARccAYq4AbUQANg8AiewQt4tZ6sN+vd+pi25qxsZhfMwPr6AcXopNE=</latexit>

<latexit sha1_base64="I56gSfpm7wq++2xjx5UfI4O5+SI=">AAACM3icbVDLSiNBFK32MTrRGaOznE0xQdDFhG4D0WVQRxzBx4BJhBjC7eobU6S6uqm6LYaQP/E3/AG3+gMyOxF3/sNUHovxcaDgcM693FMnTJW05PsP3tT0zOynufnPuYXFL1+X8ssrNZtkRmBVJCoxZyFYVFJjlSQpPEsNQhwqrIfdnaFfv0RjZaJPqZdiM4YLLdtSADmplS//dgIeIfFdIODHWvX42lHCf10RGg1qLNtMdDhYfrB3+rPk+4frrXzBL/oj8PckmJACm+CklX8+jxKRxahJKLC2EfgpNftgSAqFg9x5ZjEF0XVhGo5qiNE2+6P/DfiqUyLeTox7mvhI/X+jD7G1vTh0kzFQx771huJHXiOj9lazL3WaEWoxPtTOFKeED8vikTQoyFUSSRBGuqxcdMCAcOW8vhLZYbRBzhUTvK3hPaltFINysfRno1DZnlQ0z76zH2yNBWyTVdg+O2FVJtg1u2V37N678f56j97TeHTKm+x8Y6/gvfwDI3mosw==</latexit>

ImageNet Data Only (No External Data such as JFT-300M)

Distillation through attention


<latexit sha1_base64="mispBklfBmYs9bCruSkmKpgpdZI=">AAACJnicbVDLSsNAFL2pr1pfVZduBougm5JUUJdFXbisYB/QhjKZTJqhkwczE6GEfIO/4Q+41T9wJ+LOld/hJO3Ctl4YOJxz7z13jhNzJpVpfhmlldW19Y3yZmVre2d3r7p/0JFRIghtk4hHoudgSTkLaVsxxWkvFhQHDqddZ3yT691HKiSLwgc1iakd4FHIPEaw0tSwejYodqSCutmttmOcFwpSvoiSkY+wUjSc9tbMulkUWgbWDNRgVq1h9WfgRiQJ9DjhWMq+ZcbKTrFQjHCaVQaJpDEmYzyifQ1DHFBpp8U5GTrRjIu8SOgXKlSwfydSHEg5CRzdGWDly0UtJ//T+onyruyUhXGi/0WmRl7CkYpQng9ymaBE8YkGmAimb0XExwITpVOcc3FlflpW0cFYizEsg06jbl3Uz+8bteb1LKIyHMExnIIFl9CEO2hBGwg8wQu8wpvxbLwbH8bntLVkzGYOYa6M71+HIKd9</latexit>

<latexit sha1_base64="IybaAzzAudkwjY2vkSQdiV6jIEo=">AAACPXicbVDLSgMxFM34rPVVdekmWAQFGWYqqLiqduNGUbE+qKVkMrdtMJMMyR2hlP6Ov+EPuFXwA3Qlbt2a1i58XQgczj33npsTpVJYDIJnb2R0bHxiMjeVn56ZnZsvLCyeW50ZDlWupTaXEbMghYIqCpRwmRpgSSThIrqp9PsXt2Cs0OoMOynUE9ZSoik4Q0c1CuUzYLwNhh7qGOQu3aMWjVYtKpwQKJfMWid3gjXwW/4GPYXWEeAVrWh168B6o1AM/GBQ9C8Ih6BIhnXcKLxex5pnCSgcbK+FQYr1LjMouIRe/jqzkDJ+4+xrDiqWgK13Bz/t0VXHxLSpjXsK6YD9PtFlibWdJHLKhGHb/u71yf96tQybO/WuUGmGoPiXUTOTFDXtx0ZjYYCj7DjAuBHuVsrbzDCOLtwfLrHtn9bLu2DC3zH8BeclP9zyN09KxfL+MKIcWSYrZI2EZJuUyQE5JlXCyR15II/kybv3Xrw37/1LOuINZ5bIj/I+PgGkka2b</latexit>

Teacher Model: A strong image classifier (e.g., RegNetY ConvNet)


<latexit sha1_base64="IybaAzzAudkwjY2vkSQdiV6jIEo=">AAACPXicbVDLSgMxFM34rPVVdekmWAQFGWYqqLiqduNGUbE+qKVkMrdtMJMMyR2hlP6Ov+EPuFXwA3Qlbt2a1i58XQgczj33npsTpVJYDIJnb2R0bHxiMjeVn56ZnZsvLCyeW50ZDlWupTaXEbMghYIqCpRwmRpgSSThIrqp9PsXt2Cs0OoMOynUE9ZSoik4Q0c1CuUzYLwNhh7qGOQu3aMWjVYtKpwQKJfMWid3gjXwW/4GPYXWEeAVrWh168B6o1AM/GBQ9C8Ih6BIhnXcKLxex5pnCSgcbK+FQYr1LjMouIRe/jqzkDJ+4+xrDiqWgK13Bz/t0VXHxLSpjXsK6YD9PtFlibWdJHLKhGHb/u71yf96tQybO/WuUGmGoPiXUTOTFDXtx0ZjYYCj7DjAuBHuVsrbzDCOLtwfLrHtn9bLu2DC3zH8BeclP9zyN09KxfL+MKIcWSYrZI2EZJuUyQE5JlXCyR15II/kybv3Xrw37/1LOuINZ5bIj/I+PgGkka2b</latexit>

Teacher Model: A strong image classifier (e.g., RegNetY ConvNet)


Soft Distillation
<latexit sha1_base64="ZPjcQFIA4CCmv3V2TMUuKoIafZ0=">AAACGXicbVC9TsMwGHTKXyl/BUYYLCokpiopEjBWwMBYBG2R2qhyHKe16tiR7SBVURZegxdghTdgQ6xMvADPgZNmoC0nWTrdfZ+/03kRo0rb9rdVWlpeWV0rr1c2Nre2d6q7ex0lYolJGwsm5IOHFGGUk7ammpGHSBIUeox0vfFV5ncfiVRU8Hs9iYgboiGnAcVIG2lQPeznfySS+OmdCDS8NjcpY4Vds+t2DrhInILUQIHWoPrT9wWOQ8I1ZkipnmNH2k2Q1BQzklb6sSIRwmM0JD1DOQqJcpM8QQqPjeLDQEjzuIa5+ncjQaFSk9AzkyHSIzXvZeJ/Xi/WwYWbUB7FmnA8PRTEDGoBs0qgTyXBmk0MQVhSkxXiEZIIa1PczBVfZdHSiinGma9hkXQadeesfnrbqDUvi4rK4AAcgRPggHPQBDegBdoAgyfwAl7Bm/VsvVsf1ud0tGQVO/tgBtbXL2E3oaA=</latexit>

! models trained with our transformer-specific distillation


<latexit sha1_base64="RMpaT6LKMuAV6OCAZgKkOBdZPRw=">AAACSnicbVBNSxtRFH0TU01j1WiX3TwahG4aZlS0y6AbN0KExghJCG/e3CQP38fw3h3TMORX+Tf8A9kq+AO6K934Zsyiai9cOJz7de6JUykchuEyqKxVP6xv1D7WNz9tbe80dveunMkshy430tjrmDmQQkMXBUq4Ti0wFUvoxTdnRb13C9YJo3/iPIWhYhMtxoIz9NSocTGwYjJFZq2Z0QHCL8yVSUA6ipb5nQmdCZxSf64gtBsbq8B+dynwYgtNvEQhZbltUR81mmErLIO+B9EKNMkqOqPG0yAxPFOgkUvmXD8KUxzmzKLgEhb1QeYgZfyGTaDvoWYK3DAv317Qfc8k1CvyqZGW7L8TOVPOzVXsOxXDqXtbK8j/1foZjn8Mc6HTDEHzl0PjTFI0tPDQP22Bo5x7wLgVXivlU2YZR+/0qyuJK6SVxkRvbXgPrg5a0XHr8PKo2T5dWVQjX8hX8o1E5IS0yTnpkC7h5I4syQN5DO6D38Gf4O9LayVYzXwmr6JSfQaV+7Vu</latexit>

Hard-label Distillation
<latexit sha1_base64="wmog2ozQiFkgesKiD8zN9eZ5FP8=">AAACH3icbVDLSsNAFJ3UV62vqks3Q4vgxpJUUJdFXXRZwT6gDWUyuWmHTh7MTIQSuvc3/AG3+gfuxG1/wO9wkmZhWw8MHM65d+7hOBFnUpnm3ChsbG5t7xR3S3v7B4dH5eOTjgxjQaFNQx6KnkMkcBZAWzHFoRcJIL7DoetM7lO/+wxCsjB4UtMIbJ+MAuYxSpSWhuXKIPsjEeDOmkS4l5w4wPGDvsw4z4eqZs3MgNeJlZMqytEaln8GbkhjHwJFOZGyb5mRshMiFKMcZqVBLCEidEJG0Nc0ID5IO8lyzPC5VlzshUK/QOFM/buREF/Kqe/oSZ+osVz1UvE/rx8r79ZOWBDFCgK6OOTFHKsQp8Vglwmgik81IVQwnRXTMRGEKl3f0hVXptFmJV2MtVrDOunUa9Z17eqxXm3c5RUV0RmqoAtkoRvUQE3UQm1E0Qt6Q+/ow3g1Po0v43sxWjDynVO0BGP+C89po+w=</latexit>

Fixing the positional encoding across resolutions


<latexit sha1_base64="CmM5heEvRIXqXUgTEv6Au7bT+Cw=">AAACOXicbZDNSgMxFIUz/lv/qi7dBIvgqswoqAsXoiAuK9hWaEvJZG7bYCYZcjNiGfoyvoYv4FZ3Lt2IuPUFzLRdWOuFwOGcm9ybL0ykQOv7b97M7Nz8wuLScmFldW19o7i5VUOdGg5VrqU2tyFDkEJB1Qor4TYxwOJQQj28u8jz+j0YFFrd2H4CrZh1legIzqyz2sXT5vCNzEA0uBQPQnWp7QFNNIq8gUkKiuso9xk3GpEaQC3TPMR2seSX/WHRaRGMRYmMq9IufjQjzdMYlOWSITYCP7GtjBkruIRBoZkiJIzfsS40nFQsBmxlww0HdM85Ee1o446ydOj+vpGxGLEfh64zZraHf7Pc/C9rpLZz0sqESlLrPjsa1EkltZrmyGgkDHAr+044Bo4Lp7zHDOPWgZ2YEmG+2qDgwAR/MUyL2kE5OCofXh+Uzs7HiJbIDtkl+yQgx+SMXJEKqRJOHskzeSGv3pP37n16X6PWGW98Z5tMlPf9Aw4kr1U=</latexit>

Use a lower training resolution and fine-tune the network at a larger resolution
<latexit sha1_base64="DYB4cfkStGw8y6alA9g/Tp/ah7M=">AAACV3icbVDLSgNBEJysrxhfUY9eBoPgxbCroB6DXjwqGBWTEHpnO8mQ2ZllplcJIf/mb/gDXvUPdDYGfDYMFNVdVE3FmZKOwvC5FMzNLywulZcrK6tr6xvVza0bZ3IrsCmMMvYuBodKamySJIV3mUVIY4W38fC82N8+oHXS6GsaZdhJoa9lTwogT3Wr921tpE5QE2865MCVeUTLyYLUUve5RWdUXtxy0AnveZsDyjVyGiDXSI/GDjlQIQTb98ovQbdaC+vhdPhfEM1Ajc3mslt9aSdG5KkPIxQ414rCjDpjsCSFwkmlnTvMQAyhjy0PNaToOuNpBxO+5xmfz1j//Gem7HfFGFLnRmnsL1Oggfu9K8j/dq2ceqedsdRZTqjFp1EvV5wMLwrlibQoSI08AGGlz8rFACwI8rX/cElcEW1S8cVEv2v4C24O69Fx/ejqsNY4m1VUZjtsl+2ziJ2wBrtgl6zJBHtiL+yVvZWeS+/BYlD+PA1KM802+zHB5gdF1beH</latexit>

a larger resolution ! softmax


<latexit sha1_base64="KkwhXNwOrl6xB0L95SsZG9+21o8=">AAACHHicbVDLSsNAFJ3UV62vqksXDhbBVUlU1GXRjcsK9gFNKJPJpB06yYSZG20JXfob/oBb/QN34lbwB/wOp4+FbT1w4XDOvdx7j58IrsG2v63c0vLK6lp+vbCxubW9U9zdq2uZKspqVAqpmj7RTPCY1YCDYM1EMRL5gjX83s3IbzwwpbmM72GQMC8inZiHnBIwUrt46CaaY1fxTheIUvIRu8D6kGkZQkT6w3axZJftMfAicaakhKaotos/biBpGrEYqCBatxw7AS8jCjgVbFhwU80SQnukw1qGxiRi2svGjwzxsVECHEplKgY8Vv9OZCTSehD5pjMi0NXz3kj8z2ulEF55GY+TFFhMJ4vCVGCQeJQKDrhiFMTAEEIVN7di2iWKUDDZzWwJ9Oi0YcEE48zHsEjqp2Xnonx2d16qXE8jyqMDdIROkIMuUQXdoiqqIYqe0At6RW/Ws/VufVifk9acNZ3ZRzOwvn4BBG2jDg==</latexit>

⌧ ! temperature
<latexit sha1_base64="DnsbErU1x04JEN1mEbJKm9PLs/o=">AAACIHicbVBLSgNBFOzxG+Mv6tJNYxBchRkVdRl04zKC+UAmhDedl6Sx50P3GzUMOYDX8AJu9QbuxKUewHPY+SzUWNBQVNXjva4gUdKQ6344c/MLi0vLuZX86tr6xmZha7tm4lQLrIpYxboRgEElI6ySJIWNRCOEgcJ6cHMx8uu3qI2Mo2saJNgKoRfJrhRAVmoXij5Byn0te30CreM77hPeU0YYJqiBUo1Dm3JL7hh8lnhTUmRTVNqFL78TizTEiIQCY5qem1ArA01SKBzm/dRgAuIGeti0NIIQTSsbf2bI963S4d1Y2xcRH6s/JzIIjRmEgU2GQH3z1xuJ/3nNlLpnrUxGSUoYicmibqo4xXzUDO9IjYLUwBIQWtpbueiDBkG2v19bOmZ02jBvi/H+1jBLaocl76R0dHVcLJ9PK8qxXbbHDpjHTlmZXbIKqzLBHtgTe2YvzqPz6rw575PonDOd2WG/4Hx+A3U/pOA=</latexit>

<latexit sha1_base64="sIyvJKDX/GCs7uXU7xAzTb4BqEQ=">AAACU3icbVBNT9tAEN0Y2tK0KQGOvayaVEovkQ1S2yOCC1IlBBL5kJIoWq8n8SrrtdkZI4Uo/4y/wYErh17gH3BhHXwopE8a6enNjObNCzOtkHz/ruJtbL57/2HrY/XT59qX7frObhfT3EroyFSnth8KBK0MdEiRhn5mQSShhl44Oy76vSuwqFJzQfMMRomYGjVRUpCTxvXu0KTKRGCI/wHIOMXAlZsBngmSMUd1DbhSUSTAm0OVOFuATd48bfIWwmUORgLXYKYU/+AyFmYKOK43/La/Al8nQUkarMTZuP53GKUyT5wRqQXiIPAzGi2EJSU1LKvDHCETcuacDRw1zgyOFqv/l/y7UyI+Sa0r98hK/XdjIRLEeRK6yURQjG97hfi/3iCnye/RQpksJ/fly6FJrjmlvAiTR8qCJD13REirnNciACskuchfXYmwsLasumCCtzGsk+5+O/jZPjjfbxwelRFtsa/sG2uxgP1ih+yEnbEOk+yG3bMH9li5rTx5nrf5MupVyp099gpe7RlpNbPo</latexit>

Keep the image patch sizes the same =) N (sequence length) changes
Zs , Zt ! student and teacher logits
<latexit sha1_base64="Z4vU5O1jbDhO/yl7c1/iCAOO4fs=">AAACM3icbVDLSgMxFM34tr6qLt0Ei+BCyoyKuhTduFSwKrZluJO5bYOZyZDcUcvQP/E3/AG3+gPiTsSd/2Bau/B1IHA493VyokxJS77/7I2Mjo1PTE5Nl2Zm5+YXyotLZ1bnRmBNaKXNRQQWlUyxRpIUXmQGIYkUnkdXh/36+TUaK3V6St0Mmwm0U9mSAshJYXnnMrQb/DIk3jCy3SEwRt/wBuEtFZbyGFPikMacEEQHDVe6Lcn2wnLFr/oD8L8kGJIKG+I4LL83Yi3yxO0TCqytB35GzQIMSaGwV2rkFjMQV9DGuqMpJGibxeB/Pb7mlJi3tHHP+Rmo3ycKSKztJpHrTIA69netL/5Xq+fU2msWMs1ywlR8HWrlipPm/bB4LA0KUl1HQBjpvHLRAQOCXKQ/rsS2b61XcsEEv2P4S842q8FOdetku7J/MIxoiq2wVbbOArbL9tkRO2Y1Jtgde2CP7Mm79168V+/tq3XEG84ssx/wPj4BhFOr9w==</latexit>

e =) N (sequence length) changes


<latexit sha1_base64="wDPDRFJyz+7rMEyZEDR9kdpuLA0=">AAACUHicbVDBThsxEJ1NSxtSaEM5crEaKsEl2qUScIzohWMQDSBlV9GsdxIsvLZleytFUT6sv9Ebp94Q/AG31pukUoGONPLTmxm/mZcbKZyP49uo8er12pu3zfXWu43N9x/aWx8vnK4spwHXUturHB1JoWjghZd0ZSxhmUu6zG++1vXL72Sd0OqbnxrKSpwoMRYcfaBG7fNUaaEKUp7tpqIMkuR2mSIqmNcMCzSeGe1E3Y2SkeK6EGri2F7liOWCVyGZUJ6s0XLx6f6o3Ym78SLYS5CsQAdW0R+179JC86oMW3CJzg2T2PhshtYLLmneSoOWQX6DExoGqLAkl80Wx8/Z58AUbKxtyHDFgv13Yoalc9MyD50l+mv3vFaT/6sNKz8+zmZCmcqHs5dC40rWvtROskJY4l5OA0Bug0Oc8Wu0yIMXT1UKV682bwVjkuc2vAQXB93ksPvl7KDTO1lZ1IQd+AR7kMAR9OAU+jAADj/gF9zDQ/Qzeox+N6Jl698XtuFJNFp/ANOBtVw=</latexit>

=) need to adapt positional encodings (use bicubic interpolation)


Touvron, Hugo, et al. "Training data-e cient image transformers & distillation through attention." International Conference on Machine Learning. PMLR, 2021.
ffi
ffi
Questions?
YouTube Playlist

You might also like