0% found this document useful (0 votes)
14 views2 pages

CLIP Summary

A summary on the CLiP model
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
14 views2 pages

CLIP Summary

A summary on the CLiP model
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 2
Contrastive Languag- Image FreAraining (CLLP).z . (Rodferd , 2021) leaned lireor trons formation om) Fetes —eenbeddings angles Négt . | pepper che sont : pnp aussie pup Encoder ia) Per TL Vectors 4G) mit a softmax et, [1175 tty | ee Lola | aa-7, teeta 2-75 tety || Softmax " a Images paceee a} Ty | | iy Ty | 15-7 | IyTs IT (tini-batch) Resht Lost fig J} [det | tte | treet IvTy WW > sen Mosimize Cosine Smionity for Eitiage comect tort-hmage pair / Mivimize. it for wren g. pairs (tprcetation) (we Mow’ which text goes wil which "image. our ‘raining dot be this ig the way we've Scraped Tb From the, oy * hference (2) Create dataset classifier from label text on. Tage fo {Lobe}, 0 ype of Fwer Astakna alls Gar (4 folowersiod datestt) = inp performance compact ely label A photo of a {object F satel Prete of a } Ube! (4r Setelite hobs ) Engueered prowpt (< Tert-encoder input) HAND - CRAFTED ® cop (3) Use for zero-shot prediction bird ¥ iT | Cee - nue Qo) tpt) | tt: tpt; | | te TtW Wh which of he teats is y _teimage csast? A photo of a dog. 1s ws som poh, sel weit vets er tat ncadar leared by ccc sofhmon ¢ Smented ey "Oder eS ay (yet |x) =) Spreng on) Psi ra exp ( Gs(0j,8/T) | an haa Ung clip minibaten of eligned texte, 0-shot Tronshr| Tearnad proj of nage to enbed - u « W Learned pro} of text to embed : learned Yenperature paraneter f ct Sete) 0S) © NoVbining is nedecl. AA preined. mode Soe ney cra omupity dient des ued ven ope suis Fira task on! a ty: nen -dalaseh- abil’ t.oranse = Zan-siok claseification ne of each modality r r Figure 3 Numpy-ike pseudocode for the ore of an implement tion of CLIP

You might also like