DLNLP CH-6 N
DLNLP CH-6 N
Major challenges:
hunan larguoge and understonding is rich and intiicated
-Language diffevences :he
poken by humans
and there many languages
collecbion of input-output poirs,tere
.TYaining data : TYaining data is a curated
input Yepfe serntrs the eatures ¡ atbributes of the data.
the
task : Task such as cla 3stHcabion of text or analyzing the sentiment
comeleitu of the
kss time compared to moie cornplex tasks such as machie
of the text may require
tiarslation of answering the questions·
lack of scalabiliby
Limitations :piouty in honding armbtguity
Small model in NLP :
that have beern trained on huge amaunts of
models
"frebiained modek ate deep leaning
data be fore fine- buving fo a specitic
task.
:(Bidinectional Encodet Representations from Thansfor mers)
"BERT based madel
tanstormer ) : GPT- i9 a tronsformer-
" GPT-2 : (Genevatie pretioined
selr-supervised manner
english corpus in a
Preb rained on an extensive transtormer mdel, wtbich includes Telate
XL is avatient of
"TYanstoimer -XL : TYansformer -
encoding and a Yecuivene mechanis m.
Posibional
Noimalizabion mebhods :
uord
wod is cut of at ibs sten, the srmallest unit of that
Stemming : with stemming, a
the descend ant words.
fron wbich you Can create
a data
Lenmatization seeks to oddvess this issue. his process uses
Lemmatization:
ot a word back to ts simplest form, or lemma.
sbruc tuye that Yelates all foms
vectoization :
o0cUment #2
DOcUment #1
Radhika s agood person
He is a gocd boy she s also good
Vocabulag - is,person,she,Radhika
a,also, boy good, He,
3 5
Index
CPos):
Frgish ast- of-specch a.pab-of-speech markey.
senterice with
"Annotate eoch word in a
analysis.r
Lowest level of syntoctic and wotd Sgnter serse
disanbiquation.
TF-IDF Vectoy
Invese ocurneb Frequcncy
Term freguency Hoü cormmon (or uncommon)a
a wod
measures how impostant woyd is acvoss the dataset
docurnent
IS for a
TF - IDF
Document #1 ocument #a
is agood boy -she is also good Radhika is a good pe 1aon
Jood
IOF(He) - log(2/i) =o30/
gcod
Person
TOF (good) =log (/2) =o
boy
she total TF CHe, doc#) =/4 o-I|
also TF Cgood, doc #)a/4 z0-22
total
vocabulary r
a,also, boy igood, He,is,person ,she,rodhika
O.o331]
DocUmert #
bcument #2
TF -IDF ecto
Seque ntid model:
Intvoductien:
efers to machine leaning modek desaned for data that follow a
Sequential leaning
Seq uence. matters.
clips,vldeo clips, and time sevies data, where oder
- This includes tets,audio ordered, unßke tradibianal nod els
data that are inhevently
Seguence models thive on
that assume dato. points are independent
pispesing and analy zing 6equences ike sentences, time
-These modelsaYe adept at
Sevies, and discvete sequence daba.
sequential model:
CNN matel VS images)
netuotks CCNNs) excel with spatial data (eg,
neural
" twhile convolubional data joffeting a more effective
pptoach
models ave talored fot Scquential
" Sequence
for such data se bs.
identically disbributed Ci-i-a),
and
sequen data is ot independent
between data poinbse.necessibating pecialized
dependencies
cieates
The seguential otder
models.
Agplication :
speech and voice vecognitonis
mode s ale pivotal in
"Sequence
. Time series predicbion
Watural language processingCNLe) is cucial.
*
whereunderstanding bhe seguene
seuence nodelling : output values s g enerated fom
seguence of
"SCqUence modellng the proces ushere a'
values.
a sequence of input
incluing tirne-series data and Gext segvences.
"Tbs utlized acvoss vavious data types,
connected or dependent
datasets where eoch data point is
to
"qUential data vefers datasets
poirte wthin the Same is Cruclal foy
on othe seguence ot otder of the dato points
interConnec tedness means the
Ths
analysis ond interpretation.
exampks saqence data :
speech vecognition
vdeo activity vecognition
Musicgenerotion Aame entity vecognibion.
senbiment classificabion
seq1seg
A
A
output :
Irput Reconstyucted
input.
empiessed
data
Seq2seq madel:
seguence of words (such as
CoYe funcbioralby :Seg1se madels are designed to process a
and generate a Coriesponding sequene of words cs
oUtput
gentences) as input
vecuent neuol netwok (RNN)achitectuve.
"undevlying technology :uhle based.on the nemory)
models often employ advonc ed vayiants like LSTM (long short - tem
Seg 2seq
(hated RecurYent units) fov example, ubilizes LSTM.
o GRU
madels opevate by considering bwo inputs ab coch step :
.Mechanism : sgseg
the user.
* the cuYYent input fom
reused as an addtional input.
feedbck frorn its previous otpub, uhich is then
Encodey
Decodey
Seaa nt atoencodes: