0% found this document useful (0 votes)

8 views

Basics of NLP

Uploaded by

agnes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Basics of NLP

Uploaded by

agnes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Learning Paradigm

3 settings we explore for in-context learning

1) Zero-shot: the model predicts the answer given only a natural language description of
the task. No gradient updates are performed.
2) One-shot: in addition to task description, the model sees a single example of the task.
No gradient updates are performed.
3) Few-shot: In addition to the task description, the model sees a few examples of the
task. No gradient updates.
Traditional Fine-tuning (not used for GPT-3):
- Fine tuning:
1) The model is trained via repeated gradient updates using a large corpus of example
tasks.
Example: sea otter -> loutre de mer (GRADIENT UPDATE) peppermint -> menthe poivre (GRADIENT UPDATE) plush
giraffe -> giraffe peluche (GRADIENT UPDATE) cheese -> … PROMPT

CATEGORIZATION OF LEARNING
Disclaimer:
This categoriazation is rather coarse
The list of paradigms is extendable
Not everything is unambiguous, there might be overlap
Connection to tasks/data:
Given the task, some paradigms are more suitable
Given the amount of data, a specific paradigm might be preferrable
Presence/Absence of labels makes certain paradigms (in)feasible
Distinction between:
Embedding texts
Pre-training & fine-tuning a model
Prompting
Interaction & Generation
Agents

WORD VECTORS: ONE-HOT ENCODING

Problem statement
Words are discrete units
We can represent them as (high-dimensional) one-hot vectors
This makes it difficult/impossible to e.g. capture similarity between synonyms
Documents can be represented as a vector of word occurrences (bag-of-words)
Example of one-hot: w(football) = [0,0,0,0,1,0,0,0,0,0,0,0], w(basketball) = [0,1,0,0,0,0,0,0,0,0,0,0],

Problems of one-hot embeddings

high dimensionality
not possible to measure similarity
Alternative: Dense embeddings

WORD VECTORS: EMBEDDING

Measuring similarity now possible:

Not only possible for words, but for whole documents:

Use Case: Document retrieval
PRE-TRAIN/FINE-TUNE
Problem statement
The larger the models, the more data is needed to train them
(Labeled) Data is scarce and expensive!
Many languages are underrepresented in terms of resources: Number of speakers (of a language) 6= Amount of
available text
Unlabeled (English) text data is ubiquitous
Machine learning setup Transfer learning setup

Pre-training:
Using unlabeled corpora with self-supervised objectives is referred to as Pre-Training
Pre-training examples require no annotation, the inherent structure of the text is exploited
Construction of different self-supervised objectives, which are assumed to:
- cover different phenomena better than the others
- work more efficiently for learning
Example 1: predict the next work in a sentence
Example 2: predict a masked word

Fine-tuning:
The second phase of transfer learning, i.e. adapting the pre-trained model to a labeled data set for a specific
downstream task is referred to as Fine-Tuning
Far less labeled data required compared to a scenario w/o pre-training

PROMPTING
Accessing pre-trained models:
Fine-tune them
Also possible: No fine-tuning, but ..
 Zero-Shot Transfer w/o ANY labeled data (only describe the task)
 Few-Shot Transfer w/ FEW labeled data points (describe the task, and show examples as context)
this is called in-context learning
In both of the latter cases, good pre-training becomes even more important
Definition(s):
GPT-3 paper: "Task Description" (accompanied by samples + labels)
Prompt: Describes the task the model should perform
Prompt Engineering: Finding the best prompt(s) for one (or across multiple) task(s)
Prompt Tuning: Add trainable weights ("soft prompt") to inputs and fine-tune

CHATTING / GENERATION
Interacting with the model
Larger model sizes, reduced latency and improved training regimes enable conversations with the models
Enables the user to:
- have multi-turn conversations, with the model "remembering" previous inputs
- refine the prompt in case of unsatisfactory output
- used increased context sizes for the prompts
Still: Static, pre-trained model with "knowledge"
Interacting with the model: Persona-Chat Benchmark
OUTLOOK: Agents

NLP TASKS
Learning goals
Understand the different types of tasks (low- vs. high-level)
Purely Linguistic tasks vs. more general classification tasks

CATEGORIZATION OF NLP TASKS

Distinction between:
Language modeling
Token-level classification
Sequence-level classification
Similarity / Retrieval
Text generation
Connection to learning paradigms:
Given the task, some learning paradigms are more suitable
Tasks can be formulated differently to fit a given learning paradigm
Amount of available (labeled) data might depend on task
Presence/Absence of labels important to consider

LANGUAGE MODELING

Predict the next token:

S= Where are we … (word being predicted, should be “going”)
P(S) = P(Where) x P(are|Where) x P(we|Where are) x P(going|Where are we)

CATEGORIZATION OF NLP TASKS

"Low-Level" tasks:
Token-level Classification: Problems on a word/token level
Modeling relationships between words/tokens
"High-Level" tasks:
Sequence-level Classification: Problems on a sequence level
Retrieval: Assess (semantic) similarity on document-level
Producing sequences of text based on an input sequences, known as seq2seq tasks
Note: The latter one is also an instance of a generation task
LOW-LEVEL: SEQUENCE TAGGING
POS-tagging (part of speech):
Examples: Time flies like an arrow. // Fruit flies like a banana. IN = Preposition or
subordinating conjunction (conjunction here); VBZ = Verb, 3 rd person singular present; DT = determiner; NN =
singular noun
LOW-LEVEL: STRUCTURE PREDICTION- Chunking/Parsing

LOW-LEVEL: SEMANTICS

Word sense disambiguation

NAMED ENTITY RECOGNITION (NER)

BIO-Tagging
B – begin of entity (B-PER for person, B-LOC for location), I – inside entity, e.g I-PER, I-LOC, O – other (no entity)
NER AS TOKEN-LEVEL CLASSIFICATION
Pre-train/fine-tune:

HIGH-LEVEL NLP TASKS

 Information Extraction: search, event detection, textual entailment
 Writing Assistance: spell checking, grammar checking, auto-completion
 Text Classification: spam, sentiment, author, plagiarism
 Natural language understanding: metaphor analysis, argumentation mining, question-answering
 Natural language generation: summarization, tutoring systems, chat bots
 Multilinguality: machine translation, cross-lingual information retrieval

SEQUENCE-LEVEL CLASSIFICATION
Output can also be non-binary, i.e. multi-class/-label

Reformulation as generative task:

RETRIEVAL: Document retrieval

GENERATION: MACHINE TRANSLATION
A brief History of Machine Translation
Rule-Based Machine Translation (50s – 80s): Dictionaries + Grammatical Rules
Example-Based Machine Translation (80s – 90s): First suggested by Makoto Nagao (1984), Based on bilingual text
corpora
Statistical Machine Translation (90s – 10s): Mostly driven by IBM research
Neural Machine Translation (10s – now): Based on neural networks (LSTMs, Transformers)

SEQ2SEQ MODELING

The model reads an input sentence “ABC” and produces “WXYZ” as the output sentence. The model stops making
predictions after outputting the end-of-sentence token. Note that the LSTM reads the input sentence in reverse,
because doing so introduces many short-term dependencies in the data that make the optimization problem much
easier.
Notes:
In the meantime: Transformers replaced LSTMs
Overall architecture (Encoder-Decoder) still used
Used for:
(Neural) Machine Translation
Summarization
Questions answering
TRADITIONAL BENCHMARKING: NLU
Nine sentence- or sentence-pair language understanding tasks
Public leaderboard, (still) very popular benchmark collection

WinoGrande: Test whether the model can identify the correct reference

HellaSwag
Pick the best ending to the context.

PIQA (Physical Intercation: Question Answering): Test whether the model can identify the most plausible
continuation (see also LAMBADA, HellaSwag)
Neural Probabilistic Language Model
Learning goals
grasp importance of the “look-up table“ a.k.a. embedding layer
understand computational implications of language modeling

WHAT IS A LANGUAGE MODEL?

Wikipedia says:
"A statistical language model is a probability distribution over sequences of words"
This means (a) assigning a probability to a sequence of words, e.g.
P("we are all interested in NLP")
and (b) assigning a probability to the likelihood of a word given a sequence of words, e.g.
P("NLP"|"we are all interested in")

MAKING USE OF THE MARKOV-ASSUMPTION

The Markov-Assumption
"The future is independent of the past given the present"
In NLP context:
- Next word only depends on the k previous words
- kth order markov assumption with k to be chosen manually
"Traditional" count-based models
Good baselines, but severe shortcomings
Lacking the ability to generalize
WHAT ARE POTENTIAL PROBLEMS?
Curse of dimensionality
Linear increase in context size leads to an exponential increase in the number of parameters
Considering a vocabulary of size |V| = 100, 000. Already for bi-grams: |V|^2 = 10^10 possible combinations
Sparsity
Again, considering |V| = 1.000.000 & bi-grams as context
Unlikely to observe all of the bi-gram combinations
(a) ever
(b) often

A NEURAL PROBABILISTIC LANGUAGE MODEL

Idea
Using a neural network induces non-linearity and overcomes the shortcomings of traditional models
(a) Linear increase in #parameters with increasing context size
(b) Better generalization

Input: Context of (n - 1) words

In between:

 Look-up table
 Non-linearity, e.g. tanh, ReLU

Output: Probability distribution over the next word

WHAT COULD BE PROBLEMATIC?

Computational cost
Vanilla softmax is expensive
Proposed solution(s):
1) Hierarchical softmax
2) Sampling Approaches
Still relying on the Markov assumption: Context window has to be specified
manually

Word Embeddings
Learning goals
Understand what word embeddings are
Learn the main methods for creating them

MOTIVATION
How to represent words/tokens in a neural network?
Possible solution: one-hot encoded indicator vectors of length |V|.

Question: Why is this a bad idea?

- Parameter explosion (|V| might be > 1M)
- All word vectors are orthogonal to each other - no notion of word similarity
- Learn one word vector (“word embedding”) per word i

- Typical dimensionality
- Embedding matrix:
Question: Advantages of using word vectors?
- We can express similarities between words, e.g., with cosine similarity

- Since the embedding operation is a lookup operation, we only need to update the vectors that occur in a
given training batch
Supervised training?
Training embeddings from scratch:
- Initialize randomly and learn it during training phase
- Words that play similar roles w.r.t. task get similar embeddings.
- Example: Sentiment Classification: We might expect
- We typically have more unlabeled than labeled data. Can we learn embeddings from the unlabeled data?
Question: What could be a problem at test time? If training set is small, many words are unseen during
training and therefore have random vectors

Distributional hypothesis: “A word is characterized by the company it keeps“ (J.R. Firth, 1957)
Idea:
 Learn similar vectors for words that occur in similar contexts
 Three different (milestone) methods:
o Word2Vec
o GloVe (not covered)
o FastText

WORD2VEC AS A BIGRAM LANGUAGE MODEL

Model architecture:
Words in our vocabulary are represented as two sets of vectors:
- if they are to be predicted
- if they are conditioned on as context
Predict word “i” given previous word “j”:
Question: What is a possible function f (*) ?

Answer: Softmax 
Question: Problem with training softmax? (IT IS SLOW)
Answer: Needs to compute dot products with the whole vocabulary in the denominator for every single prediction
SPEEDING UP TRAINING: NEGATIVE SAMPLING

One option: Hierarchical Softmax (not covered) reduces complexity from

Another trick: Negative Sampling (a variant of noise contrastive estimation): Changes the objective function; the
resulting model is not a language model anymore!
IDEA: Instead of predicting the probability distribution over the whole vocabulary, make binary decisions for a small
number of words.
- “Positive“ samples: Bigrams seen in the corpus.
- “Negative“ samples: Random bigrams (not seen in corpus)
NEGATIVE SAMPLING: LIKELIHOOD
Given: positive training set pos(O), negative training set neg(O)

Question: Why not just maximize the likelihood?

WORD2VEC WITH NEGATIVE SAMPLING

Maximize likelihood of training data:

↔ minimize negative log likelihood:

Question: What do these components stand for in Word2Vec with negative sampling?
x(i) Word pair, from corpus OR randomly created
y(i) Label:
 1 = word pair is from positive training set,
 0 = word pair is from negative training set

Parameters θ of the model:

SPEEDING UP TRAINING: NEGATIVE SAMPLING

Constructing a good negative training set can be difficult
Often it is some random perturbation of the training data (e.g. replacing the second word of each bigram by a
random word).
The number of negative samples is often a multiple (1x to 20x) of the number of positive samples
Negative sets are often constructed per batch
Question: How many dot products do we need to calculate for a given word pair? How does this compare to the
naive and hierarchical softmax?

SKIP-GRAM (WORD2VEC)
Create a fake task:
Training objective: Given a word, predict the neighbouring words
Generation of samples: Sliding fixed-size window over the text

Idea: Learn many bigram language models at the same time.

Given word w[t], predict words inside a window around w[t]:
One position before the target = word: p(w[t-1] | w[t])
One position after the target word: p(w[t+1] | w[t])
Two positions before the target word: p(w[t-2] | w[t])
... up to a specified window size c.
Models share all parameters!
SKIP-GRAM: OBJECTIVE
Optimize the joint likelihood of the 2c language models:

Negative Log-likelihood for whole corpus (of size N):

Using negative sampling as approximation:

is the word vector of a random word, M is the number of negatives per positive sample

FASTTEXT
Accomplishments:
Words can be represented as dense, low-dimensional vectors
Easy to capture similarity between words
Additive Compositionality of word vectors
Open issues:
Even if we train Word2Vec on a very large corpus, we will still encounter unknown words at test time
What about rare words?
Orthography can often help us:
W (remuneration) should be similar to:
W(remunerate) same stem, w(iteration), w(consideration) … same suffix ~ same Part of Speech
Assume, we want to represent the word example:
Character n-grams (n = 3):
In practice, we don’t set n = a but rather
Character n-grams :
Note, that the 4-gram exam is different from the word <exam>.

Representation of a known word: Average of the word’s embedding and char-n-gram embeddings

Representation of an unknown word: Average of char-n-gram embeddings

FASTTEXT TRAINING
ngrams typically contains character 3- to 6-grams

Replace in Skipgram objective with its new definition. During backpropagation, loss gradient vector is

distributed to word vector and associated n-gram vectors

SUMMARY
Word2Vec as a bigram Language Model
Negative Sampling
Skipgram: Predict words in window given word in the middle
fastText: N-gram embeddings generalize to unseen words

USING PRETRAINED EMBEDDINGS

Knowledge transfer from unlabelled corpus
Design choice: Fine-tune embeddings on task or freeze them?
- Pro: Can learn/strengthen features that are important for task
- Contra: Training vocabulary is small subset of entire vocabulary  we might overfit and mess up topology
w.r.t. unseen words
Resources:
https://fasttext.cc/docs/en/crawl-vectors.html
https://nlp.stanford.edu/projects/glove/
ANALOGY MINING

W(a) – w(b) + w(c) = w(? d)

N(d) = argmax (w(d’) in W) cos (w(?) , w(d’))
SUMMARY
Applications of Word Embeddings
- Word vector initialization in neural networks for NLP tasks E.g., sentiment classification of reviews, topical
classification of news
- Analogy mining
- Information retrieval: semantic search, query expansion
- Simple and fast aggregations of sentence representations

cl13_gpt
No ratings yet
cl13_gpt
26 pages
cl13_gpt-2
No ratings yet
cl13_gpt-2
26 pages
Advancement in NLP Paper
No ratings yet
Advancement in NLP Paper
49 pages
Recent Advances in Natural Language Processing Via Large Pre-Trained Language Models-A Survey
No ratings yet
Recent Advances in Natural Language Processing Via Large Pre-Trained Language Models-A Survey
40 pages
Lecture 15 - Foundation Models - CLIP and GPT
No ratings yet
Lecture 15 - Foundation Models - CLIP and GPT
45 pages
Paradigm Shift in Natural Language Processing: Tian-Xiang Sun Xiang-Yang Liu Xi-Peng Qiu Xuan-Jing Huang
No ratings yet
Paradigm Shift in Natural Language Processing: Tian-Xiang Sun Xiang-Yang Liu Xi-Peng Qiu Xuan-Jing Huang
15 pages
Unit 5.
No ratings yet
Unit 5.
17 pages
Transfer Learning in Natural Language Processing PDF
0% (1)
Transfer Learning in Natural Language Processing PDF
238 pages
Pre Trained Models For NLP
No ratings yet
Pre Trained Models For NLP
15 pages
DAB311 DL Week 11 RNN
No ratings yet
DAB311 DL Week 11 RNN
25 pages
2023 07 28 Evolution of Language Models
No ratings yet
2023 07 28 Evolution of Language Models
73 pages
Summaries of The Chapters
No ratings yet
Summaries of The Chapters
29 pages
Chapter 2. Transformers: A Note For Early Release Readers
No ratings yet
Chapter 2. Transformers: A Note For Early Release Readers
85 pages
Lecture 7
No ratings yet
Lecture 7
66 pages
Thuyết Trình TWP
No ratings yet
Thuyết Trình TWP
7 pages
Chapter 12
No ratings yet
Chapter 12
16 pages
AI4youngster - 6 - Topic NLP
No ratings yet
AI4youngster - 6 - Topic NLP
66 pages
lec20.LLM
No ratings yet
lec20.LLM
58 pages
Transformers
No ratings yet
Transformers
27 pages
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
No ratings yet
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
20 pages
Pretraining Part1 16 Mar 23 PDF
No ratings yet
Pretraining Part1 16 Mar 23 PDF
32 pages
1 s2.0 S2095809922006324 Main
No ratings yet
1 s2.0 S2095809922006324 Main
20 pages
14-LookingForward
No ratings yet
14-LookingForward
48 pages
Understanding Deep Learning
100% (1)
Understanding Deep Learning
39 pages
paper_review
No ratings yet
paper_review
6 pages
Augmenting LLMs Survey
No ratings yet
Augmenting LLMs Survey
33 pages
2005 14165v3 PDF
No ratings yet
2005 14165v3 PDF
74 pages
2108.05542
No ratings yet
2108.05542
42 pages
BTech Advanced AI Unit03
No ratings yet
BTech Advanced AI Unit03
109 pages
Large Language Models (LLM)
No ratings yet
Large Language Models (LLM)
139 pages
Large Language Models: CSC413 Tutorial 9 Yongchao Zhou
No ratings yet
Large Language Models: CSC413 Tutorial 9 Yongchao Zhou
40 pages
1719720399971
No ratings yet
1719720399971
51 pages
2022-foundations-tutorial3-sunwang-deeplearning4nlp
No ratings yet
2022-foundations-tutorial3-sunwang-deeplearning4nlp
103 pages
Deep Learning: Hoàng Huy Minh Hoàng Thảo Lan Chi Phạm Huy Thiên Phúc Trương Huỳnh Đăng Khoa
No ratings yet
Deep Learning: Hoàng Huy Minh Hoàng Thảo Lan Chi Phạm Huy Thiên Phúc Trương Huỳnh Đăng Khoa
25 pages
2009 Tutorial Nips
No ratings yet
2009 Tutorial Nips
113 pages
Session 8
No ratings yet
Session 8
24 pages
19 20-gpt-3 Prompts
No ratings yet
19 20-gpt-3 Prompts
68 pages
NLP Short Que Ans
No ratings yet
NLP Short Que Ans
21 pages
Prompt Part1
No ratings yet
Prompt Part1
36 pages
The Diverse Landscape of Large Language Models Deepsense Ai
No ratings yet
The Diverse Landscape of Large Language Models Deepsense Ai
16 pages
Chap 7.1 Sequence Analysis Using FFN
No ratings yet
Chap 7.1 Sequence Analysis Using FFN
47 pages
Transformer Part3 16 Mar 23 PDF
No ratings yet
Transformer Part3 16 Mar 23 PDF
59 pages
03-NLP-Document
No ratings yet
03-NLP-Document
38 pages
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
No ratings yet
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
15 pages
Transformer
No ratings yet
Transformer
5 pages
Deep Learning for Natural Language GDG Bloomington 1690248059
No ratings yet
Deep Learning for Natural Language GDG Bloomington 1690248059
41 pages
Literature Review On Vulnerability Detection Using
No ratings yet
Literature Review On Vulnerability Detection Using
10 pages
14.Chapter10_AdvancedDeepLearningForText
No ratings yet
14.Chapter10_AdvancedDeepLearningForText
22 pages
GenAI_Syllabus
No ratings yet
GenAI_Syllabus
17 pages
AN2DL_05_2324_Seq2SeqAndWordEmbedding
No ratings yet
AN2DL_05_2324_Seq2SeqAndWordEmbedding
42 pages
Overview of The Transformer-Based Models For NLP Tasks
No ratings yet
Overview of The Transformer-Based Models For NLP Tasks
5 pages
(Shared) - GPT
No ratings yet
(Shared) - GPT
35 pages
SocrAI Day 3
No ratings yet
SocrAI Day 3
43 pages
1 Introduction
No ratings yet
1 Introduction
31 pages
Constrained Conditional Model: Fundamentals and Applications
From Everand
Constrained Conditional Model: Fundamentals and Applications
Fouad Sabry
No ratings yet
Learn Python in One Hour: Programming by Example
From Everand
Learn Python in One Hour: Programming by Example
Victor R. Volkman
3/5 (2)
C# Data Structures and Algorithms: Harness the power of C# to build a diverse range of efficient applications
From Everand
C# Data Structures and Algorithms: Harness the power of C# to build a diverse range of efficient applications
Marcin Jamro
No ratings yet
Thinking About Star
From Everand
Thinking About Star
Francis McCabe
No ratings yet
An Introduction to Functional Programming Through Lambda Calculus
From Everand
An Introduction to Functional Programming Through Lambda Calculus
Greg Michaelson
No ratings yet
Programming in Star
From Everand
Programming in Star
Francis McCabe
No ratings yet
3 Detection of Hate Speech in Social Networks A Surv
No ratings yet
3 Detection of Hate Speech in Social Networks A Surv
19 pages
Programming Large Language Models With Azure Open Ai Conversational Programming And Prompt Engineering With Llms Developer Reference 1 Converted Esposito download
100% (1)
Programming Large Language Models With Azure Open Ai Conversational Programming And Prompt Engineering With Llms Developer Reference 1 Converted Esposito download
81 pages
2403.16630v1
No ratings yet
2403.16630v1
12 pages
[2024+issue]+ARDA_JOURNAL_17223---AL+(1)
No ratings yet
[2024+issue]+ARDA_JOURNAL_17223---AL+(1)
6 pages
Transformer-Based Deep Learning Models For The Sentiment Analysis of Social Media Data
No ratings yet
Transformer-Based Deep Learning Models For The Sentiment Analysis of Social Media Data
12 pages
Lecture 10 - Knowledge and Reasoning - 2025 - LLM (1)
No ratings yet
Lecture 10 - Knowledge and Reasoning - 2025 - LLM (1)
121 pages
Bindeep
No ratings yet
Bindeep
9 pages
Poverty Cause and Effect Essay
100% (2)
Poverty Cause and Effect Essay
8 pages
NLP Lab Tasks
No ratings yet
NLP Lab Tasks
16 pages
Data Modeling - Cheatsheet
No ratings yet
Data Modeling - Cheatsheet
9 pages
Week 8-Module 7 NLP
No ratings yet
Week 8-Module 7 NLP
52 pages
Tensorlayer Documentation: Release 1.11.1
No ratings yet
Tensorlayer Documentation: Release 1.11.1
258 pages
Question Answering System: 296: Natural Language Processing
No ratings yet
Question Answering System: 296: Natural Language Processing
30 pages
unit-iv-v-deep-learning-material
No ratings yet
unit-iv-v-deep-learning-material
32 pages
2022 - A Survey On Deep Learning For Software Engineering
No ratings yet
2022 - A Survey On Deep Learning For Software Engineering
73 pages
Neural Word Embedding As Implicit Matrix Factorization
No ratings yet
Neural Word Embedding As Implicit Matrix Factorization
9 pages
Chapter 3 After Modfiy
No ratings yet
Chapter 3 After Modfiy
4 pages
HLDC Hindi Legal Documents Corpus
No ratings yet
HLDC Hindi Legal Documents Corpus
17 pages
Gobel (2021) The Political Logic of Protest Repression in China
No ratings yet
Gobel (2021) The Political Logic of Protest Repression in China
18 pages
Word Sense Disambiguation (WSD)
No ratings yet
Word Sense Disambiguation (WSD)
9 pages
Module III
No ratings yet
Module III
42 pages
UNIT -3 NLP
No ratings yet
UNIT -3 NLP
15 pages
Moverscore: Text Generation Evaluating With Contextualized Embeddings and Earth Mover Distance
No ratings yet
Moverscore: Text Generation Evaluating With Contextualized Embeddings and Earth Mover Distance
16 pages
AI-with-the-best Fairness Presentation
No ratings yet
AI-with-the-best Fairness Presentation
71 pages
CCS348 - Game Theory Lab Manual Record
No ratings yet
CCS348 - Game Theory Lab Manual Record
42 pages
UNIT_5_DL
No ratings yet
UNIT_5_DL
11 pages
Untitled-1-1-1 (3)
No ratings yet
Untitled-1-1-1 (3)
46 pages
Embedding S
No ratings yet
Embedding S
83 pages
AASTU Tarikwa and Naol_Team FML Project Report Draft 3
No ratings yet
AASTU Tarikwa and Naol_Team FML Project Report Draft 3
39 pages
NCE Project Presentation Template (2)
No ratings yet
NCE Project Presentation Template (2)
25 pages