0% found this document useful (0 votes)

9 views18 pages

NLP Assignment (917722H031)

The document outlines various Natural Language Processing (NLP) techniques, including tokenization, feature extraction, contextual word embeddings, and topic modeling. It provides code examples using libraries such as NLTK, Scikit-learn, and Hugging Face Transformers to demonstrate methods like Bag-of-Words, TF-IDF, and LDA. Additionally, it covers semantic similarity and RNNs for text prediction, showcasing practical applications of these NLP concepts.

Uploaded by

Arockia Alvita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views18 pages

NLP Assignment (917722H031)

Uploaded by

Arockia Alvita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

NLP ASSIGNMENT -1 Monika.

S
917722H031

1.NLP Tokenization
import nltk

# Downloads the necessary data for tokenization and text processing

nltk.download('punkt') # Tokenizer model for breaking text into words/sentences
nltk.download('stopwords') # Common English stopwords (e.g., "is", "the")
nltk.download('wordnet') # WordNet lexical database for lemmatization

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer, WordNetLemmatizer

# Sample input text

text = "NLP is amazing and it's evolving rapidly!"

# Tokenize text into words and convert to lowercase

tokens = word_tokenize(text.lower())

# Remove punctuation or non-alphabetic tokens

tokens = [word for word in tokens if word.isalpha()]

# Remove common stopwords (e.g., "is", "and", "it's")

filtered = [word for word in tokens if word not in stopwords.words('english')]

# Initialize stemmer and lemmatizer

stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()

# Apply stemming to each filtered word

print("Stemmed:", [stemmer.stem(w) for w in filtered])

# Apply lemmatization to each filtered word

print("Lemmatized:", [lemmatizer.lemmatize(w) for w in filtered])

Description:

1. nltk.download()
○ Purpose: Downloads required resources like tokenizer models ('punkt'),
stopwords, and lexical database ('wordnet') for processing.

2. word_tokenize()

○ Purpose: Splits the input sentence into individual words (tokens) for further
processing.

3. .lower()

○ Purpose: Converts the entire text to lowercase to ensure uniformity when

comparing or filtering words.

4. isalpha()

○ Purpose: Checks if each token contains only alphabetic characters, removing

punctuation and numbers.

5. stopwords.words('english')

○ Purpose: Provides a list of common English words (like "is", "the", "and") to be
removed as they carry little meaning.

6. PorterStemmer

○
Purpose: Reduces words to their root form using rule-based stemming (e.g., "evolving" "evolv").

7. WordNetLemmatizer

○
Purpose: Converts words to their base or dictionary form (e.g., "evolving" "evolve") using linguistic rules

OUTPUT:
\

2. Feature Extraction (BoW and TF-IDF)

PROGRAM:
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

corpus = [
"NLP is fun",
"NLP is powerful",
"NLP is transforming industries"
]

bow = CountVectorizer()
X_bow = bow.fit_transform(corpus)
print("BoW:", X_bow.toarray())
print("Features:", bow.get_feature_names_out())

tfidf = TfidfVectorizer()
X_tfidf = tfidf.fit_transform(corpus)
print("TF-IDF:", X_tfidf.toarray())

Description:
1. from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

○ Purpose: Imports two key tools for text feature extraction:

■ CountVectorizer: Converts text to a Bag-of-Words (BoW) model.

■ TfidfVectorizer: Converts text to a TF-IDF (Term Frequency-Inverse

Document Frequency) representation.

2. corpus

○ Purpose: A list of text documents that serve as the input for vectorization.

3. CountVectorizer()

○ Purpose: Initializes the BoW vectorizer, which counts the frequency of each word
in the corpus.
4. fit_transform(corpus)

○ Purpose: Learns the vocabulary from the corpus and transforms the documents
into a numerical matrix.

○ For BoW: Each element represents the count of a word in a document.

○ For TF-IDF: Each element represents the importance of a word in a document

relative to the corpus.

5. X_bow.toarray()

○ Purpose: Converts the sparse matrix result of CountVectorizer into a dense array
for easier viewing.

6. bow.get_feature_names_out()

○ Purpose: Retrieves the list of unique words (features) identified in the corpus.

7. TfidfVectorizer()

○ Purpose: Initializes the TF-IDF vectorizer, which evaluates word importance by

considering frequency and uniqueness across all documents.

8. X_tfidf.toarray()

○ Purpose: Converts the sparse TF-IDF matrix to a dense array to view the TF-IDF
scores.

Output:

3)Tokenization (Classical vs Modern)

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
sentence = "Unbelievable performance by the transformer model!"
tokens = tokenizer.tokenize(sentence)
token_ids = tokenizer.convert_tokens_to_ids(tokens)
print("Tokens:", tokens)
print("Token IDs:", token_ids)
DESCRIPTION:
from transformers import AutoTokenizer

● Purpose: Imports the AutoTokenizer class from the Hugging Face Transformers library,
which automatically selects the appropriate tokenizer for a given pre-trained model.

AutoTokenizer.from_pretrained("bert-base-uncased")

● Purpose: Loads the tokenizer associated with the BERT base model (uncased version,
meaning it lowercases all input).

● Automatically downloads and caches the tokenizer if it's not already available.

sentence

● Purpose: The input sentence that will be tokenized.

tokenizer.tokenize(sentence)

● Purpose: Splits the input sentence into subword tokens based on BERT's WordPiece
tokenization.

●
Handles complex words and unknown tokens by breaking them into known subwords (e.g., "unbelievable" ['un', '#

tokenizer.convert_tokens_to_ids(tokens)

● Purpose: Converts each subword token into its corresponding numerical ID from BERT’s
vocabulary.

● These IDs are the actual inputs to the BERT model.

print("Tokens:", tokens)

● Purpose: Displays the list of tokens generated by the tokenizer.

print("Token IDs:", token_ids)

● Purpose: Shows the list of token IDs corresponding to each token.

OUTPUT:

4. Contextual Word Embeddings (BERT)

Program:
from transformers import AutoModel
import torch

model = AutoModel.from_pretrained("bert-base-uncased")
inputs = tokenizer("NLP is powerful", return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
print("Embedding shape:", outputs.last_hidden_state.shape)

Description:
from transformers import AutoModel

● Purpose: Imports the pre-trained model loader from Hugging Face's Transformers
library, allowing dynamic selection of model architecture (like BERT).

import torch

● Purpose: Imports PyTorch, which is used to manage tensors and control model
computation (like disabling gradient tracking).

AutoModel.from_pretrained("bert-base-uncased")

● Purpose: Loads the pre-trained BERT base model with lowercase (uncased) inputs.
● Only returns hidden states (not classification heads).

tokenizer("NLP is powerful", return_tensors="pt")

● Purpose: Tokenizes the input sentence and returns it as PyTorch tensors ("pt" stands for
PyTorch).

● Prepares inputs like input_ids and attention_mask for the model.

with torch.no_grad():

● Purpose: Disables gradient computation since you’re doing inference (not training). This
saves memory and speeds up computation.

model(**inputs)

● Purpose: Feeds the tokenized input into the BERT model.

● The **inputs unpacks arguments like input_ids and attention_mask.

outputs.last_hidden_state

● Purpose: Contains the embeddings (hidden states) for each token in the input sequence
from the last BERT layer.

outputs.last_hidden_state.shape

● Purpose: Prints the shape of the output tensor, typically (batch_size, sequence_length,
hidden_size), e.g., (1, 5, 768).

OUTPUT:
5.TF-IDF Similarity:

PROGRAM:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Example sentences
sentence1 = "I love machine learning"
sentence2 = "Artificial intelligence is fascinating"

# TF-IDF Vectorization
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform([sentence1, sentence2])

# Cosine similarity
similarity = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])
print(f"TF-IDF similarity: {similarity[0][0]:.4f}")

Description:
from sklearn.feature_extraction.text import TfidfVectorizer

● Purpose: Imports the tool for converting text into TF-IDF vectors, which reflect word
importance relative to the document and the corpus.

from sklearn.metrics.pairwise import cosine_similarity

● Purpose: Imports the function to compute cosine similarity, which measures the angle
between two vectors—used here to find how similar two sentences are.

sentence1, sentence2

● Purpose: The two input sentences you want to compare for semantic similarity.

TfidfVectorizer()

● Purpose: Initializes the vectorizer that transforms the input text into TF-IDF-weighted
vectors.

fit_transform([sentence1, sentence2])

● Purpose: Learns vocabulary and computes the TF-IDF matrix for the input sentences.

tfidf_matrix[0:1], tfidf_matrix[1:2]

● Purpose: Selects the vector for each individual sentence (row slicing) to compute
pairwise similarity.

cosine_similarity()

● Purpose: Calculates how similar the two TF-IDF vectors are based on the cosine of the
angle between them. Returns a value between 0 (no similarity) and 1 (identical).

similarity[0][0]

● Purpose: Extracts the similarity score from the result matrix (since it's a 1x1 array here).

print(f"...")

● Purpose: Displays the final cosine similarity score, formatted to 4 decimal places.

Output:

6.SEMANTIC SIMILARITY:

PROGRAM:
from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer('all-MiniLM-L6-v2')

sentence1 = "I love machine learning."

sentence2 = "I enjoy studying artificial intelligence."

embeddings = model.encode([sentence1, sentence2], convert_to_tensor=True)

similarity = util.pytorch_cos_sim(embeddings[0], embeddings[1])

print(f"Semantic similarity: {similarity.item():.4f}")

Description:
1. from sentence_transformers import SentenceTransformer, util

○ Purpose: Imports the Sentence-BERT model and utility functions.

■ SentenceTransformer: Loads pre-trained models for sentence

embeddings.

■ util: Provides helper functions like cosine similarity in PyTorch.

2. SentenceTransformer('all-MiniLM-L6-v2')

○ Purpose: Loads a lightweight and fast pre-trained Sentence-BERT model.

○ Use case: Great for sentence-level semantic tasks like similarity, clustering, etc.

3. sentence1, sentence2

○ Purpose: The two input sentences to compare semantically.

4. model.encode([...], convert_to_tensor=True)

○ Purpose: Converts input sentences into dense vector representations

(embeddings).

○ convert_to_tensor=True returns PyTorch tensors for direct use in similarity

computations.

5. util.pytorch_cos_sim(embeddings[0], embeddings[1])

○ Purpose: Computes cosine similarity between the two sentence embeddings

using PyTorch.

6. similarity.item()
○ Purpose: Converts the single tensor value (similarity score) to a regular float
value for printing.

7. print(f"...")

○ Purpose: Displays the computed semantic similarity score, formatted to 4

decimal places.

Output:

7.Topic Modeling with LDA (Latent Dirichlet Allocation)

Program:
import gensim
from gensim import corpora
from pprint import pprint
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

# Step 2: document corpus

documents = [
"I love watching cricket and football with my friends.",
"Messi and Ronaldo are amazing football players.",
"Machine learning and AI are transforming technology.",
"Python and Java are popular programming languages.",
"Studying for exams requires focus and good sleep.",
"Teachers play an important role in shaping our future.",
"Movies and music help me relax after a long day.",
"Marvel and DC make great superhero films.",
"Eating fruits and vegetables keeps you healthy.",
"Regular exercise improves both mental and physical health."
]

# Step 3: Preprocessing - tokenize and lowercase

texts = [[word.lower() for word in doc.split()] for doc in documents]

# Step 4: Create dictionary and bag-of-words corpus

dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

# Step 5: Train LDA Model

lda_model = gensim.models.LdaModel(
corpus=corpus,
id2word=dictionary,
num_topics=5, # Adjust based on the expected number of topics
random_state=42,
passes=20, # More passes = better convergence
alpha='auto',
per_word_topics=True
)

# Step 6: Display the topics

print("\n Top words in each topic:\n")
pprint(lda_model.print_topics(num_words=5))
# Step 7: Inference on new sentence
new_doc = "I enjoy programming in Python and learning AI."
new_bow = dictionary.doc2bow(new_doc.lower().split())
topics = lda_model.get_document_topics(new_bow)

print("\n🔍 Topic distribution for new sentence:")

for topic_num, prob in topics:
print(f"Topic {topic_num}: {prob:.4f}")
Description:

import gensim and from gensim import corpora

● Purpose: Imports Gensim, a popular NLP library for topic modeling and vector space
modeling. corpora helps in creating the dictionary and BoW representations.

warnings.filterwarnings(...)

● Purpose: Suppresses deprecation warnings to keep the output clean.

documents

● Purpose: A list of text documents (your input corpus) to extract topics from.

texts = [[word.lower() for word in doc.split()] for doc in documents]

● Purpose: Preprocesses each document:

○ Splits into words.

○ Converts to lowercase for consistency.

corpora.Dictionary(texts)

● Purpose: Builds a mapping (dictionary) from words to unique IDs.

doc2bow(text)

● Purpose: Converts each document into a bag-of-words vector:

○ Each document is represented as a list of (word_id, frequency) tuples.

gensim.models.LdaModel(...)

● Purpose: Trains an LDA topic model:

○ corpus: the BoW representation.

○ id2word: dictionary for mapping IDs back to words.

○ num_topics: number of latent topics to discover.

○ passes: number of iterations over the corpus.

○ alpha='auto': automatically tunes the document-topic distribution.

○ per_word_topics=True: tracks word distributions per topic.

lda_model.print_topics(num_words=5)

● Purpose: Retrieves and prints the top 5 words associated with each discovered topic.

● Useful for interpreting what each topic is about.

pprint(...)

● Purpose: Nicely formats the output for readability.

Output:

8.RNN:

PROGRAM:
sentences = [
"i love nlp",
"i love machine learning",
"nlp is fun",
"deep learning is powerful",
"i enjoy learning"
]

from tensorflow.keras.preprocessing.text import Tokenizer

from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np

tokenizer = Tokenizer()
tokenizer.fit_on_texts(sentences)

total_words = len(tokenizer.word_index) + 1
print("Vocabulary Size:", total_words)

# Generate input sequences (predict next word)

input_sequences = []
for line in sentences:
token_list = tokenizer.texts_to_sequences([line])[0]
for i in range(1, len(token_list)):
ngram_seq = token_list[:i+1]
input_sequences.append(ngram_seq)

# Padding sequences
max_len = max([len(x) for x in input_sequences])
input_sequences = pad_sequences(input_sequences, maxlen=max_len, padding='pre')

# Features and labels

X, y = input_sequences[:, :-1], input_sequences[:, -1]
y = np.array(y)

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Embedding, SimpleRNN, Dense
from tensorflow.keras.utils import to_categorical

y_cat = to_categorical(y, num_classes=total_words)

model = Sequential()
model.add(Embedding(total_words, 10, input_length=max_len-1)) # 10-dim embeddings
model.add(SimpleRNN(64)) # You can also try LSTM
model.add(Dense(total_words, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.summary()
model.fit(X, y_cat, epochs=200, verbose=0)
def predict_next_word(seed_text, tokenizer, model, max_len):
token_list = tokenizer.texts_to_sequences([seed_text])[0]
token_list = pad_sequences([token_list], maxlen=max_len-1, padding='pre')
predicted_probs = model.predict(token_list, verbose=0)[0]
predicted_index = np.argmax(predicted_probs)

for word, index in tokenizer.word_index.items():

if index == predicted_index:
return word

# Example usage
seed = "i love"
predicted = predict_next_word(seed, tokenizer, model, max_len)
print(f"'{seed}' '{predicted}'")

DESCRIPTION:
sentences

● Purpose: A list of sentences to train a simple language model for predicting the next
word based on a given seed text.

Tokenizer() and fit_on_texts(sentences)

● Purpose:

○ Tokenizer(): Creates a tokenizer to process the text.

○ fit_on_texts(sentences): Tokenizes the sentences, assigning a unique integer

index to each word (creates a word index).

total_words = len(tokenizer.word_index) + 1

● Purpose: Calculates the total number of unique words in the vocabulary, adding 1 to
account for padding.

tokenizer.texts_to_sequences([line])

● Purpose: Converts each sentence into a sequence of word indices based on the
vocabulary learned by the tokenizer.
input_sequences

● Purpose: Generates n-grams (sequences of words) for training. For each sentence, all
possible n-grams are created to predict the next word based on previous words.

pad_sequences(input_sequences, maxlen=max_len, padding='pre')

● Purpose: Pads the input sequences to ensure they have the same length by adding
zeros at the beginning ('pre' padding).

X, y = input_sequences[:, :-1], input_sequences[:, -1]

● Purpose:

○ X: Features (all words except the last word of the sequence).

○ y: Labels (the last word in each sequence).

to_categorical(y, num_classes=total_words)

● Purpose: Converts the labels into categorical format for multi-class classification (one-
hot encoding).

Sequential()

● Purpose: Initializes the Keras Sequential model, which is a linear stack of layers.

Embedding(total_words, 10, input_length=max_len-1)

● Purpose: Adds an embedding layer that converts word indices into dense vectors of
fixed size (10 here), representing words in a continuous vector space.

SimpleRNN(64)

● Purpose: Adds a simple recurrent neural network layer with 64 units. This processes
sequences to capture dependencies between words.

Dense(total_words, activation='softmax')

● Purpose: Adds a fully connected layer with softmax activation, which outputs a
probability distribution over all possible next words.

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

● Purpose: Compiles the model with the Adam optimizer and categorical cross-entropy
loss function. Accuracy is tracked during training.

model.fit(X, y_cat, epochs=200, verbose=0)

● Purpose: Trains the model on the input data for 200 epochs, with no verbosity.

predict_next_word(seed_text, tokenizer, model, max_len)

● Purpose: Defines a function to predict the next word based on a seed text:

○ Converts the seed text into a sequence of word indices.

○ Pads the sequence.

○ Uses the trained model to predict the next word, returning the word
corresponding to the predicted index.

np.argmax(predicted_probs)

● Purpose: Retrieves the index of the word with the highest probability as predicted by the
model.

Output:

Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
NLP Notebook
No ratings yet
NLP Notebook
20 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
21 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
NLP PDF
No ratings yet
NLP PDF
3 pages
Sumati
No ratings yet
Sumati
10 pages
NLP 1 Week Tutorial NLTK
No ratings yet
NLP 1 Week Tutorial NLTK
15 pages
NLP Lab Programs
No ratings yet
NLP Lab Programs
18 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
19 pages
ASTW RA03 PracticalManual
No ratings yet
ASTW RA03 PracticalManual
18 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
NLP Manual
No ratings yet
NLP Manual
21 pages
UNIT-5 and 6
No ratings yet
UNIT-5 and 6
40 pages
NLP - Short Assignments
No ratings yet
NLP - Short Assignments
8 pages
Practicle 7-Notes
No ratings yet
Practicle 7-Notes
2 pages
DSBD 7 Ass
No ratings yet
DSBD 7 Ass
9 pages
Machine Learning For NLP: Vocabulary
No ratings yet
Machine Learning For NLP: Vocabulary
37 pages
C24064 - NLP - Lab Manual
No ratings yet
C24064 - NLP - Lab Manual
28 pages
NLP Lab Programms
No ratings yet
NLP Lab Programms
9 pages
NLP - Cheatsheet
No ratings yet
NLP - Cheatsheet
10 pages
Dsbdal A7
No ratings yet
Dsbdal A7
65 pages
Basenlp
No ratings yet
Basenlp
5 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
NLP CT1
No ratings yet
NLP CT1
6 pages
NLP Lecture2 Text Pre Processing
No ratings yet
NLP Lecture2 Text Pre Processing
54 pages
A7 Dsbda Sana
No ratings yet
A7 Dsbda Sana
15 pages
Methodology
No ratings yet
Methodology
9 pages
7 TextAnalysis
No ratings yet
7 TextAnalysis
3 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
Module III
No ratings yet
Module III
42 pages
Self Evaluation Exercises
No ratings yet
Self Evaluation Exercises
12 pages
DS 7
No ratings yet
DS 7
3 pages
Assignment-10 (NLP-part-2)
No ratings yet
Assignment-10 (NLP-part-2)
2 pages
Transformer Part3 16 Mar 23 PDF
No ratings yet
Transformer Part3 16 Mar 23 PDF
59 pages
AP For NLP-LO1
No ratings yet
AP For NLP-LO1
61 pages
Laboratory Manual: Faculty of Engineering and Technology Bachelor of Technology
No ratings yet
Laboratory Manual: Faculty of Engineering and Technology Bachelor of Technology
10 pages
NLP Assignment2
No ratings yet
NLP Assignment2
7 pages
NLP Lab
No ratings yet
NLP Lab
18 pages
Aped For Fake News
No ratings yet
Aped For Fake News
6 pages
Sahil NLP
No ratings yet
Sahil NLP
16 pages
NLP - Assignment2 Proper RNN Working
No ratings yet
NLP - Assignment2 Proper RNN Working
3 pages
Token Ization
No ratings yet
Token Ization
5 pages
DSBA+Master+Codebook+ +Text+Mining+&+TSF
No ratings yet
DSBA+Master+Codebook+ +Text+Mining+&+TSF
11 pages
NLP Record
No ratings yet
NLP Record
15 pages
NLP Preprocessing Steps 1740444240
No ratings yet
NLP Preprocessing Steps 1740444240
20 pages
SL-3 - Assignment No 7
No ratings yet
SL-3 - Assignment No 7
14 pages
NLP Preprocessing Steps
No ratings yet
NLP Preprocessing Steps
20 pages
A Comprehensive Guide To Understand and Implement Text Classification in Python
No ratings yet
A Comprehensive Guide To Understand and Implement Text Classification in Python
34 pages
NLP Tushar
No ratings yet
NLP Tushar
21 pages
SocrAI Day 3
No ratings yet
SocrAI Day 3
43 pages
Ai Lab Final
No ratings yet
Ai Lab Final
21 pages
CSE 3652 Lab Record Format - PDF
No ratings yet
CSE 3652 Lab Record Format - PDF
13 pages
NLP Record
No ratings yet
NLP Record
16 pages
NLP Lab
No ratings yet
NLP Lab
7 pages
Shubham Jade MSC It 31031420010 NLP Practical Journal
No ratings yet
Shubham Jade MSC It 31031420010 NLP Practical Journal
17 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
32 pages
1a NLTK
No ratings yet
1a NLTK
10 pages
Python Reference: An Alphabetical Guide
From Everand
Python Reference: An Alphabetical Guide
Jo Foster
No ratings yet
Text Similarity in Vector Space Models: A Comparative Study
No ratings yet
Text Similarity in Vector Space Models: A Comparative Study
17 pages
NLP in Medical
No ratings yet
NLP in Medical
11 pages
Exploring Topics of The Female Librarians: Topic Modelling Approach On Research Articles
No ratings yet
Exploring Topics of The Female Librarians: Topic Modelling Approach On Research Articles
17 pages
06 - Ed Tech 2 - EUA China India
No ratings yet
06 - Ed Tech 2 - EUA China India
18 pages
Text Mining With R
No ratings yet
Text Mining With R
15 pages
Nov Dec 2023
No ratings yet
Nov Dec 2023
2 pages
Mondal
No ratings yet
Mondal
14 pages
Unveiling Cryptocurrency Conversations Insights From Data Mining and Unsupervised Learning Across Multiple Platforms
No ratings yet
Unveiling Cryptocurrency Conversations Insights From Data Mining and Unsupervised Learning Across Multiple Platforms
11 pages
(IJCST-V7I6P7) :shatha Alsaedi
No ratings yet
(IJCST-V7I6P7) :shatha Alsaedi
6 pages
AI Syllabus - IBM
No ratings yet
AI Syllabus - IBM
18 pages
AI-Based Literature Reviews: A Topic Modeling Approach: Manoj Kumar Verma and Mayank Yuvaraj
No ratings yet
AI-Based Literature Reviews: A Topic Modeling Approach: Manoj Kumar Verma and Mayank Yuvaraj
8 pages
Natural Language Processing Question Bank
No ratings yet
Natural Language Processing Question Bank
3 pages
Health Monitoring
0% (1)
Health Monitoring
39 pages
A Narrowing of AI Research? Joel Klinger
No ratings yet
A Narrowing of AI Research? Joel Klinger
58 pages
Unit 3 Ba
No ratings yet
Unit 3 Ba
29 pages
Cyberbullying Detection Based On Semantic-Enhanced Marginalized Denoising Auto-Encoder PDF
No ratings yet
Cyberbullying Detection Based On Semantic-Enhanced Marginalized Denoising Auto-Encoder PDF
12 pages
Public Perception of Electric Vehicles On Reddit Over The Past Decade
No ratings yet
Public Perception of Electric Vehicles On Reddit Over The Past Decade
12 pages
Topic Modeling Text Clustering Based On Deep Learning Model
No ratings yet
Topic Modeling Text Clustering Based On Deep Learning Model
11 pages
Term Paper Int 423
No ratings yet
Term Paper Int 423
9 pages
Analytical Framework For Evaluating Digital Diplomacy Using Network Analysis and Topic Modeling: Comparing South Korea and Japan
No ratings yet
Analytical Framework For Evaluating Digital Diplomacy Using Network Analysis and Topic Modeling: Comparing South Korea and Japan
16 pages
POLS-8500-syllabus-text As Data UGA
No ratings yet
POLS-8500-syllabus-text As Data UGA
4 pages
Unveiling The Hidden Truth of Drug Addiction A Social Media Approach Using Similarity Network Based Deep Learning
No ratings yet
Unveiling The Hidden Truth of Drug Addiction A Social Media Approach Using Similarity Network Based Deep Learning
31 pages
How Can We Achieve Better E-Learning Success in The New Normal
No ratings yet
How Can We Achieve Better E-Learning Success in The New Normal
32 pages
Topic Modeling v.02
No ratings yet
Topic Modeling v.02
26 pages
Tanev & Sieklicki, 2025, TM As Semantic Technology, Applsci-15-03253
No ratings yet
Tanev & Sieklicki, 2025, TM As Semantic Technology, Applsci-15-03253
23 pages
Topic Modeling in Embedding Spaces
No ratings yet
Topic Modeling in Embedding Spaces
12 pages
AI PHD Syllabus S2024
No ratings yet
AI PHD Syllabus S2024
17 pages
Chengqing Zong - Rui Xia - Jiajun Zhang - Text Data Mining-Springer Singapore
No ratings yet
Chengqing Zong - Rui Xia - Jiajun Zhang - Text Data Mining-Springer Singapore
528 pages
(IJIT-V6I4P8) :nikita R. Dandwate, Sarika B. Solanke
No ratings yet
(IJIT-V6I4P8) :nikita R. Dandwate, Sarika B. Solanke
5 pages
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
No ratings yet
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
37 pages