0% found this document useful (0 votes)

13 views13 pages

Module_5-Natural_language_processing[1]

Uploaded by

hana fathima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views13 pages

Module_5-Natural_language_processing[1]

Uploaded by

hana fathima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Module 5- Natural language processing (NLP)

Natural language processing (NLP) refers to computational techniques involving language.

Word Clouds
● A word cloud is a simple, weighted visual representation of the vocabulary contained in a textual
dataset that allows us to estimate the contents of the data at a glance. It contains the most frequently
occurring words in the data, with more frequent words appearing larger in size than less frequent
ones.
● The word cloud approach is just to arrange the words on a page in a cool-looking font.
For example, imagine that, for each of some collection of data science–related buzzwords, you have two
numbers between 0 and 100 — the first representing how frequently it appears in job postings, the second
how frequently it appears on resumes:

This looks neat but doesn’t really tell us anything. A more interesting approach might be to scatter them so
that horizontal position indicates posting popularity and vertical position indicates resume popularity,
which produces a visualization that conveys a few insights

From a text we build a cloud of words applying Natural Language Processing (NLP) techniques
n-gram Models

N-grams in NLP refers to contiguous sequences of n words extracted from text for language
processing and analysis. An n-gram can be as short as a single word (unigram) or as long as
multiple words (bigram, trigram, etc.). These n-grams capture the contextual information and
relationships between words in a given text.
we can use n-grams to generate language models to predict which word comes next given a history
of words.
An N-gram language model predicts the probability of a given N-gram within any sequence of
words in the language. A good N-gram model can predict the next word in the sentence i.e the
value of p(w|h)
Example of N-gram such as unigram (“This”, “article”, “is”, “on”, “NLP”) or bi-gram (‘This
article’, ‘article is’, ‘is on’,’on NLP’).

An N-gram model is a statistical language model used in natural language processing (NLP) and
computational linguistics. It predicts the likelihood of a word (or sequence of words) based on the
preceding N-1 words.

Here’s a breakdown of the key concepts:

1. N-gram: An N-gram is a sequence of N words. For example, in the sentence "I love natural
language processing," some examples of N-grams are:
o 1-gram (unigram): "I", "love", "natural", "language", "processing"
o 2-gram (bigram): "I love", "love natural", "natural language", "language
processing"
o 3-gram (trigram): "I love natural", "love natural language", "natural language
processing"
2. N-gram Model: An N-gram model predicts the probability of a word given the N-1
preceding words. It's based on the Markov assumption that the probability of a word only
depends on the previous N-1 words (not the entire history of preceding words).
3. Application: N-gram models are used in various NLP tasks:
o Language Modeling: Predicting the next word in a sequence.
o Speech Recognition: Matching acoustic signals to sequences of words.
o Machine Translation: Predicting the next word in a translated sentence.
o Spell Checking and Correction: Identifying errors by looking at the probabilities
of word sequences.

Grammer

A different approach to modeling language is with grammars, rules for generating acceptable
sentences.
Grammar is defined as the rules for forming well-structured sentences.

Grammar in NLP is a set of rules for constructing sentences in a language used to understand and
analyze the structure of sentences in text data.
This includes identifying parts of speech such as nouns, verbs, and adjectives, determining the
subject and predicate of a sentence, and identifying the relationships between words and phrases.

● Mathematically, a grammar G can be written as a 4-tuple (N, T, S, P) where:

o N or VN = set of non-terminal symbols or variables.
o T or ∑ = set of terminal symbols.
o S = Start symbol where S ∈ N
o P = Production rules for Terminals as well as Non-terminals.

Let's construct a grammar that can generate simple English sentences:

1. Grammar Rules:

We'll define a set of production rules in the form of A -> B, where A is a non-terminal
symbol (category) and B is a sequence of symbols (terminals or non-terminals).

oS -> NP VP: A sentence (S) consists of a noun phrase (NP) followed by a verb
phrase (VP).
o NP -> Det N: A noun phrase (NP) consists of a determiner (Det) followed by a noun
(N).
o NP -> ProperNoun: A noun phrase (NP) can also be a proper noun (ProperNoun).
o VP -> V NP: A verb phrase (VP) consists of a verb (V) followed by a noun phrase
(NP).
o Det -> "the" | "a": Determiners (Det) can be "the" or "a".
o N -> "dog" | "cat" | "ball": Nouns (N) can be "dog", "cat", or "ball".
o ProperNoun -> "John" | "Mary" | "Alice": Proper nouns (ProperNoun) can be
"John", "Mary", or "Alice".
o V -> "chased" | "ate" | "threw": Verbs (V) can be "chased", "ate", or "threw".
2. Generating Sentences:

Using the above grammar rules, we can generate valid English sentences:

o Example 1:
▪ Start with S.
▪ Apply S -> NP VP.
▪ Apply NP -> ProperNoun (e.g., "John").
▪ Apply VP -> V NP (e.g., "chased" NP).
▪ Apply NP -> Det N (e.g., "the" N).
▪ Apply N -> "dog" (e.g., "the dog").
▪ Constructed sentence: "John chased the dog."
o Example 2:
▪ Start with S.
▪ Apply S -> NP VP.
▪ Apply NP -> Det N (e.g., "a" N).
▪ Apply N -> "cat" (e.g., "a cat").
▪ Apply VP -> V NP (e.g., "ate" NP).
▪ Apply NP -> ProperNoun (e.g., "Alice").
▪ Constructed sentence: "A cat ate Alice."

Example 3

Topic Modeling

Topic modeling in NLP refers to the process of automatically identifying topics or themes present
in a collection of text documents. It's a statistical technique used to uncover the hidden semantic
structures in a corpus and is widely used for tasks like document clustering, information retrieval,
and summarization. One of the most popular algorithms for topic modeling is Latent Dirichlet
Allocation (LDA). Here's an overview of how topic modeling works and its applications:
Latent Dirichlet Allocation (LDA)
1. Conceptual Basis:
o LDA assumes that each document in a corpus (a large and structured set of texts (written
or spoken)) is a mixture of topics, and each topic is a mixture of words. It posits a generative
process where:
▪ Each document is generated by sampling a distribution of topics.
▪ Each word in the document is generated by sampling a topic from the document's
topic distribution and then sampling a word from the topic's word distribution.
2. Key Components:
o Topics: Latent topics are distributions over words. Each topic can be interpreted as a set
of words that co-occur frequently within the same context.
o Document-Topic Distribution: Each document is represented as a distribution over
topics, indicating the proportion of each topic present in the document.
o Word-Topic Distribution: Each topic is represented as a distribution over words,
indicating the likelihood of each word appearing under that topic.
3. Steps in LDA:
o Initialization: Start with random assignment of words to topics.
o Iteration: Iteratively refine the assignment of words to topics to maximize the likelihood
of the observed data under the model.
o Inference: Estimate the posterior distribution of topics given the words in the documents
using techniques like variational inference or Gibbs sampling.
4. Applications:
o Document Clustering: Group similar documents together based on their topic
distributions.
o Information Retrieval: Identify relevant documents based on their topic distributions
rather than just keywords.
o Topic Summarization: Automatically generate summaries of documents based on the
most representative topics.
o Content Recommendation: Recommend related articles or documents based on their
topic similarity.
o Exploratory Analysis: Gain insights into large collections of text data by identifying
prevalent themes or trends.

Gibbs sampling

Gibbs sampling is a technique for generating samples from multidimensional distributions when we only
know some of the conditional distributions.

Working of Gibbs sampling

Start with any (valid) value for x and y and then repeatedly alternate replacing x with a random value picked
conditional on y and replacing y with a random value picked conditional on x. After a number of iterations,
the resulting values of x and y will represent a sample from the unconditional joint distribution

Gibbs Sampling is used to infer the topic distribution of words in documents and the word distribution of
topics. In LDA, for instance, each word in a document is assigned to a topic based on the current
assignments of all other words in the document. Gibbs Sampling iteratively updates these assignments to
approximate the posterior distribution over topics.

Network analysis

Network analysis is a field of study that focuses on analyzing and understanding complex systems
represented as networks or graphs. These networks consist of nodes (vertices) and edges
(connections between nodes), which can represent a wide range of entities and relationships
depending on the context of study.
Example

Social Networks: Nodes represent individuals or organizations, and edges represent relationships
(friendships, collaborations, etc.).

Information Networks: Nodes represent sources of information, and edges represent

communication or citation links.

Betweenness centrality

Betweenness centrality is a measure(metric) used in network analysis to identify the importance

of a node within a network. It quantifies the number of times a node acts as a bridge along the
shortest path between two other nodes in the network.

Here’s a breakdown of what betweenness centrality entails:

1. Definition: Betweenness centrality of a node v is calculated based on the number of

shortest paths between pairs of nodes that pass through v.
2. Importance: Nodes with high betweenness centrality have significant influence over
the transfer of information or resources within a network. They often control the flow
of information between other nodes, acting as connectors or bridges.
3. Calculation: The betweenness centrality CB(v) of a node v is computed as the
fraction of shortest paths between all pairs of nodes that pass through v:

where:

o σst is the total number of shortest paths from node s to node t,

o σst(v) is the number of those paths that pass through v.
4. Application: Betweenness centrality is used in various fields such as social network
analysis, transportation network analysis, and infrastructure analysis to identify
critical nodes that, if removed, could disrupt the network.

Closeness centrality

Closeness centrality is a measure(metric) used in network analysis to determine how central a node
is to the network by calculating the average shortest path distance from the node to all other nodes
in the network. In essence, it quantifies how quickly a node can interact with other nodes in the
network.

Here are the key aspects of closeness centrality:

1. Definition: Closeness centrality of a node v is defined as the reciprocal of the average
shortest path distance from v to all other nodes in the network. It measures how close a
node is to all other nodes in the network on average.

3.Interpretation: Nodes with high closeness centrality are those that can reach other nodes
quickly. They are effective in spreading information or influence efficiently across the network
because they have shorter average distances to other nodes.

Eigenvector centrality

Eigenvector centrality is another measure(metric) used in network analysis to assess the

importance of a node within a network. It evaluates a node's centrality based not only on its direct
connections, but also on the centrality of its neighboring nodes. In essence, it assigns higher
centrality scores to nodes that are connected to other nodes that are themselves central in the
network.

Here are the key points about eigenvector centrality:

1. Definition: Eigenvector centrality of a node v is a measure that assigns a score to the node
based on the principle that connections to high-scoring nodes contribute more to the node's
score than connections to low-scoring nodes.
3. Interpretation: Nodes with higher eigenvector centrality scores are those that are not only
well-connected but are also connected to other nodes that themselves have high centrality
scores. Therefore, eigenvector centrality captures a notion of influence that propagates through
the network.

Directed Graphs and PageRank

In network analysis, directed graphs and PageRank play significant roles in understanding and
evaluating the structure and importance of nodes within a network. Here’s how they are utilized:

Directed Graphs:

● Definition: A directed graph consists of nodes (vertices) connected by directed edges

(arcs), where each edge has a direction indicating the flow or relationship between nodes.
● Representation: Directed graphs are used to model relationships where the direction of
interaction matters, such as in social networks (followers and followees), citation networks
(citing and cited papers), and traffic flow networks (roads and intersections).
● Analysis: Directed graphs allow for the analysis of connectivity patterns, influence
propagation, and flow dynamics within a network. They enable the identification of nodes
with high in-degree (receiving many connections) and out-degree (sending many
connections), which can reveal nodes’ roles and importance within the network structure.

PageRank:

● Algorithm: PageRank is an algorithm developed by Google founders to measure the

importance of nodes in a directed graph, originally applied to web pages and their
hyperlinks.
● Concept: PageRank assigns a numerical score to each node based on the quantity and
quality of connections to it. Nodes with higher incoming links (backlinks) from important
nodes are considered more influential.
● Calculation: PageRank iteratively computes the score for each node by propagating
importance through the network, where nodes with higher scores contribute more to the
scores of nodes they link to.
● Applications: In network analysis, PageRank is used to identify central nodes that act as
hubs or authorities within a network, indicating their influence in information
dissemination, communication flow, or resource distribution.

Recommender Systems

One of the common data problem is producing recommendations of some sort. Netflix recommends movies
you might want to watch. Amazon recommends products you might want to buy. Twitter recommends users
you might want to follow.

There are several ways to use data to make recommendations. H

Manual Curation: Manual curation in recommender systems refers to the process of human intervention
in selecting, filtering, or modifying recommendations that the system generates.

Example: Before the Internet, when we needed book recommendations we would go to the library, where
a librarian was available to suggest books that were relevant to our interests.

But this method doesn’t scale particularly well, and it’s limited by individual personal knowledge and
imagination.

Recommending What’s Popular: One easy approach is to simply recommend what’s popular

Most recommendation systems use collaborative filtering to find similar patterns or information
about the users. The two types of Collaborative Filtering are user-based and item-based.

User-Based Collaborative Filtering

User-Based Collaborative Filtering is a technique used to predict the items that a user might like
on the basis of ratings given to that item by other users who have similar taste with that of the
target user.

Step1: Measure similarity between users using cosine similarity

Given two vectors, v and w, it’s defined as:

def cosine_similarity(v, w):

return dot(v, w) / math.sqrt(dot(v, v) * dot(w, w))
● It measures the “angle” between v and w.
● If v and w point in the same direction, then the numerator and denominator are equal, and
their cosine similarity equals 1.
● If v and w point in opposite directions, then their cosine similarity equals -1.
● And if v is 0 whenever w is not (and vice versa) then dot(v, w) is 0 and so the cosine
similarity will be 0.
● We’ll apply this to vectors of 0s and 1s, each vector v representing one user’s interests.
● v[i] will be 1 if the user is specified the ith interest, 0 otherwise.
● Accordingly, “similar users” will mean “users whose interest vectors most nearly point in
the same direction.”
● Users with identical interests will have similarity 1. Users with no identical interests will
have similarity 0.
● Otherwise the similarity will fall in between, with numbers closer to 1 indicating “very
similar” and numbers closer to 0 indicating “not very similar.”

Step 2: Find Unique interests

A good place to start is collecting the known interests and (implicitly) assigning indices to them.
by using a set comprehension to find the unique interests, putting them in a list, and then sorting
them.
The first interest in the resulting list will be interest 0, and so on:
unique_interests = sorted(list({ interest
for user_interests in users_interests
for interest in user_interests }))
Step3: Create a user interest vector
Produce an “interest” vector of 0s and 1s for each user. We just need to iterate over the
unique_interests list, substituting a 1 if the user has each interest, a 0 if not:

Step 3: Create a matrix of user interests

user_interest_matrix[i][j] equals 1 if user i specified interest j, 0 otherwise.

Step 4: Compute the pairwise similarities between all of our users

user_similarities[i][j] gives us the similarity between users i and j.

Step 5: Finds the most similar users to a given user.

Step 6: Generating Suggestion

For a given user (user_id), use the most similar users found in the previous step to suggest
new interests.
Aggregate interests from these similar users, weighting each interest by the similarity score of
the user who has that interest.

Step 7 Implementation of Suggestions:

Implement user_based_suggestions function that sums up the similarities of other users

interested in each interest. Sort the interests based on their total weighted similarity scores.

The output of user_based_suggestions(user_id) is a list of suggested interests sorted by

their weighted similarity scores.

Limitations:
As noted, this approach may face challenges with very large datasets or high-dimensional
interest spaces:

● Curse of Dimensionality: In large datasets with many interests, finding truly similar users
becomes harder.
● Sparse Data: If users have only a few interests, similarity calculations may not accurately reflect
user preferences.
● Dynamic Interests: User interests may change over time, requiring constant updates to similarity
calculations.

Item-Based Collaborative Filtering

The alternative approach described here focuses on computing similarities between interests
directly, rather than between users. This method allows for generating recommendations by
aggregating interests that are similar to a user's current interests. Here's how it works step-by-
step:
1. Transposing User-Interest Matrix
First, transpose the user_interest_matrix so that rows correspond to interests and columns
correspond to users. This transformation allows us to compute similarities between interests.
interest_user_matrix = [[user_interest_vector[j] for user_interest_vector in
user_interest_matrix] for j, _ in enumerate(unique_interests)]
Here, interest_user_matrix[j] will have 1 for each user who has the interest
unique_interests[j], and 0 otherwise.

2. Computing Interest Similarities

Use cosine similarity to compute pairwise similarities between interests based on
interest_user_matrix. This creates an interest_similarities matrix where
interest_similarities[i][j] represents the cosine similarity between interest i and interest
j.
interest_similarities = [[cosine_similarity(user_vector_i, user_vector_j) for
user_vector_j in interest_user_matrix] for user_vector_i in
interest_user_matrix]

3. Finding Most Similar Interests

Implement a function (most_similar_interests_to) to find interests most similar to a given
interest (interest_id). It filters out the interest itself and interests with zero similarity, then
sorts them by similarity in descending order.
def most_similar_interests_to(interest_id):
similarities = interest_similarities[interest_id]
pairs = [(unique_interests[other_interest_id], similarity)
for other_interest_id, similarity in enumerate(similarities)
if interest_id != other_interest_id and similarity > 0]
return sorted(pairs, key=lambda (_, similarity): similarity,
reverse=True)
4. Generating Suggestions for Users
For a given user (user_id), aggregate the similarities of interests similar to their current interests
to generate recommendations.
def item_based_suggestions(user_id, include_current_interests=False):
suggestions = defaultdict(float)
user_interest_vector = user_interest_matrix[user_id]

for interest_id, is_interested in enumerate(user_interest_vector):

if is_interested == 1:
similar_interests = most_similar_interests_to(interest_id)
for interest, similarity in similar_interests:
suggestions[interest] += similarity

suggestions = sorted(suggestions.items(), key=lambda (_, similarity):

similarity, reverse=True)

if include_current_interests:
return suggestions
else:
return [(suggestion, weight) for suggestion, weight in suggestions if
suggestion not in users_interests[user_id]]

Matrix Factorization

Matrix Factorization is a powerful technique used in recommendation systems to decompose a large user-
item interaction matrix into lower-dimensional matrices that represent latent factors. This approach aims to
uncover hidden patterns or latent features that explain the observed interactions between users and items
(or interests, in this case). Singular Value Decomposition (SVD) is a classical method for matrix
factorization.

Basic Concept:
1. User-Item Matrix:
o Suppose you have a matrix R where rows correspond to users and columns correspond to
items (or interests). Each entry R[i][j] represents the interaction (e.g., rating, interest
indication) of user i with item j.
2. Matrix Decomposition:
o Matrix Factorization decomposes the matrix R into two lower-dimensional matrices:
▪ User matrix U: Represents users in terms of latent factors.
▪ Item matrix V: Represents items (interests) in terms of the same latent factors.
3. Learning Latent Factors:
o The goal is to learn the matrices U and V such that their product approximates R well.
This is typically done by minimizing a loss function that quantifies the difference
between the predicted ratings (or interest indications) and the actual ratings in R.
4. Recommendations:
o Once U and V are learned, recommendations can be made by:
▪ Predicting the missing entries in R.
▪ Recommending items (interests) with the highest predicted values for a given
user.

2020-Sep-Oct - Machine Learning For Systems
No ratings yet
2020-Sep-Oct - Machine Learning For Systems
94 pages
NLP Notes-1
No ratings yet
NLP Notes-1
54 pages
NLP_M4_Part_2_SPP
No ratings yet
NLP_M4_Part_2_SPP
71 pages
EPS (Extended Planetary Significations) System - Ebook
100% (15)
EPS (Extended Planetary Significations) System - Ebook
206 pages
GENERAC 5000 Watt Generator
100% (2)
GENERAC 5000 Watt Generator
16 pages
NLP-Ch-2 Introduction to Language Models
No ratings yet
NLP-Ch-2 Introduction to Language Models
82 pages
NLP UNIT-4
No ratings yet
NLP UNIT-4
62 pages
Unit5 Notes
No ratings yet
Unit5 Notes
17 pages
ai txt unit2
No ratings yet
ai txt unit2
14 pages
NLP 1.2
No ratings yet
NLP 1.2
22 pages
module-1 ch-2
No ratings yet
module-1 ch-2
31 pages
NLP UNIT-5
No ratings yet
NLP UNIT-5
7 pages
IntroductionToNLPAbebeZerihun
No ratings yet
IntroductionToNLPAbebeZerihun
45 pages
Final PPT
No ratings yet
Final PPT
14 pages
Lecture_4_N_grams
No ratings yet
Lecture_4_N_grams
29 pages
Week 12
No ratings yet
Week 12
19 pages
C10_AI_UNIT 3_NLP_ HALF YEARLY
No ratings yet
C10_AI_UNIT 3_NLP_ HALF YEARLY
37 pages
Mechanics of Materials Oel
No ratings yet
Mechanics of Materials Oel
6 pages
Natural Language Processing 5
No ratings yet
Natural Language Processing 5
24 pages
Intro to statistical nlp
No ratings yet
Intro to statistical nlp
57 pages
NLP_DeepNLP
No ratings yet
NLP_DeepNLP
61 pages
Natural Language Processing
No ratings yet
Natural Language Processing
28 pages
AIML-HC Mod 04
No ratings yet
AIML-HC Mod 04
71 pages
Homework Construction LTD
100% (1)
Homework Construction LTD
7 pages
Competency Assessment
No ratings yet
Competency Assessment
29 pages
Form Machinery and Equipments
No ratings yet
Form Machinery and Equipments
19 pages
CSC 528 Lecture 3
No ratings yet
CSC 528 Lecture 3
42 pages
CH 6. Applications of AI-NLP
No ratings yet
CH 6. Applications of AI-NLP
65 pages
NLP Introduction Overview
No ratings yet
NLP Introduction Overview
34 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
Reading Body Language of 7 Meaning Communication
No ratings yet
Reading Body Language of 7 Meaning Communication
10 pages
module5_DS_ppt
No ratings yet
module5_DS_ppt
38 pages
SNLP Overview
No ratings yet
SNLP Overview
43 pages
Accomplishment Report 2016
No ratings yet
Accomplishment Report 2016
2 pages
learn 4
No ratings yet
learn 4
27 pages
Reference Material NLP - 2
No ratings yet
Reference Material NLP - 2
40 pages
Sec 2 E Math Peicai Sec SA2 2018i
No ratings yet
Sec 2 E Math Peicai Sec SA2 2018i
42 pages
Dealing With Textual Data
No ratings yet
Dealing With Textual Data
67 pages
Applications of AI
No ratings yet
Applications of AI
11 pages
NLP_AI_X
No ratings yet
NLP_AI_X
6 pages
NLP Basic - YL
No ratings yet
NLP Basic - YL
16 pages
NLP Intro
No ratings yet
NLP Intro
74 pages
Sony hcd-v707
No ratings yet
Sony hcd-v707
81 pages
UNIT 6- NLP NOTES
No ratings yet
UNIT 6- NLP NOTES
7 pages
Unit 1 NLP KCS072
No ratings yet
Unit 1 NLP KCS072
12 pages
Natural Language Processing
No ratings yet
Natural Language Processing
10 pages
pdf NLP
No ratings yet
pdf NLP
7 pages
NLP ANONYMOUS QB Ans
No ratings yet
NLP ANONYMOUS QB Ans
21 pages
05 Introduction To NLP
No ratings yet
05 Introduction To NLP
63 pages
Unit-4 NLP
No ratings yet
Unit-4 NLP
21 pages
WHT-en-technical-data-sheet (1)
No ratings yet
WHT-en-technical-data-sheet (1)
8 pages
Chapter 4
No ratings yet
Chapter 4
17 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
Human Values & Professional Ethics by MR Vinay Yadav Sir
100% (1)
Human Values & Professional Ethics by MR Vinay Yadav Sir
123 pages
Protoplast Isolation - Technical Notes
No ratings yet
Protoplast Isolation - Technical Notes
4 pages
Forms of Energy
No ratings yet
Forms of Energy
8 pages
AIUnit 6 10
No ratings yet
AIUnit 6 10
8 pages
NLP - CH-6
No ratings yet
NLP - CH-6
4 pages
Sample (Charles Benson)
No ratings yet
Sample (Charles Benson)
23 pages
Module-5:: Network Analysis
No ratings yet
Module-5:: Network Analysis
22 pages
Information Retrieval Using Effective Bigram Topic Modeling
No ratings yet
Information Retrieval Using Effective Bigram Topic Modeling
8 pages
JNCIA - Junos: Initial Configuration
No ratings yet
JNCIA - Junos: Initial Configuration
23 pages
Proposal For Chemistry Project STPM
100% (2)
Proposal For Chemistry Project STPM
2 pages
NLP Q2 21SAL54 Scheme
No ratings yet
NLP Q2 21SAL54 Scheme
6 pages
13 Ai Cse551 NLP 1 PDF
No ratings yet
13 Ai Cse551 NLP 1 PDF
50 pages
Astm E84-2016 PDF
No ratings yet
Astm E84-2016 PDF
22 pages
Analog Solutions For Xilinx Fpgas: 1st Edition
No ratings yet
Analog Solutions For Xilinx Fpgas: 1st Edition
36 pages
Motivation Video: Mitsuku Vs Cleverbot - AI (Artificial Intelligence)
No ratings yet
Motivation Video: Mitsuku Vs Cleverbot - AI (Artificial Intelligence)
45 pages
Introduction To Integrated Library Systems
No ratings yet
Introduction To Integrated Library Systems
36 pages
The Traditional Approach To Natural Language Processing
No ratings yet
The Traditional Approach To Natural Language Processing
7 pages
Langauage Model
No ratings yet
Langauage Model
148 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
Elementary Teachers' Views On Mind Mapping: ISSN 1948-5476 2012, Vol. 4, No. 1
No ratings yet
Elementary Teachers' Views On Mind Mapping: ISSN 1948-5476 2012, Vol. 4, No. 1
9 pages
Chapter 6-NLP
No ratings yet
Chapter 6-NLP
8 pages
TD RWG Ewg Poti en
No ratings yet
TD RWG Ewg Poti en
3 pages
Evaluasi Penerapan Sistem Informasi Absensi Online
No ratings yet
Evaluasi Penerapan Sistem Informasi Absensi Online
9 pages
Nontraditional Manufacturing Processes: Mf30604: Aknath Lecture No.-1 Introduction To Ntmps
No ratings yet
Nontraditional Manufacturing Processes: Mf30604: Aknath Lecture No.-1 Introduction To Ntmps
14 pages
Assignment of AI Finished
No ratings yet
Assignment of AI Finished
16 pages
Natural Language Processing Revision Notes
No ratings yet
Natural Language Processing Revision Notes
4 pages
Ca7 & Ca11
No ratings yet
Ca7 & Ca11
20 pages
Handbook of Aseptic Processing and Packaging PDF
89% (9)
Handbook of Aseptic Processing and Packaging PDF
386 pages
Toyota 2J Engine Data
No ratings yet
Toyota 2J Engine Data
1 page
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Alagappa University, Karaikudi
No ratings yet
Alagappa University, Karaikudi
3 pages
Automation of Disability Claims Handling at Insureit
No ratings yet
Automation of Disability Claims Handling at Insureit
3 pages
5.2 Natural Language Processing
No ratings yet
5.2 Natural Language Processing
43 pages
Unit 5 - Notes
No ratings yet
Unit 5 - Notes
11 pages
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet

Module_5-Natural_language_processing[1]

Uploaded by

Module_5-Natural_language_processing[1]

Uploaded by

Module 5- Natural language processing (NLP)

Natural language processing (NLP) refers to computational techniques involving language.

Here’s a breakdown of the key concepts:

● Mathematically, a grammar G can be written as a 4-tuple (N, T, S, P) where:

Let's construct a grammar that can generate simple English sentences:

Working of Gibbs sampling

Information Networks: Nodes represent sources of information, and edges represent

Betweenness centrality is a measure(metric) used in network analysis to identify the importance

Here’s a breakdown of what betweenness centrality entails:

1. Definition: Betweenness centrality of a node v is calculated based on the number of

o σst is the total number of shortest paths from node s to node t,

Here are the key aspects of closeness centrality:

Eigenvector centrality is another measure(metric) used in network analysis to assess the

Here are the key points about eigenvector centrality:

Directed Graphs and PageRank

● Definition: A directed graph consists of nodes (vertices) connected by directed edges

● Algorithm: PageRank is an algorithm developed by Google founders to measure the

There are several ways to use data to make recommendations. H

User-Based Collaborative Filtering

Step1: Measure similarity between users using cosine similarity

Given two vectors, v and w, it’s defined as:

def cosine_similarity(v, w):

Step 2: Find Unique interests

Step 3: Create a matrix of user interests

user_interest_matrix[i][j] equals 1 if user i specified interest j, 0 otherwise.

Step 4: Compute the pairwise similarities between all of our users

user_similarities[i][j] gives us the similarity between users i and j.

Step 6: Generating Suggestion

Step 7 Implementation of Suggestions:

Implement user_based_suggestions function that sums up the similarities of other users

The output of user_based_suggestions(user_id) is a list of suggested interests sorted by

Item-Based Collaborative Filtering

2. Computing Interest Similarities

3. Finding Most Similar Interests

for interest_id, is_interested in enumerate(user_interest_vector):

suggestions = sorted(suggestions.items(), key=lambda (_, similarity):

You might also like