0% found this document useful (0 votes)
13 views13 pages

Module_5-Natural_language_processing[1]

Uploaded by

hana fathima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views13 pages

Module_5-Natural_language_processing[1]

Uploaded by

hana fathima
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Module 5- Natural language processing (NLP)

Natural language processing (NLP) refers to computational techniques involving language.


Word Clouds
● A word cloud is a simple, weighted visual representation of the vocabulary contained in a textual
dataset that allows us to estimate the contents of the data at a glance. It contains the most frequently
occurring words in the data, with more frequent words appearing larger in size than less frequent
ones.
● The word cloud approach is just to arrange the words on a page in a cool-looking font.
For example, imagine that, for each of some collection of data science–related buzzwords, you have two
numbers between 0 and 100 — the first representing how frequently it appears in job postings, the second
how frequently it appears on resumes:

This looks neat but doesn’t really tell us anything. A more interesting approach might be to scatter them so
that horizontal position indicates posting popularity and vertical position indicates resume popularity,
which produces a visualization that conveys a few insights

From a text we build a cloud of words applying Natural Language Processing (NLP) techniques
n-gram Models

N-grams in NLP refers to contiguous sequences of n words extracted from text for language
processing and analysis. An n-gram can be as short as a single word (unigram) or as long as
multiple words (bigram, trigram, etc.). These n-grams capture the contextual information and
relationships between words in a given text.
we can use n-grams to generate language models to predict which word comes next given a history
of words.
An N-gram language model predicts the probability of a given N-gram within any sequence of
words in the language. A good N-gram model can predict the next word in the sentence i.e the
value of p(w|h)
Example of N-gram such as unigram (“This”, “article”, “is”, “on”, “NLP”) or bi-gram (‘This
article’, ‘article is’, ‘is on’,’on NLP’).

An N-gram model is a statistical language model used in natural language processing (NLP) and
computational linguistics. It predicts the likelihood of a word (or sequence of words) based on the
preceding N-1 words.

Here’s a breakdown of the key concepts:

1. N-gram: An N-gram is a sequence of N words. For example, in the sentence "I love natural
language processing," some examples of N-grams are:
o 1-gram (unigram): "I", "love", "natural", "language", "processing"
o 2-gram (bigram): "I love", "love natural", "natural language", "language
processing"
o 3-gram (trigram): "I love natural", "love natural language", "natural language
processing"
2. N-gram Model: An N-gram model predicts the probability of a word given the N-1
preceding words. It's based on the Markov assumption that the probability of a word only
depends on the previous N-1 words (not the entire history of preceding words).
3. Application: N-gram models are used in various NLP tasks:
o Language Modeling: Predicting the next word in a sequence.
o Speech Recognition: Matching acoustic signals to sequences of words.
o Machine Translation: Predicting the next word in a translated sentence.
o Spell Checking and Correction: Identifying errors by looking at the probabilities
of word sequences.

Grammer

A different approach to modeling language is with grammars, rules for generating acceptable
sentences.
Grammar is defined as the rules for forming well-structured sentences.

Grammar in NLP is a set of rules for constructing sentences in a language used to understand and
analyze the structure of sentences in text data.
This includes identifying parts of speech such as nouns, verbs, and adjectives, determining the
subject and predicate of a sentence, and identifying the relationships between words and phrases.

● Mathematically, a grammar G can be written as a 4-tuple (N, T, S, P) where:


o N or VN = set of non-terminal symbols or variables.
o T or ∑ = set of terminal symbols.
o S = Start symbol where S ∈ N
o P = Production rules for Terminals as well as Non-terminals.

Let's construct a grammar that can generate simple English sentences:

1. Grammar Rules:

We'll define a set of production rules in the form of A -> B, where A is a non-terminal
symbol (category) and B is a sequence of symbols (terminals or non-terminals).

oS -> NP VP: A sentence (S) consists of a noun phrase (NP) followed by a verb
phrase (VP).
o NP -> Det N: A noun phrase (NP) consists of a determiner (Det) followed by a noun
(N).
o NP -> ProperNoun: A noun phrase (NP) can also be a proper noun (ProperNoun).
o VP -> V NP: A verb phrase (VP) consists of a verb (V) followed by a noun phrase
(NP).
o Det -> "the" | "a": Determiners (Det) can be "the" or "a".
o N -> "dog" | "cat" | "ball": Nouns (N) can be "dog", "cat", or "ball".
o ProperNoun -> "John" | "Mary" | "Alice": Proper nouns (ProperNoun) can be
"John", "Mary", or "Alice".
o V -> "chased" | "ate" | "threw": Verbs (V) can be "chased", "ate", or "threw".
2. Generating Sentences:

Using the above grammar rules, we can generate valid English sentences:

o Example 1:
▪ Start with S.
▪ Apply S -> NP VP.
▪ Apply NP -> ProperNoun (e.g., "John").
▪ Apply VP -> V NP (e.g., "chased" NP).
▪ Apply NP -> Det N (e.g., "the" N).
▪ Apply N -> "dog" (e.g., "the dog").
▪ Constructed sentence: "John chased the dog."
o Example 2:
▪ Start with S.
▪ Apply S -> NP VP.
▪ Apply NP -> Det N (e.g., "a" N).
▪ Apply N -> "cat" (e.g., "a cat").
▪ Apply VP -> V NP (e.g., "ate" NP).
▪ Apply NP -> ProperNoun (e.g., "Alice").
▪ Constructed sentence: "A cat ate Alice."

Example 3

Topic Modeling

Topic modeling in NLP refers to the process of automatically identifying topics or themes present
in a collection of text documents. It's a statistical technique used to uncover the hidden semantic
structures in a corpus and is widely used for tasks like document clustering, information retrieval,
and summarization. One of the most popular algorithms for topic modeling is Latent Dirichlet
Allocation (LDA). Here's an overview of how topic modeling works and its applications:
Latent Dirichlet Allocation (LDA)
1. Conceptual Basis:
o LDA assumes that each document in a corpus (a large and structured set of texts (written
or spoken)) is a mixture of topics, and each topic is a mixture of words. It posits a generative
process where:
▪ Each document is generated by sampling a distribution of topics.
▪ Each word in the document is generated by sampling a topic from the document's
topic distribution and then sampling a word from the topic's word distribution.
2. Key Components:
o Topics: Latent topics are distributions over words. Each topic can be interpreted as a set
of words that co-occur frequently within the same context.
o Document-Topic Distribution: Each document is represented as a distribution over
topics, indicating the proportion of each topic present in the document.
o Word-Topic Distribution: Each topic is represented as a distribution over words,
indicating the likelihood of each word appearing under that topic.
3. Steps in LDA:
o Initialization: Start with random assignment of words to topics.
o Iteration: Iteratively refine the assignment of words to topics to maximize the likelihood
of the observed data under the model.
o Inference: Estimate the posterior distribution of topics given the words in the documents
using techniques like variational inference or Gibbs sampling.
4. Applications:
o Document Clustering: Group similar documents together based on their topic
distributions.
o Information Retrieval: Identify relevant documents based on their topic distributions
rather than just keywords.
o Topic Summarization: Automatically generate summaries of documents based on the
most representative topics.
o Content Recommendation: Recommend related articles or documents based on their
topic similarity.
o Exploratory Analysis: Gain insights into large collections of text data by identifying
prevalent themes or trends.

Gibbs sampling

Gibbs sampling is a technique for generating samples from multidimensional distributions when we only
know some of the conditional distributions.

Working of Gibbs sampling

Start with any (valid) value for x and y and then repeatedly alternate replacing x with a random value picked
conditional on y and replacing y with a random value picked conditional on x. After a number of iterations,
the resulting values of x and y will represent a sample from the unconditional joint distribution

Gibbs Sampling is used to infer the topic distribution of words in documents and the word distribution of
topics. In LDA, for instance, each word in a document is assigned to a topic based on the current
assignments of all other words in the document. Gibbs Sampling iteratively updates these assignments to
approximate the posterior distribution over topics.

Network analysis

Network analysis is a field of study that focuses on analyzing and understanding complex systems
represented as networks or graphs. These networks consist of nodes (vertices) and edges
(connections between nodes), which can represent a wide range of entities and relationships
depending on the context of study.
Example

Social Networks: Nodes represent individuals or organizations, and edges represent relationships
(friendships, collaborations, etc.).

Information Networks: Nodes represent sources of information, and edges represent


communication or citation links.

Betweenness centrality

Betweenness centrality is a measure(metric) used in network analysis to identify the importance


of a node within a network. It quantifies the number of times a node acts as a bridge along the
shortest path between two other nodes in the network.

Here’s a breakdown of what betweenness centrality entails:

1. Definition: Betweenness centrality of a node v is calculated based on the number of


shortest paths between pairs of nodes that pass through v.
2. Importance: Nodes with high betweenness centrality have significant influence over
the transfer of information or resources within a network. They often control the flow
of information between other nodes, acting as connectors or bridges.
3. Calculation: The betweenness centrality CB(v) of a node v is computed as the
fraction of shortest paths between all pairs of nodes that pass through v:

where:

o σst is the total number of shortest paths from node s to node t,


o σst(v) is the number of those paths that pass through v.
4. Application: Betweenness centrality is used in various fields such as social network
analysis, transportation network analysis, and infrastructure analysis to identify
critical nodes that, if removed, could disrupt the network.

Closeness centrality

Closeness centrality is a measure(metric) used in network analysis to determine how central a node
is to the network by calculating the average shortest path distance from the node to all other nodes
in the network. In essence, it quantifies how quickly a node can interact with other nodes in the
network.

Here are the key aspects of closeness centrality:


1. Definition: Closeness centrality of a node v is defined as the reciprocal of the average
shortest path distance from v to all other nodes in the network. It measures how close a
node is to all other nodes in the network on average.

3.Interpretation: Nodes with high closeness centrality are those that can reach other nodes
quickly. They are effective in spreading information or influence efficiently across the network
because they have shorter average distances to other nodes.

Eigenvector centrality

Eigenvector centrality is another measure(metric) used in network analysis to assess the


importance of a node within a network. It evaluates a node's centrality based not only on its direct
connections, but also on the centrality of its neighboring nodes. In essence, it assigns higher
centrality scores to nodes that are connected to other nodes that are themselves central in the
network.

Here are the key points about eigenvector centrality:

1. Definition: Eigenvector centrality of a node v is a measure that assigns a score to the node
based on the principle that connections to high-scoring nodes contribute more to the node's
score than connections to low-scoring nodes.
3. Interpretation: Nodes with higher eigenvector centrality scores are those that are not only
well-connected but are also connected to other nodes that themselves have high centrality
scores. Therefore, eigenvector centrality captures a notion of influence that propagates through
the network.

Directed Graphs and PageRank

In network analysis, directed graphs and PageRank play significant roles in understanding and
evaluating the structure and importance of nodes within a network. Here’s how they are utilized:

Directed Graphs:

● Definition: A directed graph consists of nodes (vertices) connected by directed edges


(arcs), where each edge has a direction indicating the flow or relationship between nodes.
● Representation: Directed graphs are used to model relationships where the direction of
interaction matters, such as in social networks (followers and followees), citation networks
(citing and cited papers), and traffic flow networks (roads and intersections).
● Analysis: Directed graphs allow for the analysis of connectivity patterns, influence
propagation, and flow dynamics within a network. They enable the identification of nodes
with high in-degree (receiving many connections) and out-degree (sending many
connections), which can reveal nodes’ roles and importance within the network structure.

PageRank:

● Algorithm: PageRank is an algorithm developed by Google founders to measure the


importance of nodes in a directed graph, originally applied to web pages and their
hyperlinks.
● Concept: PageRank assigns a numerical score to each node based on the quantity and
quality of connections to it. Nodes with higher incoming links (backlinks) from important
nodes are considered more influential.
● Calculation: PageRank iteratively computes the score for each node by propagating
importance through the network, where nodes with higher scores contribute more to the
scores of nodes they link to.
● Applications: In network analysis, PageRank is used to identify central nodes that act as
hubs or authorities within a network, indicating their influence in information
dissemination, communication flow, or resource distribution.

Recommender Systems

One of the common data problem is producing recommendations of some sort. Netflix recommends movies
you might want to watch. Amazon recommends products you might want to buy. Twitter recommends users
you might want to follow.

There are several ways to use data to make recommendations. H

Manual Curation: Manual curation in recommender systems refers to the process of human intervention
in selecting, filtering, or modifying recommendations that the system generates.

Example: Before the Internet, when we needed book recommendations we would go to the library, where
a librarian was available to suggest books that were relevant to our interests.

But this method doesn’t scale particularly well, and it’s limited by individual personal knowledge and
imagination.

Recommending What’s Popular: One easy approach is to simply recommend what’s popular

Most recommendation systems use collaborative filtering to find similar patterns or information
about the users. The two types of Collaborative Filtering are user-based and item-based.

User-Based Collaborative Filtering

User-Based Collaborative Filtering is a technique used to predict the items that a user might like
on the basis of ratings given to that item by other users who have similar taste with that of the
target user.

Step1: Measure similarity between users using cosine similarity

Given two vectors, v and w, it’s defined as:

def cosine_similarity(v, w):


return dot(v, w) / math.sqrt(dot(v, v) * dot(w, w))
● It measures the “angle” between v and w.
● If v and w point in the same direction, then the numerator and denominator are equal, and
their cosine similarity equals 1.
● If v and w point in opposite directions, then their cosine similarity equals -1.
● And if v is 0 whenever w is not (and vice versa) then dot(v, w) is 0 and so the cosine
similarity will be 0.
● We’ll apply this to vectors of 0s and 1s, each vector v representing one user’s interests.
● v[i] will be 1 if the user is specified the ith interest, 0 otherwise.
● Accordingly, “similar users” will mean “users whose interest vectors most nearly point in
the same direction.”
● Users with identical interests will have similarity 1. Users with no identical interests will
have similarity 0.
● Otherwise the similarity will fall in between, with numbers closer to 1 indicating “very
similar” and numbers closer to 0 indicating “not very similar.”

Step 2: Find Unique interests


A good place to start is collecting the known interests and (implicitly) assigning indices to them.
by using a set comprehension to find the unique interests, putting them in a list, and then sorting
them.
The first interest in the resulting list will be interest 0, and so on:
unique_interests = sorted(list({ interest
for user_interests in users_interests
for interest in user_interests }))
Step3: Create a user interest vector
Produce an “interest” vector of 0s and 1s for each user. We just need to iterate over the
unique_interests list, substituting a 1 if the user has each interest, a 0 if not:

Step 3: Create a matrix of user interests

user_interest_matrix[i][j] equals 1 if user i specified interest j, 0 otherwise.

Step 4: Compute the pairwise similarities between all of our users

user_similarities[i][j] gives us the similarity between users i and j.


Step 5: Finds the most similar users to a given user.

Step 6: Generating Suggestion


For a given user (user_id), use the most similar users found in the previous step to suggest
new interests.
Aggregate interests from these similar users, weighting each interest by the similarity score of
the user who has that interest.

Step 7 Implementation of Suggestions:

Implement user_based_suggestions function that sums up the similarities of other users


interested in each interest. Sort the interests based on their total weighted similarity scores.

The output of user_based_suggestions(user_id) is a list of suggested interests sorted by


their weighted similarity scores.

Limitations:
As noted, this approach may face challenges with very large datasets or high-dimensional
interest spaces:

● Curse of Dimensionality: In large datasets with many interests, finding truly similar users
becomes harder.
● Sparse Data: If users have only a few interests, similarity calculations may not accurately reflect
user preferences.
● Dynamic Interests: User interests may change over time, requiring constant updates to similarity
calculations.

Item-Based Collaborative Filtering

The alternative approach described here focuses on computing similarities between interests
directly, rather than between users. This method allows for generating recommendations by
aggregating interests that are similar to a user's current interests. Here's how it works step-by-
step:
1. Transposing User-Interest Matrix
First, transpose the user_interest_matrix so that rows correspond to interests and columns
correspond to users. This transformation allows us to compute similarities between interests.
interest_user_matrix = [[user_interest_vector[j] for user_interest_vector in
user_interest_matrix] for j, _ in enumerate(unique_interests)]
Here, interest_user_matrix[j] will have 1 for each user who has the interest
unique_interests[j], and 0 otherwise.

2. Computing Interest Similarities


Use cosine similarity to compute pairwise similarities between interests based on
interest_user_matrix. This creates an interest_similarities matrix where
interest_similarities[i][j] represents the cosine similarity between interest i and interest
j.
interest_similarities = [[cosine_similarity(user_vector_i, user_vector_j) for
user_vector_j in interest_user_matrix] for user_vector_i in
interest_user_matrix]

3. Finding Most Similar Interests


Implement a function (most_similar_interests_to) to find interests most similar to a given
interest (interest_id). It filters out the interest itself and interests with zero similarity, then
sorts them by similarity in descending order.
def most_similar_interests_to(interest_id):
similarities = interest_similarities[interest_id]
pairs = [(unique_interests[other_interest_id], similarity)
for other_interest_id, similarity in enumerate(similarities)
if interest_id != other_interest_id and similarity > 0]
return sorted(pairs, key=lambda (_, similarity): similarity,
reverse=True)
4. Generating Suggestions for Users
For a given user (user_id), aggregate the similarities of interests similar to their current interests
to generate recommendations.
def item_based_suggestions(user_id, include_current_interests=False):
suggestions = defaultdict(float)
user_interest_vector = user_interest_matrix[user_id]

for interest_id, is_interested in enumerate(user_interest_vector):


if is_interested == 1:
similar_interests = most_similar_interests_to(interest_id)
for interest, similarity in similar_interests:
suggestions[interest] += similarity

suggestions = sorted(suggestions.items(), key=lambda (_, similarity):


similarity, reverse=True)

if include_current_interests:
return suggestions
else:
return [(suggestion, weight) for suggestion, weight in suggestions if
suggestion not in users_interests[user_id]]

Matrix Factorization

Matrix Factorization is a powerful technique used in recommendation systems to decompose a large user-
item interaction matrix into lower-dimensional matrices that represent latent factors. This approach aims to
uncover hidden patterns or latent features that explain the observed interactions between users and items
(or interests, in this case). Singular Value Decomposition (SVD) is a classical method for matrix
factorization.

Basic Concept:
1. User-Item Matrix:
o Suppose you have a matrix R where rows correspond to users and columns correspond to
items (or interests). Each entry R[i][j] represents the interaction (e.g., rating, interest
indication) of user i with item j.
2. Matrix Decomposition:
o Matrix Factorization decomposes the matrix R into two lower-dimensional matrices:
▪ User matrix U: Represents users in terms of latent factors.
▪ Item matrix V: Represents items (interests) in terms of the same latent factors.
3. Learning Latent Factors:
o The goal is to learn the matrices U and V such that their product approximates R well.
This is typically done by minimizing a loss function that quantifies the difference
between the predicted ratings (or interest indications) and the actual ratings in R.
4. Recommendations:
o Once U and V are learned, recommendations can be made by:
▪ Predicting the missing entries in R.
▪ Recommending items (interests) with the highest predicted values for a given
user.

You might also like