AP For NLP-LO1
AP For NLP-LO1
CONTENTS:
• NLP
• NLTK
• NLP Pre-
processing
WHAT IS NATURAL LANGUAGE
PROCESSING (NLP)?
• Natural Language Processing is an interdisciplinary field of Artificial
Intelligence.
• It is a technique used to teach a computer to understand Human
languages and also interpret just like us.
• It is the art of extracting information, hidden insights from unstructured
text.
• It is a sophisticated field that makes computers process text data on a
large scale.
• The ultimate goal of NLP is to make computers
and computer-controlled bots understand and interpret Human
Languages, just as we do.
COMPONENTS OF
NLP
• Natural Language
Understanding
• Natural Language
Generation
Figure: Components
of NLP
NATURAL LANGUAGE UNDERSTANDING:-
• NLU helps the machine to understand and analyze human language by extracting
the text from large data such as keywords, emotions, relations, and semantics,
etc.
Let’s see what challenges are faced by a machine-
He is looking for a match.
• What do you understand by the ‘match’ keyword?
• This is Lexical Ambiguity. It happens when a word has different meanings. Lexical
ambiguity can be resolved by using parts-of-speech (POS)tagging techniques.
The Fish is ready to eat.
• What do you understand by the above example?
• This is Syntactical Ambiguity which means when we see more meanings in a
sequence of words and also Called Grammatical Ambiguity.
NATURAL LANGUAGE
GENERATION:-
• Tokenization
• POS- Tagging
• NER
• Lemmatization
• Sentence Boundary Detection
• Text Classification
WHAT IS Gensim
• Topic modeling
• Word 2 Vec
• Document Similarity
• Stop words are used to filter some words which are repetitive and don’t hold any
information. For example, words like – {that these, below, is, are, etc.} don’t provide any
information, so they need to be removed from the text. Stop Words are considered as
Noise. NLTK provides a huge list of stop words
• Very common words like 'in', 'is', and 'an' are often used as stop words since they
don’t add a lot of meaning to a text in and of themselves.
STEMMING
Stemming Lemmatization
Stemming is a process that Lemmatization considers the
stems or removes last few context and converts the word to
characters from a word, often its meaningful base form, which
leading to incorrect meanings is called Lemma.
and spelling.
For instance, stemming the word For instance, lemmatizing
‘Caring’ would return ‘Car’. the word ‘Caring’ would
return ‘Care’.
Stemming is used in case of Lemmatization is
large dataset where computationally expensive
performance is an issue since it involves look-up tables
and what not.
CHUNKING
• Semantic Similarity: Words with similar meanings are located close to each
other in the vector space. For example, the words "king" and "queen" might be
close to each other.
• Word2Vec:
• Fast Text:
Transformers):
• Python Implementation
THANKS!
Any questions?
k-Nearest Neighbors
• To make a prediction for a new data point, the algorithm finds the closest
data points in the training dataset. Its “nearest neighbors.”
k- Neighbors Classification
• The prediction is then simply the known output for this training point
k- Neighbors Classification
k- Neighbors Classification
• Here, three new data points have been added, marked as Star.
• For each of them, I marked the closest point in the training set. The
prediction of the one-nearest-neighbor algorithm is the label of
that point.
• We then assign the class that is more frequent: in other words, the
majority class among the k-nearest neighbors
k- Neighbors Classification
• mglearn.plots.plot_knn_classification(n_neighbors=3)
k- Neighbors Classification
• You can see that the prediction for the new data point at the top
left is not the same as the prediction when we used only one
neighbor.
• Step 1 − For implementing any algorithm, we need dataset. So during the first step
of KNN, we must load the training as well as test data.
• Step 2 − Next, we need to choose the value of K i.e. the nearest data points. K can
be any integer.
• 3.2 − Now, based on the distance value, sort them in ascending order.
• 3.3 − Next, it will choose the top K rows from the sorted array.
• 3.4 − Now, it will assign a class to the test point based on most
frequent class of these rows.
KNN- Example
Task : Classify the given instance according to the classes of
Training data using KNN- Algorithm . The value of K = 3
• Euclidean distance
KNN- Example
• Next, it will choose the top K rows from the sorted array.
• − Now, it will assign a class to the test point based on most frequent class of these
rows.
• D1 = Pass
• D6= Fail
• D4= Fail
• When it gets the training data, it does not learn and make a
model, it just stores the data
given data.
KNN- Dis-ADVANTAGES
• we need to store the whole training set for every test set, it
requires a lot of space.
• Features Scaling
• scaler = StandardScaler()
• X_train_scaled = scaler.fit_transform(X_train)
• X_test_scaled = scaler.transform(X_test)
KNN- Implementation Steps
• Make Prediction
• y_pred = knn.predict(X_test_scaled)
KNN- Implementation Steps
• Model Accuracy
• accuracy = accuracy_score(y_test, y_pred)
• print("Model Accuracy:", accuracy)
KNN- Implementation Steps
• Confusion matrix
• conf_matrix = confusion_matrix(y_test, y_pred)
• print("Confusion Matrix:")
• print(conf_matrix)
KNN- Implementation Steps