0% found this document useful (0 votes)
18 views9 pages

NLP

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views9 pages

NLP

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

1.

Describe the following NLP libraries:


i. NLTK: (Natural Language Toolkit):
Description: NLTK is a platform used for building Python programs that
work with human language data. It contains text processing libraries for
tokenization, parsing, classification, stemming, tagging and semantic
reasoning.
Use Cases: NLTK is widely used for prototyping and building research
systems. It's great for learning and experimenting with NLP concepts but
may not be the best choice for production environments due to its slower
performance compared to other libraries like
SpaCy.

ii. SpaCy:
Description: SpaCy is an industrial-strength NLP library that is designed for
production use. It offers fast and efficient processing of text, with a focus
on providing practical tools for tasks like tokenization, parsing, named
entity recognition, and more.
Use Cases: SpaCy is well-suited for real-world applications that require fast
and accurate NLP processing. It's commonly used in building applications
for information extraction, natural language understanding, and machine
learning pipelines.

iii. Gensim:
Description: Gensim is a Python library specifically designed for topic
modeling and document similarity analysis. It is optimized for handling
large text collections, using data streaming and incremental online
algorithms, which makes it memory-efficient.
Use Cases: Gensim is widely used for tasks like topic modeling, document
similarity analysis, and information retrieval. It's particularly popular for its
implementations of algorithms like Latent Semantic Analysis (LSA), Latent
Dirichlet Allocation (LDA), and Word2Vec.

iv. Transformers:
Description: The Transformers library, developed by Hugging Face,
provides state-of-the-art general-purpose architectures for natural
language processing, including BERT, GPT, RoBERTa, and more. It offers
thousands of pre-trained models that can be easily used for a wide range
of NLP tasks.
Use Cases: Transformers is widely used for tasks that require deep
learning models, such as text classification, translation, summarization,
and question answering. It's a go-to library for leveraging pre-trained
models and fine-tuning them for specific NLP tasks.
2. Count the number of words in a given text:
i. how many of the words are formulated using alphabets.

ii. how many of the words are formulated using numbers.

3 (i). Find the total count of unique words.

(ii). Find the total occurrence of each words.


4. Study the method of NLTK:
i. Concordance:
The ‘concordance’ method is used to find and display occurrences of a
word in a text along with some context. It shows the word in the middle of
a window of surrounding words, helping to understand how the word is
used in different contexts.

ii. Simulas:
The ‘similar’ method is used to find words that appear in a similar context
as the specified word. It helps in discovering words that are used in similar
ways within the text.

iii. Common underscore Context:


The ‘common underscore contexts’ method is used to find contexts where
two or more specified words appear together. It helps in understanding
how different words are related based on their shared contexts.

iv. Dispersion plot:


The ‘dispersion plot’ method is used to create a graphical representation
of the distribution of words in a text. It shows the location of specified
words within the text, which can be useful for analyzing how certain words
are used throughout the text.
v. Generate:
The ‘generate’ method is used to generate random text based on the style
and vocabulary of the given text. It uses a simple algorithm to produce
text that mimics the original text's patterns.

vi. Download:
The ‘download’ function is not a method of a text object but a function in
the nltk module used to download additional resources, such as corpora,
tokenizers, and other data packages that are used by NLTK.

5. Wite a function that take list of (containing duplicates) and return the list of
word and (containing no Duplicates) sorting by decreasing frequency.

6. Implementation of Bag of Words without using scikit-learn.


7. Implementation of Bag of Words with using scikit-learn.
8. Implementation of Bag of Words with preprocessing.

9. Implementation of Bag of Words without preprocessing.

10. Implementation of TF-IDF.

You might also like