NLP Assignment-1
NLP Assignment-1
Department of
Artificial Intelligence and Data Science
Assignment No.- 1
Natural Language Processing
Provided Functionalities:
1. Tokenization
2. Part-of-Speech Tagging (PoS Tagging)
3. Named Entity Recognition (NER)
4. Sentiment Analysis
5. Lemmatization and Stemming
6. Frequency Distribution
7. Concordance
Platform Dependence:
NLTK is platform-independent and can run on Windows, MacOS and
Linux.
Advantages:
1. Comprehensive: NLTK offers a comprehensive set of tools for natural
language processing.
2. Educational Resources: NLTK is one of the most used libraries for
Natural Language Processing due to which there are a lot of free and paid
resources.
3. Customization: Users can customize and extend functionalities
according to their specific needs
Disadvantages:
1. Speed: NLTK is slow as compared to other NLP libraries, especially in
large-scale NLP applications.
2. Learning Curve: Beginners can find some functionalities of the library
hard to understand.
[II] spaCy
Provided Functionalities:
1. Tokenization
2. Part-of-Speech Tagging (PoS Tagging)
3. Named Entity Recognition (NER)
4. Lemmatization
5. Dependency Parsing
6. Word Embeddings
7. Sentiment Analysis
Platform Dependence:
spaCy is platform-independent and can run on Windows, MacOS and
Linux.
Advantages:
1. High Performance: Known for its speed and efficiency, making it
suitable for large-scale text processing.
2. User-Friendly: spaCy has an easy-to-use API, making it accessible for
both beginners and experienced developers.
3. Pre-trained Models: Comes with pre-trained models for various
languages and domains, saving time and resources.
Disadvantages:
1. Limited Customization: May be less flexible and customizable
compared to some other NLP libraries for certain advanced tasks.
2. Resource Intensive: Certain operations, such as loading large models,
can be resource-intensive.
[III] Gensim
Provided Functionalities:
1. Topic Modelling
2. Document Similarity
3. Word Embeddings
Platform Dependence:
Gensim is platform-independent and can run on Windows, MacOS and
Linux.
Advantages:
1. Efficiency: Gensim is designed for efficiency and scalability, making it
suitable for processing large corpora and streaming data.
2. Memory Efficiency: Gensim can handle large datasets with minimal
memory usage.
3. Topic Modeling Expertise: It excels in tasks related to topic modeling,
making it a preferred choice for researchers and practitioners in this
domain.
Disadvantages:
1. Limited Pre-processing Tools: Gensim has fewer pre-processing tools
compared to some other NLP libraries, so users might need to rely on
additional libraries for certain tasks.
2. Learning Curve: Gensim's API might have a steeper learning curve for
beginners compared to more user-friendly libraries like SpaCy.
Installing NLP Libraries
1. NLTK