0% found this document useful (0 votes)
37 views

NLP Assignment-1

The document compares various popular Python libraries for natural language processing (NLP) including NLTK, spaCy, and Gensim. It discusses their provided functionalities, platform dependence, supported NLP approaches and tasks, advantages, and disadvantages.

Uploaded by

amey bhirange
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

NLP Assignment-1

The document compares various popular Python libraries for natural language processing (NLP) including NLTK, spaCy, and Gensim. It discusses their provided functionalities, platform dependence, supported NLP approaches and tasks, advantages, and disadvantages.

Uploaded by

amey bhirange
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Bansilal Ramnath Agarwal Charitable Trust's

Vishwakarma Institute of Information


Technology

Department of
Artificial Intelligence and Data Science

Name: Jawale Ritesh Ulhas

Class: TY Division: C Roll No: 373020

Semester: VI Academic Year: 2023-24


Subject Name & Code: Natural Language Processing: ADUA32203
Title of Assignment: Comparative study of available libraries for Natural Language
processing with respect to provided functionalities, platform dependence, supported NLP
approaches, supported NLP Tasks, advantages and Disadvantages etc.
Date of Performance: 02/02/2024 Date of Submission: 02/02/2024

Assignment No.- 1
Natural Language Processing

 NLP stands for Natural Language Processing.


 It is a branch of artificial intelligence that focuses on enabling computers
to understand, interpret, and generate human language.
 NLP involves the development of algorithms and models to analyze and
process text or speech data, allowing machines to perform tasks such as
language translation, sentiment analysis, summarization, and question
answering.
 The goal of NLP is to bridge the gap between human communication and
computer understanding, enabling more natural and effective interaction
between humans and machines.

NLP Libraries in Python

1. NLTK (Natural Language Tool Kit)


2. TextBlob
3. CoreNLP
4. Gensim
5. SpaCy
6. Polyglot
7. Scikit-Learn
8. Pattern
9. Hugging Face Transformer
Some NLP Libraries in Depth

[I] Natural Language Tool Kit (NLTK)

Provided Functionalities:
1. Tokenization
2. Part-of-Speech Tagging (PoS Tagging)
3. Named Entity Recognition (NER)
4. Sentiment Analysis
5. Lemmatization and Stemming
6. Frequency Distribution
7. Concordance

Platform Dependence:
 NLTK is platform-independent and can run on Windows, MacOS and
Linux.

Supported NLP Approaches:


1. Rule-based Approach: NLTK provides tools for rule-based processing
and language parsing.
2. Statistical Approach: NLTK supports statistical models for various tasks
such as part-of-speech tagging.

Supported NLP Tasks:


 NLTK supports a wide range of NLP tasks, including text processing,
linguistic analysis and machine learning based applications.

Advantages:
1. Comprehensive: NLTK offers a comprehensive set of tools for natural
language processing.
2. Educational Resources: NLTK is one of the most used libraries for
Natural Language Processing due to which there are a lot of free and paid
resources.
3. Customization: Users can customize and extend functionalities
according to their specific needs

Disadvantages:
1. Speed: NLTK is slow as compared to other NLP libraries, especially in
large-scale NLP applications.
2. Learning Curve: Beginners can find some functionalities of the library
hard to understand.
[II] spaCy

Provided Functionalities:
1. Tokenization
2. Part-of-Speech Tagging (PoS Tagging)
3. Named Entity Recognition (NER)
4. Lemmatization
5. Dependency Parsing
6. Word Embeddings
7. Sentiment Analysis

Platform Dependence:
 spaCy is platform-independent and can run on Windows, MacOS and
Linux.

Supported NLP Approaches:


1. Rule-based Approach: SpaCy provides tools for rule-based processing
and language parsing.
2. Deep Learning Integration: SpaCy integrates well with deep learning
libraries for certain tasks.

Supported NLP Tasks:


 SpaCy supports a wide range of NLP tasks, including text processing,
linguistic analysis and deep learning-based applications.

Advantages:
1. High Performance: Known for its speed and efficiency, making it
suitable for large-scale text processing.
2. User-Friendly: spaCy has an easy-to-use API, making it accessible for
both beginners and experienced developers.
3. Pre-trained Models: Comes with pre-trained models for various
languages and domains, saving time and resources.

Disadvantages:
1. Limited Customization: May be less flexible and customizable
compared to some other NLP libraries for certain advanced tasks.
2. Resource Intensive: Certain operations, such as loading large models,
can be resource-intensive.
[III] Gensim

Provided Functionalities:
1. Topic Modelling
2. Document Similarity
3. Word Embeddings

Platform Dependence:
 Gensim is platform-independent and can run on Windows, MacOS and
Linux.

Supported NLP Approaches:


1. Gensim primarily supports unsupervised learning approaches, especially
in the domain of topic modeling. It excels in techniques that involve
capturing semantic relationships between words and documents.

Supported NLP Tasks:


1. Topic Modeling
2. Document Similarity
3. Word Embeddings
4. Text Summarization
5. Word Similarity

Advantages:
1. Efficiency: Gensim is designed for efficiency and scalability, making it
suitable for processing large corpora and streaming data.
2. Memory Efficiency: Gensim can handle large datasets with minimal
memory usage.
3. Topic Modeling Expertise: It excels in tasks related to topic modeling,
making it a preferred choice for researchers and practitioners in this
domain.

Disadvantages:
1. Limited Pre-processing Tools: Gensim has fewer pre-processing tools
compared to some other NLP libraries, so users might need to rely on
additional libraries for certain tasks.
2. Learning Curve: Gensim's API might have a steeper learning curve for
beginners compared to more user-friendly libraries like SpaCy.
Installing NLP Libraries

1. NLTK

a. Install NLTK: https://pypi.python.org/pypi/nltk

b. Test installation: type import nltk


2. spaCy

a. pip install -U pip setuptools wheel

b. pip install -U spacy


c. python -m spacy download en_core_web_sm
3. Gensim

a. pip install gensim


Comparing NLP Libraries

Aspect NLTK SpaCy Gensim


Topic modeling,
Scope/Primary General-purpose NLP Industrial-strength NLP
document similarity,
Focus toolkit library
word embeddings
Beginner-friendly Moderate learning
User-friendly API with
Ease of Use with extensive curve, specialized for
straightforward syntax
documentation certain tasks
Limited compared to
Efficient tokenization,
Comprehensive set for NLTK and SpaCy,
Preprocessing Tools POS tagging, named
text processing but sufficient for its
entity recognition
focus
Efficient and scalable,
Moderate speed; may
High performance; suitable for large
Performance be slower for large-
designed for speed corpora and streaming
scale tasks
data
Active community,
Large and active Growing community; especially in the
Community Support
community increasing popularity domain of topic
modeling
Support for multiple
Support for various
Broad support for languages, but models
Language Support languages, but English-
multiple languages might be language-
centric in some models
specific
Text processing, Industry-grade Topic modeling,
Main Use Cases linguistic analysis, and applications, document similarity,
research production-ready NLP and word embeddings
Specialized for
Primarily focuses on
Covers various NLP unsupervised learning
NLP Approaches rule-based and machine
approaches and tasks approaches, especially
learning approaches
in topic modeling
Integration with deep Limited; focuses more
Integration with Limited deep learning
learning libraries on traditional NLP
Deep Learning integration
(TensorFlow, PyTorch) algorithms
Statistical models for
Includes algorithms LSA, LDA for topic
POS tagging,
Notable for tokenization, modeling, Word2Vec,
dependency parsing,
Algorithms/Models stemming, POS Doc2Vec for word
named entity
tagging, etc. embeddings
recognition
Often used in research
Suitable for both and specific
Use in Research vs. Used widely in
research and production applications, less
Production research and academia
environments common in
production
Conclusion:
Thus, we have successfully studied different NLP libraries in Python like
NLTK, spaCy and Gensim in detail and learned how to install them in Python.
We have also done a Comparative study of available libraries for Natural
Language processing with respect to provided functionalities, platform
dependence, supported NLP approaches, supported NLP Tasks, advantages and
Disadvantages etc.

You might also like