0% found this document useful (0 votes)
11 views

Assignment 1_NLP

The document is an assignment on Natural Language Processing (NLP) that includes multiple choice questions, fill-in-the-blank exercises, case study questions, and subjective questions. It covers key concepts such as the goals of NLP, applications like sentiment analysis and chatbots, and techniques like tokenization and TFIDF. Additionally, it discusses the advantages and limitations of the Bag of Words model and the functionalities of the NLTK library.

Uploaded by

Neha Makhija
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Assignment 1_NLP

The document is an assignment on Natural Language Processing (NLP) that includes multiple choice questions, fill-in-the-blank exercises, case study questions, and subjective questions. It covers key concepts such as the goals of NLP, applications like sentiment analysis and chatbots, and techniques like tokenization and TFIDF. Additionally, it discusses the advantages and limitations of the Bag of Words model and the functionalities of the NLTK library.

Uploaded by

Neha Makhija
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Assignment 1 – Natural Language Processing

Section A: Multiple Choice Questions (MCQs)


1. What is the primary goal of Natural Language Processing (NLP)?
a) To create new languages b) To understand and process human language
c) To replace human communication d) To improve computer hardware

2. Which of the following is an application of NLP?


a) Image recognition b) Sentiment analysis c) Data encryption d) Network routing
3. In the context of NLP, what does “Bag of Words” refer to?
a) A method for encoding words based on their order b) A technique for removing punctuation from text
c) A statistical method for representing text data d) A type of language model used in machine translation
4. Which of the following libraries is commonly used for NLP in Python?
a) NumPy b) Pandas c) NLTK d) Matplotlib
5. What does “TFIDF” stand for in text processing?
a) Term Frequency-Inverse Document Frequency b) Term Frequency-Initial Document Frequency
c) Total Frequency-Inverse Document Frequency d) Term Frequency-Information Density Factor

2. Fill in the Blanks


1. NLP aims to bridge the gap between human ________ and computer ________.
2. The 'Bag of Words' model ignores the ________ of words in a text and focuses on their ________.
3. In text processing, ________ is used to count the frequency of words in a document.
4. The library ________ is used for NLP tasks such as tokenization and parsing in Python.
5. The TFIDF model helps to determine the ________ of a word in a document relative to its occurrence in the entire
corpus.

3. Case Study Questions


1. Case Study 1: Sentiment Analysis
A company wants to analyze customer reviews to determine the overall sentiment (positive, negative, or neutral) towards
their products. Describe how NLP can be used to accomplish this task and list the steps involved in building a sentiment
analysis model.

2. Case Study 2: Chatbots


A startup is developing a chatbot to assist customers with frequently asked questions. Explain how NLP techniques can be
applied to understand user queries and provide appropriate responses. Include the role of text processing in this application.

3. Case Study 3: Document Classification


An organization needs to classify incoming documents into different categories such as “Finance, “Healthcare”, and
“Technology” Discuss how NLP methods, including Bag of Words and TFIDF, can be utilized for document classification and
the potential challenges involved.

4. Subjective Questions (3 Marks Each)


1. Explain the difference between human language and computer language in the context of NLP. Why is it challenging to
process human language with computers?
Ans. Human language is complex, ambiguous, and context-dependent, often involving nuances like slang, idioms, and
emotions. In contrast, computer languages are structured, with fixed syntax and semantics. The challenge in NLP arises
because human language lacks strict rules and varies widely, making it difficult for computers to interpret context,
disambiguate meanings, and handle variations.

2. Describe the process of tokenization in text processing and its importance in NLP applications.
Ans. Tokenization is the process of breaking down text into smaller units, such as words or sentences, known as tokens. It’s
important because most NLP tasks (e.g., sentiment analysis, translation) require working with individual tokens rather than
entire text blocks, enabling better analysis, feature extraction, and text understanding.
3. Discuss the advantages and limitations of the Bag of Words model for text representation.
Ans. Advantages: The Bag of Words (BoW) model is simple to implement and effective for representing text by counting word
occurrences, enabling easy text classification and clustering. Limitations: It disregards word order, context, and meaning,
leading to loss of semantic information and an inability to handle polysemy (words with multiple meanings).

4. Explain how TFIDF improves upon the Bag of Words model for text analysis. Provide an example of how TFIDF might be
used in practice.
Ans. TF-IDF (Term Frequency-Inverse Document Frequency) improves BoW by giving less weight to common words and more
weight to rare but important words in the text, thereby enhancing relevance. For example, in document classification, TF-IDF
helps identify key terms that distinguish one document from others by reducing the impact of frequent but non-informative
words (e.g., "the," "is").

5. What is NLTK, and how does it assist in Natural Language Processing tasks? Mention at least two functionalities provided by
the NLTK library.
Ans. NLTK (Natural Language Toolkit) is a comprehensive library in Python that supports NLP tasks such as text preprocessing,
tokenization, and sentiment analysis. Two key functionalities of NLTK include:
 Tokenization: Breaking text into words or sentences.
 Stemming and Lemmatization: Reducing words to their base or root form.

6. In text processing, what are stop words, and why are they typically removed from text data before analysis? Provide
examples of common stop words.
Ans. Stop words are common words (e.g., "the," "is," "in") that appear frequently in text but carry little meaning. They are
removed in text analysis to reduce noise and focus on more significant words. Removing stop words helps in improving the
accuracy and efficiency of NLP tasks such as text classification or search engine optimization.

You might also like