0% found this document useful (0 votes)

58 views18 pages

Minorproject Ishant

1. The document discusses chatbots and how they work. It explains that chatbots are artificial intelligence software that can engage with users to perform tasks like transactions or customer service. 2. Chatbots are either rule-based, where they answer questions based on predefined rules, or self-learning bots that use machine learning approaches. 3. The document then discusses the prerequisites for building chatbots including scikit-learn for machine learning models and NLTK for natural language processing tasks like tokenization and part-of-speech tagging.

Uploaded by

Ishant Kumawat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views18 pages

Minorproject Ishant

Uploaded by

Ishant Kumawat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Chat - Bot using

NLTK Library

Submitted by : -

Ishant Kumawat
19bcon085
So what is a chatbot?

A chatbot is an artificial intelligence-powered piece of software in a

device (Siri, Alexa, Google Assistant, etc.), application, website, or
other networks. It gauges consumer’s needs and then assists them in
performing a particular task like a commercial transaction, hotel
booking, form submission, etc. Today almost every company has a
chatbot deployed to engage with the users. Some of the ways in which
companies are using chatbots are:

 To deliver flight information

 to connect customers and their finances
 As customer support
How do Chatbots work?

There are broadly two variants of chatbots: Rule-Based and Self-learning.

1. In a Rule-based approach, a bot answers questions based on some rules,

which it is trained on. The rules defined can be very simple to very complex.
The bots can handle simple queries but fail to manage complex ones.

2. Self-learning bots are the ones that use some Machine Learning-based

approaches and are more efficient than rule-based bots. These bots can be of
further two types: Retrieval Based or Generative.
Pre - Requisites
1. Skicit-Learn : Scikit-learn (Sklearn) is the most useful and robust library for
machine learning in Python. The sklearn library contains a lot of
efficient tools for machine learning and statistical modeling
including classification, regression, clustering and dimensionality
reduction. Please note that sklearn is used to build machine
learning models. It should not be used for reading the data,
manipulating and summarizing it. There are better libraries for that
(e.g. NumPy, Pandas etc.)
Important Features of scikit-learn:

 Simple and efficient tools for data mining and data analysis. It features various
classification, regression and clustering algorithms including support vector
machines, random forests, gradient boosting, k-means, etc.

 Accessible to everybody and reusable in various contexts.

 Built on the top of NumPy, SciPy, and matplotlib.

 Open source, commercially usable – BSD license.

NLP ( Natural Language Processing ) :-

Natural language processing (NLP) refers to the branch of computer science—and

more specifically, the branch of artificial intelligence or AI—concerned with giving
computers the ability to understand text and spoken words in much the same way
human beings can. NLP has existed for more than 50 years and has roots in the field
of linguistics. It has a variety of real-world applications in a number of fields,
including medical research, search engines and business intelligence.
NLP combines computational linguistics—rule-based modeling of human language—with
statistical, machine learning, and deep learning models. Together, these technologies enable
computers to process human language in the form of text or voice data and to ‘understand’ its
full meaning, complete with the speaker or writer’s intent and sentiment.

NLP enables computers to understand natural language as humans do. Whether the language
is spoken or written, natural language processing uses artificial intelligence to take real-world
input, process it, and make sense of it in a way a computer can understand. Just as humans
have different sensors -- such as ears to hear and eyes to see -- computers have programs to
read and microphones to collect audio. And just as humans have a brain to process that input,
computers have a program to process their respective inputs. At some point in processing, the
input is converted to code that the computer can understand.
There are two main phases to natural language processing:
1. Data Pre-Processing and 2. Algorithm Development.

Data pre-processing involves preparing and "cleaning" text data for machines
to be able to analyze it. Pre-processing puts data in workable form and
highlights features in the text that an algorithm can work with. There are
several ways this can be done, including:
Tokenization :

 Tokens are the building blocks of Natural Language.

 Tokenization is a common task in Natural Language Processing (NLP). It’s a

fundamental step in both traditional NLP methods like Count Vectorizer and
Advanced Deep Learning-based architectures like Transformers.

 Tokenization is a way of separating a piece of text into smaller units called

tokens. Here, tokens can be either words, characters, or sub-words. Hence,
tokenization can be broadly classified into 3 types – word, character, and sub-
word (n-gram characters) tokenization.
 As tokens are the building blocks of Natural Language, the
most common way of processing the raw text happens at the
token level.

 Tokenization is the foremost step while modeling text data.

Tokenization is performed on the corpus to obtain tokens. The
following tokens are then used to prepare a vocabulary.
Vocabulary refers to the set of unique tokens in the corpus.

 Remember that vocabulary can be constructed by considering

each unique token in the corpus or by considering the top K
Frequently Occurring Words.
Stop word removal:

 This is when common words are removed from text so unique words that offer the
most information about the text remain.

 Stop word removal is one of the most commonly used preprocessing steps across
different NLP applications. The idea is simply removing the words that occur
commonly across all the documents in the corpus. Typically, articles and pronouns are
generally classified as stop words. These words have no significance in some of the
NLP tasks like information retrieval and classification, which means these words are
not very discriminative.

 On the contrary, in some NLP applications stop word removal will have very little
impact. Most of the time, the stop word list for the given language is a well hand-
curated list of words that occur most commonly across corpuses. Therefore removing
stop words helps build cleaner dataset with better features for machine learning model.
Lemmatization and Stemming:

 Stemming and Lemmatization are Text Normalization (or sometimes called Word

Normalization) techniques in the field of Natural Language Processing that are used to
prepare text, words, and documents for further processing.

 Stemming and Lemmatization are itself form of NLP and widely used in Text mining.
Text Mining is the process of analysis of texts written in natural language and extract
high-quality information from text. It involves looking for interesting patterns in the text
or to extract data from the text to be inserted into a database. Text mining tasks include
text categorization, text clustering, concept/entity extraction, production of granular
taxonomies, sentiment analysis, document summarization, and entity relation modelling
(i.e., learning relations between named entities).
Part-of-speech Tagging :

This is when words are marked based on the part-of speech they are -- such as
nouns, verbs and adjectives. Parts of speech tags are the properties of the
words, which define their main context, functions, and usage in a sentence.
Some of the commonly used parts of speech tags are

i. Nouns: Which defines any object or entity

ii. Verbs: That defines some action.

iii. Adjectives and Adverbs: This acts as a modifier,
quantifier, or intensifier in any sentence.
NLTK Library
• The Python programing language provides a wide range of tools and libraries for attacking
specific NLP tasks. Many of these are found in the Natural Language Toolkit, or NLTK, an
open source collection of libraries, programs, and education resources for building NLP
programs.

• The NLTK includes libraries for many of the NLP tasks listed above, plus libraries for
subtasks, such as sentence parsing, word segmentation, stemming and lemmatization
(methods of trimming words down to their roots), and tokenization (for breaking phrases,
sentences, paragraphs and passages into tokens that help the computer better understand the
text). It also includes libraries for implementing capabilities such as semantic reasoning, the
ability to reach logical conclusions based on facts extracted from text.
NLP Use Cases :
 Spam Detection

 Machine Translation

 Virtual Agents and Chat-Bots

 Social Media and Sentiment Analysis

 Text Summarization

 Text Classification

 Text Extraction
References :
 Analytics Vidya

 Medium.com

 IBM official NLP Documentation

 kdNuggets

 Wiki-Pedia

 Udemy
Thank You !!


Introduction To Natural Language Processing (NLP)
No ratings yet
Introduction To Natural Language Processing (NLP)
87 pages
Natural Language Processing - Semantic Aspects PDF
100% (4)
Natural Language Processing - Semantic Aspects PDF
343 pages
Afaan Oromo News Text Summarization Using Sentence Scoring Method
No ratings yet
Afaan Oromo News Text Summarization Using Sentence Scoring Method
106 pages
Mini ProjectA17
0% (1)
Mini ProjectA17
25 pages
A Beginner's Guide To Natural Language Processing - IBM Developer
No ratings yet
A Beginner's Guide To Natural Language Processing - IBM Developer
9 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
Bhawini NLP Practical
No ratings yet
Bhawini NLP Practical
98 pages
NLP Unit V Notes
100% (1)
NLP Unit V Notes
21 pages
NLP Manual (1-12) 1
No ratings yet
NLP Manual (1-12) 1
56 pages
Abstractive Text Summarization of Multimedia News Content Using RNN
No ratings yet
Abstractive Text Summarization of Multimedia News Content Using RNN
10 pages
NLP - Srilakshmi H - PPT Assignment
No ratings yet
NLP - Srilakshmi H - PPT Assignment
29 pages
Unit 3
No ratings yet
Unit 3
14 pages
Natural Language Processing
No ratings yet
Natural Language Processing
12 pages
Introducing Natural Language Processing
No ratings yet
Introducing Natural Language Processing
13 pages
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
No ratings yet
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
37 pages
Chapter-1 Introduction To NLP
No ratings yet
Chapter-1 Introduction To NLP
12 pages
Intro To NLP: Natural Language Toolkit
No ratings yet
Intro To NLP: Natural Language Toolkit
11 pages
NLP 9
No ratings yet
NLP 9
44 pages
Trending Topic Analysis Using Novel Sub Topic Detection Model
No ratings yet
Trending Topic Analysis Using Novel Sub Topic Detection Model
5 pages
Nlpslide
No ratings yet
Nlpslide
21 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
55 pages
Natural Language Processing 101
No ratings yet
Natural Language Processing 101
26 pages
Literature Study On Multi-Document Text Summarization Techniques
No ratings yet
Literature Study On Multi-Document Text Summarization Techniques
11 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
White Paper
No ratings yet
White Paper
9 pages
Brocode OP
No ratings yet
Brocode OP
133 pages
Natural Language Processin1
No ratings yet
Natural Language Processin1
86 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
NLP Handwritten Notes
No ratings yet
NLP Handwritten Notes
26 pages
Topic 2: Introduction To Natural Language Processing (NLP)
No ratings yet
Topic 2: Introduction To Natural Language Processing (NLP)
16 pages
Chapter 4
No ratings yet
Chapter 4
17 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
31 pages
LLM SFT Data Guideline v2.0
No ratings yet
LLM SFT Data Guideline v2.0
13 pages
Mini Combined Report
No ratings yet
Mini Combined Report
27 pages
Chatbot and Text Summarization
No ratings yet
Chatbot and Text Summarization
5 pages
NLP Notes
No ratings yet
NLP Notes
90 pages
AP For NLP-LO1
No ratings yet
AP For NLP-LO1
61 pages
Chapter 6.
No ratings yet
Chapter 6.
31 pages
NLP Lecture 1
No ratings yet
NLP Lecture 1
3 pages
AP For NLP-Word 2 Vec
No ratings yet
AP For NLP-Word 2 Vec
33 pages
NLP LectureNotes UNIT 1
No ratings yet
NLP LectureNotes UNIT 1
55 pages
NLP Unit 1 Part1
No ratings yet
NLP Unit 1 Part1
61 pages
Umut Thesis Final18
No ratings yet
Umut Thesis Final18
145 pages
Unsupervised Video Summarization Framework Using Keyframe Extraction and Video Skimming
No ratings yet
Unsupervised Video Summarization Framework Using Keyframe Extraction and Video Skimming
6 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
Lab Manual - NLP
No ratings yet
Lab Manual - NLP
60 pages
Pranshi Singla IX C AI Activity 1
No ratings yet
Pranshi Singla IX C AI Activity 1
24 pages
Module-I NLP
No ratings yet
Module-I NLP
35 pages
18 Best AI For Research in 2025
No ratings yet
18 Best AI For Research in 2025
29 pages
PresentationDayone-Introduction of NLP
No ratings yet
PresentationDayone-Introduction of NLP
17 pages
Natural Language Processing Manual
No ratings yet
Natural Language Processing Manual
39 pages
Module-1 Introduction To NLP
No ratings yet
Module-1 Introduction To NLP
28 pages
Artificial Intelligence (Unit - 2)
No ratings yet
Artificial Intelligence (Unit - 2)
118 pages
Monisha Jegadeesan: Software Engineer, Google
No ratings yet
Monisha Jegadeesan: Software Engineer, Google
3 pages
NLP Pipeline
No ratings yet
NLP Pipeline
58 pages
NLP Prep
No ratings yet
NLP Prep
14 pages
Deep Multi-Scale Pyramidal Features Network For Supervised Video Summarization
No ratings yet
Deep Multi-Scale Pyramidal Features Network For Supervised Video Summarization
14 pages
NLP - 1 - 250119 - 222702
No ratings yet
NLP - 1 - 250119 - 222702
71 pages
What Is Natural Language Processing (NLP) ?
No ratings yet
What Is Natural Language Processing (NLP) ?
11 pages
Introduction To NLP - First - Week - Lecture - 1st
No ratings yet
Introduction To NLP - First - Week - Lecture - 1st
6 pages
Proceedings of International Conference On Recent Innovations in Computing
No ratings yet
Proceedings of International Conference On Recent Innovations in Computing
689 pages
Artificial Intelligence (Unit - 2)
No ratings yet
Artificial Intelligence (Unit - 2)
118 pages
Chapter - 6 Communicating, Perceiving, and Acting
No ratings yet
Chapter - 6 Communicating, Perceiving, and Acting
30 pages
1 NLP
No ratings yet
1 NLP
26 pages
Dm-Unit Advanced Concepts
No ratings yet
Dm-Unit Advanced Concepts
57 pages
Ai CH 4
No ratings yet
Ai CH 4
53 pages
Module 1
No ratings yet
Module 1
49 pages
NLP IAE II Students Blue Print
No ratings yet
NLP IAE II Students Blue Print
1 page
AI-Powered Text Generation For Harmonious Human-Machine Interaction: Current State and Future Directions
No ratings yet
AI-Powered Text Generation For Harmonious Human-Machine Interaction: Current State and Future Directions
8 pages
Natural Language Processing Notes Class 10
No ratings yet
Natural Language Processing Notes Class 10
10 pages
Natural Language Processing Unit 1-2
No ratings yet
Natural Language Processing Unit 1-2
18 pages
Juan Ramirez CV
No ratings yet
Juan Ramirez CV
2 pages
Intro To Natural Language Processing (NLP)
No ratings yet
Intro To Natural Language Processing (NLP)
13 pages
Module I NLP
No ratings yet
Module I NLP
65 pages
Comparative Analysis of RAG, Fine-Tuning, and Prompt Engineering in Chatbot Development - 2024
No ratings yet
Comparative Analysis of RAG, Fine-Tuning, and Prompt Engineering in Chatbot Development - 2024
10 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
NLP Materia
No ratings yet
NLP Materia
29 pages
Unit 4
No ratings yet
Unit 4
39 pages
ch5&6 Lecture AI
No ratings yet
ch5&6 Lecture AI
69 pages
NLP Record300
No ratings yet
NLP Record300
24 pages
A Multimodal Approach To Multispeaker Summarization and Mind Mapping For Audio Data
No ratings yet
A Multimodal Approach To Multispeaker Summarization and Mind Mapping For Audio Data
6 pages
Ai NLP
No ratings yet
Ai NLP
34 pages
REPORT Legal Document Summarization Tool
No ratings yet
REPORT Legal Document Summarization Tool
20 pages
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
No ratings yet
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
83 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
Arunkumar S Resume
No ratings yet
Arunkumar S Resume
2 pages
Paper 10
No ratings yet
Paper 10
39 pages
Audio To Text Summarizer Mini Project Final Report
No ratings yet
Audio To Text Summarizer Mini Project Final Report
58 pages
Text Mining: Fundamentals and Applications
From Everand
Text Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Concept Mining: Fundamentals and Applications
From Everand
Concept Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet