Natural Language Processing: Practical 1
Natural Language Processing: Practical 1
• First parameter is “Path of file containing drive, folder, sub-folder, filename and
extension”
• Second parameter is “mode of opening file”
• “r” is read-only
Check the Contents of File
• Save contents of a file in a variable by reading the file contents
• read() function reads the contents of file from start to end and stores in text variable
• Use re.sub() function to replace HTTP, HTTPs URLs with blank space and store
the result in a separate variable
Stemming using
1. PorterStemmer
2. LancasterStemmer
Import TOKENIZER
Python Practical
Objectives
• What is tokenization?
• Different ways to perform tokenization
What is Tokenization?
• Tokenization is a common task a data scientist comes
across when working with text data
• It consists of splitting an entire text into small units, also
known as tokens
• NLP projects have tokenization as the first step because it
is the foundation for developing good models and helps
better understand the text we have
Different Ways to Tokenize Text
1. Using split() function
2. NLTK
Tokenization with split() - Steps
• Open the text document
• Use: nltk.download()
Natural Language Processing
Practical 3 BS(SE) 5th Semester
Python Practical
Punctuations
• Symbols
• Used to punctuate text
• .,;“‘!- [different punctuation marks]
How to remove punctuation marks?
• Three methods:
• Using string library
• Using for each loop
• Using replace() function
Method # 1 – Using string library
1. Import string library
2. Open a text file and read it
3. Print text
4. Remove punctuations
5. Print text without punctuations
Step # 1 – Import string library
• Python string is a built-in module contains some
constants, utility function, and classes for string
manipulation.
Step # 2 – Open text file and read it
• Use already discussed method
Step # 3 – Print text
• Use already discussed method
Step # 4 – Remove punctuations
• Use translate() and maketrans() methods
https://datagy.io/python-remove-punctuation-from-string/
Step # 4 – Print text without punc.
Complete Code
Detect Language of Text
Python Practical 4
Using “TextBlob”
• Steps:
• Install TextBlob library
Step 1: Install TextBlob Library
• Open Command Prompt
• Install TextBlob library using command:
• pip install textblob
Step 2: Import textblob Library
• Open python IDLE Shell
• Write the code to import Translator
• from textblob import TextBlob
Step 3: Detect the language
• Create text variable and store some sentence: