0% found this document useful (0 votes)
11 views5 pages

NLP Lab Exp 01

natural language processing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

NLP Lab Exp 01

natural language processing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Roll No.

Marks :
Experiment No. 01
BATCH - Sign :

Aim: To perform preprocessing of text (Tokenization, Filtration)

Programs:

1.​Converting to Lowercase
# Implementation of lower case conversion

def lower_case_convertion(text):
​ """
​ Input :- string
​ Output :- lowercase string
​ """
​ lower_text = text.lower()
​ return lower_text

ex_lowercase = "This is an example Sentence for LOWER case conversion"


lowercase_result = lower_case_convertion(ex_lowercase)
print(lowercase_result)

## Output::

this is an example sentence for lower case conversion


2.​Removal of HTML tags
# HTML tags removal Implementation using regex module

import re
def remove_html_tags(text):
​ """
​ Return :- String without Html tags
​ input :- String
​ Output :- String
​ """
​ html_pattern = r'<.*?>'
​ without_html = re.sub(pattern=html_pattern, repl=' ', string=text)
​ return without_html

ex_htmltags = """

<body>
<div>
<h1>Hi, this is an example text with Html tags. </h1>
</div>
</body>

"""
htmltags_result = remove_html_tags(ex_htmltags)
print(f"Result :- \n {htmltags_result}")

## Output::

Result :-

Hi, this is an example text with Html tags.


3.​Removal of URLs
# Implementation of Removing URLs using python regex

import re
def remove_urls(text):
​ """
​ Return :- String without URLs
​ input :- String
​ Output :- String
​ """
​ url_pattern = r'https?://\S+|www\.\S+'
​ without_urls = re.sub(pattern=url_pattern, repl=' ', string=text)
​ return without_urls

# example text which contain URLs in it


ex_urls = """
This is an example text for URLs like http://google.com & https://www.facebook.com/ etc.
"""

# calling removing_urls function with example text (ex_urls)


urls_result = remove_urls(ex_urls)
print(f"Result after removing URLs from text :- \n {urls_result}")

## Output::

Result after removing URLs from text :-

This is an example text for URLs like & etc.


4.​Removing Numbers
# Implementation of Removing numbers using python regex

import re
def remove_numbers(text):
​ """
​ Return :- String without numbers
​ input :- String
​ Output :- String
​ """
​ number_pattern = r'\d+'
​ without_number = re.sub(pattern=number_pattern,
repl=" ", string=text)
​ return without_number

# example text which contain numbers in it


ex_numbers = """
This is an example sentence for removing numbers like 1, 5,7, 4 ,77 etc.
"""
# calling remove_numbers function with example text (ex_numbers)
numbers_result = remove_numbers(ex_numbers)
print(f"Result after removing number from text :- \n {numbers_result}")

## Output::

Result after removing number from text :-

This is an example sentence for removing numbers like , , , , etc.


5.​Converting numbers to words
from num2words import num2words

# function to convert numbers to words


def num_to_words(text):
​ """
​ Return :- text which have all numbers or integers in the form of words
​ Input :- string
​ Output :- string
​ """
​ # splitting text into words with space
​ after_spliting = text.split()

​ for index in range(len(after_spliting)):


​ ​ if after_spliting[index].isdigit():
​ ​ ​ after_spliting[index] = num2words(after_spliting[index])

# joining list into string with space


​ numbers_to_words = ' '.join(after_spliting)
​ return numbers_to_words

# example text which contain numbers in it


ex_numbers = """
This is an example sentence for converting numbers to words like 1 to one, 5 to five, 74 to seventy-four, etc.
"""
# calling remove_numbers function with example text (ex_numbers)
numners_result = num_to_words(ex_numbers)
print(f"Result after converting numbers to its words from text :- \n {numners_result}")

## Output::

Result after converting numbers to its words from text :-

This is an example sentence for converting numbers to words like one to one, five to five, seventy-four to
seventy-four, etc.

Conclusions:
Hence we’ve performed preprocessing of text (Tokenization, Filtration)

You might also like