0% found this document useful (0 votes)
69 views18 pages

Beginners Practical Guide To NLP

This document provides a beginner's guide to natural language processing with Python, introducing basic NLP tasks like part-of-speech tagging, tokenization, stemming, lemmatization and frequency analysis using the NLTK library.

Uploaded by

clinton migono
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views18 pages

Beginners Practical Guide To NLP

This document provides a beginner's guide to natural language processing with Python, introducing basic NLP tasks like part-of-speech tagging, tokenization, stemming, lemmatization and frequency analysis using the NLTK library.

Uploaded by

clinton migono
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Beginner Practical Guide of Natural Language Processing(NLP) | by As... https://medium.com/ml-research-lab/beginner-practical-guide-of-natural...

1 of 18 1/30/2023, 5:00 PM
Beginner Practical Guide of Natural Language Processing(NLP) | by As... https://medium.com/ml-research-lab/beginner-practical-guide-of-natural...

2 of 18 1/30/2023, 5:00 PM
Beginner Practical Guide of Natural Language Processing(NLP) | by As... https://medium.com/ml-research-lab/beginner-practical-guide-of-natural...

1 pip install nltk

nlp.1py hosted with by GitHub view raw

1 import nltk
2
3 nltk.download()

nlp2.py hosted with by GitHub view raw

3 of 18 1/30/2023, 5:00 PM
Beginner Practical Guide of Natural Language Processing(NLP) | by As... https://medium.com/ml-research-lab/beginner-practical-guide-of-natural...

1 import urllib.request
2
3 response = urllib.request.urlopen('http://php.net/')
4 html = response.read()
5 print (html)

nlp3.py hosted with by GitHub view raw

4 of 18 1/30/2023, 5:00 PM
Beginner Practical Guide of Natural Language Processing(NLP) | by As... https://medium.com/ml-research-lab/beginner-practical-guide-of-natural...

1 from bs4 import BeautifulSoup


2
3 import urllib.request
4 response = urllib.request.urlopen('http://php.net/')
5 html = response.read()
6 soup = BeautifulSoup(html,"html5lib")
7 # This requires the html5lib module to be installed.
8 text = soup.get_text(strip=True)
9 print (text)

nlp4.py hosted with by GitHub view raw

1 from bs4 import BeautifulSoup


2 import urllib.request
3
4 response = urllib.request.urlopen('http://php.net/')
5 html = response.read()
6 soup = BeautifulSoup(html,"html5lib")
7 text = soup.get_text(strip=True)
8 tokens = text.split()
9 print (tokens)

nlp5.py hosted with by GitHub view raw

5 of 18 1/30/2023, 5:00 PM
Beginner Practical Guide of Natural Language Processing(NLP) | by As... https://medium.com/ml-research-lab/beginner-practical-guide-of-natural...

1 from bs4 import BeautifulSoup


2 import urllib.request
3 import nltk
4
5 response = urllib.request.urlopen('http://php.net/')
6 html = response.read()
7 soup = BeautifulSoup(html,"html5lib")
8 text = soup.get_text(strip=True)
9 tokens = text.split()
10 freq = nltk.FreqDist(tokens)
11 for key,val in freq.items():
12 print (str(key) + ':' + str(val))

nlp6.py hosted with by GitHub view raw

1 freq.plot(20, cumulative=False)
2 # need to install matplotlib library

nlp7.py hosted with by GitHub view raw

6 of 18 1/30/2023, 5:00 PM
Beginner Practical Guide of Natural Language Processing(NLP) | by As... https://medium.com/ml-research-lab/beginner-practical-guide-of-natural...

1 from nltk.corpus import stopwords


2
3 stopwords.words('english')

nlp8.py hosted with by GitHub view raw

7 of 18 1/30/2023, 5:00 PM
Beginner Practical Guide of Natural Language Processing(NLP) | by As... https://medium.com/ml-research-lab/beginner-practical-guide-of-natural...

1 clean_tokens = list()
2 sr = stopwords.words('english')
3 for token in tokens:
4 if token not in sr:
5 clean_tokens.append(token)

nlp9.py hosted with by GitHub view raw

1 from bs4 import BeautifulSoup


2 import urllib.request
3 import nltk
4 from nltk.corpus import stopwords
5
6 response = urllib.request.urlopen('http://php.net/')
7 html = response.read()
8 soup = BeautifulSoup(html,"html5lib")
9 text = soup.get_text(strip=True)
10 tokens = text.split()
11 clean_tokens = list()
12 sr = stopwords.words('english')
13 for token in tokens:
14 if not token in sr:
15 clean_tokens.append(token)
16 freq = nltk.FreqDist(clean_tokens)
17 for key,val in freq.items():
18 print (str(key) + ':' + str(val))

nlp10.py hosted with by GitHub view raw

1 freq.plot(20, cumulative=False)
2 # need to install matplotlib library

nlp7.py hosted with by GitHub view raw

8 of 18 1/30/2023, 5:00 PM
Beginner Practical Guide of Natural Language Processing(NLP) | by As... https://medium.com/ml-research-lab/beginner-practical-guide-of-natural...

9 of 18 1/30/2023, 5:00 PM
Beginner Practical Guide of Natural Language Processing(NLP) | by As... https://medium.com/ml-research-lab/beginner-practical-guide-of-natural...

1 Hello Adam, how are you? I hope everything is going well. Today is a good day, see you dude.

nlp11.py hosted with by GitHub view raw

1 from nltk.tokenize import sent_tokenize


2
3 mytext = "Hello Adam, how are you? I hope everything is going well. Today is a good day, see you dude."
4 print(sent_tokenize(mytext))

nlp12.py hosted with by GitHub view raw

1 ['Hello Adam, how are you?', 'I hope everything is going well.', 'Today is a good day, see you dude.'

nlp13.py hosted with by GitHub view raw

1 Hello Mr. Adam, how are you? I hope everything is going well. Today is a good day, see you dude.

nlp14.py hosted with by GitHub view raw

1 from nltk.tokenize import sent_tokenize


2
3 mytext = "Hello Mr. Adam, how are you? I hope everything is going well. Today is a good day, see you dude."
4 print(sent_tokenize(mytext))
5
6 Output:
7 ['Hello Mr. Adam, how are you?', 'I hope everything is going well.', 'Today is a good day, see you dude.'

nlp15.py hosted with by GitHub view raw

10 of 18 1/30/2023, 5:00 PM
Beginner Practical Guide of Natural Language Processing(NLP) | by As... https://medium.com/ml-research-lab/beginner-practical-guide-of-natural...

1 from nltk.tokenize import word_tokenize


2
3 mytext = "Hello Mr. Adam, how are you? I hope everything is going well. Today is a good day, see you dude."
4 print(word_tokenize(mytext))
5
6 Output:
7 ['Hello', 'Mr.', 'Adam', ',', 'how', 'are', 'you', '?',
8 'I', 'hope', 'everything', 'is', 'going', 'well', '.',
9 'Today', 'is', 'a', 'good', 'day', ',', 'see', 'you', 'dude', '.']

nlp16.py hosted with by GitHub view raw

1 from nltk.tokenize import sent_tokenize


2
3 mytext = "Bonjour M. Adam, comment allez-vous? J'espère que tout va bien. Aujourd'hui est un bon jour."
4 print(sent_tokenize(mytext,"french"))
5
6 Output:
7 ['Bonjour M. Adam, comment allez-vous?', "J'espère que tout va bien.", "Aujourd'hui est un bon jour."

nlp17.py hosted with by GitHub view raw

11 of 18 1/30/2023, 5:00 PM
Beginner Practical Guide of Natural Language Processing(NLP) | by As... https://medium.com/ml-research-lab/beginner-practical-guide-of-natural...

1 from nltk.corpus import wordnet


2
3 syn = wordnet.synsets("pain")
4 print(syn[0].definition())
5 print(syn[0].examples())
6
7 Output:
8
9 a symptom of some physical hurt or disorder
10 ['the patient developed severe pain and distension']

nlp18.py hosted with by GitHub view raw

1 from nltk.corpus import wordnet


2
3 syn = wordnet.synsets("NLP")
4 print(syn[0].definition())
5 syn = wordnet.synsets("Python")
6 print(syn[0].definition())
7
8 Output:
9
10 the branch of information science that deals with natural language information
11 large Old World boas

nlp19.py hosted with by GitHub view raw

12 of 18 1/30/2023, 5:00 PM
Beginner Practical Guide of Natural Language Processing(NLP) | by As... https://medium.com/ml-research-lab/beginner-practical-guide-of-natural...

1 from nltk.corpus import wordnet


2
3 synonyms = []
4 for syn in wordnet.synsets('Computer'):
5 for lemma in syn.lemmas():
6 synonyms.append(lemma.name())
7 print(synonyms)
8
9 Output:
10
11 ['computer', 'computing_machine', 'computing_device', 'data_processor', 'electronic_computer',

nlp20.py hosted with by GitHub view raw

1 from nltk.corpus import wordnet


2
3 antonyms = []
4 for syn in wordnet.synsets("small"):
5 for l in syn.lemmas():
6 if l.antonyms():
7 antonyms.append(l.antonyms()[0].name())
8 print(antonyms)
9
10 Output:
11 ['large', 'big', 'big']

nlp21.py hosted with by GitHub view raw

13 of 18 1/30/2023, 5:00 PM
Beginner Practical Guide of Natural Language Processing(NLP) | by As... https://medium.com/ml-research-lab/beginner-practical-guide-of-natural...

1 from nltk.stem import PorterStemmer


2
3 stemmer = PorterStemmer()
4 print(stemmer.stem('working'))
5 print(stemmer.stem('worked'))
6
7 Output:
8
9 work
10 work

nlp22.py hosted with by GitHub view raw

1 from nltk.stem import SnowballStemmer


2
3 print(SnowballStemmer.languages)
4
5 Output:
6 'danish', 'dutch', 'english', 'finnish', 'french', 'german', 'hungarian', 'italian',
7 'norwegian', 'porter', 'portuguese', 'romanian', 'russian', 'spanish', 'swedish'

nlp23.py hosted with by GitHub view raw

14 of 18 1/30/2023, 5:00 PM
Beginner Practical Guide of Natural Language Processing(NLP) | by As... https://medium.com/ml-research-lab/beginner-practical-guide-of-natural...

1 from nltk.stem import SnowballStemmer


2
3 french_stemmer = SnowballStemmer('french')
4
5 print(french_stemmer.stem("French word"))

nlp24.py hosted with by GitHub view raw

1 from nltk.stem import PorterStemmer


2
3 stemmer = PorterStemmer()
4
5 print(stemmer.stem('increases'))
6
7 Output:
8
9 increas

nlp25.py hosted with by GitHub view raw

1 from nltk.stem import WordNetLemmatizer


2
3 lemmatizer = WordNetLemmatizer()
4
5 print(lemmatizer.lemmatize('increases'))
6
7 Output:
8 increase

nlp26.py hosted with by GitHub view raw

15 of 18 1/30/2023, 5:00 PM
Beginner Practical Guide of Natural Language Processing(NLP) | by As... https://medium.com/ml-research-lab/beginner-practical-guide-of-natural...

1 from nltk.stem import WordNetLemmatizer


2
3 lemmatizer = WordNetLemmatizer()
4
5 print(lemmatizer.lemmatize('playing', pos="v"))
6
7 Output:
8 play

nlp27.py hosted with by GitHub view raw

1 from nltk.stem import WordNetLemmatizer


2
3 lemmatizer = WordNetLemmatizer()
4 print(lemmatizer.lemmatize('playing', pos="v"))
5 print(lemmatizer.lemmatize('playing', pos="n"))
6 print(lemmatizer.lemmatize('playing', pos="a"))
7 print(lemmatizer.lemmatize('playing', pos="r"))
8
9 Output :
10 play
11 playing
12 playing
13 playing

nlp28.py hosted with by GitHub view raw

16 of 18 1/30/2023, 5:00 PM
Beginner Practical Guide of Natural Language Processing(NLP) | by As... https://medium.com/ml-research-lab/beginner-practical-guide-of-natural...

1 from nltk.stem import WordNetLemmatizer


2 from nltk.stem import PorterStemmer
3
4 stemmer = PorterStemmer()
5 lemmatizer = WordNetLemmatizer()
6 print(stemmer.stem('stones'))
7 print(stemmer.stem('speaking'))
8 print(stemmer.stem('bedroom'))
9 print(stemmer.stem('jokes'))
10 print(stemmer.stem('lisa'))
11 print(stemmer.stem('purple'))
12 print('----------------------')
13 print(lemmatizer.lemmatize('stones'))
14 print(lemmatizer.lemmatize('speaking'))
15 print(lemmatizer.lemmatize('bedroom'))
16 print(lemmatizer.lemmatize('jokes'))
17 print(lemmatizer.lemmatize('lisa'))
18 print(lemmatizer.lemmatize('purple'))
19
20 Output:
21 stone
22 speak
23 bedroom
24 joke

27 ---------------------
28 stone
29 speaking
30 bedroom
31 joke
32 lisa
33 purple

nlp29.py hosted with by GitHub view raw

17 of 18 1/30/2023, 5:00 PM
Beginner Practical Guide of Natural Language Processing(NLP) | by As... https://medium.com/ml-research-lab/beginner-practical-guide-of-natural...

18 of 18 1/30/2023, 5:00 PM

You might also like