0% found this document useful (0 votes)
41 views3 pages

DWM Exp10

The document describes steps to extract text from a Wikipedia article on artificial intelligence using Beautiful Soup and NLTK. It cleans the text, tokenizes it into sentences, calculates word frequencies, scores sentences based on word weights, and generates a 7 sentence summary of the article.

Uploaded by

Temp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views3 pages

DWM Exp10

The document describes steps to extract text from a Wikipedia article on artificial intelligence using Beautiful Soup and NLTK. It cleans the text, tokenizes it into sentences, calculates word frequencies, scores sentences based on word weights, and generates a 7 sentence summary of the article.

Uploaded by

Temp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Untitled4.ipynb - Colaboratory https://colab.research.google.com/drive/1mBgeBK93...

1 pip install beautifulsoup4

Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.10/dist-pa


Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-pac

1 pip install lxml

Requirement already satisfied: lxml in /usr/local/lib/python3.10/dist-packages (4.

1 import bs4 as bs
2 import urllib.request
3 import nltk
4 nltk.download('stopwords')
5 import re
6 nltk.download('punkt')
7
8 scraped_data = urllib.request.urlopen('https://en.wikipedia.org/wiki/Artificial_int
9 article = scraped_data.read()
10
11 parsed_article = bs.BeautifulSoup(article,'lxml')
12
13 paragraphs = parsed_article.find_all('p')
14
15 article_text = ""
16
17 for p in paragraphs:
18 article_text += p.text

[nltk_data] Downloading package stopwords to /root/nltk_data...


[nltk_data] Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Package punkt is already up-to-date!

1 article_text = re.sub(r'\[[0-9]*\]', ' ', article_text)


2 article_text = re.sub(r'\s+', ' ', article_text)
3 print(article_text)

Artificial intelligence (AI) is the intelligence of machines or software, as oppo

1 formatted_article_text = re.sub('[^a-zA-Z]', ' ', article_text )


2 formatted_article_text = re.sub(r'\s+', ' ', formatted_article_text)
3 print(formatted_article_text)

Artificial intelligence AI is the intelligence of machines or software as opposed

1 sentence_list = nltk.sent_tokenize(article_text)
2 print(sentence_list)

['\nArtificial intelligence (AI) is the intelligence of machines or software, as o

1 of 3 04/10/23, 12:49
Untitled4.ipynb - Colaboratory https://colab.research.google.com/drive/1mBgeBK93...

1 stopwords = nltk.corpus.stopwords.words('english')
2
3 word_frequencies = {}
4 for word in nltk.word_tokenize(formatted_article_text):
5 if word not in stopwords:
6 if word not in word_frequencies.keys():
7 word_frequencies[word] = 1
8 else:
9 word_frequencies[word] += 1

1 maximum_frequncy = max(word_frequencies.values())
2
3 for word in word_frequencies.keys():
4 word_frequencies[word] = (word_frequencies[word]/maximum_frequncy)
5
6 print(word_frequencies)

{'Artificial': 0.058823529411764705, 'intelligence': 0.3697478991596639, 'AI': 1.0

1 sentence_scores = {}
2 for sent in sentence_list:
3 for word in nltk.word_tokenize(sent.lower()):
4 if word in word_frequencies.keys():
5 if len(sent.split(' ')) < 30:
6 if sent not in sentence_scores.keys():
7 sentence_scores[sent] = word_frequencies[word]
8 else:
9 sentence_scores[sent] += word_frequencies[word]
10
11 print(sentence_scores)

{'\nArtificial intelligence (AI) is the intelligence of machines or software, as o

1 import heapq
2 summary_sentences = heapq.nlargest(7, sentence_scores, key=sentence_scores.get
3
4 summary = ' '.join(summary_sentences)
5 print(summary)

[64]
A machine with artificial general intelligence should be able to solve a wide vari
Deep learning has drastically improved the performance of programs in many importa
and others. [45] Deep learning uses artificial neural networks for all of these ty
Artificial intelligence (AI) is the intelligence of machines or software, as oppos
Many researchers began to doubt that the current practices would be able to imitat
Learning algorithms for neural networks use local search to choose the weights tha

2 of 3 04/10/23, 12:49
Untitled4.ipynb - Colaboratory https://colab.research.google.com/drive/1mBgeBK93...

3 of 3 04/10/23, 12:49

You might also like