0% found this document useful (0 votes)

170 views33 pages

Information Retrieval Journal

The document provides instructions for installing Hadoop on Windows 10 and implementing a word count MapReduce program. It describes downloading and configuring Java, Hadoop, and associated XML configuration files. Testing involves formatting HDFS, starting core Hadoop and YARN processes, and verifying they run successfully. The word count example creates an input directory in HDFS and copies a local file into it before running the MapReduce job.

Uploaded by

crazzy demon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

170 views33 pages

Information Retrieval Journal

Uploaded by

crazzy demon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

INDEX

Sr. No. Topic Date Signature

1. Write a program to demonstrate bitwise

operation.

2. Implement Page Rank Algorithm.

3. Implement Dynamic programming

algorithm for computing the edit distance
between strings s1 and s2. (Hint.
Levenshtein Distance).

4. Write a program to Compute Similarity

between two text documents.

5. Write a map-reduce program to count the

number of occurrences of each alphabetic
character in the given dataset. The count
for each letter should be case-insensitive
(i.e., include both upper-case and lower-
case versions of the letter; Ignore non-
alphabetic characters).

6. Implement a basic IR system using Lucene.

7. Write a program for Pre-processing of a

Text Document: stop word removal.

8. Write a program for mining Twitter to

identify tweets for a specific period and
identify trends and named entities.

9. Write a program to implement simple web

crawler.

10. Write a program to parse XML text,

generate Web graph and compute topic
specific page rank.
PRACTICAL NO. 01
Program Statement :- Write a Program to Demonstrate bitwise operation.
CODE:
package test;

public class Test {

public static void main(String[] args)

{
int a= 60;
int b= 13;
int c=0;

c= a & b;
System.out.println("a & b ="+c);

c = a |b;
System.out.println("a | b="+c);

c = a ^ b;
System.out.println("a ^ b="+c);

c = ~a;
System.out.println("~a ="+c);

c = a <<2;
System.out.println("a <<2="+c);

c = a >>2;
System.out.println("a >>2="+c);

c = a>>>2;
System.out.println("a >>>2="+c);

}
}
OUTPUT:
PRACTICAL NO 2
Program Statement :- Implement Page Rank Algorithm.
Steps:
Step1:Open cmd and Install numpy and Install scipy
“pip install numpy” , “pip install scipy”

CODE:
import numpy as np
from scipy.sparse import csc_matrix
def pageRank(G, s = .85, maxerr = .0001):
n = G.shape[0]
# transform G into markov matrix A
A = csc_matrix(G,dtype=np.float)
rsums = np.array(A.sum(1))[:,0]
ri, ci = A.nonzero()
A.data /= rsums[ri]
# bool array of sink states
sink = rsums==0
# Compute pagerank r until we converge
ro, r = np.zeros(n), np.ones(n)
while np.sum(np.abs(r-ro)) > maxerr:
ro = r.copy()
# calculate each pagerank at a time
for i in range(0,n):
# inlinks of state i
Ai = np.array(A[:,i].todense())[:,0]
# account for sink states
Di = sink / float(n)
# account for teleportation to state i
Ei = np.ones(n) / float(n)
r[i] = ro.dot( Ai*s + Di*s + Ei*(1-s) )
# return normalized pagerank
return r/float(sum(r))
if __name__=='__main__':
# Example extracted from 'Introduction to Information Retrieval'
G = np.array([[0,0,1,0,0,0,0],
[0,1,1,0,0,0,0],
[1,0,1,1,0,0,0],
[0,0,0,1,1,0,0],
[0,0,0,0,0,0,1],
[0,0,0,0,0,1,1],
[0,0,0,1,1,0,1]])
print(pageRank(G,s=.86))

OUTPUT:
PRACTICAL NO.3

Program Statement :- Implement Dynamic programming algorithm for computing the

edit distance between string s1 and s2.

CODE:
public class Levenshtein {
public static int distance (String a, String b) {
a= a.toLowerCase();
b= b.toLowerCase();
int[]costs=new int[b.length()+1];

for(int j = 0; j < costs.length;j++)

costs[j]=j;
for (int i = 1;i <= a.length();i++)
{
costs[0]=i;
int nw=i-1;
for(int j=1;j<=b.length();j++)
{
int cj=Math.min(1+Math.min(costs[j],costs[j-1]), a.charAt(i-1)== b.charAt(j-1)?nw:nw+1);
nw=costs[j];
costs[j]=cj;

}}
return costs[b.length()];}

public static void main(String[] args)

{
String[] data={"kitten","sitting","saturday","sunday","rosettacode","raisethysword"};
for(int i=0;i<data.length;i+=2)
System.out.println("distance("+data[i]+","+data[i+1]+")="+distance(data[i],data[i+1]));
}}

OUTPUT:
PRACTICAL NO: 4
Program Statement :- Write a program to Copute Similarity between two text
document.
Steps:
Open cmd and type following commands:
“pip install nitk”
“pip install numpy”

CODE:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import numpy as np
import nltk
#nltk.download()
#nltk.download('punkt')
#nltk.download('stopwords')
def process(file):
raw=open(file).read()
tokens=word_tokenize(raw)
words=[w.lower() for w in tokens]
porter= nltk.PorterStemmer()
stemmed_tokens=[porter.stem(t) for t in words]
# removing stop words
stop_words=set(stopwords.words('english'))
filtered_tokens=[w for w in stemmed_tokens if not w in stop_words]
#count words
count=nltk.defaultdict(int)
for word in filtered_tokens:
count[word]+=1
return count;
def cos_sim(a,b):
dot_product=np.dot(a,b)
norm_a=np.linalg.norm(a)
norm_b=np.linalg.norm(b)
return dot_product/(norm_a * norm_b)
def getSimilarity(dict1,dict2):
all_words_list=[]
for key in dict1:
all_words_list.append(key)
for key in dict2:
all_words_list.append(key)
all_words_list_size=len(all_words_list)
v1=np.zeros(all_words_list_size,dtype=np.int)
v2=np.zeros(all_words_list_size,dtype=np.int)
i=0
for (key) in all_words_list:
v1[i]=dict1.get(key,0)
v2[i]=dict2.get(key,0)
i=i+1
return cos_sim(v1,v2)
if __name__ == '__main__':
dict1=process('C://Users//DELL//Downloads//text1.txt')
dict2=process('C://Users//DELL//Downloads//text2.txt')
print("Similarity between two text documents",getSimilarity(dict1,dict2))

OUTPUT:
PRACTICAL NO: 5
Program Statement :- Write a map-reduce program to count the number of occurrences
of each alphabetic character in the given dataset. The count for each letter should be
case-insensitive (i.e., include both upper-case and lower-case versions of the letter;
Ignore non-alphabetic characters).

Installation of Hadoop:

1) To install Hadoop in your windows machine, at first you need to download and install latest java
JDK version and set JAVA_HOME path as your java installation path, in my case its
“C:\Java\jdk1.8.0_171”.
These softwares should be prepared to install Hadoop 3.1.2 on window 10 64bit
Download Hadoop 2.8.0 (Link: http://www-eu.apache.org/dist/hadoop/common/hadoop-3.1.2/hadoop-
3.1.2-src.tar.gz).
Java JDK 1.8.0.zip (Link: http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-
2133151.html)

2)Check either Java 1.8.0 is already installed on your system or not, use "javac -version" to check.

If Java is not installed on your system then first install java under "C:\Java".

Extract file hadoop-3.1.2.tar.gz or hadoop-3.1.2.zip and place under "C:\hadoop-3.1.2".

3) Set the path HADOOP_HOME Environment variable on windows 10(see Step 1,2,3 and 4 below).

Set the path JAVA_HOME Environment variable on windows 10(see Step 1,2,3 and 4 below).

Next we set the Hadoop bin directory path and JAVA bin directory path.
4) Configuration:-
Edit file C:/hadoop-3.1.2/etc/hadoop/core-site.xml, paste below xml paragraph and save this file.
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Rename "mapred-site.xml.template" to "mapred-site.xml" and edit this fil C:/hadoop-

3.1.2/etc/hadoop/mapred-site.xml, paste below xml paragraph and save this file.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Create folder "data" under "C:\Hadoop-3.1.2"
Create folder "datanode" under "C:\Hadoop-3.1.2\data"
Create folder "namenode" under "C:\hadoop-
3.1.2\data"

Edit file C:\hadoop-3.1.2/etc/hadoop/hdfs-site.xml, paste below xml paragraph and save this file.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>C:\hadoop-2.8.0\data\namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>C:\hadoop-2.8.0\data\datanode</value>
</property>
</configuration>
Edit file C:/hadoop-3.1.2/etc/hadoop/yarn-site.xml, paste below xml paragraph and save this file.
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
Edit file C:/hadoop-3.1.2/etc/hadoop/hadoop-env.cmd by closing the command
line"JAVA_HOME=%JAVA_HOME%" instead of set "JAVA_HOME=C:\Java" (On C:\java this is
path to file jdk.1.8.0)

5)Hadoop Configuration:-
Download file Hadoop Configuration.zip (Link: https://github.com/MuhammadBilalYar/HADOOP-
INSTALLATION-ON-WINDOW-10/blob/master/Hadoop%20Configuration.zip)
Delete file bin on C:\Hadoop-2.8.0\bin, replaced by file bin on file just download (from Hadoop
Configuration.zip).

Open cmd and typing command "hdfs namenode –format" . You will see.
6) Testing :-
Open cmd and change directory to "C:\hadoop-3.1.2\sbin" and type "start-all.cmd" to start apache.

Make sure these apps are running

Hadoop Namenode
Hadoop datanode
YARN Resourc Manager
YARN Node Manager

http://localhost:9870
7)Word Count:
Open cmd, go to Hadoop folder and type following command:

i)Create a fold input in hadoop file directory:

C:\hadoop-3.1.2>hdfs dfs -mkdir /input

ii)create file1.txt and copy in input directory:

a)C:\hadoop-3.1.2>hdfs dfs -copyFromLocal C:/hadoop-3.1.2/file1.txt /input

b)C:\hadoop-3.1.2>hdfs dfs -cat /input/file1.txt

c) C:\hadoop-3.1.2>hdfs dfs -ls /input

iii)Create two python files a)mapper.py and b)reducer.py in C:/hadoop-3.1.2 and write
following code:
a)mapper.py:
code:
#!/usr/bin/env python
import sys
def read_input(file):
for line in file:
# split the line into words
yield line.split()
def main(separator='\t'):
# input comes from STDIN (standard input)
data = read_input(sys.stdin)
for words in data:
for word in words:
print ('%s%s%d' % (word, separator, 1))
main()

b)reducer.py:
code:
import sys
last_turf = None
turf_count = 0
line1=[]
for l1 in sys.stdin:
line1.append(l1)
line1.sort()
for line in line1:
line = line.strip()
turf, count = line.split("\t")
count = int(count)

# if this is the first iteration

if not last_turf:
last_turf = turf
# if they're the same, log it
if turf == last_turf:
turf_count += count
else:
# state change (previous line was k=x, this line is k=y)
result = [last_turf, turf_count]
print("\t".join(str(v) for v in result))
last_turf = turf
turf_count = 1
# this is to catch the final counts after all records have been received.
print("\t".join(str(v) for v in [last_turf, turf_count]))

Write paragraph in file1.txt and check from command prompt:

C:\hadoop-3.1.2>Python mapper.py <file1.txt >file2.txt
C:\hadoop-3.1.2>Python reducer.py <file2.txt

file1.txt:
file2.txt:

OUTPUT:
PRACTICAL NO 6
Program Statement :- Implement a basic IR system using Lucene.
Steps:

Step 1 - Create Java Project

The first step is to create a simple Java Project using Netbeans. Follow the option File > New Project >
Java > Java Application. Now name your project as LuceneFirstApplication using the wizard
window as follows −

Once your project is created successfully, you will have following content in your Project Explorer −

Step 2 - Add Required Libraries

Let us now add Lucene core Framework library in our project. To do this, right click on Libraries and
select Add JAR/Folder and select following files and click open:

Step 3 - Create Source Files

Let us now create actual source files under the LuceneFirstApplication
project. First we need to create a package called com.tutorialspoint.lucene.To do this, right-click on
Project Name in package explorer section and follow the option : New -> Java Package.

Next we will create LuceneConstants.java and other java classes under

the com.tutorialspoint.lucene package.

LuceneConstants.java
This class is used to provide various constants to be used across the sample application.

CODE :

package com.tutorialspoint.lucene;

public class LuceneConstants {

public static final String CONTENTS = "contents";

public static final String FILE_NAME = "filename";

public static final String FILE_PATH = "filepath";

public static final int MAX_SEARCH = 10;

TextFileFilter.java
This class is used as a .txt file filter.

CODE :

package com.tutorialspoint.lucene;

import java.io.File;
import java.io.FileFilter;

public class TextFileFilter implements FileFilter {

@Override

public boolean accept(File pathname) {

return pathname.getName().toLowerCase().endsWith(".txt");

Indexer.java
This class is used to index the raw data so that we can make it searchable using the Lucene library.

CODE :

package com.tutorialspoint.lucene;

import java.io.File;

import java.io.FileFilter;

import java.io.FileReader;

import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.index.CorruptIndexException;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.store.Directory;

import org.apache.lucene.store.FSDirectory;

import org.apache.lucene.util.Version;

public class Indexer {

private IndexWriter writer;

public Indexer(String indexDirectoryPath) throws IOException {

//this directory will contain the indexes

Directory indexDirectory =

FSDirectory.open(new File(indexDirectoryPath));

//create the indexer

writer = new IndexWriter(indexDirectory, new
StandardAnalyzer(Version.LUCENE_35),true,IndexWriter.MaxFieldLength.UNLIMITED);

public void close() throws CorruptIndexException, IOException {

writer.close();

private Document getDocument(File file) throws IOException {

Document document = new Document();

//index file contents

Field contentField = new Field(LuceneConstants.CONTENTS, new FileReader(file));

//index file name

Field fileNameField = new Field(LuceneConstants.FILE_NAME,

file.getName(),Field.Store.YES,Field.Index.NOT_ANALYZED);

//index file path

Field filePathField = new Field(LuceneConstants.FILE_PATH,

file.getCanonicalPath(),Field.Store.YES,Field.Index.NOT_ANALYZED);

document.add(contentField);

document.add(fileNameField);

document.add(filePathField);

return document;

private void indexFile(File file) throws IOException {

System.out.println("Indexing "+file.getCanonicalPath());

Document document = getDocument(file);

writer.addDocument(document);

public int createIndex(String dataDirPath, FileFilter filter)

throws IOException {

//get all files in the data directory

File[] files = new File(dataDirPath).listFiles();

for (File file : files) {

if(!file.isDirectory()

&& !file.isHidden()

&& file.exists()

&& file.canRead()

&& filter.accept(file)

){

indexFile(file);

return writer.numDocs();

Searcher.java
This class is used to search the indexes created by the Indexer to search the requested content.

CODE :

package com.tutorialspoint.lucene;

import java.io.File;

import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.index.CorruptIndexException;

import org.apache.lucene.queryParser.ParseException;

import org.apache.lucene.queryParser.QueryParser;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.Query;

import org.apache.lucene.search.ScoreDoc;

import org.apache.lucene.search.TopDocs;

import org.apache.lucene.store.Directory;

import org.apache.lucene.store.FSDirectory;

import org.apache.lucene.util.Version;

public class Searcher {

IndexSearcher indexSearcher;

QueryParser queryParser;

Query query;

public Searcher(String indexDirectoryPath)

throws IOException {

Directory indexDirectory =

FSDirectory.open(new File(indexDirectoryPath));

indexSearcher = new IndexSearcher(indexDirectory);

queryParser = new QueryParser(Version.LUCENE_35,

LuceneConstants.CONTENTS,

new StandardAnalyzer(Version.LUCENE_35));

public TopDocs search( String searchQuery)

throws IOException, ParseException {

query = queryParser.parse(searchQuery);

return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);

public Document getDocument(ScoreDoc scoreDoc)

throws CorruptIndexException, IOException {

return indexSearcher.doc(scoreDoc.doc);

public void close() throws IOException {

indexSearcher.close();

LuceneTester.java
This class is used to test the indexing and search capability of lucene library.

CODE :

package com.tutorialspoint.lucene;

import java.io.IOException;
import org.apache.lucene.document.Document;

import org.apache.lucene.queryParser.ParseException;

import org.apache.lucene.search.ScoreDoc;

import org.apache.lucene.search.TopDocs;

public class LuceneTester {

String indexDir = "C:\\Lucene\\Index";

String dataDir = "C:\\Lucene\\Data";

Indexer indexer;

Searcher searcher;

public static void main(String[] args) {

LuceneTester tester;

try {

tester = new LuceneTester();

tester.createIndex();

tester.search("Mohan");

} catch (IOException e) {

e.printStackTrace();

} catch (ParseException e) {

e.printStackTrace();

private void createIndex() throws IOException {

indexer = new Indexer(indexDir);

int numIndexed;

long startTime = System.currentTimeMillis();

numIndexed = indexer.createIndex(dataDir, new TextFileFilter());

long endTime = System.currentTimeMillis();

indexer.close();

System.out.println(numIndexed+" File indexed, time taken: "

+(endTime-startTime)+" ms");

}
private void search(String searchQuery) throws IOException, ParseException {

searcher = new Searcher(indexDir);

long startTime = System.currentTimeMillis();

TopDocs hits = searcher.search(searchQuery);

long endTime = System.currentTimeMillis();

System.out.println(hits.totalHits +

" documents found. Time :" + (endTime - startTime));

for(ScoreDoc scoreDoc : hits.scoreDocs) {

Document doc = searcher.getDocument(scoreDoc);

System.out.println("File: "

+ doc.get(LuceneConstants.FILE_PATH));

searcher.close();

Step 4 - Data & Index directory creation

We have used 10 text files from record1.txt to record10.txt containing names and other details of the
students and put them in the directory C:\Lucene\Data. An index directory path should be created
as C:\Lucene\Index. After running this program, you can see the list of index files created in that
folder.

Step 5 - Running the program

Once you are done with the creation of the source, the raw data, the data directory and the index
directory, you are ready for compiling and running of your program. To do this, keep
the LuceneTester.Java file tab active and use either the Run option available in the Netbeans IDE. If
the application runs successfully, it will print the following message in Netbeans IDE's console –

OUTPUT:
Once you've run the program successfully, you will have the following content in your index
directory –

OUTPUT:
PRACTICAL NO 7
Program Statement :- Write a program for Pre-processing of a Text Document: stop
word removal.
Steps :-
Step1:Open cmd and Install nltk.
“pip install nltk”

CODE:
from nltk.corpus import stopwords
input_str="NLTK is a leading platform for building Python programs to work with human language
data."
stop_words=set(stopwords.words('english'))
from nltk.tokenize import word_tokenize
tokens=word_tokenize(input_str)
result=[i for i in tokens if not i in stop_words]
print(result)

OUTPUT:
PRACTICAL NO:8
Program Statement :- Write a program for mining twitter to identify twittes for specific
period and identify trends and name entities.
Steps :-
Install few packages to create this:
tweepy, tkinter, textblob, matplotlib.

Using this command:

pip install packagename

Now create twitter account and log In:

Click on Create an App:

Fill the details to create App:

App created:
Generate a consumer key, Consumer secret, Access token and Access Token secret:

CODE :

import tweepy
from tkinter import *
from time import sleep
from datetime import datetime
from textblob import TextBlob
import matplotlib.pyplot
import twitter
def load_api():
consumer_key='8lGhUCuzaOprx5uYtkaGazqpj'
consumer_secret='fv8bRgDSusvjQSMNgvSmL4CYTMy3Acc0NlesCgRQPvVapRzr4e'
access_token='1090831287902846976-OiCwMRqiWGa3aXrJsjDVQQi5sZqdNK'
access_token_secret='Lxt4A9w3uq42h7cKRj9HAuYvtMotGptItqL1B9GBY9YpY'
#auth=tweepy.OAuthHandler(consumer_key,consumer_secret)
#auth.set_access_token(access_token,access_token_secret)
#api=tweepy.API(auth)
auth=twitter.oauth.OAuth(access_token,access_token_secret,consumer_key,consumer_secret)
twitter_api=twitter.Twitter(auth=auth)
return twitter_api
def getE1():
return E1.get()

def getE2():
return E2.get()
def getData():
#getE1()
keyword=getE1()
#getE2()
numberOfTweets=getE2()
numberOfTweets=int(numberOfTweets)
twitter_api=load_api()
w_wo=1
us_woh=23424977
wTrend=twitter_api.trends.place(_id=w_wo)
UTrend=twitter_api.trends.place(_id=us_woh)
#abc=wTrend[3]
print(wTrend)
print(UTrend)
root=Tk()
label1=Label(root,text="Search")
E1=Entry(root,bd=5)
label2=Label(root,text="Sample Size")
E2=Entry(root,bd=5)
submit=Button(root,text="Submit",command=getData)
label1.pack()
E1.pack()
label2.pack()
E2.pack()
submit.pack(side=BOTTOM)
root.mainloop()
OUTPUT:
PRACTICAL NO 9:
Program Statement :- Write a program to implement simple web crawler.
Steps :
Step1:Open cmd and Install requests and Install bs4
“pip install requests” , “pip install bs4”

CODE:
import requests
from bs4 import BeautifulSoup
def web(page,WebUrl):
if(page>0):
url=WebUrl
code=requests.get(url)
plain=code.text
s=BeautifulSoup(plain, "html.parser")
for link in s.findAll('a',{'class':'s-access-detail-page'}):
tet=link.get('title')
print(tet)
tet_2=link.get('href')
print(tet_2)
web(1,'http://www.amazon.in/s/ref=s9_acss_bw_cts_VodooFS_T4_w?rh=i%3Aelectronics%2Cn%3
A976419031%2Cn%3A%21976420031%2Cn%3A1389401031%2Cn%3A1389432031%2Cn%3A18
05560031%2Cp_98%3A10440597031%2Cp_36%3A1500000-
99999999&bbn=1805560031&rw_html_to_wsrp=1&pf_rd_m=A1K21FY43GMZF8&pf_rd_s=merch
andised-search-3&pf_rd_r=2EKZMFFDEXJ5HE8RVV6E&pf_rd_t=101&pf_rd_p=c92c2f88-469b-
4b56-
OUTPUT:
Practical No:10
Program Statement : Write a program to parse XML text, generate web graph and
compute topics specific page rank.
Steps:
Step 1:
Open cmd and type following command:
“pip install requests”
“pip install python-csv”

CODE:
import csv
import requests
import xml.etree.ElementTree as ET
def loadRSS():
url='http://www.hindustantimes.com/rss/topnews/rssfeed.xml'
resp=requests.get(url)
with open('topnewsfeed.xml','wb') as f:
f.write(resp.content)
def parseXML(xmlfile):
tree=ET.parse(xmlfile)
root=tree.getroot()
newsitems=[]
for item in root.findall('./channel/item'):
news={}
for child in item:
if child.tag=='{http://search.yahoo.com/mrss/}content':
news['media']=child.attrib['url']
else:
news[child.tag]=child.text.encode('utf8')
newsitems.append(news)
return newsitems
def savetoCSV(newsitems,filename):
fields=['guid','title','pubDate','description','link','media']
with open(filename,'w')as csvfile:
writer=csv.DictWriter(csvfile,fieldnames=fields)
writer.writeheader()
writer.writerows(newsitems)
loadRSS()
newsitems=parseXML('topnewsfeed.xml')
savetoCSV(newsitems,'topnews.csv')
def generate_edges(graph):
edges=[]
for node in graph:
for neighbour in graph[node]:
edges.append((node,neighbour))
return edges

OUTPUT:
topnewsfeed.xml:

topnews.csv:

Dsal Lab Manual
No ratings yet
Dsal Lab Manual
65 pages
Shimadzu CLASS-VP: Chromatography Data System CLASS-VP Ver.6.1 Instruction Manual For Additional Functions
No ratings yet
Shimadzu CLASS-VP: Chromatography Data System CLASS-VP Ver.6.1 Instruction Manual For Additional Functions
68 pages
Information Retrieval Practical
100% (1)
Information Retrieval Practical
16 pages
Leetcode CPP PDF
No ratings yet
Leetcode CPP PDF
262 pages
PYTHON Lab (21CSL46) Manual 4th Sem Final
No ratings yet
PYTHON Lab (21CSL46) Manual 4th Sem Final
69 pages
Python Lab Manual Created
No ratings yet
Python Lab Manual Created
13 pages
AI - Record Template
No ratings yet
AI - Record Template
67 pages
AI&ML
No ratings yet
AI&ML
38 pages
Priyanka Final Project
No ratings yet
Priyanka Final Project
71 pages
APL Manual 20 June 2024
No ratings yet
APL Manual 20 June 2024
50 pages
DevOps - Unit - 2
No ratings yet
DevOps - Unit - 2
61 pages
Newdlp Organized
No ratings yet
Newdlp Organized
49 pages
AI and ML Lab Manual
No ratings yet
AI and ML Lab Manual
29 pages
Programming Questions
No ratings yet
Programming Questions
55 pages
Gr12 - IT Theory LB PRINT
No ratings yet
Gr12 - IT Theory LB PRINT
147 pages
LC LabsoltionOperatorsGuide 202011
No ratings yet
LC LabsoltionOperatorsGuide 202011
380 pages
1.fibonacci Series: PGM - No:1 08/06/2023 Aim
No ratings yet
1.fibonacci Series: PGM - No:1 08/06/2023 Aim
73 pages
MN Uwmemlog e
No ratings yet
MN Uwmemlog e
69 pages
IRFinal
No ratings yet
IRFinal
46 pages
AIML Lab Manual
No ratings yet
AIML Lab Manual
39 pages
Cs3491-Aiml Lab Manual
No ratings yet
Cs3491-Aiml Lab Manual
59 pages
AI Artificial Intelligence
No ratings yet
AI Artificial Intelligence
28 pages
Tech Max
No ratings yet
Tech Max
116 pages
Google Workspace Future of Work Ebook
No ratings yet
Google Workspace Future of Work Ebook
40 pages
Saanp pt.3-1
No ratings yet
Saanp pt.3-1
21 pages
CS3491 AI and ML Lab Manual
No ratings yet
CS3491 AI and ML Lab Manual
30 pages
Python Lab Manual 2023 24
No ratings yet
Python Lab Manual 2023 24
15 pages
Python Lab Manual
No ratings yet
Python Lab Manual
29 pages
Python - Lab - Manual 2
100% (1)
Python - Lab - Manual 2
37 pages
FINALailabfile
No ratings yet
FINALailabfile
26 pages
Forensic Discovery PDF
No ratings yet
Forensic Discovery PDF
240 pages
Dsa Lab File Complete
No ratings yet
Dsa Lab File Complete
19 pages
Bhuvan Python Proj
No ratings yet
Bhuvan Python Proj
32 pages
Ad Lab 03 22051194
No ratings yet
Ad Lab 03 22051194
21 pages
Cyber Forsenic Practical - Journal-1
No ratings yet
Cyber Forsenic Practical - Journal-1
77 pages
Leetcode CPP
No ratings yet
Leetcode CPP
262 pages
System Administrator's Guide: Microsoft Dynamics GP
No ratings yet
System Administrator's Guide: Microsoft Dynamics GP
114 pages
Untitled Document
No ratings yet
Untitled Document
15 pages
Disadvantages of File Processing System
No ratings yet
Disadvantages of File Processing System
17 pages
1 Lab
No ratings yet
1 Lab
17 pages
AI - Programs KP Print
No ratings yet
AI - Programs KP Print
14 pages
Python Programming Lab
No ratings yet
Python Programming Lab
22 pages
Ethical Hacking Journal
No ratings yet
Ethical Hacking Journal
38 pages
Leetcode CPP
No ratings yet
Leetcode CPP
262 pages
1a.install Python and Set Up The Development Environment.: Code: Print ("Hello World") Output
No ratings yet
1a.install Python and Set Up The Development Environment.: Code: Print ("Hello World") Output
14 pages
Here Are 20 Challenging LeetCode Questions Commonly Asked in Data Science and Informatics Interviews
No ratings yet
Here Are 20 Challenging LeetCode Questions Commonly Asked in Data Science and Informatics Interviews
5 pages
IR Practical B1
No ratings yet
IR Practical B1
15 pages
Python Lab
No ratings yet
Python Lab
27 pages
Aarc Journel Sem1 Part1 SPDC Idol Arvindrathod
No ratings yet
Aarc Journel Sem1 Part1 SPDC Idol Arvindrathod
16 pages
Duplicate Cleaner Log File
No ratings yet
Duplicate Cleaner Log File
12 pages
Code3 0
No ratings yet
Code3 0
28 pages
PSPP Lab Record
No ratings yet
PSPP Lab Record
18 pages
IR Practical Code
No ratings yet
IR Practical Code
13 pages
Rohan Panda 1841012123 CSE D IR LAB ASSIGNMENT
No ratings yet
Rohan Panda 1841012123 CSE D IR LAB ASSIGNMENT
32 pages
Python Manual
No ratings yet
Python Manual
22 pages
Accenture New Coding
No ratings yet
Accenture New Coding
6 pages
AI Practical Assignments
No ratings yet
AI Practical Assignments
12 pages
Microsoft Office and Windows Training Course Outline
No ratings yet
Microsoft Office and Windows Training Course Outline
15 pages
Python
No ratings yet
Python
11 pages
Ai 3
No ratings yet
Ai 3
8 pages
Complex Problems Python
No ratings yet
Complex Problems Python
12 pages
What Is Linux and Basic Components?
No ratings yet
What Is Linux and Basic Components?
19 pages
Syllabus BCA
No ratings yet
Syllabus BCA
59 pages
Datastage ODBC Configure
No ratings yet
Datastage ODBC Configure
12 pages
Basic Manual: First Steps
No ratings yet
Basic Manual: First Steps
77 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
20 pages
Assessment - 2: - K Mary Nikitha
No ratings yet
Assessment - 2: - K Mary Nikitha
27 pages
Lab Programs - Jupyter Notebook
No ratings yet
Lab Programs - Jupyter Notebook
10 pages
D20 ME Quick Start Guide
No ratings yet
D20 ME Quick Start Guide
10 pages
Introduction To Antconc by Tahir Shah
No ratings yet
Introduction To Antconc by Tahir Shah
20 pages
Performance - Slide 1 - Satyajit P PDF
No ratings yet
Performance - Slide 1 - Satyajit P PDF
50 pages
Python Answers
No ratings yet
Python Answers
6 pages
Flipkart Runway Coding Prep Final
No ratings yet
Flipkart Runway Coding Prep Final
5 pages
Strings and Text I/O
No ratings yet
Strings and Text I/O
53 pages
Python
No ratings yet
Python
4 pages
IR Assignment12
No ratings yet
IR Assignment12
3 pages
List of Practical
No ratings yet
List of Practical
3 pages
CSE 5311: Design and Analysis of Algorithms Programming Project Topics
No ratings yet
CSE 5311: Design and Analysis of Algorithms Programming Project Topics
3 pages
Raylib Cheatsheet v4.2
No ratings yet
Raylib Cheatsheet v4.2
6 pages
Se Dsa
No ratings yet
Se Dsa
2 pages
Introduction To ANSYS Introduction To ANSYS Meshing
No ratings yet
Introduction To ANSYS Introduction To ANSYS Meshing
15 pages
ITBP 315 - Operating Systems Fundamentals Lab 5 - Process Management
No ratings yet
ITBP 315 - Operating Systems Fundamentals Lab 5 - Process Management
8 pages
Data Hiding Techniques
No ratings yet
Data Hiding Techniques
2 pages
My New Task Combine Log
No ratings yet
My New Task Combine Log
5 pages
Chaper 7-8 Class 9 Test
No ratings yet
Chaper 7-8 Class 9 Test
1 page
Al3411 Artificial Intelligence and Machine Learning Laboratory L T P C
No ratings yet
Al3411 Artificial Intelligence and Machine Learning Laboratory L T P C
11 pages
DSA Index
No ratings yet
DSA Index
2 pages
How To Remove All Autodesk Products From A Windows System - AutoCAD - Autodesk Knowledge Network
No ratings yet
How To Remove All Autodesk Products From A Windows System - AutoCAD - Autodesk Knowledge Network
11 pages
CC APR-19 Munotes Mumbai University
No ratings yet
CC APR-19 Munotes Mumbai University
2 pages
Code Blocks
No ratings yet
Code Blocks
5 pages
Introduction
No ratings yet
Introduction
21 pages
HD808 Keycam / DV20 Manual
No ratings yet
HD808 Keycam / DV20 Manual
1 page