0% found this document useful (0 votes)
26 views7 pages

IR Pract

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views7 pages

IR Pract

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Practical No.

Aim: Calculate Page rank along with Hubs and Authorities.

Theory:

Hubs and Authorities (HITS):

Hubs and Authorities, also known as HITS or Kleinberg's algorithm, is another algorithm for ranking
web pages. It identifies two types of nodes in a network: hubs and authorities.

Hubs are nodes that point to many other nodes. They are seen as good resources for broad
information.

Authorities are nodes that are pointed to by many hubs. They are seen as experts on a specific topic.

The HITS algorithm iteratively updates hub and authority scores based on the link structure of the
graph until convergence.

Source Code:

import networkx as nx

# Create a directed graph (replace this with your own graph)

G = nx.DiGraph()

G.add_edges_from([(1, 2), (1, 3), (2, 3), (3, 1)])

# Calculate PageRank

pagerank_scores = nx.pagerank(G)

# Calculate HITS (Hub and Authority) scores

hits_scores = nx.hits(G)

# Print the results

print("PageRank Scores:", pagerank_scores)


print("Hub Scores:", hits_scores[0])

print("Authority Scores:", hits_scores[1])

Output:

runfile('C:/Users/ckt/untitled0.py', wdir='C:/Users/ckt')

PageRank Scores: {1: 0.387789442707259, 2: 0.21481051315058508, 3: 0.3974000441421556}

Hub Scores: {1: 0.6180339887498948, 2: 0.38196601125010515, 3: 0.0}

Authority Scores: {1: 0.0, 2: 0.3819660112501052, 3: 0.6180339887498949}

Practical No. 10

Aim: Demonstrate a simple web scraping process using Python within the Spyder environment.

Theory:

Web scraping is the process of extracting data or information from websites. It involves accessing
and retrieving the content of web pages, parsing the HTML or XML structure of the page, and then
extracting the desired information. Web scraping is commonly used for various purposes, including
data mining, data analysis, and content aggregation.

The provided code is designed to retrieve the HTML content from a designated URL, utilize the
BeautifulSoup library to parse the HTML, extract the text content, and display it on the console. This
code serves as a fundamental framework for developers to extend and adapt for more sophisticated
web scraping endeavors, catering to specific data extraction needs. It also underscores the
significance of verifying the HTTP response status before advancing with subsequent processing
steps.

Source Code:

import requests

from bs4 import BeautifulSoup

# Specify the URL you want to scrape


url = 'https://google.com'

# Send a GET request to the URL

response = requests.get(url)

# Check if the request was successful (status code 200)

if response.status_code == 200:

# Parse the HTML content of the page

soup = BeautifulSoup(response.text, 'html.parser')

# Find and print the text content (modify as needed based on the HTML structure)

text_content = soup.get_text()

print(text_content)

else:

print(f"Error: Unable to fetch content. Status code: {response.status_code}")

Output:

runfile('C:/Users/ckt/untitled2.py', wdir='C:/Users/ckt')

GoogleSearch Images Maps Play YouTube News Gmail Drive More »Web History | Settings | Sign in
Advanced searchGoogle offered in: हिन्दी বাংলা తెలుగు मराठी தமிழ் ગુજરાતી ಕನ್ನಡ
മലയാളം ਪੰਜਾਬੀ AdvertisingBusiness SolutionsAbout GoogleGoogle.co.in© 2024 - Privacy - Terms

Practical No. 11

Aim: Write a Python program is to perform N-gram analysis, specifically focusing on unigrams,
bigrams, and trigrams, using the Natural Language Toolkit (NLTK).

Theory
In natural language processing and information retrieval, N-grams are contiguous sequences of 'n'
items from a given sample of text or speech. Unigrams, bigrams, and trigrams are specific cases
where 'n' is set to 1, 2, and 3, respectively.

1. Unigrams:

Unigrams are the simplest form of N-grams, representing single words. They capture the most basic
lexical information from a text. In the provided code, unigrams are generated by tokenizing the input
text, resulting in a list of individual words.

2. Bigrams:

Bigrams represent pairs of adjacent words in a sequence. They provide a bit more context than
unigrams by considering the relationships between consecutive words. In the code, bigrams are
created by sliding a window of size 2 over the list of tokens.

3. Trigrams:

Trigrams extend the concept to triples of consecutive words. They offer a higher level of context
compared to bigrams and provide more insight into the structure and flow of language. Trigrams in
the code are generated by considering three consecutive words at a time.

Source Code:

import nltk

from nltk import word_tokenize

from nltk.util import ngrams

# Sample text

text = "This is a sample text for unigram, bigram, and trigram extraction using NLTK."

# Tokenize the text

tokens = word_tokenize(text.lower()) # Converting to lowercase for consistency


# Unigrams

unigrams = list(ngrams(tokens, 1))

# Bigrams

bigrams = list(ngrams(tokens, 2))

# Trigrams

trigrams = list(ngrams(tokens, 3))

# Print the results

print("Original Text:", text)

print("\nUnigrams:", unigrams)

print("\nBigrams:", bigrams)

print("\nTrigrams:", trigrams)

Output:

Original Text: This is a sample text for unigram, bigram, and trigram extraction using NLTK.

Unigrams: [('this',), ('is',), ('a',), ('sample',), ('text',), ('for',), ('unigram',), (',',), ('bigram',), (',',), ('and',),
('trigram',), ('extraction',), ('using',), ('nltk',), ('.',)]

Bigrams: [('this', 'is'), ('is', 'a'), ('a', 'sample'), ('sample', 'text'), ('text', 'for'), ('for', 'unigram'),
('unigram', ','), (',', 'bigram'), ('bigram', ','), (',', 'and'), ('and', 'trigram'), ('trigram', 'extraction'),
('extraction', 'using'), ('using', 'nltk'), ('nltk', '.')]

Trigrams: [('this', 'is', 'a'), ('is', 'a', 'sample'), ('a', 'sample', 'text'), ('sample', 'text', 'for'), ('text', 'for',
'unigram'), ('for', 'unigram', ','), ('unigram', ',', 'bigram'), (',', 'bigram', ','), ('bigram', ',', 'and'), (',', 'and',
'trigram'), ('and', 'trigram', 'extraction'), ('trigram', 'extraction', 'using'), ('extraction', 'using', 'nltk'),
('using', 'nltk', '.')]

Practical No. 12

Aim: Write a Python program is to evaluate the performance of an information retrieval model
using standard evaluation metrics

Theory:

Information retrieval (IR) model evaluation is crucial for assessing the effectiveness of algorithms in
retrieving relevant information from large datasets. Several key metrics are commonly used to
measure the performance of these models. In the context of the provided program, three
fundamental metrics—Precision, Recall, and F1 Score—are employed.

1. Precision:

Precision is a metric that quantifies the accuracy of the positive predictions made by a retrieval
model. It is calculated as the ratio of true positives to the sum of true positives and false positives.

Precision is particularly relevant in scenarios where the cost of false positives is high, and there is a
need for confidence in the relevance of retrieved documents.

2. Recall:

Recall, also known as sensitivity or true positive rate, measures the ability of a retrieval model to
capture all relevant documents. It is calculated as the ratio of true positives to the sum of true
positives and false negatives.

3. F1 Score:

The F1 Score is the harmonic mean of precision and recall, providing a balanced measure of a
model's overall performance. It takes both false positives and false negatives into account, making it
suitable for scenarios where precision and recall need to be balanced.
Source Code:

from sklearn.metrics import precision_score, recall_score, f1_score

# Sample data (ground truth and predicted relevance)

ground_truth = [1, 0, 1, 0, 1, 1, 0, 0, 1, 1] # Binary relevance labels (1: relevant, 0: non-relevant)

predicted_relevance = [1, 1, 1, 0, 0, 1, 0, 1, 1, 0] # Binary predictions

# Calculate evaluation metrics

precision = precision_score(ground_truth, predicted_relevance)

recall = recall_score(ground_truth, predicted_relevance)

f1 = f1_score(ground_truth, predicted_relevance)

# Print the results

print("Precision:", precision)

print("Recall:", recall)

print("F1 Score:", f1)

Output:

runfile('C:/Users/ckt/untitled4.py', wdir='C:/Users/ckt')

Precision: 0.6666666666666666

Recall: 0.6666666666666666

F1 Score: 0.6666666666666666

You might also like