0% found this document useful (0 votes)

26 views7 pages

IR Pract

Uploaded by

navinpratpsingh007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views7 pages

IR Pract

Uploaded by

navinpratpsingh007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Practical No.

Aim: Calculate Page rank along with Hubs and Authorities.

Theory:

Hubs and Authorities (HITS):

Hubs and Authorities, also known as HITS or Kleinberg's algorithm, is another algorithm for ranking
web pages. It identifies two types of nodes in a network: hubs and authorities.

Hubs are nodes that point to many other nodes. They are seen as good resources for broad
information.

Authorities are nodes that are pointed to by many hubs. They are seen as experts on a specific topic.

The HITS algorithm iteratively updates hub and authority scores based on the link structure of the
graph until convergence.

Source Code:

import networkx as nx

# Create a directed graph (replace this with your own graph)

G = nx.DiGraph()

G.add_edges_from([(1, 2), (1, 3), (2, 3), (3, 1)])

# Calculate PageRank

pagerank_scores = nx.pagerank(G)

# Calculate HITS (Hub and Authority) scores

hits_scores = nx.hits(G)

# Print the results

print("PageRank Scores:", pagerank_scores)

print("Hub Scores:", hits_scores[0])

print("Authority Scores:", hits_scores[1])

Output:

runfile('C:/Users/ckt/untitled0.py', wdir='C:/Users/ckt')

PageRank Scores: {1: 0.387789442707259, 2: 0.21481051315058508, 3: 0.3974000441421556}

Hub Scores: {1: 0.6180339887498948, 2: 0.38196601125010515, 3: 0.0}

Authority Scores: {1: 0.0, 2: 0.3819660112501052, 3: 0.6180339887498949}

Practical No. 10

Aim: Demonstrate a simple web scraping process using Python within the Spyder environment.

Theory:

Web scraping is the process of extracting data or information from websites. It involves accessing
and retrieving the content of web pages, parsing the HTML or XML structure of the page, and then
extracting the desired information. Web scraping is commonly used for various purposes, including
data mining, data analysis, and content aggregation.

The provided code is designed to retrieve the HTML content from a designated URL, utilize the
BeautifulSoup library to parse the HTML, extract the text content, and display it on the console. This
code serves as a fundamental framework for developers to extend and adapt for more sophisticated
web scraping endeavors, catering to specific data extraction needs. It also underscores the
significance of verifying the HTTP response status before advancing with subsequent processing
steps.

Source Code:

import requests

from bs4 import BeautifulSoup

# Specify the URL you want to scrape

url = 'https://google.com'

# Send a GET request to the URL

response = requests.get(url)

# Check if the request was successful (status code 200)

if response.status_code == 200:

# Parse the HTML content of the page

soup = BeautifulSoup(response.text, 'html.parser')

# Find and print the text content (modify as needed based on the HTML structure)

text_content = soup.get_text()

print(text_content)

else:

print(f"Error: Unable to fetch content. Status code: {response.status_code}")

Output:

runfile('C:/Users/ckt/untitled2.py', wdir='C:/Users/ckt')

GoogleSearch Images Maps Play YouTube News Gmail Drive More »Web History | Settings | Sign in
Advanced searchGoogle offered in: हिन्दी বাংলা తెలుగు मराठी தமிழ் ગુજરાતી ಕನ್ನಡ
മലയാളം ਪੰਜਾਬੀ AdvertisingBusiness SolutionsAbout GoogleGoogle.co.in© 2024 - Privacy - Terms

Practical No. 11

Aim: Write a Python program is to perform N-gram analysis, specifically focusing on unigrams,
bigrams, and trigrams, using the Natural Language Toolkit (NLTK).

Theory
In natural language processing and information retrieval, N-grams are contiguous sequences of 'n'
items from a given sample of text or speech. Unigrams, bigrams, and trigrams are specific cases
where 'n' is set to 1, 2, and 3, respectively.

1. Unigrams:

Unigrams are the simplest form of N-grams, representing single words. They capture the most basic
lexical information from a text. In the provided code, unigrams are generated by tokenizing the input
text, resulting in a list of individual words.

2. Bigrams:

Bigrams represent pairs of adjacent words in a sequence. They provide a bit more context than
unigrams by considering the relationships between consecutive words. In the code, bigrams are
created by sliding a window of size 2 over the list of tokens.

3. Trigrams:

Trigrams extend the concept to triples of consecutive words. They offer a higher level of context
compared to bigrams and provide more insight into the structure and flow of language. Trigrams in
the code are generated by considering three consecutive words at a time.

Source Code:

import nltk

from nltk import word_tokenize

from nltk.util import ngrams

# Sample text

text = "This is a sample text for unigram, bigram, and trigram extraction using NLTK."

# Tokenize the text

tokens = word_tokenize(text.lower()) # Converting to lowercase for consistency

# Unigrams

unigrams = list(ngrams(tokens, 1))

# Bigrams

bigrams = list(ngrams(tokens, 2))

# Trigrams

trigrams = list(ngrams(tokens, 3))

# Print the results

print("Original Text:", text)

print("\nUnigrams:", unigrams)

print("\nBigrams:", bigrams)

print("\nTrigrams:", trigrams)

Output:

Original Text: This is a sample text for unigram, bigram, and trigram extraction using NLTK.

Unigrams: [('this',), ('is',), ('a',), ('sample',), ('text',), ('for',), ('unigram',), (',',), ('bigram',), (',',), ('and',),
('trigram',), ('extraction',), ('using',), ('nltk',), ('.',)]

Bigrams: [('this', 'is'), ('is', 'a'), ('a', 'sample'), ('sample', 'text'), ('text', 'for'), ('for', 'unigram'),
('unigram', ','), (',', 'bigram'), ('bigram', ','), (',', 'and'), ('and', 'trigram'), ('trigram', 'extraction'),
('extraction', 'using'), ('using', 'nltk'), ('nltk', '.')]

Trigrams: [('this', 'is', 'a'), ('is', 'a', 'sample'), ('a', 'sample', 'text'), ('sample', 'text', 'for'), ('text', 'for',
'unigram'), ('for', 'unigram', ','), ('unigram', ',', 'bigram'), (',', 'bigram', ','), ('bigram', ',', 'and'), (',', 'and',
'trigram'), ('and', 'trigram', 'extraction'), ('trigram', 'extraction', 'using'), ('extraction', 'using', 'nltk'),
('using', 'nltk', '.')]

Practical No. 12

Aim: Write a Python program is to evaluate the performance of an information retrieval model
using standard evaluation metrics

Theory:

Information retrieval (IR) model evaluation is crucial for assessing the effectiveness of algorithms in
retrieving relevant information from large datasets. Several key metrics are commonly used to
measure the performance of these models. In the context of the provided program, three
fundamental metrics—Precision, Recall, and F1 Score—are employed.

1. Precision:

Precision is a metric that quantifies the accuracy of the positive predictions made by a retrieval
model. It is calculated as the ratio of true positives to the sum of true positives and false positives.

Precision is particularly relevant in scenarios where the cost of false positives is high, and there is a
need for confidence in the relevance of retrieved documents.

2. Recall:

Recall, also known as sensitivity or true positive rate, measures the ability of a retrieval model to
capture all relevant documents. It is calculated as the ratio of true positives to the sum of true
positives and false negatives.

3. F1 Score:

The F1 Score is the harmonic mean of precision and recall, providing a balanced measure of a
model's overall performance. It takes both false positives and false negatives into account, making it
suitable for scenarios where precision and recall need to be balanced.
Source Code:

from sklearn.metrics import precision_score, recall_score, f1_score

# Sample data (ground truth and predicted relevance)

ground_truth = [1, 0, 1, 0, 1, 1, 0, 0, 1, 1] # Binary relevance labels (1: relevant, 0: non-relevant)

predicted_relevance = [1, 1, 1, 0, 0, 1, 0, 1, 1, 0] # Binary predictions

# Calculate evaluation metrics

precision = precision_score(ground_truth, predicted_relevance)

recall = recall_score(ground_truth, predicted_relevance)

f1 = f1_score(ground_truth, predicted_relevance)

# Print the results

print("Precision:", precision)

print("Recall:", recall)

print("F1 Score:", f1)

Output:

runfile('C:/Users/ckt/untitled4.py', wdir='C:/Users/ckt')

Precision: 0.6666666666666666

Recall: 0.6666666666666666

F1 Score: 0.6666666666666666

Outlining Long Quiz
No ratings yet
Outlining Long Quiz
3 pages
British Airways Forage Report
No ratings yet
British Airways Forage Report
12 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
38 pages
Hardness Shore A vs. Shore D - Darwin Microfluidics
No ratings yet
Hardness Shore A vs. Shore D - Darwin Microfluidics
2 pages
23141091,18201115,19301124,19101116 Cse
No ratings yet
23141091,18201115,19301124,19101116 Cse
53 pages
Samaksh Gupta Programming Ass. IR
No ratings yet
Samaksh Gupta Programming Ass. IR
13 pages
HALLIBURTON-MWD-LWD Services Overview
100% (3)
HALLIBURTON-MWD-LWD Services Overview
8 pages
Spam Classifier With LSTM - Ipynb
No ratings yet
Spam Classifier With LSTM - Ipynb
44 pages
Ir Practical Manual 2
No ratings yet
Ir Practical Manual 2
24 pages
EATON SMP 4DP Manual
No ratings yet
EATON SMP 4DP Manual
2 pages
Com - Upgadata.up7723 Logcat
No ratings yet
Com - Upgadata.up7723 Logcat
47 pages
A7 Dsbda Sana
No ratings yet
A7 Dsbda Sana
15 pages
DSBA+Master+Codebook+ +Text+Mining+&+TSF
No ratings yet
DSBA+Master+Codebook+ +Text+Mining+&+TSF
11 pages
Alpine Catalogo 2003
No ratings yet
Alpine Catalogo 2003
10 pages
Natural Language Processing For Hackers
No ratings yet
Natural Language Processing For Hackers
176 pages
CSE 3652 Lab Record Format - PDF
No ratings yet
CSE 3652 Lab Record Format - PDF
13 pages
Governing AI For Humanity
No ratings yet
Governing AI For Humanity
101 pages
2018 M.SC 2nd Sem
No ratings yet
2018 M.SC 2nd Sem
12 pages
JS7 ClassNotes
No ratings yet
JS7 ClassNotes
5 pages
2rd论文
No ratings yet
2rd论文
81 pages
Information Retrival
No ratings yet
Information Retrival
43 pages
SocrAI Day 3
No ratings yet
SocrAI Day 3
43 pages
Ai Lab Final
No ratings yet
Ai Lab Final
21 pages
Draft - R1-2312083 Summary of UE Features For NR NTN - v002 - DCM - HW&HiSi
No ratings yet
Draft - R1-2312083 Summary of UE Features For NR NTN - v002 - DCM - HW&HiSi
23 pages
CCS369 - Text and Speech Analysis
No ratings yet
CCS369 - Text and Speech Analysis
31 pages
Module 5
No ratings yet
Module 5
69 pages
2 NLP Pipeline
No ratings yet
2 NLP Pipeline
57 pages
Final Nikhil Cover - Page - Certi.
No ratings yet
Final Nikhil Cover - Page - Certi.
10 pages
CSIT366-Lab File
No ratings yet
CSIT366-Lab File
17 pages
Language Engineering - Section
No ratings yet
Language Engineering - Section
20 pages
NIJ-0108.01 Ballistic Resistant Protective Materials
100% (1)
NIJ-0108.01 Ballistic Resistant Protective Materials
16 pages
Permission Forms
No ratings yet
Permission Forms
4 pages
IR Practical
No ratings yet
IR Practical
24 pages
Self Evaluation Exercises
No ratings yet
Self Evaluation Exercises
12 pages
Dokumen - Pub - Natural Language Processing Practical Using Transformers With Python
No ratings yet
Dokumen - Pub - Natural Language Processing Practical Using Transformers With Python
275 pages
Methodology
No ratings yet
Methodology
9 pages
Aped For Fake News
No ratings yet
Aped For Fake News
6 pages
Experiment: 1
No ratings yet
Experiment: 1
28 pages
NLP Manual
No ratings yet
NLP Manual
21 pages
Tutorial 8
No ratings yet
Tutorial 8
16 pages
How To Use NLP in Python A Practical Step-by-Step ExampleTo Find Out The In-Demand Skills For Data SC
No ratings yet
How To Use NLP in Python A Practical Step-by-Step ExampleTo Find Out The In-Demand Skills For Data SC
12 pages
Web and Social Media Analytics Lab
No ratings yet
Web and Social Media Analytics Lab
34 pages
Lecture 01.1 Introduction To Website Development
No ratings yet
Lecture 01.1 Introduction To Website Development
22 pages
Unit2 Full
No ratings yet
Unit2 Full
28 pages
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
No ratings yet
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
11 pages
NLP Record
No ratings yet
NLP Record
15 pages
Python NLP
No ratings yet
Python NLP
15 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
Parts of Speech Tagger
No ratings yet
Parts of Speech Tagger
12 pages
Computational Fluid Dynamic Analysis of Innovative Design of Solar-Biomass Hybrid Dryer
No ratings yet
Computational Fluid Dynamic Analysis of Innovative Design of Solar-Biomass Hybrid Dryer
12 pages
Expose 6 PDF
0% (1)
Expose 6 PDF
2 pages
Lecture 8 - Text Analytics NLP
No ratings yet
Lecture 8 - Text Analytics NLP
24 pages
Chat Bot
No ratings yet
Chat Bot
10 pages
Search Bar
No ratings yet
Search Bar
6 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
G Suite Interview Questions
No ratings yet
G Suite Interview Questions
7 pages
20200728204914D5872 - COMP6639 - Session 28 - Natural Language Processing
No ratings yet
20200728204914D5872 - COMP6639 - Session 28 - Natural Language Processing
29 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
WSMA Lab Manual 2
No ratings yet
WSMA Lab Manual 2
8 pages
Glove
100% (1)
Glove
10 pages
NLP Soc
No ratings yet
NLP Soc
15 pages
Statistical Learning and Text Classification With NLTK and Scikit-Learn
No ratings yet
Statistical Learning and Text Classification With NLTK and Scikit-Learn
24 pages
Assessment - 2: - K Mary Nikitha
No ratings yet
Assessment - 2: - K Mary Nikitha
27 pages
Unit 5
No ratings yet
Unit 5
4 pages
6 - Text Vectorization-CSC688-SP22
No ratings yet
6 - Text Vectorization-CSC688-SP22
5 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
4aeee7-Ba25-Ff2e-30d7-63d306a7270 Open Ai Playground Example Prompts - Google Sheets
No ratings yet
4aeee7-Ba25-Ff2e-30d7-63d306a7270 Open Ai Playground Example Prompts - Google Sheets
8 pages
Matlab Demo Instructions
No ratings yet
Matlab Demo Instructions
1 page
AI Zone: Log in Sign Up
No ratings yet
AI Zone: Log in Sign Up
24 pages
Product, Design and Development
No ratings yet
Product, Design and Development
11 pages
03 S4HANA Logistics
No ratings yet
03 S4HANA Logistics
50 pages
Ngram 2x3
No ratings yet
Ngram 2x3
5 pages
Course Project Report For: Artificial Intelligence EL-3011
No ratings yet
Course Project Report For: Artificial Intelligence EL-3011
8 pages
Lab Manual
No ratings yet
Lab Manual
10 pages
Operating Manual Models M10/M11/M20/M22: Pre-Charge Pneumatic Air Rifle
No ratings yet
Operating Manual Models M10/M11/M20/M22: Pre-Charge Pneumatic Air Rifle
13 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
"SCILAB - An Open Source Substitute For MATLAB": Organized By: JNTUH College of Engineering, Sultanpur
No ratings yet
"SCILAB - An Open Source Substitute For MATLAB": Organized By: JNTUH College of Engineering, Sultanpur
4 pages
Phone
0% (1)
Phone
4 pages
CD Expt 3 Implementation of A Lexical Analyzer Using Lex Tool
No ratings yet
CD Expt 3 Implementation of A Lexical Analyzer Using Lex Tool
6 pages
Mastering Python: A Comprehensive Guide for Beginners and Experts
From Everand
Mastering Python: A Comprehensive Guide for Beginners and Experts
Rick Spair
No ratings yet
DESIGN ALGORITHMS TO SOLVE COMMON PROBLEMS: Mastering Algorithm Design for Practical Solutions (2024 Guide)
From Everand
DESIGN ALGORITHMS TO SOLVE COMMON PROBLEMS: Mastering Algorithm Design for Practical Solutions (2024 Guide)
ARCHER PAUL
No ratings yet
Build A Simple Webservice With Delphi 2006 and Microsoft Server 2003 IIS 6.0
No ratings yet
Build A Simple Webservice With Delphi 2006 and Microsoft Server 2003 IIS 6.0
7 pages
Sop Vigilance
No ratings yet
Sop Vigilance
7 pages
Easy Programming for Everyone
From Everand
Easy Programming for Everyone
Umar Asghar
No ratings yet
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet
Lesson Plan On Algebra
No ratings yet
Lesson Plan On Algebra
5 pages
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Fire Fighting Techniques
No ratings yet
Fire Fighting Techniques
3 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
19 - Heating and Ventilating Systems - HVAC
No ratings yet
19 - Heating and Ventilating Systems - HVAC
6 pages