0% found this document useful (0 votes)

2 views6 pages

Ir Task

The document outlines a lab activity on Boolean retrieval methods in information retrieval, comparing approaches with and without preprocessing. It details the processes of creating inverted indexes, performing Boolean queries, and the impact of text preprocessing techniques like stemming and stopword removal on search accuracy. The conclusion emphasizes that preprocessing enhances search efficiency and accuracy by standardizing text input.

Uploaded by

mariamafzaal45

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views6 pages

Ir Task

Uploaded by

mariamafzaal45

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

DEPARTMENT OF CREATIVE TECHNOLOGIES

NAME: MARIAM AFZAAL

REG ID: 231139
CLASS: BS AI IV ‘B’
SUBJECT: INFORMATION RETRIEVAL
SUBMITTED TO: MA’AM FAIZA QAMAR
LAB ACTIVITY:

Boolean Retrieval without Preprocessing:

1. index_without_preprocessing = {}
for doc_id, text in Chapter_1.items():
for word in text.split():
index_without_preprocessing.setdefault(word, []).append(doc_id)

Explanation:

• It loops through all documents in Chapter_1 and processes each word.

• The words are directly stored in index_without_preprocessing without any
modi cations.
• Each word is mapped to a list of document IDs in which it appears.

2. allah_docs = set(index_without_preprocessing.get("Allah", []))

compassionate_docs = set(index_without_preprocessing.get("Compassionate", []))
result = allah_docs & compassionate_docs
print("Without Preprocessing (Allah & Compassionate):", result)

Explanation:

• Finds documents containing "Allah" and "Compassionate" separately.

• Uses the & (AND) operator to nd the intersection of both sets.
• The result is a set of document IDs containing both words.

3. def create_inverted_index(documents):
inv_index = {}
for doc_id, text in documents.items():
for word in text.lower().split():
word = word.strip(".,!?")
inv_index.setdefault(word, set()).add(doc_id)
return inv_index

Explanation:

• Converts all words to lowercase.

• Removes punctuation marks like . , ! ?.
• Stores document IDs in a dictionary with words as keys.

4. def boolean_retrieval(query, inv_index):

query_terms = query.lower().split()
if "and" in query_terms:
term_1_docs = inv_index.get(query_terms[0], set())
term_2_docs = inv_index.get(query_terms[2], set())
return term_1_docs & term_2_docs
elif "or" in query_terms:
term_1_docs = inv_index.get(query_terms[0], set())
term_2_docs = inv_index.get(query_terms[2], set())
return term_1_docs | term_2_docs
fi
fi
else:
return inv_index.get(query_terms[0], set())
Explanation:

• Converts the query to lowercase.

• If the query contains "AND", it nds the intersection of documents.
• If the query contains "OR", it nds the union of documents.
• If there is only one word, it returns the documents containing that word.

5. inv_index = create_inverted_index(Chapter_1)
print("\nInverted Index:\n", inv_index)

Explanation:

• Calls create_inverted_index() to generate the inverted index with

preprocessing.
• Prints the generated index

6. query = "compassionate and merciful"

results = boolean_retrieval(query, inv_index)
print(f"\nBoolean Retrieval Results for '{query}': {results}")

Explanation:
• Retrieves documents containing both "compassionate" and "merciful" using AND.

7. query = "compassionate or merciful"

results = boolean_retrieval(query, inv_index)
print(f"\nBoolean Retrieval Results for '{query}': {results}")

Explanation:
• Retrieves documents containing either "compassionate" or "merciful" using Orr
fi
fi
Boolean Retrieval with Preprocessing:
1.import re
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords
import nltk

nltk.download('stopwords')
nltk.download(‘punkt')

Explanation:

• re is used for text cleaning.

• PorterStemmer is used for stemming words to their root form.
• stopwords are common words (like "and", "the") that are removed.
• nltk.download() ensures required resources are available.

2. stemmer = PorterStemmer()
stop_words = set(stopwords.words('english'))

Explanation:

• PorterStemmer() is initialized for word stemming.

• stopwords.words('english') loads a list of common words to ignore.

3.def preprocess_text(text):
text = text.lower() # Convert text to lowercase
text = re.sub(r'[^\w\s]', '', text) # Remove punctuation
return text

Explanation:

• Converts text to lowercase to ensure case insensitivity.

• Removes punctuation marks using regex.

4.def create_inverted_index_with_preprocessing(chapter):
index = {}
for doc_id, text in chapter.items():
text = preprocess_text(text)
for word in text.split():
if word not in stop_words and word: # Stop word removal and empty word check
stemmed_word = stemmer.stem(word) # Stemming
index.setdefault(stemmed_word, []).append(doc_id)
return index

Explanation:

• Preprocesses the text using preprocess_text().

• Splits text into words.
• Removes stopwords and performs stemming.
• Stores words in index along with document IDs.
5.def boolean_retrieval_preprocessed(query, index):
terms = query.lower().split()
processed_terms = [stemmer.stem(term) for term in terms if term not in stop_words and term]

if not processed_terms:
return set()…

Explanation:

• Converts query to lowercase and removes stopwords.

• Stems each query word.
• If "AND" is present, it nds documents common to all terms.
• If "OR" is present, it nds documents containing any of the terms.
• If only one word is given, it returns matching documents.

6.inv_index_preprocessed = create_inverted_index_with_preprocessing(Chapter_1)
print("\nInverted Index with Preprocessing:\n", inv_index_preprocessed)

Explanation:

• Calls create_inverted_index_with_preprocessing() to generate the

index.
• Prints the processed inverted index.

7.query = "compassionate and merciful"

results = boolean_retrieval_preprocessed(query, inv_index_preprocessed)
print(f"\nBoolean Retrieval Results for '{query}': {results}”)

Explanation:

Retrieves documents containing both "compassionate" and "merciful".

fi
fi
CONCLUSION:
1.Without preprocessing Boolean retrieval:

• The text is used as it is, without changing the case, removing punctuation, or ltering
common words.
• This can give inaccurate results because “Allah” and “allah” would be treated as different
words.
2.With preprocessing Boolean retrieval:

• The text is cleaned (lowercased, punctuation removed, stopwords removed, and words
stemmed to their root forms).
• This makes the search more accurate and ef cient, nding documents even with slight word
variations.
fi
fi
fi

Introduction To Information Rertrieval Answer
100% (4)
Introduction To Information Rertrieval Answer
6 pages
Inverted Index-Unit-3
No ratings yet
Inverted Index-Unit-3
11 pages
Ir Journal
No ratings yet
Ir Journal
41 pages
Ir Op 6
No ratings yet
Ir Op 6
2 pages
IR Practical 1
No ratings yet
IR Practical 1
5 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
2.boolean Retrieval Model
No ratings yet
2.boolean Retrieval Model
40 pages
Tamrakar 2015
No ratings yet
Tamrakar 2015
6 pages
Ir 1
No ratings yet
Ir 1
14 pages
Vanessaa Wim
No ratings yet
Vanessaa Wim
9 pages
Certificate: T.Y.Bsc Cs
No ratings yet
Certificate: T.Y.Bsc Cs
120 pages
Cs 3308 Unit 7 Programming Assignment
No ratings yet
Cs 3308 Unit 7 Programming Assignment
8 pages
01 Intro
No ratings yet
01 Intro
145 pages
Programming Assignment Unit 05 - CS 3308 - Information Retrieval - University of The People
No ratings yet
Programming Assignment Unit 05 - CS 3308 - Information Retrieval - University of The People
9 pages
IR Journal (Printable)
No ratings yet
IR Journal (Printable)
20 pages
Module 1-1
No ratings yet
Module 1-1
12 pages
Ir 2 Inverted Files
No ratings yet
Ir 2 Inverted Files
2 pages
Chapter 1: Boolean Retrieval
No ratings yet
Chapter 1: Boolean Retrieval
9 pages
IR Prac 1
No ratings yet
IR Prac 1
3 pages
IR Journal
No ratings yet
IR Journal
36 pages
Lab3 IR BIM
No ratings yet
Lab3 IR BIM
14 pages
IR Unit 2
No ratings yet
IR Unit 2
54 pages
IR Merged Merged
No ratings yet
IR Merged Merged
132 pages
Assignment 4
No ratings yet
Assignment 4
11 pages
600 Computer Mcqs
No ratings yet
600 Computer Mcqs
23 pages
Module 5 (NLP)
No ratings yet
Module 5 (NLP)
30 pages
20 Tolerantretrieval
No ratings yet
20 Tolerantretrieval
39 pages
Lecture 4
No ratings yet
Lecture 4
48 pages
Unit I
No ratings yet
Unit I
83 pages
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
No ratings yet
Cs8080 Ir Unit2 I Modeling and Retrieval Evaluation
42 pages
Lecture1 Intro Handout 1 Per
No ratings yet
Lecture1 Intro Handout 1 Per
57 pages
Lec2 BooleanRetrieval 1
No ratings yet
Lec2 BooleanRetrieval 1
61 pages
Lec 1 IR
No ratings yet
Lec 1 IR
42 pages
IR - Midsem Question Paper - 2024 - Solutionfull
No ratings yet
IR - Midsem Question Paper - 2024 - Solutionfull
7 pages
Unit 4 Source Code
No ratings yet
Unit 4 Source Code
11 pages
Made By:-Bhawana Agarwal Cs Iiiyr
No ratings yet
Made By:-Bhawana Agarwal Cs Iiiyr
29 pages
Information Retrieval (CS6370) : Maunendra Sankar Desarkar
No ratings yet
Information Retrieval (CS6370) : Maunendra Sankar Desarkar
44 pages
Query Languages
No ratings yet
Query Languages
54 pages
Unit 2 Irt
No ratings yet
Unit 2 Irt
33 pages
2 - Text Operation - 1
No ratings yet
2 - Text Operation - 1
28 pages
IR - 754 All Practical
No ratings yet
IR - 754 All Practical
21 pages
2-Boolean IR and Indexing
No ratings yet
2-Boolean IR and Indexing
46 pages
Supervisionguide16 17 Students
No ratings yet
Supervisionguide16 17 Students
17 pages
Lecture 2 - Boolean Retrieval
No ratings yet
Lecture 2 - Boolean Retrieval
49 pages
Assignment 2 IR
No ratings yet
Assignment 2 IR
6 pages
115 Ir 9
No ratings yet
115 Ir 9
4 pages
Information Retrieval - 1
No ratings yet
Information Retrieval - 1
47 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
31 pages
2T-Inverted Index
No ratings yet
2T-Inverted Index
54 pages
Lect 3 Inverted Index
No ratings yet
Lect 3 Inverted Index
24 pages
Lecture 3 - Terms, Postings, Dictionaries, and Tolerant Retrieval
No ratings yet
Lecture 3 - Terms, Postings, Dictionaries, and Tolerant Retrieval
77 pages
CS 3308 Programming Assignment Unit 4
No ratings yet
CS 3308 Programming Assignment Unit 4
7 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
69 pages
Supervisionguide15 16 Students
No ratings yet
Supervisionguide15 16 Students
18 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
50 pages
Lecture1-Intro - Realted To Ch1
No ratings yet
Lecture1-Intro - Realted To Ch1
60 pages
Lecture02 - IR
No ratings yet
Lecture02 - IR
36 pages
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet
SQL Interview Success From Beginner To Pro
From Everand
SQL Interview Success From Beginner To Pro
Shana
No ratings yet
Academic Librarianship
No ratings yet
Academic Librarianship
4 pages
Ice Student Handbook 2023 2024 Final 02-11-23
No ratings yet
Ice Student Handbook 2023 2024 Final 02-11-23
31 pages
Slides Nest
No ratings yet
Slides Nest
26 pages
Information Retrieval System
No ratings yet
Information Retrieval System
10 pages
Archival Science
No ratings yet
Archival Science
18 pages
Advantages & Disadvantages Ofa
No ratings yet
Advantages & Disadvantages Ofa
2 pages
Tele Medicine-Budi Wiweko
No ratings yet
Tele Medicine-Budi Wiweko
9 pages
Abstract As Reference Source
No ratings yet
Abstract As Reference Source
12 pages
Cat & Class Dip
No ratings yet
Cat & Class Dip
12 pages
Information Retrieval Thesis Topics
100% (3)
Information Retrieval Thesis Topics
6 pages
Lis 221 - Lesson 1
0% (1)
Lis 221 - Lesson 1
13 pages
Final Cdacc Timetable For July 2025 Series
No ratings yet
Final Cdacc Timetable For July 2025 Series
6 pages
GNS 101 Week 6
No ratings yet
GNS 101 Week 6
6 pages
National Libraries
No ratings yet
National Libraries
9 pages
Data and Information Quiz
No ratings yet
Data and Information Quiz
3 pages
Theory Assignment
No ratings yet
Theory Assignment
4 pages
MLIS, Details Aiou
No ratings yet
MLIS, Details Aiou
40 pages
Development of A Personalized E-Learning Model Using Methods of Ontology
No ratings yet
Development of A Personalized E-Learning Model Using Methods of Ontology
8 pages
Administration of Special Libraries
100% (1)
Administration of Special Libraries
7 pages
Keyword Extraction
No ratings yet
Keyword Extraction
2 pages
Salary Dataset
No ratings yet
Salary Dataset
30 pages
09 Graduation Clearance
No ratings yet
09 Graduation Clearance
1 page
BS Prospectus Revised
No ratings yet
BS Prospectus Revised
192 pages
Kalibrasi - PT Sinar Alam Permai
No ratings yet
Kalibrasi - PT Sinar Alam Permai
4 pages
Iraqi Portal of Knowledge and Heritage With Format Edits - 11-21-2023
No ratings yet
Iraqi Portal of Knowledge and Heritage With Format Edits - 11-21-2023
6 pages
REFERENCES
No ratings yet
REFERENCES
3 pages
Opac Real
No ratings yet
Opac Real
27 pages
GST 103 POOL OF QuESTIONS
No ratings yet
GST 103 POOL OF QuESTIONS
18 pages
Model Perilaku Pencarian Informasi Analisis Teori
No ratings yet
Model Perilaku Pencarian Informasi Analisis Teori
14 pages
Topic 3 - Technical - Information Services in Information Agency
No ratings yet
Topic 3 - Technical - Information Services in Information Agency
12 pages

Ir Task

Uploaded by

Ir Task

Uploaded by

DEPARTMENT OF CREATIVE TECHNOLOGIES

NAME: MARIAM AFZAAL

Boolean Retrieval without Preprocessing:

• It loops through all documents in Chapter_1 and processes each word.

2. allah_docs = set(index_without_preprocessing.get("Allah", []))

• Finds documents containing "Allah" and "Compassionate" separately.

• Converts all words to lowercase.

4. def boolean_retrieval(query, inv_index):

• Converts the query to lowercase.

• Calls create_inverted_index() to generate the inverted index with

6. query = "compassionate and merciful"

7. query = "compassionate or merciful"

• re is used for text cleaning.

• PorterStemmer() is initialized for word stemming.

• Converts text to lowercase to ensure case insensitivity.

• Preprocesses the text using preprocess_text().

• Converts query to lowercase and removes stopwords.

• Calls create_inverted_index_with_preprocessing() to generate the

7.query = "compassionate and merciful"

Retrieves documents containing both "compassionate" and "merciful".

You might also like