Ir Task
Ir Task
Explanation:
Explanation:
3. def create_inverted_index(documents):
inv_index = {}
for doc_id, text in documents.items():
for word in text.lower().split():
word = word.strip(".,!?")
inv_index.setdefault(word, set()).add(doc_id)
return inv_index
Explanation:
5. inv_index = create_inverted_index(Chapter_1)
print("\nInverted Index:\n", inv_index)
Explanation:
Explanation:
• Retrieves documents containing both "compassionate" and "merciful" using AND.
Explanation:
• Retrieves documents containing either "compassionate" or "merciful" using Orr
fi
fi
Boolean Retrieval with Preprocessing:
1.import re
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords
import nltk
nltk.download('stopwords')
nltk.download(‘punkt')
Explanation:
2. stemmer = PorterStemmer()
stop_words = set(stopwords.words('english'))
Explanation:
3.def preprocess_text(text):
text = text.lower() # Convert text to lowercase
text = re.sub(r'[^\w\s]', '', text) # Remove punctuation
return text
Explanation:
4.def create_inverted_index_with_preprocessing(chapter):
index = {}
for doc_id, text in chapter.items():
text = preprocess_text(text)
for word in text.split():
if word not in stop_words and word: # Stop word removal and empty word check
stemmed_word = stemmer.stem(word) # Stemming
index.setdefault(stemmed_word, []).append(doc_id)
return index
Explanation:
if not processed_terms:
return set()…
Explanation:
6.inv_index_preprocessed = create_inverted_index_with_preprocessing(Chapter_1)
print("\nInverted Index with Preprocessing:\n", inv_index_preprocessed)
Explanation:
Explanation:
• The text is used as it is, without changing the case, removing punctuation, or ltering
common words.
• This can give inaccurate results because “Allah” and “allah” would be treated as different
words.
2.With preprocessing Boolean retrieval:
• The text is cleaned (lowercased, punctuation removed, stopwords removed, and words
stemmed to their root forms).
• This makes the search more accurate and ef cient, nding documents even with slight word
variations.
fi
fi
fi