IR Code Document 1
IR Code Document 1
Information Retrieval (IR) is the process of obtaining relevant information from large
repositories, typically in response to a user query. IR systems are used in search engines,
digital libraries, and databases. IR differs from data retrieval in that it deals with
unstructured data such as text and multimedia. Key components include indexing, querying,
and ranking. IR is fundamental to how we access information in the digital age.
```python
docs = ["IR is the foundation of search engines.", "Information retrieval uses indexing."]
index = {}
for i, doc in enumerate(docs):
for word in doc.lower().split():
index.setdefault(word.strip('.'), []).append(i)
print(index)
```
```python
query = "retrieval"
results = index.get(query, [])
print(f"Documents containing '{query}':", results)
```
IR systems often use techniques like stop-word removal, stemming, and lemmatization.
Libraries like `nltk` can help with preprocessing.
Advanced IR incorporates TF-IDF, word embeddings, and neural networks to capture the
semantic meaning of queries and documents.
IR Basics with Python
Information Retrieval (IR) is about finding relevant documents from a large collection. In
Python, simple IR systems can be built using standard libraries.
```python
docs = ["IR is fun", "Information retrieval is important"]
index = {}
for i, doc in enumerate(docs):
for word in doc.lower().split():
index.setdefault(word, []).append(i)
print(index)
```
This builds a basic inverted index mapping terms to document IDs.