Welcome to Scribd!

0% found this document useful (0 votes)

26 views

Introduction To Information Retrieval - by William Scott - Medium

Uploaded by

This document introduces information retrieval and provides an overview of key concepts. It defines information retrieval as finding relevant materials to satisfy an information need, typically text documents. It discusses how information retrieval systems take queries, understand them, search a document corpus, and return relevant results. The document also explains why simple keyword searches are not enough due to issues like synonyms and homographs, and how intelligent information retrieval models address these problems.

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Introduction To Information Retrieval - by William Scott - Medium

Uploaded by

KrishanSingh

0% found this document useful (0 votes)

26 views4 pages

Original Title

Introduction to Information Retrieval _ by William Scott _ Medium

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

0% found this document useful (0 votes)

26 views4 pages

Introduction To Information Retrieval - by William Scott - Medium

Uploaded by

KrishanSingh

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

Jump to Page

You are on page 1of 4

Search inside document

Get started Open in app

William Scott

Follow 121 Followers About

Introduction to Information Retrieval

William Scott Feb 3, 2019 · 3 min read

This is a series on Information Retrieval techniques with implementation basic concepts

and easily understandable examples.

For those who are highly interested, i suggest the book “Introduction to Information
Retrieval” book by Manning

Click here to checkout the git repo.

Information Retrieval Series:

1. Introduction

2. Unigram Indexing & Positional Indexing

3. TF-IDF

More to come…

What is Information Retrieval?

Information Retrieval, just as the name suggests is retrieval of information. What we
basically do in this is refine the retrieval of information just so that we can satisfy an
information need.
so, we can sum up information retrieval as

Finding relevant materials to satisfy an information

need.
Few points to be noted here is that when we say we want to find materials we basically
mean documents, specifically text documents. And the text in those documents are
highly unstructured. If you are still not sure what a text document could be, just think of
it as a website, for the time being.

So what an IR system does is, it takes the query from user, understands it, searches it in its
corpus and sends the results of the relevant documents.

Why cant we do ctrl+f?

we are saying that we want to find and find, so why not just build a program to search
for a query, if it exists in document or not? we could do that but it need not work. the
reasons are as follows

Synonyms: There are many words which have alternative words. for example, when
a user is trying to get a haircut, his search query could be “salon” or “barber”. we
cannot just show him the documents which have barber and which doesn’t have
salon. because they both mean the same thing. More general examples are Mom —
Mother, Hat — Cap.

Homographs: These are the words which have the same spelling but have different
meaning in different sentences. we do not basically deal with the pronunciation of
words here. Lie — can be lying on bed, or lying to another person. tear — could be
tearing a paper, or having tears (as in crying). Apple — Could be a company or a
fruit.

so due to these above problems, we need to build an intelligent IR model which can
understand the query of the user and give the relevant documents. do not worry about
the above problems, we will basically deal with them later, just as a gist, we deal with
this by going through a important stage called, preprocessing, where the information is
turned into a more general form which can help us relate the words much better.

Intelligent IR
When we are trying to retrieve relevant documents, we need to first define relevance.
are we going to retrieve the latest documents? or are we going to retrieve the documents
which match the subject?

An Intelligent IR model do not just depend on one factor to find out relevance,
metadata, authoritativeness, type of information need, meaning of the query,
meaning of the sentence in the document and many such factors are considered.

Basic Terminology:
Collection / Corpus: collection of documents

Query: Information need

Token: An individual entity as word / set of characters

Rank: Relevance of a document in some measure.

Information Retrieval Series:

1. Introduction

2. Unigram Indexing & Positional Indexing

3. TF-IDF

More to come…

Resources:
Introduction to Information Retrieval — Manning

Search Information Retrieval Artificial Intelligence NLP Introduction

About Help Legal

Get the Medium app

IT Application in Business
Document1 page
IT Application in Business
BALATICO, JHOANNA R.
No ratings yet
Unit5 NLP RNP
Document112 pages
Unit5 NLP RNP
savidahegaonkar7
No ratings yet
Sec 1
Document36 pages
Sec 1
Olivia Michel
No ratings yet
Chapter One
Document25 pages
Chapter One
habentsegay30
No ratings yet
Search Manifesto
From Everand
Search Manifesto
Rajan Manickavasagam
No ratings yet
Module 6
Document5 pages
Module 6
Aren
No ratings yet
Skimming Scanning: Purpose
Document14 pages
Skimming Scanning: Purpose
irish bautista
No ratings yet
Data Vs Information: Knowledge
Document7 pages
Data Vs Information: Knowledge
EASSA
No ratings yet
Differences Between Data Information and Knowledge 1
Document4 pages
Differences Between Data Information and Knowledge 1
acem16098
No ratings yet
NLP Practicals
Document54 pages
NLP Practicals
RAPTER GAMING
No ratings yet
Ins1502 Fi Concession 2023
Document3 pages
Ins1502 Fi Concession 2023
Ashwill Reece
No ratings yet
2 Data, Information, Knowledge, Belief, and Truth
Document14 pages
2 Data, Information, Knowledge, Belief, and Truth
Muna S
No ratings yet
1 IR Chapter-One
Document47 pages
1 IR Chapter-One
Balem 09
No ratings yet
Information Technology: International Advanced Level Scheme of Work Unit 1
Document11 pages
Information Technology: International Advanced Level Scheme of Work Unit 1
Munesh
No ratings yet
Open Coding
Document9 pages
Open Coding
cs001483
No ratings yet
IR Chapter 1
Document32 pages
IR Chapter 1
abdelaj087
No ratings yet
Etech Module 3
Document16 pages
Etech Module 3
Arlene Flor
No ratings yet
Open Coding PDF
Document9 pages
Open Coding PDF
Hendro Try Widianto
No ratings yet
Empowerment Technologies-11 - Q1 - W3 - M3 - LDS - Contextualized-Online-Search-and-ICT-Tools-in-Online-Research-2 - ALG - RTP
Document7 pages
Empowerment Technologies-11 - Q1 - W3 - M3 - LDS - Contextualized-Online-Search-and-ICT-Tools-in-Online-Research-2 - ALG - RTP
Dy O Sa
No ratings yet
LESSON 3 Contextualized Online Search and Research Student
Document62 pages
LESSON 3 Contextualized Online Search and Research Student
Jeanew TheFox
No ratings yet
Lesson 3 Media
Document6 pages
Lesson 3 Media
julzhaide
No ratings yet
Text Analytics
Document34 pages
Text Analytics
shahazadi
100% (1)
Introduction Email Mining Final
Document18 pages
Introduction Email Mining Final
Guman Singh
No ratings yet
Data Collection Statistics
Document18 pages
Data Collection Statistics
Gabriel Belmonte
No ratings yet
Brief Comments About Qualitative Data Analysis
Document10 pages
Brief Comments About Qualitative Data Analysis
Treymax Sikan
No ratings yet
QUESTION #2.1: What Is The Difference Between Primary Research and Secondary Research?
Document7 pages
QUESTION #2.1: What Is The Difference Between Primary Research and Secondary Research?
Nicholas Henry
No ratings yet
Unit 4 NLP
Document29 pages
Unit 4 NLP
brewcoder6113
No ratings yet
Observe - Observing The Results of The Plan
Document6 pages
Observe - Observing The Results of The Plan
Alfya Aft
No ratings yet
IRS Notes
Document10 pages
IRS Notes
Mohammad Zaid Ansari
No ratings yet
Chapter 4
Document25 pages
Chapter 4
Rica Pearl Zorilla
No ratings yet
Unit:: A. Text Mining Algorithms
Document21 pages
Unit:: A. Text Mining Algorithms
shabir Ahmad
No ratings yet
Bi Intro All PDF
Document40 pages
Bi Intro All PDF
PUSHPU SINGH
No ratings yet
Lect 1
Document15 pages
Lect 1
honey singh
No ratings yet
Applications of AI unit 2
Document16 pages
Applications of AI unit 2
junglesafari90
No ratings yet
5 Examples of Difference Between Data & Information?
Document2 pages
5 Examples of Difference Between Data & Information?
Ola
No ratings yet
Objectives of Information Retrieval
Document5 pages
Objectives of Information Retrieval
ggf
No ratings yet
Information or Data
Document5 pages
Information or Data
Cheng Villaren
No ratings yet
Module - 3 - Empowerment Technology - Week 3-4
Document11 pages
Module - 3 - Empowerment Technology - Week 3-4
Jay Dhel
60% (5)
4.1.5.named Entity Recognition
Document11 pages
4.1.5.named Entity Recognition
DHARMESH SHRIVASTAVA
No ratings yet
Definition of Information
Document3 pages
Definition of Information
kuldeep choudhary
100% (1)
The Differences Between Data, Information and Knowledge
Document37 pages
The Differences Between Data, Information and Knowledge
Cristina
No ratings yet
ISB - BA - W9 - Video Transcripts
Document12 pages
ISB - BA - W9 - Video Transcripts
subscription.piyali
No ratings yet
Brm-Unit III Data Collection - Notes
Document17 pages
Brm-Unit III Data Collection - Notes
pondi123
No ratings yet
Emp Tech
Document31 pages
Emp Tech
Ax Cel
No ratings yet
IJCER (WWW - Ijceronline.com) International Journal of Computational Engineering Research
Document4 pages
IJCER (WWW - Ijceronline.com) International Journal of Computational Engineering Research
International Journal of computational Engineering research (IJCER)
No ratings yet
ChatGPT_MyLearning on Knowledge and Information Retrieval
Document20 pages
ChatGPT_MyLearning on Knowledge and Information Retrieval
tbudiono
No ratings yet
Chapter 1 Introduction To ISR
Document39 pages
Chapter 1 Introduction To ISR
Aaron Melendez
No ratings yet
Defining Knowledge PDF
Document9 pages
Defining Knowledge PDF
Kristine Mae Cayubit Vencio
No ratings yet
NLP Mod-5
Document17 pages
NLP Mod-5
sharonsajan438690
No ratings yet
Dikw
Document6 pages
Dikw
NIKNISH
No ratings yet
Databases 01 - Data Vs Information
Document4 pages
Databases 01 - Data Vs Information
Nelson Barasa
No ratings yet
Sources, Acquisition and Classification of Data
Document6 pages
Sources, Acquisition and Classification of Data
Preeti Balhara
No ratings yet
2 - Q1 Emp Technology
Document23 pages
2 - Q1 Emp Technology
Clarissa Hugasan
No ratings yet
Full PDF
Document154 pages
Full PDF
Quỳnh Mỹ Vũ
No ratings yet
Data Information Knowledge Wisdom
Document7 pages
Data Information Knowledge Wisdom
son phan
100% (1)
Lecture 7
Document126 pages
Lecture 7
suchi9may
No ratings yet
IIIP Issue Information On Information Jan09
Document13 pages
IIIP Issue Information On Information Jan09
Guus Pijpers
No ratings yet
Week 8 IT Era LITE WITH HIGHLIGHT
Document36 pages
Week 8 IT Era LITE WITH HIGHLIGHT
arjaycabrera116
No ratings yet
Handsworth Institute of Health Sciences and Technology
Document5 pages
Handsworth Institute of Health Sciences and Technology
Lisabel luzendi
No ratings yet
What Is Collection of Data - Methods, Types & Everything You Should Know
Document16 pages
What Is Collection of Data - Methods, Types & Everything You Should Know
Valentina Timofeev
No ratings yet
Advanced SQL Features: Immanuel Trummer
Document25 pages
Advanced SQL Features: Immanuel Trummer
KrishanSingh
No ratings yet
CatBoost vs. Light GBM vs. XGBoost - by Alvira Swalin - Towards Data Science
Document10 pages
CatBoost vs. Light GBM vs. XGBoost - by Alvira Swalin - Towards Data Science
KrishanSingh
No ratings yet
17 Statistical Hypothesis Tests in Python (Cheat Sheet)
Document44 pages
17 Statistical Hypothesis Tests in Python (Cheat Sheet)
KrishanSingh
No ratings yet
PE Project EVCS
Document5 pages
PE Project EVCS
KrishanSingh
No ratings yet
Ripple Factor:: VM 1V, VDC 1/ Vac 0.385588 1.21136 1.21
Document1 page
Ripple Factor:: VM 1V, VDC 1/ Vac 0.385588 1.21136 1.21
KrishanSingh
No ratings yet
Bawana RD, Shahbad Daulatpur Village, Rohini, Delhi, 110042: Experiment # 1
Document35 pages
Bawana RD, Shahbad Daulatpur Village, Rohini, Delhi, 110042: Experiment # 1
KrishanSingh
No ratings yet
VANET Simulation
Document43 pages
VANET Simulation
cthomas1990
No ratings yet
Chapter 4 Securities Operations and Risk Management PDF
Document84 pages
Chapter 4 Securities Operations and Risk Management PDF
MRIDUL GOEL
No ratings yet
Chapter 1 - SEPARATION
Document41 pages
Chapter 1 - SEPARATION
hafizulhakim02
No ratings yet
Genagen CAB (Version 2013)
Document2 pages
Genagen CAB (Version 2013)
punyamputt2811
100% (1)
Text Book References (Stewart)
Document3 pages
Text Book References (Stewart)
A
No ratings yet
Steam Turbine Gas Turbine and Dual Fuel
Document37 pages
Steam Turbine Gas Turbine and Dual Fuel
Jomeru- Senpai
No ratings yet
PPG - Module 2 - 2ND Sem - 2ND Quarter - Grade 11 - Humss Bonifacio-Agoncillo - MR - Paombong - MRS Cuenca
Document12 pages
PPG - Module 2 - 2ND Sem - 2ND Quarter - Grade 11 - Humss Bonifacio-Agoncillo - MR - Paombong - MRS Cuenca
Arnold Paombong
100% (1)
PSY101 Assignment 01 Solve
Document2 pages
PSY101 Assignment 01 Solve
mustafamughal673
No ratings yet
SNMP Install and Configure Service On Windows Server and Windows 10 - 11 - Windows OS Hub
Document13 pages
SNMP Install and Configure Service On Windows Server and Windows 10 - 11 - Windows OS Hub
infoferhq
No ratings yet
Marissa Natzke - Resume
Document3 pages
Marissa Natzke - Resume
api-433163868
No ratings yet
Chapter 7 Practice Test
Document9 pages
Chapter 7 Practice Test
Lala Jafarova
No ratings yet
Womeninministry
Document34 pages
Womeninministry
zanele
No ratings yet
Omni and Multi Channel
Document10 pages
Omni and Multi Channel
buran
No ratings yet
Audiovox Vme-9120ts
Document32 pages
Audiovox Vme-9120ts
adrian
No ratings yet
Detailed Courses: Rank Booster Questions Practice JEE Main and Advanced Level
Document4 pages
Detailed Courses: Rank Booster Questions Practice JEE Main and Advanced Level
Akansha Rawat
No ratings yet
Basic Java Chapter 4
Document9 pages
Basic Java Chapter 4
zyrus-mojica-6477
No ratings yet
Instant Download Pharmacology A Patient Centered Nursing Process Approach 9th Edition Mccuistion Solutions Manual PDF Full Chapter
Document32 pages
Instant Download Pharmacology A Patient Centered Nursing Process Approach 9th Edition Mccuistion Solutions Manual PDF Full Chapter
TonyaJohnsonDDSrzjt
100% (18)
Goldengate12 2 X Cert Matrix 2769360
Document18 pages
Goldengate12 2 X Cert Matrix 2769360
repakulakishore
No ratings yet
Zipcar: Redefining Its Business Model
Document11 pages
Zipcar: Redefining Its Business Model
shmuup1
50% (2)
Hachalu Hundessa Campus IOT Department of Information Technology
Document10 pages
Hachalu Hundessa Campus IOT Department of Information Technology
mikeyas meseret
No ratings yet
Term 1 - Maths
Document10 pages
Term 1 - Maths
Gaming Triad
No ratings yet
Test Document: Date Tester Test Notes Results
Document8 pages
Test Document: Date Tester Test Notes Results
aruna777
No ratings yet
PerDev11-q2-W8-Career Development-V4
Document25 pages
PerDev11-q2-W8-Career Development-V4
cherryochoco13
No ratings yet
Edmt281536 FHQ Davma, RZQS, RZQ Series
Document156 pages
Edmt281536 FHQ Davma, RZQS, RZQ Series
Phanhai Kaka
No ratings yet
Unit 2 (World History)
Document10 pages
Unit 2 (World History)
Kwkkkkkkk
100% (1)
Philippine Forests
Document37 pages
Philippine Forests
Ness G. Mamasabulod
No ratings yet
Development of Bakery and Confectionery Products Using Ghee Residue October PDF
Document107 pages
Development of Bakery and Confectionery Products Using Ghee Residue October PDF
S.K. Akhtarzai
0% (1)
Red Hat Enterprise Linux 7 - High Availability Add-On Administration
Document28 pages
Red Hat Enterprise Linux 7 - High Availability Add-On Administration
sarasasasa
No ratings yet
Arrhythmias Types, Pathophysiology Atf
Document9 pages
Arrhythmias Types, Pathophysiology Atf
Amir mohammad moori Mohammadi
No ratings yet