0% found this document useful (0 votes)
12 views75 pages

(Fhi) Chapter 3

Uploaded by

Bikila Dessalegn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views75 pages

(Fhi) Chapter 3

Uploaded by

Bikila Dessalegn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 75

Chapter Three

Information Retrieval and EBM


Chapter Objectives
• At the end of successful completion of the
chapter the students will be able to:

• Explain Information Retrieval and Evidence


Based Medicine (EBM)

• List components of IR

• Describe types of IR
Information Retrieval(IR) and EBM
• Introduction to information retrieval
• Information is processed data like news or
facts about something
• These information can be represented in
the form of:
• Text, image, audio, video
• XML and structured documents
• Source codes
• Applications/web services
Information Retrieval(IR) and EBM…cont’d
• Introduction to information retrieval……
• Retrieval
• Fetching something that has been stored
• Main objective of IR
• Provide the users with effective access to and
interaction with information resources
• Goal of IR
• To search large document collection to retrieve
small subsets relevant to user’s information need
Information Retrieval(IR) and EBM…cont’d
• Introduction to information retrieval…..
• Purpose/role of an IR system
• An information retrieval system is designed to retrieve the
document or information required by the user community
• It should make the right information available to the right
user
• It aims at collecting and organizing information in one or
more subject areas in order to provide it to the user as soon
as possible.
• It serves as a bridge between the world of creators or
generators of information and the users of that information
Information Retrieval(IR) and EBM…cont’d
• Introduction to information retrieval…..
• Application areas within Information
retrieval:
• Cross language retrieval
• Speech/broadcast retrieval
• Text categorization
• Text summarization
• Structured document element retrieval (XML)
Information Retrieval(IR) and EBM…cont’d
• Introduction to information retrieval…..
• Information Retrieval vs information extraction
• Information retrieval
• given a set of terms and a set of document
terms select only the most relevant document
• Information extraction
• extract from the text what the document
means
Information Retrieval(IR) and EBM…cont’d
• Introduction to information retrieval…..
Deference between data Retrieval and information extraction
Parameters Database/data Information retrieval
retrieval
Example Database query www search
What we are Structured data Mostly unstructured
retrieving
Queries we are posing Formally define Expressed in natural
queries, unambiguous language
Matching Exact Partial, best match
Inferences Deduction Induction
Model Deterministic Probabilistic
Information Retrieval(IR) and EBM…cont’d
• Introduction to information retrieval…..
• Kinds of information retrieval systems
• In house information retrieval systems
• are set up by a particular library or information center
to serve mainly the users within the organization
• One particular type of database is the library catalogue
• Online information retrieval systems
• retrieve data from web sites, web pages and servers
that may include data bases, images, texts, tables, etc.
Information Retrieval(IR) and EBM…cont’d
• Introduction to information retrieval…..
• Features of an information retrieval system
• an effective information retrieval system must
have provisions for:
• Prompt dissemination of information
• Filtering of information
• The right amount of information at the right time
• Active switching of information
• Receiving information in an economical way
Information Retrieval(IR) and EBM…cont’d
• Introduction to information retrieval…..
• Features of an information retrieval system
• an effective information retrieval system must have
provisions for:
• Browsing
• Getting information in an economical way
• Current literature
• Access to other information system
• Interpersonal communications
• Personalized help
Information Retrieval(IR) and EBM…cont’d
• What is information retrieval (IR)?
• Very broad term
• is finding material (usually documents) of
unstructured nature (usually text) that satisfies
an information need from within large
collections (usually stored on computers)

• Structured data – relational database


• Unstructured – data which does not have clear,
semantically overt, easy-for-a-computer structure
Information Retrieval(IR) and EBM…cont’d
• What is information retrieval (IR)?
• Hundreds of Millions of people engaged on IR
every day by using web search engines now a
days.

• Is fast becoming the dominant form of


information access, overtaking traditional
database style searching
Information Retrieval(IR) and EBM…cont’d
• The field of information retrieval covers supporting
users in browsing or filtering document collections or
further processing a set of retrieved documents.
• Given a set of document, clustering is the task of
coming up a good grouping of the documents based
on their contents.
• Arranging books on shelf based on the topics
• Classification is the task of deciding which class/es if
any each of sets documents to belongs to.
Information Retrieval(IR) and EBM…cont’d
• Information retrieval systems can be:
• Web search-searching for billions of documents

• Personal information retrieval – filter emails

• Enterprise, institutional and domain specific search


• Search corporates internal document
• Database of patent/research article
Information Retrieval(IR) and EBM…cont’d
• Searching for information needed and relevance
• An information need is the topic about which the
user desires to know more about

• A query is what the user conveys to the computer


in an attempt to communicate the information
need

• A document is relevant if the user perceives that it


contains information of value with respect to their
personal information need
Information Retrieval(IR) and EBM…
cont’d
• Information retrieval system comprised of:
• The indexing system
• The query system
• The index system - analyze the document
downloaded from the web and with the creating
of indexes then that allow search queries to be
made
• The query system – is the search engine’s visible
interface, that is the part the users interact
Information Retrieval(IR) and EBM…cont’d
• History of information retrieval
• IR from 1960-1970’s:
• Initial exploration of text retrieval systems for
“small” corpora of scientific abstracts and law and
business documents
• Development of the basic Boolean and vector-
space models of retrieval
• Prof. Salton and his students at Cornell University
are the leading researchers in the area
Information Retrieval(IR) and EBM…cont’d
•History of information retrieval…..
•IR in 1980’s:
• Large document database system, many
run by companies:
• Lexis-Nexis
• Dialog
• MEDLINE
Information Retrieval(IR) and EBM…cont’d
• History of information retrieval……
• IR in1990’s
• Searching FTPable documents on the internet
• Archie
• WAIS
• Searching the world wide web
• Lycos
• Yahoo
• Altavista
Information Retrieval(IR) and EBM…cont’d
• History of information retrieval……
• IR in 1990’s:…..
• Organized competitions
• NIST TREC
• Recommender systems
• Ringo
• Amazon
• NetPerceptions
• Automated Text Categorization and Clustering
Information Retrieval(IR) and EBM…cont’d
• History of information retrieval……
• IR in2000’s
• Link analysis for Web Search
• Google
• Automated Information Extraction
• Whizbang
• Fetch
• Burning Glass
• Question Answering
• TREC Q/A track
Information Retrieval(IR) and EBM…cont’d
• History of information retrieval……
• IR in2000’s…..
• Multimedia IR
• Image
• Video
• Audio and music
• Cross-language IR
• DARPT Tides
• Document Summarization
• Learning to Rank
Information Retrieval(IR) and EBM…cont’d
• Information retrieval and Related areas
• Database management
• Library and information Science
• Artificial Intelligence
• Natural Language Processing
• Machine learning
Information Retrieval(IR) and EBM…cont’d
• Information retrieval and Related areas……
• Database Management
• Focused on structured data stored in relational
tables rather than free-form text.
• Focused on efficient processing of well-
defined queries in a formal language (SQL).
• Clearer semantics for both data and queries.
• Recent move towards semi-structured data
(XML) brings it closer to IR.
Information Retrieval(IR) and EBM…cont’d
• Information retrieval and Related areas……
• Library and Information Science
• Focused on the human user aspects of information
retrieval (human-computer interaction, user
interface, visualization).
• Concerned with effective categorization of human
knowledge.
• Concerned with citation analysis and bibliometrics
(structure of information).
• Recent work on digital libraries brings it closer to
Computer Science & IR.
Information Retrieval(IR) and EBM…cont’d
• Information retrieval and Related areas……
• Artificial Intelligence
• Focused on the representation of knowledge,
reasoning, and intelligent action.
• Formalisms for representing knowledge and
queries:
• First-order Predicate Logic
• Bayesian Networks
• Recent work on web ontologies and intelligent
information agents brings it closer to IR.
Information Retrieval(IR) and EBM…cont’d
• Information retrieval and Related areas……
• Natural Language Processing
• Focused on the syntactic, semantic, and
pragmatic analysis of natural language text
and discourse.
• Ability to analyze syntax (phrase structure) and
semantics could allow retrieval based on
meaning rather than keywords.
Information Retrieval(IR) and EBM…cont’d
• Information retrieval and Related areas……
• Natural Language Processing: IR Directions
• Methods for determining the sense of an
ambiguous word based on context (word sense
disambiguation).
• Methods for identifying specific pieces of
information in a document (information
extraction).
• Methods for answering specific natural language
questions from document corpora or structured
data like FreeBase or Google’s Knowledge Graph.
Information Retrieval(IR) and EBM…cont’d
• Information retrieval and Related areas……
• Machine Learning
• Focused on the development of computational
systems that improve their performance with
experience.
• Automated classification of examples based on
learning concepts from labeled training examples
(supervised learning).
• Automated methods for clustering unlabeled
examples into meaningful groups (unsupervised
learning)
Information Retrieval(IR) and EBM…cont’d
• Information retrieval and Related areas……
• Machine Learning: IR Directions
• Text Categorization
• Automatic hierarchical classification (Yahoo).
• Adaptive filtering/routing/recommending.
• Automated spam filtering.
• Text Clustering
• Clustering of IR query results.
• Automatic formation of hierarchies (Yahoo).
• Learning for Information Extraction
• Text Mining
• Learning to Rank
Information Retrieval(IR) and EBM…
cont’d
• What is Evidence Based Medicine (EBM)?
• is the application of the best available
research to clinical care, which requires the
integration of evidence with clinical expertise
and patient values.

• is conscientious, explicit and judicious use of


current best evidence in making decisions
about the care of individual patients.
Information Retrieval(IR) and EBM…cont’d
• What is Evidence Based Medicine (EBM)?.....
• Requires the decisions about health and social
care are based on the best available, current,
valid and relevant evidence.

• The decisions should be made by those


receiving care informed by the tacit and
explicit knowledge of those providing care,
within the context of available resources.
Information Retrieval(IR) and EBM…
cont’d
• What is Evidence Based Medicine (EBM)?.....
• Best available research (patient oriented
research):
• Illuminate the accuracy and precision of diagnostic
tests
• Highlights the importance of prognostic markers
• Establish the efficacy and safety of therapeutic,
rehabilitative or preventive healthcare strategies
• Seeks to understand the patient experience
Information Retrieval(IR) and EBM…
cont’d
• What is Evidence Based Medicine (EBM)?.....
• In EBM the physicians:
• use their clinical skills and prior experiences to
rapidly identify each patient’s unique clinical
situation
• apply the evidence tailored to the individual’s risks
versus benefits of potential interventions.
• The ultimate goal of EBM is to support the
patient by contextualizing the evidence with
their preference, concerns and expectations.
Information Retrieval(IR) and EBM…cont’d
• What is Evidence Based Medicine (EBM)?
• Applying EBM properly will:
• result the potential to be a great equalizer
• striving for equitable care for patients in
disparate part of the world
• Play a prominent role in policy making
• politicians are increasingly speaking to their use
of research evidence to inform their decision-
making as a declaration of legitimacy
Information Retrieval(IR) and EBM…
cont’d
Effective EBM
comprised of the Individual
combination of: clinical
expertise

EBM
Patient Best
values and external
expectation evidence
Information Retrieval(IR) and EBM…cont’d
• What is Evidence Based Medicine (EBM)?
• The three fundamental principles of EBM are:
A/ Optimal clinical decision making requires
awareness of the evidence
B/ EBM provides guidance to decide whether
evidence is more or less trustworthy
• How confident we be:
• of the properties of diagnostic tests
• of our patients prognoses or of the impact
of our therapeutic options
Information Retrieval(IR) and EBM…cont’d
C/ Evidence alone is never sufficient to make a
clinical decision
• Decision makers must always trade off:

• the benefits and risks, burden, and costs


associated with alternative management
strategies

• doing consideration of their patients’ unique


predicament and values and preferences
Components of Information Retrieval
• Information retrieval is concerned with:
• representing, searching, and manipulating large collections
of electronic text and other human-language data
• Before conducting a search:
• a user has an information need, which underlies and drives
the search process.
• As a result of the information need:
• the user constructs and issues a query to the IR system.
• Query consists of:
• a smaller number of terms, with two or three terms being
typical for a Web search.
Components of Information Retrieval…cont’d
• Depending on the information need, a query
term may be a date, a number, a musical note,
or a phrase.
• For example:
• the term “inform*” might match any word starting
with that prefix
• Informs
• Informal
• Informant
• Informative
• etc.
Components of Information Retrieval…cont’d
• Users typically issue simple keyword queries but, IR
systems often support a richer query syntax,
frequently with complex Boolean and pattern
matching operators.
• These facilities may be used to limit a search to a
particular Web site, to specify constraints on fields
such as author and title, or to apply other filters,
restricting the search to a subset of the collection.
• A user interface mediates between the user and the
IR system, simplifying the query creation process
when these richer query facilities are required.
Components of Information Retrieval…cont’d
Components of Information Retrieval…cont’d
• The user’s query is processed by a search engine, which may
be running:
• on the user’s local machine
• on a large cluster of machines in a remote geographic location
• or anywhere in between the two
• A major task of a search engine is to maintain and
manipulate an inverted index for a document collection.
• the index forms the principal data structure used by the engine for
searching and relevance ranking.
• As its basic function, an inverted index provides a mapping
between terms and the locations in the collection in which
they occur.
Components of Information Retrieval…cont’d
• To support relevance ranking algorithms, the
search engine maintains collection statistics
associated with the index such as:
• the number of documents containing each term
• the length of each document
• Using the inverted index, collection statistics,
and other data, the search engine accepts
queries from its users, processes these queries,
and returns ranked lists of results.
Components of Information Retrieval…cont’d
• To perform relevance ranking
• the search engine computes a score, sometimes called a
retrieval status value (RSV), for each document.
• After sorting documents according to their scores, the
result list must be subjected to further processing, such
as the removal of duplicate or redundant results.

• For example, a Web search engine might report only


one or results from a single host or domain, eliminating
the others in favor of pages from different sources.
Components of Information Retrieval…cont’d
•Traditional IRS
• Three major components of Traditional IRS
• Document subsystem
• Acquisition, Representation, File organization
• User subsystem
• Problem, Representation, Query
• Searching /Retrieval subsystem
• Matching, Retrieved objects
Components of Information Retrieval…cont’d
Components of Information Retrieval…cont’d
• An information retrieval system has three major components
• the document subsystem
• the users subsystem
• the searching/retrieval subsystem
• Functions of each IR System:
• Analysis of documents and organization of information (creation of
a document database)
• Analysis of user’s queries, preparation of a strategy to search the
database
• Actual searching or matching of users queries with the database,
and finally
• Retrieval of items that fully or partially match the search
statement
Components of Information Retrieval…cont’d

• Acquisition (Document subsystem)


• Selection of documents & other objects from
various web resources.
• Mostly text based documents
• full texts, titles, abstracts...
• but also other objects:
• data, statistics, images, maps, trade marks, sounds ...
• The data are collected by web crawler and
stored in data base.
Components of Information Retrieval…cont’d
• Representation of documents, objects(document subsystem)
• Indexing – many ways:
• free text terms (even in full texts)
• controlled vocabulary - thesaurus
• manual & automatic techniques.
• Abstracting; summarizing
• Bibliographic description:
• author, title, sources, date…
• metadata
• Classifying, clustering
• Organizing in fields & limits
• Basic Index, Additional Index. Limits
Components of Information Retrieval…cont’d
• File organization (Document subsystem)
• Sequential
• record (document) by record
• Inverted
• term by term; list of records under each term
• Combination
• indexes inverted, documents sequential
• When citation retrieved only, need for document
files
• Large file approaches
• for efficient retrieval by computers
Components of Information Retrieval…cont’d
• Problem (user subsystem)
• Related to users’ task, situation
• vary in specificity, clarity
• Produces information need
• ultimate criterion for effectiveness of retrieval
• how well was the need met?
• Information need for the same problem may
change, evolve, shift during the IR process
adjustment in searching
• often more than one search for same problem over
time
Components of Information Retrieval…cont’d
• Representation (user subsystem)
• Converting a concept to query.
• What we search for.
• These are stemmed and corrected using dictionary.
• Focus toward a good result
• Subject to feedback changes
Components of Information Retrieval…cont’d
• Query - search statement (user & system)
• Translation into systems requirements & limits
• start of human-computer interaction
• query is the thing that goes into the computer
• Selection of files, resources
• Search strategy - selection of:
• search terms & logic
• possible fields, delimiters
• controlled & uncontrolled vocabulary
• variations in effectiveness tactics
• Reiterations from feedback
• several feedback types:
• relevance feedback, magnitude feedback..
• query expansion & modification
Components of Information Retrieval…cont’d
• Matching - searching (Searching subsystem)
• Process of matching, comparing
• search: what documents in the file match the query as stated?
• Various search algorithms:
• exact match - Boolean
• still available in most, if not all systems
• best match - ranking by relevance
• increasingly used e.g. on the web
• hybrids incorporating both
• e.g. Target, Rank in DIALOG
• Each has strengths, weaknesses
• No ‘perfect’ method exists and probably never will
Components of Information Retrieval…cont’d
• Retrieved documents -from system to user (IR
Subsystem)
• Various order of output:
• Last In First Out (LIFO); sorted
• ranked by relevance
• ranked by other characteristics
• Various forms of output
• When citations only: possible links to document
delivery
• Base for relevance, utility evaluation by users
• Relevance feedback
Components of Information Retrieval…cont’d
• ISSUES IN IR
• Information retrieval is concerned with:
• representing, searching, and manipulating
large collections of electronic text and other
human-language data.
• Three Big Issues in IR
• Relevance
• Evaluation
• Emphasis on users and their information needs
Components of Information Retrieval…cont’d
• Relevance
• It is the fundamental concept in IR.
• A relevant document contains the information that a
person was looking for when s/he submitted a query to
the search engine.
• There are many factors that go into a person’s decision as
to whether a document is relevant.
• These factors must be taken into account when designing
algorithms for comparing text and ranking documents.
• Simply comparing the text of a query with the text of a
document and looking for an exact match, as might be
done in a database system produces very poor results in
Components of Information Retrieval…cont’d
• To address the issue of relevance, retrieval models are
used.
• A retrieval model is a formal representation of the
process of matching a query and a document. It is
the basis of the ranking algorithm that is used in a
search engine to produce the ranked list of
documents.
• A good retrieval model will find documents that are
likely to be considered relevant by the person who
submitted the query.
Components of Information Retrieval…cont’d
• To address the issue of relevance, retrieval
models are used…...
• The retrieval models used in IR typically model
the statistical properties of text rather than
the linguistic structure.
• For example, the ranking algorithms are
concerned with the counts of word occurrences
than whether the word is a noun or an adjective.
Components of Information Retrieval…cont’d
•Evaluation
• Two of the evaluation measures are
precision and recall.
• Precision is the proportion of retrieved
documents that are relevant.
• Recall is the proportion of relevant
documents that are retrieved.
Components of Information Retrieval…cont’d
• Emphasis on users and their information needs
• The users of a search engine are the ultimate judges of
quality.
• Text queries are often poor descriptions of what the user
actually wants compared to the request to a database
system, such as for the balance of a bank account.
• Despite their lack of specificity, one-word queries are very
common in web search.
• Techniques such as query suggestion, query expansion
and relevance feedback use interaction and context to
refine the initial query in order to produce better ranked
results
Components of Information Retrieval…cont’d
• Main problems
• Document and query indexing
• How to represent their contents?
• Query evaluation
• To what extent does a document correspond to a
query?
• System evaluation
• How good is a system?
• Are the retrieved documents relevant? (precision)
• Are all the relevant documents retrieved? (recall)
Components of Information Retrieval…cont’d
• Why is IR difficult?
• Vocabularies mismatching
• The language can be used to express the same
concepts in many different ways, with different words.
This is referred to as the vocabulary mismatch problem
in information retrieval.
• E.g. Synonymy: car vs automobile
• Queries are ambiguous
• Content representation may be inadequate and
incomplete
• The user is the ultimate judge, but we don’t know how the
judge judges.
Components of Information Retrieval…cont’d
•Challenges in IR
• Scale, distribution of documents

• Controversy over the unit of indexing

• High heterogeneity

• Retrieval strategies
Types of information Retrieval
• Retrieval
• the two broad approaches to information
retrieval are:
• Exact-match searching
• allows the user precise control over the items
retrieved
• Partial-match searching,
• recognizes the inexact nature of both indexing and
retrieval
• instead attempts to return the user content ranked
by how close it comes to the user’s query
Types of Information Retrieval…cont’d
• Exact-Match Retrieval
• In exact-match searching, the IR system gives the
user all documents that exactly match the criteria
specified in the search statement(s).
• Uses Boolean searching - the Boolean operators
AND, OR, and NOT are usually required to create a
manageable set of documents.
• Most of the early operational IR systems in the
1950s through the 1970s used the exact-match
approach
Types of Information Retrieval…cont’d
• In modern times, exact-match searching tends to
be associated with retrieval from bibliographic
and annotated databases, while the partial-
match approach tends to be used with full-text
searching.
• Typically the first step in exact-match retrieval is
to select terms to build sets.
• use attributes, such as the author name,
publication type, or gene identifier (in the
secondary source identifier field of MEDLINE)
Types of Information Retrieval…cont’d
• Once the search term(s) and attribute(s) have been selected,
they are combined with the Boolean operators.
• The Boolean AND operator is typically used to narrow a
retrieval set to contain only documents with two or more
concepts.
• The Boolean OR operator is usually used when there is more
than one way to express a concept.
• The Boolean NOT operator is often employed as a
subtraction operator that must be applied to another set.
• Some systems more accurately call this the ANDNOT
operator.
Types of Information Retrieval…cont’d
• Some retrieval systems allow terms in searches to be
expanded by using the wild-card character, which adds
all words to the search that begin with the letters up
until the wild-card character. This approach is also called
truncation.
• Unfortunately, there is no standard approach to using
wild-card characters, so syntax for them varies from
system to system.
• PubMed, for example, allows a single asterisk at the end
of a word to signify a wild-card character. Thus, the
query word can* will lead to the words cancer and
Candid, among others, being added to the search
Types of Information Retrieval…cont’d
• Partial-Match Retrieval
• Although partial-match searching was
conceptualized very early, it did not see
widespread use in IR systems until the advent
of Web search engines in the 1990s.

• This is most likely because exact-match


searching tends to be preferred by “power
users” whereas partial-match searching is
preferred by novice searchers.
Types of Information Retrieval…cont’d
• Whereas exact-match searching requires an
understanding of Boolean operators and (often)
the underlying structure of databases
• e.g., the many fields in MEDLINE), partial-match
searching allows a user to simply enter a few terms
and start retrieving documents.
• Although partial-match searching does not
exclude the use of non-term attributes of
documents, and for that matter does not even
exclude the use of Boolean operators
Types of Information Retrieval…cont’d
• the most common use of this type of searching is
with a query of a small number of words, also known
as a natural language query.
• In the partial-match approach, documents are
typically ranked by their closeness of fit to the query.
• That is, documents containing more query terms will
likely be ranked higher, since those with more query
terms will in general be more likely to be relevant to
the user. As a result this process is called relevance
ranking.
• Self evaluation questions
• Explain information retrieval and evidence based medicine?

• What are the types of information retrieval?

• List the components of information retrieval?

You might also like