0% found this document useful (0 votes)
22 views19 pages

Unit IV

The document discusses user search techniques and information visualization, focusing on search statements, similarity measures, relevance feedback, and selective dissemination of information. It explains the binding process for translating user queries into system-compatible formats and various similarity measures for ranking search results. Additionally, it covers the integration of Boolean logic with weighted searches and ranking algorithms to enhance information retrieval systems.

Uploaded by

kuppamhyndu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views19 pages

Unit IV

The document discusses user search techniques and information visualization, focusing on search statements, similarity measures, relevance feedback, and selective dissemination of information. It explains the binding process for translating user queries into system-compatible formats and various similarity measures for ranking search results. Additionally, it covers the integration of Boolean logic with weighted searches and ranking algorithms to enhance information retrieval systems.

Uploaded by

kuppamhyndu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

USER SEARCH TECHNIQUES

AND INFORMATION VISUALIZATION

UNIT-IV
CATALOGING AND INDEXING

User Search Techniques: Search Statements and Binding, Similarity Measures and Ranking,
Relevance Feedback, Selective Dissemination of Information Search, Weighted Searches of
Boolean Systems, Searching the INTERNET and Hypertext

Information Visualization: Introduction to Information Visualization, Cognition and Perception,


Information Visualization Technologies

Search Statements

 Search statements are the elaboration of an information need. It is generated by "users" to


specify the "concepts". i.e. the user may have ability to weight i.e. to different concepts in the
search statements (traditional Boolean logic).
 The search statements directly affect the ability of information retrieval system to find the
relevant items.
 The longer "search query" is easier to find those raw relevant items in the system.
 BINDING: It is based on the vocabulary and past experience of user.
o Binding means when a more abstract form is specified into a more specific form.
o In binding the search statement is passed to user, by a specific search system.
o The search system translates the query to its own metalanguage.
o In the final level of binding:
 The Search is applied to "specific database".
 This binding is based on the statistics of the processing tokens in the database.
 Statistical based on the current contents of the database.
 “Binding” is the process of translating or mapping the user's abstract request
into specific terms and formats that a computer system can process. There are
three levels of binding:
 User-Level Binding – Concept to Words
 System-Level Binding – Query to Search Engine Metalanguage
 Database-Level Binding – Match Query with Data

K. JAYASRI | UNIT – IV | CSM 1 Information Retrieval System (IRS)


USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION

 “Find me information on the impact of oil spills in Alaska on the price of oil.”
 Step 1 – User Binding:
o Extracted terms: impact, oil, spills, Alaska, price, etc.
 Step 2 – System Binding:
o Mapped to synonyms and given weights:
o oil (.606), petroleum (.65) price (.16), cost (.25), value (.10) etc.
 Step 3 – Database Binding:
o Weights adjusted again based on document statistics and indexing semantics of that
particular database.

Similarity Measures and Ranking

 In general, searching is concerned with calculating the “similarity” between “a user’s search
statement” and “the items in the database”.
 The similarity is applied to “total items” (or) logical passages in the item.
 For EXAMPLE, every paragraph may be defined as a passage (or) every 100 words.

K. JAYASRI | UNIT – IV | CSM 2 Information Retrieval System (IRS)


USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
 SIMILARITY MEASURES

 The different similarity measures can be used to calculate the similarity between
“the item and the search statement”.
 The similarity between documents for clustering purposes, i.e.
 Simple Sum of Products

o The above formula uses the summation of the product of the various terms
of two items.
o Calculates similarity by summing the product of corresponding term weights
between two items.
o This is a basic approach but lacks normalization, which can lead to issues
when items vary in length.
 Croft’s Similarity Formula

o C is a tuning constant.
o IDFi is the inverse document frequency, which gives higher weight to rarer
terms.
o where: K is a tuning constant (typically 0.3 to 0.5).
o TFi,j is the frequency of term i in item j.
o maxfreqj is the maximum frequency of any term in item. This formula
adjusts term weights based on their frequency and rarity, improving
relevance.
 Cosine Similarity

o Measures the cosine of the angle between two vectors (document and
query). A value of 1 means the vectors are identical (same direction), and 0
means they are orthogonal (unrelated).
o The denominator normalizes for vector length, ensuring the result is
between 0 and 1.
o A variant (fourth equation) simplifies the denominator but still normalizes
the score.
 Jaccard and Dice Measures
o Jaccard: The denominator depends on the number of common terms,
producing scores between -1 and 1. It penalizes dissimilarities more heavily.

K. JAYASRI | UNIT – IV | CSM 3 Information Retrieval System (IRS)


USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION

– Dice: The Dice measure simplifies the denominator and adds a factor of 2 in the
numerator, also ranging from -1 to 1, but it’s less sensitive to the number of common
terms.

– The simple “sum of the products” similarity formula is used to calculate similarity
between the query and each document. If no threshold is specified, all three documents
are considered hits. If a threshold of 4 is selected, then only DOC1 is returned.

– Query threshold process:


– Threshold defines the items in the relevant + little from the query.
– If no threshold is specified, all the documents are considered here.

K. JAYASRI | UNIT – IV | CSM 4 Information Retrieval System (IRS)


USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
SIM (Q, DOC_1) = 0 x 0 + 0 x 1 + 0 x 1 + 0 x 0 + 1 x 2 + 0 x 0 + 0 x 0 + 1 x 3 + 1 x 1 + 0 x 0
SIM (Q, DOC_1) = 2 + 3 + 1
SIM (Q, DOC_1) = 6

SIM (Q, DOC_2) = 0 x 1 + 0 x 3 + 0 x 3 + 0 x 2 + 1 x 0 + 0 x 1 + 0 x 0 + 1 x 0 + 1 x 0 + 0 x 0


SIM (Q, DOC_2) = 0

SIM (Q, DOC_3) = 0 x 0 + 0 x 0 + 0 x 1 + 1 x 3 + 0 x 3 + 1 x 3 + 0 x 0 + 1 x 0 + 1 x 0 + 0 x 2


SIM (Q, DOC_3) = 3

 Ranking:

 Once items are identified as possibly relevant to the user’s query, it is the best way
to present the most likely relevant items first.
 This process is called “Ranking”.

Relevance Feedback

 Relevance feedback concept was that, the new query should be based on the "old query".
 The old query modified to increase the weight of terms in "relevant items" and decrease the
weight of terms that are in "non-relevant items".
 The first major work on relevance feedback was published in 1965 by Rocchio.
 This technique not only modified the terms in the original query but also allowed expansion of
new terms from the relevant items.
 The revised Rocchio formula for query modification:

K. JAYASRI | UNIT – IV | CSM 5 Information Retrieval System (IRS)


USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION

Positive Feedback Negative feedback


 Negative feedback may help, but in some
Positive feedback is weighted and significantly cases it actually reduces the effectiveness
greater than negative feedback. of a query.

 Many times the positive feedback is used


in relevance environment.
 Positive feedback is more likely to move a
query closer to user's information need.

 This “Impact of relevance feedback “figure visually shows how positive and negative feedback
affect the query’s position in the document space:
 Circles: Represent documents (filled circles are non-relevant, open circles are relevant).
 Oval: The set of items retrieved by the query.
 Solid Box: The original query’s position.
 Hollow Box: The query’s position after feedback.

Example:

K. JAYASRI | UNIT – IV | CSM 6 Information Retrieval System (IRS)


USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION

 Recent experiments with relevance feedback during TREC sessions have shown conclusively the
advantages of relevance feedback.
 Queries using relevance feedback produce significantly better results than queries being
manually enhanced, while user enter queries with few no. of terms, automatic relevance
feedback based on the rank value of that items used.
 This concept in the information system called pseudo-relevance feedback, blind feedback (or
local text analysis). It does not require human relevance feedback.
 Highest ranked items from query are automatically assumed to be relevant.

Selective Dissemination of Information Search

 Selective dissemination of information frequently called “dissemination systems” are becoming


more prevalent with growth of the internet.
 A dissemination system sometimes labeled as a “push system” and whereas a search system
called a “pull system”.
 SDI systems, also called dissemination systems, automatically deliver new items to users when
they match a user’s profile (a static query representing the user’s interests).
 Key characteristics include:
o Push vs. Pull: In SDI, the system pushes relevant items to the user as new data arrives,
unlike search systems where users pull information via queries.
o Profiles: These are static, broad search statements with many terms (hundreds,
compared to a few in ad hoc queries), covering a user’s general information needs.
o Time Parameter: Profiles are active as long as their time parameter is in the future,
delivering items asynchronously to the user’s "mail" file.
 Examples
o Logicon Message Dissemination System (LMDS):
 Treats profiles as a static database and the new item as a query.
 Uses a least frequently occurring trigraph (three-character sequence) algorithm
to quickly eliminate profiles that don’t match the item, then analyzes potential
matches in detail.
o Personal Library Software (PLS):
 Accumulates new items into a database and periodically runs profiles against it,
sacrificing real-time delivery for the use of retrospective search software.
o RetrievalWare and InRoute:
 RetrievalWare: Uses a statistical algorithm that doesn’t rely on corpora data,
comparing each profile to the item after filtering out irrelevant terms.
 InRoute: Uses IDF, which it computes and updates as items arrive, stabilizing
over time as more items are processed.

Weighted Searches of Boolean Systems

 Boolean queries (e.g., "A AND B", "A OR B", "A NOT B") are traditionally strict—they return items
that exactly match the conditions (e.g., both A and B for AND).

K. JAYASRI | UNIT – IV | CSM 7 Information Retrieval System (IRS)


USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
 In weighted index systems (where terms have weights representing their importance) strict
Boolean logic can be too restrictive (AND) or too general (OR), leading to suboptimal retrieval
results.
 Additionally, pure Boolean systems lack ranking, which is critical for user experience in
information retrieval.
 The goal is to combine Boolean logic with weighted systems to allow for fuzziness and ranking.
1. Fuzzy Set Approach (Fox and Sharat)
 Fuzzy sets introduce the concept of "degree of membership" (DEG), allowing
terms to partially belong to a set rather than strictly being in or out. This is
useful for combining Boolean logic with weights.
 Degree of Membership for AND and OR

2. Mixed Min and Max (MMM) Model


 The MMM model calculates the similarity between a query (QUERY) and a
document (DOC) as a linear combination of minimum and maximum weights of
the terms:

3. Paice’s Extension of MMM


 Paice expanded the MMM model by considering all term weights (not just
min/max) and sorting them:
 Similarity is calculated as:

 This method requires more computation due to sorting but provides a more
comprehensive use of weights.
4. P-norm Model
 For OR queries, the origin (all weights = 0) is the worst case; the best documents
are farthest from the origin.
 For AND queries, the ideal point is the unit vector (all weights = 1); the best
documents are closest to this point.

K. JAYASRI | UNIT – IV | CSM 8 Information Retrieval System (IRS)


USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
 Similarity formulas:

5. Salton’s Approach (No Weights in Indexes)


 Salton’s method applies Boolean operations first, then refines results using
weights.

Boolean Operations

Weighted Interpretation

Two circles represent sets A and For AND:


B.
At weight 1.0 for B, only black
Areas: dotted (A AND B) is included.

White:(A NOT B) As B’s weight decreases to 0.0,


white area (A NOT B) is added.
Black dotted:(A AND B)
For NOT:
Grey: (B NOT A)
At weight 0.0 for B, all of A
For OR: (white + black dotted) is
included.
At weight 0.0 for B, only A
(white + black dotted) is As B’s weight increases to 1.0,
included. black dotted (A AND B) is
removed, leaving grey.
As B’s weight increases to 1.0,
grey area (B NOT A) is added
proportionally.

K. JAYASRI | UNIT – IV | CSM 9 Information Retrieval System (IRS)


USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
Example:

K. JAYASRI | UNIT – IV | CSM 10 Information Retrieval System (IRS)


USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
Ranking Algorithms

1. Coarse Grain Ranking


2. Fine Grain Ranking

Ranking Algorithms - Coarse Grain Ranking

Completeness: Semantic Distance Contextual Evidence:

Example: If the query has 5 Synonyms (e.g., “buy” for Example: If the query term is
terms and the item contains 3 “purchase”) increase the score, “charge” with the context of
of them (or their synonyms), while antonyms (e.g., “sell” for “paying for an object,” finding
the completeness is 3/5 = 0.6. “buy”) decrease it. words like “buy,” “purchase,” or
“debt” in the item suggests that
This sets an upper limit on the The closer the semantic “charge” is used in the desired
item’s rank. If query terms are relationship, the more weight is sense, increasing the item’s
weighted (e.g., some terms are added to the ranking. score.
marked as more important),
those weights are factored into This helps disambiguate terms
the score. with multiple meanings (e.g.,
“charge” as in payment vs.
“charge” as in an electrical
charge).

Ranking Algorithms - Fine Grain Ranking

Proximity Impact of Proximity

• A proximity factor is calculated, which is • In long documents, if query terms are


highest when terms are adjacent and widely distributed (e.g., one term at the
decreases as the distance between terms beginning and another at the end), the
increases. fine grain rank can drop significantly,
• Example: If the query terms “machine” even to zero, despite the terms being
and “learning” appear next to each other present.
(e.g., “machine learning”), the item • This reflects the intuition that scattered
scores higher than if they are in different terms are less likely to indicate relevance
paragraphs (e.g., “machine” in the to the query as a whole.
introduction and “learning” in the
conclusion).

K. JAYASRI | UNIT – IV | CSM 11 Information Retrieval System (IRS)


USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
Searching the INTERNET and Hypertext

 Internet Search Mechanisms

The Internet in the 1990s relied on search engines like Yahoo, AltaVista, and Lycos to help users find
information. These search engines worked by:

o Indexing Process: Automated processes (called crawlers or spiders) visited websites,


retrieved text, and created indexes. These indexes stored references to web content,
including URLs (web addresses), to help users locate relevant pages.
 Lycos indexed only the home pages of websites, focusing on a limited scope.
 AltaVista indexed all text on a website, providing a more comprehensive search.
o Ranking Algorithms: Search results were ranked using simple statistical methods based
on word frequency in the indexed text. This helped prioritize results most relevant to
the user's query.
 Six Key Characteristics of Intelligent Agents
Intelligent agents were software programs designed to autonomously search the Internet for
information based on user-defined needs.

1. Autonomy
 Agents operated independently without constant human input, navigating websites
based on predefined criteria to collect relevant information.

2. Communication Ability
 Agents used standard protocols (e.g., Z39.50, a library search protocol) to interact
with websites and retrieve data.

3. Capacity for Cooperation


 Agents could work together or with other systems to achieve common goals, such
as sharing search results.

4. Capacity for Reasoning


 Rule-based: Followed user-defined conditions and actions (e.g., "if a page contains
'AI,' save it").
 Knowledge-based: Learned from past actions to improve future searches.
 Artificial Evolution-based: Created new, smarter agents to handle complex tasks.

5. Adaptive Behavior
 Agents assessed their environment and adjusted their actions to better meet user
needs, combining autonomy and reasoning.

6. Trustworthiness
 Users needed to trust that agents would act in their best interests, retrieving
relevant and accessible information.

K. JAYASRI | UNIT – IV | CSM 12 Information Retrieval System (IRS)


USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
Information Visualization

 Doyle (1962): Proposed "semantic road maps" to give users a visual overview of a database’s
content, allowing them to focus queries on specific themes.

 Miller (1968): Emphasized spatial organization to map database information.


 Sammon (1969): Developed a non-linear mapping algorithm to reveal document associations,
enabling visual "road maps."
 By the 1990s, technological advancements and the explosion of digital information made
visualization a practical and commercially viable field.
 Why Information Visualization Matters ?
o Historically, IR systems have focused on algorithmic improvements (e.g., indexing,
searching, clustering) rather than how information is displayed to users. This was due to:
o Technological limitations: Early systems lacked the capability for sophisticated visual
displays.
o Academic focus: Researchers prioritized algorithmic challenges over human-computer
interface (HCI) design.
o Multidisciplinary complexity: Effective visualization requires integrating cognitive
science, perception, and HCI.
 Visualization can be divided into:
o Link visualization: Shows relationships between items (e.g., networks or connections).
o Attribute visualization: Reveals content patterns across many items (e.g., how search
terms influence results).

Cognition and Perception

 The shift in user-machine interfaces from basic typewriter-like interactions to more complex
systems like WIMP (windows, icons, menus, pointer) interfaces, which handle multiple tasks
simultaneously.
 As computer displays became common, the focus turned to representing information visually in
ways that align with human cognitive processes.
 The goal is to reduce the mental effort (cognitive overhead) users spend finding and
understanding information by leveraging human perception—particularly vision, but also other
senses like audio and touch.
 Background on Vision and Cognition
o Gestalt Psychology: The mind organizes visual input into meaningful wholes using rules:
o Proximity: Nearby objects are grouped together.
o Similarity: Similar objects are grouped together.
o Continuity: Smooth, continuous patterns are preferred (e.g., a circle with a line through
it is seen as a circle and a line, not two half-circles).
o Closure: Gaps are mentally filled to form a whole (e.g., a dashed square is still perceived
as a square).
o Connectedness: Linked elements are seen as a single unit.

K. JAYASRI | UNIT – IV | CSM 13 Information Retrieval System (IRS)


USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
 Aspects of the Visualization Process
o Preattentive Processing

 Detecting the boundary between left-


right arrows and "C" shapes requires
more conscious effort because it
involves different objects, not just
orientations.
 Preattentive processes are
automatic, low-level, and
 The figure shows three groups of preconscious. They detect basic
shapes: upward arrows (left), visual features like borders or
left-right arrows (middle), and orientation changes quickly.
"C" shapes (right).  Implication: Information encoded in
 The visual system quickly detects orientations (e.g., grouping by
the boundary between the direction) is easier to detect than
upward and left-right arrows due using different shapes, as it leverages
to orientation differences the retina’s feature detectors.
(preattentive).

o Rotated Objects and Symmetry

 Recognizing rotated objects (e.g., a


square turned 45 degrees) requires
more cognitive effort than upright
ones, as symmetry is easier to detect
along vertical axes.
 Figure 8.2 (Rotating a Square and
Reversing Letters in “REAL”):
 Implication: Interfaces should use
 The figure shows a square (left)
familiar orientations (e.g.,
and a rotated square (right, now
vertical/horizontal) to reduce cognitive
a diamond).
load.
 It also shows the word "REAL"
with letters reversed.
 The rotated square is harder to
recognize as a square, and
reversed letters are harder to
read because they deviate from
familiar orientations.

K. JAYASRI | UNIT – IV | CSM 14 Information Retrieval System (IRS)


USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
o Optical Illusions and Color
 A light object on a dark background appears larger than a dark object on a light
background. Small objects in displays should use bright colors to stand out.
 Color Use:
 Colors (hue, saturation, lightness) are used to organize and enhance
visuals. Humans are drawn to primary colors (red, blue, green, yellow)
and retain them longer.
 Depth Cues:
 Monocular cues like shading, perspective, and occlusion depict depth.
Brightness enhances depth perception more than contrast.
 Depth/size recognition is learned early (Gibson and Walk, 1960, showed
six-month-olds understand depth), making depth a reliable way to
represent information.
o Configural Effects

 Configural clues allow quick


recognition of abstract
conditions by arranging objects
in recognizable patterns.
 Implication: Configural clues
can substitute high-level
cognitive processes with faster
 Figure 8.3 (Distortion of a Regular low-level visual ones, useful for
Polygon): The figure shows a monitoring systems (e.g.,
square (left) and a distorted square detecting anomalies in
(right) with one side elongated. operations).
 The visual system quickly detects
the distortion because it deviates
from the expected equal-sided
shape.

o Spatial Frequency
 The visual system constructs images from multiple channels (spatial frequency,
orientation, contrast).
 Spatial frequency measures light-dark cycles per degree of visual field.
 Distinct images are easier to process for motion/changes than blurred ones, so
certain spatial frequencies can help highlight patterns in dynamic displays.
o Natural Visual Processing
 The visual system is tuned to real-world patterns like horizontal/vertical
references, subdued colors, and terrain/depth.
 Bright colors in displays mimic natural attention cues (e.g., noticing bright
flowers), and depth-based graphics align with everyday depth processing.
 Implication: Visualizations should mimic real-world sensory experiences to
reduce cognitive effort.

K. JAYASRI | UNIT – IV | CSM 15 Information Retrieval System (IRS)


USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
o Challenges in Visualization
 Context and Bias: Interpretation depends on the user’s background. Recent
experiences can bias perception (e.g., seeing clusters where none exist if
clusters were recently relevant).
 Individual Differences: Past experiences shape how users interpret visuals,
which may differ from the designer’s intent.

Hierarchical Visualization: Cone Tree and Perspective Wall

Cone Tree

 Hierarchical data (e.g., organizational


structures or document clusters) can be
visualized using tree structures.
 However, traditional 2D trees become
cluttered with large datasets, so 3D
representations are used.

Description:

 This is a 3D visualization from the


Information Visualizer (Xerox PARC).
 The tree’s root node is at the apex, with
child nodes arranged in a circular base.
 Each child node can be a parent to another
cone, forming a nested structure.
 Selecting a node rotates it to the front for Purpose: It shows the size and structure of
focus. subtrees, helping users understand the hierarchy
and navigate to documents (represented as
squares at the leaf nodes). Higher nodes may
represent clusters or semantic centroids.

Example Use: In a document retrieval system, this


could represent clusters of related articles, with
leaf nodes as individual documents.

K. JAYASRI | UNIT – IV | CSM 16 Information Retrieval System (IRS)


USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
Perspective Wall

Description:

 The Information Visualizer, this divides


information into three areas:
 a focused central area (e.g., "A GUI: The
What Frame")
 two out-of-focus side areas (e.g.,
"Report," "Letter").
 The wall-like structure provides a
perspective view.

Purpose: It allows users to focus on a specific


area while keeping the broader context visible,
aiding navigation in large datasets.

Example Use: In an IR system, this could display


search results with the central area showing a
selected document cluster, while side areas show
related clusters.

Tree Maps for Hierarchical Data

Tree maps use nested rectangular boxes to


represent hierarchical data, maximizing screen
space.

In this example, the map shows computer-related


articles divided into categories like CPU, OS,
Memory, Network Management, and others
(e.g., Word Processing, Spreadsheet Software).

Purpose: The size of each box reflects the


number of items in a category, and the layout
indicates relationships (e.g., CPU and OS are
grouped under operating systems, separate from
applications).

Example Use: A user searching for computer


articles can quickly see which topics (e.g., OS)
have the most documents and explore related
areas.

K. JAYASRI | UNIT – IV | CSM 17 Information Retrieval System (IRS)


USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
Search Statement Analysis: Understanding Query Impact

Query Window: Displays and allows editing of


the search query.

Graphic View Window: A scatterplot where


each icon (circle or ellipse) represents a
document or cluster. Circles show single items
with relevance weights, while ellipses indicate
clusters.

Item Summary Window: Shows bibliographic


details of selected items.

Purpose: It helps users visualize the


distribution of search results and their
relevance, making it easier to refine queries.

Example Use: A user searching for “machine


learning” can see clusters of documents and
their relevance scores, then adjust the query
based on the scatterplot.

Visualization of Results

From SIGIR ’96, this shows a grid where


columns represent documents and rows
represent query terms (e.g., “affirmative-
action,” “construct-industry”).

The height of bars indicates the weight of each


term in a document.

Purpose: Users can see which terms most


influenced retrieval (by scanning columns) and
how terms contributed overall (by scanning
rows), helping identify underperforming terms
for query refinement.

Example Use: For the query “How affirmative


action affected the construction industry,” a
user might notice “construct” has low impact
and adjust the query to include related terms.

K. JAYASRI | UNIT – IV | CSM 18 Information Retrieval System (IRS)


USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
DCARS Query Histogram

From Calspan’s DCARS system, this


histogram displays documents as rows,
with tile bars showing the contribution of
query terms (e.g., “arms,” “proliferation”)
to retrieval.

The width of each bar reflects term


importance.

From Calspan’s DCARS system, this


histogram displays documents as rows,
with tile bars showing the contribution of
query terms (e.g., “arms,” “proliferation”)
to retrieval.

The width of each bar reflects term


importance.

Cityscape for Thematic Representation

This uses a 3D “city” metaphor where


skyscrapers represent themes or concepts (e.g.,
document clusters).

The height of buildings indicates the


importance or size of the theme, and
connections between buildings show
relationships.

Purpose: It allows users to “move” through the


cityscape, zooming in on specific themes to
reveal more details, providing an intuitive way
to explore complex data.

Example Use: In a news retrieval system, a user


might see a tall building for “oil spill” with
connected buildings for related events,
navigating to explore specific articles.

K. JAYASRI | UNIT – IV | CSM 19 Information Retrieval System (IRS)

You might also like