0% found this document useful (0 votes)

22 views19 pages

Unit IV

The document discusses user search techniques and information visualization, focusing on search statements, similarity measures, relevance feedback, and selective dissemination of information. It explains the binding process for translating user queries into system-compatible formats and various similarity measures for ranking search results. Additionally, it covers the integration of Boolean logic with weighted searches and ranking algorithms to enhance information retrieval systems.

Uploaded by

kuppamhyndu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views19 pages

Unit IV

Uploaded by

kuppamhyndu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

USER SEARCH TECHNIQUES

AND INFORMATION VISUALIZATION

UNIT-IV
CATALOGING AND INDEXING

User Search Techniques: Search Statements and Binding, Similarity Measures and Ranking,
Relevance Feedback, Selective Dissemination of Information Search, Weighted Searches of
Boolean Systems, Searching the INTERNET and Hypertext

Information Visualization: Introduction to Information Visualization, Cognition and Perception,

Information Visualization Technologies

Search Statements

 Search statements are the elaboration of an information need. It is generated by "users" to

specify the "concepts". i.e. the user may have ability to weight i.e. to different concepts in the
search statements (traditional Boolean logic).
 The search statements directly affect the ability of information retrieval system to find the
relevant items.
 The longer "search query" is easier to find those raw relevant items in the system.
 BINDING: It is based on the vocabulary and past experience of user.
o Binding means when a more abstract form is specified into a more specific form.
o In binding the search statement is passed to user, by a specific search system.
o The search system translates the query to its own metalanguage.
o In the final level of binding:
 The Search is applied to "specific database".
 This binding is based on the statistics of the processing tokens in the database.
 Statistical based on the current contents of the database.
 “Binding” is the process of translating or mapping the user's abstract request
into specific terms and formats that a computer system can process. There are
three levels of binding:
 User-Level Binding – Concept to Words
 System-Level Binding – Query to Search Engine Metalanguage
 Database-Level Binding – Match Query with Data

K. JAYASRI | UNIT – IV | CSM 1 Information Retrieval System (IRS)

USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION

 “Find me information on the impact of oil spills in Alaska on the price of oil.”
 Step 1 – User Binding:
o Extracted terms: impact, oil, spills, Alaska, price, etc.
 Step 2 – System Binding:
o Mapped to synonyms and given weights:
o oil (.606), petroleum (.65) price (.16), cost (.25), value (.10) etc.
 Step 3 – Database Binding:
o Weights adjusted again based on document statistics and indexing semantics of that
particular database.

Similarity Measures and Ranking

 In general, searching is concerned with calculating the “similarity” between “a user’s search
statement” and “the items in the database”.
 The similarity is applied to “total items” (or) logical passages in the item.
 For EXAMPLE, every paragraph may be defined as a passage (or) every 100 words.

K. JAYASRI | UNIT – IV | CSM 2 Information Retrieval System (IRS)

USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
 SIMILARITY MEASURES

 The different similarity measures can be used to calculate the similarity between
“the item and the search statement”.
 The similarity between documents for clustering purposes, i.e.
 Simple Sum of Products

o The above formula uses the summation of the product of the various terms
of two items.
o Calculates similarity by summing the product of corresponding term weights
between two items.
o This is a basic approach but lacks normalization, which can lead to issues
when items vary in length.
 Croft’s Similarity Formula

o C is a tuning constant.
o IDFi is the inverse document frequency, which gives higher weight to rarer
terms.
o where: K is a tuning constant (typically 0.3 to 0.5).
o TFi,j is the frequency of term i in item j.
o maxfreqj is the maximum frequency of any term in item. This formula
adjusts term weights based on their frequency and rarity, improving
relevance.
 Cosine Similarity

o Measures the cosine of the angle between two vectors (document and
query). A value of 1 means the vectors are identical (same direction), and 0
means they are orthogonal (unrelated).
o The denominator normalizes for vector length, ensuring the result is
between 0 and 1.
o A variant (fourth equation) simplifies the denominator but still normalizes
the score.
 Jaccard and Dice Measures
o Jaccard: The denominator depends on the number of common terms,
producing scores between -1 and 1. It penalizes dissimilarities more heavily.

K. JAYASRI | UNIT – IV | CSM 3 Information Retrieval System (IRS)

USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION

– Dice: The Dice measure simplifies the denominator and adds a factor of 2 in the
numerator, also ranging from -1 to 1, but it’s less sensitive to the number of common
terms.

– The simple “sum of the products” similarity formula is used to calculate similarity
between the query and each document. If no threshold is specified, all three documents
are considered hits. If a threshold of 4 is selected, then only DOC1 is returned.

– Query threshold process:

– Threshold defines the items in the relevant + little from the query.
– If no threshold is specified, all the documents are considered here.

K. JAYASRI | UNIT – IV | CSM 4 Information Retrieval System (IRS)

USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
SIM (Q, DOC_1) = 0 x 0 + 0 x 1 + 0 x 1 + 0 x 0 + 1 x 2 + 0 x 0 + 0 x 0 + 1 x 3 + 1 x 1 + 0 x 0
SIM (Q, DOC_1) = 2 + 3 + 1
SIM (Q, DOC_1) = 6

SIM (Q, DOC_2) = 0 x 1 + 0 x 3 + 0 x 3 + 0 x 2 + 1 x 0 + 0 x 1 + 0 x 0 + 1 x 0 + 1 x 0 + 0 x 0

SIM (Q, DOC_2) = 0

SIM (Q, DOC_3) = 0 x 0 + 0 x 0 + 0 x 1 + 1 x 3 + 0 x 3 + 1 x 3 + 0 x 0 + 1 x 0 + 1 x 0 + 0 x 2

SIM (Q, DOC_3) = 3

 Ranking:

 Once items are identified as possibly relevant to the user’s query, it is the best way
to present the most likely relevant items first.
 This process is called “Ranking”.

Relevance Feedback

 Relevance feedback concept was that, the new query should be based on the "old query".
 The old query modified to increase the weight of terms in "relevant items" and decrease the
weight of terms that are in "non-relevant items".
 The first major work on relevance feedback was published in 1965 by Rocchio.
 This technique not only modified the terms in the original query but also allowed expansion of
new terms from the relevant items.
 The revised Rocchio formula for query modification:

K. JAYASRI | UNIT – IV | CSM 5 Information Retrieval System (IRS)

USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION

Positive Feedback Negative feedback

 Negative feedback may help, but in some
Positive feedback is weighted and significantly cases it actually reduces the effectiveness
greater than negative feedback. of a query.

 Many times the positive feedback is used

in relevance environment.
 Positive feedback is more likely to move a
query closer to user's information need.

 This “Impact of relevance feedback “figure visually shows how positive and negative feedback
affect the query’s position in the document space:
 Circles: Represent documents (filled circles are non-relevant, open circles are relevant).
 Oval: The set of items retrieved by the query.
 Solid Box: The original query’s position.
 Hollow Box: The query’s position after feedback.

Example:

K. JAYASRI | UNIT – IV | CSM 6 Information Retrieval System (IRS)

USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION

 Recent experiments with relevance feedback during TREC sessions have shown conclusively the
advantages of relevance feedback.
 Queries using relevance feedback produce significantly better results than queries being
manually enhanced, while user enter queries with few no. of terms, automatic relevance
feedback based on the rank value of that items used.
 This concept in the information system called pseudo-relevance feedback, blind feedback (or
local text analysis). It does not require human relevance feedback.
 Highest ranked items from query are automatically assumed to be relevant.

Selective Dissemination of Information Search

 Selective dissemination of information frequently called “dissemination systems” are becoming

more prevalent with growth of the internet.
 A dissemination system sometimes labeled as a “push system” and whereas a search system
called a “pull system”.
 SDI systems, also called dissemination systems, automatically deliver new items to users when
they match a user’s profile (a static query representing the user’s interests).
 Key characteristics include:
o Push vs. Pull: In SDI, the system pushes relevant items to the user as new data arrives,
unlike search systems where users pull information via queries.
o Profiles: These are static, broad search statements with many terms (hundreds,
compared to a few in ad hoc queries), covering a user’s general information needs.
o Time Parameter: Profiles are active as long as their time parameter is in the future,
delivering items asynchronously to the user’s "mail" file.
 Examples
o Logicon Message Dissemination System (LMDS):
 Treats profiles as a static database and the new item as a query.
 Uses a least frequently occurring trigraph (three-character sequence) algorithm
to quickly eliminate profiles that don’t match the item, then analyzes potential
matches in detail.
o Personal Library Software (PLS):
 Accumulates new items into a database and periodically runs profiles against it,
sacrificing real-time delivery for the use of retrospective search software.
o RetrievalWare and InRoute:
 RetrievalWare: Uses a statistical algorithm that doesn’t rely on corpora data,
comparing each profile to the item after filtering out irrelevant terms.
 InRoute: Uses IDF, which it computes and updates as items arrive, stabilizing
over time as more items are processed.

Weighted Searches of Boolean Systems

 Boolean queries (e.g., "A AND B", "A OR B", "A NOT B") are traditionally strict—they return items
that exactly match the conditions (e.g., both A and B for AND).

K. JAYASRI | UNIT – IV | CSM 7 Information Retrieval System (IRS)

USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
 In weighted index systems (where terms have weights representing their importance) strict
Boolean logic can be too restrictive (AND) or too general (OR), leading to suboptimal retrieval
results.
 Additionally, pure Boolean systems lack ranking, which is critical for user experience in
information retrieval.
 The goal is to combine Boolean logic with weighted systems to allow for fuzziness and ranking.
1. Fuzzy Set Approach (Fox and Sharat)
 Fuzzy sets introduce the concept of "degree of membership" (DEG), allowing
terms to partially belong to a set rather than strictly being in or out. This is
useful for combining Boolean logic with weights.
 Degree of Membership for AND and OR

2. Mixed Min and Max (MMM) Model

 The MMM model calculates the similarity between a query (QUERY) and a
document (DOC) as a linear combination of minimum and maximum weights of
the terms:

3. Paice’s Extension of MMM

 Paice expanded the MMM model by considering all term weights (not just
min/max) and sorting them:
 Similarity is calculated as:

 This method requires more computation due to sorting but provides a more
comprehensive use of weights.
4. P-norm Model
 For OR queries, the origin (all weights = 0) is the worst case; the best documents
are farthest from the origin.
 For AND queries, the ideal point is the unit vector (all weights = 1); the best
documents are closest to this point.

K. JAYASRI | UNIT – IV | CSM 8 Information Retrieval System (IRS)

USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
 Similarity formulas:

5. Salton’s Approach (No Weights in Indexes)

 Salton’s method applies Boolean operations first, then refines results using
weights.

Boolean Operations

Weighted Interpretation

Two circles represent sets A and For AND:

B.
At weight 1.0 for B, only black
Areas: dotted (A AND B) is included.

White:(A NOT B) As B’s weight decreases to 0.0,

white area (A NOT B) is added.
Black dotted:(A AND B)
For NOT:
Grey: (B NOT A)
At weight 0.0 for B, all of A
For OR: (white + black dotted) is
included.
At weight 0.0 for B, only A
(white + black dotted) is As B’s weight increases to 1.0,
included. black dotted (A AND B) is
removed, leaving grey.
As B’s weight increases to 1.0,
grey area (B NOT A) is added
proportionally.

K. JAYASRI | UNIT – IV | CSM 9 Information Retrieval System (IRS)

USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
Example:

K. JAYASRI | UNIT – IV | CSM 10 Information Retrieval System (IRS)

USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
Ranking Algorithms

1. Coarse Grain Ranking

2. Fine Grain Ranking

Ranking Algorithms - Coarse Grain Ranking

Completeness: Semantic Distance Contextual Evidence:

Example: If the query has 5 Synonyms (e.g., “buy” for Example: If the query term is
terms and the item contains 3 “purchase”) increase the score, “charge” with the context of
of them (or their synonyms), while antonyms (e.g., “sell” for “paying for an object,” finding
the completeness is 3/5 = 0.6. “buy”) decrease it. words like “buy,” “purchase,” or
“debt” in the item suggests that
This sets an upper limit on the The closer the semantic “charge” is used in the desired
item’s rank. If query terms are relationship, the more weight is sense, increasing the item’s
weighted (e.g., some terms are added to the ranking. score.
marked as more important),
those weights are factored into This helps disambiguate terms
the score. with multiple meanings (e.g.,
“charge” as in payment vs.
“charge” as in an electrical
charge).

Ranking Algorithms - Fine Grain Ranking

Proximity Impact of Proximity

• A proximity factor is calculated, which is • In long documents, if query terms are

highest when terms are adjacent and widely distributed (e.g., one term at the
decreases as the distance between terms beginning and another at the end), the
increases. fine grain rank can drop significantly,
• Example: If the query terms “machine” even to zero, despite the terms being
and “learning” appear next to each other present.
(e.g., “machine learning”), the item • This reflects the intuition that scattered
scores higher than if they are in different terms are less likely to indicate relevance
paragraphs (e.g., “machine” in the to the query as a whole.
introduction and “learning” in the
conclusion).

K. JAYASRI | UNIT – IV | CSM 11 Information Retrieval System (IRS)

USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
Searching the INTERNET and Hypertext

 Internet Search Mechanisms

The Internet in the 1990s relied on search engines like Yahoo, AltaVista, and Lycos to help users find
information. These search engines worked by:

o Indexing Process: Automated processes (called crawlers or spiders) visited websites,

retrieved text, and created indexes. These indexes stored references to web content,
including URLs (web addresses), to help users locate relevant pages.
 Lycos indexed only the home pages of websites, focusing on a limited scope.
 AltaVista indexed all text on a website, providing a more comprehensive search.
o Ranking Algorithms: Search results were ranked using simple statistical methods based
on word frequency in the indexed text. This helped prioritize results most relevant to
the user's query.
 Six Key Characteristics of Intelligent Agents
Intelligent agents were software programs designed to autonomously search the Internet for
information based on user-defined needs.

1. Autonomy
 Agents operated independently without constant human input, navigating websites
based on predefined criteria to collect relevant information.

2. Communication Ability
 Agents used standard protocols (e.g., Z39.50, a library search protocol) to interact
with websites and retrieve data.

3. Capacity for Cooperation

 Agents could work together or with other systems to achieve common goals, such
as sharing search results.

4. Capacity for Reasoning

 Rule-based: Followed user-defined conditions and actions (e.g., "if a page contains
'AI,' save it").
 Knowledge-based: Learned from past actions to improve future searches.
 Artificial Evolution-based: Created new, smarter agents to handle complex tasks.

5. Adaptive Behavior
 Agents assessed their environment and adjusted their actions to better meet user
needs, combining autonomy and reasoning.

6. Trustworthiness
 Users needed to trust that agents would act in their best interests, retrieving
relevant and accessible information.

K. JAYASRI | UNIT – IV | CSM 12 Information Retrieval System (IRS)

USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
Information Visualization

 Doyle (1962): Proposed "semantic road maps" to give users a visual overview of a database’s
content, allowing them to focus queries on specific themes.

 Miller (1968): Emphasized spatial organization to map database information.

 Sammon (1969): Developed a non-linear mapping algorithm to reveal document associations,
enabling visual "road maps."
 By the 1990s, technological advancements and the explosion of digital information made
visualization a practical and commercially viable field.
 Why Information Visualization Matters ?
o Historically, IR systems have focused on algorithmic improvements (e.g., indexing,
searching, clustering) rather than how information is displayed to users. This was due to:
o Technological limitations: Early systems lacked the capability for sophisticated visual
displays.
o Academic focus: Researchers prioritized algorithmic challenges over human-computer
interface (HCI) design.
o Multidisciplinary complexity: Effective visualization requires integrating cognitive
science, perception, and HCI.
 Visualization can be divided into:
o Link visualization: Shows relationships between items (e.g., networks or connections).
o Attribute visualization: Reveals content patterns across many items (e.g., how search
terms influence results).

Cognition and Perception

 The shift in user-machine interfaces from basic typewriter-like interactions to more complex
systems like WIMP (windows, icons, menus, pointer) interfaces, which handle multiple tasks
simultaneously.
 As computer displays became common, the focus turned to representing information visually in
ways that align with human cognitive processes.
 The goal is to reduce the mental effort (cognitive overhead) users spend finding and
understanding information by leveraging human perception—particularly vision, but also other
senses like audio and touch.
 Background on Vision and Cognition
o Gestalt Psychology: The mind organizes visual input into meaningful wholes using rules:
o Proximity: Nearby objects are grouped together.
o Similarity: Similar objects are grouped together.
o Continuity: Smooth, continuous patterns are preferred (e.g., a circle with a line through
it is seen as a circle and a line, not two half-circles).
o Closure: Gaps are mentally filled to form a whole (e.g., a dashed square is still perceived
as a square).
o Connectedness: Linked elements are seen as a single unit.

K. JAYASRI | UNIT – IV | CSM 13 Information Retrieval System (IRS)

USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
 Aspects of the Visualization Process
o Preattentive Processing

 Detecting the boundary between left-

right arrows and "C" shapes requires
more conscious effort because it
involves different objects, not just
orientations.
 Preattentive processes are
automatic, low-level, and
 The figure shows three groups of preconscious. They detect basic
shapes: upward arrows (left), visual features like borders or
left-right arrows (middle), and orientation changes quickly.
"C" shapes (right).  Implication: Information encoded in
 The visual system quickly detects orientations (e.g., grouping by
the boundary between the direction) is easier to detect than
upward and left-right arrows due using different shapes, as it leverages
to orientation differences the retina’s feature detectors.
(preattentive).

o Rotated Objects and Symmetry

 Recognizing rotated objects (e.g., a

square turned 45 degrees) requires
more cognitive effort than upright
ones, as symmetry is easier to detect
along vertical axes.
 Figure 8.2 (Rotating a Square and
Reversing Letters in “REAL”):
 Implication: Interfaces should use
 The figure shows a square (left)
familiar orientations (e.g.,
and a rotated square (right, now
vertical/horizontal) to reduce cognitive
a diamond).
load.
 It also shows the word "REAL"
with letters reversed.
 The rotated square is harder to
recognize as a square, and
reversed letters are harder to
read because they deviate from
familiar orientations.

K. JAYASRI | UNIT – IV | CSM 14 Information Retrieval System (IRS)

USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
o Optical Illusions and Color
 A light object on a dark background appears larger than a dark object on a light
background. Small objects in displays should use bright colors to stand out.
 Color Use:
 Colors (hue, saturation, lightness) are used to organize and enhance
visuals. Humans are drawn to primary colors (red, blue, green, yellow)
and retain them longer.
 Depth Cues:
 Monocular cues like shading, perspective, and occlusion depict depth.
Brightness enhances depth perception more than contrast.
 Depth/size recognition is learned early (Gibson and Walk, 1960, showed
six-month-olds understand depth), making depth a reliable way to
represent information.
o Configural Effects

 Configural clues allow quick

recognition of abstract
conditions by arranging objects
in recognizable patterns.
 Implication: Configural clues
can substitute high-level
cognitive processes with faster
 Figure 8.3 (Distortion of a Regular low-level visual ones, useful for
Polygon): The figure shows a monitoring systems (e.g.,
square (left) and a distorted square detecting anomalies in
(right) with one side elongated. operations).
 The visual system quickly detects
the distortion because it deviates
from the expected equal-sided
shape.

o Spatial Frequency
 The visual system constructs images from multiple channels (spatial frequency,
orientation, contrast).
 Spatial frequency measures light-dark cycles per degree of visual field.
 Distinct images are easier to process for motion/changes than blurred ones, so
certain spatial frequencies can help highlight patterns in dynamic displays.
o Natural Visual Processing
 The visual system is tuned to real-world patterns like horizontal/vertical
references, subdued colors, and terrain/depth.
 Bright colors in displays mimic natural attention cues (e.g., noticing bright
flowers), and depth-based graphics align with everyday depth processing.
 Implication: Visualizations should mimic real-world sensory experiences to
reduce cognitive effort.

K. JAYASRI | UNIT – IV | CSM 15 Information Retrieval System (IRS)

USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
o Challenges in Visualization
 Context and Bias: Interpretation depends on the user’s background. Recent
experiences can bias perception (e.g., seeing clusters where none exist if
clusters were recently relevant).
 Individual Differences: Past experiences shape how users interpret visuals,
which may differ from the designer’s intent.

Hierarchical Visualization: Cone Tree and Perspective Wall

Cone Tree

 Hierarchical data (e.g., organizational

structures or document clusters) can be
visualized using tree structures.
 However, traditional 2D trees become
cluttered with large datasets, so 3D
representations are used.

Description:

 This is a 3D visualization from the

Information Visualizer (Xerox PARC).
 The tree’s root node is at the apex, with
child nodes arranged in a circular base.
 Each child node can be a parent to another
cone, forming a nested structure.
 Selecting a node rotates it to the front for Purpose: It shows the size and structure of
focus. subtrees, helping users understand the hierarchy
and navigate to documents (represented as
squares at the leaf nodes). Higher nodes may
represent clusters or semantic centroids.

Example Use: In a document retrieval system, this

could represent clusters of related articles, with
leaf nodes as individual documents.

K. JAYASRI | UNIT – IV | CSM 16 Information Retrieval System (IRS)

USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
Perspective Wall

Description:

 The Information Visualizer, this divides

information into three areas:
 a focused central area (e.g., "A GUI: The
What Frame")
 two out-of-focus side areas (e.g.,
"Report," "Letter").
 The wall-like structure provides a
perspective view.

Purpose: It allows users to focus on a specific

area while keeping the broader context visible,
aiding navigation in large datasets.

Example Use: In an IR system, this could display

search results with the central area showing a
selected document cluster, while side areas show
related clusters.

Tree Maps for Hierarchical Data

Tree maps use nested rectangular boxes to

represent hierarchical data, maximizing screen
space.

In this example, the map shows computer-related

articles divided into categories like CPU, OS,
Memory, Network Management, and others
(e.g., Word Processing, Spreadsheet Software).

Purpose: The size of each box reflects the

number of items in a category, and the layout
indicates relationships (e.g., CPU and OS are
grouped under operating systems, separate from
applications).

Example Use: A user searching for computer

articles can quickly see which topics (e.g., OS)
have the most documents and explore related
areas.

K. JAYASRI | UNIT – IV | CSM 17 Information Retrieval System (IRS)

USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
Search Statement Analysis: Understanding Query Impact

Query Window: Displays and allows editing of

the search query.

Graphic View Window: A scatterplot where

each icon (circle or ellipse) represents a
document or cluster. Circles show single items
with relevance weights, while ellipses indicate
clusters.

Item Summary Window: Shows bibliographic

details of selected items.

Purpose: It helps users visualize the

distribution of search results and their
relevance, making it easier to refine queries.

Example Use: A user searching for “machine

learning” can see clusters of documents and
their relevance scores, then adjust the query
based on the scatterplot.

Visualization of Results

From SIGIR ’96, this shows a grid where

columns represent documents and rows
represent query terms (e.g., “affirmative-
action,” “construct-industry”).

The height of bars indicates the weight of each

term in a document.

Purpose: Users can see which terms most

influenced retrieval (by scanning columns) and
how terms contributed overall (by scanning
rows), helping identify underperforming terms
for query refinement.

Example Use: For the query “How affirmative

action affected the construction industry,” a
user might notice “construct” has low impact
and adjust the query to include related terms.

K. JAYASRI | UNIT – IV | CSM 18 Information Retrieval System (IRS)

USER SEARCH TECHNIQUES
AND INFORMATION VISUALIZATION
DCARS Query Histogram

From Calspan’s DCARS system, this

histogram displays documents as rows,
with tile bars showing the contribution of
query terms (e.g., “arms,” “proliferation”)
to retrieval.

The width of each bar reflects term

importance.

From Calspan’s DCARS system, this

histogram displays documents as rows,
with tile bars showing the contribution of
query terms (e.g., “arms,” “proliferation”)
to retrieval.

The width of each bar reflects term

importance.

Cityscape for Thematic Representation

This uses a 3D “city” metaphor where

skyscrapers represent themes or concepts (e.g.,
document clusters).

The height of buildings indicates the

importance or size of the theme, and
connections between buildings show
relationships.

Purpose: It allows users to “move” through the

cityscape, zooming in on specific themes to
reveal more details, providing an intuitive way
to explore complex data.

Example Use: In a news retrieval system, a user

might see a tall building for “oil spill” with
connected buildings for related events,
navigating to explore specific articles.

K. JAYASRI | UNIT – IV | CSM 19 Information Retrieval System (IRS)

Unit-I: Introduction To Information Retrieval Systems
100% (1)
Unit-I: Introduction To Information Retrieval Systems
14 pages
CompletedUNIT 1 PPT 10.7.17
100% (6)
CompletedUNIT 1 PPT 10.7.17
87 pages
Irs r22 Unit 4 Lecture Notes User Search Techniques Ranking Algorithms
No ratings yet
Irs r22 Unit 4 Lecture Notes User Search Techniques Ranking Algorithms
24 pages
Ahb Irs Kits Jntuk r19 Unit IV Complete
No ratings yet
Ahb Irs Kits Jntuk r19 Unit IV Complete
53 pages
IRS Unit 4
No ratings yet
IRS Unit 4
63 pages
X - CH-10 Working With Multiple Tables
No ratings yet
X - CH-10 Working With Multiple Tables
5 pages
Irs Unit-4 Notes - 241202 - 150037
No ratings yet
Irs Unit-4 Notes - 241202 - 150037
18 pages
IRS Study Material
100% (1)
IRS Study Material
87 pages
Database Management Systems Theory (4th Sem) .
No ratings yet
Database Management Systems Theory (4th Sem) .
25 pages
Pe Ii6
No ratings yet
Pe Ii6
166 pages
Irs Unit - 4
No ratings yet
Irs Unit - 4
29 pages
Unit III
No ratings yet
Unit III
37 pages
SQL
100% (3)
SQL
111 pages
Unit 1
No ratings yet
Unit 1
108 pages
Intro IR
No ratings yet
Intro IR
108 pages
ITR Notes
No ratings yet
ITR Notes
166 pages
Unit 4
No ratings yet
Unit 4
61 pages
Lec 1 - Intro - Unit 1 Information Technology
No ratings yet
Lec 1 - Intro - Unit 1 Information Technology
102 pages
IRS Unit-4
No ratings yet
IRS Unit-4
35 pages
Information Retrieval: Prof: Ehab Ezzat Hassanein
No ratings yet
Information Retrieval: Prof: Ehab Ezzat Hassanein
49 pages
Introduction To Database Management System Second Edition PDF
100% (2)
Introduction To Database Management System Second Edition PDF
553 pages
Big Data Black Book PDF
15% (20)
Big Data Black Book PDF
2 pages
Unit 4
No ratings yet
Unit 4
17 pages
IRT Unit 1
No ratings yet
IRT Unit 1
27 pages
Information Retrivals Ans
No ratings yet
Information Retrivals Ans
78 pages
Irs Mid 2
No ratings yet
Irs Mid 2
14 pages
IR 2nd Unit
No ratings yet
IR 2nd Unit
17 pages
Unit 1 Irt
No ratings yet
Unit 1 Irt
21 pages
UNIT5-User Search Techniques
No ratings yet
UNIT5-User Search Techniques
24 pages
IRS - Notes - I&2 CSE A&B
No ratings yet
IRS - Notes - I&2 CSE A&B
27 pages
MongoDB Dumps
No ratings yet
MongoDB Dumps
27 pages
Irs Unit-4 Modified
No ratings yet
Irs Unit-4 Modified
13 pages
Lab1-Algorithms For Information Retrieval. Introduction
No ratings yet
Lab1-Algorithms For Information Retrieval. Introduction
13 pages
2 Introduction To Information Retrieval
No ratings yet
2 Introduction To Information Retrieval
38 pages
Ker Ruthven Lalmas PDF
No ratings yet
Ker Ruthven Lalmas PDF
53 pages
IRS Unit 4 by Krishna
No ratings yet
IRS Unit 4 by Krishna
23 pages
Unit - 1
No ratings yet
Unit - 1
51 pages
Bulu
No ratings yet
Bulu
47 pages
4 TH Unit
No ratings yet
4 TH Unit
13 pages
Unit - 6
No ratings yet
Unit - 6
12 pages
8.relavance Feedback - II
No ratings yet
8.relavance Feedback - II
52 pages
Salesforce Objects, Fields, Tabs
No ratings yet
Salesforce Objects, Fields, Tabs
10 pages
Lec2 2
No ratings yet
Lec2 2
17 pages
Irs QB Iii I Se
No ratings yet
Irs QB Iii I Se
9 pages
Abinitio Interview Ques
No ratings yet
Abinitio Interview Ques
30 pages
Cs8080 Irt Unit 1 PDF
No ratings yet
Cs8080 Irt Unit 1 PDF
28 pages
Information Retrieval Systems Slip Test 2
No ratings yet
Information Retrieval Systems Slip Test 2
10 pages
IRS Notes
No ratings yet
IRS Notes
10 pages
Module 1 - Introduction
No ratings yet
Module 1 - Introduction
61 pages
Sap Refresh
100% (3)
Sap Refresh
63 pages
IR Cs Sem 6
No ratings yet
IR Cs Sem 6
16 pages
SAP HANA Modeling Guide
100% (1)
SAP HANA Modeling Guide
266 pages
Unit1 Introduction
No ratings yet
Unit1 Introduction
31 pages
Unit-4 1
No ratings yet
Unit-4 1
7 pages
Chap 4 Text IR PDF
No ratings yet
Chap 4 Text IR PDF
19 pages
CMP 312 - 2
No ratings yet
CMP 312 - 2
5 pages
Information Retrievalpdf
No ratings yet
Information Retrievalpdf
7 pages
Fuzzy Ontologies and Scale Free Networks
No ratings yet
Fuzzy Ontologies and Scale Free Networks
11 pages
Irs I
No ratings yet
Irs I
20 pages
Module 1print
No ratings yet
Module 1print
5 pages
Web Mining UNIT-II Chapter-01 - 02 - 03
No ratings yet
Web Mining UNIT-II Chapter-01 - 02 - 03
19 pages
IRS Unit-1
50% (2)
IRS Unit-1
14 pages
DBMS C1P2
No ratings yet
DBMS C1P2
42 pages
DBA Cockpit For SAP HANA
No ratings yet
DBA Cockpit For SAP HANA
8 pages
Irs Unit1
No ratings yet
Irs Unit1
15 pages
Abap Datadictionary
No ratings yet
Abap Datadictionary
153 pages
Documentation Ir
No ratings yet
Documentation Ir
58 pages
Complete Data Engineering Roadmap With Resources
No ratings yet
Complete Data Engineering Roadmap With Resources
16 pages
Importance of Similarity Measures in Effective Web Information Retrieval
No ratings yet
Importance of Similarity Measures in Effective Web Information Retrieval
5 pages
Information Retrieval
No ratings yet
Information Retrieval
5 pages
Chapter 6 Management Information System
No ratings yet
Chapter 6 Management Information System
6 pages
Extrea Queries For Practice
No ratings yet
Extrea Queries For Practice
7 pages
Pivot Tables
No ratings yet
Pivot Tables
24 pages
Pythonlearn 15 Databases
No ratings yet
Pythonlearn 15 Databases
96 pages
Hadoop: A Report Writing On
No ratings yet
Hadoop: A Report Writing On
13 pages
Big Data and The Courts
No ratings yet
Big Data and The Courts
8 pages
BW Inventory Controlling Extraction Steps: Log On To SAP-R/3 ECC OLTP System
No ratings yet
BW Inventory Controlling Extraction Steps: Log On To SAP-R/3 ECC OLTP System
8 pages
Types of Data Warehouses
No ratings yet
Types of Data Warehouses
2 pages
Speak BO Understanding The Audit Database in SAP BI 4 Slides
No ratings yet
Speak BO Understanding The Audit Database in SAP BI 4 Slides
24 pages
DBMS Architecture: Professor Navneet Goyal Department of Computer Science & Information Systems BITS, Pilani
No ratings yet
DBMS Architecture: Professor Navneet Goyal Department of Computer Science & Information Systems BITS, Pilani
16 pages
Web-GIS Based System For The Management of Objections To A Comprehensive Municipal Land Use Plan
No ratings yet
Web-GIS Based System For The Management of Objections To A Comprehensive Municipal Land Use Plan
6 pages
NAME: Sai Kumar.p REG - NO:18BCA0079: Assessment 5 Ita1005 - Database Management Systems Lab
No ratings yet
NAME: Sai Kumar.p REG - NO:18BCA0079: Assessment 5 Ita1005 - Database Management Systems Lab
12 pages
AIT 524 L1 29 AUG: Everything in 1 Hour Video in Learning Resources
0% (1)
AIT 524 L1 29 AUG: Everything in 1 Hour Video in Learning Resources
3 pages
999datdsr021 DB Mail TSQL
No ratings yet
999datdsr021 DB Mail TSQL
4 pages
Fpse
No ratings yet
Fpse
6 pages
Answer SQL Questions
No ratings yet
Answer SQL Questions
4 pages
Equitable Resource Allocation: Models, Algorithms and Applications
From Everand
Equitable Resource Allocation: Models, Algorithms and Applications
Hanan Luss
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet