DVT U4 My Notes
DVT U4 My Notes
4.1 Introduction
Complex Queries:
For example, finding documents related to the "spread of flu" requires
more than searching the word "flu"—it involves analyzing connections and
patterns among documents.
Levels of Text Representations (with Examples)
1. Lexical Level
At this level, the focus is on breaking down raw text into basic units, called
tokens. These tokens can be words, phrases, or character sequences. A
lexical analyzer applies rules, often using regular expressions or finite
state machines, to identify and classify these units.
Example:
Input: "The quick brown fox jumps over the lazy dog."
In a search engine, this level enables the system to break a query like
"restaurants near me" into individual words for further analysis.
2. Syntactic Level
Example:
Input: "The quick brown fox jumps over the lazy dog."
Output:
o "The" → Article
o "quick" → Adjective
o "fox" → Noun
o "jumps" → Verb.
For instance, in a chatbot, this level helps interpret "Book a flight to New
York tomorrow" by tagging "flight" as a noun, "New York" as a place, and
"tomorrow" as a date.
3. Semantic Level
This level goes deeper by extracting the meaning of the text and
understanding relationships between words or phrases in a given context.
The focus is on interpreting what the text means, not just its structure or
individual tokens.
Example:
Input: "The quick brown fox jumps over the lazy dog."
Example: For a voice assistant like Siri, interpreting the command "Set a
timer for 10 minutes" involves:
By integrating all three levels, systems can provide more intelligent and
meaningful interactions, enhancing the user experience across
applications like search engines, chatbots, and recommendation systems.
4.3 The Vector Space Model
1. Term Vectors:
Count-Terms(tokenStream)
3. Return terms.
2. Zipf’s Law
Implication:
Summarizing text often requires only a few high-frequency words to
capture key ideas.
The model is paired with a distance metric (e.g., cosine similarity) for
various tasks:
Visualization:
1. Word Clouds
2. Word Tree
Features:
3. TextArc
Features:
o Interactive tools let users explore the text's flow and structure.
Example: A TextArc of Alice in Wonderland positions evenly
distributed words at the center and section-specific words at the
circumference.
4. Arc Diagrams
Features:
5. Literature Fingerprinting
Features:
1. Navigation
2. Selection
Features:
3. Filtering
Features:
4. Reconfiguring
Purpose: Change how data is represented to uncover different
perspectives or patterns.
Features:
5. Encoding
Features:
6. Connecting
Features:
7. Abstracting/Elaborating
Purpose: Adjust the granularity of information displayed.
Features:
8. Hybrid Techniques
Features:
1. Navigation Operators
These operators help users move through and adjust their view of the
data, making it easier to focus on specific areas or explore the dataset
from different perspectives.
2. Selection Operators
Selection operators let users highlight specific parts of the data for deeper
analysis or actions.
3. Filtering Operators
4. Reconfiguring Operators
5. Encoding Operators
6. Connection Operators
7. Abstraction/Elaboration Operators
These operators let users adjust the level of detail in the visualization,
zooming in for specifics or zooming out to see the bigger picture.
Summary
1. Focus
The focus refers to the specific area or point of interest where the user’s
attention is directed.
2. Extents
Extents define the range or boundaries within which the interaction takes
place, accommodating the dimensions of the space being explored.
Example: In Excel, when filtering a dataset, the extents could be the rows
and columns selected for the operation—such as filtering only rows for
"2023" in a "Year" column. On a map, extents represent the visible region
being navigated, like the boundaries of a city you’re exploring. For 3D
visualizations, extents might include depth and height dimensions within
which objects are manipulated.
3. Transformation
Summary
Unified Experience