Classification Methods
Classification Methods
Scatter-Gather clusters the whole collection to get groups of documents that the user can
select or gather.
The selected groups are merged and the resulting set is again clustered. This process is
repeated until a cluster of interest is found.
Example: A collection of New York Times news stories is clustered (``scattered'') into eight
clusters (top row). The user manually gathers three of these into a smaller collection
International Stories and performs another scattering operation. This process repeats until a
small cluster with relevant documents is found (e.g., Trinidad)
Collection clustering
Clustered collections store documents ordered by the clustered index key value,.
clustered collections have the following benefits compared to non-clustered collections:
• Faster queries on clustered collections without needing a secondary index, such as queries
with range scans and equality comparisons on the clustered index key.
• Clustered collections have a lower storage size, which improves performance for queries
and bulk inserts.
• Clustered collections have additional performance improvements for inserts, updates,
deletes, and queries.
Language Modelling