Exploring rated datasets with rating maps

S Amer-Yahia, S Kleisarchaki, NK Kolloju… - Proceedings of the 26th …, 2017 - dl.acm.org
Proceedings of the 26th International Conference on World Wide Web, 2017dl.acm.org
Online rated datasets have become a source for large-scale population studies for analysts
and a means for end-users to achieve routine tasks such as finding a book club. Existing
systems however only provide limited insights into the opinions of different segments of the
rater population. In this paper, we develop a framework for finding and exploring population
segments and their opinions. We propose rating maps, a collection of (population segment,
rating distribution) pairs, where a segment, eg,{18-29 year old males in CA} has a rating …
Online rated datasets have become a source for large-scale population studies for analysts and a means for end-users to achieve routine tasks such as finding a book club. Existing systems however only provide limited insights into the opinions of different segments of the rater population. In this paper, we develop a framework for finding and exploring population segments and their opinions. We propose rating maps, a collection of (population segment, rating distribution) pairs, where a segment, e.g., {18-29 year old males in CA} has a rating distribution in the form of a histogram that aggregates its ratings for a set of items (e.g., movies starring Russel Crowe). We formalize the problem of building rating maps dynamically given desired input distributions. Our problem raises two challenges: (i) the choice of an appropriate measure for comparing rating distributions, and (ii) the design of efficient algorithms to find segments. We show that the Earth Mover's Distance (EMD) is well-adapted to comparing rating distributions and prove that finding segments whose rating distribution is close to input ones is NP-complete. We propose an efficient algorithm for building Partition Decision Trees and heuristics for combining the resulting partitions to further improve their quality. Our experiments on real and synthetic datasets validate the utility of rating maps for both analysts and end-users.
ACM Digital Library
Showing the best result for this search. See all results