WP3 Challenges & Hybrid Models: Dan Brickley, VUA Pro-Netics & BBC
WP3 Challenges & Hybrid Models: Dan Brickley, VUA Pro-Netics & BBC
WP3 Challenges & Hybrid Models: Dan Brickley, VUA Pro-Netics & BBC
2
Theory and Practice
9 8 5 2 0 9
0 0 8 8 8 6
3 2 7 9 9 8
3
More likely ...
9 0 0 0 0 9
0 0 8 0 8 0
0 0 7 0 9 8
4
5
6
7
8
TV preference data is very sparse!
• Even for a single service (eg. Netflix), data is
‘overwhelmingly sparse’
• For NoTube’s open systems, challenges
multiply:
– often no global view, only per-user data
– many ways of identifying the same content item
– many ways of identifying the same user
– never mind other entities (actors, directors, ...)
9
Challenges: Sparsity, Fragmentation
11
A hybrid approach to sparsity
13
Diversity and Fragmentation
• Diversity of the Web
– reading lists: bookcrossing, librarything, amazon
– music on last.fm, spotify, ...
– news sites, social networks, blogs ...
• How to integrate while respecting privacy?
• Good news: OAuth deployment growing &
social sites expose their recommendations
• Bad news: user-by-user data makes large-scale
analysis of trends harder
14
OAuth? RDFa?
• OAuth lets sites negotiate access with users
• e.g., Facebook knows lots of movies I “like”.
• NoTube can use OAuth to ask me to share that
data with TV services
• RDFa data from movie pages (IMDB, Rotten
tomatoes) is consumed at Facebook
• This makes certain pages attractive as content
identifiers, a ‘taste graph’ alongside ‘social
graph’
15
16
17
RDFa in IMDB and
RottenTomatoes HTML
19
LOD challenges
• Linked Open Data for TV is new
– datasets evolving, changing
– quality varies
– modelling styles vary
– ‘lumpy’, uneven coverage
• ‘Pattern recommender’ finds paths
– from items in user profile to new content
– handles variation between Linked Data sources
20
Content Pattern-based
Recommendations
• Paths in Linked Open Data
• Diversity & Serendipity measures
21
Participation Pattern
• Person X played role Y
in TV program Z
22
Influence Pattern
• Person X influenced
by person Y (direct)
• Person X and Y
influenced by person
Z (in-direct)
• 6,562 dbpedia:influenced
triples
• 11,776 dbpedia:influencedBy
triples
23
Analysis of Patterns in Dataset
# items # items
recommendations 1266
recommendations 222
- Individual brands 411
- Individual brands 100
paths 17,001
paths
- with linkedmdb:actor 15,257
- with linkedmdb:director 1155 - influencedBy (all) 1202
29
Plans and challenges
• Richer integration between components
– currently this occurs in the application; can we
exploit LOD patterns prior to Mahout analysis?
• Polish & packaging; more patterns and rules
• Track and influence evolving standards (W3C)
• Work-in-progress with ‘big data’ analysis -
‘what kinds of TV links are shared by the kind
of people who follow @stephenfry on
Twitter?’
30