WP3 Challenges & Hybrid Models: Dan Brickley, VUA Pro-Netics & BBC

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 30

WP3

Challenges & Hybrid models


Dan Brickley, VUA
Pro-netics & BBC
Overview
• Challenges for TV in Social Web
– theory and practice of our hybrid approach
• 3 Interconnected problems:
– Privacy, Sparsity and Heterogeneity
• What we built (and why)
– different kinds of recommender
– ways of integrating them
• Plans and options for final developments

2
Theory and Practice

9 8 5 2 0 9

0 0 8 8 8 6

3 2 7 9 9 8

3
More likely ...

9 0 0 0 0 9

0 0 8 0 8 0

0 0 7 0 9 8

4
5
6
7
8
TV preference data is very sparse!
• Even for a single service (eg. Netflix), data is
‘overwhelmingly sparse’
• For NoTube’s open systems, challenges
multiply:
– often no global view, only per-user data
– many ways of identifying the same content item
– many ways of identifying the same user
– never mind other entities (actors, directors, ...)

9
Challenges: Sparsity, Fragmentation

• Content identifiers (WP1)


– Wikipedia/DBpedia URLs? Freebase?
– RottenTomatoes.com, IMDB.com, broadcaster IDs
• Social Web interoperability
– Bob’s on Facebook, Charlie’s on Twitter
– negotiating access to non-public data (OAuth)
– reconciling metadata models, rating models
10
Fragmentation by site

11
A hybrid approach to sparsity

• Find patterns and paths in factual data


• Collaborative filtering - from bulk rating data
• Experiments with ‘big data’ (e.g. Twitter
crawl)
• Models for combining recommenders
• Strategies for inferring ‘sameAs’ links
• ...or grouping items together (by series,
brand)
12
Challenge: Privacy
• TV preferences are very personal data
• Relevant standards (OAuth) are new
– deployed widely in Social Web during NoTube
– slower adoption in TV and broadcast world
• We can use OAuth to request permission to
read a user’s closed data (eg. Facebook ‘Like’s)
• limits ability to find general trends across an
entire audience (except public data - twitter?)

13
Diversity and Fragmentation
• Diversity of the Web
– reading lists: bookcrossing, librarything, amazon
– music on last.fm, spotify, ...
– news sites, social networks, blogs ...
• How to integrate while respecting privacy?
• Good news: OAuth deployment growing &
social sites expose their recommendations
• Bad news: user-by-user data makes large-scale
analysis of trends harder
14
OAuth? RDFa?
• OAuth lets sites negotiate access with users
• e.g., Facebook knows lots of movies I “like”.
• NoTube can use OAuth to ask me to share that
data with TV services
• RDFa data from movie pages (IMDB, Rotten
tomatoes) is consumed at Facebook
• This makes certain pages attractive as content
identifiers, a ‘taste graph’ alongside ‘social
graph’
15
16
17
RDFa in IMDB and
RottenTomatoes HTML

Aggregated by Facebook (and then, by us...) 18


What we built
• Main WP3 work: beancounter and pattern
recommender
• Aggregate, normalize and merge social Web
activity streams, then match against enriched
TV metadata to produce recommendations
• We also have a Mahout-based collaborative
filtering recommender, with ‘item to item’
recommendations based on bulk ratings data

19
LOD challenges
• Linked Open Data for TV is new
– datasets evolving, changing
– quality varies
– modelling styles vary
– ‘lumpy’, uneven coverage
• ‘Pattern recommender’ finds paths
– from items in user profile to new content
– handles variation between Linked Data sources

20
Content Pattern-based
Recommendations
• Paths in Linked Open Data
• Diversity & Serendipity measures

21
Participation Pattern
• Person X played role Y
in TV program Z

• 194,649 lmdb:actor triples


• 53,180 lmdb:director triples
• 28,549 lmdb:writer triples
• 1,262
lmdb:film_story_contributor
triples

22
Influence Pattern
• Person X influenced
by person Y (direct)
• Person X and Y
influenced by person
Z (in-direct)

• 6,562 dbpedia:influenced
triples
• 11,776 dbpedia:influencedBy
triples

23
Analysis of Patterns in Dataset
# items # items
recommendations 1266
recommendations 222
- Individual brands 411
- Individual brands 100
paths 17,001
paths
- with linkedmdb:actor 15,257
- with linkedmdb:director 1155 - influencedBy (all) 1202

- with linkedmdb:writer 569


- influencedBy (unique) 521
- with linkedmdb:film_story_contributor 20

Dataset (BBC EPG metadata):


– 12,777 (7,756 title enrichment) programmes
– 1260 (401 enriched) brands (unique titles)
– 35,227 (19,394 enriched) person names in metadata
– 9,315 (4,590 enriched) unique person names in metadata
24
Collaborative filtering

(item similarity measures from bulk ratings data)


25
26
27
Hybrid models:
factual paths and statistical similarity

(and not to mention ‘@wossy’ is on Twitter with 1 million followers...)


28
Status
• We can show a standards-based system that
– integrates TV preference data from diverse Web
– matches this with enriched TV metadata
– finds graph patterns linking users to content
– integrates with ‘classic’ recommender approaches
– builds on opensource (Cliopatria, Mahout)
– supports real-time multi-screen exploration

29
Plans and challenges
• Richer integration between components
– currently this occurs in the application; can we
exploit LOD patterns prior to Mahout analysis?
• Polish & packaging; more patterns and rules
• Track and influence evolving standards (W3C)
• Work-in-progress with ‘big data’ analysis -
‘what kinds of TV links are shared by the kind
of people who follow @stephenfry on
Twitter?’
30

You might also like