WP3 Challenges & Hybrid Models: Dan Brickley, VUA Pro-Netics & BBC

WP3
Challenges & Hybrid models

Dan Brickley, VUA
Pro-netics & BBC
Overview
• Challenges for TV in Social Web
– theory and practice of our hybrid approach
• 3 Interconnected problems:
– Privacy, Sparsity and Heterogeneity
• What we built (and why)
– different kinds of recommender
– ways of integrating them
• Plans and options for final developments
2
Theory and Practice
9 8 5 2 0 9
0 0 8 8 8 6
3 2 7 9 9 8
3
More likely ...
9 0 0 0 0 9
0 0 8 0 8 0
0 0 7 0 9 8
4
5
6
7
8
TV preference data is very sparse!
• Even for a single service (eg. Netflix), data is
‘overwhelmingly sparse’
• For NoTube’s open systems, challenges
multiply:
– often no global view, only per-user data
– many ways of identifying the same content item
– many ways of identifying the same user
– never mind other entities (actors, directors, ...)
9
Challenges: Sparsity, Fragmentation
• Content identifiers (WP1)

– Wikipedia/DBpedia URLs? Freebase?
– RottenTomatoes.com, IMDB.com, broadcaster IDs
• Social Web interoperability
– Bob’s on Facebook, Charlie’s on Twitter
– negotiating access to non-public data (OAuth)
– reconciling metadata models, rating models
10
Fragmentation by site
11
A hybrid approach to sparsity
• Find patterns and paths in factual data

• Collaborative filtering - from bulk rating data
• Experiments with ‘big data’ (e.g. Twitter
crawl)
• Models for combining recommenders
• Strategies for inferring ‘sameAs’ links
• ...or grouping items together (by series,
brand)
12
Challenge: Privacy
• TV preferences are very personal data
• Relevant standards (OAuth) are new
– deployed widely in Social Web during NoTube
– slower adoption in TV and broadcast world
• We can use OAuth to request permission to
read a user’s closed data (eg. Facebook ‘Like’s)
• limits ability to find general trends across an
entire audience (except public data - twitter?)
13
Diversity and Fragmentation
• Diversity of the Web
– reading lists: bookcrossing, librarything, amazon
– music on last.fm, spotify, ...
– news sites, social networks, blogs ...
• How to integrate while respecting privacy?
• Good news: OAuth deployment growing &
social sites expose their recommendations
• Bad news: user-by-user data makes large-scale
analysis of trends harder
14
OAuth? RDFa?
• OAuth lets sites negotiate access with users
• e.g., Facebook knows lots of movies I “like”.
• NoTube can use OAuth to ask me to share that
data with TV services
• RDFa data from movie pages (IMDB, Rotten
tomatoes) is consumed at Facebook
• This makes certain pages attractive as content
identifiers, a ‘taste graph’ alongside ‘social
graph’
15
16
17
RDFa in IMDB and
RottenTomatoes HTML
Aggregated by Facebook (and then, by us...) 18

What we built
• Main WP3 work: beancounter and pattern
recommender
• Aggregate, normalize and merge social Web
activity streams, then match against enriched
TV metadata to produce recommendations
• We also have a Mahout-based collaborative
filtering recommender, with ‘item to item’
recommendations based on bulk ratings data
19
LOD challenges
• Linked Open Data for TV is new
– datasets evolving, changing
– quality varies
– modelling styles vary
– ‘lumpy’, uneven coverage
• ‘Pattern recommender’ finds paths
– from items in user profile to new content
– handles variation between Linked Data sources
20
Content Pattern-based
Recommendations
• Paths in Linked Open Data
• Diversity & Serendipity measures
21
Participation Pattern
• Person X played role Y
in TV program Z
• 194,649 lmdb:actor triples

• 53,180 lmdb:director triples
• 28,549 lmdb:writer triples
• 1,262
lmdb:film_story_contributor
triples
22
Influence Pattern
• Person X influenced
by person Y (direct)
• Person X and Y
influenced by person
Z (in-direct)
• 6,562 dbpedia:influenced
triples
• 11,776 dbpedia:influencedBy
triples
23
Analysis of Patterns in Dataset
# items # items
recommendations 1266
recommendations 222
- Individual brands 411
- Individual brands 100
paths 17,001
paths
- with linkedmdb:actor 15,257
- with linkedmdb:director 1155 - influencedBy (all) 1202
- with linkedmdb:writer 569

- influencedBy (unique) 521
- with linkedmdb:film_story_contributor 20
Dataset (BBC EPG metadata):

– 12,777 (7,756 title enrichment) programmes
– 1260 (401 enriched) brands (unique titles)
– 35,227 (19,394 enriched) person names in metadata
– 9,315 (4,590 enriched) unique person names in metadata
24
Collaborative filtering
(item similarity measures from bulk ratings data)

25
26
27
Hybrid models:
factual paths and statistical similarity
(and not to mention ‘@wossy’ is on Twitter with 1 million followers...)

28
Status
• We can show a standards-based system that
– integrates TV preference data from diverse Web
– matches this with enriched TV metadata
– finds graph patterns linking users to content
– integrates with ‘classic’ recommender approaches
– builds on opensource (Cliopatria, Mahout)
– supports real-time multi-screen exploration
29
Plans and challenges
• Richer integration between components
– currently this occurs in the application; can we
exploit LOD patterns prior to Mahout analysis?
• Polish & packaging; more patterns and rules
• Track and influence evolving standards (W3C)
• Work-in-progress with ‘big data’ analysis -
‘what kinds of TV links are shared by the kind
of people who follow @stephenfry on
Twitter?’
30

WP3 Challenges & Hybrid Models: Dan Brickley, VUA Pro-Netics & BBC

Uploaded by

Copyright:

Available Formats

WP3 Challenges & Hybrid Models: Dan Brickley, VUA Pro-Netics & BBC

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

WP3 Challenges & Hybrid Models: Dan Brickley, VUA Pro-Netics & BBC

Uploaded by

Copyright:

Available Formats

WP3

Challenges & Hybrid models

• Content identifiers (WP1)

• Find patterns and paths in factual data

Aggregated by Facebook (and then, by us...) 18

• 194,649 lmdb:actor triples

- with linkedmdb:writer 569

Dataset (BBC EPG metadata):

(item similarity measures from bulk ratings data)

(and not to mention ‘@wossy’ is on Twitter with 1 million followers...)

You might also like