Intro
Intro
Nasser Zalmout Chenwei Zhang Xian Li Yan Liang Xin Luna Dong
Amazon Amazon Amazon Amazon Amazon→Facebook
Outline
Overview and Introduction 20 min
Knowledge Extraction 40 min
Knowledge Cleaning 25 min
Break 20 min
Ontology Mining 25 min
Applications 20 min
Conclusion and Future Directions 10 min
Overview and Introduction 20 min
Knowledge Extraction
Knowledge Cleaning
Q&A
Overview and Break
Applications
Q&A
Knowledge Graph Example for 2 Songs
Entity name “Pop”
mid127
name
genre “Dance-pop”
mid127
name
name genre “Dance-pop”
mid129 name
genre “Country pop”
name
Generic KG Generic KG
Generic KG
PG
PG
PGPG
Product Graph vs. Knowledge Graph
Thousands of attributes
Snacks Drinks
Snacks Drinks
Candy
Pretzels Candy
User logs
Catalog AutoKnow
hasType
Product Type Flavor Color
Prod. 1 Prod. 2 Prod. 3
Product 1 Snacks Cherry flavor color
flavor
Product 2 Candy ? ?
Chocolate Choc. Gold
synonym
Product 3 Candy Choc. Gold
Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.
Our Goal: Self-Driving Product Knowledge Collection
Taxonomy
Grocery
Product
KG Grocery
Snacks Drinks
Snacks Drinks
Candy
Pretzels Candy
User logs
Catalog AutoKnow
hasType
Product Type Flavor Color
● #Types ↑ 3X Prod. 1 Prod. 2 Prod. 3
Product 1 Snacks Cherry ● Defect rate ↓ flavor color
flavor
Product 2 Candy ? ?
up to 68 percent
points Chocolate Choc. Gold
synonym
Product 3 Candy Choc. Gold
Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.
Amazon AutoKnow: Self-Driving Product
Knowledge Collection
Input Data
PT Taxonomy
Catalog
Behavioral
Signals (e.g.,
search logs,
reviews,
Q&A)
Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.
Amazon AutoKnow: Self-Driving Product
Knowledge Collection
Input Data Ontology Suite
PT Taxonomy
Taxonomy
Enrichment
Catalog
Behavioral Relation
Signals (e.g., Discovery
search logs,
reviews,
Q&A)
Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.
Amazon AutoKnow: Self-Driving Product
Knowledge Collection
Ontology Suite
PT Taxonomy
Taxonomy Data Suite
Enrichment
Catalog Data
Imputation
Behavioral Relation
Signals (e.g., Discovery Data
search logs, Cleaning
reviews,
Q&A) Synonym
Discovery
Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.
Amazon AutoKnow: Self-Driving Product
Knowledge Collection
Ontology Suite
Broad Graph
PT Taxonomy
Taxonomy Data Suite
Enrichment Ontology
Catalog Data
Imputation {product,
Behavioral Relation attribute,
Signals (e.g., Discovery Data value}
search logs, Cleaning
{value,
reviews,
Synonym synonym,
Q&A)
Discovery value}
Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.
Self Driving to Navigate a Large Space
• Automatic: Fully ML-based
• Annotation free: Weak learning based on existing
Catalog data and user behavior
• One-size-fits-all: Few taxonomy-aware models
• Self guidance: Identify important attributes and
categories to focus efforts
Key Intuition I. Learning w. Limited Labels
Generated from Existing Catalog Data
Taxonomy
Grocery
Snacks Drinks
Candy
Catalog
Product 2 Candy ? ?
hasType
Snacks Drinks
Prod. 1 Prod. 2 Prod. 3
Pretzels Candy color
flavor flavor
Grocery
Snacks Drinks
Pretzels Candy
hasType
hasType
Snacks Drinks
Prod. 1 Prod. 2 Prod. 3
Pretzels Candy color
flavor flavor
10
, 0 0,0 0 0,0 0 0
Deliver the Data Business
1
High
precision
models
Deliver the Data Business
10
, 0 0
High E2E pipeline
precision + AutoML
models to reduce
modeling cost
Deliver the Data Business
10
, 0 0,0 0 0 1000s categories
High E2E pipeline 10s
precision + AutoML languages 100s attributes
models to reduce Scale-up to
modeling cost reduce #models
Deliver the Data Business
10
, 0 0,0 0 0,0 0 0 1000s categories
High E2E pipeline 10s
precision + AutoML languages 100s attributes
models to reduce Scale-up to Higher yield from
modeling cost reduce #models multi-modal models
Tutorial Structure
Sec 4
Ontology Suite
Broad Graph
PT Taxonomy
Taxonomy Data Suite
Enrichment Ontology
Catalog Data
Imputation Sec 2 {product,
Behavioral Relation attribute,
Signals (e.g., Discovery Data value}
search logs, Cleaning Sec 3
{value,
reviews,
Synonym synonym,
Q&A)
Discovery value}
Sec 5.
Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020. Applications
Section Structure
• Problem Definition
What are unique challenges for PG beyond generic KGs?
• Short answer -- key intuition
What are key intuitions for building product KGs?
• Long answer -- details
What are practical tips?
• Reflection/short-answer
Can we apply the techniques to other domains?
Key Questions We Answer in This Tutorial
• Q1. What are unique challenges to build a product knowledge
graph and what are solutions?