0% found this document useful (0 votes)
49 views48 pages

Berkeley Dataproduct Talk

The document discusses Pete Skomoroch's background in data science and his work creating data products at LinkedIn. It describes the process for creating data products, including defining the problem, modeling, data collection, feature engineering, and iteration. As an example, it outlines LinkedIn's skills data product, including extracting and standardizing skills from profiles, skills pages, suggested skills algorithms using Naive Bayes classification, and the viral growth of over 1 billion skill endorsements. It emphasizes building on initial data foundations to continuously expand and improve data products over time.

Uploaded by

Lordger Liu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views48 pages

Berkeley Dataproduct Talk

The document discusses Pete Skomoroch's background in data science and his work creating data products at LinkedIn. It describes the process for creating data products, including defining the problem, modeling, data collection, feature engineering, and iteration. As an example, it outlines LinkedIn's skills data product, including extracting and standardizing skills from profiles, skills pages, suggested skills algorithms using Naive Bayes classification, and the viral growth of over 1 billion skill endorsements. It emphasizes building on initial data foundations to continuously expand and improve data products over time.

Uploaded by

Lordger Liu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Data

Products Deep Dive


Pete Skomoroch
@peteskomoroch
3/31/14
Berkeley CS194-16: Intro to Data Science

Some Background

Physics/Math BS Undergrad
Analyst/ SoGware Engineer @ProtLogic - 3.5 years
Biodefense Engineer / ML Student @ MIT - 3.5 years
Sr. Research Engineer @ AOL Search - 1 year
Director @ Juice AnalyScs - 1 year
ConsulSng @ Cloudera, Amazon etc - 1 year
Principal Data ScienSst @ LinkedIn - 4 years

Four types of data scienSst (at least)

source: "Analyzing the Analyzers" O'Reilly


Media

Data ScienSsts create data products

The data product process

Verify you are solving the right problem


Theory + model design
Measurement: data collecSon and cleaning
Feature engineering & model development
Error analysis and invesSgaSon
Iterate and improve each step in the process
Leverage derived data to build new products

Data factories & ywheels

Source: h`p://www.linkedin.com/
channels/disrupt2013 Steve Jennings/Ge`y
Images Entertainment

Data Product Example: LinkedIn Skills

Skill ExtracSon and StandardizaSon Pipeline


Skill Pages
Skills SecSon on Member Proles
Suggested Skills Algorithm and Email
Skill Endorsements

Skill Discovery: Unsupervised Topics


from Prole SpecialSes SecSon

Extract

10

Topic Clustering & Phrase Sense


DisambiguaSon

11

DeduplicaSon Signals from Mechanical


Turk

12

Sample Task for Mechanical Turk


Workers

13

Mechanical Turk StandardizaSon

Skill Phrase DeduplicaSon

15

Tagging Skill Phrases

Document
(ex: Prole)

Tagging: Extract potenSal skill phrases from text


Lead designer and engineer for the implementaSon of a user-centric,

fully-congurable UI for data aggregaSon and reporSng.

Developed over 20 SaaS custom applicaSons using Python, Javascript


and RoR.

JavaScript

RoR

Python

SaaS

Standardize unambiguous phrase variants


ror
rubyonrails
ruby on rails development
ruby rails
ruby on rail

Ruby on Rails

TokenizaSon
Phrases
(up to 6 words)

Skills Tagger
Skills
(unordered)

Skills Classier

Skills
(ranked by relevance)
16

30

Skills Related to Big Data

31

Skills Correlated with the Job Title


Data ScienSst

32

SkillRank: Algorithm for Top People

33

How do we get more people into the


skill graphs?

Prole

Suggested Skills Inference

How suggested/inferred skills work:


Extract
a`ributes

The skill likelihood is a condiSonal model

ProbabiliSes are combined using a Nave Bayes Classier



If you are an engineer at Apple, you probably know


about iPhone Development.

Feature
Vectors

- Company ID
- Title ID
- Groups ID
- Industry ID
-

Skills Classier



Skills
(ranked by likelihood)

35

Skill RecommendaSons for Your


LinkedIn Prole
4% Conversion

49% Conversion

41

ReputaSon: Build Endorsements


Product to Collect More Graph Edges

42

PYMK + Suggested Skills

43

Viral Growth: 1 Billion Endorsements in 5 Months

44

Social Viral Tagging = Lots of Data


Skill markeSng

Skill recommendaSons
Virality only
Suggested endorsements

How Did We Gather this Data?


1. Desire + Social Proof
2. Viral Loops + Network Eects
3. Data FoundaSon + RecommendaSon
Algorithms

46

Recap: Data Product EvoluSon

Skill ExtracSon and StandardizaSon Pipeline


Skill Pages
Skills SecSon on Member Proles
Suggested Skills Algorithm and Email > 20M members
Skill Endorsements > 60M members, 3B+ Edges
Big product wins in engagement, recall, relevance
SkillRank & ReputaSon integraSon
Sets stage for next generaSon of products

QuesSons?
@peteskomoroch
h`p://datawrangling.com
h`p://www.linkedin.com/in/peterskomoroch

You might also like