0% found this document useful (0 votes)
93 views30 pages

Big Data Ecosystem in Linkedin Data

LinkedIn uses big data across many of its products and services. It collects and analyzes data from its over 560 million users to provide personalized recommendations and insights. Some of its big data products include People You May Know, jobs recommendations, talent and marketing insights, and social graph analysis. LinkedIn also uses a variety of technologies like Apache Kafka, Hadoop and machine learning to handle large volumes of data and scale its platforms and services.

Uploaded by

tamoghna ghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views30 pages

Big Data Ecosystem in Linkedin Data

LinkedIn uses big data across many of its products and services. It collects and analyzes data from its over 560 million users to provide personalized recommendations and insights. Some of its big data products include People You May Know, jobs recommendations, talent and marketing insights, and social graph analysis. LinkedIn also uses a variety of technologies like Apache Kafka, Hadoop and machine learning to handle large volumes of data and scale its platforms and services.

Uploaded by

tamoghna ghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Big Data Ecosystem in LinkedIn Data

Ecosystem at LinkedIn
2
Brief History of LinkedIn
- Launched in 2003 by Reid Hoffman (https://ourstory.linkedin.com/)

- 2005: Introduced first business lines : Jobs and Subscriptions

- 2006: Launched public profiles (achieved portability/new features)

- 2008: LinkedIn goes GLOBAL! (https://business.linkedin.com/)

- 2012: Site transformation/rapid growth

- 2013: ~225 million members (27 % of LinkedIn subscribers are


recruiters )
- 2014: Next decade focused on map of digital
economy 3
4
5
Three Major Data Dimensions
@LinkedIn

6
LinkedIn Challenges for Web-scale
OLAP
● Horizontally scalable
○ currently over 560+ million users
○ adding 2 new members per second
● Quick response time to user’s queries
● High availability
● High read & write throughput (billions of monthly page views)
● Heavy dependency on slowest node’s response as data is spread
across various nodes
Job Recommendations
Product and Premium Insights
OLAP(Online analytical Processing)
Profile Analytics
LinkedIn Products using Big Data

1. People You May Know

• The “People You May Know“ feature on your My


Network page suggests LinkedIn members for you to
connect with.

• These recommendations are based on


commonalities between you and other LinkedIn
members, as well as contacts you’ve imported from
your email and mobile address books.
LinkedIn Products using Big Data

2. Network/Social Graph

• Streamlined search by providing results of


people, companies, jobs, groups and other
filters into one personalized result.
LinkedIn Products using Big Data

3. Talent Solutions

• Instant trends and movement across the


talent marketplace with access to real-time
supply and demand data.

• Intuitive data and insights to acquire,


develop, and retain talent.
LinkedIn Products using Big Data

3. Talent Solutions contd.

•Build deeper knowledge on priority


markets, talent pools, and companies.

•See who you’re gaining talent from, and


losing talent to.

•Understand how your workforce


compares with your competitors.
LinkedIn Products using Big Data

4. LinkedIn Pipeline Builder

• Lets talent share their profile, email, and


phone number with a click of a button to
express interest in your opportunities.

• Let companies fill high-priority, high-


volume, or hard-to-fill roles. Allowing
them to reach them automatically when
they visit LinkedIn with personalized
Sponsored Updates and Recruitment
Ads.
LinkedIn Products using Big Data

5. LinkedIn Marketing Solutions

• Over 560M professionals are on


LinkedIn. Advt. Target them by job
title, function, industry.

• Controlled spending with flexible


pricing options.
LinkedIn Products using Big Data

6. LinkedIn PROFINDER

• Marketplace designed to help


freelance white-collar professionals
find consumers and small businesses in
need of their services.
LinkedIn Products using Big Data
7. Salary Insights
LinkedIn Products using Big Data

8. LinkedIn Jobs

• LinkedIn network uses big data to provide the deep, up-to-date, and insightful data set on professionals.

• The data is used to match role to qualified professionals through targeted promotions of job across LinkedIn
and candidate recommendations.
Scaling at LinkedIn

Vertical
Scaling

Master Slave Model Partitioned DB


Service Oriented Architecture – Micro services
What's New?
• Micro services Architecture – Application Level
Logic Implementation Vs DB Level Logic
Implementation
• Open Source
• Voldemort – NoSQL vs SQL i.e No Impedance
Mismatch, Implicit Schema ,Data replication and
Partition,
• Eventual Consistency
• Kafka – Highly Scalable Messaging System , can
handle 1.4 Trillion messages per day with strong
durability and Low latency
• Hadoop – Big data platform now handling 100s of
Petabytes of data
• Machine Learning – Using googles Tensor flow for
feeds and smart replies
• Tony – TensorFlowOnYARN uses Spark
Still Not Scalable?
Kafka
• 1400 kafka Brokers receiving over 2
Petabata byes of data a week
• Strong Durability and Low Latency
• Kafka REST Proxy
• Nuage a self-service portal for online data-
infrastructure resources
• Kafka Mirror maker consumes data in source
and produces in Target

https://engineering.linkedin.com/blog/2016/04/
kafka-ecosystem-at-linkedin
Create on Your Own vs Reuse

Voldemort
kafka Norbert
Lucene
Sensei

Hadoop
Pig
Hive
Avro
Apache Traffic Server
Product Feature NOSQL DB
Polyglot Persistence
People you May Caching Key Value
Know
Network/Social Network of Graph
Graph Relations
Messenger Consistency Column Family

Pipeline Builder Network Graph

Marketing Analytics Column Family


Solutions
ProFinder Catalog Document Based

Salary Insider Analytics Column Family

Jobs Catalog Document Based


Data Extraction & Visualization
● HQL queries to derive insights from data
○ Top locations with highest job count
○ Job title and count per location
○ Top job titles recently listed

● Query visualization
○ Locations of certain jobs listed
○ Profile Headlines with Highest Connections

● Economic graph
○ Comprehensive digital map of the world economy
○ Popular destination cities of college graduates
○ Areas with high concentrations of skills
Acquisitions
Linkedin has acquired 21 companies
● mSpoke 2010 - Offers an adaptive personalization engine
● Careerify 2015 - An online platform offering recruitment solutions
● Lynda 2015- e Learning platform
● PointDrive 2016 - A sales-oriented application that improves the way you share content
● Glint 2018 - Offers an employee engagement platform.

Microsoft acquisition of LinkedIn ($26.2 billion)


○ 590 million total LinkedIn users (2018)
○ Continues To Acquire Startups

Business units
Talent Solutions, Marketing Solutions, Premium Subscriptions, and Learning Solutions
Potential updates and Future of Linkedin

Magic Quadrant for Analytics and


Business Intelligence Platforms
Feb 2018
Potential updates and Future of Linkedin

• We can see more integration with other Microsoft products.

• Job seeker platform is growing and this is where LinkedIn's focus lies.

• Lynda (e-learning platform)- go-to place for many job seekers

• “Resumes will vanish”- technology will provide more signals

• Online tools will help recruiters test for Learning quotient


Submitted by Group 6
Gokulaparthiban C N IPMX11010
Manish Kumar IPMX11013
Tamoghna IPMX11037
Venkata Siva Teja IPMX11039
Vineeth Kumar IPMX11042

30

You might also like