0% found this document useful (0 votes)

34 views1 page

How We Improved Our Performance Using ElasticSearch Plugins - Part 2 - by Xiaohu Li - Tinder Tech Blog - Medium

Uploaded by

nd0906

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views1 page

How We Improved Our Performance Using ElasticSearch Plugins - Part 2 - by Xiaohu Li - Tinder Tech Blog - Medium

Uploaded by

nd0906

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Search Write Sign up Sign in

How We Improved Our Performance

Using ElasticSearch Plugins: Part 2
Xiaohu Li · Follow
Published in Tinder Tech Blog · 7 min read · Sep 20, 2019

377 3

Written By: Daniel Geng, Software Engineer | Pierre Poitevin, Senior

Software Engineer| Xiaohu Li, Engineering Manager

We introduced the architecture and baseline performance benefits of the ES

plugin in Part 1. In this post, we will focus on a specific customization that
removes one of the largest bottlenecks in the recommendations ecosystem.

Problem
When we query ES to fetch recommendations to serve, we need to send a list
of users to skip. For example, users that you have already seen recently and
users that you are already matched with should not be recommended to you
again. This skip list can be reasonably high for very active users. We use the
terms query on ES for the skip list.

Terms query example:

1 {
2 "query": {
3 "filter": {
4 "bool": {
5 "must": [...],
6 "must_not": [
7 {
8 "terms": {
9 "user_number": [1,2,3,...]
10 }
11 }
12 ...
13 ]
14 }
15 }
16 }
17 }

terms_query_example.json hosted with ❤ by GitHub view raw

However, we suspected that the “terms” query was inefficient for very large
lists. We conducted a performance test using queries with skip lists of
different sizes using the “terms” filter. From the results below, performance
and skip list size have a clear inverse relationship.

p50 latency comparison with different skip list sizes (terms filter)

As seen above, the skip list is a considerable bottleneck in our ES

performance. We also performance tested the omission of other query
components, but the skip list was by far the largest contributor to latency.
While it would be possible to reduce its size to improve performance, this
could result in a negative user experience since users would potentially see
duplicate recommendations. Our goal is removing the bottleneck but leaving
the business logic intact.

Solution
Fundamentally, the solution is to find an alternative to using the terms
query. Our idea was to send a serialized skip list using a compressed data
structure, which could then be deserialized and used on the ES server.
Assuming that the serialization and deserialization overhead is acceptable,
not only would this reduce latency by avoiding a large terms query, but it
could also greatly reduce the size of our query requests.

Now that we are familiarized with the usage of the ES plugin, we thought
about how we could leverage it to optimize the skip list. In addition to adding
a new script that could be used by our LoaderPlugin, another possibility was
to add a new custom API using an ActionPlugin (similar to what we did for
observability in Part 1). We will cover implementation details and tradeoffs
below.

Plugin Types
ActionPlugin

To use the serialized skip list through a custom API, which we will call
“_newsearch”, the following steps must be made.

1. Serialize skip list on the ES client.

2. Send a query to ES using the _newsearch API and pass in the serialized
list.

3. In the ES cluster, the query node sends a search query to the data node
without the skip list. The requested document count is equal to the
requested document count sent by the client plus the size of the skip list
because the skip list will be applied on the query node.

4. Receive the ranked documents in the query node. Deserialize the skip
list. Include documents that are not in the skip list up to the requested
size sent by the client and return to the client.

Pros:

Easy to implement

Cons:

Unnecessary processing: the skip logic occurs after the sorting phase, so
relevance factor is calculated for documents in the skip list

Effect on data node load is query dependent

Increased load: potentially needs to rank extra documents, which may be heavy
for queries with large skip lists

Decreased load: skip list handling is moved to query nodes

Increased load on query nodes since it needs to deserialize and apply the
skip list

Updates require cluster restart since it does not take advantage of the
LoaderPlugin

Increased network traffic between query and data nodes

LoaderPlugin

To use the serialized skip list by leveraging the LoaderPlugin from Part 1, we
will need to add a new script to deserialize and apply the skip list. This new
script will use the following workflow.

1. Serialize skip list on the ES client.

2. Send a query to ES through the standard _search API. Send the serialized
skip list through “params” in the request. Specify a script that uses a skip
list deserializer in the “source” field. Add a “min_score” (a field from ES)
parameter to the query (used in next step). Here is an example:

1 {
2 "min_score": -100000,
3 "query": {
4 "filter": {
5 ...
6 },
7 "functions": [
8 {
9 "script_score": {
10 "script": {
11 "params": {
12 "key1": value1,
13 ...
14 },
15 "source": "my_bitmap_script",
16 "lang": "tinder_scripts"
17 }
18 }
19 }
20 ]
21 }
22 }

loader_plugin_query_sample.json hosted with ❤ by GitHub view raw

3. On the data node, the skip list will be deserialized. For documents that
should be skipped, the script will return a relevance factor lower than
min_score, so they will be omitted.

4. The remaining documents are returned to the client.

Pros:

Updating a script is easy since it uses the LoaderPlugin

No unnecessary relevance computation done on data nodes

Reduced load on data nodes, since deserializing a skip list is much faster
than having a large terms filter

No increased load on query nodes

Cons:

Using min_score for skipping may have limitations — it can be difficult to

determine the correct threshold

We conducted a performance test comparing the two implementations using

bitmap serialization and a skip list of size 50k.

ActionPlugin vs. LoaderPlugin: p50 latency (50k skip list size, bitmap serialization)

While the latency was similar at lower QPS, the difference is quite obvious at
125 QPS. As we originally expected, the LoaderPlugin yielded much better
performance.

Data structures
Now that we have determined which plugin implementation to use, we still
had to decide which data structure to use to serialize the skip list. To help us
make a decision, we conducted more performance tests for comparison. We
tested the following data structures.

Base case: terms query

Hash set

Bloom filter

Roaring bitmap

The size of the serialized skip list has a potential impact on ES network
latency since it will be included in the request. Using a skip list of size 10
million, the serialized skip list size of each implementation is shown below.

Since a standard hash set is not meant for compression, it was expected that
it would be larger than the raw values in terms. Inversely, the bloom filter
and roaring bitmap generated serialized skip lists that were much smaller.
Although a smaller size will result in reduced network bandwidth usage, it
may not have any correlation with reduced latency or cluster load.
Therefore, we implemented each data structure using a LoaderPlugin script
and tested latency using various skip list sizes.

Below are the results when comparing the data structures using a skip list of
size 10k.

P50 latency with 10k skip list size

It was clear that bloom filter and bitmap were much better than the rest of
the pack. We did more performance testing comparing those two with larger
skip lists. Below are the results when using a skip list of size 40k.

Bloom vs. Bitmap: p50 latency with 40k exclusion list

Even though bloom filter is slightly faster than bitmap, it has the issue of
false positives. Since the difference in performance is small, we decided to
use bitmap because it does not impact our business logic.

Final decision
In the end, we decided to adopt RoaringBitmap as the data structure and
implement it as a LoaderPlugin.

Impact
By quickly iterating through our ES plugin development cycle, we are able to
validate the functionality of this plugin in production by keeping the hit size
intact and roll this out 100% transparently to our valued users.

More importantly, we see even greater performance gains in production

than our performance test results. Here are the numbers:

Around 80% reduction across all latency percentiles(green line is original

“terms” query, blue line is RoaringBitmap + LoaderPlugin):

P50: 65ms -> 13ms

P90: 170ms -> 35ms

P99: 400ms -> 80ms

Around 35% — 50% CPU utilization drop for query nodes and data nodes
respectively.

Query Nodes

Data Nodes

Around 50% reduction on query and data nodes’ network IO

We are very excited to announce that by releasing this plugin, we are not
only able to provide a better user experience while keeping business logic
intact, but also gain significant headroom of our cluster capacity for future
growth.

Summary
After building out a framework for plugin dynamic loading and iterating, we
pushed our cluster’s performance to the next level by actively identifying our
current bottlenecks, investigating and testing different options, and finally
delivering benefits to our end users. A few key findings we got during this
process:

Large size of terms filter results in bad performance. Lucene/ES is not

designed to handle such cases efficiently

Among all the solutions we explored, RoaringBitmap provides the best

compression rate, a relatively fast serialization/deserialization
performance, and avoids false positives

Performance tests provide critical insights for design decisions

By leveraging scoring functions to handle skip, we are able to support

more sophisticated logic going forward, as long as the information/logic
is passed through the query parameter

This concludes our latest innovation on how we operate our ES cluster and
make it hyper-scaled, which is just one of many engineering challenges we
are tackling at Tinder. If you are interested in challenging yourself and want
to work with talented teammates, please take a look at our job website for
openings.

Elasticsearch Plugins Performance Tuning

377 3

Written by Xiaohu Li Follow

86 Followers · Writer for Tinder Tech Blog

Xiaohu Li in Tinder Tech Blog Tinder in Tinder Tech Blog

How We Improved Our How we built the Tinder API

Performance Using ElasticSearch… Gateway
Written By: Pierre Poitevin, Senior Software Authored by: Vijayvangapandu Vijaya
Engineer|Daniel Geng, Software Engineer |… Vangapandu Distinguished Software…

12 min read · Sep 5, 2019 9 min read · Oct 25, 2022

342 2.1K 11

Frank Ren in Tinder Tech Blog Xiaohu Li in Tinder Tech Blog

Geosharded Recommendations Geosharded Recommendations

Part 1: Sharding Approach Part 2: Architecture
Authors: Frank Ren|Director, Backend Authors: Frank Ren|Director, Backend
Engineering, Xiaohu Li|Manager, Backend… Engineering, Xiaohu Li|Manager, Backend…

9 min read · May 15, 2019 7 min read · May 30, 2019

842 1 278 1

See all from Xiaohu Li See all from Tinder Tech Blog

Recommended from Medium

Konstantin Sofeikov BigM

Open source rule engine for fintech Semantic search with Vector
domain embeddings using Elasticsearch
I work in the fintech industry, specifically in So, why vector search ? 🤔
the field of financial crime. Quite often,…

3 min read · Oct 16, 2023 6 min read · Oct 27, 2023

55 8

Lists

Natural Language Processing

1276 stories · 768 saves

Pinterest Engineeri… in Pinterest Engineering Bl… Naga Brahmam Y

Deep Multi-task Learning and Choosing Between Elastic

Real-time Personalization for… Enterprise Search and…
Haomiao Li | Software Engineer, Closeup Elastic Enterprise Search and Elasticsearch
Ranking & Blending; Travis Ebesu | Software… are both powerful search solutions, but they…

10 min read · Jun 13, 2023 2 min read · Dec 27, 2023

269 4

Saeed Zarinfam in ITNEXT Utkarsh Ver… in Sixt Research & Development In…

How to detect cache misses using Java Virtual Threads vs Platform

observability Threads: Performance Comparis…
Fusing Spring Boot and Digma Before we begin comparing the performance
of virtual threads to platform threads we nee…

· 9 min read · 6 days ago 4 min read · Sep 21, 2023

155 73

See more recommendations

Help Status About Careers Blog Privacy Terms Text to speech Teams

Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (648)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2886)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Business Requirement Document (BRD)
No ratings yet
Business Requirement Document (BRD)
8 pages
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Anargharāghava of Murāri PDF
100% (2)
Anargharāghava of Murāri PDF
639 pages
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
GC WebCollect Error Codes Overview 6.8.2
100% (1)
GC WebCollect Error Codes Overview 6.8.2
65 pages
648938EN_05
No ratings yet
648938EN_05
2 pages
SRS Template of Sample Case Study - 24032018
No ratings yet
SRS Template of Sample Case Study - 24032018
11 pages
Frost Banks Journey To A Paperless Trust Department
No ratings yet
Frost Banks Journey To A Paperless Trust Department
6 pages
VUGen Recording Options in LoadRunner
No ratings yet
VUGen Recording Options in LoadRunner
14 pages
Website Performance Testing Tools and Services
No ratings yet
Website Performance Testing Tools and Services
6 pages
SQAT - FMO - FuncReq Spec - v1.6
No ratings yet
SQAT - FMO - FuncReq Spec - v1.6
110 pages
Tài liệu giới thiệu giải pháp phần mềm quản lý bán hàng trực tuyến
No ratings yet
Tài liệu giới thiệu giải pháp phần mềm quản lý bán hàng trực tuyến
51 pages
Babel Street Analytics Name Match For Elasticsearch - 2024 03 23 172617 - LGWF
No ratings yet
Babel Street Analytics Name Match For Elasticsearch - 2024 03 23 172617 - LGWF
2 pages
Binance UI
No ratings yet
Binance UI
4 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
FONETICA
No ratings yet
FONETICA
8 pages
Trasformation Sentences
No ratings yet
Trasformation Sentences
68 pages
Shital Ingle Developer - CV
No ratings yet
Shital Ingle Developer - CV
1 page
Grammar and Usage Student Quiz A
No ratings yet
Grammar and Usage Student Quiz A
10 pages
Final Time Table For Mock 2025
No ratings yet
Final Time Table For Mock 2025
2 pages
Elsc 109 Module 1 - Fbadua
No ratings yet
Elsc 109 Module 1 - Fbadua
27 pages
VMix Control Panel Manual - 20201020
No ratings yet
VMix Control Panel Manual - 20201020
27 pages
List of Deepfake Tools
No ratings yet
List of Deepfake Tools
5 pages
Comic Strips: Comic Strip Definition & Meaning
No ratings yet
Comic Strips: Comic Strip Definition & Meaning
20 pages
Ahmed Messlmani: Skills Work Experience
No ratings yet
Ahmed Messlmani: Skills Work Experience
1 page
"My Name Was Isabella Linton" - Coverture, Domestic Violence, and Mrs. Heathcliff's Narrative in Wuthering Heights
No ratings yet
"My Name Was Isabella Linton" - Coverture, Domestic Violence, and Mrs. Heathcliff's Narrative in Wuthering Heights
37 pages
Poets and Pancakes
No ratings yet
Poets and Pancakes
2 pages
Figure of Speech
No ratings yet
Figure of Speech
21 pages
Akademik - Its.ac - Id Rep Transkrip Sementara - PHP
No ratings yet
Akademik - Its.ac - Id Rep Transkrip Sementara - PHP
1 page
5 Meeting Preparation and Format
No ratings yet
5 Meeting Preparation and Format
9 pages
6to Año TALKING ABOUT PLANS
No ratings yet
6to Año TALKING ABOUT PLANS
3 pages
2nd Term - Stories - Future Will & Going To
No ratings yet
2nd Term - Stories - Future Will & Going To
14 pages
G10 Least Mastered Skills With Intervention ENGLISH 2020 2021
100% (3)
G10 Least Mastered Skills With Intervention ENGLISH 2020 2021
2 pages
Time RPH Year 4
No ratings yet
Time RPH Year 4
6 pages
B.E (2019 Pattern)
No ratings yet
B.E (2019 Pattern)
2 pages
Select Statements in Sap Abap
No ratings yet
Select Statements in Sap Abap
7 pages
English in Mind 5 Teacher S Book
No ratings yet
English in Mind 5 Teacher S Book
184 pages
Presentation 94
No ratings yet
Presentation 94
5 pages
RZL Frailocracy
No ratings yet
RZL Frailocracy
2 pages
Sri Vraja Dhama Mahimamrta
No ratings yet
Sri Vraja Dhama Mahimamrta
3 pages
Special Study On PNUEMA HAGION V5a
No ratings yet
Special Study On PNUEMA HAGION V5a
174 pages
Molloy College Division of Education Lesson Plan
No ratings yet
Molloy College Division of Education Lesson Plan
4 pages
Unifying Large Language Models and Knowledge Graphs A Roadmap
No ratings yet
Unifying Large Language Models and Knowledge Graphs A Roadmap
20 pages

How We Improved Our Performance Using ElasticSearch Plugins - Part 2 - by Xiaohu Li - Tinder Tech Blog - Medium

Uploaded by

How We Improved Our Performance Using ElasticSearch Plugins - Part 2 - by Xiaohu Li - Tinder Tech Blog - Medium

Uploaded by

Search Write Sign up Sign in

How We Improved Our Performance

Written By: Daniel Geng, Software Engineer | Pierre Poitevin, Senior

We introduced the architecture and baseline performance benefits of the ES

Terms query example:

terms_query_example.json hosted with ❤ by GitHub view raw

As seen above, the skip list is a considerable bottleneck in our ES

1. Serialize skip list on the ES client.

Effect on data node load is query dependent

Decreased load: skip list handling is moved to query nodes

Increased network traffic between query and data nodes

1. Serialize skip list on the ES client.

loader_plugin_query_sample.json hosted with ❤ by GitHub view raw

4. The remaining documents are returned to the client.

Updating a script is easy since it uses the LoaderPlugin

No unnecessary relevance computation done on data nodes

No increased load on query nodes

Using min_score for skipping may have limitations — it can be difficult to

We conducted a performance test comparing the two implementations using

Base case: terms query

P50 latency with 10k skip list size

Bloom vs. Bitmap: p50 latency with 40k exclusion list

More importantly, we see even greater performance gains in production

Around 80% reduction across all latency percentiles(green line is original

P50: 65ms -> 13ms

P90: 170ms -> 35ms

P99: 400ms -> 80ms

Around 50% reduction on query and data nodes’ network IO

Large size of terms filter results in bad performance. Lucene/ES is not

Among all the solutions we explored, RoaringBitmap provides the best

Performance tests provide critical insights for design decisions

By leveraging scoring functions to handle skip, we are able to support

Elasticsearch Plugins Performance Tuning

Written by Xiaohu Li Follow

86 Followers · Writer for Tinder Tech Blog

More from Xiaohu Li and Tinder Tech Blog

Xiaohu Li in Tinder Tech Blog Tinder in Tinder Tech Blog

How We Improved Our How we built the Tinder API

12 min read · Sep 5, 2019 9 min read · Oct 25, 2022

Frank Ren in Tinder Tech Blog Xiaohu Li in Tinder Tech Blog

Geosharded Recommendations Geosharded Recommendations

Recommended from Medium

Konstantin Sofeikov BigM

Natural Language Processing

Pinterest Engineeri… in Pinterest Engineering Bl… Naga Brahmam Y

Deep Multi-task Learning and Choosing Between Elastic

How to detect cache misses using Java Virtual Threads vs Platform

· 9 min read · 6 days ago 4 min read · Sep 21, 2023

See more recommendations

You might also like