SlideShare a Scribd company logo
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
Elastic @ Adobe: Making Search Smarter with Machine Learning at Scale

--- Baldo Faieta & Gaurav Kukal
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. 2
Adobe is unique in Search space as large
part of content is non textual in nature like
images, videos, 3d templates, psd, dcx but at
the same time billions of content pieces have
text as well ….
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
A buffet of Search Use Cases
3
1 2 3 4 5
Search Based
on Computer
Vision &
Metadata
Deep Textual
and hybrid
content Search
Video and
Richer format
Search
Enterprise
Search
Discovery and
Recommendation
s
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
“Computer vision is an interdisciplinary field that
deals with how computers can be made to gain
high-level understanding from digital
images or videos. From the perspective
of engineering, it seeks to automate tasks that
the human visual system can do”
Computer Vision, ML, AI to the rescue
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
Landscape of Search Product Wise
6
Adobe Experience CloudAdobe Creative Cloud Adobe Document Cloud
ADOBE MARKETING
CLOUD
ADOBE ANALYTICS CLOUD
ADOBE ADVERTISING CLOUD
Experience
Manager
Campaign PrimetimeTarget
Audience
Manager
Analytics
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
Landscape of Search Product Wise : Current State
7
Adobe Experience CloudAdobe Creative Cloud Adobe Document Cloud
ADOBE MARKETING
CLOUD
ADOBE ANALYTICS CLOUD
ADOBE ADVERTISING CLOUD
Experience
Manager
Campaign PrimetimeTarget
Audience
Manager
Analytics
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. 8
SEARCH EXPERIENCE
MATTERS MORE THAN
EVER
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
Architecture Search @Adobe
Birds Eye View
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
Adobe Search Elasticsearch Stats
3

Geographical Regions
~10
Billion
Documents
18

Production
Elastic Search Clusters
* As of June 2018
200
Shards

Max # Shards
16
~400 

Virtual Machines in AWS and
Azure
2
Public Clouds
~6000
Live Ingestion Rate/
second
~25000
Ingestion Rate/second
Capacity for Reindexing
~600
Queries Per Second
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
Adobe Search Elasticsearch : Self Managed
* As of June 2018
17
Self Managed Clusters on public clouds1
Moved Lr Search from AWS Elasticsearch hosted to Self-
Managed, 3.4 billion docs
2
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
Why Elasticsearch?
* As of June 2018
18
Because we have seen Solr Cloud code and cloud is an
after thought
1
Stability and resilience is very high (if done right )
2
Right balance of open-source with tight review process
3
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
Adobe Search Play with Elasticsearch
Elasticsearch as a white box ES Plugin for Image/Video similarity
& Search Ranking
* As of June 2018
19
1
Cost Saving: Hot & Cold Indices & Static and Live Indices
2
Index any Generic Entity into Elastic Search with zero code
change
3
Out of order Event : Optimistic locks using ES scripting
4
Custom ACLs in Elastic Search for Enterprise Searches
5
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
ML Deep dive
© 2018 Adobe Systems Incorporated. All Rights Reserved.
Stock powered by Elasticsearch
License images used in projects, ads,
websites, etc.
• Over 130M images
• Images indexed by tags, price, type,
…
• Novel visual search applications to
surface more results as well as
differentiator
Do these better using deep learned
representations (DLR)
© 2018 Adobe Systems Incorporated. All Rights Reserved.
Deep Learned Representation: Embedding
Project to continuous space where similarity of
particular property can be mapped to
(Euclidean) distance:
Dense, small (1k dimensions) vectors
Similarity score corresponds to square
distance
Dimensions usually don’t have meaning
Usually trained using (deep) neural
networks
Used to power deep search engine Image embedding space
Image Similar tags (“semantic”)
maps to
Word (word2vec) Similar word context
Face Similar face attributes
© 2018 Adobe Systems Incorporated. All Rights Reserved.
Image Embedding
Trained using a neural network (CNN)
On an auxiliary task (e.g., classification)
Layer 1
Layer 4 Layer 5
CNN
Embedding
Layer
1 DOG
0 SURF
1 SUNSET
0 SUN
1 WOMAN
0 PALM
0 HEAD
0 50s
1 BEACH
0 STARFISH
0 SMILE
0
RETRIEVER
…
…
WOMAN, DOG, BEACH, …
Training:
• Feed image and network tries to
predict tag indicators. Errors
backpropagated
Embedding layer = layer before last
• Embedding is output of embedding
layer
• Captures abstract rep. of tags for
image
• 1k dimensions vector
• Learns filters activated when
patterns arise at various layers
© 2018 Adobe Systems Incorporated. All Rights Reserved.
Deep Search Engine
24
Get embeddings for all images and index them
Deep
Search Engine
Embeddings
Query
embedding
Query
Query using an embedding corresp. to query image
Relevant results are those images whose embeddings
are closest to query embedding using Euclidean distance
Close by embeddings corresp. to ”semantically” similar
Finding similar images:
• Calculate similarity for all candidates
• Pick top k
© 2018 Adobe Systems Incorporated. All Rights Reserved.
Towards a Reverse Index for Embeddings
Too expensive to calc. distance with 100M+ per query 3
1 4
8
15
18
20
14
16
17
56
7
12
9
10
13
2
19
11
14
9
10
13
query
…Bucket1
Bucket2
Bucket3
…
…
…
…
So, bucketize (cluster) embeddings assigning them
to one of 1k buckets
Find nearest 20 buckets to query from bucket
centroids
Only compare query with embeddings of 20 nearest
buckets (2% of corpus)
Reverse index:
• Bucket to image embedding
© 2018 Adobe Systems Incorporated. All Rights Reserved.
PQ-Codes
Embeddings are still too big to keep in index (4kb *
100M+)
• Also, lots of floating point distance calculations
1011
0110
1111
0100
1010
01010010
1110
1001
1101
0111
0011
1000
1100
0001
0000
1011
0110
1111
0100
1010
01010010
1110
1001
1101
0111
0011
1000
1100
0001
0000
1011
0110
1111
0100
1010
0101 0010
1110
1001
1101
0111
0011
1000
1100
0001
0000
1011
0110
1111
0100
1010
01010010
1110
1001
1101
0111
0011
1000
1100
0001
0000
s1 s2 s63 s64
1k floats / 4KB
64 bytes
…
…
Compress embedding using trained encoder:
• Subdivide embedding space in (64) subspaces
During search:
• Because codes quantize sub-vectors, we can pre-
calculate values dependent on quantized -centroids
and bucket centroids
• Distance to candidate is fast because we can leverage
LUTs calculated once per query per bucket
• Cluster each subspace in 256 clusters
• Encode every subspace-vector of
embedding with ID of nearest cluster (as
byte)
• PQ-Code is concatenation of subspace IDs
• Store pqcode, bucketID in index
tabQC – 2.ql + tl + 2.tcl
Query
&
quantized-centroids
Query
&
Bucket
Bucket Bucket
&
Sub-centroids
PrecalculatedPrecalculatedOnce per query
per bucket
Once per query
© 2018 Adobe Systems Incorporated. All Rights Reserved.
Elasticsearch Pqcode Plugin
27
Has to work in conjunction with other asset data in ES
So, implement deep search as a plugin and store pqcodes in ES
Plugin implements comparison between query embedding and candidates pqcodes
CAS analyzer outputs query embedding
Reverse index used to limit search to nearest 20 buckets
Calculate scores for all candidates in buckets
CAS
Model Encode
r
Embeddin
g +
Nearest
Buckets
Elastic Search
ES Pqcode
Plugin
Encode
r
Images
index
pqcode
s
© 2018 Adobe Systems Incorporated. All Rights Reserved..
Demo – Find Similar Controls
© 2018 Adobe Systems Incorporated. All Rights Reserved..
Exploration/Refinement
• Majority 1, 2 word queries
Numberofsearches(M)
0
15
30
45
60
Number of words / query
1 2 3 4 5
• 1 word queries
• Banana, christmas, family, beach, food, flowers,
car, …
• Very general
• Exploration
• 2 word queries
• happy family, wood background, doctor patient,
business man,…
• Still too general
• Refinement by query rewrite
• Both modes can be helped by way of
clustering
© 2018 Adobe Systems Incorporated. All Rights Reserved.
Clustering PQ Codes
Use pqcodes for top 5k results
to cluster using k-means:
© 2018 Adobe Systems Incorporated. All Rights Reserved.
Clustering PQ Codes
Use pqcodes for top 5k results to
cluster using k-means:
Iteratively assign pqcodes to cluster
of closest centroid and recalculate
centroids until convergence
© 2018 Adobe Systems Incorporated. All Rights Reserved.
Clustering PQ Codes
Use pqcodes for top 5k results to
cluster using k-means:
Iteratively assign pqcodes to cluster
of closest centroid and recalculate
centroids until convergence
Centroids like queries and
assignments like find-similar
Decision to assign to a cluster uses
only additions and subtractions
s(y1,xi)−s(y2,xi) = tabQC1 + ql1
− tabQC2 − ql2
Distance to
Centroid Y1
Distance to
Centroid Y2
Once per
cluster centroid
Once per cluster centroid
per bucket
© 2018 Adobe Systems Incorporated. All Rights Reserved..
Demo – Clustering Stock Images
Elastic @ Adobe: Making Search Smarter with Machine Learning at Scale

More Related Content

PDF
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
PDF
How eStruxture Data Centers is Using ECE to Rapidly Scale Their Business
PDF
Combining Logs, Metrics, and Traces for Unified Observability
PDF
Get full visibility and find hidden security issues
PDF
Countering Threats with the Elastic Stack at CERDEC/ARL
PDF
Transformational Search Performance with EnergyIQ
PDF
Achieving cyber mission assurance with near real-time impact
PDF
American Ancestors Use Case - Scalability & Support Using the Elasticsearch S...
Siscale Lightning Talk: Automated Root Cause Analysis with Elastic Stack
How eStruxture Data Centers is Using ECE to Rapidly Scale Their Business
Combining Logs, Metrics, and Traces for Unified Observability
Get full visibility and find hidden security issues
Countering Threats with the Elastic Stack at CERDEC/ARL
Transformational Search Performance with EnergyIQ
Achieving cyber mission assurance with near real-time impact
American Ancestors Use Case - Scalability & Support Using the Elasticsearch S...

What's hot (20)

PDF
Security Events Logging at Bell with the Elastic Stack
PDF
InfoTrack: Creating a single source of truth with the Elastic Stack
PDF
Centralized logging in a changing environment at the UK’s DVLA
PDF
CSG’s Journey with Elastic
PDF
Elastic @ John Deere
PDF
IoTforReal Seminar slidedeck
PDF
Machine Learning for Anomaly Detection, Time Series Modeling, and More
PDF
#IoTforReal Seminar slidedeck (Codit Belgium - Ghelamco Arena Gent)
PDF
Kubernetes Jakarta Meetup 010 - Service Mesh Observability with Kiali
PDF
Infrastructure monitoring made easy, from ingest to insight
PPTX
Aeris + Cassandra: An IOT Solution Helping Automakers Make the Connected Car ...
PPTX
CI/CD for a Data Platform
PDF
Elastic and Google: Observability for multicloud and hybrid environments
PDF
CSX: Real-time Business Discovery with the Elastic Stack
PPTX
Elastic community Abidjan #225 meetup 08 May 2021
PDF
Securing APIs for ultimate security and privacy with Azure | Codit Webinar
PDF
Combining Logs, Metrics, and Traces for Unified Observability
PDF
Empower Your Security Practitioners with Elastic SIEM
PPTX
Hoe het Azure ecosysteem een cruciale rol speelt in uw IoT-oplossing (Glenn C...
PDF
Elastic Security : Protéger son entreprise avec la Suite Elastic
Security Events Logging at Bell with the Elastic Stack
InfoTrack: Creating a single source of truth with the Elastic Stack
Centralized logging in a changing environment at the UK’s DVLA
CSG’s Journey with Elastic
Elastic @ John Deere
IoTforReal Seminar slidedeck
Machine Learning for Anomaly Detection, Time Series Modeling, and More
#IoTforReal Seminar slidedeck (Codit Belgium - Ghelamco Arena Gent)
Kubernetes Jakarta Meetup 010 - Service Mesh Observability with Kiali
Infrastructure monitoring made easy, from ingest to insight
Aeris + Cassandra: An IOT Solution Helping Automakers Make the Connected Car ...
CI/CD for a Data Platform
Elastic and Google: Observability for multicloud and hybrid environments
CSX: Real-time Business Discovery with the Elastic Stack
Elastic community Abidjan #225 meetup 08 May 2021
Securing APIs for ultimate security and privacy with Azure | Codit Webinar
Combining Logs, Metrics, and Traces for Unified Observability
Empower Your Security Practitioners with Elastic SIEM
Hoe het Azure ecosysteem een cruciale rol speelt in uw IoT-oplossing (Glenn C...
Elastic Security : Protéger son entreprise avec la Suite Elastic
Ad

Similar to Elastic @ Adobe: Making Search Smarter with Machine Learning at Scale (20)

PDF
Open Source AI - News and examples
PPTX
IBM Developer Model Asset eXchange - Deep Learning for Everyone
PDF
Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024
PDF
Containers & AI - Beauty and the Beast!?!
PPTX
Oracle Data Science Platform
PDF
Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....
PPTX
Inteligencia artificial, open source e IBM Call for Code
PDF
Building Generative AI-infused apps: what's possible and how to start
PDF
2018 Oracle Impact 발표자료: Oracle Enterprise AI
PDF
GPT and Graph Data Science to power your Knowledge Graph
PPTX
Adobe Spark Meetup - 9/19/2018 - San Jose, CA
PDF
AWS で構築するコンピュータビジョンアプリケーション
PPTX
AEM DataLayer IMMERSE 2017 Presentation by Dan Klco
PPTX
Jeremy cabral search marketing summit - scraping data-driven content (1)
PPTX
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
PDF
20181127 オラクル講演資料(DataRobot AI Experience Tokyo)
PDF
Amazon Deeplens 와 컴퓨터 비전 딥러닝 어플리케이션 활용::Sunil Mallya::AWS Summit Seoul 2018
PDF
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
PDF
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
PPTX
search_demystified_presentation for SEO SE<
Open Source AI - News and examples
IBM Developer Model Asset eXchange - Deep Learning for Everyone
Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024
Containers & AI - Beauty and the Beast!?!
Oracle Data Science Platform
Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....
Inteligencia artificial, open source e IBM Call for Code
Building Generative AI-infused apps: what's possible and how to start
2018 Oracle Impact 발표자료: Oracle Enterprise AI
GPT and Graph Data Science to power your Knowledge Graph
Adobe Spark Meetup - 9/19/2018 - San Jose, CA
AWS で構築するコンピュータビジョンアプリケーション
AEM DataLayer IMMERSE 2017 Presentation by Dan Klco
Jeremy cabral search marketing summit - scraping data-driven content (1)
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
20181127 オラクル講演資料(DataRobot AI Experience Tokyo)
Amazon Deeplens 와 컴퓨터 비전 딥러닝 어플리케이션 활용::Sunil Mallya::AWS Summit Seoul 2018
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
search_demystified_presentation for SEO SE<
Ad

More from Elasticsearch (20)

PDF
An introduction to Elasticsearch's advanced relevance ranking toolbox
PDF
From MSP to MSSP using Elastic
PDF
Cómo crear excelentes experiencias de búsqueda en sitios web
PDF
Te damos la bienvenida a una nueva forma de realizar búsquedas
PDF
Tirez pleinement parti d'Elastic grâce à Elastic Cloud
PDF
Comment transformer vos données en informations exploitables
PDF
Plongez au cœur de la recherche dans tous ses états.
PDF
Modernising One Legal Se@rch with Elastic Enterprise Search [Customer Story]
PDF
An introduction to Elasticsearch's advanced relevance ranking toolbox
PDF
Welcome to a new state of find
PDF
Building great website search experiences
PDF
Keynote: Harnessing the power of Elasticsearch for simplified search
PDF
Cómo transformar los datos en análisis con los que tomar decisiones
PDF
Explore relève les défis Big Data avec Elastic Cloud
PDF
Comment transformer vos données en informations exploitables
PDF
Transforming data into actionable insights
PDF
Opening Keynote: Why Elastic?
PDF
Empowering agencies using Elastic as a Service inside Government
PDF
The opportunities and challenges of data for public good
PDF
Enterprise search and unstructured data with CGI and Elastic
An introduction to Elasticsearch's advanced relevance ranking toolbox
From MSP to MSSP using Elastic
Cómo crear excelentes experiencias de búsqueda en sitios web
Te damos la bienvenida a una nueva forma de realizar búsquedas
Tirez pleinement parti d'Elastic grâce à Elastic Cloud
Comment transformer vos données en informations exploitables
Plongez au cœur de la recherche dans tous ses états.
Modernising One Legal Se@rch with Elastic Enterprise Search [Customer Story]
An introduction to Elasticsearch's advanced relevance ranking toolbox
Welcome to a new state of find
Building great website search experiences
Keynote: Harnessing the power of Elasticsearch for simplified search
Cómo transformar los datos en análisis con los que tomar decisiones
Explore relève les défis Big Data avec Elastic Cloud
Comment transformer vos données en informations exploitables
Transforming data into actionable insights
Opening Keynote: Why Elastic?
Empowering agencies using Elastic as a Service inside Government
The opportunities and challenges of data for public good
Enterprise search and unstructured data with CGI and Elastic

Recently uploaded (20)

PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
REPORT: Heating appliances market in Poland 2024
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Electronic commerce courselecture one. Pdf
PDF
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
PDF
Sensors and Actuators in IoT Systems using pdf
PDF
CIFDAQ's Teaching Thursday: Moving Averages Made Simple
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PPTX
Cloud computing and distributed systems.
PDF
Newfamily of error-correcting codes based on genetic algorithms
PPTX
MYSQL Presentation for SQL database connectivity
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
cuic standard and advanced reporting.pdf
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
REPORT: Heating appliances market in Poland 2024
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Electronic commerce courselecture one. Pdf
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
Advanced Soft Computing BINUS July 2025.pdf
Review of recent advances in non-invasive hemoglobin estimation
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
Sensors and Actuators in IoT Systems using pdf
CIFDAQ's Teaching Thursday: Moving Averages Made Simple
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
20250228 LYD VKU AI Blended-Learning.pptx
GamePlan Trading System Review: Professional Trader's Honest Take
Cloud computing and distributed systems.
Newfamily of error-correcting codes based on genetic algorithms
MYSQL Presentation for SQL database connectivity
CIFDAQ's Market Insight: SEC Turns Pro Crypto
cuic standard and advanced reporting.pdf

Elastic @ Adobe: Making Search Smarter with Machine Learning at Scale

  • 1. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. Elastic @ Adobe: Making Search Smarter with Machine Learning at Scale
 --- Baldo Faieta & Gaurav Kukal
  • 2. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. 2 Adobe is unique in Search space as large part of content is non textual in nature like images, videos, 3d templates, psd, dcx but at the same time billions of content pieces have text as well ….
  • 3. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. A buffet of Search Use Cases 3 1 2 3 4 5 Search Based on Computer Vision & Metadata Deep Textual and hybrid content Search Video and Richer format Search Enterprise Search Discovery and Recommendation s
  • 4. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. “Computer vision is an interdisciplinary field that deals with how computers can be made to gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to automate tasks that the human visual system can do” Computer Vision, ML, AI to the rescue
  • 5. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
  • 6. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. Landscape of Search Product Wise 6 Adobe Experience CloudAdobe Creative Cloud Adobe Document Cloud ADOBE MARKETING CLOUD ADOBE ANALYTICS CLOUD ADOBE ADVERTISING CLOUD Experience Manager Campaign PrimetimeTarget Audience Manager Analytics
  • 7. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. Landscape of Search Product Wise : Current State 7 Adobe Experience CloudAdobe Creative Cloud Adobe Document Cloud ADOBE MARKETING CLOUD ADOBE ANALYTICS CLOUD ADOBE ADVERTISING CLOUD Experience Manager Campaign PrimetimeTarget Audience Manager Analytics
  • 8. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. 8 SEARCH EXPERIENCE MATTERS MORE THAN EVER
  • 9. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. Architecture Search @Adobe Birds Eye View
  • 10. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
  • 11. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
  • 12. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
  • 13. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
  • 14. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
  • 15. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
  • 16. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. Adobe Search Elasticsearch Stats 3
 Geographical Regions ~10 Billion Documents 18
 Production Elastic Search Clusters * As of June 2018 200 Shards
 Max # Shards 16 ~400 
 Virtual Machines in AWS and Azure 2 Public Clouds ~6000 Live Ingestion Rate/ second ~25000 Ingestion Rate/second Capacity for Reindexing ~600 Queries Per Second
  • 17. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. Adobe Search Elasticsearch : Self Managed * As of June 2018 17 Self Managed Clusters on public clouds1 Moved Lr Search from AWS Elasticsearch hosted to Self- Managed, 3.4 billion docs 2
  • 18. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. Why Elasticsearch? * As of June 2018 18 Because we have seen Solr Cloud code and cloud is an after thought 1 Stability and resilience is very high (if done right ) 2 Right balance of open-source with tight review process 3
  • 19. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. Adobe Search Play with Elasticsearch Elasticsearch as a white box ES Plugin for Image/Video similarity & Search Ranking * As of June 2018 19 1 Cost Saving: Hot & Cold Indices & Static and Live Indices 2 Index any Generic Entity into Elastic Search with zero code change 3 Out of order Event : Optimistic locks using ES scripting 4 Custom ACLs in Elastic Search for Enterprise Searches 5
  • 20. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. ML Deep dive
  • 21. © 2018 Adobe Systems Incorporated. All Rights Reserved. Stock powered by Elasticsearch License images used in projects, ads, websites, etc. • Over 130M images • Images indexed by tags, price, type, … • Novel visual search applications to surface more results as well as differentiator Do these better using deep learned representations (DLR)
  • 22. © 2018 Adobe Systems Incorporated. All Rights Reserved. Deep Learned Representation: Embedding Project to continuous space where similarity of particular property can be mapped to (Euclidean) distance: Dense, small (1k dimensions) vectors Similarity score corresponds to square distance Dimensions usually don’t have meaning Usually trained using (deep) neural networks Used to power deep search engine Image embedding space Image Similar tags (“semantic”) maps to Word (word2vec) Similar word context Face Similar face attributes
  • 23. © 2018 Adobe Systems Incorporated. All Rights Reserved. Image Embedding Trained using a neural network (CNN) On an auxiliary task (e.g., classification) Layer 1 Layer 4 Layer 5 CNN Embedding Layer 1 DOG 0 SURF 1 SUNSET 0 SUN 1 WOMAN 0 PALM 0 HEAD 0 50s 1 BEACH 0 STARFISH 0 SMILE 0 RETRIEVER … … WOMAN, DOG, BEACH, … Training: • Feed image and network tries to predict tag indicators. Errors backpropagated Embedding layer = layer before last • Embedding is output of embedding layer • Captures abstract rep. of tags for image • 1k dimensions vector • Learns filters activated when patterns arise at various layers
  • 24. © 2018 Adobe Systems Incorporated. All Rights Reserved. Deep Search Engine 24 Get embeddings for all images and index them Deep Search Engine Embeddings Query embedding Query Query using an embedding corresp. to query image Relevant results are those images whose embeddings are closest to query embedding using Euclidean distance Close by embeddings corresp. to ”semantically” similar Finding similar images: • Calculate similarity for all candidates • Pick top k
  • 25. © 2018 Adobe Systems Incorporated. All Rights Reserved. Towards a Reverse Index for Embeddings Too expensive to calc. distance with 100M+ per query 3 1 4 8 15 18 20 14 16 17 56 7 12 9 10 13 2 19 11 14 9 10 13 query …Bucket1 Bucket2 Bucket3 … … … … So, bucketize (cluster) embeddings assigning them to one of 1k buckets Find nearest 20 buckets to query from bucket centroids Only compare query with embeddings of 20 nearest buckets (2% of corpus) Reverse index: • Bucket to image embedding
  • 26. © 2018 Adobe Systems Incorporated. All Rights Reserved. PQ-Codes Embeddings are still too big to keep in index (4kb * 100M+) • Also, lots of floating point distance calculations 1011 0110 1111 0100 1010 01010010 1110 1001 1101 0111 0011 1000 1100 0001 0000 1011 0110 1111 0100 1010 01010010 1110 1001 1101 0111 0011 1000 1100 0001 0000 1011 0110 1111 0100 1010 0101 0010 1110 1001 1101 0111 0011 1000 1100 0001 0000 1011 0110 1111 0100 1010 01010010 1110 1001 1101 0111 0011 1000 1100 0001 0000 s1 s2 s63 s64 1k floats / 4KB 64 bytes … … Compress embedding using trained encoder: • Subdivide embedding space in (64) subspaces During search: • Because codes quantize sub-vectors, we can pre- calculate values dependent on quantized -centroids and bucket centroids • Distance to candidate is fast because we can leverage LUTs calculated once per query per bucket • Cluster each subspace in 256 clusters • Encode every subspace-vector of embedding with ID of nearest cluster (as byte) • PQ-Code is concatenation of subspace IDs • Store pqcode, bucketID in index tabQC – 2.ql + tl + 2.tcl Query & quantized-centroids Query & Bucket Bucket Bucket & Sub-centroids PrecalculatedPrecalculatedOnce per query per bucket Once per query
  • 27. © 2018 Adobe Systems Incorporated. All Rights Reserved. Elasticsearch Pqcode Plugin 27 Has to work in conjunction with other asset data in ES So, implement deep search as a plugin and store pqcodes in ES Plugin implements comparison between query embedding and candidates pqcodes CAS analyzer outputs query embedding Reverse index used to limit search to nearest 20 buckets Calculate scores for all candidates in buckets CAS Model Encode r Embeddin g + Nearest Buckets Elastic Search ES Pqcode Plugin Encode r Images index pqcode s
  • 28. © 2018 Adobe Systems Incorporated. All Rights Reserved.. Demo – Find Similar Controls
  • 29. © 2018 Adobe Systems Incorporated. All Rights Reserved.. Exploration/Refinement • Majority 1, 2 word queries Numberofsearches(M) 0 15 30 45 60 Number of words / query 1 2 3 4 5 • 1 word queries • Banana, christmas, family, beach, food, flowers, car, … • Very general • Exploration • 2 word queries • happy family, wood background, doctor patient, business man,… • Still too general • Refinement by query rewrite • Both modes can be helped by way of clustering
  • 30. © 2018 Adobe Systems Incorporated. All Rights Reserved. Clustering PQ Codes Use pqcodes for top 5k results to cluster using k-means:
  • 31. © 2018 Adobe Systems Incorporated. All Rights Reserved. Clustering PQ Codes Use pqcodes for top 5k results to cluster using k-means: Iteratively assign pqcodes to cluster of closest centroid and recalculate centroids until convergence
  • 32. © 2018 Adobe Systems Incorporated. All Rights Reserved. Clustering PQ Codes Use pqcodes for top 5k results to cluster using k-means: Iteratively assign pqcodes to cluster of closest centroid and recalculate centroids until convergence Centroids like queries and assignments like find-similar Decision to assign to a cluster uses only additions and subtractions s(y1,xi)−s(y2,xi) = tabQC1 + ql1 − tabQC2 − ql2 Distance to Centroid Y1 Distance to Centroid Y2 Once per cluster centroid Once per cluster centroid per bucket
  • 33. © 2018 Adobe Systems Incorporated. All Rights Reserved.. Demo – Clustering Stock Images