0% found this document useful (0 votes)

20 views11 pages

Search Queries Anomaly Detection Using Python

Uploaded by

kashif majeed janjua

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views11 pages

Search Queries Anomaly Detection Using Python

Uploaded by

kashif majeed janjua

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Search Queries Anomaly Detection using

Python
 Aman Kharwal


 November 20, 2023

  Machine Learning

Search Queries Anomaly Detection means identifying queries that are outliers according to their
performance metrics. It is valuable for businesses to spot potential issues or opportunities, such
as unexpectedly high or low CTRs. If you want to learn how to detect anomalies in search
queries, this article is for you. In this article, I’ll take you through the task of Search Queries
Anomaly Detection with Machine Learning using Python.

Search Queries Anomaly Detection: Process We Can Follow

Search Queries Anomaly Detection is a technique to identify unusual or unexpected patterns in
search query data. Below is the process we can follow for the task of Search Queries Anomaly
Detection:

1. Gather historical search query data from the source, such as a search engine or a
website’s search functionality.
2. Conduct an initial analysis to understand the distribution of search queries, their
frequency, and any noticeable patterns or trends.
3. Create relevant features or attributes from the search query data that can aid in anomaly
detection.
4. Choose an appropriate anomaly detection algorithm. Common methods include statistical
approaches like Z-score analysis and machine learning algorithms like Isolation Forests
or One-Class SVM.
5. Train the selected model on the prepared data.
6. Apply the trained model to the search query data to identify anomalies or outliers.

So, the process starts with collecting a dataset based on search queries. I found an ideal dataset
for this task. You can download the dataset from here.

Search Queries Anomaly Detection using Python

Now, let’s get started with the task of Search Queries Anomaly Detection by importing the
necessary Python libraries and the dataset:
1
import pandas as pd
2
from collections import Counter
3
import re
4
import plotly.express as px
5
import plotly.io as pio
6
pio.templates.default = "plotly_white"
7

8
queries_df = pd.read_csv("Queries.csv")
9
print(queries_df.head())
Top queries Clicks Impressions CTR \
0 number guessing game python 5223 14578 35.83%
1 thecleverprogrammer 2809 3456 81.28%
2 python projects with source code 2077 73380 2.83%
3 classification report in machine learning 2012 4959 40.57%
4 the clever programmer 1931 2528 76.38%

Position
0 1.61
1 1.02
2 5.94
3 1.28
4 1.09

Exploratory Data Analysis

Let’s have a look at the column insights before moving forward:

1
print(queries_df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Top queries 1000 non-null object
1 Clicks 1000 non-null int64
2 Impressions 1000 non-null int64
3 CTR 1000 non-null object
4 Position 1000 non-null float64
dtypes: float64(1), int64(2), object(2)
memory usage: 39.2+ KB
None

Now, let’s convert the CTR column from a percentage string to a float:

1
# Cleaning CTR column
2
queries_df['CTR'] = queries_df['CTR'].str.rstrip('%').astype('float') / 100

Now, let’s analyze common words in each search query:

1
# Function to clean and split the queries into words
2
def clean_and_split(query):
3
words = re.findall(r'\b[a-zA-Z]+\b', query.lower())
4
return words
5

6
# Split each query into words and count the frequency of each word
7
word_counts = Counter()
8
for query in queries_df['Top queries']:
9
word_counts.update(clean_and_split(query))
10

11
word_freq_df = pd.DataFrame(word_counts.most_common(20), columns=['Word',
'Frequency'])
12
13
# Plotting the word frequencies
14
fig = px.bar(word_freq_df, x='Word', y='Frequency', title='Top 20 Most Common
Words in Search Queries')
15
fig.show()

Now, let’s have a look at the top queries by clicks and impressions:

1
# Top queries by Clicks and Impressions
2
top_queries_clicks_vis = queries_df.nlargest(10, 'Clicks')[['Top queries',
'Clicks']]
3
top_queries_impressions_vis = queries_df.nlargest(10, 'Impressions')[['Top
queries', 'Impressions']]
4

5
# Plotting
6
fig_clicks = px.bar(top_queries_clicks_vis, x='Top queries', y='Clicks',
title='Top Queries by Clicks')
7
fig_impressions = px.bar(top_queries_impressions_vis, x='Top queries',
y='Impressions', title='Top Queries by Impressions')
8
fig_clicks.show()
9
fig_impressions.show()
Now, let’s analyze the queries with the highest and lowest CTRs:
1
# Queries with highest and lowest CTR
2
top_ctr_vis = queries_df.nlargest(10, 'CTR')[['Top queries', 'CTR']]
3
bottom_ctr_vis = queries_df.nsmallest(10, 'CTR')[['Top queries', 'CTR']]
4

5
# Plotting
6
fig_top_ctr = px.bar(top_ctr_vis, x='Top queries', y='CTR', title='Top Queries
by CTR')
7
fig_bottom_ctr = px.bar(bottom_ctr_vis, x='Top queries', y='CTR',
title='Bottom Queries by CTR')
8
fig_top_ctr.show()
9
fig_bottom_ctr.show()
Now, let’s have a look at the correlation between different metrics:
1
# Correlation matrix visualization
2
correlation_matrix = queries_df[['Clicks', 'Impressions', 'CTR',
'Position']].corr()
3
fig_corr = px.imshow(correlation_matrix, text_auto=True, title='Correlation
Matrix')
4
fig_corr.show()

In this correlation matrix:

1. Clicks and Impressions are positively correlated, meaning more Impressions tend to lead
to more Clicks.
2. Clicks and CTR have a weak positive correlation, implying that more Clicks might
slightly increase the Click-Through Rate.
3. Clicks and Position are weakly negatively correlated, suggesting that higher ad or page
Positions may result in fewer Clicks.
4. Impressions and CTR are negatively correlated, indicating that higher Impressions tend to
result in a lower Click-Through Rate.
5. Impressions and Position are positively correlated, indicating that ads or pages in higher
Positions receive more Impressions.
6. CTR and Position have a strong negative correlation, meaning that higher Positions result
in lower Click-Through Rates.

Detecting Anomalies in Search Queries

Now, let’s detect anomalies in search queries. You can use various techniques for anomaly
detection. A simple and effective method is the Isolation Forest algorithm, which works well
with different data distributions and is efficient with large datasets:

1
from sklearn.ensemble import IsolationForest
2

3
# Selecting relevant features
4
features = queries_df[['Clicks', 'Impressions', 'CTR', 'Position']]
5

6
# Initializing Isolation Forest
7
iso_forest = IsolationForest(n_estimators=100, contamination=0.01) #
contamination is the expected proportion of outliers
8

9
# Fitting the model
10
iso_forest.fit(features)
11

12
# Predicting anomalies
13
queries_df['anomaly'] = iso_forest.predict(features)
14

15
# Filtering out the anomalies
16
anomalies = queries_df[queries_df['anomaly'] == -1]

Here’s how to analyze the detected anomalies to understand their nature and whether they
represent true outliers or data errors:
1
print(anomalies[['Top queries', 'Clicks', 'Impressions', 'CTR', 'Position']])
Top queries Clicks Impressions CTR Position
0 number guessing game python 5223 14578 0.3583 1.61
1 thecleverprogrammer 2809 3456 0.8128 1.02
2 python projects with source code 2077 73380 0.0283 5.94
4 the clever programmer 1931 2528 0.7638 1.09
15 rock paper scissors python 1111 35824 0.0310 7.19
21 classification report 933 39896 0.0234 7.53
34 machine learning roadmap 708 42715 0.0166 8.97
82 r2 score 367 56322 0.0065 9.33
167 text to handwriting 222 11283 0.0197 28.52
929 python turtle 52 18228 0.0029 18.75

The anomalies in our search query data are not just outliers. They are indicators of potential
areas for growth, optimization, and strategic focus. These anomalies are reflecting emerging
trends or areas of growing interest. Staying responsive to these trends will help in maintaining
and growing the website’s relevance and user engagement.

Summary

So, Search Queries Anomaly Detection means identifying queries that are outliers according to
their performance metrics. It is valuable for businesses to spot potential issues or opportunities,
such as unexpectedly high or low CTRs. I hope you liked this article on Search Queries Anomaly
Detection with Machine Learning using Python. Feel free to ask valuable questions in the
comments section below.

https://thecleverprogrammer.com/2023/11/20/search-queries-anomaly-detection-using-python/

Week3 Exame3
75% (4)
Week3 Exame3
31 pages
Beginners Guide For John The Ripper
No ratings yet
Beginners Guide For John The Ripper
29 pages
Pyspark PDF
100% (1)
Pyspark PDF
406 pages
Query Quake
No ratings yet
Query Quake
5 pages
Python Indepth Live Session
No ratings yet
Python Indepth Live Session
8 pages
Data Science Classes
No ratings yet
Data Science Classes
13 pages
Data Science
No ratings yet
Data Science
13 pages
Exercise5 Solution
No ratings yet
Exercise5 Solution
22 pages
Data Science Masters Pro 2024 (Syllabus)
No ratings yet
Data Science Masters Pro 2024 (Syllabus)
16 pages
Datascienceusing Python Training
No ratings yet
Datascienceusing Python Training
11 pages
Unit 7: Problem Solving Real World Programming Problems
No ratings yet
Unit 7: Problem Solving Real World Programming Problems
36 pages
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
Data Science & Machine Learning Using Python - CDR
No ratings yet
Data Science & Machine Learning Using Python - CDR
8 pages
Learninng Plan
No ratings yet
Learninng Plan
6 pages
1.2.1. Retrieving Data - 1.2.2. Cleaning Data
No ratings yet
1.2.1. Retrieving Data - 1.2.2. Cleaning Data
35 pages
Data Science - A First Introduction With Python (Z-Lib - Io)
No ratings yet
Data Science - A First Introduction With Python (Z-Lib - Io)
452 pages
Python and PowerBI Syllabus
No ratings yet
Python and PowerBI Syllabus
3 pages
Industrialreport
No ratings yet
Industrialreport
26 pages
Cleaning Data in Python
No ratings yet
Cleaning Data in Python
47 pages
DR Kruti Dangarwala CSE & IT Department Svmit: Python For Data Science Unit 5: Data Wrangling
No ratings yet
DR Kruti Dangarwala CSE & IT Department Svmit: Python For Data Science Unit 5: Data Wrangling
91 pages
Ds Final
No ratings yet
Ds Final
45 pages
ML Aml Cse It Lab Manual Final
No ratings yet
ML Aml Cse It Lab Manual Final
22 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Advanced Python Programming Data Science: The University of Sheffield
No ratings yet
Advanced Python Programming Data Science: The University of Sheffield
55 pages
Data Analysis Python Read The Docs Io en Latest
No ratings yet
Data Analysis Python Read The Docs Io en Latest
79 pages
Introduction to Machine Learning and Neural Classification
From Everand
Introduction to Machine Learning and Neural Classification
Trilokesh Khatri
No ratings yet
Preprocessing Data For Machine Learning: Sarah Guido
No ratings yet
Preprocessing Data For Machine Learning: Sarah Guido
21 pages
Data Science Toc Srinivas
No ratings yet
Data Science Toc Srinivas
4 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Hduud
No ratings yet
Hduud
55 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
2 DataPreProcessing Code
No ratings yet
2 DataPreProcessing Code
46 pages
Py Spark
No ratings yet
Py Spark
427 pages
Learning Apache Spark With Python: Wenqiang Feng
No ratings yet
Learning Apache Spark With Python: Wenqiang Feng
8 pages
Py Spark
No ratings yet
Py Spark
427 pages
Python Data Science Group Bootcamp NYC (Affordable Machine Learning)
No ratings yet
Python Data Science Group Bootcamp NYC (Affordable Machine Learning)
16 pages
Data Manipulation in Python Using Pandas
No ratings yet
Data Manipulation in Python Using Pandas
12 pages
Project Report
No ratings yet
Project Report
37 pages
Vishnu. ML
No ratings yet
Vishnu. ML
26 pages
Python Machine Learning: A Practical Beginner's Guide to Understanding Machine Learning, Deep Learning and Neural Networks with Python, Scikit-Learn, Tensorflow and Keras
From Everand
Python Machine Learning: A Practical Beginner's Guide to Understanding Machine Learning, Deep Learning and Neural Networks with Python, Scikit-Learn, Tensorflow and Keras
Brandon Railey
No ratings yet
Pandas Tutorial
No ratings yet
Pandas Tutorial
1 page
POA - Tracker
No ratings yet
POA - Tracker
60 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
Vamshi ml-1,2
No ratings yet
Vamshi ml-1,2
25 pages
l9 Scientific Python Proc
No ratings yet
l9 Scientific Python Proc
30 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
A Z Cheatsheet Python DA
No ratings yet
A Z Cheatsheet Python DA
7 pages
Dsmlusingpython
No ratings yet
Dsmlusingpython
10 pages
Preprocessing ch.1
No ratings yet
Preprocessing ch.1
24 pages
AI (Syllabus)
No ratings yet
AI (Syllabus)
7 pages
Pyspark PDF
100% (1)
Pyspark PDF
397 pages
LTCWFF PDF
100% (1)
LTCWFF PDF
259 pages
Sarkar, DR Tirthajyoti - Roychowdhury, Shubhadeep - Data Wrangling With Python - Creating Actionable Data From Raw Sources-Packt Publishing (2019)
No ratings yet
Sarkar, DR Tirthajyoti - Roychowdhury, Shubhadeep - Data Wrangling With Python - Creating Actionable Data From Raw Sources-Packt Publishing (2019)
538 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
AI Using Python
No ratings yet
AI Using Python
9 pages
Data Science Full Archive Notes ?
No ratings yet
Data Science Full Archive Notes ?
3 pages
Datasciendeusingpython 6 Weeks
No ratings yet
Datasciendeusingpython 6 Weeks
7 pages
0805 Learning Apache Spark With Python
No ratings yet
0805 Learning Apache Spark With Python
147 pages
CH 3 2
No ratings yet
CH 3 2
17 pages
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
From Everand
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
Calvert Long
No ratings yet
Graph Data Science with Python and Neo4j: Hands-on Projects on Python and Neo4j Integration for Data Visualization and Analysis Using Graph Data Science for Building Enterprise Strategies (English Edition)
From Everand
Graph Data Science with Python and Neo4j: Hands-on Projects on Python and Neo4j Integration for Data Visualization and Analysis Using Graph Data Science for Building Enterprise Strategies (English Edition)
Timothy Eastridge
No ratings yet
Farooq 040
No ratings yet
Farooq 040
35 pages
Pentestmonkey
No ratings yet
Pentestmonkey
5 pages
Resume 1
No ratings yet
Resume 1
2 pages
Attacking Modern Environments With MS
No ratings yet
Attacking Modern Environments With MS
45 pages
Python Complete Course
No ratings yet
Python Complete Course
124 pages
Dentifying The USB Storage To Be Allowed
No ratings yet
Dentifying The USB Storage To Be Allowed
6 pages
Cracking
No ratings yet
Cracking
2 pages
Cracking Linux Passwords and Pentesting With Grep2
No ratings yet
Cracking Linux Passwords and Pentesting With Grep2
17 pages
Analyzing Malicious PDFs Documents
No ratings yet
Analyzing Malicious PDFs Documents
34 pages
A C P W I C S: Ttacks To Ryptography Rotocols of Ireless Ndustrial Ommunication Ystems
No ratings yet
A C P W I C S: Ttacks To Ryptography Rotocols of Ireless Ndustrial Ommunication Ystems
6 pages
4660 Lab6
No ratings yet
4660 Lab6
13 pages
Hacking Into Windows 10 Using Metasploit Framework
No ratings yet
Hacking Into Windows 10 Using Metasploit Framework
13 pages
Tafsir e Hallul Quran Vol 01
No ratings yet
Tafsir e Hallul Quran Vol 01
555 pages
Unit 1: MPI (CST-282, ITT-282) SUBMISSION DATE: 14.02.2020
No ratings yet
Unit 1: MPI (CST-282, ITT-282) SUBMISSION DATE: 14.02.2020
9 pages
ch1 - Database System Concepts
No ratings yet
ch1 - Database System Concepts
17 pages
MCA Syllabus
No ratings yet
MCA Syllabus
8 pages
A Concise Survey Paper On Automated Plant Irrigation System
No ratings yet
A Concise Survey Paper On Automated Plant Irrigation System
7 pages
IFX Expo Limassol - Exhibitor Manual
No ratings yet
IFX Expo Limassol - Exhibitor Manual
19 pages
Cse Computer Forensics PPT 38
No ratings yet
Cse Computer Forensics PPT 38
21 pages
In21 EN2853 ProgrammingAssignment2
No ratings yet
In21 EN2853 ProgrammingAssignment2
3 pages
12CS em 2025
No ratings yet
12CS em 2025
193 pages
Artificial Intelligence: Machine Learning Algorithms Id3 Dbscan
No ratings yet
Artificial Intelligence: Machine Learning Algorithms Id3 Dbscan
30 pages
Thesis About Computer Laboratory
100% (2)
Thesis About Computer Laboratory
8 pages
Grade XI: Computer Science Project Work: Submitted By: Rashihang Rai
No ratings yet
Grade XI: Computer Science Project Work: Submitted By: Rashihang Rai
21 pages
FortiGate 40C
No ratings yet
FortiGate 40C
2 pages
Chapter 1 - Information Theory
No ratings yet
Chapter 1 - Information Theory
55 pages
Egov Bancnet Corporate User'S Manual
No ratings yet
Egov Bancnet Corporate User'S Manual
66 pages
Badar Part
No ratings yet
Badar Part
2 pages
Schneider Sebastian
No ratings yet
Schneider Sebastian
42 pages
Vci v3 Api Manual PDF
No ratings yet
Vci v3 Api Manual PDF
106 pages
Case Study Mysql
0% (1)
Case Study Mysql
3 pages
Config Idevice Standard DOCU V1d0 en
No ratings yet
Config Idevice Standard DOCU V1d0 en
44 pages
Wireless Network Assignment
No ratings yet
Wireless Network Assignment
5 pages
Brochure AuthPoint
No ratings yet
Brochure AuthPoint
4 pages
Spring Security in Action 1st Edition Laurentiu Spilca Download
No ratings yet
Spring Security in Action 1st Edition Laurentiu Spilca Download
61 pages
Patient Monitor: Series
No ratings yet
Patient Monitor: Series
498 pages
Jobvacancyresult Com
No ratings yet
Jobvacancyresult Com
4 pages
Power System Security
88% (40)
Power System Security
32 pages
manual-en-EU Automate Diseño AI 3shape
No ratings yet
manual-en-EU Automate Diseño AI 3shape
26 pages
Online Exams: SR. NO Olympiad Date of Examination Registration Fees Cost of Books (Optional)
No ratings yet
Online Exams: SR. NO Olympiad Date of Examination Registration Fees Cost of Books (Optional)
2 pages
Module 19 Business Objects
No ratings yet
Module 19 Business Objects
15 pages
Catatan
No ratings yet
Catatan
1 page

Search Queries Anomaly Detection Using Python

Uploaded by

Search Queries Anomaly Detection Using Python

Uploaded by

Search Queries Anomaly Detection using

 November 20, 2023

Search Queries Anomaly Detection: Process We Can Follow

Search Queries Anomaly Detection using Python

Exploratory Data Analysis

Let’s have a look at the column insights before moving forward:

Now, let’s analyze common words in each search query:

In this correlation matrix:

Detecting Anomalies in Search Queries

You might also like