Improving Click Through Rates
Improving Click Through Rates
net/publication/380524890
CITATIONS READS
0 233
2 authors:
All content following this page was uploaded by Subid Dahal on 12 May 2024.
Abstract - This report presents a study on brand-new campaigns on a monthly or even daily basis.
predictive analytics for predicting click-through But the previous campaign should be analyzed before
rates (CTR) of product promotion advertisements launching a new campaign for a more effective result of
posted on various platforms. In the age of digital their advertisements. The success of an advertising
marketing, predicting advertisement click-through campaign is measured by analyzing CTR. So having a
rates is critical for improving their performance. module that accurately analyzes and predict those rates
Marketing initiatives to promote a product often is important. The success of an advertising campaign is
underperform, and predicting their click-through measured by analyzing CTR. So having a module that
rates can help improve their performance. The accurately analyzes and predict those rates is important.
So various research has been done to accurately predict
purpose of this project is to create a predictive
the CTR. Additionally, the rise of Big Data tools has
analytics model that uses big data tools and provided a new method to examine user behavior.
technology to predict the click-through rates of ads
running on various platforms for product II. PROBLEM STSTEMENT
promotion. The proposed methodology will allow Numerous businesses are spending a lot of money
businesses to efficiently analyze large datasets and on advertising their goods and services, however, the
generate insights that will help improve the return they receive on that investment isn't as expected.
performance of their marketing initiatives. The In particular, the click-through rates of ads running on
project's predictive analytics model could help different platforms are low. Because of this, it's hard for
businesses in making data-driven decisions and businesses to figure out how to improve turn rates and
optimize their marketing campaigns, thus reach their marketing goals while dealing with this
improving the digital advertising field. problem. Without having reliable analytics CTR data,
companies can't make decisions based on previous
Keywords - CTR, Advertisement, product promotion, marketing campaigns.
predictive analytics
III. AIMS
I. BACKGROUND STUDY The purpose of predictive analytics is to create a model
The return on investment for successful online that can properly anticipate the click-through rates
advertising is currently very high. In 2016, the digital (CTR) of advertisements running on several platforms
market of Europe was predicted to be worth 48.3 billion to promote a product.
euros. As the internet has grown and new digital rivals
have joined the market, the sector has exploded in size IV. OBJECTIVES
during the past two decades. Internet data about users' This project's objective is to help marketing teams
behaviors are being captured in massive amounts improve the success of their ad campaigns as well as
across numerous platforms while the number of online boost engagements by providing more precise
activities continues to grow exponentially. Mining this predictions of the CTR of advertisements displayed
data can aid the platform in its advertising and
across different channels.
management. As an information consumer, it can be
challenging to find information of interest through an
enormous quantity of information. Also, as a creator, it is a. Cleaning and preprocessing the dataset to remove
not simplistic to ensure that the information created by irrelevant data and ensure consistency.
oneself stands out and gathers the interest of users in b. Exploratory data analysis is used to identify the key
today's competitive market. Advertising is now factors that affect ad CTR.
fundamental to running a successful business in the c. Creating and training a machine learning model to
Internet age. It's become more complicated and data- predict ad CTR
driven, which requires deeper familiarity with consumer d. Evaluating the developed model's accuracy
habits and preferences. With the use of predictive e. Identifying the best-performing ads and
analytics, organizations can better anticipate customer recommending strategies for improving the performing
behavior and make changes within their advertising ads.
strategies to boost CTR.
V. CONTRIBUTION OF THE WORK CONNECTED
Advertising has expanded rapidly in recent years, WITH METHODOLOGY
and fresh entrepreneurs are constantly flooding in.
Existing advertisers constantly release new ad Data preprocessing: In order to guarantee the data
campaigns at the same time. Many advertisers launch obtained for this research was in the proper format for
analysis, substantial cleaning, as well as preprocessing,
was necessary. This involved operations including Despite these advancements, there are still challenges
cleaning up duplicates, identifying missing data, and in accurately predicting CTR due to factors such as ad
classifying insightful data. fatigue, seasonality, and changes in user behavior. This
Feature engineering: The data had to be transformed research aims to contribute to this field by analyzing
into a format that could be analyzed. This included tasks large-scale datasets with a combination of MapReduce
like creating a new feature or feature selections. and Spark and investigating the impact of different
Machine learning model selection: Various machine feature engineering techniques on CTR prediction
learning algorithms were evaluated for their accuracy in performance. (S. Patil, 2022)
predicting ad click-through rates. This included machine
learning algorithms. VIII. LITERATURE REVIEW
Model training and evaluation: the dataset is split into A. Review I
train and test to evaluate the model's performance using
Mrs. Solanke's 2022 research paper focuses on the
accuracy, precision, and others.
field of machine learning in order to predict an
Optimization: Several methods were employed to
advertisement's CTR. It compares the performance of
improve the model performance.
four algorithms (Decision Trees, XGB, Random Forest,
The final results were analyzed to find trends as well as
as well as LGBM) based on accuracy, AUC score,
patterns that might be employed in order to make
precision score, and F1 score. For this study, the Avazu
marketing efforts more effective.
CTR Dataset was employed. It is a collection of 11 days'
VI. ORGANIZATION OF REPORT worth of statistics on how many times certain web ads
were clicked. Click-through data from the previous 10
The report related-work section contains some already days is utilized for training, whereas data from the
published similar works on predicting click-through previous day is used for testing.
rates (CTR) in digital advertising. After that, a few LGBM was found to have the highest accuracy (0.8795)
related literatures were reviewed in the literature review
after being compared to other models. On all four
section. Then, the methodology section includes the measures, LGBM is the best algorithm. LGBM has the
detailed techniques employed in the analysis. Likewise,
highest AUC score in classifying whether or not an ad
the result and discussion section include the model
will be clicked with 0.95, whereas Random Forest falls
performance and analysis of the findings obtained from short at 0.73. When compared to other algorithms,
the analysis. At last, in the conclusion section, a LGBM has superior Precision, F1 score, as well as
summary of the findings and the future scope is
Accuracy all around 0.87. Due to its low AUC, Precision,
presented. and F1 scores, the decision trees classifier remains the
VII. RELATED WORK least preferred method for CTR prediction.
Overall, the research lends credence to the idea that
Several works on predicting click-through rates (CTR) feature interaction and more user-related data are
in digital advertising have already been published. crucial to enhancing model performance. (M. R.
In one research study, researchers utilized weekly ad Jaisinghani, 2022)
data to evaluate the correlation between
advertisement placement and viewer response. Their B. Review II
Logistic Regression-based CTR calculation is 90% Mr. Bakhtyari and his colleagues put together a study
accurate, leading them to the conclusion that ad examining the usage of deep factorization machines
placement and depth affect click-through rates. Another alongside boosting methods (XDBoost algorithm). For
study presented a model that could forecast CTR for prediction inference alongside limited resources, it
new advertisements based on ad attributes, term employs a boosting method. It demonstrates that a raw
features as well as advertisers. The researcher built boosting model can beat the XDBoost approach with
a method on a Logistic Regression algorithm. The study proper feature creation and fine-tuned parameter
emphasizes the importance of accurate outcomes and adjustment. It employs exploratory data analysis to
how they boost the effectiveness of the advertising identify the most prominent features of the dataset and
system by boosting both income and user pleasure. (U. remove any irrelevant details. The authors then use a
Singh, 2022) grid search strategy to determine the optimal values
Moreover, the issue of categorization has been among the model's hyperparameters.
approached from several different angles. For instance, Three distinct data sets have been created from the
a hybrid model combining decision trees with logistic original dataset which is divided in form of 1% of the
regression was presented in one study. When total data in the first dataset, 5% in the second, and 10%
compared to the performance of either model alone, the in the third. For all three data sets, the proposed
combined approach was found to be the more effective XGBoost method achieves better results than the
one. Authors saw a 3% gain in prediction accuracy XDBoost models both in terms of AUC as well as Log
using boosted decision trees considering real-valued Loss. It shows that XGBoost beats the XDBoost method
input characteristics, leading them to infer that this on both the AUC and Log Loss measures in a small
method is superior to traditional linear classifiers. sample set with appropriate feature architecture and
Research into novel methods has led to the proposal of hyperparameter optimization. (Mirzaei, 2021)
an alternative model as a Deep Interest Network (DIN),
which outperforms more conventional models.
C. Review III the order of field embeddings affects the success of
Using Embedding Model along with Deep Learning, Mr. standard CNN models when predicting CTR. So far, this
Jiang and his team created an approach to predict CTR is the first time the CTR prediction has attempted to use
that is able to efficiently analyze multi-field categorical quantum probability theory. They put the model to work
data, determine the complex nonlinear correlated on two well-known open datasets. The results of the
relationship, as well as capture the essential experiment show that the model works effectively and
characteristics contributing to CTR forecast for is stable. (Hou, 2020)
advertisement. Experiments conducted on the original E. Findings from literature review
data set confirm that the suggested method
In summary, this literature highlights the importance of
outperforms commonly used baseline models.
advanced machine learning techniques such as feature
This research offers an embedding-based deep
engineering. Also, it highlights the importance of user-
learning model called FMSDAELR to predict click-
related data in predicting CTR for advertisements.
through rates (CTRs) for ad datasets with large
Models such as LGBM, XGBoost, FMSDAELR, as well
amounts of multi-field categorical data. Multi-field
as DMCNN outperform traditional methods. Moreover,
categorical data is first modeled using an embedding
it shows the significance of capturing feature
model then FM to get a low-dimensional dense vector
interactions and complex correlations. So, integrating
in the suggested technique. Next, a stacked sparse
these methods can enhance CTR prediction
autoencoder is shown to automatically discover the
performance and improve advertising strategies.
critical features' dimensions. After that, it takes the raw
In the literature review, there are different algorithms
data from the advertisement logs and determines the
used within different datasets. In our case, we will
high-order cross features and complicated nonlinear
choose the algorithms after identifying the appropriate
correlation relationships. Last but not least, LR is used
input variable where we analyze the ctr so our target
for the problem of estimating how often people will
variable is ctr.
interact with adverts. Using a real-world dataset, the
authors show that the proposed FMSDAELR models IX. METHODOLOGY
can achieve better CTR prediction performance
compared to other baseline models by accurately
capturing all the features of the most significant
dimension as well as underlying patterns in the massive
multi-field categorical data found in advertisement logs.
(Z. Jiang, 2018)
D. Review IV
The study of MR. Tianyuan and his colleague introduce
the idea of a "density matrix" to quantum probability
theory. Researchers have come up with a new way to
describe input cases that are based on the density
matrix and can hold information about global 2-order
feature interactions. It takes the best parts of both the A. Data collection
density matrix along with the Convolutional Neural The Kaggle dataset is used in analyzing and predicting
Network and makes the Density Matrix Based click-through rates (CTR) of product promotion
Convolutional Neural Network (DMCNN), and that can advertisements posted on various platforms.
record more feature interactions compared to other
models. Experiments with Criteo and Avazu datasets B. Data Preprocessing
show that the model works well and gets great results Firstly, the dataset is stored in the MongoDB database
on several standards. and the Jupiter Notebook is connected to the database.
The research model achieves the highest prediction Then, by importing the pyspark into the notebook, the
performance compared to all other baselines when pyspark schema is made for the dataset.
tested on the Criteo and Avazu datasets. The estimated After that, the data is overviewed and understood.
performance of the LR model is among the lowest. Then, the data cleaning process is employed. Initially,
However, it is the most quickly and easily trained model. the data has 35 columns and 72612 rows. After that, the
Compared to LR, FM performs better, demonstrating null values of every column are checked. As a result,
the importance and validity of extracting feature seven columns have a high number of null values and
interactions. Moreover, capturing the high-order feature these columns are dropped. These columns are
interactions is crucial, as shown by the fact that the dropped because it is less important in analyzing CTR
other four deep models perform better than LR and FM. as well as carry less meaning towards analyzing CTR.
Models which could learn both high-order features as Moreover, the one column (approved_budget) has the
well as low-order features, such as DeepFM and least number of null values and was filled with its mean
DMCNN, perform far better in compared to models that value because it can meaningful in analyzing CTR. After
learn solely high-order features, like FNN and CCPM. dealing with the null values, the duplicate data is
So, the model takes the best parts of both the density checked which is 0 means that the dataset has no
matrix alongside the CNNs and solves the problem that duplicate data.
So, after dealing with null values and checking duplicate
values, the data shape is transformed from 72612 rows
and 35 columns to 72612 rows and 28 rows.
C. Data Analysis and Visualization
After completing data preprocessing, the initial data is
visualized in a Jupiter Notebook and Kibana as data
The above figure shows the mean CTR over time. The
mean CTR reached high at September which is 8. The
impact of high CTR could be because of upcoming
festivals of Indians. As, maximum advertisement is
related to Jewelry and Indian were cultural dresses with
Jewelry in festivals. After that, the 2022-05 and 2022-
06 has high mean CTR of 3.
So, the advertisers may target the festival or other
specific occasion to advertise their product and get a
high CTR.