0% found this document useful (0 votes)
447 views15 pages

3 Idiots Approach Display Advertising Challenge

This document summarizes an approach for a display advertising click-through rate prediction challenge that achieved scores of 0.44488 and 0.44479 on the public and private leaderboards. The approach involved preprocessing data, training a gradient boosting decision tree model to generate additional features, training a field-aware factorization machine model on the preprocessed data, and calibrating the predictions. Key steps included one-hot encoding categorical features, applying hashing tricks to generate a high-dimensional sparse feature space, and an additional calibration step to improve scores.

Uploaded by

shafiahmedbd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
447 views15 pages

3 Idiots Approach Display Advertising Challenge

This document summarizes an approach for a display advertising click-through rate prediction challenge that achieved scores of 0.44488 and 0.44479 on the public and private leaderboards. The approach involved preprocessing data, training a gradient boosting decision tree model to generate additional features, training a field-aware factorization machine model on the preprocessed data, and calibrating the predictions. Key steps included one-hot encoding categorical features, applying hashing tricks to generate a high-dimensional sparse feature space, and an additional calibration step to improve scores.

Uploaded by

shafiahmedbd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

1/15

3 Idiots Approach for


Display Advertising Challenge
Yu-Chin Juan, Yong Zhuang, and Wei-Sheng Chin
NTU CSIE MLGroup

What This Competition Challenges Us?

Predict the click probabilities of impressions.

2/15

Dataset
Label
1
0
0

I1
3
7
12

I2
20
91
73

I13
2741
1157
1844

C1
68fd1e64
3516f6e6
05db9164
..
.

C2
80e26c9b
cfc86806
38a947a1

C26
4cf72387
796a1a2e
5d93f8ab

62

1457

68fd1e64

cfc86806

cf59444f

#Train:
#Test:
#Features after one-hot encoding:

3/15

45M
6M
33M

Evaluation

1X
yi log yi + (1 yi ) log (1 yi ),
logloss =
L
i=1

where L is the number of instances, yi is the true label (0 or 1),


and yi is the predicted probability.

4/15

5/15

This slide introduces our approach to achieve 0.44488 and 0.44479


on the public and private leaderboards, respectively.

Flowchart

6/15

Pre-A

nnz=13-39
feat=39

GBDT

nnz=30
feat=30 27

Pre-B

nnz=
69
feat
=10 6

CSV

Rst

Calib.

FFM

nnz means the number of non-zero elements of each impression; feat represents
the size of feature space.

Preprocessing-A

Purpose: generate features for GBDT.


All numerical data are included. (13 features)
Categorical features (after one-hot encoding) appear more
than 4 million times are also included. (26 features)

7/15

Gradient Boosting Decision Tree (GBDT)

8/15

Purpose: generate GBDT features.


We use trees in GBDT to generate features.
30 trees with depth 7 are used.
30 features are generated for each impression.
This approach is proposed by Xinran He et al. at Facebook.

Gradient Boosting Decision Tree (GBDT)

9/15

Example: Assuming that we have already trained GBDT with 3 trees with depth 2.
We feed an impression x into these trees. The first tree thinks x belong to node 4, the
second node 7, and the third node 6. Then we generate the feature 1:4 2:7 3:6 for
this impression.

x
1

2
4

3
5

1:4

2
7

3
5

2:7

2
7

3
5

3:6

Preprocessing-B

10/15

Purpose: generate features for FFM.


Numerical features (I1-I13) greater than 2 are transformed by
v blog(v )2 c.
Categorical features (C1-C26) appear less than 10 times are
transformed into a sepcial value.
GBDT features are directly included.
These three groups of features are hashed into 1M-dimension
by hashing trick.
Each impression has 13 (numerical) + 26 (categorical) + 30
(GBDT) = 69 features.

Hashing Trick

11/15

text

hash function

hash value

mod 106

feature

I1:3

739920192382357839297

839297

C1-68fd1e64

839193251324345167129

167129

GBDT1:173

923490878437598392813

392813

Field-aware Factorization Machine (FFM)

12/15

For the details of FFM, please check the following slides:


http://www.csie.ntu.edu.tw/~r01922136/slides/ffm.pdf

Calibration

Purpose: calibrate the final result.


The average CTRs on the public / private leaderboards are
0.2632 and 0.2627, respectively.
The average CTR of our submission is 0.2663.
There is a gap. So we minus every prediction by 0.003, and
the logloss is reduced by around 0.0001.

13/15

Running Time

14/15

Environment: A workstation with two 6-core CPUs


All processes are parallelized.
Process
Pre-A
GBDT
Pre-B
FFM
Calibration
Total

Time (min.)
8
29
38
100
1
176

Memory (GB)
0
15
0
16
0

Comparison Among Different Methods

15/15

Method
LR-Poly2
FFM
FFM + GBDT
FFM + GBDT (v2)
FFM + GBDT + calib.
FFM + GBDT + calib. (v2)

Public
0.44984
0.44613
0.44497
0.44474
0.44488
0.44461

v2: 50 trees and 8 latent factors

Private
0.44954
0.44598
0.44483
0.44462
0.44479
0.44449

You might also like