0% found this document useful (0 votes)
134 views27 pages

Survival Analysis For Cache Time-To-Live Optimization Presentation

This document discusses using survival analysis techniques to optimize the time-to-live (TTL) values for entries in a hotel rate cache. It begins with an introduction to survival analysis and key terms. Survival curves are estimated from historical rate change data using Kaplan-Meier and parametric models. Initial experiments successfully applied optimized TTLs for unavailable rates, reducing lookups without negatively impacting bookings. Future work includes expanding the approach to available rates and personalizing TTLs based on predictive variables.

Uploaded by

Mohamed Umar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
134 views27 pages

Survival Analysis For Cache Time-To-Live Optimization Presentation

This document discusses using survival analysis techniques to optimize the time-to-live (TTL) values for entries in a hotel rate cache. It begins with an introduction to survival analysis and key terms. Survival curves are estimated from historical rate change data using Kaplan-Meier and parametric models. Initial experiments successfully applied optimized TTLs for unavailable rates, reducing lookups without negatively impacting bookings. Future work includes expanding the approach to available rates and personalizing TTLs based on predictive variables.

Uploaded by

Mohamed Umar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Survival Analysis & TTL Optimization

Rob Lancaster, Orbitz Worldwide Click to edit Master subtitle style

3/5/12

Outline
The Problem Survival Analysis
Intro Key Terms Techniques & Models:
Kaplan-Meier Estimates Parametric Models

Optimizing Cache TTL


Methods Results
3/5/12

The Problem
The hotel rate cache and TTL optimization.

3/5/12

The Hotel Rate Cache

3/5/12

The Hotel Rate Cache


Key/Value Store
Key: Search Criteria

hotel id host

check-in check-out

# people # rooms

Value: Hotel Rate Information

Benefit = Reduce looks & latency Cost = Increased re-price errors


3/5/12

The Hotel Rate Cache


Each cache entry is given a time-to-live

(TTL)

TTLs set based on intuition ages ago. Goal: Optimize TTL to decrease looks,

control re-price errors

How? Ideally, find greatest TTL value at

which probability of rate change is below an acceptable threshold.

3/5/12

Survival Analysis
A brief? introduction.

3/5/12

What is Survival Analysis?


Statistical procedures for predicting time

until an event occurs.

Event: death, relapse, recovery, failure. Examples:


Heart transplant patients:
Time until death.

Leukemia patients in remission:


Time until relapse.

Prison parolees:
Re-arrest.

3/5/12

Key Terms
Survival Time, T vs. t Failure Censoring Survival Function

3/5/12

Censoring
Period of no information
Left-censored. Right-censored.

Causes:
Individual is lost to follow-up Death from cause unrelated to event of

interest

Study ends

Models assume either failure or censoring.


3/5/12

Survival Function
Survival Function: S(t) Probability of survival greater than t,

i.e. that T > t


weibull

Properties:
Non-increasing S(t) = 1, for t=0. S(t) = 0, t=

1 0.8 0.6 0.4 0.2 0 log-logistic 1 0.8 0.6 0.4 0.2 0

3/5/12

Kaplan-Meier Estimates
tj: observation time
tj mj qj mj: number ofnjfailures 0 0 0 14

qj: number of 14 censored observations 1 1 0


2 1 1 nj: number at 13 risk 4 6 7 9 10 2 0 1 1 2 1 2 0 0 2 11 8 6 5 4

+1 = ( + )

3/5/12

Kaplan-Meier Estimates
()

3/5/12

Parametric Models
Accelerated Failure

Time

Distributi on Exponent ial Weibull Loglogistic

S(t)

Assume

distribution
Use regression to

fit parameters.
is parameterized

in terms of predictor variables and regression parameters.

3/5/12

Optimizing Cache TTL


Methods and early results.

3/5/12

Data Collection
Data is collected from service hosts in

our hotel stack.

Includes every live rate search (aka

burst) performed by our hotel stack.


Raw data: ~200 GB, compressed, 108

records.
Extraction: <40 GB compressed, 109

records.

3/5/12

Data Preparation
Map/Reduce Job
Key: unique search criteria (including

hotel id)

Sorted by date of occurrence Most important output:

Does rate ever change? (how long) status ever change? (how long)

Does

Results stored in Hive Table


Predictors: location, lead 3/5/12 los, time,

chain, etc.

Data Preparation: Sample


Key: hotelid:checkin:checkout: ppl:rms Timestamp Status 12345:2012-03-01:20122012-01-10 03-02:2:1 5:00Available 12345:2012-03-01:20122012-01-10 03-02:2:1 8:00Available 12345:2012-03-01:20122012-01-10Unavaila 03-02:2:1 11:00ble 12345:2012-03-01:20122012-01-10Unavaila 03-02:2:1 13:00ble 12345:2012-03-01:20122012-01-10Unavaila 03-02:2:1 14:00ble 12345:2012-03-01:20122012-01-10Unavaila 03-02:2:1 17:00ble 12345:2012-03-01:20122012-01-10 03-02:2:1 19:00Available 12345:2012-03-01:20122012-01-10 03-02:2:1 22:00Available 12345:2012-03-01:20122012-01-10 03-02:2:1 23:00Available 12345:2012-03-01:20122012-01-11 03-02:2:1 1:00Available 12345:2012-03-01:20122012-01-11 03-02:2:1 3:00Available Rate $100 $100 N/A N/A N/A N/A $120 $120 $150 $150 $150 Status Change TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE N/A Hours Until Status Change 6 3 8 6 5 2 N/A N/A N/A N/A N/A Rate Hours Until Change Rate Change TRUE TRUE N/A N/A N/A N/A TRUE TRUE FALSE FALSE N/A 6 3 N/A N/A N/A N/A 4 1 N/A N/A N/A

3/5/12

KM Estimates
Glo bal By Traffic Volume

3/5/12

Fitting the Survival Curve


Assume exponential:

Apply simple linear regression.


Full data R2: 0.9671 40 hrs R2: 0.999
3/5/12

Survival Regression
Using survreg, we can fit

our data to a given distribution.

Allows us to capture

influence of predictor values on survival rate.

3/5/12

Model Families

3/5/12

Production Testing
Divided hotels in 8 markets into A & B groups Modified TTL values for unavailable rates for B Prediction: Reduce the number of looks to B Reduce the unavailability percentage for B No negative impact on bookings or look-to-

books for B

3/5/12

Production Results

3/5/12

Production Results

3/5/12

Conclusions and Next Steps


Conclusions
Survival Analysis is well-suited for our

problem. rates.

Great success in experiments for unavailable

Whats next?
Available rates Introduction of predictor variables On-the-fly TTL calculation Beyond TTL
3/5/12

Thank you!

Questions?

3/5/12

You might also like