0% found this document useful (0 votes)

47 views36 pages

Internet Tomography: Bin Yu Statistics Department, UC Berkeley

This document summarizes research on network tomography for estimating network characteristics like link delays and origin-destination traffic matrices. It describes two main examples: (1) estimating link delays in a multicast network using observed delays, and (2) estimating origin-destination traffic matrices using link traffic counts. It then outlines the general linear network tomography model and discusses heuristics like using covariance of link measurements. Maximum likelihood and iterative proportional fitting approaches are proposed for parameter estimation and reconstructing the traffic matrices. Computational methods like moving windows and maximum pseudo-likelihood estimation are also introduced to address issues like non-stationarity and improve efficiency. Evaluation using real network data is discussed.

Uploaded by

joaquimdalmeida

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views36 pages

Internet Tomography: Bin Yu Statistics Department, UC Berkeley

Uploaded by

joaquimdalmeida

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 36

Internet Tomography

Bin Yu
Statistics Department, UC Berkeley
J. Cao, D. Davis, S. Vander Wiel,

G. Liang,

R. Castro, M. Coates, A. Hero, R. Nowak,

N. Taft.

Related papers:

Cao, Davis, Vander Wiel, and Yu (JASA, 2000),
Coates, Hero, Nowak, and Yu (SPM, 2001)
Liang and Yu (IEEE-SP, 2003),
Castro, Coates, Liang, Nowak, and Yu (Statist. Sci., 2004),
Liang, Yu, and Taft (Proc. ISIT04).
Collaborators
Medical Tomography
Computer assisted tomography (CAT scanning)
Positron emission tomography (PET scanning)
Single photon emission tomography (SPECT scanning)

All are inverse problems
Internet Tomography

A Lucent Network
Network Tomography
The term network tomography was first used by Vardi (1996) to
capture the similarities between origin destination (OD) matrix
estimation through link counts and medical tomography: in network
inference, it is common that one does not observe quantities of
interest but their aggregations instead and this goes beyond OD
estimation.

Vardi (1996) also devised the linear tomography Poisson model for
OD traffic estimation and the linear form (not the Poisson
assumption) is shown later to approximate other network
tomography problems (cf. Coates, Nowak, Hero and Yu, 2002).
Why Network Tomography (NT)?
Network monitoring and management need

- Link packet loss probability
- Link delay
- Origin-Destination (OD) traffic matrix
- Topology/connectivity discovery
- Intrusion detection and prevention
- ...

They are not easily measured directly, but easily measurable indirectly.

Network engineering and resource allocation include

- Routing optimization (OD information needed)
- Quality of service guarantee
-

NT Example 1: Multicast Link Delay Estimation
Probes are sent from the root of a multicasting tree (where
routers duplicate the probes and send them to its downstream
routers) and delays (Y) are observed at the receiver nodes only.
The problem is to infer the distribution of internal links delay (X).
Obviously, we have

Y=AX,
where 1's in the ith row of A specify the links that the ith
component Y travels through.
|
|
|
.
|

\
|
|
|
.
|

\
|
=
|
|
.
|

\
|
t
t
t
t
t
x
x
x
y
y
, 3
, 2
, 1
, 2
, 1
1 0 1
0 1 1
dst-fddi
1
dst-switch
2
dst-local
3
dst-corp
4
total
4 1 4 2 4 3 4 4 4 orig-corp
3 1 3 2 3 3 3 4 3 orig-local
2 1 2 2 2 3 2 4 2 orig-switch
1 1 1 2 1 3 1 4 1 orig-fddi
n = 4 edge nodes, 1 router, J=8, I=16.
J = n
2
= 16 OD pairs in X
I = 7 independent links in Y
NT Example 2: OD Traffic Matrix Estimation
Router 1 Link Data, Feb. 22
General Linear Network Tomography Model
At a given time t,

X : unknown quantity of interest (of dim J )
(e.g, link delay, traffic flow counts).

Y : known aggregations of X (of dim I).

Problem: predict or estimate X from Y with

AX = Y,
where A is a 0-1 routing matrix. Usually the number J of unknowns is
much larger than number I of knowns, so it is a badly ill-posed linear
inverse problem.

The special case of OD traffic matrix estimation is of most interest
because of its importance to major service providers such as Sprint and
AT&T.
Heuristics to Recover X from Y.
Key observations:

Due to the variability in the traffic, covariance of the Y or link
measurements give hints on how to attribute traffic to the different OD
pairs.

The mean traffic level is related to the variance of the traffic.
Roadmap for OD Estimation
Gaussian Model with Mean-Variance Relationship

Maximum Likelihood Estimation (MLE) for Parameters

Iterative Proportional Fitting (IPF) for OD Traffic Estimation based on
Parameter Estimation

Maximum Pseudo-Likelihood Estimation (MPLE) for Parameters

Sprint European Network Data Analysis

A Geometric View: MPLE + IPF vs. Gravity Model + MMI

New: A Partial Measurement Approach (APMA)

OD:
Link:

Where
is the unknown parameter, and
|: positive scale parameter
: unknown mean parameter
c: power of variance growth with mean, fixed

Variance relation to mean accounts for variations beyond Poisson
(c=1 and | =1).

The Gaussian mean-variance model was verified in Cao et al
(2000) using LAN validation data and recently verified using
Sprint European backbone validation data by Melinda et al (2002)
and Global Crossing European and American backbone validation
data by Gunnar et al (2004).
Basic Model (Cao et al, 2000)
), ( diag
); ' , ( Normal ~
); , ( Normal ~
c
t t
t
A A A Ax y
x
|

= E
E =
E
) , ( | u =
Theorem: is identifiable for fixed c.

For the ith origin-destination pair,

: link count at the origin interface

: link count at the destination interface.

The only bytes that contribute to both of these counts are those from
the ith OD pair, and thus

implying that
i
is determined up to the scale |. Additional information
from E(Y) identifies the scale and identifiability follows.

This proof formalizes the idea of using covariances between links
motivated by the router 1 traffic plots.

d
y
o
y
c
i d o
y y | = ) , cov(
A Heuristic Identifiability Proof
Maximum Likelihood Estimate (MLE) for Gaussian Model
Given observed data , the log-likelihood function is
T
y y y , , ,
2 1

. ) ( ) ' ( )' (
2
1
| ' | log
2
) | (
1
1

E E =
T
t
t t
A y A A A y A A
T
l u Y
Because is functionally related to , no analytic solution to maximize the
above expression in terms of : Expectation-Maximization algorithm is
used.

MLE computation with EM is too slow for large networks.

Each EM step has complexity with sparsity matrix calculations. (Cao et al.
2000).

u
) (
5
e
n O
Iterative Proportional Fitting (IPF): a simple
alternating minimization procedure to find the I-
projection.
Iterative Proportional Fitting (IPF) for OD Estimation
Given Initial Parameter Estimation
Given
(a) a set of summation linear constraints L (AX=Y)
(b) a starting distribution q for X (e.g. MLE estimates for mean OD traffic)

I-projection of q to L is

Maximum Entropy Principle is a special case when q is uniform.
) || ( min arg q p D p
L pe
=
) || ( ) || ( ) || (
1 1
q p D p p D q p D + =
Pythagorean equality:
. any for
1
L p e
Moving Windows to Address Nonstationarity
Dealing with nonstationarity:

Local Likelihood is formed based on n observations such that

Data inside each moving window is assumed to be i.i.d;
Moving windows are overlapping;
Estimates from previous window as starting values for next one. (n=7)
Replacing MLE by Maximum Pseudo-Likelihood Estimation (MPLE)
(Liang and Yu, 2003, IEEE-SP)
Our pseudo likelihood

has a different scheme for forming sub-problems by using pairs of links
and
multiplies likelihoods based on pairs instead of conditional likelihood.

But they share the same divide-and-conquer principle.
In order to overcome the computational difficulty of MLE for Markov
random field (MRF) inference problems, Besag (1974) proposed a pseudo
likelihood (PL) approach.

Sub-problems are formed by neighborhood decomposition;
Pseudo likelihood function is obtained by multiplying the conditional
likelihoods from sub-problems, ignoring dependences among sub-
problems.

In our experiments, we use sub-problems of all pairs.

The pseudo-EM algorithm is similar to the one used in Cao et al (2000), and
the same initial values are used.

The only difference is in E-step: many small matrix inversions instead of
one big matrix inversion, and they can be made parallel.

If the average length of OD paths is , then the complexity of one
pseudo-EM step is .

Recall that the EM step of MLE has complexity with sparsity matrix
calculations. (Cao et al. 2001).
2 2
e e
n n
MPLE computation
) (
5 . 3
e
n O
) (
5 . 0
e
n O
) (
5
e
n O
Estimated Mean Traffic
Computation Time Comparison
for MLE and MPLE
Using network simulator ns, we simulated two networks of 8 end nodes and 21
end nodes, based on the Lucent network topology. For estimating the traffic
counts, the computation times (in seconds) are as follows (using R and a 1GHz
laptop):
# nodes # links MLE MPLE MPLE/MLE
4 7 48 12 0.25
8 16 82 18 0.21
21 49 2300 149 0.06
Sprint Europe Network Data With
Validation (OD Traffic Known)
Configuration: 13 PoPs, 18 internal links.
Directly measured OD traffic, X, through Ciscos Netflow
Automatic 10 minute aggregation
Two Sample OD Traffic Plots
Periodicity
Slow-variability of mean OD traffic
Smoothness (nonburstiness), most of time
A Pictorial Comparison of Our Approach and ATTs
Average relative errors: pseudo+IPF is 0.279 and gravity+MMI: 0.305 for large
OD traffic.
Cumulative Distribution Plot of Relative Errors
Boxplot of Relative Errors
Boxplot of relative errors for large OD traffic: pseudo+IPF (red) and gravity+MMI
(black). All traffic is binned into 10 equal spaced levels.
Estimation Results: Two Sample OD Traffic
Gunnar et al (2004)
compares different tomographic methods
Global Crossing validation data sets:
a. European network: 12 PoPs (132 OD pairs), 72 Links
ATT approach and variants give best results:
about 10% relative error for 29 largest OD pairs
(90% total traffic). Worst case bound (LP programming)
also gives comparable results.

b. American network: 25 PoPs (600 OD pairs), 284 links.
ATT approach and variants give best results: 25%. WCB
gives 39% for 155 largest OD pairs.

Both OD problems are much more well-posed than the Sprint data set.

Liang et al (2004). Proc. ISIT, June.
Rationale:
Direct measurements of OD pairs through NetFlow are becoming
available but still computationally expensive. We propose to
trade off computation with OD information gathering through:
APMA Algorithm:
i) For each t, select some OD pairs to measure;
ii) Plug measured OD pairs into AX=Y and use IPF to obtain the
remaining Xs with initial values for these Xs estimated from t-1.
APMA: a partial measurement approach
.
Recently, Papagiannaki et al (2004) uses complete OD information
measured every few days to estimate fanout cofficients used together
with link counts for OD estimation (relative error rates 6-10%).
Selection Schemes
On-line selection:
Randomly select few OD pairs to measure with weights
a. uniform;
b. proportional to the estimated variances from the Gaussian
model, fitted based on estimated OD from t-1.

Off-line selection:
Make both schemes (a) and (b) deterministic by cycling through
the OD pairs according to a fixed list generated ahead of
time.

Estimation Results: Two Sample OD Traffic
Overall error rates with one OD pair measured
are 7% (uniform selection) and 3.5% (using
weights).
These rates are conservative because to turn on
NetFlow at a router, a whole row of OD pairs
becomes available, not just one pair. With
uniform off line whole row OD traffic, the error
rate drops to 3.7%.
Compression can be used to reduce
transmission cost (sending differences of
estimated OD from t-1 and measured OD at t).
Soule et al (2004) compare second-
generation methods
Sprint European validation data set:
Methods use information beyond link counts.
Give better results. E.g.
Generalized Gravity+MMI (ATT approach): 30%
Stable error rates across space, but not time.
Kalman filter, PCA based, Fanout. They all use
OD information one way or the other, and give
5-10% errors.

Parting Message:
Second generation Tomographic Methods
go beyond link counts to drastically
reduece error rate.
APMA is one of such methods which is
inexpensive and computationally fast.
First-generation methods are still useful.
For example, we are planning to use
Gaussian model to give priors to feed into
Sprints Kalman filter method.

A Novel Framework For Intrusion Detection Using Distributed Collaboration Detection Scheme in Packet Header Data
No ratings yet
A Novel Framework For Intrusion Detection Using Distributed Collaboration Detection Scheme in Packet Header Data
16 pages
Coates
No ratings yet
Coates
50 pages
CCNP Route All Chapters
No ratings yet
CCNP Route All Chapters
66 pages
Apply Machine Learning Techniques To Detect Malicious Network Traffic in Cloud Computing
No ratings yet
Apply Machine Learning Techniques To Detect Malicious Network Traffic in Cloud Computing
24 pages
Anomaly Detection and Attribution in Networks With Temporally Correlated Traffic
No ratings yet
Anomaly Detection and Attribution in Networks With Temporally Correlated Traffic
12 pages
INFORMS Is Collaborating With JSTOR To Digitize, Preserve and Extend Access To Transportation Science
No ratings yet
INFORMS Is Collaborating With JSTOR To Digitize, Preserve and Extend Access To Transportation Science
21 pages
ID9473
No ratings yet
ID9473
15 pages
Tech Report NWU-CS-02-13-Revised: February 29th, 2004 Multiscale Predictability of Network Traffic
No ratings yet
Tech Report NWU-CS-02-13-Revised: February 29th, 2004 Multiscale Predictability of Network Traffic
14 pages
Ijcnc 050302
No ratings yet
Ijcnc 050302
15 pages
2.6.2 Telecom Traffic Engineering (C-Dot)
No ratings yet
2.6.2 Telecom Traffic Engineering (C-Dot)
10 pages
A New Path Selection Algorithm For MPLS Networks Based On Available Bandwidth Estimation
No ratings yet
A New Path Selection Algorithm For MPLS Networks Based On Available Bandwidth Estimation
10 pages
BTP Presentation
No ratings yet
BTP Presentation
26 pages
Machine Learning Appledore Oracle WP
No ratings yet
Machine Learning Appledore Oracle WP
8 pages
Anomaly in Manet2
No ratings yet
Anomaly in Manet2
14 pages
Traffic Measurement Optimization Based On
No ratings yet
Traffic Measurement Optimization Based On
9 pages
Statistical Pattern Recognition Based Content Analysis On Encrypted Network Traffic For The TeamViewer Application
No ratings yet
Statistical Pattern Recognition Based Content Analysis On Encrypted Network Traffic For The TeamViewer Application
9 pages
(ARTICLE) Evaluation of Network Traffic Prediction Based On Neural Networks With Multi-Task Learning and Multiresolution Decomposition
No ratings yet
(ARTICLE) Evaluation of Network Traffic Prediction Based On Neural Networks With Multi-Task Learning and Multiresolution Decomposition
8 pages
Social Media Addiction and Its Influence On The Academic Performance of Senior High School Learners 1
No ratings yet
Social Media Addiction and Its Influence On The Academic Performance of Senior High School Learners 1
86 pages
Neural Network Tomography
No ratings yet
Neural Network Tomography
14 pages
Active and Passive Network Measurements A Survey
No ratings yet
Active and Passive Network Measurements A Survey
14 pages
CST 205 Quiz 1
No ratings yet
CST 205 Quiz 1
6 pages
李涛英文翻译
No ratings yet
李涛英文翻译
12 pages
An Integrated Network Performance Monitor System
No ratings yet
An Integrated Network Performance Monitor System
4 pages
76.phikita Phishing Kit Attacks Dataset For Phishing Websites Identification Felipe
No ratings yet
76.phikita Phishing Kit Attacks Dataset For Phishing Websites Identification Felipe
100 pages
Achieving Multi-Time-Step Segment Routing Via Traffic Prediction and Compressive Sensing Techniques
No ratings yet
Achieving Multi-Time-Step Segment Routing Via Traffic Prediction and Compressive Sensing Techniques
16 pages
Varet COMPSAC2014
No ratings yet
Varet COMPSAC2014
7 pages
Prediction of Cyber Attacks Using Data Science Technique
No ratings yet
Prediction of Cyber Attacks Using Data Science Technique
11 pages
ADT: AI-Driven Network Telemetry Processing On Routers
No ratings yet
ADT: AI-Driven Network Telemetry Processing On Routers
20 pages
Network Anomaly Detection Using A Hybrid Approach of Machine H Öztekin
No ratings yet
Network Anomaly Detection Using A Hybrid Approach of Machine H Öztekin
12 pages
Electronics 11 00898
No ratings yet
Electronics 11 00898
13 pages
Time-Aware Detection Systems: Proceedings
No ratings yet
Time-Aware Detection Systems: Proceedings
3 pages
Real-Time Feedback Control of Computer Networks Based On Predicted State Estimation
No ratings yet
Real-Time Feedback Control of Computer Networks Based On Predicted State Estimation
27 pages
Spatial Internet Traffic Load Forecasting With Using Estimation Method
No ratings yet
Spatial Internet Traffic Load Forecasting With Using Estimation Method
9 pages
Anomaly-Aware Network Traffic Estimation Via Outlier-Robust Tensor Completion
No ratings yet
Anomaly-Aware Network Traffic Estimation Via Outlier-Robust Tensor Completion
13 pages
A Study On High Speed Outlier Detection
No ratings yet
A Study On High Speed Outlier Detection
17 pages
ML-based Anomaly Detection in Optical Fiber Monitoring: Khouloud Abdelli, Joo Yeon Cho, Carsten Tropschug
No ratings yet
ML-based Anomaly Detection in Optical Fiber Monitoring: Khouloud Abdelli, Joo Yeon Cho, Carsten Tropschug
7 pages
Network Anomography
No ratings yet
Network Anomography
14 pages
Enhanced Network Anomaly Detection Using Autoencoders A Deep Learning Approach For Proactive Cybersecurity
No ratings yet
Enhanced Network Anomaly Detection Using Autoencoders A Deep Learning Approach For Proactive Cybersecurity
7 pages
Cybersecurity in Network Traffic: Integrating Statistical Techniques With AI
No ratings yet
Cybersecurity in Network Traffic: Integrating Statistical Techniques With AI
11 pages
Comparative Analysis of AGBFM and IWOFM With Forecasting Models LSSVM-PSO, LSSVM-ACO and LSSVM-WOA
No ratings yet
Comparative Analysis of AGBFM and IWOFM With Forecasting Models LSSVM-PSO, LSSVM-ACO and LSSVM-WOA
17 pages
1 s2.0 S2214212622000394 Main
No ratings yet
1 s2.0 S2214212622000394 Main
8 pages
Network Anomaly Detection
No ratings yet
Network Anomaly Detection
18 pages
00889216
No ratings yet
00889216
5 pages
Research 2
No ratings yet
Research 2
12 pages
427 - Investigating The Effectiveness - Joshua
No ratings yet
427 - Investigating The Effectiveness - Joshua
5 pages
Sample, Explore, Modify, Model, Assess
No ratings yet
Sample, Explore, Modify, Model, Assess
1 page
Manjunath Jusstuu
No ratings yet
Manjunath Jusstuu
11 pages
f99 1sol
No ratings yet
f99 1sol
4 pages
Ahmed PDF
No ratings yet
Ahmed PDF
6 pages
Batch 1 - 4 CSE C
No ratings yet
Batch 1 - 4 CSE C
9 pages
Deep Convolutional Neural Networks For Intrusion Detection in Automotive Ethernet Networks
No ratings yet
Deep Convolutional Neural Networks For Intrusion Detection in Automotive Ethernet Networks
6 pages
How To Write Chapter 4
No ratings yet
How To Write Chapter 4
33 pages
Efficient Classifier For R2L and U2R Attacks: P. Gifty Jeya M. Ravichandran C. S. Ravichandran
No ratings yet
Efficient Classifier For R2L and U2R Attacks: P. Gifty Jeya M. Ravichandran C. S. Ravichandran
5 pages
A Comprehensive Survey On Network Traffic Synthesis: From Statistical Models To Deep Learning
No ratings yet
A Comprehensive Survey On Network Traffic Synthesis: From Statistical Models To Deep Learning
33 pages
Machine Learning Approaches To Network Anomaly Detection: Tarem Ahmed, Boris Oreshkin and Mark Coates
No ratings yet
Machine Learning Approaches To Network Anomaly Detection: Tarem Ahmed, Boris Oreshkin and Mark Coates
6 pages
NetworkPerformance (Compatibility Mode) PDF
No ratings yet
NetworkPerformance (Compatibility Mode) PDF
17 pages
q3 536 23
No ratings yet
q3 536 23
2 pages
Efficiency of Network Traffic Management and Prediction Using SCA
No ratings yet
Efficiency of Network Traffic Management and Prediction Using SCA
9 pages
An Efficient Support Vector Machine Algorithm Based Network Outlier Detection System
No ratings yet
An Efficient Support Vector Machine Algorithm Based Network Outlier Detection System
14 pages
Bayesian Cost Effectiveness Analysis With The R Package BCEA PDF
No ratings yet
Bayesian Cost Effectiveness Analysis With The R Package BCEA PDF
181 pages
30 Mock Test Paper-1
No ratings yet
30 Mock Test Paper-1
653 pages
Case Prep
No ratings yet
Case Prep
18 pages
4.4 Correlation and Simple Linear Regression
100% (2)
4.4 Correlation and Simple Linear Regression
11 pages
Cheat Sheet
No ratings yet
Cheat Sheet
5 pages
Sem With Amos I PDF
100% (1)
Sem With Amos I PDF
68 pages
Simple Linear Regression & Correlation Chapter No 14...
No ratings yet
Simple Linear Regression & Correlation Chapter No 14...
43 pages
Office Equipment, Inc.: Waiting Line Model With A Finite Calling Population M/M/1
No ratings yet
Office Equipment, Inc.: Waiting Line Model With A Finite Calling Population M/M/1
9 pages
MANOVA - Analysis
No ratings yet
MANOVA - Analysis
33 pages
Dissertation Format
No ratings yet
Dissertation Format
39 pages
Wilkins Excel Sheet
100% (1)
Wilkins Excel Sheet
9 pages
Time To Collision
No ratings yet
Time To Collision
11 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
6 pages
A Nova Sumner 2016
No ratings yet
A Nova Sumner 2016
23 pages
Parzen Windowing
No ratings yet
Parzen Windowing
10 pages
05 - Statind2 - Regresi Linier Sederhana Dan Korelasi
No ratings yet
05 - Statind2 - Regresi Linier Sederhana Dan Korelasi
15 pages
Chapter 10 Powerpoint IPE381
No ratings yet
Chapter 10 Powerpoint IPE381
28 pages
Space Engineering: Control Performance
No ratings yet
Space Engineering: Control Performance
57 pages
BT401
No ratings yet
BT401
4 pages
Thesis Defence
No ratings yet
Thesis Defence
49 pages
Electronic Medical Records (EMR) Over Manual Documentation of In-Patient Records: A Scientific Insight
No ratings yet
Electronic Medical Records (EMR) Over Manual Documentation of In-Patient Records: A Scientific Insight
12 pages
Writing
No ratings yet
Writing
8 pages
Idsa Reviewer
No ratings yet
Idsa Reviewer
4 pages
NMR Val Guideline II V6
No ratings yet
NMR Val Guideline II V6
20 pages
Assessing The Effect of Kitchen Layout On Employee'S Productivity
No ratings yet
Assessing The Effect of Kitchen Layout On Employee'S Productivity
11 pages
Selfie Aging Index: An Index For The Self-Assessment of Healthy and Active Aging
No ratings yet
Selfie Aging Index: An Index For The Self-Assessment of Healthy and Active Aging
10 pages
MAS202
No ratings yet
MAS202
11 pages
Math 15 Module 4 - Activity 4 - Gayta
No ratings yet
Math 15 Module 4 - Activity 4 - Gayta
5 pages

Internet Tomography: Bin Yu Statistics Department, UC Berkeley

Uploaded by

Internet Tomography: Bin Yu Statistics Department, UC Berkeley

Uploaded by

Internet Tomography

You might also like