0% found this document useful (0 votes)
18 views68 pages

9 Large Network

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views68 pages

9 Large Network

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Study of Internet

Study of Large Networks

6CCS3INS Internet Systems


2014-15 Toktam Mahmoodi, Department of Informatics, KCL
Outline

 Study of Networks
 Measurement on a real network
 Random Graphs
 Small World phenomena
 Power Law Distribution and Preferential attachment
 Information flow and epidemics
Large networks
Study of Networks

 Empirical: Study network data to find organisational principles


 How do we measure and quantify networks?

 Mathematical models: Graph theory, statistical models


 Models allow us to understand behaviors and distinguish surprising from
expected phenomena

 Algorithms for analysing graphs


 Hard computational challenges

 Historical study of networks with mathematical graph theory


 One of the pillars of discrete mathematics
 Started with Euler’s celebrated 1735 solution of the Königsberg bridge
problem.
Properties of Network

 While a small network can be visualised directly by its


graph (N, g ), larger networks can be more difficult to
envision and describe.

 Therefore, we define a set of summary statistics or


quantitative performance measures to describe and
compare networks (focus on undirected graphs):

 Degree distributions
 Distance
 Diameter and average path length
 Clustering Coefficient
Properties of Network: Degree distribution

 Degree distribution, P(k)


 Probability that a randomly chosen node has degree k
 N(k) = no. of nodes with degree k
 P(k) = N(k) /n

 Distance
 between a pair of nodes is defined as the number of edges along
the shortest path connecting the nodes.
 In directed graphs
paths need to follow the direction of the arrows
Properties of Network: Diameter

 Let h(i, j) denote the length of the shortest path


between node i and j (or the distance between i and j).
 The diameter of a network is the largest distance between any
two nodes in the network:
diameter = max h(i, j)
 The average path length is the average distance
between any two nodes in the network
average path length = (∑i≥j h(i,j)) / (.5n(n-1))

 Average path length is bounded from above by the diameter; in


some cases, it can be much shorter than the diameter.
Properties of Network: Clustering Coefficient

 What portion of i’s neighbors are connected


 Node i with degree ki

 where ei is the number of edges between the neighbors of node i

 Average clustering
Measurement on a real network

 MSN Messenger activity in June 2006:


 150Gb/day (compressed)
 4.5Tb / month
 245 million users logged in
 180 million users engaged in conversations
 More than 30 billion conversations
 More than 255 billion exchanged messages
MSN Network: Degree Distribution (log-log plot)
MSN Network: Clustering

Avg.
clustering of
the MSN:
C = 0.1140
MSN Network: Diameter

 Avg. path length


6.6

 90% of the people


can be reached in
< 8 hops
Random Graphs

 We use the notation Gnp to denote the undirected Erdös-


Renyi graph - a Simple Random Graph Model.
 Undirected graph with n nodes
 Each edge (u, v) is formed with probability p ∈ (0, 1)
independently of every other edge (i.i.d.).
 n and p don’t uniquely determine the graph.
 We can have many different realisation for the same n, and p.
How likely is a graph on E edges

 The probability that a given Gnp, produces a graph of


exactly E edges, P(E).

 P(E) is the Binomial distribution


 Number of successes in a sequence
of n independent binary (yes/no)
experiment.
MSN Vs Random Graphs

 Degree distribution

 Clustering Coefficient

 Connected component 99% almost there


Real Network Vs Random Graphs

 Are real networks random graphs?


 The answer is simply NO!

 If Gnp is wrong, why did we spend time on it?


 It is the reference model for our analysis.
 It will help us calculate many quantities, that can then be
compared to the real data.
 It will help us understand to what degree is a particular property
the result of some random process.

 While Gnp is WRONG, it will turns out to be extremely


useful.
Small World phenomena

 Origins of a small-world idea: the Bacon number


 Create a network of Hollywood actors
 Connect two actors if they co-appeared in the movie.

 Bacon number: number of steps to Kevin Bacon


 As of Dec 2007, the highest (finite) Bacon number
reported is 8
 Only approx. 12% of all actors cannot be linked to
Bacon

 A recent study has shown, in fact Christopher


Lee is the actual center of the movie universe.
Small World Properties

 Small diameter
 High clustering
Small World phenomena

 Taking a connected graph and adding a very small


number of edges randomly, the diameter tends to drop
drastically.
 This is known as the small world phenomenon.

 Short-term memory uses small world networks between


neurons to remember this sentence.

 In modern mathematics, the center of the network of co-


authorship is considered to be P. Erdős,
 resulting in the so-called Erdős number.
 Erdős numbers are small!
Small World Experiment

 What is the typical shortest path length between any two


people?
 Experiment on the global friendship network
 Can’t measure, need to probe explicitly

 Small-world experiment [Milgram ’67]


 Picked 300 people in Omaha, Nebraska and Wichita, Kansas
 Ask them to send a letter to a stock-broker in Boston by passing
it to somebody they know and they think she/he could be related
to the broker.
 How many steps did it take?
Small World Experiment: 6 degrees of separation

 64 letters reached the target


 It took 6.2 steps on average

 Further observations:
 People who owned stock had shortest paths to the stockbroker
than random people
 People from the Boston area have even closer paths: 4.4
Criticism to Milgram Experiment

 31 of 64 chains passed through 1 of 3 people as their


final step
 Not all links/nodes are equal
 Starting points and the target were non-random
 People in the experiment follow some strategy (e.g.,
geographic routing) instead of forwarding the letter to
everyone.
 They are not finding the shortest path!
 There are not many samples (only 64)
 People might have used extra information resources
Another Small World Experiment

 In 2003 Dodds, Muhamad and Watts performed the


experiment using e-mail:
 18 targets of various backgrounds
 24,000 first steps (~1,500 per target)
 65% dropout per step
 384 chains completed (1.5% of emails reached the target)
 Average path length 4.01

After the correction, average


path length is ~ 7
Degree Distribution

 Degree distribution in a random graph


 P(k) is an exponential function of k

 Observation:
 Power Law
Degree Distribution
Node Degrees: Internet Autonomous system

[Faloutsos3,1999]
Node Degrees: Web

[Broader, et all, 2000]


Node Degrees: other networks

[Barabasi, Albert, 1999]


Power-law degree exponent

 Power-law degree exponent is typically 2 < α < 3


 Web graph: αin = 2.1, αout = 2.4
[Broder et al. 00]
 Autonomous systems: α = 2.4
[Faloutsos3, 99]
 Actor-collaborations: α = 2.3
[Barabasi-Albert 00]
 Citations to papers: α ≈ 3
[Redner 98]
 Online social networks: α ≈ 2
[Leskovec et al. 07]
Scale-Free network

 Networks with a power law tail in their degree


distribution are called “scale-free networks”.

 The name is coming from scale invariance property.

 Scale-free function:

 Power law function:


Power Laws are Everywhere
Mathematics of Power Law

 Above a certain x value, Power law is always higher


than the exponential.
Radom Vs. Scale free Network
Preferential attachment

 Nodes arrive in order 1,2,…,n


 At step j, let di be the degree of node i < j
 A new node j arrives and creates m out-links
 Prob. of j linking to a previous node i is proportional to
the degree of node i, that is di.

[Price ‘65, Albert-Barabasi ’99, Mitzenmacher ‘03]


Rich get Richer

 New nodes are more likely to link to nodes that already


have high degree

 Examples:
 Citation: New citations to a paper are proportional to the number
it already has.
Spreading through networks

 Behaviors that cascade from node to node like an


epidemic

 Examples:
 Biological:
 Diseases via contagion
 Technological:
 Cascading failures
 Spread of information
 Social:
 Rumors, news
Diffusion Model

 Probabilistic models
 Models of influence or disease spreading
 Example: You “catch” a disease with some probability from each
active neighbour in the network

 Decision based models


 Models of product adoption, decision making
 A node observes decisions of its neighbours and makes its own
decision
 Example: You watch a movie if k of your friends told you about it.
Decision Based Model of Diffusion

 Example Scenario:
 Assume a network where everyone starts chose action B
 Small set S of had chosen A
 If more than 50% of one’s friends have chosen A, one will also
change their action to A.
 threshold level for adopting A is set as, q > 1/2
Example Scenario
Example Scenario
Example Scenario
Example Scenario
Example Scenario
Network Cascade

 Consider infinite graph G


 each node has finite number of neighbours

 We say that a finite set S causes a cascade in G with


threshold q if, when S adopts A, eventually every node in G
adopts A.

 The “cascade capacity” of a graph G is the largest q for which


some finite set S can cause a cascade.

 Fact: There is no (infinite) G where cascade capacity > ½ .


 Proof idea: Suppose such G exists: q>½, finite S causes cascade.
 Show contradiction: Argue that nodes stop switching after a finite #
of steps.
Examples of infinite graphs

 Infinite Path: If q<1/2 then cascade occurs

 Infinite Tree: If q<1/3


then cascade occurs

 Infinite Grid: If q<1/4


then cascade occurs
Food for thought

 Stopping Cascade
 Let S be an initial set of adopters of A
 All nodes apply threshold q to decide whether to switch to A
 What prevents cascades from spreading?
Diffusion Model

 Probabilistic models
 Models of influence or disease spreading
 Example: You “catch” a disease with some probability from each
active neighbour in the network

 Decision based models


 Models of product adoption, decision making
 A node observes decisions of its neighbours and makes its own
decision
 Example: You watch a movie if k of your friends told you about it.
Probabilistic Model of Diffusion

 Epidemic Model based on Random Trees.

 A patient meets d other people


 With probability q > 0 infects each of them
 Question is: for which values of d and q does the epidemic run
forever?
Epidemic

 Let ph = probability that there is an infected node in


depth h of the tree.
 Epidemic will die out if 0

 Recurrence for ph on tree

 result of iterating
Epidemic

 p1 =1:

 For the epidemic to die out we need f(x) to be bellow


y=x.

 𝒒⋅𝒅 = expected number of people that we infect


Spreading Models of viruses
General Epidemic Model
SIR Model
SIS Model
SIS Model
Epidemic Threshold
Epidemic Threshold in SIS model
Experiment
Independent Cascade model
Independent Cascade model
Exposure and Adaptation
Exposure Curve
Exposure Curve
Example Application
Diffusion in Viral Marketing
Small Experiment

 Gephi
 Exploratory data analysis and visualisation tool for graphs and
networks

 Available data sets to work with


 Movie Ratings: imdb data set
 http://www.imdb.com/interfaces
 Your own facebook data
 Login into Facebook account and search for Netvizz
Reading

 Networks, Crowds, and Markets: Reasoning about a Highly


Connected World
 Chapters 18 on Power Law
 Chapter 20 on Small World Phenomena
 Chapters 19 & 21 on epidemics

You might also like