0% found this document useful (0 votes)

105 views8 pages

A Case of The Good, The Bad and The Ugly: A Note On Determining Valuable Originators in Preferential Attachment Networks

This document summarizes a study that used computer simulations of preferential attachment networks to analyze how the position of originator nodes in a social network can influence the diffusion of ideas or innovations. The study clustered 60 originator nodes into three groups ("good", "bad", "ugly") based on the diffusion characteristics of ideas spreading from those nodes. It found that originator nodes with higher measures of network centrality (eigenvector centrality, average reciprocal distance, k-step reach) tended to be in the "good" cluster, indicating more rapid diffusion, supporting the hypothesis that more central nodes make better originators.

Uploaded by

Varun Gangwar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

105 views8 pages

A Case of The Good, The Bad and The Ugly: A Note On Determining Valuable Originators in Preferential Attachment Networks

Uploaded by

Varun Gangwar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

A case of The Good, The Bad and The Ugly: a note on determining valuable originators in preferential attachment networks

Abhishek Nagaraj1 Aseem Sood2

February 22, 2010

Abstract Using computer simulations of preferential attachment networks we find support for the hypothesis that well connected nodes (as defined by standard parameters of network centrality) are ideal candidates to be chosen as originators in a social network. We do this by successfully clustering candidate originator nodes into three distinct groups based on their corresponding diffusion characteristic and then commenting on their centrality parameters.

1 2

Indian Institute of Management Calcutta. ([email protected]) Indian Institute of Management Calcutta. ([email protected])

We are thankful to Prof. Arijit Sen of the Indian Institute of Management Calcutta under whose guidance this term paper was written.

1. INTRODUCTION What causes an innovation or an idea to spread in a social network is not only a fundamentally interesting problem, but it is also one with substantial sociological considerations. Research in this area can be used to support theories about why certain ideas catch on, why certain technologies become popular or why a certain disease reaches epidemic proportions. Specifically we ask, how does the originator and her position in a social network influence the nature of diffusion? For example, how can computer companies potentially engineer widespread software adoption by finding the right people to give free copies to? How can locating and quarantining the right people arrest the spread of virulent diseases? These are a few questions that motivate this article. In this study, we use a simulation-based approach to generate our dataset. After having repeatedly tracked the diffusion of innovations in algorithmically generated networks we find support for the hypothesis that the more central a node is, the better a candidate it is for origination. More specifically we find that good originators have favourable centrality scores when measured using standard parameters of node centrality, namely Eigen Vector Centrality, Average Reciprocal Distance (ARD) and K-Step Reach (k=3 in our particular case). 2. STUDY DESIGN The data for our study comes from a NETLOGO simulation3 we built to analyse diffusion. A screenshot of the simulation area is shown in Fig. 1.

Fig. 1

The simulation builds upon work by Adamic and Bakshy (SmallWorldDiffusionSIS, 2008) and Wilensky, U. (NetLogo Preferential Attachment model, 2005).

The network, as represented in the rightmost area of the simulation, is composed of nodes, which represent entities, and links, which represent relationships that exist between nodes. We do not attach any significance to the nature of this relationship and posit simply that all links represent a homogenous relationship that exists among any two nodes which share a link in common. The network itself consists of 500 nodes and was constructed algorithmically using a preferential attachment scheme (Barabsi 2000) which is described as follows. The Preferential Attachment scheme supposes that when a network grows, newcomers to the network prefer expending resources on establishing relationships with already well connected nodes. It uses the notion of degree (i.e., the number of neighbours a node has) to define connectedness in this case. Wilensky (2005) describes the process of the network construction as follows: The model starts with two nodes connected by an edge. At each step, a new node is added. A new node picks an existing node to connect to randomly, but with some bias. More specifically, a node's chance of being selected is directly proportional to the number of connections it already has, or its "degree." This is the mechanism which is called "preferential attachment." Once the network has been constructed using this scheme we randomly generate an origin node for our infection. The dots which are in red in Fig. 1 are the ones which have already contracted the infection. Once an origin has been chosen, the simulation runs as follows: For each infected node, the probability that each of its uninfected neighbours contracts the infection is 10%. Each such iteration counts as a tick the unit of time we use for our measurement. The simulation therefore, starts with one red dot till all the dots have been infected red. Given a particular network and given an origin node we record the percentage number of nodes infected at each point in time. Fig. 2 is a sample graph that is generated from one such diffusion profile.
1387
120

Percent infected

100 80 60 40 20 0 0 20 40 60 80 100 120 140 Time (ticks)

Fig. 2

We repeated this exercise for 60 randomly chosen origin nodes across three such large preferential attachment networks. Next we use UCINet (Borgatti et al. 2002), a social network analysis tool, to calculate centrality parameters for each of our networks and the originator nodes. Combining data generated from our NETLOGO simulations and from UCINet analysis we build the dataset of 60 nodes; an extract of which is shown as Table 1.
NodeID 1003 1020 1075 1080 1084 1127 Degree 0.002 0.032 0.012 0.01 0.002 0.004 3Step 0.416 0.244 0.28 0.282 0.416 0.082 ARD 0.291 0.268 0.264 0.263 0.291 0.202 Eigenvector 0.118 0.023 0.028 0.027 0.118 0.004 per25 33 32 20 34 19 29 per50 44 43 30 49 31 42 per75 56 54 45 70 46 59 per90 70 72 60 86 67 73 per95 82 83 70 101 76 79

Table 1 In Table 1. the variables per25, per50, per75, per90and per95 represent the number of ticks it took for the network to be infected 25%, 50%, 75%, 90% and 95% respectively when the origin of the infection was the node indicated by NodeID. We use three standard centrality measures (as calculated by UCINet): Average Relative Distance, 3Step Reach and Eigenvector centrality. These measures, well developed within the field of Network Analysis, use differing approaches to measuring how central a node is in a given network. We include their definitions and supporting notes in Appendix III. Once the dataset of 60 nodes has been built, as shown in Table 1, we use the variables per25, per50, per75, per90and per95 to perform k-means clustering. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. Depending on the diffusion characteristic (as defined by 5 time-points along the curve shown in Fig. 2) we cluster our dataset into three distinctive groups. The means of the diffusion times in each of these three clusters is shown in Table 2.
Clust er bad good ugly Distance clust-1 21.971 70.175 76.232 Distance clust-2 70.413 16.141 143.867 Distance clust-3 74.789 142.961 28.000 per25 51.348 24.889 70.727 per50 65.652 35.519 94.909 per75 83.087 50.333 117.909 per90 101.087 67.481 137.727 per95 111.087 79.407 151.545

Table 2 Having clustered all of our nodes into the three clusters based on their realized diffusion characteristic, it now remains to be shown that the centrality measures of nodes in these three clusters define the behaviour that they demonstrate.

3. RESULTS First, we establish the nature of the three clusters we have obtained in Table 2. To do this we plot an estimate of the diffusion curve shown by the average node in each cluster as defined by the average values of per25, per50, per75, per90and per95 of each node in the cluster. This graph is as shown in Fig 3.

Fig. 3 As can be seen in Fig.3 each cluster represents an average diffusion characteristic that can be compared strictly with any other cluster. The cluster, which we name good, on an average has nodes which were the origins of faster diffusions than those in the cluster bad, the nodes in which, in turn, did better than those in the cluster ugly. We list diffusion graphs for nodes in each of these three clusters in Appendix II. Having established this notion of ranking of three clusters we are in a position to look at centrality measures which define these networks. A summary of these centrality measures can be found in Table 3.

Cluster good bad ugly

3Step 0.176 0.100 0.043

ARD 0.202 0.155 0.052 Table 3

EigenVector 0.031 0.014 0.001

Table 3, our main results table, shows that all the three centrality measures 3Step, ARD and Eigenvector are highest for the cluster good followed by the values for the cluster bad and then the cluster ugly. This finding is the support for the result we are looking for: nodes which demonstrate high centrality measures are indeed good originators if rapid diffusion is desired. 3. FINAL THOUGHTS Barabsi and Albert (1999) report that preferential attachment networks can potentially be a good model for diverse sets of real-world networks, including the world-wide web and the network of scientific citations. We therefore feel confident in saying that our results could be used to investigate what could turn out be important phenomena in a wide variety of real-world networks. Moreover, our study can be the starting point for a variety of allied investigations into the characteristics of originator nodes. In particular, we could look at differentiating among centrality measures and establish conditions under which a few measures are more reliable than others. Another line of investigation could be to measure the impact of centrality on the speed of diffusion. A preliminary logistic regression that we perform is included in Appendix I and seems to indicate there could be significant differences in the effect of each of the three measures in promoting rapid diffusion. Finally we conclude by restating our belief in the importance of analysing originator node centrality in understanding the nature of the diffusion process. Concentrating on where an idea begins could lead to rich dividends in terms of how fast it spreads. 4. REFERENCES
Albert-Lszl Barabsi (2000) Linked: The New Science of Networks, Perseus Publishing, Cambridge, Massachusetts, pages 79-92. Albert-Laszlo Barabsi and Reka Albert (1999), Emergence of scaling in random networks, Science volume 286, pp. 509 Borgatti, S.P., Everett, M.G. and Freeman, L.C. (2002) Ucinet for Windows: Software for Social Network Analysis. Harvard, MA: Analytic Technologies. Wilensky, U. (1999). NetLogo. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL. Wilensky, U. (2005). NetLogo Preferential Attachment model http://ccl.northwestern.edu/netlogo/models/PreferentialAttachment

Appendix I LOGIT REGRESSION

Appendix II SAMPLE DIFFUSION CHARACTERISTICS ACROSS CLUSTERS

Cluster 1 Bad
1387
120
Percent infected

1159
120

100 80 60 40 20 0 0 20 40 60 80 100 120 140 Time (ticks)

Percent infected

100 80 60 40 20 0 0 50 100 Time (ticks) 150 200

Cluster 2 Good
2207
120

2468
120
Percent infected

Percent infected

100 80 60 40 20 0 0 50 100 Time (ticks) 150 200

100 80 60 40 20 0 0 50 100 150 200 250 Time (ticks)

Cluster 3 Ugly
3303
120
Percent Infected

3180
120
Percent Infected

100 80 60 40 20 0 0 20 40 60 80 100 120 140 Time (ticks)

100 80 60 40 20 0 0 50 100 150 200 250 Time (ticks)

Appendix III CENTRALITY MEASURES AND THEIR DEFINITIONS

Eigenvector Centrality Given a network, we can define an adjacency matrix. For each edge between node i and j, Aij =1, else Aij =0. Now, the centrality of vertex i (denoted ci) is given by ci =aSAijcj where a is a parameter to ensure non-trivial solutions for centrality. These centralities, thus obtained are the elements of the corresponding eigenvector. The normalized eigenvector centrality is the scaled eigenvector centrality divided by the maximum difference possible expressed as a percentage. Therefore the greater the value of the Eigenvector Centrality for a node, the greater is its connectedness. Average Reciprocal Distance The far-ness of a node is measured as the sum of the length to all other nodes. The reciprocal of this summation is closeness. Now shift the reciprocal to before the summation sign so as to remove the effect of infinite distance between two nodes. The distance so obtained is normalized with respect to the maximum value of closeness, so as to obtain the ARD in percentage terms. The Average Reciprocal Distance (ARD) is thus a measure of closeness centrality i.e. how short the path lengths between nodes are. The greater the value of ARD, the greater is the connectedness of the node. ci = S1/dj where dj is the number of edges traversed to reach node i from node j and ci is the closeness parameter. Normalized ARD = ci / cmax - cmin 3-Step Reach 3 Step Reach is measures the connectedness of a node within a circle of influence. For a node, the 3 Step Reach is defined as, the number of nodes reachable within 3 Steps. Therefore the greater the 3 Step Reach of a node, the greater is its circle of influence.

Social Network Analysis Notes
No ratings yet
Social Network Analysis Notes
8 pages
Graph Cumulants
No ratings yet
Graph Cumulants
69 pages
SNA - T4-5 - Pagerank and Communities
No ratings yet
SNA - T4-5 - Pagerank and Communities
56 pages
A Dirichlet Stochastic Block Model For Composition-Weighted Networks
No ratings yet
A Dirichlet Stochastic Block Model For Composition-Weighted Networks
27 pages
PHD Student - Population-Based Surveillance
No ratings yet
PHD Student - Population-Based Surveillance
22 pages
Hafta
No ratings yet
Hafta
41 pages
Unit 4
No ratings yet
Unit 4
35 pages
Module4 Networkmodels
No ratings yet
Module4 Networkmodels
68 pages
DSP Unit 5
No ratings yet
DSP Unit 5
33 pages
Dynamic Network Models
No ratings yet
Dynamic Network Models
16 pages
1564 Axioms For Centrality
No ratings yet
1564 Axioms For Centrality
41 pages
SNA Presentation Training IRA ICARDA (SNA Social Network Analysis)
No ratings yet
SNA Presentation Training IRA ICARDA (SNA Social Network Analysis)
41 pages
Fajardo-Gardner2013 Article InferringContagionPatternsInSo
No ratings yet
Fajardo-Gardner2013 Article InferringContagionPatternsInSo
28 pages
Approximating Network Centrality Measures Using Node Embedding and Machine Learning
No ratings yet
Approximating Network Centrality Measures Using Node Embedding and Machine Learning
16 pages
Centrality - Wikipedia
No ratings yet
Centrality - Wikipedia
75 pages
Social
No ratings yet
Social
67 pages
Unit Iv
No ratings yet
Unit Iv
51 pages
Introduction To Datascience (R20DS501)
No ratings yet
Introduction To Datascience (R20DS501)
23 pages
A Hierarchical Approach For Influential Node Ranki - 2018 - Expert Systems With
No ratings yet
A Hierarchical Approach For Influential Node Ranki - 2018 - Expert Systems With
12 pages
Network Analytics An Introduction and Illustrative Applications in Health Data Science
No ratings yet
Network Analytics An Introduction and Illustrative Applications in Health Data Science
12 pages
Social Network Analysis
No ratings yet
Social Network Analysis
38 pages
Diffusion On Complex Networks
No ratings yet
Diffusion On Complex Networks
28 pages
FULL CosciaM1099 PDF
No ratings yet
FULL CosciaM1099 PDF
11 pages
Chap 1
No ratings yet
Chap 1
17 pages
Understanding The Influence of All Nodes in A Network: Scientific Reports
No ratings yet
Understanding The Influence of All Nodes in A Network: Scientific Reports
9 pages
Betweenness Centrality - Wikipedia
No ratings yet
Betweenness Centrality - Wikipedia
22 pages
Study On Centrality Measures in Social Networks: A Survey: Kousik Das Sovan Samanta Madhumangal Pal
No ratings yet
Study On Centrality Measures in Social Networks: A Survey: Kousik Das Sovan Samanta Madhumangal Pal
11 pages
Institute of Engineering and Technology, Lucknow: Special Lab (Kcs-751)
No ratings yet
Institute of Engineering and Technology, Lucknow: Special Lab (Kcs-751)
37 pages
Node Significance Paper
No ratings yet
Node Significance Paper
8 pages
Geographical Context in Community Detection: A Comparison of A Node Based and A Link-Based Approach
No ratings yet
Geographical Context in Community Detection: A Comparison of A Node Based and A Link-Based Approach
6 pages
2 Introduction To Network Analysis-Network Structure - Jasmine Mondolo
No ratings yet
2 Introduction To Network Analysis-Network Structure - Jasmine Mondolo
29 pages
Institute of Engineering and Technology, Lucknow: Special Lab (Kcs-751)
No ratings yet
Institute of Engineering and Technology, Lucknow: Special Lab (Kcs-751)
37 pages
Cascading Behavior in Social and Economic Networks: Jon Kleinberg
No ratings yet
Cascading Behavior in Social and Economic Networks: Jon Kleinberg
3 pages
Social Network Analysis and Mining
No ratings yet
Social Network Analysis and Mining
10 pages
Young 00 Diffusion
No ratings yet
Young 00 Diffusion
20 pages
A Random Walk Model For Infection On Graphs Spread
No ratings yet
A Random Walk Model For Infection On Graphs Spread
22 pages
Social Network Analysis - AA - Article - Session 12
No ratings yet
Social Network Analysis - AA - Article - Session 12
56 pages
Identifying Influential Spreaders in Complex Networks
No ratings yet
Identifying Influential Spreaders in Complex Networks
31 pages
HCMUT MATHS4CS 055263 Assignment Community Structure Identification IMP
No ratings yet
HCMUT MATHS4CS 055263 Assignment Community Structure Identification IMP
10 pages
M Stage 8 p110 02 Afp PDF
67% (3)
M Stage 8 p110 02 Afp PDF
14 pages
EpidemicModeling s20 Skvortsov
No ratings yet
EpidemicModeling s20 Skvortsov
6 pages
Benhiba. A Classification of Healthcare Social Network Analysis Applications
No ratings yet
Benhiba. A Classification of Healthcare Social Network Analysis Applications
12 pages
Epidemic Models On Social Networks - With Inference: Tom Britton, Stockholm University August 16, 2019
No ratings yet
Epidemic Models On Social Networks - With Inference: Tom Britton, Stockholm University August 16, 2019
20 pages
Social Networks and Microfinance: Abhijit Banerjee Arun G. Chandrasekhar Esther Duflo Matthew Jackson
No ratings yet
Social Networks and Microfinance: Abhijit Banerjee Arun G. Chandrasekhar Esther Duflo Matthew Jackson
24 pages
Degree Centrality, Betweenness Centrality, and Closeness Centrality in Social Network
No ratings yet
Degree Centrality, Betweenness Centrality, and Closeness Centrality in Social Network
4 pages
Script
No ratings yet
Script
2 pages
Network Sys
No ratings yet
Network Sys
10 pages
The Mathematics of Networks
No ratings yet
The Mathematics of Networks
12 pages
Graph Centrality Measures
No ratings yet
Graph Centrality Measures
3 pages
Degree of Diffusion in Real Complex Networks
No ratings yet
Degree of Diffusion in Real Complex Networks
12 pages
Predicting Temporal Variance of Information Cascades in Online Social Networks
No ratings yet
Predicting Temporal Variance of Information Cascades in Online Social Networks
11 pages
Research Methodology and Biostatistics - Syllabus & Curriculum - M.D (Hom) - WBUHS
100% (1)
Research Methodology and Biostatistics - Syllabus & Curriculum - M.D (Hom) - WBUHS
5 pages
Abstract - : 1.1 Network Evolution Models
No ratings yet
Abstract - : 1.1 Network Evolution Models
6 pages
The Mathematics of Networks: M. E. J. Newman
No ratings yet
The Mathematics of Networks: M. E. J. Newman
12 pages
A Combination of Hidden Markov Model and Fuzzy Model For Stock Market Forecasting PDF
No ratings yet
A Combination of Hidden Markov Model and Fuzzy Model For Stock Market Forecasting PDF
8 pages
Project Planning and Approval Worksheet
100% (2)
Project Planning and Approval Worksheet
8 pages
Techniques of Integration
No ratings yet
Techniques of Integration
17 pages
02 Eisenman Cardboard Architecture
No ratings yet
02 Eisenman Cardboard Architecture
12 pages
AI V1 V2 V3 Fall 2020 - 21 Assg 02
No ratings yet
AI V1 V2 V3 Fall 2020 - 21 Assg 02
3 pages
GWA Calculator by FilipiKnow
No ratings yet
GWA Calculator by FilipiKnow
17 pages
Angle Section: Design Capacities
No ratings yet
Angle Section: Design Capacities
6 pages
Pgpool II Tutorial
No ratings yet
Pgpool II Tutorial
6 pages
9 Fourier Transform Properties: Solutions To Recommended Problems
No ratings yet
9 Fourier Transform Properties: Solutions To Recommended Problems
15 pages
Cad Unit-3 PDF
No ratings yet
Cad Unit-3 PDF
18 pages
First Term MTH
No ratings yet
First Term MTH
2 pages
APMOPS (SMOPS) 2008 First Round With Answers
No ratings yet
APMOPS (SMOPS) 2008 First Round With Answers
6 pages
Math 155 Lecture Notes Section 10,2
No ratings yet
Math 155 Lecture Notes Section 10,2
7 pages
High-Level Interpretability Detecting An AI's Objectives - LessWrong
No ratings yet
High-Level Interpretability Detecting An AI's Objectives - LessWrong
31 pages
ME Math 10 Q2 1002 PS
No ratings yet
ME Math 10 Q2 1002 PS
26 pages
Okun'S Law in Malaysia: An Autoregressive Distributed Lag (Ardl) Approach With Hodrick-Prescott (HP) Filter
No ratings yet
Okun'S Law in Malaysia: An Autoregressive Distributed Lag (Ardl) Approach With Hodrick-Prescott (HP) Filter
9 pages
Vidyapeeth: @icse - 2024 - Materials - Backup
No ratings yet
Vidyapeeth: @icse - 2024 - Materials - Backup
7 pages
Tulane University Sea-Level Rise Study
No ratings yet
Tulane University Sea-Level Rise Study
11 pages
SPSS
No ratings yet
SPSS
30 pages
Computer Fundamentals and Programming Using Dev C++
No ratings yet
Computer Fundamentals and Programming Using Dev C++
16 pages
Form 1 Term 2 Mathematics SOW 2024
No ratings yet
Form 1 Term 2 Mathematics SOW 2024
4 pages
Mathematics Assignment Term 3 by Ashhal Ayubi & Ayman Mondal
No ratings yet
Mathematics Assignment Term 3 by Ashhal Ayubi & Ayman Mondal
10 pages
Department of Education: Brigada Eskwela School Accomplishment Report (F7)
No ratings yet
Department of Education: Brigada Eskwela School Accomplishment Report (F7)
4 pages
ME2134 Summary Feedback
No ratings yet
ME2134 Summary Feedback
13 pages
Case-Based MCQs Trigonometry
No ratings yet
Case-Based MCQs Trigonometry
4 pages
Chapter 2 Modeling in The Frequency Domain
No ratings yet
Chapter 2 Modeling in The Frequency Domain
3 pages
San Francisco Bread Co
No ratings yet
San Francisco Bread Co
3 pages
Ex 265 266
No ratings yet
Ex 265 266
2 pages
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
From Everand
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
Björn Olsson
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
From Everand
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
Sebastian Thelen
5/5 (1)
Advanced Dynamic-System Simulation: Model Replication and Monte Carlo Studies
From Everand
Advanced Dynamic-System Simulation: Model Replication and Monte Carlo Studies
Granino A. Korn
No ratings yet
Communication and Network Security: CISSP, #4
From Everand
Communication and Network Security: CISSP, #4
Selwyn Classen
No ratings yet
Attractor Networks: Fundamentals and Applications in Computational Neuroscience
From Everand
Attractor Networks: Fundamentals and Applications in Computational Neuroscience
Fouad Sabry
No ratings yet
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet