0% found this document useful (0 votes)
73 views20 pages

Social Network Analysis

The document discusses social network analysis and centrality measures. It introduces concepts like degree centrality, closeness centrality, betweenness centrality, and clustering coefficient. It also demonstrates creating undirected and directed graphs in NetworkX and calculating centrality measures on sample graphs.

Uploaded by

Nipuni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views20 pages

Social Network Analysis

The document discusses social network analysis and centrality measures. It introduces concepts like degree centrality, closeness centrality, betweenness centrality, and clustering coefficient. It also demonstrates creating undirected and directed graphs in NetworkX and calculating centrality measures on sample graphs.

Uploaded by

Nipuni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

social-network-analysis

March 24, 2024

[22]: # Get the required Packages

import numpy as np # linear algebra


import pandas as pd # data processing
import networkx as nx # network creation
import matplotlib.pyplot as plt # visualization
from IPython import display # for displaying Image

0.1 1. Undirected Graph


To create an undirected graph and add nodes and edges to a graph
[2]: # To create an empty undirected graph
G = nx.Graph()

# To add a node
G.add_node(1)
G.add_node(2)
G.add_node(3)
G.add_node(4)
G.add_node(7)
G.add_node(9)

#visualize
nx.draw_networkx(G)

1
[3]: # Adding multiple nodes at a time
G1=nx.Graph()
G1.add_nodes_from(['A','B','C','D','E','F'])

#visualize
nx.draw_networkx(G1)

2
[4]: # Adding multiple edges at a time
G1.add_edges_from([('A','B'),␣
↪('C','A'),('B','D'),('D','A'),('F','E'),('E','C'),('D','F')])

#visualize
nx.draw_networkx(G1)

3
[5]: # To add an edge since the graph is undirected, the order of nodes in edges␣
↪doesn't matter

G.add_edge(1,2)
G.add_edge(3,1)
G.add_edge(2,4)
G.add_edge(4,1)
G.add_edge(9,1)
G.add_edge(1,7)
G.add_edge(2,9)

# Adding multiple edges at a time


#G.add_edges_from([(1,2), (3,1),(2,4),(4,1),(9,1),(1,7),(2,9)])

#visualize
nx.draw_networkx(G)

4
[6]: # Homework - Try to generate an adjacency matrix for the above.

[7]: # To get all the nodes of a graph


node_list = G.nodes()
print("#1 Node List")
print(node_list)

# To get all the edges of a graph


edge_list = G.edges()
print("#2 Edge List")
print(edge_list)

#1 Node List
[1, 2, 3, 4, 7, 9]
#2 Edge List
[(1, 2), (1, 3), (1, 4), (1, 9), (1, 7), (2, 4), (2, 9)]

[8]: # To remove a node of a graph


G.remove_node(3)
node_list = G.nodes()
print("Node 3 is removed")
print(node_list)

5
#visualize
nx.draw_networkx(G)

Node 3 is removed
[1, 2, 4, 7, 9]

[9]: G1.remove_node('E')
nx.draw_networkx(G1)

6
[ ]:

[10]: # To remove an edge of a graph


G.remove_edge(1,2)
edge_list = G.edges()
print("Edge connecting node 1 and 2 is removed")
print(edge_list)

#visualize
nx.draw_networkx(G)

Edge connecting node 1 and 2 is removed


[(1, 4), (1, 9), (1, 7), (2, 4), (2, 9)]

7
[11]: # To find number of nodes
n = G.number_of_nodes()
print("Number of Nodes")
print(n)

# To find number of edges


m = G.number_of_edges()
print("Number of Edges")
print(m)

# To find degree of a node


# d will store degree of node 2
d = G.degree(2)
print("The degree of Node 2")
print(d)

# To find all the neighbor of a node


neighbor_list = list(G.neighbors(2))
print("Neighbour list of Node 2")
print(neighbor_list)

Number of Nodes
5
Number of Edges

8
5
The degree of Node 2
2
Neighbour list of Node 2
[4, 9]

[12]: # neighbor_list type is a dict_keyiterator so we convert it into a list

0.2 Centrality Measures


0.2.1 1. Degree Centrality - number of connections for a node
This is based on the assumption that important nodes have many connections.For finding very
connected individuals, popular individuals, individuals who are likely to hold most information or
individuals who can quickly connect with the wider network.
Centrality_{degree}(v) = d_v/(|N|-1), where d_v is the Degree of node v and N is the set of all
nodes of the Graph.
[13]: centrality = nx.degree_centrality(G)
centrality

[13]: {1: 0.75, 2: 0.5, 4: 0.5, 7: 0.25, 9: 0.5}

[14]: # For directed graphs


#in_deg_centrality = nx.in_degree_centrality(G)
#out_deg_centrality = nx.out_degree_centrality(G)

0.2.2 2. Closeness Centrality - Closest node in the network


This is based on the assumption that important nodes are close to other nodes. It is calculated
as the sum of the path lengths from the given node to all other nodes. How close a node is to all
other nodes.Used for finding the individuals who are best placed to influence the entire network
most quickly.
[15]: close_centrality = nx.closeness_centrality(G)

# G is the Graph
close_centrality

[15]: {1: 0.8,


2: 0.5714285714285714,
4: 0.6666666666666666,
7: 0.5,
9: 0.6666666666666666}

9
0.2.3 3. Betweenness Centrality - Most connected(important) node
It is a measure of how often a node appears in the shortest path connecting two other
nodes.Betweenness centrality quantifies how many times a particular node comes in the shortest
chosen path between two other nodes.
This measure shows which nodes are ‘bridges’ between nodes in a network. It does this by identifying
all the shortest paths and then counting how many times each node falls on one.
For Graphs with a large number of nodes, the value of betweenness centrality is very high. So,
we can normalize the value by dividing with number of node pairs (excluding the current node).
For Directed Graphs, the number of node pairs are (|N|-1)(|N|-2), while for Undirected Graphs, the
number of node pairs are (1/2)(|N|-1)*(|N|-2).

[16]: bet_centrality = nx.betweenness_centrality(G, normalized = True, endpoints =␣


↪False)

# parameters normalized and endpoints ensure whether we normalize the value and␣
↪consider the endpoints respectively.

bet_centrality

[16]: {1: 0.5833333333333333,


2: 0.08333333333333333,
4: 0.16666666666666666,
7: 0.0,
9: 0.16666666666666666}

[17]: # Question: What is Page Rank?

0.3 Clustering Coefficient


[18]: # global clustering coefficient
Clustering_coeff=nx.average_clustering(G)
Clustering_coeff

[18]: 0.0

[19]: # local clustering coefficient


CC=nx.clustering(G)
CC

[19]: {1: 0, 2: 0, 4: 0, 7: 0, 9: 0}

[20]: #To delete all the nodes and edges


G.clear()

#visualize
nx.draw_networkx(G)

10
0.4 Example - Symmetric Network
Below you see a network of Bollywood actors as nodes. They are connected with solid lines if they
have worked together in at least one movie.
The first network of actors that we created below is a symmetric network because the relationship
“working together in a movie” is a symmetric relationship. If A is related to B, B is also related to
A. Let us create the network we saw above in NetworkX.
We will be using the Graph() method to create a new network and add_edge() to add an edge
between two nodes.
[23]: display.Image("C:/Users/ramaleer/Desktop/python/Practical 4/image.png")
[23]:

11
Each network consists of:
• Nodes: The individuals whose network we are building. Actors in the above example.
• Edges: The connection between the nodes. It represents a relationship between the nodes of
the network. In our example, the relationship was that the actors have worked together.
[24]: # Graph = This class implements an undirected graph

G_symmetric = nx.Graph() # Initialize graph


G_symmetric.add_edge('Amitabh Bachchan','Abhishek Bachchan')
G_symmetric.add_edge('Amitabh Bachchan','Aamir Khan')
G_symmetric.add_edge('Amitabh Bachchan','Akshay Kumar')
G_symmetric.add_edge('Amitabh Bachchan','Dev Anand')
G_symmetric.add_edge('Abhishek Bachchan','Aamir Khan')
G_symmetric.add_edge('Abhishek Bachchan','Akshay Kumar')
G_symmetric.add_edge('Abhishek Bachchan','Dev Anand')
G_symmetric.add_edge('Dev Anand','Aamir Khan')

12
[25]: #visualize what we have generated
nx.draw_networkx(G_symmetric)

0.5 2. Directed Graph


[26]: # DiGraph stands for directed graphs
G = nx.DiGraph()

# create the edges - The order matters!!


G.add_edges_from([(1, 1), (1, 7), (2, 1), (2, 2), (2, 3),
(2, 6), (3, 5), (4, 3), (5, 4), (5, 8),
(5, 9), (6, 4), (7, 2), (7, 6), (8, 7)])

#plot size specification


plt.figure(figsize =(5, 5))
nx.draw_networkx(G,node_color = 'pink')

13
[ ]:

[27]: # getting different graph attributes

print("Total number of nodes: ", int(G.number_of_nodes()))


print("Total number of edges: ", int(G.number_of_edges()))
print("List of all nodes: ", list(G.nodes()))
print("List of all edges: ", list(G.edges()))
print("In-degree for all nodes: ", dict(G.in_degree()))
print("Out degree for all nodes: ", dict(G.out_degree))

Total number of nodes: 9


Total number of edges: 15
List of all nodes: [1, 7, 2, 3, 6, 5, 4, 8, 9]
List of all edges: [(1, 1), (1, 7), (7, 2), (7, 6), (2, 1), (2, 2), (2, 3), (2,
6), (3, 5), (6, 4), (5, 4), (5, 8), (5, 9), (4, 3), (8, 7)]
In-degree for all nodes: {1: 2, 7: 2, 2: 2, 3: 2, 6: 2, 5: 1, 4: 2, 8: 1, 9: 1}
Out degree for all nodes: {1: 2, 7: 2, 2: 4, 3: 1, 6: 1, 5: 3, 4: 1, 8: 1, 9:
0}

[28]: print("List of all nodes we can go to in a single step from node 2: ",list(G.
↪successors(2)))

14
print("List of all nodes from which we can go to node 2 in a single step:␣
↪",list(G.predecessors(2)))

List of all nodes we can go to in a single step from node 2: [1, 2, 3, 6]


List of all nodes from which we can go to node 2 in a single step: [2, 7]

[29]: print("In degree centrality(Inbound Links): ",nx.in_degree_centrality(G))


print("Out Degree centrality(Outbound Links): ",nx.out_degree_centrality(G))

In degree centrality(Inbound Links): {1: 0.25, 7: 0.25, 2: 0.25, 3: 0.25, 6:


0.25, 5: 0.125, 4: 0.25, 8: 0.125, 9: 0.125}
Out Degree centrality(Outbound Links): {1: 0.25, 7: 0.25, 2: 0.5, 3: 0.125, 6:
0.125, 5: 0.375, 4: 0.125, 8: 0.125, 9: 0.0}

1 The Marvel Universe SNA


The dataset contains heroes and comics, and the relationship between them. The dataset is divided
into three files:
nodes.csv: Contains two columns (node, type), indicating the name and the type (comic, hero) of
the nodes. edges.csv: Contains two columns (hero, comic), indicating in which comics the heroes
appear. hero-edge.csv: Contains the network of heroes which appear together in the comics.
[30]: # import the dataset
hero_network = pd.read_csv('C:/Users/ramaleer/Desktop/python/Practical 4/
↪dataset/hero-network.csv')

# display the top 5


hero_network.head()

[30]: hero1 hero2


0 LITTLE, ABNER PRINCESS ZANDA
1 LITTLE, ABNER BLACK PANTHER/T'CHAL
2 BLACK PANTHER/T'CHAL PRINCESS ZANDA
3 LITTLE, ABNER PRINCESS ZANDA
4 LITTLE, ABNER BLACK PANTHER/T'CHAL

[31]: # Size of the Hero Network


hero_network.shape

[31]: (574467, 2)

[32]: hero_network.describe()

[32]: hero1 hero2


count 574467 574467
unique 6211 6173

15
top CAPTAIN AMERICA CAPTAIN AMERICA
freq 8149 8350

[33]: # selecting a sample of hero_network- We are selecting the information relevant␣


↪to Thor, Captain America and IronMan

[34]: # selecting 3 subgroups with 25 records per each sample


np.random.seed(100)

Thor = hero_network[hero_network['hero1']=='THOR/DR. DONALD BLAK'].


↪sample(25,random_state=0)

Cap = Subset = hero_network[hero_network['hero1']=='CAPTAIN AMERICA'].


↪sample(25,random_state=0)

IronMan = hero_network[hero_network['hero1'].str.contains('IRON MAN/TONY␣


↪STARK')].sample(25,random_state=0)

[35]: #combine the 3 samples together


Subset = pd.concat([Thor,Cap,IronMan],axis = 0)
Subset.head()

[35]: hero1 hero2


294518 THOR/DR. DONALD BLAK BLACK PANTHER/T'CHAL
109106 THOR/DR. DONALD BLAK BANNER, BETTY ROSS T
340853 THOR/DR. DONALD BLAK ENCHANTRESS/AMORA/HE
260226 THOR/DR. DONALD BLAK FALCON/SAM WILSON
84946 THOR/DR. DONALD BLAK WASP/JANET VAN DYNE

[36]: G=nx.from_pandas_edgelist(Subset, 'hero1', 'hero2')

plt.figure(figsize = (20,20))
nx.draw(G, with_labels=True,node_size = 8)
plt.show()

16
[37]: # Degree Centrality
deg_centrality=nx.degree_centrality(G)
degcentral_node= max(deg_centrality, key=deg_centrality.get)
print("Degree Centrality :", degcentral_node)

# Closeness Centrality
close_centrality=nx.closeness_centrality(G)
closeness_node=max(close_centrality, key=close_centrality.get)
print("Closeness Centrality :", closeness_node)

# Between Centrality
bet_centrality = nx.betweenness_centrality(G, normalized = True, endpoints =␣
↪False)

17
betcentral_node=max(bet_centrality, key=bet_centrality.get)
print("Betweenness Centrality :",betcentral_node)

Degree Centrality : CAPTAIN AMERICA


Closeness Centrality : IRON MAN/TONY STARK
Betweenness Centrality : IRON MAN/TONY STARK

[ ]:

[38]: #nx.degree_centrality(G)

[39]: # without the name labels


pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos, node_size=100)
nx.draw_networkx_edges(G, pos, alpha=0.5)
plt.show()

[40]: # clustering coefficients


print("Global CC")
nx.average_clustering(G)

Global CC

18
[40]: 0.027260432414710115

[41]: print("Local CC")


local_cc=nx.clustering(G)
local_cc

Local CC

[41]: {'THOR/DR. DONALD BLAK': 0.003952569169960474,


"BLACK PANTHER/T'CHAL": 0,
'BANNER, BETTY ROSS T': 0,
'ENCHANTRESS/AMORA/HE': 0,
'FALCON/SAM WILSON': 0,
'WASP/JANET VAN DYNE ': 0,
'BOLT, COUNCILMAN AND': 0,
'BEETLE/ABNER RONALD ': 0,
'MEDUSA/MEDUSALITH AM': 0,
'DR. SPECTRUM II/DR. ': 0,
'QUASAR III/WENDELL V': 0,
'JARVIS, EDWIN ': 0,
'WRECKER III/DIRK GAR': 0,
'VISION ': 0.6666666666666666,
'HUGIN': 0,
'FIREBIRD/BONITA JUAR': 0,
'GALACTUS/GALAN': 0,
'HOTSHOT/LOUIS': 0,
'WONDER MAN/SIMON WIL': 0,
'DR. STRANGE/STEPHEN ': 0,
'CAPTAIN MARVEL/CAPTA': 0,
'BANNON, LANCE': 0,
'ODIN [ASGARDIAN]': 0,
'CAPTAIN AMERICA': 0.008658008658008658,
'THING/BENJAMIN J. GR': 0,
'NOVA/RICHARD RIDER': 0,
'THANOS': 0,
'CLINTON, BILL': 0,
'HORTON, PROFESSOR PH': 0,
'LEECH': 0,
'CAPTAIN MARVEL II/MO': 0,
'ANDROMEDA/ANDROMEDA ': 0,
'SCARLET CENTURION': 0,
'PAGAN': 0,
'IRONCLAD': 0,
'ROSENTHAL, BERNIE': 0,
'RUDOLFO, PRINCE': 0,
'TITANIA II/MARY SKEE': 1.0,
'IRON MAN/TONY STARK ': 0.010869565217391304,

19
'STORM, CHILI': 0,
'THUNDERSTRIKE/ERIC K': 0,
'HULK DOPPELGANGER': 0,
'TUATARA/COMMANDER AR': 0,
'USAGENT/CAPTAIN JOHN': 0,
'TYPHON': 0,
'RODGERS, MARIANNE': 0,
'ULTIMO': 0,
'ANT-MAN/DR. HENRY J.': 0,
'BYRD, SEN. HARRINGTO': 0,
'AQUARIAN/WUNDARR': 0,
'SPIDER-MAN/PETER PAR': 0,
'CYCLOPS/SCOTT SUMMER': 0,
'VENUS/APHRODITE/VICT': 0,
'MOON KNIGHT/MARC SPE': 0,
'MR. FANTASTIC/REED R': 0,
'ARBOGAST, BAMBI': 0,
'BINARY/CAROL DANVERS': 0,
'SCARLET WITCH/WANDA ': 0,
'MIRAGE': 0,
'DOLLAR BILL': 0,
'DR. SPECTRUM/JOSEPH ': 0,
'HAWK': 0}

[42]: # Find the maximum clustering coefficient and its corresponding node
max_clustering_node = max(local_cc, key=local_cc.get)
max_clustering_node

[42]: 'TITANIA II/MARY SKEE'

dataset: https://www.kaggle.com/datasets/csanhueza/the-marvel-universe-social-
network/code?select=nodes.csv
blog article furthur reading : https://www.roxanne-euproject.org/news/blog/social-network-
analysis-for-criminology-in-roxanne
[ ]:

20

You might also like