0% found this document useful (0 votes)
94 views

Getting Started With Graph Analysis in Python With Pandas and Networkx

This document provides a tutorial on how to perform graph analysis in Python using Pandas and NetworkX. It demonstrates how to create a graph from a Pandas dataframe by connecting individuals who share the same phone number. The data is cleaned to remove duplicate and self connections. The cleaned data is then used to construct a graph object in NetworkX, which can be analyzed using various graph algorithms.

Uploaded by

ante mitar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views

Getting Started With Graph Analysis in Python With Pandas and Networkx

This document provides a tutorial on how to perform graph analysis in Python using Pandas and NetworkX. It demonstrates how to create a graph from a Pandas dataframe by connecting individuals who share the same phone number. The data is cleaned to remove duplicate and self connections. The cleaned data is then used to construct a graph object in NetworkX, which can be analyzed using various graph algorithms.

Uploaded by

ante mitar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Getting started with graph analysis in Python with pandas and networkx |... https://towardsdatascience.com/getting-started-with-graph-analysis-in-py...

1 of 8 4/25/2021, 3:23 PM
Getting started with graph analysis in Python with pandas and networkx |... https://towardsdatascience.com/getting-started-with-graph-analysis-in-py...

2 of 8 4/25/2021, 3:23 PM
Getting started with graph analysis in Python with pandas and networkx |... https://towardsdatascience.com/getting-started-with-graph-analysis-in-py...

import pandas as pd

df = pd.DataFrame({'ID':[1,2,3,4,5,6],
'First Name':['Felix', 'Jean', 'James', 'Daphne', 'James', 'Peter'],
'Family Name': ['Revert', 'Durand', 'Wright', 'Hull', 'Conrad', 'Donovan'],
'Phone number': ['+33 6 12 34 56 78', '+33 7 00 00 00 00', '+33 6 12 34 56 78'
'Email': ['[email protected]', '[email protected]', '[email protected]'

set_up_data.py hosted with by GitHub view raw

3 of 8 4/25/2021, 3:23 PM
Getting started with graph analysis in Python with pandas and networkx |... https://towardsdatascience.com/getting-started-with-graph-analysis-in-py...

column_edge = 'Phone number'


column_ID = 'ID'

data_to_merge = df[[column_ID, column_edge]].dropna(subset=[column_edge]).drop_duplicates() # select column

# To create connections between people who have the same number,


# join data with itself on the 'ID' column.
data_to_merge = data_to_merge.merge(
data_to_merge[[column_ID, column_edge]].rename(columns={column_ID:column_ID+"_2"}),
on=column_edge
)

connect_individuals.py hosted with by GitHub view raw

4 of 8 4/25/2021, 3:23 PM
Getting started with graph analysis in Python with pandas and networkx |... https://towardsdatascience.com/getting-started-with-graph-analysis-in-py...

# By joining the data with itself, people will have a connection with themselves.
# Remove self connections, to keep only connected people who are different.
d = data_to_merge[~(data_to_merge[column_ID]==data_to_merge[column_ID+"_2"])] \
.dropna()[[column_ID, column_ID+"_2", column_edge]]

# To avoid counting twice the connections (person 1 connected to person 2 and person 2 connected to person 1
# we force the first ID to be "lower" then ID_2
d.drop(d.loc[d[column_ID+"_2"]<d[column_ID]].index.tolist(), inplace=True)

clean_connections.py hosted with by GitHub view raw

import networkx as nx

G = nx.from_pandas_edgelist(df=d, source=column_ID, target=column_ID+'_2', edge_attr=column_edge)

G.add_nodes_from(nodes_for_adding=df.ID.tolist())

create_graph.py hosted with by GitHub view raw

5 of 8 4/25/2021, 3:23 PM
Getting started with graph analysis in Python with pandas and networkx |... https://towardsdatascience.com/getting-started-with-graph-analysis-in-py...

6 of 8 4/25/2021, 3:23 PM
Getting started with graph analysis in Python with pandas and networkx |... https://towardsdatascience.com/getting-started-with-graph-analysis-in-py...

7 of 8 4/25/2021, 3:23 PM
Getting started with graph analysis in Python with pandas and networkx |... https://towardsdatascience.com/getting-started-with-graph-analysis-in-py...

8 of 8 4/25/2021, 3:23 PM

You might also like