Data Structures (Spring 2021) Semester Project: Network Analysis Using Graphs
Data Structures (Spring 2021) Semester Project: Network Analysis Using Graphs
Overview
There are three datasets given with this project statement. You are required to complete the tasks, given
below, for any one of the datasets. At the maximum, you can try any two of the three datasets for more
understanding on how graphs help in investigating hidden patterns in the data (you can read it as, “for
bonus marks”). You should first try to complete all the tasks on the smallest dataset and then use the
same code for the others. (170 marks)
You are also required to submit a one-page report about your understanding of the dataset. In the report
you are required to explain the patterns in the dataset as much as you can. (10 marks)
Note: Your program MUST present a cool menu to run the tasks one at a time or all at once. We will use
this menu to test your program. If the menu is not cool/useful, then we will deduct 20 marks from the
overall obtained marks. If the output is not cool/readable, we will deduct further 10 marks from the
overall obtained marks.
Grading will be done individually after a demo. If a member is unable to explain the working/demo of
the project, he/she will be given the minimum (say zero) marks in the project while others will get
according to their understanding.
1
National University of Computer and Emerging Sciences
Arxiv GR-QC (General Relativity and Quantum Cosmology) collaboration network is from the e-print arXiv
(https://arxiv.org/archive/gr-qc) and covers scientific collaborations between authors papers submitted
to General Relativity and Quantum Cosmology category. If an author i co-authored a paper with author j,
the graph contains a undirected edge from i to j. If the paper is co-authored by k authors this generates a
completely connected (sub)graph on k nodes.
The data covers papers in the period from January 1993 to April 2003 (124 months). It begins within a few
months of the inception of the arXiv, and thus represents essentially the complete history of its GR-QC
section.
2. Astro Physics collaboration network Dataset Description
Arxiv ASTRO-PH (Astro Physics) collaboration network is from the e-print arXiv
(https://arxiv.org/archive/astro-ph) and covers scientific collaborations between authors papers
submitted to Astro Physics category. If an author i co-authored a paper with author j, the graph contains
a undirected edge from i to j. If the paper is co-authored by k authors this generates a completely
connected (sub)graph on k nodes.
The data covers papers in the period from January 1993 to April 2003 (124 months). It begins within a few
months of the inception of the arXiv, and thus represents essentially the complete history of its ASTRO-
PH section.
Tasks
First Set of Tasks – Graph Stats
1. Display the number of nodes (5 marks)
2. Display the number of edges (5 marks)
3. Display the number of source nodes (5 marks)
4. Display the number of sink nodes (5 marks)
5. Display the number of isolated nodes (5 marks)
6. Display the number of bridge edges (15 marks)
2
National University of Computer and Emerging Sciences
12. Display the size of the largest strongly connected component (SCC) (25 marks)
13. Display the size distribution of all SCCs (10 marks)
14. Display the size of the largest weakly connected component (WCC) (25 marks)
15. Display the size distribution of all WCCs (10 marks)
Algorithms
You will need the following algorithms for the third set of tasks.
3
National University of Computer and Emerging Sciences
4
National University of Computer and Emerging Sciences
Definitions
Source Vertex: A vertex with zero in-degree is called a source vertex
Isolated Vertex: A vertex with in-degree and out-degree both equal to zero is called an isolated vertex.
Degree Distribution: The degree distribution of a graph G(V, E), denoted by P(k), is defined as the
probability that a random chosen vertex has degree k. If |V|k denotes the number of vertices with degree
k,
Bridge Edge: A bridge edge is an edge whose removal disconnects the graph.
Articulation Vertex: An articulation vertex is defined as a vertex which when deleted renders the graph
a disconnected one.
Diameter: The diameter of a graph is the maximum of the distances between all pairs of vertices in this
graph. While computing the diameter of a graph, all infinite distances are disregarded.
Best of Luck ☺