|
Online Social Networks and Media |
Home |
Project Report
Guidelines You can find some guidelines for the project
report here. Make sure that you start the report
early! Paper Presentation
Guidelines The presentations will be
evaluated based on the quality of the presentation, and the comprehension of
the material covered. The following are some guidelines, tips and advice for
preparing your presentation. � You have 20 minutes for the presentation (1
student group) and 25 minutes (2-student group). We will enforce the
time limit and cut you off if you have not completed it on time. 10 more
minutes will be allocated for questions. We may randomly pick someone from
the audience to ask a question, so everyone should pay attention. � You should prepare around 20-25 slides, given that a
slide takes around a minute to talk about on average. � Break you presentation into thematic units. The
following flow is very common: 1. Motivate why
the problem is important and give a high level idea; 2. Define clearly the
problem; 3. Present the main idea and
the fundamental algorithms; 4. Present the results
(experimental or theoretical or both); 5. Conclusions. � The talk should be self-contained. Do not assume
that the audience has read the paper, or some previous work that you consider
known. Define all the concepts you need and all the notation that
you use. Refer only to related work that you know. � Since the time for the talk is short, you will need to focus on
the important parts of the paper and avoid going through all the details. The
goal is to give a summary of the paper and have a clear message. Just because
you read the full paper, it does not mean that you should present everything.
At the same time, you should not skip important information. Focusing on the
right part to present is important since it shows that you understood the
paper well. � Prepare the slides carefully. Do not add too much text, and only
the necessary math. Do not use full sentences, but rather keywords and short
phrases. Make sure the slides are readable and not too loaded. Never ever
project parts of the paper pdf. � Practice! Good talks are the result of a lot of practice even if
they seem spontaneous and fun to the audience. Practice the talk several
times, and time yourself to make sure you are within the time bounds. Some fun advice (to avoid) on how to give
a bad talk (and more) here. Projects The list of projects
is available here. The
projects will be done in teams of at most two students. Send an email to both
instructors with the names of the team members, and your selection by Friday
15/12/2023. Deliverables
and Timeline:
Assignment 2 Due December 6th in class. In this assignment you
will experiment with network embeddings for community detection and link
prediction. For the assignment you will use the DBLP dataset you used for the
second part of Assignment 1. The assignment has two parts. A.
Compute the node2vec
embeddings for the nodes in the graph and apply the k-means clustering
algorithm to obtain communities of nodes. Evaluate the communities in the
same way as in Assignment 1. Compare your results with those you obtained
with the community detection algorithms. Briefly comment on the results (Bonus) Experiment with more clustering algorithms for finding
communities. B.
For this part of the
assignment, you will use the node2vec embedding for link prediction.
Specifically, you will build a binary classifier that will predict the
probability of existence of an edge e = (u, v) given the embedding of nodes u
and v as input. First, remove a set S of edges from the original graph.� The goal will be to predict S, that is, S
is the test set for your classifier. Let G� be the graph where the edges in S
have been removed. Compute the node2vec embeddings for G�.� These are the embeddings that will be used
for the classification. Check the node2vec paper for ways to combine these
embeddings to create the features for the classifier. To train the classifier
use as positive class, a set of edges that exist in G� and as a negative
class, a set of randomly selected pairs of nodes that are note connected in
G. The size of the negative class should be the same as that of the positive
class. Report the AUC when S includes 20%, 25% and 30% of the edges of the
original graph. You may experiment with different ways to combine the
embeddings for producing the classifier features. Create a Jupyter
Notebook with the code you have written, the output of your code, and any
commentary you have on your results. You can either write your own code or
use implementations provided by SNAP, NetworkX, or other sources. It is
recommended to use existing libraries. Specify this in your report. Export
your notebook to HTML and submit both the notebook and the HTML file. The
assignments should be done in teams of at most two students. (Bonus) Experiment
with additional embedding algorithms and evaluate the performance for the two
tasks above. Assignment 1 Due November 22 in class. In this assignment you
will experiment with network measurements and models, and community detection
algorithms. For the assignment you will create a Jupyter Notebook which will
contain code you have written, the output of your code, and any commentary you
have on your results. You can either write your own code or use
implementations provided by SNAP, NetworkX, or other sources. It is
recommended to use existing libraries. Specify this in your report. Export
your notebook to HTML and submit both the notebook and the HTML file. The
assignments should be done in teams of at most two students. The assignment has two
parts. A. For the first part of the assignment you will experiment with
network measurements and network generation models. (1) The Wiki-Vote and
the ego-Facebook graph
from the SNAP dataset
repository. (2) An
(undirected) Erdos-Renyi random graph. (3) An (undirected)
graph generated using preferential attachment. (4) A graph generated
using the forest fire model. The number of nodes of
the generated graphs and (when possible) the (expected) number of edges of
each of the synthetically generated graphs should be the
same to one of the Wiki-Vote graph. For these graphs: a. Plot the degree
distributions for each graph. Produce 3 plots (simple distribution,
cumulative distribution, zipf). All plots should be in
log-log scale. (Use the grid option to put all plots per dataset in the same
line) b. Compute and report the
effective diameter for all graphs. c. Compute and report the clustering
co-efficient for all graphs. Briefly comment on the
results. B. For the second part of the assignment you will experiment with
community detection algorithms. For this question, you will use the DBLP10 dataset that includes
publications from computer science conferences between 2006 and 2015. Nodes
correspond to authors. There is an edge between two authors if they have
written an article together. The following information is available: Co-authorship: Data in the form (id1, id2) meaning that author with id1
co-authored an article with author with id2. Authors: Data
in the form (id, n) indicating that the author (node) with identifier id has
name n. Label: Data
in the form (id, c) indicating that author with identifier id wrote a paper at
conference c. Hence, the label of each author (node of your graph) is a set
of conferences. (a) Find communities
in these graphs using a modularity-based algorithm. Report the number of
clusters, the size and modularity of each cluster. If necessary, experiment
with different numbers of clusters to improve the quality of the clusters. (b) (Optional) Use
the labels of the users to evaluate the homogeneity of the clusters. For each
pair of clusters Ci and Cj, compute the average
similarity between the labels of ai and aj, where
ai is an author in Ci and aj is
an author in Cj. Use the Jaccard index to measure similarity (https://en.wikipedia.org/wiki/Jaccard_index). Report your findings using m � m matrix where m is the number
of clusters. |