|
Online Social Networks and Media |
Home |
Project Report
Guidelines You can find some guidelines for the project
report here. Make sure that you start the report
early! Paper Presentation
Guidelines The presentations will be
evaluated based on the quality of the presentation, and the comprehension of
the material covered. The following are some guideline, tips and advice for
preparing your presentation. ·
You have 20 minutes
for the presentation (1 student group) and 25 minutes (2 students group).
We will enforce the time limit and cut you off if you have not completed on
time. 10 more minutes will be allocated for questions. We
may randomly pick someone from the audience to ask a question, so everyone
should pay attention. ·
You should prepare
around 20-25 slides, given that a slide takes around a minute to
talk about on average. ·
Break you presentation
into thematic units. The following flow is very common: 1.
Motivate why
the problem is important and give a high level idea; 2.
Define clearly the
problem; 3.
Present the main idea
and the fundamental algorithms; 4.
Present the results
(experimental or theoretical or both); 5.
Conclusions. ·
The talk should
be self-contained. Do not assume that the audience has read the
paper, or some previous work that you consider known. Define all the concepts
you need and all the notation that you use. Refer only to related
work that you know. ·
Since the time for the
talk is short, you will need to focus on the important parts of the paper and
avoid going through all the details. The goal is to give a summary of the
paper and have a clear message. Just because you read all the paper it does
not mean that you should present everything. At the same time, you should not
skip important information. Focusing on the right part to present is
important since it shows that you understood the paper well. ·
Prepare the slides
carefully. Do not add too much text, and only the math symbols necessary. Do
not use full sentences, but rather keywords and short phrases. Make sure the
slides are readable and not too loaded. Never ever project parts of the paper
pdf. ·
Practice! Good talks
are the result of a lot of practice even if they seem spontaneous and fun to
the audience. Practice the talk several times, and time yourself to make sure
you are within the time bounds. Some fun advice on how to give a bad talk
(and more) here. Projects The list of projects
is available here. The assignment is First-Come-First-Serve. Assignment 1 Due December 4, 2020 end of day. The goal of this
assignment is for you to familiarize yourself with network measurements and
network generation models, and experiment with clustering. The assignments should
be done individually. You can either write
your own code or use implementations provided by SNAP, NetworkX, or
other sources. Specify this in your report.
Consider the following graphs: (1) The Wiki-Vote graph from the SNAP dataset repository.
(2) An (undirected) Erdos-Renyi random graph. (3) An (undirected) graph generated using preferential attachment. (4) A graph generated using the forest fire model. The number of nodes of the generated graphs and (when possible) the
(expected) number of edges of each of the synthetically generated graphs
should be the same to one of the Wiki-Vote graph. For these graphs: a.
Plot the degree distributions for each graph. Produce 5 plots
(simple distribution, bins of exponential size,
cumulative, zipf). All plots should be in log-log scale. b.
Report the effective diameter for all graphs. c.
Report the clustering co-efficient for all graphs. In addition, for the Wiki-Vote and the Forest Fire graphs
compute the PageRank values and produce again 5 plots as before for the
distribution of the PageRank values for each of the two graphs. Submit a short report where you describe the experimental set-up
(including the values of the tuning parameters for each of the models), and
your results. Specifically, for each of the measurements, for each of the
graphs, comment on the results that you obtain, similar to the analysis
presented in class. Question 2 For this question, you will use the DBLP10
dataset that includes publications from computer science conferences between
2006 and 2015. Nodes correspond to authors. There is an edge between two
authors if they have written an article together. The following information
is available: Co-authorship: Data in the form (id1, id2)
meaning that author with id1 co-authored an article with author with id2. Authors: Data in the form (id, n)
indicating that the author (node) with identifier id has name n. Label: Data in the form (id, c) indicating that
author with identifier id wrote a paper at conference c. Hence, the label of
each author (node of your graph) is a set of conferences. (a) Find communities in these graphs using a modularity-based algorithm.
Report the number of clusters, the size and modularity of each cluster. If
necessary, experiment with different number of clusters to improve the
quality of the clusters. (b) Use the labels of the users to evaluate the homogeneity of the
clusters. For each pair of clusters Ci and Cj, compute
the average similarity between the labels of ai and aj,
where ai is an author in Ci and aj is an
author in Cj. Use the Jaccard index to measure similarity (https://en.wikipedia.org/wiki/Jaccard_index). Report your findings using m × m
matrix where m is the number of clusters. Question 3 [optional] Let |