|
Online Social Networks and Media |
Home |
Project Proposal Guidelines You can find some guidelines for the project report here. Make sure that you start
the report early! Presentation Schedule The updated schedule for the presentations is as follows: Wednesday 18/1, starting at 13:00: ·
Μαρία
Ζέρβα, Γιάννης
Κουβάτης: o Discovering top-k
teams of experts with/without a leader in social networks. Mehdi Kargar, Aijun An, CIKM 2011 ·
Χρήστος
Σπαθάρης: o Antisocial Behavior in
Online Discussion Communities. Justin Cheng, Cristian Danescu-Niculescu-Mizil,
Jure Leskovec, ICWSM 2015 ·
Μαρία
Παππά: o Tracking the Evolution
of Communities in Dynamic Social Networks, Derek Greene, Dónal
Doyle, Padraig Cunningham, ASONAM 2010 ·
Αδαμόπουλος
Γιώργος,
Βαλεκάρδας
Δημήτρης: o A hashtag
recommendation system for twitter data streams. Eriko Otsuka, Scott A.
Wallace, David Chiu, Computational Social Networks, 2016 ·
Κωνσταντίνος
Δημολίκας,
Chaysri Piyabhum: o Quantifying
Controversy in Social Media. Kiran Garimella, Gianmarco De Francisci Morales,
Aristides Gionis, Michael Mathioudakis,
WSDM 2016 ·
Παπαμιχαήλ
Άγγελος: o Near linear time
algorithm to detect community structures in large-scale networks. Usha Nandini Raghavan, Reka Albert, Soundar Kumara, Physical Review E 76, 2007 ·
Βιργινία
Τσίντζου: o Density-based place
clustering in geo-social networks. Shi, J., Mamoulis,
N., Wu, D., & Cheung, D. W.,
SIGMOD 2014 Paper Presentation Guidelines The presentations will be
evaluated based on the quality of the presentation, and the comprehension of
the material covered. The following are some guideline, tips and advice for
preparing your presentation. ·
You have 20 minutes for the presentation (1 student group) and 25 minutes
(2 students group). We will enforce the time limit and cut you off if you
have not completed on time. 10
more minutes will be allocated for questions. We may randomly pick someone
from the audience to ask a question, so everyone should pay attention. ·
You should prepare around 20-25 slides, given that a slide
takes around a minute to talk about on average. ·
Break you presentation into thematic units.
The following flow is very common:
·
The talk should be self-contained. Do not assume that the audience has read the
paper, or some previous work that you consider known. Define all the concepts
you need and all the notation that you use. Refer only to related
work that you know. ·
Since the time for the talk is short,
you will need to focus on the important parts of the paper and avoid going
through all the details. The goal is to give a summary of the paper and have
a clear message. Just because you read all the paper it does not mean that
you should present everything. At the same time, you should not skip
important information. Focusing on the right part to present is important
since it shows that you understood the paper well. ·
Prepare the slides carefully. Do not add
too much text, and only the math symbols necessary. Do not use full
sentences, but rather keywords and short phrases. Make sure the slides are
readable and not too loaded. Never ever project parts of the paper pdf. ·
Practice! Good talks are the result of a
lot of practice even if they seem spontaneous and fun to the audience.
Practice the talk several times, and time yourself to make sure you are
within the time bounds. Some fun advice on how
to give a bad talk (and more) here. Projects The list of projects is available here. The assignment is
First-Come-First-Serve. The timeline for the projects is as follows:
Assignment 2 Due on Wednesday 7/12 in class. For this
assignment, you can either write your own code or use implementations
provided by SNAP, NetworkX, or other sources.
Specify this in your report. Question 2 can be done in teams of two, while
the remaining questions should be done individually. Question 1 Consider an
alternative definition of density for a set of nodes S in a graph G to be the
minimum degree of any node in S in the induced subgraph G[S], that is, where G[S] denotes the subgraph induced
by the set of nodes S in the graph G, and degree(v,G[S])
denotes the degree of node Question 2 For this
Question you will use again the DBLP10 dataset you used for Assignment 1. Use
the link analysis ranking techniques we described in class to propose two
algorithms for ranking the conferences in the data. Your algorithms should
make use of the network between the authors. You can use existing
implementations of the link analysis ranking algorithms when available. Question 3 In class,
we studied the problem of selecting seed nodes to maximize the expected
spread of influence in a network for the independent cascade model. Consider
the case that we are given a set of seed nodes S that initiate the spread of
harmful information. We want to select a set of (non-seed) nodes to
“immunize” so as to stop the spread of the harmful information. Immunized
nodes do not propagate the information, thus blocking the spread. Given a
network G, the set S, and a budget K, we want to find a set of nodes B of
size K to immunize such that the expected number of “protected nodes” that
never receive the information, P(B), is maximized. a.
Show
that the function P(B) is not a submodular function.
Note that for this proof it suffices to construct a counter-example where the
submodularity property does not hold. (Hint: Assume
deterministic propagation of the information). b.
Propose
a heuristic algorithm for selecting the set B. Teams for the Assignment 1 Team 1 (Forest Fire): Μαρία Ζέρβα, Ιωάννης Κουβάτης, Χρήστος Σπαθάρης Team 2 (Kronecker graph):
Άγγελος Παπαμιχαήλ, Δημήτρης Βαλεκάρδας, Βιργινία Τσίντζου, Γιώργος Αδαμόπουλος Team 3 (Preferential attachment and
copying model): Μαρία Παππά, Κωνσταντίνος Δημολίκας, Chaysri Piyabhum Assignment 1 Due on Wednesday 16/11 in class. The goal of
this assignment is for you to familiarize yourself with network measurements
and network generation models, and experiment with community detection/graph
clustering algorithms. This is a group assignment. There will be 3 teams, and
each team will work with a different graph model. For this
assignment, you can either write your own code or use implementations
provided by SNAP, NetworkX, or other sources.
Specify this in your report. You will
use the DBLP10 dataset which you can download from here. Dataset description. The DBLP10
dataset includes publications from computers science conference between 2006
and 2015. Nodes correspond to authors. There is an edge between two authors
if they have written an article together. Co-authorship: Data in the form (id1, id2) meaning
that author with id1 co-authored an article with author with id2. Authors: Data in the form (id, n) indicating
that the author (node) with identifier id has name n. Label: Data in the form (id, c) indicating
that author with identifier id wrote a paper at conference c (a node may have
multiple labels). Question (1) [models and
measurements] Consider
the following graphs: (1) The
DBLP10 co-authorship graph, (2)
An Erdos-Renyi random graph with the same
number of nodes as DBLP10 graph, and (3) A graph
generated using one of the following
alternatives: (a) a Kronecker graph, (b)
preferential attachment and copying models, or (c) forest fire. Again the
number of nodes of the generated graph should be similar to the number of
nodes of DBLP. For these
graphs: (a)
Plot
the degree distribution. Produce 5 plots (simple distribution, bins of equal
size, bins of exponential size, cumulative, zipf). All plots should be in log-log scale. (b)
Report
the effective diameter. (c)
Report
the clustering co-efficient. Questions (2) [homophily] Use the
labels of the authors in the co-authorship to study the homophily
of the DBLP10 co-authorship graph.
Devise your own methodology for measuring homophily
and describe it in your report. Questions (3) [clustering] (a) Apply clustering
on the co-authorship graph. Report the number of clusters, information about
their size and the name of the authors in the largest cluster. Also, evaluate
the quality of clusters using the metrics described in class. You can use any
of the clustering algorithms we presented in class. If necessary, experiment
with different number of clusters. (b) Use the
labels of the authors to evaluate the homogeneity of the clusters. Devise
your own metric for this, and describe it in the report. |