|
Online Social Networks and Media |
Home |
Project Proposal Guidelines You can find some guidelines for the project report here. Make sure that you start
the report early! Paper Presentation Guidelines The presentations will be
evaluated based on the quality of the presentation, and the comprehension of
the material covered. The following are some guideline, tips and advice for
preparing your presentation. ·
You have 20 minutes for the presentation (1 student group) and 25 minutes
(2 students group). We will enforce the time limit and cut you off if you
have not completed on time. 10
more minutes will be allocated for questions. We may randomly pick someone
from the audience to ask a question, so everyone should pay attention. ·
You should prepare around 20-25 slides, given that a slide
takes around a minute to talk about on average. ·
Break you presentation into thematic
units. The following flow is very common:
·
The talk should be self-contained. Do not assume that the audience has read the
paper, or some previous work that you consider known. Define all the concepts
you need and all the notation that you
use. Refer only to related work that you know. ·
Since the time for the talk is short,
you will need to focus on the important parts of the paper and avoid going
through all the details. The goal is to give a summary of the paper and have
a clear message. Just because you read all the paper it does not mean that
you should present everything. At the same time, you should not skip
important information. Focusing on the right part to present is important
since it shows that you understood the paper well. ·
Prepare the slides carefully. Do not add
too much text, and only the math symbols necessary. Do not use full
sentences, but rather keywords and short phrases. Make sure the slides are
readable and not too loaded. Never ever project parts of the paper pdf. ·
Practice! Good talks are the result of a
lot of practice even if they seem spontaneous and fun to the audience.
Practice the talk several times, and time yourself to make sure you are
within the time bounds. Some fun advice on how
to give a bad talk (and more) here. Project Proposal Guidelines Deadline: Dec 24, 2015 5pm (by
email) The suggested length for the project proposal is 2 pages. In the header you should have the title of the project, the members of
the project and the URL of the web page of the project. The text body of the project should have three main parts: 1. Problem definition: Describe the goal of
your project and the questions that you want to answer. Write a few words to
motivate why the problem is important (e.g., mention a couple of
applications). 2. Methodology: Describe how you
will address the problem you defined. What are the steps you will take? Try
to be as specific as possible. 3. Evaluation: Explain how you will
evaluate your work. Describe the experiments you plan to do and the dataset
that will be used. At the end of the proposal, write down the paper that you will present
(full citation). Although it is not mandatory, we suggest that you write the project
proposal and the project report in English, so that it is accessible to
anyone to read. Presentation Schedule o January 13: Projects 2,4,7,8 o January 20: Projects 1, 6a, 6b, 9 Project Assignments ·
Δ. Τριαντής: 1:Ego network characterization Papers: o Lars Backstrom, Eytan Bakshy, Jon M. Kleinberg, Thomas M. Lento, Itamar Rosenn:Center
of Attention: How Facebook Users Allocate Attention across Friends. ICWSM
2011 o Johan Ugander, Brian Karrer, Lars Backstrom,
Cameron Marlow:The Anatomy
of the Facebook Social Graph. CoRR abs/1111.4503
(2011) ·
Θ. Κατσιγιάννης, Δ. Μπακάλης: 2: Local and global event
identification Paper: o Tye Rattenbury, Nathaniel Good, Mor Naaman:
Towards automatic extraction of event and place semantics from flickr tags. SIGIR 2007: 103-110 ·
Α. Μουταφίδου, Β. Τουλατζής: 4: Recommending tags for influence
maximization Papers: o Su Mon Kywe, Tuan-Anh Hoang, Ee-Peng
Lim, Feida Zhu: On Recommending Hashtags in Twitter
Networks. SocInfo 2012: 337-350 o Eva Zangerle, Wolfgang Gassler,
Gunther Specht, Recommending #-Tags in Twitter ·
Ν. Τιμονίδης: 6a:
Dense subgraphs over time Paper: o Mauro Sozio, Aristides Gionis: The
community-search problem and how to plan a successful cocktail party. KDD
2010: 939-948 ·
Α. Αποστόλου, Ε. Μπέκας: 6b: Dense subgraphs with activity
levels Paper: o Polina Rozenshtein, Aris Anagnostopoulos, Aristides
Gionis, Nikolaj Tatti: Event detection in activity networks. KDD 2014:
1176-1185 ·
Θ. Κούτσης, Π. Κούτση: 7: Virus propagation in evolving
networks Paper: o Jure Leskovec, Andreas Krause, Carlos Guestrin,
Christos Faloutsos, Jeanne M. VanBriesen,
Natalie S. Glance: Cost-effective outbreak detection in networks. KDD 2007:
420-42 ·
Α. Ματάκος: 8 Diversifying opinions Paper: o Aristides Gionis, Evimaria Terzi,
Panayiotis Tsaparas: Opinion Maximization in Social
Networks. SDM 2013: 387-395 ·
Α. Παπάς, Ν. Κουφός: 9: Extracting positive and negative
linked networks Use a discussion forum to produce a “signed” network based on the
pair-wise interactions of the users and check whether properties such as
stability hold Paper: o J. Leskovec,
D. Huttenlocher, J. Kleinberg. Predicting Positive
and Negative Links in Online Social Networks ACM WWW International conference
on World Wide Web (WWW), 2010. Projects The list of projects is available here. The assignment is First-Come-First-Serve. The timeline for the projects is as follows:
Assignment 3 Due on Wednesday 9/12 in class. All questions are individual. Question 1 Explain the SimRank algorithm using absorbing
random walks. Question 2 Assume a clique of size 3 (triangle) and of size 4. Each edge has
transmission probability p. Compute the expected spread if one node is the
seed for the Independent Cascade model. Assignment 2 Due on Wednesday 25/11 in class. Question 1 of the assignment will be done
in groups of 2 people, while the rest of the questions are individual. Question 1 The goal of this question is to experiment with algorithms for community
finding and link-analysis ranking. You are given a network of actors, where there is an edge between two
actors if they have played together in some movie between 1995 and 2004. You
are also given a file that maps the actors ids to names, and for each actor
we also have the main genre of movies in which (s)he
acted. You can download the data from these two links: http://cs224w.stanford.edu/homework/hw1/imdb_actor_edges.tsv http://cs224w.stanford.edu/homework/hw1/imdb_actors_key.tsv Preprocess the data and extract the largest connected component of the
graph. This is the part of the graph you will use in your experiments. The question has two parts: Part 1: (a)
Apply a community detection algorithm on the graph to derive
communities. You can use any of the algorithms presented in class. If you
need to fix the number of communities use the number of genres. (b)
Evaluate the results of the algorithm. Since the ground truth is
not available use at least two methods to do so: one using intra and inter
clustering density, and a semantic based one that uses the genres of the
nodes in each community. Part 2: (a)
Propose and implement and algorithm that uses link-analysis
techniques to compute the importance of the communities. Present the
communities in the order of their importance. (b)
Apply the same link analysis algorithm to rank the nodes in two
different ways: inside each community and in the whole graph. Then compute
the following: 1.
Get the top-k nodes overall (k is the number of communities).
Compute how many of these nodes belong to the top community. 2.
Get the top-k nodes overall and compute the fraction of these
nodes that are within the top-k of their corresponding community. You can use any community finding and link analysis algorithm you choose,
and you can use existing implementations for these algorithms. Turn in your
code and a report that outlines your overall approach, the algorithms, the
results and a commentary on your findings. Question 2 Do exercise 6 from Chapter 6 in the book Social Media Mining (pg. 214) Question 3 Prove that in the stationary distribution of a random walk on an
undirected graph the probability of a node Teams for the Assignment 1 Team 1 (Kronecker graph):
Σπύρος Απόστολου, Δημήτρης Τριαντης, Στάθης Μπαλικας, Εύδοξος Μπεκας Team 2 (Preferential attachment and
copying model): Αντώνης Ματάκος, Νίκος Κουφός, Αθανάσιος Παππάς, Νέστορας Τιμονίδης Team 3 (Small World):
Αναστασία Μουταφίδου,
Βασίλειος Τουλατζής,
Θεόδωρος
Κουτσής,
Παρασκευή
Κουτσή Team 4 (Forest Fire):
Τάσος Σουρής,
Βασίλης
Μπακάλης,
Γεώργιος Γκανάς,
Θεόφιλος Κατσιγιάννης,
Θανάσης Μπάσιος
Assignment 1 Due on Wednesday 4/11 in class. The goal of
this assignment is for you to familiarize yourself with network measurements
and network generation models. This is a group assignment. There will be 4
teams. Part 1 For this
part of the assignment, you will compute a number of measurements for a
number of graphs. The
measurements are: (a)
degree
distribution (b)
clustering
coefficient (c)
effective
diameter For (b) and
(c) you can either write your own code or use the implementations provided by
SNAP or NetworkX. For (a) you
should produce 5 plots (simple distribution, bins of equal size, bins of
exponential size, cumulative, zipf) For all
your measurements report the average over 100 different random graphs. The graphs
are: (a) An Erdos-Renyi
random graph (All teams) (b) A real network from the SNAP
dataset, specifically: ·
ca-AstroPh (Team 1) ·
ca-CondMat (Team 2) ·
ca-HepTh (Team 3) ·
ca-HepPg (Team 4) (c) A graph that you will generate
following one of the models below: ·
Kronecker
graph model (Team 1) ·
Preferential
attachment and copying model (Team 2) ·
Small
world (using the caveman and Watts-Strogatz model)
(Team 3) ·
Forest
Fire Model (as described in the class slides) (Team 4) For
generating the graphs in (a) and (c), you should write your own code. The
graph size in (a) and (c) should be around the same size as in (b). Prepare a
short write-up with the following: 1.
Any
information about your implementation that you find useful, in
particular, did you write your own
code or use one from some tool, did you make any assumptions (e.g. the
initial graph for the generative models), etc. 2.
Plots
of the degree distributions. For power-law distributions use the three
different ways of plotting the distribution and compute the power-law
exponent. 3.
Report
the clustering coefficient. 4.
Report
the effective diameter. For 2-4,
comment on the results. Also, write-down the values of the parameters of your
model. Part 2 For this
part, you are asked to study homophily on the
ego-Facebook dataset from SNAP. This dataset includes node features
(profiles), circles, and ego networks. There is no specific correct answer to
this part – be creative! Teams will
be formed in a first-come-first-serve way |