Online Social Networks and Media -- Homework

network

Online Social Networks and Media
Homework

Home

Homework

Slides & References

Reading Material

Resources

Project Proposal Guidelines

You can find some guidelines for the project report here. Make sure that you start the report early!

Paper Presentation Guidelines

The presentations will be evaluated based on the quality of the presentation, and the comprehension of the material covered. The following are some guideline, tips and advice for preparing your presentation.

· You have 20 minutes for the presentation (1 student group) and 25 minutes (2 students group). We will enforce the time limit and cut you off if you have not completed on time. 10 more minutes will be allocated for questions. We may randomly pick someone from the audience to ask a question, so everyone should pay attention.

· You should prepare around 20-25 slides, given that a slide takes around a minute to talk about on average.

· Break you presentation into thematic units. The following flow is very common:

Motivate why the problem is important and give a high level idea;
Define clearly the problem;
Present the main idea and the fundamental algorithms;
Present the results (experimental or theoretical or both);
Conclusions.

· The talk should be self-contained. Do not assume that the audience has read the paper, or some previous work that you consider known. Define all the concepts you need and all the notation that you use. Refer only to related work that you know.

· Since the time for the talk is short, you will need to focus on the important parts of the paper and avoid going through all the details. The goal is to give a summary of the paper and have a clear message. Just because you read all the paper it does not mean that you should present everything. At the same time, you should not skip important information. Focusing on the right part to present is important since it shows that you understood the paper well.

· Prepare the slides carefully. Do not add too much text, and only the math symbols necessary. Do not use full sentences, but rather keywords and short phrases. Make sure the slides are readable and not too loaded. Never ever project parts of the paper pdf.

· Practice! Good talks are the result of a lot of practice even if they seem spontaneous and fun to the audience. Practice the talk several times, and time yourself to make sure you are within the time bounds.

Some fun advice on how to give a bad talk (and more) here.

Project Proposal Guidelines

Deadline: Dec 24, 2015 5pm (by email)

The suggested length for the project proposal is 2 pages.

In the header you should have the title of the project, the members of the project and the URL of the web page of the project.

The text body of the project should have three main parts:

1. Problem definition: Describe the goal of your project and the questions that you want to answer. Write a few words to motivate why the problem is important (e.g., mention a couple of applications).

2. Methodology: Describe how you will address the problem you defined. What are the steps you will take? Try to be as specific as possible.

3. Evaluation: Explain how you will evaluate your work. Describe the experiments you plan to do and the dataset that will be used.

At the end of the proposal, write down the paper that you will present (full citation).

Although it is not mandatory, we suggest that you write the project proposal and the project report in English, so that it is accessible to anyone to read.

Presentation Schedule

o January 13: Projects 2,4,7,8

o January 20: Projects 1, 6a, 6b, 9

Project Assignments

· Δ. Τριαντής: 1:Ego network characterization

Papers:

o Lars Backstrom, Eytan Bakshy, Jon M. Kleinberg, Thomas M. Lento, Itamar Rosenn:Center of Attention: How Facebook Users Allocate Attention across Friends. ICWSM 2011

o Johan Ugander, Brian Karrer, Lars Backstrom, Cameron Marlow:The Anatomy of the Facebook Social Graph. CoRR abs/1111.4503 (2011)

· Θ. Κατσιγιάννης, Δ. Μπακάλης: 2: Local and global event identification

Paper:

o Tye Rattenbury, Nathaniel Good, Mor Naaman: Towards automatic extraction of event and place semantics from flickr tags. SIGIR 2007: 103-110

· Α. Μουταφίδου, Β. Τουλατζής: 4: Recommending tags for influence maximization

Papers:

o Su Mon Kywe, Tuan-Anh Hoang, Ee-Peng Lim, Feida Zhu: On Recommending Hashtags in Twitter Networks. SocInfo 2012: 337-350

o Eva Zangerle, Wolfgang Gassler, Gunther Specht, Recommending #-Tags in Twitter

· Ν. Τιμονίδης: 6a: Dense subgraphs over time

Paper:

o Mauro Sozio, Aristides Gionis: The community-search problem and how to plan a successful cocktail party. KDD 2010: 939-948

· Α. Αποστόλου, Ε. Μπέκας: 6b: Dense subgraphs with activity levels

Paper:

o Polina Rozenshtein, Aris Anagnostopoulos, Aristides Gionis, Nikolaj Tatti: Event detection in activity networks. KDD 2014: 1176-1185

· Θ. Κούτσης, Π. Κούτση: 7: Virus propagation in evolving networks

Paper:

o Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne M. VanBriesen, Natalie S. Glance: Cost-effective outbreak detection in networks. KDD 2007: 420-42

· Α. Ματάκος: 8 Diversifying opinions

Paper:

o Aristides Gionis, Evimaria Terzi, Panayiotis Tsaparas: Opinion Maximization in Social Networks. SDM 2013: 387-395

· Α. Παπάς, Ν. Κουφός: 9: Extracting positive and negative linked networks

Use a discussion forum to produce a “signed” network based on the pair-wise interactions of the users and check whether properties such as stability hold

Paper:

o J. Leskovec, D. Huttenlocher, J. Kleinberg. Predicting Positive and Negative Links in Online Social Networks ACM WWW International conference on World Wide Web (WWW), 2010.

Projects

The list of projects is available here. The assignment is First-Come-First-Serve.

The timeline for the projects is as follows:

Week before Christmas: Submit a ~2-page project proposal outlining what you plan to do. This should include the topic of your presentation
January 13 and 20: Presentations.
February 3: Submit full project.

Assignment 3

Due on Wednesday 9/12 in class. All questions are individual.

Question 1

Explain the SimRank algorithm using absorbing random walks.

Question 2

Assume a clique of size 3 (triangle) and of size 4. Each edge has transmission probability p. Compute the expected spread if one node is the seed for the Independent Cascade model.

Assignment 2

Due on Wednesday 25/11 in class. Question 1 of the assignment will be done in groups of 2 people, while the rest of the questions are individual.

Question 1

The goal of this question is to experiment with algorithms for community finding and link-analysis ranking.

You are given a network of actors, where there is an edge between two actors if they have played together in some movie between 1995 and 2004. You are also given a file that maps the actors ids to names, and for each actor we also have the main genre of movies in which (s)he acted. You can download the data from these two links:

http://cs224w.stanford.edu/homework/hw1/imdb_actor_edges.tsv

http://cs224w.stanford.edu/homework/hw1/imdb_actors_key.tsv

Preprocess the data and extract the largest connected component of the graph. This is the part of the graph you will use in your experiments.

The question has two parts:

Part 1:

(a) Apply a community detection algorithm on the graph to derive communities. You can use any of the algorithms presented in class. If you need to fix the number of communities use the number of genres.

(b) Evaluate the results of the algorithm. Since the ground truth is not available use at least two methods to do so: one using intra and inter clustering density, and a semantic based one that uses the genres of the nodes in each community.

Part 2:

(a) Propose and implement and algorithm that uses link-analysis techniques to compute the importance of the communities. Present the communities in the order of their importance.

(b) Apply the same link analysis algorithm to rank the nodes in two different ways: inside each community and in the whole graph. Then compute the following:

1. Get the top-k nodes overall (k is the number of communities). Compute how many of these nodes belong to the top community.

2. Get the top-k nodes overall and compute the fraction of these nodes that are within the top-k of their corresponding community.

You can use any community finding and link analysis algorithm you choose, and you can use existing implementations for these algorithms. Turn in your code and a report that outlines your overall approach, the algorithms, the results and a commentary on your findings.

Question 2

Do exercise 6 from Chapter 6 in the book Social Media Mining (pg. 214)

Question 3

Prove that in the stationary distribution of a random walk on an undirected graph the probability of a node is proportional to the degree of the node.

Teams for the Assignment 1

Team 1 (Kronecker graph): Σπύρος Απόστολου, Δημήτρης Τριαντης, Στάθης Μπαλικας, Εύδοξος Μπεκας

Team 2 (Preferential attachment and copying model): Αντώνης Ματάκος, Νίκος Κουφός, Αθανάσιος Παππάς, Νέστορας Τιμονίδης

Team 3 (Small World): Αναστασία Μουταφίδου, Βασίλειος Τουλατζής, Θεόδωρος Κουτσής, Παρασκευή Κουτσή

Team 4 (Forest Fire): Τάσος Σουρής, Βασίλης Μπακάλης, Γεώργιος Γκανάς, Θεόφιλος Κατσιγιάννης, Θανάσης Μπάσιος

Assignment 1

Due on Wednesday 4/11 in class.

The goal of this assignment is for you to familiarize yourself with network measurements and network generation models. This is a group assignment. There will be 4 teams.

Part 1

For this part of the assignment, you will compute a number of measurements for a number of graphs.

The measurements are:

(a) degree distribution

(b) clustering coefficient

For (b) and (c) you can either write your own code or use the implementations provided by SNAP or NetworkX.

For (a) you should produce 5 plots (simple distribution, bins of equal size, bins of exponential size, cumulative, zipf)

For all your measurements report the average over 100 different random graphs.

The graphs are:

(a) An Erdos-Renyi random graph (All teams)

(b) A real network from the SNAP dataset, specifically:

· ca-AstroPh (Team 1)

· ca-CondMat (Team 2)

· ca-HepTh (Team 3)

· ca-HepPg (Team 4)

· Kronecker graph model (Team 1)

· Preferential attachment and copying model (Team 2)

· Small world (using the caveman and Watts-Strogatz model) (Team 3)

· Forest Fire Model (as described in the class slides) (Team 4)

For generating the graphs in (a) and (c), you should write your own code. The graph size in (a) and (c) should be around the same size as in (b).

Prepare a short write-up with the following:

1. Any information about your implementation that you find useful, in particular, did you write your own code or use one from some tool, did you make any assumptions (e.g. the initial graph for the generative models), etc.

2. Plots of the degree distributions. For power-law distributions use the three different ways of plotting the distribution and compute the power-law exponent.

3. Report the clustering coefficient.

4. Report the effective diameter.

For 2-4, comment on the results. Also, write-down the values of the parameters of your model.

Part 2

For this part, you are asked to study homophily on the ego-Facebook dataset from SNAP. This dataset includes node features (profiles), circles, and ego networks. There is no specific correct answer to this part – be creative!

Teams will be formed in a first-come-first-serve way