CS059 – Data Mining
Fall 2012
|
|
Administrative
Class Hours: Tuesday
13:00-16:00, Room Ι3. Grades: The grade for the course will be determined
by the assignments. Announcements
·
Thursday March 7. Final grades. You can see your final grade here. If you have any questions
please contact me within the next one-two days.
·
Friday February 15. Assignment
5. You can submit Assignment 5
until the end of day today.
·
Wednesday February 13. Assignment
4 grades. You can see the grades for
Assignment 4 (together with those of the previous assignments) here.
·
Tuesday February 12. PageRank
in MATLAB. For those who implement
PageRank in MATLAB, you should use the efficient code we have in the slides
of the course.
·
Tuesday February 5. Deadline
for Assignment 5. To accommodate those that take the Graphics exam, the deadline for
Assignment 5 is moved to February 15, 6pm.
·
Tuesday February 5. Assignment
3 grades. You can see the grades for
Assignment 3 (together with those of the previous assignments) here.
·
Tuesday January 29. Assignment
2 grades. You can see the grades for
Assignment 2 (together with those of Assignment 1) here.
·
Sunday January 27. Programming exercises. For the programming exercises
it is required to submit a written report
discussing the results of your program. Also, you should explain how to run
the code, and you should have some comments in your code.
·
Tuesday January 22. Assignment
5. Together with your code for
Assignment 5, submit also instructions on how to run your program.
·
Wednesday January 16. Deadline
for Assignment 5. The new deadline for Assignment 5 is February 11. The Assignment has be slightly modified, so download the new version from the
web page of the class.
·
Tuesday January 15. Assignment
1 grades. You can see the grades for
Assignment 1 here.
·
Tuesday January 15. Assignment 5: Assignment 5 is available on
the Assignments web page.
·
Monday January 14. January
15th Lecture. A reminder that tomorrow class will start at 1:00.
·
Monday January 7. Evaluation. A reminder that tomorrow we
will have the course evaluation at the end of class.
·
Wednesday December 25. Deadline
extension for Assignment 4. The deadline for Assignment 4 is extended for December 29. The
assignment should be handed out until the end of the day.
·
Thursday December 13. Course Evaluation: At the end of class of Tuesday
December 18, we will do the course evaluation.
·
Thursday December 13. Assignment 4: Assignment 4 is available on
the Assignments web page.
·
Wednesday December 5. Extra class, Extension for
Assignment 3: This
Friday, December 7, there will be an extra lecture at 10-12 pm. The deadline
of Assignment 3 is extended for Tuesday December 11, at the beginning of the
class.
·
Sunday December 2. Assignment 3, Question 2: Two corrections for Question
2. The exercise in the textbook is 8.27, not 9.27. In the equation, there is
a 2 in the denominator of the proportionality factor. The corrected
assignment has been posted on the Assignments web page.
·
Sunday December 2. Assignment 3, Question 4: For the precision and recall
values of k-means report the mean value for 5 runs. Also, except for the
precision/recall values, give also your empirical observations about the kind
of users that are grouped together in each cluster.
·
Saturday November 24. Assignment 3: Assignment 3 is available on
the Assignments web page.
·
Tuesday November 20. Extra class, Assignment 2: This Friday we will have an
extra lecture at 12:00 for an hour or two. You can submit Assignment 2 until
the end of day today without any penalty.
·
Sunday November 18. Assignment 2, Question 2: For the hash functions
provided by the book you should take the value of the function mod 5 as your
hash function.
·
Thursday November 14. New class hours: From now on class hours will be
13:30 – 16:00. In case we need to cover some more material we will schedule
classes on Fridays.
·
Monday November 12. Class Hours: The class hours are still 13:00-16:00. There is a mistake in the
updated schedule on the department’s home page.
·
Friday, November 9. Assignment 2: Assignment 2 is available on
the Assignments web page.
·
Friday November 9. Free pass policy for
Assignments: For the
Assignment deadlines you have 3 “free passes”. That is, you have three days
which you can use for extending the deadline of an assignment. Details on the
Assignments page.
·
Thursday November 8. Turn in of Assignment 1,
Part B: You can turn-in the
Assignment until the end of day on Friday without any penalty.
·
Thursday November 8. Clarifications for
Assignment 1, Part B: Although it is the most common type of input, some of the
implementations in FIMI may also work with strings as items instead of
integers (one of your colleagues mentioned ECLAT). If this is the case then
you obviously do not need to do the conversion to integers.
·
Thursday November 8. Clarifications for
Assignment 1, Part B:
o
For question 3, if you plan to use WEKA, then each distinct word should
be made into an attribute that takes values true/false, depending if the word
is present or not. The number of attributes is then too large to fit in
memory and you should use the sparse arff format. (For example, see the
following posting: http://old.nabble.com/convert-market-basket-data-to-binary-form-for-fp-growth-td30651604.html
-- there is more information online). Another idea is to throw out the words
that are not frequent enough, but you may be left still with too many words.
Alternatively, you can use one of the FIMI implementations (e.g., the LCM
implementation is easy to use). In the input file each row should be a
“basket” and the items are integers (separated with spaces), so you would
need to assign an id to each word.
o
For question 2, the correct way to generate and count subsequences is
by considering the subsequences generated with the leftmost item in the
window. A different way to count the frequency is to count the number of
windows that contain a subsequence. Although this is slightly different than
what the question asks, it will be accepted.
·
Wednesday November 7. Time of Lecture, November 9: To avoid overlap with the class
of Operating Systems the lecture will take place at 11:00-14:00.
·
Thursday November 1. Clarifications for
Assignment 1, Part B:
o
For Question 2, the items of a subsequence maintain the order they have
in the sequence. For example, the sequence BBAC contains the subsequence BAC,
but not the subsequence ABC.
o
For Question 3, from the file with the Twitter profiles we are
interested only in the 11th (eleventh) field that holds the user description.
This is the field from which you should extract the frequent itemsets
(frequent sets of words). If you want to consider other (additional) fields,
you can suggest it as part of the option 3.
o
For Question 3, you have some freedom on how to preprocess the data. In
your report you should be clear about the choices that you have made.
·
Thursday November 1.
Postponed Lecture – Extension for Assignment
1, Part B:
Next week’s lecture (November 6th)
is postponed for Friday November 9, 9:00-12:00 am. The deadline for the
second part of the first assignment is extended to the start of the lecture
on November 9.
·
Monday October 29. Assignment 1 – Part 2 –
Question 2 - Correction: In the second question, when a subsequence appears more than once in
a window of length W, then it should be counted only once. For example for
the sequence AABC, for W = 4, the subsequence AB should be counted
only once, and not
twice as it was originally stated in the question. For the sequence AABCB,
for W = 4, the subsequence AB should be counted twice, once for the window
AABC, and once for the window ABCB, due to the new appearance of B at the end
of the window. This correction is necessary so that the anti-monotonicity
property holds. For bonus marks, give a counter example that violates the
anti-monotonicity property.
·
Friday, October 26. Assignment 1 – part 2: Part 2 of Assignment 1 is
out, on the Assignments web page.
·
Thursday,
October 25. Turn-in: To turn-in the first part of
Assignment 1 use the command: turnin assignment1a@ple059 <your files>.
Write your name and student number in the submitted files.
·
Friday, October 19. Assignment 1 – part 1: Part 1 of Assignment 1 is
out, on the Assignments web page.
|