Home
Material
Slides
Assignments
|
Lecture Slides
For the slides
of this course we will use slides and material from
other courses and books. We thank in advance: Tan, Steinbach
and Kumar, Anand Rajaraman and Jeff Ullman, Evimaria
Terzi, for the material of their slides that we have
used in this course.
Lecture 1:
Introduction to Data Mining (ppt, pdf)
Lecture 2: Data,
pre-processing and post-processing (ppt, pdf)
Lecture 3: Frequent
Itemsets, Association Rules, Apriori
algorithm.(ppt, pdf)
Lecture 4: Frequent
Itemests, Association Rules. Evaluation.
Beyond Apriori
(ppt, pdf)
Lecture 5: Similarity and
Distance. Metrics. Min-wise independent
hashing. (ppt,pdf)
Lecture 6: Min-wise independent hashing. Locality
Sensitive Hashing. Clustering, K-means
algorithm (ppt,pdf)
Lecture 7: Hierarchical
clustering, DBSCAN, Mixture models and the
EM algorithm (ppt,pdf)
Lecture 8a: Clustering Validity, Minimum
Description Length (MDL), Introduction to
Information Theory, Co-clustering using MDL. (ppt,pdf)
- Deepayan Chakrabarti,
Spiros Papadimitriou, Dharmendra Modha, Christos
Faloutsos, Fully Automatic
Cross-Associations, KDD 2004, Seattle,
August 2004. [PDF]
- Some details about MDL and Information
Theory can be found in the book “Introduction
to Data Mining” by Tan, Steinbach, Kumar
(chapters 2,4).
Lecture 8b: Clustering Validity, Minimum
Description Length (MDL), Introduction to
Information Theory, Co-clustering using MDL. (ppt,pdf)
- Chapter 2,
Evimaria Terzi, Problems
and Algorithms for Sequence Segmentations, Ph.D.
Thesis (PDF)
Lecture 9: Dimensionality Reduction, Singular
Value Decomposition (SVD), Principal Component
Analysis (PCA). (ppt,pdf)
Lecture 10a: Classification. Decision Trees.
Evaluation. (ppt,pdf)
Lecture 10b: Classification. k-Nearest
Neighbor classifier, Logistic Regression,
Support Vector Machines (SVM), Naive Bayes (ppt,pdf)
Lecture 11: Naive Bayes classifier.
Supervised Learning. Web Search and PageRank (ppt,pdf)
Lecture 12: Link Analysis
Ranking: PageRank, HITS, Random
Walks (ppt,pdf)
Lecture 13: Absorbing Random
Walks. Coverage Problems (Set
Cover, Maximum Coverage) (ppt,pdf)
|