Sections
Y-7 Parallel Systems
Previous terms
Y-7 Parallel Systems / Spring 2005
Main Project:
- MPI implementations and collective communications for switch-based ethernet clusters (Κρεμμυδάς - Σκυβαλίδας)
- Parallelism in the Linux kernel (Καραγιάννης - Μελισσόβας)
- Simulator for multistage interconnection networks (Σαούγκος)
- Survey of sDSM libraries and experimentation (Μαργαρίτη)
-
Write an extensive (at least 20-30 pages) report about OpenMP: what it is,
history of development, description of its directives and operations,
examples of how to use it etc.
It is supposed to include a complete tutorial on OpenMP.
Try out (code, measure and include in your report) OpenMP programs for pi, matrix times vector, matrix times matrix, in atlantis (icc) and helios.cc
- Problems 6.2 and 6.7 (for 6.2 a general algorithm is required, for any N)
- Prove Amdahl's law using Gustafson's law or vice versa
- Find the optimum time for performing total exchange in linear graphs under the multiport model.
- (bonus +1 in the final grade) Find, analyze and prove formaly an optimal algorithm for the previous problem.
- Measure message setup and per-byte time in helios.cc and in our local net. Use a simple ping-pong benchmark and message sizes of 1 byte - 64 KBytes. Plot the results and use a least squares fitting to determine setup and per-byte time (you should repeat measurements many times and average the results).
- Write a C program for mapping arbitrary meshes/tori to hypercubes, using (partial) gray codes. The program should ask for the dimension of the hypercube, the number of dimensions of the mesh, and the number of nodes in each dimension of the mesh.
- Problem 4.1 (you may assume square mesh and torus MxM).
- Write, execute and measure parallel MPI programs for
matrix multiplication using checkerboard
and strip partitioning (where each process calculates
a set of rows of the result).
- Repeat your measurements (at least 4 times) and average the results
- Use 1, 4, 16, 64 processes, 512x512 integer matrices
- Your report should contain plots of the results and discussion.
- All prorgams are to be run in helios.cc and our local net.
- Reading assignment (for 19/4/2005):
- Hoard memory allocator: Μαργαρίτη, Σαούγκος
- Algorithms for locks and barriers: Καραγιάννης, Μελισσόβας
- HyperCup - forming hypercubes in P2P systems: Κρεμμυδάς, Σκυβαλίδας
- Prepare a detailed report on the MESI and MOESI cache coherency protocols.
- Write a detailed summary of paper [4] in the "Memory consistency" course notes.
- Prepare a detailed report on AMD-64 and Intel Itanium processor architectures, regarding their cache coherence mechanisms and their memory model.
- Problems: 3.2 - 3.6
- Learn how to time in Unix; find out about high resolution timers; write a summary report (3-4 pages).
- Write, execute and measure parallel programs for
matrix multiplication, using Posix Threads (pthreads),
checkerboard partitioning and self-scheduling.
Notes:
- Repeat your measurements (at least 4 times) and average the results
- Use 1, 4, 16, 64 threads, 512x512 integer matrices
- Your report should contain plots of the results and discussion.
- All prorgams are to be run in atlantis, titan and helios.cc.
- Here is a serial program for matrix multiplication and two matrices A, B to play with.
- Design, implement and test a barrier primitive for pthreads (barrier_init(), barrier_wait(), barrier_destroy()). Since you can easily find code over the internet, a great deal of emphasis will be given on the detailed explanation of how your code works and on the tests you have done.
- Problems (on paper only, don't submit programs): 8.1, 8.3-8.6, 8.8, 8.10-8.14
- Create your own home page for this course
- Survey all parallel UoI machines in Computer Center & Dept. of Computer Science (IP address/name, model type, # processors, O.S., total memory)
- Obtain accounts on helios.cc and atlantis.cs.
- Problems: 1.1, 1.2 & DBPP 1.1, 1.2, 1.3, 1.6, 1.7
Course notes/material:
- Butterfly and related networks slides (postscript files, .tar.gz)
- Memory consistency (.ps.gz, in greek)
Students:
Student
/ URL |
|
Αναστάσιος Καραγιάννης | ktasos |
Νίκος Κρεμμυδάς | nkremmid |
Παναγιώτης Σκυβαλίδας |
skyval |
Σπυριδούλα Μαργαρίτη |
smargari |
Σπύρος Μελισσόβας |
peppe |
Δημήτρης Σαούγκος |
dsaougos |
Course outline:
Introduction: parallel systems and their programming
Shared-memory programming: processes, threads, apps, openMP
Shared-memory machines: architecture, networks, cache coherence, memory consistency
Message-passing programming: communication primitives, MPI, apps
Distributed-memory machines: architecture, interconnection networks, routing, collective communications, distributed shared memory
Performance and metrics: Amdhal, Gustafson, granularity, scalability
Topics on shared-memory systems and programming: TBA
Topics on distributed-memory systems and message passing: TBA
Shared-memory programming: processes, threads, apps, openMP
Shared-memory machines: architecture, networks, cache coherence, memory consistency
Message-passing programming: communication primitives, MPI, apps
Distributed-memory machines: architecture, interconnection networks, routing, collective communications, distributed shared memory
Performance and metrics: Amdhal, Gustafson, granularity, scalability
Topics on shared-memory systems and programming: TBA
Topics on distributed-memory systems and message passing: TBA
Books:
- The instructor's
- I. Foster, Designing and Building Parallel Programms, DBPP, available on-line
- V. Kumar et al, Introduction to Parallel Computing: Design and Analysis of Algorithms
- D. Culler et al, Parallel Computer Architecture: A Hardware/Software Approach
Journals:
The library has many relevant technical
journals & conference proceedings. Keep an eye on the following general-purpose
magazines, too:
IEEE Computer, IEEE Micro, ACM Computing
Surveys
A few relevant links:
- Internet Parallel Computing Archive
- MPI
- OpenMP
- Διεργασίες, κοινή μνήμη και σημαφόροι στο Unix, σύντομο tutorial (VVD)
- Πως να χρονομετρήσετε στο Unix (VVD)
- Οδηγίες για την χρήση του LAM / MPI (VVD)
AMD, Compaq, Cray, DEC, Fujitsu, HP,
IBM, Intel, Meiko, MIPS, Motorola, NEC, SGI, Sun