Sections
Y-7 Parallel Systems / Spring 2003
Final Exam: Thursday, 12/6/2003, 16:00
Project (30%):
Schedule:
- Tueday 27/5/2003, progress presentation, 10 min (10%)
- 1 week after the final exams, final presentation, 15 min (10%)
- DSM over MPI (Τζίκας/Πετράκης και Καστίδου/Καρακασίδης)
- Implementation of various spin-locks algorithms for x86, sparc, MIPS processors (Γεωργόπουλος)
- Implementation and performance study of various barrier algorithms (Παπαφώτη)
- POSIX Threads, implementations, performance (Καρβέλης)
- Survey of parallel programming languages (Φιλίππου)
- Historical survey of parallel machines (Σεβεντεκίδης)
- Benchmarks, visualization and profiling tools for OpenMP (Λεοντιάδης)
- Parallelization of a neural network training process using Posix threads and MPI (Βαλασούλης)
- A small message passing library (Χάντας)
Optional mini-project (20%):
Study the parallelization of Gaussian elimination (for solving
systems of linear equations) in depth,
for shared-memory programming (using pthreads)
and message passing (using MPI). You are allowed to use books,
resources on the internet, etc. Discuss all your thoughts on
how to parallelize each part, and implement parallel programs.
Your report should be very detailed, and more than 10 pages
in length.
- Deadline/presentation: Friday 6/6/2003, 10:30
- Sample 128x128 linear system Ax = b: matrix A, vector b.
- Contest: The one who writes the fastest pthread program and the one who writes the fastest MPI program will be exempted from the final exam!
- Grades:
--> Βαλασούλης: 9, Καρακασίδης: 8, Λεοντιάδης: 9, Πετράκης: 10 (the winner!)
- Problems: 4.1-4.3
- In 4.1 assume square mesh and torus MxM. You may also assume that M is odd in the case of torus. Obtain closed-form formulas.
- In 4.3, also map a 6x8x10 torus in the smallest possible hypercube (using Gray codes).
- Measure message setup and per-byte time in helios.cc and in our local net. Use a simple ping-pong benchmark and message sizes of 1 byte - 64 KBytes. Plot the results and use a least squares fitting to determine setup and per-byte time (you should repeat measurements many times and average the results).
- Write, execute and measure parallel programs for
matrix multiplication using checkerboard
and strip partitioning (where each process calculates
a set of rows of the result).
- Repeat your measurements (at least 4 times) and average the results
- Use 1, 4, 16, 64 processes, 512x512 integer matrices
- Your report should contain plots of the results and discussion.
- All prorgams are to be run in helios.cc and our local net.
- Problems: 9.1-9.3
- Survey modern commercial bus-based shared memory machines.
- Survey modern commercial CPUs that support multiprocessing (model, basic characteristics, cache coherence protocol, memory consistency model).
- Problems: 3.2 - 3.6
- Read the instructor's notes on memory consistency and report errors, typos etc.
- Memory consistency (.ps.gz, in Greek)
- Show the equivallency of butterfly and omega, baseline or delta
networks
(here are some of the slides for the butterly and related networks)
- Learn how to time in Unix; find out about high resolution
timers;
write a summary report (2-3 pages). - Measure fork()'s and pthread_create()'s time
overhead.
You should repeat measurements many times (e.g. 1000) and average the results. Run your tests in atlantis, titan and helios.cc.
Be careful, it is a bit tricky! - Write, execute and measure parallel programs for calculating
π = 3,14.. and for matrix multiplication.
- using SysV shared memory and processes, do π with block scheduling and matrix multiplication with checkerboard partitioning
- using Posix Threads (pthreads), do π with loop splitting and matrix multiplication with self-scheduling.
- Repeat your measurements (at least 4 times) and average the results
- Use 1, 4, 16, 64 processes / threads, 512x512 integer matrices
- Your report should contain plots of the results and discussion.
- All prorgams are to be run in atlantis, titan and helios.cc.
- Here is a serial program for matrix multiplication and two matrices A, B to play with.
- Problems: 8.1, 8.3-8.6, 8.8, 8.10-8.14
- Create your own home page for this course
- Survey all parallel UoI machines in Computer Center & Dept. of Computer Science (IP address/name, model type, # processors, O.S., total memory)
- Survey Grand Challenges (HPCC '96) and their time requirements
- Problems: 1.1, 1.2 & DBPP 1.1, 1.2, 1.3
Student pages:
Course outline:
Introduction: parallel systems and their programming
Shared-memory programming: processes, threads, apps, openMP
Shared-memory machines: architecture, networks, cache coherence, memory consistency
Message-passing programming: communication primitives, MPI, apps
Distributed-memory machines: architecture, interconnection networks, routing, collective communications, distributed shared memory
Performance and metrics: Amdhal, Gustafson, granularity, scalability
Topics on shared-memory systems and programming: TBA
Topics on distributed-memory systems and message passing: TBA
Shared-memory programming: processes, threads, apps, openMP
Shared-memory machines: architecture, networks, cache coherence, memory consistency
Message-passing programming: communication primitives, MPI, apps
Distributed-memory machines: architecture, interconnection networks, routing, collective communications, distributed shared memory
Performance and metrics: Amdhal, Gustafson, granularity, scalability
Topics on shared-memory systems and programming: TBA
Topics on distributed-memory systems and message passing: TBA
Books:
- The instructor's
- I. Foster, Designing and Building Parallel Programms, DBPP, available on-line
- V. Kumar et al, Introduction to Parallel Computing: Design and Analysis of Algorithms
- D. Culler et al, Parallel Computer Architecture: A Hardware/Software Approach
Journals:
The library has many relevant technical
journals & conference proceedings. Keep an eye on the following general-purpose
magazines, too:
IEEE Computer, IEEE Micro, ACM Computing
Surveys
A few relevant links:
- Internet Parallel Computing Archive
- MPI
- OpenMP
- Διεργασίες, κοινή μνήμη και σημαφόροι στο Unix, σύντομο tutorial (VVD)
- Πως να χρονομετρήσετε στο Unix (VVD)
- Οδηγίες για την χρήση του LAM / MPI (VVD)
AMD, Compaq, Cray, DEC, Fujitsu, HP,
IBM, Intel, Meiko, MIPS, Motorola, NEC, SGI, Sun