Y-7 Parallel Systems / Spring 2003

Final Exam: Thursday, 12/6/2003, 16:00

Project (30%):

  • Tueday 27/5/2003, progress presentation, 10 min (10%)
  • 1 week after the final exams, final presentation, 15 min (10%)
  1. DSM over MPI (Τζίκας/Πετράκης και Καστίδου/Καρακασίδης)
  2. Implementation of various spin-locks algorithms for x86, sparc, MIPS processors (Γεωργόπουλος)
  3. Implementation and performance study of various barrier algorithms (Παπαφώτη)
  4. POSIX Threads, implementations, performance (Καρβέλης)
  5. Survey of parallel programming languages (Φιλίππου)
  6. Historical survey of parallel machines (Σεβεντεκίδης)
  7. Benchmarks, visualization and profiling tools for OpenMP (Λεοντιάδης)
  8. Parallelization of a neural network training process using Posix threads and MPI (Βαλασούλης)
  9. A small message passing library (Χάντας)

Optional mini-project (20%):
Study the parallelization of Gaussian elimination (for solving systems of linear equations) in depth, for shared-memory programming (using pthreads) and message passing (using MPI). You are allowed to use books, resources on the internet, etc. Discuss all your thoughts on how to parallelize each part, and implement parallel programs. Your report should be very detailed, and more than 10 pages in length.

  • Deadline/presentation: Friday 6/6/2003, 10:30
  • Sample 128x128 linear system Ax = b: matrix A, vector b.
  • Contest: The one who writes the fastest pthread program and the one who writes the fastest MPI program will be exempted from the final exam!
  • Grades:
    --> Βαλασούλης: 9, Καρακασίδης: 8, Λεοντιάδης: 9, Πετράκης: 10 (the winner!)
Assignment 5 (due 20/5/2003):
  1. Problems: 4.1-4.3
    • In 4.1 assume square mesh and torus MxM. You may also assume that M is odd in the case of torus. Obtain closed-form formulas.
    • In 4.3, also map a 6x8x10 torus in the smallest possible hypercube (using Gray codes).
Assignment 4 (due 13/5/2003):
  1. Measure message setup and per-byte time in helios.cc and in our local net. Use a simple ping-pong benchmark and message sizes of 1 byte - 64 KBytes. Plot the results and use a least squares fitting to determine setup and per-byte time (you should repeat measurements many times and average the results).
  2. Write, execute and measure parallel programs for matrix multiplication using checkerboard and strip partitioning (where each process calculates a set of rows of the result).
    • Repeat your measurements (at least 4 times) and average the results
    • Use 1, 4, 16, 64 processes, 512x512 integer matrices
    • Your report should contain plots of the results and discussion.
    • All prorgams are to be run in helios.cc and our local net.
  3. Problems: 9.1-9.3
Assignment 3 (due 15/4/2003):
  1. Survey modern commercial bus-based shared memory machines.
  2. Survey modern commercial CPUs that support multiprocessing (model, basic characteristics, cache coherence protocol, memory consistency model).
  3. Problems: 3.2 - 3.6
  4. Read the instructor's notes on memory consistency and report errors, typos etc.
Additional class notes:
Assignment 2 supplement (due 26/3/2003):
  • Show the equivallency of butterfly and omega, baseline or delta networks
    (here are some of the slides for the butterly and related networks)
Assignment 2 (due 26/3/2003):
  1. Learn how to time in Unix; find out about high resolution timers;
    write a summary report (2-3 pages).
  2. Measure fork()'s and pthread_create()'s time overhead.
    You should repeat measurements many times (e.g. 1000) and average the results. Run your tests in atlantis, titan and helios.cc.
    Be careful, it is a bit tricky!
  3. Write, execute and measure parallel programs for calculating π = 3,14.. and for matrix multiplication.
    • using SysV shared memory and processes, do π with block scheduling and matrix multiplication with checkerboard partitioning
    • using Posix Threads (pthreads), do π with loop splitting and matrix multiplication with self-scheduling.
    • Repeat your measurements (at least 4 times) and average the results
    • Use 1, 4, 16, 64 processes / threads, 512x512 integer matrices
    • Your report should contain plots of the results and discussion.
    • All prorgams are to be run in atlantis, titan and helios.cc.
    • Here is a serial program for matrix multiplication and two matrices A, B to play with.
  4. Problems: 8.1, 8.3-8.6, 8.8, 8.10-8.14
Assignment 1 (due 4/3/2003):
  1. Create your own home page for this course
  2. Survey all parallel UoI machines in Computer Center & Dept. of Computer Science (IP address/name, model type, # processors, O.S., total memory)
  3. Survey Grand Challenges (HPCC '96) and their time requirements
  4. Problems: 1.1, 1.2 & DBPP 1.1, 1.2, 1.3

Student pages:

Student / URL

Student / URL
Κώστας Βαλασούλης merit
Άλκης Γεωργόπουλος alkisg
Αλέξης Καρακασίδης

Πέτρος Καρβέλης
Γεωργία Καστίδου

Ηλίας Λεοντιάδης
Μαρία Παπαφώτη

Γιάννης Πετράκης
Κώστας Σεβεντεκίδης

Δημήτρης Τζίκας
Αγγελος Φιλίππου

Γιάννης Χάντας

Course outline:
Introduction: parallel systems and their programming
Shared-memory programming: processes, threads, apps, openMP
Shared-memory machines: architecture, networks, cache coherence, memory consistency
Message-passing programming: communication primitives, MPI, apps
Distributed-memory machines: architecture, interconnection networks, routing, collective communications, distributed shared memory
Performance and metrics: Amdhal, Gustafson, granularity, scalability
Topics on shared-memory systems and programming: TBA
Topics on distributed-memory systems and message passing: TBA


  • The instructor's
  • I. Foster, Designing and Building Parallel Programms, DBPP, available on-line
  • V. Kumar et al, Introduction to Parallel Computing: Design and Analysis of Algorithms
  • D. Culler et al, Parallel Computer Architecture: A Hardware/Software Approach

The library has many relevant technical journals & conference proceedings. Keep an eye on the following general-purpose magazines, too:
IEEE Computer, IEEE Micro, ACM Computing Surveys

A few relevant links:
Some relevant companies:
AMD, Compaq, Cray, DEC, Fujitsu, HP, IBM, Intel, Meiko, MIPS, Motorola, NEC, SGI, Sun