Y-7 Parallel Systems / Fall 2003

Main Project:
  1. Algorithms for barriers, implementation and measurements (Κακολύρης)
  2. Algorithms for parallel sorting and implementations (Τσώτσος)
  3. Posix threads API, system-specific implementations, example usage of primitives (Παπαδόπουλος)
  4. RPC and Java RMI (Πλίτσης)
  5. OpenMP and NPB - Nasa Parallel Benchmarks (Ζέλιου)
Mini Project (due 20/1/2004):
    Study and use the communicators and process topologies of MPI.
    • Write an extensive report on their usage
    • Use the cartesian process topologies to implement different programs for matrix multiplication.
    • Report your results on helios.cc and our local net.
    You are to implement and compare the strip partitioning method (no process topologies needed), the Cannon's algorithm (can also be found in the Data Parallelism chapter of my book) and the algorithm in the "MPI: The complete reference" book, Section 6.8. Try to use non-blocking communications and the MPI_Sendrecv_replace() call.
Assignment 6 (due 3/2/2004):
  1. Problem 6.7
  2. Prove Amdahl's law using Gustafson's law
  3. Prove that in a hypercube exactly half of the nodes have odd weight
  4. Write C programs for mapping meshes/tori to hypercubes.
Assignment 5 (due 13/1/2004):
  1. Problem 4.1
    Assume the mesh and torus are square MxM
Assignment 4 (due 23/12/2003):
  1. Write, execute and measure parallel MPI programs for matrix multiplication using checkerboard and strip partitioning (where each process calculates a set of rows of the result).
    • Repeat your measurements (at least 4 times) and average the results
    • Use 1, 4, 16, 64 processes, 512x512 integer matrices
    • Your report should contain plots of the results and discussion.
    • All prorgams are to be run in helios.cc and our local net.
  2. Problems: 9.1-9.3
Assignment 3 (due 10/12/2003):
  1. Show the equvallency of butterfly, omega and binary delta networks
  2. Prepare a report on the MOESI cache coherency protocol of the AMD 64-bit processors.
  3. Survey the contemporary bus-based machines of Sun, HP/Compaq and SGI.
  4. Problems: 3.2, 3.3, 3.4, 3.5, 3.6
Assignment 2 (due 2/12/2003):
  1. Learn how to time in Unix; find out about high resolution timers; write a summary report (3-4 pages).
  2. Measure fork()'s and pthread_create()'s time overhead.
    You should repeat measurements many times (e.g. 1000) and average the results. Run your tests in atlantis, titan and helios.cc. Be careful, it is a bit tricky!
  3. Write, execute and measure parallel programs for calculating π = 3,14.. and for matrix multiplication.
    • using SysV shared memory and processes, do π with block scheduling and matrix multiplication with self-scheduling
    • using Posix Threads (pthreads), do π with loop splitting and matrix multiplication with checkerboard partitioning.
    • Repeat your measurements (at least 4 times) and average the results
    • Use 1, 4, 16, 64 processes / threads, 512x512 integer matrices
    • Your report should contain plots of the results and discussion.
    • All prorgams are to be run in atlantis, titan and helios.cc.
    • Here is a serial program for matrix multiplication and two matrices A, B to play with.
  4. Problems: 8.1, 8.3-8.6, 8.8, 8.10-8.14
Assignment 1 (due 18/11/2003):
  1. Create your own home page for this course
  2. Survey all parallel UoI machines in Computer Center & Dept. of Computer Science (IP address/name, model type, # processors, O.S., total memory)
  3. Problems: 1.1, 1.2 & DBPP 1.1, 1.2, 1.3

Course notes/material:


Student / URL
Ιωάννα Ζέλιου ioanna
Ανδρέας Κακολύρης akakolir
Φίλιππος Παπαδόπουλος
Ζήσης Πλίτσης
Θοδωρής Τσώτσος

Course outline:
Introduction: parallel systems and their programming
Shared-memory programming: processes, threads, apps, openMP
Shared-memory machines: architecture, networks, cache coherence, memory consistency
Message-passing programming: communication primitives, MPI, apps
Distributed-memory machines: architecture, interconnection networks, routing, collective communications, distributed shared memory
Performance and metrics: Amdhal, Gustafson, granularity, scalability
Topics on shared-memory systems and programming: TBA
Topics on distributed-memory systems and message passing: TBA


  • The instructor's
  • I. Foster, Designing and Building Parallel Programms, DBPP, available on-line
  • V. Kumar et al, Introduction to Parallel Computing: Design and Analysis of Algorithms
  • D. Culler et al, Parallel Computer Architecture: A Hardware/Software Approach

The library has many relevant technical journals & conference proceedings. Keep an eye on the following general-purpose magazines, too:
IEEE Computer, IEEE Micro, ACM Computing Surveys

A few relevant links:
Some relevant companies:
AMD, Compaq, Cray, DEC, Fujitsu, HP, IBM, Intel, Meiko, MIPS, Motorola, NEC, SGI, Sun