I came across an interesting introductory codeproject tutorial on Compare and Swap (CAS) instructions* for lock free parallelism. Parallel programming frequently makes use of locking constructs, such as mutexes, semaphores, and critical sections to suspend other threads while the lock is in place. Such suspensions typically consume a great deal of time and thus adversely affect system performance. Lock free and wait free parallelism in contrast uses atomic instructions such as CAS to guarantee that data is not corrupted during writes, and to avoid suspension of other threads. A good deal of theoretical background can be found by following the links from this Wikipedia article.
Valve has published a GDC 2007 presentation called Dragged Kicking and Screaming: Source Multicore that talks about how they migrated to a lock-free threading system; some parts of their implementation are also wait-free. They structured their game architecture around what they call a hybrid threading model, where algorithms are broken up according to appropriate task models. They support high level constructs of function threading, where work is parceled out to other threads, but appears to complete in line, array parallelism, where work is described in array slices, and queued execution. To that end, they provide some templated functors and functional interfaces that abstract the implementation details from the application programmers.
For comparison with another approach, please see this analysis of multi-threaded rendering in Capcom's MT2 engine. The MT2 engine has to deal with further complexity to support parallelism on the PS3 in that every SPU task has to have a corresponding PPU thread to marshal it.
I definitely have a lot to think about before I approach my next parallelization task!
* CAS takes three arguments; an address, the expected old value, and the new value. If the expected old value and the data at the address agree, then the new value is written to the address. If the address was written, success is reported. This allows an algorithm to read a value from memory, modify it, and write it to memory only if no other process has written to it in the mean time.
In a typical implementation, a lock free algorithm using CAS re-attempts its computation if a value was written under it. This algorithm would not be wait-free. The tutorial on codeproject implements a lock-free queue implementation that waits. The linked list implementation in Valve's presentation also waits. The algorithms are structured around atomic operations such as rewriting the head or tail of the list.