Christer Ericson has posted an analysis of input to rendered frame lag, with reference to Mick West's article in the September 2007 Game Developer magazine. Christer talks about restructuring the game loop to minimize latency, and derives the lowest possible latency structure. Between a user forming an intention, such as “press the button”, and the button actually getting pressed, there is a delay on the order of several tens of milliseconds. This delay is a result of the fact that neural signals travel at about fifty kilometers an hour, and have got a distance of around a meter to traverse, and proprioception requires a round trip to signal that the finger has felt the push. Our minds have already factored this delay into our perceptions, but it is a contributing case cause of games feeling mushy when frame rate lag is occuring - we perceive that the proprioceptive loop is closed (we feel that we wanted to push the button, and that the button was pushed), but our eyes see the lag of the action on the screen and that sensory disconnect results in the mushy feeling. An amusing side note is that the size of a baseball field naturally matches this inherent round trip latency - the distance of the pitcher to the batter is just larger than the speed of thought quite literally (coupled with the ability of the human musculature to accelerate the bat - 45m/s vs 18m = 400ms). The distance of a soccer penalty kick to the goalee is actually just smaller than the speed of thought + reaction time, which I think reflects the character of the gameplay! (Thanks to Mick for feedback on this point.) Christer's latency reduction technique requires advanced ju-ju, in that you have to know how to structure your rendering system such that the GPU can consume the render instructions as fast as you can generate them. Christer is right on in that he puts the vertical blank and the draw submissions right at the top of the frame - this gives the GPU the best possible chance to beat the raster to the next blank. I have a further extension to Christer's thesis, which is that in a multiple CPU scenario, there is in fact one more frame of delay that can encrue. If there is simulation going on, such as particle physics, smoke, and other such systems that don't need to show instantaneous response to input, it's possible to put that work on another CPU, and have it lag the game itself by a single frame. In the diagram, light blue indicates the shortest possible latency between user input and display. Yellow indicates the maximum latency between a user input and display, this being the case where the user performed an input in the Logic segment of Frame 1. Green indicates the latency of simulation to the display, and red indicates the worst case latency of simulation that reacts to user input from a previous frame. Nonetheless, having simulation as a separate process has a major advantage, in that heavy simulation work can proceed at its own pace and not block the critical draw segment which must always run faster than the GPU. It also allows an obvious way to leverage multi-core or multi-processor systems. The disadvantage of this is that resources such as texture or vertex buffers may need to be double buffered. Consider the calculation of a water surface. If the water is simulated in frame 2 in the diagram, it will be submitted to the GPU in frame 3. During that period, the vertex buffer is likely to be locked, so in order to calculate frame 4's water during frame 3, a second buffer is necessary in that period. An advanced technique is to put fences and syncs into the Draw segment. The fence would be injected immediately after the submission of the water surface in the Draw segment. When the fence is hit, the water simulation could be unblocked and it could recycle the vertex buffer as long as it is guaranteed that the simulation and buffer rewrite can be finished before the Render needs it again. A sync in the Draw segment just before the submission of the water surface would synchronize the simulation thread to ensure that the thread is done with the water surface. Ideally, in this advanced scenario, the simulation and render are interleaved such that thread synchronization occurs incredibly rarely allowing all processors to speed at maximum throughput. Our later internally developed PS2 titles worked this way. The submission engine architecture I talk about in other posts leverages this structure. The submission engine tracks a persistent draw list in order to leverage frame to frame coherency to minimize the amount of CPU time consumed by the Draw segment. The submission engine double buffers the draw list so that the creation of the next draw list can occur while the present one is being processed. This further leverages multi-core/multi-processor systems by reducing hard dependencies between segments.