Latency

As rising bandwidth, is comparatively simple, just by adding more channels and widening bus, by causing more data transfers to happen per clock, or just by rising speed of data transfer, lowering latency is not almost as simple. We want to lower latency all we can. Latency is defined as time differential between when a command is given, and when that command is really executed. In any memory read or write operation, there are many things that add to latency. One of the first occurs when Front Side Bus and memory controller are not running at similar clock speeds. When using any divider other than 1:1, two devices have to wait for clocks to match up before any signal can be sent between the two. E.g. when a memory divider of 5:4 is selected, for each 5 cycles of FSB, the memory will only complete 4. That means that only every 5 cycles of FSB can they really talk to each other. Thus if a command is sent from CPU to memory requesting a read of data on second cycle after they just matched, request must sit there waiting 3 more cycles before that request can really go out. This is why it is always better for memory performance to attempt and use a 1:1 divider over all others, and so the increase of high speed modules, to match up with the quicker FSB's being used.

Rest of the latency is selected in memory module. When a request for a certain address is sent out by north bridge memory controller, an ACTIVE command is first sent to memory. This is followed by row and bank address, which cause desired row to go from "pre-charge" to "active". This is t RP that can be adjusted in BIOS's. Generally this operation can be done in 2, 3 or 4 cycles. Following that is the Row Address Strobe (RAS) to Column Address Strobe (CAS) Delay or t RCD operation. This is the amount of time necessary to send contents of row to a buffer on the module. It is measured in 2, 3 or 4 clock cycles also. After this, comes ballyhooed t CL or CAS Latency operation. This operation carries with it the column address, and takes 2, 2.5, or 3 clock cycles to send contents of now defined cell to driver. From there the burst happens to send data out onto bus. Now here is why CAS Latency matters. If next memory read or write is to similar row, the only latency that is acquired is t CL to move from one cell to another. This often occurs, as longer data strings usually take up many successive addresses. Once another row is necessary though, latency called t RAS is incurred previous to next row can be called, and t RP starting process over again. t RAS can be performed in 5,6,7 or 8 clock cycles. This means that if a new row is required, 12 clock cycles of NOPs should be waited before the data can be sent. This is simplified version of what occurs through a read/write operation. There are many more operations that happen, though that most motherboards change in BIOS.


Now, that possibly was a bit confusing. I wish this example will show WHY it is difficult to lower latency, compared with bandwidth. Remember that latency's have lowered about 120ns to around 50ns. In less time than that, bandwidth has moved about 1GB/s for PC133, to about 8GB/s (theoretical) for a dual PC4000 setup.

Consider a NASCAR team through a pit stop. The car first has to come in and stop in its stall. Then it must be jacked up on driver side, fuel added, have the tires removed and replaced, be dropped back down, everybody race to other side, jack up passenger side, do similar procedure with tires, drop the car again, make any changes to spoiler needed, and take out fuel hose before car can leave. Now, if a crew chief has trained his crew to point where they actually can't obtain any faster, what is he do? His only choice is to remove some of the processes. The problem with latency is it is hard to remove any of the steps, and still retain stability. Witness Intel's problems with Performance Acceleration Technology, known as "PAT" in the i875 boards, and many various names in i865 ones. Enabling it tends to cause stability problems in not as fast silicon of Springdale's, because it is removing some of the steps through memory access.