Network processors exploit parallelism to attain wire-rate processing speeds. Different network processor architectures have exploited several kinds of parallelism including packet-level parallelism, instruction-level parallelism and thread-level parallelism. Packet-level parallelism. Incoming packets are independent of one another and thus exhibit significant 'packet level parallelism' for .the network processor to exploit. This, however, is not always the case as ordering constraints on packets within any given flow limit the available parallelism.
The key idea behind exploiting packet-level parallelism is to design a parallel architecture by employing several processing engines (PEs) in parallel so that each PE can operate on a separate packet. All such PEs are pro¬grammed to perform similar functions. A straightforward implementation ap¬proach is to use simplified RISC cores interconnected through a shared bus as the PEs. Most commercial network processors make use of packet-level parallelism by, typically, integrating multiple RISC cores on a single chip.
Instruction-level parallelism. When instructions are independent of one another, their execution may take place in parallel or may be overlapped. This is instruction-level parallelism. Many network processors use heavily pipe¬lined and multiple-issue architectures to exploit instruction-level parallelism and thus attain enhanced performance of individual packet processing. Multiple-issue architectures include both superscalar and very long instruction word (VLIW) processors.
Thread-level parallelism. Thread level parallelism lets the processor process different instruction streams (thread code) at the same time. A variety of architectures exploit thread-level parallelism to improve performance by executing multiple threads within one or multiple processors (PEs). Typically, a superscalar processor forms the basis of multithreaded architecture by adding hardware support for multiple threads.
Types of multithreaded architectures include fine-grain multithreaded (FGMT) processors, simultaneous multithreaded (SMT) processors and single-chip multiprocessors (CMP).
Fine-grain multithreaded architecture. The FGMT architecture can fetch instructions from the execution of a different thread each cycle. This architectural approach exploits per thread instruction-level parallelism and also can switch between threads when one thread is stalled. The .technique is limited by the degree of parallelism found in a single thread in a single cycle.
Simultaneous multithreaded architecture. The SMT architecture extends the FGMT architecture by adding support for instructions to be fetched and issued from multiple threads within the same cycle. One of the key advantages is that it can cope with varying levels of instruction-level parallelism within each thread and thread-level parallelism between threads. If there is insufficient thread-level parallelism, most of the SMT processor's resources can be used to exploit instruction-level parallelism. On the other hand, when more thread-level parallelism exists, this parallelism can compensate for a lack of instruction level parallelism within each thread. Thus the SMT architecture can more effectively use processor resources to achieve greater instruction through put.
Chip-multiprocessor. A CMP is a single chip partitioned into multiple independent processors where each processor can operate on a different thread. This approach lets multiple threads execute completely in parallel but the CMP architecture restricts a thread to use only resources within the processor on which it is executing. This prevents multiprocessors from fully exploiting all possible instruction-level parallelism.




Reply With Quote
Copyright Techfuels
Bookmarks