The rapid expansion of software from simple text-based tools to massively complex, feature-rich, highly visual products would dominate the mass-market computing world during the 1980s and 90s. And with this push, came a higher demand on processors to both efficiently utilize more memory and grow in computing power, all while keeping costs at consumer accessible levels.
RISE OF 32-BIT
During the mid-1980s, in response to the growing demands of software, the opening moves towards the mainstream adoption of 32-bit processor architecture would begin. While 32-bit architectures have existed in various forms as far back as 1948, particularly in mainframe use, at the desktop level only a few processors had full 32-bit capabilities. Produced in speeds ranging from 12Mhz to 33Mhz, the 68020 had 32 bit internal and external data buses as well as 32-bit address buses. It’s arithmetic logic unit was also now natively 32-bit, allowing for single clock cycle 32-bit operations.
One year later, Intel would introduce its own true 32-bit processor family, the 80386. Not only did it offer a new set of 32-bit registers and a 32-bit internal architecture, but also built-in debugging capabilities as well as a far more powerful memory management unit, that addressed many of the criticisms of the 80286.
This allowed most of the instruction set to target either the newer 32-bit architecture or perform older 16-bit operations. With 32-bit architecture, the potential to directly address and manage roughly 4.2 GB of memory proved to be promising. This new scale of memory addressing capacity would develop into the predominant architecture of software for the next 15 years.
On top of this, protected mode can also be used in conjunction with a paging unit, combining segmentation and paging memory management. The ability of the 386 to disable segmentation by using one large segment effectively allowed it to have a flat memory model in protected mode. This flat memory model, combined with the power of virtual addressing and paging is arguably the most important feature change for the x86 processor family.
PIPELINING
CPUs designed around pipelining can also generally run at higher clock speeds due to the fewer delays from the simpler logic of a pipeline’s stage. The instruction data is usually passed in pipeline registers from one stage to the next, via control logic for each stage.
Data inconsistency that disrupts the flow of a pipeline is referred to as a data hazard. Control hazards are when a conditional branch instruction is still in the process of executing within the pipeline as the incorrect branch path of new instructions are being loaded into the pipeline.
One common technique to handle data hazards is known as pipeline bubbling. Operand forwarding is another employed technique in which data is passed through the pipeline directly before it’s even stored within the general CPU logic. In some processor pipelines, out-of-order execution is use to helps reduce underutilization of the pipeline during data hazard events.
Control hazards are generally managed by attempting to choose the most likely path a conditional branch will take in order to avoid the need to reset the pipeline.
CACHING
In caching a small amount of high-speed static memory, is used to buffer access to a larger amount of lower-speed but less expensive, dynamic memory.
A derived identifier, called a tag, that points to the memory region the block represents, amongst all possible mapped regions it can represent, is also stored within the cache block. While simple to implement, direct mapping creates an issue when two needed memory regions compete for the same mapped cache block.
When an instruction invokes memory access, the cache controller calculates the block set the address will reside in and the tag to look for within that set. If the block is found, and it is marked as valid, then the data requested is read from the cache. This is known as a cache hit and it is the ideal path of memory access due to its speed. If the address cannot be found within the cache then it must be fetched from slower system memory. This is known as a cache miss and it comes with a huge performance penalty as it can potentially stall an instruction cycle while a cache update is performed.
Writing data to a memory location introduces its own complication as the cache must now synchronize any changes made to it with system memory. The simplest policy is known as a write-through cache, where data written to the cache is immediately written to system memory. Another approach known as write-back or copy-back cache, tracks written blocks and only updates system memory when the block is evicted from the cache by replacement.