By the late '70s, personal computers were available from many vendors, such as Tandy, Commodore, TI and Apple. Computers from different vendors were not compatible. Each vendor had their own architecture, their own operating system, their own bus interface, and their own software. When you purchased a computer, you implicitly made a major commitment to that vendor's standards.
In 1980, IBM decided to enter the PC market. They realized—correctly—that the never-ending fall in DRAM prices would soon make the 8/16 architecture obsolete. The next logical step would have been, say, a 16/32 architecture, such as the Motorola 68000. A 16/32 architecture would have improved performance by doubling memory bandwidth, and provided a 4G byte address space—enough for the foreseeable future.
Unfortunately, the 68000 wasn't actually available in 1980, and neither were any other single-chip microprocessors with a 16/32 architecture. The problem is that die size scales up directly with register width. When you go from 8/16 to 16/32, all the registers get twice as big, all the data paths get twice as wide, the ALU's adder carry chain gets twice as many terms—the whole design doubles in size. And in 1980, process technology simply hadn't reached the point where a single-chip 16/32 microprocessor could be fabricated at a marketable price.
The four segment registers define four segments:
CS | the code segment |
DS | the data segment |
SS | the stack segment |
ES | the extra segment |
Most operations implicitly use the correct segment register: instruction fetches use CS, loads and stores use DS, pushes and pops use SS. A few operations, such as block move, use both DS and ES—one for source and one for destination.
Intel's documentation describes this architecture as a programming convenience: here's your code, here's your data, each neatly stored in its own segment. Students of computer science also like segmented architectures for various reasons having to do with OS design.
However, actually programming this machine is a nightmare. You can never just address anything. First, you have to make sure that a segment register is set up for it, and then you have construct the address as an offset into that segment. A segment register can point anywhere in the entire 1M byte address space, but once it has been set, it only provides access to a 64K segment. If you have more than 64K of code or data, you have to reload segment registers on the fly. A particular problem is that there is no good way to index into an array that is bigger than 64K bytes.
Although IBM initially captured the PC market, it subsequently lost it to clone vendors. Accustomed to being a monopoly supplier of mainframe computers, IBM was unprepared for the fierce competition that arose as Compaq, Leading Edge, AT&T, Dell, ALR, AST, Ampro, Diversified Technologies and others all vied for a share of the PC market. Besides low prices and high performance, the clone vendors provided one other very important thing to the PC market: an absolute hardware standard. In order to sell a PC clone, the manufacturer had to be able to guarantee that it would run all of the customer's existing PC software, and work with all of the customer's existing peripheral hardware. The only way to do this was to design the clone to be identical to the original IBM PC at the register level. Thus, the standard that the IBM PC defined became graven in stone as dozens of clone vendors shipped millions of machines that conformed to it in every detail. This standardization has been an important factor in the low cost and wide availability of PC systems. It has also been a serious obstacle in the attempt to move beyond the limitations of the PC architecture.
Segment | Use |
---|---|
0-9 | RAM |
A-B | video RAM |
C-D | ROM on I/O cards |
E-F | ROM BIOS (operating system code) |
With 10 segments allowed for RAM, the PC can address up to 640K bytes of main memory. The space between 640K and 1M is reserved for hardware and operating system use. By the mid '80s this architecture was becoming obsolete. 256K and 1M byte DRAM chips were available; users were buying PCs with a full complement of 640K of RAM and wanted more. Unfortunately, as the table above shows, there is no place to put any more memory on a PC.
One solution was bank-select memory systems. A vendor would design a memory card, add some bank-select registers, and map selected blocks of memory into the PC address space, typically at C0000. With a bank-select system, the programmer is responsible for managing the bank-select registers and keeping track of which bank has which data. Today, bank-select systems generally conform to the Lotus/Intel/MicroSoft Expanded Memory Specification (LIM-EMS). In this context, the word expanded specifically denotes a bank-select system.
Expanded memory no doubt allowed a few programs to use more than 640K of RAM, but it is clearly inadequate as a long-term solution to the need for more memory. The only real solution is to move to a bigger architecture. Intel took the first step by introducing the 80286 processor.
The 80286 gives the programmer a 16M byte address space. However, it is still hamstrung by the need to manipulate segment registers, and the fact that each segment is limited to 64K bytes, as in the 8086. More significantly, the 80286 is limited by the need to remain PC compatible.
Intel knew that they could not market a new processor unless it could run existing PC programs. Therefore, they designed the 80286 with two different execution modes: real mode and protected mode. Protected mode is the 16/24 architecture just described. Real mode is an exact emulation of the 8086 16/16 architecture. Real mode is sometimes called DOS mode. When an 80286 powers on, it boots up in real mode. This allows it to function as the processor in an IBM PC clone. Used this way, the 80286 provides a performance boost, due to its faster clocks and 16-bit data busses. However, the programmer is still restricted to the PC architecture, with its 1M byte address space and 640K RAM limitation. Since DOS and PC programs will not run on an 80286 processor in protected mode, most 80286 processors are run in real mode.
The most common use of extended memory is to provide a RAM disk for a DOS system. When the program wants access to data stored on the RAM disk, it sets a mode bit that switches the 80286 to protected mode. This gives it access to extended memory. The program then performs the desired data transfer between its own memory space and the RAM disk in extended memory. There happens not to be any way to return from protected mode to real mode, so the program must then save its state and reset the processor. Upon rebooting, the processor resumes execution of the original program in real mode. In practice, all of this is handled by a device driver for the RAM disk, such as RAMDRIVE.SYS.
Intel intended the 80286 to provide a path for upward evolution of PC systems. In particular, they hoped that its DOS compatibility mode would allow it to gain acceptance, and that once there was a sufficient installed base of 80286 processors, software developers would begin writing operating systems and programs that used the features of protected mode. What actually happened was that PC clone vendors used it as a high-performance 8086, users ran it almost exclusively in real mode, and software developers balked at the intricacies and limitations of the protected mode segmented architecture.
In protected mode, the 80386 is a 32/32 architecture. The segmentation scheme is even more complex than that of the 80286, and I'll spare you the details. It does, however, allow 32-bit segment offsets, so a single segment can be up to 4G bytes. This allows a programmer to define a single segment that covers all of available memory, instead of having to continually juggle a collection of 64K byte segments. It also allows indexing into arrays that are larger than 64K bytes.
In real mode, the 80386 provides an exact emulation of the 8086 16/16 architecture.
Unfortunately, the capabilities of the 80386 are little more used than those of the 80286. DOS and PC programs will not run on an 80386 processor in protected mode, so most 80386 processors are run in real mode. The processor in my current machine runs in real mode. It provides access to 640K bytes of main memory and a 3456K byte RAM disk, for a total of 4M bytes of installed RAM.
One issue is whether addresses are 16 bits or 32 bits. A 16-bit address provides an offset into a single segment. The segment register must have already been loaded with the appropriate base address. A 32-bit address provides both a segment base address and an offset into that segment. When accessing memory through a 32-bit address, the program first loads the segment register from the upper 16 bits of the address, and then uses the lower 16 bits of the address as an offset into that segment. 32-bit addresses require more memory and more CPU cycles, but they provide access to the entire 1M byte address space of the 8086 processor.
If a program has less than 64K of data, then it can put all of its data into a single data segment and use 16-bit addresses to access it. Similarly, if a program has less than 64K of code, it can put all of its code into a single code segment, and use 16-bit addresses for jumps and subroutine calls. Conversely, if code or data do not fit within these limits, then the program must use 32-bit addresses. MicroSoft C provides for all four possibilities, through a set of memory models:
Memory model | Data addresses | Code addresses |
---|---|---|
Tiny | 16-bit | 16-bit |
Small | 16-bit | 16-bit |
Compact | 32-bit | 16-bit |
Medium | 16-bit | 32-bit |
Large | 32-bit | 32-bit |
Huge | 32-bit | 32-bit |
The Tiny memory model is same as the Small model, except that the size of the code and data segments together must not exceed 64K bytes. Also, the Tiny model produces a .COM file instead of a .EXE file. .COM files are slightly smaller and load slightly faster under DOS.
The Huge memory model is the same as the Large memory model, except that individual arrays can exceed 64K bytes in size. However, addresses are still stored as segment:offset pairs, and the compiler declines to perform full 32-bit address arithmetic on them. As a result, Huge arrays are subject to the restriction that the size of the array element must be a power of 2.
Choice of memory model is a compile-time option, so you can easily experiment with different models.