[Contents] [Index] [Help] [Retrace] [Browse <] [Browse >]

The 68030 and 68040 on the Zorro III Bus

by Michael Sinz


The Zorro III bus presents several special design issues for systems
with either a 68030 or 68040 CPU.  This article discusses those
design issues and offers solutions to the potential problems that
they present to those developing Zorro III devices, in particular,
Zorro III devices that are not exclusively memory expansion devices.


Background - 68030 and 68040 Caches

Both the 68030 and the 68040 have two caches; one for instructions
and one for data.  The 68030's caches are 256-bytes long.  The
68040's caches are considerably larger: each is 4K long.  Both CPU's
caches store memory in 16-byte blocks which are referred to as a
cache line.  The CPU only keeps one address for each cache line.
Each cache line is further broken up into long words called cache
entries.  On the 68030, each cache entry is marked as either valid or
invalid, telling the CPU which long words in the cache still contain
valid data.  On the 68040, only an entire cache line can be marked as
valid or invalid.

When the 68030 caches a memory address, it uses bits four through
seven of the address as a hash value.  This value is an index that
tells the 68030 which cache line to use for a specific address.  This
means each memory address corresponds to only one cache line.  For
example, if the 68030 tried to read a long word from 0x0FFFFF9F,
since the 68030 index extends from bit four through seven, the index
is 0x9, which corresponds to the tenth cache line.  This also means
that many memory addresses correspond to the same cache line.  For
example, the addresses 0x01FFFF9F, 0x02FFFF9F, 0x03FFFF9F, and
0x04FFFF9F all correspond to the same cache line.

The 68040 cache uses a similar indexing scheme, but the 68040 is a
little more dynamic.  The 68040 has four cache lines for each index,
so the 68040 cache can hold on to four different addresses that share
the same index.  The 68040 also has a larger index (six bits).

While the caches are active, when the CPU executes an instruction
that reads memory, it will check if that memory is already in the
cache.  If the memory is in the cache and the cache entry (or, in the
case of the 68040, the cache line) is marked as valid, a cache hit
occurs, so the CPU reads from the cache instead of main memory.  If
the memory is not in the cache, or the cache entry (or cache line) is
marked invalid, a cache miss occurs, so the CPU has to perform a main
memory fetch.

The cache lines are used in conjunction with the 68030's and 68040's
burst mode.  Normally, the 68030 fills its caches one entry at a
time.  While in burst mode, the CPU fills its caches a whole cache
line at a time. This helps to reduce the number of cache misses.

On the 68030, it is possible to turn burst mode on and off
independently of its caches.  If the 68030 cache is on and burst mode
is off, the 68030 can fill its cache a single long word at a time,
rather than the four words at a time it would do in burst mode.  The
68040 is different.  On the 68040, the only way to turn on burst mode
is to turn on the cache, so there is no way to prevent a burst access
when using the cache.  The 68040 always fills a whole cache line at a
time.

The instruction cache on both CPUs is fairly straightforward.  As the
CPU fetches instructions from memory, it copies them into the cache
for quick access later.  The only time the CPU changes the
instruction cache is when it does a memory fetch.

The data cache is different.  The data cache can change when the CPU
fetches memory and when the CPU executes an instruction that writes
to memory.  The 68030 and 68040 deal with this differently.

The 68030 data cache is a write-through cache.  This means whenever
the 68030 executes an instruction that writes to memory, the 68030
always performs a write to main memory, even if that address is in
the data cache.

When a data write causes a cache miss, the 68030 will act as if it
has no data cache and write directly to memory (except in
write-allocate mode--see the next paragraph).  When a data write
causes a cache hit, the 68030 will update the cache entry (or
entries) as well as write to memory.  Basically, on a memory write
operation, the 68030 will only update cache entries that are
currently cached.  It will not allocate a new cache entry for a cache
miss.

The 68030 data cache has a mode called write-allocate.  In this mode,
the 68030 not only updates the data in the cache, but, in the case of
a cache miss, the 68030 can also allocate a new cache entry.  While
in this mode, if a data write causes a cache miss, the CPU first
marks the corresponding cache entry as invalid (or, if in burst mode,
the CPU marks the entire cache line as invalid).  If the data write
is a long word write and it is aligned on a long word boundary, the
CPU updates that long word in the cache and marks it as valid. The
68040 data cache is not always a write through cache.  It has a mode
called copyback.  While in copyback mode, a write operation on the
68040 will not write through to memory.  The data will remain in the
data cache until the CPU flushes it out.  For more information, see
the article, ``68040 Compatibility Warning'' from the July/August
1991 Amiga Mail.


Zorro III and the 68030

When the 68030 reads data from a memory address, it will cache that
address only if that memory address is marked as cachable.  Certain
areas of memory cannot be cachable, for example, hardware registers
of a Zorro III card.  On the Zorro III bus, when the CPU attempts to
read an address that is not cachable, the device that exists in that
address space asserts the Zorro III cache inhibit line (/CINH).  The
bus controller will turn this signal into the CPU's cache inhibit
signal, which tells the CPU not to cache the address.

The problem is with the 68030's data cache in write-allocate mode
(which the Amiga OS requires).  When write-allocate mode is disabled,
the 68030 will only allocate a cache entry for a data address if the
address is cachable.  The CPU knows if the address is cachable
because the device told it using the cache inhibit line.

While in write-allocate mode, the 68030 will also allocate a cache
entry during certain write operations.  If, while in write-allocate
mode, the 68030 writes a long word to a long word aligned address,
the 68030 will write to that address and will allocate a cache entry
for that address.  This provides a loophole where the 68030 will
allocate a cache entry for a non-cachable memory address.  If the CPU
does a long word write to a Zorro III hardware register that happens
to be aligned on a long word address, the 68030 will put that address
in the cache.  If the CPU attempts to read from that address again
and that address happens to still be in the data cache, it will see
the value in the cache and will not attempt to read the hardware
register.

So far, the conditions under which this loophole can occur have been
rare.  The loophole requires that a hardware register be both
writable and readable, aligned on a long word address, and be four
bytes long. This precludes the Amiga custom chip registers as they
are not both readable and writable (in general they are not four
bytes long either).  Zorro II devices don't apply as Zorro II devices
only have a 16 bit wide data path.  The small size of the 68030's
data cache also makes it tough for a register write and read to occur
without a cache flush happening in between.

However, as Zorro III devices start to hit the market, the conditions
under which the loophole can occur will become more commonplace.  To
avoid this problem, Zorro III card designers can utilize the
following hardware trick.

The trick is to ``mirror'' all of the hardware registers.  In this
scheme, every register that is both readable and writable is
accessible at two addresses.  One address is exclusively for reading
and the other address is exclusively for writing.  Now, if the 68030
performs a write and allocates a cache entry, the 68030 caches the
writing address, but not the reading address.

Another hardware trick that might seem to be a viable solution is to
align 32-bit register ports so that they do not fall on a long word
boundary.  Using this method, the 68030 will never cache the register
address on a data write because it is not aligned properly.  The
problem with this method is that reading (or writing) a long word
from a non-long word aligned address is considerably slower than from
a long word aligned address. This can almost double the amount of bus
traffic, making the entire system slower.


The 68040 and Zorro III

The 68040 does not have the problem that the 68030 has with Zorro II
space.  The 68040 contains two registers to give data space a default
mapping without the need of a Memory Management Unit (MMU).  On an
Amiga with a 68040, Exec uses one of these registers to map the low
24-bits of the Amiga's address space (the Zorro II range,
$00000000-$00FFFFFFFF) as non-cachable and serialized1 .

The Amiga uses the second register to map the remaining memory
($01000000-$FFFFFFFF) as cachable and non-serialized.  Because of its
mapping, any RAM in this region will yield considerably higher
performance than RAM in Zorro II space.  Unfortunately, this mapping
can cause problems for a Zorro III device that is not RAM.

When the 68040 accesses a Zorro III device that is in cachable
address space, the device can still tell the CPU that an address is
not cachable by asserting the CPU's cache inhibit line.  This
overrides the default mapping the 68040 has placed on the address
space.  However, this does not stop the CPU from doing a full line
burst.  When accessing address space mapped as cachable, the 68040
will always attempt to read or write a block the size of an entire
cache line (four long words).

This presents a problem when the 68040 attempts to read from a Zorro
III device that is in cachable address space and the device asserts
the CPU's cache inhibit line.  The 68040 cannot notice that the Zorro
III device asserted the CPU's cache inhibit line until the 68040
reads the first long word of the burst cycle.  By the time the 68040
sees that the first long word is not cachable, it is already too late
to stop the burst cycle, so the 68040 finishes the burst.  When the
burst is done, the 68040 throws out the extra three long words from
the burst read.  In this case, the 68040 performs four long word
memory accesses instead of just one.

Writing to the device is even worse.  When the 68040 writes data to
an address that is not currently in the data cache, the 68040 will
first try to fill a cache line.  When the 68040 sees that the device
asserts the CPU's cache inhibit line, it will finish the read and
then write out one long word.  Essentially, to perform a single
memory write, the 68040 performs four memory reads and one memory
write.

These excessive memory accesses can significantly hinder system
performance.  Certain Zorro III designs could make the 68040 as much
as four to five times slower.

The full line bursts also cause a second potential problem for some
possible Zorro III devices.  Reading certain types of hardware
registers will trigger the hardware to perform some extra function as
well.  It is not uncommon for a hardware device to supply a new data
value for a register after the CPU reads that register. If a Zorro
III device has such a register and the device is located in cachable
address space, the device can experience problems with reads and
writes of addresses surrounding the register.  If the CPU reads a
second hardware register at an address that is in the same quad-long
word as the register (i.e. the first register's address would be in
the same cache line as the second register's address),   when the CPU
performs its full line burst, it will read the first register in
addition to the second register.  Because the CPU reads the first
register, the device will reload the first register with a new value,
losing the previous value.


The Solution

There is a solution that will fix both potential problems for Zorro
III cards on 68040-based Amigas.  The MMU in the 68040 can map
specific pages of memory as non-cachable.  The 37.10 version of the
68040.library creates MMU tables that map only Zorro III memory
devices as cachable (actually it maps all RAM except Chip RAM as
cachable).  The library marks other Zorro III devices as
non-cachable.  The new library prevents the 68040 from doing full
line bursts to non-cachable devices, so the CPU only reads or writes
one long word at a time.  As the 68040.library uses the MMU to map
all address space, invalid addresses can no longer cause bus errors
(Guru #00000002), which may help a few ill-behaved products to work
on 68040 systems.

The 37.10 68040.library is part of the V39 OS present on the current
A4000.  Developers who are working on or have released 68040-based
expansion devices should contact CATS to obtain information on
distributing the library with their product.

There is only one problem with this solution.  Not all 68040s have
MMUs.  There are three kinds of 68040 chip: the MC68040, the
MC68LC040, and the MC68EC040.  The MC68040 has both an Floating Point
Unit (FPU) and an MMU.  The MC68LC040 is a regular MC68040 without an
FPU.  The MC68EC040 is a MC68LC040 without an MMU.

As the 68040.library requires an MMU to map address space, the fix
described above will not work on systems with an MC68EC040.  Because
burst mode on the 68040 is activated along with the cache, there is
no way to prevent a 68EC040-equipped Amiga from doing full line
bursts when accessing cachable address space.  This means a 68EC040
cannot prevent the excessive reads and writes when reading
non-cachable Zorro III devices that reside in cachable address space.
A 68EC040-equipped Amiga will experience a significant decrease in
performance when accessing non-cachable Zorro III devices.  For this
reason we cannot recommend that anyone use a 68EC040 (or any future
68000 series CPU that has no MMU) as the CPU on a Zorro III bus
system.

If someone decides not to heed this warning and create a 68EC040 CPU
card for the Zorro III bus, there is nothing the 68EC040 card can do
to prevent these problems, although there is still a way for a Zorro
III card to prevent the second 68040 problem (the ``register
trigger'' problem).  A Zorro III card that needs to use ``trigger''
registers can arrange the trigger registers so that each register is
in its own quad-long word.  This way, when the 68EC040 reads one of
these registers, the read operation won't disturb other registers, as
two registers do not reside in the same quad-long word.  Note that
this fix will not prevent the first problem (the performance decrease
problem) and it does not address the possibility that future CPUs may
have an eight or sixteen long word cache line.