The DEA library is known to work on the following platforms using one of the
Compilers mentioned below:

    Sun     Sparc       Solaris 1.* and 2.*

    Dec     Alpha       OSF/1 1.[23] 2.0
            DecStation  Ultrix 4.[234]

    IBM     RS6000      AIX 3.2

    SGI     Indigo,     IRIX [45]*
            PowerChallenge

The performance of the library functions - especially dea - depends highly
on the choice of the compiler and its options. A few machine instructions more
or less are quite noticeable.
All machine dependencies are bundled in machine.h. There are three of them:

    1) a typedef for an unsigned integer of 32 bits size
    2) a macro for converting 4 consecutive bytes in ascending order such that
       the first byte becomes the most significant, the second byte second most
       sigificant, .. in an integer.
    3) a define to chose between two addressing modes for fast S-Box table
       lookup. If BYTEADDR is 1, addresses will be represented as byte pointers,
       otherwise as integer pointers.

For every supported platform there's a #if conditional testing the CPU type
around three lines which set the apropriate machine dependend defines.

Below you'll find a list of tested configurations, the first one being the
fastest. The others are in no specific order. The comments below each
configuration give more explanations. Talking about gcc means version 2.5.8.

Solaris 1
Solaris 2   CC      = gcc
            ANSI    = '-ansi -pedantic'
            WARN    = -Wall
            OPTIM   = -O2

There was no chance to test the unbundled C compiler under solaris 2 since the
licence had expired. Early version showed that gcc 2.3.3 was better than C
2.0.1. The Solaris 1 C compiler doesn't understand ANSI C.

OSF/1 1.3   CC      = gcc
            ANSI    = '-ansi -pedantic'
            WARN    = -Wall
            OPTIM   = -O2
        or
            CC      = cc
            ANSI    = -std1
            OPTIM   = -O
        or
            CC      = c89
            ANSI    = -std
            WARN    = -check
            OPTIM   = -O

Both, cc and c89, are ridiculously slow compared to gcc. These machines don't
deserve these compilers. Even compiling with -O3 doesn't help. Defining
BYTEADDR to 1 or 0 doesn't matter. Both addressing modes seem to be equally
fast.

OSF/1 2.0   CC      = cc
            ANSI    = -std1
            WARN    =
            OPTIM   = -O
        or
            CC      = gcc
            ANSI    = '-ansi -pedantic'
            WARN    = -Wall
            OPTIM   = -O2

The newer cc is as good as gcc when encrypting but still generates much
slower code for key schedule generation and -O3 slows everything down.

DecStation 5?00
Ultrix 4    CC      = gcc
            ANSI    = '-ansi -pedantic'
            WARN    = -Wall
            OPTIM   = -O2

The systems C compiler doesn't understand ANSI C before Ultrix 4.4 but it's
still slower than gcc.

Indigo
IRIX 4      CC      = cc
            ANSI    = -ansi
            WARN    = '-fullwarn -woff 269'
            OPTIM   = -O -mips2 -sopt
        or
            CC      = gcc
            ANSI    = '-ansi -pedantic'
            WARN    = -Wall
            OPTIM   = -O2

The mips compiler is slightly faster but that's not exactly reproducible. I
believe that the granularity of the system timer is to large.

PowerChallenge
IRIX 5      CC      = gcc
            ANSI    = '-ansi -pedantic'
            WARN    = -Wall
            OPTIM   = -O2
        or
            CC      = cc
            ANSI    = -ansi
            WARN    = -fullwarn
            OPTIM   = -O [-mips2 -sopt]

The SGI mips Compiler is 5% to 10% slower than gcc with or without allowing
Mips 2 instruction set or running the source-to-source optimizer, even global
register allocation does not help.

RS6000 320H
AIX 3.2.5   CC      = cc
            OPTIM   = -O3
        or
            CC      = gcc
            ANSI    = '-ansi -pedantic'
            WARN    = -Wall
            OPTIM   = -O2

I'm not sure wether the 'aggressive' optimizations allowed with -O3 will
work on other RS6000 CPUs as well. Anyway, gives about 15% performance increase
compared to gcc -O2. cc -O2 is slower than gcc.
