On 20/10/2008, Shot (Piotr Szotkowski) <shot / hot.pl> wrote:
> M. Edward (Ed) Borasky:
>
>
>  > Last time I looked, the difference between no optimization
>  > whatsoever and "-O3 -march=<your chip here>" was about 30 percent.
>
>
> What benchmarks did you use? In my code's case, the difference between
>  empty CFLAGS and CFLAGS='-O3 -march=native' is minimal (Athlon 64 X2).

It also depends on your chip.  Recent AMD chips tend to have sane
design wrt balance of number of execution units, cache sizes, decoder,
etc. The parts fit well together so the CPU can handle any code
without much trouble.

On the other hand, Pentium4 chips (before Core2 which are sometimes
also called P4 for some reason) were very poorly designed with slow
decoder and inbalanced number of execution units. The compiler can
reorder instructions so that they can get to the execution units
faster on this chip and achieve better saturation of the CPU hence
improving performance considerably.

>
>  (gcc's man page says -march implies the same -mtune, and that 'native'
>  is inteligently handled to mean whatever arch is the best in my case.)
>
>
>  > BTW ... 64-bit compiled is slower than 32-bit compiled on a 64-bit
>  > chip, too ... cache sizes, alignments, and such, I suspect, though
>  > I haven't taken the time to profile it.
>
>
> That's interesting. Can I build 32-bit Ruby and use it inside my x86_64
>  system? If so, how? (Sorry, I'm a total novice when it comes to this.)

You probably do that by passing some parameter to gcc.

Obviously you would need 32bit versions of all the libraries you use
in your extensions.

And you would not be able to use as much memory. The 32bit address
space is very limited (normally only 1-2GB on Linux).

Thanks

Michal