Oh.. you might be right. It used to be that -Os was ridiculous, but I guess they split the truly ridiculous stuff into -Oz instead.syntonica wrote: ↑Fri Jun 30, 2023 6:52 amIsn't that -Oz? I've never tried that one. I've only experimented with Os and it really is quite good, but -Ofast always beats it, but not by that much. If I was doing embedded, I'd be quite happy with -Os.mystran wrote: ↑Fri Jun 30, 2023 6:40 am-Os optimizes for size. If it can make the code 1 byte smaller by making it 10 times slower, it'll do that.syntonica wrote: ↑Thu Jun 29, 2023 11:24 pm Isn't this why we have -Os now? It does peephole optimizations and similar, but I think it doesn't unroll loops. I find it does about the same as O3 but not quite Ofast. That is, when my code object is tiny to begin with, there's no point in making it tinier.
edit:
Ok, so basically it looks like in GCC -Os is basically -O2 that turns off a few things like loop alignments and such. In general the stuff that can aggressively increases code size (eg. most of the loop restructing and unrolling and whatever) is all in -O3 for GCC and probably for clang as well.
-Ofast is the same as -O3 + -ffast-math and comparing it with the others is kinda apples and oranges, because -ffast-math allows floating point to be optimized pretending that they are real numbers and you can specify -ffast-math with any of the other optimization levels too.
Returning to the previous example of "(foo+foo)/2" if "foo" is float/double, then without -ffast-math I believe we can rewrite "(foo+foo)*.5" because the reciprocal happens to be exact, but that's it. With -ffast-math (1) we can assume that foo is neither inf or nan, (2) we can assume that (foo+foo) is not inf and (3) we are allowed to neglect the fact that (foo+foo) loses one bit of precision in the mantissa, so we can optimize to "foo" which is too accurate for strict IEEE rules. Without -ffast-math we can't use fused-multiply-adds on architectures where those are available either for the same reason: too accurate.
-ffast-math is also essential for things like vectorization. Suppose you have a loop that's computing the sum of floating point values over an array. This is easy to vectorize, just compute partial sums in parallel then do a horizontal sum at the very end... except without -ffast-math we can't do this, because we can't change the order of additions, because it changes rounding.
If you don't want to use -Ofast because -O3 blows up the code size (or compile time) too much, really just add -ffast-math to whatever other optimization level you want and you'll get a better idea what the real differences are.