These results are wrong. To test correctly you need to compare results of your function to known higher precision variant. To test 32-bit floats you need to compare it to library double (which is proven to be within few ulps). And even then it's still technically not corrert; there are cases where proper rounding of the result requires more than double precision.
For approximation we don't care about correct rounding at all. All we care is precision for performance, and usually few ulps are more than enough.
For sine and cosine the precision/performance optimum is minmax polynomial consisting of odd/even terms. Can't do much better without using tables. Pade/taylor approximations or loops of any kind will not deliver the performance.
Fiddling around with approximations is not really worth it unless you really know what you are doing.