-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x86-32: floating-point return values undergo implicit format conversion #66803
Comments
@llvm/issue-subscribers-backend-x86
With SSE2 disabled, the floating-point semantics are pretty hopeless, with implicit conversions happening all over the place (e.g. spilling registers to stack). Most of those issues go away with SSE2 enabled, because we use SSE2 instructions and registers for all float/double operations.
The remaining issue with SSE2 enabled, is that the default C ABI requires that float and double values are returned in x87 registers. Returning a float or double value thus converts to x86_fp80 (and then back, in the caller). This conversion means that a signaling NaN cannot be returned, because the behind-the-scenes conversion to x87_fp80 will raise an FP invalid exception, and quiet the NaN. LLVM does support other ABIs which don't have this problem: you can either use an alternative calling convention on the function (such as "fastcc"), or by annotating the return type with "inreg" (as seen here): llvm-project/llvm/lib/Target/X86/X86CallingConv.td Lines 300 to 304 in 575a648
While this is a fundamental problem with the x86-32 ABI, I believe we could potentially fix it on the LLVM side, without breaking the ABI, because loading/storing an 80-bit value from x87 FPU register does not trigger a conversion operation. Thus, we could potentially write custom conversion routines to go from 32/64-bit float to 80-bit float (and back), and use that at the call boundary. Such a routine would have runtime overhead vs using the X87 FPU's native conversion support, and it's also unclear whether anyone cares enough about precise x86-32 FP semantics in order to actually bother implementing it. But, it seemed worth at least recording the issue, and a possible resolution. |
See also Rust investigations linked from rust-lang/rust#115567 @RalfJung |
See also #29774 (7 years old now :) where it keeps asserting when SSE2 is disabled, due to some internal inconsistency. This has never been fixed, and there are lots and lots of duplicates... |
This issue is specifically about the problems that remain when SSE2 is enabled. |
I think we may need to extend the |
Could this be routine be called for NaN values only (to preserve their bits perfectly), using the native routine for other values? Then it could be skipped entirely when |
The requirement is that a round-trip from float/double (in memory) to x87 register and back to memory preserves the exact bitpattern of the input, and raises no fp exception flags. I believe this is true for all values other than sNaN. So, totally doable. But, it's still pretty complex, and will have some performance/code-size cost, so I doubt we want to enable this by default. sNaN is a pretty niche feature, and x86-32 is becoming a pretty niche architecture. The combination of people who care about sNaN on x86-32 is likely pretty negligible. But, it would be nice to at least offer a mode where this works. We could use a routine like below to convert from a float in integer form, to a value in an x87 register (almost-C code, but hand-waving away the x87 calling convention boundaries).
Then, of course, in the caller, we'd need to do the same thing in reverse, to convert from 80-bit x87 back to float. Since you already need to copy and store out the 80-bit value from the x87 register to memory, to even detect that it's an sNaN, a separate inline fast-path doesn't seem that doable. Something like:
Should be relatively straightforward to add a set of routines like that to compiler-rt (written in asm, so they can use a nonstandard ABI to deal with x87 stack I/O), and have the backend emit a call instead of native FST/FLD instructions. Could some of it be inlined instead? Perhaps, but IMO not terribly worthwhile, given that the check for whether to take the slow-path is already half the code. I don't plan to work on this, but if someone else does, hope the above helps. :) |
With SSE2 disabled, the floating-point semantics are pretty hopeless, with implicit conversions happening all over the place (e.g. spilling registers to stack). Most of those issues go away with SSE2 enabled, because we use SSE2 instructions and registers for all float/double operations.
The remaining issue with SSE2 enabled, is that the default C ABI requires that float and double values are returned in x87 registers. Returning a float or double value thus converts to x86_fp80 (and then back, in the caller). This conversion means that a signaling NaN cannot be returned, because the behind-the-scenes conversion to x87_fp80 will raise an FP invalid exception, and quiet the NaN.
LLVM does support other ABIs which don't have this problem: you can either use an alternative calling convention on the function (such as "fastcc"), or by annotating the return type with "inreg" (as seen here):
llvm-project/llvm/lib/Target/X86/X86CallingConv.td
Lines 300 to 304 in 575a648
While this is a fundamental problem with the x86-32 ABI, I believe we could potentially fix it on the LLVM side, without breaking the ABI, because loading/storing an 80-bit value from x87 FPU register does not trigger a conversion operation. Thus, we could potentially write custom conversion routines to go from 32/64-bit float to 80-bit float (and back), and use that at the call boundary.
Such a routine would have runtime overhead vs using the X87 FPU's native conversion support, and it's also unclear whether anyone cares enough about precise x86-32 FP semantics in order to actually bother implementing it. But, it seemed worth at least recording the issue, and a possible resolution.
The text was updated successfully, but these errors were encountered: