Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEG_INFINITY * 0.0 is not a negative NaN but NEG_INFINITY.mul(0.0) is #81261

Closed
glandium opened this issue Jan 22, 2021 · 21 comments
Closed

NEG_INFINITY * 0.0 is not a negative NaN but NEG_INFINITY.mul(0.0) is #81261

glandium opened this issue Jan 22, 2021 · 21 comments
Labels
A-const-eval Area: Constant evaluation (MIR interpretation) A-floating-point Area: Floating point numbers and arithmetic A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@glandium
Copy link
Contributor

The following code:

use std::ops::Mul;

fn main() {
    assert_eq!(1.0f64.copysign(f64::NEG_INFINITY.mul(0.0)), -1.0f64);
    assert_eq!(1.0f64.copysign(f64::NEG_INFINITY * 0.0), -1.0f64);
}

has the following output:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `1.0`,
 right: `-1.0`', src/main.rs:5:5

In other words, the first assert_eq! passes, but the second doesn't, while they should be the same thing.

Another variant that works: https://internals.rust-lang.org/t/should-neg-infinity-0-0-return-a-negative-nan/13883/2

@glandium glandium added the C-bug Category: This is a bug. label Jan 22, 2021
@glandium
Copy link
Contributor Author

The first assert_eq! also fails in release mode.

@bjorn3
Copy link
Member

bjorn3 commented Jan 22, 2021

MIR excerpt:

[...]
        _38 = const NaNf64;              // scope 0 at src/main.rs:5:32: 5:55
                                         // mir::Constant
                                         // + span: src/main.rs:5:32: 5:55
                                         // + literal: Const { ty: f64, val: Value(Scalar(0x7ff8000000000000)) }
        _37 = f64::<impl f64>::copysign(const 1f64, move _38) -> bb8; // scope 0 at src/main.rs:5:16: 5:56
[...]

Miri also fails on the first assertion. This combined with the fact that with LLVM it fails in release mode too makes me suspect that this is a bug in apfloat, which LLVM uses for compile time float operations. Rustc uses the rust port rustc_apfloat for const eval and by extension miri.

@bjorn3 bjorn3 added A-const-eval Area: Constant evaluation (MIR interpretation) A-floating-point Area: Floating point numbers and arithmetic A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 22, 2021
@silene
Copy link

silene commented Jan 22, 2021

Note that the IEEE-754 floating-point standard states the following: "For all other operations, this standard does not specify the sign bit of a NaN result, even when there is only one input NaN, or when the NaN is produced from an invalid operation." (Multiplication is one of these "other operations".) So, the fact that both assertions behave differently is certainly not forbidden by the standard.

@joshtriplett
Copy link
Member

@silene It's allowed by the standard, but it seems inconsistent to have different behavior depending on whether an operation is evaluated at compile time or runtime.

@glandium
Copy link
Contributor Author

Interestingly, the corresponding C++:

#include <iostream>
#include <cassert>
#include <cmath>

int main() {
	assert(std::copysign(1.0, -INFINITY * 0.0) == -1.0);
	return 0;
}

works with GCC at -O0 and -O3, but not clang, at neither optimization level.

@thomcc
Copy link
Member

thomcc commented Jan 23, 2021

It will also depend on the hardware. I've kind of known about this issue for a while (since digging into apfloat a while ago). The default behavior for APFloat's NaNs isn't the same as x86's behavior, which is problematic. In practice I believe the only long term solution is to make apfloat architecture-aware, but this is painful and diverges with what LLVM does.

That said, the printing might violate the IEEE standard, I'm not sure.

CC @workingjubilee, who has dug into some details here more recently, specifically the details around printing.

@RalfJung
Copy link
Member

So, just to make sure I understand -- f64::NEG_INFINITY.mul(0.0) creates a NaN, and copysign is used to test the NaNs sign?

This seems like it could be considered a part or a duplicate of #73328.

@thomcc
Copy link
Member

thomcc commented Jan 24, 2021

Yes, this is probably a duplicate of #73328, aside from the note that it may be non-conformant not to print the sign for -nan.

However, I don't know off the top if that's actually the case, and don't have my copy of IEEE754 handy.

@silene
Copy link

silene commented Jan 24, 2021

In practice I believe the only long term solution is to make apfloat architecture-aware, but this is painful

We did that for CompCert and it was definitely painful, yet CompCert does not even support that many architecture, compared to Rust and LLVM. And the pain was not due to formal verification, because this part cannot be verified anyway. It was a matter of deciphering the (incorrect) documentation of hardware constructors, writing a bazillion of tests to understand what the hardware actually does under the hood (e.g., is a - b physically wired as a + (-b) or not?) The only reason we went though all these hoops is because CompCert's semantics does not allow for unspecified bits (only unspecified bytes). If it did, we would gladly have marked the sign of NaN values as unspecified, as expressed by the floating-point standard.

@bluss
Copy link
Member

bluss commented Jan 24, 2021

Miri also fails on the first assertion. This combined with the fact that with LLVM it fails in release mode too makes me suspect that this is a bug in apfloat, which LLVM uses for compile time float operations. Rustc uses the rust port rustc_apfloat for const eval and by extension miri.

With float arithmetic not being stable in const fn, Rust could conservatively disable const eval for any float expression involving non-finite (intermediate) values - or is there a reason this can't work (? stable const expressions might be a problem).

@RalfJung
Copy link
Member

With float arithmetic not being stable in const fn

Well, float arithmetic is promoted to 'static lifetime on stable though and the result, computed by CTFE, is observable... and when I tried to forbid that crater found various users.

And the ConstProp pass might also be doing float arithmetic. That said, since the LLVM optimizer has the same behavior, it will change little to just disable this part of the rustc optimizer.

@bstrie
Copy link
Contributor

bstrie commented Feb 14, 2021

Rust could conservatively disable const eval for any float expression involving non-finite (intermediate) values

One thing is that, short of compiler magic, you'd need some way to provide f32::NAN, f64::INFINITY et al, which are currently defined like pub const NAN: f32 = 0.0_f32 / 0.0_f32;.

@thomcc
Copy link
Member

thomcc commented Feb 14, 2021

One thing is that, short of compiler magic, you'd need some way to provide f32::NAN, f64::INFINITY et al, which are currently defined like pub const NAN: f32 = 0.0_f32 / 0.0_f32;

It might be fine to allow expressions with NaN outputs, but not inputs.

This would still require fixing APFloat bugs, but not as many.

@RalfJung
Copy link
Member

IMO the best way forward for apfloat issues is to report them upstream in LLVM, help fix them there, and backport fixes to the Rust version.

@thomcc
Copy link
Member

thomcc commented Feb 15, 2021

I would guess that they don't care that the sign differs for NaNs on the target platform.

Also, fixing it requires complicating the API of APFloat so that it's aware of the target, so it's not a trivial change, unfortunately.

@RalfJung
Copy link
Member

Oh right, I forgot... the spec says the sign is not specified.

In that case I am not even sure if there's a bug here, or does IEEE mandate that these operations all be deterministic? And even then that seems like something we are already tracking at #73328, so I am inclined to close this as a duplicate.

@eddyb
Copy link
Member

eddyb commented Mar 13, 2021

The only reason we went though all these hoops is because CompCert's semantics does not allow for unspecified bits (only unspecified bytes). If it did, we would gladly have marked the sign of NaN values as unspecified, as expressed by the floating-point standard.

I wonder how infuriatingly difficult having undef bits (instead of bytes) in miri would be. It certainly feels like the clean solution, though you can't actually use undef, since it's not UB to observe those bits, just undesirable at e.g. compile-time.

@RalfJung
Copy link
Member

I wonder how infuriatingly difficult having undef bits (instead of bytes) in miri would be. It certainly feels like the clean solution, though you can't actually use undef, since it's not UB to observe those bits, just undesirable at e.g. compile-time.

The undef_mask in Allocation would have to become per-bit... but the more interesting question is, what would that entail? There is no operation in Rust to load a single bit from memory, and any byte-wise load would "explode" the undef to the full size of the load anyway. And even if we fixed that (which might not be faithful to what LLVM does), we'd still get UB when such a "value with an undef bit" is used for any kind of arithmetic, including further floating-point arithmetic.

So I see no practical way in which this would help here.

@eddyb I see you thumbs-upped by previous comment, so I am going to close this as a duplicate of #73328.

@workingjubilee
Copy link
Member

Just to clarify, IEEE754-2019 does not specify or interpret the sign bit of the quiet NaN result from this multiplication, correct, and if the IEEE754 function copySign, that could be represented in Rust as

fn copySign<T: Float>(x: T, y, Y:) -> T

is used, it gives "unspecified sign if y is a NaN".

@thomcc
Copy link
Member

thomcc commented Mar 18, 2021

I'm pretty sure that's not quite right, as it depends where the NaN comes from. Operations like abs() are required to unset the sign bit even if the input is NaN.

@workingjubilee
Copy link
Member

You are correct, if a NaN has its sign changed by abs or, indeed, by being the first operand to copySign, it then does have a specified sign bit. However, if the NaN is the second operand to copySign, the resulting value does not seem to be guaranteed. Presumably if you set a NaN's sign bit, using abs and friends, and then use it as an operand, you can rely on the resulting sign bit, but perhaps not before?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-const-eval Area: Constant evaluation (MIR interpretation) A-floating-point Area: Floating point numbers and arithmetic A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants