Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM assertion when using i128 with the NVPTX #38824

Closed
japaric opened this issue Jan 4, 2017 · 31 comments · Fixed by #40257
Closed

LLVM assertion when using i128 with the NVPTX #38824

japaric opened this issue Jan 4, 2017 · 31 comments · Fixed by #40257
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. O-NVPTX Target: the NVPTX LLVM backend for running rust on GPUs, https://llvm.org/docs/NVPTXUsage.html

Comments

@japaric
Copy link
Member

japaric commented Jan 4, 2017

This effectively means you can't compile core for these targets.

STR

$ edit foo.rs && cat $_
#![crate_type = "lib"]
#![feature(i128_type)]
#![feature(lang_items)]
#![feature(no_core)]
#![no_core]

fn foo() -> i128 { 0 }

#[lang = "copy"]
trait Copy {}

#[lang = "sized"]
trait Sized {}
$ edit nvptx64-nvidia-cuda.json && cat $_
{
  "arch": "nvptx64",
  "cpu": "sm_20",
  "data-layout": "e-i64:64-v16:16-v32:32-n16:32:64",
  "llvm-target": "nvptx64-nvidia-cuda",
  "max-atomic-width": 0,
  "os": "cuda",
  "panic-strategy": "abort",
  "target-endian": "little",
  "target-pointer-width": "64"
}
$ rustc --target nvptx64-nvidia-cuda foo.rs
rustc: /buildslave/rust-buildbot/slave/nightly-dist-rustc-linux/build/src/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp:2492: virtual llvm::SDValue llvm::NVPTXTargetLowering::LowerReturn(llvm::SDValue, llvm::CallingConv::ID, bool, const llvm::SmallVectorImpl<llvm::ISD::OutputArg>&, const llvm::SmallVectorImpl<llvm::SDValue>&, const llvm::SDLoc&, llvm::SelectionDAG&) const: Assertion `ValVTs.size() == OutVals.size() && "Bad return value decomposition"' failed.

Meta

$ rustc -V
rustc 1.16.0-nightly (468227129 2017-01-03)

Already Fixed

$ cat msp430.json && cat $_
{
  "arch": "msp430",
  "asm-args": ["-mcpu=msp430"],
  "data-layout": "e-m:e-p:16:16-i32:16:32-a:16-n8:16",
  "executables": true,
  "linker": "msp430-elf-gcc",
  "llvm-target": "msp430",
  "max-atomic-width": 0,
  "no-integrated-as": true,
  "os": "none",
  "panic-strategy": "abort",
  "relocation-model": "static",
  "target-endian": "little",
  "target-pointer-width": "16",
  "vendor": "unknown"
}
$ rustc --target msp430 foo.rs
Return operand #4 has unhandled type i16
UNREACHABLE executed at /buildslave/rust-buildbot/slave/nightly-dist-rustc-linux/build/src/llvm/lib/CodeGen/CallingConvLower.cpp:114!
@japaric japaric added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. O-NVPTX Target: the NVPTX LLVM backend for running rust on GPUs, https://llvm.org/docs/NVPTXUsage.html labels Jan 4, 2017
@japaric
Copy link
Member Author

japaric commented Jan 4, 2017

This may have been caused by the i128 PR ( #38482 ). No LLVM assertion is raised if I compile core with --cfg stage0, which removes the i128 stuff.

We could try to fix this or we could cfg away the i128 stuff for the nvptx targets in the meantine. The LLVM assertions render the NVPTX backend unusable in the nightlies.

cc @est31 @nagisa Have you seen any LLVM assertion like the ones above while working on implementing i128?
cc @alexcrichton cfg?

@est31
Copy link
Member

est31 commented Jan 4, 2017

@japaric no, I can't remember seeing such an assertion. It seems to be in PTX related code. but its very likely that i128 broke some of the non tier1 platforms. You can only tell whether they break once they actually do :/

Can you try to generate a minimum reproducible example? Then we might know more...

@japaric
Copy link
Member Author

japaric commented Jan 4, 2017

Seems like the msp430 target broke too:

$ rustc --target msp430 $(rustc --print sysroot)/lib/rustlib/src/rust/src/libcore/lib.rs --emit=asm
Call result #4 has unhandled type i16
UNREACHABLE executed at /buildslave/rust-buildbot/slave/nightly-dist-rustc-linux/build/src/llvm/lib/CodeGen/CallingConvLower.cpp:167!

with

{
  "arch": "msp430",
  "asm-args": ["-mcpu=msp430"],
  "data-layout": "e-m:e-p:16:16-i32:16:32-a:16-n8:16",
  "executables": true,
  "linker": "msp430-elf-gcc",
  "llvm-target": "msp430",
  "max-atomic-width": 0,
  "no-integrated-as": true,
  "os": "none",
  "panic-strategy": "abort",
  "relocation-model": "static",
  "target-endian": "little",
  "target-pointer-width": "16",
  "vendor": "unknown"
}

Using --cfg stage0 works fine.

cc @pftbest

@japaric japaric changed the title LLVM assertion when compiling core to PTX LLVM assertion when compiling core for NVPTX / MSP430 Jan 4, 2017
@japaric
Copy link
Member Author

japaric commented Jan 4, 2017

@est31 Examples

#![crate_type = "lib"]
#![feature(i128_type)]
#![feature(lang_items)]
#![feature(no_core)]
#![no_core]

fn foo() -> i128 {
    0
}

#[lang = "copy"]
trait Copy {}

#[lang = "sized"]
trait Sized {}
$ rustc --target nvptx64-nvidia-cuda foo.rs
rustc: /buildslave/rust-buildbot/slave/nightly-dist-rustc-linux/build/src/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp:2492: virtualllvm::SDValue llvm::NVPTXTargetLowering::LowerReturn(llvm::SDValue, llvm::CallingConv::ID, bool, const llvm::SmallVectorImpl<llvm::ISD::OutputArg>&, const llvm::SmallVectorImpl<llvm::SDValue>&, const llvm::SDLoc&, llvm::SelectionDAG&) const: Assertion `ValVTs.size() == OutVals.size() && "Bad return value decomposition"' failed.
$ rustc --target msp430 foo.rs
Return operand #4 has unhandled type i16
UNREACHABLE executed at /buildslave/rust-buildbot/slave/nightly-dist-rustc-linux/build/src/llvm/lib/CodeGen/CallingConvLower.cpp:114!

#![crate_type = "lib"]
#![feature(i128_type)]
#![feature(lang_items)]
#![feature(no_core)]
#![no_core]

fn foo(x: i128) {}

#[lang = "copy"]
trait Copy {}

#[lang = "sized"]
trait Sized {}
$ rustc --target nvptx64-nvidia-cuda foo.rs
rustc: /buildslave/rust-buildbot/slave/nightly-dist-rustc-linux/build/src/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7952: void llvm::SelectionDAGISel::LowerArguments(const llvm::Function&): Assertion `InVals.size() == Ins.size() && "LowerFormalArguments didn't emit the correct number of values!"' failed.

@japaric japaric changed the title LLVM assertion when compiling core for NVPTX / MSP430 LLVM assertion when using i128 with the NVPTX and MSP430 targets Jan 4, 2017
@pftbest
Copy link
Contributor

pftbest commented Jan 4, 2017

msp430 EABI doesn't specify how to return 128 bit values, and LLVM does not support them on msp430, so I don't see an easy fix for this.

@alexcrichton
Copy link
Member

These are tier 3 platforms, so basically anything goes IMO in terms of language guarantees. If they don't support i128 at a fundamental level then it seems like if we want them to compile we'll have to cfg out support.

@japaric
Copy link
Member Author

japaric commented Jan 4, 2017

I'm trying to understand this better:

  • If the ABI does not specify how 128-bit values are passed/returned, does that mean that the architecture doesn't support 128-bit values at the fundamental level?

  • Do all the ABIs of the all the other platform 2 targets (x86, ARM, MIPS, PowerPC, SystemZ) specify how 128-bit values are returned?

  • (What about wasm and asm.js? Do those even have an ABI? How does the 128-bit stuff even works there?)

I'm wondering if we should:

  • Add a supports_i128 field to target specifications. For targets that don't support i128 at the fundamental level.

  • Or, maybe add Cargo feature to core to cfg away the i128 stuff.

  • Or unilateraly cfg away i128 in core for nvptx and msp430.

I personally don't like either option. To me this seems like a limitation in what LLVM can do today but that can be fixed in the future.

Also, if we had pure MIR rlibs this wouldn't be a problem at all because the i128 stuff would never be passed to LLVM if it wasn't being used by the final application.

To sum it up, I'm not sure what to do here ...

@pftbest
Copy link
Contributor

pftbest commented Jan 4, 2017

If the ABI does not specify how 128-bit values are passed/returned, does that mean that the architecture doesn't support 128-bit values at the fundamental level?

Right now on MSP430 values from the functions are returned using 4 16bit registers r12-r15. So in total you can return only 64 bits of data. There is an exception though, for returning large structs, the data is placed on the stack and reference is placed in the registers. So I guess it will be possible to bodge some hack that will treat i128 like a struct, but I don't know if anyone tried to implement 128 bit values on 16bit cpu, so it will be a new territory and potentially a lot of work for a feature that may not be useful at all on a small MCU.

@japaric
Copy link
Member Author

japaric commented Jan 4, 2017

@pftbest yeah, but if LLVM can generate code for a function that returns [u8; 16] then it should be able to use the same trick to return an i128. It's just that noboby has taught it how to do that. (i128 ops are totally different though but at least LLVM should emit intrinsics instead of raising assertions)

I don't know if anyone tried to implement 128 bit values on 16bit cpu

How about i128 on 8-bit AVR? 😄

@est31
Copy link
Member

est31 commented Jan 4, 2017

If the ABI does not specify how 128-bit values are passed/returned, does that mean that the architecture doesn't support 128-bit values at the fundamental level?

AFAIK, if Rust code calls Rust code, the ABI is totally irrelevant. We can do whatever we want. However, it starts getting a problem once we start talking to non Rust places, e.g. for FFI, or when LLVM emits a call to i128 intrinsics.

  • For FFI we might want to end up forbidding extern "C" fn's and #[repr(C)] structs that have i128 for targets where i128 is not specified by the convention or buggy in LLVM.

  • With intrinsics, there has been major pain on Windows with because LLVM assumed a different ABI than what Rust provided with extern "C". It was "fixed" by introducing the "unadjusted" ABI, and it might work for nvptx64/msp430 as well, but it can just as well still cause an assertion failure.

Real need for i128 is only for platforms that want to cryptographic libraries, or maybe even std, as those two places might want to use i128. The higher stages of the compiler itself require i128 support as well, so if your platform wants to be a host platform for rustc, you'll need to support i128.

I'm okay with any of the three options (flag in target.json, feature for core, only setting it for those two platforms) for these platforms.

@nagisa
Copy link
Member

nagisa commented Jan 4, 2017

If the ABI does not specify how 128-bit values are passed/returned, does that mean that the architecture doesn't support 128-bit values at the fundamental level?

No.

Do all the ABIs of the all the other platform 2 targets (x86, ARM, MIPS, PowerPC, SystemZ) specify how 128-bit values are returned?

No. Namely the x64 windows ABI is kinda fuzzy about how i128 values are handled so carefully reading it we arrived at a conclusion that they ought to be handled the same way as regular 16-byte structs on 64 bit window platforms.

(What about wasm and asm.js? Do those even have an ABI? How does the 128-bit stuff even works there?)

Add a supports_i128 field to target specifications. For targets that don't support i128 at the fundamental level.

A target cannot not support i128 at a fundamental level the same way a target cannot not support writing turing complete software.

To sum it up, I'm not sure what to do here ...

IMO what we should do here is to fudge the ABIs somewhat to take and return arguments via pointers (or the same way (i64, i64) would be handled) on these targets.

C ABI will be more complex; as @est31 already said there will always be at least one user of the C ABI with i128 arguments and stuff – LLVM with their intrinsics. The question here is how does LLVM lower to the intrinsics on these targets. If the backend does not lower and asserts for whatever reason, the targets have what’s essentially a broken backend implementation.

Which seems to be broken for at least NVPTX (test.ll here) in exactly the place where I’d expect it to be broken if it was broken:

$ llc test.ll --filetype=asm -o - -march=nvptx
//
// Generated by LLVM NVPTX Back-End
//

.version 3.2
.target sm_20
.address_size 32

#0 0x00007f8a504a0698 llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/usr/bin/../lib/libLLVM-3.9.so+0x6d6698)
#1 0x00007f8a5049e5fe llvm::sys::RunSignalHandlers() (/usr/bin/../lib/libLLVM-3.9.so+0x6d45fe)
#2 0x00007f8a5049e73a (/usr/bin/../lib/libLLVM-3.9.so+0x6d473a)
#3 0x00007f8a4f6d70b0 __restore_rt (/usr/bin/../lib/libc.so.6+0x330b0)
#4 0x00007f8a51855d30 llvm::NVPTXTargetLowering::getArgumentAlignment(llvm::SDValue, llvm::ImmutableCallSite const*, llvm::Type*, unsigned int) const (/usr/bin/../lib/libLLVM-3.9.so+0x1a8bd30)
#5 0x00007f8a5185b76a llvm::NVPTXTargetLowering::LowerCall(llvm::TargetLowering::CallLoweringInfo&, llvm::SmallVectorImpl<llvm::SDValue>&) const (/usr/bin/../lib/libLLVM-3.9.so+0x1a9176a)
#6 0x00007f8a50963e98 llvm::TargetLowering::LowerCallTo(llvm::TargetLowering::CallLoweringInfo&) const (/usr/bin/../lib/libLLVM-3.9.so+0xb99e98)
#7 0x00007f8a509efb71 llvm::TargetLowering::makeLibCall(llvm::SelectionDAG&, llvm::RTLIB::Libcall, llvm::EVT, llvm::ArrayRef<llvm::SDValue>, bool, llvm::SDLoc const&, bool, bool) const (/usr/bin/../lib/libLLVM-3.9.so+0xc25b71)
#8 0x00007f8a508ed938 (/usr/bin/../lib/libLLVM-3.9.so+0xb23938)
#9 0x00007f8a508fe045 (/usr/bin/../lib/libLLVM-3.9.so+0xb34045)
#10 0x00007f8a50906e47 (/usr/bin/../lib/libLLVM-3.9.so+0xb3ce47)
#11 0x00007f8a509075c7 llvm::SelectionDAG::LegalizeTypes() (/usr/bin/../lib/libLLVM-3.9.so+0xb3d5c7)
#12 0x00007f8a509c44dd llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/usr/bin/../lib/libLLVM-3.9.so+0xbfa4dd)
#13 0x00007f8a509cb75e llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/usr/bin/../lib/libLLVM-3.9.so+0xc0175e)
#14 0x00007f8a509ce33b llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/usr/bin/../lib/libLLVM-3.9.so+0xc0433b)
#15 0x00007f8a506cd1d1 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/usr/bin/../lib/libLLVM-3.9.so+0x9031d1)
#16 0x00007f8a505657c2 llvm::FPPassManager::runOnFunction(llvm::Function&) (/usr/bin/../lib/libLLVM-3.9.so+0x79b7c2)
#17 0x00007f8a50565b4b llvm::FPPassManager::runOnModule(llvm::Module&) (/usr/bin/../lib/libLLVM-3.9.so+0x79bb4b)
#18 0x00007f8a50565e74 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/usr/bin/../lib/libLLVM-3.9.so+0x79be74)
#19 0x000000000041aacc (llc+0x41aacc)
#20 0x00000000004113e0 (llc+0x4113e0)
#21 0x00007f8a4f6c4291 __libc_start_main (/usr/bin/../lib/libc.so.6+0x20291)
#22 0x000000000041146a (llc+0x41146a)
Stack dump:
0.	Program arguments: llc test.ll --filetype=asm -o - -march=nvptx 
1.	Running pass 'Function Pass Manager' on module 'test.ll'.
2.	Running pass 'NVPTX DAG->DAG Pattern Instruction Selection' on function '@div'
fish: “llc test.ll --filetype=asm -o -…” terminated by signal SIGSEGV (Address boundary error)

That being said, since the C compilers do not support i128 for those targets, the actual extern "C" interoperation does not matter whatsoever.

All that’s really needs to be fixed is LLVM’s lowering of instructions to intrinsics using whatever ABI that works, even if its passing in arguments and taking return value via pointers.

I, myself, am very strongly against splitting the language feature sets depending on the target.

@androm3da
Copy link
Contributor

How do we break the stalemate?

If we submitted changes upstream for each of the smaller targets w/o support for i128 (providing indirection like large structs) it seems like it would meet resistance. As @pftbest says, it's likely not considered useful since there's many targets without any native registers or support for such large values.

But we also also have reluctance to limit this type to only appropriate targets.

Does the "vision for 2017 cycle" provide any guidance? Does "provide a more seamless FFI story" help push us in one direction or the other?

@est31
Copy link
Member

est31 commented Jan 23, 2017

@androm3da regardless whether useful or not, we do need to support i128 on all targets, otherwise people will start using libraries that provide it across platforms. Even if it needs to be "virtualized" in some way, and not wholly put into registers. In the worst case we'd have to provide some translation i128 -> (u64,u64) before passing things to LLVM, but it will affect things like alignment and generally looks like an ugly workaround.

About FFI, I guess we can't really provide any FFI for i128 on targets where there is no C support for it, as generally the calling conventions only specify things that are part of C.

@nagisa
Copy link
Member

nagisa commented Jan 23, 2017

it's likely not considered useful since there's many targets without any native registers or support for such large values.

None of the targets Rust supports have native support for 128-bit integers, other than the fact that some can implement unchecked division and multiplication involving 128-bit integer operand with a single insn.

If we submitted changes upstream for each of the smaller targets w/o support for i128 (providing indirection like large structs) it seems like it would meet resistance.

Why? All that needs to be done here is to have the backends fixed. LLVM is supposed to be agnostic over bitwidth of integer, however useful or not somebody on the internet perceives such functionality to be. In other words, its a backend bug.

How do we break the stalemate?

So, its probably pretty clear that all it takes is to write a patch to fix bugs in these backends wrt operations on large values. I have no knowledge of these targets, am not familiar with the backends and haven’t the hardware to test any fixes on, so these patches will have to be written by somebody else.


Right now on MSP430 values from the functions are returned using 4 16bit registers r12-r15. So in total you can return only 64 bits of data. There is an exception though, for returning large structs, the data is placed on the stack and reference is placed in the registers. So I guess it will be possible to bodge some hack that will treat i128 like a struct, but I don't know if anyone tried to implement 128 bit values on 16bit cpu, so it will be a new territory and potentially a lot of work for a feature that may not be useful at all on a small MCU.

Late answer, but I’ve noticed this only now.

This observation is wrong in multiple places. Firstly, sret (a.k.a. returning via pointer) is not a hack. Neither is it an exception. That’s what pretty much every single ABI¹ out there does for values that do not fit into register(s). It is most likely true that MSP430 ABI does not specify what to do with PODs this large, so it falls on us to pick whatever.

¹: For a less obscure example, x64 ABI used by code targetting Windows mandates sret to be used for any non-vector values over 64 bit in size, even though they could be fit it into registers. SysV ABI starts returning stuff via sret after some size threshold as well.

@awygle
Copy link

awygle commented Jan 23, 2017

The MSP430 ABI specifies that aggregates larger than 32 bits are passed and returned via pointer. The backend just doesn't implement sret correctly yet (and in fact is implemented targeting an old and outdated ABI in any case). I'm working on fixing this in LLVM. I'd bet that similar problems explain the NVPTX problems but I know nothing about that platform so I can't say that for sure.

@pftbest
Copy link
Contributor

pftbest commented Jan 23, 2017

@nagisa I also agree that fixing backend is the right thing to do. I just think that it is not that easy as you say it is. MSP430 is the only 16bit architecture in LLVM, and this creates some trouble by itself.

@awygle Good to know that someone is working on this.

alexcrichton added a commit to alexcrichton/rust that referenced this issue Mar 10, 2017
LLVM: Update submodule to include SRet support patch for MSP430.

This patch is needed to fix rust-lang#38824 on MSP430.
I know that LLVM 4 is coming soon, but it would be great to have at least one working nightly before the update.

cc @awygle
r? @alexcrichton
alexcrichton added a commit to alexcrichton/rust that referenced this issue Mar 10, 2017
LLVM: Update submodule to include SRet support patch for MSP430.

This patch is needed to fix rust-lang#38824 on MSP430.
I know that LLVM 4 is coming soon, but it would be great to have at least one working nightly before the update.

cc @awygle
r? @alexcrichton
bors added a commit that referenced this issue Mar 12, 2017
LLVM: Update submodule to include SRet support patch for MSP430.

This patch is needed to fix #38824 on MSP430.
I know that LLVM 4 is coming soon, but it would be great to have at least one working nightly before the update.

cc @awygle
r? @alexcrichton
@nagisa
Copy link
Member

nagisa commented Mar 12, 2017

@pftbest does that LLVM patch really fix NVPTX as well?

@pftbest
Copy link
Contributor

pftbest commented Mar 12, 2017

@nagisa No, it doesn't. this should be reopened for NVPTX.

@eddyb eddyb reopened this Mar 12, 2017
@pftbest
Copy link
Contributor

pftbest commented Mar 12, 2017

BTW, thanks to @awygle for fixing this in LLVM upstream.

@japaric japaric changed the title LLVM assertion when using i128 with the NVPTX and MSP430 targets LLVM assertion when using i128 with the NVPTX Apr 7, 2017
@japaric
Copy link
Member Author

japaric commented Apr 7, 2017

Triage: MSP430 has been fixed. No change in the NVPTX backend.

@lilith
Copy link

lilith commented Jun 18, 2017

What needs to be done to get NVPTX working again?

@awygle
Copy link

awygle commented Jun 19, 2017

You can see the patch I did for the MSP430 above - if it's the same problem you basically just have to add sret (structure return) support to the LLVM backend. I don't know anything about NVPTX but I'd bet it's pretty much the same procedure as what I did for the MSP.

@japaric
Copy link
Member Author

japaric commented Jun 20, 2017

@nathanaeljones Note that you can use the NVPTX backend today with the nightly compiler if you use a fork of the libcore crate that doesn't include 128 bit integers. Instructions here.

@denzp
Copy link
Contributor

denzp commented Jun 21, 2017

I have some progress on the issue.

First, I tried a hacky workaround with replacing i128 with <2 x i64> in arguments and return values with a custom FunctionPass.

define i128 @internal(i128, i128) unnamed_addr #0 {
  %a = mul i128 %0, %1
  ret i128 %a
}

Can be transformed to:

define <2 x i64> @internal(<2 x i64>, <2 x i64>) unnamed_addr #0 {
  %3 = bitcast <2 x i64> %0 to i128
  %4 = bitcast <2 x i64> %1 to i128
  %5 = mul i128 %4, %3
  %6 = bitcast i128 %5 to <2 x i64>
  ret <2 x i64> %6
}

And compiled to PTX with vanilla LLVM (llc):

.visible .func  (.param .align 16 .b8 func_retval0[16]) internal(
	.param .align 16 .b8 internal_param_0[16],
	.param .align 16 .b8 internal_param_1[16]
)
{
	.reg .b64 	%rd<11>;

	ld.param.u64 	%rd1, [internal_param_0];
	ld.param.u64 	%rd2, [internal_param_0+8];
	ld.param.u64 	%rd3, [internal_param_1+8];
	ld.param.u64 	%rd4, [internal_param_1];
	mul.lo.s64 	%rd5, %rd4, %rd2;
	mul.hi.u64 	%rd6, %rd4, %rd1;
	add.s64 	%rd7, %rd6, %rd5;
	mul.lo.s64 	%rd8, %rd3, %rd1;
	add.s64 	%rd9, %rd7, %rd8;
	mul.lo.s64 	%rd10, %rd4, %rd1;
	st.param.v2.b64	[func_retval0+0], {%rd10, %rd9};
	ret;
}

Then, I made less hacky workaround patching LLVM NVPTX backend so it became possible to compile the code with i128 without additional passes.

target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

define i128 @internal(i128, i128) unnamed_addr #0 {
start:
  %a = mul i128 %0, %1
  ret i128 %a
}

define ptx_kernel void @foo(i128, i128, i128*) unnamed_addr #0 {
start:
  %a = call i128 @internal(i128 %0, i128 %1)
  store i128 %a, i128* %2

  ret void
}

attributes #0 = { norecurse nounwind readnone }

Is compiled with patched llc to pretty much valid and working PTX assembly ;)

//
// Generated by LLVM NVPTX Back-End
//

.version 3.2
.target sm_20
.address_size 64

	// .globl	internal
                                        // @internal
.visible .func  (.param .align 8 .b8 func_retval0[16]) internal(
	.param .align 8 .b8 internal_param_0[16],
	.param .align 8 .b8 internal_param_1[16]
)
{
	.reg .b64 	%rd<11>;

// BB#0:                                // %start
	ld.param.u64 	%rd1, [internal_param_0+8];
	ld.param.u64 	%rd2, [internal_param_1];
	ld.param.u64 	%rd3, [internal_param_1+8];
	ld.param.u64 	%rd4, [internal_param_0];
	mul.lo.s64 	%rd5, %rd4, %rd3;
	mul.hi.u64 	%rd6, %rd4, %rd2;
	add.s64 	%rd7, %rd6, %rd5;
	mul.lo.s64 	%rd8, %rd1, %rd2;
	add.s64 	%rd9, %rd7, %rd8;
	mul.lo.s64 	%rd10, %rd4, %rd2;
	st.param.b64	[func_retval0+0], %rd10;
	st.param.b64	[func_retval0+8], %rd9;
	ret;
}

	// .globl	foo
.visible .entry foo(
	.param .align 8 .b8 foo_param_0[16],
	.param .align 8 .b8 foo_param_1[16],
	.param .u64 foo_param_2
)                                       // @foo
{
	.reg .b64 	%rd<9>;

// BB#0:                                // %start
	ld.param.u64 	%rd1, [foo_param_0];
	ld.param.u64 	%rd2, [foo_param_0+8];
	ld.param.u64 	%rd3, [foo_param_2];
	cvta.to.global.u64 	%rd4, %rd3;
	ld.param.u64 	%rd5, [foo_param_1];
	ld.param.u64 	%rd6, [foo_param_1+8];
	{ // callseq 0
	.reg .b32 temp_param_reg;
	.param .align 8 .b8 param0[16];
	st.param.b64	[param0+0], %rd1;
	st.param.b64	[param0+8], %rd2;
	.param .align 8 .b8 param1[16];
	st.param.b64	[param1+0], %rd5;
	st.param.b64	[param1+8], %rd6;
	.param .align 8 .b8 retval0[16];
	call.uni (retval0), 
	internal, 
	(
	param0, 
	param1
	);
	ld.param.b64	%rd7, [retval0+0];
	ld.param.b64	%rd8, [retval0+8];
	} // callseq 0
	st.global.u64 	[%rd4+8], %rd8;
	st.global.u64 	[%rd4], %rd7;
	ret;
}

But another i128 related problems appears. When instead of:

%a = mul i128 %0, %1

I write:

%a = sdiv i128 %0, %1

I receive an error:

LLVM ERROR: Cannot select: t34: i64 = ExternalSymbol'__divti3'
In function: internal

And similar error when I try to compile libcore:

LLVM ERROR: Cannot select: t36: i64 = ExternalSymbol'__muloti4'
In function: _ZN4core3num22_$LT$impl$u20$i128$GT$14from_str_radix17h648079ef1c5aadbdE

Does anybody have any suggestions about the problems? Looks like PTX doesn't have an instructions for the functions, and LLVM can't expand them.

@awygle
Copy link

awygle commented Jun 21, 2017

Looks like the expansion from i128 to i64 is going OK but the i64 sdiv/mulo instructions aren't properly implemented. Not sure why that might be... it kind of looks like it's trying to assign the address of a library call and that's not working. Possibly the way you implemented the i128 expansion prevented a subsequent i64 expansion from running but I can't tell without seeing the code.

Run llc with -print-before-all and/or -print-machineinstrs to see where things diverge from what you expect. Look up in your platform ABI what the right action to take for 64-bit divides actually is. Post a link to your changes and someone may be able to take a look at them and see the problem.

@est31
Copy link
Member

est31 commented Jun 21, 2017

@denzp

Looks like PTX doesn't have an instructions for the functions, and LLVM can't expand them.

Traditionally, in LLVM, __muloti4 is implemented by the compiler-rt library. However, its disabled on most platforms. Not sure why, but compiler-rt is written in C so they probably only can support i128 on platforms where they also provide C support, and clang's C support for i128 is right now very limited. Rust however does provide the __muloti4 function, inside the libcompiler_builtins crate (that is a wrapper for compiler-rt). There are good chances it works already!

@denzp
Copy link
Contributor

denzp commented Jun 21, 2017

@awygle
It has pretty expected output before and after each pass. I think more interesting would be output with -debug-only isel, but I don't really understand it.

Here is an output of the next commands:

llc -O0 -debug-only isel test.ll 2> isel.log  
llc -O0 -print-before-all -print-after-all -print-machineinstrs test.ll 2> passes.log

I tried to make changes over LLVM as minimal as possible: diff.

@est31
I'm quite confused about compiler-rt and libcompiler_builtins. Should they be compiled for target or host? If we can build at least libcompiler_builtins for NVPTX, then can we just link against it?

It still require some more changes for LLVM to be able to threat libcalls as normal calls. But then, all we would need is to link against libcompiler_builtins. This blocked by #38787 but might be solved with my experimental linker.

@est31
Copy link
Member

est31 commented Jun 21, 2017

@denzp

Should they be compiled for target or host?

For the target. At least its that way on other platforms. What is LLVM doing on 32 bit NVPTX when it encounters a 64 bit operation? On normal targets, this is one of the places compiler-rt gets used. Or idk, to cast a float to an int and back.

@est31
Copy link
Member

est31 commented Jun 21, 2017

@denzp by the way, I didn't congratulate you yet for your progress. Really great work, loving it!

@Mark-Simulacrum Mark-Simulacrum added the C-bug Category: This is a bug. label Jul 26, 2017
@bheisler
Copy link

I'd like to see this move forward, so I'll summarize the above. Please correct me if I'm wrong:

  • This issue was fixed for the MSP430 backend by submitting a patch to LLVM.
  • @denzp worked on a patch to do the same for NVPTX, but it was not submitted to LLVM. I imagine that since that was a year ago, the patch will need some work to bring it up to date.

Is this problem still happening with more recent builds? Perhaps it's been fixed by upstream in the meantime. If it is still a problem, what would have to be done to fix it?

@denzp
Copy link
Contributor

denzp commented Aug 15, 2018

Well, my patch related to i128 lowering landed in LLVM about year ago.

Now we can compile simple examples without any problems:

#![feature(lang_items)]
#![feature(no_core)]
#![crate_type = "lib"]
#![no_core]

#[no_mangle]
pub fn foo() -> i128 {
    1
}

#[lang = "copy"]
trait Copy {}

#[lang = "sized"]
trait Sized {}

with RUST_TARGET_PATH="targets" rustc --target nvptx64-nvidia-cuda test_i128.rs --emit asm

which produces the PTX assembly:

.version 3.2
.target sm_20
.address_size 64

.visible .func  (.param .align 8 .b8 func_retval0[16]) foo()
{
	.reg .b64 	%rd<3>;

	mov.u64 	%rd1, 1;
	st.param.b64 	[func_retval0+0], %rd1;
	mov.u64 	%rd2, 0;
	st.param.b64 	[func_retval0+8], %rd2;
	ret;

}

And as for me, the issue can be closed. I was able to compile a kernel from japaric/nvptx with latest nightly. So the hack with custom libcore is obsolete now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. O-NVPTX Target: the NVPTX LLVM backend for running rust on GPUs, https://llvm.org/docs/NVPTXUsage.html
Projects
None yet
Development

Successfully merging a pull request may close this issue.