Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look into ways get_function_address could safely return a function #5

Closed
TheDan64 opened this issue Jul 3, 2017 · 16 comments
Closed

Comments

@TheDan64
Copy link
Owner

TheDan64 commented Jul 3, 2017

Currently it returns an address value pointer, which is in itself safe - but to actually use it, the user needs to unsafely transmute it into a rust extern "C" function. I'm wondering if this could be accomplished under the hood by "deserializing" the function's FunctionType (which has all of the needed type info) into rust types. Maybe writing a Serde plugin could help when it comes to complex types like StructType & PointerType.

Will probably need to account for edge cases - is VoidType deserializable? If so, would that just be () and is that ever useful? Is *() / &() valid in rust? Etc

@TheDan64 TheDan64 self-assigned this Jul 3, 2017
@TheDan64 TheDan64 added this to the 0.2.0 milestone Jul 3, 2017
@TheDan64 TheDan64 removed their assignment Jul 3, 2017
@TheDan64
Copy link
Owner Author

TheDan64 commented Jul 16, 2017

I'm starting to think that this might not be possible, and possible as a safe wrapper:

  1. While FunctionType may have all the required info such as return type, argument types, etc. It would only be accessible at runtime, so generation of a function signature at runtime is not possible I believe (we can't return a function generated at runtime)
  2. Even if the above wasn't a problem, and we could provide a safe wrapper for casting the address to an actual function, that function is very probably unsafe. Rust is not able to provide any guarantees about that function generated by LLVM. And indeed, it's very easy to shoot yourself in the foot.

So, this may just be an area where unsafety is required on the user's part 😢.

One slightly safer approach (that doesn't solve this problem but is a related idea) might be to return a non-copy address wrapper reference &MemoryAddress so that the underlying copy-able usize address isn't as easily abused and passed out of the scope/lifetime of the execution engine, but can still be transmuted to a function signature.

@remexre
Copy link

remexre commented Aug 7, 2017

I would just have a separate get_function_handle::<Args, Ret>() method that returns a Option<&Fn<Args, Output=Ret>>, doing the check at runtime. This definitely requires nightly though, and would probably be frowned upon (the Fn<Args, Output=Ret> bit is unstable)

@TheDan64
Copy link
Owner Author

TheDan64 commented Aug 7, 2017

Oh, neat, thanks! I didn't know something like this existed. I'll have to look into it more. What's the name for this feature? I could probably just have an experimental crate feature for stuff like this, so that the whole crate doesn't have to move to nightly.

@TheDan64
Copy link
Owner Author

TheDan64 commented Aug 7, 2017

Also, does this let you specify that the generated Fn is unsafe somehow?

@remexre
Copy link

remexre commented Aug 7, 2017

What's the name for this feature?

I think unboxed_closures; at least, that's the feature name for it.

Also, does this let you specify that the generated Fn is unsafe somehow?

I don't think so, because the Fn trait doesn't have the function marked as unsafe. Maybe make the get_function_handle method unsafe? (Although I agree, that's not 100% satisfying.)

@TheDan64
Copy link
Owner Author

TheDan64 commented Aug 7, 2017 via email

@Michael-F-Bryan
Copy link
Contributor

In the tutorial I'm working on I noticed how awkward it is to use the execution engine's get_function_address() method. Besides the fact that it should return a usize because we're using pointers, I'd expect the lifetime of the returned symbol to be tied to the execution engine.

The libloading crate does this really well when you are loading function pointers out of a dynamic library. This will always be an innately unsafe operation because there's no way to statically prove a function pointer generated at runtime has the correct signature, but by using Rust lifetimes and returning something like Symbol<'ee, T> where we use the T to infer what signature to transmute the function pointer to, we should be able to make this easier to use and reduce the foot guns.

In general, I feel like this crate can benefit a lot from using lifetimes more. I've already encountered a couple segfaults because I didn't fully understand inkwell/LLVM's ownership model and because it's not encoded in the typesystem there's no way for the compiler to help prevent these issues.

@TheDan64
Copy link
Owner Author

Good point, it maybe should be a usize.

The method itself should not be unsafe to call, but the result should be unsafe to manipulate, I think (but could be wrong). I've also thought about using the typesystem and lifetimes here to tie it to the EE somehow and my most recent idea was to return a &FunctionAddress which is just a thin wrapper tied to the EE similarly to your example. But yeah, thanks for the information, it's certainly a good resource.

I do want to get more issues solved via the type system, but I'm holding that off until the second iteration/version(v0.2.0) of inkwell(I've jotted down ideas here: #8) because it's a lot of additional work and I just want to focus on core functionality being complete first across multiple LLVM versions so that we can get an initial version out the door. And yeah, LLVM's ownership model is very confusing, and I've just been piecing it together myself as I go

@CAD97
Copy link

CAD97 commented Apr 1, 2018

(I saw this a while back when looking around at LLVM in Rust, so this has been sitting in my mind for a while. Then I had an idea, so) Just to spitball the idea:

Maybe you could use Frunk's HList to allow a nice safe interface to the (unsafe) function.

My first idea was that you could expose an interface that gave back a LLVMJitFn<...> where the interface for unsafe fn sum(_: i32, _: i32) -> i32 would be something like llvm_sum.arg(5).arg(5).call(), where each step effectively built the call, typechecked with the information used to build the fn in the first place.

Then I remembered seeing the blog posts about Frunk's HList. Above I've linked the actual blog post, but an HList is basically a type-level cons list, so you can have types like (i32, (i32, ()) and talk about them in a somewhat sane manner.

So, using unsafe fn sum(_: i32, _: i32) -> i32 as an example, the idea is that you'd provide a get_function call on the fully constructed JIT fn that returns some LLVMJitFn<HList![i32, i32], i32>, which would then provide the unsafe fn LLVMJitFn::invoke(arguments: HList![i32, i32]) -> i32.

Maybe if in the future the Fn trait stabilizes in some fashion and supports being tagged as unsafe to call, you could use that. But I think using HList you can "reinvent" the Fn trait in a not-too-cumbersome manner.

@Michael-F-Bryan
Copy link
Contributor

@CAD97 I like the idea of using something like a cons list for passing in arguments, but there may be issues trying to shoehorn a LLVM function signature (an instance of FunctionType) into something at the type level.

The big problem I see is you can't guarantee a function signature at compile time, because LLVM doesn't even generate the function until runtime. So about the best you can do is either provide some sort of builder type which will do runtime checks to validate the signature (llvm_sum.arg(5).arg(5).call()) or use libloading's get() which is essentially a smart transmute which uses a wrapper type to ensure the function symbol doesn't outlive where it came from.

Either way, I don't think it's sound to try and remove the unsafe. We're essentially plucking a random function pointer out of the JIT executor and then running it, there's no guarantee the pointer is valid or that it doesn't do unsafe stuff internally. We require people to use unsafe when calling into C for exactly the same reason.

@71
Copy link
Contributor

71 commented Apr 1, 2018

I also think it should stay unsafe. Using a wrapper type or fn(...) -> ... always requires an assumption to be made at compile-time. The only way I see of making this safe is by building the HList when creating the function (ie fn_type(return_type).arg<i32>(i32type).arg<i64>(i64type)), but it would get messy quickly. Plus, it would require Any and a lot of runtime reflection-fu to ensure the argument given to arg corresponds to the Rust type.

@Michael-F-Bryan
Copy link
Contributor

@6a I've made a PR which replaces the old get_function_address() with a get_function() which returns a Symbol<F>, allowing people to call the function (via a Deref impl) and it also holds a Rc<LLVMExecutionEngineRef> so the symbol doesn't accidentally outlive its execution engine.

Under the hood it's essentially just the old get_function_address() with an additional pointer cast at the end.

I don't know if you can constrain F to be just a function pointer, so if anyone has any ideas, feel free to let me know.

@71
Copy link
Contributor

71 commented Apr 1, 2018

Looks like a good compromise to me, but it essentially depends on @TheDan64's opinion. To my knowledge, there aren't any fn constraints, no; however I do think it is statically possible to check that size_of<F>() == size_of<uint>().

@Michael-F-Bryan
Copy link
Contributor

I tried using the static_assertions crate/hack and unfortunately it doesn't look like we'll be able to use it here without proper static asserts and all the related const fn machinery 😞

It seems like F is almost too generic!

error[E0512]: transmute called with types of different sizes
   --> src/execution_engine.rs:230:9
    |
230 |         assert_eq_size!(F, usize); // `F` must be the size of a pointer
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: source type: F (this type's size can vary)
    = note: target type: usize (64 bits)
    = note: this error originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

@TheDan64
Copy link
Owner Author

TheDan64 commented Apr 1, 2018

This doesn't seem unreasonable but one problem I have is that deref is a safe call, when in reality the call is unsafe... Or do you still get an unsafe function from deref? Meant to write this on the PR 😂

I wonder if we could use the nightly call from https://doc.rust-lang.org/nightly/std/ops/trait.Fn.html#required-methods ? Seems to be runtime function-ish

@Michael-F-Bryan
Copy link
Contributor

@TheDan64 given #36 is merged, would we say this issue is resolved? You still need to use a bit of unsafe, but grabbing a function pointer out of a JIT compiler's memory is always going to be an unsafe operation. At least #36 makes sure you can only get unsafe extern "C" functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants