-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Introduce a mid-level IR (MIR) in the compiler that will drive borrowck, trans #1211
Conversation
cc @rust-lang/compiler |
Sounds nice. However, as written this makes even rvalues be non-SSA - we may want to be smarter on that front. We would want to do at least these optimizations on the MIR, to prevent codegen regressions: |
of it to make quality error messages. | ||
3. This representation should encode drops, panics, and other | ||
scope-dependent items explicitly. | ||
4. This representation does not have to be well-typed Rust, though it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well-typed Rust? The representation allows for some unsafe operations (e.g. unrestricted downcasts, unchecked indexing, calling unsafe functions) but should type-check.
@arielb1 I expect pure constant expressions to have a single value in the MIR, modulo associated constant projections. |
| [LVALUE...LVALUE] | ||
| CONSTANT | ||
| LEN(LVALUE) // load length from a slice, see section below | ||
| BOX // malloc for builtin box, see section below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this also need the adjustments?
I think what you mean by this is that simple things like |
One thing the current MIR does not make explicit as explicit as it | ||
could is when something is *moved*. For by-value uses of a value, the | ||
code must still consult the type of the value to decide if that is a | ||
move or not. This could be made more explicit in the IR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a world with drop calls explicitly encoded into the MIR, whether something is moved as opposed to copied doesn't matter at all; either the value will be explicitly dropped, or it won't. This is true with either the current embedded drop flags or explicit stack-based drop flags. Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eliding memcpy
calls for moves, perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a world with drop calls explicitly encoded into the MIR, whether something is moved as opposed to copied doesn't matter at all; either the value will be explicitly dropped, or it won't. This is true with either the current embedded drop flags or explicit stack-based drop flags. Or am I missing something?
For one thing, the MIR as I've described it thus far is allowed to DROP things that may have been moved. I'm assuming a later pass that determines precisely what needs to be dropped and inserts code to prevent double drops; this will be a type-based, control-flow-sensitive analysis, and hence it makes sense to do it after the MIR is built.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eddyb Right. If we explicitly encode moves, a pass could look for copies where the source is not used anymore after the copy and turn that into a move. LLVM doesn't do that for calls, most likely because it sees the address as significant when you pass a pointer to a function.
Looks good to me. SSA probably isn't worth it for this level, it's brilliant for lower-level optimisations, but it's somewhat more complex to build, whatever we want to do can probably be handled with dataflow analysis and similar. As this is an internal thing, I'm not too bothered as long as we get something in this direction. The details can be changed later. I think that unsafe blocks are inappropriate for an MIR. I also think the property that |
to figure out on its own how to do unwinding at that point. Because | ||
the MIR doesn't "desugar" fat pointers, we include a special rvalue | ||
`LEN` that extracts the length from an array value whose type matches | ||
`[T]` or `[T;n]` (in the latter case, it yields a constant). Using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowing LEN on fixed-size arrays seems like it just pointlessly complicates the MIR at the expense of possibly making it slightly easier to construct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it complicates the MIR at all. If anything it's simpler as it doesn't require an extra rule for fixed-size arrays. Also, we still need to bounds-check fixed-size arrays, so this would have to be a separate path for them for no obvious reason.
Have you thought about how serialization for MIR will work? |
its contents (it is not yet initialized). | ||
|
||
Note that having this kind of builtin box code is a legacy thing. The | ||
more generalized protocol that [RFC 809][809] specifies works in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant to link to RFC 809 here.
I'm all for giving this a try. The direction of the proposed design makes sense to me. Many of the details will become more clear when working on the implementation. Only the Regarding debuginfo, the things that come to mind in the context of the MIR are source locations, scope information, and memory locations of local varibles/arguments. Source Locations Scope Information Local Variables and Arguments |
The main question was whether it'd be worth CHECKING that property (that is, checking what operations are disallowed) on MIR. The reason to consider doing that is that it would be easier, since all derefs and calls are made very explicit. |
Not deeply. I don't foresee any particular difficulties. It should be much easier than serializing the AST, since there are no side-tables to be concerned with, and all internals links are, well, internal. That said, I'd like to define a canonical textual format for testing purposes (in an ideal world, we'd be able to supply MIR inputs directly to the compiler so we can skip early stages of the pipeline when testing). |
The existence of the "ReturnValue" lvalue allows us to do RVO, modulo the next bullet.
So, the main problem here that I see is aggregates. That is, if you have
it gets converted into:
which is obviously not what trans would produce. However, there are a lot of advantages to starting out with this form. But after safety checks are done, as I describe in the RFC, it is pretty easy to convert this to:
I'm assuming this would run after safety analyses but also after drops are rewritten to be more minimal, since I think there are some cases where you might wind up with double-frees if you're not careful.
This is why I separated out constants into their own thing. We can simplify constants and also rewrite MIR expressions as we choose.
This can conceivably be expressed by rewriting to reference the original lvalue. (Overall, I'm not sure how much optimization it makes sense to do on the MIR vs leaving it to LLVM -- we'll have to work out that trade-off. Certainly though we've found that doing optimizations in trans can be quite helpful for execution and compilation time so it's easy to see that the same will be true of the MIR. And I'm trying to think beyond LLVM as well, in which case doing more in the MIR would be helpful for portability -- especially Rust-specific things that would require custom LLVM passes or code anyway.) |
That would convert into something like
Your last example wasn't valid IR ( |
I was assuming that past a certain point we would enforce looser restrictions on what's valid or invalid. |
This looks very clean and a lot easier to work with. I'm definitely in favor. I had a few thoughts about the kind of desugaring we might want to do at this level and how it would interact with region and borrow checking: ClosuresClosures seem like a natural candidate for desugaring, since they are nearly equivalent to an anonymous struct with a trait impl. One subtlety is that assignment to a non- CPS and friendsIf we ever want Rust to support generators, async/await, coroutines, etc., this seems like the right place to do it. I've played around with writing a CPS transformation with pure macro rules and found several constructs that would be sound when doing region/borrow analysis in direct style but are not expressible in safe Rust after translation. Doing it at the MIR level after performing region/borrow checking would solve the issue nicely. On the other hand, the transformation also introduces trait bounds (e.g. LintsDo we allow pluggable lints at this level? It seems like some of the ones used by Servo (e.g. checking that GC roots are used properly) would need to operate on the MIR. Maybe I'm wrong and the HIR is enough. |
Regarding upvars, the MIR actually has a richer type system than the source language, and it includes Regarding CPS transform, I agree this is the place to do it, and we'll have to do some work to produce good error messages. I think we'll gain some more experience in that regard with mapping closures etc (we've made some progress, but we definitely produce some suboptimal error messages in borrowck today, such as those that talk about "borrowing" when the borrow is implicit in the syntax today). Regarding lints, I think it might make sense for some of them to operate on the MIR, but that's a long way off. @brson has also expressed interest in being able to write front-ends that generate MIR directly. So it seems plausible to me that we might sometime want to standardize a lowered Rust representation that can be consumed externally. |
Hear ye, hear ye. This RFC is entering final comment period. |
|
@bkoropoff The special unification logic in trait selection stuff is independent of the mir (which doesn't really touch on trait selection), but you're right I was forgetting about the rules to prevent assignments to moved upvars. What a pain. I should have pushed harder for mutpocalypse. :) In any case, to actually model that properly does require just a bit more extension of the type system: basically marking fields that cannot be directly assigned, even when reached uniquely (I've thought about proposing something similar from time to time -- obviously now it'd have to be more of a lint). As you say, not a big deal, but you're right that it has to be handled. |
@nikomatsakis Would trait selection still occur before desugaring closures to a struct + impl? I guess that's fine then, I was just hoping we could eliminate as much special case handling as possible. |
The big part of the complications closures bring are the type-system issues, and most of these (e.g. Assignments to non- |
On Fri, Jul 24, 2015 at 06:47:58PM -0700, Brian Koropoff wrote:
Yes, trait selection still occurs before desugaring. Trait selection |
Don't we already have a type IR? |
@arielb1 I'll try to write up what i'm talking about :) pretty orthogonal to this proposal. |
I'd suggest using some kind of standard format as a text representation - either a subset of Rust itself, or LES. That way nobody has to go to the trouble of designing a new syntax. |
I like this a lot! From a formal verification standpoint, this language is much better suited than the original AST. Fewer constructs, and more things explicit, it's almost like what I dreamed of ;-) Now, from a purely practical perspective, there's one thing I do not understand: What is the relationship to the recently accepted HIR? I'm surprised that the only relationship mentioned is that the |
@RalfJung It's possible that everything outside of function bodies may be kept around in HIR form. |
So what's the reason not to compile the AST directly to MIR+tables? Is there anything interesting happening on that intermediate stage? (I'm not trying to suggest that HIR has no place in this world; I'm just trying to figure out the reason behind your design decisions here.) |
The HIR is supposed to abstract over macros and name resolution. The new process should be:
Type checking is a big enough step to deserve its own IR. |
Thanks, thats a pretty simple, but thorough list for someone thats curious but completely opaque to rustc development. Similarly I was hoping you could expand on it a bit. I've heard in the past that one way to improve code-gen speed is for rustc to optimize the amount of LLVM IR it creates. I'm not at all suggesting it happen in this implementation but this seems like a great refactor to help with that, so I'm sure you guys are keeping it in mind. I'm just kinda curious where these IR reductions could/would take place in your list above? or if it will be more piece-meal and happen in small increments at every level as appropriate? |
|
It's official. The compiler subteam has decided to accept this RFC. (As of this writing, there are a few missing votes, but @Aatch has expressed support in thread, and @pnkfelix has expressed support in person.) |
This proposal describes a mid-level IR that I believe we should use in the compiler. This is purely an implementation detail and should not affect the language, though it may make many language extensions and analyses easier to implement; the most notable of these is non-lexical lifetimes.
Rendered