Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collection #415

Closed
glaebhoerl opened this issue Oct 25, 2014 · 15 comments
Closed

Garbage collection #415

glaebhoerl opened this issue Oct 25, 2014 · 15 comments
Labels
A-allocation Proposals relating to allocation. A-machine Proposals relating to Rust's abstract machine. T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC.

Comments

@glaebhoerl
Copy link
Contributor

We want to add support for garbage collection at some point.

We had a really long discussion about this back on the rust repository here. It also implicates the design for allocators.

My own belief is that the best plan would be precise tracing piggybacked off the existing trait and trait object system, i.e. compiler-derived trace routines (Trace impls) for each type, as outlined in my comment here. This would likely be very performant and avoid the need for any kind of headers on allocations, except for existentials (trait objects), which could/would have a Trace vtable pointer similarly to how Drop is currently done, i.e. this would also "just fall out" of the trait-based mechanism. By avoiding headers, we could also avoid imposing any costs on code which doesn't use GC.

(I am also not sure that we need to involve LLVM in any way, at least in the first round. My suspicion is that via the borrow checker and the type system (at least once we have static drops), we already have more information than would LLVM. Instead of stack maps, at least in the first iteration, in GC-using code we could have the compiler insert calls to register/unregister stack variables which may potentially contain managed data with the GC, based on borrow checker information.)

@thestinger
Copy link

I don't think forcing libraries to worry about tracing is worth it. It will a significant amount of complexity and with that comes new memory safety issues. The need to add overhead to trait objects is unacceptable, as is forcing more bloat into every crate. Features that impose a cost whether or not you use them are not a good fit with the language. Rust has been steadily dropping features like segmented stacks and green threads not adhering to pay-for-what-you-use.

@thestinger
Copy link

Niche features with a performance cost should be opt-in at compile-time and anyone who wants it can build a new set of standard libraries with it enabled. I still don't think the complexity would be worth it even in that scenario. I'm strongly against adding any form of tracing to the language / libraries and I intend to build a lot of community resistance against these costly, complex features. If it ends up being added, then it's going to be more great ammunition for a fork of the language.

@glaebhoerl
Copy link
Contributor Author

@thestinger In either case it would be possible to avoid any kind of overhead from garbage collection support for code that doesn't want it (at least how I would do things; can't speak for others). That was actually one of my foremost priorities. Then it mainly boils down to the question of opt-in vs. opt-out. (But even in the opt-out case, it would be possible to opt out.) Basically in one universe, garbage collection support is provided by default and you write:

struct MyStruct<T: NoManaged> { ... }

and

Box<MyTrait+NoManaged>

to disallow the given types from containing managed data, and thereby avoid any overhead from tracing support (including having to consider the possibility in unsafe code). In the other universe, NoManaged is default and you write:

#[deriving(Trace)]
struct MyStruct<T> { ... }

and

Box<MyTrait+Trace>

to enable tracing support, and thereby allow storing managed data.

Obviously you would prefer the latter. (I don't personally have a preference yet.) But once the infrastructure is in place (which is the same in either case), there would be lots of room to figure out the best way to expose it, and plenty of time to litigate the opt-in vs. opt-out debate.

(Again, I'm speaking only for myself here and have no idea what anybody else, not least the core team, wants to do.)

@thestinger
Copy link

If the standard libraries support it, then it imposes overhead on everyone.

@thestinger
Copy link

Simply outputting the metadata by default slows down compiles and results in more bloated binaries. If it's not opt-in via a compiler switch, then you're forcing costs on everyone. Either way, it forces a huge amount of complexity on the standard libraries because they need to cope with tracing. It will decrease the quality of the code for the common case where the niche feature isn't used.

@Ericson2314
Copy link
Contributor

@thestringer, if it's opt in (which it probably should be). The compiler time overhead in the don't use should be no more than that of any other unused trait with many impls. The runtime overhead should be non whatsoever.

By "should be" I mean something that I feel is a mandatory goal shared by just about everything interested, and an attainable goal too.

@thestinger
Copy link

@Ericson2314: That's not at all true, as I explained above.

@Ericson2314
Copy link
Contributor

@thestinger I have read everything you wrote, and I am not convinced.

  • "Simply outputting the metadata by default slows down compiles". Sure, but the deriving(trace) would be comparable to any other normal trait deriving. Since the trait is opt-in (as it is in my ideal senario), quanitifying over some arbitary type does NOT add an implicit Trace bound, and thus you write your code just like today.

The standard library need not to support GC types from the get go. It seems reasonable to support trying to nail down the GC abstractions first, and then merge them into the standard library.

The problem of making a lot more functions generic ocurs ONLY when the abstractions are used pervasively in the standard library. This problem is also triggered by making those functions allocator-agnostic without GC. My solution is to speculatively compile generic functions instanciated with their defaults in rlibs. This will mean if your program uses jemalloc and no GC (the default args), compile times would be similar today.

@Ericson2314
Copy link
Contributor

@glaebhoerl With the dynamic registering of stack variables as you propose (which, because a pointer is registered, I think will prevent the variables from going in registers), I'm hopeful that a rough prototype could be made without any rustc or llvm support. Do you agree?

@thestinger
Copy link

I have read everything you wrote, and I am not convinced.

Yeah, that's how confirmation bias works.

Sure, but the deriving(trace) would be comparable to any other normal trait deriving. Since the trait is opt-in (as it is in my ideal senario), quanitifying over some arbitary type does NOT add an implicit Trace bound, and thus you write your code just like today.

So you didn't actually read my comments, because you're ignoring the problems with trait objects. You're also not countering the point about the increase in metadata at all.

The standard library need not to support GC types from the get go. It seems reasonable to support trying to nail down the GC abstractions first, and then merge them into the standard library.

If the standard library ever supports garbage collection, it will add unacceptable overhead in terms of metadata and bloat.

This problem is also triggered by making those functions allocator-agnostic without GC. My solution is to speculatively compile generic functions instanciated with their defaults in rlibs.

You're drawing a false equivalence here. Allocator support on collections would not result in bloated metadata, bloated code or slower compile-time. It would be a pay-for-what-you feature as it would only generate extra code for custom allocators. I don't see how speculative compilation is a good idea, considering that types like collections need to be instantiated for each set of type parameters. Since nearly all of the code is supposed to be inlined, there's very little that can actually be reused in any case.

compile times would be similar today.

No, adding metadata will significantly slow down compile times.

@thestinger
Copy link

It's amusing that people are unable to have an honest debate about this. I've had productive debates about it with @pnkfelix and he never felt the need to deny that there are costs to supporting tracing.

The only way of completely avoiding a runtime / cost size cost is making it a compile-time option and not building any of the standard libraries with it enabled by default. It will still introduce a significant amount of complexity into the standard libraries and get in the way of implementing optimizations. The compile-time switch would result in there being 4 dialects of Rust to test and support (tracing is one bit of diversity, unwinding is another - and surely there will be more proposals for costly, complex niche features).

@glaebhoerl
Copy link
Contributor Author

@Ericson2314

With the dynamic registering of stack variables as you propose (which, because a pointer is registered, I think will prevent the variables from going in registers), I'm hopeful that a rough prototype could be made without any rustc or llvm support. Do you agree?

Doing it without rustc support seems like a tall order, but maybe at the "rough prototype" level something might be possible (after all, the Servo folks already did something vaguely similar). @huonw also had a prototype back at the discussion in the other repository. But yes, although I'm not a GC expert, unless I'm missing something, avoiding having to rely on LLVM seems like it should be possible (and probably advisable, at least in the short term).

@Ericson2314
Copy link
Contributor

@glaebhoerl I think it would be an interesting thing to make, if for nothing else to demonstrate that at least tracing can be done without any cost to non-users.

@thestinger If you find this conversation unproductive I am sorry. I value your insistence on features not costing non-users. If the bloat imposed by GC is as unavoidable and significant as you claim it is, then I will agree with you that GC shouldn't be added. I don't mean to be deceptive -- If @pnkfelix admits there will be some cost, perhaps you both are aware of something I am missing.


So you didn't actually read my comments, because you're ignoring the problems with trait objects.

Just to be sure, I searched for "trait object" and I got your sentence:

The need to add overhead to trait objects is unacceptable, as is forcing more bloat into every crate.

and @glaebhoerl 's sentence:

This would likely be very performant and avoid the need for any kind of headers on allocations, except for existentials (trait objects), which could/would have a Trace vtable pointer similarly to how Drop is currently done, i.e. this would also "just fall out" of the trait-based mechanism.

The bloat you are referencing I assume is the extra trace method in every vtable -- and to be clear I consider that bloat too. My previous understanding, which is what I thought @glaebhoerl followed up with, was that this was due to trace being an opt-out in his original comment. If we make it opt-in, then while Box<MyTrait+Trace> has the extra method, Box<MyTrait> doesn't. In the opt-in scenario, Box<MyTrait> therefore has no bloat.


You're also not countering the point about the increase in metadata at all.

Again, what metadata. I absolutely agree stack maps are extra metadata to clutter up the rlibs. But in this current proposal, there are no stack maps. But in @glaebhoerl's proposal for the first iteration, there are no stack maps. Either the registering of roots would be explicit, or it would exist 1-1 with the explicit calls to create or clone a GC root ptr, so it would be the next best thing. Both options are very explicit on costs, and would seem not to impact those that don't use GC.

Even if/when stack maps are added, I'd assume they can be enabled/disabled without affecting the semantics of code that does not use it. So while yes, there is another build target, there is no new dialect of Rust.


If the standard library ever supports garbage collection, it will add unacceptable overhead in terms of metadata and bloat.

and

No, adding metadata will significantly slow down compile times.

As illustrated above, The only metadata and bloat I am aware of is stack maps and the trace method in vtables. I have tried to explain my reasoning leading me to believe that they both can be avoided in programs that do not use GC without changing the semantics of Rust / forking a new dialect.


You're drawing a false equivalence here. Allocator support on collections would not result in bloated metadata, bloated code or slower compile-time. It would be a pay-for-what-you feature as it would only generate extra code for custom allocators. I don't see how speculative compilation is a good idea, considering that types like collections need to be instantiated for each set of type parameters. Since nearly all of the code is supposed to be inlined, there's very little that can actually be reused in any case.

If I remember correctly, my concern is something is not my own, but something I read elsewhere, perhaps basically in a meeting minutes. Perhaps my recollection is wrong, and there is no problem.

The concern is right now, Rust only compiles the monomorphizations of generic code that are actually used. The problem is that if one has a library where everything takes a type parameter, that effectively means that one gains nothing from compiling the library separately from the program it is used in, because in the library nothing is instantiated with a "concrete" type. If all the libraries the application developer use have a high proportion of generic code, the developer is forced to basically rebuild every time.

In the long run, I think this is just yet another reason why all compilers / build systems should support much more fine-grained caching---on individual functions even. In the short run, speculatively compiling code instantiated with its default parameters seems like an adequate solution.

Allocators (with or without GC) are just example of features that might make a far higher percentage of code polymorphic.

@magicgoose
Copy link

re. opt-in vs opt-out:
selection of opt-out GC was one of the bigger things that "killed" the D language. because it became pretty much impractical to use without GC, because most code depended on it, and then it's not a C++ alternative anymore.

IMO, having GC is fine but then it should be opt-in…

@Centril Centril added T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. labels Feb 23, 2018
@Centril Centril added A-machine Proposals relating to Rust's abstract machine. A-allocation Proposals relating to allocation. labels Nov 27, 2018
wycats pushed a commit to wycats/rust-rfcs that referenced this issue Mar 5, 2019
@Nadrieril
Copy link
Member

As of 2023 this seems like an unlikely direction for rust to go into. Given that there has been no discussion in years I'm taking the initiative to close this. Anyone is welcome to reopen if you want to explore this possibility again (also I haven't looked but I'm pretty sure there are a some more detailed proposals along these lines that already exist, either in this repo or on https://internals.rust-lang.org).

@Nadrieril Nadrieril closed this as not planned Won't fix, can't repro, duplicate, stale Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-allocation Proposals relating to allocation. A-machine Proposals relating to Rust's abstract machine. T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests

6 participants