Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thread_id_value tracking issue #67939

Open
Mark-Simulacrum opened this issue Jan 6, 2020 · 39 comments
Open

thread_id_value tracking issue #67939

Mark-Simulacrum opened this issue Jan 6, 2020 · 39 comments
Labels
A-concurrency Area: Concurrency related issues. B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. Libs-Small Libs issues that are considered "small" or self-contained Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@Mark-Simulacrum
Copy link
Member

Mark-Simulacrum commented Jan 6, 2020

This is the tracking issue for the ThreadId::as_u64 method, which casts a thread ID to the underlying u64.

There are currently no known blockers to stabilization beyond making sure that the API indeed works for intended use cases.

Implemented originally in #67566.

@Mark-Simulacrum Mark-Simulacrum added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. labels Jan 6, 2020
@JohnTitor JohnTitor added the B-unstable Blocker: Implemented in the nightly compiler and unstable. label Feb 28, 2020
@brain0
Copy link

brain0 commented Mar 18, 2020

Shouldn't this at least guarantee that a valid id is never 0?

@Mark-Simulacrum
Copy link
Member Author

Why would you expect such a guarantee?

We could clearly make it, but it seems a bit odd to me for us to do so...

@brain0
Copy link

brain0 commented Mar 18, 2020

Such a guarantee would allow storing the equivalent of Option<ThreadId> in an AtomicU64.

I have a case where I need to avoid some recursive operation: So, a thread starts the operation (by executing an FnOnce() passed from outside) and saves its thread id. If the same thread performs the same operation again recursively, I panic. If another thread does it, I park the thread. In this case, 0 would serve as a value indicating "no thread has been assigned yet".

The current implementation stores the ThreadId in NonZeroU64, so if we can make a stable guarantee that a thread id will never be 0, returning NonZeroU64 instead of u64 here would help a lot.

@Mark-Simulacrum
Copy link
Member Author

I think I would personally prefer to avoid using NonZero in this API. I'd suggest manually doing so in the consumer of the API, perhaps assigning u64::MAX instead of 0 as the sentinel for None.

@brain0
Copy link

brain0 commented Mar 18, 2020

Without any guarantee that a thread id cannot be u64::MAX that is not a good idea. And the current state is "there are no guarantees".

@Mark-Simulacrum
Copy link
Member Author

Well, sure, I guess, though realistically it's the sort of thing that would be fine to do IMO and panic on -- you're almost certainly not going to hit that.

I'm not opposed to accepting this, I guess -- would you be willing to file a PR changing the API as you've suggested?

@brain0
Copy link

brain0 commented Mar 18, 2020

Well, sure, I guess, though realistically it's the sort of thing that would be fine to do IMO and panic on -- you're almost certainly not going to hit that.

I always prefer a static guarantee (as_u64 returning a type that guarantees an invariant) over "almost certainly fine".

I'm not opposed to accepting this, I guess -- would you be willing to file a PR changing the API as you've suggested?

Sure, I can do that. But I still don't quite understand yet why you don't like it the idea.

@Mark-Simulacrum
Copy link
Member Author

It's not really that I don't like it -- I think you've convinced me that it can be useful and is pretty minimal -- but I was uncertain about making the guarantee going forward. But thinking about it more, 64 bits should be more than enough that we shouldn't need to worry about the zero being lost to us if we make the guarantee.

brain0 pushed a commit to brain0/rust that referenced this issue Mar 21, 2020
As discussed in rust-lang#67939, this allows turning Option<ThreadId> into Option<NonZeroU64> which
can then be stored inside an AtomicU64.
Centril added a commit to Centril/rust that referenced this issue Mar 22, 2020
Return NonZeroU64 from ThreadId::as_u64.

As discussed in rust-lang#67939, this allows turning Option<ThreadId> into Option<NonZeroU64> which
can then be stored inside an AtomicU64.
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Mar 22, 2020
Return NonZeroU64 from ThreadId::as_u64.

As discussed in rust-lang#67939, this allows turning Option<ThreadId> into Option<NonZeroU64> which
can then be stored inside an AtomicU64.
@Amanieu
Copy link
Member

Amanieu commented Mar 26, 2020

This API should return a usize, not a u64. AtomicU64 isn't supported on all platforms, while AtomicUsize is. Also since threads share the address space, it should be possible to guarantee that an id fits in a usize.

@Mark-Simulacrum
Copy link
Member Author

@Amanieu We currently use a u64 I think because (a sufficiently long lived) process could spawn more than 2^32 threads, not simultaneously. And currently we guarantee that the index is unique, i.e. will not wrap. I am unopposed to changing this, but we should do so in ThreadId::new and make a decision about behavior on wrapping.

@joshtriplett
Copy link
Member

joshtriplett commented May 13, 2020

There's value in not reusing the IDs, such as if you use them as unique identifiers for log files.

EDIT: I no longer think this is a good idea. Reuse is fine, and the tradeoffs of avoiding it are worse.

@KodrAus KodrAus added A-concurrency Area: Concurrency related issues. Libs-Small Libs issues that are considered "small" or self-contained Libs-Tracked Libs issues that are tracked on the team's project board. labels Jul 29, 2020
ltratt added a commit to ltratt/yk that referenced this issue Dec 5, 2020
When we detect a possibly closed loop in the end user's program, we need to
determine whether the current thread is the one tracing it or not. We previously
allocated a chunk of memory when we started tracing and used its address as a
temporary thread ID.

Ideally we would use the system provided thread ID to determine this, but that
turns out to be a non-portable minefield. pthreads provides the pthread_t ID
type but that's notionally opaque (on OpenBSD, as far as I can tell from a quick
search, it's really an opaque struct about which no guarantees are provided,
though some programs such as Chrome do seem to rely on it being equivalent to a
usize). Some platforms (at least Linux and OpenBSD) use the PID ID type pid_t
for threads too and that's guaranteed to be a signed integer though not it seems
of a guaranteed size (though on both platforms it's actually a C int, so 32 bits
in practise). Rust provides a ThreadID type and an unstable "as_64" function
(rust-lang/rust#67939) but that provides no guarantees
about how much of the 64-bit integer space is used.

The challenge we have is that whatever ID we use it must fit into
(numbits(usize) - numbits(PHASE_TAGS)) i.e. we have 62 bits for the ID on a
64-bit system. If we rely on the system thread IDs it feels like we're storing
up a portability time bomb that might explode one day in the dim and distant
future. At least for now, the (minimal) extra performance we might get from that
doesn't seem worth the danger.

This commit is thus a relatively simple change. Rather than allocating a
malloc'd block every time we start tracing a thread, which means we have to be
quite careful about freeing memory, we allocate it on thread construction. This
simplifies the code slightly and (since it's an aligned address) we can feel
fairly safe that it plays nicely with PHASE_TAGS. We could, if we wanted,
allocate this block lazily the first time we trace but that feels like an
unnecessary optimisation at this point.
ltratt added a commit to ltratt/yk that referenced this issue Dec 6, 2020
When we detect a possibly closed loop in the end user's program, we need to
determine whether the current thread is the one tracing it or not. We previously
allocated a chunk of memory when we started tracing and used its address as a
temporary thread ID.

Ideally we would use the system provided thread ID to determine this, but that
turns out to be a non-portable minefield. pthreads provides the pthread_t ID
type but that's notionally opaque (on OpenBSD, as far as I can tell from a quick
search, it's really an opaque struct about which no guarantees are provided,
though some programs such as Chrome do seem to rely on it being equivalent to a
usize). Some platforms (at least Linux and OpenBSD) use the PID ID type pid_t
for threads too and that's guaranteed to be a signed integer though not it seems
of a guaranteed size (though on both platforms it's actually a C int, so 32 bits
in practise). Rust provides a ThreadID type and an unstable "as_64" function
(rust-lang/rust#67939) but that provides no guarantees
about how much of the 64-bit integer space is used.

The challenge we have is that whatever ID we use it must fit into
(numbits(usize) - numbits(PHASE_TAGS)) i.e. we have 62 bits for the ID on a
64-bit system. If we rely on the system thread IDs it feels like we're storing
up a portability time bomb that might explode one day in the dim and distant
future. At least for now, the (minimal) extra performance we might get from that
doesn't seem worth the danger.

This commit is thus a relatively simple change. Rather than allocating a
malloc'd block every time we start tracing a thread, which means we have to be
quite careful about freeing memory, we allocate it on thread construction. This
simplifies the code slightly and (since it's an aligned address) we can feel
fairly safe that it plays nicely with PHASE_TAGS. We could, if we wanted,
allocate this block lazily the first time we trace but that feels like an
unnecessary optimisation at this point.
bors bot added a commit to ykjit/yk that referenced this issue Dec 6, 2020
168: Don't create a malloc'd block each time we start tracing. r=vext01 a=ltratt

When we detect a possibly closed loop in the end user's program, we need to determine whether the current thread is the one tracing it or not. We previously allocated a chunk of memory when we started tracing and used its address as a temporary thread ID.

Ideally we would use the system provided thread ID to determine this, but that turns out to be a non-portable minefield. pthreads provides the pthread_t ID type but that's notionally opaque (on OpenBSD, as far as I can tell from a quick search, it's really an opaque struct about which no guarantees are provided, though some programs such as Chrome do seem to rely on it being equivalent to a usize). Some platforms (at least Linux and OpenBSD) use the PID ID type pid_t for threads too and that's guaranteed to be a signed integer though not it seems of a guaranteed size (though on both platforms it's actually a C int, so 32 bits in practise). Rust provides a ThreadID type and an unstable "as_64" function (rust-lang/rust#67939) but that provides no guarantees about how much of the 64-bit integer space is used.

The challenge we have is that whatever ID we use it must fit into (numbits(usize) - numbits(PHASE_TAGS)) i.e. we have 62 bits for the ID on a 64-bit system. If we rely on the system thread IDs it feels like we're storing up a portability time bomb that might explode one day in the dim and distant future. At least for now, the (minimal) extra performance we might get from that doesn't seem worth the danger.

This commit is thus a relatively simple change. Rather than allocating a malloc'd block every time we start tracing a thread, which means we have to be quite careful about freeing memory, we allocate it on thread construction. This simplifies the code slightly and (since it's an aligned address) we can feel fairly safe that it plays nicely with PHASE_TAGS. We could, if we wanted, allocate this block lazily the first time we trace but that feels like an unnecessary optimisation at this point.

This is based on a suggestion from @vext01 and a comment from @bjorn3.

Co-authored-by: Laurence Tratt <[email protected]>
LukasKalbertodt added a commit to LukasKalbertodt/rust that referenced this issue Feb 22, 2021
This changes the `Debug` impl of:
- `core::any::TypeId`
- `core::mem::Discriminant`
- `std::thread::ThreadId`
- `std::time::Instant`

All these types are explicitly described as "opaque" in the
documentation, so they shouldn't print implementation details in their
`Debug` implementation.

The change for `ThreadId` can possibly be reverted in the future, once
the method `thread_id_value` is stabilized:
rust-lang#67939
The same goes for `Discriminant` once/if the `DiscriminantKind` trait
gets stabilized.
But until then, these are opaque types.
@joshtriplett
Copy link
Member

This seems to have trailed off.

It sounds like there are two major approaches here: use u64 and make no guarantees, or use NonZeroU64 and guarantee that a thread ID will never be zero. I think we can reasonably guarantee that.

@jgarvin
Copy link

jgarvin commented Feb 25, 2021

I had an earlier post here I deleted b/c I was conflating two things.

  1. Guaranteeing that some value isn't used so you can have a niche. This is fine AFAICT at least on Linux, because pid_t is only 4 bytes, so there's lots of bits to spare.
  2. Making sure thread IDs are never recycled like @joshtriplett suggested.

I don't know how to do 2 on Linux. gettid() and pthread_self have no such guarantee, so it would have to come from Rust. gettid() will recycle if you spawn and delete many threads. Things that won't work:

A) Rust could maintain its own global u64 count that it increments at thread creation, but it is only possible to hook this for threads created by std::thread. FFI libraries could create threads undetected. AFAIK there is no platform support for hooking thread creation.

B) You could lazily count threads -- on request for an ID check a global HashMap<pid_t, bool> to try to track if a thread id has already been counted and if not bump the global count. But that has the same reuse problem. You could try to detect reuse by having a thread local that impls Drop to reliably detect threads being torn down, but the thread local documentation describes running destructors as "best effort."

C) You could have a thread local storing the current thread id, and when it's detected as not set it bumps the global count and returns it. This number won't be stable during thread teardown, b/c destruction order of thread locals isn't defined and on some platforms they get recreated if accessed after destruction.

I'm not a pthreads expert by any means, mostly commenting here because if there is a way to do this it's a very nice property to have so I want to know how to pull it off ;) Could assist with implementing safe APIs that do checks to make sure you are consistently accessing an object from the same thread.

@Amanieu
Copy link
Member

Amanieu commented Mar 16, 2021

Why use a global thread ID when we can just use the TLS base address instead?

  • Reading it uses a single instruction.
  • It is usize since it is a memory address.
  • It is unique for each thread in the current process.
  • It is guaranteed to be non-null.

As an example of something that specifically required a non-zero usize thread ID, consider parking_Lot's ReentrantMutex:

  • It must be usize instead of u64 because AtomicUsize is available on all platforms but AtomicU64 isn't.
  • It must be non-zero because we need a sentinel value to indicate that the lock has no current owner.

@joshtriplett
Copy link
Member

My concern was whether it would be safe to commit to thread IDs being usize on all future targets.

However, if there's a platform where that doesn't hold, we can always use our own, whether derived from TLS or something else. So usize seems fine.

@ltratt
Copy link
Contributor

ltratt commented Apr 12, 2021

I wonder if we can split this up a little bit, in the sense that it seems that one design decision dominates.

IMHO that major design decision is: does Rust's std lib keep its own thread IDs? If it doesn't, then life is easy, but thread IDs are also of limited use. If it does, then we open up more use cases, but have to worry about things like: are thread IDs unique for the lifetime of a process? should they a non-zero representation? and so on. Personally I'm rather neutral, but I think the status quo is unappealing: the documentation is vague about guarantees, but if you look at the source code, you'll see that it's currently making a strong guarantee. That strikes me as dangerous: I'm pretty sure there will be code out there now making an assumption that ThreadIds are unique for a process's lifetime even without using as_u64. If we don't want to make such a guarantee, then we might perhaps want to explicitly document that, even if the code doesn't change; it's easy to strengthen the guarantee in the future, but painful to weaken it.

[I must admit, I had totally forgotten this issue even existed -- I was surprised to see I'd referenced it a few months back! I raised https://github.com//pull/84083 as a result of my forgetting about this!]

@jweinst1
Copy link

I would say thread id's are somewhat useful for approaches to thread specific storage techniques, where you want to store things in a common data structure, that is safely separated by individual thread IDs. In theory, you can do this via just passing in a uniquely incremented usize when creating threads, but it would be nice to have a way to retrieve a unique one anywhere.

@Perksey
Copy link

Perksey commented Feb 9, 2022

Are there any major outstanding issues now that the clarification PR #84083 was merged?

@jgarvin
Copy link

jgarvin commented Feb 9, 2022

After examining the discussion at #84083 and the associated commit it's still not obvious to me how the ID works for threads created outside Rust. @joshtriplett mentions that only being able to access the ID via thread::current helps, but I'm not sure how. Does thread::current().id() lazily allocate a unique ID? If it uses TLS to do so is it safe even on platforms that can deallocate and then reallocate TLS storage during thread teardown? (if the latter can a thread report having multiple different but unique IDs during teardown?)

@permezel
Copy link

permezel commented Jun 3, 2022

I would like to be able to access the thread::current().id.0. IE: the private field in ThreadId. The reason for this is that I print the TID as a "{:04.4x}" as a fixed component of a trace output line for debugging multi-threaded programmes.
ThreadId(1): .... is not optimal.
0001: ... is optimal.

ThreadId(1):5c1e0b80aa859:I:          fs: Printing testing lists...
ThreadId(1):5c1e0b80c4e7d:D:          fs: debug
ThreadId(1):5c1e0b80c7f6e:E:          fs: Error

what I desire is:

0001:5c1e0b80aa859:I:          fs: Printing testing lists...
0001:5c1e0b80c4e7d:D:          fs: debug
0001:5c1e0b80c7f6e:E:          fs: Error

The second field is n/s counter in hex.
The third field is Info/Debug/Error tag.

If I cannot access the ThreadId as u64, I will need to write my own hash and lookup for each and every trace line output.

Not conducive to most efficient implementation.

@Zerowalker
Copy link

so ThreadId has no relevance to the actual id of the thread given by the OS?

@thomcc
Copy link
Member

thomcc commented Sep 9, 2022

It does not. The OSes may a thread's ID after it terminates (and also many OSes have multiple things that might be called the thread id).

@thomcc
Copy link
Member

thomcc commented Sep 9, 2022

That means there is no opportunity for a Rust-generated ID, and we should just use OS thread IDs. Yes, they might get reused after a thread exits; so be it.

FWIW our documentation on this gives a guarantee that we never reuse these: https://doc.rust-lang.org/nightly/std/thread/struct.ThreadId.html

ThreadIds are guaranteed not to be reused, even when a thread terminates

@wleslie
Copy link

wleslie commented Jan 6, 2023

Would this numeric ID be a suitable basis for implementing Ord for ThreadId? One potential usecase for this impl is to serve as an arbitrary tiebreaker for conflict resolution.

@tgross35
Copy link
Contributor

tgross35 commented Sep 11, 2023

A few things as a summary:

  • @thomcc pointed out above that our documentation unequivocally states that ThreadId must be unique for all eternity of the process. But...
  • @the8472 demonstrated that u64 is not big enough for that purpose
    Stabilize ThreadId::as_u64 #110738 (comment). I have also seen discussion on the kernel mailing lists that address space may start growing to 128 bits within a decade or so, does kind of track if uniqueness is desired.
  • Iirc, 64 bits of randomness is not considered enough to provide a resonable guarantee of nonrepeatedness in cryptographic contexts
  • Randomized IDs seems slightly preferred over sequential for the same reason as -Z randomize-layout - break reliance on behavior that happens to work but is not guaranteed. Using thread local storage address was one proposed way
  • Some C interop situations requies the gettid ID of a thread

The documentation is pretty strong, which is kind of unfortunate because it paints us into a corner - the first two points together seem to say that u128 is more or less the needed internal representation at this point, depending on how strongly the u64 expiration concern is.

Loose but interesting proposal that fits all of the above: maybe ThreadId could be a u128 structured like the following:

bits 0..32: the OS's TID
bits 32..48: lower 16 bits of a LCG-generated 92-bit number
bits 48..52: 0b1000
bits 52..64: middle 12 bits of a LCG-generated 92-bit number
bits 64..66: 0b10
bits 66..128: upper 64 bits of a LCG-generated 92-bit number

The form is comes from the soon-to-be-finalized UUIDv8 spec (all data custom except for the 0b1000 version and 0b10 variant). So with something like this we get the following:

  • The OS's TID can always be recovered from a ThreadID which means we don't need to store or retrieve two separate points of data. So ThreadId::os_tid(self) and ThreadId::as_u128(self) are both possible. (I am not sure how we would handle this otherwise)
  • We get a space of 92 bits for our pseudorandom nonrepeating generator (which is fast and doesn't require storing all generated values), plus the 32 bits of OS TID that is also mildly random. This gives a very strong uniqueness guarantee and LCGs are fast.
  • We get to display the TIDs as spec-compliant UUIDs, which is nice because u128 values aren't nice to read. Loggers can probably just print the first segment of the UUID to save space
  • It just so happens that the first segment of the 00000000-0000-0000-0000-000000000000 form is 4 bytes, so the TID is the portion up to the dash. Kind of a convenient coincidence
  • Anything that needs 64 bits can grab either half and still have a strong uniqueness guarantee

That all seems to work together almost perfectly. The downside is, of course, no atomic storage.

@ahicks92
Copy link
Contributor

I needed this the other day (specifically a nonzero id that fits in an atomic).

Skimming these discussions, why not fn to_os_id(&self) -> usize which loosens the guarantee that they will not be reused for the lifetime of the process, and which maybe even goes as far as specifying what the id is on various targets?

Then the representation of the full struct--where the guarantee of non-reuse is offered--can be whatever.

Docs could say something like "This is intended for advanced concurrency algorithms where..." but I am terrible at writing docs which are generally useful for people not already at least somewhat familiar with the subject matter when it comes to this stuff.

It seems to me that anyone who needs a non-reused integer thread id of a specific integer width knows enough to throw together the obvious solution with a static atomic counter and a thread local which matches their needs; the value of Rust stdlib in this context would be knowing how to get something as quickly as possible, since that is slow and/or branchy.

@tgross35
Copy link
Contributor

tgross35 commented Nov 25, 2023

I have been thinking about this for a while and come to a similar conclusion.

It seems like applications needing uniqueness could bring their own, is there any reason this has to be in std? Maybe they need u64 uniqueness, maybe they need u128 uniqueness, maybe the IDs need to be cryptographically unpredictable for some reason, etc.

If there is no reason that this couldn’t exist in a library then I don’t think we should try too hard to provide anything other than the OS thread ID.

edit: I forgot we already made these guarantees

@ahicks92
Copy link
Contributor

The problem is that ThreadId is stable and offers the guarantee of process-lifetime uniqueness and that can't be relaxed without breaking anyone who, e.g., does tls themselves with a hashmap or something. The discussion there is probably dead not because it's necessarily a good idea, but because the behavior is stable for better or worse. Any relaxing there would need to be done via methods which relax the guarantee, not changing the core type.

The other problem is that on a concrete level there's a case to be made in one of my current projects to use PI-futexes and it would be really nice if I didn't have to go outside stdlib just to get a Linux tid.

None of my use cases for this have absolutely required it, but I'd say every 2 months or so I come back to needing a u64 (or other integer) that's unique per thread and doesn't need to be unique for the lifetime of the process. Then I give up and write my own one way or another and go well okay kinda sucks that I can only access that id from tls on the thread that generated it but I guess it's good enough o well.

@marshrayms
Copy link

Using the stdlib doc search bar, it took me five seconds to find std::process::id().

Please may I have a simple, standard way to get the native OS numeric thread id?

I really don't think that's asking for a lot.

I don't care if it's provided by this type, a plain function, a macro, whatever.

I don't care about any uniqueness guarantees or promises about its value.

I don't care if it's a u128.

I just want the TID.

Please :-)

@ChrisDenton
Copy link
Member

That's different to what this issue is about. It would be a new API request. You may want to fill out an API Change Proposal (ACP) suggesting that: https://github.com/rust-lang/libs-team/issues/new/choose

@tgross35
Copy link
Contributor

tgross35 commented Mar 2, 2024

Unfortunately that isn't quite as straightforward as it would seem, see discussion starting at https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Printing.20thread.20ID.20on.20panic.20bikeshed/near/412207744

Still wouldn't hurt to have an ACP to get some direction there

@marshrayms
Copy link

@ChrisDenton

That's different to what this issue is about. It would be a new API request. You may want to fill out an API Change Proposal (ACP) suggesting that: https://github.com/rust-lang/libs-team/issues/new/choose

Oh, my apologies.

I'd read in the documentation that std::process::id() "Returns the OS-assigned process identifier associated with this process" and I wrongly assumed that std::thread::current().id() would be analogous. Nothing in the description of that method (or its entire doc page) suggested that it was a completely unrelated concept to what every other system that I am familiar with refers to as a "thread ID". Seeing the conversion to u64 only served to confirm my incorrect assumption.

@tgross35

Unfortunately that isn't quite as straightforward as it would seem, see discussion starting at https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Printing.20thread.20ID.20on.20panic.20bikeshed/near/412207744

That discussion seems to be about getting the OS thread ID for an arbitrary thread from its pthread_t.

All I am asking for is the OS TID of the current thread, i.e., gettid(), GetCurrentThreadId(). I need this to log for reference with external tools such as debuggers and profilers.

But I have been directed to make a fresh proposal for that, so, sorry for the noise.

@correabuscar
Copy link

correabuscar commented May 6, 2024

so ThreadId has no relevance to the actual id of the thread given by the OS?
@thomcc It does not. The OSes may a thread's ID after it terminates (and also many OSes have multiple things that might be called the thread id).

it seems a word is missing there, was it supposed to be may change ah it's may reuse most likely ?

@ahicks92
Copy link
Contributor

ahicks92 commented May 6, 2024

@correabuscar
Probably should be reuse. Many OSes only guarantee uniqueness for the currently running threads, not for the lifetime of the program; that's the problem.

@correabuscar
Copy link

Just a heads up, for those that didn't know (like me), std::thread::current() does alloc because it uses Arc at this point:

pub(crate) unsafe fn new(name: CString) -> Thread {
unsafe { Self::new_inner(ThreadName::Other(name)) }
}
pub(crate) fn new_unnamed() -> Thread {
unsafe { Self::new_inner(ThreadName::Unnamed) }
}
// Used in runtime to construct main thread
pub(crate) fn new_main() -> Thread {
unsafe { Self::new_inner(ThreadName::Main) }
}
/// # Safety
/// If `name` is `ThreadName::Other(_)`, the contained string must be valid UTF-8.
unsafe fn new_inner(name: ThreadName) -> Thread {
// We have to use `unsafe` here to construct the `Parker` in-place,
// which is required for the UNIX implementation.
//
// SAFETY: We pin the Arc immediately after creation, so its address never
// changes.
let inner = unsafe {
let mut arc = Arc::<Inner>::new_uninit();
let ptr = Arc::get_mut_unchecked(&mut arc).as_mut_ptr();
addr_of_mut!((*ptr).name).write(name);
addr_of_mut!((*ptr).id).write(ThreadId::new());
Parker::new_in_place(addr_of_mut!((*ptr).parker));
Pin::new_unchecked(arc.assume_init())
};

it's that Arc::<Inner>::new_uninit(); which does Global.allocate(layout) inside it (tested with rustc 1.76.0 btw, even tho i pointed to latest code above)

therefore std::thread::current().id().as_u64() is unusable as a unique identifier of threads for the purpose of say creating a noalloc thread_local type, which is what I wanted to do with it...

You may try the following example(playground) just in case it changed in the future, see if it still segfaults:

#![feature(thread_id_value)]

use std::alloc::{GlobalAlloc, Layout};

struct MyGlobalAllocator;
unsafe impl GlobalAlloc for MyGlobalAllocator {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        let _=std::thread::current();
        //XXX: ^ infinite recursion because it's allocating, so Segmentation fault      (core dumped)
        //https://github.com/rust-lang/rust/blob/e8ada6ab253b510ac88edda131021d9878f2984f/library/std/src/thread/mod.rs#L1321-L1349
        let ptr = std::alloc::System.alloc(layout);
        ptr
    }

    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        std::alloc::System.dealloc(ptr, layout);
    }

    unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8 {
        let new_ptr = std::alloc::System.realloc(ptr, layout, new_size);
        new_ptr
    }
}

#[global_allocator]
static GLOBAL_ALLOCATOR: MyGlobalAllocator = MyGlobalAllocator;

fn main() {
}

@CAD97
Copy link
Contributor

CAD97 commented Jun 4, 2024

#124881 utilizes an atomic ThreadId to identify which thread has locked a ReentrantLock. (On targets without a sufficiently large atomic, a seqlock is used and is fully sound; this could be extended to larger tid if needed.) The reason to bring it up is that to do so it duplicates the tid so it's not just in the static CURRENT: OnceCell<Thread> but also in an accompanying static CURRENT_ID: Cell<Option<ThreadId>> such that getting the current ThreadId can be extremely cheap in the common case and still work during TLS destruction. The current implementation uses thread::try_current() as the canonical way to get the assigned tid, but this could theoretically be flipped so thread::try_current_id() is the canonical source which thread::try_current() uses when it needs to initialize the initial Thread object for a thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-concurrency Area: Concurrency related issues. B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. Libs-Small Libs issues that are considered "small" or self-contained Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests