-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Promote the libc
crate from the nursery
#1291
Conversation
Move the `libc` crate into the `rust-lang` organization after applying changes such as: * Remove the internal organization of the crate in favor of just one flat namespace at the top of the crate. * Set up a large number of CI builders to verify FFI bindings across many platforms in an automatic fashion. * Define the scope of libc in terms of bindings it will provide for each platform.
How complete the new libc is supposed to be? In the current libc large parts of the ISO C library are missing, the same is true for posix, IIRC. To some degree it can be described as "If some binding was needed for something in rust-lang/* and friends it was included." Edit: everything else - flat namespacing, size_t changes, testing, removal of winapi looks good to me. |
👍 to the flat namespace. I don't think I've ever personally found the existing hierarchy too useful. Whenever I need to use With respect to |
First of all, rather than testing the bindings, I believe they should be fully auto-generated (similarly to winapi), possibly at crate build time. The only thing that should receive real commits in such case are lists of headers/libraries to look at/link to, and the generator itself. Provided binding generators are correct (which is what you’d test), no API testing is necessary. I’m strongly in favour of keeping only CRT on windows-msvc, but windows-gnu should expose gnu part as well. I’m strongly in favour of flat namespace. I’d like crate to get named libc-sys or something, to match the conventions. |
I intended to address the "what does this library contain" question in the section about
One worry I've had about this in the past is compatibility in the backwards and forwards direction. For example, let's say we're using a very new libc with a very old libstd. This means that any new On the flip side if we're using a very old libc with a very new libstd, it's likely that there's a number of types in libstd which aren't reexported by libc because when it was originally written those types didn't exist. The only real solution I know of to this is to put
I'm currently under the impression that it's not viable to generate any of these bindings at build time because that implies something like libclang being installed which would be a pretty unfortunate dependency of the
Also, there's no real downside to just running more tests! (especially because they already exist)
I debated for a bit about perhaps renaming this crate, but it's so widely used and ingrained that I think ship has sailed here at this point. It's also not clear that it would actually follow the existing conventions because |
For the
There's probably an argument to be made that libc shouldn't expose any functions at all on Windows. (The crate wouldn't be completely empty because it would contain some types.) It might be a good idea to explicitly exclude pure math functions from the scope of libc ( |
The C runtime is so ubiquitous and difficult to change that I would expect any alteration in how it's linked to be a modification to the target triple entirely. I certainly do not want to promise that the standard library will link to a C runtime on all platforms for forever, rather I want there to always be a possibility for the standard library to be built completely independently of libc (e.g. on Unix) or the CRT (on Windows). That being said, the purpose of this library will just be to say "I'd like a C runtime linked in, please". Although the standard library may not link a C runtime, that'd be the purpose of this crate. In that sense, to answer your points:
I'm a little confused by this in the sense of if you don't have a C runtime, why would you want I would expect, however, that
I'm not sure I understand what the concern is here. If this is a problem, how is it supposed to work in the ideal case? Surely we have to always be able to link to "a CRT" as well as external C code?
I disagree that you never want these functions because I can imagine a niche use case where you're not dealing with many Rust types but instead more C types, so it may be easier to call these functions in that case. Regardless, though, this library represents an exact binding to the platform in question, not necessarily an opinionated version of "here's what we think you should call". For example I wouldn't reject a PR to add
While I agree that the functions are likely to be rarely used, I don't think we should necessarily actively remove them just because we don't think they should be there. I would expect a use case to eventually arise in one form or another and it's nice to have the bindings already available!
Could you elaborate a bit on this? While I agree that they may not be necessary (because the standard library provides them) I don't see how reimplementing them in Rust would affect this. By linking to |
There is no "standard CRT" on Windows. Every version of Visual Studio ships with a different, binary-incompatible version of the CRT.
You don't necessarily need a C runtime to do FFI on Windows. Some Windows APIs are defined to take wchar_t or a typedef of it (the winapi crate has a definition), but they're independent of any C runtime.
Ideally, APIs which allocate memory have a companion API to free it. http://blogs.msdn.com/b/oldnewthing/archive/2006/09/15/755966.aspx is a more in-depth description of the issue. |
This becomes a bit problematic when, for example, abs() on Android or sinf() on Windows is defined in a header, so there is no symbol to link against, so we would have to rewrite the definition in Rust to expose "libc" in the same way it would be visible to C code. There isn't any fundamental reason we can't do that, but it seems like a lot of useless work. |
Not specific enough. When scope section says "libc, libm, librt, libdl, and libpthread" I assume glibc is implied and the scope is based on implementations and not standards. At the same time Rust libc contains only a tiny part of what glibc provides. Is better (or even complete) coverage of glibc promised or encouraged? Will for example PRs adding wide string or locale stuff be accepted or worked on by rust-lang members? Will C99/C11 bindings/types be provided or accepted? Will glibc(musl/VC)-only functions/types/etc. present in the library? E.g. what is the purpose of the library - gather some popular stuff from popular implementations or provide a more or less complete C layer. |
In general I'm not understanding your comments with respect to the design of this library, it sounds like you're talking about general "things to worry about" with MSVC which have not a lot of relevance to providing bindings to the CRT? For example, I don't understand how binary incompatibility of the CRT comes into play here. Surely the signature of functions like I'm also not sure why you're concerned about getting type definitions without a CRT? That sounds like the job of And finally, I don't understand how malloc/free are relevant here. It's of course a problem if you malloc in one DLL and then free in another with a different free function than intended, but how is that related to the design of It'd be helpful to me to articulate exact failure scenarios you're worried about, and keep in mind that this library isn't going to "just solve all your problems", it's just declarations to functions found on the platform in question!
This is only really problematic if you want the library to be "cross platform" in the sense that it's always conforming to some standard or another, but the RFC explicitly states that this library is not cross platform and it's quite specific to the platform in question. In that sense there's no problem here. We could provide the inline functions ourselves as well for these platforms. I also don't understand why you say that writing the inline definition in Rust is "useless work"? What are you trying to reason towards? Not exposing these symbols? Not binding Android at all? We don't necessarily have to seek out and implement all of these definitions, the RFC is just defining the scope of Currently when this RFC says that group of libraries on Linux, it really means it. There's no implication of glibc or of any particular standard, it's simply whatever's to be found in those libraries on all Linux distributions can be included in this library. We'd certainly accept PRs for wide strings, locales, new types, etc, so long as they're present on Linux in these libraries across all Linux distributions (e.g. following the letter-of-the-law of this RFC in terms of scope). The purpose of this library, stated in the RFC, is "to provide all of the definitions necessary to easily interoperate with C code". Each platform defines its own scope (and the tier 1 platforms are defined in this RFC), and as long as the function is within that scope it's welcome in |
If you want to draw the line that way, I guess that's fine. It's a good point that we don't actually have to implement all the functions which could theoretically be supported.
Sure, memcmp probably isn't changing... but for example, in VS2015, the CRT no longer provides a symbol named
It would be preferable if there were a canonical home for wchar_t, although I guess it's not a hard requirement.
Hmm... I'll try to work through various scenarios. Everything statically linked, built from source, mostly just works... but there are two issues. One, if the user uses functions like printf, where the symbol changes, or a function like wcstok, where the signature changes, libc has to know which CRT you will be linking against. Two, some libraries might prefer to pin a CRT version so that they don't have to deal with unknown runtime behavior changes in a future version of the CRT. Suppose I'm dealing with a badly written binary DLL, I might need to link dynamically against a particular CRT to get the right versions of Suppose I'm writing pure Rust code, and I decide to allocate memory using |
Thanks for the MSDN link! Definitely quite helpful to see what kind of breakage we're talking about. While good to keep in mind, I don't think it affects the design of this library all that much though. In the worst case it's got a build script which dynamically alters the surface area and API of With respect to the linkage issues, it also seems somewhat orthogonal to the API issue of |
If we just say by definition that libc picks up symbols from whatever CRT is linked in, and rustc controls which CRT is linked in, yes, there isn't really any design impact. There are other possible designs, though: if libc controls which CRT is linked in, that would probably be exposed through cargo features or something like that. We might be able to put that off until later, though. |
I presume this means we won't be using The scope for the Windows version of libc seems reasonable to me. I don't see the loss of Windows bindings as a drawback. With regards to |
To help write "cross platform" code in the sense of trying to avoid unnecessary Currently on Windows the leading underscore is removed and things like |
I'd like the following approach to organization of the public namespace to be considered: The purpose is to make the writer of consumer code conscious of the non-universal availability of those APIs, and facilitate management of downstream OS-specific code with There probably will still be some per-target variation, such as "holes" where particular functions aren't available on certain targets despite being available elsewhere in the wider family. But that kind of organization would help manage most of the differences. |
I expect that manual editing of target-specific definitions will leave room for error, even with the general intent to back it up with tests. To validate the data type definitions it's necessary to test correct functionality of some C API functions that use the data types, which may not be feasible for all APIs without imposing particular requirements on the test environment, such as filesystem, configuration of the network, DNS etc. Testing structure fields, bit flags, or enum constants may need a lot of care to isolate the effect of each definition. A more practical approach in the long term would be to generate the FFI definitions per target from the system C headers, and only edit the public reexports manually. The generated code for all supported targets would still be checked into git, but there should be tools to update it for a specific target when needed. |
That is a bit contradictory: if you are writing pure Rust code, surely you'd be using Rust allocators? :) If some Rust crates have different requirements for linking the CRT, they probably should do it explicitly; there could be a MSVC-specific crate to facilitate that.
The line may be drawn before functions that deal with state managed by the CRT, like |
Yes... if you dynamically link against the CRT, and don't override the linker settings, it all just works out. If you statically link against the CRT, or try to mix different versions of the CRT, you can run into trouble. |
Keep in mind that Rust's allocator doesn't use the CRT allocator, even if jemalloc is disabled. The only always safe option is to have the C library provide functions to free the things that it allocates. |
Thanks for bringing this up! I talked about this a bit in the drawbacks and alternative sections, but it's good to dive in here with more detail to see what it would look like. This is sort of what the library does today (only with standards, not surfaces kinda), so there's at least some precedent with this. I definitely agree that this fits Rust's "conventional platform-specific functionality" pattern better where the main surface area is entirely cross platform where modules then contain the specific functionality per-platform (e.g. In considering this, however, I found that it may not end up solving some of the points in the motivation section. For example:
Overall I ended up personally concluding that this form of organization was one where the cons outweighed the pros, but I'm curious how you think about these topics?
Yeah I expect that we may have to issue updates or fixes to APIs, and the hope is that with the automated testing in place we're at least "as covered in possible" and can hopefully prevent issuing a new major version of
I totally agree! @nagisa mentioned this earlier, and this RFC certainly doesn't preclude the approach of auto-generating FFI definitions and committing them in that form. That kind of one-time operation would be quite useful and could be easily verified with the testing infrastructure set up as well. |
|
||
This is also a bit of a maintenance burden on the standard library itself as it | ||
means that all the bindings it uses must move to `src/libstd/sys/windows/c.rs` | ||
in the immedidate future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although with a bit of build system finagling (like say, making Rust use Cargo as its build system), you could make it use winapi
for those things. 😉
@alexcrichton, I agree that it's unlikely that I'm thinking of segmenting the non-portable names along OS families correlated to the target configuration, where anything in |
I definitely agree that there's no speed bump to writing non-portable code with a flat namespace. I do think, however, that exposing e.g. You may want to take a look at the current organization of the |
This lint warning was originally intended to help against misuse of the old Rust `int` and `uint` types in FFI bindings where the Rust `int` was not equal to the C `int`. This confusion no longer exists (as Rust's types are now `isize` and `usize`), and as a result the need for this lint has become much less over time. Additionally, starting with [the RFC for libc][rfc] it's likely that `isize` and `usize` will be quite common in FFI bindings (e.g. they're the definition of `size_t` and `ssize_t` on many platforms). [rfc]: rust-lang/rfcs#1291 This commit disables these lints to instead consider `isize` and `usize` valid types to have in FFI signatures.
This lint warning was originally intended to help against misuse of the old Rust `int` and `uint` types in FFI bindings where the Rust `int` was not equal to the C `int`. This confusion no longer exists (as Rust's types are now `isize` and `usize`), and as a result the need for this lint has become much less over time. Additionally, starting with [the RFC for libc][rfc] it's likely that `isize` and `usize` will be quite common in FFI bindings (e.g. they're the definition of `size_t` and `ssize_t` on many platforms). [rfc]: rust-lang/rfcs#1291 This commit disables these lints to instead consider `isize` and `usize` valid types to have in FFI signatures.
@alexcrichton for the auto-generated bindings on at least UNIX’es most of the relevant documentation can be pulled down from the manual pages or said manual pages linked to. e.g. something like
Manual pages usually already contain useful availability information (e.g.
) Lists of supported platforms can be pretty reliably generated over a few generations by making generators communicate (by generating lists of bound functions or something along the lines along with the bindings themselves). This would in a sense help somewhat with the fact that documentation pages are usually generated on linux only (and hence are linux specific). |
I am currently opposed to the line |
That sounds promising to me! It'd actually be quite nice if could just auto-generate all the linux (and perhaps other platforms) bindings at once and be done with it! As I've mentioned earlier, the question of linking in the CRT I believe is a moot point with respect to this RFC. I certainly agree there are improvements to be made, but it doesn't have much to do with the API of liblibc itself. |
Can you clarify more between the cross-platform bit of the RFC:
and this comment:
Well And as you know, on the internals list I brought up Linux LFS and the effects of I think the right answer is for Rust to choose the "best" variant, with 64-bit |
You certainly bring up some excellent points! I agree that for now the "best" variant is probably the one that should be bound by liblibc, and we can perhaps add bindings in the future explicitly to older or different functions. I think one metric could be to compile a C program referencing the APIs in question with the "best set of #define directives" in play and then take a look at the object file and reference those symbol names (and corresponding structures). Consequently this is why we end up with crazy function names on OSX. Does that make sense? Do you think it should be fleshed out a little more? |
Yes, I'm happy with that approach, thanks! |
I'm not familiar with how rustc operates on Windows, but unless output executables and DLL are all linked with CRT, Perhaps the only place where the CRT DLL name can visibly leak into the Rust build system is the links key for Cargo, so special care needs to be taken there. The nursery project does not currently have |
Consider a C/C++ program, which has already made a decision of which CRT to link. Now, a Rust library is added to the program, which depends on the libc crate. Ideally, libc needs to use whatever version of the CRT the C/C++ program had already decided to use, instead of having libc force the program to make its CRT choice based on libc's choice. |
I assume, in a future revision of the crate, that can be selected at build time by means of configuration features. This detail should not affect the Rust source of crates using One possible case where this could break is when the stable API has implicit dependency on the choice of the CRT version or linkage variant. We probably can set the cutoff on the oldest supported CRT based on what symbols |
As you can see above, I've filed a pull request that documents the equivalence and suggests using the Rust names. You mentioned caveats regarding auto-generated or auto-verified bindings, but I don't think any caveat is necessary. It is trivial for binding generators and verifiers to map between Rust and C names, and such programs should also prefer to use the Rust names. |
Digging through old Rust issues, I found rust-lang/rust#17547 which is about libc. |
The libs team discussed this during triage yesterday, and the decision was to merge, so I will do so. Thanks for the discussion everyone! |
This commit replaces the in-tree liblibc with the [external clone](https://github.com/rust-lang-nursery/libc) which has no evolved beyond the in-tree version in light of its [recent redesign](rust-lang/rfcs#1291). The primary changes here are: * `src/liblibc/lib.rs` was deleted * `src/liblibc` is now a submodule pointing at the external repository * `src/libstd/sys/unix/{c.rs,sync.rs}` were both deleted having all bindings folded into the external liblibc. * Many ad-hoc `extern` blocks in the standard library were removed in favor of bindings now being in the external liblibc. * Many functions/types were added to `src/libstd/sys/windows/c.rs`, and the scattered definitions throughout the standard library were consolidated here. At the API level this commit is **not a breaking change**, although it is only very lightly tested on the *BSD variants and is probably going to break almost all of their builds! Follow-up commits to liblibc should in theory be all that's necessary to get the build working on the *BSDs again.
Move the
libc
crate into therust-lang
organization after applying changessuch as:
namespace at the top of the crate.
platforms in an automatic fashion.
platform.
Rendered