Tracking Issue for `ascii::Char` (ACP 179) #110998

scottmcm · 2023-04-29T19:59:34Z

Feature gate: #![feature(ascii_char)] #![feature(ascii_char_variants)]

This is a tracking issue for the ascii::Char type from rust-lang/libs-team#179

https://doc.rust-lang.org/nightly/std/ascii/enum.Char.html

Public API

// core::ascii

#[repr(u8)]
enum Char {
    Null = 0,
    …
    Tilde = 127,
}

impl Debug for Char { … }
impl Display for Char { … }
impl Default for Char { ... }

impl Step for Char { ... } // so `Range<Char>` is an Iterator

impl Char {
    const fn from_u8(x: u8) -> Option<Self>;
    const unsafe fn from_u8_unchecked(x: u8) -> Self;
    const fn digit(d: u8) -> Option<Self>;
    const unsafe fn digit_unchecked(d: u8) -> Self;
    const fn as_u8(self) -> u8;
    const fn as_char(self) -> char;
    const fn as_str(&self) -> &str;
}

impl [Char] {
    const fn as_str(&self) -> &str;
    const fn as_bytes(&self) -> &[u8];
}

impl From<Char> for u8 { … }
impl From<Char> for char { … }
impl From<&[Char]> for &str { … }

// core::array

impl<const N: usize> [u8; N] {
    const fn as_ascii(&self) -> Option<&[ascii::Char; N]>;
    const unsafe fn as_ascii_unchecked(&self) -> &[ascii::Char; N];
}

// core::char

impl char {
    const fn as_ascii(&self) -> Option<ascii::Char>;
}

// core::num

impl u8 {
    const fn as_ascii(&self) -> Option<ascii::Char>;
}

// core::slice

impl [u8] {
    const fn as_ascii(&self) -> Option<&[ascii::Char]>;
    const unsafe fn as_ascii_unchecked(&self) -> &[ascii::Char];
}

// core::str

impl str {
    const fn as_ascii(&self) -> Option<&[ascii::Char]>;
}

Steps / History

Unresolved Questions

What should it be named? Code mixing char and Char might be too confusing.
How should its Debug impl work?
- make Debug impl for ascii::Char match that of char #115434
Is allowing as-casting from it a good or bad feature?
- FWIW, there's no char::to_u32, just as u32 for it.
Some of the as_ascii methods take &self for consistency with is_ascii. Should they take self instead where possible, as the usually-better option, or stick with &self for the consistency?

https://std-dev-guide.rust-lang.org/feature-lifecycle/stabilization.html ↩

The text was updated successfully, but these errors were encountered:

BurntSushi · 2023-05-02T14:17:04Z

I guess I'll kick off the discussion about how to actually define the ascii::Char type. Right now, it's an enum like this:

#[repr(u8)]
enum Char {
    Null = 0,
    …
    Tilde = 127,
}

I'd like to first make sure I understand the pros of using this representation. Do I have them right?

It provides a niche optimization such that Option<ascii::Char> is the same size as ascii::Char.
It provides a way to write ch as u8 where ch has type ascii::Char.
For "free," you can get cute and write the names of ASCII characters instead of their visual or numerical representation. (I don't mean to trivialize this by calling it "cute." I think it's a nice benefit, especially for non-printable characters.)

Are there more? Am I mistaken about the above?

And now, the mitigations:

I think the niche optimization can still be obtained in this case because std can use unstable features? And I think there's a way to say, "make this specific value the niche" without any other additional cost.
We can provide a ascii::Char::as_u8() method. (Which is already done in the implementation PR.) Is there anything else that being able to write ch as u8 buys us?
I don't think there is a mitigation here. It feels like a "nice to have" aspect of using an enum that we probably wouldn't go out of our way to re-create via other means.

In terms of negatives, why not do the enum in the first place? Good question. I'm not entirely sure. It feels to me like it stresses my weirdness budget. It's also a generally much bigger enum than most, and I wonder whether that will provoke any surprises in places. Maybe not.

ChayimFriedman2 · 2023-05-04T06:00:15Z

Do we need both u8::as_ascii() and ascii::Char::from_u8()?

Add `ascii::Char` (ACP#179) ACP second: rust-lang/libs-team#179 (comment) New tracking issue: rust-lang#110998 For now this is an `enum` as `@kupiakos` [suggested](rust-lang/libs-team#179 (comment)), with the variants under a different feature flag. There's lots more things that could be added here, and place for further doc updates, but this seems like a plausible starting point PR. I've gone through and put an `as_ascii` next to every `is_ascii`: on `u8`, `char`, `[u8]`, and `str`. As a demonstration, made a commit updating some formatting code to use this: scottmcm@ascii-char-in-fmt (I don't want to include that in this PR, though, because that brings in perf questions that don't exist if this is just adding new unstable APIs.)

scottmcm · 2023-05-04T23:31:00Z

Regarding the enum, @kupiakos said the following in

The names or properties of ASCII characters will never change and there aren't many of them, and so we might as well expose them as variants that can participate in pattern matching.

And mentioned icu4x's enum,

https://github.com/unicode-org/icu4x/blob/b6c4018a736e79790898c5b91ff3ab25d33192c2/utils/tinystr/src/asciibyte.rs#L8-L11

Though given those names it's possible it's just doing that for the niche and not for direct use.

(Man I wish we had private variants as a possibility.)

To your questions, @BurntSushi ,

Yeah, there's at least three other possibilites here: a) put the enum in a newtype (like ptr::Alignment) to still get the niches without exposing the variants b) use the rustc-internal magic c) just don't get a niche for now since that's not needed for its core scenario of convertibility to UTF-8 (though it's certainly nice to have)
Given that we're overall not fond of as casts, and it also allows weirder things like as i64, I actually wish the cast could be disabled, and have people use as_u8 as you mention. But that might also just happen by deprecating or warning on these casts, and eventually having ascii::Char: AsRepr<Repr = u8> does seem reasonable.
Now that Refactor core::char::EscapeDefault and co. structures #105076 has landed, I've been thinking of redoing the internals of those using ascii::Char to try out how it feels using the names. That would mean code like

            '\0' => EscapeDebug::backslash(Null),
            '\r' => EscapeDebug::backslash(CarriageReturn),
            '\n' => EscapeDebug::backslash(LineFeed ),

because without the variants it ends up being more like

            '\0' => EscapeDebug::backslash(b'\0'.as_ascii().unwrap()),
            '\r' => EscapeDebug::backslash(b'\r'.as_ascii().unwrap()),
            '\n' => EscapeDebug::backslash(b'\n'.as_ascii().unwrap()),

which is nice in that it keeps the escape sequences lining up, but having to copy-paste .as_ascii().unwrap() all over like that doesn't fill me with joy.

I guess this is yet another situation in which I'd like custom literals. Or something like a'\n', I guess, but that's way too far a jump to propose right now.

EDIT a while later: The PR to update EscapeDebug like that is #111524

Constify `[u8]::is_ascii` (unstably) UTF-8 checking in `const fn`-stabilized back in 1.63 (rust-lang#97367), but apparently somehow ASCII checking was never const-ified, despite being simpler. New constness-tracking issue for `is_ascii`: rust-lang#111090 I noticed this working on `ascii::Char`: rust-lang#110998

clarfonthey · 2023-05-13T19:16:35Z

I wasn't in the discussion when this type was created (which I fully support), but is there a reason to explicitly use an enum instead of a newtype + internal niche attribute like NonZeroU8?

If we decide to go the pattern types route (#107606) then NonZeroU8 could become u8 is 1.. and this could become u8 as ..128. Alternatively, if we go the generic integers route (#2581) then this could become uint<7> or just u7.

IMHO, explicitly going with an enum kind of prevents these kinds of optimisations later down the road, since it forces an enum as the representation.

BurntSushi · 2023-05-13T19:26:08Z

@clarfonthey I think the discussion above answers your question about why we went with the enum for now. In summary, I think its principle upside is that it gives nice names to each character. Not particularly compelling, but nice.

My general feeling is that nobody is stuck on using an enum here. My best guess is that unless something compelling comes up, I wouldn't be surprised if we changed to an opaque representation such that we could change to an enum later in a compatible manner. (I think that's possible?) Purely because it's the conservative route.

I don't grok the second two paragraphs of your comment. Probably because I don't know what "pattern types" are. I also don't know what uint<7> means. (Don't have time to follow your links at the moment.)

clarfonthey · 2023-05-13T19:52:25Z

I mean, having nice names is still achievable with constants, and this still would make match expressions work as you'd expect. Personally, the benefits I'm seeing here are the ability to have more-performant code without the need for extra unsafe code, since any slice of ASCII characters is automatically valid UTF-8. Otherwise, a type invariant would mostly be unnecessary, and you could just have constants for the names.

Explaining the two proposals:

Pattern types just allow you to write T is pat which is a new type that means T, but only values which match pat. So, Option<T> is Some(_) means "options which are some" and u8 is 1.. is effectively NonZeroU8.
Generic integers is a separate proposal which "converts" all the existing uN and iN types into aliases for a more generic uint<N> and int<N> type, which has the benefit of allowing non-power-of-two sizes and sizes larger than 128. The idea behind uint<7> is that it would be equivalent to u8 internally with the first bit always zero, although still allow all the normal operations on integers like pattern-matching and be castable to u8.

The pattern type version is very enticing because it very naturally allows exhaustive matching, while still just being a compile-time constraint. And like I said, we could still have constants for the different names for characters without really affecting things.

BurntSushi · 2023-05-13T20:12:03Z

having nice names is still achievable

Yes, as I said:

I don't think there is a mitigation here. It feels like a "nice to have" aspect of using an enum that we probably wouldn't go out of our way to re-create via other means.

I don't know what the stabilization story is for pattern types or generic integers, but I would prefer we focus on finding a path to stabilization that doesn't require being blocked on hypothetical language features. Or at least, focusing discussion on whether we can utilize them later in a backwards compatible fashion.

I still don't really know what pattern types are (I just don't have time right now to grok it), but generic integer types seem like an implementation detail that's consistent with defining ascii::Char as an opaque type?

clarfonthey · 2023-05-13T20:30:33Z

Right, the point here is that any opaque type would be equivalent to the primary hypothetical future implementations. So, explicitly not blocking on any particular path forward, but making sure that they're compatible.

scottmcm · 2023-05-13T20:56:51Z

I mean, having nice names is still achievable with constants, and this still would make match expressions work as you'd expect.

Though you can use a variant, but can't use an associated constant. Thus use ascii::Char::*; works with an enum, but not with constants for the names.

I agree that this could absolutely be done with the magic attributes. But is there anything in particular that you think is better with that approach?

The enum version still gets the niche, notably:

[src/main.rs:3] std::alloc::Layout::new::<Option<Option<Option<Option<Option<std::ascii::Char>>>>>>() = Layout {
    size: 1,
    align: 1 (1 << 0),
}

clarfonthey · 2023-05-14T06:32:57Z

I will admit that the lack of an ability to use associated constants is an immediate downgrade, although I do think that'll eventually be allowed to eventually deprecate the std::fN::consts modules.

As far as benefits -- the main ones I see are the ability to unify this type with any one of the potential future features I mentioned. If we stabilise an enum now, we are explicitly locked out of this.

That said, there is precedent for separating out the char type from the other integer types, and I guess that ascii::Char fits right in with that. I'm just not convinced that locking in this as an enum is worth losing the benefits we could get later, especially considering how it could be converted from an opaque type into an enum publicly later if we eventually decide that is better than the alternatives.

scottmcm · 2023-05-14T06:37:50Z

I think it's important that this stay a separate type. To me, it's an advantage that this not just be a u7, but that it have the semantic intent behind it of actually being ASCII.

For example, if we have pattern types, I don't want to impl Display for &[u8 is ..128], but I think it's possible that we'd add impl Display for &[ascii::Char]. (Like how Debug for char is very different from Debug for a hypothetical u32 is ..=1114111.)

Should it be something else to be able to stabilize a subset sooner? Maybe; I don't know. I think I'll leave questions like that more to libs-api.

clarfonthey · 2023-05-14T14:34:11Z

That's fair, and in that case I'll concede that the enum works in this case. I still am not sure if it's the best-performing version (I use the bounded-integer crate for all sorts of things, and it uses enums as it's codegen) but I feel convinced enough that we can stabilise this as a separate type.

`ascii::Char`-ify the escaping code in `core` This means that `EscapeIterInner::as_str` no longer needs unsafe code, because the type system ensures the internal buffer is only ASCII, and thus valid UTF-8. Come to think of it, this also gives it a (non-guaranteed) niche. cc `@BurntSushi` as potentially interested `ascii::Char` tracking issue: rust-lang#110998

safinaskar · 2023-05-22T18:42:14Z

I don't like name Char. It is too similar to char, despite these are totally different things. I propose ASCII instead

safinaskar · 2023-05-22T18:46:35Z

If you don't like name ASCII, then name it Septet, but, please, not Char

BurntSushi · 2023-05-22T18:50:29Z

I don't think ASCII or Septet are going to work. I can't imagine those being the names we end up with personally.

ascii::Char and char are not totally different things. They both represent abstract characters. One is just a more limited definition than the other, but both are in fact quite limited! ascii::Char is namespaced under the ascii module, which provides pretty adequate context for what kind of character it represents IMO.

clarfonthey · 2023-05-22T21:42:51Z

Honestly, the name Ascii for a type would make sense since it lives in the ascii module, since that would mirror existing other Rust types. It doesn't seem out of place to refer to an ASCII character as "an Ascii" especially considering how there are already lots of terms that use "ASCII" as a placeholder for "text," as in stuff like "ASCII art." Nowadays, people even refer to Unicode art as ASCII art.

In terms of precedent for the ascii::Char name, I would point toward the various Result types in modules offered across libstd and the ecosystem, where io::Result and fmt::Result are two common examples. While the base Result is in prelude, these separate Result types are in separate modules and often used exclusively as module-specific imports, although in rare cases code will directly import them. I don't think this is a good analogy, however, since ascii::Char and char actually are completely different types with entirely different properties, and not just shorthands for each other, even though you can convert one into the other.

The only real downside I can see to using the name Ascii is that it's converting an acronym into a a word and lowercasing most of the letters, which… didn't stop us for stuff like Arc and Rc and OsStr.

Personally, I've thought that the core::ascii module has been mostly useless until the addition of this character now that AsciiExt is deprecated, although revitalising it to introduce a type called Ascii would make the most sense to me. Plus, it would even lead a path forward to potentially adding this type to prelude if it were considered useful there, or maybe the preludes of other crates that rely on it.

BurntSushi · 2023-05-22T22:05:20Z

Using Ascii as a singular noun sounds very strange to me. "ASCII art" is using "ASCII" as an adjective. I'm not aware of any wide usage of "Ascii" as a singular noun referring to a single character. Its usage as a noun, in my experience, refers to a character set and/or encoding. Not a member of that set/encoding.

clarfonthey · 2023-05-22T22:28:25Z

I mean, it does feel weird, but then again, so does calling a vector a "vec" until you say it enough times. If you think of "ASCII" as shorthand for "ASCII character", I don't think that's too far-fetched.

Oh, and even "char" is an abbreviation.

kupiakos · 2023-06-10T04:06:50Z

2. Given that we're overall not fond of as casts, and it also allows weirder things like as i64, I actually wish the cast could be disabled

I see no issue with this; ascii::Char, or whatever it's called, can be fully represented by every numeric type except bool. The issue with as casts changing the value is irrelevant for this case.

I mean, it does feel weird

This message is composed of 420 ASCIIs.

bluebear94 · 2023-06-30T05:59:45Z

I also support calling this type Ascii because calling it Char is likely to give worse diagnostics if you mix up char and Char. The Char name would also result in confusion to readers unless you qualify it as ascii::Char, which is more verbose than Ascii.

Also, rustdoc currently doesn’t play well with types that are exported under a different name from their declaration; for instance, this method shows the return type as AsciiChar, which is the name used in the declaration of this type.

(Note: I dislike the convention of having multiple types named Error and Result in the standard library and would rather have had FmtError, IoError, and so on, so my opinions might be biased.)

scottmcm · 2024-02-22T20:37:42Z

With NonZero<T> in FCP (#120257 (comment)), I was inspired by it and jake's previous comment to come back to this and propose something that I think might be reasonable.

We're about to stabilize

pub struct NonZero<T>(/* private fields */)
where T: ZeroablePrimitive;

where ZeroablePrimitive is a forever-internal implementation detail, but there's always a get: NonZero<T> -> T.

So here we could do something similar:

pub struct Ascii<T: ?Sized>(/* private fields */)
where T: SupersetOfAscii;

Then for T: Sized we'd have the same get: Ascii<T> -> T, but we'd also have for everything a as_str: &Ascii<T> -> &str as well as allowing it to deref (perhaps indirectly) from &Ascii<T> to &T.

So internally that might be

pub struct Ascii<T: ?Sized + SupersetOfAscii>(<T as SupersetOfAscii>::StorageType);

Then we could implement that trait for various types -- u8 and u32 where the internal StorageType is a private thing with #[rustc_layout_scalar_valid_range_end(0x7F)], but also implement it for [u8; N] and [u8] and such to cover more general things.

(That would allow Ascii<u32>: AsRef<char> + AsRef<&str> for example, since you can get references to either from it. Might not be worth bothering having that, though, since I've never seen anything that cares about AsRef<char>.)

Thoughts, @BurntSushi ? Would you like to see that as a PR or an ACP (or nothing)?

programmerjake · 2024-02-22T20:57:34Z

If we have Ascii<T: ?Sized>: Deref<Target = T>, I think we'll need both Ascii<str> and Ascii<[u8]> as well as the corresponding owned types. We should have Ascii<[u8]>: AsRef<Ascii<str>> and visa versa, and other corresponding conversions for owned/borrowed types.

scottmcm · 2024-02-22T21:03:52Z

On Deref: yeah, I haven't quite thought this one through all the way. I added the "(perhaps indirectly)" parenthetical to try to add some space for it -- like maybe we don't always have Ascii<T: ?Sized>: Deref<Target = T> because we deref the ascii one to something else that then derefs to the original thing.

But thinking about it more as I type, maybe it's just a bad idea. We don't have &String -> &Vec<u8> as a deref coercion -- not even indirectly -- so maybe trying to have it here would be wrong too.

Maybe I should propose Ascii<T: ?Sized>: AsRef<T> as the general thing instead, since that we can definitely do, and we'll be more limited in which things we allow to Deref at all.

Kimundi · 2024-02-25T17:24:31Z

Heh, that actually reminds me of the API I came up with for a toy hex formatting crate of me: https://docs.rs/easy-hex/1.0.0/easy_hex/struct.Hex.html

Basically, the whole api resolves around being able to cast T to/from Hex<T> just to get changed trait impl semantic. In my case its just a repr(transparent) wrapper, so I do not change representation, but still the idea seems similar.

That said, I feel like a basic AsciiChar type is still the best course of action, otherwise it seems like the whole API discussion here has to be started from scratch :D

For the [AsciiChar] does not implement Debug ruight problem, could we maybe provide a struct AsciiSlice([AschiiChar]);, and just make it easy to convert to/from the basic slice type? I could image that becoming useful for more things than just the debug formatting impl.

jongiddy · 2024-04-25T09:40:56Z

Will it be possible to backwards-compatibly redefine [u8]::escape_ascii to return an Iterator<Item=AsciiChar> instead of the current Iterator<Item=u8>?

Currently the returned iterator provides a to_string() method that collects the characters into a string, but if any other iterators are chained, we lose the info that the bytes are safe to collect into a String.

programmerjake · 2024-04-25T10:33:16Z

Will it be possible to backwards-compatibly redefine [u8]::escape_ascii to return an Iterator<Item=AsciiChar> instead of the current Iterator<Item=u8>?

yes, but likely only if rust implements something like edition-dependent name resolution where the same name resolves to two different escape_ascii functions depending on the edition.

I previously proposed something like that: https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/Effect.20of.20fma.20disabled/near/274199236

clarfonthey · 2024-05-01T16:15:56Z

This is a random thought I had, but seeing the progress on this makes me wonder if we could make char itself implemented as a lang item struct instead of a primitive in the language without breaking anything.

Figuring out what would have to be done so avoid any breakages there could probably be useful here, since they would also need to be applied to ASCII chars too.

scottmcm · 2024-05-01T16:19:52Z

@clarfonthey Practically I think no, because char has exhaustive matching for its particular disjoint range, which isn't something exposable today.

Maybe eventually, with fancy enough matching-transparent pattern types, but I'm not sure it'd ever really be worth it. (For something like str, sure, but char has so much special stuff that I'm skeptical.)

clarfonthey · 2024-06-19T21:30:59Z

Getting back to this feature a bit. Genuine question: instead of the common named variants, now that the precedent has been set for CStr having dedicated c"..." literals, would it be reasonable to add a'x' and a"..." literals? This could also solve the issue with ascii_char_variants being unstable, preferring that you use the literals instead of the variants.

Not sure if implementing this as a POC in unstable would require an RFC or not.

leb-kuchen · 2024-06-29T10:26:44Z

is it possible to hide the variants in the sidebar of the documentation or even better the entire section?

core: optimise Debug impl for ascii::Char Rather than writing character at a time, optimise Debug implementation for core::ascii::Char such that it writes the entire representation with a single write_str call. With that, add tests for Display and Debug. Issue: rust-lang#110998

Rollup merge of rust-lang#120314 - mina86:i, r=Mark-Simulacrum core: optimise Debug impl for ascii::Char Rather than writing character at a time, optimise Debug implementation for core::ascii::Char such that it writes the entire representation with a single write_str call. With that, add tests for Display and Debug. Issue: rust-lang#110998

core: optimise Debug impl for ascii::Char Rather than writing character at a time, optimise Debug implementation for core::ascii::Char such that it writes the entire representation with a single write_str call. With that, add tests for Display and Debug. Issue: rust-lang/rust#110998

NathanielHardesty · 2024-09-20T19:11:47Z

Re: ASCII Literals

As an end user of Rust who wants good ASCII support, I believe the best way to improve this feature is to add literals such as a'A' and a"ASCII" as others have suggested. Doing fancy manipulations on strings is far less important (to me anyways) than just being able to write strings so that they can be stored in structs or passed into functions.

Currently, the best solution to arriving at the ASCII subset is this:

[Char::SmallA, Char::SmallS, Char::SmallC, Char::SmallI, Char::SmallI]

Being forced to write out all of the Char variants just to arrive at a [Char] is cumbersome and it seems as though these variants are an unstable feature unto themselves which is best avoided.

On the other hand, byte string literals can be used in this manner:

b"ascii".as_ascii().unwrap()
unsafe {b"ascii".as_ascii_unchecked()}

This forces me to use runtime checks, or simply cross my fingers, rather than use compile time checks in order to assure that the ASCII subset is not being violated. I'm perfectly fine with doing runtime checks when I'm parsing what very well could be arbitrary data, but source code is not arbitrary and violations should be able to be caught at compile time.

ChayimFriedman2 · 2024-09-21T17:13:03Z

@NathanielHardesty If the functions for constructing &[AsciiChar] from &[u8] are const, you can create a macro for it (it's still worse than compiler support, but not by much).

Marcondiro · 2024-09-25T08:21:59Z

Hello, do you plan to add a char::to_digit(self, radix: u32) equivalent?
This method was proposed in #95969 / #96110 for u8 but was rejected.
Thanks!

NobodyXu · 2024-09-25T10:39:39Z

@Marcondiro you will need to open an issue in rust-lang/libs-team, which is called an ACP.

The libs-api team would then provide you with feedback there.

scottmcm · 2024-09-26T01:49:16Z

Interestingly, it looks like char::from(a).to_digit(10) actually optimizes well: https://rust.godbolt.org/z/K5nebha3a

I had been about to say "just send a PR since it's something char has", but then I looked at the to_digit signature, and now I want an ACP to discuss u32 vs some other width for the argument and the return :/

CodesInChaos · 2024-11-18T17:17:49Z

Why do the fallible functions return Option instead of Result<T, AsciiError>? Returning Result would be consistent with utf8 conversion, and would produce a reasonable error when unwrapping. Plus TryFrom already needs an error type.
Would it make sense to add a family of const fn as_ascii_mut[_unchecked](&mut [u8]) -> Option<&mut [ascii::Char]> functions (for u8, [u8], str, [u8; N]) or is TryFrom enough?
Or should some of the conversion functions proposed in the original post be removed in favour of TryFrom?
NonZero<ascii::Char> support could be useful since it can be infallibly converted to CString.
Concerning naming the bikeshed, I like Ascii. It's short, and meaningful even outside the module context (e.g if it becomes part of the prelude at some point). But I'd be fine with AsciiChar as well.

zopsicle · 2024-11-18T18:59:42Z

@CodesInChaos &[NonZero<ascii::Char>] cannot be converted to &CStr because there is no terminating nul. It can be infallibly converted to CString though.

clarfonthey · 2024-11-18T20:55:58Z

NonZero<ascii::Char> actually could be implemented with just a library change, although I'm not sure whether libs-api would rather make that go through ACP first. I would imagine that, at least this being the first non-primitive type to support NonZero, it would be nice to ensure that this precedent is okay.

clarfonthey · 2024-11-18T20:57:19Z

On the other hand, byte string literals can be used in this manner:
* `b"ascii".as_ascii().unwrap()`

* `unsafe {b"ascii".as_ascii_unchecked()}`
This forces me to use runtime checks, or simply cross my fingers, rather than use compile time checks in order to assure that the ASCII subset is not being violated. I'm perfectly fine with doing runtime checks when I'm parsing what very well could be arbitrary data, but source code is not arbitrary and violations should be able to be caught at compile time.

It's worth adding that, if these were available in const context, you could verify they're valid at compile-time by wrapping them in a const block. Although I agree that adding literals is still a good idea.

nnethercote · 2024-11-19T05:03:15Z

A drive-by bikeshed comment that I don't think has been made before: UpperCaseA and LowerCaseA would be preferable to CapitalA and SmallA, for consistency with char::is_{upper,lower}_case, and because they are just clearer names.

ChrisDenton · 2024-11-19T07:32:56Z

I don't think these names are ever intended to be stable. IIRC using an enum is a bit of a hack to get all the niches. The intent is for it to appear more like char to users.

That said, the official Unicode™ name is "Latin Capital letter A"

scottmcm added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels Apr 29, 2023

This was referenced Apr 30, 2023

Add ascii::Char (ACP#179) #111009

Merged

Add an "ascii character" type to reduce unsafe needs in conversions rust-lang/libs-team#179

Closed

scottmcm mentioned this issue May 4, 2023

Constify [u8]::is_ascii (unstably) #111222

Merged

scottmcm mentioned this issue May 13, 2023

ascii::Char-ify the escaping code in core #111524

Merged

This comment was marked as resolved.

Sign in to view

scottmcm mentioned this issue Mar 13, 2024

Add str::reverse method for in place string reversal rust-lang/libs-team#353

Closed

NightEule5 mentioned this issue Aug 15, 2024

Add read/write_ascii NightEule5/data-streams#20

Closed

ChaiTRex mentioned this issue Aug 21, 2024

Use assert_unsafe_precondition! in AsciiChar::digit_unchecked #129374

Merged

Tracking Issue for ascii::Char (ACP 179) #110998

Tracking Issue for ascii::Char (ACP 179) #110998

Comments

scottmcm commented Apr 29, 2023 • edited by Dylan-DPC Loading

Public API

Steps / History

Unresolved Questions

Footnotes

BurntSushi commented May 2, 2023

ChayimFriedman2 commented May 4, 2023

scottmcm commented May 4, 2023 • edited Loading

clarfonthey commented May 13, 2023

BurntSushi commented May 13, 2023

clarfonthey commented May 13, 2023 • edited Loading

BurntSushi commented May 13, 2023

clarfonthey commented May 13, 2023

scottmcm commented May 13, 2023

clarfonthey commented May 14, 2023

scottmcm commented May 14, 2023

clarfonthey commented May 14, 2023

safinaskar commented May 22, 2023

safinaskar commented May 22, 2023

BurntSushi commented May 22, 2023

clarfonthey commented May 22, 2023

BurntSushi commented May 22, 2023

clarfonthey commented May 22, 2023 • edited Loading

kupiakos commented Jun 10, 2023

bluebear94 commented Jun 30, 2023

scottmcm commented Feb 22, 2024

This comment was marked as resolved.

This comment was marked as resolved.

programmerjake commented Feb 22, 2024 • edited Loading

scottmcm commented Feb 22, 2024

Kimundi commented Feb 25, 2024 • edited Loading

jongiddy commented Apr 25, 2024

programmerjake commented Apr 25, 2024

clarfonthey commented May 1, 2024

scottmcm commented May 1, 2024

clarfonthey commented Jun 19, 2024

leb-kuchen commented Jun 29, 2024

NathanielHardesty commented Sep 20, 2024

ChayimFriedman2 commented Sep 21, 2024

Marcondiro commented Sep 25, 2024

NobodyXu commented Sep 25, 2024

scottmcm commented Sep 26, 2024

CodesInChaos commented Nov 18, 2024 • edited Loading

zopsicle commented Nov 18, 2024

clarfonthey commented Nov 18, 2024

clarfonthey commented Nov 18, 2024

nnethercote commented Nov 19, 2024

ChrisDenton commented Nov 19, 2024

Tracking Issue for `ascii::Char` (ACP 179) #110998

Tracking Issue for `ascii::Char` (ACP 179) #110998

scottmcm commented Apr 29, 2023 •

edited by Dylan-DPC

Loading

scottmcm commented May 4, 2023 •

edited

Loading

clarfonthey commented May 13, 2023 •

edited

Loading

clarfonthey commented May 22, 2023 •

edited

Loading

programmerjake commented Feb 22, 2024 •

edited

Loading

Kimundi commented Feb 25, 2024 •

edited

Loading

CodesInChaos commented Nov 18, 2024 •

edited

Loading