Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ascii::Char (ACP#179) #111009

Merged
merged 1 commit into from
May 4, 2023
Merged

Add ascii::Char (ACP#179) #111009

merged 1 commit into from
May 4, 2023

Conversation

scottmcm
Copy link
Member

@scottmcm scottmcm commented Apr 30, 2023

ACP second: rust-lang/libs-team#179 (comment)
New tracking issue: #110998

For now this is an enum as @kupiakos suggested, with the variants under a different feature flag.

There's lots more things that could be added here, and place for further doc updates, but this seems like a plausible starting point PR.

I've gone through and put an as_ascii next to every is_ascii: on u8, char, [u8], and str.

As a demonstration, made a commit updating some formatting code to use this: scottmcm@ascii-char-in-fmt (I don't want to include that in this PR, though, because that brings in perf questions that don't exist if this is just adding new unstable APIs.)

@rustbot
Copy link
Collaborator

rustbot commented Apr 30, 2023

r? @cuviper

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 30, 2023
@rustbot
Copy link
Collaborator

rustbot commented Apr 30, 2023

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@rust-log-analyzer

This comment has been minimized.

@pitaj
Copy link
Contributor

pitaj commented Apr 30, 2023

Should ascii::Char implement TryFrom<u8> and TryFrom<char>?

@scottmcm
Copy link
Member Author

@pitaj Eventually that seems perfectly reasonable. But that needs an error type, so I left them out of the first PR.

(We can always add more later, especially while it's unstable.)

#[inline]
pub const fn as_char(self) -> char {
// SAFETY: Everything in the ASCII block is a valid USV.
unsafe { char::from_u32_unchecked(self as u32) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unsafe block can be avoided by doing char::from(self.as_u8()).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I should think more about my copy-pastes ;)

Updated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wait, const. No char::from in a const fn. Changed this to a FIXME instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as casting works however:

Suggested change
unsafe { char::from_u32_unchecked(self as u32) }
self.as_u8() as char

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, yet another problem with as: I forget it works and it's not something with a signature in rustdoc to remind me 😬

#[must_use]
#[unstable(feature = "ascii_char", issue = "110998")]
#[inline]
pub const fn as_ascii(&self) -> Option<ascii::Char> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this take &self and not self?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that self would make more sense, but I think that for the existing is_ascii above too.

So I made this &self for consistency with the existing methods here. I'll add a note to the tracking issue to make sure that it's discussed further before stabilization.

#[must_use]
#[unstable(feature = "ascii_char", issue = "110998")]
#[inline]
pub const fn as_ascii(&self) -> Option<ascii::Char> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@rust-log-analyzer

This comment has been minimized.

use crate::mem::transmute;

/// One of the 128 Unicode characters from U+0000 through U+007F,
/// often known as the [ASCII]subset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// often known as the [ASCII]subset.
/// often known as the [ASCII] subset.

#[inline]
pub const fn as_char(self) -> char {
// SAFETY: Everything in the ASCII block is a valid USV.
unsafe { char::from_u32_unchecked(self as u32) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as casting works however:

Suggested change
unsafe { char::from_u32_unchecked(self as u32) }
self.as_u8() as char

/// or returns `None` if it's too large.
#[unstable(feature = "ascii_char", issue = "110998")]
#[inline]
pub const fn from_u8(b: u8) -> Option<Self> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshedding incoming: Why not call the methods from_byte/as_byte? There is no stable precedent in the standard library for including the numeric type name in methods and from_byte is IMHO much simpler to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is, actually: https://doc.rust-lang.org/std/primitive.char.html#method.from_u32

So since it's char::from_u32, I figured that ascii::Char::from_u8 was the most consistent here.

But drop a note in the tracking issue if you feel strongly. I think that's a better place for bikeshedding discussions, so they'll be seen at (eventual) stabilization time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally missed that one! Seems very reasonable to rhyme with it. Bikeshedding complete 😉.

//! because it avoids a whole bunch of "are you sure you didn't mean `char`?"
//! suggestions from rustc if you get anything slightly wrong in here, and overall
//! helps with clarity as we're also referring to `char` intentionally in here.
//! Maybe that means that the stuttering name would be better for the export too?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm conflicted on this personally too. I agree with the decision at least in terms of this module internally, but I would personally very much like to just use use std::ascii; and ascii::Char instead of having to write out use std::ascii::AsciiChar;. It's an interesting stylistic point.

/// U+007F
#[unstable(feature = "ascii_char_variants", issue = "110998")]
Delete = 127,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with landing it like this, but I find that an enum like this is super weird. I'll post more of a comment on the tracking issue because I don't think I fully understand the trade offs here yet.

/// [block]: https://www.unicode.org/glossary/index.html#block
/// [chart]: https://www.unicode.org/charts/PDF/U0000.pdf
/// [NIST FIPS 1-2]: https://nvlpubs.nist.gov/nistpubs/Legacy/FIPS/fipspub1-2-1977.pdf
/// [NamesList]: https://www.unicode.org/Public/15.0.0/ucd/NamesList.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to do it now, but I would like to see a "Why would I use this?" section before stabilization. :-)

library/core/src/ascii/ascii_char.rs Show resolved Hide resolved
@scottmcm scottmcm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 3, 2023
#[inline]
pub const unsafe fn as_ascii_unchecked(&self) -> &[ascii::Char; N] {
let byte_ptr: *const [u8; N] = self;
let ascii_ptr = byte_ptr as *const [ascii::Char; N];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we can use cast(), right? We can also use the newly added ptr::from_ref().

@scottmcm
Copy link
Member Author

scottmcm commented May 4, 2023

I think this is ready for review for nightly now:
@rustbot ready

There's lots more to do here before it'd be ready for stabilization, but this is already 700+ lines, so I'd like to move those things (like putting examples on everything) into later PRs.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 4, 2023
@BurntSushi
Copy link
Member

Ship it. @bors r+

@bors
Copy link
Contributor

bors commented May 4, 2023

📌 Commit 8c781b0 has been approved by BurntSushi

It is now in the queue for this repository.

@bors bors removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 4, 2023
@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label May 4, 2023
bors added a commit to rust-lang-ci/rust that referenced this pull request May 4, 2023
…iaskrgr

Rollup of 8 pull requests

Successful merges:

 - rust-lang#108865 (Add a `sysroot` crate to represent the standard library crates)
 - rust-lang#110651 (libtest: include test output in junit xml reports)
 - rust-lang#110826 (Make PlaceMention a non-mutating use.)
 - rust-lang#110982 (Do not recurse into const generic args when resolving self lifetime elision.)
 - rust-lang#111009 (Add `ascii::Char` (ACP#179))
 - rust-lang#111100 (check array type of repeat exprs is wf)
 - rust-lang#111186 (Add `is_positive` method for signed non-zero integers.)
 - rust-lang#111201 (bootstrap: add .gitmodules to the sources)

Failed merges:

 - rust-lang#110954 (Reject borrows of projections in ConstProp.)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit ea0b650 into rust-lang:master May 4, 2023
@rustbot rustbot added this to the 1.71.0 milestone May 4, 2023
@scottmcm scottmcm deleted the ascii-char branch May 4, 2023 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants