-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a generic string pattern matching API #528
Conversation
impl<'a, 'b> Pattern<'a> for &'b str { /* ... */ } | ||
|
||
impl<'a, 'b> Pattern<'a> for &'b [char] { /* ... */ } | ||
impl<'a, F> Pattern<'a> for F where F: FnOnce(char) -> bool { /* ... */ } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would have to be FnMut
, not FnOnce
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, yeah.
implementations, as the different results would break the requirement for double ended iterators | ||
to behave like a double ended queues where you just pop elements from both sides. | ||
|
||
_However_, all iterators will still implement `DoubleEndedIterator` if the underling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/underling/underlying/
I wanted to thank @Kimundi for writing this up -- it's been a long process getting to this stage. I really like this direction. Not only does it clean up the API (by avoiding The only hesitation I have is around performance, but I'd say we can move forward with with a prototype at least, and try to get numbers early in the process to ensure that there's little, if any, regression. |
I like the direction this is going, and I think the pattern of putting something we think we might deprecate in the future in an opt-in trait is a nice way to give us expandability in the future. My first reaction to this is if it'd be possible to broaden this to also support Also, to bikeshed a tiny bit, I'd prefer |
Also bear in mind that associated types are unusable at present; notably, trait constraints don’t work. |
That... is an excellent point. While @Kimundi, what do you think of the following migration plan. We stabilize the full set of methods on That would also relieve some of the pressure to get this done in the alpha timeframe; we have plenty else on our plate. |
(Note, this plan banks on getting bounds done by beta1.) |
In more detail, I think we could do something like the following, where the #[stable]
fn contains(&self, needle: &str) -> bool;
#[unstable]
fn contains_char(&self, needle: char) -> bool;
#[stable]
fn split<Sep: CharEq>(&'a self, sep: Sep) -> CharSplits<'a, Sep>;
#[stable]
fn splitn<Sep: CharEq>(&'a self, count: uint, sep: Sep) -> CharSplitsN<'a, Sep>;
#[stable]
fn rsplitn<Sep: CharEq>(&'a self, count: uint, sep: Sep) -> CharSplitsN<'a, Sep>;
#[stable]
fn split_terminator<Sep: CharEq>(&'a self, sep: Sep) -> CharSplits<'a, Sep>;
#[stable]
fn match_indices(&'a self, sep: &'a str) -> MatchIndices<'a>;
#[unstable]
fn split_str(&'a self, &'a str) -> StrSplits<'a>;
#[stable]
fn starts_with(&self, needle: &str) -> bool;
#[stable]
fn ends_with(&self, needle: &str) -> bool;
#[unstable]
fn trim_chars<C: CharEq>(&'a self, to_trim: C) -> &'a str;
#[unstable]
fn trim_left_chars<C: CharEq>(&'a self, to_trim: C) -> &'a str;
#[unstable]
fn trim_right_chars<C: CharEq>(&'a self, to_trim: C) -> &'a str;
#[stable]
fn find<C: CharEq>(&self, search: C) -> Option<uint>;
#[stable]
fn rfind<C: CharEq>(&self, search: C) -> Option<uint>;
#[unstable]
fn find_str(&self, &str) -> Option<uint>;
#[stable]
fn replace(&self, from: &str, to: &str) -> String { ... } |
@erickt: I haven't really thought about generalizing it beyond string slices, but the exact same system should work there as well. Maybe there is a way to abstract the trait families here over the actual slice type, which would also enable using them for the proposed I mainly used @chris-morgan, @aturon: Trait constraints on associated types are not actually necessary for this proposal to work. They are nice because the allow early type error detection, but the condition they check can also happen later at the concrete
(Or does that constrain in @aturon: If we temporary stabilize on a subset of the design, then I'd try to do it the way I layed out as a primary alternative: Change the |
@erickt: Maybe something like this could exist later on as a general mechanism for pattern searches on slices? (With HKT and trait bounds working):
This could live in |
looks great, but I share the same concern for performance as aturon |
We should definitely give this some careful thought prior to stabilizing these APIs, because it would be great to be usable across many string/slice types. At the very least, it seems like the |
Yeah, that might actually be enough |
@aturon: I think I now agree with your plan to stabilize the methods as-is where a later upgrade path is possible. No need to cause unnecessary churn by renaming back and forth if we want to support them anyway in the long term. |
Btw, I think we might want to change |
This stabilizes most methods on `&str` working with patterns in a way that is forwards-compatible with a generic string pattern matching API: - Methods that are using the primary name for their operation are marked as `#[stable]`, as they can be upgraded to a full `Pattern` API later without existing code breaking. Example: `contains(&str)` - Methods that are using a more specific name in order to not clash with the primary one are marked as `#[unstable]`, as they will likely be removed once their functionality is merged into the primary one. Example: `contains_char<C: CharEq>(C)` - The method docs got changed to consistently refer to the pattern types as a pattern. - Methods whose names do not match in the context of the more generic API got renamed. Example: `trim_chars -> trim_matches` Additionally, all methods returning iterators got changed to return unique new types with changed names in accordance with the new naming guidelines. See also rust-lang/rfcs#528 Due to some deprecations and type changes, this is a [breaking-change]
I extended the |
I managed to get the previous worst case of 2x slowdown down to 1.2 or so locally, but in the combined bencher run its still at 2x: Looking at the |
@Kimundi would you mind updating the RFC with the changes to |
My only other concern is I know when I mentioned that before, you worried that we were growing the trait to encompass basically all of the methods provided by str, but:
|
This has been a long time in the making! Thanks for your persistence, @Kimundi This RFC will go a long way toward making the str API more consistent, ergonomic and flexible, allowing for a wide (and open-ended) variety of patterns to be used with the same set of methods. The prototype performance has gotten within spitting distance of the existing API, and there do not appear to be any fundamental performance drawbacks to the approach. We will likely keep the |
This is not a complete implementation of the RFC: - only existing methods got updated, no new ones added - doc comments are not extensive enough yet - optimizations got lost and need to be reimplemented See rust-lang/rfcs#528 Technically a [breaking-change]
Since the API is still unstabe, I was wondering if we could remove lifetime parameter from the |
@krdln Wow, didn't realize that this is possible. This sounds very doable indeed. |
I just tried it, an unfortunately it seems that the HRTB approach does not play well with associated types: impl<'a, P: Pattern> Clone for $t<'a, P>
where P::Searcher: Clone
{
fn clone(&self) -> Self {
let $s = self;
$e
}
}
I get ~54 errors, all because of attempting to use ::Searcher. This includes I could just change all those cases to use BoundedPattern<'a> directly, but that |
@Kimundi Does changing the bound to the following work? It should help. where <P as BoundedPattern<'a>>::Searcher: Clone By "from what I've tested" I meant that this black magic compiles: use std::str::pattern::Pattern;
trait Pat: for<'a>Pattern<'a> {
fn into_searcher<'a>(self, haystack: &'a str) -> <Self as Pattern<'a>>::Searcher {
<Self as Pattern<'a>>::into_searcher(self, haystack)
}
}
impl<P: for<'a>Pattern<'a>> Pat for P {} And I concluded that rest of usecases are simpler and will work. (btw, by this example I don't want to say if it's a good idea to copy |
@krdln: The issue is that apparently most function signatures that would become easier by using Like, |
It appears this was left out of RFC rust-lang/rfcs#528 because it might be useful to also generalize the second argument in some way. That doesn't seem to prevent generalizing the first argument now, however. This is a [breaking-change] because it could cause type-inference to fail where it previously succeeded. Also update docs for a few other methods that still referred to `&str` instead of patterns.
Edit: there used to be a question here, but it has since been deleted. I am leaving this amswer up because someone linked to it on users.rust-lang.org You can ask questions like these on users.rust-lang.org or reddit on ghe learnrust subreddit. You can do this already using the let mut split2 = name.splitn(',', 2);
if let (Some(last), Some(first)) = (split2.next(), split2.next()) {
...
} else ... If you have further questions please go to one of the forums. I'm active on users and would be happy to help there. |
This is not a complete implementation of the RFC: - only existing methods got updated, no new ones added - doc comments are not extensive enough yet - optimizations got lost and need to be reimplemented See rust-lang/rfcs#528 Technically a [breaking-change]
Summary
Stabilize all string functions working with search patterns around a new
generic API that provides a unified way to define and use those patterns.
Motivation
Right now, string slices define a couple of methods for string
manipulation that work with user provided values that act as
search patterns. For example,
split()
takes an type implementingCharEq
to split the slice at all codepoints that match that predicate.
Among these methods, the notion of what exactly is being used as a search
pattern varies inconsistently: Many work with the generic
CharEq
,which only looks at a single codepoint at a time; and some
work with
char
or&str
directly, sometimes duplicating a method toprovide operations for both.
This RFC proposes a set of traits for defining such patterns, which enable efficient iteration through all matches in a string.
Rendered