Stabilize feature(binary_heap_into_iter_sorted) #76234

scottmcm · 2020-09-02T03:11:21Z

This came up on an itertools issue, and looks like it's baked sufficiently, so I figured I'd ask about stabilization.

This would stabilize the following API:
https://doc.rust-lang.org/nightly/std/collections/struct.BinaryHeap.html#method.into_iter_sorted

impl<T: Ord> BinaryHeap {
    fn into_iter_sorted(self) -> IntoIterSorted<T>;
}
impl<T: Ord> Iterator for IntoIterSorted<T> { type Item = T; }
impl<T: Ord> ExactSizeIterator for IntoIterSorted<T> ;
impl<T: Ord> FusedIterator for IntoIterSorted<T> ;
impl<T: Clone> Clone for IntoIterSorted<T>;
impl<T: Debug> Debug for IntoIterSorted<T>;

There were comments about wishing that the ordinary .into_iter() could just be the sorted one, but .iter() cannot be in that order so that's not necessarily good. And anyways, IntoIter already implements DoubleEndedIterator so we can't really change the order even if we wanted to.

So I think the remaining potential questions are naming (it was originally proposed as into_iter_ordered but went in as into_iter_sorted, which I agree is more consistent with elsewhere in the library) and the standard whether this is useful enough (one could certainly use from_fn on the heap's pop, but that's potentially suboptimal as it doesn't have a size_hint).

This needs an FCP, so picking a libs member explicitly:
r? @KodrAus

Tracking issue #59278 (which also covers a drain version which I've intentionally not included here)

matklad · 2020-09-02T07:30:20Z

should we stabilize drain_sorted in the same step? the two methods feel pretty similar.

scottmcm · 2020-09-02T21:31:27Z

@matklad There were concerns about that version in the tracking issue and the use I ran into didn't need it, so I didn't include it here. We certainly could, if people wanted, though.

scottmcm · 2020-09-02T21:54:29Z

Hmm, I just saw #76250 -- if a min-max heap is strictly better than a binary heap then another option would be to implement that, which could efficiently be a DoubleEndedIterator...

KodrAus · 2020-09-03T02:13:43Z

Revisiting this after the original PR, the different ordering of into_iter_sorted and into_sorted_vec is a little unfortunate.

One way to resolve this could be to consider deprecating into_sorted_vec over into_iter_sorted().collect::<Vec<_>>(), but I don't think that's what we'd want to do. It looks like the reason to prefer into_sorted_vec (and thus keep it around) over into_iter_sorted().collect::<Vec<_>>() is that into_sorted_vec is more efficient, and specialization of into_iter_sorted().collect::<Vec<_>>() wouldn't help cases where you want to collect into something Vec-like, but isn't a Vec itself.

Another way could be to tweak this API a bit would be to slot a type in-between the iterators that lives in binary_heap:

pub struct Sorted<T>(BinaryHeap<T>);
pub struct SortedMut<'a, T>(&'a mut BinaryHeap<T>);

// current into_iter_sorted
impl<T> IntoIterator for Sorted<T> {
    type Item = T;
}

// equivalent to SortedMut<'a, T> as IntoIterator
impl<'a, T> IntoIterator for &'a mut Sorted<T> {
    type Item = T;
}

impl<'a, T> IntoIterator for SortedMut<'a, T> {
    type Item = T;
}

impl<'a, T> SortedMut<'a, BinaryHeap<T>> {
    // current drain_sorted
    pub fn drain(self) -> SortedMutDrain<'a, T>;
}

impl<T> BinaryHeap<T> {
    pub fn into_sorted(self) -> Sorted<T>;
    pub fn sorted_mut(&mut self) -> SortedMut<T>;
}

It would then be called like:

// instead of into_iter_sorted
for item in heap.into_sorted() {
    ..
}

// into_sorted_vec stays around
let v = heap.into_sorted_vec();

// instead of drain_sorted
for item in heap.sorted_mut().drain() {
    ..
}

// like drain_sorted, but doesn't clear the heap
let max_5 = heap.sorted_mut().into_iter().take(5).collect::<Vec<_>>();

That's extra machinery and indirection though we may not think is worthwhile. I do think it's worth exploring though, since into_iter_sorted is also the only into_iter-style named method we have that isn't just an implementation of IntoIterator, and this would avoid that.

scottmcm · 2020-09-03T02:39:57Z

Hmm, into_sorted_vec is a bit of a strange API -- it forces a heapsort, but even though the input is already in heap order, that's not the expensive part of the sort, so it's plausibly faster to just .into_vec() and call .sort_unstable() (since a heapsort is also unstable). To me, the only reason to use the heap to get sorted things is because you want to be lazy about it since you're going to stop early but at some unknown point. (If you have a known point then [T]::partition_at_index and sorting the subslice is faster.)

Your point about IntoIterator is a good one. It really makes me wish for a completely different heap API: one that doesn't have .iter() and .iter_mut() at all, just .into_iter() and .drain() (with people being expected to flip back to Vec if they really need arbitrary order). Especially if it were double-ended, so that both ascending and descending are available cheaply (one via .rev()) and to address #58174 directly.

KodrAus · 2020-09-03T07:50:54Z

Your point about IntoIterator is a good one. It really makes me wish for a completely different heap API: one that doesn't have .iter() and .iter_mut() at all, just .into_iter() and .drain() (with people being expected to flip back to Vec if they really need arbitrary order).

Do you think the way we'd want to do this would be to introduce a new MinMaxHeap type? I think since our hands seems a bit tied with BinaryHeap maybe we could just rename the into_iter_sorted method to into_sorted_iter and stabilize that along with drain_sorted, since they're useful to have. The naming convention being:

{op}_{sorted?}_{container?}

which then covers:

drain()
drain_sorted()

into_vec()
into_sorted_vec()

into_iter()
into_sorted_iter()

What do you think?

Hmm, into_sorted_vec is a bit of a strange API -- it forces a heapsort, but even though the input is already in heap order, that's not the expensive part of the sort, so it's plausibly faster to just .into_vec() and call .sort_unstable() (since a heapsort is also unstable).

I was curious about this so thought I'd give it a try. A simple microbenchmark seems to support the idea that sort_unstable is better as the size of the heap increases. I ran these on my i9 9900K Windows machine.

With a heap size of 10 elements:

test already_sorted_heap_collect_sorted_vec     ... bench:          79 ns/iter (+/- 0)
test already_sorted_heap_into_sorted_vec        ... bench:          69 ns/iter (+/- 0)
test already_sorted_heap_into_vec_sort_unstable ... bench:          71 ns/iter (+/- 0)
test interleaved_heap_collect_sorted_vec        ... bench:          78 ns/iter (+/- 0)
test interleaved_heap_into_sorted_vec           ... bench:          70 ns/iter (+/- 0)
test interleaved_heap_into_vec_sort_unstable    ... bench:          70 ns/iter (+/- 0)

With a heap size of 100 elements:

test already_sorted_heap_collect_sorted_vec     ... bench:         563 ns/iter (+/- 3)
test already_sorted_heap_into_sorted_vec        ... bench:         462 ns/iter (+/- 1)
test already_sorted_heap_into_vec_sort_unstable ... bench:         139 ns/iter (+/- 4)
test interleaved_heap_collect_sorted_vec        ... bench:         565 ns/iter (+/- 5)
test interleaved_heap_into_sorted_vec           ... bench:         467 ns/iter (+/- 3)
test interleaved_heap_into_vec_sort_unstable    ... bench:         489 ns/iter (+/- 1)

With a heap size of 1000 elements:

test already_sorted_heap_collect_sorted_vec     ... bench:      21,534 ns/iter (+/- 235)
test already_sorted_heap_into_sorted_vec        ... bench:      21,182 ns/iter (+/- 186)
test already_sorted_heap_into_vec_sort_unstable ... bench:         751 ns/iter (+/- 11)
test interleaved_heap_collect_sorted_vec        ... bench:      22,008 ns/iter (+/- 219)
test interleaved_heap_into_sorted_vec           ... bench:      22,112 ns/iter (+/- 381)
test interleaved_heap_into_vec_sort_unstable    ... bench:       6,044 ns/iter (+/- 32)

#![feature(test, binary_heap_into_iter_sorted)]
extern crate test;

use std::collections::BinaryHeap;

// tweak this to change the size of the heap
const N: usize = 1000;

fn interleaved() -> BinaryHeap<usize> {
    let mut heap = BinaryHeap::new();

    for (a, b) in (0..N / 2).zip((N / 2..N).into_iter().rev()) {
        heap.push(a);
        heap.push(b);
    }

    heap
}

fn already_sorted() -> BinaryHeap<usize> {
    let mut heap = BinaryHeap::new();

    for i in (0..N).into_iter().rev() {
        heap.push(i);
    }

    heap
}

#[bench]
fn interleaved_heap_into_sorted_vec(b: &mut test::Bencher) {
    let heap = interleaved();

    b.iter(|| {
        let heap = heap.clone();
        heap.into_sorted_vec()
    });
}

#[bench]
fn interleaved_heap_collect_sorted_vec(b: &mut test::Bencher) {
    let heap = interleaved();
    let mut sorted = Vec::new();

    sorted.extend(heap.clone().into_iter_sorted());

    b.iter(|| {
        let heap = heap.clone();
        sorted.clear();

        // Unlike the `into_vec` methods, this creates a new `Vec` each time
        // We amortize the cost a little be re-using it
        sorted.extend(heap.into_iter_sorted());

        test::black_box(&sorted);
    });
}

#[bench]
fn interleaved_heap_into_vec_sort_unstable(b: &mut test::Bencher) {
    let heap = interleaved();

    b.iter(|| {
        let mut heap = heap.clone().into_vec();
        heap.sort_unstable();
        heap
    });
}

#[bench]
fn already_sorted_heap_into_sorted_vec(b: &mut test::Bencher) {
    let heap = already_sorted();

    b.iter(|| {
        let heap = heap.clone();
        heap.into_sorted_vec()
    });
}

#[bench]
fn already_sorted_heap_collect_sorted_vec(b: &mut test::Bencher) {
    let heap = already_sorted();
    let mut sorted = Vec::new();

    sorted.extend(heap.clone().into_iter_sorted());

    b.iter(|| {
        let heap = heap.clone();
        sorted.clear();

        // Unlike the `into_vec` methods, this creates a new `Vec` each time
        // We amortize the cost a little be re-using it
        sorted.extend(heap.into_iter_sorted());

        test::black_box(&sorted);
    });
}

#[bench]
fn already_sorted_heap_into_vec_sort_unstable(b: &mut test::Bencher) {
    let heap = already_sorted();

    b.iter(|| {
        let mut heap = heap.clone().into_vec();
        heap.sort_unstable();
        heap
    });
}

Should we update the implementation of into_sorted_vec to just do into_vec + sort_unstable? 🙂

matklad · 2020-09-03T08:20:51Z

so it's plausibly faster to just .into_vec() and call .sort_unstable()

Oh wow, that means that max heap really doesn’t make sense.

the different ordering of into_iter_sorted and into_sorted_vec is a little unfortunate.

As well as just is the fact that iter_sorted returns reversely-sorted Items. I would totally expect anything _sorted to be ascending. Should we stick a rev into method‘s name?

bors · 2020-09-03T23:53:11Z

☔ The latest upstream changes (presumably #70793) made this pull request unmergeable. Please resolve the merge conflicts.

KodrAus · 2020-09-11T00:24:31Z

As well as just is the fact that iter_sorted returns reversely-sorted Items.

That tripped me up too. Since it's role is to call pop until you're done, maybe we should be thinking of it as iter_priority or something?

sekineh · 2020-09-29T08:50:22Z

For record, the reason I picked the method name into_iter_sorted() was:

Discoverability in IDE use case.

A user type heap.into_iter in IDE editor.
IDE will display method candidates:

.into_iter()
.into_iter_sorted()

User can discover the _sorted() version.

Edit: This merit also applies to .into_iter_ordered() if _sorted feels inconsistent because it's not ascending. Suffixes are generally good for IDE discoverability.
Edit: I implemented Implement ordered/sorted iterators on BinaryHeap as per #59278 #65091 and add .into_iter_sorted(). sekineh/binary-heap-plus-rs#20 to fix the into_iter() problem in which values are returned in unexpected (arbitrary) order.

sekineh · 2020-09-30T03:59:49Z

As well as just is the fact that iter_sorted returns reversely-sorted Items.

That tripped me up too. Since it's role is to call pop until you're done, maybe we should be thinking of it as iter_priority or something?

In rust iterator design, .iter() produces &Ts and .into_iter() produces Ts.
If .iter_priority() or somthing iter_*() produces Ts, it's confusing in my veiw.

My opinion is that the method is a variant of into_iter(), i.e. produces Ts, so it's natural to call it into_iter_<suffixhere>()

sekineh · 2020-09-30T06:18:25Z

pub struct Sorted<T>(BinaryHeap<T>);
pub struct SortedMut<'a, T>(&'a mut BinaryHeap<T>);

It's an interesting API redesign, but ~~cannot be done in a backward compatible way~~.

Sorted ~~adapter~~ newtype / sorted() method feels redundant because BinaryHeap is inherently sorted data structure from the begining.
I'd do in the opposite way, if redesign is permitted:

pub struct Unsorted<T>(BinaryHeap<T>);
pub struct UnsortedMut<'a, T>(&'a mut BinaryHeap<T>);

In this new world, we'd use iterator like below (LHS).

heap.unsorted().iter() == current heap.iter()
heap.unsorted().into_iter() == current heap.into_iter()

But actualy I won't implement unsorted() beacuse it's nearly same as into_vec() which already exists.

heap.into_vec().iter() == heap.iter()
heap.into_vec().into_iter() == heap.into_iter()

sekineh · 2020-09-30T08:22:19Z

{op}_{sorted?}_{container?}

For the order of name, I feel like into_sorted_vec() should be named as into_vec_sorted(). (But can't change now!)
It's common to define set of similar methods with different suffixes:

{basename}_{suffix?}

For example, min-max-heap crate provide the following set of methods [1]:

into_vec(): Returns a vector in arbitrary order
into_vec_asc(): Returns an ascending (sorted) vector.
into_vec_desc(): Returns an descending (sorted) vector.

They are highly discoverable. I'm not endorsing asc/desc suffixes for this feature. They are using the suffixes because it's double ended.

KodrAus · 2020-10-05T02:53:27Z

In rust iterator design, .iter() produces &Ts and .into_iter() produces Ts.

I actually think of it slightly differently. .iter() takes &self (which might mean it needs to produce &Ts) and .into_iter() consumes self (so will probably produce Ts). I was a bit inaccurate with my comment though and it should've been into_priority_iter.

I can appreciate the discoverability use-case preferring suffixes, and do think into_iter_{sorting} is actually better than into_{sorting}_iter. But I think aligning with the existing and stable into_sorted_vec is more important here.

Since I think into_iter_sorted().collect() is actually into_sorted_vec().rev(), I think my best suggestion right now is into_priority_iter and drain_priority.

sekineh · 2020-10-05T05:27:05Z

I agree that into_priority_iter and drain_priority are consistent with existing method name.
The word priority will signal different (reversed) ordering than into_sorted_vec().

Since I think into_iter_sorted().collect() is actually into_sorted_vec().rev(), I think my best suggestion right now is into_priority_iter and drain_priority.

scottmcm · 2020-10-14T16:10:19Z

I'm going to withdraw this PR, since the conversation has convinced me that things aren't ready yet here.

My personal conclusions:

Given a time machine I'd remove direct iteration in heap order, and would make that go through Vec: From<Heap<_>> or some sort of .as_unordered_slice().
Assuming it can be implemented efficiently, .into_iter() and something-like-.drain() on a heap should both be double-ended in sorted order. (With probably no .iter() nor .iter_mut().)
As a bonus, that would avoid issues like BinaryHeap example should show how to make a min-heap. #58174 as such a heap would natively have .pop_min() and .pop_max()

Stabilize feature(binary_heap_into_iter_sorted)

b0da96f

rust-highfive assigned KodrAus Sep 2, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 2, 2020

scottmcm mentioned this pull request Sep 2, 2020

Add helper method for taking the k smallest elements in an iterator rust-itertools/itertools#473

Merged

jonas-schievink added needs-fcp This change is insta-stable, so needs a completed FCP to proceed. relnotes Marks issues that should be documented in the release notes of the next release. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 2, 2020

jonas-schievink added this to the 1.48 milestone Sep 2, 2020

scottmcm closed this Oct 14, 2020

This was referenced Oct 14, 2020

Tracking issue for BinaryHeap sorted iterator methods #59278

Open

Add Min-Max Heap to std::collections #76250

Closed

scottmcm mentioned this pull request Dec 18, 2020

Feature request: incremental sort rust-itertools/itertools#508

Open

KodrAus mentioned this pull request Dec 21, 2020

Naming guidelines recommend into_iter() method instead of IntoIterator trait rust-lang/api-guidelines#206

Open

coriolinus mentioned this pull request Jul 16, 2022

Add method to return the k most common items and speed up most_common_*() methods coriolinus/counter-rs#25

Merged

scottmcm mentioned this pull request Aug 30, 2023

Performance of BinaryHeap::into_sorted_vec vs. Vec::sort_unstable #115357

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stabilize feature(binary_heap_into_iter_sorted) #76234

Stabilize feature(binary_heap_into_iter_sorted) #76234

scottmcm commented Sep 2, 2020

matklad commented Sep 2, 2020

scottmcm commented Sep 2, 2020

scottmcm commented Sep 2, 2020

KodrAus commented Sep 3, 2020

scottmcm commented Sep 3, 2020

KodrAus commented Sep 3, 2020

matklad commented Sep 3, 2020

bors commented Sep 3, 2020

KodrAus commented Sep 11, 2020

sekineh commented Sep 29, 2020 •

edited

Loading

sekineh commented Sep 30, 2020 •

edited

Loading

sekineh commented Sep 30, 2020 •

edited

Loading

sekineh commented Sep 30, 2020 •

edited

Loading

KodrAus commented Oct 5, 2020

sekineh commented Oct 5, 2020

scottmcm commented Oct 14, 2020

Stabilize feature(binary_heap_into_iter_sorted) #76234

Stabilize feature(binary_heap_into_iter_sorted) #76234

Conversation

scottmcm commented Sep 2, 2020

matklad commented Sep 2, 2020

scottmcm commented Sep 2, 2020

scottmcm commented Sep 2, 2020

KodrAus commented Sep 3, 2020

scottmcm commented Sep 3, 2020

KodrAus commented Sep 3, 2020

matklad commented Sep 3, 2020

bors commented Sep 3, 2020

KodrAus commented Sep 11, 2020

sekineh commented Sep 29, 2020 • edited Loading

Discoverability in IDE use case.

sekineh commented Sep 30, 2020 • edited Loading

sekineh commented Sep 30, 2020 • edited Loading

sekineh commented Sep 30, 2020 • edited Loading

KodrAus commented Oct 5, 2020

sekineh commented Oct 5, 2020

scottmcm commented Oct 14, 2020

sekineh commented Sep 29, 2020 •

edited

Loading

sekineh commented Sep 30, 2020 •

edited

Loading

sekineh commented Sep 30, 2020 •

edited

Loading

sekineh commented Sep 30, 2020 •

edited

Loading