Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for BinaryHeap sorted iterator methods #59278

Open
1 task
jonhoo opened this issue Mar 18, 2019 · 17 comments
Open
1 task

Tracking issue for BinaryHeap sorted iterator methods #59278

jonhoo opened this issue Mar 18, 2019 · 17 comments
Labels
A-collections Area: `std::collection` A-iterators Area: Iterators B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@jonhoo
Copy link
Contributor

jonhoo commented Mar 18, 2019

This tracks the stabilization of BinaryHeap::into_iter_sorted (binary_heap_into_iter_sorted) and BinaryHeap::drain_sorted (binary_heap_drain_sorted) implemented in #65091.

EDIT much later:

  • Stabilize feature(binary_heap_into_iter_sorted) #76234 looked at stabilizing part of this, but didn't come to a conclusion that it was ready-to-go as-is. The questions from there (such as consistency in order with into_sorted_vec) need to be addressed before a future attempt to stabilize.

Original feature request:


Currently, both BinaryHeap::into_iter and BinaryHeap::drain yield the heap elements in an arbitrary order. This seems counter-intuitive given the nature of a heap. Sadly, we can't simply change their behavior behind the scenes, as we would have to remove the implementation of DoubleEndedIterator. Should we perhaps add into_ordered_iter and drain_ordered? The implementation is pretty straightforward so I'd be happy to submit a PR.

@estebank estebank added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Mar 18, 2019
@clarfonthey
Copy link
Contributor

Isn't into_sorted_vec().into_iter() sufficient?

@scottmcm
Copy link
Member

scottmcm commented Apr 1, 2019

@clarfon No, because getting the largest k items from that is O(n lg n + k), whereas with into_ordered_iter() it would be O(k lg n).

@ckaran

This comment has been minimized.

@clarfonthey

This comment has been minimized.

@ckaran

This comment has been minimized.

@jonas-schievink jonas-schievink added A-collections Area: `std::collection` C-enhancement Category: An issue proposing an enhancement or a PR with one. C-feature-request Category: A feature request, i.e: not implemented / a PR. and removed C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Jun 6, 2019
@sekineh
Copy link
Contributor

sekineh commented Oct 2, 2019

Hi,

I've implemented both API in the local rust tree.
I see some problems in the API and I'm proposing the better version at the bottom of this comment.

Any ideas?

Currently proposed API

.into_ordered_iter(self)

    /// Returns an iterator which retrieves elements in heap order.
    /// This method consumes the original heap.
    ///
    /// # Examples
    ///
    /// Basic usage:
    ///
    /// ```
    /// #![feature(binary_heap_into_ordered_iter)]
    /// use std::collections::BinaryHeap;
    /// let heap = BinaryHeap::from(vec![1, 2, 3, 4, 5]);
    ///
    /// assert_eq!(heap.into_ordered_iter().take(2).collect::<Vec<_>>(), vec![5, 4]);
    /// ```
    #[unstable(feature = "binary_heap_into_ordered_iter", issue = "59278")]
    pub fn into_ordered_iter(self) -> IntoOrderedIter<T> {
        IntoOrderedIter {
            inner: self,
        }
    }

.drain_ordered(&mut self)

    /// Returns an iterator which retrieves elements in heap order.
    /// The retrieved elements will be removed from the original heap.
    ///
    /// Note: Unlike other `.drain()` methods, this method removes elements *lazily*.
    /// Only the retrieved elements will be removed.
    ///
    /// # Examples
    ///
    /// Basic usage:
    ///
    /// ```
    /// #![feature(binary_heap_drain_ordered)]
    /// use std::collections::BinaryHeap;
    ///
    /// let mut heap = BinaryHeap::from(vec![1, 2, 3, 4, 5]);
    /// assert_eq!(heap.len(), 5);
    ///
    /// let removed = heap.drain_ordered().take(2).collect::<Vec<_>>(); 
    /// assert_eq!(removed, vec![5, 4]);
    /// assert_eq!(heap.len(), 3);
    ///
    /// let removed = heap.drain_ordered().take(3).collect::<Vec<_>>(); 
    /// assert_eq!(removed, vec![3, 2, 1]);
    /// assert_eq!(heap.len(), 0);
    /// ```
    #[inline]
    #[unstable(feature = "binary_heap_drain_ordered", issue = "59278")]
    pub fn drain_ordered(&mut self) -> DrainOrdered<'_, T> {
        DrainOrdered {
            inner: self,
        }
    }

Problems I see

  • (P1) Too many iters: New users won't get to the right one.
    => Edit: BinaryHeap doc example could be updated after this feature would be stabilized.
  • (P2) Code duplication: The impl of .into_ordered_iter and .drain_ordered is essentially the same. They are duplicated codes.
    => @scottmcm: Both owning and borrowing versions are useful.
  • (P3) .drain() inconsistency: The semantics of .drain_ordered() is lazy. Other all .drain() method immediately removes elements.
    => @scottmcm: I agree that drop(foo.drain_ordered()); should result in an empty foo. But that can be accomplished by just clearing in the iterator's drop (assuming it's safe so doesn't need to amplify).

Better API (Solution)

  • Implement one method .into_ordered_iter(&mut self).
    This is the mix of above .into_ordered_iter(self) and .drain_ordered(&mut self). I picked more useful one and changed the name.
  • Do not implement .drain_ordered() method.

-> Solves P2 and P3. Partially solves P1 (discoverability).

@scottmcm
Copy link
Member

scottmcm commented Oct 2, 2019

Both owning and borrowing versions are useful. I can't return a version that borrows from a local variable, but I can return the version that moves it.

That said, I agree that drop(foo.drain_ordered()); should result in an empty foo. But that can be accomplished by just clearing in the iterator's drop (assuming it's safe so doesn't need to amplify).

(I'm not worried about discoverability here; people have presumably been taught about the difference in things like Vec, which also has .into_iter() and .drain(..).)

@sekineh
Copy link
Contributor

sekineh commented Oct 2, 2019

Thanks @scottmcm for the comment. I updated my comment accordingly.

@jonhoo, are you still working on PR for this?

@jonhoo
Copy link
Contributor Author

jonhoo commented Oct 3, 2019

@sekineh nope, not currently working on a PR

bors added a commit that referenced this issue Oct 31, 2019
Implement ordered/sorted iterators on BinaryHeap as per #59278

I've implemented the ordered version of iterator on BinaryHeap as per #59278.

# Added methods:

* `.into_iter_sorted()`
  * like `.into_iter()`; but returns elements in heap order
* `.drain_sorted()`
  * like `.drain()`; but returns elements in heap order
  * It's a bit _lazy_; elements are removed on drop. (Edit: it’s similar to vec::Drain)

For `DrainSorted` struct, I implemented `Drop` trait following @scottmcm 's [suggestion](#59278 (comment))

# ~TODO~ DONE
* ~I think I need to add more tests other than doctest.~

# **Notes:**
* we renamed `_ordered` to `_sorted`, because the latter is more common in rust libs. (as suggested by @KodrAus )
@jonhoo jonhoo changed the title BinaryHeap owned iterators should yield items in heap order Tracking issue for BinaryHeap sorted iterator methods Nov 6, 2019
@jonhoo
Copy link
Contributor Author

jonhoo commented Nov 6, 2019

I've updated the issue to reflect that this now tracks the features implemented in #65091.

@jonhoo
Copy link
Contributor Author

jonhoo commented Nov 6, 2019

@KodrAus (not sure if you're the right person to ping for this) This should probably get C-tracking.

@KodrAus KodrAus added the C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. label Nov 6, 2019
@jonas-schievink jonas-schievink added B-unstable Blocker: Implemented in the nightly compiler and unstable. and removed C-feature-request Category: A feature request, i.e: not implemented / a PR. labels Nov 26, 2019
@jonas-schievink jonas-schievink added the A-iterators Area: Iterators label Dec 12, 2019
@jonas-schievink
Copy link
Contributor

I kind of feel like in-order iteration should be the default for a heap. I was surprised to learn that the IntoIterator impl yields items in arbitrary order.

@jonas-schievink
Copy link
Contributor

Also the destructor of DrainSorted will currently leak items if one of their destructors panics.

@alexeymuranov
Copy link

alexeymuranov commented Mar 27, 2021

I kind of feel like in-order iteration should be the default for a heap. I was surprised to learn that the IntoIterator impl yields items in arbitrary order.

How about renaming?

  • iter -> into_iter_unsorted,
  • drain -> drain_unsorted,
  • into_iter_sorted -> iter
  • drain_sorted -> drain

Would such a renaming even need a deprecation warning?

@cwhakes
Copy link
Contributor

cwhakes commented Dec 2, 2022

I almost ran headlong into the underlying issue. I agree that it is unintuitive that results don't come out sorted. Additionally, if iteration order is arbitrary, reversing the order with DoubleEndedIterator is pointless and likely an incorrect attempt at making a MinHeap.

Whether .into_iter_sorted() is stablized or .iter() is renamed, the resulting Iterator should not be double-ended and any implementation thereof should be depreciated. If that is even possible,

bors added a commit to rust-lang-ci/rust that referenced this issue Jan 15, 2023
Leak amplification for peek_mut() to ensure BinaryHeap's invariant is always met

In the libs-api team's discussion around rust-lang#104210, some of the team had hesitations around exposing malformed BinaryHeaps of an element type whose Ord and Drop impls are trusted, and which does not contain interior mutability.

For example in the context of this kind of code:

```rust
use std::collections::BinaryHeap;
use std::ops::Range;
use std::slice;

fn main() {
    let slice = &mut ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'];
    let cut_points = BinaryHeap::from(vec![4, 2, 7]);
    println!("{:?}", chop(slice, cut_points));
}

// This is a souped up slice::split_at_mut to split in arbitrary many places.
//
// usize's Ord impl is trusted, so 1 single bounds check guarantees all those
// output slices are non-overlapping and in-bounds
fn chop<T>(slice: &mut [T], mut cut_points: BinaryHeap<usize>) -> Vec<&mut [T]> {
    let mut vec = Vec::with_capacity(cut_points.len() + 1);
    let max = match cut_points.pop() {
        Some(max) => max,
        None => {
            vec.push(slice);
            return vec;
        }
    };

    assert!(max <= slice.len());

    let len = slice.len();
    let ptr: *mut T = slice.as_mut_ptr();
    let get_unchecked_mut = unsafe {
        |range: Range<usize>| &mut *slice::from_raw_parts_mut(ptr.add(range.start), range.len())
    };

    vec.push(get_unchecked_mut(max..len));
    let mut end = max;
    while let Some(start) = cut_points.pop() {
        vec.push(get_unchecked_mut(start..end));
        end = start;
    }
    vec.push(get_unchecked_mut(0..end));
    vec
}
```

```console
[['7', '8', '9'], ['4', '5', '6'], ['2', '3'], ['0', '1']]
```

In the current BinaryHeap API, `peek_mut()` is the only thing that makes the above function unsound.

```rust
let slice = &mut ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'];
let mut cut_points = BinaryHeap::from(vec![4, 2, 7]);
{
    let mut max = cut_points.peek_mut().unwrap();
    *max = 0;
    std::mem::forget(max);
}
println!("{:?}", chop(slice, cut_points));
```

```console
[['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], [], ['2', '3'], ['0', '1']]
```

Or worse:

```rust
let slice = &mut ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'];
let mut cut_points = BinaryHeap::from(vec![100, 100]);
{
    let mut max = cut_points.peek_mut().unwrap();
    *max = 0;
    std::mem::forget(max);
}
println!("{:?}", chop(slice, cut_points));
```

```console
[['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], [], ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '\u{1}', '\0', '?', '翾', '?', '翾', '\0', '\0', '?', '翾', '?', '翾', '?', '啿', '?', '啿', '?', '啿', '?', '啿', '?', '啿', '?', '翾', '\0', '\0', '񤬐', '啿', '\u{5}', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\u{8}', '\0', '`@',` '\0', '\u{1}', '\0', '?', '翾', '?', '翾', '?', '翾', '
thread 'main' panicked at 'index out of bounds: the len is 33 but the index is 33', library/core/src/unicode/unicode_data.rs:319:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```

---

This PR makes `peek_mut()` use leak amplification (https://doc.rust-lang.org/1.66.0/nomicon/leaking.html#drain) to preserve the heap's invariant even in the situation that `PeekMut` gets leaked.

I'll also follow up in the tracking issue of unstable `drain_sorted()` (rust-lang#59278) and `retain()` (rust-lang#71503).
@MiSawa
Copy link
Contributor

MiSawa commented Jun 2, 2023

I feel it odd that the return value of into_sorted_vec and into_iter_sorted has opposite order, and into_iter_sorted().is_sorted() is not true. Perhaps into_iter_descending (and rename into_sorted_vec to into_ascending_vec?) or something that makes it clear about the ordering would be nice.

@scottmcm
Copy link
Member

scottmcm commented Jun 2, 2023

Copying my thoughts from #76234 (comment):

My personal conclusions:

  • Given a time machine I'd remove direct iteration in heap order, and would make that go through Vec: From<Heap<_>> or some sort of .as_unordered_slice().
  • Assuming it can be implemented efficiently, .into_iter() and something-like-.drain() on a heap should both be double-ended in sorted order. (With probably no .iter() nor .iter_mut().)
  • As a bonus, that would avoid issues like BinaryHeap example should show how to make a min-heap. #58174 as such a heap would natively have .pop_min() and .pop_max()

So I'd be tempted to "solve" this issue by adding MinMaxHeap (#76250), and using documentation to encourage people to use that instead of BinaryHeap (though not actually deprecating BinaryHeap itself, since that feels rather strong).

I don't know if libs-api would be interested in the whole new collection type, though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: `std::collection` A-iterators Area: Iterators B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests