a bit of respite

010110__010110

Iterator Blocks for Rust - Feature Survey

In the first concrete installment of my series on iterator blocks I want to explore the possible feature set that could/should be supported in Rust. In the following paragraphs I will try to collect things that I think one would expect from iterator blocks in Rust. Mind, however, that this is just my opinion so far. I'd very much like to discuss this with anyone interested on the mailing list. Nevertheless, I think a collection like this is valuable as a point of departure—even if most of it won't make into a hypothetical final implementation.

Coco Chanel

I categorized the points below into must haves, nice to haves, and things to think about. The syntax I use here is very close to C# and should not be considered a proposal for the real syntax. This topic is big enough to warrant its own post.

the must haves

The features in this category probably won't be disputed all too much. Core Functionality: Allow yielding things
This is kind of obvious but it should be clarified what it exactly means. I would suggest to take the C# functionality as a baseline here, where yield allows to implement
  • resumable functions
  • that may return a value whenever yielding control.
This allows for some interesting things:
  • One can write external iterators like internal ones.
  • It can also be used to implement stop-and-go control flow.
Implement std::iterator::Iterator
Iterators created from iterator blocks should implement the std::iterator::Iterator trait in order to be compatible with the rest of the standard library and language facilities such as for loops.
Support yielding owned values (~)
One should be able to yield owned values, such as in:
fn generate_owned_ints() -> Iterator<~int> {
    // yield an owned r-value
    yield return ~1;

    let x = ~2;
    let y = ~3;

    // yield an owned l-value
    yield return x;
    // x is not accessible here any more.

    // this case is a bit trickier because y has to be
    // retained across a *yield return* boundary, but it
    // should be supported nonetheless.
    yield return y;
}
This should also allow to create an iterator block that takes ownership of some collection and gradually moves values out of it (like std::vec::ConsumeIterator does).
Support borrowing iterators (&)
Iterator blocks should allow to yield borrowed references into any data structure iterated over:
fn strange_seq<'a>(input: &'a mut HashMap<int, int>)
                            -> Iterator<&'a mut int> {
    yield return input.get_mut(100);
    yield return input.get_mut(1000);
    yield return input.get_mut(100000);
}
Performance

Iterator blocks should be as fast as handwritten iterators if possible. In my view, this pretty much rules out any implementation relying on each iterator having its own stack. C#'s way of transforming the iterator block code into a lightweight state machine seems the way to go here.

It probably also means that iterators created from yielding functions should not require a heap allocation. That would mean that each iterator function should introduce its own type implementing std::iterator::Iterator to allow for stack allocation and static method dispatch. This is just me thinking aloud, though. There has also been talk about other ways to deal with this problem.

Good Error Messages
If a piece of iterator code is not legal the compiler should tell you as clearly as possible why it isn't and how you might fix it. This is an important requirement to keep in mind because it might be possible to implement iterator blocks as a mere AST transformation (like the new for loop). In this case, the error messages generated from the »de-sugared« version of the code might not be very helpful and even confusing.

the nice to haves

I could imagine an iterator block implementation that does not have the features in this category but that would still be useful. yield break
In C# you can call yield break from within an iterator block to end iteration explicitly. This more or less corresponds to a Rust Iterator returning None from next(). One can probably get by without this, but it's definitely useful.
No dependence on threads, scheduler and garbage collection
There seems to be quite some interest in using Rust without its runtime. The iterator block implementation should not depend on anything that would make it impossible to use it in such constrained contexts (once std::iterator can be used there too). The state machine transformation should be able to fullfil this requirement.
Efficient Iterator Nesting
This is a genuinely interesting item. Regular state machine iterators can become quite inefficient when nested, like in the following example.
enum LinearList {
    Node { data: int, tail: ~LinearList },
    Nil
}

fn traverse(list: &LinearList) -> Iterator<int> {
    match *list {
        Node { data: data, tail: ~ref tail } => {
            yield return data;

            // Recursively iterate
            for item in traverse(tail) {
                yield return item;
            }
        }
        Nil => {
            yield break;
        }
    }
}
With the code above every call to the outer iterator traverses every nesting level until the currently active one, leading to an O(n²) runtime for traversing a linear list. Obviously, there is some unnecessary overhead involved here. To the rescue come Erik Meijer and friends: In their Iterators revisited paper Jacobs et al present a rather straight forward implementation for nested iterators. It introduces a new yield everything from another iterator construct. With something like this the above example can be written as:
fn traverse(list: &LinearList) -> Iterator<int> {
    match *list {
        Node { data: data, tail: ~ref tail } => {
            yield return data;

            // yield everything from the sub iterator
            yield .. traverse(tail);
        }
        Nil => {
            yield break;
        }
    }
}
The paper proposes to introduce a root iterator, directly dispatching control into the correct sub-iterator, skipping any intermediate sub-iterators. This works for any kind of nested iterators and brings runtime down to O(n) when traversing an n element data structure. Unfortunately, Microsoft seems to hold a patent on this technique. Mind, though, that I'm not a lawyer. And the whole concept of software patents—especially on things any bright undergrad could develop on her own—does not quite compute in my European brain. So maybe there is some way to still use this intricate method of "using a stack to traverse a recursive structure" that Microsoft invented in 2004.

the things to think about

Well, it's really just one thing to think about. According to [James & Sabry, 2011] iterators in Ruby support taking arguments when resuming iteration. In Rust this would mean that the next() method in the `Iterator` trait would be able to take additional arguments:
pub trait Iterator<A, T> {
    /// Advance the iterator and return the next value.
    /// Return `None` when the end is reached.
    fn next(&mut self, arg: T) -> Option<A>;
}
The current Iterator trait would then be a special case with T == (). This sounds kind of nice but I could not come up with a good use case. It also might complicate things unnecessarily. But I wanted to have mentioned it.

conclusion

I hope I've provided you with some food for thought. I'd be interested in what others have to say about the topic. So head on over to the mailing list thread and let me know!