A C++ Zip Iterator

Parallel Iteration of... Whatever

"Zipper" from Pixabay

Python is one of my favourite languages and one of the features it has is the zip function, which allows for parallel iteration of arrays or list like objects. I wondered if something like that was possible in C++. There is almost certainly something like this out there already, however I thought it would be an interesting exercise I could try and do myself.

Boost does have a zip iterator, which accepts a range rather than a container, on the condition that the ranges are the same size. This avoids some weirdness, and in fairness, is more idiomatic C++.

It turns out this is possible, but not without some trickery.

This is the basic motivation:

auto a = std::vector<int>{1, 2, 3};
auto b = std::vector<int>{4, 5, 6};
auto c = std::vector<int>{};

/* Using a range-for expression with structured bindings from C++17 */
for (auto && [x, y] : zip(a, b)) {
    c.push_back(x + y);
}

/* "c" should contain 5, 7, 9 */

The idea is to use a range-for expression to unpack values into a structured binding, from a tuple returned by dereferencing an iterator that wraps a set of iterators to be iterated in parallel.

GitHub repository available here

The Idea

We need several things to achieve this:

  1. Helper function to return

  2. Container that provides begin() and end() to satisfy the range-for requirements, which returns

  3. Iterator, that when de-referenced, returns

  4. A tuple of values that can be de-structured through structured bindings

Zipping Two Ranges

The Zipper

I think a reasonable start would be the zipper class. This is the class that provides the begin and end functions for the range-for expression. It accepts and stores references to the containers to be iterated over.

template <typename T, typename U>
class zipper
{
public:
    /*  Select the most appropriate iterator for the past in containers.
        This will be covered later */
    using Iter1 = select_iterator_for<T>;
    using Iter2 = select_iterator_for<U>;

    using zip_type = zip_iterator<Iter1, Iter2>;

    /*  Constructor that takes forwarding references and forwards them to
        initialise the members */
    template <typename V, typename W>
    explicit zipper(V && a, W && b)
        : m_a{std::forward<V>(a)}
        , m_b{std::forward<W>(b)}
    {
    }

    /*  Functions to satisfy the range-for requirement. Returns iterators
        for begin and end functions respectively. */
    auto begin() -> zip_type {
        return zip_type{std::begin(m_a), std::begin(m_b)};
    }
    auto end() -> zip_type {
        return zip_type{std::end(m_a), std::end(m_b)};
    }

private:
    T m_a;
    U m_b;
};

The constructor is templatised in order to accept a forwarding (universal) reference, which is then forwarded (using std::forward) to initialise the member variables. This makes sure the value category is preserved such that if one of the values is an rvalue reference, the argument can be moved into the member.

Selecting the Correct Iterator

We need to select the correct iterator for the passed in types. If one of the types is a const reference then we need to make sure we use it’s const_iterator, rather than the normal iterator. std::decay is needed because if T is a reference, and is not a complete type we can’t deduce, i.e., we can’t deduce T&::iterator. Using decay will give us the fundamental type and allows to access it’s type definitions (as T::iterator).

template <typename T>
using select_iterator_for = std::conditional_t<
    std::is_const_v<std::remove_reference_t<T>>, 
    typename std::decay_t<T>::const_iterator,
    typename std::decay_t<T>::iterator>;

Constructing the Zipper

Utility method for creating the zipper in the place of the range expression in the for loop we are targeting. We take forwarding references for both arguments because we want to accept lvalues and rvalues, const and non-const.

template <typename T, typename U>
auto zip(T && t, U && u) {
    return zipper<T, U>{std::forward<T>(t), std::forward<U>(u)};
}

The Iterator

The zip iterator holds two iterators for the actual type being iterated over, and proxies their usual functions in tandem.

template <typename Iter1, typename Iter2>
class zip_iterator
{
public:
    using value_type = std::pair<
        select_access_type_for<Iter1>,
        select_access_type_for<Iter2>
    >;

    zip_iterator() = delete;

    zip_iterator(Iter1 iter_1_begin, Iter2 iter_2_begin)
        : m_iter_1_begin {iter_1_begin}
        , m_iter_2_begin {iter_2_begin}
    {
    }

    auto operator++() -> zip_iterator& {
        ++m_iter_1_begin;
        ++m_iter_2_begin;
        return *this;
    }

    auto operator++(int) -> zip_iterator {
        auto tmp = *this;
        ++*this;
        return tmp;
    }

    auto operator!=(zip_iterator const & other) {
        return !(*this == other);
    }

    auto operator==(zip_iterator const & other) {
        return m_iter_1_begin == other.m_iter_1_begin ||
                m_iter_2_begin == other.m_iter_2_begin;
    }

    auto operator*() -> value_type {
        return value_type{*m_iter_1_begin, *m_iter_2_begin};
    }

private:
    Iter1 m_iter_1_begin;
    Iter2 m_iter_2_begin;
};

The comparison we need to make in order to detect when iteration should end is when any of the iterators equal their end position. From a semantics point of view, equality in this sense is not really accurate; you would expect, ordinarily, that the test would be the aggregate equivalence of it’s members. However, if the things being iterated over have different sizes, both iterators will never reach the end at the same time. Therefore we need to return true as soon as any of the iterators are at their end position, terminating the iteration loop.

You could make this semantically correct by calculating what the minimum distance of all the iterator begin/end pairs are. However, this can have issues:

While iteration should stop when any of the underlying iterators match their end, it raises the question of what should happen if you want to have a reverse iterator, should iteration start at the end of each element, or should it start at the minimum index?

Selecting the Correct Access Type

The consideration here is what the access type is when the internal iterators are is de-referenced. In most cases this should be a reference, but this is not always possible. For example, std::vector<bool> is a special case in that it doesn’t actually store bools as you might think. It does some special bit twiddling to save on space (although arguably this is implementation defined). A normal iterator of such is incapable of returning a reference, and de-referencing returns a value.

template <typename Iter>
using select_access_type_for = std::conditional_t<
    std::is_same_v<Iter, std::vector<bool>::iterator> ||
    std::is_same_v<Iter, std::vector<bool>::const_iterator>,
    typename std::iterator_traits<Iter>::value_type,
    typename std::iterator_traits<Iter>::reference
>;

The Result

So now you should be able to do something like this:

/* Using a range-for expression with structured bindings from C++17 */
for (auto && [x, y] : zip(a, b)) {
    c.push_back(x + y);
}

Even though the zip() returns a temporary (as is necessary here), in a range-for expression, it’s lifetime will be extended. 1

It looks like the structured bindings can be qualified by value or as a forwarding reference (double ampersands in range-for do this). You can also provide a temporary:

/* Using a range-for expression with structured bindings from C++17 */
for (auto && [x, y] : zip(a, std::vector<int>{1, 2, 3})) {
    c.push_back(x + y);
}

In this case, the type of y (which represents the temporary) is bound to a reference, and i’m not sure that makes much sense. It might be better to have it bound to a const reference, but that would likely involve more trickery.

Zipping Multiple Ranges

You can extend this idea to zip multiple containers. I will gloss over this a little bit as it is quite a bit more complicated than the previous example.

The Zipper

This time, the references or values for the containers are stored in a tuple, and the iterator type (which is now also variadic) is bound to the various relevant iterators.

template <typename ... T>
class zipper
{
public:
    using zip_type = zip_iterator<select_iterator_for<T> ...>;

    /*  std::forward is used to preserve the value category of the
        containers */
    template <typename ... Args>
    zipper(Args && ... args)
        : m_args{std::forward<Args>(args)...}
    {
    }

    auto begin() -> zip_type {
        return std::apply([](auto && ... args){ 
                return zip_type(std::begin(args)...); 
            }, m_args);
    }

    auto end() -> zip_type {
        return std::apply([](auto && ... args){ 
                return zip_type(std::end(args)...); 
            }, m_args);
    }

private:
    std::tuple<T ...> m_args;
};

select_iterator_for is the same as in the previous example

std::apply takes a tuple and a function which accepts the tuple elements as function arguments. This required to expand the arguments in the m_args member variable, in order to call either begin or end and pass the results to a variadic function, which in this case is the zip iterator

Constructing the Zipper

template <typename ... T>
auto zip(T && ... t) {
    return zipper<T ...>{std::forward<T>(t)...};
}

The difference here is that the helper function is variadic, but the same principles apply.

The Iterator

Instead of storing the underlying iterators in names types, we have to store an unknown number of them. I choose a std::tuple as it makes it impossible to mistakenly compare truthfulness with different types; the iterator types become a part of the wrapper.

template <typename ... Iters>
class zip_iterator
{
public:
    using value_type = std::tuple<
        select_access_type_for<Iters>...
    >;

    zip_iterator() = delete;

    zip_iterator(Iters && ... iters)
        : m_iters {std::forward<Iters>(iters)...}
    {
    }

    /* Increment all the underlying iterators */
    auto operator++() -> zip_iterator& {
        std::apply([](auto & ... args){
                ((args += 1), ...);
            }, m_iters);
        return *this;
    }

    auto operator++(int) -> zip_iterator {
        auto tmp = *this;
        ++*this;
        return tmp;
    }

    auto operator!=(zip_iterator const & other){
        return !(*this == other);
    }

    /*  Elementwise compare another iterator of this type to detect if any
        of the elements match. */
    auto operator==(zip_iterator const & other){
        return any_match(m_iters, other.m_iters);
    }

    /*  Dereference operator that constructs a tuple (`value_type`) by
        expanding and dereferencing each of the underlying iterators in */
    auto operator*() -> value_type {
        return std::apply([](auto && ... args){ 
                return value_type(*args...); 
            }, m_iters);
    }

private:
    std::tuple<Iters...> m_iters;
};

std::apply is used here to increment each of the underlying iterators, and the equivalence operator returns true if any of the iterators pairs are equal (more on this below).

select_access_type_for is the same as in the previous example

Matching the End of the Tuples

Finally, the equivalence operator is a special case. In a twist of fate, we also need to do parallel iteration on all of the iterators! We can’t use std::apply as, as far as I can tell, only works on one tuple at a time. With a somewhat terrifying set of template functions:

/*  The index sequence is only used to deduce the Index sequence in the template
    declaration. It uses a fold expression which is applied to the indexes,
    using each expanded value to compare tuple value at that index. If any of
    the tuple elements are equal, the function will return true. */
template <typename ... Args, std::size_t ... Index>
auto any_match_impl(std::tuple<Args...> const & lhs,
        std::tuple<Args...> const & rhs,
        std::index_sequence<Index...>) -> bool {
    return (... | (std::get<Index>(lhs) == std::get<Index>(rhs)));
}


/*  User function for finding any elementwise match in two tuples. Forwards to
    to the implementation the two tuples along with a generated index sequence
    which will have the same length as the tuples. */
template <typename ... Args>
auto any_match(std::tuple<Args...> const & lhs, 
        std::tuple<Args...> const & rhs) -> bool {
    return any_match_impl(lhs, rhs, std::index_sequence_for<Args...>{});
}

If any of the iterator pairs are equal, the function will return true.

The Result

auto a = std::vector<int>{1, 2, 3, 4, 5, 6};
auto b = std::vector<int>{1, 2, 3, 4, 5, 6, 7};
auto c = std::vector<int>{0, 0, 0, 0, 0};
auto const & d = b;

for (auto && [x, y, z] : c9::zip(a, d, c)) {
    z = x + y;
}

auto expected = std::vector<int>{2, 4, 6, 8, 10};
assert(c == expected);

Footnotes