Python is one of my favourite languages and one of the features it has is the zip function, which allows for parallel iteration of arrays or list like objects. I wondered if something like that was possible in C++. There is almost certainly something like this out there already, however I thought it would be an interesting exercise I could try and do myself.
Boost does have a zip iterator, which accepts a range rather than a container, on the condition that the ranges are the same size. This avoids some weirdness, and in fairness, is more idiomatic C++.
It turns out this is possible, but not without some trickery.
This is the basic motivation:
auto a = std::vector<int>{1, 2, 3};
auto b = std::vector<int>{4, 5, 6};
auto c = std::vector<int>{};
/* Using a range-for expression with structured bindings from C++17 */
for (auto && [x, y] : zip(a, b)) {
c.push_back(x + y);
}
/* "c" should contain 5, 7, 9 */
The idea is to use a range-for expression to unpack values into a structured binding, from a tuple returned by dereferencing an iterator that wraps a set of iterators to be iterated in parallel.
GitHub repository available here
The Idea
We need several things to achieve this:
-
Helper function to return
-
Container that provides
begin()
andend()
to satisfy the range-for requirements, which returns -
Iterator, that when de-referenced, returns
-
A tuple of values that can be de-structured through structured bindings
Zipping Two Ranges
The Zipper
I think a reasonable start would be the zipper
class. This is the class that provides the begin
and end
functions for the range-for expression. It
accepts and stores references to the containers to be iterated over.
template <typename T, typename U>
class zipper
{
public:
/* Select the most appropriate iterator for the past in containers.
This will be covered later */
using Iter1 = select_iterator_for<T>;
using Iter2 = select_iterator_for<U>;
using zip_type = zip_iterator<Iter1, Iter2>;
/* Constructor that takes forwarding references and forwards them to
initialise the members */
template <typename V, typename W>
explicit zipper(V && a, W && b)
: m_a{std::forward<V>(a)}
, m_b{std::forward<W>(b)}
{
}
/* Functions to satisfy the range-for requirement. Returns iterators
for begin and end functions respectively. */
auto begin() -> zip_type {
return zip_type{std::begin(m_a), std::begin(m_b)};
}
auto end() -> zip_type {
return zip_type{std::end(m_a), std::end(m_b)};
}
private:
T m_a;
U m_b;
};
The constructor is templatised in order to accept a forwarding (universal)
reference, which is then forwarded (using std::forward
)
to initialise the member variables. This makes sure the value category is
preserved such that if one of the values is an rvalue reference, the argument
can be moved into the member.
Selecting the Correct Iterator
We need to select the correct iterator for the passed in types. If one of the
types is a const reference then we need to make sure we use it’s
const_iterator
, rather than the normal iterator.
std::decay
is needed because if
T
is a reference, and is not a complete type we can’t
deduce, i.e., we can’t deduce T&::iterator
. Using
decay will give us the fundamental type and allows to access it’s type
definitions (as T::iterator
).
template <typename T>
using select_iterator_for = std::conditional_t<
std::is_const_v<std::remove_reference_t<T>>,
typename std::decay_t<T>::const_iterator,
typename std::decay_t<T>::iterator>;
Constructing the Zipper
Utility method for creating the zipper in the place of the range expression in the for loop we are targeting. We take forwarding references for both arguments because we want to accept lvalues and rvalues, const and non-const.
template <typename T, typename U>
auto zip(T && t, U && u) {
return zipper<T, U>{std::forward<T>(t), std::forward<U>(u)};
}
The Iterator
The zip iterator holds two iterators for the actual type being iterated over, and proxies their usual functions in tandem.
template <typename Iter1, typename Iter2>
class zip_iterator
{
public:
using value_type = std::pair<
select_access_type_for<Iter1>,
select_access_type_for<Iter2>
>;
zip_iterator() = delete;
zip_iterator(Iter1 iter_1_begin, Iter2 iter_2_begin)
: m_iter_1_begin {iter_1_begin}
, m_iter_2_begin {iter_2_begin}
{
}
auto operator++() -> zip_iterator& {
++m_iter_1_begin;
++m_iter_2_begin;
return *this;
}
auto operator++(int) -> zip_iterator {
auto tmp = *this;
++*this;
return tmp;
}
auto operator!=(zip_iterator const & other) {
return !(*this == other);
}
auto operator==(zip_iterator const & other) {
return m_iter_1_begin == other.m_iter_1_begin ||
m_iter_2_begin == other.m_iter_2_begin;
}
auto operator*() -> value_type {
return value_type{*m_iter_1_begin, *m_iter_2_begin};
}
private:
Iter1 m_iter_1_begin;
Iter2 m_iter_2_begin;
};
The comparison we need to make in order to detect when iteration should end is when any of the iterators equal their end position. From a semantics point of view, equality in this sense is not really accurate; you would expect, ordinarily, that the test would be the aggregate equivalence of it’s members. However, if the things being iterated over have different sizes, both iterators will never reach the end at the same time. Therefore we need to return true as soon as any of the iterators are at their end position, terminating the iteration loop.
You could make this semantically correct by calculating what the minimum distance of all the iterator begin/end pairs are. However, this can have issues:
-
Random access iterators can have the distance easily calculated, through subtraction, however;
-
The distance of a pair of forward iterators is only known by performing a pass through to calculate it’s count, and;
-
Input iterators can’t be passed over multiple times.
While iteration should stop when any of the underlying iterators match their end, it raises the question of what should happen if you want to have a reverse iterator, should iteration start at the end of each element, or should it start at the minimum index?
Selecting the Correct Access Type
The consideration here is what the access type is when the internal iterators
are is de-referenced. In most cases this should be a reference, but this is not
always possible. For example, std::vector<bool>
is a
special case in that it doesn’t actually store bool
s
as you might think. It does some special bit twiddling to save on space
(although arguably this is implementation defined). A normal iterator of such is
incapable of returning a reference, and de-referencing returns a value.
template <typename Iter>
using select_access_type_for = std::conditional_t<
std::is_same_v<Iter, std::vector<bool>::iterator> ||
std::is_same_v<Iter, std::vector<bool>::const_iterator>,
typename std::iterator_traits<Iter>::value_type,
typename std::iterator_traits<Iter>::reference
>;
The Result
So now you should be able to do something like this:
/* Using a range-for expression with structured bindings from C++17 */
for (auto && [x, y] : zip(a, b)) {
c.push_back(x + y);
}
Even though the
zip()
returns a temporary (as is necessary here), in a range-for expression, it’s lifetime will be extended. 1
It looks like the structured bindings can be qualified by value or as a forwarding reference (double ampersands in range-for do this). You can also provide a temporary:
/* Using a range-for expression with structured bindings from C++17 */
for (auto && [x, y] : zip(a, std::vector<int>{1, 2, 3})) {
c.push_back(x + y);
}
In this case, the type of y
(which represents the
temporary) is bound to a reference, and i’m not sure that makes much sense. It
might be better to have it bound to a const
reference,
but that would likely involve more trickery.
Zipping Multiple Ranges
You can extend this idea to zip multiple containers. I will gloss over this a little bit as it is quite a bit more complicated than the previous example.
The Zipper
This time, the references or values for the containers are stored in a tuple, and the iterator type (which is now also variadic) is bound to the various relevant iterators.
template <typename ... T>
class zipper
{
public:
using zip_type = zip_iterator<select_iterator_for<T> ...>;
/* std::forward is used to preserve the value category of the
containers */
template <typename ... Args>
zipper(Args && ... args)
: m_args{std::forward<Args>(args)...}
{
}
auto begin() -> zip_type {
return std::apply([](auto && ... args){
return zip_type(std::begin(args)...);
}, m_args);
}
auto end() -> zip_type {
return std::apply([](auto && ... args){
return zip_type(std::end(args)...);
}, m_args);
}
private:
std::tuple<T ...> m_args;
};
select_iterator_for
is the same as in the previous example
std::apply
takes a tuple and a function which accepts
the tuple elements as function arguments. This required to expand the arguments
in the m_args member variable, in order to call either begin or end and pass the
results to a variadic function, which in this case is the zip iterator
Constructing the Zipper
template <typename ... T>
auto zip(T && ... t) {
return zipper<T ...>{std::forward<T>(t)...};
}
The difference here is that the helper function is variadic, but the same principles apply.
The Iterator
Instead of storing the underlying iterators in names types, we have to store an
unknown number of them. I choose a std::tuple
as it makes it impossible to
mistakenly compare truthfulness with different types; the iterator types
become a part of the wrapper.
template <typename ... Iters>
class zip_iterator
{
public:
using value_type = std::tuple<
select_access_type_for<Iters>...
>;
zip_iterator() = delete;
zip_iterator(Iters && ... iters)
: m_iters {std::forward<Iters>(iters)...}
{
}
/* Increment all the underlying iterators */
auto operator++() -> zip_iterator& {
std::apply([](auto & ... args){
((args += 1), ...);
}, m_iters);
return *this;
}
auto operator++(int) -> zip_iterator {
auto tmp = *this;
++*this;
return tmp;
}
auto operator!=(zip_iterator const & other){
return !(*this == other);
}
/* Elementwise compare another iterator of this type to detect if any
of the elements match. */
auto operator==(zip_iterator const & other){
return any_match(m_iters, other.m_iters);
}
/* Dereference operator that constructs a tuple (`value_type`) by
expanding and dereferencing each of the underlying iterators in */
auto operator*() -> value_type {
return std::apply([](auto && ... args){
return value_type(*args...);
}, m_iters);
}
private:
std::tuple<Iters...> m_iters;
};
std::apply
is used here to increment each of the
underlying iterators, and the equivalence operator returns true if any of the
iterators pairs are equal (more on this below).
select_access_type_for
is the same as in the previous example
Matching the End of the Tuples
Finally, the equivalence operator is a special case. In a twist of fate, we also
need to do parallel iteration on all of the iterators! We can’t use
std::apply
as, as far as I can tell, only works on
one tuple at a time. With a somewhat terrifying set of template functions:
/* The index sequence is only used to deduce the Index sequence in the template
declaration. It uses a fold expression which is applied to the indexes,
using each expanded value to compare tuple value at that index. If any of
the tuple elements are equal, the function will return true. */
template <typename ... Args, std::size_t ... Index>
auto any_match_impl(std::tuple<Args...> const & lhs,
std::tuple<Args...> const & rhs,
std::index_sequence<Index...>) -> bool {
return (... | (std::get<Index>(lhs) == std::get<Index>(rhs)));
}
/* User function for finding any elementwise match in two tuples. Forwards to
to the implementation the two tuples along with a generated index sequence
which will have the same length as the tuples. */
template <typename ... Args>
auto any_match(std::tuple<Args...> const & lhs,
std::tuple<Args...> const & rhs) -> bool {
return any_match_impl(lhs, rhs, std::index_sequence_for<Args...>{});
}
If any of the iterator pairs are equal, the function will return
true
.
The Result
auto a = std::vector<int>{1, 2, 3, 4, 5, 6};
auto b = std::vector<int>{1, 2, 3, 4, 5, 6, 7};
auto c = std::vector<int>{0, 0, 0, 0, 0};
auto const & d = b;
for (auto && [x, y, z] : c9::zip(a, d, c)) {
z = x + y;
}
auto expected = std::vector<int>{2, 4, 6, 8, 10};
assert(c == expected);
Footnotes
-
C++ Reference: Temporary range expression ↩