kirit.com

Created 26th April, 2005 13:12 (UTC), last edited 22nd December, 2016 03:24 (UTC)

Writing about C++, Programming, Fost 5, the web, Thailand and anything else that catches my attention—with some photos thrown in

Pacific++ trip report

Posted 31st October, 2017 02:25 (UTC), last edited 2nd November, 2017 01:31 (UTC)

Pacific++ is the first programming language conference I've been to, and I have to say that actually being there is a big step up from just watching videos on youtube. The conference was small and intimate, which was great. I think I probably managed to talk to nearly half of the about 70 attendees. We got mugs

Day 1

Chandler Carruth opened with a live demo (video) of how to build clang and it's tooling and then went on to demo some of the other features.

I've been building a toolchain for our internal use so that we're able to make use of the new coroutines features that shipped with clang 5. I'd kind of expected this to be a short term thing so hadn't put too much effort into it, so it was interesting to learn that Google actually do this weekly just using HEAD of master for all of their builds. Maybe tidying this up for long term use will be good.

One thing Chandler didn't talk about is how they distrubute the toolchain to developers, a problem I think we've solved quite well using docker. I think it's something that I should probably blog about.

Making use of the llvm linker (-fuse-ld=lld, it's much faster) and the use of the link time optimiser (-flto=thin) also looked worthwhile.

He also showed us the CFI sanitizer (-fsanitize=cfi) which is designed to help harden your code in production (important for our networking code).

Chandler only needed four and half minutes to compile all of clang, but he did later admit that he was using a 128 core monster he has at home built especially to make clang builds fast.

Next up Toby Allsopp talked about coroutines (video). He gave a general high level overview of how they work and some of the things you can do with them, and most importantly, showed the benefits they bring in terms of code clarity. He has some great diagrams of the control flow through the coroutine and its calling function which would certainly help my tutorial articles. I'll be linking his from the introduction article once it's up on youtube.

Matt Bentley took us through his higher performance list implementation (video) which uses an allocation scheme that tries to keep list cells contiguous in memory as far as possible. It turns out that there are many places that the common wisdom of always just using a vector isn't the best for performance and he showed us places where list implementations (especially his) do win out.

Dean Michael Berris explained how XRay works, which is a tool for gathering performance data about the execution of your code, and is fast enough that it can be used in production systems with the data gathering turned on and off whilst the system is running.

This looks like a very interesting capability that is now included in newer versions of clang (and is yet another reason why building your toolchain is worthwhile).

Dean took us through the low level implementation details of how the instrumentation is turned on and off at runtime and why the overhead is so low. He also showed some of the (offline) analysis tools that come with XRay.

Right now it doesn't support shared objects (dynamic linking with .so files), but Dean promised me that was work in progress. Still more reason to stay as near to HEAD with llvm as possible—and, if whoever is working on it needs somebody to test it out then please let me know :)

The final talk was Christian Blume's about his transwarp library, which is able to build up a graph (acyclic tree only) of tasks with the dependencies and then execute it multiple times (the other approach is to build the tree for each group of tasks and then rip it down when they've executed).

It takes a very different approach to how this would look with coroutines or the ranges TS, which may be easier to work with where the dependancy graph is only known at runtime.

At the end of the time there was much lively discussion in the bar followed by the speaker's dinner which I'd decided to pay the extra for given that I was there on my own and figured it would be a good opportunity to talk to more people (spoiler, it was well worth it).

At the dinner Chandler came out with a very interesting rant on the difficulties compilers have with user defined literals, especially for compile string handling of the type I've been playing around with in f5-cord's tstringclass — Oops, and somethingh I'll have to think more about. Not only are there parsing problems with them, but more importantly, compile time substring processing is especially costly in terms of compile times (thankfully not something I care about).

Day 2

The second day started off with Jason Turner (video) telling us about the importance of noexcept. I'd always not worried too much about it because I thought that the overhead of having to manage the terminate condition mandated in the standard would negate any possible code generation improvements. Jason managed to show that was categorically false, and even more importantly demonstrated many code improvements directly driven by trying to make things noexcept that should be.

Carl Cook showed a number of techniques used by high frequency trading developers to reduce latency when trading. I've been watching a lot of high throughput videos by games developers and then applying those techniques to Postgres database designs, so it was interesting to see some low latency examples too. For some of the things we want to do latency can be important so now I need to think about how to apply these techniques to databases.

Wanting to implement state machines was a big reason for me to pick up coroutines, after watching Dominic Robinson's ACCU talk, so it was great to see what Nick Sarten had done through using std::variant. He benchmarked a few different approaches, including one based on inheritance. This implementation, like a coroutine based one, performs a memory allocation on state change and the benchmarks showed just how expensive that would be. Something we're going to need to think a lot more about.

Sarah Smith is a much braver person than me clearly. I've done some live coding talks, but she built an entire application using Qt and then ran it on her desktop and phone (iOS, of course the Android had problems connecting to the dev environment, as they always do).

She also talked about her history and how she ended up a mobile developer and entrepreneur. I always love to hear stories about how people ended up where they are.

The final talk was Tom Isaacson showing how, what look like the same warnings, actually differ a lot between compilers and how that affects cross platform development. I don't switch warnings to errors because of the range of different compilers we use, but I can certainly see the appeal and it would be good to tighten this up and turn on more warnings. He showed some interesting ones that are not turned on by default.

Go to a conference

This was my first C++ conference (actually, my first real tech conference), and it was well worth the trip. I've watched a lot of videos from other conferences, and although great, it's not anything like actually being there and getting to talk to people.

I think my next mission should be to make it to the ACCU conference in the UK.


I'll link the videos as they come online. If any other links are wrong, or could be better elsewhere please let me know. I've probably misunderstood what people were trying to say in their talks, mistakes are my own.


Categories:

Fost 5 release 5.17.09.45051 now out

Posted 23rd September, 2017 03:51 (UTC), last edited 23rd September, 2017 04:12 (UTC)

The layout changes to fostlib::json have now landed. Previously the atomic parts of JSON (bool, int, double, string etc.) where stored in a separate variant to the one that handled the atomic, array and object parts. By bringing this all into one variant the type is smaller and dealing with atomic types is faster.

As part of this change we also changed the object and array to being stored by a shared_ptr rather than directly. This was always the intended design, and means that now taking a copy of a json is significantly cheaper. We have some more changes coming now that we have this base that should make the type smaller still and further improve performance.

We're also still moving towards improving string handling with a few new immutable string types in the pipeline and a move towards using immutable string views in many more places.

It's also about time that we get serious with Python 3 support. All of our projects are now using this and we need to be able to support bindings for it correctly. There's still some debate about whether that should be a branch in fost-py or a separate library. In any case the entirety of the Python 2 version will be replaced with something completely new.

Building on Linux

You will need a C++14 compiler. Recent versions of either gcc or clang are suitable.

git clone --branch=5.17.09.45051 --recursive git@github.com:KayEss/fost-hello.git
cd fost-hello
Boost/build
hello/compile
dist/bin/hello-world-d

Download locations

Applications

  • beanbag — Stand alone transactional JSON database server — git@github.com:KayEss/beanbag.git
  • beanbag-seed — Seed project for giving you a starting point to develop web applications using Beanbag — git@github.com:KayEss/beanbag-seed.git
  • fost-hello — Sample seed project — git@github.com:KayEss/fost-hello.git
  • mengmon — Stand alone web server — git@github.com:KayEss/mengmom.git
  • wright — Experimental build system — git@github.com:KayEss/wright.git

Libraries

  • f5-cord — First version of a new string library with compile time string and Unicode support — git@github.com:KayEss/f5-cord.git
  • f5-threading — Preview of the first Fost 5 library which includes help for threading — git@github.com:KayEss/f5-threading.git
  • fost-aws — Amazon AWS and OpenStack — git@github.com:KayEss/fost-aws.git
  • fost-android — Eclipse project for Android that allows Fost 4 and Beanbags to be used on mobile devices — git@github.com:KayEss/fost-android.git
  • fost-android-ndk — The native code for Android. Includes required parts of Boost configured to use the standard Android build system.
  • fost-beanbag — Transactional JSON database — git@github.com:KayEss/fost-beanbag.git
  • fost-base — Build system and core libraries — git@github.com:KayEss/fost-base.git
  • fost-internet — Internet protocols, servers & clients — git@github.com:KayEss/fost-internet.git
  • fost-meta — All libraries in one wrapper — git@github.com:KayEss/fost-meta.git
  • fost-orm — Object/Relational mapping — git@github.com:KayEss/fost-orm.git
  • fost-postgres — PostgreSQL — git@github.com:KayEss/fost-postgres.git
  • fost-py — Python (2.x) bindings — git@github.com:KayEss/fost-py.git
  • fost-web — Web server libraries — git@github.com:KayEss/fost-web.git
  • fost-wright — Experiment in a build system — git@github.com:KayEss/fost-wright.git

Detailed change log

cord

  • Add a _t literal for creating tstring instances. This allows us to deprecate the vstring template.
  • Added iostream output for lstring.

fostgres

  • Add -h and -U options to fostgres-test to control how to access the database.

fost-base

  • Fix generation of log messages from JSON where the modules are an array.
  • Count the number of times an exception has been absorbed.
  • Remove variant for storing atoms in json
  • Don't print empty objects in the log stream.
  • Load the global logging configuration much later so it can be changed using the -i or -j command line switches.
  • Better integrate utf::u8_view and string.
  • Make array_view sliceable.
  • There is now an explicit constructor from std::string for the u8_view.
  • Make dates fully comparable.
  • Improve error report when coercing JSON to strings fails.
  • The unexpected_eof error can now take a boost::system::error_code for use with socket errors.

fost-internet

  • Ampersands, single quote and dollar symbol added to allowed URL path characters.
  • The conversion of HTTP request file paths to strings now works for a wider range of inputs. The conversion to boost::filesystem::wpath is problematic with Boost versions prior to 1.64 due to the way they use codecvt.

fost-orm

  • Improve error report when a beanbag update precondition wasn't met.

fost-postgres

  • Make it possible to iterate over the columns in the row.
  • Change libpqxx paths for latest code.
  • Implement assignment and pre-increment for recordset::const_iterator

fost-py

  • The spider always outputs a test.txt file with all results in it.

fost-web

  • The template middleware now uses the headers from the wrapped view instead of throwing them away.
  • Logging middleware also records the processing duration.

Categories:

Awaiting

Created 27th August, 2017 06:37 (UTC), last edited 27th August, 2017 06:55 (UTC)

We've covered co_return and co_yield now. The final new keyword introduced by the coroutines TS is co_await. Where the first two keywords allow a coroutine to suspend whilst its consumer gets ready to use results, co_await is used to allow the coroutine to suspend whilst it waits for things it needs. I tend to think of co_return and co_yield as servicing the outside of the coroutine and co_await as dealing with the inside.

We're going to start with the refactored lazy class that we had a couple of tutorials ago and make it so that our lazy class supports co_await.

lazy<int> answer() {
    std::cout << "Thinking deep thoghts..." << std::endl;
    co_return 42;
}

sync<int> await_answer() {
    std::cout << "Started await_answer" << std::endl;
    auto a = answer();
    std::cout << "Got a coroutine, let's get a value" << std::endl;
    auto v = co_await a;
    std::cout << "And the coroutine value is: " << v << std::endl;
    v = co_await a;
    std::cout << "And the coroutine value is still: " << v << std::endl;
    co_return 0;
}

int main() {
    return await_answer().get();
}

The first thing to notice here is that we now have two coroutines, both answer and await_answer. We can't perform the await inside main because main (along with constructors and destructors) isn't allowed to be a coroutine.

The reason for this is that a coroutine's return value needs to be properly dealt with from its calling context, so that the coroutine does its work properly. If we tried to turn main into a coroutine there'd be nothing to handle this for us. There're similar problems for both constructors and destructors, neither of which return values.

Because there are two coroutines in play we'll only keep the prints for the lifetime tracking of the lazy instance and not the sync one. We're using sync only as a mechanism to allow us to enter a context within which we can use coroutines and it doesn't play any part in the example other than that.

The co_await API

Just like our co_return and co_yield examples, the co_await requires a particular API on anything we want to await. There are three parts:

  1. await_ready returns a boolean to describe whether or not the suspend is needed, true for don't suspend and false for suspend.
  2. await_suspend which is called when the suspend is needed because the value isn't ready yet.
  3. await_resume which returns the value that is being awaited on. This becomes the result of the co_await expression.

Let's look at what these would look like for our lazy class.

bool await_ready() {
    const auto ready = coro.done();
    std::cout << "Await " << (ready ? "is ready" : "isn't ready") << std::endl;
    return coro.done();
}

We can easily tell if the value is available by looking to see if the lazy is done or not. If it is done then we can move directly on to returning the value to the co_await. If it isn't done we're going to need to resume the lazy body so it can co_return something into the promise type for us to pass on.

void await_suspend(std::experimental::coroutine_handle<> awaiting) {
    std::cout << "About to resume the lazy" << std::endl;
    coro.resume();
    std::cout << "About to resume the awaiter" << std::endl;
    awaiting.resume();
}

Here we can directly see the two coroutines in our example. The function is a member of the lazy returned by answer so the class member coro is the handle to the answer coroutine. We always resume this because we can only end up here if the coroutine isn't done already (checked in the await_ready member).

The passed in awaiting handle is for the sync instance associated with the await_answer coroutine, which is where the co_await appears. Once the lazy is done we resume this coroutine.

The last part (returning the value) is trivial:

auto await_resume() {
    const auto r = coro.promise().value;
    std::cout << "Await value is returned: " << r << std::endl;
    return r;
}

When we run this we can see things working as expected (remember that we only print lifetime information for the lazy instance and not the sync one):

Started await_answer
Promise created
Send back a lazy
Created a lazy object
Started the coroutine, wait for the go!
Move constructed a lazy object
Lazy gone
Got a coroutine, let's get a value
Await isn't ready
About to resume the lazy
Thinking deep thoghts…
Got an answer of 42
Finished the coro
About to resume the awaiter
Await value is returned: 42
And the coroutine value is: 42
Await is ready
Await value is returned: 42
And the coroutine value is still: 42
Promise died
Lazy gone

The first time await_ready is called the lazy hasn't yet run so we can see answer being entered when it is resumed and then the execution path as the value makes it way back out. The second time, however, the lazy is already done so we don't go down the resume path as we skip directly on to returning a second copy of the value.

Comparison with get

Let's look again at our get implementation:

T get() {
    std::cout << "We got asked for the return value..." << std::endl;
    if ( not coro.done() ) coro.resume();
    return coro.promise().value;
}

You'll notice that all the three parts of our new API appear in this:

  1. The if statement's conditional expression appears as await_ready
  2. The resume appears in the await_suspend.
  3. The return is handled by the await_resume.

There is one rather important difference though. The await_suspend also gets given the handle to the second coroutine, and this allows it to do a number of interesting things with it, all of which we'll have to wait for future tutorials for. We'll see our first example in the next article where we'll use co_await for something a bit more cool and surprising.

Addendum 1: operator co_await

The thing that is a bit nasty about this code is that the await API is implemented directly on the lazy instance and this allows it be used (and abused) by anybody who has a lazy instance. Luckily there is something we can do about that. We can add await support to any type by adding support for operator co_await. Like many of these operators it can be defined as a method, or stand alone. The signature for the the member version would be:

awaitable_type operator co_await();

Where awaitable_type now implements the above API. If we want it to be stand alone then it would be:

template<typename T>
awaitable_type operator co_await(lazy<T> &);

This is analagous to operators like == and <<.

Using this, and the new trick of local types we can use the following:

auto operator co_await() {
    struct awaitable_type {
        handle_type coro;
        bool await_ready() {
            const auto ready = coro.done();
            std::cout << "Await " << (ready ? "is ready" : "isn't ready") << std::endl;
            return coro.done();
        }
        void await_suspend(std::experimental::coroutine_handle<> awaiting) {
            std::cout << "Got to resume the lazy" << std::endl;
            coro.resume();
            std::cout << "Got to resume the awaiter" << std::endl;
            awaiting.resume();
        }
        auto await_resume() {
            const auto r = coro.promise().value;
            std::cout << "Await value is returned: " << r << std::endl;
            return r;
        }
    };
    return awaitable_type{coro};
}

Addendum 2: Tail call optimisation

In the current TS we have to resume both coroutines in the await_suspend. The lazy coroutine's body is executed first and when it suspends after the co_return our await_suspend resumes execution and then continues execution of the awaiting coroutine, in this case

void await_suspend(std::experimental::coroutine_handle<> awaiting) {
    std::cout << "About to resume the lazy" << std::endl;
    coro.resume();
    std::cout << "About to resume the awaiter" << std::endl;
    awaiting.resume();
}

It's not quite as dire as it seems though, because the call is in the tail position. This allows the compiler to perform tail call optimisation to remove the stack frame, something that the optimsers in modern compilers are pretty good at. This means optimised production builds should be fine, but it may prove a problem for debug builds.

To deal with this better a future version of the TS is likely to allow await_suspend to return a coroutine handle to be used as a continuation (that is, to be called immediately after await_suspend returns). This looks like:

auto await_suspend(std::experimental::coroutine_handle<> awaiting) {
    std::cout << "About to resume the lazy" << std::endl;
    coro.resume();
    std::cout << "Returning awaiting coroutine as a continuation" << std::endl;
    return awaiting;
}

In this version the stack frame for our await_suspend is completely gone before the awaiting coroutine is resumed.


Categories:
Posted: 27th August, 2017 07:52 (UTC)
First announced.

Generating Iterators

Created 30th July, 2017 03:58 (UTC), last edited 27th August, 2017 06:43 (UTC)

Now that we have co_yield working and understand how it works we should be able to quite easily convert our generator to provide iterators so it acts like a normal C++ collection (or range if you're following the ranges TS work).

int main() {
    for ( auto v : count() )
        std::cout << ">> " << v << std::endl;
    return 0;
}

The refactoring is pretty much a matter of just moving code around, but there are a couple of things worth looking at more closely. First though some easy bits:

class iterator {
    generator &owner;
    bool done;
    void advance() {
        std::cout << "Moving to next" << std::endl;;
        owner.coro.resume();
        std::cout << "Are we done? ";
        auto still_going = not owner.coro.done();
        std::cout << (still_going ? "There is another" : "We're done") << std::endl;
        done = not still_going;
    }
};

The done flag is interesting. Because we don't have any sort of position, and certainly no way of rolling back there isn't much we can do for equality of these iterator instances other than tell if they're both at the end or not.

All we can do with the coroutine handle is either destroy or resume it. We can't copy them and we can't step back to a previous suspension point. Our options here are very limited. This leads to the following for !=:

bool operator != (const iterator &r) const {
    return done != r.done;
}

And now it becomes clear that our iterator constructor should mark done as true for the end iterator:

iterator(generator &o, bool d)
: owner(o), done(d) {
    if ( not done ) advance();
}

In generator we now know how to make and return the iterators:

iterator begin() {
    return iterator{*this, false};
}
iterator end() {
    return iterator{*this, true};
}

This just leaves the final two iterator members we need for our ranged for loop:

iterator &operator ++ () {
    advance();
    return *this;
}
int operator * () const {
    return owner.coro.promise().current_value;
}

Running gives us what we expect:

New promise created
generator
Initial suspend
~generator (empty)
Moving to next
Going to yield 1
Yielded 1
Are we done? There is another
>> 1
Moving to next
Going to yield 2
Yielded 2
Are we done? There is another
>> 2
Moving to next
Going to yield 3
Yielded 3
Are we done? There is another
>> 3
Moving to next
count() is done
return_void
Final suspend
Are we done? We're done
~generator (contains a coroutine)
Promise died

To make our generator properly and generally useful we should now turn it into a template and fill in the rest of the iterator so it has all of the expected members. I'll leave that as an exercise.

Yielding primes

So let's look at yielding something a little bit more interesting:

generator primes(int limit) {
    if ( limit >= 2 ) co_yield 2;
    for ( int n = 3; n <= limit; n += 2 ) {
        if ( is_prime(n) ) co_yield n;
    }
}

And we can call it like this:

int main() {
    std::cout << "coro4" << std::endl;
    for ( auto v : primes(10) )
        std::cout << ">> " << v << std::endl;
    return 0;
}

It's true that iterators can also be used as generators and we don't need this co_yield machinery to produce primes, but it's also true that this code is simpler, and that this just works. We haven't needed to alter any of our generator class code, or the way that the iterators work.

This is one of the big benefits of using coroutines. In the pre-coroutine world, ff you want to do different things then each one is a different iterator implementation where you have to break the code up in exactly the right way. With co_yield the compiler does this for you.

We passed a parameter to the coroutine this time. How does that work? We've already seen the coroutine_handle class that we use to deal with the promise_type and we use to resume and destroy the coroutine itself. We've sometimes referred to this as a stack frame, because that's also what it is. The compiler arranges to allocate some memory for us where information about the coroutine can be stored for as long as we keep it alive.

As well as our promise_type the compiler keeps whatever bookkeeping it needs to remember where the coroutine is up to (which suspension point we're currently at when the coroutine isn't running) and it also keeps the arguments and all the local parameters there. This “stack frame” that is kept on the heap is what allows us to remember everything we need to run inside the coroutine as we bounce between it and the code that uses the results.

Because there is only a single stack frame that is on the heap these sorts of coroutine are, rather confusingly, referred to as “stackless” coroutines. This is because we have only a single frame that is remembered on the heap rather than a whole alternative stack, which explains the name: There's a stack frame, but not a whole stack.

If we were to have a whole stack then the sorts of coroutines we would be talking about would be “stackful coroutines”. Each type has their own set of trade offs, but on most platforms stackful coroutines are easy to implement as a library and don't require compiler support. The stackless coroutines are most easily done with compiler support and it's stackless coroutines that look to be standardised first (hopefully in C++20). The Boost coroutines libraries implement stackful coroutines if you want to play with those.

One of the biggest downsides of stackless coroutines is that they are “infectious”, in the sense that once you start to use them, more and more code become coroutines. This isn't so much a problem when you're only using co_return and co_yield, but we'll see it when we start to want to use co_await.

Stackful coroutines don't suffer from this, but the requirement to have a full stack means that each coroutine uses a lot more memory, meaning they don't scale to as many coroutines as you can with stackless ones.

Yielding an infinity

Let's play around with our prime numbers a bit more. Right now we have to know which the largest number we want to deal with is going to be up front. Let's change that:

generator primes() {
    co_yield 2;
    for ( int n = 3; true; n += 2 ) {
        if ( is_prime(n) ) co_yield n;
    }
}
int main() {
    for ( auto v : primes() )
        std::cout << ">> " << v << std::endl;
    return 0;
}

Normally an infinite loop like this is a Bad Idea™ and indeed when we run this it does loop forever, so probably some limit is in order. Let's try to put it on the consumer side:

int main() {
    for ( auto v : primes() ) {
        if ( v > 15 ) break;
        std::cout << ">> " << v << std::endl;
    }
    return 0;
}

This is the output:

New promise created
generator
Initial suspend
~generator (empty)
Moving to next
Yielded 2
Are we done? There is another
>> 2
Moving to next
Yielded 3
Are we done? There is another
>> 3
Moving to next
Yielded 5
Are we done? There is another
>> 5
Moving to next
Yielded 7
Are we done? There is another
>> 7
Moving to next
Yielded 11
Are we done? There is another
>> 11
Moving to next
Yielded 13
Are we done? There is another
>> 13
Moving to next
Yielded 17
Are we done? There is another
~generator (contains a coroutine)
Promise died

If you look at the end of this you can see that despite the fact that the coroutine never finishes the promise destructor worked just fine. It turns out that everything that is in the stack frame is also properly dealt with. When the coroutine is destroyed through its handle then the compiler makes sure that the correct destructors for everything that is used are properly called.

We can absolutely rely on this behaviour in the same way that we can rely on the compiler calling the correct destuctors when an exception bypasses a function. So long as a coroutine is suspended it's always safe to destroy it and never get around to continuing it. Later on we'll play with some more things this can be used for. For now though, we'll move on to the last of the key words: co_await.


Categories:
Posted: 30th July, 2017 04:28 (UTC)
First announced.

Yielding Generators

Created 15th July, 2017 05:28 (UTC), last edited 30th July, 2017 03:59 (UTC)

We've seen how the promise_type together with the coroutine return type handles the interactions between the caller and the coroutine itself.

Our target is to be able to do something pretty simple:

generator count() {
    std::cout << "Going to yield 1" << std::endl;
    co_yield 1;
    std::cout << "Going to yield 2" << std::endl;
    co_yield 2;
    std::cout << "Going to yield 3" << std::endl;
    co_yield 3;
    std::cout << "count() is done" << std::endl;
}

co_yield is actually very similar to our co_return example previously. There is really just one additional component, and one small change:

  1. Because our generator doesn't have any sort of return value there is an implicit return that produces nothing. This means that instead of return_value we are going to need to fill in a return_void method.
  2. For each value that is yielded the compiler will arrange to make a call to yield_value.

These two things together mean that we need to suspend at the yields (so the user of the generator gets a chance to read the value) and we know we're finished yielding values when the return_void promise method gets called.

Let's get started

Let's sketch out something that should look fairly familiar:

struct generator {
    struct promise_type {
    };
    using handle_type =
        std::experimental::coroutine_handle<promise_type>;
private:
    generator(handle_type h)
    : coro(h) {
        std::cout << "generator" << std::endl;
    }
    handle_type coro;
};

This should look fairly similar to our lazy and sync examples.

We can track the lifetime of the promise to help us see what's going on and we also suspend the coroutine both at its start and end, all by adding these members to our promise_type:

promise_type() {
    std::cout << "Promise created" << std::endl;
}
~promise_type() {
    std::cout << "Promise died" << std::endl;
}
auto initial_suspend() {
    std::cout << "Started the coroutine, let's pause!" << std::endl;
    return std::experimental::suspend_always{};
}
auto final_suspend() {
    std::cout << "Finished the coroutine" << std::endl;
    return std::experimental::suspend_always{};
}

We still need to return the generator object from our promise_type, and we also have to do something if an exception happens.

auto get_return_object() {
    std::cout << "Send back a generator" << std::endl;
    return generator{handle_type::from_promise(*this)};
}
void unhandled_exception() {
    std::cout << "Exception...." << std::endl;
    std::exit(1);
}

For our co_return based lazy and sync classes we also had to provide a mechanism for the co_return value to find its way to the caller. We actually still have to do this, but our coroutine example doesn't have a return statement, so now we have to implement the return_void member:

auto return_void() {
    std::cout << "return_void" << std::endl;
    return std::experimental::suspend_never{};
}

We're not going to suspend here because we want to just carry on and suspend at the final suspension point (where we will stop so the generator can clean up for us).

Yielding values

The last thing we need is to handle the yield itself. When the coroutines gets to a co_yield the compiler will call yield_value on our promise_type so we can see the value and decide what to do.

auto yield_value(int value) {
    std::cout << "Yielded " << value << std::endl;
    current_value = value;
    return std::experimental::suspend_always{};
}

It's also important that we suspend here. If we don't then our coroutine will carry on and run another co_yield. This will cause yield_value to be called again and we'll lose the value that was just given to us—it'll get overwritten by the new one.

As we saw before, we now have the coroutine performing its half of the control flow. The rest of the control flow that happens needs to be in the generator code because that is what our caller will interact with.

There is a dance between the generator and the promise_type to control the resumption of the coroutine and generation of values in exactly the right way. We need to make sure that we suspend and resume so that we get all of the values and don't skip any (especially easy at the start and end).

We need some sort of API that users of generator are going to use to fetch values. The obvious one is some sort of iterator based interface, but let's start with something a little bit simpler that involves less code.

int main() {
    auto g = count();
    while ( g.move_next() )
        std::cout << ">> " << g.current_value() << std::endl;
    return 0;
}

The implication here is that move_next will resume the coroutine until a co_yield or a return happens. If a co_yield happens we're going to return true, if the coroutine returns then we're going to return false to signal the end of the yielded sequence. Inside the loop we expect the current_value method to return the last value yielded.

The latter is very straight forward:

int current_value() {
    return coro.promise().current_value;
}

This is exactly as we had with the earlier sync and lazy examples. We store the last yielded value in the promise and can easily retrieve it through the coroutine handle we store in the generator.

For the sequence for fetching values we want this:

  1. Enter the coroutine, but suspend before doing anything.
  2. When the move_next is called resume the coroutine so that it can either return or yield a value.
  3. If the corotune is now done we didn't get a value so move_next returns false.
  4. If we're not done then return true so that move_next can be called again (going back to step 2).

Let's do that and also track what's going on:

bool move_next() {
    std::cout << "Moving to next" << std::endl;;
    coro.resume();
    std::cout << "Are we done? ";
    auto still_going = not coro.done();
    std::cout << (still_going ? "There is another" : "We're done") << std::endl;
    return still_going;
}

We now have enough to run:

Promise created
Send back a generator
generator
Started the coroutine, let's pause!
Moving to next
Going to yield 1
Yielded 1
Are we done? There is another
>> 1
Moving to next
Going to yield 2
Yielded 2
Are we done? There is another
>> 2
Moving to next
Going to yield 3
Yielded 3
Are we done? There is another
>> 3
Moving to next
count() is done
return_void
Finished the coroutine, brakes on
Are we done? We're done

Examining this we can see the dance between the two sides, the promise_type and the generator, play out in the right way.

  1. The promise creates and returns the generator.
  2. The coroutine is entered, but suspended.
  3. The generator's move_next is called and it advances the coroutine.
  4. The coroutine yields its first value
  5. move_next finds that the coroutine has yielded so returns true
  6. The value is printed
  7. Steps 3 to 7 are repeated another couple of times

The last time though is a bit different. The final move_next results in a call to return_void rather than yield_value and the coroutine ends up suspended at the final suspension point. coro.done() now returns true so we know that the coroutine has yielded everything it can (or wants to) and move_next returns false and we're all done.

To see the importance of getting this dance right let's change the initial_suspend to keep going:

auto initial_suspend() {
    std::cout << "Started the coroutine, keep going!" << std::endl;
    return std::experimental::suspend_never{};
}

Now we'll see this sequence:

Promise created
Send back a generator
generator
Started the coroutine, keep going!
Going to yield 1
Yielded 1
Moving to next
Going to yield 2
Yielded 2
Are we done? There is another
>> 2
Moving to next
Going to yield 3
Yielded 3
Are we done? There is another
>> 3
Moving to next
count() is done
return_void
Finished the coroutine, brakes on
Are we done? We're done

And because we didn't stop at the beginning we've resumed before the first value could be read. We can fix this by changing the way the generator is used:

int main() {
    std::cout << "coro3" << std::endl;
    auto g = count();
    do {
        std::cout << ">> " << g.current_value() << std::endl;
    } while ( g.move_next() );
    return 0;
}

Now we see all of our values:

Promise created
Send back a generator
generator
Started the coroutine, let's pause!
Going to yield 1
Yielded 1
>> 1
Moving to next
Going to yield 2
Yielded 2
Are we done? There is another
>> 2
Moving to next
Going to yield 3
Yielded 3
Are we done? There is another
>> 3
Moving to next
count() is done
return_void
Finished the coroutine, brakes on
Are we done? We're done

But of course we have another problem. The code that uses the generator now assumes that the generator must yield at least one value. Probably our first API is the best for a general use generator so let's put that back.

Fixing the leak

We still have a more subtle problem though. Notice we never see the “Promise died” message. This is because we're not managing the life time of the coroutine itself yet. We have the same issue that we saw with sync and lazy. We need to make the generator movable, but not copyable and then destroy the coroutine at the correct point in time.

~generator() {
    std::cout << "~generator "
        << (not coro ? "(empty)" : "(contains a coroutine)")
        << std::endl;
    if ( coro ) coro.destroy();
}
generator(const generator &) = delete;
generator(generator &&g)
: coro(g.coro) {
    g.coro = nullptr;
};

Now we can see the coroutine itself is destroyed when our generator goes out of scope:

Promise created
Send back a generator
generator
Started the coroutine, let's pause!
~generator (empty)
Moving to next
Going to yield 1
Yielded 1
Are we done? There is another
>> 1
Moving to next
Going to yield 2
Yielded 2
Are we done? There is another
>> 2
Moving to next
Going to yield 3
Yielded 3
Are we done? There is another
>> 3
Moving to next
count() is done
return_void
Finished the coroutine, brakes on
Are we done? We're done
~generator (contains a coroutine)
Promise died

Next we're going to convert this to an iterator, a straightforward enough procedure, but with a few things worth pointing out.


Categories:
Posted: 16th July, 2017 03:55 (UTC)
First announced.