Some Python asyncio disambiguation

2018-03-09

Recently I needed to work a little more in-depth with Python 3's asyncio. On the one hand, some people (like me) might scoff at this because it's just green threads and cooperative threading is a model that's fresh out of the '90s, and Python still has the GIL - and because Elixir, Erlang, Haskell, Clojure (also this), Java/Kotlin, and Go all handle async and M:N threading fine, and have for years. The Python folks have their own set of complaints, like I don't understand Python's Asyncio and Why I went from Python to Go (and not node.js). At least it is in good company with Emacs still.

On the other hand, it's still a useful enough paradigm that it's in the works for Rust (sort of… it had green threads which were removed in favor of a lighter approach) and broadly the JVM (sort of… they're trying to do fibers, not green threads). libuv brings something very similar to various languages, including C, and C already has an asyncio imitator with libgreen. Speaking of C, did anyone know that GLib has some decent support here via things like GTask, GThreadPool, and GAsyncQueue? I didn't until recently. But I digress…

asyncio is still preferable to manually writing code in continuation-passing-style (as that's all callbacks are, and last time I had to write that many callbacks, I hated it enough that I added features to my EDSL to avoid it), it's still preferable to a lot of manual arithmetic on timer values to try to schedule things, and it's still preferable to doing blocking I/O all over the place and trying to escape it with other processes. Coroutines are also preferable to yet another object-oriented train-wreck when it comes to handling things like pipelines. While Python's had coroutines for quite awhile now, asyncio perhaps makes them a little more obvious. David Beazley's slides are excellent for explaining its earlier coroutine support.

I found the Concurrency with Processes, Threads, and Coroutines tutorials to be an excellent overview of Python's asyncio, as well as most ways of handling concurrency in Python, and I highly recommend them.

However, I still had a few stumbling blocks in understanding, and below I give some notes I wrote to check my understanding. I put together a table to try to classify what method to use in different circumstances. As I use it here, calling "now" means turning control over to some other code, whereas calling "whenever" means retaining control but queuing up some code to be run in the background asychronously (as much as possible).

Call from Call to When/where How
Either Function Now, same thread Normal function call
Function Coroutine Now, same thread .run_* in event loop
Coroutine Coroutine Now, same thread await
Either Function Whenever, same thread Event loop .call_*()
Either Coroutine Whenever, same thread Event loop .create_task()
asyncio.ensure_future()
Either Function Now, another thread .run_in_executor() on ThreadPoolExecutor
Either Function Now, another process .run_in_executor() on ProcessPoolExecutor

Futures & Coroutines

The documentation was also sometimes vague on the relation between coroutines and futures. My summary on what I figured out is below.

Python already had generator-based coroutines.

Python now has a language feature it refers to as "coroutines" in asyncio (and in calls like asyncio.iscoroutine(), but in Python 2.5 it also already supported similar-but-not-entirely-the-same form of coroutine, and even earlier in a limited form via generators. See PEP 342 and Beazley's slides.

Coroutines and Futures are mostly independent.

It just happens that both allow you to call things asychronously. However, you can use coroutines/asyncio without ever touching a Future. Likewise, you can use a Future without ever touching a coroutine or asyncio. Note that its .result() call isn't a coroutine.

They can still encapsulate each other.

A coroutine can encapsulate a Future simply by using await on it.

A Future can encapsulate a coroutine with asyncio.ensure\_future() or the event loop's .create\_task().

Futures can implement asychronicity(?) differently

The ability to make a Future from a coroutine was mentioned above; that's asyncio.Task, an implementation of asyncio.Future, but it's not the only way to make a Future.

concurrent.futures.Future provides other mostly-compatible ways. Its ThreadPoolExecutor provides Futures based on separate threads, and its ProcessPoolExecutor provides Futures based on separate processes.

Futures are always paired with some running context.

That is, a Future is already "started" - running, or scheduled to run, or already ran, or something along those lines, and this is why it has semantics for things like cancellation.

A coroutine by itself is not. The closest analogue is asyncio.Handle which is available only when a coroutine has been scheduled to run.

Other Event Loops

Quamash implements an asyncio event loop inside of Qt, and I used this on a project. I ran into many issues with this combination. Qt's juggling of multiple event loops seemed to cause many problems here, and I still have some unsolved issues in which calls run_until_complete cause coroutines to die early with an exception because the event loop appears to have died. This came up regularly for me because of how often I would want a Qt slot to queue a task in the background, and it seems this is an acknowledge issue.

There is also uvloop. I presently have no need for extra performance (nor could I really use it alongside Qt), but it's helpful to know about.

Other References

There are a couple pieces of "official" documentation that can be good references as well:

PEP 342 and PEP 380 are relevant too.

technobabble

Recommender Systems, Part 1 (Collaborative Filtering)

CincyFP presentation: R & Feature Transformation