Co-Operative Multi-tasking in Python

Note: This is a draft.

Introduction

Software developers are showing enaromous interest in Co-Operative multi-tasking. This in turn made all modern programming languages like Rust, Go, Python (3.6+), JavaScript/TypeScript, Julia, .NET etc. to provide language in-built syntax for Co-Operative Multitasking. In this article, I will give a begineer introduction on how to use co-op multitasking in Python. Once you get hold of co-op tasking, it helps you even if you use other programming languages. C++20 is adding similar support, hence you can re-use understanding from this article if your primary language is C++. At the end, I will share some demo code to try on your console!

What is the Co-operative Tasking?

In Linux OS, the default scheduling policy is a pre-emptive multi-tasking. This means, kernel schedular will pick your thread to run on a core based on certain criteriaAlso, it decided when to pause the running thread. ie, As a deveoper, you can not control when your task moved from running state to paused state. This kind of scheduling forces us do many things like : gaurding critical regions of data/code. This in turn forces us to use locks/mutuxes. This scheduling policy has some side effects such as unnecessary contex switching, which causes waste of cpu resources. Certainly pre-emptive tasking has benifits. However, strong knowledge of alternative methods of tasking makes us a good software developer. This is one reason we need to study co-operative multitasking. Then you can decide which kind of tasking is good for you upcoming s/w design.To understand Co-operative tasking, let us brush up the pre-emptive multiasking.

A brief on pre-emptive Multi-Tasking.

POSIX threads (pthreads) are widely used for preemptive multitasking. sched_yield() in a POSIX thread will bot make the a POSIX thread into non-preemptive multi tasking. Reasons is : thread may be preempted even before it calls the sched_yield().

Present common pattern

Dividing a complex software into multiple threads or multiple processes is a dominant practice. 'POSIX pthread_create() / fork() / its wrappers are almost available on every computer; Hence the near monopoly of these concepts, if not everywhere, at least in minds of people from C/C++ world. It is difficult for us (Human) for true Parallel Processing. This has its own impact on how we write multi tasking software. Hence, the need for better Computing Multi Tasking concepts/tools.

Why do we need Co-Operative tasking?

We can design better SW if we can schedule our own code (instead of forcefully scheduled by OS scheduler).Ex: We can continue the job until we want w/o worrying about being the control given to peer task.We (developer) can be the better judge about when we can give back the control compared to the mechanical scheduler.Context switches are fast but not free.Threads take relatively more memory (hence, more costly cache too)Developer cannot control the scheduler, hence typically end up using locks, thus multitude of problems starts.Developer has to spend time on discussing, designing, implementing locks!Developers exploring the ways to reduce the interference from Kernel Scheduler!

Benefits of The Co-Operative Multi Tasking

Simplified Software.
Significantly less use of synchronization primitives, re-entrant code, thread safe code.
Tasks will never preempted, AND no other tasks will run in parallel until the current task yields. This guarantee to the Developer removes the coding effort to avoid data races. (Here, OS level preempt still happens but that will not impact your inner status of application data. More on this later!)
Significantly Less number of context switches. (each context switch takes 1uS to 3 uS)Though context switch is O(1) in time complexity for any random ‘n’ threads, but the Number of context switches proportional to number of threads.
CPU Pinning: Better binding of Tasks to same CPU for longer time. - Improved cache performance

No Callback Hell in Co-Operative Multi Tasking.

A lot of times, we can divide our work into large number of smaller tasks. But we can not create equally large number of (POSIX) threads for each of them. We end up designing the callbacks to solve this but cognitive load of such design is too much. The load of Co-operative multi tasking is same as Callbacks functions, yet we can create very large number of tasks without load on CPU. Thus, we can avoid callback Hell.

Practical Co-Operative Multi Tasking in Python (JavaScript)

It is provided by Python Interpreter under Single Thread using event loop.For OS, all your co-operating multi tasks appear as ONE thread, not the multiple threads.Every time, a task yields the control, event loop takes the control and schedules the next task based on ‘longest waiting time + ready to run as a parameter’.co-routines are the key parts of this model.Using async/await keywords are added into Python-3.6/JavaScript.

Some Questions

Question: If all Tasks are run in Single threads, does it not reduce the performance?Ans: Most of the modern apps are IO bound. If you know, you app is cpu intensive, async is wrong choice. If it is IO bound, You may see improved performance because CPU computing power is an invariant. Having more threads does not increase the Total available computation power. If your reduce the overall thread count in your system, all remaining threads will get re-distributed power. It is little tricky!. More threads means more context switches. This is an overhead, however small it is. This necessary frictional loss.

Co-Operative Multi Tasking: Shift in our attitude

We tend to think, we spawn the more threads to get more processing power. But this is a ZERO SUM game. In your product-team, if every team starts thinking this, each one will start stealing processing power from other, thus overall degradation. If your system has 6 CPU cores, at most 6 processes can run in parallel. However, a product/Desktop/Server typically have 100s to 1000s processes. So, you will not have imminent danger of having less processes than actual cores. But we should keep this under watch.Instead of thinking in competitive terms, think in co-operative terms. A better design will consume less processing power, makes us a better architect.Also, In some cases, Co-Op tasking may not bring speed compared to pre-emptive tasking. So, not always think in speed-angle. Think also in simplicity angle. i.e., Can writing the software become simple?We almost don't need use locks/semaphores for the critical regions/data that are shared among co-op tasks. If you can break SW logic into many thousands sub parts, think, running each part in thread Vs co-operative tasks. Which consumes less resource, which is easy to write? Some time, old method of pre-emptive tasking may look better. Once can keep using it.

Concerns on Co-operative Multi Tasking

Question: Does not the power to hold the processor as long as a task wants dangerous?Answer: This power has its own advantages and risks. Even in multi threads/Multi Processing, one can easily deny processing power to peer threads by dead locks, live locks, non-re-entrant, non-thread safe code. These problems largely because, threads are preemptive. Now you are solving these by being more co-operative. Agree, there is a risk due to bad coding. Threads are preempted by kernel, at least remaining threads would not suffer this kind of issues.

Example code for Co operative Multi tasking.

We see how to create multiple tasks, how to monitor them. While monitoring, we can call some other routines too. This is typically needed in real-life. Mostly, we would read network messages and there by create more tasks and we may run forever. I just wanted to show that possibility in single code.

import asyncio
async def my_task_init(task_number):
   """ a co routine, pretending doing some IO job.
   Typically, this coutine would wait for IO read/write msg. (Kakfka, AIO sockets, pexpect)
   """
   print(f"started my_task_init_init_init({task_number})")
   await asyncio.sleep(2) # pretending as if doing some IO work!
   print(f"returning my_task_init_init_init({task_number})")
   return task_number + 111

async def my_ANOTHER_task_init(name):
   """ a co routine, pretending doing some IO job.
   Typically, this coutine would wait for IO read/write msg. (Kakfka, AIO sockets, pexpect)
   """
    print(f"started my_ANOTHER_task_init({name})")
    await asyncio.sleep(3) # pretending as if doing some IO work!
    print(f"returning my_ANOTHER_task_init({name})")
    return name + "is from solar system"

def my_few_tasks():
   """ Creates 2+1 = 3 co-operative tasks.
   Returns the task objects so that the caller can wait for them to complete.
   """
    tasks_list = []
    for i in range(2):
       task_obj = asyncio.create_task(my_task_init(i)) # we can pass different co-routines
       tasks_list.append(task_obj)
       # we can create task individually, with separate init functions.
       task_obj = asyncio.create_task(my_ANOTHER_task_init("earth")) # we can pass different co-routines
       tasks_list.append(task_obj)
    return tasks_list

async def main():
    """Calls a coroutine to create few co-operative tasks.
       Checks in a loop if task is done and sleeps in b/w.
       if you want, you can create more tasks, wait for a msg etc.
   """
    print(f"Enter: main()")
    tasks_list = my_few_tasks()
    while True:
        # main do whatever it wants, even it can create more async tasks.
        await asyncio.sleep(1)
        for i in range(len(tasks_list)):
            all_done = True
            print(f" checking task[{i}]")
            if tasks_list[i].done():
                print(f"" Done; return value of task: {tasks_list[i].result()}"")
            else:
                print(f"task is not yet done!")
                # here, you can add code to create more TASKS.
                # Something like, you read some incoming network message, create a task to
                # process it in this main loop, and wait for its completion... blah blah
            all_done = False
    if all_done:
        print(f"all tasks are Completed/Done")
        return

print(f"Return: main()")
asyncio.run(main())
_______________________________
OUTPUT
(sandbox) #python async_1.py
Enter: main()started my_task_init_init_init(0)started my_task_init_init_init(1)
started my_ANOTHER_task_init(earth)
checking task[0]
task is not yet done!
checking task[1]
task is not yet done!
checking task[2]
task is not yet done!
returning my_task_init_init_init(1)
returning my_task_init_init_init(0)
checking task[0]
Done; return value of task: 111
checking task[1]
Done; return value of task: 112
checking task[2]
task is not yet done!
returning my_ANOTHER_task_init(earth)
checking task[0]
Done; return value of task: 111
checking task[1]
Done; return value of task: 112
checking task[2]
Done; return value of task: earthis from solar system
all tasks are Completed/Done
(sandbox) #

Etymology

Let:
Co-routine
Concurrent
Co-Routine
    - Co : together
- Routine : a packaged unit code.
    Co-routine is a subroutine that can co-exists (maintain run time state) while other routines runs.
This allows us to implement co-operative multitasking. They can be entered, 
paused and exited from various points. Compare yourself the co-routine with sub-routine.
Concurrent (Adjective: modifier of a noun, Verb too)
- Con → From Latin. In English, it becomes Co in English; Meaning: together, with;
Related to time order in which things happen.
- Current → run
Hence: Concurrent → Running together. Emphasis is on the time. Time is overlapping.
Concurrent: A system is said to be concurrent if it can support the progress of more than one action at the same time,
but not necessarily executing the all actions simultaneously.
Parallel: A system is said to be parallel if it can support the progress of more than one action at the same time,
and, also, executing the all actions simultaneously (at all physical instances of time).
So, use ‘parallel’ when simultaneous actions are expected.  Use concurrent when simultaneous actions
are may or may not executed. If executed simultaneously, better use 'parallel'.  
I feel (I may be wrong), all parallel systems can be called concurrent. But not otherwise.
Words are tricky. If I am unsure about myself or my audiences, 
I just mention explicitly how actions are carried w.r.t to time.

Reference

https://brennan.io/2020/05/24/userspace-cooperative-multitasking/
https://www.gottliebtfreitag.de/blog/recreational-c++/2019/08/19/why-cooperative-multitasking-might-be-a-bad-idea.html