Rust Concurrency Framework for Mobile

# Rust Concurrency Framework for Mobile **This is a collection of information that our team can collaborate to develop Mobile Rust position paper later this year. With the help of ChatGPT, sometims the content seems to be empty and bland, and needs more depth** There are two ways of the future development of mobile SoCs. The first one is through extending the linear innovation, by continuously improving power consumption and performance through advanced 2/3nm chip process technology. The other one is through system-level innovations, such as domain-specific **software+chip** co-design, **chiplet** and **advanced 3D packaging**, which can improve performance and power consumption without the most advanced process technology. This new system-level hardware design method and architecture revolutionizes software systems, especially how to effectively support parallel heterogeneous computing. Multi-core CPUs (with tens to hundreds of cores) are already a reality. The traditional OS and programming languages' hardware abstraction is based on the shared memory SMP multi-core computing architecture. In this architecture, the CPU is abstracted as a logical hardware for single-processor serial processing, allowing software developers to still use serial control flow programming. To this end, OS and programming languages provide mechanisms such as locks and atomic instructions to manage shared memory. However, these software tools limit hardware processing performance and do not simplify developers' burden. Developing parallel and concurrent applications is still a very difficult task. When managing cache coherence in a multi-core concurrent CPU, it is necessary to implement hardware locks to provide atomic memory access, thus turning parallel access into serial access. If the memory access mechanism of the hardware system changes from strong consistency to weak consistency, access speed can be increased by 20-100 times. GPUs are an example of this, as their programming model encourages developers to use local fast memory and avoid using global shared memory. Software in future should break free from the constraints of this memory-sharing paradigm and allow developers to perform data abstraction, concurrent abstraction, and can run on multi-core or even multi-machine systems without having to rewrite programs every time. The evolution of cloud computing from the scale-up mode of large servers to the scale-out mode of cluster expansion is in line to break these constraints. In this process, multiple data consistency storage implementation methods were introduced for applications, such as providing a NoSQL database for eventual consistency, greatly improving the transaction processing speed of e-commerce applications. The introduction of the Go language provided good concurrent application abstractions and became the main language for developing high-performance cloud computing infrastructure. The next generation of mobile operating systems needs to provide systematic parallel and concurrent support in programming languages, operating systems, and programming frameworks in order to fully unleash the capabilities of the new generation of mobile SoC hardware which are heterogenous, AMP (Asymmetric Multi-Processing, non-SMP), and with built-in GPUs. It is unrealistic to build a brand new mobile software stack from scratch without leveraging existing mobile apps ecosystem and code bases, but it is necessary to implement a disruptive innovation in a full-stack manner (programming language, OS, application framework, developer experience) in order to match the new area-performance hardware architecture. Rust language and its rapid developing ecosystem present an opportunity to innovate next generation mobile apps software stack. Swift language design and its concurrency design offer some insight for future Mobile Rust projects. https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782 ## Motivation - Why we need it ### Mobile workload needs fine-grained task control strategy The concurrency framework for mobile workload requires fine-grained task control strategy which takes into account factors such as: 1) UI-related tasks should have a higher priority and lower latency, while background tasks should have lower priorities and consume less CPU/GPU resources and have smaller impacts on other task scheduling and processing in order to provide a better user experience. 2) Battery efficiency 3) Heterogeneous CPU/GPU architecture, e.g., big.LITTEL architecture etc. Mobile applications usually have strict QoS mechanisms. For example, Swift provides a 4-level QoS. First, there is a distinction between foreground and background applications. The foreground application is the app that the user is currently using, so all green tasks belonging to this app need to have their priorities boosted. Within the app, green tasks that do UI rendering have the highest priority and require QoS such as 60fps, so these UI green tasks need to be able to grab CPU resources at any time. Background green tasks within the app have the lowest priority, but still higher than all green tasks of background apps. In addition, there are also strong energy efficiency QoS requirements inside the mobile SoC, with big cores pursuing performance and small cores pursuing energy efficiency. The app's green tasks need to determine whether to use performance cores or energy-efficient cores based on the application scenario. Currently, the scheduling policies of DSR (Domain Specific Runtime, e.g., Tokio) do not support QoS classification and are mostly cooperative scheduling, and the locking relationship between threads and cores is fixed. Therefore, the new Rust concurrency framework may need to take over the scheduling function of DSR to achieve complex scheduling requirements based on QoS and energy efficiency in mobile scenarios. https://orangeloops.com/2021/10/async-await-vs-coroutines-native-developer/ ### Support multiple use cases The concurrency framework needs to be adapted to as many types of use cases as possible, which usually have different CPU architectures, different operating systems (even without an operating system), and different available resources (such as disk, memory, CPU, etc.). This is common in comsumer products, such as mobile phones, small wearable devices, or large smart home appliances. Rust compiler can resolve part of adaptations, e.g., targets for different CPU architectures and operating systems. However, a large part of the adaptation still needs to be done by the concurrency framework, such as reactor/IO, libc, and/or even in no-std environment for embedded devices with extremely limited resources. ### Programming paradigm: developer's user interface How to use a framework to adapt to multiple use cases and workloads and provide developers with a stable programming interface as much as possible? Different use cases may require different programming paradigms that are adapted to their specific needs. Even within the same use case, different applications and different business scenarios may also require different programming paradigms. For example, in an I/O-intensive application where tasks spend a lot of time waiting for I/O, an asynchronous I/O programming paradigm is needed to reduce the overhead of task switching and improve task concurrency and CPU utilization (e.g., Tokio). On the other hand, a compute-intensive application that requires a lot of time to perform computing operations should adopt a parallel computing programming paradigm, where tasks are usually bound to a specific core to improve cache hit rates and reduce the overhead of inter-core task switching (e.g., Rayon). In addition, when it comes to data sharing and communication, the Actor programming paradigm can be used to avoid data races. Currently, Rust concurrency frameworks usually provide only one or a few programming paradigms, and users have to combine multiple crates, such as **Tokio+Rayon** or **Tokio+Actix**, to use other programming paradigms. The programming interfaces between different crates are not entirely unified, which increases the learning curve for developers. In general, the concurrency framework that supports multiple programming paradigms in a developer friendly way will simplify user programming experience, improve productivity, and provide the best practices for the different use case scenarios. ## Rust Concurrency Framework - What we hope for ### Consistent and user friendly programming interface APIs A user-friendly and consistent programming interface that is easy to use and understand. This means that the programming interface should be designed to be intuitive and easy to learn, with a clear and consistent design that is easy to follow. This is especially important when working with complex systems, such as concurrent programming frameworks, where the complexity can quickly become overwhelming. This can be achieved through the use of high-level abstractions and clear documentation that helps developers to understand how the framework works and how to use it effectively. Additionally, providing a unified interface for different programming paradigms can help to simplify the process of choosing the right approach for different types of applications and scenarios. The Rust concurrency framework should have consistent APIs for different use cases such that their differences should be shielded as much as possible through framework implementation. Maintaining API stability helps reduce the cost of switching user code between different use cases. The ultimate goal is to help users achieve a set of application code for multiple use cases. At the same time, the API abstraction and design should reduce developer's learning curve and improve developer's productivity. To balance flexibility and usability, the API design should be layered. High-level APIs should shield a large amount of details for the purpose of quick use, while low-level APIs allow users to do more detailed configurations. The API should also allow developers to focus on their own business without worrying about the implementation details of the framework. ### Multiple Programming Paradigms A concurrent framework should provide users with rich programming paradigms, including IO asynchronous (e.g., Tokio), computation parallelism (e.g., Rayon), actor model, task dependency expression, etc., to deal with various use case scenarios. At the same time, different programming paradigms should be integrated and unified in API design, maintaining consistency for developers. IO asynchronous and computation parallelism are used to handle IO-intensive and computation-intensive tasks, respectively. The framework should provide easy-to-use interfaces that automatically parallelize users' computing tasks and support the use of IO asynchronous tasks in a similar way in the same or similar contexts. The actor model supports saving user data in actors, where different data can only be shared through messages and cannot be directly accessed. Actors have both state and behavior, and users can define actor behavior that can process according to the current state and change the current state. The actor model is suitable for situations that require data sharing while avoiding data races and has been proven effective by many languages or frameworks (such as Erlang and Akka). Task dependency expression can identify other tasks (similar to taskflow) or data that the current task depends on, to guide the concurrent framework to construct task execution flows and execute these tasks in sequence and parallel to achieve optimal execution efficiency and CPU utilization. ### Replaceable and customizable executor An executor is responsible for executing submitted tasks and managing threads, and different types of executors can be used for different types of tasks and workloads. Having a replaceable and customizable executor allows users to choose an executor that best suits their needs and optimize the performance of their concurrent program. This feature is particularly important for complex applications that require fine-tuning of the underlying executor to achieve optimal performance. The Rust language async/await has built-in capability to support replaceable and customizable executor. ### Expandable reactor An expandable reactor in the context of a concurrency framework refers to a design that allows for the dynamic addition and removal of resources to handle incoming requests or events. In a concurrency framework, the reactor is responsible for managing input/output (I/O) operations and events, such as network connections, file I/O, and user input. The expandable reactor design allows the framework to handle an increasing number of requests or events by scaling up or down the resources allocated to the reactor dynamically. The most typical Reactor is the event source of network IO, such as epoll on Linux and IOCP on Windows. Different network IO event sources need to be adapted for different operating systems. At the same time, the concurrent framework Reactor also needs to support expandable other event sources. Using a unified system, various types of event monitoring can be brought into Reactor management, such as file, memory, signal events, etc., thus realizing the asynchronous interface of operations on these events. ### Flexible and customizable functional modules To meet the various resource usage restrictions of embedded systems, a concurrency framework should support flexible and customizable functional modules. This requires that the modules should be loosely coupled and interact through interface registration, rather than direct function calls. Specifically, the framework should support customizable features such as asynchronous I/O, asynchronous and synchronization primitives, and asynchronous timers. Furthermore, the concurrency framework should also consider the no-std scenario. Even though some functions may not be available in a no-std environment, the framework should still support Rust asynchronous programming and asynchronously convert available interfaces. ## The concepts of asynchronous, concurrent, and parallel computing (with the help of ChatGPT) ### Concurrent Concurrent computing is a type of computing where multiple tasks or processes are executed simultaneously, often on a single processor/core. This means that multiple operations are being performed at the same time, although not necessarily at the same speed or with the same priority. Concurrent computing is used to improve the performance and efficiency of programs, especially in cases where there are multiple independent tasks that can be executed in parallel. Concurrency can be achieved using several techniques, including multi-threading, multi-processing, and coroutines. Multi-threading involves creating multiple threads within a single process, where each thread can execute a separate task concurrently. Multi-processing, on the other hand, involves creating multiple processes that run concurrently. Coroutines are a lightweight alternative to threads or processes, where multiple tasks are executed within a single thread using cooperative multitasking. ### Parallel computing Parallel computing is a type of computing where multiple tasks are executed simultaneously, often on multiple processors or computers. This allows programs to achieve higher levels of performance and faster execution times than can be achieved with serial computing, where tasks are executed sequentially on a single processor. Parallel computing can be achieved using several techniques, including shared memory parallelism, distributed memory parallelism, and hybrid parallelism. Shared memory parallelism involves multiple processors accessing the same shared memory, allowing them to work on different parts of a problem simultaneously. This technique is often used in multi-core processors or high-performance computing clusters. Distributed memory parallelism involves multiple processors or computers working together on a problem, with each processor or computer having its own local memory. This technique is often used in large-scale supercomputers or clusters of computers connected by high-speed networks. Hybrid parallelism involves combining shared memory parallelism and distributed memory parallelism to achieve the benefits of both techniques. This can be achieved by using multiple processors within each node of a high-performance computing cluster, and then using multiple nodes in a distributed computing environment. ### Asynchronous Asynchronous computing is a programming paradigm that allows tasks to be executed independently and in a non-blocking manner. This means that when a task is waiting for a resource (such as input/output), the program can switch to executing other tasks instead of blocking the whole program. As a result, asynchronous programming can provide improved performance and efficiency, especially in programs that involve I/O-bound tasks, such as web servers or network communication. In asynchronous programming, tasks are typically implemented using callbacks or coroutines. Callbacks are functions that are executed when a task is completed, allowing the program to continue with other tasks while the original task is waiting for a resource. Coroutines, on the other hand, are functions that can be paused and resumed, allowing multiple tasks to be executed within a single thread of execution. ### Coroutine and Green Task "Green task" and "coroutine" are two terms that are often used interchangeably, but they have slightly different meanings depending on the context. A coroutine is a special type of function or subroutine that allows for cooperative multitasking within a single thread of execution. Unlike traditional subroutines, which execute to completion before returning control to the caller, coroutines can be suspended and resumed at specific points during their execution, allowing multiple tasks to be interleaved within a single thread. Coroutines can also be used in other contexts, such as state machines or generators. In a state machine, a coroutine can represent a particular state or behavior, with each suspension and resumption representing a transition to a new state. In a generator, a coroutine can produce a sequence of values, with each suspension and resumption representing the production of a new value in the sequence. Green tasks are similar to threads (sometimes called user space thread), but they are not managed by the operating system and are instead managed entirely within the concurreny framework. We can view a coroutine is a specific type of green task that allows for cooperative multitasking within a single thread. So while coroutines are a type of green task, not all green tasks are coroutines. Green tasks can also include other types of lightweight tasks, such as green threads or greenlets, which may be used in contexts other than asynchronous programming. In summary, while the terms "green task" and "coroutine" are often used interchangeably, a coroutine is a specific type of green task that is used for cooperative multitasking within a single thread, whereas a green task can refer to any lightweight task that is scheduled and managed by a concurrency framework. ### Stackful coroutine and stackless coroutine Stackful coroutines (sometimes called Fibers or user mode cooperatively scheduled threads) and stackless coroutines (compiler synthesized state machines) are two different approaches to implementing coroutine functionality. Stackful coroutines are implemented using a traditional call stack, which means that each coroutine has its own stack frame that maintains the state of the coroutine when it is suspended. When a coroutine is resumed, its stack frame is restored and execution continues from the point where it was suspended. This approach allows for more efficient context switching and better performance, but it also requires more memory and can be more complex to implement. For example, if the coroutine's stack memory usage exceeds the designed threshold, the Concurrency Framework's runtime will dynamically increase the stack size, and how to achieve dynamic stack growth (different design ideas have emerged, such as segmented stack, guarded stack, etc.). Stackless coroutines, on the other hand, are implemented using a custom stack or heap that is managed by the coroutine library. Instead of maintaining a separate call stack for each coroutine, stackless coroutines use a single call stack that is shared by all coroutines and their progress are maintained by coresponding state machine. When a coroutine is suspended, its state is saved to a separate data structure, such as a queue or heap, and when it is resumed, its state is restored from that data structure. This approach has a smaller memory footprint and is simpler to implement, but it can be less efficient due to the overhead of managing the separate data structure. The paper **Fibers under the magnifying glass** https://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p1364r0.pdf has detailed analysis on efficiency, scalability and usability problems of stackful coroutines. However, to support (priority) preemptive scheduling, stackful coroutine cannot be avoided, or leverage OS thread for preemptive / priority scheduling. ### CPU core vs. OS process vs. OS thread vs. User Space thread A CPU core is a physical processing unit that can execute instructions in parallel. A modern CPU can have multiple cores, allowing for multiple instructions to be executed simultaneously. An OS process is an instance of a program that is executed by the operating system. Each process has its own memory space and system resources, such as file handles and network sockets. Processes are isolated from each other and communicate with each other using inter-process communication mechanisms. An OS thread is a lightweight unit of execution within a process. Each thread shares the same memory space as the process and can access all the resources of the process. Threads can communicate with each other directly, without the need for inter-process communication mechanisms. A user space thread (sometimes called green task) is a thread that is managed entirely within the application, without the involvement of the operating system. User space threads are implemented using a threading library or other concurrency framework, and they can be scheduled and managed independently of the operating system's scheduler. User space threads can be more efficient than OS threads because they have less overhead, but they may not be able to take advantage of multi-core processors as efficiently as OS threads. In summary, a CPU core is a physical processing unit, an OS process is an instance of a program executed by the operating system, an OS thread is a lightweight unit of execution within a process, and a user space thread is a thread managed entirely within the application without the involvement of the operating system. Each of these concepts plays a role in concurrent programming and multi-threading, and understanding their differences is important for designing efficient and scalable software systems. ### Preemptive and cooperative multitasking Operating systems normally supports preemptive scheduling algorithms, but the most concurrency frameworks and/or language runtimes support cooperative multitasking. The operating system uses scheduling algorithms to determine which processes and threads should be executed at any given time. The goal of these algorithms is to maximize system performance by ensuring that the CPU is utilized as efficiently as possible, while also meeting the requirements of the running processes and threads. There are many scheduling algorithms used by modern operating systems which are preemptive. Some common ones include: * Round-robin scheduling: In this algorithm, each process or thread is given a fixed time slice or quantum of CPU time, after which the scheduler switches to the next process or thread in the queue. This algorithm ensures that each process or thread gets a fair share of the CPU time, but it may not be optimal for certain workloads. * Priority-based scheduling: In this algorithm, each process or thread is assigned a priority level based on its importance or urgency. The scheduler gives preference to processes or threads with higher priority levels, but may also temporarily boost the priority of lower-priority processes or threads to ensure that they are not starved of CPU time. * Shortest job first (SJF) scheduling: In this algorithm, the scheduler gives priority to processes or threads with the shortest expected execution time. This algorithm can minimize the average waiting time for processes or threads, but requires accurate estimates of execution times. * Multi-level feedback queue scheduling: This algorithm uses multiple queues with different priorities and time-slice lengths to balance the needs of different types of processes or threads. Processes or threads may move between queues based on their behavior and resource requirements. * In addition to these algorithms, modern operating systems also use techniques such as preemption, where the scheduler may interrupt a running process or thread to give CPU time to another process or thread, and affinity scheduling, where the scheduler attempts to keep a process or thread on the same CPU core to minimize cache misses and improve performance. Cooperative multitasking is a type of multitasking where multiple tasks or processes are executed concurrently, but each task must **explicitly yield** control to other tasks at specific points in its execution. In cooperative multitasking, the operating system relies on the cooperation of the running tasks to ensure that each task gets a fair share of CPU time. In cooperative multitasking, each task or thread is responsible for managing its own execution, and must periodically yield control to other tasks or threads in the system. This is typically done by explicitly calling a yield function or similar construct that allows other tasks to execute. Cooperative multitasking can be simpler to implement than preemptive multitasking, and can be more efficient for certain types of workloads, since there is less overhead involved in task switching. However, cooperative multitasking can also be less reliable, since a single misbehaving task can monopolize CPU time and starve other tasks of resources. Cooperative multitasking is often used in user-level threads or fibers, where the concurrency framework/runtimes manages the scheduling of tasks. Potentially, the framework/runtime can adopt preemtive scheduling algorithms. For example, Go language runtime implemented preemptive scheduling thru OS signal (every 10ms) to interrupt the fiber task, and determine if the fiber task is at safe-point to yield to other tasks. (https://github.com/golang/proposal/blob/master/design/24543-non-cooperative-preemption.md,https://kerkour.com/cooperative-vs-preemptive-scheduling, https://medium.com/a-journey-with-go/go-asynchronous-preemption-b5194227371c, the details are here: https://go.dev/src/runtime/preempt.go.) ### Priority preemptive and priority inversion problem In priority preemptive scheduling, each process or thread is assigned a priority level, and the scheduler gives preference to processes or threads with higher priority levels. If a higher-priority process or thread becomes available, the scheduler will preempt the currently running process or thread to ensure that the higher-priority task is executed in a timely manner. However, priority inversion is a problem that can occur when multiple processes or threads are executing concurrently with different priority levels. In some cases, a lower-priority process or thread may hold a resource that a higher-priority process or thread needs to execute. This can cause the higher-priority process or thread to be blocked, or to wait for an extended period of time, which can cause delays and reduce system performance. Priority inversion can be particularly problematic when the lower-priority process or thread is executing in a critical section of code, where it cannot be preempted by higher-priority tasks. This can cause a situation known as priority inversion deadlock, where the higher-priority process or thread is blocked indefinitely, and cannot make progress until the lower-priority process or thread releases the required resource. One solution to the priority inversion problem is to use priority inheritance protocols, which ensure that a lower-priority process or thread inherits the priority of a higher-priority process or thread when it is holding a resource that the higher-priority process or thread needs. This can help prevent priority inversion by temporarily boosting the priority of the lower-priority process or thread, allowing the higher-priority task to execute more quickly. Overall, priority inversion is a complex problem that can be difficult to detect and resolve in systems with complex and dynamic scheduling requirements. Careful attention to system design, including the use of appropriate scheduling algorithms and priority inheritance protocols, can help minimize the impact of priority inversion and ensure that systems operate reliably and efficiently. ### Structured concurrency Concurrency framework/runtime realize their capabilities thru **structured concurrency**, which is a programming paradigm that provides a structured and safe way to manage concurrent operations in a program. The goal of structured concurrency is to make it easier to write correct and maintainable concurrent programs by enforcing certain rules and constraints on how concurrent operations are organized and executed. In structured concurrency, concurrent operations are organized into hierarchies of scopes or contexts, where each scope or context is responsible for managing the lifetime and execution of its child operations. When a new concurrent operation is started, it is associated with the scope or context in which it is created, and must be explicitly structured and organized within that scope or context. One of the key principles of structured concurrency is that each scope or context is responsible for ensuring that all of its child operations complete successfully, or are aborted (or cancelled) if an error occurs. This helps to avoid common concurrency issues such as race conditions, deadlocks, and resource leaks, by ensuring that all concurrent operations are properly structured and managed within their respective contexts. Structured concurrency is often implemented using a combination of language features and programming patterns, such as asynchronous functions, coroutines, and structured exception handling. By providing a structured and safe way to manage concurrency, structured concurrency can help make it easier to write correct and maintainable concurrent programs, even in complex and dynamic environments. Table below summarized structured concurrency implemented in mordern programming languages especially in mobile domain. https://shahbhat.medium.com/structured-concurrency-in-modern-programming-languages-part-iv-kotlin-and-swift-7bf0e08de1dd | Feature | Typescript (NodeJS) | GO | Rust | Kotlin | Swift | |-----------------------------------------|:-----------------------------:|:-------------------------------------------------:|--------------------------------------------------:|:-------------------------------------------------:|--------------------------------------------------:| | Structured scope | Built-in | manually | Built-in | Built-in | Built-in | | Asynchronous Composition | Yes | No | Yes | Yes | Yes | | Error Handling | Natively using Exceptions | Manually storing errors in Response | Manually using Result ADT | Natively using Exceptions | Natively using Exceptions | | Cancellation | Cooperative Cancellation | Built-in Cancellation or Cooperative Cancellation | Built-in Cancellation or Cooperative Cancellation | Built-in Cancellation or Cooperative Cancellation | Built-in Cancellation or Cooperative Cancellation | | Timeout | No | Yes | Yes | Yes | Yes | | Customized Execution Context | No | No | No | Yes | Yes | | Race Conditions | No due to NodeJS architecture | Possible due to shared state | No due to strong ownership | Possible due to shared state | Possible due to shared state | | Value Types | No | Yes | Yes | Yes | Yes | | Concurrency paradigms | Event loop | Go-routine, CSP channels | OS-Thread, coroutine | OS-Thread, coroutine, CSP channels | OS-Thread, GCD queues, coroutine, Actor model | | Type Checking | Static | Static but lacks generics | Strongly static types with generics | Strongly static types with generics | Strongly static types with generics | | Suspends Async code using Continuations | No | No | Yes | Yes | Yes | | Zero-cost based abstraction ( async) | No | No | Yes | No | No | | Memory Management | GC | GC | (Automated) Reference counting, Boxing | GC | Automated reference counting | ## Popular Concurrency Framework from Rust Ecosystem ### Tokio (https://tokio.rs/tokio/tutorial) Tokio is a popular asynchronous runtime for Rust that provides an event-driven, non-blocking I/O model for building high-performance network applications. Tokio is built on top of Rust's futures and async/await primitives, and provides a variety of APIs for working with asynchronous tasks, including timers, channels, and futures. One of the key features of Tokio is its ability to handle large numbers of concurrent connections with minimal overhead. This is achieved by using Rust's lightweight, zero-cost abstractions and by relying on asynchronous I/O to avoid blocking on I/O operations. Main features are: 1. Non-blocking I/O: Tokio provides a non-blocking I/O model, which allows us to perform I/O operations without blocking the main thread of execution. This is important for building high-performance network applications that can handle many connections simultaneously. 1. Lightweight abstractions: Tokio is built on top of Rust's lightweight abstractions, which means that it has minimal overhead compared to other concurrency frameworks. This makes it well-suited for high-performance applications that need to handle a large number of concurrent connections. 1. Support for async/await: Tokio supports Rust's async/await syntax, which provides a clean and concise way to write asynchronous code. This makes it easier to reason about and maintain asynchronous code, compared to other concurrency frameworks that may rely on more complex abstractions. 1. Protocol support: Tokio provides support for a variety of network protocols, including TCP, UDP, HTTP, WebSocket, and more. This makes it easier to build network applications that use these protocols, without having to implement them from scratch. Tokio provides multiple variations of the runtime. Everything from a multi-threaded, work-stealing runtime to a light-weight, single-threaded runtime. Each of these runtimes come with many knobs to allow users to tune them to their needs. https://tokio.rs/blog/2019-10-scheduler https://tokio.rs/blog/2020-04-preemption The Tokio runtime runs within a single OS process. It uses a single event loop, which handles I/O operations and schedules tasks on a set of worker threads. Tokio does not use multiple OS processes for parallelism. However, multiple instances of Tokio can be run across multiple OS processes to achieve horizontal scaling. Although Tokio is useful for many projects that need to do a lot of things simultaneously, there are also some use-cases where Tokio is not a good fit. * Speeding up CPU-bound computations by running them in parallel on several threads. Tokio is designed for IO-bound applications where each individual task spends most of its time waiting for IO. If the only thing your application does is run computations in parallel, you should be using rayon. That said, it is still possible to "mix & match" if you need to do both. * Reading a lot of files. Although it seems like Tokio would be useful for projects that simply need to read a lot of files, Tokio provides no advantage here compared to an ordinary threadpool. This is because operating systems generally do not provide asynchronous file APIs. * Sending a single web request. The place where Tokio gives you an advantage is when you need to do many things at the same time. If you need to use a library intended for asynchronous Rust such as reqwest, but you don't need to do a lot of things at once, you should prefer the blocking version of that library, as it will make your project simpler. Using Tokio will still work, of course, but provides no real advantage over the blocking API. If the library doesn't provide a blocking API, see the chapter on bridging with sync code. Tokio is a powerful and flexible framework for building high-performance network applications in Rust, and is widely used in the Rust community for a variety of use cases, including web servers, database drivers, and distributed systems. ### Async-std (https://async.rs/, https://github.com/async-rs/async-std) Async-std is an asynchronous runtime for Rust that provides a similar set of functionality to Tokio, but with a different design philosophy. Unlike Tokio, which aims to provide a low-level, composable set of building blocks for building high-performance network applications, async-std takes a more opinionated approach and provides a higher-level, batteries-included set of APIs that are easier to use and more beginner-friendly. async-std believes Async Rust should be as easy to pick up as Sync Rust. it also believes that the best API is the one developers already know and that providing an asynchronous counterpart to the standard library is the best way stdlib provides a reliable basis for both performance and productivity. Async-std is the embodiment of that vision. It combines single-allocation task creation, with an adaptive lock-free executor, threadpool and network driver to create a smooth system that processes work at a high pace with low latency, using Rust's familiar stdlib API. One of the key benefits of async-std is its ease of use. Because it provides a high-level set of abstractions, it can be easier for new Rust developers to get started with asynchronous programming than with Tokio's more complex set of building blocks. Additionally, async-std provides a more consistent and ergonomic set of APIs than Rust's built-in std::future and std::async modules, which can be difficult to use and require a lot of boilerplate code. Another benefit of async-std is its focus on cross-platform compatibility. Like Tokio, async-std is designed to work seamlessly on Windows, macOS, and Linux, making it a great choice for building cross-platform network applications. async-std also provides a number of built-in features that make it easy to build high-performance network applications, including a fast and scalable event loop, support for asynchronous file and network I/O, and support for HTTP and WebSocket protocols. async-std is a great choice for Rust developers who want to build high-performance network applications using a simple and easy-to-use set of APIs. Its focus on simplicity and cross-platform compatibility make it a great choice for beginners and experienced developers alike. ### Smol (https://github.com/smol-rs/smol) Smol is another lightweight asynchronous runtime for Rust that provides an alternative to both async-std and Tokio. Like async-std, smol aims to provide a simple and easy-to-use set of abstractions for building asynchronous applications, but with an even simpler and more minimalistic design. One of the key benefits of smol is its simplicity and ease of use. Unlike Tokio and async-std, which provide a large and complex set of building blocks for building asynchronous applications, smol provides a much simpler and more minimalistic set of abstractions that are easier to understand and use. This makes smol a great choice for beginners and developers who prefer a more minimalistic and streamlined approach to asynchronous programming. Another benefit of smol is its focus on performance. Because smol provides a simpler and more lightweight set of abstractions than other asynchronous runtimes, it can be more efficient and faster in some cases. Additionally, smol uses a unique scheduling algorithm that is designed to be more efficient than traditional event loop-based schedulers. Smol also provides a number of built-in features that make it easy to build high-performance network applications, including support for asynchronous file and network I/O, a built-in HTTP client and server, and support for Rust's async/await syntax. Smol is a great choice for Rust developers who want a simple and lightweight asynchronous runtime that is easy to use and fast. Its minimalistic design and focus on performance make it a great choice for building high-performance network applications with minimal overhead. ### Comparing Tokio, Async-std and Smol Tokio, Async-std, and Smol are three popular asynchronous runtimes for Rust that each provide a different set of features and design philosophies. Here are some of the key differences between these runtimes: * Design philosophy: Tokio aims to provide a low-level, composable set of building blocks for building high-performance network applications. Async-std provides a higher-level, batteries-included set of APIs that are easier to use and more beginner-friendly. Smol is even more minimalistic and provides a simpler set of abstractions for building asynchronous applications. * Ease of use: Async-std is the easiest to use of the three runtimes, thanks to its high-level set of abstractions and consistent, ergonomic APIs. Tokio and Smol are more complex and require a deeper understanding of asynchronous programming concepts. * Performance: Tokio is generally considered to be the most performant of the three runtimes, thanks to its focus on low-level optimization and scalability. However, Async-std and Smol can be more efficient in some cases due to their simpler design and lightweight abstractions. * Feature set: All three runtimes provide support for asynchronous file and network I/O, but they differ in their built-in features. For example, Tokio provides support for timers and multi-threaded task execution, while Async-std includes a built-in HTTP client and server, and Smol includes a unique scheduling algorithm. * Compatibility: All three runtimes are designed to work seamlessly on Windows, macOS, and Linux, but they differ in their compatibility with other libraries and frameworks. Tokio is the most widely used and has the most extensive ecosystem of third-party libraries, while Async-std and Smol are newer and have a smaller but growing set of libraries and frameworks that support them. The choice of which asynchronous runtime to use will depend on your specific use case, preferences, and skill level. Tokio is a great choice for building high-performance network applications with a low-level, composable set of building blocks, while Async-std is a great choice for beginners or developers who prefer a higher-level set of abstractions. Smol is a good choice for developers who want an even simpler and more minimalistic set of abstractions for building asynchronous applications. ### Embassy (https://embassy.dev/, https://github.com/embassy-rs/embassy) Embassy is the next-generation framework for embedded applications. Write safe, correct and energy-efficient embedded code faster, using the Rust programming language, its async facilities, and the Embassy libraries. Rust's async/await allows for unprecedently easy and efficient multitasking in embedded systems. Tasks get transformed at compile time into state machines that get run cooperatively. It requires no dynamic memory allocation, and runs on a single stack, so no per-task stack size tuning is required. It obsoletes the need for a traditional RTOS with kernel context switching, and is faster and smaller than one! * Hardware Abstraction Layers - HALs implement safe, idiomatic Rust APIs to use the hardware capabilities, so raw register manipulation is not needed. The Embassy project maintains HALs for select hardware, but you can still use HALs from other projects with Embassy. * Time that Just Works - No more messing with hardware timers. embassy_time provides Instant, Duration and Timer types that are globally available and never overflow. * Real-time ready - **Tasks on the same async executor run cooperatively, but you can create multiple executors with different priorities, so that higher priority tasks preempt lower priority ones**. See the example (https://github.com/embassy-rs/embassy/blob/master/examples/nrf52840/src/bin/multiprio.rs). * Low-power ready - Easily build devices with years of battery life. The async executor automatically puts the core to sleep when there's no work to do. Tasks are woken by interrupts, there is no busy-loop polling while waiting. * Networking - The embassy-net network stack implements extensive networking functionality, including Ethernet, IP, TCP, UDP, ICMP and DHCP. Async drastically simplifies managing timeouts and serving multiple connections concurrently. * Bluetooth - The nrf-softdevice crate provides Bluetooth Low Energy 4.x and 5.x support for nRF52 microcontrollers. * LoRa - embassy-lora supports LoRa networking on STM32WL wireless microcontrollers and Semtech SX126x and SX127x transceivers. * USB - embassy-usb implements a device-side USB stack. Implementations for common classes such as USB serial (CDC ACM) and USB HID are available, and a rich builder API allows building your own. * Bootloader and DFU - embassy-boot is a lightweight bootloader supporting firmware application upgrades in a power-fail-safe way, with trial boots and rollbacks. Embassy is also designed to work with the no_std environment, which is a Rust feature that allows developers to build applications without relying on the standard library. This is important for embedded applications, where resources such as memory and processing power may be limited. ### Comparing Tokio and Embassy Tokio and Embassy are both Rust frameworks for building asynchronous and concurrent applications, but they have some significant differences in terms of design and use cases. One of the main differences between Tokio and Embassy is their target platforms. Tokio is designed primarily for building network applications on servers, Rust Embassy is designed for building embedded applications on microcontrollers and other resource-constrained devices. This means that Tokio is optimized for high-performance network I/O and distributed computing, while Rust Embassy is optimized for low-power, low-latency embedded systems. Both Tokio and Embassy use cooperative scheduling instead of preemptive scheduling, however Embassy provides a flexible executor system that allows developers to specify different priorities for tasks based on their importance and resource requirements. This enables developers to create custom executors that can prioritize critical tasks and optimize resource usage for different application scenarios. Embassy's support for custom executors and task prioritization makes it a powerful tool for building efficient and responsive embedded applications that can handle a wide range of tasks and events. ### Rayon (https://github.com/rayon-rs/rayon) Rayon is a framework for data parallelism in Rust. It is designed to make it easy to parallelize operations that can be broken down into smaller, independent subproblems that can be processed in parallel. The goal of Rayon is to make it simple for Rust developers to take advantage of multiple CPU cores without having to worry about the details of low-level threading and synchronization. One of the main benefits of using Rayon is that it can automatically distribute workloads across all available CPU cores, making it possible to process data in parallel and achieve significant speedups for many types of operations. Rayon also provides a simple and intuitive API that is easy to use, even for developers who are not familiar with low-level threading and concurrency concepts. Rayon is particularly well-suited for Rust applications that need to process large amounts of data, such as scientific computing, data analysis, and machine learning. By parallelizing these computations across multiple CPU cores, Rayon can significantly reduce the time required to complete these tasks and improve the overall performance of the application. ``` use rayon::prelude::*; fn sum_of_squares(input: &[i32]) -> i32 { input.par_iter() // <-- just change that! .map(|&i| i * i) .sum() } ``` Rayon is a valuable tool for Rust developers who want to take advantage of the full power of modern multicore processors without having to deal with the complexity of low-level concurrency and synchronization. ### Comparing Tokio and Rayon Tokio and Rayon are both Rust libraries that provide concurrency and parallelism, but they differ in their specific use cases and design goals. **Tokio** * Tokio is built on top of Rust's async/await syntax, which allows for writing asynchronous code in a natural and intuitive way. * Tokio provides an event-driven model of concurrency, where a single thread can handle many simultaneous I/O-bound tasks by scheduling them to run cooperatively. * Tokio uses a scheduler that implements a work-stealing algorithm to efficiently distribute work across threads. * Tokio provides a rich set of I/O primitives for building network servers, clients, and other asynchronous applications. * **Rayon** * Rayon is built on top of Rust's standard library and uses a thread pool to parallelize CPU-bound workloads across multiple threads. * Rayon uses a divide-and-conquer approach to parallelism, where the workload is recursively divided into smaller sub-tasks until they can be executed independently. * Rayon's API is based on functional programming concepts such as map, filter, and reduce, making it easy to parallelize common operations on collections. * Rayon uses work-stealing to balance the workload across threads and to minimize contention for shared resources. In summary, Tokio and Rayon provide different models of concurrency and parallelism, and are optimized for different kinds of workloads. Tokio is designed for I/O-bound tasks that can benefit from asynchronous, event-driven programming, while Rayon is designed for CPU-bound tasks that can be parallelized across multiple threads. ### Lunatic (https://github.com/lunatic-solutions/lunatic) Lunatic is a universal runtime for fast, robust and scalable server-side applications. It's inspired by Erlang and can be used from any language that compiles to WebAssembly. Lunatic includes a set of libraries and a WebAssembly runtime which allows developers to build resilient actor systems. Lunatic chose WebAssembly instances as the abstraction for actors. Each instance has its own stack, heap and syscalls. This allows developers to have completely isolated execution environments per actor, without reaching out to much heavier technologies like Docker. Here are some key features and concepts of Rust Lunatic: **Actors, WASM, hot reloading and supervisors** * Lunatic uses an actor model of concurrency, where each actor (WASM module) is an isolated, independent entity that communicates with other actors through message passing. * Lunatic uses the name Process for actors. Lunatic's processes are lightweight, fast to create and destroy, and the scheduling overhead is low. They are designed for massive concurrency. * Lunatic's design is all about spawning super lightweight processes (WASM module), also known as green threads or go-routines in other runtimes. Lunatic's processes are fast to create, have a small memory footprint and a low scheduling overhead. * Actors can be written in Rust, or in any other language that can communicate through the C ABI. **Shared-nothing architecture** * Lunatic uses a shared-nothing architecture, where each actor has its own memory space and cannot access memory owned by other actors. * This provides strong isolation and fault-tolerance guarantees, since failures in one actor cannot affect other actors. **Asynchronous I/O and Scheduling** * Lunatic framework provides a high-performance, asynchronous I/O system based on Rust's async/await syntax on top of Tokio async framework. * All processes are scheduled using an async executor and all code running on top of Lunatic will be automatically transformed to be non-blocking. All processes running on Lunatic are preemptively scheduled and executed by a work stealing async executor (Tokio). This gives you the freedom to write simple blocking code, but the runtime is going to make sure it actually never blocks a thread if waiting on I/O. * I/O operations can be performed without blocking the actor's execution, allowing for efficient use of system resources. Even if you have an infinite loop somewhere in your code, the scheduling will always be fair and not permanently block the execution thread. The applications that developers write on top of Lunatic framework do not utilize Rust async/await syntax, all APIs are synchronous. From the perspective of the developer, only blocking syscalls are used. However, the runtime will take care of actually scheduling another process' execution when one is waiting on networking traffic or is blocked for another reason. In this way, the Lunatic enables developers to get the best of both worlds, straightforward development (without async keywords) and the highest performance possible. ## Popular Mobile Concurrency Framework from other languages and ecosystem ### Apple Grand Central Dispatch (GCD) Apple's Grand Central Dispatch (GCD) is a technology that provides an efficient and easy-to-use API for performing concurrent programming tasks on macOS, iOS, and other Apple platforms. It is a low-level API that allows developers to perform tasks concurrently by dividing them into smaller, independent parts, and executing them simultaneously on different threads or processors. GCD is based on the concept of queues, which are data structures that hold blocks of code waiting to be executed. These blocks can be executed concurrently or serially, depending on the type of queue used. There are three types of queues in GCD: 1. Serial Queues: These queues execute blocks of code one at a time in the order in which they were added to the queue. This ensures that the blocks are executed in a predictable, deterministic manner. 1. Concurrent Queues: These queues execute blocks of code concurrently, meaning that multiple blocks can be executed simultaneously on different threads or processors. The blocks are still executed in the order in which they were added to the queue, but the exact order in which they complete is not deterministic. 1. Main Queue: This queue is a special type of serial queue that is associated with the main thread of the application. It is used to execute tasks on the main thread, which is the thread responsible for updating the user interface. GCD provides a simple API for creating and managing queues, as well as dispatching blocks of code to them. Blocks of code can be added to queues using the dispatch_async() function, which takes a block of code as its argument and adds it to the specified queue. GCD also provides a number of other functions for managing queues and executing code asynchronously, including dispatch_sync(), dispatch_barrier_async(), and dispatch_group_async(). In addition to queues, GCD also provides a number of other features, including: 1. Dispatch Sources: These are objects that monitor various system events, such as file system changes or network activity, and execute blocks of code when those events occur. 1. Dispatch Timers: These are objects that execute blocks of code at specified intervals. 1. Dispatch Semaphores: These are objects that allow you to control the concurrency of your code by limiting the number of blocks that can execute simultaneously. Apple's Grand Central Dispatch is a powerful and flexible technology that allows developers to easily perform concurrent programming tasks on Apple platforms. Its simple API and efficient implementation make it an ideal choice for building high-performance, scalable applications. But with the introduction of Swift language into Apple ecosystem, GCD may be replaced by new Swift language runtime. ### Swift Concurrency Runtime https://docs.swift.org/swift-book/documentation/the-swift-programming-language/concurrency/ Swift Concurrency Runtime is a new feature introduced in Swift 5.5 to simplify and enhance the process of writing concurrent code in Swift. It is built on top of the existing Swift runtime and leverages the capabilities of Apple's Grand Central Dispatch (GCD) and other low-level concurrency APIs. It provides a set of language-level features, libraries, and tools that allow developers to write concurrent code in a natural and easy-to-understand way. The Swift Concurrency model is based on the concept of asynchronous programming, where code is executed asynchronously on different threads or processors, without blocking the main thread. This allows for better utilization of system resources and improved performance. Some of the key features of Swift Concurrency Runtime include: 1. Async/Await: Async/Await is a new language-level feature that simplifies the process of writing asynchronous code. It allows developers to write code that looks and behaves like synchronous code, but runs asynchronously in the background. With async/await, developers can write asynchronous code that is easy to read, write, and debug. 1. Continuations: This allows you to pause and resume execution of asynchronous tasks, making it possible to write complex control flow without blocking. 3. Actors: Actors are a new concurrency primitive in Swift that provide a safe and easy-to-use way to write concurrent code. Actors are essentially objects that encapsulate state and behavior, and ensure that access to that state is always synchronized and thread-safe. Actors are isolated from each other and communicate with each other through asynchronous message passing. 4. Structured Concurrency: Structured Concurrency is a new programming pattern that makes it easier to write and manage concurrent code. It ensures that all tasks started by a given block of code are completed before that block of code exits. This makes it easier to reason about the flow of control in concurrent code, and helps prevent common concurrency bugs. 5. Task API: The Task API is a new set of libraries and tools that make it easier to create, manage, and debug tasks in Swift. Tasks are the fundamental unit of concurrency in Swift, and the Task API provides a set of functions and utilities for creating, running, and canceling tasks. 6. Debugging and profiling tools: Swift Concurrency provides a number of new tools and APIs to help you debug and profile your concurrency code, including new Xcode debuggers and performance profiling tools. #### Scheduling Swift Concurrency also provides a number of scheduling policies that determine how tasks are executed. These policies include: * Serial: This policy executes tasks one at a time, in the order they were added to the queue. This is useful for ensuring that tasks are executed in a specific order, such as updating the UI. * Concurrent: This policy executes tasks concurrently, allowing multiple tasks to run at the same time. This is useful for maximizing performance and utilizing system resources efficiently. * Deadline: This policy allows you to set a deadline for a task, after which it will be cancelled if it has not completed. This is useful for preventing tasks from running indefinitely and wasting system resources. * Group: This policy allows you to group tasks together and specify a maximum number of concurrent tasks that can be executed at once. This is useful for managing resources and ensuring that the application does not exceed its capacity. * Prioritized: This policy allows you to prioritize tasks based on their importance, ensuring that higher-priority tasks are executed before lower-priority tasks. There are four task priority levels that can be assigned to tasks. These priority levels are used to determine the order in which tasks are executed when multiple tasks are waiting to be executed: * background: This is the lowest priority level and is used for tasks that are not time-sensitive and can be executed in the background without affecting the user experience. Examples of background tasks include data synchronization, file backups, and system maintenance. * utility: This is the second-lowest priority level and is used for tasks that are important but not time-critical. Examples of utility tasks include image processing, data analysis, and non-critical network requests. * user-initiated: This is the second-highest priority level and is used for tasks that are initiated by the user and require a response within a reasonable time frame. Examples of user-initiated tasks include UI updates, network requests initiated by the user, and user interactions. * user-interactive: This is the highest priority level and is used for tasks that are initiated by the user and require an immediate response to provide a seamless user experience. Examples of user-interactive tasks include UI animations, touch input handling, and critical user interactions. You can assign a priority level to a task using the **TaskPriority** enumeration. For example, to create a user-interactive task, you can use the following code: ``` Task(priority: .userInitiated) { // Perform user-interactive task } ``` By assigning the appropriate priority level to each task, you can ensure that your application is using system resources efficiently and is providing a responsive and smooth user experience. #### Does Swift support M:N threading model? No, Swift does not directly support the M:N threading model. M:N threading is a threading model where multiple user-space threads (M) are mapped onto a smaller number of kernel threads (N) in order to improve performance and scalability. Swift Concurrency uses the actor model and relies on the underlying operating system's thread pool to schedule tasks. This is an approach similar to the one used by Go and other modern languages. The thread pool manages a pool of kernel threads and assigns tasks to them as needed, allowing for efficient use of system resources. While Swift does not directly support the M:N threading model, it provides a powerful and flexible concurrency model that allows you to write efficient and scalable concurrent code. By using actors and structured concurrency, you can ensure that your code is easy to reason about, avoids common concurrency pitfalls, and makes optimal use of system resources. #### What is difference between Swift task and OS’s thread? In Swift Concurrency, a task is a lightweight unit of work that represents a piece of code that needs to be executed asynchronously. A task is different from an operating system's thread in several ways: * Memory footprint: Tasks are more lightweight than threads, which means that they require less memory to execute. This allows you to create and manage a large number of tasks without running out of memory. * Concurrency control: Tasks are managed by the Swift Concurrency runtime and use a structured concurrency model that ensures safe and efficient use of system resources. In contrast, threads are managed by the operating system and can be used to implement various concurrency models, such as locks and semaphores, which can be more error-prone and harder to reason about. * Scheduling: The Swift Concurrency runtime manages the scheduling and execution of tasks, which allows it to optimize the use of system resources and ensure that tasks are executed efficiently. In contrast, the operating system schedules and executes threads based on its own algorithms and policies, which may not be optimized for your application's specific requirements. * Cancellation: Tasks can be cancelled and cleaned up automatically by the Swift Concurrency runtime, which helps prevent resource leaks and ensures that your application is using system resources efficiently. In contrast, threads can only be cancelled by the operating system, which may not always be able to clean up resources associated with the thread. #### Does Swift Task have stack? No. Swift's task-based concurrency system, which is built on top of Swift's async/await syntax, does not use separate stacks for each task. Instead, it uses a unified call stack, much like Kotlin coroutines. Each task is associated with a continuation, which is a lightweight object that represents the task's state and execution context. When a task is suspended, its continuation is saved to the heap, and when it is resumed, the continuation is restored and execution continues from where it left off. Unlike traditional threads, which require separate stacks for each thread to avoid stack overflow, Swift tasks use a "heap-based stack" approach that allows for the efficient and lightweight management of many tasks on a single call stack. #### Example code: Swift Concurrency to perform a network request and update the UI: ``` actor WeatherManager { var temperature: Double = 0.0 func fetchWeather() async { // Simulate network delay await Task.sleep(1_000_000_000) // Perform network request let url = URL(string: "https://api.openweathermap.org/data/2.5/weather?q=San%20Francisco&appid=YOUR_APP_ID")! let (data, _) = try! await URLSession.shared.data(from: url) let json = try! JSONSerialization.jsonObject(with: data, options: []) as! [String: Any] let main = json["main"] as! [String: Any] let temp = main["temp"] as! Double // Update temperature temperature = temp } } class WeatherViewController: UIViewController { @IBOutlet weak var temperatureLabel: UILabel! private let weatherManager = WeatherManager() override func viewDidLoad() { super.viewDidLoad() // Do any additional setup after loading the view. } override func viewWillAppear(_ animated: Bool) { super.viewWillAppear(animated) // Fetch weather data Task { await weatherManager.fetchWeather() // Update UI on main thread DispatchQueue.main.async { temperatureLabel.text = "\(weatherManager.temperature)°F" } } } } ``` In this example, we define a **WeatherManager** actor that encapsulates the state and behavior of fetching the weather data. It has a **fetchWeather** method that performs a network request using **URLSession** and updates the **temperature** property. In the **WeatherViewController**, we create an instance of the **WeatherManager** and call **fetchWeather** asynchronously using **Task**. Once the weather data is fetched, we update the UI by setting the **temperatureLabel** text on the main thread using **DispatchQueue.main.async**. In summary, Swift Concurrency Runtime is a powerful and flexible technology that simplifies the process of writing concurrent code in Swift. Its async/await syntax, actors, structured concurrency, and task API make it easy to write high-performance, scalable applications. ### Kotlin Concurrency Runtime Kotlin provides several concurrency constructs for writing asynchronous and concurrent code. These constructs are built on top of the Java concurrency constructs and provide additional abstractions to simplify and streamline concurrency programming. The coroutine is the key element of Kotlin concurrecny runtime. The Kotlin concurrency runtime manages the creation, scheduling, and execution of coroutines and provides a set of APIs for working with concurrency. Some of the key features of the Kotlin concurrency runtime include: * Coroutine builders: Kotlin provides several coroutine builders, such as launch and async, that allow you to create and launch coroutines. These builders provide a simple and flexible way to create and manage coroutines. * Suspending functions: Coroutines can use suspending functions to perform long-running tasks without blocking the thread. Suspending functions are functions that can be suspended and resumed later, which allows for efficient use of system resources. * Coroutine contexts: Coroutine contexts provide a way to define the execution context for coroutines. Contexts can include things like the dispatcher, which determines the thread pool to use for executing the coroutine. * Dispatchers: Dispatchers are used to determine the thread pool that should be used for executing coroutines. Kotlin provides several built-in dispatchers, such as Dispatchers.IO and Dispatchers.Main, that allow you to specify the thread pool based on the type of work being performed. * Coroutine cancellation: Coroutines can be cancelled using a Job object, which allows for efficient cleanup of resources associated with the coroutine. * Channels: Channels provide a way of communicating between coroutines in a thread-safe and non-blocking manner. A channel can be used to send and receive messages between coroutines and provides a simple and efficient way of implementing producer-consumer and other message-passing patterns. * Atomic variables: Atomic variables provide a way of implementing thread-safe and non-blocking access to shared variables. They provide a simple and efficient way of implementing synchronization primitives such as locks and semaphores. * Executors: Executors provide a way of managing and scheduling threads for executing tasks. Kotlin provides several types of executors, such as **ThreadPoolExecutor**, **SingleThreadExecutor**, and **FixedThreadPoolExecutor**, which can be used to execute tasks in parallel and manage system resources efficiently. #### Scheduling In Kotlin, task scheduling is managed by the underlying Java Virtual Machine (JVM), which provides a thread pool to execute tasks in parallel. Kotlin's concurrency constructs, such as coroutines and executors, use this thread pool to schedule and execute tasks. The JVM thread pool is managed by the **ExecutorService** interface, which provides methods for submitting tasks to the pool and managing the pool's resources. The **ExecutorService** interface provides several implementations, such as **ThreadPoolExecutor**, **SingleThreadExecutor**, and **FixedThreadPoolExecutor**, which can be used to customize the thread pool's behavior based on your application's requirements. Kotlin's coroutines use the Dispatchers interface to manage task scheduling. The Dispatchers interface provides several implementations, such as **Dispatchers.Default**, **Dispatchers.IO**, and **Dispatchers.Main**, which can be used to execute tasks in different thread contexts: * Dispatchers.Main – Runs on the Main Thread, mainly for UI operations and light work. * Dispatchers.IO – Optimized for IO operations, doesn’t utilize the Main Thread. * Dispatchers.Default – Optimized for CPU-intensive work. For example, the **Dispatchers.IO** dispatcher is optimized for I/O-bound tasks and uses a thread pool that is separate from the JVM's default thread pool. Kotlin's **async** and **await** keywords use coroutines to execute tasks asynchronously and non-blockingly. The **async** keyword returns a **Deferred** object, which represents a task that will be executed asynchronously in the background. The **await** keyword is used to wait for the completion of the **Deferred** object and retrieve its result. Kotlin's channels provide a way of communicating between coroutines in a non-blocking and thread-safe manner. Channels use a buffer to store messages, and provide methods for sending and receiving messages between coroutines. The ***receive()*** method blocks the current coroutine until a message is available in the channel, while the ***send()*** method blocks until the message is successfully sent. Kotlin does not provide built-in task (coroutine) priority levels like Swift. Instead, task priorities in Kotlin are managed by the underlying Java Virtual Machine (JVM), which provides a thread pool to execute tasks in parallel. You can also use Kotlin's **coroutineContext** to customize the context in which a coroutine is executed, including the thread pool used to execute the coroutine. The **coroutineContext** provides several elements, such as the **Dispatcher** and **Job**, which can be used to specify the thread pool and manage the coroutine's execution. #### Does Kotlin support M:N threading model? Kotlin itself does not support the M:N threading model, but it runs on the Java Virtual Machine (JVM), which does support it. The JVM uses a hybrid threading model that combines both M:N and 1:1 threading models. The JVM creates one operating system thread per processor core and schedules multiple Java threads on each operating system thread. The JVM also includes a thread pool for I/O operations, which uses a separate set of operating system threads that are managed independently of the JVM's main thread pool. Kotlin's concurrency constructs, such as coroutines and executors, use the JVM's thread pool to execute tasks in parallel. Kotlin's Dispatchers interface provides several implementations, such as Dispatchers.Default, Dispatchers.IO, and Dispatchers.Main, which can be used to execute tasks in different thread contexts. For example, the Dispatchers.IO dispatcher is optimized for I/O-bound tasks and uses a thread pool that is separate from the JVM's default thread pool. #### What is difference between Kotlin task and OS’s thread? In Kotlin, a task refers to a unit of work that can be executed concurrently with other tasks, while an OS thread refers to an operating system construct that provides a separate context for executing code in parallel. A Kotlin task is typically implemented using coroutines, which are lightweight threads that can be suspended and resumed at specific points in their execution. Coroutines provide a high-level abstraction for concurrent programming that makes it easy to write and reason about concurrent code. Coroutines are executed within a thread pool managed by the JVM and can be scheduled on different threads depending on their context. In contrast, an OS thread is an independent execution context provided by the operating system. Each OS thread has its own stack, program counter, and set of registers, which allows it to execute code independently of other threads. The operating system schedules threads on the CPU using various scheduling algorithms and provides mechanisms for synchronization and communication between threads. The key difference between Kotlin tasks and OS threads is the level of abstraction provided. Tasks implemented using coroutines provide a high-level abstraction that makes it easy to write concurrent code, while OS threads provide a low-level abstraction that provides direct access to the underlying hardware. ##### Does Kotlin coroutine have stack? Kotlin coroutines do not have their own stack. Instead, they share the call stack of the underlying thread they are running on, which allows for lightweight and efficient concurrency. Kotlin coroutines are built on top of suspending functions, which are functions that can be paused and resumed without blocking the underlying thread. When a coroutine calls a suspending function, it is suspended and its state is saved. When the suspending function completes, the coroutine is resumed from where it left off, allowing it to continue executing. Because coroutines share the call stack of the underlying thread, they are much lighter weight than traditional threads, allowing for many more concurrent tasks to be executed efficiently. Additionally, Kotlin provides a variety of tools for controlling and managing the concurrency of coroutines, including structured concurrency, coroutine scopes, and cancellation. https://medium.com/@lucianoalmeida1/an-overview-on-kotlin-coroutines-d55e123e137b #### Example code: perform a network request on a background thread and update the UI on the main thread ``` import kotlinx.coroutines.* class MainActivity : AppCompatActivity() { override fun onCreate(savedInstanceState: Bundle?) { super.onCreate(savedInstanceState) setContentView(R.layout.activity_main) // Start a coroutine on a background thread GlobalScope.launch(Dispatchers.IO) { // Perform a network request val response = getResponseFromServer() // Update the UI on the main thread withContext(Dispatchers.Main) { updateUI(response) } } } private suspend fun getResponseFromServer(): String { // Perform network request delay(5000) // Simulate network delay return "Server response" } private fun updateUI(response: String) { // Update UI elements with response data textView.text = response } } ``` In this example, we create a new coroutine using the ***GlobalScope.launch()*** function, which runs the coroutine on a background thread using the ***Dispatchers.IO*** dispatcher. We then perform a network request in the ***getResponseFromServer()*** function, which is a suspend function that can be paused and resumed at specific points during its execution. Once the network request is complete, we update the UI on the main thread using the ***withContext()*** function and the ***Dispatchers.Main*** dispatcher. This ensures that the UI updates are performed on the main thread, which is required in Android applications. ### Go Concurrency Runtime Go is a programming language that has built-in support for concurrency through its Goroutines and Channels constructs (CSP - Communicating Sequential Processes). Goroutines are lightweight threads of execution that can run concurrently with other Goroutines, while Channels provide a mechanism for communication and synchronization between Goroutines. #### CSP (Communicating Sequential Processes) CSP (Communicating Sequential Processes) is a concurrency model that was introduced by Tony Hoare in the 1970s. The Go programming language was designed with CSP as a guiding principle and provides built-in support for implementing concurrent programs using the CSP model. In Go, CSP is implemented using two main constructs: Goroutines and Channels. Goroutines are lightweight threads of execution that can run concurrently with other Goroutines, while Channels provide a mechanism for communication and synchronization between Goroutines. A Channel is a typed conduit through which Goroutines can send and receive values. When a Goroutine sends a value on a Channel, the value is added to the end of the Channel's internal queue. When a Goroutine receives a value from a Channel, it blocks until a value is available on the Channel's internal queue. Channels can be used to synchronize the execution of Goroutines and to pass data between them. Here's an example of using Goroutines and Channels to implement a concurrent program that calculates the sum of a large array of integers: ``` func main() { numbers := []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10} // Create a Channel to receive partial sums sumChannel := make(chan int) // Launch Goroutines to calculate partial sums for i := 0; i < len(numbers); i += 2 { go sum(numbers[i:i+2], sumChannel) } // Aggregate partial sums received on the Channel totalSum := 0 for i := 0; i < len(numbers)/2; i++ { totalSum += <-sumChannel } fmt.Println(totalSum) } func sum(numbers []int, sumChannel chan<- int) { sum := 0 for _, number := range numbers { sum += number } sumChannel <- sum } ``` In this example, we create a Channel to receive partial sums and launch two Goroutines to calculate the sum of the first and second halves of the array. Each Goroutine calculates a partial sum and sends it on the Channel using the <- operator. The main Goroutine blocks until it has received all partial sums on the Channel, at which point it aggregates the sums and prints the total. #### Go Concurrency Scheduling In Go, Goroutines are scheduled by the Go runtime, which is responsible for mapping Goroutines onto operating system threads and scheduling them for execution on multiple CPUs or processor cores. The Go runtime includes a scheduler that uses a work-stealing algorithm to balance the workload across all available cores, which maximizes CPU utilization and minimizes contention for shared resources. When a Goroutine is created, it is added to a pool of ready Goroutines. The scheduler then selects one or more Goroutines to run and assigns them to operating system threads. When a Goroutine blocks, such as when it is waiting for input on a Channel, the scheduler can switch to another Goroutine that is ready to run, which allows multiple Goroutines to execute concurrently on different threads. ##### Does Go runtime support M:N threading model? Yes, Go's concurrency model is based on M:N threading, where M goroutines are multiplexed onto N operating system threads. This allows for efficient use of resources and scheduling of goroutines. The Go runtime provides a scheduler, which is responsible for managing the scheduling of goroutines onto operating system threads and manages the execution and scheduling of Goroutines across multiple CPUs or processor cores. The scheduler uses a work-stealing algorithm to ensure that work is evenly distributed across threads and to prevent any one thread from becoming overloaded. The scheduler can also dynamically adjust the number of threads used by the program based on workload and system resources. In addition to the scheduler, the Concurrency Runtime also includes a garbage collector that is optimized for concurrent execution. The garbage collector runs concurrently with Goroutines and uses a tri-color marking algorithm to identify and collect unused memory. This allows Goroutines to continue executing while the garbage collector runs in the background, which helps to minimize pauses and improve overall performance. ##### Does goroutine have stack? Yes, Goroutines have their own stack. When a Goroutine is created, the Go runtime allocates a small amount of memory for the Goroutine's stack. This memory is used to store local variables, function arguments, and other data that the Goroutine needs to execute. Unlike traditional threads, which typically have fixed-size stacks that can be quite large, Goroutine stacks are allocated dynamically and can grow or shrink as needed. This means that Goroutines can be created and destroyed very quickly, without incurring the overhead of allocating and deallocating large blocks of memory. If the Goroutine needs more stack space than is available in the initial allocation, the runtime will dynamically allocate additional stack space as needed, using a technique called stack splitting. Stack splitting works by dividing the Goroutine's stack into small segments. When the Goroutine needs more stack space, the runtime allocates a new stack segment and adds it to the Goroutine's stack. This allows the Goroutine's stack to grow dynamically, without the need for large, fixed-size stacks. The use of stack splitting allows Goroutines to be much more memory-efficient than traditional threads, which typically require large, fixed-size stacks that can be wasteful of memory. By dynamically allocating stack space as needed, Goroutines can avoid the overhead of large stack allocations, and can use memory more efficiently overall. ***Note: ChatGPT knowledge is old, the new Goroutines use stack copying technique which is much simpler to maintain, but may suffer little performance hit*** https://blog.cloudflare.com/how-stacks-are-handled-in-go/ The dynamic nature of Goroutine stacks also makes them more memory-efficient than traditional thread stacks. Because Goroutines only use as much stack space as they need, they can avoid wasting memory on unused stack space. This can be especially important in applications that use large numbers of Goroutines, as it can help to reduce overall memory usage and improve performance. ##### Is Goroutine cooperatively scheduled? No, Goroutines in Go are not cooperatively scheduled. Instead, they are scheduled using a preemptive scheduling algorithm. This means that the Go runtime can interrupt a Goroutine at any time and switch to another Goroutine, even if the currently executing Goroutine is not blocked or waiting on any resources. The preemptive scheduling algorithm used by Go is designed to be lightweight and efficient, and it is optimized for the kind of highly concurrent, asynchronous programming that is common in modern applications. By preemptively scheduling Goroutines, the Go runtime can ensure that all Goroutines have a fair chance to execute, even in the presence of long-running or blocking tasks. ##### Does Goroutine support priority scheduling? No, Goroutines in Go do not support traditional priority scheduling. Instead, they use a technique called "asymmetric multiprocessing" to achieve similar functionality. Asymmetric multiprocessing is a technique where a single operating system thread manages multiple Goroutines. The Go runtime uses this technique to prioritize Goroutines based on their state and behavior, rather than using traditional priority scheduling. For example, the Go runtime may give higher priority to Goroutines that are blocked on I/O or waiting on a channel, as these Goroutines are more likely to be able to make progress quickly. Conversely, Goroutines that are busy performing compute-intensive tasks may be given lower priority, as they are less likely to be able to make progress quickly. The use of asymmetric multiprocessing allows the Go runtime to achieve many of the benefits of traditional priority scheduling, without some of the drawbacks. By prioritizing Goroutines based on their behavior and state, the runtime can ensure that the most important work gets done first, while still providing fair and efficient scheduling for all Goroutines. ***(Needs to double check, sounds like BS, since AMP-asymmetric multiprocessing is more about CPU hardware architecture when compair to SMP)*** ### DART Concurrency Runtime (https://dart.dev/guides/language/concurrency) Dart is a modern, object-oriented language that supports both concurrent and parallel computing. Dart provides a number of language features and libraries that make it easy to write concurrent and parallel code, including: * Asynchronous programming with the async and await keywords: This allows you to write non-blocking, asynchronous code that can run in the background while your main program continues to execute. Async programming allows for non-blocking execution of I/O and other long-running operations, allowing the program to continue executing while waiting for the operation to complete. This approach can improve the responsiveness and scalability of Dart programs, particularly in situations where multiple operations need to be executed concurrently. * Isolates: Isolates are lightweight, independent threads of execution in Dart. Each isolate has its own memory heap and runs in its own isolated environment, allowing for true parallelism and shared-nothing concurrency. Isolates provide a mechanism for parallelism and concurrency, allowing multiple independent computations to be executed simultaneously without interference. Isolates communicate with each other through asynchronous message passing. * Message Passing: Dart also supports message passing, which allows for the exchange of data between concurrent processes or threads. In Dart, message passing is implemented using "ports", which are objects that provide a communication channel between two or more isolates. Dart's message passing system is designed to be fast and efficient, and can be used to implement a wide variety of concurrent and parallel algorithms and data structures. * Streams: Streams provide a way to handle asynchronous data streams in Dart. They allow you to handle a sequence of values as they arrive, rather than blocking until all the data is available. * Future and Completer: Futures provide a way to represent the result of an asynchronous operation that hasn't completed yet. Completers are used to create and control the completion of futures. * **Shared memory??**: Dart provides support for shared memory between isolates using typed arrays and atomic operations. This allows for efficient communication and synchronization between isolates. ***Reached HackMD note limit, more DART info, see*** https://hackmd.io/_Nepnw2UQpGdh6ROPizFrg