owned this note
owned this note
Published
Linked with GitHub
# Books : Embedded System Programming Series
*This will focusing more on performance side and program interaction with OS.
Peripheral related chapter will be skipped.\(2025\/10\/08\)*
# Book \: Embedded Programming with Modern C++ Cookbook
> Reference \:
> * Embedded Programming with Modern C++ Cookbook(Kobo/電子書)
> https://24h.pchome.com.tw/books/prod/DJBQ3H-D900D7ZM1
> * Embedded Programming with Modern C++ Cookbook Repository
> https://github.com/PacktPublishing/Embedded-Programming-with-Modern-CPP-Cookbook
*In this post, I will only include some topics that interest me and some extra material that I read when studying this book.*
## Fundamental of Embedded Systems
> Reference \:
> * Embedded Programming with Modern C++ Cookbook **Page 10**
> https://24h.pchome.com.tw/books/prod/DJBQ3H-D900D7ZM1
> A major part of an embedded program is checking the status, reading input, sending data, or controlling the external device.
:::info
:information_source: **Embedded System Categories**
My other posts \:
* **Microcontrollers \(MCU\)**
* *RTOS with STM32* \:
https://hackmd.io/@Erebustsai/S1B6s0ns2
* *MCU Basic & Advanced Study Note* \:
https://hackmd.io/@Erebustsai/SJ6xmLqY6
* **A System on Chip \(SoC\)**
* *Linux Driver Development with Raspberry Pi* \:
https://hackmd.io/@Erebustsai/B1xvJYRrha
* *Raspberry Pi GPU Audio Video Programming* \:
https://hackmd.io/@Erebustsai/rJgn-Q89C
* *CUDA for Tegra Programming* \:
https://hackmd.io/MgVSv30jRnaEUJETjDiZgQ?view#CUDA-for-Tegra-Programming
* **Application-Specific Integrated Circuits \(ASIC\)**
* **FPGAs**
* *FPGA Verilog Development* \:
https://hackmd.io/@Erebustsai/HybNY2-IR
:::
### Working with Limited Resources
**Endianness**
> Reference \:
> * Embedded Programming with Modern C++ Cookbook **Page 15**
> https://24h.pchome.com.tw/books/prod/DJBQ3H-D900D7ZM1
With 0x01020304 32-bit value stored in ptr address \:
* **Big-endian**
| offset in memory | Value |
| ---------------- | ----- |
| ptr | 0x01 |
| ptr +1 | 0x02 |
| ptr +2 | 0x03 |
| ptr +3 | 0x04 |
* **Little-endian**
| offset in memory | Value |
| ---------------- | ----- |
| ptr | 0x04 |
| ptr +1 | 0x03 |
| ptr +2 | 0x02 |
| ptr +3 | 0x01 |
:::info
:information_source: **Big\-Endian 與 Little\-Endian 的差異與判斷程式碼**
https://blog.gtwang.org/programming/difference-between-big-endian-and-little-endian-implementation-in-c/
:::
### Hardware Failure \& Influence of Environmental Conditions
For satellites or any device that need to go to space, expose to differernt level of energy particles is inevitable.
* Single Event Effects
https://radhome.gsfc.nasa.gov/radhome/see.htm
:::info
:bulb: **Checking for hardware failure**
My github repository that include checking Disk, Ram and GPU is functioning correctly. Notice that the CPU is not checked because development board is connected with ssh. Any CPU failure will cause ssh to disconnect.
https://github.com/Chen-KaiTsai/HomeLabHelper_repo/tree/main/system_integraty_test
:::
For Nvidia Jetson Series, TX2i can be considered as one of the good selection that provide higher resilience to temperature, humidity, vibration and, most importantly, ECC memory. However, this kind of device is not boardly accessible for independant scholars.

* Cyclic Redundancy Check \(CRC\)
https://gordonfang-85054.medium.com/crc檢查碼編碼-fdec319341e9
### Using C++ for Embedded Development
> Reference \:
> * Embedded Programming with Modern C++ Cookbook **Page 19**
> https://24h.pchome.com.tw/books/prod/DJBQ3H-D900D7ZM1
In the book, the author listed feature and techniques that make C++ the best choice for developers \:
* You don't pay for what you don't use.
* Object-oriented programming to time the code complexity
* Resource acquisition is initialization \(RAII\)
* Exceptions
* A powerful standard library
* Threads and memory model as part of the language specification.
> ... You may ask why C++ does not enforce every method to be virtual by default. This approach is adopted by Java and doesn't seem to have any downsides.
>
> The reason is that virtual functions **are not free**. Function resolution is performed at runtime via the virtual table—an array of function pointers. It adds a slight overhead to the function invocation time. If you do not need dynamic polymorphism, you do not pay for it. That is why C++ developers add the virtual keyboard, to explicitly agree with functionality that adds performance overhead.
## Setting Up Environment
Raspberry Pi 3B\+ OS although has been upgraded to the newest version, the `GLIBC` is still too old to cross compile with EPYC Server which runs *Ubuntu 22.04*.
**The following error showes \:**
```bash
/lib/arm-linux-gnueabihf/libc.so.6: version `GLIBC_2.34' not found
```
We can check the version of in the raspberry pi system with `strings /lib/arm-linux-gnueabihf/libc.so.6 | grep GLIBC`.
**Output \:**
Apparent in the following output we can see that the version of `GLIBC` is too new for my respberry pi. Therefore, a transition to *WSL2 Ubuntu 18.04* is applied.
```bash
GLIBC_2.4
GLIBC_2.5
GLIBC_2.6
GLIBC_2.7
GLIBC_2.8
GLIBC_2.9
GLIBC_2.10
GLIBC_2.11
GLIBC_2.12
GLIBC_2.13
GLIBC_2.14
GLIBC_2.15
GLIBC_2.16
GLIBC_2.17
GLIBC_2.18
GLIBC_2.22
GLIBC_2.23
GLIBC_2.24
GLIBC_2.25
GLIBC_2.26
GLIBC_2.27
GLIBC_2.28
GLIBC_2.29
GLIBC_2.30
GLIBC_PRIVATE
GNU C Library (Debian GLIBC 2.31-13+rpt2+rpi1+deb11u11) stable release version 2.31.
```
### Using CMake as a Build System
**Simple Example**
```cmake
cmake_minimum_required(VERSION 3.10.2)
project(hello)
add_executable(hello main.cpp)
set(CMAKE_C_COMPILER /usr/bin/arm-linux-gnueabi-gcc)
set(CMAKE_CXX_COMPILER /usr/bin/arm-linux-gnueabi-g++)
set(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
set(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)
set(CMAKE_FIND_ROOT_PATH_MODE_PACKAGE ONLY)
```
Notice that the minimum required cmake version is the version of `cmake` that run on my *Ubuntu 18.04*.
**Check Binary**
```
hello: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.3, for GNU/Linux 3.2.0, BuildID[sha1]=ce2a2b7298fd29190937deffb4d7d8bb7532e699, not stripped
```
## Working with Different Architecture
:::info
:bulb: `size_t`
https://blog.csdn.net/qq_34018840/article/details/100884317
:::
### Converting the endianness
We can use Network protocols to transfer between device with differet endianness, since in binary network protocol, bytes order is always in big-endian.
**htonl, htons, ntohl, ntohs \- convert values between host and network byte order**
> Reference \:
> * Linux man page
> https://linux.die.net/man/3/htonl
> * Book Example
> https://github.com/PacktPublishing/Embedded-Programming-with-Modern-CPP-Cookbook/tree/master/Chapter03/enconv
:::info
:information_source: **Input-output system calls in C | Create, Open, Close, Read, Write**
https://www.geeksforgeeks.org/input-output-system-calls-c-create-open-close-read-write/
:::
### Data Alignment
The simple rule as I learn from other books. Sort the data fields in a `struct` from large to small. The book provide two rules when sorting element in a `struct`.
* Group them by their size.
* Order the groups from largest to smallest data types.
> Reference \: * Embedded Programming with Modern C++ Cookbook **Page 84**
> https://24h.pchome.com.tw/books/prod/DJBQ3H-D900D7ZM1
> Modern operating systems operate 4 KB memory blocks or pages to map a process' virtual address space to physical memory. Aligning data structures on 4 KB boundaries can lead to performance gains
### Working with Packed Structures
> Reference\:
> C/C++ `__attribute__((__packed__))` 用法與範例
> https://shengyu7697.github.io/cpp-attribute-packed/
`struct` marked with `__attribute__((packed))` will not be padded automatically. This can reduce the `struct` size but will not enforce alignment as compiler do normally.
```cpp
struct ObjectMetadata1 {
uint8_t access_flags;
uint32_t size;
uint32_t owner_id;
Category category;
} __attribute__((packed))
```
Notice the this marking only work with `gcc` compiler. If on `Windows` OS or using `clang`, they will have different way to do the same thing.
### Aligning Data with Cache Lines
> Reference\:
> * Alignas in C++ 11
> https://www.geeksforgeeks.org/cpp/alignas-in-cpp-11/
> * Example Source Code
> https://github.com/PacktPublishing/Embedded-Programming-with-Modern-CPP-Cookbook/blob/master/Chapter03/cache_align/cache_align.cpp
* Frequently accessing data that's used together should be put into the same cache line.
* Data used independently by different threads should not be put into the same cache line. This is **false sharing**.
**Aligned Static Bufffer**
```cpp
alignas(kAlignSize) char aligned_static_buffer[kAllocBytes];
```
`alignas` feature is added in `c++11` and can be used on object or types. This can work on static array buffer.
**Algined Dynamic Static Buffer**
```cpp
if (posix_memalign((void**)&aligned_dynamic_buffer,
kAlignSize, kAllocBytes)) { /*allocation failed*/ }
```
The `posix_memalign()` will return an error code if failed to allocate memory or the specified `kAlignSize` is not a power of two.
## Handling Interupts
### Interrupt Service Routines
> Reference \: * Embedded Programming with Modern C++ Cookbook **Page 86**
> https://24h.pchome.com.tw/books/prod/DJBQ3H-D900D7ZM1
* Interrupts are identified by numbers, starting with 0. The numbers are mapped to the hardware interrupt request lines \(IRQ\) that physically correspond to specific processor pins.
* When an IRQ line is activated, the processor uses its number as an offset in the interrupt vector array to locate the address of the interrupt service routine. The interrupt vector array is stored in memory on a fixed address.
* Developers can define or redefine ISRs by updating the entries in the interrupt vector arrays.
* A processor can be programmed to enable or disable interrupts, either for specific IRQ lines or all interrupts at once. When interrupts are disabled, the processor does not invoke the corresponding ISRs, although the status of the IRQ lines can be read.
* IRQ lines can be programmed to trigger interrupts, depending on the signal on the physical pin. This can be at the low level of the signal, the high level of the signal, or the edge \(which is a transition from low to high or high to low\).
:::info
:information_source: **Interrupt Request \(IRQ\)**
ISR should be short and fast. Taking out instructions that is not necessary is imoportant.
https://bcc16.ncu.edu.tw/pool/1.14.shtml
:::
## Debugging, Logging, and Profiling
### Working with Core Dump
:::info
:information_source: **Core dump**
https://wiki.archlinux.org/title/Core_dump
:::
## Memory Management
> Reference \: * Embedded Programming with Modern C++ Cookbook **Page 137**
> https://24h.pchome.com.tw/books/prod/DJBQ3H-D900D7ZM1
> ...the most important characteristic of memory
management in embedded systems is determinism, or predictability.
> ...
> Similarly, predictability also applies to memory allocation and deallocation time. In many situations, embedded applications favor spending more memory to achieve deterministic timing.
### Computer Memory Architectures \(Hardware\)
> Reference \:
> * Shared memory
> https://en.wikipedia.org/wiki/Shared_memory
> * Distributed memory
> https://en.wikipedia.org/wiki/Distributed_memory
> * Distributed shared memory
> https://en.wikipedia.org/wiki/Distributed_shared_memory
**Shared Memory**

> https://en.wikipedia.org/wiki/File:MMU_and_IOMMU.svg
Basically, modern multi-core computers are all using this architecture with the following differences
* **UMA \(Uniform Memory Access\)**\: All the processor share the physical memory uniformly. Modern Computer with one CPU chips usually belong to this type.
* **NUMA \(Non-uniform Memory Access\)**\: Memory access time is different and depends on the memory location relative to a processor. Multiple CPU chips \(usually two or four\) on the system and some memory module connected to one and some connected to another, thus create the differences in accessing cost.
**Distributed System**
Multiprocessor computer system or computer cluster will belong to this architecture. In short, multiple computer connected with ethernet or infiniband.
**Distributed Shared System**
> In computer science, distributed shared memory (DSM) is a form of memory architecture where physically separated memories can be addressed as a single shared address space.
A single address space used for all processors. Memory coherency will be a problem.
:::info
The NVIDIA H100 GPU use this design on SMs
> * NVIDIA Hopper Architecture In-Depth
> https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/#distributed_shared_memory
:::
### Using Dynamic Memory Allocation
**Caveat**
* Timing \: Memory allocation time depands on system status and OS implementation. This characteristic will prevent predicting respond time of a process.
* Memory Fragmentation \: Deallocat and allocate memory might crate fragment in the memory, which is too small for other application to utilize.
**Rule of Thumb**
Allocate all memory at the beginning of the application. Although it will make the algorihtm always run at peak memory usage, it will provide more consistant in memory consumption in the system.
### Exploring Object Pools
* Wikipedia
https://en.wikipedia.org/wiki/Object_pool_pattern
* Unity Learn \: Introduction to Object Pooling
https://learn.unity.com/tutorial/introduction-to-object-pooling#5ff8d015edbc2a002063971b
### Ring Buffer
* Wikipedia
https://zh.wikipedia.org/zh-tw/%E7%92%B0%E5%BD%A2%E7%B7%A9%E8%A1%9D%E5%8D%80
* 並行程式設計: Ring buffer
https://hackmd.io/@sysprog/concurrency-ringbuffer
* Book Implementation
https://github.com/PacktPublishing/Embedded-Programming-with-Modern-CPP-Cookbook/blob/master/Chapter06/ringbuf/ringbuf.cpp
### Shared Memory \(Software\)
* linux 共用記憶體 shm_open ,mmap的正確使用
https://blog.csdn.net/ababab12345/article/details/102931841
* C/C++: Shared Memory,不同 Process 資料交換的方式
https://www.bigcatblog.com/shared_memory/
* Book Implementation
https://github.com/PacktPublishing/Embedded-Programming-with-Modern-CPP-Cookbook/blob/master/Chapter06/shmem/shmem.cpp
Notice that the book implementation use a wrapper class to hide the low level POSIX API.
:::info
:information_source: **`/dev/shm` Size**
`df -h` display its size as about half of the system memory. This file will reside in system memory and will have same speed reading and writing from it. However, only when the file is used, will the memory really used.
**On EPYC Server with 512G system memory**
```
tmpfs 252G 0 252G 0% /dev/shm
```
:::
### Using Specialized Memory
> Embedded systems often provide **access to their peripheral devices over specific ranges of memory addresses**. When a program accesses an address in such a region, it does not read or write a value in memory. Instead, data is sent to a device or read from a device mapped to this address.
:::success
* 記憶體對應技術-MMIO
https://blog.csdn.net/weixin_43405280/article/details/131663456
* Memory Mapped IO, Peripherals, and Registers
https://jsandler18.github.io/extra/peripheral.html
:::
**Book\'s Example**
> Reference\:
> * Book's Example Source Code
> https://github.com/PacktPublishing/Embedded-Programming-with-Modern-CPP-Cookbook/blob/master/Chapter06/timer/timer.cpp
The book provide an example of accessing system timer peripheral device in Raspberry Pi 3.
> **The system timer is connected to the processor using an MMIO interface. This means it has a dedicated range of physical addresses, each of them with a specific format and purpose.**
```cpp
constexpr uint32_t kTimerBase = 0x3F003000;
struct SystemTimer {
uint32_t CS;
uint32_t counter_lo;
uint32_t counter_hi;
};
```
However, `0x3F003000` is a physical address and our program runs in virtual address. We need to open `/proc/mem` in Linux, which provide access to physical memory addresses. Note that this require `root`.
```cpp
int memfd = open("/dev/mem", O_RDWR | O_SYNC);
if (memfd < 0) {
throw std::system_error(errno, std::generic_category(),
"Failed to open /dev/mem. Make sure you run as root.");
}
SystemTimer *timer = (SystemTimer*)mmap(NULL, sizeof(SystemTimer),
PROT_READ|PROT_WRITE, MAP_SHARED,
memfd, kTimerBase);
if (timer == MAP_FAILED) {
throw std::system_error(errno, std::generic_category(),
"Memory mapping failed");
}
```
### Extra
In this chapter, there are many uses of `mmap` which is a system call for linux. In most of the reference above, those articles are more about specific topic of `mmap` usage. In this section, I will include some youtube video that shows what `mmap` really is and how to use them by `Jacob Sorber` https://www.youtube.com/@JacobSorber
:::success
Notice that most of the functions used in this book are from Linux function call. Therefore, `man` them is possible. For example, `man mmap` will show **Linux Programmer's Manual**.
:::
* Reading and Writing Files in C, two ways \(fopen vs. open\)
https://www.youtube.com/watch?v=BQJBe4IbsvQ
* How processes get more memory. \(mmap, brk\)
https://www.youtube.com/watch?v=XV5sRaSVtXQ
* How to Map Files into Memory in C \(mmap, memory mapped file io\)
https://www.youtube.com/watch?v=m7E9piHcfr4
* Simple Shared Memory in C \(mmap\)
https://www.youtube.com/watch?v=rPV6b8BUwxM
## Multithreading and Synchronization
> Reference \:
> * Modern C++ \(cpp\) Concurrency by Mike Shah
> https://www.youtube.com/playlist?list=PLvv0ScY6vfd_ocTP2ZLicgqKnvq50OCXM
### Data Synchronization
:::info
:information_source: **`std::scoped_lock` in c++17**
*Can accept multiple mutexes to prevent deadlocks*
https://en.cppreference.com/w/cpp/thread/scoped_lock
:::
### Atomic Variables
> Reference \:
> * Embedded Programming with Modern C++ Cookbook **Page 181**
> https://24h.pchome.com.tw/books/prod/DJBQ3H-D900D7ZM1
**Reference to `std::atomic`**
https://en.cppreference.com/w/cpp/atomic/atomic
> In C++20, atomic variables receive wait, notify_all, and notify_one methods, similar to the methods of condition variables. They allow implementation of the logic that previously required condition variables by using much more efficient and lightweight atomic variables.
### C++ Memory Model
> Reference \:
> * 理解C/C++ 11記憶體模型
> https://juejin.cn/post/7350089511291289615
> * Memory Model in C++ 11 by geeksforgeeks
> https://www.geeksforgeeks.org/memory-model-in-cpp-11/
> * Arvid Norberg\: The C++ memory model\: an intuition
> https://www.youtube.com/watch?v=OyNG4qiWnmU
> * CppCon 2017\: Fedor Pikus “C\+\+ atomics, from basic to advanced. What do they really do?”
> https://www.youtube.com/watch?v=ZQFzMfHIxng
**What is Memory Model \?**
* The memory access order of shared memory \/ data in a multi-threaded support language.
* What level of instruction reorder is allowed.
> Reference\:
> * Hardware Memory Models - 筆記
> https://blog.kennycoder.io/2022/07/18/Hardware-Memory-Models-%E7%AD%86%E8%A8%98/
When compiler optimization reordered the code, everything might not work as it seems.
```cpp
// Thread 1
x = 1;
done = 1;
// Thread 2
while (done == 0) { /*Loop body*/ }
print(x);
```
***Can Thread 2 print `0`\?***
The answer is \-\> depend on hardware and the compiler used.
* If `x86` CPU architecture is used and generate assembly line by line \(**no instruction reorder**\), it will always print `1`.
* If `ARM` or `PowerPC` used, it can be `0`.
* If compiler reorder the code, it can print `0` and can be infinite `while` loop.
**Memory Model in CPU Architecture**

* **Total Store Order \(TSO\)**\: Memory Store operation have a global order that every processor sees. The store operation must be done one by one and the order directly related to your program store operation order. In order to achieve TSO, the CPU design must sacrifice some concurrency performance.
* **Weak Memory Order \(WMO\)**\: Everything need to be done by the programmer who need to insert memory fences to reach some sort of order required.
**Memory Model in C++ Programming Languages**
:::success
Different programming language will have different memory model definition.
:::
C++ provide **a** memory model in language level that will work on different CPU architecture and different compiler so that the multi-threaded program can be protable.
C++ use `memory order` to descrive `memory model` and implemented with `atomic` variables. `atomic` operation will have `load()` and `release()`.
**Sequenced-before**
In **a** thread, if statement A is before statement B, the result and the effect of the statement A can be seen by the stetement B.
```cpp
r2 = x.load(std::memory_order_relaxed);
y.store(42, std::memory_order_relaxed);
```
**Happens-before**
`happnes-before` is on **different** threads. If A happens-before B, the memory status of statement A can be seen by statement B. `happens-before` include `inter-thread happens-before` and `synchronizes-with`.
**Synchronizes-with**
`synchronizes-with` describe the propagation after a variable is modified. If a thread modify a variable and the change can be seen by other threads, it is `synchronizes-with`. Basically, a `happens-before` across threads; therefore, a operation is `synchronizes-with` it will be happens-before as well.
**Carries Dependency**
In **a** thread, statement A `sequenced-before` statement B and the result of statement B is effected by statement A. Their relationship is `carries dependency`
```cpp
int *a = &var1; // A
int *b = &var2; // B
c = *a + *b; // C statement is carries dependency with statement A and B.
```
**C++ Memory Orders**
In C++, there are 6 different memory orders that is used to describe a memory model. Order from relaxed to strict\:
* `memory_order_relaxed`\: Only guaranteed the `load` and `store` on the atomic variable is atomic.
```cpp
std::atomic<int> x = 0;
std::atomic<int> y = 0;
// Thread 1
r1 = y.load(memory_order_relaxed); // A
x.store(r1, memory_order_relaxed); // B
// Thread 2
r2 = x.load(memory_order_relaxed); // C
y.store(42, memory_order_relaxed); // D
```
After `join()` the above threads, `r2 == r1 == 42` is possible since `reorder` is allowed. \(`D -> A -> B -> C`\). Program counter, thread stopper \(in CUDA AI framework\), and simple sum can be the use case.
* `memory_order_consume`\: `consume` and `release` should be used together. Can be used when threads only sync on operation that has `carries dependency`.
```cpp
std::atomic<std::string*> pstr;
int data;
// Thread 1
std::string* p = new std::string("Hello");
data = 42;
pstr.store(p, std::memory_order_release);
// Thread 2
std::string* p2;
while (!(p2 = pstr.load(std::memory_order_consume)))
;
std::assert(*p2 == "Hello"); // never fires: carries dependency
std::assert(data == 42); // can fire: no carries dependency
```
* `memory_order_acquire`\: `synchronize-with` relationship. All the write operation be `release` in thread 1 can be seen by all the operations after `acquire` in thread 2.
```cpp
std::atomic<bool> ready { false };
int data = 0;
std::atomic<int> var = { 0 };
// sender thread
data = 42;
var.store(100, std::memory_order_relaxed);
ready.store(true, std::memory_order_release);
// receiver thread
while (!ready.load(std::memory_order_acquire))
;
assert(data == 42); // never fires
assert(var == 100); // never fires
```
* `memory_order_release`\: Never used alone.
* `memory_order_acq_rel`\: TODO\: Hard to use
* `memory_order_seq_cst`\: Most strict order. Not only the order is consistant inter thread but also in-between threads.
## Time Points and Intervals
Use `std::chrono`.
### Monotonic Clock
* **System Clock**\:
Will be adjust with time synchronization service, thus it might go backward.
* **Steady Clock**\:
> The steady clock is monotonic; it is never adjusted and never goes backward. This property has its cost—it is not related to wall clock time and is usually represented as the time since the last reboot.
> ...
> The steady clock should not be used for persistent timestamps that need to remain valid after reboots.
> ...
> Also, the steady clock should not be used for any time calculations involving time from different sources, such as remote systems or peripheral devices
* **High-resolution Clock**\:
`std::chrono::high_resolution_clock`
https://en.cppreference.com/w/cpp/chrono/high_resolution_clock.html
## Reduce Power Consumption
* ACPI \(Advanced Configuration and Power Interface\)
https://zh.wikipedia.org/zh-tw/%E9%AB%98%E7%BA%A7%E9%85%8D%E7%BD%AE%E4%B8%8E%E7%94%B5%E6%BA%90%E6%8E%A5%E5%8F%A3
* CPU Frequency Governors
https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt
# Book \: C++ System Programming Cookbook
*Will not include duplicated topics*
> Reference\:
> * C++ System Programming Cookbook: Practical recipes for Linux system-level programming using the latest C++ features \(Paperback\)
> https://www.tenlong.com.tw/products/9781838646554
> * Github Repository
> https://github.com/PacktPublishing/C-System-Programming-Cookbook
## Revisited C++
This chapter revisited important C++ features
:::info
:information_source: **My Other Posts**
* Books : The Art of ... Series?
https://hackmd.io/@Erebustsai/S1sguFsJkg
* Books : Optimized C++ & 21st Century C & Understanding and Using C Pointers & Effective Modern C++
https://hackmd.io/@Erebustsai/SJ6qYG7fJe
:::
### `std::filesystem`
> Reference\:
> * Filesystem library \(since C++17\)
> https://en.cppreference.com/w/cpp/filesystem.html
### The C++ Core Guidelines
> Reference\:
> * C++ Core Guidelines
> https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines
### Using `std::span`
This allow pass in plain array or **contiguous** STL containers to a function.
```cpp
void print(std::span<int> container) {
/* Do Something */
}
```
### How Ranges Work\?
> Reference\:
> * C++ 20 Ranges
> https://viml.nchc.org.tw/archive_blog_814/
To me this is a thing that gradually make C++ not like C++. The performance of this Range thingy is completely depend on how it is designed and as a programmer, we need to follow its design to make program run efficiently. However, the \"old\" way of writing it is self-explanatory on how we optimized it and what kind of optimization is applied or can be applied. Writing things down step by step rather than making the code look easy. When it comes down to performance, this kind of code is not helping.
這種寫法只對來自其他語言的新手友善而已,對於一個長時間使用C++的人反而不直觀。而事實是,所有的書籍、教學、網路上的資料都不是使用這種方法最為預設或最基礎的教學。它真的對新手友善嗎?我不會花時間在這種\"新增\"的功能上,更何況是改變我寫程式的習慣。
### C++20 Modules
> Reference\:
> * 取代傳統 include 的 C++20 Modules(1/N)
> https://kheresy.wordpress.com/2025/01/07/modules-of-cpp-20/
This might be the future but it is too early for pratically use it on projects.
## Dealing with Processes and Threads
:::info
:information_source: **Process vs Thread**
> Reference\:
> * WTF is a Thread
> https://www.schneems.com/2017/10/23/wtf-is-a-thread/
> * How to choose between process and threads
> https://stackoverflow.com/a/9876890
*In the above reference, although its title only mentioning thread but the blog post is basically describe process and thread.*
:::
## Deep Dive into Memory Management
### `std::aligned_storage` is Deprecated
> Reference\:
> * C++23中 `std::aligned_storage` 被棄用的深度解析
> https://blog.csdn.net/weixin_61470881/article/details/144594288
> * What's the best way to have aligned storage for an object so you can do placement new \(and do explicit d'tor calls later\)?
> https://www.reddit.com/r/cpp/comments/1dree7m/whats_the_best_way_to_have_aligned_storage_for_an/
Basically just use `alignas()` to make things aligned.
## Multi-threaded Synchronization Mechanisms
My Other Post
* Books : The Art of … Series?
https://hackmd.io/@Erebustsai/S1sguFsJkg
* Books : Optimized C++ & 21st Century C & Understanding and Using C Pointers & Effective Modern C++
https://hackmd.io/@Erebustsai/SJ6qYG7fJe
## Pipes\, First-In First-Out \(FIFO\)\, Message Queues\, and Shared Memory

> Reference\: [Book](https://www.tenlong.com.tw/products/9781838646554) Page 147
:::success
:bulb: **Half-duplex, Full-duplex**
* **Half-duplex**\: Only one side can work at a any given time.
* **Full-duplex**\: Two side can participate simultaneously.
:::
### Pipe
A relation between two processes is required and will be used to make the pipe visible by both processes. The data will copied to the kernel, then will be copied to the receiver process.
### FIFO \(named pipe\)
A pathname is required to create as a special kind of file and any process can use it.
> FIFOs are typically used for IPC between processes on the same machine but, as it is based on files, if the file is visible by other machines, a FIFO could potentially be used for IPC between processes on different machines. Even in this case, the kernel is involved in the IPC, with data copied from kernel space to the user space of the processes.
### Message Queue
> A message queue is a linked list of messages stored in the kernel.
### Shared Memory
The fastest way to do IPC, identified with a key, reside in the kernel space, and require synchronization.
:::info
:information_source: **Sample Codes**
> * Github Repository
> https://github.com/PacktPublishing/C-System-Programming-Cookbook
:::
## Dealing with Time Interfaces
### C++20 Calendar and TimeZone
> Reference\:
> * C++20 chrono 的日曆功能
> https://viml.nchc.org.tw/cpp20-chrono-calendar/
## Managing Signals
Signals are software interrupts and can be used to manage asynchronous events. This can be send in many different ways. e.g. terminal inputs, `htop` kill with signal...
### Signals

The following system call can be used\:
* `signal(SIGKILL, SIG_IGN)`\: Can ignore a signal. However, `SIGKILL` cannot be ignored.
* `signal(SIGTERM, handleSigTerm)`\: Pass a user defined function to handle a signal.
* `kill(pid, SIGTERM)`\: Send a signal to a process with `pid`.
* `raise(SIGTERM)`\: Raise signal for process itself.
## Scheduling
### Get Scheduler Policy
:::info
:information_source: **Linux System Development Related Headers**
* `sys/types.h`
https://www.ibm.com/docs/en/zos/3.1.0?topic=files-systypesh-typedef-symbols-structures
* `unistd.h` Unix Standard Header
https://zh.wikipedia.org/zh-tw/Unistd.h
* `sched.h`
https://pubs.opengroup.org/onlinepubs/7908799/xsh/sched.h.html
:::
* `SCHED_OTHER`\: The normal scheduler policy \(that is, not for real-time processes\)
* `SCHED_FIFO`\: First-in\/first-out
* `SCHED_RR`\: Round-robin
:::success
:bulb: `sched_setscheduler()`
This function will require a leveraged permission.
:::
### `nice` Value
`SCHED_OTHER` and `SCHED_NORMAL` policy implements the completely fair scheduler \(CFS\). `nice()` system call will increment the static prioirty of the task by the given amount.
:::info
:information_source: **nice value vs priority**
https://superuser.com/a/877353
TL;DR
The nice value is just a hint reside in user space and set by user. The real priority is still decided by the kernel, which can totally ignore the nice value.
:::
# Appendix
## Crash Course Study Note\: How CPUs Interact with So Many Different Devices
[Course Video](https://www.youtube.com/watch?v=tadUeiNe5-g)
* I\/O devices are slow compare to CPUs. To prevent constantly monitor the I\/O changes \(this change is slow so wasting CPU cycles checking on something that changed rarely compare to how fast a CPU can go.\)
* MCU is used to monitor the I\/O changes and pass detected result to the CPU.
* Most of the hardware have control chips that communicate with CPU and CPU only send requests and receive results.
:::success
CPU don't control I\/O devices. It communicate with them.
:::

> Reference\: https://youtu.be/tadUeiNe5-g?t=530
* **Memory-Mapped I\/O**
Mapping the on-device MCU memory\/register\/RAM to the memory address of the CPU and R\/W that memory address will interact with the device. As describe in [this section](https://hackmd.io/d8us7WQSQkCiY_3g4l0GyQ?view#Using-Specialized-Memory)
* **Isolated I\/O**
Require extra hardware in the chip to mapp with I\/O ports.
* **Interrupt**
Devices interrupt the CPU.
* **Polling**
CPU checking on all the devices.
:::info
:information_source: **Northbridge \& Southbridge I\/O Chips**
Nowadasy most of the northbridge and some of the southbridge is inside the CPU and the remaining of them is now a chipset \(x99, x299\)
:::