Try   HackMD

Q&A - Advanced Linux and C Language related

tags: 2022/09 C Q&A C Language

(2022/9/11) Collections of Q&A related to C language.
Latest update on 2022/12/27.


Table of Contents



User Space API - getcpu() / sched_getcpu()

A : from this article - NUMA Get Current Node/Core, it is recommended to use sched_getcpu() iso getcpu().

sched_getcpu() is the most stable way to get cpuid. Since, you were explicitly looking for both cpu and node id, that's why I replied with getcpu(). Actually, getcpu() don't have libc wrapper, you need to use syscalls() system call. And, this is another of reason sched_getcpu() is better than getcpu(), along with portability issues.

Below is the example to show the C programming and compiling tips.

getcpu-ex1.c

#include <stdio.h> #include <utmpx.h> #include <numa.h> int sched_getcpu(); int main(void) { int cpu = sched_getcpu(); int node = numa_node_of_cpu(cpu); printf("CPU : %d, Node : %d\n", cpu, node); return 0; }

Terminal

$ sudo apt-get install libnuma-dev $ gcc getcpu-ex1.c -o getcpu-ex1 -lnuma $ ./getcpu-ex1 CPU : 0, Node : 0

Try getcpu() - fails

In man sched_getcpu, it says the following. Let's give it a try, but not working. Seems it requires to call getcpu through syscall.

       The call
           cpu = sched_getcpu();
       is equivalent to the following getcpu(2) call:
           int c, s;
           s = getcpu(&c, NULL, NULL);
           cpu = (s == -1) ? s : c;

getcpu-ex2.c

clude <stdio.h> #include <utmpx.h> #define _GNU_SOURCE #include <sched.h> int getcpu(); int main(void) { int c, s, cpu; s = getcpu(&c, NULL, NULL); cpu = (s == -1) ? s : c; printf("CPU : %d\n", cpu); return 0; }

Terminal

$ gcc getcpu-ex2.c -o getcpu-ex2
/tmp/ccbE4ZCd.o: In function `main':
getcpu-ex2.c:(.text+0x2e): undefined reference to `getcpu'
collect2: error: ld returned 1 exit status

Try getcpu() - Success

Follow this article Linux System Call Tutorial with C to find below example which works with getcpu().

getcpu-ex3.c

#include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #include <sys/types.h> int main() { unsigned cpu, node; // Get current CPU core and NUMA node via system call // Note this has no glibc wrapper so we must call it directly syscall(SYS_getcpu, &cpu, &node, NULL); // Display information printf("This program is running on CPU core %u and NUMA node %u.\n\n", cpu, node); return 0; }

Terminal

$ gcc getcpu-ex3.c -o getcpu-ex3 $ ./getcpu-ex3 This program is running on CPU core 1 and NUMA node 0.

Further study



Memory Management : 8086 / 80286 / 80386 and onwards

This article, Chapter 2. Memory Addressing, is by far the most comprehensive article about x86 memory management I ever found.

CPU Mode Addressing Capacity-Physical Capacity-Virtual
8086 Real mode CS(16bits<<4):IP(16bits) 1MB (20bits) NA
80286 Real mode CS(16bits<<4):IP(16bits) 1MB (20bits) NA
Protected Virtual Address mode (PVAM) 32bits pointer
=
Selector (16bits) + Offset(16bits)
=>
Segment Base Address (24bits) + Offset(16bits)
16MB 1GB
80386 Real mode CS(16bits<<4):IP(16bits) 1MB (20bits) NA
Protected mode 32(16+16)/48(16+32)-bit pointer 4GB (32bits) 64TB (4GB/Segment X 16K Segments)
Virtual 8086 mode same as 8086 1MB NA

Figure below shows the format of a descriptor for the 80286 through the Pentium II. Note that each descriptor is 8 bytes in length, so the global and local descriptor tables are each a maximum of 64K bytes in length. Descriptors for the 80286 and the 80386 through the Pentium II differ slightly, but the 80286 descriptor is upward-compatible (with reserved 2 bytes). Though we can see the 'ugly' structure of descriptor in 80386, to be backward compatible with 80286.

Descriptor difference between 80286 and 80386


Memory Management : Segment Selectors and Segment Descriptors

Item CPU Description Example Components Location
Segment Register 8086 A 16-bit value of Segmentation is the process in which the main memory of the computer is logically divided into different segments and each segment has its own base address. It shifts 4 bits left, then adding Offset Registers to get physical address. CS, DS, SS, ES 16 bit Segment Registers CPU Segment Registers
Segment Selector 80286 Still a 16-bit value, but it now indexes a table of up to 16M (24bits) Segment Descriptors CS, DS, SS, ES 16 bits consistes of
1) 13-bit index value that is used to index the Segment Descriptor table;
2) 1 bit Table Indicator: This is a 1-bit flag that indicates whether the Segment Descriptor table is located in the Global Descriptor Table (GDT) or the Local Descriptor Table (LDT);
3) 2 bit Requested Privilege Level (RPL): This is a 2-bit field that specifies the privilege level of the code or data that is accessing the segment
CPU Segment Registers referring to Segment Descriptor Table in Memory, using lgdt instruction
Segment Descriptor 80286 Expanded to 24 bits for Base Address and contains additional information such as the segment size and the privilege level of the segment 24 bits
Segment Selector 80386 Still 16 bits, same as 80286, add 2 more Segment Selector Registers, FS and GS. The Segment Descriptor has been further expanded to 32 bits CS, DS, SS, ES, FS, GS 16 bits, same as 80286 Same as 80286
Segment Descriptor 80386 Has been further expanded to 32 bits Base Address 32 bits
Offset Register 8086 Store the offset through which the actual address is calculated. (CS:)IP
(DS:)BX, DI, SI
(SS:)SP, BP
(ES:)BX, DI, SI
16 bits CPU Offset Registers
Offset Register 80286 Store the offset through which the actual address is calculated. (CS:)IP
(DS:)BX, DI, SI
(SS:)SP, BP
(ES:)BX, DI, SI
16 bits CPU Offset Registers
Offset Register 80386 Store the offset through which the actual address is calculated. (CS:)EIP
(DS:)EBX, EDI, ESI
(SS:)ESP, EBP
(ES:)EBX, EDI, ESI
32 bits CPU Offset Registers

Segment Selector Format - 80286 and onwards

Segment Selector Format

Segment Descriptor Format between 80286 (total 6 bytes, 2 bytes are reserved) and 80386 (total 8 bytes)

Segment Descriptor Format

Capacity between 80286 and 80386

80286 vs 80386 Segment Descriptor

GDTR / LDTR Base and Limit

GDTR / LDTR Base and Limit

References:


Find another Hackmd x86assemlby for more info related to embedded assembly in C language.



Q: Mixed C and Assembly Programming in Embedded Systems - 3 ways of implementation : Instruction intrinsics, inline and embedded assembler

A:

Instruction intrinsics, and inline and embedded assembler are built into the compiler to enable the use of target processor features that cannot normally be accessed directly from C or C++.

Instruction intrinsics
Instruction intrinsics provide a way of easily incorporating target processor features in C and C++ source code without resorting to complex implementations in assembly language. They have the appearance of a function call in C or C++, but are replaced during compilation by assembly language instructions.

Inline assembler
The inline assembler supports interworking with C and C++. Any register operand can be an arbitrary C or C++ expression. The inline assembler also expands complex instructions and optimizes the assembly language code.

Note
The output object code might not correspond exactly to your input because of compiler optimization.

Embedded assembler
The embedded assembler enables you to use the full ARM assembler instruction set, including assembler directives. Embedded assembly code is assembled separately from the C and C++ code. A compiled object is produced that is then combined with the object from the compilation of the C and C++ source.

The following table summarizes the main differences between instruction intrinsics, inline assembler, and embedded assembler.

Table 3-1 Differences between instruction intrinsics, inline and embedded assembler

Feature Instruction Intrinsics Inline assembler Embedded assembler
Instruction set ARM and Thumb. ARM and Thumb. (a) ARM and Thumb.
ARM assembler directives None supported. None supported. All supported.
C/C++ expressions Full C/C++ expressions. Full C/C++ expressions. Constant expressions only.
Optimization of assembly code Full optimization. Full optimization. No optimization.
Inlining Automatically inlined. Automatically inlined. Can be inlined by linker if it is the right size and linker inlining is enabled.
Register access Physical registers, including PC, LR and SP. Virtual registers except PC, LR and SP. Physical registers, including PC, LR and SP.
Return instructions Generated automatically. Generated automatically. BX, BXJ, and BLX instructions are not supported. You must add them in your code.
BKPT instruction Supported. Not supported. Supported.

(a) The inline assembler supports Thumb instructions in ARMv6T2, ARMv6-M, and ARMv7.


Q: Mixed C and Assembly Programming in Embedded Systems - ARM describes how to write a mixture of C, C++, and assembly language code for the ARM architecture

A:


Q: Mixed C and Assembly Programming in Embedded Systems - using in-line assembly for different CPU architectures, and providing examples in more generic form

A:

Inline Assembly
One of the most common methods for using assembly code fragments in a C programming project is to use a technique called inline assembly. Inline assembly is invoked in different compilers in different ways. Also, the assembly language syntax used in the inline assembly depends entirely on the assembly engine used by the C compiler. Microsoft C++, for instance, only accepts inline assembly commands in MASM syntax, while GNU GCC only accepts inline assembly in GAS syntax (also known as AT&T syntax).

ARM : Can refer to Main page: Embedded Systems/ARM Microprocessors
Practically everyone using ARM processors uses the standard calling convention. This makes mixed C and ARM assembly programming fairly easy, compared to other processors. The simplest entry and exit sequence for Thumb functions is:

an_example_subroutine:
    PUSH {save-registers, lr} ; one-line entry sequence
    ; ... first part of function ...
    BL thumb_sub 	;Must be in a space of +/- 4 MB 
    ; ... rest of function goes here, perhaps including other function calls
    ; somehow get the return value in a1 (r0) before returning
    POP {save-registers, pc} ; one-line return sequence

The standard C calling convention for ARM is specified in detail by ARM PLC in "Procedure Call Standard for the ARM Architecture".

The simplest entry and exit sequence for 32-bit ARM functions is very similar to Thumb functions:

an_example_ARM32_subroutine:
    PUSH {r4-r11, lr} ; one-line function prologue
    ; ... first part of function ...
    BL subroutine_name 	;Must be in a space of +/- 4 MB 
    ; ... rest of function goes here, perhaps including other function calls
    ; ...
    POP {r4-r11, pc} ; one-line exit sequence (function epilogue)

ARM GCC Inline Assembler Cookbook is a good small article to read through before doing Inline Assembly. It starts with a simple example

/* NOP example */
asm("mov r0,r0");

More than one assembler instruction in a single inline asm statement.

asm(
"mov     r0, r0\n\t"
"mov     r0, r0\n\t"
"mov     r0, r0\n\t"
"mov     r0, r0"
);

So far, the assembler instructions are much the same as they'd appear in pure assembly language programs. However, registers and constants are specified in a different way, if they refer to C expressions. The general form of an inline assembler statement is

asm(code : output operand list : input operand list : clobber list);

Also talk about the solution is to add the volatile attribute to the asm statement to instruct the compiler to exclude your assembler code from code optimization. Remember, that you have been warned to use the initial example. Here is the revised version:

/* NOP example, revised */
asm volatile("mov r0, r0");

Reuse your assembler language parts by defining them as macros and put them into include files. Using such include files may produce compiler warnings, if they are used in modules, which are compiled in strict ANSI mode. To avoid that, you can write asm instead of asm and volatile instead of volatile. These are equivalent aliases. Here is a macro which will convert a long value from little endian to big endian or vice versa:

#define BYTESWAP(val) \
    __asm__ __volatile__ ( \
        "eor     r3, %1, %1, ror #16\n\t" \
        "bic     r3, r3, #0x00FF0000\n\t" \
        "mov     %0, %1, ror #8\n\t" \
        "eor     %0, %0, r3, lsr #8" \
        : "=r" (val) \
        : "0"(val) \
        : "r3", "cc" \
    );

Macro definitions will include the same assembler code whenever they are referenced. This may not be acceptable for larger routines. In this case you may define a C stub function. Here is the byte swap procedure again, this time implemented as a C function.

unsigned long ByteSwap(unsigned long val)
{
asm volatile (
        "eor     r3, %1, %1, ror #16\n\t"
        "bic     r3, r3, #0x00FF0000\n\t"
        "mov     %0, %1, ror #8\n\t"
        "eor     %0, %0, r3, lsr #8"
        : "=r" (val)
        : "0"(val)
        : "r3"
);
return val;
}

Q: Mixed C and Assembly Programming ARM Cortex-M MCU in-line assembly example Youtube

This Youtube - Lecture 32. Mixing C and Assembly with ARM Cortext-M MCU provides clear explanation with some examples.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


References


Previous article - Q&A Linux
Next article - Q&A AI
back to marconi's blog