Solutions
A
You are required to write C programs that convert between different floating-point formats. One of these formats is the half-precision floating-point, also known as FP16
or float16
, which is a compact representation designed to save memory. Half-precision numbers are increasingly used in fields like deep learning and image processing. Unlike the single-precision floating-point format (float32
), defined by IEEE 754, FP16 has fewer bits allocated for the exponent.
The functions for converting between FP16 and float32 (FP32) are provided below.
Consider the fp16_to_fp32
function, which converts an FP16 value to a single-precision IEEE floating-point (float32) format.
Next, complete the fp32_to_fp16
function, which is responsible for converting a float32 value to FP16. The identifiers starting with #A
indicate the sections where you need to provide the appropriate answers.
The values for #A01
, #A02
, and #A03
must be expressed as hexadecimal integer literals in their shortest form. For example, use 0x7C00
rather than 0x007C00
.
A01 = 0x7800000
A02 = 0xFFF
A03 = 0x7E00
B
The bfloat16 format is a 16-bit floating-point representation, designed to provide a wide dynamic range by using a floating radix point. It is a shortened version of the 32-bit IEEE 754 single-precision format (binary32), aimed at accelerating machine learning.
The structure of the bfloat16 floating-point format is as follows.
Since bfloat16 (bf16) shares the same number of exponent bits as a 32-bit float, encoding and decoding values is relatively simple.
You are required to implement the fp32_to_bf16
function, which converts a float32 to bfloat16. This conversion should be binary identical to the BFloat16 format. Floats must round to the nearest even value, and NaNs should be quiet. Subnormal values should not be flushed to zero, unless explicitly needed.
B01 = 0x7f800000
B02 = 0x7fff
B03 = 0x10
B04 = 0x10
C
In Problem A, the fabsf
function was used within the fp32_to_fp16
function. Now, you are required to implement your own version of fabsf
.
C01 = 0x7FFFFFFF
Consider the C implementation of __builtin_clz
:
The __builtin_clz
function is a GCC built-in function that counts the number of leading zero bits in the binary representation of an unsigned integer, starting from the most significant bit (MSB). It returns an integer representing how many zero bits precede the first 1
in the binary representation of the number.
For example, given the number x
, __builtin_clz(x)
counts how many 0
s are at the beginning of the binary form of x
.
Examples of __builtin_clz
x = 16
00000000 00000000 00000000 00010000
27
(as there are 27 zeros before the first 1
).__builtin_clz(16)
returns 27
.x = 1
00000000 00000000 00000000 00000001
31
.__builtin_clz(1)
returns 31
.x = 255
00000000 00000000 00000000 11111111
24
.__builtin_clz(255)
returns 24
.x = 0xFFFFFFFF
(which is 4294967295
in decimal)
11111111 11111111 11111111 11111111
__builtin_clz(0xFFFFFFFF)
returns 0
.x = 0
(undefined behavior)
00000000 00000000 00000000 00000000
x = 0
, the result is undefined according to the GCC manual.Next, you will implement the fp16_to_fp32
function, which converts a 16-bit floating-point number in IEEE half-precision format (bit representation) to a 32-bit floating-point number in IEEE single-precision format (bit representation). This implementation will avoid using any floating-point arithmetic and will utilize the clz
function discussed earlier.
C02 = 0x70
C03 = 0x17
The C function is_aligned determines whether a given pointer ptr is aligned to the specified alignment. It returns 1 if the pointer is properly aligned and 0 otherwise.
If the alignment is guaranteed to be a power of 2, the modulo operation can be optimized using a bitwise-AND operation.
You are expected to write a function align_ptr
that aligns a pointer to the next address boundary based on the specified alignment. This function adjusts the given pointer to the next memory address that is a multiple of the provided alignment (which must be a power of 2, such as 1, 2, 4, or 8). If the pointer is already aligned, the original pointer will be returned. The alignment operation is performed using bitwise arithmetic without any conditional checks, making the function efficient.
D01 = ~alignment
Assuming that uintptr_t
represents the native word size of the platform (which is typically 4 or 8 bytes), you are required to implement your own version of memcpy
that prevents word-sized memory accesses at unaligned addresses, thus avoiding potential performance penalties or misaligned access problems. This implementation is more efficient than byte-by-byte copying, especially for larger memory blocks, as it utilizes word-sized memory transfers where possible.
D02 = *d++ = *s++
D03 = *d_word++ = *s_word++
D04 = n > 0
OR n
OR n != 0
D05 = *d++ = *s++
E
Examine a C program that includes multiple memory bugs, categorized as shown in the following table.
ID | Memory bug |
---|---|
A | Dangling Pointer |
B | Mismatched Memory Management Functions |
C | Use-After-Free |
D | Invalid Free |
E | Memory Leak |
F | Out-of-Bounds Access |
G | Double Free |
Now, you need to identify the causes of the memory bugs by selecting from the provided table and proposing fixes. Fields #E01
, #E02
, #E03
, #E04
, #E05
, and #E06
must be filled in the format "ID:code", where ID corresponds to the item listed in the table. For example, 'A' stands for 'Dangling pointer,' and 'code' represents your proposed change. If the code is empty, it means the suggested action is to remove the indicated line. For instance, 'D:' indicates the memory bug is classified as 'invalid free,' and the corresponding line should be removed. Note that no spaces are allowed in #E01
, #E02
, #E03
, #E04
, #E05
, and #E06
.
E01 = E:free(data);
E02 = F:for(i=0;i<5;i++)
E03 = G:
E04 = A:
E05 = D:
E06 = B:free(data);
F
You are required to write a C function that allocates nodes for a singly linked list with a specified number of nodes. The function dynamically allocates memory for the list, properly linking each node’s next pointer. The data field of each node remains uninitialized. It returns a pointer to the head of the newly created linked list, or NULL
if the length is 0. The caller is responsible for freeing the allocated memory.
F01 = &node->next
OR &(node->next)