We often use the strlen(str)
function to get the length of the string. This is broad-use, and simply implemented. In C, we cannot consider a string as an object. We need to find the '\0'(NULL) character to determine the end of the string.
C11 6.4.5
6. a byte or code of value zero is appended to each multibyte
character sequence that results from a string literal or literals.
C11 7.1.1
A string is a contiguous sequence of characters terminated by and including the first null
character. The term multibyte string is sometimes used instead to emphasize special
processing given to multibyte characters contained in the string or to avoid confusion
with a wide string. A pointer to a string is a pointer to its initial (lowest addressed)
character. The length of a string is the number of bytes preceding the null character and
the value of a string is the sequence of the values of the contained characters, in order.
Briefly, strlen() is the finding first 0 function.
First, we can implement the following in accordance with the behavior defined by the language:
It is a very easy implement - compare every character to find NULL character. But we can find out something - the function always comparing one character at one time. This can obviously be optimized!
Depending on the computer architecture, we can operate on data of a long size. Taking a computer with 64-bit architecture for example, 8 characters can be compared at a time, and the performance can be increased up to 8 times for longer strings.
If the string length is less than one long size (or smaller, correctly to say, to the address that matches the bytes address), the design does not improve its performance. We have to think about how to handle very short strings. In addition, we also need to find the aligned pointer.
If a long string is given, we will need to find the first '\0' in the later data. The later data should be spilted into many one-long-size block (sub-string), which can be used directly by memory.
After finding NULL charater, we can calculate the length by comparing with start pointer.
GNOME - GNU Network Object Model Environment
Glib is one of the bsp library for GNOME, but GNU plan fail.
Glib has good cross-platform features, and can deployed on different OS.
GNU Liberary provides this optimized implementation for strlen(), which will be described in detail in this article.
1. Small string / Find start-pointer
The first goal would be to find a aligned pointer on a longword boundary. If we find a NULL in here, return back.
In this for-loop for ((unsigned long int) char_ptr & (sizeof (longword) - 1)) != 0
we can find the padding address. The address will be the start-address for next stage.
Consider two string condition.
The pointer &str[1]
can not be used for the long-byte operator.
2. Check the block (sub-string)
So here is the question. How to find the null character in a long size data?
Determine if a word has a byte less than n
Test if a word x contains an unsigned byte with value < n. Specifically for n=1, it can be used to find a 0-byte by examining one long at a time, or any byte by XORing x with a mask first. Uses 4 arithmetic/logical operations when n is constant.
Requirements: x>=0; 0<=n<=128
#define hasless(x,n) (((x)-~0UL/255*(n))&~(x)&~0UL/255*128)
To count the number of bytes in x that are less than n in 7 operations, use
#define countless(x,n) \
(((~0UL/255*(127+(n))-((x)&~0UL/255*127))&~(x)&~0UL/255*128)/128%255)
Determine if a word has a zero byte
The subexpression (v - 0x01010101UL), evaluates to a high bit set in any byte whenever the corresponding byte in v is zero or greater than 0x80. The sub-expression ~v & 0x80808080UL evaluates to high bits set in bytes where the byte of v doesn't have its high bit set (so the byte was less than 0x80). Finally, by ANDing these two sub-expressions the result is the high bits set where the bytes in v were zero, since the high bits set due to a value greater than 0x80 in the first sub-expression are masked off by the second.
We can notice how to build a mask in different platform by an elegant method in source code.
According to this implementation, we can speculate that there would be an 8x improvment performance in 64-bits architecture computer.
[holes](Printable characters
Codes 20hex to 7Ehex, known as the printable characters, represent letters, digits, punctuation marks, and a few miscellaneous symbols. There are 95 printable characters in total.[m] )
As a software engineer, what's the most ingenious line of code you've seen?
Test:
string size = 100000
run 10000 times and calculate the avg-time
Branch-Test
In this experiment, we can find that glib-preformance will have the best performance in -O1 option. In addition, string.h- strlen() is hardly affected by GCC-optimitize.
So what is the reason that -O2 amd -O3 can't achieve better performance in glib - strlen case?
//ToDo
What is strlen@PLT?
PTL(procedure linkage table) makes dynamic loading and linking easier to use. How is glibc loaded at runtime?
This should be the implementation on the hardware. Maybe we should test the strlen() in different platform.
https://stackoverflow.com/questions/1733281/where-is-the-implementation-of-strlen-in-gcc
//ToDo: Compare ARM vs Intel architecture
strlen
is an indeterminate function with time complexity O(n). In this section we will show what we can do or learn in glib-strlen implementation.
1. Do/Don't
Test:
2. upper string
ASCII :
A = 0x41 = 0b 0100 0000
a = 0x61 = 0b 0110 0000
…
Z = 0x5a = 0b 0101 1010
z = 0x7a = 0b 0111 1010
A single upper character can be transfered to a lower character by "or-ing a mask" (| 0b01000000). In other words, "and-ing a mask"(& 0b11011111) can let a lower character become a upper character.
Test upper a 100000 size string.
Output:
strupr()
and strlwr()
are not standard function, these functions are sometimes shipped with the implementation of the compiler on some platforms but generally it is recomemded to write your own custom functions to conver the string to upper and/or lower case.It is implementation-defined whether they are supported or not.
As a software engineer, what's the most ingenious line of code you've seen?
GCC-buildin
(cont.)