---
tags: SDE-good-read
topic: Optimed Strlen (cont.)
---
# Optimed Strlen (cont.)
## 1. uclib vs glib
[uclib](https://uclibc-ng.org/) ->https://git.uclibc-ng.org/git/uclibc-ng.git

uClibc-ng is a small C library for developing embedded Linux systems. It is much smaller than the GNU C Library, but nearly all applications supported by glibc also work perfectly with uClibc-ng.
In ```strlen()``` function, the "finding zero in long size string" is different between glib and uclib.
[See Here](https://hackmd.io/@YLowy/HkQ68pjN5)
```c=
//uclib
if (((longword - lomagic) & himagic) != 0)
```
vs
```c=
//glib
if (((longword - lomagic) & ~longword & himagic) != 0)
```
### Compare: What's different between two library?
#### CASE 1: No Zero in a long size string
We will explain the magic in `strlen()`.
```c=
//uclib
if (((longword - lomagic) & himagic) != 0)
```
##### 1. longword
```graphviz
digraph G{
node [shape = record];
node0 [fontsize=13, label ="0x63|0x63|0x63|0x63|0x63|0x63|0x63|0x63"];
}
```
##### 2. longword - lomagic(0x0101010101010101)
```graphviz
digraph G{
node [shape = record];
node0 [fontsize=13, label ="0x62|0x62|0x62|0x62|0x62|0x62|0x62|0x62"];
}
```
##### 3. (longword - lomagic) & himagic (0x8080808080808080)
```graphviz
digraph G{
node [shape = record];
node0 [fontsize=13, label ="0x62|0x62|0x62|0x62|0x62|0x62|0x62|0x62"];
}
```
```graphviz
digraph G{
node [shape = record];
node0 [fontsize=13, label ="0x80|0x80|0x80|0x80|0x80|0x80|0x80|0x80"];
}
```
---
```graphviz
digraph G{
node [shape = record];
node0 [fontsize=13, label ="0x00|0x00|0x00|0x00|0x00|0x00|0x00|0x00"];
}
```
```c=
//Result:
((longword - lomagic) & ~longword & himagic) == 0
```
#### CASE 2: a Zero in a long size string
##### 1. longword
```graphviz
digraph G{
node [shape = record];
node0 [fontsize=13, label ="0x63|0x00|0x63|0x63|0x63|0x63|0x63|0x63"];
}
```
##### 2. longword - lomagic(0x0101010101010101)
```graphviz
digraph G{
node [shape = record];
node0 [fontsize=13, label ="0x61|0xFF|0x62|0x62|0x62|0x62|0x62|0x62"];
}
```
##### 3. (longword - lomagic) & himagic (0x8080808080808080)
```graphviz
digraph G{
node [shape = record];
node0 [fontsize=13, label ="0x61|0xFF|0x62|0x62|0x62|0x62|0x62|0x62"];
}
```
```graphviz
digraph G{
node [shape = record];
node0 [fontsize=13, label ="0x80|0x80|0x80|0x80|0x80|0x80|0x80|0x80"];
}
```
---
```graphviz
digraph G{
node [shape = record];
node0 [fontsize=13, label ="0x00|0x10|0x00|0x00|0x00|0x00|0x00|0x00"];
}
```
```c=
//Result:
((longword - lomagic) & ~longword & himagic) != 0
```
#### Different between uclib and glib
Is `& ~longword` necessary for `strlen()` ? What does it mean in `strlen()`?
Consider the situation below:
```c=
char myString[] = "AMAZON SDE READ";
```
```graphviz
digraph G{
node [shape = record];
node0 [fontsize=13, label ="A|M|A|Z|O|N| |S"];
node1 [fontsize=13, label ="D|E| |R|E|A|D|-"];
}
```
```graphviz
digraph G{
node [shape = record];
node0 [fontsize=13, label ="0x41|0x4D|0x41|0x5A|0x4F|0x4E|0x20|0x53"];
node1 [fontsize=13, label ="0x44|0x45|0x20|0x52|0x45|0x41|0x44|0x00"];
}
```
Now that I decide to modify the context "O" to "A" in the string AMAZON.
```c=
myString[4] = 'A';
```
However, I accidentally type the wrong one, and now the string has a non-ASCII character. (p.s. Correctly to say, a noraml ASCII character, 0xF1 is an external ASXII character.)
```c=
myString[4] = 0xF1;
```
So now we will get a strange string as below.
```graphviz
digraph G{
node [shape = record];
node0 [fontsize=13, label ="A|M|A|Z|?|N| |S"];
node1 [fontsize=13, label ="D|E| |R|E|A|D|-"];
}
```
```graphviz
digraph G{
node [shape = record];
node0 [fontsize=13, label ="0x41|0x4D|0x41|0x5A|0x41|0xF1|0x20|0x53"];
node1 [fontsize=13, label ="0x44|0x45|0x20|0x52|0x45|0x41|0x44|0x00"];
}
```
In this case, there are two different return for uclib's and glib's strlen function.
##### uclib
1. find the start point.
2. looping to find zero in sub-string.
##### 1. longword
```graphviz
digraph G{
node [shape = record];
node0 [fontsize=13, label ="0x41|0x4D|0x41|0x5A|0xF1|0x4E|0x20|0x53"];
}
```
##### 2. longword - lomagic(0x0101010101010101)
```graphviz
digraph G{
node [shape = record];
node0 [fontsize=13, label ="0x40|0x4C|0x40|0x59|0xF0|0x4D|0x1F|0x52"];
}
```
##### 3. (longword - lomagic) & himagic (0x8080808080808080)
```graphviz
digraph G{
node [shape = record];
node0 [fontsize=13, label ="0x40|0x4C|0x40|0x59|0xF0|0x4D|0x1F|0x52"];
}
```
```graphviz
digraph G{
node [shape = record];
node0 [fontsize=13, label ="0x80|0x80|0x80|0x80|0x80|0x80|0x80|0x80"];
}
```
---
```graphviz
digraph G{
node [shape = record];
node0 [fontsize=13, label ="0x00|0x00|0x00|0x00|0x00|0x00|0x10|0x00"];
}
```
Function `strlen()` in uclib will return true in this case. So the function will check if there is a zero character in the substring. However, glib's `strlen()` will not have to check this because of `& ~longword`. In this case, it will ignore the highest bits in the character.
1. case for normal ASCII string
The lack of `& ~longword` will let uclib's performance better.

2. case for external ASCII string
However, if the string is full with external ASCII, uclib will check if there is zero in looping function. It will have negative impact on proformance.

## 2. C++ strlen
In C++, we have two types of strings:
1. C-style strings
2. std::strings (from the C++ Standard string class)
### How to use C-style strings
Use them in C++ code by including the `<cstring>` header.
```cpp=
#include <iostream>
#include <cstring>
int main() {
char str[] = "This is a C-style string";
std::cout << str << "\n";
std::cout << "string's size: "<< strlen(str) << "\n";
}
```
```
cheyenyu@u49049006de455c:~/Desktop/SDEGoodRead$ g++ -o outcpp strlentest.cpp
cheyenyu@u49049006de455c:~/Desktop/SDEGoodRead$ ./outcpp
This is a C-style string
string's size: 24
```
### How to use std::strings
C-style strings are relatively unsafe – if the string has no 0x00 , it can lead to a whole host of potential bugs.
The `std::string` class that's provided by the C++ Standard Library is a much safer alternative.
```cpp=
#include <iostream>
#include <string>
int main() {
std::string str = "This is a C++ string class";
std::cout << str << "\n";
std::cout << "string's size: "<< str.length() << "\n";
}
```
```
cheyenyu@u49049006de455c:~/Desktop/SDEGoodRead$ g++ -o outcpp strlentest.cpp
cheyenyu@u49049006de455c:~/Desktop/SDEGoodRead$ ./outcpp
This is a C++ string class
string's size: 26
```
String object will return it's private member.(O(1))
```cpp=
/// null-termination.
size_type
length() const _GLIBCXX_NOEXCEPT
{ return _M_string_length; }
```


---
https://www.youtube.com/watch?v=kPR8h4-qZdk&t=53s&ab_channel=CppCon
### std::string

SSO (Small String Optimzation) & CoW (Copy of Write)
```cpp=
class string {
char *start;
size_t size;
static const int kLocalSize = 15;
union{
char buffer[kLocalSize+1];
size_t capacity;
}data;
};
```
**Small String**
```graphviz
digraph G{
node [shape = record];
node0 [fontsize=13, label ="{{-0-|-1-|-2-|-3-|-4-|-5-|-6-|-7-}|char *start|size_t size|{S|D|E|G|O|O|D|R}|{E|A|D|/0|X|X|X|X}}"];
}
```
**Large String**
```graphviz
digraph G{
node [shape = record];
A [fontsize=13, label ="{{-0-|-1-|-2-|-3-|-4-|-5-|-6-|-7-}|<A1>char *start |size_t size|size_t capacity |{unused}}"];
B [fontsize=13, label ="{12|size}|{30|capacity}|{ X|refcnt}|{<B1>SDEGOODREAD|string in heap}|{|}"]
A:A1->B:B1
}
```
### folly:fbstring
folly/FBString.h
```cpp=
struct RefCounted {
std::atomic<size_t> refCount_;
Char data_[1];
static RefCounted * create(size_t * size);
static RefCounted * create(const Char * data, size_t * size);
static void incrementRefs(Char * p);
static void decrementRefs(Char * p);
};
struct MediumLarge {
Char* data_;
size_t size_;
size_t capacity_;
size_t capacity() const {
return kIsLittleEndian ? capacity_ & capacityExtractMask : capacity_ >> 2;
}
void setCapacity(size_t cap, Category cat) {
capacity_ = kIsLittleEndian
? cap | (static_cast<size_t>(cat) << kCategoryShift)
: (cap << 2) | static_cast<size_t>(cat);
}
};
union {
uint8_t bytes_[sizeof(MediumLarge)]; // For accessing the last byte.
Char small_[sizeof(MediumLarge) / sizeof(Char)];
MediumLarge ml_;
};
```
**Small String(1-23)**
```graphviz
digraph G{
node [shape = record];
A [fontsize=13, label ="{{-0-|-1-|-2-|-3-|-4-|-5-|-6-|-7-}|{S|D|E|G|O|O|D|R}|{E|A|D|/0|-|-|-|-}|{-|-|-|-|-|-|-|<star>*}}"];
B [fontsize=13, label ="{{<B1>-0-|-1-|-2-|-3-|-4-|-5-|-6-|<B3>-7-}|{0|<B2>0|-|s|i|z|<B4>e|-}}"]
C [fontsize=13, label ="00 = small string"]
D [fontsize=13, label ="size = 23 - strlen"]
A:star -> B:B1
A:star -> B:B3
B:B2->C
B:B4->D
}
```
**Medium String (24-255)**
```graphviz
digraph G{
node [shape = record];
A [fontsize=13, label ="{{0|1|2|3|4|5|6|7}|<A1>char* data_|size_t size_|size_t capacity_}"];
B [fontsize=13, label ="{<B1>SDEGOODREAD|string in heap}"]
A:A1 -> B:B2
}
```
**Large String(255 up)**
```graphviz
digraph G{
node [shape = record];
A [fontsize=13, label ="{{0|1|2|3|4|5|6|7}|<A1>char* data_|size_t size_|size_t capacity_}"];
B [fontsize=13, label ="{12|size}|{30|capacity}|{ X|refcnt}|{<B1>SDEGOODREAD|string in heap}|{|}"]
A:A1 -> B:B1
}
```
## 3. try ELF -> string lib
```c=
#include <stdio.h>
#include <string.h>
int main(){
char *str = "SDE Good Read";
strlen(str);
return 0;
}
```
```
$ gcc -g -static -o sl2 sl2.c
$ objdump -d -M intel -S sl2
```
## Refer
https://www.lookuptables.com/

