--- title: ByteWise - Declaring a String tags: bytewise, c, string, string literal --- # Declaring a String Let me start with some codes. This is a classic leetcode problem, [20. Valid Parentheses](https://leetcode.com/problems/valid-parentheses/). The logic of this solution is perfect, it reuses the same chunk of memory for the stack used for checking the closing parentheses. But that is not what we will be focusing on in this article. Take some time and see if what you can find the problem. :::spoiler :speech_balloon: **Hint** It complains **`SEGFAULT`** and `GDB` tells us it happens at **line 7**. ::: ```c= bool isValid(char *s) { char *reuse_ptr = s; for (char *iter=s; *iter; iter++) { switch (*iter) { case '(': *reuse_ptr = ')'; reuse_ptr++; continue; case '[': *reuse_ptr = ']'; reuse_ptr++; continue; case '{': *reuse_ptr = '}'; reuse_ptr++; continue; default: { if (0 == reuse_ptr-s) return false; char expected = *--reuse_ptr; if (*iter != expected) return false; } } } return 0 == reuse_ptr-s; } int main(void) { char* test_case = "(){[]}"; assert(isValid(test_case)); return 0; } ``` Apparently, the **`SEGFAULT`** goes away when I change the **pointer notatoin** to the **array subscripting notation** when declaring a string. ***But why*** :question: ::: warning :poop: Always look into the first-hand material before doing trial-and-error experiments. ::: ```diff int main(void) { -- char* test_case = "(){[]}"; ++ char test_case[] = "(){[]}"; assert(isValid(test_case)); return 0; } ``` ## Undefined Bahavior First thing first, let us look into the C standard. We found that modifying the content of `char* test_case` is an ***undefined behavior.*** > #### § 6.7.9 32.) EXAMPLE 8 The declaration > ```c > char s[] = "abc", t[3] = "abc"; > ``` > defines "plain" `char` array objects `s` and `t` whose elements are initialized with character string literals. This declaration is identical to > ```c > char s[] = { 'a', 'b', 'c', '\0' }, t[] = { 'a', 'b', 'c' }; > ``` > The contents of the arrays are modifiable. On the other hand, the declaration > ```c > char *p = "abc"; > ``` > defines `p` with type "pointer to `char`" and initializes it to point to an object with type "array of `char`" with length 4 whose elements are initialized with a character string literal. If an attempt is made to use `p` to modify the contents of the array, **the behavior is undefined**. :question: ***Could this be the cause? But it says undefined behavior not segmentation fault.*** ## What is a `SEGFAULT` `SEGFAULT` or segmentation fault is defined[^1] > In computing, a **segmentation fault** (often shortened to **segfault**) or **access violation** is a fault, or failure condition, **raised by hardware with memory protection**, **notifying an operating system (OS)** the software has attempted to access a restricted area of memory (a memory access violation). On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process (a program crash), and sometimes a core dump. > > [...] > #### Writing to read-only memory > Writing to **read-only memory** raises a segmentation fault. At the level of code errors, this occurs when the program writes to part of its own code segment or the read-only portion of the data segment, as these are loaded by the OS into read-only memory. ### Make use of the `SEGFAULT` Up to this point, we know we can make use of this feature to store data that we don't want it to be mutated in a **read-only** segment. When someone try to modify the content, it raises `SEGFAULT`. It turns out I unintentionally use `char* test_case` to make the string **read-only**. But still, this is a quess. Let me verify it with a simple code shown below ```c char test_case_0[] = "()"; // pointer notation char *test_case_00 = "()"; // array subscripting notation ``` Go into the `GDB` and print the **address of** their storage. ```shell gef➤ p &*test_case_0 $0 = 0x7fffffffda12 "()" gef➤ p &*test_case_00 $1 = 0x555555556008 "()" ``` Use `vmmap` command to see the **layout** of the memory and the **permission** of each segment. ```shell gef➤ vmmap Start End Offset Perm Path 0x00555555556000 0x00555555557000 0x00000000002000 r-- [executable binary] [...] 0x007ffffffdd000 0x007ffffffff000 0x00000000000000 rw- [stack] ``` Now we finally conclude - `char* test_case` store the data in the `.rodata` section of the executable file, the same segment where the `.text` section gets dumped, which has **read** and **execute** permissions, but **not write**. - `char test_case[]` store the data in the **stack**. [^1]: Wikipedia, "Segmentation fault", https://en.wikipedia.org/wiki/Segmentation_fault [^_]: C For Dummies Blog, "Declaring a String Literal as a Pointer", https://c-for-dummies.com/blog/?p=3475