LinuxOS Project1
=
###### tags: `Course - Linux OS`
## 小組名單 - 第20組
108502532 丁麒源
108502533 廖宥霖
108502530 曹鈞翔
## 補交內容
### SYSCALL_DEFINEx 與 asmlinkage
SYSCALL_DEFINEx 與 asmlinkage都是讓function call與底層溝通的syscall
asmlinkage代表 此函式需遵從 C 語言在此平台下的呼叫慣例,例如x86-32核心中,c function就會從stack而不是register取參數
而SYSCALL_DEFINEx則包含了一個重要的功能,就是讓64 位元的kernel中,32 位元長的值有正確的符號擴展(sign-extended),這是為了防止Linux CVE-2009-0029漏洞被利用。
## Target

- Text (code) Segment: `start_code`(低), `end_code`(高)
- Data Segment : `start_data`(低), `end_data`(高)
- Bss Segment : `end_data`之上
- Heap Segment : `start_brk`(低), `brk`(高)
- Stack Segment : `start_stack`(高),
- mmap Segment(share libraries, thread stack...) : `mmap_base`(高)
## ==執行結果==

* ?代表不確定該處是否存在segment分界
```
thread: main (pid 2707)
virtual addresses of variables:
<code()>: '0x555555555249'
<data>: '0x555555558010'
<bss>: '0x555555558018'
<heap>: '0x5555555592a0'
<mmmap>: '0x7ffff7ffa000'
<stack>: '0x7fffffffddb2'
virtual addresses of segment pointers:
Code Segment:
<start_code>: '0x555555555000'
<end_code>: '0x555555555629'
Data Segment:
<start_data>: '0x555555557d78'
<end_data>: '0x555555558015'
Heap Segment:
<start_brk>: '0x555555559000'
<brk>: '0x55555557a000'
Stack Segment:
<start_stack>: '0x7fffffffdf00'
mmap Segment:
<mmap_base>: '0x7ffff7fff000'
===========================================
thread: t1 (pid 2708)
virtual addresses of variables:
<code()>: '0x555555555249'
<data>: '0x555555558010'
<bss>: '0x555555558018'
<heap>: '0x7ffff0000b70'
<mmmap>: '0x7ffff7fbb000'
<stack>: '0x7ffff7d88e42'
virtual addresses of segment pointers:
Code Segment:
<start_code>: '0x555555555000'
<end_code>: '0x555555555629'
Data Segment:
<start_data>: '0x555555557d78'
<end_data>: '0x555555558015'
Heap Segment:
<start_brk>: '0x555555559000'
<brk>: '0x55555557a000'
Stack Segment:
<start_stack>: '0x7fffffffdf00'
mmap Segment:
<mmap_base>: '0x7ffff7fff000'
===========================================
thread: t2 (pid 2709)
virtual addresses of variables:
<code()>: '0x555555555249'
<data>: '0x555555558010'
<bss>: '0x555555558018'
<heap>: '0x7fffe8000b70'
<mmmap>: '0x7ffff7fba000'
<stack>: '0x7ffff7587e42'
virtual addresses of segment pointers:
Code Segment:
<start_code>: '0x555555555000'
<end_code>: '0x555555555629'
Data Segment:
<start_data>: '0x555555557d78'
<end_data>: '0x555555558015'
Heap Segment:
<start_brk>: '0x555555559000'
<brk>: '0x55555557a000'
Stack Segment:
<start_stack>: '0x7fffffffdf00'
mmap Segment:
<mmap_base>: '0x7ffff7fff000'
===========================================
```
### Variables' address
`code` : __all the same, between `start_code` and `end_code`__
`data` : __all the same, between `start_data` and `end_data`__
`bss ` : __all the same, above `end_data`__
`heap` : __3 different address, only main heap between `start_brk` and `brk`, the others below `mmap_base`__
`mmmap`: __3 different address, all below `mmap_base`__
`stack`: __3 different address, only main stack above `mmap_base`, the others below `mmap_base`__
### Segment pointers' address
:::info
listed from low address to high address
:::
Code segment:
- `start_code`: __all the same__
- `end_code`: __all the same__
Data segment:
- `start_data`: __all the same__
- `end_data`: __all the same__
Heap segment:
- `start_brk`: __all the same__
- `brk`: __all the same__
mmap segment:
- `mmap_base`: __all the same__
Stack segment:
- `start_stack`: __all the same__
### 說明&分析
:::success
From above result, we can know that threads share the code, data, BSS, heap segments and mmap segment. But they each has their own stack.
:::
***1. Why the results from variable `stack` and `start_stack` are different?***
- Because `start_stack` is pointer to the main thread's stack segment, and `stack` is ponter to address in each thread's stack segment. The thread's stack segment is in the mmap segment.
***2. Address is distribed the same as the following image***
- **increasing** in the order of **code->data->Bss->heap->mmap->stack**

***3. `task_struct & mm_struct`***

- Linux processes are implemented in the kernel as instances of task_struct, the process descriptor.
- The mm field in task_struct points to the memory descriptor, mm_struct, which is a summary of a program's memory.
- Within the memory descriptor we can find the set of virtual memory areas and the page tables.
- We can find each segment's start and end virtual address using virtual memory area.
***4. `mmap()` (user mode function)***
- The dynamic linker uses mmap() *with NULL as the addr argument to load pages of the shared library.
The address that the library gets mapped to depends on factors such as:
- mmap_base (below stack by default on x86-64, with 28 bits of randomness):
see `arch/x86/mm/mmap.c`
- the size of the file and previous mappings
- Stack space for a new thread is also created by the parent thread with mmap(MAP_ANONYMOUS|MAP_STACK).
So they're in the "memory map segment", as we see in our result.
- When a thread use malloc(), OS will use mmap() to allocate memory to it. As we see in heap viriable's pointer.
***5. `current` (kernel mode function)***
/include/asm-generic/current.h

/arch/alpha/include/asm/thread_info.h

## Kernel and OS version
:::success
kernel: Linux-5.15.68
OS: Ubuntu-22.04 64bit (WSL)
:::
## Add new system call and Compile kernel
### Download WSL kernel
```bash=
wget https://github.com/microsoft/WSL2-Linux-Kernel/archive/refs/tags/rolling-lts/wsl/5.15.68.1.tar.gz
tar -xvf WSL2-Linux-Kernel-rolling-lts-wsl-5.15.68.1.tar.gz -C ~/
```
### Define new system call
```bash=
cd WSL2-Linux-Kernel
mkdir get_segment_info
nano get_segment_info/get_segment_info.c
```
### ==Kernel space code==
#### `get_segment_info.c`
```c=
#include <linux/types.h>
#include <linux/syscalls.h>
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/ptrace.h>
#include <linux/thread_info.h>
#include <asm/current.h>
struct segment_info
{
unsigned long start_code, end_code;
unsigned long start_data, end_data;
unsigned long start_brk, brk;
unsigned long start_stack;
unsigned long mmap_base;
};
SYSCALL_DEFINE1(get_segment_info, void *, dsi)
{
struct segment_info si = {
.start_code = current->mm->start_code,
.end_code = current->mm->end_code,
.start_data = current->mm->start_data,
.end_data = current->mm->end_data,
.start_brk = current->mm->start_brk,
.brk = current->mm->brk,
.start_stack = current->mm->start_stack,
.mmap_base = current->mm->mmap_base,
};
if (copy_to_user((struct segment_info *)dsi, &si, sizeof(si)))
return -1;
return current->pid;
}
```
### Create Makefile for your system call
```bash=
nano get_segment_info/Makefile
```
write the following code
```Makefile=
obj-y := get_segment_info.o
```
### Add the home directory of your system call to the main Makefile of the kernel
```bash=
nano Makefile
```
Search for `core-y`. In the second result, you will see a series of directories.
```
kernel/ certs/ mm/ fs/ ipc/ security/ crypto/ block/
```
add ` get_segment_info/` in the end of it
```Makefile=
kernel/ certs/ mm/ fs/ ipc/ security/ crypto/ block/ get_segment_info/
```
### Add a corresponding function prototype for your system call to the header file of system calls
```bash=
nano include/linux/syscalls.h
```
Navigate to the bottom of it and write the following code just above `#endif`.
```c=
asmlinkage long sys_get_segment_info(void *);
```
### Add new system call to kernel's system call table
```
nano arch/x86/entry/syscalls/syscall_64.tbl
```
Add the following code at the end 0f 64-bit system calls (use tab for spacing).
```
449 common get_segment_info sys_get_segment_info
```
### Compile and change WSL kernel
```bash=
sudo apt install build-essential flex bison dwarves libssl-dev libelf-dev
make -j$(nproc) KCONFIG_CONFIG=Microsoft/config-wsl
```
Keep pressing enter to use default config.
After compiling, return to host and type the following text in `%USERPROFILE%\.wslconfig` (remember to move the image file to host's directory first):
```
[wsl2]
kernel=<path_of_vmlinux>
```
Then reboot WSL.
```powershell=
wsl --shutdown && sleep 10 && wsl
```
## ==User space code==
### `main.c`
```c=
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/syscall.h>
#include <pthread.h>
#define __NR_get_segment_info 449
struct segment_info
{
unsigned long start_code, end_code;
unsigned long start_data, end_data;
unsigned long start_brk, brk;
unsigned long start_stack;
unsigned long mmap_base;
};
char isT1Done = 0;
char isT2Done = 0;
char *code() { return "code"; }
void print_segment_info(char *thread_name)
{
if (thread_name[1] == '2')
{
while (!isT1Done)
;
}
static char data[5] = "data";
static char bss[4];
char *heap = (char *)malloc(sizeof(char) * 5);
char *mmmap = mmap(NULL, 5 * sizeof(char), PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
char stack[6] = "stack";
struct segment_info si;
printf("thread: %s (pid %d)\n", thread_name, syscall(__NR_get_segment_info, &si));
printf("virtual addresses of variables:\n");
printf("<code()>:\t'%p'\n", code);
printf("<data>:\t\t'%p'\n", data);
printf("<bss>:\t\t'%p'\n", bss);
printf("<heap>:\t\t'%p'\n", heap);
printf("<mmmap>:\t'%p'\n", mmmap);
printf("<stack>:\t'%p'\n", stack);
printf("virtual addresses of segment pointers:\n");
printf("Code Segment:\n");
printf("<start_code>:\t'%p'\n", si.start_code);
printf("<end_code>:\t'%p'\n", si.end_code);
printf("Data Segment:\n");
printf("<start_data>:\t'%p'\n", si.start_data);
printf("<end_data>:\t'%p'\n", si.end_data);
printf("Heap Segment:\n");
printf("<start_brk>:\t'%p'\n", si.start_brk);
printf("<brk>:\t\t'%p'\n", si.brk);
printf("Stack Segment:\n");
printf("<start_stack>:\t'%p'\n", si.start_stack);
printf("mmap Segment:\n");
printf("<mmap_base>:\t'%p'\n", si.mmap_base);
printf("===========================================\n");
if (thread_name[1] == '1')
{
isT1Done = 1;
while (!isT2Done)
;
}
else if (thread_name[1] == '2')
{
isT2Done = 1;
}
free(heap);
}
int main()
{
print_segment_info("main");
pthread_t t1, t2;
pthread_create(&t1, NULL, print_segment_info, "t1");
pthread_create(&t2, NULL, print_segment_info, "t2");
pthread_join(t1, NULL);
pthread_join(t2, NULL);
}
```
## 參考資料
Add new system call:
https://dev.to/jasper/adding-a-system-call-to-the-linux-kernel-5-8-1-in-ubuntu-20-04-lts-2ga8
Replacing the WSL Kernel:
https://blog.dan.drown.org/replacing-the-wsl-kernel/
problems and solutions:
https://blog.csdn.net/qq_36393978/article/details/124274364
https://blog.csdn.net/m0_48958478/article/details/121620449
https://blog.csdn.net/bby1987/article/details/104264285
Others:
奔跑吧 CH 3.1 進程的誕生
https://hackmd.io/@PIFOPlfSS3W_CehLxS3hBQ/S14tx4MqP?fbclid=IwAR2yZqy4A92NegOZUj5frnDXKr_8XG_Y8mXJZXq5lboJG2S3OodFTSmACXg
OS Process & Thread (user/kernel) 筆記
https://medium.com/@yovan/os-process-thread-user-kernel-%E7%AD%86%E8%A8%98-aa6e04d35002
linux 内存管理(8) —内存描述符(mm_struct)
https://blog.csdn.net/weixin_41028621/article/details/104455327
Linux进程地址管理之mm_struct
https://www.cnblogs.com/rofael/archive/2013/04/13/3019153.html
Do memory mapping segment and heap grow until they meet each other?
https://unix.stackexchange.com/questions/466443/do-memory-mapping-segment-and-heap-grow-until-they-meet-each-other
mmap()
https://unix.stackexchange.com/questions/466443/do-memory-mapping-segment-and-heap-grow-until-they-meet-each-other
https://ithelp.ithome.com.tw/articles/10187260?sc=rss.iron
https://www.quora.com/What-factors-decides-the-load-address-of-shared-libraries-like-libc
setvbut()
https://stackoverflow.com/questions/5876373/using-setvbuf-with-stdin-stream
How The Kernel Manages Your Memory
https://manybutfinite.com/post/how-the-kernel-manages-your-memory/
malloc() in multi-thread program
https://hackmd.io/@ljP_AG30SzmQE5qO-cjcpQ/HkICAjeJg?type=view#%E8%83%8C%E6%99%AF%E7%9F%A5%E8%AD%98%E8%88%87gblic-Malloc%E5%8E%9F%E7%90%86
系統呼叫(system call)的剖析(上)
https://alittleresearcher.blogspot.com/2015/02/anatomy-of-a-system-call-part-1.html
什麼是 "asmlinkage"?
https://www.jollen.org/blog/2006/10/_asmlinkage.html
Linux CVE-2009-0029 漏洞解析
https://blog.csdn.net/hxmhyp/article/details/22619729
Linux 筆記 3
https://hackmd.io/@combo-tw/Linux-%E8%AE%80%E6%9B%B8%E6%9C%83/%2F%40a29654068%2FHyD4Lu_Dr