# mini-docker
最近看到docker 又看到了podman
突然想了解一下容器底層到底如何實現,整體步驟參考
[https://coolshell.cn/articles/17010.html](https://coolshell.cn/articles/17010.html)
還蠻清晰的,透過linux namespace最後可以看到用簡單的c語言就可以實現一個container 概念
```c=
#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <sched.h>
#include <signal.h>
#include <unistd.h>
/* 定义一个给 clone 用的栈,栈大小1M */
#define STACK_SIZE (1024 * 1024)
static char container_stack[STACK_SIZE];
char *const container_args[] = {
"/bin/bash",
NULL};
char *const container_args2[] = {
"/usr/sbin/chroot",
NULL};
int pipefd[2];
void set_map(char *file, int inside_id, int outside_id, int len)
{
FILE *mapfd = fopen(file, "w");
if (NULL == mapfd)
{
perror("open file error");
return;
}
fprintf(mapfd, "%d %d %d", inside_id, outside_id, len);
fclose(mapfd);
}
void set_uid_map(pid_t pid, int inside_id, int outside_id, int len)
{
char file[256];
sprintf(file, "/proc/%d/uid_map", pid);
set_map(file, inside_id, outside_id, len);
}
void set_gid_map(pid_t pid, int inside_id, int outside_id, int len)
{
char file[256];
sprintf(file, "/proc/%d/gid_map", pid);
set_map(file, inside_id, outside_id, len);
}
int container_main(void *arg)
{
printf("Container [%5d] - inside the container!\n", getpid());
printf("Container: eUID = %ld; eGID = %ld, UID=%ld, GID=%ld\n",
(long)geteuid(), (long)getegid(), (long)getuid(), (long)getgid());
/* 等待父进程通知后再往下执行(进程间的同步) */
char ch;
close(pipefd[1]);
read(pipefd[0], &ch, 1);
printf("Container [%5d] - setup hostname!\n", getpid());
sethostname("container", 10);
if (chdir("./rootfs") != 0 || chroot("./") != 0)
{
perror("chdir/chroot");
}
mount("proc", "/proc", "proc", 0, NULL);
execv(container_args[0], container_args);
printf("Something's wrong!\n");
return 1;
}
int main()
{
const int gid = getgid(), uid = getuid();
printf("Parent: eUID = %ld; eGID = %ld, UID=%ld, GID=%ld\n",
(long)geteuid(), (long)getegid(), (long)getuid(), (long)getgid());
pipe(pipefd);
printf("Parent [%5d] - start a container!\n", getpid());
/* 启用Mount Namespace - 增加CLONE_NEWNS参数 */
int container_pid = clone(container_main, container_stack + STACK_SIZE,
CLONE_NEWIPC| CLONE_NEWUTS | CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWUSER | SIGCHLD, NULL);
printf("Parent [%5d] - Container [%5d]!\n", getpid(), container_pid);
set_uid_map(container_pid, 0, uid, 1);
set_gid_map(container_pid, 0, gid, 1);
printf("Parent [%5d] - user/group mapping done!\n", getpid());
/* 通知子进程 */
close(pipefd[1]);
waitpid(container_pid, NULL, 0);
printf("Parent - container stopped!\n");
return 0;
}
```
# rootfs
將要放入container 的 執行檔案與所需的 lib 放入 想對應的資料夾內,以這個例子為
```bash
mkdir bin
cp /bin/bash
ldd ./bash
```
![](https://i.imgur.com/HFoLp3i.png)
將這些lib 複製到 rootfs 資料夾下
```bash
linux-vdso.so.1 (0x00007ffcaa9cd000)
libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007f44a2e5c000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f44a2e55000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f44a2c69000)
/lib64/ld-linux-x86-64.so.2 (0x00007f44a3002000)
```
![](https://i.imgur.com/VjnAHzl.png)
當上述程式執行後會以rootfs這個資料夾為根目錄,並限制於這個資料夾。
# cgroup
我們來加點東西,限制容器資源大小
https://www.cnblogs.com/sparkdev/p/8296063.html
http://guildwar23.blogspot.com/2013/01/linux-control-group.html
https://www.cntofu.com/book/114/Cgroups/cgroups1.md
於/sys/fs/cgroup/cpu 創建兩個資料夾
```bash=
mkdir hight
mkdir low
```
![](https://i.imgur.com/GPvHC3J.png)
查看現有核心數量
```bash=
cat /sys/fs/cgroup/cpuset/cpuset.cpus
```
![](https://i.imgur.com/6BsPbXl.png)
wsl 資料夾全部設為最高權限,用了文章中的mkdir 在寫入檔案時會發生權限問題
改用 cgcreate 來創造cgroup
```bash=
chowm 777 -R low
chowm 777 -R hight
chowm 777 -R singlecore
cgcreate -g cpu:low
cgcreate -g cpu:hight
cgcreate -g cpuset:singlecore
```
```bash=
chown root:root -R /sys/fs/cgroup/cpu/low
echo 512 > /sys/fs/cgroup/cpu/low/cpu.shares
chown root:root -R /sys/fs/cgroup/cpu/hight
echo 2048 > /sys/fs/cgroup/cpu/hight/cpu.shares
echo "0-1" > /sys/fs/cgroup/cpuset/low/cpuset.cpus
chown root:root -R /sys/fs/cgroup/cpuset/singlecore
echo 0 > /sys/fs/cgroup/cpuset/singlecore/cpuset.mems
echo 0 > /sys/fs/cgroup/cpuset/singlecore/cpuset.cpus
```
# 設為共用同一個核心
# test file
```c=
#include <stdio.h>
int main(){
int i , end;
end = 1024*1024*1024;
for(i = 0 ; i < end;)
{
//++;
}
}
```
根據前面的權重 512:2048
1:4 在同一顆cpu 裡面 20232 佔80 ,20231 佔 20
```
echo 20231 >/sys/fs/cgroup/cpuset/singlecore/tasks
echo 20232 >/sys/fs/cgroup/cpuset/singlecore/tasks
cgdelete cpu:/low
cgdelete cpu:/hight
```
![](https://i.imgur.com/QpPrwgS.png)
![](https://i.imgur.com/zY426gQ.png)
# 調整memory
# test2 file
```c=
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include <unistd.h>
#define CHUNK_SIZE 1024 * 1024 * 1
void main()
{
char *p;
int i;
for(i = 0; i < 100; i ++)
{
p = malloc(sizeof(char) * CHUNK_SIZE);
if(p == NULL)
{
printf("fail to malloc!");
return ;
}
sleep(1); // 1s
// memset() 函数用来将指定内存的前 n 个字节设置为特定的值
memset(p, 0, CHUNK_SIZE);
printf("malloc memory %d MB\n", (i + 1) * 1);
}
}
```
調整記憶體上限為10M Bytes
```
mkdir /sys/fs/cgroup/memory/mymemory
chown -R root:root /sys/fs/cgroup/memory/mymemory
記憶體限制 10mb
echo 10000000 > /sys/fs/cgroup/memory/mymemory/memory.limit_in_bytes
關閉swap
echo 0 > /sys/fs/cgroup/memory/mymemory/memory.swappiness
echo 20807 > /sys/fs/cgroup/memory/mymemory/tasks
```
![](https://i.imgur.com/0sbgs23.png)
再往上一層 ,可以看到 bash 只要是child process 這樣就可以大致算一個cgroup
![](https://i.imgur.com/V6z8gnW.png)
關閉 swap 可以看到 process 直接被 kill
到這邊就可以透過 cgroup 分別控管容器。
![](https://i.imgur.com/CWUqOo7.png)
# 其他參考
## 透過 systemd 控制 cgroup
https://www.cnblogs.com/sparkdev/p/9523194.html
```bash=
ps --ppid 20203
apt install cgroup-tools
```
創建臨時cgroup
```bash=
sudo systemd-run --unit=toptest --slice=test top -b
```
```bash=
ls \-l /proc/?/ns
```
可以看到它會自動產生一些配置
https://www.waynerv.com/posts/container-fundamentals-resource-limitation-using-cgroups/
https://www.cntofu.com/book/46/linux_system/pipehe_fifo.md
https://coolshell.cn/articles/17010.html