# /dev/mem research
contributed by < `jhan1998` >
## Environment Settings
Operating Environment Information
```
OS: Ubuntu 20.04.2 LTS
Kernel Version: 5.4.0-72-generic
Memory: 15 G
CPU: Intel® Core™ i7-4770HQ CPU @ 2.20GHz × 8
```
First of all, we can reserve the memory from being managed by the kernel management system by modifying the startup parameters in `/etc/default/grub`.
In [/dev/mem](https://hackmd.io/@sysprog/linux-mem-device#Linux-%E6%A0%B8%E5%BF%83%E7%9A%84-devmem-%E8%A3%9D%E7%BD%AE) The method mentioned in the article is to add `mem=14G` to `GRUB_CMDLINE_LINUX_DEFAULT=""` and then execute `sudo update-grub`. After restarting, 15G - 14G = 1G will be reserved memory down.
**Before Setting**
```bash
$ free
total used free shared buff/cache available
Mem: 16270944 2263540 11458908 1256396 2548496 12436764
Swap: 1999868 0 1999868
$ sudo cat /proc/iomem | grep RAM
[sudo] password for jhan1998:
00001000-00057fff : System RAM
00059000-0009ffff : System RAM
00100000-6650b80f : System RAM
6650b810-6650bcd2 : System RAM
6650bcd3-78d00fff : System RAM
78d49000-78d5cfff : System RAM
78d8f000-78e39fff : System RAM
78e8f000-78ed3fff : System RAM
78eff000-78f84fff : System RAM
78fdf000-78ffffff : System RAM
100000000-47f5fffff : System RAM
47f600000-47fffffff : RAM buffer
```
**After Setting**
```bash
$ free
total used free shared buff/cache available
Mem: 12152416 2019588 8262100 683884 1870728 9161272
Swap: 1999868 0 1999868
$ sudo cat /proc/iomem | grep RAM
[sudo] password for jhan1998:
00001000-00057fff : System RAM
00059000-0009ffff : System RAM
00100000-6650b80f : System RAM
6650b810-6650bcd2 : System RAM
6650bcd3-78d00fff : System RAM
78d49000-78d5cfff : System RAM
78d8f000-78e39fff : System RAM
78e8f000-78ed3fff : System RAM
78eff000-78f84fff : System RAM
78fdf000-78ffffff : System RAM
100000000-37fffffff : System RAM
```
After comparison, it can be found that the displayed memory space is indeed reduced, because the reserved memory will not be recorded in any statistics of the core, but the space lost is more than 1 GB.
We can find out the clues from the mapping table of the address space. It can be seen that when `mem=14G` is not set at the beginning, the address space to which the memory is mapped will be `0x47fffffff`, and after `mem=14G` is set, it will become `0x37fffffff`.
After a simple conversion here:
$$
0x47fffffff = 18G - 1 \\
0x37fffffff = 14G - 1
$$
We can know that `mem=` is set here is that the highest address space segment that can be mapped to is `18G`, so when we set `mem=14G`, it will only be mapped to a segment of 14G at most, just behind the segment The segment is the address space mapped to the memory storage device, so `15G - (18G - 14G) = 11G` Therefore, the free command will be executed to see that the available memory is 11G, so we can set `mem=17G` To accurately reserve the last 1G memory for operation.
**After Setting**
```bash
$ free
total used free shared buff/cache available
Mem: 15248992 607292 13567272 405288 1074428 13958960
Swap: 1999868 0 1999868
$ free -g
total used free shared buff/cache available
Mem: 14 0 12 0 1 13
Swap: 1 0 1
$ sudo cat /proc/iomem | grep RAM
[sudo] password for jhan1998:
00001000-00057fff : System RAM
00059000-0009ffff : System RAM
00100000-6650b80f : System RAM
6650b810-6650bcd2 : System RAM
6650bcd3-78d00fff : System RAM
78d49000-78d5cfff : System RAM
78d8f000-78e39fff : System RAM
78e8f000-78ed3fff : System RAM
78eff000-78f84fff : System RAM
78fdf000-78ffffff : System RAM
100000000-43fffffff : System RAM
```
Here `0x43fffffff = 17G - 1`, `0x440000000` to `0x47fffffff` are our reserved sections.
## Use crash to map the memory reserved by the system
First of all, it took me a lot of time to run the crash. To run the crash, you need to use the debug symbol. The debug symbol of ubuntu needs to be in accordance with [Debug symbol package](https://wiki.ubuntu.com/Debug%20Symbol%20Packages) The instructions of `Getting -dbgsym.ddeb packages` set `/etc/apt/sources.list.d/ddebs.list` and then follow [Getting Kernel Symbols/Sources on Ubuntu Linux](https://sysprogs.com/VisualKernel/ tutorials/setup/ubuntu/) Download the debug symbol of the corresponding version.
After downloading, we can test run to see the crash
```bash
sudo crash /home/jhan1998/modules/boot/vmlinux-5.4.0-73-generic /dev/mem
crash 7.2.8
...
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
```
We can test to see the difference between our reserved memory and unreserved memory.
```bash
crash> rd -p 0x43ffffff1
43ffffff1: 5000000030000000...0...P
crash> rd -p 0x440000000
rd: seek error: physical address: 440000000 type: "64-bit PHYSADDR"
```
Here, since the memory after 0x440000000 is the paging table that we reserved without mapping, there is no way to read it, but 0x43ffffff1 can read the content inside.
Next we can use mmap to use the reserved space:
```c
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
unsigned char *addr;
int fd;
fd = open("/dev/mem",O_RDWR);
if (fd < 0){
printf("device file open error !\n");
return 0;
}
addr = mmap(0,4096,PROT_READ|PROT_WRITE,MAP_SHARED,fd,0x440000000);
printf("addr = %p \n", addr);
*(volatile unsigned int *)(addr + 0x00) = 0x1; // 0x440000000,令其值為1
*(volatile unsigned int *)(addr + 0x04) = 0x9; // 0x440000004,令其值為9
printf("the address is %p, and the value is %d\n", addr + 0x00, *(addr + 0x00));
printf("the address is %p, and the value is %d\n", addr + 0x04, *(addr + 0x04));
system("read -p 'Press Enter to continue...' var");
munmap(addr,4096);
close(fd);
return 0;
}
```
From the output results, we can know the location of virtual memory.
```bash
$ sudo ./test
addr = 0x7f9828a97000
the address is 0x7f9828a97000, and the value is 1
the address is 0x7f9828a97004, and the value is 9
Press Enter to continue...
```
Next, we use crash to view the corresponding entity address.
```
crash> vtop 0x7f9828a97000
VIRTUAL PHYSICAL
7f9828a97000 (not accessible)
```
At the beginning, when you want to check the corresponding entity address, you can't check it because there is no mapping table.
We can use the `set` command to get the current context so that we can view the corresponding entity address.
```
crash> ps | grep test
21073 21072 2 ffff96885eec8000 IN 0.0 2492 1436 test
crash> set 21073
PID: 21073
COMMAND: "test"
TASK: ffff96885eec8000 [THREAD_INFO: ffff96885eec8000]
CPU: 2
STATE: TASK_INTERRUPTIBLE
crash> vtop 0x7f9828a97000
VIRTUAL PHYSICAL
7f9828a97000 440000000
PGD: 1e2da07f8 => 80000003cb2a4067
PUD: 3cb2a4300 => 27b99e067
PMD: 27b99ea28 => 3bcd98067
PTE: 3bcd984b8 => 8000000440000267
PAGE: 440000000
PTE PHYSICAL FLAGS
8000000440000267 440000000 (PRESENT|RW|USER|ACCESSED|DIRTY|NX)
VMA START END FLAGS FILE
ffff9687bb915450 7f9828a97000 7f9828a98000 d0444bb /dev/mem
```
It can be seen that `0x440000000` is the physical address we reserved.
:::info
:bell: Then re-run `rd` to try to print the content of the entity address, but it still fails
```
crash> rd 0x7f9828a97000
rd: seek error: user virtual address: 7f9828a97000 type: "64-bit UVADDR"
```
According to the error message, I speculate that it may be because the crash is a kernel core
For analysis tools, the reserved memory will be mapped to the user space and use the memory space of the process, so it cannot be read by crash.
:::
## Use crash to observe Five-level page tables
According to the description in [Five-level page tables](https://lwn.net/Articles/717293/), we can know that the Linux MMU under the current x86_64 architecture will convert the virtua address into a physical address with a 5-level page table.
![](https://i.imgur.com/CMz0h48.png)
It should be noted that the virtual address only has 48 bits instead of 64 bits, and the top 16 bits will be discarded, because 48 bits can already map a large enough 256 TB. At the beginning of conversion, we can query the location of `page global directory (PGD)` from mm_struct in task_struct, and then find the corresponding index according to the top 9 bits (bits 39-47) of the virtual address to get `page upper directory ( PUD)` location is also based on the virtual address (bits 30 - 38) to find the corresponding `page middle middle directory (PMD)` location, and then find `page table entry (PTE)` by analogy and finally use the most The following 12 bits of offset find the address we want, and the conversion is completed.
Then we can execute the following program and use crash to observe the mechanism of [Five-level page tables](https://lwn.net/Articles/717293/).
```c
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
unsigned char *addr;
int fd;
fd = open("/dev/mem",O_RDWR);
if (fd < 0){
printf("device file open error !\n");
return 0;
}
addr = mmap(0,4096,PROT_READ|PROT_WRITE,MAP_SHARED,fd,0x440000000);
printf("addr = %p \n", addr);
*(volatile unsigned int *)(addr + 0x00) = 0x1; // 0x440000000,令其值為1
*(volatile unsigned int *)(addr + 0x04) = 0x9; // 0x440000004,令其值為9
printf("the address is %p, and the value is %d\n", addr + 0x00, *(addr + 0x00));
printf("the address is %p, and the value is %d\n", addr + 0x04, *(addr + 0x04));
printf("PGD index = 0x%llx\n", ((unsigned long long int)addr >> 39) & 0x1ff);
printf("PUD index = 0x%llx\n", ((unsigned long long int)addr >> 30) & 0x1ff);
printf("PMD index = 0x%llx\n", ((unsigned long long int)addr >> 21) & 0x1ff);
printf("PTE index = 0x%llx\n", ((unsigned long long int)addr >> 12) & 0x1ff);
pause();
munmap(addr,4096);
close(fd);
return 0;
}
```
The resulting output is:
```bash
$ sudo ./test
addr = 0x7eff0939a000
the address is 0x7eff0939a000, and the value is 1
the address is 0x7eff0939a004, and the value is 9
PGD index = 0xfd
PUD index = 0x1fc
PMD index = 0x49
PTE index = 0x19a
```
Then use crash to observe:
Find out where the PGD is first
```bash
crash> ps | grep test
9298 9296 6 ffff889ffe01dd00 IN 0.0 2492 1540 test
crash> set 9298
PID: 9298
COMMAND: "test"
TASK: ffff889ffe01dd00 [THREAD_INFO: ffff889ffe01dd00]
CPU: 6
STATE: TASK_INTERRUPTIBLE
crash> px ((struct task_struct*)0xffff889ffe01dd00)->mm->pgd
$1 = (pgd_t *) 0xffff889e7894a000
```
Among them, `$1 = (pgd_t *) 0xffff889e7894a000` Since it is returning a virtual address, we also need to use vtop to convert it to a physical address.
```bash
crash> vtop 0xffff889e7894a000
VIRTUAL PHYSICAL
ffff889e7894a000 27894a000
PGD DIRECTORY: ffffffffb3c0a000
PAGE DIRECTORY: 38b801067
PUD: 38b8013c8 => 278919063
PMD: 278919e20 => 2789ee063
PTE: 2789eea50 => 800000027894a063
PAGE: 27894a000
PTE PHYSICAL FLAGS
800000027894a063 27894a000 (PRESENT|RW|ACCESSED|DIRTY|NX)
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffce5b49e25280 27894a000 0 ffff88a02a767b40 1 17ffffc0000000
```
It can be obtained that the starting position of PGD is 0x27894a000, and then use the original virtual address of vtop to verify whether the corresponding index is correct.
```bash
crash> vtop 0x7eff0939a000
VIRTUAL PHYSICAL
7eff0939a000 440000000
PGD: 27894a7e8 => 8000000321be3067
PUD: 321be3fe0 => 321be5067
PMD: 321be5248 => 3c8c0a067
PTE: 3c8c0acd0 => 8000000440000267
PAGE: 440000000
PTE PHYSICAL FLAGS
8000000440000267 440000000 (PRESENT|RW|USER|ACCESSED|DIRTY|NX)
VMA START END FLAGS FILE
ffff889f1919d2b0 7eff0939a000 7eff0939b000 d0444bb /dev/mem
```
Here I found that the PGD index is not the same as what I calculated. The index displayed here is the result of my calculation shifted to the left by three bits `0x7e8 = 0xfd << 3`.
There are definitions of pgd_index and pgd_offset in [linux/include/linux/pgtable.h](https://github.com/torvalds/linux/blob/master/include/linux/pgtable.h).
```c
#ifndef pgd_index
/* Must be a compile-time constant, so implement it as a macro */
#define pgd_index(a) (((a) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))
#endif
static inline pgd_t *pgd_offset_pgd(pgd_t *pgd, unsigned long address)
{
return (pgd + pgd_index(address));
};
/*
* a shortcut to get a pgd_t in a given mm
*/
#ifndef pgd_offset
#define pgd_offset(mm, address) pgd_offset_pgd((mm)->pgd, (address))
#endif
```
[linux/arch/x86/include/asm/pgtable_64.h
](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/pgtable_64.h)
```c
#define PGDIR_SHIFT 39
#define PTRS_PER_PGD 512
```
But there is still no way to explain why it is shifted to the left by 3 bits, so this is still to be verified.
Back to crash, we can read the obtained PGB position and get the value of PUD.
```bash
crash> rd -p 27894a7e8
27894a7e8: 8000000321be3067 g0.!....
```
It can be directly regarded as `0x321be3067`, and the following 12 bits are flags bits, so we can know that the starting position of the PUD is `0x321be3000`, and after adding the shifted index, read the starting position of the PMD.
**`0x1fc << 3 = 0xfe0`**
```bash
crash> rd -p 0x321be3fe0
321be3fe0: 0000000321be5067 gP.!....
```
And so on to get PTE.
**`0x49 << 3 = 0x248`**
```bash
crash> rd -p 0x321be5248
321be5248: 00000003c8c0a067 g.......
```
Get the page we want.
**`0x19a << 3 = 0xcd0`**
```bash
crash> rd -p 0x3c8c0acd0
3c8c0acd0: 8000000440000267 g..@....
```
Adding the last 12 bits offset of the virtual address is our physical address `0x440000000`.
**Verify with vtop:**
```bash
crash> vtop 0x7eff0939a000
VIRTUAL PHYSICAL
7eff0939a000 440000000
PGD: 27894a7e8 => 8000000321be3067
PUD: 321be3fe0 => 321be5067
PMD: 321be5248 => 3c8c0a067
PTE: 3c8c0acd0 => 8000000440000267
PAGE: 440000000
PTE PHYSICAL FLAGS
8000000440000267 440000000 (PRESENT|RW|USER|ACCESSED|DIRTY|NX)
VMA START END FLAGS FILE
ffff889f1919d2b0 7eff0939a000 7eff0939b000 d0444bb /dev/mem
```
:::warning
:question: Why does the index have to be shifted to the left by three bits and then added to the starting position of the table? The current guess may be related to Big-Endian and Little-Endian.
:::
## Page exchange between Processes
> There is a need:
We don't want process_A and process_B to share any paging, which means they cannot operate on the same data at the same time.
But occasionally we also want process_A and process_B to exchange information, but we don't want to use the inefficient traditional inter-process communication mechanism.
After understanding the mechanism of [Five-level page tables](https://lwn.net/Articles/717293/), we can use crash to modify the operation of reserved memory and exchange pages between processes, which is not like It is share memory to give a piece of memory to share information between processes, and to manually modify `/dev/mem` to achieve information exchange between two processes.
**master.c**
```c
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>
#include <fcntl.h>
int main(int argc, char **argv)
{
int fd;
unsigned long *addr;
fd = open("/dev/mem", O_RDWR);
// 建立一個分頁 P1 映射到保留記憶體
addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd,0x440000000);
// 修改 P1 的内容
*addr = 0x1122334455667788;
printf("address at: %p content is: 0x%lx\n", addr, addr[0]);
// 等待分頁交換
getchar();
printf("address at: %p content is: 0x%lx\n", addr, addr[0]);
close(fd);
munmap(addr, 4096);
return 1;
}
```
**slave.c**
```c
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>
#include <fcntl.h>
int main(int argc, char **argv)
{
int fd;
unsigned long *addr;
fd = open("/dev/mem", O_RDWR);
// 建立分頁 P2 映射到保留的記憶體
addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0x440004000);
// 修改 P2 的内容
*addr = 0x8877665544332211;
printf("address at: %p content is: 0x%lx\n", addr, addr[0]);
// 等待分頁交換
getchar();
printf("address at: %p content is: 0x%lx\n", addr, addr[0]);
close(fd);
munmap(addr, 4096);
return 1;
}
```
After execution, you can see the addresses and values of the two processes:
**master**
```bash
$ sudo ./master
address at: 0x7f9822c40000 content is: 0x1122334455667788
```
**slave**
```bash
$ sudo ./slave
address at: 0x7f9822cd1000 content is: 0x8877665544332211
```
To use crash to modify `/dev/mem`, you need to set up the environment first.
> When using crash to modify /dev/mem, you need to use ststemtap hook to live in devmeme_is_allowed, so that the return value is always 1, and then you can directly modify it.
Steps:
install systemtap
Execute stap -g -e 'probe kernel.function("devmem_is_allowed").return { $return = 1 }'
Then turn on crash again
**Use crash to modify the page of master**
```bash
crash> ps | grep master
27867 27866 4 ffff889e253945c0 IN 0.0 2492 1332 master
crash> set 27867
PID: 27867
COMMAND: "master"
TASK: ffff889e253945c0 [THREAD_INFO: ffff889e253945c0]
CPU: 4
STATE: TASK_INTERRUPTIBLE
crash> vtop 0x7f9822c40000
VIRTUAL PHYSICAL
7f9822c40000 440000000
PGD: 1ee8e27f8 => 800000020514a067
PUD: 20514a300 => 1f0d80067
PMD: 1f0d808b0 => 391d34067
PTE: 391d34200 => 8000000440000267
PAGE: 440000000
PTE PHYSICAL FLAGS
8000000440000267 440000000 (PRESENT|RW|USER|ACCESSED|DIRTY|NX)
VMA START END FLAGS FILE
ffff889dfa6cc0d0 7f9822c40000 7f9822c41000 d0444bb /dev/mem
crash> wr -64 -p 0x391d34200 0x8000000440004267
```
**Use crash to modify slave's page**
```bash
crash> ps | grep slave
27869 27868 4 ffff889e2ab21740 IN 0.0 2492 1384 slave
crash> set 27869
PID: 27869
COMMAND: "slave"
TASK: ffff889e2ab21740 [THREAD_INFO: ffff889e2ab21740]
CPU: 4
STATE: TASK_INTERRUPTIBLE
crash> vtop 0x7f9002cd1000
VIRTUAL PHYSICAL
7f9002cd1000 440004000
PGD: 1f0aea7f8 => 80000002539fc067
PUD: 2539fc200 => 20a3a8067
PMD: 20a3a80b0 => 205689067
PTE: 205689688 => 8000000440004267
PAGE: 440004000
PTE PHYSICAL FLAGS
8000000440004267 440004000 (PRESENT|RW|USER|ACCESSED|DIRTY|NX)
VMA START END FLAGS FILE
ffff889ecc833d40 7f9002cd1000 7f9002cd2000 d0444bb /dev/mem
crash> wr -64 -p 0x205689688 0x8000000440000267
```
Then let the program continue to execute and you can see the information exchange between the two processes!
**master**
```bash
$ sudo ./master
address at: 0x7f9822c40000 content is: 0x1122334455667788
address at: 0x7f9822c40000 content is: 0x8877665544332211
```
**slave**
```bash
$ sudo ./slave
address at: 0x7f9002cd1000 content is: 0x8877665544332211
address at: 0x7f9002cd1000 content is: 0x1122334455667788
```
> This example is very suitable for designing micro-kernel inter-process communication. With the cache consistency protocol, it can achieve very high efficiency.
## Securely tamper with the memory of the process
This time we can modify the page information in `/dev/mem` without crashing, but use another process to safely tamper with the memory of another process.
First, let's randomly map a piece of memory:
```c
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>
int main(int argc, char **argv)
{
unsigned char *addr;
// 匿名映射一段記憶體空間
addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_SHARED, -1, 0);
// 修改內容
strcpy(addr, "浙江溫州皮鞋濕");
// 只是範例,所以直接顯示 address 實際操作時需要手工 hack 記憶體位置
printf("address at: %p content is: %s\n", addr, addr);
getchar();
printf("address at: %p content is: %s\n", addr, addr);
munmap(addr, 4096);
return 1;
}
```
The output at this time is
```bash
$ ./change
address at: 0x7f4a035f1000 content is: 浙江溫州皮鞋濕
```
Then we can use crash to query the physical memory location, and then modify it through other programs.
First use crash to find the process and physical address.
```bash
crash> ps | grep change
crash: current context no longer exists -- restoring "crash" context:
36261 9027 4 ffff889e344d0000 IN 0.0 2496 1392 change
crash> set 36261
PID: 36261
COMMAND: "change"
TASK: ffff889e344d0000 [THREAD_INFO: ffff889e344d0000]
CPU: 4
STATE: TASK_INTERRUPTIBLE
crash> vtop 0x7f4a035f1000
VIRTUAL PHYSICAL
7f4a035f1000 18a6c4000
PGD: 3b084a7f0 => 80000003b1171067
PUD: 3b1171940 => 1ee97f067
PMD: 1ee97f0d0 => 1f0b21067
PTE: 1f0b21f88 => 800000018a6c4867
PAGE: 18a6c4000
PTE PHYSICAL FLAGS
800000018a6c4867 18a6c4000 (PRESENT|RW|USER|ACCESSED|DIRTY|NX)
VMA START END FLAGS FILE
ffff889feeb425b0 7f4a035f1000 7f4a035f2000 80000fb dev/zero
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffce5b4629b100 18a6c4000 ffff889e2aa0c4b8 0 2 17ffffc008001c uptodate,dirty,lru,swapbacked
```
The converted physical address is `0x18a6c4000`.
So we write a program to map the memory of this offset and change the content inside to `下雨進水不會胖`。
**hack.c**
```c
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>
#include <fcntl.h>
int main(int argc, char **argv)
{
int fd;
unsigned char *addr;
unsigned long long off;
off = strtoll(argv[1], NULL, 16);
fd = open("/dev/mem", O_RDWR);
addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, off);
strcpy(addr, "下雨進水不會胖");
close(fd);
munmap(addr, 4096);
return 1;
}
```
When executing, remember to hook `devmeme_is_allowed` first, otherwise there will be a segmentation fault.
```bash
sudo ./hack 0x18a6c4000
```
Go back and look at the results, and you can find that we have successfully modified the value stored at `0x18a6c4000`.
```bash
$ ./change
address at: 0x7f4a035f1000 content is: 浙江溫州皮鞋濕
address at: 0x7f4a035f1000 content is: 下雨進水不會胖
```
## Change the name of the process by changing /dev/mem
This time we will not use crash, just rely on hack /dev/mem to modify a process name.
> This makes sense for an Internet product to work.
Especially on some managed machines, in order to prevent information leakage, it is generally not allowed to use tools like `crash & gdb` to debug. Of course, `systemtap` API has restrictions, so it is relatively safe, and core modules are generally not will be banned.
But having `systemtap` and `/dev/mem` is enough!
Take a look at the following program, we will do a simple experiment:
- [ ] **Modify the name of the process being executed**
```c
#include <stdio.h>
int main(int argc, char **argv)
{
getchar();
}
```
```bash
gcc -o pixie pixie.c && ./pixie
```
Now we have to find a way to change the name of the process from pixie to skinshoe.
There is no `crash` and no `gdb`, only a `/dev/mem` that can be read and written (assuming we have hooked `devmem_si_allowed`) how to do it?
It is now known that all data structures in the core can be found in `/dev/mem`, so we need to find the location of the `task_struct` structure of the pixie process, and then change its `comm` field.
It is very easy if you use the `crash` tool, as long as you find out the position of the process, you can easily find the corresponding position of comm in the task_struct.
```bash
crash> set 63972
PID: 63972
COMMAND: "pixie"
TASK: ffff9c3832572e80 [THREAD_INFO: ffff9c3832572e80]
CPU: 4
STATE: TASK_INTERRUPTIBLE
crash> px ((struct task_struct*)0xffff9c3832572e80)->comm
$1 = "pixie\000PoolSingl"
crash> px &(((struct task_struct*)0xffff9c3832572e80)->comm)
$2 = (char (*)[16]) 0xffff9c38325738f8
```
But what if you can't use `crash` or `gdb` now?
We know that `/dev/mem` is a physical memory space, and any memory operated by the operating system is based on virtual addresses. How to establish the relationship between the two is the key.
We notice three facts:
* x86_64 can directly map 64TiB of physical memory, which is enough to map any common physical memory one by one.
* The Linux kernel creates a one-to-one mapping of all physical memory. Fixed offset between physical address and virtual address.
* The data structure of the Linux core is a network of interrelated structures, so it is possible to follow the vines.
This means that as long as we provide a virtual address of a Linux kernel space data structure, we can find its physical address, and then follow the clues to find the task_struct structure of our pixie process.
In the Linux system, the address of the core data structure can be found in many places:
* `/proc/kallsyms`
* `/boot/System.map`
* the result of `lsof`
The article explains one of the examples of finding `init_task` in `/proc/kallsyms`:
```bash
$ sudo cat /proc/kallsyms | grep init_task
ffffffff953a90c0 T ftrace_graph_init_task
ffffffff95406390 T perf_event_init_task
ffffffff96685cec r __ksymtab_init_task
ffffffff966aaa19 r __kstrtab_init_task
ffffffff96800000 D __start_init_task
ffffffff96804000 D __end_init_task
ffffffff96813780 D init_task
ffffffff96f48c98 b ext4_lazyinit_task
```
Then find the mapping rules from `init_task` to physical memory, start from `init_task` to visit the task linked list of the entire system, find our target `pixie` itinerary, and then make changes.
But this is not modified through `/dev/mem`, so another method is provided in the article.
First create a tcpdump process without capturing any packets, it is just a cover to provide clues, let's start with it:
```bash
$ sudo tcpdump -i lo -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
```
The reason why the tcpdump process is established is because tcpdump will generate a packet socket, and the virtual address of the socket can be found from procfs:
```
$ sudo cat /proc/net/packet
sk RefCnt Type Proto Iface R Rmem User Inode
ffff9c39c0c87000 2 2 0000 0 0 0 0 384885
ffff9c397b431000 3 2 888e 2 1 0 0 384886
ffff9c386ec14000 3 2 890d 2 1 0 0 382437
ffff9c397b4ab800 2 2 0000 0 0 0 0 383853
ffff9c3796d78000 3 2 0800 2 1 0 0 380767
```
**After starting tcpdump**
```bash
$ sudo cat /proc/net/packet
sk RefCnt Type Proto Iface R Rmem User Inode
ffff9c39c0c87000 2 2 0000 0 0 0 0 384885
ffff9c397b431000 3 2 888e 2 1 0 0 384886
ffff9c386ec14000 3 2 890d 2 1 0 0 382437
ffff9c397b4ab800 2 2 0000 0 0 0 0 383853
ffff9c3796d78000 3 2 0800 2 1 0 0 380767
ffff9c3888878000 3 3 0003 1 1 0 0 382711
```
We can see that a `packet socket` has indeed been added.
However, since the method of switching between virtual addresses and physical addresses in the article is different from my computer, so here we still need to use the help of crash to convert virtual addresses and physical addresses.
In this article, we will use the virtual address of `packet socket` to push back and forth the position of `wati_queue_head_t` in the structure, and then find the next `task_struct` and search the entire `task_struct list`.
> This point requires you to be very familiar with the data structure of the Linux core. If you are not familiar with it, go to the corresponding source code to calculate the offset. [Or use `struct X.y -o` of `crash` to calculate]
But because the article is old and the architecture of the computer system is not the same, so at this time we still have to use the `struct X.y -o` function of `crash` to see how to find the structure we are looking for.
First we start with `struct sock` to find `sk_wq`:
```bash
crash> struct sock
struct sock {
struct sock_common __sk_common;
socket_lock_t sk_lock;
atomic_t sk_drops;
int sk_rcvlowat;
struct sk_buff_head sk_error_queue;
struct sk_buff *sk_rx_skb_cache;
struct sk_buff_head sk_receive_queue;
struct {
atomic_t rmem_alloc;
int len;
struct sk_buff *head;
struct sk_buff *tail;
} sk_backlog;
int sk_forward_alloc;
unsigned int sk_ll_usec;
unsigned int sk_napi_id;
int sk_rcvbuf;
struct sk_filter *sk_filter;
union {
struct socket_wq *sk_wq;
struct socket_wq *sk_wq_raw;
};
struct xfrm_policy *sk_policy[2];
struct dst_entry *sk_rx_dst;
struct dst_entry *sk_dst_cache;
atomic_t sk_omem_alloc;
int sk_sndbuf;
int sk_wmem_queued;
refcount_t sk_wmem_alloc;
unsigned long sk_tsq_flags;
union {
struct sk_buff *sk_send_head;
struct rb_root tcp_rtx_queue;
};
struct sk_buff *sk_tx_skb_cache;
struct sk_buff_head sk_write_queue;
__s32 sk_peek_off;
int sk_write_pending;
__u32 sk_dst_pending_confirm;
u32 sk_pacing_status;
long sk_sndtimeo;
struct timer_list sk_timer;
__u32 sk_priority;
__u32 sk_mark;
unsigned long sk_pacing_rate;
unsigned long sk_max_pacing_rate;
struct page_frag sk_frag;
netdev_features_t sk_route_caps;
netdev_features_t sk_route_nocaps;
netdev_features_t sk_route_forced_caps;
int sk_gso_type;
unsigned int sk_gso_max_size;
gfp_t sk_allocation;
__u32 sk_txhash;
unsigned int __sk_flags_offset[0];
unsigned int sk_padding : 1;
unsigned int sk_kern_sock : 1;
unsigned int sk_no_check_tx : 1;
unsigned int sk_no_check_rx : 1;
unsigned int sk_userlocks : 4;
unsigned int sk_protocol : 8;
unsigned int sk_type : 16;
u16 sk_gso_max_segs;
u8 sk_pacing_shift;
unsigned long sk_lingertime;
struct proto *sk_prot_creator;
rwlock_t sk_callback_lock;
int sk_err;
int sk_err_soft;
u32 sk_ack_backlog;
u32 sk_max_ack_backlog;
kuid_t sk_uid;
struct pid *sk_peer_pid;
const struct cred *sk_peer_cred;
long sk_rcvtimeo;
ktime_t sk_stamp;
u16 sk_tsflags;
u8 sk_shutdown;
u32 sk_tskey;
atomic_t sk_zckey;
u8 sk_clockid;
u8 sk_txtime_deadline_mode : 1;
u8 sk_txtime_report_errors : 1;
u8 sk_txtime_unused : 6;
struct socket *sk_socket;
void *sk_user_data;
void *sk_security;
struct sock_cgroup_data sk_cgrp_data;
struct mem_cgroup *sk_memcg;
void (*sk_state_change)(struct sock *);
void (*sk_data_ready)(struct sock *);
void (*sk_write_space)(struct sock *);
void (*sk_error_report)(struct sock *);
int (*sk_backlog_rcv)(struct sock *, struct sk_buff *);
struct sk_buff *(*sk_validate_xmit_skb)(struct sock *, struct net_device *, struct sk_buff *);
void (*sk_destruct)(struct sock *);
struct sock_reuseport *sk_reuseport_cb;
struct bpf_sk_storage *sk_bpf_storage;
struct callback_head sk_rcu;
}
SIZE: 760
```
It can be seen that the structure size of the entire `sock` is 760.
Knowing that `sk_wq` exists in the structure, we can use `struct X.y` to query its offset.
```bash
crash> struct sock.sk_wq
struct sock {
[280] struct socket_wq *sk_wq;
}
```
Next query the structure of `struct socket_wq`:
```bash
crash> struct socket_wq
struct socket_wq {
wait_queue_head_t wait;
struct fasync_struct *fasync_list;
unsigned long flags;
struct callback_head rcu;
}
SIZE: 64
```
This structure is much smaller than the previous one. We can see that the first item is `wait_queue_head_t`, so the offset is 0, and we can directly observe `wait_queue_head_t`.
```bash
crash> struct wait_queue_head_t
typedef struct wait_queue_head {
spinlock_t lock;
struct list_head head;
} wait_queue_head_t;
SIZE: 24
```
This should be the waiting queue of the socket, but according to the clues in the article, you should also find `poll_wqueues` through `wait_queue_t` and then find `task_struct` from it, but the `wiat_queue_head_t` found so far is already at the end and will not be connected `poll_wqueues`, so find another way.
:::warning
Can't find a way to find `struct task_struct` from `struct sock` at present, so I still use `crash` to change the name of the process
:::
Next, we use the crash tool to help us modify the process name.
First of all, re-execute pixie
```
$ ./pixie
```
We can use `crash` to query the location of `task_struct` of this process, and then search for the location of `comm` in `task_struct`.
```bash
crash> ps | grep pixie
24127 10259 1 ffff8d7aa975c5c0 IN 0.0 2492 1248 pixie
crash> set 24127
PID: 24127
COMMAND: "pixie"
TASK: ffff8d7aa975c5c0 [THREAD_INFO: ffff8d7aa975c5c0]
CPU: 1
STATE: TASK_INTERRUPTIBLE
crash> px ((struct task_struct *)0xffff8d7aa975c5c0)->comm
$3 = "pixie\000PoolSingl"
crash> px &((struct task_struct *)0xffff8d7aa975c5c0)->comm
$4 = (char (*)[16]) 0xffff8d7aa975d038
```
Then we can convert this virtual address into a physical address for mapping:
```bash
crash> vtop 0xffff8d7aa975d038
VIRTUAL PHYSICAL
ffff8d7aa975d038 22975d038
PGD DIRECTORY: ffffffffa280a000
PAGE DIRECTORY: 3b8801067
PUD: 3b8801f50 => 2191d0063
PMD: 2191d0a58 => 219038063
PTE: 219038ae8 => 800000022975d063
PAGE: 22975d000
PTE PHYSICAL FLAGS
800000022975d063 22975d000 (PRESENT|RW|ACCESSED|DIRTY|NX)
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffe56c88a5d740 22975d000 dead000000000400 0 0 17ffffc0000000
```
We can see that `0x22975d038` is the physical address of `task_struct->comm`.
```bash
crash> rd -p 22975d038
22975d038: 656f68736e696b73 pixie
```
But we can't directly map this section of memory, because the direct mapping of the 12bits flag behind may not be allowed, so we need to clear the 12bits behind, do the mapping and then add the diff back.
**hack.c**
```c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>
#include <fcntl.h>
int main(int argc, char **argv)
{
int fd;
unsigned char *addr;
unsigned long long off, diff;
off = strtoll(argv[1], NULL, 16);
diff = off & 0x000000fff;
off &= 0xffffff000;
fd = open("/dev/mem", O_RDWR);
addr = mmap(NULL, 0xffffffff, PROT_READ|PROT_WRITE, MAP_SHARED, fd, off);
addr += diff;
printf("program name is: %s\n", addr);
// strcpy(addr, "skinshoe");
close(fd);
munmap(addr, 0xffffffff);
return 1;
}
```
After execution:
```bash
$ sudo ./skinshoe 22975d038
program name is: pixie
```
We can see that the name has changed.
```bash
crash> px ((struct task_struct *)0xffff8d7aa975c5c0)->comm
$8 = "skinshoe\000lSingl"
rash> rd -p 22975d038
22975d038: 656f68736e696b73 skinshoe
```
We can also use pid query:
```bash
$ cat /proc/24127/comm
skinshoe
```
So far we have successfully modified the name of the process through `/dev/mem`.
## Implement vtop
In [/dev/mem](https://hackmd.io/@sysprog/linux-mem-device#Linux-%E6%A0%B8%E5%BF%83%E7%9A%84-devmem-%E8%A3%9D%E7%BD%AE) mentioned in this article
> Non-reserved physical memory will be mapped one by one to the virtual address starting from `0xffff880000000000`
So when we want to convert the virtual address mapped in `/dev/mem`, we only need to subtract `0xffff880000000000`. The author also uses this to accomplish many things, but because of the different versions, my computer does not start from `0xffff880000000000 `Start.
After many comparisons and corrections, I found that the benchmark value of the computer will be different every time it is turned on. This time it starts from `0xffff8d7880000000`, so subtracting this value can successfully map the virtual address without conversion by `crash`, I It is speculated that it is randomly determined when the mapping table is created at startup.
So we can use the virtual address to view the name of the process.
Execute the pixie code in the previous paragraph as well:
```bash
./pixie
```
This time we can query pixie's `task_struct`:
```bash
crash> ps | grep pixie
crash: current context no longer exists -- restoring "crash" context:
27857 10259 0 ffff8d7a99105d00 IN 0.0 2492 1252 pixie
crash> set 27857
PID: 27857
COMMAND: "pixie"
TASK: ffff8d7a99105d00 [THREAD_INFO: ffff8d7a99105d00]
CPU: 0
STATE: TASK_INTERRUPTIBLE
```
So we know the virtual address is `0xffff8d7a99105d00`.
We can also query the displacement of task_struct.comm by the way:
```bash
crash> struct task_struct.comm
struct task_struct {
[2680] char comm[16];
}
```
So when we get the entity address and add 2680 it is `comm`.
The implementation code is as follows:
```c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>
#include <fcntl.h>
#define OFFSET 0xffff8d7880000000
int main(int argc, char **argv)
{
int fd;
unsigned char *addr;
unsigned long long off, diff;
off = strtoull(argv[1], NULL, 16);
off -= OFFSET;
off += 2680;
diff = off & 0x000000fff;
off &= 0xffffff000;
fd = open("/dev/mem", O_RDWR);
addr = mmap(NULL, 0xffffffff, PROT_READ|PROT_WRITE, MAP_SHARED, fd, off);
addr += diff;
printf("program name is: %s\n", addr);
close(fd);
munmap(addr, 0xffffffff);
return 1;
}
```
Final execution result:
```bash
$ sudo ./vtop ffff8d7a99105d00
program name is: pixie
```
## Visit the process of the system
Continuing the previous example, we can try to visit the process after pixie.
We can first query the position of `(list_head *) tasks` in `task_struct`, which is the linked-list connecting each `task_struct`.
```bash
crash> struct task_struct.tasks
struct task_struct {
[1984] struct list_head tasks;
}
```
After knowing the position deviation, we can try to visit the processes and their names.
The same we do from pixie as the entry point.
```bash
./pixie
```
Then find the virtual address of `task_struct` of this process as before, this time I won’t demonstrate them one by one. In addition, in this example, the virtual address is mapped from `0xffff9fa540000000`.
After that, we can add and subtract the offsets of `task_struct->tasks` and `task_struct->comm` to visit the subsequent processes and their names. Here I list a total of 12.
**traverse.c**
```c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>
#include <fcntl.h>
#define OFFSET 0xffff9fa540000000
int main(int argc, char **argv)
{
int fd;
unsigned char *addr;
unsigned long long off, diff;
unsigned long *pltmp;
off = strtoull(argv[1], NULL, 16);
off += 1984;
fd = open("/dev/mem", O_RDWR);
for(int i = 0; i < 12; i++){
off -= OFFSET;
off -= 1984;
diff = off & 0x000000fff;
off &= 0xffffff000;
addr = mmap(NULL, 0xffffffff, PROT_READ|PROT_WRITE, MAP_SHARED, fd, off);
addr += diff;
addr += 2680;
printf("program name is: %s\n", addr);
addr -= 2680;
addr += 1984;
pltmp = (long unsigned int *)addr;
off = (unsigned long long)*pltmp;
munmap(addr, 0xffffffff);
}
close(fd);
return 1;
}
```
Output result:
```bash
$ sudo ./traverse ffff9fa791151740
program name is: pixie
program name is: bash
program name is: kworker/u16:1
program name is: kworker/u16:2
program name is: kworker/0:0
program name is: kworker/1:1
program name is: kworker/u16:3
program name is: kworker/3:2
program name is: cpptools-srv
program name is: kworker/u16:0
program name is: sudo
program name is: traverse
```
It can be seen that the process name after pixie is successfully listed, and the traverse at the end is our current process, so we have successfully extended the previous example to visit each process in the system.
## Access NULL address legally
Next, we will try to see if we can legally access the address of NULL.
In fact, the NULL address can be accessed completely, as long as there is a paging table to map it to a physical memory page, we can first look at [mmap(2)](https://man7.org/linux/man-pages/man2/mmap .2.html) inside the description.
> on Linux, the kernel will pick a nearby page boundary (but always above or equal to the value specified by `/proc/sys/vm/mmap_min_addr`) and attempt to create the mapping there.
It can be seen that addresses smaller than the number in `/proc/sys/vm/mmap_min_addr` are protected and cannot be mapped, so we need to change the value inside to 0 so that we can use NULL Space.
So first we change `/proc/sys/vm/mmap_min_addr`:
```bash
$ cat /proc/sys/vm/mmap_min_addr
65536
$ sudo sh -c "echo 0 > /proc/sys/vm/mmap_min_addr"
$ cat /proc/sys/vm/mmap_min_addr
0
```
Next we can try to map to use NULL address:
```c
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
int main(int argc, char **argv)
{
int i;
unsigned char *niladdr = NULL;
unsigned char str[] = "Zhejiang Wenzhou pixie shi,xiayu jinshui buhui pang!";
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_FIXED|MAP_ANONYMOUS|MAP_SHARED, -1, 0);
perror("a");
for (i = 0 ; i < sizeof(str); i++) {
niladdr[i] = str[i];
}
printf("using assignment at NULL: %s\n", niladdr);
for (i = 0 ; i < sizeof(str); i++) {
printf ("%c", *((char*)NULL+i));
}
printf ("\n");
getchar();
munmap(0, 4096);
return 0;
}
```
Output result:
```bash
$ sudo ./access0
a: Success
using assignment at NULL: (null)
Zhejiang Wenzhou pixie shi, xiayu jinshui buhui pang!
```
Observe through `crash`:
```bash
crash> ps | grep access0
crash: current context no longer exists -- restoring "crash" context:
8447 8446 1 ffff9ddb907a2e80 IN 0.0 2492 1452 access0
crash> set 8447
PID: 8447
COMMAND: "access0"
TASK: ffff9ddb907a2e80 [THREAD_INFO: ffff9ddb907a2e80]
CPU: 1
STATE: TASK_INTERRUPTIBLE
crash> vtop 0
VIRTUAL PHYSICAL
0 3d8725000
PGD: 2dbf7e000 => 800000038bfd9067
PUD: 38bfd9000 => 426544067
PMD: 426544000 => 32f357067
PTE: 32f357000 => 80000003d8725867
PAGE: 3d8725000
PTE PHYSICAL FLAGS
80000003d8725867 3d8725000 (PRESENT|RW|USER|ACCESSED|DIRTY|NX)
VMA START END FLAGS FILE
ffff9ddc3a17a8f0 0 1000 80000fb dev/zero
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffefce4f61c940 3d8725000 ffff9ddcbea4d598 0 2 17ffffc008001c uptodate,dirty,lru,swapbacked
```
It can be seen that we have successfully mapped the NULL address to the physical memory, and we can also observe the value of the NULL address:
```bash
crash> rd 0 8
0: 676e61696a65685a 756f687a6e655720 Zhejiang Wenzhou
10: 7320656978697020 75796169782c6968 pixie shi, xiayu
20: 697568736e696a20 7020697568756220 jinshui buhui p
30: 0000000021676e61 0000000000000000 ang!..........
```
Exactly the same as what we put in!
So why it is impossible to access NULL is to better distinguish what is a legal address, so a special address called NULL is artificially created to make it inaccessible, but at the MMU (Memory Management Unit) level, NULL is no different from other memory.
## Kernel protect mechanism
**KPTI**
Since the general shared address space may cause core data leakage, the linux kernel introduces the technology KPTI (shared address space) to effectively hide the relative location of the kernel in the user space.
According to the description of [KAISER: hiding the kernel from user space](https://lwn.net/Articles/738975/), KPTI will randomize the position of the kernel in the virtual address space at boot time, which can prevent attackers from knowing the kernel correct position, KPTI will provide a
The shadow page table records all user space data, and only records a small part of kernel data to ensure that system calls and interrupts can be executed correctly, thereby achieving the function of hiding the kernel.
However, it is still possible that the base address of the kernel is leaked during mode conversion.
**ASLR**
ASLR is another memory protection mechanism that places process data at unpredictable random addresses. This method can be used to prevent attackers from using stack overflow to jump to specific locations for attacks.
Because of the above two mechanisms, some unexpected situations will appear when we use crash to observe the memory content, so I have to turn off KPTI and ASLR and then observe the memory content to see if there is any difference.
First we turn off KPTI:
This [HOW TO DISABLE PAGE-TABLE ISOLATION ON UBUNTU FOR BENCHMARKING](https://www.stevenrombauts.be/2018/02/how-to-disable-page-table-isolation-on-ubuntu-for-benchmarking/ ) with examples.
Let's first take a look at the status of KPTI in the system:
``` bash
$ cat /sys/devices/system/cpu/vulnerabilities/*
KVM: Mitigation: Split huge pages
Mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable
Mitigation: Clear CPU buffers; SMT vulnerable
Mitigation: PTI
Mitigation: Speculative Store Bypass disabled via prctl and seccomp
Mitigation: usercopy/swapgs barriers and __user pointer sanitization
Mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling
Mitigation: Microcode
Not affected
```
In this way, you can see that Mitigation has the option of PTI.
Then use the grub boot file and then restart the shutdown, go to `/etc/default/grub` to modify the `GRUB_CMDLINE_LINUX_DEFAULT` parameter.
like this:
```bash
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pti=off"
```
It can be turned off after executing `update-grub` and rebooting.
```bash
$ cat /sys/devices/system/cpu/vulnerabilities/*
KVM: Mitigation: Split huge pages
Mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable
Mitigation: Clear CPU buffers; SMT vulnerable
Vulnerable
Mitigation: Speculative Store Bypass disabled via prctl and seccomp
Mitigation: usercopy/swapgs barriers and __user pointer sanitization
Mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling
Mitigation: Microcode
Not affected
```
We can choose to turn off ASLR again:
From [How ASLR protects Linux systems from buffer overflow attacks](https://www.networkworld.com/article/3331199/what-does-aslr-do-for-linux.html) we can get from `/proc/sys The state of ASLR is known in /kernel/randomize_va_space`.
```bash
$ cat /proc/sys/kernel/randomize_va_space
2
$ sysctl -a --pattern randomize
kernel.randomize_va_space = 2
```
Here 2 means Full Randomization.
The article mentioned an interesting way to test ASLR, using ldd to verify whether the listed address is different every time.
```bash
$ ldd /bin/bash
linux-vdso.so.1 (0x00007ffcf11fe000)
libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007f888ba57000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f888ba51000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f888b85f000)
/lib64/ld-linux-x86-64.so.2 (0x00007f888bbd1000)
$ ldd /bin/bash
linux-vdso.so.1 (0x00007ffcbc7c6000)
libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007fa4001c6000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa4001c0000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa3fffce000)
/lib64/ld-linux-x86-64.so.2 (0x00007fa400340000)
```
Next, we have to turn off ASLR, and then use ldd to observe:
```bash
$ sudo sysctl -w kernel.randomize_va_space=0
kernel.randomize_va_space = 0
$ ldd /bin/bash
linux-vdso.so.1 (0x00007ffff7fce000)
libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007ffff7e51000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ffff7e4b000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff7c59000)
/lib64/ld-linux-x86-64.so.2 (0x00007ffff7fcf000)
$ ldd /bin/bash
linux-vdso.so.1 (0x00007ffff7fce000)
libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007ffff7e51000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ffff7e4b000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffff7c59000)
/lib64/ld-linux-x86-64.so.2 (0x00007ffff7fcf000)
```
We can see that we have successfully disabled ASLR.
Then repeat the previous experiment to see if there is any difference.
Starting from the page table offset, re-execute `test` to get the offset of the page table of each layer.
```bash
$ sudo ./test
addr = 0x7fe93c5e6000
the address is 0x7fe93c5e6000, and the value is 1
the address is 0x7fe93c5e6004, and the value is 9
PGD index = 0xff
PUD index = 0x1a4
PMD index = 0x1e2
PTE index = 0x1e6
```
We can test to see if the offset will be as we expected, or it will still be the result of shifting to the left by three bits.
```bash
crash> ps | grep test
crash: current context no longer exists -- restoring "crash" context:
4310 4309 2 ffff8e83da5edd00 IN 0.0 2492 1424 test
crash> set 4310
PID: 4310
COMMAND: "test"
TASK: ffff8e83da5edd00 [THREAD_INFO: ffff8e83da5edd00]
CPU: 2
STATE: TASK_INTERRUPTIBLE
crash> vtop 0x7ffff7ffb000
VIRTUAL PHYSICAL
7ffff7ffb000 440000000
PGD: 2e98b67f8 => 2f89cb067
PUD: 2f89cbff8 => 42498a067
PMD: 42498adf8 => 403e32067
PTE: 403e32fd8 => 8000000440000267
PAGE: 440000000
PTE PHYSICAL FLAGS
8000000440000267 440000000 (PRESENT|RW|USER|ACCESSED|DIRTY|NX)
VMA START END FLAGS FILE
ffff8e82f2e57930 7ffff7ffb000 7ffff7ffc000 d0444bb /dev/mem
```
According to the result displayed by the crash, the offset is still the same as the original display, but one thing worth noting is that since ASLR is turned off, the virtual address will not change no matter how many times it is executed>
```bash
$ sudo ./test
addr = 0x7ffff7ffb000
the address is 0x7ffff7ffb000, and the value is 1
the address is 0x7ffff7ffb004, and the value is 9
PGD index = 0xff
PUD index = 0x1ff
PMD index = 0x1bf
PTE index = 0x1fb
$ sudo ./test
addr = 0x7ffff7ffb000
the address is 0x7ffff7ffb000, and the value is 1
the address is 0x7ffff7ffb004, and the value is 9
PGD index = 0xff
PUD index = 0x1ff
PMD index = 0x1bf
PTE index = 0x1fb
$ sudo ./test
addr = 0x7ffff7ffb000
the address is 0x7ffff7ffb000, and the value is 1
the address is 0x7ffff7ffb004, and the value is 9
PGD index = 0xff
PUD index = 0x1ff
PMD index = 0x1bf
PTE index = 0x1fb
$ sudo ./test
addr = 0x7ffff7ffb000
the address is 0x7ffff7ffb000, and the value is 1
the address is 0x7ffff7ffb004, and the value is 9
PGD index = 0xff
PUD index = 0x1ff
PMD index = 0x1bf
PTE index = 0x1fb
```
Next we can observe to see if the non-reserved memory will have a fixed mapping address.
This time we can execute the pixie program repeatedly to see the change of his address.
```bash
crash> ps | grep pixie
2598 2357 3 ffff979b7e2e8000 IN 0.0 2492 1232 pixie
crash> set 2598
PID: 2598
COMMAND: "pixie"
TASK: ffff979b7e2e8000 [THREAD_INFO: ffff979b7e2e8000]
CPU: 3
STATE: TASK_INTERRUPTIBLE
crash> px 0xffff979b7e2e8000
$1 = 0xffff979b7e2e8000
crash> vtop 0xffff979b7e2e8000
VIRTUAL PHYSICAL
ffff979b7e2e8000 3fe2e8000
PGD DIRECTORY: ffffffffa040a000
PAGE DIRECTORY: 244401067
PUD: 244401368 => 3ffa28063
PMD: 3ffa28f88 => 3fe38b063
PTE: 3fe38b740 => 80000003fe2e8163
PAGE: 3fe2e8000
PTE PHYSICAL FLAGS
80000003fe2e8163 3fe2e8000 (PRESENT|RW|ACCESSED|DIRTY|GLOBAL|NX)
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffe5d44ff8ba00 3fe2e8000 ffff979ba281a840 ffff979b7e2e9740 1 17ffffc0010200 slab,head
crash> px 0xffff979b7e2e8000-0x3fe2e8000
$2 = 0xffff979780000000
```
We can see that the base address is `0xffff979780000000` when you execute it for the first time, so let’s reboot to see if it changes.
Reboot and execute pixie:
```bash
crash> ps | grep pixie
2666 2437 2 ffff9633e7492e80 IN 0.0 2492 1232 pixie
crash> set 2666
PID: 2666
COMMAND: "pixie"
TASK: ffff9633e7492e80 [THREAD_INFO: ffff9633e7492e80]
CPU: 2
STATE: TASK_INTERRUPTIBLE
crash> vtop 0xffff9633e7492e80
VIRTUAL PHYSICAL
ffff9633e7492e80 427492e80
PGD DIRECTORY: ffffffff9ce0a000
PAGE DIRECTORY: 103401067
PUD: 103401678 => 42e3aa063
PMD: 42e3aa9d0 => 4273be063
PTE: 4273be490 => 8000000427492163
PAGE: 427492000
PTE PHYSICAL FLAGS
8000000427492163 427492000 (PRESENT|RW|ACCESSED|DIRTY|GLOBAL|NX)
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffe865d09d2480 427492000 dead000000000400 0 0 17ffffc0000000
crash> px 0xffff9633e7492e80-0x427492e80
$1 = 0xffff962fc0000000
```
:::warning
:bell: After rebooting, I found that the base address is still different, so the previous assumption was wrong.
:::
Then let's try `Spectre` that the teacher said. `Spectre` basically uses branch prediction and speculative execution on modern cpus to bypass access control to obtain privileged data and does not modify memory.
This time, let's try to turn off Linux's defense mechanism against `Spectre`.
According to [Specter Side Channels](https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/spectre.html) we can turn off `Specter` through the grub configuration file.
```bash
GRUB_CMDLINE_LINUX_DEFAULT="nospectre_v1 nospectre_v2 nopti quiet splash"
```
Then restart the machine and use crash to test it.
```bash
crash> ps | grep pixie
3234 2949 6 ffff948c0457c5c0 IN 0.0 2492 1232 pixie
crash> set 3234
PID: 3234
COMMAND: "pixie"
TASK: ffff948c0457c5c0 [THREAD_INFO: ffff948c0457c5c0]
CPU: 6
STATE: TASK_INTERRUPTIBLE
crash> vtop 0xffff948c0457c5c0
VIRTUAL PHYSICAL
ffff948c0457c5c0 34457c5c0
PGD DIRECTORY: ffffffffa6e0a000
PAGE DIRECTORY: 268201067
PUD: 268201180 => 80000003400001e3
PMD: 340000110 => 0
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
fffff03a0d115f00 34457c000 dead000000000400 0 0 17ffffc0000000
crash> px 0xffff948c0457c5c0-0x34457c5c0
$1 = 0xffff9488c0000000
```
The base address is `0xffff9488c0000000`, and then reboot again.
```bash
crash> ps | grep pixie
3249 3022 6 ffff998ac5bd0000 IN 0.0 2492 1232 pixie
crash> set 3249
PID: 3249
COMMAND: "pixie"
TASK: ffff998ac5bd0000 [THREAD_INFO: ffff998ac5bd0000]
CPU: 6
STATE: TASK_INTERRUPTIBLE
crash> vtop 0x ffff998ac5bd0000
VIRTUAL PHYSICAL
0 (not accessible)
VIRTUAL PHYSICAL
ffff998ac5bd0000 305bd0000
PGD DIRECTORY: ffffffffb620a000
PAGE DIRECTORY: 344e01067
PUD: 344e01158 => 80000003000001e3
PMD: 300000168 => 280000000a
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
fffff1ed0c16f400 305bd0000 ffff998bb153e840 ffff998ac5bd1740 1 17ffffc0010200 slab,head
crash> vtop 0xffff998ac5bd0000
VIRTUAL PHYSICAL
ffff998ac5bd0000 305bd0000
PGD DIRECTORY: ffffffffb620a000
PAGE DIRECTORY: 344e01067
PUD: 344e01158 => 80000003000001e3
PMD: 300000168 => 280000000a
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
fffff1ed0c16f400 305bd0000 ffff998bb153e840 ffff998ac5bd1740 1 17ffffc0010200 slab,head
crash> px 0xffff998ac5bd0000-0x305bd0000
$1 = 0xffff9987c0000000
```
:::warning
:bell: The default base address of the two boots is still different, so it may be caused by other defense mechanisms.
:::
According to the teacher's prompt, I will turn off KASLR to see if there will be expected results. The way to turn off is to set `GRUB_CMDLINE_LINUX_DEFAULT` in grub and add `nokaslr` in it.
After restarting, let's observe the results:
```bash
crash> ps | grep pixie
3890 3235 1 ffff8883cd758000 IN 0.0 2492 1224 pixie
crash> set 3890\
set: invalid task or pid value: 3890\
crash> set 3890
PID: 3890
COMMAND: "pixie"
TASK: ffff8883cd758000 [THREAD_INFO: ffff8883cd758000]
CPU: 1
STATE: TASK_INTERRUPTIBLE
crash> vtop 0xffff8883cd758000
VIRTUAL PHYSICAL
ffff8883cd758000 3cd758000
PGD DIRECTORY: ffffffff8260a000
PAGE DIRECTORY: 3001067
PUD: 3001078 => 3ffdd1063
PMD: 3ffdd1358 => 3cd6f9063
PTE: 3cd6f9ac0 => 80000003cd758163
PAGE: 3cd758000
PTE PHYSICAL FLAGS
80000003cd758163 3cd758000 (PRESENT|RW|ACCESSED|DIRTY|GLOBAL|NX)
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea000f35d600 3cd758000 ffff888404f38540 ffff8883cd759740 1 17ffffc0010200 slab,head
crash> px 0xffff8883cd758000-0x3cd758000
$1 = 0xffff888000000000
```
This time the final result is our expected base address `0xffff888000000000`!
:::success
Therefore, we can infer that the main mechanism for changing the kernel mapping base address at boot time should be caused by KASLR.
:::