## 簡介 各種在 Linux 環境下 Troubleshooting 的常見方法 ## 分類 ### Disk Space Usage 檢查根目錄開始的每個檔案的硬碟使用空間量 ``` sudo du -h / | sort -n ``` 如果是 log 或 journal 之類的使用空間量較大,可以直接刪除 如果是 containerd,或是 docker,造成的原因可能是 image 太多 ### Cpu Memory Usage 用 `htop` 找到異常的 process ``` htop ``` 再用 `pstree` 來追查有關的 process ``` pstree -ps <pid> ``` `lsof` 可以找出這 process 正在使用的檔案 ``` lsof -p <pid> ``` `strace` 可以印出目前 process 的 system call ``` strace -p <pid> ``` 如果有發現異常的訊號,例如 `SIGILL` 可以使用 `gdb` 來在訊號異常時,找到異常的 back trace ``` gdb -p <pid> handle <signal> stop // 預設就是這個設定 bt ``` ``` Thread 1 "node /code/dist" received signal SIGILL, Illegal instruction. 0x00005653314a9923 in v8::base::OS::Abort() () (gdb) handle SIGILL stop Signal Stop Print Pass to program Description SIGILL Yes Yes Yes Illegal instruction (gdb) bt #0 0x00005653314a9923 in v8::base::OS::Abort() () #1 0x00005653326bff74 in V8_Fatal(char const*, ...) () #2 0x000056533199ccbe in v8::internal::Scavenger::Process(v8::internal::OneshotBarrier*) () #3 0x00005653319a466a in v8::internal::ScavengingTask::RunInParallel(v8::internal::ItemParallelJob::Task::Runner) () #4 0x00005653319292bc in v8::internal::ItemParallelJob::Run() () #5 0x00005653319a2100 in v8::internal::ScavengerCollector::CollectGarbage() () #6 0x000056533190c9fd in v8::internal::Heap::Scavenge() () #7 0x000056533191bb20 in v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) () #8 0x000056533191c1ce in v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) () #9 0x0000565331920428 in v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) () #10 0x00005653318dda07 in v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) () #11 0x0000565331ca779f in v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) () #12 0x000056533209c759 in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit () #13 0x0000565332104d80 in Builtins_Load_FastDoubleElements_0 () #14 0x0000000000000000 in ?? () (gdb) ``` ### Disk I/O ### NetWork I/O ## 案例 ### Containerd 或是 docker image 造成硬碟使用空間過高 Image 通常會設定 Rotate,如果發現硬碟使用空間過高是因為 Image造成,可能需要調整 Rotate 的策略 - containerd 刪除使用不到的Image ```bash crictl rmi --prune ``` - Docker 刪除使用不到的Image ```bash docker image prune ```