# [Debug] Continuous HTTP requests to a web server result in HTTP/1.1 404 Not Found and remotely manage lost
###### tags: `debug`, `http`, `web server`, `file descriptor`, `leak`, `memory leak`, `fd leak`, `PRTG`, `goahead`
[toc]
## Preface
This issue is reported by customers. They use a tool named PRTG to monitor the netwrok. Unfortunately, they cannot control and manage the switch through SSH, telnet and web after a few days. For web, both of HTTP and HTTPS are failed.
First, I tried to reproduce this issue with HTTPS as what customers did. After several hours, I cound not access the login webpage which turned into ***404 Not Found***, but I could login through SSH or telnet. We kept this test after several days, we found it turned into ***a memory leak issue***. According to the serial-port log, it looked like a TLS relating problem at first glance. In order to make it simpler, I tried to make HTTP connection and got only 404 Not Found issue.
Here, I am going to share these two issues together in detail.
## Environment
In this scenario, we have a DUT and a host. The DUT is a L2 switch and has a active web server. And the host is installed a network monitoring tool named [PRTG Network Monitor](http://www.acom-networks.com/32178363353164929702-paessler-prtg.html). We connect a DUT with the host and then execute PRTG on the host. A HTTP sensor is added to our DUT device in PRTG. It will launch http requests such as **HTTP get or post** to DUT periodically, and here is 30 sec.
## Symptoms
### 404 Not Found issue
After a burn-in test (about 9 hours regularly), we cannot access the management web which turned into **404 Not Found**. We observed that
1. No memory leak.
2. Hankshakes between HTTP server and client are normal.<div style="text-align: center"><img src="https://i.imgur.com/0pzpxeY.png"/></div></br>
3. Sometimes the client sent TCP port numbers reused packets to server.<div style="text-align: center"><img src="https://i.imgur.com/hUKI9UJ.png"/></div></br>
4. This failure always occurs after about 9-hour testing.
5. In most of time, it regularly fails after approximately 1000 times of HTTP requests from a client to a server.<div style="text-align: center"><img src="https://i.imgur.com/RE5kb9Y.png"/></div></br>
### Memory leak issue
We put this system to the test for a couple of days (at least 3 days). And we found that there were lots of error messages relating TLS from goahead and memory leak was followed. Finally, the Out of Memory (OOM) killer killed plenty of processes based on the oom-score. The largest one will be killed first. You can check it by
```shell
Tomas$ cat /proc/[pid]/oom_score
10
```
Those outputs look like
:::spoiler
```
Serial-COM6-23:03:40.305 inetd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
Serial-COM6-23:03:40.439 [<c002888c>] (dump_stack+0x0/0x14) from [<c005ea94>] (out_of_memory+0x78/0x250)
Serial-COM6-23:03:40.439 [<c005ea1c>] (out_of_memory+0x0/0x250) from [<c0060614>] (__alloc_pages+0x23c/0x2c8)
Serial-COM6-23:03:40.572 [<c00603d8>] (__alloc_pages+0x0/0x2c8) from [<c005bdf4>] (page_cache_read+0x48/0xb0)
Serial-COM6-23:03:40.704 [<c005bdac>] (page_cache_read+0x0/0xb0) from [<c005c338>] (filemap_nopage+0x1c4/0x30c)
Serial-COM6-23:03:40.705 [<c005c174>] (filemap_nopage+0x0/0x30c) from [<c0067220>] (__handle_mm_fault+0x15c/0xc54)
Serial-COM6-23:03:40.838 [<c00670c4>] (__handle_mm_fault+0x0/0xc54) from [<c002a2ac>] (do_page_fault+0xe8/0x23c)
Serial-COM6-23:03:40.971 [<c002a1c4>] (do_page_fault+0x0/0x23c) from [<c002a4a4>] (do_translation_fault+0x18/0x80)
Serial-COM6-23:03:41.105 [<c002a48c>] (do_translation_fault+0x0/0x80) from [<c002319c>] (do_PrefetchAbort+0x18/0x1c)
Serial-COM6-23:03:41.105 r7:400760f4 r6:0000003c r5:0000003c r4:ffffffff
Serial-COM6-23:03:41.238 [<c0023184>] (do_PrefetchAbort+0x0/0x1c) from [<c0023da8>] (ret_from_exception+0x0/0x10)
Serial-COM6-23:03:41.239 Exception stack(0xc3f05fb0 to 0xc3f05ff8)
Serial-COM6-23:03:41.372 5fa0: 496842a2 0000003c 400760f4 ffff51fc
Serial-COM6-23:03:41.372 5fc0: 496842a2 0000003c 0000003c 400760f4 40071e54 00020e6f 40067050 400760f4
Serial-COM6-23:03:41.505 5fe0: 40071ed4 bee2b1f8 4003bd74 0006963c 20000010 ffffffff
Serial-COM6-23:03:41.505 Mem-info:
Serial-COM6-23:03:41.505 DMA per-cpu:
Serial-COM6-23:03:41.639 CPU 0: Hot: hi: 18, btch: 3 usd: 2 Cold: hi: 6, btch: 1 usd: 2
Serial-COM6-23:03:41.639 Active:12737 inactive:701 dirty:0 writeback:0 unstable:0
Serial-COM6-23:03:41.772 free:252 slab:1171 mapped:58 pagetables:93 bounce:0
Serial-COM6-23:03:41.906 DMA free:1008kB min:1016kB low:1268kB high:1524kB active:50948kB inactive:2804kB present:65024kB pages_scanned:83198 all_unreclaimable? yes
Serial-COM6-23:03:41.906 lowmem_reserve[]: 0 0
Serial-COM6-23:03:42.043 DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1008kB
Serial-COM6-23:03:42.043 Free swap: 0kB
Serial-COM6-23:03:42.043 16384 pages of RAM
Serial-COM6-23:03:42.043 359 free pages
Serial-COM6-23:03:42.043 857 reserved pages
Serial-COM6-23:03:42.043 1171 slab pages
Serial-COM6-23:03:42.177 1033 pages shared
Serial-COM6-23:03:42.177 0 pages swap cached
Serial-COM6-23:03:42.178 Out of memory: kill process 308 (blahblahblah) score 22 or a child
Serial-COM6-23:03:42.178 Killed process 308 (blahblahblah)
Serial-COM6-23:03:42.309 cmd-server2 invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
Serial-COM6-23:03:42.442 [<c002888c>] (dump_stack+0x0/0x14) from [<c005ea94>] (out_of_memory+0x78/0x250)
Serial-COM6-23:03:42.442 [<c005ea1c>] (out_of_memory+0x0/0x250) from [<c0060614>] (__alloc_pages+0x23c/0x2c8)
Serial-COM6-23:03:42.576 [<c00603d8>] (__alloc_pages+0x0/0x2c8) from [<c005bdf4>] (page_cache_read+0x48/0xb0)
Serial-COM6-23:03:42.710 [<c005bdac>] (page_cache_read+0x0/0xb0) from [<c005c338>] (filemap_nopage+0x1c4/0x30c)
Serial-COM6-23:03:42.710 [<c005c174>] (filemap_nopage+0x0/0x30c) from [<c0067220>] (__handle_mm_fault+0x15c/0xc54)
Serial-COM6-23:03:42.843 [<c00670c4>] (__handle_mm_fault+0x0/0xc54) from [<c002a2ac>] (do_page_fault+0xe8/0x23c)
Serial-COM6-23:03:42.976 [<c002a1c4>] (do_page_fault+0x0/0x23c) from [<c002a4a4>] (do_translation_fault+0x18/0x80)
Serial-COM6-23:03:42.977 [<c002a48c>] (do_translation_fault+0x0/0x80) from [<c002319c>] (do_PrefetchAbort+0x18/0x1c)
Serial-COM6-23:03:43.110 r7:00000003 r6:000216e0 r5:00000003 r4:ffffffff
Serial-COM6-23:03:43.243 [<c0023184>] (do_PrefetchAbort+0x0/0x1c) from [<c0023da8>] (ret_from_exception+0x0/0x10)
Serial-COM6-23:03:43.244 Exception stack(0xc2ee9fb0 to 0xc2ee9ff8)
Serial-COM6-23:03:43.377 9fa0: 00000000 be87fa78 00000000 00000000
Serial-COM6-23:03:43.377 9fc0: 00000000 00000003 000216e0 00000003 00000001 0000e120 000216a0 00000000
Serial-COM6-23:03:43.510 9fe0: 0002154c be87f2b8 0000b2bc 0000b2bc 00000010 ffffffff
Serial-COM6-23:03:43.511 Mem-info:
Serial-COM6-23:03:43.511 DMA per-cpu:
Serial-COM6-23:03:43.643 CPU 0: Hot: hi: 18, btch: 3 usd: 1 Cold: hi: 6, btch: 1 usd: 2
Serial-COM6-23:03:43.643 Active:12737 inactive:714 dirty:0 writeback:0 unstable:0
Serial-COM6-23:03:43.643 free:252 slab:1172 mapped:58 pagetables:93 bounce:0
Serial-COM6-23:03:43.910 DMA free:1008kB min:1016kB low:1268kB high:1524kB active:50948kB inactive:2856kB present:65024kB pages_scanned:83230 all_unreclaimable? yes
Serial-COM6-23:03:43.910 lowmem_reserve[]: 0 0
Serial-COM6-23:03:44.047 DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1008kB
Serial-COM6-23:03:44.047 Free swap: 0kB
Serial-COM6-23:03:44.047 16384 pages of RAM
Serial-COM6-23:03:44.047 358 free pages
Serial-COM6-23:03:44.047 857 reserved pages
Serial-COM6-23:03:44.048 1172 slab pages
Serial-COM6-23:03:44.048 1020 pages shared
Serial-COM6-23:03:44.085 0 pages swap cached
...
...
...
Serial-COM6-00:03:37.487 kthreadd invoked oom-killer: gfp_mask=0xd0, order=1, oomkilladj=0
Serial-COM6-00:03:37.654 [<c002888c>] (dump_stack+0x0/0x14) from [<c005ea94>] (out_of_memory+0x78/0x250)
Serial-COM6-00:03:37.753 [<c005ea1c>] (out_of_memory+0x0/0x250) from [<c0060614>] (__alloc_pages+0x23c/0x2c8)
Serial-COM6-00:03:37.753 [<c00603d8>] (__alloc_pages+0x0/0x2c8) from [<c006071c>] (__get_free_pages+0x28/0x5c)
Serial-COM6-00:03:37.887 [<c00606f4>] (__get_free_pages+0x0/0x5c) from [<c0036ffc>] (copy_process+0x80/0xfec)
Serial-COM6-00:03:38.258 [<c0036f7c>] (copy_process+0x0/0xfec) from [<c0038028>] (do_fork+0xc0/0x1c8)
Serial-COM6-00:03:38.259 [<c0037f68>] (do_fork+0x0/0x1c8) from [<c0024fa0>] (kernel_thread+0x70/0x80)
Serial-COM6-00:03:38.259 [<c0024f30>] (kernel_thread+0x0/0x80) from [<c004c5bc>] (kthreadd+0xb4/0x100)
Serial-COM6-00:03:38.260 r8:c029ed70 r7:c0380000 r6:0000069f r5:c37e3f50 r4:c37e3f2c
Serial-COM6-00:03:38.326 [<c004c508>] (kthreadd+0x0/0x100) from [<c003bd88>] (do_exit+0x0/0x854)
Serial-COM6-00:03:38.326 Mem-info:
Serial-COM6-00:03:38.327 DMA per-cpu:
Serial-COM6-00:03:38.460 CPU 0: Hot: hi: 18, btch: 3 usd: 3 Cold: hi: 6, btch: 1 usd: 0
Serial-COM6-00:03:38.460 Active:12637 inactive:917 dirty:0 writeback:0 unstable:0
Serial-COM6-00:03:38.567 free:254 slab:1157 mapped:21 pagetables:63 bounce:0
Serial-COM6-00:03:38.688 DMA free:1016kB min:1016kB low:1268kB high:1524kB active:50548kB inactive:3668kB present:65024kB pages_scanned:81973 all_unreclaimable? yes
Serial-COM6-00:03:38.688 lowmem_reserve[]: 0 0
Serial-COM6-00:03:38.824 DMA: 0*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1016kB
Serial-COM6-00:03:38.824 Free swap: 0kB
Serial-COM6-00:03:38.824 16384 pages of RAM
Serial-COM6-00:03:38.824 320 free pages
Serial-COM6-00:03:38.824 857 reserved pages
Serial-COM6-00:03:38.875 1157 slab pages
Serial-COM6-00:03:38.875 841 pages shared
Serial-COM6-00:03:38.892 0 pages swap cached
Serial-COM6-00:03:40.840 printk: 367 messages suppressed.
Serial-COM6-00:03:40.841 kthreadd invoked oom-killer: gfp_mask=0xd0, order=1, oomkilladj=0
Serial-COM6-00:03:40.972 [<c002888c>] (dump_stack+0x0/0x14) from [<c005ea94>] (out_of_memory+0x78/0x250)
Serial-COM6-00:03:41.106 [<c005ea1c>] (out_of_memory+0x0/0x250) from [<c0060614>] (__alloc_pages+0x23c/0x2c8)
Serial-COM6-00:03:41.106 [<c00603d8>] (__alloc_pages+0x0/0x2c8) from [<c006071c>] (__get_free_pages+0x28/0x5c)
Serial-COM6-00:03:41.240 [<c00606f4>] (__get_free_pages+0x0/0x5c) from [<c0036ffc>] (copy_process+0x80/0xfec)
Serial-COM6-00:03:41.377 [<c0036f7c>] (copy_process+0x0/0xfec) from [<c0038028>] (do_fork+0xc0/0x1c8)
Serial-COM6-00:03:41.383 [<c0037f68>] (do_fork+0x0/0x1c8) from [<c0024fa0>] (kernel_thread+0x70/0x80)
Serial-COM6-00:03:41.527 [<c0024f30>] (kernel_thread+0x0/0x80) from [<c004c5bc>] (kthreadd+0xb4/0x100)
Serial-COM6-00:03:41.528 r8:c029ed70 r7:c0380000 r6:0000069f r5:c37e3f50 r4:c37e3f2c
Serial-COM6-00:03:41.640 [<c004c508>] (kthreadd+0x0/0x100) from [<c003bd88>] (do_exit+0x0/0x854)
Serial-COM6-00:03:41.640 Mem-info:
Serial-COM6-00:03:41.640 DMA per-cpu:
Serial-COM6-00:03:41.773 CPU 0: Hot: hi: 18, btch: 3 usd: 3 Cold: hi: 6, btch: 1 usd: 0
Serial-COM6-00:03:41.773 Active:13403 inactive:151 dirty:0 writeback:0 unstable:0
Serial-COM6-00:03:41.906 free:254 slab:1157 mapped:21 pagetables:63 bounce:0
Serial-COM6-00:03:42.040 DMA free:1016kB min:1016kB low:1268kB high:1524kB active:53612kB inactive:604kB present:65024kB pages_scanned:85749 all_unreclaimable? yes
Serial-COM6-00:03:42.040 lowmem_reserve[]: 0 0
Serial-COM6-00:03:42.176 DMA: 0*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1016kB
Serial-COM6-00:03:42.177 Free swap: 0kB
Serial-COM6-00:03:42.177 16384 pages of RAM
Serial-COM6-00:03:42.177 320 free pages
Serial-COM6-00:03:42.177 857 reserved pages
Serial-COM6-00:03:42.177 1157 slab pages
Serial-COM6-00:03:42.205 841 pages shared
Serial-COM6-00:03:42.220 0 pages swap cached
Serial-COM6-00:04:03.425 goahead: 0: CTR_DRBG - The entropy source failed: error -0x34
Serial-COM6-00:04:03.441 goahead: 0: Cannot upgrade to TLS
Serial-COM6-00:04:59.668 goahead: 0: CTR_DRBG - The entropy source failed: error -0x34
Serial-COM6-00:04:59.734 goahead: 0: Cannot upgrade to TLS
Serial-COM6-00:05:25.578 goahead: 0: SSL - Handshake protocol not within min/max boundaries: error -0x6e80
Serial-COM6-00:05:25.594 goahead: 0: Cannot upgrade to TLS
Serial-COM6-00:05:40.339 goahead: 0: SSL - Processing of the ClientHello handshake message failed: error -0x7900
Serial-COM6-00:05:40.355 goahead: 0: Cannot upgrade to TLS
Serial-COM6-00:07:12.728 goahead: 0: CTR_DRBG - The entropy source failed: error -0x34
Serial-COM6-00:07:15.837 goahead: 0: Cannot upgrade to TLS
Serial-COM6-00:08:51.519 printk: 9 messages suppressed.
Serial-COM6-00:08:51.519 inetd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
Serial-COM6-00:08:51.653 [<c002888c>] (dump_stack+0x0/0x14) from [<c005ea94>] (out_of_memory+0x78/0x250)
Serial-COM6-00:08:51.786 [<c005ea1c>] (out_of_memory+0x0/0x250) from [<c0060614>] (__alloc_pages+0x23c/0x2c8)
Serial-COM6-00:08:51.786 [<c00603d8>] (__alloc_pages+0x0/0x2c8) from [<c005bdf4>] (page_cache_read+0x48/0xb0)
Serial-COM6-00:08:51.920 [<c005bdac>] (page_cache_read+0x0/0xb0) from [<c005c338>] (filemap_nopage+0x1c4/0x30c)
Serial-COM6-00:08:52.053 [<c005c174>] (filemap_nopage+0x0/0x30c) from [<c0067220>] (__handle_mm_fault+0x15c/0xc54)
Serial-COM6-00:08:52.054 [<c00670c4>] (__handle_mm_fault+0x0/0xc54) from [<c002a2ac>] (do_page_fault+0xe8/0x23c)
Serial-COM6-00:08:52.187 [<c002a1c4>] (do_page_fault+0x0/0x23c) from [<c00231d8>] (do_DataAbort+0x38/0x98)
Serial-COM6-00:08:52.320 [<c00231a0>] (do_DataAbort+0x0/0x98) from [<c0023da8>] (ret_from_exception+0x0/0x10)
Serial-COM6-00:08:52.320 Exception stack(0xc3f05fb0 to 0xc3f05ff8)
Serial-COM6-00:08:52.454 5fa0: 00086044 0006db29 0001851a 00086043
Serial-COM6-00:08:52.454 5fc0: 0000002c 00000007 0000000d 000733cb 40075516 00000001 00000016 00000007
Serial-COM6-00:08:52.588 5fe0: 0008411c bee2b828 0000d964 400482dc 60000010 ffffffff
Serial-COM6-00:08:52.588 Mem-info:
Serial-COM6-00:08:52.588 DMA per-cpu:
Serial-COM6-00:08:52.720 CPU 0: Hot: hi: 18, btch: 3 usd: 2 Cold: hi: 6, btch: 1 usd: 3
Serial-COM6-00:08:52.721 Active:13063 inactive:499 dirty:0 writeback:0 unstable:0
Serial-COM6-00:08:52.854 free:252 slab:1154 mapped:22 pagetables:60 bounce:0
Serial-COM6-00:08:52.988 DMA free:1008kB min:1016kB low:1268kB high:1524kB active:52252kB inactive:1996kB present:65024kB pages_scanned:80627 all_unreclaimable? yes
Serial-COM6-00:08:52.988 lowmem_reserve[]: 0 0
Serial-COM6-00:08:53.336 DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1008kB
Serial-COM6-00:08:53.336 Free swap: 0kB
Serial-COM6-00:08:53.336 16384 pages of RAM
Serial-COM6-00:08:53.336 316 free pages
Serial-COM6-00:08:53.336 857 reserved pages
Serial-COM6-00:08:53.338 1154 slab pages
Serial-COM6-00:08:53.338 843 pages shared
Serial-COM6-00:08:53.339 0 pages swap cached
Serial-COM6-00:08:53.339 Out of memory: kill process 323 (goahead) score 9 or a child
Serial-COM6-00:08:53.339 Killed process 323 (goahead)
```
:::
</br>
## Root cause analysis 1
### 404 Not Foud issue
We enabled the debugging options and set a loglevel, and found it failed because the login webpage cannot open normally. Why this webpage cannot be opened after being opened about 1000 times?
### Hypothesis
I guess that
1. If the file name of this webpage is correct?
2. If the path of directory for this webpage is correct?
3. **If file descriptor is leaking?**
### Validation
It is easy to check both hypothesis 1 and 2 by adding debugging messages.
In hypothesis 3, we need to enter some commands in shell as shown below.
```shell
# To check out what is the present open file limit in Linux System
Tomas$ ulimit -n
1024
# To check the pid of our target process
Tomas$ ps
PID USER VSZ STAT COMMAND
1 root 1136 S init
. .... .... .. ...
123 root 4567 S /bin/mytest
# To count the number of opened fd in our target process
Tomas$ ls -l /proc/123/fd | wc -l
1023
# To list these fd
Tomas$ ls -l /proc/323/fd
lr-x------ 1 64 Jan 1 00:12 0 -> /dev/null
lrwx------ 1 64 Jan 1 00:12 1 -> /dev/console
l-wx------ 1 64 Jan 1 00:12 2 -> /dev/console
lrwx------ 1 64 Jan 1 00:12 3 -> socket:[873]
lrwx------ 1 64 Jan 1 00:12 4 -> socket:[874]
lrwx------ 1 64 Jan 1 00:12 5 -> socket:[875]
... .. .. ... . .... ...............
```
After I fixed this fd leak issue, I tested it with HTTPS connection. Luckily, it did not fail. But, if I used browser to open the webpage again and again, the used memory of goahead got larger and larger.
```
Tomas $ ps
PID USER VSZ STAT COMMAND
391 root 14228 S goahead --log stdout:4 --route ....
```
## Root cause analysis 2
### Memory leak issue
According aforementioned observation, it was definitely a memory issue. So, how to fix it?
### Solution
After tracing the code according to dumpped error messages, I located this bug. And then we can solve it easily by cloning the newest version of goahead and diff it.
## Summary
From above, at first, we confirm this is a file descriptor leak issue. This is often caused when we opened a file and forgot to closed it again and again. In this case, I found that two times of fopen are called but only one time of fclose are called. Finally, the resources are exhausted.
Secondly, the memory leak issue is diffcult to reproduce. It costs lots of time and effort to monitor amounts of memory used in a process. The sign of memory leak is that the memory consumption increases with time linearly.
## Reference
http://www.acom-networks.com/32178363353164929702-paessler-prtg.html
https://www.paessler.com/prtg?gclid=CjwKCAiAnfjyBRBxEiwA-EECLJihb3gTyf9sD01uqBdnayrMaI3AZDIZa_3-KgX2iwwPIl1vCZGVuhoCWnoQAvD_BwE
https://www.linuxtechi.com/set-ulimit-file-descriptors-limit-linux-servers/
https://blog.toright.com/posts/1203/%E6%B7%BA%E8%AB%87-http-method%EF%BC%9A%E8%A1%A8%E5%96%AE%E4%B8%AD%E7%9A%84-get-%E8%88%87-post-%E6%9C%89%E4%BB%80%E9%BA%BC%E5%B7%AE%E5%88%A5%EF%BC%9F.html