# Docker loading test note
###### tags: `docker`
1024 containers in one single docker network
ref: https://success.mirantis.com/article/maximum-containers-per-engine
https://stackoverflow.com/questions/21799382/is-there-a-maximum-number-of-containers-running-on-a-docker-host
https://blog.widodh.nl/2015/10/maximum-amount-of-docker-containers-on-a-single-host/
****
HPC: 100.74.53.2
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic
Try case
1. `docker network create -d macvlan --subnet=100.74.53.0/24 --gateway=100.74.53.1 mlsteam-net`
Each containers will got different private ip. Master node can not match server and repository can not connected.
2. `docker network create -d macvlan --subnet=172.10.0.0/16 --gateway=172.10.0.1 mlsteam-net`
Master node match server ip, repository can connected. But all containers can not ping 8.8.8.8
If we want to use swarm mode with macvlan, maybe we can reference this page. https://collabnix.com/docker-17-06-swarm-mode-now-with-macvlan-support/ (not try yet)
> 2020/11/10 Update
## Docker loading test
### Case 1
1. Create and run at once
2. More than 2000 containers
```python=
import subprocess
import time
num = 2000
for i in range(num):
cmd = "docker create ...."
subprocess.Popen(cmd, shell=True)
cmd = "docker start ...."
subprocess.Popen(cmd, shell=True)
```
:::danger
Error message
1. `docker container state OCI runtime create failed: container with id exists:unknown`
2. `Error response from daemon: failed to create endpoint u9999 on network mlsteam-net: adding interface vethfb3035c to bridge br-16b2b4e4cc93 failed: exchange full`
3. `Oct 27 08:51:50 hpc1 python3[17041]: runtime/cgo: pthread_create failed: Resource temporarily unavailable
Oct 27 08:51:50 hpc1 python3[17041]: SIGABRT: abort`
4. `Error response from daemon: ttrpc: closed: unknown
Error: failed to start containers: n267`
:::
### Case 2
1. Pre-create 2000 containers at once
2. After 60 secs, start all containers at once
```python=
import subprocess
import time
num = 2000
for i in range(num):
cmd = "docker create ...."
subprocess.Popen(cmd, shell=True)
time.sleep(60)
for i in range(num):
cmd = "docker start ...."
subprocess.Popen(cmd, shell=True)
```
:::danger
Error message
1. ~~`docker container state OCI runtime create failed: container with id exists:unknown`~~
2. `Error response from daemon: failed to create endpoint u9999 on network mlsteam-net: adding interface vethfb3035c to bridge br-16b2b4e4cc93 failed: exchange full`
3. `Oct 27 08:51:50 hpc1 python3[17041]: runtime/cgo: pthread_create failed: Resource temporarily unavailable
Oct 27 08:51:50 hpc1 python3[17041]: SIGABRT: abort`
4. ~~`Error response from daemon: ttrpc: closed: unknown
Error: failed to start containers: n267`~~
:::
### Case 3
1. Pre-create 1000 containers
2. After 60 secs, start all containers at once
```python=
import subprocess
import time
num = 1000
for i in range(num):
cmd = "docker create ...."
subprocess.Popen(cmd, shell=True)
time.sleep(60)
for i in range(num):
cmd = "docker start ...."
subprocess.Popen(cmd, shell=True)
```
:::danger
Error message
1. ~~`docker container state OCI runtime create failed: container with id exists:unknown`~~
2. ~~`Error response from daemon: failed to create endpoint u9999 on network mlsteam-net: adding interface vethfb3035c to bridge br-16b2b4e4cc93 failed: exchange full`~~
3. `Oct 27 08:51:50 hpc1 python3[17041]: runtime/cgo: pthread_create failed: Resource temporarily unavailable
Oct 27 08:51:50 hpc1 python3[17041]: SIGABRT: abort`
4. ~~`Error response from daemon: ttrpc: closed: unknown
Error: failed to start containers: n267`~~
:::
### Case 4
1. Pre-create 1000 containers at once
2. run about 300 containers each batch
```python=
import subprocess
import time
num = 1000
internal = 3
for i in range(num):
cmd = "docker create ...."
subprocess.Popen(cmd, shell=True)
time.sleep(60)
# 0-334
# 333-667
# 666-1000
for i in range(internal):
for i in range(i*num, (i+1)*num+1):
cmd = "docker start ...."
subprocess.Popen(cmd, shell=True)
```
:::success
Success !! total 1000 containers
:::
### Case 5
1. Pre-create 1000 containers at once
2. run about 300 containers each batch
3. connect different docker-engine (default, mlsteam-net)
```python=
import subprocess
import time
num = 1000
internal = 3
for i in range(num):
cmd = "docker create ...."
subprocess.Popen(cmd, shell=True)
cmd = "docker create network=mlsteam-net"
subprocess.Popen(cmd, shell=True)
# default
time.sleep(60)
# 0-334
# 333-667
# 666-1000
for i in range(internal):
for i in range(i*num, (i+1)*num+1):
cmd = "docker start ...."
subprocess.Popen(cmd, shell=True)
# mlsteam-net
time.sleep(60)
# 0-334
# 333-667
# 666-1000
for i in range(internal):
for i in range(i*num, (i+1)*num+1):
cmd = "docker start ...."
subprocess.Popen(cmd, shell=True)
```
:::success
All Success !! total 2000 containers
:::
## MLSteam loading test
2020/11/02
```
Percentage of the requests completed within given times
Type Name # reqs 50% 66% 75% 80% 90% 95% 98% 99% 99.9% 99.99% 100%
------------------------------------------------------------------------------------------------------------------------------------------------------
POST /api/auth/login 466 410 440 560 640 880 1100 1200 1300 1400 1400 1400
POST /api/labs 463 170 220 370 440 680 870 1200 1400 1800 1800 1800
------------------------------------------------------------------------------------------------------------------------------------------------------
None Aggregated 929 380 420 460 560 800 1000 1200 1300 1800 1800 1800
```
### Case 1
:::warning
After 700 containers
python catch:
`error waiting for container: context canceled"\nError response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \\"rootfs_linux.go:109: jailing process inside rootfs caused \\\\\\"pivot_root invalid argument\\\\\\"\\"": unknown\n`
:::
:::warning
After 800 containers
```
2020-11-02 06:41:07 [DEBUG] b'time="2020-11-02T06:40:58Z" level=error msg="error waiting for container: context canceled"\nError response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \\"rootfs_linux.go:109: jailing process inside rootfs caused \\\\\\"pivot_root invalid argument\\\\\\"\\"": unknown\n'
2020-11-02 06:43:08 [ERROR] [Lab.wait_alive] : lab ue0d23ac response timeout=15.
2020-11-02 06:43:08 [DEBUG] b'time="2020-11-02T06:42:59Z" level=error msg="error waiting for container: context canceled"\nError response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \\"process_linux.go:432: running prestart hook 1 caused \\\\\\"error running hook: exit status 2, stdout: , stderr: runtime: failed to create new OS thread (have 2 already; errno=11)\\\\\\\\nruntime: may need to increase max user processes (ulimit -u)\\\\\\\\nfatal error: newosproc\\\\\\\\n\\\\\\\\nruntime stack:\\\\\\\\nruntime.throw(0x53e895, 0x9)\\\\\\\\n\\\\\\\\t/usr/local/go/src/runtime/panic.go:1116 +0x72\\\\\\\\nruntime.newosproc(0xc00002e000)\\\\\\\\n\\\\\\\\t/usr/local/go/src/runtime/os_linux.go:161 +0x1ba\\\\\\\\nruntime.newm1(0xc00002e000)\\\\\\\\n\\\\\\\\t/usr/local/go/src/runtime/proc.go:1753 +0xdc\\\\\\\\nruntime.newm(0x548970, 0x0)\\\\\\\\n\\\\\\\\t/usr/local/go/src/runtime/proc.go:1732 +0x8f\\\\\\\\nruntime.main.func1()\\\\\\\\n\\\\\\\\t/usr/local/go/src/runtime/proc.go:134 +0x36\\\\\\\\nruntime.systemstack(0x45bd74)\\\\\\\\n\\\\\\\\t/usr/local/go/src/runtime/asm_amd64.s:370 +0x66\\\\\\\\nruntime.mstart()\\\\\\\\n\\\\\\\\t/usr/local/go/src/runtime/proc.go:1041\\\\\\\\n\\\\\\\\ngoroutine 1 [running]:\\\\\\\\nruntime.systemstack_switch()\\\\\\\\n\\\\\\\\t/usr/local/go/src/runtime/asm_amd64.s:330 fp=0xc00002a788 sp=0xc00002a780 pc=0x45be70\\\\\\\\nruntime.main()\\\\\\\\n\\\\\\\\t/usr/local/go/src/runtime/proc.go:133 +0x70 fp=0xc00002a7e0 sp=0xc00002a788 pc=0x4332d0\\\\\\\\nruntime.goexit()\\\\\\\\n\\\\\\\\t/usr/local/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc00002a7e8 sp=0xc00002a7e0 pc=0x45de01\\\\\\\\n\\\\\\"\\"": unknown\n'
2020-11-02 06:43:08 [DEBUG] [lab.run] : stop lab ue0d23ac, exception, wait alive failed
```
:::
:::warning
It shows lot of error message after 900 containers.
:::
:::danger
docker : Error response from daemon: removal of container ### is already in progress

:::
:::danger
After 840 containers
```
File "/usr/lib/python3/dist-packages/requests/models.py", line 935, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 409 Client Error: Conflict for url: http+docker://localhost/v1.40/containers/68d86854380b0b4111cd5dd5467499d3217a577ecf753272237b3d660eb5601f?v=False&link=False&force=False
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "src/gevent/greenlet.py", line 854, in gevent._gevent_cgreenlet.Greenlet.run
File "/build/mlsteam_agent/mlsteam_agent/lab.py", line 451, in run
self.remove()
File "/build/mlsteam_agent/mlsteam_agent/lab.py", line 133, in remove
raise e
File "/build/mlsteam_agent/mlsteam_agent/lab.py", line 130, in remove
docker.from_env().containers.get(self.uuid).remove()
File "/usr/local/lib/python3.6/dist-packages/docker/models/containers.py", line 351, in remove
return self.client.api.remove_container(self.id, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/docker/utils/decorators.py", line 19, in wrapped
return f(self, resource_id, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/docker/api/container.py", line 1009, in remove_container
self._raise_for_status(res)
File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 261, in _raise_for_status
raise create_api_error_from_http_exception(e)
File "/usr/local/lib/python3.6/dist-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation)
docker.errors.APIError: 409 Client Error: Conflict ("You cannot remove a running container 68d86854380b0b4111cd5dd5467499d3217a577ecf753272237b3d660eb5601f. Stop the container before attempting removal or force remove")
2020-11-02T06:47:28Z <Greenlet at 0x7fb0334fdb48: <bound method Lab.run of <mlsteam_agent.lab.Lab object at 0x7fb0334d5320>>> failed with APIError
```
:::
:::danger
```
After 900 containers
runtime: failed to create new OS thread (have 24 already; errno=11)
runtime: may need to increase max user processes (ulimit -u)
fatal error: newosproc
runtime stack:
runtime.throw(0x1af5fbe, 0x9)
/usr/local/go/src/runtime/panic.go:616 +0x81
runtime.newosproc(0xc4204eac00, 0xc420526000)
/usr/local/go/src/runtime/os_linux.go:164 +0x1af
runtime.newm1(0xc4204eac00)
/usr/local/go/src/runtime/proc.go:1879 +0x113
runtime.newm(0x1bc48b0, 0x0)
/usr/local/go/src/runtime/proc.go:1858 +0x9b
runtime.startTheWorldWithSema(0x1, 0x4579f3)
/usr/local/go/src/runtime/proc.go:1155 +0x1d0
runtime.gcMarkTermination.func3()
/usr/local/go/src/runtime/mgc.go:1647 +0x26
runtime.systemstack(0x0)
/usr/local/go/src/runtime/asm_amd64.s:409 +0x79
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1175
goroutine 36 [running]:
runtime.systemstack_switch()
/usr/local/go/src/runtime/asm_amd64.s:363 fp=0xc4200bb550 sp=0xc4200bb548 pc=0x453e00
runtime.gcMarkTermination(0x3fed900e52bfd36e)
/usr/local/go/src/runtime/mgc.go:1647 +0x407 fp=0xc4200bb720 sp=0xc4200bb550 pc=0x4187d7
runtime.gcMarkDone()
/usr/local/go/src/runtime/mgc.go:1513 +0x22c fp=0xc4200bb748 sp=0xc4200bb720 pc=0x41836c
runtime.gcBgMarkWorker(0xc420053400)
/usr/local/go/src/runtime/mgc.go:1912 +0x2e7 fp=0xc4200bb7d8 sp=0xc4200bb748 pc=0x4192e7
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc4200bb7e0 sp=0xc4200bb7d8 pc=0x456811
created by runtime.gcBgMarkStartWorkers
/usr/local/go/src/runtime/mgc.go:1723 +0x79
goroutine 1 [runnable]:
net/http.(*persistConn).roundTrip(0xc4206da000, 0xc420686e70, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/transport.go:2033 +0x5a7
net/http.(*Transport).RoundTrip(0xc4201260f0, 0xc420688500, 0xc4201260f0, 0x0, 0x0)
/usr/local/go/src/net/http/transport.go:422 +0x8f2
net/http.send(0xc420688500, 0x1c91720, 0xc4201260f0, 0x0, 0x0, 0x0, 0xc42000c870, 0xc42024d710, 0xc4206a36c8, 0x1)
/usr/local/go/src/net/http/client.go:252 +0x185
net/http.(*Client).send(0xc420686d50, 0xc420688500, 0x0, 0x0, 0x0, 0xc42000c870, 0x0, 0x1, 0x417e1a)
/usr/local/go/src/net/http/client.go:176 +0xfa
net/http.(*Client).Do(0xc420686d50, 0xc420688500, 0xc42003c0c0, 0xc420688500, 0xc420023c20)
/usr/local/go/src/net/http/client.go:615 +0x28d
github.com/docker/cli/vendor/golang.org/x/net/context/ctxhttp.Do(0x1cc40c0, 0xc42003c0c0, 0xc420686d50, 0xc420688400, 0x0, 0x27c1be0, 0x2)
/go/src/github.com/docker/cli/vendor/golang.org/x/net/context/ctxhttp/ctxhttp.go:30 +0x6e
github.com/docker/cli/vendor/github.com/docker/docker/client.(*Client).doRequest(0xc42023de80, 0x1cc40c0, 0xc42003c0c0, 0xc420688400, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/docker/cli/vendor/github.com/docker/docker/client/request.go:132 +0xbe
github.com/docker/cli/vendor/github.com/docker/docker/client.(*Client).Ping(0xc42023de80, 0x1cc40c0, 0xc42003c0c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/docker/cli/vendor/github.com/docker/docker/client/ping.go:17 +0x17c
github.com/docker/cli/cli/command.(*DockerCli).initializeFromClient(0xc420411b00)
/go/src/github.com/docker/cli/cli/command/cli.go:213 +0x8b
github.com/docker/cli/cli/command.(*DockerCli).Initialize(0xc420411b00, 0xc420481400, 0xc420686930, 0x3)
/go/src/github.com/docker/cli/cli/command/cli.go:197 +0x192
main.newDockerCommand.func2(0xc420672500, 0xc420686930, 0x3, 0x3, 0x0, 0x0)
/go/src/github.com/docker/cli/cmd/docker/docker.go:43 +0x71
github.com/docker/cli/vendor/github.com/spf13/cobra.(*Command).execute(0xc420672500, 0xc420038160, 0x3, 0x3, 0xc420672500, 0xc420038160)
/go/src/github.com/docker/cli/vendor/github.com/spf13/cobra/command.go:741 +0x579
github.com/docker/cli/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc420416c80, 0xc420583fb0, 0x17fc840, 0xc420583fc0)
/go/src/github.com/docker/cli/vendor/github.com/spf13/cobra/command.go:852 +0x30a
github.com/docker/cli/vendor/github.com/spf13/cobra.(*Command).Execute(0xc420416c80, 0xc420416c80, 0x1c91a00)
/go/src/github.com/docker/cli/vendor/github.com/spf13/cobra/command.go:800 +0x2b
main.main()
/go/src/github.com/docker/cli/cmd/docker/docker.go:180 +0xdc
goroutine 5 [syscall]:
os/signal.signal_recv(0x0)
/usr/local/go/src/runtime/sigqueue.go:139 +0xa6
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.0
/usr/local/go/src/os/signal/signal_unix.go:28 +0x41
goroutine 52 [chan receive]:
github.com/docker/cli/vendor/github.com/golang/glog.(*loggingT).flushDaemon(0x27c0f20)
/go/src/github.com/docker/cli/vendor/github.com/golang/glog/glog.go:882 +0x8b
created by github.com/docker/cli/vendor/github.com/golang/glog.init.0
/go/src/github.com/docker/cli/vendor/github.com/golang/glog/glog.go:410 +0x203
goroutine 114 [select]:
net/http.(*persistConn).readLoop(0xc4206da000)
/usr/local/go/src/net/http/transport.go:1717 +0x743
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:1237 +0x95a
goroutine 115 [runnable]:
internal/poll.(*FD).writeUnlock(0xc4206e4080)
/usr/local/go/src/internal/poll/fd_mutex.go:246 +0x5a
internal/poll.(*FD).Write(0xc4206e4080, 0xc4206f7000, 0x50, 0x1000, 0x50, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:261 +0x2de
net.(*netFD).Write(0xc4206e4080, 0xc4206f7000, 0x50, 0x1000, 0x6, 0x6, 0xc4204a2f88)
/usr/local/go/src/net/fd_unix.go:220 +0x4f
net.(*conn).Write(0xc420400018, 0xc4206f7000, 0x50, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:188 +0x6a
net/http.persistConnWriter.Write(0xc4206da000, 0xc4206f7000, 0x50, 0x1000, 0x6, 0x1aecc32, 0x3)
/usr/local/go/src/net/http/transport.go:1253 +0x52
bufio.(*Writer).Flush(0xc420356040, 0x1c8f0e0, 0xc420356040)
/usr/local/go/src/bufio/bufio.go:573 +0x7e
net/http.(*persistConn).writeLoop(0xc4206da000)
/usr/local/go/src/net/http/transport.go:1838 +0x38d
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:1238 +0x97f
2020-11-02 06:48:41 [INFO ] => (Server) : {'title': 'task_heartbeat_update', 'data': {'uuid': 'uca0661a'}, 'host': 'hpc1'}
Error response from daemon: Container 1b94f2b8ce91f6276c62814e8e
```
:::
:::info
After delete all projects
Almost 200 containers fails and dead
:::
### Case 1
```python=414
# mlsteam_agent/lab.py
if not self.wait_alive():
raise Exception('wait alive failed')
self.post_run()
self.set_status(RunState.RUN)
### add
while True:
time.sleep(1)
return
###
while self.docker_process.poll() is None:
self.heartbeat()
if self.should_stop.is_set():
self.set_status(RunState.SAVING)
self.close(timeout=STOP_TIMEOUT, err=False)
break
if len(self.commit_que):
self.set_status(RunState.SAVING)
self.commit()
self.set_status(RunState.RUN)
time.sleep(1)
```
:::warning
It did not happened docker core dump, but agent will stuck after 500 containers and disconnect sockerio.
:::
### Case 2
```python=414
# mlsteam_agent/lab.py
if not self.wait_alive():
raise Exception('wait alive failed')
self.post_run()
self.set_status(RunState.RUN)
while self.docker_process.poll() is None:
self.heartbeat()
if self.should_stop.is_set():
self.set_status(RunState.SAVING)
self.close(timeout=STOP_TIMEOUT, err=False)
break
### Remove
'''
if len(self.commit_que):
self.set_status(RunState.SAVING)
self.commit()
self.set_status(RunState.RUN)
'''
###
time.sleep(1)
```
:::warning
It shows lot of error message after 900 containers.
:::
:::danger
docker : Error response from daemon: removal of container ### is already in progress

:::
### Case 3
```python=156
# mlsteam_agent/lab.py
with self.commit_que_lock:
while True:
try:
image, tag = self.commit_que.pop()
logger.debug("[Lab.commit] : lab {} start commit {}:{}".format(self.uuid, image, tag))
self.sync()
if image_obj is None:
self.send_container_size()
cli.containers.get(self.uuid).commit(image, tag)
image_obj = cli.images.get(image + ':' + tag)
else:
image_obj.tag(image + ':' + tag)
logger.debug("[Lab.commit] : lab {} image saved: {}:{}".format(
self.uuid, image, tag))
if self.repository:
self.set_status(RunState.PUSH)
logger.debug("[lab.push] : lab {} pushing...".format(self.uuid))
from mlsteam_agent.imagepusher import ImagePusher
username = self.repository.get('username')
password = self.repository.get('password')
server = self.repository.get('server')
uuid = "{}{}".format("p", self.uuid)
auth = {'username': username, 'password': password}
data = {
'uuid': uuid,
'server': server,
'image': image,
'tag': tag,
'auth': auth
}
### Remove
'''
imgpush = ImagePusher(agent=self.agent, **data)
self.agent.runs[imgpush.uuid] = imgpush
if not imgpush.run():
raise Exception('push failed')
logger.debug("[lab.push] : lab {} pushed".format(self.uuid))
'''
###
```
:::warning
After 700 containers
python catch:
`error waiting for container: context canceled"\nError response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \\"rootfs_linux.go:109: jailing process inside rootfs caused \\\\\\"pivot_root invalid argument\\\\\\"\\"": unknown\n`
:::
:::warning
After 800 containers
```
2020-11-02 06:41:07 [DEBUG] b'time="2020-11-02T06:40:58Z" level=error msg="error waiting for container: context canceled"\nError response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \\"rootfs_linux.go:109: jailing process inside rootfs caused \\\\\\"pivot_root invalid argument\\\\\\"\\"": unknown\n'
2020-11-02 06:43:08 [ERROR] [Lab.wait_alive] : lab ue0d23ac response timeout=15.
2020-11-02 06:43:08 [DEBUG] b'time="2020-11-02T06:42:59Z" level=error msg="error waiting for container: context canceled"\nError response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \\"process_linux.go:432: running prestart hook 1 caused \\\\\\"error running hook: exit status 2, stdout: , stderr: runtime: failed to create new OS thread (have 2 already; errno=11)\\\\\\\\nruntime: may need to increase max user processes (ulimit -u)\\\\\\\\nfatal error: newosproc\\\\\\\\n\\\\\\\\nruntime stack:\\\\\\\\nruntime.throw(0x53e895, 0x9)\\\\\\\\n\\\\\\\\t/usr/local/go/src/runtime/panic.go:1116 +0x72\\\\\\\\nruntime.newosproc(0xc00002e000)\\\\\\\\n\\\\\\\\t/usr/local/go/src/runtime/os_linux.go:161 +0x1ba\\\\\\\\nruntime.newm1(0xc00002e000)\\\\\\\\n\\\\\\\\t/usr/local/go/src/runtime/proc.go:1753 +0xdc\\\\\\\\nruntime.newm(0x548970, 0x0)\\\\\\\\n\\\\\\\\t/usr/local/go/src/runtime/proc.go:1732 +0x8f\\\\\\\\nruntime.main.func1()\\\\\\\\n\\\\\\\\t/usr/local/go/src/runtime/proc.go:134 +0x36\\\\\\\\nruntime.systemstack(0x45bd74)\\\\\\\\n\\\\\\\\t/usr/local/go/src/runtime/asm_amd64.s:370 +0x66\\\\\\\\nruntime.mstart()\\\\\\\\n\\\\\\\\t/usr/local/go/src/runtime/proc.go:1041\\\\\\\\n\\\\\\\\ngoroutine 1 [running]:\\\\\\\\nruntime.systemstack_switch()\\\\\\\\n\\\\\\\\t/usr/local/go/src/runtime/asm_amd64.s:330 fp=0xc00002a788 sp=0xc00002a780 pc=0x45be70\\\\\\\\nruntime.main()\\\\\\\\n\\\\\\\\t/usr/local/go/src/runtime/proc.go:133 +0x70 fp=0xc00002a7e0 sp=0xc00002a788 pc=0x4332d0\\\\\\\\nruntime.goexit()\\\\\\\\n\\\\\\\\t/usr/local/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc00002a7e8 sp=0xc00002a7e0 pc=0x45de01\\\\\\\\n\\\\\\"\\"": unknown\n'
2020-11-02 06:43:08 [DEBUG] [lab.run] : stop lab ue0d23ac, exception, wait alive failed
```
:::
:::danger
After 840 containers
```
File "/usr/lib/python3/dist-packages/requests/models.py", line 935, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 409 Client Error: Conflict for url: http+docker://localhost/v1.40/containers/68d86854380b0b4111cd5dd5467499d3217a577ecf753272237b3d660eb5601f?v=False&link=False&force=False
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "src/gevent/greenlet.py", line 854, in gevent._gevent_cgreenlet.Greenlet.run
File "/build/mlsteam_agent/mlsteam_agent/lab.py", line 451, in run
self.remove()
File "/build/mlsteam_agent/mlsteam_agent/lab.py", line 133, in remove
raise e
File "/build/mlsteam_agent/mlsteam_agent/lab.py", line 130, in remove
docker.from_env().containers.get(self.uuid).remove()
File "/usr/local/lib/python3.6/dist-packages/docker/models/containers.py", line 351, in remove
return self.client.api.remove_container(self.id, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/docker/utils/decorators.py", line 19, in wrapped
return f(self, resource_id, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/docker/api/container.py", line 1009, in remove_container
self._raise_for_status(res)
File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 261, in _raise_for_status
raise create_api_error_from_http_exception(e)
File "/usr/local/lib/python3.6/dist-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation)
docker.errors.APIError: 409 Client Error: Conflict ("You cannot remove a running container 68d86854380b0b4111cd5dd5467499d3217a577ecf753272237b3d660eb5601f. Stop the container before attempting removal or force remove")
2020-11-02T06:47:28Z <Greenlet at 0x7fb0334fdb48: <bound method Lab.run of <mlsteam_agent.lab.Lab object at 0x7fb0334d5320>>> failed with APIError
```
:::
:::danger
```
After 900 containers
runtime: failed to create new OS thread (have 24 already; errno=11)
runtime: may need to increase max user processes (ulimit -u)
fatal error: newosproc
runtime stack:
runtime.throw(0x1af5fbe, 0x9)
/usr/local/go/src/runtime/panic.go:616 +0x81
runtime.newosproc(0xc4204eac00, 0xc420526000)
/usr/local/go/src/runtime/os_linux.go:164 +0x1af
runtime.newm1(0xc4204eac00)
/usr/local/go/src/runtime/proc.go:1879 +0x113
runtime.newm(0x1bc48b0, 0x0)
/usr/local/go/src/runtime/proc.go:1858 +0x9b
runtime.startTheWorldWithSema(0x1, 0x4579f3)
/usr/local/go/src/runtime/proc.go:1155 +0x1d0
runtime.gcMarkTermination.func3()
/usr/local/go/src/runtime/mgc.go:1647 +0x26
runtime.systemstack(0x0)
/usr/local/go/src/runtime/asm_amd64.s:409 +0x79
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1175
goroutine 36 [running]:
runtime.systemstack_switch()
/usr/local/go/src/runtime/asm_amd64.s:363 fp=0xc4200bb550 sp=0xc4200bb548 pc=0x453e00
runtime.gcMarkTermination(0x3fed900e52bfd36e)
/usr/local/go/src/runtime/mgc.go:1647 +0x407 fp=0xc4200bb720 sp=0xc4200bb550 pc=0x4187d7
runtime.gcMarkDone()
/usr/local/go/src/runtime/mgc.go:1513 +0x22c fp=0xc4200bb748 sp=0xc4200bb720 pc=0x41836c
runtime.gcBgMarkWorker(0xc420053400)
/usr/local/go/src/runtime/mgc.go:1912 +0x2e7 fp=0xc4200bb7d8 sp=0xc4200bb748 pc=0x4192e7
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc4200bb7e0 sp=0xc4200bb7d8 pc=0x456811
created by runtime.gcBgMarkStartWorkers
/usr/local/go/src/runtime/mgc.go:1723 +0x79
goroutine 1 [runnable]:
net/http.(*persistConn).roundTrip(0xc4206da000, 0xc420686e70, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/transport.go:2033 +0x5a7
net/http.(*Transport).RoundTrip(0xc4201260f0, 0xc420688500, 0xc4201260f0, 0x0, 0x0)
/usr/local/go/src/net/http/transport.go:422 +0x8f2
net/http.send(0xc420688500, 0x1c91720, 0xc4201260f0, 0x0, 0x0, 0x0, 0xc42000c870, 0xc42024d710, 0xc4206a36c8, 0x1)
/usr/local/go/src/net/http/client.go:252 +0x185
net/http.(*Client).send(0xc420686d50, 0xc420688500, 0x0, 0x0, 0x0, 0xc42000c870, 0x0, 0x1, 0x417e1a)
/usr/local/go/src/net/http/client.go:176 +0xfa
net/http.(*Client).Do(0xc420686d50, 0xc420688500, 0xc42003c0c0, 0xc420688500, 0xc420023c20)
/usr/local/go/src/net/http/client.go:615 +0x28d
github.com/docker/cli/vendor/golang.org/x/net/context/ctxhttp.Do(0x1cc40c0, 0xc42003c0c0, 0xc420686d50, 0xc420688400, 0x0, 0x27c1be0, 0x2)
/go/src/github.com/docker/cli/vendor/golang.org/x/net/context/ctxhttp/ctxhttp.go:30 +0x6e
github.com/docker/cli/vendor/github.com/docker/docker/client.(*Client).doRequest(0xc42023de80, 0x1cc40c0, 0xc42003c0c0, 0xc420688400, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/docker/cli/vendor/github.com/docker/docker/client/request.go:132 +0xbe
github.com/docker/cli/vendor/github.com/docker/docker/client.(*Client).Ping(0xc42023de80, 0x1cc40c0, 0xc42003c0c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/docker/cli/vendor/github.com/docker/docker/client/ping.go:17 +0x17c
github.com/docker/cli/cli/command.(*DockerCli).initializeFromClient(0xc420411b00)
/go/src/github.com/docker/cli/cli/command/cli.go:213 +0x8b
github.com/docker/cli/cli/command.(*DockerCli).Initialize(0xc420411b00, 0xc420481400, 0xc420686930, 0x3)
/go/src/github.com/docker/cli/cli/command/cli.go:197 +0x192
main.newDockerCommand.func2(0xc420672500, 0xc420686930, 0x3, 0x3, 0x0, 0x0)
/go/src/github.com/docker/cli/cmd/docker/docker.go:43 +0x71
github.com/docker/cli/vendor/github.com/spf13/cobra.(*Command).execute(0xc420672500, 0xc420038160, 0x3, 0x3, 0xc420672500, 0xc420038160)
/go/src/github.com/docker/cli/vendor/github.com/spf13/cobra/command.go:741 +0x579
github.com/docker/cli/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc420416c80, 0xc420583fb0, 0x17fc840, 0xc420583fc0)
/go/src/github.com/docker/cli/vendor/github.com/spf13/cobra/command.go:852 +0x30a
github.com/docker/cli/vendor/github.com/spf13/cobra.(*Command).Execute(0xc420416c80, 0xc420416c80, 0x1c91a00)
/go/src/github.com/docker/cli/vendor/github.com/spf13/cobra/command.go:800 +0x2b
main.main()
/go/src/github.com/docker/cli/cmd/docker/docker.go:180 +0xdc
goroutine 5 [syscall]:
os/signal.signal_recv(0x0)
/usr/local/go/src/runtime/sigqueue.go:139 +0xa6
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.0
/usr/local/go/src/os/signal/signal_unix.go:28 +0x41
goroutine 52 [chan receive]:
github.com/docker/cli/vendor/github.com/golang/glog.(*loggingT).flushDaemon(0x27c0f20)
/go/src/github.com/docker/cli/vendor/github.com/golang/glog/glog.go:882 +0x8b
created by github.com/docker/cli/vendor/github.com/golang/glog.init.0
/go/src/github.com/docker/cli/vendor/github.com/golang/glog/glog.go:410 +0x203
goroutine 114 [select]:
net/http.(*persistConn).readLoop(0xc4206da000)
/usr/local/go/src/net/http/transport.go:1717 +0x743
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:1237 +0x95a
goroutine 115 [runnable]:
internal/poll.(*FD).writeUnlock(0xc4206e4080)
/usr/local/go/src/internal/poll/fd_mutex.go:246 +0x5a
internal/poll.(*FD).Write(0xc4206e4080, 0xc4206f7000, 0x50, 0x1000, 0x50, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:261 +0x2de
net.(*netFD).Write(0xc4206e4080, 0xc4206f7000, 0x50, 0x1000, 0x6, 0x6, 0xc4204a2f88)
/usr/local/go/src/net/fd_unix.go:220 +0x4f
net.(*conn).Write(0xc420400018, 0xc4206f7000, 0x50, 0x1000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:188 +0x6a
net/http.persistConnWriter.Write(0xc4206da000, 0xc4206f7000, 0x50, 0x1000, 0x6, 0x1aecc32, 0x3)
/usr/local/go/src/net/http/transport.go:1253 +0x52
bufio.(*Writer).Flush(0xc420356040, 0x1c8f0e0, 0xc420356040)
/usr/local/go/src/bufio/bufio.go:573 +0x7e
net/http.(*persistConn).writeLoop(0xc4206da000)
/usr/local/go/src/net/http/transport.go:1838 +0x38d
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:1238 +0x97f
2020-11-02 06:48:41 [INFO ] => (Server) : {'title': 'task_heartbeat_update', 'data': {'uuid': 'uca0661a'}, 'host': 'hpc1'}
Error response from daemon: Container 1b94f2b8ce91f6276c62814e8e
```
:::
:::info
After delete all projects
Almost 200 containers fails and dead
:::
## SystemCtl default agent.service
```bash
Loaded: loaded (/etc/systemd/system/mlsteam_agent_100.74.53.2_10004.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2020-11-03 03:35:08 UTC; 29min ago
Main PID: 5057 (python3)
Tasks: 7356 (limit: 7372)
```
```bash=
Type=simple
Restart=always
NotifyAccess=none
RestartUSec=100ms
TimeoutStartUSec=1min 30s
TimeoutStopUSec=1min 30s
RuntimeMaxUSec=infinity
WatchdogUSec=0
WatchdogTimestampMonotonic=0
PermissionsStartOnly=no
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
MainPID=0
ControlPID=0
FileDescriptorStoreMax=0
NFileDescriptorStore=0
StatusErrno=0
Result=timeout
UID=[not set]
GID=[not set]
NRestarts=0
ExecMainStartTimestamp=Mon 2020-11-02 08:06:09 UTC
ExecMainStartTimestampMonotonic=425474543069
ExecMainExitTimestamp=Mon 2020-11-02 08:23:26 UTC
ExecMainExitTimestampMonotonic=426511587165
ExecMainPID=44247
ExecMainCode=2
ExecMainStatus=9
ExecStart={ path=/usr/bin/python3 ; argv[]=/usr/bin/python3 -m mlsteam_agent.cli -c /etc/mlsteam_agent/mlsteam_agent_100.74.53.2_10004.ini ; ignore_errors=no ; start_time=[Mon 2020-11-02 08:06:09 UTC] ; stop_t
Slice=system.slice
MemoryCurrent=[not set]
CPUUsageNSec=[not set]
TasksCurrent=[not set]
IPIngressBytes=18446744073709551615
IPIngressPackets=18446744073709551615
IPEgressBytes=18446744073709551615
IPEgressPackets=18446744073709551615
Delegate=no
CPUAccounting=no
CPUWeight=[not set]
StartupCPUWeight=[not set]
CPUShares=[not set]
StartupCPUShares=[not set]
CPUQuotaPerSecUSec=infinity
IOAccounting=no
IOWeight=[not set]
StartupIOWeight=[not set]
BlockIOAccounting=no
BlockIOWeight=[not set]
StartupBlockIOWeight=[not set]
MemoryAccounting=no
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryLimit=infinity
DevicePolicy=auto
TasksAccounting=yes
TasksMax=7372
IPAccounting=no
Environment=PYTHONUNBUFFERED=1
UMask=0022
LimitCPU=infinity
LimitCPUSoft=infinity
LimitFSIZE=infinity
LimitFSIZESoft=infinity
LimitDATA=infinity
LimitDATASoft=infinity
LimitSTACK=infinity
LimitSTACKSoft=8388608
LimitCORE=infinity
LimitCORESoft=0
LimitRSS=infinity
LimitRSSSoft=infinity
LimitNOFILE=4096
LimitNOFILESoft=1024
LimitAS=infinity
LimitASSoft=infinity
LimitNPROC=1030855
LimitNPROCSoft=1030855
LimitMEMLOCK=16777216
LimitMEMLOCKSoft=16777216
LimitLOCKS=infinity
LimitLOCKSSoft=infinity
LimitSIGPENDING=1030855
LimitSIGPENDINGSoft=1030855
LimitMSGQUEUE=819200
LimitMSGQUEUESoft=819200
LimitNICE=0
LimitNICESoft=0
LimitRTPRIO=0
LimitRTPRIOSoft=0
LimitRTTIME=infinity
LimitRTTIMESoft=infinity
OOMScoreAdjust=0
Nice=0
IOSchedulingClass=0
IOSchedulingPriority=0
CPUSchedulingPolicy=0
CPUSchedulingPriority=0
TimerSlackNSec=50000
CPUSchedulingResetOnFork=no
NonBlocking=no
StandardInput=null
StandardInputData=
StandardOutput=journal
StandardError=inherit
TTYReset=no
TTYVHangup=no
TTYVTDisallocate=no
SyslogPriority=30
SyslogLevelPrefix=yes
SyslogLevel=6
SyslogFacility=3
LogLevelMax=-1
SecureBits=0
CapabilityBoundingSet=cap_chown cap_dac_override cap_dac_read_search cap_fowner cap_fsetid cap_kill cap_setgid cap_setuid cap_setpcap cap_linux_immutable cap_net_bind_service cap_net_broadcast cap_net_admin ca
AmbientCapabilities=
DynamicUser=no
RemoveIPC=no
MountFlags=
PrivateTmp=no
PrivateDevices=no
ProtectKernelTunables=no
ProtectKernelModules=no
ProtectControlGroups=no
PrivateNetwork=no
PrivateUsers=no
ProtectHome=no
ProtectSystem=no
SameProcessGroup=no
UtmpMode=init
IgnoreSIGPIPE=yes
NoNewPrivileges=no
SystemCallErrorNumber=0
LockPersonality=no
RuntimeDirectoryPreserve=no
RuntimeDirectoryMode=0755
StateDirectoryMode=0755
CacheDirectoryMode=0755
LogsDirectoryMode=0755
ConfigurationDirectoryMode=0755
MemoryDenyWriteExecute=no
RestrictRealtime=no
RestrictSUIDSGID=no
RestrictNamespaces=no
MountAPIVFS=no
KeyringMode=private
KillMode=control-group
KillSignal=15
SendSIGKILL=yes
SendSIGHUP=no
Id=mlsteam_agent_100.74.53.2_10004.service
Names=mlsteam_agent_100.74.53.2_10004.service
Requires=sysinit.target system.slice
WantedBy=graphical.target
Conflicts=shutdown.target
Before=graphical.target shutdown.target
After=systemd-journald.socket basic.target sysinit.target system.slice
Description=MLSteam worker Agent
LoadState=loaded
ActiveState=failed
SubState=failed
FragmentPath=/etc/systemd/system/mlsteam_agent_100.74.53.2_10004.service
UnitFileState=enabled
UnitFilePreset=enabled
StateChangeTimestamp=Mon 2020-11-02 08:23:26 UTC
StateChangeTimestampMonotonic=426511587322
InactiveExitTimestamp=Mon 2020-11-02 08:06:09 UTC
InactiveExitTimestampMonotonic=425474543147
ActiveEnterTimestamp=Mon 2020-11-02 08:06:09 UTC
ActiveEnterTimestampMonotonic=425474543147
ActiveExitTimestamp=Mon 2020-11-02 08:21:56 UTC
ActiveExitTimestampMonotonic=426421553033
InactiveEnterTimestamp=Mon 2020-11-02 08:23:26 UTC
InactiveEnterTimestampMonotonic=426511587322
CanStart=yes
CanStop=yes
CanReload=no
CanIsolate=no
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=yes
OnFailureJobMode=replace
IgnoreOnIsolate=no
NeedDaemonReload=no
JobTimeoutUSec=infinity
JobRunningTimeoutUSec=infinity
JobTimeoutAction=none
ConditionResult=yes
AssertResult=yes
ConditionTimestamp=Mon 2020-11-02 08:06:09 UTC
ConditionTimestampMonotonic=425474538816
AssertTimestamp=Mon 2020-11-02 08:06:09 UTC
AssertTimestampMonotonic=425474538816
Transient=no
Perpetual=no
StartLimitIntervalUSec=10s
StartLimitBurst=5
StartLimitAction=none
FailureAction=none
SuccessAction=none
InvocationID=a1e86ac1404c4b50a7ade1ff53093e50
CollectMode=inactive
```
