--- title: Weave 22-10-2019 --- ``` [kostyrev ~ (k.fasten.com:kube-system) Wed Oct 23 12:55] > for i in $(kubectl get nodes -l kubernetes.io/role=master -o json | jq -r '.items[] | .status .addresses[] | select(.type=="InternalIP") | .address');do kubectl get pods -n kube-system -owide | grep $i;done | grep weave | cut -d ' ' -f1 ``` ``` weave-net-hdhv9 weave-net-xp2t6 ``` ``` [kostyrev ~ (k.fasten.com:kube-system) Wed Oct 23 09:31] > egrep --color=always '(weave-net-hdhv9|weave-net-xp2t6)' /tmp/verify -B 1` ``` ``` Group sum55046 has 48 nodes: weave-net-295v2 weave-net-29n8v weave-net-2zgxb weave-net-4ktrl weave-net-4pw5g weave-net-55x9n weave-net-5szlj weave-net-5vrgm weave-net-6mzs2 weave-net-6w9pz weave-net-79x6j weave-net-982qc weave-net-9xq2d weave-net-bf9q8 weave-net-bzk99 weave-net-c7t8v weave-net-f5dpq weave-net-f99fx weave-net-gg625 weave-net-hdhv9 weave-net-ht7hf weave-net-hvfh6 weave-net-jd2qh weave-net-jgztx weave-net-kq57v weave-net-ks829 weave-net-ktr6v weave-net-m2gwj weave-net-mbnh6 weave-net-mf9p4 weave-net-n67xn weave-net-nxgcn weave-net-ph7z4 weave-net-pv26m weave-net-pwqmp weave-net-q46hb weave-net-q56w8 weave-net-qcrvz weave-net-qdvdv weave-net-vcz64 weave-net-vhg42 weave-net-w9bkd weave-net-x78db weave-net-xkttl weave-net-xp2t6 weave-net-xpw2l weave-net-z9tnf weave-net-zxvvw ``` ``` [kostyrev ~/fasten/gitlab/DevOps/kubernetes/tf-k.fasten.com/kubernetes (k.fasten.com:kube-system) (use-r5a-large-for-nodes-monitoring *) Wed Oct 23 12:28] > k logs weave-net-hdhv9 -c weave --since 1h | grep '10.32.20.0' INFO: 2019/10/23 08:56:16.619710 ->[10.128.199.193:48970|b6:53:07:06:90 ``` ``` INFO: 2019/10/23 08:46:32.523401 ->[10.128.233.236:43475|0a:f2:cb:be:ff:8e(ip-10-128-233-236.ec2.internal)]: connection shutting down due to error: Inconsistent entries for 10.32.20.0: owned by c2:86:2f:c7:0d:e4 but incoming message says 66:ac:b8:1a:78:0c INFO: 2019/10/23 08:47:03.819578 ->[10.128.228.191:56387|66:ac:b8:1a:78:0c(ip-10-128-228-191.ec2.internal)]: connection shutting down due to error: Inconsistent entries for 10.32.20.0: owned by c2:86:2f:c7:0d:e4 but incoming message says 66:ac:b8:1a:78:0c c2:86:2f:c7:0d:e4(ip-10-128-231-223.ec2.internal) 72721 IPs (06.9% of total) 66:ac:b8:1a:78:0c(ip-10-128-228-191.ec2.internal) 4096 IPs (00.4% of total) ``` ``` [kostyrev ~/fasten/gitlab/DevOps/kubernetes/tf-k.fasten.com/kubernetes (k.fasten.com:kube-system) (use-r5a-large-for-nodes-monitoring *) Wed Oct 23 11:51] > kdno ip-10-128-228-191.ec2.internal | grep weave ``` ``` kube-system weave-net-px6qs 20m (1%) 0 (0%) 0 (0%) 0 (0%) 23h ``` ``` [kostyrev ~/fasten/gitlab/DevOps/kubernetes/tf-k.fasten.com/kubernetes (k.fasten.com:kube-system) (use-r5a-large-for-nodes-monitoring *) Wed Oct 23 10:27] > ksysex weave-net-px6qs -c weave -- /home/weave/weave --local report | jq '.IPAM.Entries[]| select(.Token == "10.32.20.0")' ``` ``` { "Token": "10.32.20.0", "Size": 1024, "Peer": "66:ac:b8:1a:78:0c", "Nickname": "ip-10-128-228-191.ec2.internal", "IsKnownPeer": true, "Version": 615 } ``` ``` [kostyrev ~/fasten/gitlab/DevOps/kubernetes/tf-k.fasten.com (k.fasten.com:kube-system) (use-r5a-large-for-nodes-monitoring *) Wed Oct 23 12:17] > kdno ip-10-128-231-223.ec2.internal | grep weave ``` ``` kube-system weave-net-4ktrl 20m (1%) 0 (0%) 0 (0%) 0 (0%) 18h ``` ``` [kostyrev ~/fasten/gitlab/DevOps/kubernetes/tf-k.fasten.com (k.fasten.com:kube-system) (use-r5a-large-for-nodes-monitoring *) Wed Oct 23 12:19] > ksysex weave-net-4ktrl -c weave -- /home/weave/weave --local report | jq '.IPAM.Entries[]| select(.Token == "10.32.20.0")' ``` ``` { "Token": "10.32.20.0", "Size": 1024, "Peer": "c2:86:2f:c7:0d:e4", "Nickname": "ip-10-128-231-223.ec2.internal", "IsKnownPeer": true, "Version": 615 } ``` утилитка verify-weave.sh.txt нам показывает, что они в разных группах, т.е. по-разному видят мир ``` [kostyrev ~ (k.fasten.com:kube-system) Wed Oct 23 09:25] > egrep --color=always '(weave-net-px6qs|weave-net-4ktrl)' /tmp/verify -B 1 ``` ``` Group sum18463 has 4 nodes: weave-net-9q75d weave-net-cs5ph weave-net-pwn7j weave-net-px6qs Group sum55046 has 48 nodes: weave-net-295v2 weave-net-29n8v weave-net-2zgxb weave-net-4ktrl weave-net-4pw5g weave-net-55x9n weave-net-5szlj weave-net-5vrgm weave-net-6mzs2 weave-net-6w9pz weave-net-79x6j weave-net-982qc weave-net-9xq2d weave-net-bf9q8 weave-net-bzk99 weave-net-c7t8v weave-net-f5dpq weave-net-f99fx weave-net-gg625 weave-net-hdhv9 weave-net-ht7hf weave-net-hvfh6 weave-net-jd2qh weave-net-jgztx weave-net-kq57v weave-net-ks829 weave-net-ktr6v weave-net-m2gwj weave-net-mbnh6 weave-net-mf9p4 weave-net-n67xn weave-net-nxgcn weave-net-ph7z4 weave-net-pv26m weave-net-pwqmp weave-net-q46hb weave-net-q56w8 weave-net-qcrvz weave-net-qdvdv weave-net-vcz64 weave-net-vhg42 weave-net-w9bkd weave-net-x78db weave-net-xkttl weave-net-xp2t6 weave-net-xpw2l weave-net-z9tnf weave-net-zxvvw ``` получается, что поды с хоста ip-10-128-228-191.ec2.internal ``` [kostyrev ~/fasten/gitlab/DevOps/kubernetes/tf-k.fasten.com/kubernetes (k.fasten.com:kube-system) (use-r5a-large-for-nodes-monitoring *) Wed Oct 23 12:40] > kubectl describe node ip-10-128-228-191.ec2.internal | sed -n -e '/^ ---------/,/Allocated/p' | sed -e '1d' -e '$d' | awk '{ print $1 "\t" $2 }' | column -t ``` ``` default ingress-internal-nginx-ingress-controller-7847b74d6f-mbkhm geoservice elasticsearch-client-f545c85df-x5rrm geoservice elasticsearch-master-0 kube-system kube-proxy-ip-10-128-228-191.ec2.internal kube-system weave-net-px6qs logging logs-shipper-fluentd-5jjhg monitoring prometheus-prod-node-exporter-xplmg monitoring seye-kube-exporters-prometheus-node-exporter-gfchz ustaxi backend-resolver-f946f7d69-jc2pg ustaxi cashier-db44bf5bd-r6glm ustaxi order-5df95d8687-h7sz2 ustaxi order-saver-6797c7d5f9-pps6f ustaxi partner-6649b6748f-npsfh ustaxi web-driver-cabinet-fasten-d4748dd6c-8mdkm ``` не могут подключиться к подам на хосте ip-10-128-231-223.ec2.internal ``` [kostyrev ~/fasten/gitlab/DevOps/kubernetes/tf-k.fasten.com/kubernetes (k.fasten.com:kube-system) (use-r5a-large-for-nodes-monitoring *) Wed Oct 23 12:41] > kubectl describe node ip-10-128-231-223.ec2.internal | sed -n -e '/^ ---------/,/Allocated/p' | sed -e '1d' -e '$d' | awk '{ print $1 "\t" $2 }' | column -t default ingress-nginx-ingress-controller-7c8c9f8f97-smswj ``` ``` fasten-com fasten-com-app-4073276648-8p44k geoservice corrector-6b557c9d75-hdxjg kube-system kube-proxy-ip-10-128-231-223.ec2.internal kube-system tiller-deploy-669564d7cd-dgjkr kube-system weave-net-4ktrl logging logs-shipper-fluentd-mw2s4 monitoring prometheus-prod-node-exporter-wmk48 monitoring seye-kube-exporters-prometheus-node-exporter-f949x ustaxi acceptance-rate-85bd574944-kks76 ustaxi order-5df95d8687-gr7bz ustaxi rider-api-gtw-668c5fb6cd-z4qrv ustaxi web-driver-cabinet-b49887d96-zzlzm ``` что в логах мы и наблюдаем ``` [kostyrev ~/fasten/gitlab/DevOps/kubernetes/tf-k.fasten.com (k.fasten.com:kube-system) (use-r5a-large-for-nodes-monitoring *) Wed Oct 23 13:18] > klo weave-net-px6qs -c weave --since 1h | grep --color=always ip-10-128-231-223 | grep 'connection shutting down due to error' ``` ``` INFO: 2019/10/23 09:21:02.419738 ->[10.128.231.223:6783|c2:86:2f:c7:0d:e4(ip-10-128-231-223.ec2.internal)]: connection shutting down due to error: Received update for IP range I own at 10.32.16.0 v811: incoming message says owner c2:86:2f:c7:0d:e4 v815 INFO: 2019/10/23 09:21:02.478179 ->[10.128.231.223:6783|c2:86:2f:c7:0d:e4(ip-10-128-231-223.ec2.internal)]: connection shutting down due to error: Multiple connections to c2:86:2f:c7:0d:e4(ip-10-128-231-223.ec2.internal) added to 66:ac:b8:1a:78:0c(ip-10-128-228-191.ec2.internal) ``` или ваще куда-либо как раз алерт прилетел про эти хосты и ордер на них ``` [kostyrev ~/fasten/gitlab/DevOps/kubernetes/tf-k.fasten.com/kubernetes (k.fasten.com:kube-system) (use-r5a-large-for-nodes-monitoring *) Wed Oct 23 12:47] > kgpo -owide -n ustaxi order-5df95d8687-gr7bz NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES order-5df95d8687-gr7bz 1/1 Running 0 63m 10.32.16.0 ip-10-128-231-223.ec2.internal <none> <none> [kostyrev ~/fasten/gitlab/DevOps/kubernetes/tf-k.fasten.com/kubernetes (k.fasten.com:kube-system) (use-r5a-large-for-nodes-monitoring *) Wed Oct 23 12:47] > kgpo -owide -n ustaxi order-5df95d8687-h7sz2 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES order-5df95d8687-h7sz2 1/1 Running 0 59m 10.32.16.0 ip-10-128-228-191.ec2.internal <none> <none> ```