owned this note
owned this note
Published
Linked with GitHub
# Debugging kernel networking bug
Working notes debugging https://bugzilla.redhat.com/show_bug.cgi?id=1781575
- See also https://bugzilla.redhat.com/show_bug.cgi?id=1832332
Spun up a 4.3.17 (RHCOS 8.1) cluster in libvirt, and ran:
```
$ ./_output/local/bin/linux/amd64/openshift-tests run openshift/conformance/parallel --run 'Services should be rejected'
```
many times - all passed.
Then used `oc debug node/` on a random worker, and downloaded the 8.2 kernel and switched to it:
```
# rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ab14709af5c410dd36ebeda30809c791ef08328af2aaa87b23372e39c17af44d
CustomOrigin: Managed by machine-config-operator
Version: 43.81.202004211653.0 (2020-04-21T16:58:52Z)
ReplacedBasePackages: linux-firmware 20190516-94.git711d3297.el8 -> 20191202-97.gite8a0f4c9.el8, kernel-modules kernel-modules-extra kernel kernel-core 4.18.0-147.8.1.el8_1 -> 4.18.0-193.el8
```
I used `oc adm cordon` on the other workers to ensure the test always ran on the node with my updated kernel.
And now running the test consistently fails. And just to re-verify, using `rpm-ostree reset -ol --reboot` and the test starts passing again.
still fails with `4.18.0-152.el8`
`4.18.0-151.el8` failed once but then seems to succeed repeatedly
## using systemtap
Spin up rhel8 dev container (w/coretoolbox) and use yum to install same kernel as target
Follow "Use Case 2: Creating a pre-compiled module" from https://access.redhat.com/articles/17839:
```
$ stap -r 4.18.0-193.el8.x86_64 -m inetpeer -p 4 inetpeer.stp
```
Use `kola http-server` to serve the module via HTTP from host. Get denied accessing from VM and eventually realize that I need to `firewall-cmd --zone libvirt --add-port=8000/tcp` so the libvirt VMs can access host. ([virtiofs](https://virtio-fs.gitlab.io/) can't come soon enough)
Try to just `insmod`, fails because SELinux breaks `spc_t` doing this; `setenforce 0` for now (but we need to figure out how to escape this)
Next, realize we need to get `staprun`.
Use `oc debug --image docker.io/centos:centos8` (because subscriptions are a pain) and `yum -y install systemtap-runtime`
Download the module from the host (or wherever you built it) and run it with `staprun` as below.
#### success!
We've got systemtap going with https://sourceware.org/systemtap/examples/network/netfilter_summary.stp
Now let's try tracing the icmp code...
```
#! /usr/bin/env stap
probe begin {
println("Watching ICMP, Ctrl-C to exit")
}
probe kernel.function("inet_peer_xrlim_allow") {
if ($peer == NULL) {
println("inet_peer_xrlim_allow(NULL peer)")
}
}
probe kernel.function("inet_peer_xrlim_allow").return {
printf("inet_peer_xrlim_allow last=%s tokens=%s ret=%s\n", $peer->rate_last$, $peer->rate_tokens$, $$return)
}
//probe kernel.statement("*@net/ipv4/inetpeer.c:281") {
// printf("icmpv4_xrlim_allow: rate_tokens=%s rate_last=%s\n", $peer->rate_tokens, $peer->rate_last)
//}
probe kernel.function("icmpv4_xrlim_allow").return {
printf("icmpv4_xrlim_allow(type=%d code=%d) %s\n", $type, $code, $$return)
}
probe kernel.function("icmpv4_global_allow").return {
if ($return == 0) {
println("icmpv4_global_allow: denied")
}
}
//probe kernel.function("icmp_send") {
// printf("icmp_send(%s)\n", $$parms$)
//}
probe module("nf_reject_ipv4").function("nf_send_unreach") {
printf("nf_send_unreach: %s\n", $skb_in->dev->name$)
}
```
Running that while executing the test:
```
sh-4.4# curl -LO http://192.168.126.1:8000/inetpeer.ko && staprun inetpeer.ko
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1517k 100 1517k 0 0 493M 0 --:--:-- --:--:-- --:--:-- 1481M
Watching ICMP, Ctrl-C to exit
icmpv4_xrlim_allow(type=5 code=1) return=0x1
icmpv4_xrlim_allow(type=5 code=1) return=0x1
icmpv4_xrlim_allow(type=5 code=1) return=0x1
icmpv4_xrlim_allow(type=5 code=1) return=0x1
icmpv4_xrlim_allow(type=5 code=1) return=0x1
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4347872279 tokens=0 ret=return=0x0
icmpv4_xrlim_allow(type=3 code=3) return=0x0
icmpv4_xrlim_allow(type=5 code=1) return=0x1
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4347873280 tokens=0 ret=return=0x0
icmpv4_xrlim_allow(type=3 code=3) return=0x0
icmpv4_xrlim_allow(type=5 code=1) return=0x1
icmpv4_xrlim_allow(type=5 code=1) return=0x1
icmpv4_xrlim_allow(type=5 code=1) return=0x1
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4347877380 tokens=0 ret=return=0x0
icmpv4_xrlim_allow(type=3 code=3) return=0x0
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4347878400 tokens=0 ret=return=0x0
icmpv4_xrlim_allow(type=3 code=3) return=0x0
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4347881378 tokens=0 ret=return=0x0
icmpv4_xrlim_allow(type=3 code=3) return=0x0
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4347882432 tokens=0 ret=return=0x0
icmpv4_xrlim_allow(type=3 code=3) return=0x0
icmpv4_xrlim_allow(type=5 code=1) return=0x1
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4347885378 tokens=0 ret=return=0x0
icmpv4_xrlim_allow(type=3 code=3) return=0x0
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4347886400 tokens=0 ret=return=0x0
icmpv4_xrlim_allow(type=3 code=3) return=0x0
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4347889379 tokens=0 ret=return=0x0
icmpv4_xrlim_allow(type=3 code=3) return=0x0
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4347890432 tokens=0 ret=return=0x0
icmpv4_xrlim_allow(type=3 code=3) return=0x0
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4347893373 tokens=0 ret=return=0x0
icmpv4_xrlim_allow(type=3 code=3) return=0x0
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4347894400 tokens=0 ret=return=0x0
icmpv4_xrlim_allow(type=3 code=3) return=0x0
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4347897390 tokens=0 ret=return=0x0
icmpv4_xrlim_allow(type=3 code=3) return=0x0
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4347898432 tokens=0 ret=return=0x0
icmpv4_xrlim_allow(type=3 code=3) return=0x0
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4347901400 tokens=0 ret=return=0x0
icmpv4_xrlim_allow(type=3 code=3) return=0x0
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4347902464 tokens=0 ret=return=0x0
icmpv4_xrlim_allow(type=3 code=3) return=0x0
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4347905380 tokens=0 ret=return=0x0
icmpv4_xrlim_allow(type=3 code=3) return=0x0
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4347906432 tokens=0 ret=return=0x0
```
And here's a run from a passing test:
```
# staprun inetpeer81.ko
Watching ICMP, Ctrl-C to exit
icmpv4_xrlim_allow(type=5 code=1) return=0x1
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4361979389 tokens=1 ret=return=0x0
icmpv4_xrlim_allow(type=3 code=3) return=0x0
icmpv4_xrlim_allow(type=5 code=1) return=0x1
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4361980416 tokens=6 ret=return=0x0
icmpv4_xrlim_allow(type=3 code=3) return=0x0
icmpv4_xrlim_allow(type=5 code=1) return=0x1
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4361984622 tokens=7 ret=return=0x0
icmpv4_xrlim_allow(type=3 code=3) return=0x0
nf_send_unreach: "tun0"
inet_peer_xrlim_allow last=4361984628 tokens=13 ret=return=0x1
icmpv4_xrlim_allow(type=3 code=3) return=0x1
```
It seems like something's not initializing tokens?
---
Hum...so I finally read the comment
> Note that the same inet_peer fields are modified by functions in route.c too
And looking there, I notice we lost an increment of rate_tokens:
```
diff -u kernel-4.18.0-117.el8/linux-4.18.0-117.el8_1_0.x86_64/net/ipv4/route.c kernel-4.18.0-193.el8/linux-4.18.0-193.el8_2_0.x86_64/net/ipv4/route.c
--- kernel-4.18.0-117.el8/linux-4.18.0-117.el8_1_0.x86_64/net/ipv4/route.c 2019-07-16 13:21:04.000000000 +0000
+++ kernel-4.18.0-193.el8/linux-4.18.0-193.el8_2_0.x86_64/net/ipv4/route.c 2020-03-27 13:57:18.000000000 +0000
@@ -908,16 +908,15 @@
if (peer->rate_tokens == 0 ||
time_after(jiffies,
(peer->rate_last +
- (ip_rt_redirect_load << peer->rate_tokens)))) {
+ (ip_rt_redirect_load << peer->n_redirects)))) {
__be32 gw = rt_nexthop(rt, ip_hdr(skb)->daddr);
icmp_send(skb, ICMP_REDIRECT, ICMP_REDIR_HOST, gw);
peer->rate_last = jiffies;
- ++peer->rate_tokens;
++peer->n_redirects;
```
which seems to have come from https://github.com/torvalds/linux/commit/b406472b5ad79ede8d10077f0c8f05505ace8b6d
Not certain this is it but certainly the token values in the previous kernel are small and possibly were just incremented there.