owned this note
owned this note
Published
Linked with GitHub
BSDCam Transport Session
========================
* Agenda Bashing
* Linux NetDev Report from thj@
* Co-located with IETF event
* Not especially useful for FreeBSD people
* Things they are doing:
* tight vendor intergration for switch ASICs
* switchdev API, switch configurations
* Mellanox, Barefoot, and Cumulous
* FreeBSD likely to lag behind
* Barefoot: Intellectual Property in compiler
* Would be willing to open source spec for configuring ASIC
* Librification of netfilter tools (firewall rules in JSON)
* Write firewall config tools in higher level languages
* What do sysadmins want to have libs/JSON etc
* Demo of netfilter implemented in eBPF
* Have a "Tell developers what you think/want" session at MeetBSD
* Getting more feedback from users and sysadmins
* Have a FreeBSDCon, a devsummit focused on getting users to tell us about their needs/pains/desires
* Making IPv6 Suck Less
* Perform Better
* Missing RFCs
* thj is implementing RFC7112
* Roaming WiFi: ipv4 renegotiates DHCP, but SLAAC doesn't get reset
* jtl's concern: the complexity of headers, cases where host may be instructed to do work.
* There are some measurements of what % of traffic gets dropped if it has extension headers. Cisco is apparently doing fresh stats on this.
* jtl would like a sysctl bitmask to ignore extension specific types of headers
* A bug with v6 fragments, if RSS enabled, counter of how many headers have been processed gets reset to 0
* Optimizations that have only been done to v4, may need to be replicated for v6
* v4 may be more strictly compliant, v6 is often less complaint
* v4 would not accept more than 16 fragments
* bz would liken us to be RFC8200 complaint
* Who wants to actually work on v6: thj, bz, gallatin@, left 1/2 of rrs@, right 1/2 of tuexen@
* Old ipv6 todo page: https://wiki.freebsd.org/IPv6/ToDo
* An equiv to the v4 RFC page: https://wiki.freebsd.org/TransportProtocols/tcp_rfc_compliance
* We need more test cases, both for things that work (so we don't break them), and for things that are broken (so we know when it is fixed)
* OpenBSD has a python based v6 test suite that works on FreeBSD
* tuexen@ has a set of test packages that are ready to be hooked up to CI
* Take Away: status reports on the bi-weekly transport call
* IP[46]/TCP Reassembly Bugs/Stuff
* Researcher found that 'walking linked list is slow, and bad'
* The kernel created long linked lists for out-of-order TCP segments and fragment chains.
* IPv6: Used to limit resources in very differently than v4, now uses the same vocabularity
* IPv6 fragments were not hashed into buckets, now they are
* Performance suffers too much when the list exceeds 100, this is the new limit
* Mostly just a workaround, papers over the problem. Needs an algorithmic fix
* If more than a trivial number of fragments, needs a better solution. glebius@ is working on an implementation of fragment processing code using red-black tree. Needs a security review. Is the performance impact acceptable.
* TCP: rrs@ working on collescing code
* Updated version coming to phabricator soon
* tuxen@ wrote test cases for reassmbly
* jhb@ and jtl@ have a todo list
* use queue.h
* v6 code requires changes in many places
* Need a modernization pass, remove #ifdef KAME etc
* Too much noise in the code, harder to read and reason about
* Need a regression suite
* Give it the FreeBSD stink(tm)
* bz@ may have old project in perforce that does some cleanup, likely applies fairly well
* Todo: pf
* brooks@ would prefer a cleanup of the IOCTLs
* TFO (TCP Fast Open)
* Who might have patches?
* Known interop problem with Windows
* TCP option alignment
* tuexen@ has test cases for this, need to extract them from him (with pliers)
* Limelight extension with shared secret
* Alternate Stacks
* Infrastructure
* Allow different TCP stacks concurrently (side-by-side)
* Use setsockopt() to assign individual sockets to the alternative stacks
* Requires that when you switch stacks you must update the common TCPcb
* A/B test stacks, route n% of traffic to the new stack, compare stats from the two stacks
* Can be used to different workloads
* Live-patching by loading newer version of stack without rebooting
* Allows much more active development, frees development from usual requirements (work across low cpu/ram count to high cpu/ram count)
* RACK
* IETF draft: https://tools.ietf.org/html/draft-ietf-tcpm-rack-04
* Our code only supports draft -02.
* Netflix not driven to update at this time
* Recent ACK + Tail loss probe
* use RTT to predict when to try to keep transmitting
* use SACK to use RTT to predict when to retransmit
* PRR Proportional Rate Reduction (https://tools.ietf.org/html/rfc6937), keep sending more data as you get ACKs instead of waiting for 1/2 of window
* Burst mitigation, high percision timing system
* Much better quality of experience
* Keeps a send map, how many times each segment has been sent, better than old SACK
* robert@ asks about reducing diff between base stack and RACK
* Improved recovery
* In head, higher cost to use
* Most all video traffic at Netflix uses RACK
* Even fill traffic will use RACK eventually
* Head is a bit different than what Netflix is using right now, head is considered far better
* Doing new tests to compare 2017 to 2018 stack
* BBR
* Experimental congestion control, but actually a different stack
* Builds on RACK
* Even higher than cost than RACK
* BBR v1.0 is controversial
* Netflix has enhanced this for their implementation
* In router small buffer scenarios it is unfair to newreno/cubic
* BBR v2.0 looks to improve this
* Netflix not necessarily sold on Google's ideas
* Assumes loss is not congestion based
* "Policer detection" to notice when you are being rate limited by a middlebox
* "Blackbox Recorder"
* Volunteer to make ports? :-)
* Came from Netflix
* Log state of TCPcb, the packet, timers, other data to ring buffer
* Can be dumped out to userspace
* Tooling exists, needs ports
* Writes out pcapng files
* Traceviewer provides visual interface
* Analysis daemon that runs continuously and runs tests againsts the data, in the form of assertions
* After panic, can extract the data from the ring buffer
* RACK and BBR development depended upon blackbox
* Extend wireshark to understand the metadata
* Attend SharkFest to present FreeBSD work
* RCU "Locking"
* mmacy@ applied RCU to IP stack
* Requires a mindset change
* read-locks are not always "locks"
* Register your intention to read the data structure
* ConcurrencyKit will not garbage collect the data while you are using it
* In 13 we should shift to using these more
* To date we only have a first pass
* More though about which data structures requires "full" locks
* Make engineering decisions to use the new CK features more
* Avoid "lock chains" that require acquiring many locks in a sequence
* Rethink locking from a more fundamental prespective
* Used to allow add/remove from list, while another process is walking through the list
* Netflix is committed to upstreaming and being good community citizens
*