HLD Virtual Routing Forwarding (VRF) ===== ###### tags: `SONiC` # Revision |Revision| Author| Date | |----|----|----| |Initial version|Dung-Ru Tsai |04/06/2021| |Add test case/neighbor nexthopkey/ config db schema|Dung-Ru Tsai |08/06/2021| |Add src_mac, ip state, ip option config_db schema|Dung-Ru Tsai |27/07/2021| # About this manual VRF contains a separate address space with unicast and multicast route tables for IPv4 and IPv6 and makes routing decisions independent of any other VRF. Each switch has a default VRF and a management VRF: - Management VRF (Not discuss in this document) - The management VRF is for management purposes only. - Only the mgmt 0 interface can be in the management VRF. - The mgmt 0 interface cannot be assigned to another VRF. - No routing protocols can run in the management VRF (static only). - Default VRF - All Layer 3 interfaces exist in the default VRF until they are assigned to another VRF. - Routing protocols run in the default VRF context unless another VRF context is specified. - The default VRF uses the default routing context for all show commands. # Scope This document only define the unicast VRF. Management VRF, multicast VRF and VRF route leaking are not consider in this document. # Definitions/Abbreviations This section covers the abbreviation if any, used in this high-level design document and its definitions. | Abbreviation | Meaning | |--------------|----------------------------| | VRF | Virtual Routing and Forwarding | | RIB | Routing Information Base | | PBR | Policy based routing | | FRR | FRRouting is an IP routing protocol suite for Linux and Unix platforms | # 1. Requirements 1. Add or Delete VRF instance 2. Bind L3 interface to a VRF. - L3 interface includes port interface, vlan interface, LAG interface and loopback interface. 3. Static IP route with VRF 4. Enable BGP VRF aware in SONiC 5. VRF Scalability: Currently VRF number can be supported up to 1000 after fixing a bug in FRR. (Depend on ASIC ability) 6. Loopback devices with vrf. # 2. Architecture Design ```mermaid graph TD CONFIG_DB[(CONFIG_DB)] APPL_DB[(APPL_DB)] vrfmgrd(vrfmgrd) intfmgrd(intfmgrd) fpmsyncd(fpmsyncd) zebra(zebra) bgpd(bgpd) vrfcli(VRF CLI) orchagent(Orchagent) Netlink[Kernel Netlink] vrfcli-->CONFIG_DB subgraph Database CONFIG_DB APPL_DB ASIC_DB end subgraph SWSS-container vrfmgrd intfmgrd orchagent end CONFIG_DB-->|create/del VRF|vrfmgrd CONFIG_DB-->|VRF bind/unbind|intfmgrd subgraph Syncd-container syncd---SAI-API SAI-API---ASIC-SDK end APPL_DB-->orchagent orchagent-->ASIC_DB ASIC_DB-->syncd vrfmgrd-->|create/del VRF|APPL_DB intfmgrd-->|VRF bind/unbind|APPL_DB vrfmgrd-->|create/del VRF|Netlink intfmgrd-->|VRF bind/unbind|Netlink subgraph BGP-container bgpd-->zebra zebra-->bgpd zebra-->fpmsyncd end Netlink-->|create/del VRF, VRF bind/unbind|zebra fpmsyncd-->|VRF Route info|APPL_DB ``` ## 2.1 SWSS Container ### 2.1.1 Config Manager Daemon SONiC VRF need two config manager daemon to handle the CONFIG_DB changes. Manager daemons have two main task. - Synchronize new config_db status to APPL_DB. - Trigger netlink message to notify the kernel and FRR zebra daemon. #### 2.1.1.1 Vrf Manager (vrfmgrd) - Listening to VRF creation/deletion configuration in config_db `CFG_VRF_TABLE_NAME`. Once detected, 1. Set `STATE_VRF_TABLE_NAME` state to ok. 2. Update kernel using iproute2 CLIs. 3. Write VRF information to `APP_VRF_TABLE_NAME`. - When vrfmgrd receives VRF delete event it wont process the event till all the devices belonging to this VRF are unbound from the VRF. 1. delete `APP_VRF_TABLE_NAME` by Vrf-name 2. delete `STATE_VRF_TABLE_NAME` by Vrf-name 3. Update kernel using iproute2 CLIs. 4. vrforch will delete the`STATE_VRF_OBJECT_TABLE_NAME` by Vrf-name - vrfmgrd process will be placed in swss docker. In case of swss docker warm reboot, since VRF device is still retained in kernel, when vrfmgrd starts up it will recover the VRF system state from kernel. #### 2.1.1.2 Interface Manager (intfmgrd) IP address event and **VRF binding event** need to be handled seperately. These two events has sequence dependency. - Listening to interface binding to specific VRF configuration in config_db. - bind to VRF event: - bind kernel device to master VRF - add interface entry with VRF attribute to `APP_INTF_TABLE_NAME(INTF_TABLE)`. - set vrf-binding flag on STATE_DB `STATE_INTERFACE_TABLE_NAME` table. - unbind from VRF event: - wait until all ip addresses associated with the interface is removed. Ip address infomation can be retrieved from kernel. - unbind kernel device to global VRF (default VRF) - del interface entry with VRF attribute from `APP_INTF_TABLE_NAME(INTF_TABLE)` table - Remove vrf-binding on STATE_DB `STATE_INTERFACE_TABLE_NAME` - Listening to interface ip address configuration in config_db. - add ip address event: - After interface bind vrf is set, set ip address on kernel device - Add {interface_name:ip address} entry to `APP_INTF_TABLE_NAME(INTF_TABLE)` and `STATE_INTERFACE_TABLE_NAME` - del ip address event: - unset ip address on kernel device - Delete {interface_name:ip address} entry from `APP_INTF_TABLE_NAME` and `STATE_INTERFACE_TABLE_NAME` #### 2.1.1.3 Neighbor (nbrmgrd) - Listening to neighhor configuration (CFG_NEIGH_TABLE_NAME "NEIGH") table on configdb, add neighbor entry to kernel only after the corresponding intf-bind-vrf event is processed. - In the current implementation neighbor may be added to kernel before intf-bind-vrf event. After intf-bind-vrf event kernel will flush all neighbors associated with this interface, the neighbor configuration get lost. - intf-bind-vrf: add interface entry with VRF attribute. ### 2.1.2 Orchagent #### 2.1.2.1 vrforch - Monitoring `APP_VRF_TABLE_NAME`, using `sai_create_virtual_router_fn` or `sai_remove_virtual_router_fn` defined in saivirtualrouter.h to track (VR, VRF) creation/deletion and save (vrf_name, vrf-vid) pairs. - When vrforch receives vrf-delete event for a given VRF, **this VRF object should be deleted after routes and router interfaces related this VRF are removed. Neigh object related VRF is implicit guaranteed by router interface object related VRF.** #### 2.1.2.2 intfsorch - add vrforch as a member of intfsorch - intfsorch monitors app-intf-table - When APP_INTF_TABLE_NAME change - bind to vrf event: create router interface with vrf attribute and increase refcnt of vrforch. - unbind from vrf event: wait until all ip addresses on interface is removed, then remove router interface with vrf attribute, decreasing refcnt of vrforch After the binding, we must add the the router interface again (set interrface ip). During router interface create, the attribute `SAI_ROUTER_INTERFACE_ATTR_VIRTUAL_ROUTER_ID` will be include to `SAI_OBJECT_TYPE_ROUTER_INTERFACE` object. #### 2.1.2.3 routeorch - Add vrforch as a member of routeorch - Once APP_ROUTE_TABLE_NAME has new udpate, get VRF object ID from vrforch by vrf_name. - APP_ROUTE_TABLE_NAME is update by FRR fpmsyncd. - Nexthop key is changed to `(ipaddress, intf_name)` pair from `ipaddress`. You could ceck the `src/sonic-swss/orchagent/nexthopkey.h` - The key of Nexthop group is the set of nexthop key. - The value of routetable is changed to the set of `(ipaddress, intf_name)` pair from `ipaddresses` - Expand single routetable to mutiple routetables with vrf ID as the key - Update refcnt of vrforch #### 2.1.2.4 neighorch changes The Key of Nexthop now is changed from only ipaddress to a pair of (ipaddress, intf_name). ```clike struct NextHopKey {}; typedef NextHopKey NeighborEntry; NeighborEntry neighbor_entry = { ip_address, alias }; ``` Notation: intf_name is alias name. You could ceck the `src/sonic-swss/orchagent/nexthopkey.h` #### 2.1.2.5 aclorch changes The Key of redirect-nexthop is changed from only ip address to a pair of (ipaddress@Vrf-name). You could ceck the `src/sonic-swss/orchagent/nexthopkey.h` ## 2.2 BGP Container ### 2.2.1 fpmsyncd - fpmsyncd will add VRF support, it can use `rtnl_route_get_table` to get VRF table ID. But with the current FRR implementation, this API returns the master devices' ifIndex for this VRF. The VRF name of Prefix can be derived from ifIndex. - The key of `APP_ROUTE_TABLE_NAME` is "vrf_name:prefix". - The route from FRR has nexthop information which contain nexthop_ipaddress and interface index. **Nexthop interface contain vrf information**. It is available for route-leak scenarios. ## 2.3 Database ### 2.3.1 CONFIG_DB For VRF, VRF is main table in CONFIG_DB. INTERFACE, LOOPBACK_INTERFACE, PORTCHANNEL_INTERFACE, VLAN_INTERFACE, BGP_NEIGHBOR, BGP_PEER_RANGE, ACL_RULE, STATIC_ROUTE 8 tables need to be change to support VRF. #### CFG_VRF_TABLE_NAME "VRF" Schema: ``` ;defines virtual routing forward table ; ;Status: stable key = VRF_TABLE|vrf_name ; fallback = "true"/"false"; v4 = "true"/"false"; Admin V4 state v6 = "true"/"false"; Admin V6 state ip_opt_action = "drop"/"forward"; Action for Packets with IP options src_mac = "MAC Address"; Example format "00:12:34:56:78:9a" ttl_action = "drop"/"forward"; Action for Packets with TTL 0 or 1 ``` :::warning fallback features not support yet. ::: Redis DB dump: ``` 127.0.0.1:6379[4]> keys *VRF* 1) "VRF|Vrf-green" 2) "VRF|Vrf-red" 127.0.0.1:6379[4]> hgetall "VRF|Vrf-green" 1) "NULL" 2) "NULL" ``` The following 4 sections are interface relative #### CFG_INTF_TABLE_NAME "INTERFACE" changes ```json "INTERFACE":{ "Ethernet0":{ "vrf_name":"Vrf-blue" // vrf_name must start with "Vrf" prefix }, "Ethernet1":{ "vrf_name":"Vrf-red" }, "Ethernet2":{}, // it means this interface belongs to global vrf. It is necessary even user doesnt use vrf. "Ethernet0|11.11.11.1/24": {}, "Ethernet0|12.12.12.1/24": {}, "Ethernet1|12.12.12.1/24": {}, "Ethernet2|13.13.13.1/24": {} }, ``` Schema Changes in INTERFACE: ``` ;Define INTERFACE table key = INTERFACE|EthernetID EthernetID = "Ethernet"VCHAR ; ethernet id with Ethernet prefix ; field = value vrf_name = string ;VRF name with Vrf prefix ``` #### CFG_LOOPBACK_INTERFACE_TABLE_NAME "LOOPBACK_INTERFACE" changes ```json "LOOPBACK_INTERFACE":{ "Loopback0":{ "vrf_name":"Vrf-yellow" }, "Loopback0|14.14.14.1/32":{} }, ``` Schema Changes in LOOPBACK_INTERFACE: ``` ;Define LOOPBACK_INTERFACE table key = LOOPBACK_INTERFACE|LoopbackID LoopbackID = "Loopback"VCHAR; loopback id with Loopback prefix ; field = value vrf_name = string ;VRF name with Vrf prefix ``` #### CFG_LAG_INTF_TABLE_NAME "PORTCHANNEL_INTERFACE" changes ```json "PORTCHANNEL_INTERFACE":{ "Portchannel0":{ "vrf_name":"Vrf-yellow" }, "Portchannel0|16.16.16.1/24":{} } ``` Schema Changes in PORTCHANNEL_INTERFACE: ``` ;Define PORTCHANNEL_INTERFACE table key = PORTCHANNEL_INTERFACE|PortchannelID PortchannelID = "Portchannel"VCHAR ; portchannel id with Portchannel prefix ; field = value vrf_name = string ;VRF name with Vrf prefix ``` #### CFG_VLAN_INTF_TABLE_NAME "VLAN_INTERFACE" changes ```json "VLAN_INTERFACE": { "Vlan100":{ "vrf_name":"Vrf-blue" }, "Vlan100|15.15.15.1/24": {} }, ``` Schema Changes in VLAN_INTERFACE: ``` ;Define VLAN_INTERFACE table key = VLAN_INTERFACE|VlanID VlanID = "Vlan"VCHAR ; vlan id with Vlan prefix ; field = value vrf_name = string ;VRF name with Vrf prefix ``` #### CFG_BGP_NEIGHBOR_TABLE_NAME "BGP_NEIGHBOR" changes ```json "BGP_NEIGHBOR": { "Vrf-blue|10.0.0.49": { // This neighbour belongs to Vrf-blue "name": "ARISTA09T0", "rrclient": "0", "local_addr": "10.0.0.48", "asn": "64009", "nhopself": "0" } } ``` Schema Changes in BGP_NEIGHBOR: ``` ;Define BGP_NEIGHBOR table key = BGP_NEIGHBOR|Vrf-name|IP-prefix Vrf-name = string ;VRF name with Vrf prefix IP-prefix = = IPv4Prefix / IPv6prefix ; field = value ; No changes ``` #### CFG_BGP_PEER_RANGE_TABLE_NAME "BGP_PEER_RANGE" changes ```json "BGP_PEER_RANGE": { "BGPSLBPassive": { // This BGP_PEER_Group belong to Vrf-blue "name": "BGPSLBPassive", "vrf_name": "Vrf-blue", "src_address":"10.1.1.2", "ip_range": [ "192.168.8.0/27" ] } } ``` Schema Changes in BGP_PEER_RABGE: ``` ;Define BGP_PEER_RABGE table key = BGP_PEER_RANGE|peer-range-name peer-range-name = "BGPSLBPassive"/"BGPVac" ; field = value vrf_name = string ;VRF name with Vrf prefix ``` #### CFG_ACL_RULE_TABLE_NAME "ACL_RULE" changes The existing acl_rule_table definition is the following. ```json "table1|rule1": { "L4_SRC_PORT": "99", "PACKET_ACTION": "REDIRECT:20.1.1.93,30.1.1.93" }, "table1|rule2": { "L4_SRC_PORT": "100", "PACKET_ACTION": "REDIRECT:20.1.1.93" }, ``` To support vrf the nexthop key should change to `{IP@Vrf-name}` pair from single `{IP}`. For backward compatibilty nexthop key `{IP}` is also supported, it only works on global vrf. So new acl_rule_table should like the following. ```json "table1|rule1": { "L4_SRC_PORT": "99", "PACKET_ACTION": "REDIRECT:20.1.1.93@Vrf-blue,30.1.1.93" }, "table1|rule2": { "L4_SRC_PORT": "100", "PACKET_ACTION": "REDIRECT:20.1.1.93@Vrf-blue, 30.1.1.93" }, ``` The REDIRECT Vrf parser implement is here: `src/sonic-swss/orchagent/nexthopkey.h:NextHopKey(const std::string &str)` Schema change in ACL_RULE_TABLE ``` key: ACL_RULE_TABLE|table_name|rule_name ; key of the rule entry in the table, ; seq is the order of the rules ; when the packet is filtered by the ; ACL "policy_name". ; A rule is always assocaited with a ; policy. ;field = value packet_action = "redirect:"redirect_parameter ; an action when the fields are matched ; we have a parameter in case of packet_action="redirect" ; This redirect_parameter defines a destination for redirected packets ; it could be: : name of physical port. Example: "Ethernet10" : name of LAG port Example: "PortChannel5" : next-hop ip address with prefix Vrf. Example: "10.0.0.1@Vrf-name" : next-hop group set of addresses Example: "10.0.0.1@Vrf-name,10.0.0.3" redirect_action = string ; It could be: : name of physical port. Example: "Ethernet10" : name of LAG port Example: "PortChannel5" : next-hop ip address Example: "10.0.0.1@Vrf-name" or "10.0.0.1" : next-hop group set of addresses Example: "10.0.0.1,10.0.0.3" or "10.0.0.1@Vrf-name,10.0.0.3" ``` Vrf name append in redirect parameter with `@Vrf-name`. #### CFG_STATIC_ROUTE_TABLE_NAME "STATIC_ROUTE" changes Config DB not implement the table yet. ``` ;Defines IP static route table ; ;Status: stable key = STATIC_ROUTE|vrf_name|prefix ; vrf_name = "Vrf"string ;VRF name with Vrf prefix prefix = IPv4Prefix / IPv6prefix ``` Reference: [static route from bgpcfgd](https://github.com/Azure/SONiC/blob/master/doc/static-route/SONiC_static_route_hdl.md) Preview the JSON format. ```json "STATIC_ROUTE": { "Vrf-red|192.168.113.0/24": { "blackhole":"False", "distance":"2", "ifname":"Ethernet4", "nexthop":"10.10.10.1" } }, ``` ### 2.3.2 APPL_DB For VRF, VRF_TABLE is main table in APPL_DB. And INTF_TABLE, ROUTE_TABLE need to change. #### APP_VRF_TABLE_NAME "VRF_TABLE" Schema: ``` ;defines virtual routing forward table ; ;Status: stable key = VRF_TABLE:vrf_name ; ;field = value fallback = "true"/"false" ``` Redis DB dump: ``` 127.0.0.1:6379> keys *VRF* 1) "VRF_TABLE:Vrf-red" 2) "VRF_TABLE:Vrf-green" 127.0.0.1:6379> hgetall "VRF_TABLE:Vrf-red" 1) "NULL" 2) "NULL" ``` #### APP_INTF_TABLE_NAME "INTF_TABLE" changes Schema: ``` ;defines logical network interfaces, an attachment to a PORT name ; ;Status: stable key = INTF_TABLE:ifname ;field = value vrf_name = 1\*15VCHAR ;New add ``` Redis DB dump: ``` 127.0.0.1:6379> hgetall "INTF_TABLE:Ethernet0" 1) "vrf_name" 2) "Vrf-red" 3) "mac_addr" 4) "00:00:00:00:00:00" ``` #### APP_ROUTE_TABLE_NAME "ROUTE_TABLE" changes Schema: ``` ;Stores a list of routes ;Status: Mandatory key = ROUTE_TABLE:vrf-name:ip_prefix ;vrf-name start with 'Vrf' prefix ``` Redis DB dump: ``` 127.0.0.1:6379> keys *ROUTE* 1) "ROUTE_TABLE:Vrf-red:10.22.22.0/24" 127.0.0.1:6379> hgetall "ROUTE_TABLE:Vrf-red:10.22.22.0/24" 1) "nexthop" 2) "10.11.11.2" 3) "ifname" 4) "Ethernet0" ``` ### 2.3.3 STATE_DB For VRF, VRF_TABLE and VRF_OBJECT_TABLE are main table in STATE_DB. #### STATE_VRF_TABLE_NAME "VRF_TABLE" This table only updated by vrfmgrd. Schema: ``` ;defines virtual routing forward table state ; ;Status: stable key = VRF_TABLE|Vrf_name ; Vrf_name must unique state = ""/ok" ; VRF created ``` #### STATE_VRF_OBJECT_TABLE_NAME "VRF_OBJECT_TABLE" This table only updated by vrforch. Schema: ``` ;defines virtual routing forward object table state ; ;Status: stable key = VRF_OBJECT_TABLE|vrf_name ; state = ""/ok" ; VRF created ``` Redis DB dump ``` 127.0.0.1:6379[6]> keys *VRF* 1) "VRF_OBJECT_TABLE|Vrf-green" 2) "VRF_OBJECT_TABLE|Vrf-red" 3) "VRF_TABLE|Vrf-green" 4) "VRF_TABLE|Vrf-red" 127.0.0.1:6379[6]> hgetall "VRF_OBJECT_TABLE|Vrf-green" 1) "state" 2) "ok" 127.0.0.1:6379[6]> hgetall "VRF_TABLE|Vrf-green" 1) "state" 2) "ok" ``` ### 2.3.4 ASIC_DB Redis DB Dump: ```bash 127.0.0.1:6379[1]> keys *VIRTUAL_ROUTER* 1) "ASIC_STATE:SAI_OBJECT_TYPE_VIRTUAL_ROUTER:oid:0x3000000000022" //This one is default VRF (Global VRF) 2) "ASIC_STATE:SAI_OBJECT_TYPE_VIRTUAL_ROUTER:oid:0x300000000062a" 3) "ASIC_STATE:SAI_OBJECT_TYPE_VIRTUAL_ROUTER:oid:0x300000000062b" // Get the SAI object Attribute 127.0.0.1:6379[1]> hgetall "ASIC_STATE:SAI_OBJECT_TYPE_VIRTUAL_ROUTER:oid:0x300000000062a" 1) "NULL" 2) "NULL" ``` Get the VRF Object ID who is reference. ```bash 127.0.0.1:6379[1]> keys *0x300000000062a* 1) "ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY:{\"dest\":\"10.10.10.0/24\",\"switch_id\":\"oid:0x21000000000000\",\"vr\":\"oid:0x300000000062a\"}" 2) "ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY:{\"dest\":\"10.10.10.12/32\",\"switch_id\":\"oid:0x21000000000000\",\"vr\":\"oid:0x300000000062a\"}" 3) "ASIC_STATE:SAI_OBJECT_TYPE_VIRTUAL_ROUTER:oid:0x300000000062a" ``` We could find out the ROUTE_ENTRY is using it. And show the ROUTE_ENTRY object what it got. ```bash 127.0.0.1:6379[1]> hgetall "ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY:{\"dest\":\"10.10.10.0/24\",\"switch_id\":\"oid:0x21000000000000\",\"vr\":\"oid:0x300000000062a\"}" 1) "SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID" 2) "oid:0x600000000062c" ``` Then, you could get the ROUTER_INTERFACE information ```bash 127.0.0.1:6379[1]> keys *0x600000000062c* 1) "ASIC_STATE:SAI_OBJECT_TYPE_ROUTER_INTERFACE:oid:0x600000000062c" 127.0.0.1:6379[1]> hgetall "ASIC_STATE:SAI_OBJECT_TYPE_ROUTER_INTERFACE:oid:0x600000000062c" 1) "SAI_ROUTER_INTERFACE_ATTR_VIRTUAL_ROUTER_ID" 2) "oid:0x300000000062a" 3) "SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS" 4) "52:54:00:12:34:56" 5) "SAI_ROUTER_INTERFACE_ATTR_TYPE" 6) "SAI_ROUTER_INTERFACE_TYPE_PORT" 7) "SAI_ROUTER_INTERFACE_ATTR_PORT_ID" 8) "oid:0x1000000000005" 9) "SAI_ROUTER_INTERFACE_ATTR_MTU" 10) "9100" 11) "SAI_ROUTER_INTERFACE_ATTR_NAT_ZONE_ID" 12) "0" ``` # 3. Flows ## 3.1 Part 1: CONFIG_DB to APPL_DB ```mermaid sequenceDiagram participant config_db participant vrfMgrd participant intfMgrd participant kernel participant frr participant app_db config_db->>vrfMgrd: add vrf event vrfMgrd->>kernel: add kernel device vrfMgrd->>app_db: add app-vrf-table entry Note over kernel: create vrf master device kernel->>frr: notify vrf master device status config_db->>vrfMgrd: del vrf event Note over vrfMgrd: wait until all the <br/>devices belonging to <br/>this VRF are unbound vrfMgrd->>kernel: del kernel device vrfMgrd->>app_db: del app-vrf-table entry Note over kernel: remove vrf master device kernel->>frr: notify vrf master device status config_db->>intfMgrd: bind to vrf event intfMgrd->>kernel: issue kernel cmd intfMgrd->>app_db: add app-intf-table entry Note over kernel: set interface master vrf kernel->>frr: notify slave interface status config_db->>intfMgrd: unbind from vrf event Note over intfMgrd: wait until all the <br/>ip addresses belonging<br/>to the interface are<br/> removed intfMgrd->>kernel: issue kernel cmd intfMgrd->>app_db: del app-intf-table entry Note over kernel: unset interface master vrf kernel->>frr: notify interface status config_db->>intfMgrd: add ip address event Note over intfMgrd: wait until vrf-bind<br/> event done intfMgrd->>kernel: issue kernel cmd intfMgrd->>app_db: add app-intf-prefix-table entry Note over kernel: add interface ip address kernel->>frr: notify interface address status config_db->>intfMgrd: del ip address event intfMgrd->>kernel: issue kernel cmd intfMgrd->>app_db: del app-intf-prefix-table entry Note over kernel: del interface ip address kernel->>frr: notify interface address status kernel->>app_db: add/del neigh-table entry by neighsyncd frr->>app_db: add/del route-table entry by fpmsyncd ``` ## 3.2 Part 2: APPL_DB to ASIC_DB ```mermaid sequenceDiagram participant app_db participant vrfOrch participant intfOrch participant neighOrch participant routeOrch participant SAI app_db->>vrfOrch: add app-vrf-table entry event vrfOrch->>SAI: call sai_create_virtual_router app_db->>vrfOrch: del app-vrf-table entry event Note over vrfOrch: wait until the refcnt<br/> of vrf obj is zero vrfOrch->>SAI: call sai_remove_virtual_router app_db->>intfOrch: add app-intf-table entry event intfOrch->>SAI: call sai_create_router_interface Note over intfOrch: increase the refcnt of<br/> vrf obj intfOrch->>SAI: call sai_create_route_entry(ip2me and subnet) Note over intfOrch: increase the refcnt of<br/> vrf obj app_db->>intfOrch: del app-intf-table entry event Note over intfOrch: wait until all ip <br/>addresses on interface<br/> are removed intfOrch->>SAI: call sai_remove_router_interface intfOrch->>SAI: call sai_remove_route_entry(ip2me and subnet) Note over intfOrch: decrease the refcnt of<br/> vrf obj app_db->>neighOrch: add/del app-neigh-table entry event neighOrch->>SAI: Call sai_add/remove_neigh_entry neighOrch->>SAI: Call sai_add/remove_next_hop app_db->>routeOrch: add app-route-table entry event Note over routeOrch: wait until vrf obj and<br/> rif obj are created routeOrch->>SAI: Call sai_add_next_hop_group and sai_add_route_entry Note over routeOrch: increase the refcnt of<br/> vrf obj app_db->>routeOrch: del app-route-table entry event routeOrch->>SAI: Call sai_remove_route_entry and sai_remove_next_hop_group Note over routeOrch: decrease the refcnt of<br/> vrf obj ``` # 4. SAI API The sai header is `saivirtualrouter.h`. The object type is `SAI_OBJECT_TYPE_VIRTUAL_ROUTER`. ## 4.1 Methods ```clike /** * @brief Virtual router methods table retrieved with sai_api_query() */ typedef struct _sai_virtual_router_api_t { sai_create_virtual_router_fn create_virtual_router; sai_remove_virtual_router_fn remove_virtual_router; sai_set_virtual_router_attribute_fn set_virtual_router_attribute; sai_get_virtual_router_attribute_fn get_virtual_router_attribute; } sai_virtual_router_api_t; ``` - create_virtual_router - Create the `SAI_OBJECT_TYPE_VIRTUAL_ROUTER` object and get the router id. - remove_virtual_router - Remove the virtual router object. - set_virtual_router_attribute - Update the sai attribute data. - get_virtual_router_attribute - Get the sai attribute data. ## 4.2 SAI Attribute - SAI_VIRTUAL_ROUTER_ATTR_ADMIN_V4_STATE - JSON file fieldin VRF "v4" - SAI_VIRTUAL_ROUTER_ATTR_ADMIN_V6_STATE - JSON file fieldin VRF "v6" - SAI_VIRTUAL_ROUTER_ATTR_SRC_MAC_ADDRESS - JSON file fieldin VRF "src_mac" - SAI_VIRTUAL_ROUTER_ATTR_VIOLATION_TTL1_PACKET_ACTION - JSON file fieldin VRF "ttl_action" - SAI_VIRTUAL_ROUTER_ATTR_VIOLATION_IP_OPTIONS_PACKET_ACTION - JSON file fieldin VRF "ip_opt_action" :::warning TODO: Not Support ``` /* * @brief if it is global vrf * * @type bool * @flags CREATE_AND_SET * @default true */ SAI_VIRTUAL_ROUTER_ATTR_GLOBAL /* * @brief continue to do global fib lookup while current vrf fib lookup * missed * * @type bool * @flags CREATE_AND_SET * @default false */ SAI_VIRTUAL_ROUTER_ATTR_FALLBACK ``` ::: # 5. Configuration and Management ## 5.1 Config JSON File ### 5.1.1 VRF_TABLE CFG_VRF_TABLE_NAME="VRF" ```json "VRF": { "Vrf-green": { "src_mac": "11:22:33:44:55:66", "ttl_action": "drop", "v4": "False", "ip_opt_action": "forward" }, "Vrf-red": { } } ``` ### 5.1.2 STATIC_ROUTE with VRF ```json "STATIC_ROUTE": { "Vrf-red|192.168.113.0/24": { "blackhole":"False", "distance":"2", "ifname":"Ethernet4", "nexthop":"10.10.10.1" }, "Vrf-blue|192.168.114.0/24": { "blackhole":"False", "distance":"2", "ifname":"Ethernet4", "nexthop":"10.10.10.1" } }, ``` The other subsystem with VRF `config_db.json` file changes are list in section [2.3.1 CONFIG_DB](#231-CONFIG_DB). ## 5.2 VRF CLI ### 5.2.1 VRF show commands `show vrf` This command displays all vrfs configured on the system along with interface binding to the vrf. If Vrf-name is also provided as part of the command, if the vrf is created it will display all interfaces binding to the vrf, if vrf is not created nothing will be displayed. - Usage: ``` show vrf [<vrf_name>] ``` - Example: ``` admin@sonic:~$ show vrf VRF Interfaces ------- ------------ default Vlan20 Vrf-red Vlan100 Loopback11 Vrf-blue Loopback100 Loopback102 ``` ### 5.2.2 config vrf #### config vrf add This command creates vrf in SONiC system with provided Vrf-name. - Usage: ``` config vrf add <Vrf-name> ``` Note: Vrf-name should always start with keyword "Vrf" #### config vrf del This command deletes vrf with name Vrf-name. - Usage: ``` config vrf del <Vrf-name> ``` #### config vrf add_vrf_vni_map This command add vni into the exists Vrf-name. - Usage: ``` config vrf add_vrf_vni_map <Vrf-name> <vni> ``` Make sure the vlan id, vrf name, vxlan name before use the command. ```bash config vlan add 1 config vxlan add Vxlan1 config vxlan map add Vxlan1 1 112 config vrf add_vrf_vni_map Vrf-blue 112 ``` show the config result in vxlan CLI command ``` show vxlan vlanvnimap +--------+-------+ | VLAN | VNI | +========+=======+ | Vlan1 | 112 | +--------+-------+ Total count : 1 ``` #### config vrf del_vrf_vni_map This command deletes vrf with vni_map - Usage: ``` config vrf del_vrf_vni_map <Vrf-name> ``` ### 5.2.3 Interface bind with VRF - Usage: ```bash config interface vrf bind <Interface> <Vrf-name> config interface vrf unbind <Interface> ``` e.g. `config interface vrf bind Ethernet4 Vrf-blue` ### 5.2.4 config route with VRF - Usage: ```bash config route add prefix vrf <Vrf-name> <ip_prefix> nexthop vrf <Vrf-name> <nexthop-ip> config route del prefix vrf <Vrf-name> <ip_prefix> nexthop vrf <Vrf-name> <nexthop-ip> ``` Add operation: `config route add prefix vrf Vrf-blue 192.168.113.0/24 nexthop vrf Vrf-blue 10.10.10.1` Delete operration: `config route del prefix vrf Vrf-blue 192.168.113.0/24 nexthop vrf Vrf-blue 10.10.10.1` ### 5.2.5 show ip route with vrf - Usage ```bash show ip route [vrf <vrf-name>] [<ip_address>] show ipv6 route [vrf <vrf-name>] [<ip_address>] ``` # 6. Restrictions/Limitations ## 6.1 Limitations for VRFs - When you make an interface a member of an existing VRF, SONiC removes all Layer 3 configurations. You should configure all Layer 3 parameters after adding an interface to a VRF. - If you configure an interface for a VRF before the VRF exists, the interface is operationally down until you create the VRF. ## 6.2 Limitations for VRF Route Leaking - Route leaking is supported between any two non-default VRFs and from the default VRF to a non-default VRF. - By default, the maximum number of IP prefixes that can be imported from the default VRF into a non-default VRF is 1000 routes. - There is no limit on the number of routes that can be leaked between two non-default VRFs. # 7. Test cases ## 7.1 Unit test Stage1 - test_VRFMgr_Comprehensive test the SAI attribute. ```clike= ('v4', 'SAI_VIRTUAL_ROUTER_ATTR_ADMIN_V4_STATE', self.boolean_gen), ('v6', 'SAI_VIRTUAL_ROUTER_ATTR_ADMIN_V6_STATE', self.boolean_gen), ('src_mac', 'SAI_VIRTUAL_ROUTER_ATTR_SRC_MAC_ADDRESS', self.mac_addr_gen), ('ttl_action', 'SAI_VIRTUAL_ROUTER_ATTR_VIOLATION_TTL1_PACKET_ACTION', self.packet_action_gen), ('ip_opt_action', 'SAI_VIRTUAL_ROUTER_ATTR_VIOLATION_IP_OPTIONS_PACKET_ACTION', self.packet_action_gen), ('l3_mc_action', 'SAI_VIRTUAL_ROUTER_ATTR_UNKNOWN_L3_MULTICAST_PACKET_ACTION', self.packet_action_gen), ``` - test_VRFMgr - test_VRFMgr_Update - test_VRFMgr_Capacity ## 7.2 Unit test Stage2 Vrf vs test plan are used to verify the function of the swss by checking the content of the APP_DB/ASIC_DB/kernel. FRR and SAI function will be covered by ansible pytest test cases. No|Test case summary ---------|---------- 1|Verify that the vrf entry from config is pushed correctly by vrfmgrd to APP_DB and linux kernel. 2|Verify that the Orchagent is pushing the vrf entry into ASIC_DB by checking the contents in the ASIC_DB. 3|Verify that the random combination of vrf attributes can successfully configured by checking the contents in the APP_DB and ASIC_DB. 4|Verify that the vrf attribute can be updated successfully after vrf is created by checking the contents in the ASIC_DB 5|Verify that the vrf entries can be successfully removed from the CONFIG_DB, APP_DB and ASIC_DB. 6|Verify that the maximum number of vrf entries be created can reach to 1K. 7|Verify that the interface entry from config is pushed correctly by intfmgrd to APP_DB and linux kernel. 8|Verify that the Orchagent is receiving interface creation and deletion from APP_DB. 9|Verify that the Orchagent is pushing the interface entry into ASIC_DB by checking the contents in the ASIC_DB. 10|Verify that the port interface bind IPv4 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB. 11|Verify that the port interface bind IPv4 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB. 12|Verify that the different port interface bound to different vrf can configure the same IPv4 Address by checking the APP_DB and ASIC_DB. 13|Verify that the IPv4 address is removed successfully from the port interface by checking the contents in the APP_DB and ASIC_DB 14|Verify that the port interface bind IPv6 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB. 15|Verify that the port interface bind IPv6 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB. 16|Verify that the IPv6 address is removed successfully from the port interface by checking the contents in the APP_DB and ASIC_DB 17|Verify that the lag interface bind IPv4 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB. 18|Verify that the lag interface bind IPv4 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB. 19|Verify that the IPv4 address is removed successfully from the lag interface by checking the contents in the APP_DB and ASIC_DB 20|Verify that the lag interface bind IPv6 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB. 21|Verify that the lag interface bind IPv6 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB. 22|Verify that the IPv6 address is removed successfully from the lag interface by checking the contents in the APP_DB and ASIC_DB 23|Verify that the vlan interface bind IPv4 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB. 24|Verify that the vlan interface bind IPv4 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB. 25|Verify that the IPv4 address is removed successfully from the vlan interface by checking the contents in the APP_DB and ASIC_DB 26|Verify that the vlan interface bind IPv6 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB. 27|Verify that the vlan interface bind IPv6 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB. 28|Verify that the IPv6 address is removed successfully from the vlan interface by checking the contents in the APP_DB and ASIC_DB 29|Verify that the loopback interface bind IPv4 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB. 30|Verify that the loopback interface bind IPv4 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB. 31|Verify that the IPv4 address is removed successfully from the loopback interface by checking the contents in the APP_DB and ASIC_DB. 32|Verify that the loopback interface bind IPv6 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB. 33|Verify that the IPv6 address remove successfully from the loopback interface by checking the contents in the APP_DB and ASIC_DB. 34|Verify that the neighsyncd pushed neighbor entries to APP_DB correctly by checking the contents in the APP_DB. 35|Verify that the Orchagent is pushing the neighbor entry into ASIC_DB by checking the contents in the ASIC_DB. 36|Verify that the IPv4 neighbor create and delete successfully by checking the contents in the APP_DB and ASIC_DB. 37|Verify that the IPv6 neighbor create and delete successfully by checking the contents in the APP_DB and ASIC_DB. 38|Verify that the different interface with different vrf can add the same IPv4 neighbor address by checking the APP_DB and ASIC_DB. 39|Verify that the fpmsyncd pushed route entries to APP_DB correctly by checking the contents in the APP_DB. 40|Verify that the Orchagent is pushing the route entry into ASIC_DB by checking the contents in the ASIC_DB. 41|Verify that the IPv4 route entry add successfully by checking the contents in the ASIC_DB. 42|Verify that the IPv4 route entry delete successfully by checking the contents in the APP_DB and ASIC_DB. 43|Verify that the IPv6 route entry add successfully by checking the contents in the ASIC_DB. 44|Verify that the IPv6 route entry delete successfully by checking the contents in the APP_DB and ASIC_DB. 45|Verify that the IPv4 route entry with vrf add successfully by checking the contents in the ASIC_DB. 46|Verify that the IPv4 route entry with vrf delete successfully by checking the contents in the APP_DB and ASIC_DB. 47|Verify that the IPv6 route entry with vrf add successfully by checking the contents in the ASIC_DB. 48|Verify that the IPv6 route entry with vrf delete successfully by checking the contents in the APP_DB and ASIC_DB. 49|Verify that the route entry can point to a nexthop in different vrf. 50|Verify that the acl packet action is redirect to a nexthop, the acl entry add correctly by checking the contents in the ASIC_DB. 51|Verify that the kernel vrf config keep the same during vrfmgrd warm-reboot. 52|Verify that the VIRTUAL_ROUTER/ROUTE_ENTRY/NEIGH_ENTRY in ASIC_DB keep the same during vrfmgrd warm-reboot by monitoring the object changes in ASIC_DB. 53|Verify that the vrfmgrd work well after warm-reboot by checking that the new config is pushed correctly to APP_DB and ping work well via vrf port interfaces. ## 7.3 System test ![](https://i.imgur.com/dpmh32N.jpg) | Config | Verify | Step | | ------ | ------ | -------------------------- | |V| | Set up the Topology | |V| Check each host could ping each other. |V| | Set up Vrf-red and Vrf-blue and binding the interface| |V| | Let Host112 and Host114 in Vrf-red| |V| | Let Host113 and Host115 in Vrf-blue| | |V| Check host112 could reach host114 and host113 could reach host114| | |V| Check the host112/host114 could not reach host113/host115 vice versa.| # 8. Open/Action items - if any 1. STATIC_ROUTE need to add into the CONFIG_DB. # 9. Summary ------ **Don't COPY BELOW THIS LINE.** # 10. Appendix [sonic vrf HLD](https://github.com/Azure/SONiC/blob/master/doc/vrf/sonic-vrf-hld.md) [Kernel VRF](https://www.kernel.org/doc/Documentation/networking/vrf.txt) [l3mdev](https://lwn.net/Articles/658471/) [sonic-utilities doc](https://github.com/Azure/sonic-utilities/blob/master/doc/Command-Reference.md#vxlan--vnet) [cisco n9k vrf](https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/6-x/unicast/configuration/guide/l3_cli_nxos/l3_virtual.html#86372) ## Requirement (TODO) 6. loopback devices with vrf. with detail - Add/delete Loopback interfaces per VRF - Support to add IPv4 and IPv6 host address on these loopback interfaces - Support to use these loopback interfaces as source for various routing protocol control packet transmission. For instance in case of BGP multihop sessions source IP of the BGP packet will be outgoing interface IP which can change based on the interface status bringing down BGP session though there is alternate path to reach BGP neighbor. In case loopback interface is used as source BGP session would not flap. - These loopback interface IP address can be utilized for router-id generation on the VRF which could be utilized by routing protocols. - Support to use these interfaces as tunnel end points if required in future. - Support to use these interfaces as source for IP Unnumbered interface if required in future. 7. Fallback lookup. (Not support, need a new SAI attribute) - The fallback feature which defined by RFC4364 is very useful for specified VRF user to access internet through global/main route. Some enterprise users still use this to access internet on vpn environment. 8. VRF route leaking between VRFs. (Define in the other taskforce) > (TODO: need to check the code) An ideal approach is to handle the two events similar to what Linux kernel is doing. e.g. if the IP address is configured in an interface first, it will be accepted. Later on when the interface is enslaved to a VRF, the IP address from the master FIB will be removed, and reprogrammed to the VRF table. But this approach is very complicated to support. e.g. it may have IP address conflict in the destination VRF, and the current SONiC infrastructure cannot detect and protect it. So this approach is not supported in this VRF release. ## VRF Test PLAN https://github.com/Azure/SONiC/blob/master/doc/vrf/vrf-ansible-test-plan.md ![](https://i.imgur.com/1tCXfru.png)