HLD Virtual Routing Forwarding (VRF)
=====
###### tags: `SONiC`
# Revision
|Revision| Author| Date |
|----|----|----|
|Initial version|Dung-Ru Tsai |04/06/2021|
|Add test case/neighbor nexthopkey/ config db schema|Dung-Ru Tsai |08/06/2021|
|Add src_mac, ip state, ip option config_db schema|Dung-Ru Tsai |27/07/2021|
# About this manual
VRF contains a separate address space with unicast and multicast route tables for IPv4 and IPv6 and makes routing decisions independent of any other VRF.
Each switch has a default VRF and a management VRF:
- Management VRF (Not discuss in this document)
- The management VRF is for management purposes only.
- Only the mgmt 0 interface can be in the management VRF.
- The mgmt 0 interface cannot be assigned to another VRF.
- No routing protocols can run in the management VRF (static only).
- Default VRF
- All Layer 3 interfaces exist in the default VRF until they are assigned to another VRF.
- Routing protocols run in the default VRF context unless another VRF context is specified.
- The default VRF uses the default routing context for all show commands.
# Scope
This document only define the unicast VRF. Management VRF, multicast VRF and VRF route leaking are not consider in this document.
# Definitions/Abbreviations
This section covers the abbreviation if any, used in this high-level design document and its definitions.
| Abbreviation | Meaning |
|--------------|----------------------------|
| VRF | Virtual Routing and Forwarding |
| RIB | Routing Information Base |
| PBR | Policy based routing |
| FRR | FRRouting is an IP routing protocol suite for Linux and Unix platforms |
# 1. Requirements
1. Add or Delete VRF instance
2. Bind L3 interface to a VRF.
- L3 interface includes port interface, vlan interface, LAG interface and loopback interface.
3. Static IP route with VRF
4. Enable BGP VRF aware in SONiC
5. VRF Scalability: Currently VRF number can be supported up to 1000 after fixing a bug in FRR. (Depend on ASIC ability)
6. Loopback devices with vrf.
# 2. Architecture Design
```mermaid
graph TD
CONFIG_DB[(CONFIG_DB)]
APPL_DB[(APPL_DB)]
vrfmgrd(vrfmgrd)
intfmgrd(intfmgrd)
fpmsyncd(fpmsyncd)
zebra(zebra)
bgpd(bgpd)
vrfcli(VRF CLI)
orchagent(Orchagent)
Netlink[Kernel Netlink]
vrfcli-->CONFIG_DB
subgraph Database
CONFIG_DB
APPL_DB
ASIC_DB
end
subgraph SWSS-container
vrfmgrd
intfmgrd
orchagent
end
CONFIG_DB-->|create/del VRF|vrfmgrd
CONFIG_DB-->|VRF bind/unbind|intfmgrd
subgraph Syncd-container
syncd---SAI-API
SAI-API---ASIC-SDK
end
APPL_DB-->orchagent
orchagent-->ASIC_DB
ASIC_DB-->syncd
vrfmgrd-->|create/del VRF|APPL_DB
intfmgrd-->|VRF bind/unbind|APPL_DB
vrfmgrd-->|create/del VRF|Netlink
intfmgrd-->|VRF bind/unbind|Netlink
subgraph BGP-container
bgpd-->zebra
zebra-->bgpd
zebra-->fpmsyncd
end
Netlink-->|create/del VRF, VRF bind/unbind|zebra
fpmsyncd-->|VRF Route info|APPL_DB
```
## 2.1 SWSS Container
### 2.1.1 Config Manager Daemon
SONiC VRF need two config manager daemon to handle the CONFIG_DB changes. Manager daemons have two main task.
- Synchronize new config_db status to APPL_DB.
- Trigger netlink message to notify the kernel and FRR zebra daemon.
#### 2.1.1.1 Vrf Manager (vrfmgrd)
- Listening to VRF creation/deletion configuration in config_db `CFG_VRF_TABLE_NAME`. Once detected,
1. Set `STATE_VRF_TABLE_NAME` state to ok.
2. Update kernel using iproute2 CLIs.
3. Write VRF information to `APP_VRF_TABLE_NAME`.
- When vrfmgrd receives VRF delete event it wont process the event till all the devices belonging to this VRF are unbound from the VRF.
1. delete `APP_VRF_TABLE_NAME` by Vrf-name
2. delete `STATE_VRF_TABLE_NAME` by Vrf-name
3. Update kernel using iproute2 CLIs.
4. vrforch will delete the`STATE_VRF_OBJECT_TABLE_NAME` by Vrf-name
- vrfmgrd process will be placed in swss docker. In case of swss docker warm reboot, since VRF device is still retained in kernel, when vrfmgrd starts up it will recover the VRF system state from kernel.
#### 2.1.1.2 Interface Manager (intfmgrd)
IP address event and **VRF binding event** need to be handled seperately. These two events has sequence dependency.
- Listening to interface binding to specific VRF configuration in config_db.
- bind to VRF event:
- bind kernel device to master VRF
- add interface entry with VRF attribute to `APP_INTF_TABLE_NAME(INTF_TABLE)`.
- set vrf-binding flag on STATE_DB `STATE_INTERFACE_TABLE_NAME` table.
- unbind from VRF event:
- wait until all ip addresses associated with the interface is removed. Ip address infomation can be retrieved from kernel.
- unbind kernel device to global VRF (default VRF)
- del interface entry with VRF attribute from `APP_INTF_TABLE_NAME(INTF_TABLE)` table
- Remove vrf-binding on STATE_DB `STATE_INTERFACE_TABLE_NAME`
- Listening to interface ip address configuration in config_db.
- add ip address event:
- After interface bind vrf is set, set ip address on kernel device
- Add {interface_name:ip address} entry to `APP_INTF_TABLE_NAME(INTF_TABLE)` and `STATE_INTERFACE_TABLE_NAME`
- del ip address event:
- unset ip address on kernel device
- Delete {interface_name:ip address} entry from `APP_INTF_TABLE_NAME` and `STATE_INTERFACE_TABLE_NAME`
#### 2.1.1.3 Neighbor (nbrmgrd)
- Listening to neighhor configuration (CFG_NEIGH_TABLE_NAME "NEIGH") table on configdb, add neighbor entry to kernel only after the corresponding intf-bind-vrf event is processed.
- In the current implementation neighbor may be added to kernel before intf-bind-vrf event. After intf-bind-vrf event kernel will flush all neighbors associated with this interface, the neighbor configuration get lost.
- intf-bind-vrf: add interface entry with VRF attribute.
### 2.1.2 Orchagent
#### 2.1.2.1 vrforch
- Monitoring `APP_VRF_TABLE_NAME`, using `sai_create_virtual_router_fn` or
`sai_remove_virtual_router_fn` defined in saivirtualrouter.h to track (VR, VRF) creation/deletion and save (vrf_name, vrf-vid) pairs.
- When vrforch receives vrf-delete event for a given VRF, **this VRF object should be deleted after routes and router interfaces related this VRF are removed. Neigh object related VRF is implicit guaranteed by router interface object related VRF.**
#### 2.1.2.2 intfsorch
- add vrforch as a member of intfsorch
- intfsorch monitors app-intf-table
- When APP_INTF_TABLE_NAME change
- bind to vrf event: create router interface with vrf attribute and increase refcnt of vrforch.
- unbind from vrf event: wait until all ip addresses on interface is removed, then remove router interface with vrf attribute, decreasing refcnt of vrforch
After the binding, we must add the the router interface again (set interrface ip). During router interface create, the attribute `SAI_ROUTER_INTERFACE_ATTR_VIRTUAL_ROUTER_ID` will be include to `SAI_OBJECT_TYPE_ROUTER_INTERFACE` object.
#### 2.1.2.3 routeorch
- Add vrforch as a member of routeorch
- Once APP_ROUTE_TABLE_NAME has new udpate, get VRF object ID from vrforch by vrf_name.
- APP_ROUTE_TABLE_NAME is update by FRR fpmsyncd.
- Nexthop key is changed to `(ipaddress, intf_name)` pair from `ipaddress`. You could ceck the `src/sonic-swss/orchagent/nexthopkey.h`
- The key of Nexthop group is the set of nexthop key.
- The value of routetable is changed to the set of `(ipaddress, intf_name)` pair from `ipaddresses`
- Expand single routetable to mutiple routetables with vrf ID as the key
- Update refcnt of vrforch
#### 2.1.2.4 neighorch changes
The Key of Nexthop now is changed from only ipaddress to a pair of (ipaddress, intf_name).
```clike
struct NextHopKey {};
typedef NextHopKey NeighborEntry;
NeighborEntry neighbor_entry = { ip_address, alias };
```
Notation: intf_name is alias name.
You could ceck the `src/sonic-swss/orchagent/nexthopkey.h`
#### 2.1.2.5 aclorch changes
The Key of redirect-nexthop is changed from only ip address to a pair of (ipaddress@Vrf-name).
You could ceck the `src/sonic-swss/orchagent/nexthopkey.h`
## 2.2 BGP Container
### 2.2.1 fpmsyncd
- fpmsyncd will add VRF support, it can use `rtnl_route_get_table` to get VRF table ID.
But with the current FRR implementation, this API returns the master devices' ifIndex for this VRF. The VRF name of Prefix can be derived from ifIndex.
- The key of `APP_ROUTE_TABLE_NAME` is "vrf_name:prefix".
- The route from FRR has nexthop information which contain nexthop_ipaddress and interface index. **Nexthop interface contain vrf information**. It is available for route-leak scenarios.
## 2.3 Database
### 2.3.1 CONFIG_DB
For VRF, VRF is main table in CONFIG_DB.
INTERFACE, LOOPBACK_INTERFACE, PORTCHANNEL_INTERFACE, VLAN_INTERFACE, BGP_NEIGHBOR, BGP_PEER_RANGE, ACL_RULE, STATIC_ROUTE 8 tables need to be change to support VRF.
#### CFG_VRF_TABLE_NAME "VRF"
Schema:
```
;defines virtual routing forward table
;
;Status: stable
key = VRF_TABLE|vrf_name ;
fallback = "true"/"false";
v4 = "true"/"false"; Admin V4 state
v6 = "true"/"false"; Admin V6 state
ip_opt_action = "drop"/"forward"; Action for Packets with IP options
src_mac = "MAC Address"; Example format "00:12:34:56:78:9a"
ttl_action = "drop"/"forward"; Action for Packets with TTL 0 or 1
```
:::warning
fallback features not support yet.
:::
Redis DB dump:
```
127.0.0.1:6379[4]> keys *VRF*
1) "VRF|Vrf-green"
2) "VRF|Vrf-red"
127.0.0.1:6379[4]> hgetall "VRF|Vrf-green"
1) "NULL"
2) "NULL"
```
The following 4 sections are interface relative
#### CFG_INTF_TABLE_NAME "INTERFACE" changes
```json
"INTERFACE":{
"Ethernet0":{
"vrf_name":"Vrf-blue" // vrf_name must start with "Vrf" prefix
},
"Ethernet1":{
"vrf_name":"Vrf-red"
},
"Ethernet2":{}, // it means this interface belongs to global vrf. It is necessary even user doesnt use vrf.
"Ethernet0|11.11.11.1/24": {},
"Ethernet0|12.12.12.1/24": {},
"Ethernet1|12.12.12.1/24": {},
"Ethernet2|13.13.13.1/24": {}
},
```
Schema Changes in INTERFACE:
```
;Define INTERFACE table
key = INTERFACE|EthernetID
EthernetID = "Ethernet"VCHAR ; ethernet id with Ethernet prefix
; field = value
vrf_name = string ;VRF name with Vrf prefix
```
#### CFG_LOOPBACK_INTERFACE_TABLE_NAME "LOOPBACK_INTERFACE" changes
```json
"LOOPBACK_INTERFACE":{
"Loopback0":{
"vrf_name":"Vrf-yellow"
},
"Loopback0|14.14.14.1/32":{}
},
```
Schema Changes in LOOPBACK_INTERFACE:
```
;Define LOOPBACK_INTERFACE table
key = LOOPBACK_INTERFACE|LoopbackID
LoopbackID = "Loopback"VCHAR; loopback id with Loopback prefix
; field = value
vrf_name = string ;VRF name with Vrf prefix
```
#### CFG_LAG_INTF_TABLE_NAME "PORTCHANNEL_INTERFACE" changes
```json
"PORTCHANNEL_INTERFACE":{
"Portchannel0":{
"vrf_name":"Vrf-yellow"
},
"Portchannel0|16.16.16.1/24":{}
}
```
Schema Changes in PORTCHANNEL_INTERFACE:
```
;Define PORTCHANNEL_INTERFACE table
key = PORTCHANNEL_INTERFACE|PortchannelID
PortchannelID = "Portchannel"VCHAR ; portchannel id with Portchannel prefix
; field = value
vrf_name = string ;VRF name with Vrf prefix
```
#### CFG_VLAN_INTF_TABLE_NAME "VLAN_INTERFACE" changes
```json
"VLAN_INTERFACE": {
"Vlan100":{
"vrf_name":"Vrf-blue"
},
"Vlan100|15.15.15.1/24": {}
},
```
Schema Changes in VLAN_INTERFACE:
```
;Define VLAN_INTERFACE table
key = VLAN_INTERFACE|VlanID
VlanID = "Vlan"VCHAR ; vlan id with Vlan prefix
; field = value
vrf_name = string ;VRF name with Vrf prefix
```
#### CFG_BGP_NEIGHBOR_TABLE_NAME "BGP_NEIGHBOR" changes
```json
"BGP_NEIGHBOR": {
"Vrf-blue|10.0.0.49": { // This neighbour belongs to Vrf-blue
"name": "ARISTA09T0",
"rrclient": "0",
"local_addr": "10.0.0.48",
"asn": "64009",
"nhopself": "0"
}
}
```
Schema Changes in BGP_NEIGHBOR:
```
;Define BGP_NEIGHBOR table
key = BGP_NEIGHBOR|Vrf-name|IP-prefix
Vrf-name = string ;VRF name with Vrf prefix
IP-prefix = = IPv4Prefix / IPv6prefix
; field = value
; No changes
```
#### CFG_BGP_PEER_RANGE_TABLE_NAME "BGP_PEER_RANGE" changes
```json
"BGP_PEER_RANGE": {
"BGPSLBPassive": { // This BGP_PEER_Group belong to Vrf-blue
"name": "BGPSLBPassive",
"vrf_name": "Vrf-blue",
"src_address":"10.1.1.2",
"ip_range": [
"192.168.8.0/27"
]
}
}
```
Schema Changes in BGP_PEER_RABGE:
```
;Define BGP_PEER_RABGE table
key = BGP_PEER_RANGE|peer-range-name
peer-range-name = "BGPSLBPassive"/"BGPVac"
; field = value
vrf_name = string ;VRF name with Vrf prefix
```
#### CFG_ACL_RULE_TABLE_NAME "ACL_RULE" changes
The existing acl_rule_table definition is the following.
```json
"table1|rule1": {
"L4_SRC_PORT": "99",
"PACKET_ACTION": "REDIRECT:20.1.1.93,30.1.1.93"
},
"table1|rule2": {
"L4_SRC_PORT": "100",
"PACKET_ACTION": "REDIRECT:20.1.1.93"
},
```
To support vrf the nexthop key should change to `{IP@Vrf-name}` pair from single `{IP}`. For backward compatibilty nexthop key `{IP}` is also supported, it only works on global vrf. So new acl_rule_table should like the following.
```json
"table1|rule1": {
"L4_SRC_PORT": "99",
"PACKET_ACTION": "REDIRECT:20.1.1.93@Vrf-blue,30.1.1.93"
},
"table1|rule2": {
"L4_SRC_PORT": "100",
"PACKET_ACTION": "REDIRECT:20.1.1.93@Vrf-blue, 30.1.1.93"
},
```
The REDIRECT Vrf parser implement is here: `src/sonic-swss/orchagent/nexthopkey.h:NextHopKey(const std::string &str)`
Schema change in ACL_RULE_TABLE
```
key: ACL_RULE_TABLE|table_name|rule_name ; key of the rule entry in the table,
; seq is the order of the rules
; when the packet is filtered by the
; ACL "policy_name".
; A rule is always assocaited with a
; policy.
;field = value
packet_action = "redirect:"redirect_parameter
; an action when the fields are matched
; we have a parameter in case of packet_action="redirect"
; This redirect_parameter defines a destination for redirected packets
; it could be:
: name of physical port. Example: "Ethernet10"
: name of LAG port Example: "PortChannel5"
: next-hop ip address with prefix Vrf. Example: "10.0.0.1@Vrf-name"
: next-hop group set of addresses Example: "10.0.0.1@Vrf-name,10.0.0.3"
redirect_action = string ; It could be:
: name of physical port. Example: "Ethernet10"
: name of LAG port Example: "PortChannel5"
: next-hop ip address Example: "10.0.0.1@Vrf-name" or "10.0.0.1"
: next-hop group set of addresses Example: "10.0.0.1,10.0.0.3" or "10.0.0.1@Vrf-name,10.0.0.3"
```
Vrf name append in redirect parameter with `@Vrf-name`.
#### CFG_STATIC_ROUTE_TABLE_NAME "STATIC_ROUTE" changes
Config DB not implement the table yet.
```
;Defines IP static route table
;
;Status: stable
key = STATIC_ROUTE|vrf_name|prefix ;
vrf_name = "Vrf"string ;VRF name with Vrf prefix
prefix = IPv4Prefix / IPv6prefix
```
Reference: [static route from bgpcfgd](https://github.com/Azure/SONiC/blob/master/doc/static-route/SONiC_static_route_hdl.md)
Preview the JSON format.
```json
"STATIC_ROUTE": {
"Vrf-red|192.168.113.0/24": {
"blackhole":"False",
"distance":"2",
"ifname":"Ethernet4",
"nexthop":"10.10.10.1"
}
},
```
### 2.3.2 APPL_DB
For VRF, VRF_TABLE is main table in APPL_DB.
And INTF_TABLE, ROUTE_TABLE need to change.
#### APP_VRF_TABLE_NAME "VRF_TABLE"
Schema:
```
;defines virtual routing forward table
;
;Status: stable
key = VRF_TABLE:vrf_name ;
;field = value
fallback = "true"/"false"
```
Redis DB dump:
```
127.0.0.1:6379> keys *VRF*
1) "VRF_TABLE:Vrf-red"
2) "VRF_TABLE:Vrf-green"
127.0.0.1:6379> hgetall "VRF_TABLE:Vrf-red"
1) "NULL"
2) "NULL"
```
#### APP_INTF_TABLE_NAME "INTF_TABLE" changes
Schema:
```
;defines logical network interfaces, an attachment to a PORT name
;
;Status: stable
key = INTF_TABLE:ifname
;field = value
vrf_name = 1\*15VCHAR ;New add
```
Redis DB dump:
```
127.0.0.1:6379> hgetall "INTF_TABLE:Ethernet0"
1) "vrf_name"
2) "Vrf-red"
3) "mac_addr"
4) "00:00:00:00:00:00"
```
#### APP_ROUTE_TABLE_NAME "ROUTE_TABLE" changes
Schema:
```
;Stores a list of routes
;Status: Mandatory
key = ROUTE_TABLE:vrf-name:ip_prefix ;vrf-name start with 'Vrf' prefix
```
Redis DB dump:
```
127.0.0.1:6379> keys *ROUTE*
1) "ROUTE_TABLE:Vrf-red:10.22.22.0/24"
127.0.0.1:6379> hgetall "ROUTE_TABLE:Vrf-red:10.22.22.0/24"
1) "nexthop"
2) "10.11.11.2"
3) "ifname"
4) "Ethernet0"
```
### 2.3.3 STATE_DB
For VRF, VRF_TABLE and VRF_OBJECT_TABLE are main table in STATE_DB.
#### STATE_VRF_TABLE_NAME "VRF_TABLE"
This table only updated by vrfmgrd.
Schema:
```
;defines virtual routing forward table state
;
;Status: stable
key = VRF_TABLE|Vrf_name ; Vrf_name must unique
state = ""/ok" ; VRF created
```
#### STATE_VRF_OBJECT_TABLE_NAME "VRF_OBJECT_TABLE"
This table only updated by vrforch.
Schema:
```
;defines virtual routing forward object table state
;
;Status: stable
key = VRF_OBJECT_TABLE|vrf_name ;
state = ""/ok" ; VRF created
```
Redis DB dump
```
127.0.0.1:6379[6]> keys *VRF*
1) "VRF_OBJECT_TABLE|Vrf-green"
2) "VRF_OBJECT_TABLE|Vrf-red"
3) "VRF_TABLE|Vrf-green"
4) "VRF_TABLE|Vrf-red"
127.0.0.1:6379[6]> hgetall "VRF_OBJECT_TABLE|Vrf-green"
1) "state"
2) "ok"
127.0.0.1:6379[6]> hgetall "VRF_TABLE|Vrf-green"
1) "state"
2) "ok"
```
### 2.3.4 ASIC_DB
Redis DB Dump:
```bash
127.0.0.1:6379[1]> keys *VIRTUAL_ROUTER*
1) "ASIC_STATE:SAI_OBJECT_TYPE_VIRTUAL_ROUTER:oid:0x3000000000022" //This one is default VRF (Global VRF)
2) "ASIC_STATE:SAI_OBJECT_TYPE_VIRTUAL_ROUTER:oid:0x300000000062a"
3) "ASIC_STATE:SAI_OBJECT_TYPE_VIRTUAL_ROUTER:oid:0x300000000062b"
// Get the SAI object Attribute
127.0.0.1:6379[1]> hgetall "ASIC_STATE:SAI_OBJECT_TYPE_VIRTUAL_ROUTER:oid:0x300000000062a"
1) "NULL"
2) "NULL"
```
Get the VRF Object ID who is reference.
```bash
127.0.0.1:6379[1]> keys *0x300000000062a*
1) "ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY:{\"dest\":\"10.10.10.0/24\",\"switch_id\":\"oid:0x21000000000000\",\"vr\":\"oid:0x300000000062a\"}"
2) "ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY:{\"dest\":\"10.10.10.12/32\",\"switch_id\":\"oid:0x21000000000000\",\"vr\":\"oid:0x300000000062a\"}"
3) "ASIC_STATE:SAI_OBJECT_TYPE_VIRTUAL_ROUTER:oid:0x300000000062a"
```
We could find out the ROUTE_ENTRY is using it. And show the ROUTE_ENTRY object what it got.
```bash
127.0.0.1:6379[1]> hgetall "ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY:{\"dest\":\"10.10.10.0/24\",\"switch_id\":\"oid:0x21000000000000\",\"vr\":\"oid:0x300000000062a\"}"
1) "SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID"
2) "oid:0x600000000062c"
```
Then, you could get the ROUTER_INTERFACE information
```bash
127.0.0.1:6379[1]> keys *0x600000000062c*
1) "ASIC_STATE:SAI_OBJECT_TYPE_ROUTER_INTERFACE:oid:0x600000000062c"
127.0.0.1:6379[1]> hgetall "ASIC_STATE:SAI_OBJECT_TYPE_ROUTER_INTERFACE:oid:0x600000000062c"
1) "SAI_ROUTER_INTERFACE_ATTR_VIRTUAL_ROUTER_ID"
2) "oid:0x300000000062a"
3) "SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS"
4) "52:54:00:12:34:56"
5) "SAI_ROUTER_INTERFACE_ATTR_TYPE"
6) "SAI_ROUTER_INTERFACE_TYPE_PORT"
7) "SAI_ROUTER_INTERFACE_ATTR_PORT_ID"
8) "oid:0x1000000000005"
9) "SAI_ROUTER_INTERFACE_ATTR_MTU"
10) "9100"
11) "SAI_ROUTER_INTERFACE_ATTR_NAT_ZONE_ID"
12) "0"
```
# 3. Flows
## 3.1 Part 1: CONFIG_DB to APPL_DB
```mermaid
sequenceDiagram
participant config_db
participant vrfMgrd
participant intfMgrd
participant kernel
participant frr
participant app_db
config_db->>vrfMgrd: add vrf event
vrfMgrd->>kernel: add kernel device
vrfMgrd->>app_db: add app-vrf-table entry
Note over kernel: create vrf master device
kernel->>frr: notify vrf master device status
config_db->>vrfMgrd: del vrf event
Note over vrfMgrd: wait until all the <br/>devices belonging to <br/>this VRF are unbound
vrfMgrd->>kernel: del kernel device
vrfMgrd->>app_db: del app-vrf-table entry
Note over kernel: remove vrf master device
kernel->>frr: notify vrf master device status
config_db->>intfMgrd: bind to vrf event
intfMgrd->>kernel: issue kernel cmd
intfMgrd->>app_db: add app-intf-table entry
Note over kernel: set interface master vrf
kernel->>frr: notify slave interface status
config_db->>intfMgrd: unbind from vrf event
Note over intfMgrd: wait until all the <br/>ip addresses belonging<br/>to the interface are<br/> removed
intfMgrd->>kernel: issue kernel cmd
intfMgrd->>app_db: del app-intf-table entry
Note over kernel: unset interface master vrf
kernel->>frr: notify interface status
config_db->>intfMgrd: add ip address event
Note over intfMgrd: wait until vrf-bind<br/> event done
intfMgrd->>kernel: issue kernel cmd
intfMgrd->>app_db: add app-intf-prefix-table entry
Note over kernel: add interface ip address
kernel->>frr: notify interface address status
config_db->>intfMgrd: del ip address event
intfMgrd->>kernel: issue kernel cmd
intfMgrd->>app_db: del app-intf-prefix-table entry
Note over kernel: del interface ip address
kernel->>frr: notify interface address status
kernel->>app_db: add/del neigh-table entry by neighsyncd
frr->>app_db: add/del route-table entry by fpmsyncd
```
## 3.2 Part 2: APPL_DB to ASIC_DB
```mermaid
sequenceDiagram
participant app_db
participant vrfOrch
participant intfOrch
participant neighOrch
participant routeOrch
participant SAI
app_db->>vrfOrch: add app-vrf-table entry event
vrfOrch->>SAI: call sai_create_virtual_router
app_db->>vrfOrch: del app-vrf-table entry event
Note over vrfOrch: wait until the refcnt<br/> of vrf obj is zero
vrfOrch->>SAI: call sai_remove_virtual_router
app_db->>intfOrch: add app-intf-table entry event
intfOrch->>SAI: call sai_create_router_interface
Note over intfOrch: increase the refcnt of<br/> vrf obj
intfOrch->>SAI: call sai_create_route_entry(ip2me and subnet)
Note over intfOrch: increase the refcnt of<br/> vrf obj
app_db->>intfOrch: del app-intf-table entry event
Note over intfOrch: wait until all ip <br/>addresses on interface<br/> are removed
intfOrch->>SAI: call sai_remove_router_interface
intfOrch->>SAI: call sai_remove_route_entry(ip2me and subnet)
Note over intfOrch: decrease the refcnt of<br/> vrf obj
app_db->>neighOrch: add/del app-neigh-table entry event
neighOrch->>SAI: Call sai_add/remove_neigh_entry
neighOrch->>SAI: Call sai_add/remove_next_hop
app_db->>routeOrch: add app-route-table entry event
Note over routeOrch: wait until vrf obj and<br/> rif obj are created
routeOrch->>SAI: Call sai_add_next_hop_group and sai_add_route_entry
Note over routeOrch: increase the refcnt of<br/> vrf obj
app_db->>routeOrch: del app-route-table entry event
routeOrch->>SAI: Call sai_remove_route_entry and sai_remove_next_hop_group
Note over routeOrch: decrease the refcnt of<br/> vrf obj
```
# 4. SAI API
The sai header is `saivirtualrouter.h`. The object type is `SAI_OBJECT_TYPE_VIRTUAL_ROUTER`.
## 4.1 Methods
```clike
/**
* @brief Virtual router methods table retrieved with sai_api_query()
*/
typedef struct _sai_virtual_router_api_t
{
sai_create_virtual_router_fn create_virtual_router;
sai_remove_virtual_router_fn remove_virtual_router;
sai_set_virtual_router_attribute_fn set_virtual_router_attribute;
sai_get_virtual_router_attribute_fn get_virtual_router_attribute;
} sai_virtual_router_api_t;
```
- create_virtual_router
- Create the `SAI_OBJECT_TYPE_VIRTUAL_ROUTER` object and get the router id.
- remove_virtual_router
- Remove the virtual router object.
- set_virtual_router_attribute
- Update the sai attribute data.
- get_virtual_router_attribute
- Get the sai attribute data.
## 4.2 SAI Attribute
- SAI_VIRTUAL_ROUTER_ATTR_ADMIN_V4_STATE
- JSON file fieldin VRF "v4"
- SAI_VIRTUAL_ROUTER_ATTR_ADMIN_V6_STATE
- JSON file fieldin VRF "v6"
- SAI_VIRTUAL_ROUTER_ATTR_SRC_MAC_ADDRESS
- JSON file fieldin VRF "src_mac"
- SAI_VIRTUAL_ROUTER_ATTR_VIOLATION_TTL1_PACKET_ACTION
- JSON file fieldin VRF "ttl_action"
- SAI_VIRTUAL_ROUTER_ATTR_VIOLATION_IP_OPTIONS_PACKET_ACTION
- JSON file fieldin VRF "ip_opt_action"
:::warning
TODO: Not Support
```
/*
* @brief if it is global vrf
*
* @type bool
* @flags CREATE_AND_SET
* @default true
*/
SAI_VIRTUAL_ROUTER_ATTR_GLOBAL
/*
* @brief continue to do global fib lookup while current vrf fib lookup
* missed
*
* @type bool
* @flags CREATE_AND_SET
* @default false
*/
SAI_VIRTUAL_ROUTER_ATTR_FALLBACK
```
:::
# 5. Configuration and Management
## 5.1 Config JSON File
### 5.1.1 VRF_TABLE
CFG_VRF_TABLE_NAME="VRF"
```json
"VRF": {
"Vrf-green": {
"src_mac": "11:22:33:44:55:66",
"ttl_action": "drop",
"v4": "False",
"ip_opt_action": "forward"
},
"Vrf-red": {
}
}
```
### 5.1.2 STATIC_ROUTE with VRF
```json
"STATIC_ROUTE": {
"Vrf-red|192.168.113.0/24": {
"blackhole":"False",
"distance":"2",
"ifname":"Ethernet4",
"nexthop":"10.10.10.1"
},
"Vrf-blue|192.168.114.0/24": {
"blackhole":"False",
"distance":"2",
"ifname":"Ethernet4",
"nexthop":"10.10.10.1"
}
},
```
The other subsystem with VRF `config_db.json` file changes are list in section [2.3.1 CONFIG_DB](#231-CONFIG_DB).
## 5.2 VRF CLI
### 5.2.1 VRF show commands
`show vrf`
This command displays all vrfs configured on the system along with interface binding to the vrf.
If Vrf-name is also provided as part of the command, if the vrf is created it will display all interfaces binding to the vrf, if vrf is not created nothing will be displayed.
- Usage:
```
show vrf [<vrf_name>]
```
- Example:
```
admin@sonic:~$ show vrf
VRF Interfaces
------- ------------
default Vlan20
Vrf-red Vlan100
Loopback11
Vrf-blue Loopback100
Loopback102
```
### 5.2.2 config vrf
#### config vrf add
This command creates vrf in SONiC system with provided Vrf-name.
- Usage:
```
config vrf add <Vrf-name>
```
Note: Vrf-name should always start with keyword "Vrf"
#### config vrf del
This command deletes vrf with name Vrf-name.
- Usage:
```
config vrf del <Vrf-name>
```
#### config vrf add_vrf_vni_map
This command add vni into the exists Vrf-name.
- Usage:
```
config vrf add_vrf_vni_map <Vrf-name> <vni>
```
Make sure the vlan id, vrf name, vxlan name before use the command.
```bash
config vlan add 1
config vxlan add Vxlan1
config vxlan map add Vxlan1 1 112
config vrf add_vrf_vni_map Vrf-blue 112
```
show the config result in vxlan CLI command
```
show vxlan vlanvnimap
+--------+-------+
| VLAN | VNI |
+========+=======+
| Vlan1 | 112 |
+--------+-------+
Total count : 1
```
#### config vrf del_vrf_vni_map
This command deletes vrf with vni_map
- Usage:
```
config vrf del_vrf_vni_map <Vrf-name>
```
### 5.2.3 Interface bind with VRF
- Usage:
```bash
config interface vrf bind <Interface> <Vrf-name>
config interface vrf unbind <Interface>
```
e.g. `config interface vrf bind Ethernet4 Vrf-blue`
### 5.2.4 config route with VRF
- Usage:
```bash
config route add prefix vrf <Vrf-name> <ip_prefix> nexthop vrf <Vrf-name> <nexthop-ip>
config route del prefix vrf <Vrf-name> <ip_prefix> nexthop vrf <Vrf-name> <nexthop-ip>
```
Add operation:
`config route add prefix vrf Vrf-blue 192.168.113.0/24 nexthop vrf Vrf-blue 10.10.10.1`
Delete operration:
`config route del prefix vrf Vrf-blue 192.168.113.0/24 nexthop vrf Vrf-blue 10.10.10.1`
### 5.2.5 show ip route with vrf
- Usage
```bash
show ip route [vrf <vrf-name>] [<ip_address>]
show ipv6 route [vrf <vrf-name>] [<ip_address>]
```
# 6. Restrictions/Limitations
## 6.1 Limitations for VRFs
- When you make an interface a member of an existing VRF, SONiC removes all Layer 3 configurations. You should configure all Layer 3 parameters after adding an interface to a VRF.
- If you configure an interface for a VRF before the VRF exists, the interface is operationally down until you create the VRF.
## 6.2 Limitations for VRF Route Leaking
- Route leaking is supported between any two non-default VRFs and from the default VRF to a non-default VRF.
- By default, the maximum number of IP prefixes that can be imported from the default VRF into a non-default VRF is 1000 routes.
- There is no limit on the number of routes that can be leaked between two non-default VRFs.
# 7. Test cases
## 7.1 Unit test Stage1
- test_VRFMgr_Comprehensive
test the SAI attribute.
```clike=
('v4', 'SAI_VIRTUAL_ROUTER_ATTR_ADMIN_V4_STATE', self.boolean_gen),
('v6', 'SAI_VIRTUAL_ROUTER_ATTR_ADMIN_V6_STATE', self.boolean_gen),
('src_mac', 'SAI_VIRTUAL_ROUTER_ATTR_SRC_MAC_ADDRESS', self.mac_addr_gen),
('ttl_action', 'SAI_VIRTUAL_ROUTER_ATTR_VIOLATION_TTL1_PACKET_ACTION', self.packet_action_gen),
('ip_opt_action', 'SAI_VIRTUAL_ROUTER_ATTR_VIOLATION_IP_OPTIONS_PACKET_ACTION', self.packet_action_gen),
('l3_mc_action', 'SAI_VIRTUAL_ROUTER_ATTR_UNKNOWN_L3_MULTICAST_PACKET_ACTION', self.packet_action_gen),
```
- test_VRFMgr
- test_VRFMgr_Update
- test_VRFMgr_Capacity
## 7.2 Unit test Stage2
Vrf vs test plan are used to verify the function of the swss by checking the content of the APP_DB/ASIC_DB/kernel. FRR and SAI function will be covered by ansible pytest test cases.
No|Test case summary
---------|----------
1|Verify that the vrf entry from config is pushed correctly by vrfmgrd to APP_DB and linux kernel.
2|Verify that the Orchagent is pushing the vrf entry into ASIC_DB by checking the contents in the ASIC_DB.
3|Verify that the random combination of vrf attributes can successfully configured by checking the contents in the APP_DB and ASIC_DB.
4|Verify that the vrf attribute can be updated successfully after vrf is created by checking the contents in the ASIC_DB
5|Verify that the vrf entries can be successfully removed from the CONFIG_DB, APP_DB and ASIC_DB.
6|Verify that the maximum number of vrf entries be created can reach to 1K.
7|Verify that the interface entry from config is pushed correctly by intfmgrd to APP_DB and linux kernel.
8|Verify that the Orchagent is receiving interface creation and deletion from APP_DB.
9|Verify that the Orchagent is pushing the interface entry into ASIC_DB by checking the contents in the ASIC_DB.
10|Verify that the port interface bind IPv4 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB.
11|Verify that the port interface bind IPv4 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB.
12|Verify that the different port interface bound to different vrf can configure the same IPv4 Address by checking the APP_DB and ASIC_DB.
13|Verify that the IPv4 address is removed successfully from the port interface by checking the contents in the APP_DB and ASIC_DB
14|Verify that the port interface bind IPv6 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB.
15|Verify that the port interface bind IPv6 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB.
16|Verify that the IPv6 address is removed successfully from the port interface by checking the contents in the APP_DB and ASIC_DB
17|Verify that the lag interface bind IPv4 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB.
18|Verify that the lag interface bind IPv4 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB.
19|Verify that the IPv4 address is removed successfully from the lag interface by checking the contents in the APP_DB and ASIC_DB
20|Verify that the lag interface bind IPv6 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB.
21|Verify that the lag interface bind IPv6 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB.
22|Verify that the IPv6 address is removed successfully from the lag interface by checking the contents in the APP_DB and ASIC_DB
23|Verify that the vlan interface bind IPv4 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB.
24|Verify that the vlan interface bind IPv4 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB.
25|Verify that the IPv4 address is removed successfully from the vlan interface by checking the contents in the APP_DB and ASIC_DB
26|Verify that the vlan interface bind IPv6 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB.
27|Verify that the vlan interface bind IPv6 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB.
28|Verify that the IPv6 address is removed successfully from the vlan interface by checking the contents in the APP_DB and ASIC_DB
29|Verify that the loopback interface bind IPv4 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB.
30|Verify that the loopback interface bind IPv4 address with vrf correctly by checking the contents in the APP_DB and ASIC_DB.
31|Verify that the IPv4 address is removed successfully from the loopback interface by checking the contents in the APP_DB and ASIC_DB.
32|Verify that the loopback interface bind IPv6 address without vrf correctly by checking the contents in the APP_DB and ASIC_DB.
33|Verify that the IPv6 address remove successfully from the loopback interface by checking the contents in the APP_DB and ASIC_DB.
34|Verify that the neighsyncd pushed neighbor entries to APP_DB correctly by checking the contents in the APP_DB.
35|Verify that the Orchagent is pushing the neighbor entry into ASIC_DB by checking the contents in the ASIC_DB.
36|Verify that the IPv4 neighbor create and delete successfully by checking the contents in the APP_DB and ASIC_DB.
37|Verify that the IPv6 neighbor create and delete successfully by checking the contents in the APP_DB and ASIC_DB.
38|Verify that the different interface with different vrf can add the same IPv4 neighbor address by checking the APP_DB and ASIC_DB.
39|Verify that the fpmsyncd pushed route entries to APP_DB correctly by checking the contents in the APP_DB.
40|Verify that the Orchagent is pushing the route entry into ASIC_DB by checking the contents in the ASIC_DB.
41|Verify that the IPv4 route entry add successfully by checking the contents in the ASIC_DB.
42|Verify that the IPv4 route entry delete successfully by checking the contents in the APP_DB and ASIC_DB.
43|Verify that the IPv6 route entry add successfully by checking the contents in the ASIC_DB.
44|Verify that the IPv6 route entry delete successfully by checking the contents in the APP_DB and ASIC_DB.
45|Verify that the IPv4 route entry with vrf add successfully by checking the contents in the ASIC_DB.
46|Verify that the IPv4 route entry with vrf delete successfully by checking the contents in the APP_DB and ASIC_DB.
47|Verify that the IPv6 route entry with vrf add successfully by checking the contents in the ASIC_DB.
48|Verify that the IPv6 route entry with vrf delete successfully by checking the contents in the APP_DB and ASIC_DB.
49|Verify that the route entry can point to a nexthop in different vrf.
50|Verify that the acl packet action is redirect to a nexthop, the acl entry add correctly by checking the contents in the ASIC_DB.
51|Verify that the kernel vrf config keep the same during vrfmgrd warm-reboot.
52|Verify that the VIRTUAL_ROUTER/ROUTE_ENTRY/NEIGH_ENTRY in ASIC_DB keep the same during vrfmgrd warm-reboot by monitoring the object changes in ASIC_DB.
53|Verify that the vrfmgrd work well after warm-reboot by checking that the new config is pushed correctly to APP_DB and ping work well via vrf port interfaces.
## 7.3 System test

| Config | Verify | Step |
| ------ | ------ | -------------------------- |
|V| | Set up the Topology
| |V| Check each host could ping each other.
|V| | Set up Vrf-red and Vrf-blue and binding the interface|
|V| | Let Host112 and Host114 in Vrf-red|
|V| | Let Host113 and Host115 in Vrf-blue|
| |V| Check host112 could reach host114 and host113 could reach host114|
| |V| Check the host112/host114 could not reach host113/host115 vice versa.|
# 8. Open/Action items - if any
1. STATIC_ROUTE need to add into the CONFIG_DB.
# 9. Summary
------
**Don't COPY BELOW THIS LINE.**
# 10. Appendix
[sonic vrf HLD](https://github.com/Azure/SONiC/blob/master/doc/vrf/sonic-vrf-hld.md)
[Kernel VRF](https://www.kernel.org/doc/Documentation/networking/vrf.txt)
[l3mdev](https://lwn.net/Articles/658471/)
[sonic-utilities doc](https://github.com/Azure/sonic-utilities/blob/master/doc/Command-Reference.md#vxlan--vnet)
[cisco n9k vrf](https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/6-x/unicast/configuration/guide/l3_cli_nxos/l3_virtual.html#86372)
## Requirement (TODO)
6. loopback devices with vrf. with detail
- Add/delete Loopback interfaces per VRF
- Support to add IPv4 and IPv6 host address on these loopback interfaces
- Support to use these loopback interfaces as source for various routing protocol control packet transmission. For instance in case of BGP multihop sessions source IP of the BGP packet will be outgoing interface IP which can change based on the interface status bringing down BGP session though there is alternate path to reach BGP neighbor. In case loopback interface is used as source BGP session would not flap.
- These loopback interface IP address can be utilized for router-id generation on the VRF which could be utilized by routing protocols.
- Support to use these interfaces as tunnel end points if required in future.
- Support to use these interfaces as source for IP Unnumbered interface if required in future.
7. Fallback lookup. (Not support, need a new SAI attribute)
- The fallback feature which defined by RFC4364 is very useful for specified VRF user to access internet through global/main route. Some enterprise users still use this to access internet on vpn environment.
8. VRF route leaking between VRFs. (Define in the other taskforce)
> (TODO: need to check the code)
An ideal approach is to handle the two events similar to what Linux kernel is doing. e.g. if the IP address is configured in an interface first, it will be accepted. Later on when the interface is enslaved to a VRF, the IP address from the master FIB will be removed, and reprogrammed to the VRF table. But this approach is very complicated to support. e.g. it may have IP address conflict in the destination VRF, and the current SONiC infrastructure cannot detect and protect it. So this approach is not supported in this VRF release.
## VRF Test PLAN
https://github.com/Azure/SONiC/blob/master/doc/vrf/vrf-ansible-test-plan.md
