BGP roles in BIRD

BGP roles in BIRD

BGP roles for route leak prevention

Not long ago (May 2022 according to the document) a new RFC on BGP was published. This one, numbered 9234, is about route leak detection and prevention. We all know what to do to prevent route leaks and MANRS is doing a great job here, spreading the word about filtering, RPKI and other related stuff. However, this does not stop all the disturbances and especially leaks to happen all over the Internet. Qrator, for example, documenting them on they blog per quarter and write special posts about the biggest (and most hilarious) ones.
So what this new RFC brings to the table? The idea is rather simple. We all know that on the Internet there are some kind of a relationship exists between any AS or a peer. Some AS is a customer of another, which could be called provider. Two other AS’ are peers (neither customers, nor providers) to one another. RFC9234 using this observation and postulate that we can automatically prevent leaks by telling the software (network operating system or router daemon or whatever implements BGP in a system) about those roles. So if you configure your sessions with a “customer” role (your local role), that means you do not want to re-advertise routes between your peers. This is essentially what makes your AS a customer AS (or a stub). Re-advertising routes between such sessions turns your AS into transit AS and count as a route leak and a disaster.

BIRD BGP roles

So why writing a blog post if one can just go and read the RFC? Which is short and not complicated by the way. It is of course because there is a working implementation already which we can look at! Since version 2.0.11 released in November 2022, BIRD supports BGP roles. So we can spin up a simple lab and look at how this works.

Note of caution

While new mechanisms making Internet more stable and secure is warmly welcome, that doesn’t mean we should abandon any prior art. Not right now at least. So be kind and follow MANRS, filter your routes thoroughly and wash your hands before configuring BGP!

Setting up a lab

There are a lot of ways to configure a lab and you probably have your prefered method, so you can just skip this section and go to the next one. In case you are interested to replicate my setup, here it is. Please note that this was done on a up-to-date Arch Linux as of 14.01.2023. Here is the topology:

     ------------            ------------
     |archbird1 |            |archbird3 |
     |  isp1    |            |  isp2    |
     |  65100   |            |  65300   |
     | ::100/64 |            | ::300/64 |
     |          |            |          |
     ------------            ------------
          |                        |
          |                        |
          |       ----------       |
          |       |        |       |
          |-------| bridge |-------|
                  |        |
                  ----------
                       |
                       |
                       |
                  ------------
                  |archbird2 |
                  |  cust1   |
                  |  65200   |
                  | ::200/64 |
                  |          |
                  ------------

While in real world you probably will not be connected to your ISPs through a bridge using single broadcast domain it doesn’t matter too much in this test. To produce this topology I’ve created 3 containers using systemd-nspawn. To build a container with Arch Linux, BIRD and some useful tools we need to follow these steps:

sudo mkdir /var/lib/machines/archbird1 # creating a directory for the container
sudo pacstrap -K -c /var/lib/machines/archbird1 base vim tcpdump traceroute openbsd-netcat bird # bootstrap the container
sudo systemd-nspawn -D /var/lib/machines/archbird1 # chroot into container
passwd # set root password (we are inside the container tree now)
echo -e 'pts/0npts/1' # fix login issue, see https://wiki.archlinux.org/title/Systemd-nspawn#Root_login_fails
sed -i '/securetty/ d' /usr/lib/tmpfiles.d/arch.conf # fix login issue, again see link above
logout # exit chroot

Next we need a bit of network configuration. Create a directory /etc/systemd/nspawn if you don’t have one and add a file /etc/systemd/nspawn/archbird1.nspawn with this inside:

[Network]
Private=yes
VirtualEthernet=yes
Zone=bird

This will add a virtual Ethernet interface (veth) to both host (your machine a.k.a. hypervisor) and container and link them. The “Zone” part will automatically create a bridge interface and connect veth interface on a host to this bridge.
So now we have one container ready to boot. While I’m sure you can just copy-paste container directory to get 2 other containers, I’m not sure that this will flawlessly works. At least this will produce containers with a same hostnames. And while this is of course fixable, there could be other issues. Since we just need 2 more I’d just repeat the steps creating archbird2 and archbird3.
When all 3 are ready, let’s boot them up:

sudo machinectl start archbird1
sudo machinectl start archbird2
sudo machinectl start archbird3

Now we can login to each one with (in different terminals of course):

sudo machinectl login archbird1
sudo machinectl login archbird2
sudo machinectl login archbird3

To authenticate use login ‘root’ and a password which you’ve set during the container setup. Now let’s deal with the networking part inside them. Create a file /etc/systemd/network/00-host0.network with such lines inside:

[Match]
Virtualization=container
Name=host0

[Network]
LinkLocalAddressing=ipv6
Address=fdff:856f:5f10::100/64
LLDP=yes
EmitLLDP=nearest-bridge

This assumes that the interface inside the container called “host0”. As you can see I’ve opted for an ULA address, because I think they are the most appropriate here. The only line which should change between containers is “Address”. Enable and start systemd-networkd daemon:

systemctl enable --now systemd-networkd.service

This will configure interface appropriately. After completing this on all 3 machines, test the connectivity with a simple ping. Did it work? Not for me unfortunately. Checking my host system reveals that for some reason veth interfaces on the host were not added to the bridge. Is this a bug? Maybe. I’ve fixed it manually:

14/01/23 23:32:07 yman@zaand ~ > ip -d link show dev vb-archbird1
20: vb-archbird1@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 7e:c6:f0:67:53:19 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0  allmulti 0 minmtu 68 maxmtu 65535 
    veth addrgenmode eui64 numtxqueues 32 numrxqueues 32 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536 
# as you can see the veth interface isn't part of any bridge
# so let's fix that
14/01/23 23:32:11 yman@zaand ~ > sudo ip link set vb-archbird1 master vz-bird
14/01/23 23:32:47 yman@zaand ~ > ip -d link show dev vb-archbird1
20: vb-archbird1@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vz-bird state UP mode DEFAULT group default qlen 1000
    link/ether 7e:c6:f0:67:53:19 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 1  allmulti 1 minmtu 68 maxmtu 65535 
    veth 
    bridge_slave state forwarding priority 32 cost 2 hairpin off guard off root_block off fastleave off learning on flood on port_id 0x8001 port_no 0x1 designated_port 32769 designated_cost 0 designated_bridge 8000.1a:82:54:58:b5:19 designated_root 8000.1a:82:54:58:b5:19 hold_timer    0.00 message_age_timer    0.00 forward_delay_timer   13.06 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on bcast_flood on mcast_to_unicast off neigh_suppress off group_fwd_mask 0 group_fwd_mask_str 0x0 vlan_tunnel off isolated off locked off addrgenmode eui64 numtxqueues 32 numrxqueues 32 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536 
# now it's in the bridge

After this ping started to work fine. So let’s turn to the BIRD. Configuration for ISP peers (in /etc/bird.conf):

log syslog all;
router id 0.0.0.1;

protocol device {
}

protocol kernel {
    ipv4 { export all; };
}

protocol kernel {
    ipv6 { export all; };
}

protocol bgp cust1 {
    local fdff:856f:5f10::100 as 65100;
    neighbor fdff:856f:5f10::200 as 65200;
    ipv6 {
        import all;
        export all;
    };
}

For another ISP change ‘router id’ and ‘local’ statements, otherwise configs are the same. Configuration on the customer is not very different, we just have 2 peers:

log syslog all;
router id 0.0.0.2;

protocol device {
}

protocol kernel {
    ipv4 { export all; };
}

protocol kernel {
    ipv6 { export all; };
}

protocol bgp isp1 {
    local fdff:856f:5f10::200 as 65200;
    neighbor fdff:856f:5f10::100 as 65100;
    ipv6 {
        import all;
        export all;
    };
}

protocol bgp isp2 {
    local fdff:856f:5f10::200 as 65200;
    neighbor fdff:856f:5f10::300 as 65300;
    ipv6 {
        import all;
        export all;
    };
}

And now we can start the BIRD with:

systemctl enable --now bird.service

After doing this on all 3 birdc show proto should list your BGP sessions in “Established” state.

Tests

As you could see in a previous section I configured BGP sessions with ‘import all;’ and ‘export all;’. In a real world setup this is wrong and dangerous! Don’t ever do it! Configure your policies according to your session role and what you are trying to achieve. But for a test this is exactly what we want. And this actually what new RFC want to help us with. If you made a mistake configuring your policies/filters, BGP roles come to help you prevent a disaster. So let’s first let a disaster happen, it shouldn’t be too big in a lab! Actually it wouldn’t even look like a disaster. Let’s advertise something from isp2 side. Add a file /etc/systemd/network/01-dummy0.netdev with these lines:

[Match]
Virtualization=container

[NetDev]
Name=dummy0
Kind=dummy

And a file /etc/systemd/network/02-dummy0.network with those lines:

[Match]
Virtualization=container
Name=dummy0

[Network]
LinkLocalAddressing=ipv6
Address=fdff:856f:5f10:4000::1/50

And reconfigure networking with networkctl reload. This will produce interface “dummy0” with /50 prefix on it. Maybe not the easiest way to add a static route, but that’s the first thing which my mind have produced. To add it into BIRD we also need to add one more protocol into /etc/bird.conf:

protocol direct {
    ipv6;
    interface "dummy*";
}

Do a systemctl reload bird.service afterwards and you check your cust1 container (a.k.a. archbird2). birdc show route all should show you /50 arriving from isp2. Do we have it on isp1? Of course! So now we are having a route leak. Customer AS was turned into transit AS! Let’s use our new capabilities to stop it! Adjust BGP sessions with isp2 on a cust1 container adding these lines:

local role customer;
require roles yes;

After reloading bird (systemctl reload bird.service) the BGP sessions goes down. Whoops! This is because ‘require roles yes;’ of course. We can omit it, but let’s update isp2 side to match:

local role provider;
require roles yes;

After bird reload you will see the session comes back up. You will still see /50 on a customer and on isp1. Why? Because our session with isp1 still have no role. Let’s fix it and add a single line to isp1 BGP session (on cust1 side!):

local role customer;

Again, reload bird. Now, the session is still up, because we didn’t mandate roles, but the isp1 is no longer receiving /50. This time we have two sessions with ‘local role customer;’ that means, we do not want to readvertise routes between them. That’s simple! How this works internally? Let’s tear down the session between cust1 and isp2 set up a tcpdump and bring it up again.

tcpdump -envvvi host0 tcp port 179 # on cust1/archbird2
birdc disable cust1 # on isp2/archbird3
birdc enable cust1 # on isp2/archbird3

tcpdump will produce quite an output, but let’s look at the most interesting parts. Interesting packet 1:

10:09:46.993538 1e:a8:a7:1d:59:fb > ea:8e:05:b6:96:6b, ethertype IPv6 (0x86dd), length 142: (class 0xc0, flowlabel 0xc4b3f, hlim 1, next-header TCP (6) payload length: 88) fdff:856f:5f10::200.41627 > fdff:856f:5f10::300.179: Flags [P.], cksum 0xca5d (incorrect -> 0x1d06), seq 1:57, ack 1, win 507, options [nop,nop,TS val 871092027 ecr 2684418360], length 56: BGP
    Open Message (1), length: 56
      Version 4, my AS 65200, Holdtime 240s, ID 0.0.0.2
      Optional parameters, length: 27
        Option Capabilities Advertisement (2), length: 25
          Multiprotocol Extensions (1), length: 4
            AFI IPv6 (2), SAFI Unicast (1)
            0x0000:  0002 0001
          Route Refresh (2), length: 0
          Unknown (9), length: 1
            no decoder for Capability 9
            0x0000:  03
          Graceful Restart (64), length: 2
            Restart Flags: [none], Restart Time 120s
            0x0000:  0078
          32-Bit AS Number (65), length: 4
             4 Byte AS 65200
            0x0000:  0000 feb0
          Enhanced Route Refresh (70), length: 0
            no decoder for Capability 70
          Long-lived Graceful Restart (71), length: 0

As you can see this is Open message, where cust1 is establishing session with isp2. There is a router ID, AS number, multiple capabilities like LLGR and route refresh. What we are interested about is this part:

Unknown (9), length: 1
  no decoder for Capability 9
  0x0000:  03

This is a capability unknown to tcpdump with a value of 03. If we look into RFC section 4.1 this is exactly why are we here. This is BGP role capability (code 9) with a customer value (3). Packet from the isp2 will be similar with the role set to provider (0). Interesting packet 2:

10:09:49.996722 ea:8e:05:b6:96:6b > 1e:a8:a7:1d:59:fb, ethertype IPv6 (0x86dd), length 178: (class 0xc0, flowlabel 0x9efdf, hlim 1, next-header TCP (6) payload length: 124) fdff:856f:5f10::300.179 > fdff:856f:5f10::200.41627: Flags [P.], cksum 0xca81 (incorrect -> 0x9235), seq 76:168, ack 76, win 502, options [nop,nop,TS val 2684421364 ecr 871092069], length 92: BGP
    Update Message (2), length: 92
      Multi-Protocol Reach NLRI (14), length: 45, Flags [OE]: 
        AFI: IPv6 (2), SAFI: Unicast (1)
        nexthop: fdff:856f:5f10::300, fe80::e88e:5ff:feb6:966b, nh-length: 32, no SNPA
          fdff:856f:5f10:4000::/50
        0x0000:  0002 0120 fdff 856f 5f10 0000 0000 0000
        0x0010:  0000 0300 fe80 0000 0000 0000 e88e 05ff
        0x0020:  feb6 966b 0032 fdff 856f 5f10 40
      Origin (1), length: 1, Flags [T]: IGP
        0x0000:  00
      AS Path (2), length: 6, Flags [T]: 65300 
        0x0000:  0201 0000 ff14
      Unknown Attribute (35), length: 4, Flags [OT]: 
        no Attribute 35 decoder
        0x0000:  0000 ff14

This is an Update message, where isp2 is advertising our /50 to the cust1. The most interesting part is at the bottom. Again, tcpdump is unaware of such new stuff as attribute 35 with value ff14. RFC section 5 tells us that path attribute 35 called ‘only to customer’ (OTC) and the value depends on the roles. In our case it’s ff14 which is a hex for 65300, our isp2 ASN.
So how exactly it’s preventing a leak? Section 5 of RFC mandates that

   2.  If a route already contains the OTC Attribute, it MUST NOT be
   propagated to Providers, Peers, or RSes.

So, our cust1 receives a route with OTC set to isp2 ASN and will not be advertise it to any other provider. That’s why we don’t even need to configure isp1 with appropriate role! Actually in such a setup we don’t even need roles on isp2! Because having roles configured on cust1 will force it to set OTC for the received prefixes from the providers and as such, prevent leaks (again see section 5 of the RFC). Of course it’s too simple setup, so in something more complicated you probably want to mandate roles.
For the sake of completeness let’s also check what BIRD thinks about all this. Let’s show our isp2 session from the cust1 (birdc show protocol all isp2):

isp2       BGP        ---        up     10:09:46.993  Established   
  BGP state:          Established
    Neighbor address: fdff:856f:5f10::300
    Neighbor AS:      65300
    Local AS:         65200
    Neighbor ID:      0.0.0.3
    Local capabilities
      Multiprotocol
        AF announced: ipv6
      Route refresh
      Graceful restart
      4-octet AS numbers
      Enhanced refresh
      Long-lived graceful restart
      Role: customer
    Neighbor capabilities
      Multiprotocol
        AF announced: ipv6
      Route refresh
      Graceful restart
      4-octet AS numbers
      Enhanced refresh
      Long-lived graceful restart
      Role: provider

I’ve omitted output a bit, but you can see the roles alongside other capabilities there. And now the route (birdc show route all):

Table master6:
fdff:856f:5f10:4000::/50 unicast [cust1 23:57:21.217 from fdff:856f:5f10::200] * (100) [AS65300i]
    via fdff:856f:5f10::300 on host0
    Type: BGP univ
    BGP.origin: IGP
    BGP.as_path: 65200 65300
    BGP.next_hop: fdff:856f:5f10::300 fe80::e88e:5ff:feb6:966b
    BGP.local_pref: 100
    BGP.otc: 65300

You can see the ‘BGP.otc’ value here.

Conclusion

This is quite interesting and simple mechanism which can save you a headache from ‘fat fingering’ our a simple mistake. I would like to see more adoption of this among vendors, but it’s very cool that working implementation is already there. Still I believe that this should be a last barrier on the leak path and you should not stop to configure your filters appropriately.
After writing all this I’ve realised that FRRouting 8.4 (released in November 2022) added support of roles too! So this could be an inter-op post, but it is not…
And note that real world relations are way more complicated that this, so check out RFC as it covers more roles and their behaviour.

Comments are closed.