NixOS 21.11 switched to the nf_tables backend for iptables. Let’s see what this means, and what new things we can and cannot do.

Refresher on netfilter, iptables, and nftables

Skip this section if you're familiar with tables, chains, and rules.

To quote the netfilter homepage:

The netfilter project enables packet filtering, network address [and port] translation (NA[P]T), packet logging, userspace packet queueing and other packet mangling.

The netfilter hooks are a framework inside the Linux kernel that allows kernel modules to register callback functions at different locations of the Linux network stack. The registered callback function is then called back for every packet that traverses the respective hook within the Linux network stack.

So, netfilter is the Linux framework for manipulating network packets. It can filter and transform packets at predefined points in the kernel.

On top of netfilter sit the firewalls: the venerable iptables, and the new nftables. A regular user can use these to configure rules like “only allow incoming packets on ports 22, 80, and 443”.

Both iptables and nftables use the same concepts, but with a few differences:

  • Everything is organized into “tables”. These are essentially namespaces. In iptables, there are a fixed number of them: filter, nat, mangle, raw, and security. In nftables, there are no predefined tables, and we can instead create as few or as many as we want. We still frequently find the above five tables in nftables-based systems because that’s what most people and current tools expect, but we might also find other ones.

  • Tables contain “chains” of “rules”. The rules in each chain get evaluated sequentially, either modifying the packets, jumping to a different chain to continue evaluation, or setting a decision for the packets, and ending evaluation. In iptables there are a few predefined chains like INPUT, FORWARD, and OUTPUT where evaluation starts. We can define our own chains, and jump to them from the predefined ones. In nftables, there are no predefined chains, and we can instead connect our chains to predefined “hooks”. This is almost the same thing as in iptables, except that we can have multiple chains connected to each hook, and evaluation will happen in a priority order. So, these chains are a bit like nested match or case expressions, except that control-flow returns to higher levels if the lower levels don’t break or return, and control-flow can generally jump around between the different branches.

  • The “rules” can change most fields in packet headers, and make a decision on how the packets should be handled. For instance, rules can change the source and destination addresses on packets. This is used for NAT, and also by Kubernetes to do routing to services. The decision for each packet is something like “accept it into the system”, or “drop the packet without acknowledging it”, but it can also be something more complicated like “queue the packet for consumption by a userspace process”.

For NixOS 21.11, the nftables backend is enabled. Practically, this means we can continue using the iptables command, or we can start using the new nft command, and gain access to all the new features.

Example

An example will make these concepts clearer, so let’s see what the rules look like for a simple web server.

We spin up a NixOS VM with nixos-shell. Our configuration starts with the default install, adds the iptables and nftables packages, enables the firewall, and opens up the web ports. We’ll reuse this VM when we explore some of the new nftables features later.

{ pkgs, ... }: {
  boot.kernelPackages = pkgs.linuxPackages_latest;
  services.openssh.enable = true;
  environment.systemPackages = with pkgs; [
    nftables
    iptables
    tmux
  ];
  networking.firewall.enable = true;
  networking.firewall.allowedTCPPorts = [
    80   # http
    443  # https
  ];
}
simple-server.nix

We run this with nixos-shell simple-server.nix, and we can inspect the firewall with the iptables command:

root@nixos:// > iptables -L -v
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 nixos-fw   all  --  any    any     anywhere             anywhere

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain nixos-fw (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 nixos-fw-accept  all  --  lo     any     anywhere             anywhere
    0     0 nixos-fw-accept  all  --  any    any     anywhere             anywhere             ctstate RELATED,ESTABLISHED
    0     0 nixos-fw-accept  tcp  --  any    any     anywhere             anywhere             tcp dpt:ssh
    0     0 nixos-fw-accept  tcp  --  any    any     anywhere             anywhere             tcp dpt:http
    0     0 nixos-fw-accept  tcp  --  any    any     anywhere             anywhere             tcp dpt:https
    0     0 nixos-fw-accept  icmp --  any    any     anywhere             anywhere             icmp echo-request
    0     0 nixos-fw-log-refuse  all  --  any    any     anywhere             anywhere

Chain nixos-fw-accept (6 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     all  --  any    any     anywhere             anywhere

Chain nixos-fw-log-refuse (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 LOG        tcp  --  any    any     anywhere             anywhere             tcp flags:FIN,SYN,RST,ACK/SYN LOG level info prefix "refused connection: "
    0     0 nixos-fw-refuse  all  --  any    any     anywhere             anywhere             PKTTYPE != unicast
    0     0 nixos-fw-refuse  all  --  any    any     anywhere             anywhere

Chain nixos-fw-refuse (2 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 DROP       all  --  any    any     anywhere             anywhere
Firewall rules, as shown by iptables

Without trying to understand everything, we recognize some of the words. The -L option lists the filter table by default. This table contains multiple chains, including the INPUT predefined chain, and the nixos-fw chain created by NixOS. An incoming packet first hits the INPUT chain. There are no conditions on the first rule, so evaluation then jumps to the nixos-fw chain. The packet is checked for several conditions like output interface, if it’s part of an established connection, and if its target port is ssh, http, or https. If so, evaluation jumps to the nixos-fw-accept chain, which just sets the ACCEPT decision on the packet. If the packet doesn’t match any of the conditions, evaluation jumps to nixos-fw-log-refuse which outputs a log message, and then jumps to nisos-fw-refuse which DROPs the packet.

I personally don’t like the iptables command very much. I find its output to be hard to read, and it’s full of gotchas. For instance, if we hadn’t used -v in the command above, the out column in the tables would have been missing. This would make the “accept everything destined for lo” rule look like nixos-fw-accept all -- anywhere anywhere, which is very confusing. Also, the command outputs the filter table by default, and nothing in the output indicates that there are more tables. When I started using iptables casually, it was years until I realized there were more tables.

Let’s see if nft looks any better:

root@nixos:// > nft --stateless list table filter
table ip filter {
        chain nixos-fw-accept {
                counter accept
        }

        chain nixos-fw-refuse {
                counter drop
        }

        chain nixos-fw-log-refuse {
                meta l4proto tcp tcp flags & (fin|syn|rst|ack) == syn counter log prefix "refused connection: " level info
                pkttype != unicast counter jump nixos-fw-refuse
                counter jump nixos-fw-refuse
        }

        chain nixos-fw {
                iifname "lo" counter jump nixos-fw-accept
                ct state related,established counter jump nixos-fw-accept
                meta l4proto tcp tcp dport 22 counter jump nixos-fw-accept
                meta l4proto tcp tcp dport 80 counter jump nixos-fw-accept
                meta l4proto tcp tcp dport 443 counter jump nixos-fw-accept
                meta l4proto icmp icmp type echo-request counter jump nixos-fw-accept
                counter jump nixos-fw-log-refuse
        }

        chain INPUT {
                type filter hook input priority filter; policy accept;
                counter jump nixos-fw
        }
}
Firewall rules, as shown by nft

We only list the filter table here to compare the output with that of iptables. If we had run nft list ruleset, we would’ve gotten all the tables. The structure of the output is the same, but a lot of the information implied before is now explicit. For instance, the INPUT chain isn’t intrinsically special anymore. Evaluation starts there only because the chain was attached to the input hook with type filter hook input priority filter. The lo condition in the nixos-fw chain is plainly visible instead of being hidden by default. On the other hand, the counter information makes every row look noisy. If we hadn’t used --stateless, the rules would have all looked like counter packets 597365 bytes 253572991 jump nixos-fw. That said, if we ignore the word “counter” on every line, the output is easier to read.

Good: Hierarchical rule syntax and transactional updates

The listing above hints at the first new feature: hierarchical rule syntax. With iptables, we’d write rules like this to allow packet forwarding for the wg0 interface, and for the 10.32.0.0/16 subnet:

# iptables -P FORWARD DROP
# iptables -A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT
# iptables -A FORWARD -i wg0 -j ACCEPT
# iptables -A FORWARD -s 10.32.0.0/16 -j ACCEPT
# iptables -A FORWARD -d 10.32.0.0/16 -j ACCEPT
Imperative rule setting with iptables

With nft, we can write either the equivalent commands:

# nft add chain ip filter FORWARD '{ type filter hook forward priority filter; policy drop; }'
# nft add rule ip filter FORWARD ct state related,established accept
# nft add rule ip filter FORWARD meta iifname wg0 counter accept
# nft add rule ip filter FORWARD ip daddr 10.32.0.0/16 counter accept
# nft add rule ip filter FORWARD ip saddr 10.32.0.0/16 counter accept
Imperative rule setting with nft

Or, better yet, we can write the rules hierarchically (and transactionally):

# nft -f - <<EOF
add chain ip filter FORWARD;
flush chain ip filter FORWARD;
table ip filter {
    chain FORWARD {
        type filter hook forward priority filter; policy drop;
        ct state related,established accept
        meta iifname wg0 counter accept
        ip daddr 10.32.0.0/16 counter accept
        ip saddr 10.32.0.0/16 counter accept
    }
}
EOF
Declarative rule setting with nft

There are a few things going on here:

  • Instead of repeating ip filter FORWARD on every line, we specify it just once. This makes writing scripts much easier.

  • We’re using nft -f, which applies the entire file atomically. If we get the syntax for any of the rules wrong, we don’t end up with a half-configured firewall.

  • We also specify our FORWARD chain kind-of-declaratively. The add chain and flush chain commands at the beginning ensure the chain exists and is empty. The table and chain commands then configure the chain. After the entire command completes, the chain will look exactly as described, which is a step-up from running a bunch of add rule commands, and hoping that the end result is what we wanted. Mind you, this is only kind-of-declarative because it doesn’t stop us, or other programs, from further modifying the firewall.

In NixOS, this also opens up the possibility of configuring the firewall in just one place based on system options. For instance, my forwarding rules used to be configured by iptables invocations littered across multiple modules. Now, they’re configured in one place based on whether the machine has a Wireguard VPN enabled, and whether it’s part of the Kubernetes cluster:

networking.firewall.extraCommands = with pkgs.lib; ''
  ${pkgs.nftables}/bin/nft -f - <<EOF
  table inet ab-forward;
  flush table inet ab-forward;
  table inet ab-forward {
        chain FORWARD {
              type filter hook forward priority filter; policy drop;
              ct state related,established accept
              ${optionalString cfg.vpn
                "meta iifname ${wgCfg.wgInterface} counter accept;"}
              ${optionalString kubeCfg.master.enable
                 "ip daddr ${kubeCfg.clusterCidrIp4} counter accept;\n
                  ip saddr ${kubeCfg.clusterCidrIp4} counter accept;"}
        }
  }
  EOF
'';
Declarative firewall configuration with nft

Personally, the lack of transactional configuration was the aspect I hated most about iptables. That’s fixed now, so hooray! Unfortunately, we’re not quite there yet in practical terms, as we’ll see later in the “Legacy tools” section.

Good: The inet protocol family

The iptables and nftables rules are scoped by protocol families. So, the hierarchy is actually Protocol Family → Table → Chain → Rule. This also means that the iptables listing in the previous section was incomplete because it neglected IPv6. It should’ve been:

# iptables -P FORWARD DROP
# iptables -A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT
# iptables -A FORWARD -i wg0 -j ACCEPT
# iptables -A FORWARD -s 10.32.0.0/16 -j ACCEPT
# iptables -A FORWARD -d 10.32.0.0/16 -j ACCEPT
# ip6tables -P FORWARD DROP
# ip6tables -A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT
# ip6tables -A FORWARD -i wg0 -j ACCEPT
# ip6tables -A FORWARD -s 10.32.0.0/16 -j ACCEPT
# ip6tables -A FORWARD -d 10.32.0.0/16 -j ACCEPT
Writing everything twice with iptables

Needless to say, this sort of duplication is annoying to write, it’s hard to check that the two families are configured the same, and this pattern makes it very easy to forget about IPv6.

With nftables, we can use the inet family to configure both IPv4 and IPv6 at the same time. So, instead of writing nft add rule ip and nft add rule ip6, we can write nft add rule inet (or, better yet, use the hierarchical syntax with inet instead of duplicated the ip and ip6 tables).

Good: Debugging with nft monitor

Next up, have you ever wondered what changes some program is making to the firewall? For instance, what does kube-proxy do when a new service is added? We can answer that very easily with nft monitor.

We start nft monitor in one terminal, create a new Kubernetes service on port 12345, and check the first terminal for output:

apiVersion: v1
kind: Service
metadata:
  name: test-service
spec:
  selector:
    app: scvalex-net  # selects the two pods running this website
  ports:
  - port: 12345
    name: web
A test Kubernetes service
# nft monitor
... many rule deletions ...
... many rule additions ...
add rule ip nat KUBE-SVC-IMA7E3MHIXVXIG46 meta l4proto tcp ip saddr != 10.32.0.0/16 ip daddr 10.33.0.190  tcp dport 12345 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
add rule ip nat KUBE-SERVICES meta l4proto tcp ip daddr 10.33.0.190  tcp dport 12345 counter packets 0 bytes 0 jump KUBE-SVC-IMA7E3MHIXVXIG46
add rule ip nat KUBE-SVC-IMA7E3MHIXVXIG46   counter packets 0 bytes 0 jump KUBE-SEP-O7LSKZVJZBWKHEDW
add rule ip nat KUBE-SVC-IMA7E3MHIXVXIG46  counter packets 0 bytes 0 jump KUBE-SEP-NNVVCRJZTKE7RKSU
add rule ip nat KUBE-SEP-O7LSKZVJZBWKHEDW ip saddr 10.32.2.146  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
add rule ip nat KUBE-SEP-O7LSKZVJZBWKHEDW meta l4proto tcp   counter packets 0 bytes 0 dnat to 10.32.2.146:12345
add rule ip nat KUBE-SEP-NNVVCRJZTKE7RKSU ip saddr 10.32.4.130  counter packets 0 bytes 0 jump KUBE-MARK-MASQ
add rule ip nat KUBE-SEP-NNVVCRJZTKE7RKSU meta l4proto tcp   counter packets 0 bytes 0 dnat to 10.32.4.130:12345
add rule ip nat KUBE-SVC-J2WIMEXQLDNVVHW7 meta l4proto tcp ip saddr != 10.32.0.0/16 ip daddr 10.33.0.205  tcp dport 12345 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
... many more rule additions ...
Using nft monitor to spy on kube-proxy

From the sheer number of changes, we can tell that kube-proxy smashes all the rules in place every time. Searching for port 12345, we see that the most interesting rules are in the nat table. Looking back, having something like this would’ve made writing the post about Kubernetes networking much easier.

Speaking of Kubernetes, another question I’ve had for a while is if this log message from flannel is correct:

... flannel[18293]: ... main.go:313] Changing default FORWARD chain policy to ACCEPT

On the face of it, that sounds like a pretty dodgy statement. It’s like seeing “Opening all INPUT ports” in nginx logs. However, if we check with nft monitor, we see that flannel doesn’t actually make any changes to the firewall, so the statement is wrong.

Good: Debugging with nftrace

Another nice addition is tooling to debug nftables rules more easily. The way this works is that a rule may toggle the nftrace meta mark on a packet, and then we can see the packet as it goes through the chains with nftables monitor trace.

As an example, let’s block access to Facebook from our test VM. For demonstration purposes, we create three chains connected to the output hook, organized in two tables.

  • The fb-trace table has the lowest numerical priority, so the packet goes through it first. This is where we set the nftrace meta mark. We set it on the Facebook subnet, and on 8.8.8.8 which we use to check that sending packets to other addresses still works.

  • The packet then goes through the fb-block table where we drop any packets going to 157.240.240.35 or 157.240.240.36.

  • Finally, we ensure an OUTPUT chain exists in the system’s filter table which allows all packets through. Our Facebook packet should not reach this point. Note that we switch to using the ip protocol family here because that’s where NixOS creates the filter table. In a real system, we would add another chain like this for ip6.

# nft -f - <<EOF
add table inet fb-trace;
add chain inet fb-trace OUTPUT;
flush chain inet fb-trace OUTPUT;
delete chain inet fb-trace OUTPUT;
table inet fb-trace {
    chain OUTPUT {
        type filter hook output priority -10; policy accept;
        ip daddr {157.240.240.0/24, 8.8.8.8} nftrace set 1;
    }
}

add table inet fb-block;
add chain inet fb-block OUTPUT;
flush chain inet fb-block OUTPUT;
delete chain inet fb-block OUTPUT;
table inet fb-block {
    chain OUTPUT {
        type filter hook output priority -5; policy accept;
        ip daddr {157.240.240.35, 157.240.240.36} counter drop;
    }
}

table ip filter {
    chain OUTPUT {
        type filter hook output priority 0; policy accept;
    }
}
EOF

We start nft monitor trace in one terminal, and try out a couple of pings in another:

root@nixos:// > ping -W5 -c1 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=255 time=31.2 ms
--- 8.8.8.8 ping statistics ---1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 31.195/31.195/31.195/0.000 ms

root@nixos:// > ping -W5 -c1 157.240.240.35
PING 157.240.240.35 (157.240.240.35) 56(84) bytes of data.
--- 157.240.240.35 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

As expected, pinging 8.8.8.8 works, but pinging 157.240.240.35 times out. Checking the tracing terminal, we see:

root@nixos:// > nft monitor trace
trace id a4ac8c93 inet fb-trace OUTPUT packet: oif "eth0" ip saddr 10.0.2.15 ip daddr 8.8.8.8 ip dscp cs0 ip ecn not-ect ip ttl64 ip id 19836 ip protocol icmp ip length 84 icmp type echo-request icmp code net-unreachable icmp id 8 icmp sequence 1 @th,64,96 0x2960cf6100000000ef1e0d00
trace id a4ac8c93 inet fb-trace OUTPUT rule ip daddr { 8.8.8.8, 157.240.240.0/24 } meta nftrace set 1 (verdict continue)
trace id a4ac8c93 inet fb-trace OUTPUT verdict continue
trace id a4ac8c93 inet fb-trace OUTPUT policy accept
trace id a4ac8c93 inet fb-block OUTPUT packet: oif "eth0" ip saddr 10.0.2.15 ip daddr 8.8.8.8 ip dscp cs0 ip ecn not-ect ip ttl64 ip id 19836 ip protocol icmp ip length 84 icmp type echo-request icmp code net-unreachable icmp id 8 icmp sequence 1 @th,64,96 0x2960cf6100000000ef1e0d00
trace id a4ac8c93 inet fb-block OUTPUT verdict continue
trace id a4ac8c93 inet fb-block OUTPUT policy accept
trace id a4ac8c93 ip filter OUTPUT packet: oif "eth0" ip saddr 10.0.2.15 ip daddr 8.8.8.8 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 19836 ip length 84 icmp type echo-request icmp code net-unreachable icmp id 8 icmp sequence 1 @th,64,96 0x2960cf6100000000ef1e0d00
trace id a4ac8c93 ip filter OUTPUT verdict continue
trace id a4ac8c93 ip filter OUTPUT policy accept

trace id 4683d399 inet fb-trace OUTPUT packet: oif "eth0" ip saddr 10.0.2.15 ip daddr 157.240.240.35 ip dscp cs0 ip ecn not-ectip ttl 64 ip id 48251 ip protocol icmp ip length 84 icmp type echo-request icmp code net-unreachable icmp id 9 icmp sequence 1 @th,64,96 0x3160cf61000000009bd00200
trace id 4683d399 inet fb-trace OUTPUT rule ip daddr { 8.8.8.8, 157.240.240.0/24 } meta nftrace set 1 (verdict continue)
trace id 4683d399 inet fb-trace OUTPUT verdict continue
trace id 4683d399 inet fb-trace OUTPUT policy accept
trace id 4683d399 inet fb-block OUTPUT packet: oif "eth0" ip saddr 10.0.2.15 ip daddr 157.240.240.35 ip dscp cs0 ip ecn not-ectip ttl 64 ip id 48251 ip protocol icmp ip length 84 icmp type echo-request icmp code net-unreachable icmp id 9 icmp sequence 1 @th,64,96 0x3160cf61000000009bd00200
trace id 4683d399 inet fb-block OUTPUT rule ip daddr { 157.240.240.35, 157.240.240.36 } counter packets 1 bytes 84 drop (verdict drop)
Tracing two packets through the firewall

The 8.8.8.8 packet hit the fb-trace, fb-block, and filter tables. It had nftrace attached to it in the first table, and then fell through the accept policy of each table. On the other hand, the 157.240.240.35 packet went through fb-trace, and then got dropped in fb-block. As expected, it never made it to filter.

Once we’re satisfied with our firewall rules, we can just delete our fb-trace table to cleanup:

# nft delete table inet fb-trace

Good: Maps and sets

Another new feature is the addition of sets and maps. To copy the example from regit’s blog, these let you avoid having to copy-paste the same rule multiple times with just one field changed:

# ip6tables -A INPUT -p tcp -m multiport --dports 23,80,443 -j ACCEPT
# ip6tables -A INPUT -p icmpv6 --icmpv6-type neighbor-solicitation -j ACCEPT
# ip6tables -A INPUT -p icmpv6 --icmpv6-type echo-request -j ACCEPT
# ip6tables -A INPUT -p icmpv6 --icmpv6-type router-advertisement -j ACCEPT
# ip6tables -A INPUT -p icmpv6 --icmpv6-type neighbor-advertisement -j ACCEPT
Copy-pasted rules with iptables
# nft add rule ip6 filter input tcp dport {telnet, http, https} accept
# nft add rule ip6 filter input icmpv6 type { nd-neighbor-solicit, echo-request, nd-router-advert, nd-neighbor-advert } accept
De-duplicated rules with nft sets

The latter should also be faster because checking for set membership is more efficient than sequential rule evaluation.

Another cool thing you can do is port knocking at the firewall level.

Bad: Legacy tools

We’ve seen that nft introduces several improvements over iptables, but there’s a catch: trying to use the new features to their fullest is awkward because it conflicts with existing tooling.

Specifically, the nft -f command lets us atomically configure our firewall, which is a marked improvement over having a script that incrementally adds rules, and can fail partway through leaving the firewall in an undefined state. However, many existing tools attempt to change the firewall at runtime. For instance, the usual way people setup Wireguard VPNs is by having the wg-quick tool add rules to iptables after the network interface is created. As another example, the way NixOS configures NAT is by appending to the long script of firewall setup commands.

So, you can write all of your firewall configuration in one place, and ensure that either it’s all applied, or none of it is. But then some other program is going to come in, and make other changes to the firewall in the live system.

This will get fixed eventually, but at the moment, we can’t have fully declarative configurations for our firewall. The best we can aim for right now is making chunks of the configuration declarative.

Bad: Multiple chains on the same hook

Another issue is that, due to the way the accept and drop policies are interpreted, it’s very hard to have both rules written in multiple tables, and a base policy of drop. This is a bit hard to explain without an example, so let’s consider how we’d restrict output from a box.

We’d like our system to only allow outgoing connections to certain addresses (e.g. 8.8.8.8). To achieve this, we’ll try connecting chains to the output hook in three tables:

  • In the test-trace table, we enable nftrace so that we can later debug what’s going on.

  • In the allow-dns table, we specify that we want to allow packets to 8.8.8.8.

  • In the filter table, we define the base case that no outgoing packets are allowed (except for already established connections)

# nft -f - <<EOF
add table inet test-trace;
add chain inet test-trace OUTPUT;
flush chain inet test-trace OUTPUT;
delete chain inet test-trace OUTPUT;
table inet test-trace {
    chain OUTPUT {
        type filter hook output priority -10; policy accept;
        ip daddr 8.8.8.8 nftrace set 1;
    }
}

add table inet allow-dns;
add chain inet allow-dns OUTPUT;
flush chain inet allow-dns OUTPUT;
delete chain inet allow-dns OUTPUT;
table inet allow-dns {
    chain OUTPUT {
        type filter hook output priority -5; policy drop;
        ip daddr 8.8.8.8 accept;
    }
}

table ip filter {
    chain OUTPUT {
        type filter hook output priority 0; policy drop;
        ct state related,established accept
    }
}
EOF
Blocking all outgoing connections, except for 8.8.8.8. This does NOT work.

We want separate tables like this to make the rules easier to manage. In an ideal world where the entire firewall configuration is generated in one place, this isn’t necessary, but in the current world where the filter table is going to be full of rules added by random programs and modules, having our configuration in separate tables is helpful.

This setup doesn’t work. We see pings timing out:

root@nixos:// > ping -c1 -W5 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.

--- 8.8.8.8 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
root@nixos:// > nft monitor trace
trace id 5c12872a inet test-trace OUTPUT packet: oif "eth0" ip saddr 10.0.2.15 ip daddr 8.8.8.8 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 15754 ip protocol icmp ip length 84 icmp type echo-request icmp code net-unreachable icmp id 13 icmp sequence 1 @th,64,96 0x3f6dcf6100000000cf150f00
trace id 5c12872a inet test-trace OUTPUT rule ip daddr { 8.8.8.8, 157.240.240.0/24 } meta nftrace set 1 (verdict continue)
trace id 5c12872a inet test-trace OUTPUT verdict continue
trace id 5c12872a inet test-trace OUTPUT policy accept
trace id 5c12872a inet allow-dns OUTPUT packet: oif "eth0" ip saddr 10.0.2.15 ip daddr 8.8.8.8 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 15754 ip protocol icmp ip length 84 icmp type echo-request icmp code net-unreachable icmp id 13 icmp sequence 1 @th,64,96 0x3f6dcf6100000000cf150f00
trace id 5c12872a inet allow-dns OUTPUT rule ip daddr 8.8.8.8 accept (verdict accept)
trace id 5c12872a ip filter OUTPUT packet: oif "eth0" ip saddr 10.0.2.15 ip daddr 8.8.8.8 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 15754 ip length 84 icmp type echo-request icmp code net-unreachable icmp id 13 icmp sequence 1 @th,64,96 0x3f6dcf6100000000cf150f00
trace id 5c12872a ip filter OUTPUT verdict continue
trace id 5c12872a ip filter OUTPUT policy drop

Looking at the trace, it’s clear what’s going on: the packet goes through the test-trace table without issue, then gets accepted in the allow-dns table, then gets dropped by the policy in the filter table. This is because accept doesn’t end evaluation, and just passes the packet to the next chain attached to the hook. This is different from drop which does end evaluation. Practically, this means that, if our base policy is drop, then we must have all the accept rules in the same chain. This emergent restriction makes tables less useful, and the configs noisier.

Conclusion

All in all, nft is a big improvement over iptables, especially in terms of usability. I had noticed all the hubbub about nft in the summer of 2021, and now that I’ve had a chance to see for myself, I’m completely on board this hype train. I still don’t see where the pixelated apes come in, but that might be in some info manual I haven’t read.