NixOS 21.11 switched to the nf_tables
backend for iptables
. Let’s see what this means, and what new things we can and cannot do.
Table of contents
Refresher on netfilter
, iptables
, and nftables
Skip this section if you're familiar with tables, chains, and rules.
To quote the netfilter
homepage:
The netfilter project enables packet filtering, network address [and port] translation (NA[P]T), packet logging, userspace packet queueing and other packet mangling.
The netfilter hooks are a framework inside the Linux kernel that allows kernel modules to register callback functions at different locations of the Linux network stack. The registered callback function is then called back for every packet that traverses the respective hook within the Linux network stack.
So, netfilter
is the Linux framework for manipulating network packets. It can filter and transform packets at predefined points in the kernel.
On top of netfilter
sit the firewalls: the venerable iptables
, and the new nftables
. A regular user can use these to configure rules like “only allow incoming packets on ports 22, 80, and 443”.
Both iptables
and nftables
use the same concepts, but with a few differences:
-
Everything is organized into “tables”. These are essentially namespaces. In
iptables
, there are a fixed number of them:filter
,nat
,mangle
,raw
, andsecurity
. Innftables
, there are no predefined tables, and we can instead create as few or as many as we want. We still frequently find the above five tables innftables
-based systems because that’s what most people and current tools expect, but we might also find other ones. -
Tables contain “chains” of “rules”. The rules in each chain get evaluated sequentially, either modifying the packets, jumping to a different chain to continue evaluation, or setting a decision for the packets, and ending evaluation. In
iptables
there are a few predefined chains likeINPUT
,FORWARD
, andOUTPUT
where evaluation starts. We can define our own chains, and jump to them from the predefined ones. Innftables
, there are no predefined chains, and we can instead connect our chains to predefined “hooks”. This is almost the same thing as iniptables
, except that we can have multiple chains connected to each hook, and evaluation will happen in a priority order. So, these chains are a bit like nestedmatch
orcase
expressions, except that control-flow returns to higher levels if the lower levels don’tbreak
orreturn
, and control-flow can generally jump around between the different branches. -
The “rules” can change most fields in packet headers, and make a decision on how the packets should be handled. For instance, rules can change the source and destination addresses on packets. This is used for NAT, and also by Kubernetes to do routing to services. The decision for each packet is something like “accept it into the system”, or “drop the packet without acknowledging it”, but it can also be something more complicated like “queue the packet for consumption by a userspace process”.
For NixOS 21.11, the nftables
backend is enabled. Practically, this means we can continue using the iptables
command, or we can start using the new nft
command, and gain access to all the new features.
Example
An example will make these concepts clearer, so let’s see what the rules look like for a simple web server.
We spin up a NixOS VM with nixos-shell
. Our configuration starts with the default install, adds the iptables
and nftables
packages, enables the firewall, and opens up the web ports. We’ll reuse this VM when we explore some of the new nftables
features later.
{ pkgs, ... }: {
boot.kernelPackages = pkgs.linuxPackages_latest;
services.openssh.enable = true;
environment.systemPackages = with pkgs; [
nftables
iptables
tmux
];
networking.firewall.enable = true;
networking.firewall.allowedTCPPorts = [
80 # http
443 # https
];
}
simple-server.nix
We run this with nixos-shell simple-server.nix
, and we can inspect the firewall with the iptables
command:
root@nixos:// > iptables -L -v
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 nixos-fw all -- any any anywhere anywhere
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain nixos-fw (1 references)
pkts bytes target prot opt in out source destination
0 0 nixos-fw-accept all -- lo any anywhere anywhere
0 0 nixos-fw-accept all -- any any anywhere anywhere ctstate RELATED,ESTABLISHED
0 0 nixos-fw-accept tcp -- any any anywhere anywhere tcp dpt:ssh
0 0 nixos-fw-accept tcp -- any any anywhere anywhere tcp dpt:http
0 0 nixos-fw-accept tcp -- any any anywhere anywhere tcp dpt:https
0 0 nixos-fw-accept icmp -- any any anywhere anywhere icmp echo-request
0 0 nixos-fw-log-refuse all -- any any anywhere anywhere
Chain nixos-fw-accept (6 references)
pkts bytes target prot opt in out source destination
0 0 ACCEPT all -- any any anywhere anywhere
Chain nixos-fw-log-refuse (1 references)
pkts bytes target prot opt in out source destination
0 0 LOG tcp -- any any anywhere anywhere tcp flags:FIN,SYN,RST,ACK/SYN LOG level info prefix "refused connection: "
0 0 nixos-fw-refuse all -- any any anywhere anywhere PKTTYPE != unicast
0 0 nixos-fw-refuse all -- any any anywhere anywhere
Chain nixos-fw-refuse (2 references)
pkts bytes target prot opt in out source destination
0 0 DROP all -- any any anywhere anywhere
iptables
Without trying to understand everything, we recognize some of the words. The -L
option lists the filter
table by default. This table contains multiple chains, including the INPUT
predefined chain, and the nixos-fw
chain created by NixOS. An incoming packet first hits the INPUT
chain. There are no conditions on the first rule, so evaluation then jumps to the nixos-fw
chain. The packet is checked for several conditions like output interface, if it’s part of an established connection, and if its target port is ssh
, http
, or https
. If so, evaluation jumps to the nixos-fw-accept
chain, which just sets the ACCEPT
decision on the packet. If the packet doesn’t match any of the conditions, evaluation jumps to nixos-fw-log-refuse
which outputs a log message, and then jumps to nisos-fw-refuse
which DROP
s the packet.
I personally don’t like the iptables
command very much. I find its output to be hard to read, and it’s full of gotchas. For instance, if we hadn’t used -v
in the command above, the out
column in the tables would have been missing. This would make the “accept everything destined for lo
” rule look like nixos-fw-accept all -- anywhere anywhere
, which is very confusing. Also, the command outputs the filter
table by default, and nothing in the output indicates that there are more tables. When I started using iptables
casually, it was years until I realized there were more tables.
Let’s see if nft
looks any better:
root@nixos:// > nft --stateless list table filter
table ip filter {
chain nixos-fw-accept {
counter accept
}
chain nixos-fw-refuse {
counter drop
}
chain nixos-fw-log-refuse {
meta l4proto tcp tcp flags & (fin|syn|rst|ack) == syn counter log prefix "refused connection: " level info
pkttype != unicast counter jump nixos-fw-refuse
counter jump nixos-fw-refuse
}
chain nixos-fw {
iifname "lo" counter jump nixos-fw-accept
ct state related,established counter jump nixos-fw-accept
meta l4proto tcp tcp dport 22 counter jump nixos-fw-accept
meta l4proto tcp tcp dport 80 counter jump nixos-fw-accept
meta l4proto tcp tcp dport 443 counter jump nixos-fw-accept
meta l4proto icmp icmp type echo-request counter jump nixos-fw-accept
counter jump nixos-fw-log-refuse
}
chain INPUT {
type filter hook input priority filter; policy accept;
counter jump nixos-fw
}
}
nft
We only list the filter
table here to compare the output with that of iptables
. If we had run nft list ruleset
, we would’ve gotten all the tables. The structure of the output is the same, but a lot of the information implied before is now explicit. For instance, the INPUT
chain isn’t intrinsically special anymore. Evaluation starts there only because the chain was attached to the input
hook with type filter hook input priority filter
. The lo
condition in the nixos-fw
chain is plainly visible instead of being hidden by default. On the other hand, the counter
information makes every row look noisy. If we hadn’t used --stateless
, the rules would have all looked like counter packets 597365 bytes 253572991 jump nixos-fw
. That said, if we ignore the word “counter” on every line, the output is easier to read.
Good: Hierarchical rule syntax and transactional updates
The listing above hints at the first new feature: hierarchical rule syntax. With iptables
, we’d write rules like this to allow packet forwarding for the wg0
interface, and for the 10.32.0.0/16
subnet:
# iptables -P FORWARD DROP
# iptables -A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT
# iptables -A FORWARD -i wg0 -j ACCEPT
# iptables -A FORWARD -s 10.32.0.0/16 -j ACCEPT
# iptables -A FORWARD -d 10.32.0.0/16 -j ACCEPT
iptables
With nft
, we can write either the equivalent commands:
# nft add chain ip filter FORWARD '{ type filter hook forward priority filter; policy drop; }'
# nft add rule ip filter FORWARD ct state related,established accept
# nft add rule ip filter FORWARD meta iifname wg0 counter accept
# nft add rule ip filter FORWARD ip daddr 10.32.0.0/16 counter accept
# nft add rule ip filter FORWARD ip saddr 10.32.0.0/16 counter accept
nft
Or, better yet, we can write the rules hierarchically (and transactionally):
# nft -f - <<EOF
add chain ip filter FORWARD;
flush chain ip filter FORWARD;
table ip filter {
chain FORWARD {
type filter hook forward priority filter; policy drop;
ct state related,established accept
meta iifname wg0 counter accept
ip daddr 10.32.0.0/16 counter accept
ip saddr 10.32.0.0/16 counter accept
}
}
EOF
nft
There are a few things going on here:
-
Instead of repeating
ip filter FORWARD
on every line, we specify it just once. This makes writing scripts much easier. -
We’re using
nft -f
, which applies the entire file atomically. If we get the syntax for any of the rules wrong, we don’t end up with a half-configured firewall. -
We also specify our FORWARD chain kind-of-declaratively. The
add chain
andflush chain
commands at the beginning ensure the chain exists and is empty. Thetable
andchain
commands then configure the chain. After the entire command completes, the chain will look exactly as described, which is a step-up from running a bunch ofadd rule
commands, and hoping that the end result is what we wanted. Mind you, this is only kind-of-declarative because it doesn’t stop us, or other programs, from further modifying the firewall.
In NixOS, this also opens up the possibility of configuring the firewall in just one place based on system options. For instance, my forwarding rules used to be configured by iptables
invocations littered across multiple modules. Now, they’re configured in one place based on whether the machine has a Wireguard VPN enabled, and whether it’s part of the Kubernetes cluster:
networking.firewall.extraCommands = with pkgs.lib; ''
/bin/nft -f - <<EOF
table inet ab-forward;
flush table inet ab-forward;
table inet ab-forward {
chain FORWARD {
type filter hook forward priority filter; policy drop;
ct state related,established accept
}
}
EOF
'';
nft
Personally, the lack of transactional configuration was the aspect I hated most about iptables
. That’s fixed now, so hooray! Unfortunately, we’re not quite there yet in practical terms, as we’ll see later in the “Legacy tools” section.
Good: The inet
protocol family
The iptables
and nftables
rules are scoped by protocol families. So, the hierarchy is actually Protocol Family → Table → Chain → Rule. This also means that the iptables
listing in the previous section was incomplete because it neglected IPv6. It should’ve been:
# iptables -P FORWARD DROP
# iptables -A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT
# iptables -A FORWARD -i wg0 -j ACCEPT
# iptables -A FORWARD -s 10.32.0.0/16 -j ACCEPT
# iptables -A FORWARD -d 10.32.0.0/16 -j ACCEPT
# ip6tables -P FORWARD DROP
# ip6tables -A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT
# ip6tables -A FORWARD -i wg0 -j ACCEPT
# ip6tables -A FORWARD -s 10.32.0.0/16 -j ACCEPT
# ip6tables -A FORWARD -d 10.32.0.0/16 -j ACCEPT
iptables
Needless to say, this sort of duplication is annoying to write, it’s hard to check that the two families are configured the same, and this pattern makes it very easy to forget about IPv6.
With nftables
, we can use the inet
family to configure both IPv4 and IPv6 at the same time. So, instead of writing nft add rule ip
and nft add rule ip6
, we can write nft add rule inet
(or, better yet, use the hierarchical syntax with inet
instead of duplicated the ip
and ip6
tables).
Good: Debugging with nft monitor
Next up, have you ever wondered what changes some program is making to the firewall? For instance, what does kube-proxy
do when a new service is added? We can answer that very easily with nft monitor
.
We start nft monitor
in one terminal, create a new Kubernetes service on port 12345, and check the first terminal for output:
apiVersion: v1
kind: Service
metadata:
name: test-service
spec:
selector:
app: scvalex-net # selects the two pods running this website
ports:
- port: 12345
name: web
# nft monitor
... many rule deletions ...
... many rule additions ...
add rule ip nat KUBE-SVC-IMA7E3MHIXVXIG46 meta l4proto tcp ip saddr != 10.32.0.0/16 ip daddr 10.33.0.190 tcp dport 12345 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
add rule ip nat KUBE-SERVICES meta l4proto tcp ip daddr 10.33.0.190 tcp dport 12345 counter packets 0 bytes 0 jump KUBE-SVC-IMA7E3MHIXVXIG46
add rule ip nat KUBE-SVC-IMA7E3MHIXVXIG46 counter packets 0 bytes 0 jump KUBE-SEP-O7LSKZVJZBWKHEDW
add rule ip nat KUBE-SVC-IMA7E3MHIXVXIG46 counter packets 0 bytes 0 jump KUBE-SEP-NNVVCRJZTKE7RKSU
add rule ip nat KUBE-SEP-O7LSKZVJZBWKHEDW ip saddr 10.32.2.146 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
add rule ip nat KUBE-SEP-O7LSKZVJZBWKHEDW meta l4proto tcp counter packets 0 bytes 0 dnat to 10.32.2.146:12345
add rule ip nat KUBE-SEP-NNVVCRJZTKE7RKSU ip saddr 10.32.4.130 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
add rule ip nat KUBE-SEP-NNVVCRJZTKE7RKSU meta l4proto tcp counter packets 0 bytes 0 dnat to 10.32.4.130:12345
add rule ip nat KUBE-SVC-J2WIMEXQLDNVVHW7 meta l4proto tcp ip saddr != 10.32.0.0/16 ip daddr 10.33.0.205 tcp dport 12345 counter packets 0 bytes 0 jump KUBE-MARK-MASQ
... many more rule additions ...
nft monitor
to spy on kube-proxy
From the sheer number of changes, we can tell that kube-proxy
smashes all the rules in place every time. Searching for port 12345, we see that the most interesting rules are in the nat
table. Looking back, having something like this would’ve made writing the post about Kubernetes networking much easier.
Speaking of Kubernetes, another question I’ve had for a while is if this log message from flannel
is correct:
... flannel[18293]: ... main.go:313] Changing default FORWARD chain policy to ACCEPT
On the face of it, that sounds like a pretty dodgy statement. It’s like seeing “Opening all INPUT ports” in nginx
logs. However, if we check with nft monitor
, we see that flannel
doesn’t actually make any changes to the firewall, so the statement is wrong.
Good: Debugging with nftrace
Another nice addition is tooling to debug nftables
rules more easily. The way this works is that a rule may toggle the nftrace
meta mark on a packet, and then we can see the packet as it goes through the chains with nftables monitor trace
.
As an example, let’s block access to Facebook from our test VM. For demonstration purposes, we create three chains connected to the output
hook, organized in two tables.
-
The
fb-trace
table has the lowest numerical priority, so the packet goes through it first. This is where we set thenftrace
meta mark. We set it on the Facebook subnet, and on8.8.8.8
which we use to check that sending packets to other addresses still works. -
The packet then goes through the
fb-block
table where wedrop
any packets going to157.240.240.35
or157.240.240.36
. -
Finally, we ensure an
OUTPUT
chain exists in the system’sfilter
table which allows all packets through. Our Facebook packet should not reach this point. Note that we switch to using theip
protocol family here because that’s where NixOS creates thefilter
table. In a real system, we would add another chain like this forip6
.
# nft -f - <<EOF
add table inet fb-trace;
add chain inet fb-trace OUTPUT;
flush chain inet fb-trace OUTPUT;
delete chain inet fb-trace OUTPUT;
table inet fb-trace {
chain OUTPUT {
type filter hook output priority -10; policy accept;
ip daddr {157.240.240.0/24, 8.8.8.8} nftrace set 1;
}
}
add table inet fb-block;
add chain inet fb-block OUTPUT;
flush chain inet fb-block OUTPUT;
delete chain inet fb-block OUTPUT;
table inet fb-block {
chain OUTPUT {
type filter hook output priority -5; policy accept;
ip daddr {157.240.240.35, 157.240.240.36} counter drop;
}
}
table ip filter {
chain OUTPUT {
type filter hook output priority 0; policy accept;
}
}
EOF
We start nft monitor trace
in one terminal, and try out a couple of pings in another:
root@nixos:// > ping -W5 -c1 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=255 time=31.2 ms
--- 8.8.8.8 ping statistics ---1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 31.195/31.195/31.195/0.000 ms
root@nixos:// > ping -W5 -c1 157.240.240.35
PING 157.240.240.35 (157.240.240.35) 56(84) bytes of data.
--- 157.240.240.35 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
As expected, pinging 8.8.8.8
works, but pinging 157.240.240.35
times out. Checking the tracing terminal, we see:
root@nixos:// > nft monitor trace
trace id a4ac8c93 inet fb-trace OUTPUT packet: oif "eth0" ip saddr 10.0.2.15 ip daddr 8.8.8.8 ip dscp cs0 ip ecn not-ect ip ttl64 ip id 19836 ip protocol icmp ip length 84 icmp type echo-request icmp code net-unreachable icmp id 8 icmp sequence 1 @th,64,96 0x2960cf6100000000ef1e0d00
trace id a4ac8c93 inet fb-trace OUTPUT rule ip daddr { 8.8.8.8, 157.240.240.0/24 } meta nftrace set 1 (verdict continue)
trace id a4ac8c93 inet fb-trace OUTPUT verdict continue
trace id a4ac8c93 inet fb-trace OUTPUT policy accept
trace id a4ac8c93 inet fb-block OUTPUT packet: oif "eth0" ip saddr 10.0.2.15 ip daddr 8.8.8.8 ip dscp cs0 ip ecn not-ect ip ttl64 ip id 19836 ip protocol icmp ip length 84 icmp type echo-request icmp code net-unreachable icmp id 8 icmp sequence 1 @th,64,96 0x2960cf6100000000ef1e0d00
trace id a4ac8c93 inet fb-block OUTPUT verdict continue
trace id a4ac8c93 inet fb-block OUTPUT policy accept
trace id a4ac8c93 ip filter OUTPUT packet: oif "eth0" ip saddr 10.0.2.15 ip daddr 8.8.8.8 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 19836 ip length 84 icmp type echo-request icmp code net-unreachable icmp id 8 icmp sequence 1 @th,64,96 0x2960cf6100000000ef1e0d00
trace id a4ac8c93 ip filter OUTPUT verdict continue
trace id a4ac8c93 ip filter OUTPUT policy accept
trace id 4683d399 inet fb-trace OUTPUT packet: oif "eth0" ip saddr 10.0.2.15 ip daddr 157.240.240.35 ip dscp cs0 ip ecn not-ectip ttl 64 ip id 48251 ip protocol icmp ip length 84 icmp type echo-request icmp code net-unreachable icmp id 9 icmp sequence 1 @th,64,96 0x3160cf61000000009bd00200
trace id 4683d399 inet fb-trace OUTPUT rule ip daddr { 8.8.8.8, 157.240.240.0/24 } meta nftrace set 1 (verdict continue)
trace id 4683d399 inet fb-trace OUTPUT verdict continue
trace id 4683d399 inet fb-trace OUTPUT policy accept
trace id 4683d399 inet fb-block OUTPUT packet: oif "eth0" ip saddr 10.0.2.15 ip daddr 157.240.240.35 ip dscp cs0 ip ecn not-ectip ttl 64 ip id 48251 ip protocol icmp ip length 84 icmp type echo-request icmp code net-unreachable icmp id 9 icmp sequence 1 @th,64,96 0x3160cf61000000009bd00200
trace id 4683d399 inet fb-block OUTPUT rule ip daddr { 157.240.240.35, 157.240.240.36 } counter packets 1 bytes 84 drop (verdict drop)
The 8.8.8.8
packet hit the fb-trace
, fb-block
, and filter
tables. It had nftrace
attached to it in the first table, and then fell through the accept
policy of each table. On the other hand, the 157.240.240.35
packet went through fb-trace
, and then got dropped in fb-block
. As expected, it never made it to filter
.
Once we’re satisfied with our firewall rules, we can just delete our fb-trace
table to cleanup:
# nft delete table inet fb-trace
Good: Maps and sets
Another new feature is the addition of sets and maps. To copy the example from regit’s blog, these let you avoid having to copy-paste the same rule multiple times with just one field changed:
# ip6tables -A INPUT -p tcp -m multiport --dports 23,80,443 -j ACCEPT
# ip6tables -A INPUT -p icmpv6 --icmpv6-type neighbor-solicitation -j ACCEPT
# ip6tables -A INPUT -p icmpv6 --icmpv6-type echo-request -j ACCEPT
# ip6tables -A INPUT -p icmpv6 --icmpv6-type router-advertisement -j ACCEPT
# ip6tables -A INPUT -p icmpv6 --icmpv6-type neighbor-advertisement -j ACCEPT
iptables
# nft add rule ip6 filter input tcp dport {telnet, http, https} accept
# nft add rule ip6 filter input icmpv6 type { nd-neighbor-solicit, echo-request, nd-router-advert, nd-neighbor-advert } accept
nft
sets
The latter should also be faster because checking for set membership is more efficient than sequential rule evaluation.
Another cool thing you can do is port knocking at the firewall level.
Bad: Legacy tools
We’ve seen that nft
introduces several improvements over iptables
, but there’s a catch: trying to use the new features to their fullest is awkward because it conflicts with existing tooling.
Specifically, the nft -f
command lets us atomically configure our firewall, which is a marked improvement over having a script that incrementally adds rules, and can fail partway through leaving the firewall in an undefined state. However, many existing tools attempt to change the firewall at runtime. For instance, the usual way people setup Wireguard VPNs is by having the wg-quick
tool add rules to iptables
after the network interface is created. As another example, the way NixOS configures NAT is by appending to the long script of firewall setup commands.
So, you can write all of your firewall configuration in one place, and ensure that either it’s all applied, or none of it is. But then some other program is going to come in, and make other changes to the firewall in the live system.
This will get fixed eventually, but at the moment, we can’t have fully declarative configurations for our firewall. The best we can aim for right now is making chunks of the configuration declarative.
Bad: Multiple chains on the same hook
Another issue is that, due to the way the accept
and drop
policies are interpreted, it’s very hard to have both rules written in multiple tables, and a base policy of drop
. This is a bit hard to explain without an example, so let’s consider how we’d restrict output from a box.
We’d like our system to only allow outgoing connections to certain addresses (e.g. 8.8.8.8
). To achieve this, we’ll try connecting chains to the output
hook in three tables:
-
In the
test-trace
table, we enablenftrace
so that we can later debug what’s going on. -
In the
allow-dns
table, we specify that we want to allow packets to8.8.8.8
. -
In the
filter
table, we define the base case that no outgoing packets are allowed (except for already established connections)
# nft -f - <<EOF
add table inet test-trace;
add chain inet test-trace OUTPUT;
flush chain inet test-trace OUTPUT;
delete chain inet test-trace OUTPUT;
table inet test-trace {
chain OUTPUT {
type filter hook output priority -10; policy accept;
ip daddr 8.8.8.8 nftrace set 1;
}
}
add table inet allow-dns;
add chain inet allow-dns OUTPUT;
flush chain inet allow-dns OUTPUT;
delete chain inet allow-dns OUTPUT;
table inet allow-dns {
chain OUTPUT {
type filter hook output priority -5; policy drop;
ip daddr 8.8.8.8 accept;
}
}
table ip filter {
chain OUTPUT {
type filter hook output priority 0; policy drop;
ct state related,established accept
}
}
EOF
8.8.8.8
. This does NOT work.
We want separate tables like this to make the rules easier to manage. In an ideal world where the entire firewall configuration is generated in one place, this isn’t necessary, but in the current world where the filter
table is going to be full of rules added by random programs and modules, having our configuration in separate tables is helpful.
This setup doesn’t work. We see pings timing out:
root@nixos:// > ping -c1 -W5 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
--- 8.8.8.8 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
root@nixos:// > nft monitor trace
trace id 5c12872a inet test-trace OUTPUT packet: oif "eth0" ip saddr 10.0.2.15 ip daddr 8.8.8.8 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 15754 ip protocol icmp ip length 84 icmp type echo-request icmp code net-unreachable icmp id 13 icmp sequence 1 @th,64,96 0x3f6dcf6100000000cf150f00
trace id 5c12872a inet test-trace OUTPUT rule ip daddr { 8.8.8.8, 157.240.240.0/24 } meta nftrace set 1 (verdict continue)
trace id 5c12872a inet test-trace OUTPUT verdict continue
trace id 5c12872a inet test-trace OUTPUT policy accept
trace id 5c12872a inet allow-dns OUTPUT packet: oif "eth0" ip saddr 10.0.2.15 ip daddr 8.8.8.8 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 15754 ip protocol icmp ip length 84 icmp type echo-request icmp code net-unreachable icmp id 13 icmp sequence 1 @th,64,96 0x3f6dcf6100000000cf150f00
trace id 5c12872a inet allow-dns OUTPUT rule ip daddr 8.8.8.8 accept (verdict accept)
trace id 5c12872a ip filter OUTPUT packet: oif "eth0" ip saddr 10.0.2.15 ip daddr 8.8.8.8 ip dscp cs0 ip ecn not-ect ip ttl 64 ip id 15754 ip length 84 icmp type echo-request icmp code net-unreachable icmp id 13 icmp sequence 1 @th,64,96 0x3f6dcf6100000000cf150f00
trace id 5c12872a ip filter OUTPUT verdict continue
trace id 5c12872a ip filter OUTPUT policy drop
Looking at the trace, it’s clear what’s going on: the packet goes through the test-trace
table without issue, then gets accepted in the allow-dns
table, then gets dropped by the policy in the filter
table. This is because accept
doesn’t end evaluation, and just passes the packet to the next chain attached to the hook. This is different from drop
which does end evaluation. Practically, this means that, if our base policy is drop
, then we must have all the accept
rules in the same chain. This emergent restriction makes tables less useful, and the configs noisier.
Conclusion
All in all, nft
is a big improvement over iptables
, especially in terms of usability. I had noticed all the hubbub about nft
in the summer of 2021, and now that I’ve had a chance to see for myself, I’m completely on board this hype train. I still don’t see where the pixelated apes come in, but that might be in some info
manual I haven’t read.