A thing I’ve been missing is the ability to easily share files with insecure devices. I’ve tried a bunch of solutions over the years, including using Syncthing and Nextcloud, but they were neither nice to use nor run. Recently, a friend found a much better trade-off in terms of setup complexity and ease-of-use. So, let’s build such a file-sharing service with rclone, Nginx, and Kubernetes.
The full code for this post is available here.
Table of contents
The problem
Imagine that we’re travelling and we need to print a document from email. The printshop’s computer has Internet access, but we really don’t want to login to our email from there. Instead, we want to download the document to our phones, put it somewhere, then download it onto the public computer.
What we need is some sort of private server to which we can upload files very easily from our phones, tablets, or laptops. We also need this to work over restricted Internet connections in case we have to use hotel WiFi. In other words, we have to use HTTP.
Natively, HTTP doesn’t support file upload and editing—we need something more than a plain webserver to implement the file handling logic. We don’t want this server to be too complicated to setup and run—something like Nextcloud is very featureful, but far too heavy. Luckily, there’s an extension to plain HTTP called WebDAV that does what we want. Unluckily, there isn’t great support for it on the server side. For example, the Nginx DAV module works at a superficial level, but behaves erratically if you pay close attention.
The server I’ve found to work best is rclone. We basically run rclone serve webdav
on a directory and it just works. The problem is that it doesn’t have support for multiple endpoints with different permissions, so we still need Nginx for this.
Now that we know roughly what we want, let’s formalize it.
Architecture
Practically, we have a /data
directory on the server that we want to provide authenticated read/write access to over WebDAV. We also need to provide unauthenticated public access to /data/pub
. We’ll run rclone
for the private files and serve only on localhost. We’ll then make Nginx our entry point, have it serve the public files itself, and have it authenticate and proxy access to the rclone
server.
We’ll put both rclone
and Nginx in a single container and run this as a Kubernetes pod. The /data
volume will be a Persistent Volume not tied to any particular machine. This will let our pod migrate to other nodes during maintenance and minimize service interruption. We’ll also let Kubernetes terminate HTTPS connections since it’s good at that and saves us from having to bundle certificate management in our container.
Let’s build this inside-out, starting with the container.
Building the container with Nix
We’re going to use Nix to build the container since it keeps all of our configuration in one place and ensures repeatable builds.
The full flake for this is available here. The core of it is the container
package. It just creates a few directories that Nginx needs at runtime, includes the fakeNss
package so that Nginx doesn’t complain about missing users, and then calls the start script. Nix sees our dependencies, so we don’t need to include nginx
or rclone
or anything else manually.
packages.container = pkgs.dockerTools.buildLayeredImage {
name = "rclone-webdav";
tag = "flake";
created = "now";
contents = [
pkgs.fakeNss
];
extraCommands = ''
# Nginx needs these dirs
mkdir -p srv/client-temp
mkdir -p var/log/nginx/
mkdir -p tmp/nginx_client_body
'';
config = {
ExposedPorts = { "80/tcp" = { }; };
Entrypoint = [ " /bin/startScript"];
Cmd = [ ];
};
};
flake.nix
The start script starts rclone
and nginx
as two Bash jobs, then waits for rclone
to terminate. We could’ve done something more complicated with a dedicated “pid 0” process, but we wouldn’t gain much and it would complicate our container.
startScript = pkgs.writeShellScriptBin "startScript" ''
set -m
/bin/rclone serve webdav --log-level INFO --baseurl priv --addr 127.0.0.1:8081 /data/ &
/bin/nginx -c &
fg %1
'';
flake.nix
Next, we need the config file for Nginx. We set up the /pub
endpoint to serve files from /data/pub
. We set up the /priv
endpoint as a proxy to the rclone
process after authenticating. A slightly weird thing about this setup is that we access the /data/pub/file1
through the /pub/file1
URL, but we write to it through the /priv/pub/file1
URL.
A very important setting is client_max_body_size 200M
. Without it, we wouldn’t be able to upload files larger than 1 MB. There is a corresponding necessary setting in the Ingress config later.
nginxConfig = pkgs.writeText "nginx.conf" ''
daemon off;
user nobody nobody;
error_log /dev/stdout info;
pid /dev/null;
events {}
http {
server {
listen 80;
client_max_body_size 200M;
location /pub {
root /data;
autoindex on;
}
location /priv {
proxy_pass "http://127.0.0.1:8081";
auth_basic "Files";
auth_basic_user_file ;
}
location = / {
return 301 /pub/;
}
location / {
deny all;
}
}
}
'';
flake.nix
Finally, we need the authentication config for Nginx. This is where things get tricky. We need to include the username and password for /priv/
access in the container somewhere, but we don’t want them to appear in the repo—that’d just be bad security practice (and I certainly can’t include my password in the repo because I’m making it public). The standard solution is to pull these in as environment variables from the CI or build host, but we can’t do that here because Nix flakes are hermetic. That is, they cannot use anything at all from outside the repo. So, we cheat. We commit a users.env
file with dummy environment variables to the repo and we overwrite it in the CI with the right values.
authConfig = pkgs.runCommand "auth.htpasswd" { } ''
source
/bin/htpasswd -nb "$WEB_USER" "$WEB_PASSWORD" > $out
'';
flake.nix
The .gitlab-ci.yml
is essentially unchanged from my previous post. The only new addition is the echo "$PROD_CREDENTIALS" > users.env
line which dumps the username and password from a Gitlab variable into the users.env
file for the build to use.
build-container:
image: "nixos/nix:2.12.0"
stage: build
needs: []
only:
- main
- tags
variables:
CACHIX_CACHE_NAME: scvalex-rclone-webdav
IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG
before_script:
- nix-env --install --attr nixpkgs.cachix
- nix-env --install --attr nixpkgs.skopeo
- cachix use "$CACHIX_CACHE_NAME"
script:
- mkdir -p "$HOME/.config/nix"
- echo 'experimental-features = nix-command flakes' > "$HOME/.config/nix/nix.conf"
- mkdir -p "/etc/containers/"
- echo '{"default":[{"type":"insecureAcceptAnything"}]}' > /etc/containers/policy.json
# 👇 👇 👇 NEW LINE 👇 👇 👇
- echo "$PROD_CREDENTIALS" > users.env
- cachix watch-exec "$CACHIX_CACHE_NAME" nix build .#container
- skopeo login --username "$CI_REGISTRY_USER" --password "$CI_REGISTRY_PASSWORD" "$CI_REGISTRY"
- ls -lh ./result
- 'skopeo inspect docker-archive://$(readlink -f ./result)'
- 'skopeo copy docker-archive://$(readlink -f ./result) docker://$IMAGE_TAG'
With all this in place, we can build the container locally with nix build .#container
and have Gitlab build it for us in CI. See the Justfile for other useful local recipies.
Now, let’s operationalize our container and run it in Kubernetes.
Deploying on Kubernetes
The whole config is available here. It uses kubectl
’s built-in Kustomize to force everything into a dedicated namespace.
The setup is fairly simple: the container is run in a single pod by a StatefulSet, it is fronted by a Service, and it is backed by a Longhorn PersistentVolume. It’s important to use a StatefulSet instead of a Deployment here to make sure the volume is properly detached before the pod is restarted or replaced.
apiVersion: v1
kind: Service
metadata:
name: rclone-webdav
spec:
selector:
app: rclone-webdav
type: NodePort
ports:
- port: 80
protocol: TCP
name: http
targetPort: http
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: rclone-webdav
labels:
app: rclone-webdav
spec:
replicas: 1
serviceName: rclone-webdav
selector:
matchLabels:
app: rclone-webdav
template:
metadata:
labels:
app: rclone-webdav
spec:
containers:
- name: rclone-webdav
image: registry.gitlab.com/abstract-binary/rclone-webdav-container:main
imagePullPolicy: Always
ports:
- containerPort: 80
name: http
volumeMounts:
- name: data
mountPath: /data
volumes:
- name: data
persistentVolumeClaim:
claimName: data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data
spec:
storageClassName: longhorn
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 2G
Finally, we expose our service to the outside world with an Ingress. In my setup, I use ingress-nginx
as the actual ingress implementation and cert-manager
to manage TLS certificates.
Importantly, we have to configure the maximum upload size again with the nginx.ingress.kubernetes.io/proxy-body-size: 200m
annotation. Without this, file uploads would be limited to 1 MB.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: rclone-webdav
namespace: rclone-webdav
annotations:
cert-manager.io/cluster-issuer: 'letsencrypt'
nginx.ingress.kubernetes.io/proxy-body-size: 200m
spec:
ingressClassName: nginx
tls:
- hosts:
- files.abstractbinary.org
secretName: files-certs
rules:
- host: files.abstractbinary.org
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: rclone-webdav
port:
number: 80
Conclusion
…And that’s all there is to it. Like with most container and Kubernetes things, there’s seemingly a lot of code, but most of it is boilerplate or just common patterns.