A thing I’ve been missing is the ability to easily share files with insecure devices. I’ve tried a bunch of solutions over the years, including using Syncthing and Nextcloud, but they were neither nice to use nor run. Recently, a friend found a much better trade-off in terms of setup complexity and ease-of-use. So, let’s build such a file-sharing service with rclone, Nginx, and Kubernetes.

The full code for this post is available here.

The problem

Imagine that we’re travelling and we need to print a document from email. The printshop’s computer has Internet access, but we really don’t want to login to our email from there. Instead, we want to download the document to our phones, put it somewhere, then download it onto the public computer.

What we need is some sort of private server to which we can upload files very easily from our phones, tablets, or laptops. We also need this to work over restricted Internet connections in case we have to use hotel WiFi. In other words, we have to use HTTP.

Natively, HTTP doesn’t support file upload and editing—we need something more than a plain webserver to implement the file handling logic. We don’t want this server to be too complicated to setup and run—something like Nextcloud is very featureful, but far too heavy. Luckily, there’s an extension to plain HTTP called WebDAV that does what we want. Unluckily, there isn’t great support for it on the server side. For example, the Nginx DAV module works at a superficial level, but behaves erratically if you pay close attention.

The server I’ve found to work best is rclone. We basically run rclone serve webdav on a directory and it just works. The problem is that it doesn’t have support for multiple endpoints with different permissions, so we still need Nginx for this.

Now that we know roughly what we want, let’s formalize it.

Architecture

Practically, we have a /data directory on the server that we want to provide authenticated read/write access to over WebDAV. We also need to provide unauthenticated public access to /data/pub. We’ll run rclone for the private files and serve only on localhost. We’ll then make Nginx our entry point, have it serve the public files itself, and have it authenticate and proxy access to the rclone server.

We’ll put both rclone and Nginx in a single container and run this as a Kubernetes pod. The /data volume will be a Persistent Volume not tied to any particular machine. This will let our pod migrate to other nodes during maintenance and minimize service interruption. We’ll also let Kubernetes terminate HTTPS connections since it’s good at that and saves us from having to bundle certificate management in our container.

Architecture of the whole setup
Architecture of the whole setup

Let’s build this inside-out, starting with the container.

Building the container with Nix

We’re going to use Nix to build the container since it keeps all of our configuration in one place and ensures repeatable builds.

The full flake for this is available here. The core of it is the container package. It just creates a few directories that Nginx needs at runtime, includes the fakeNss package so that Nginx doesn’t complain about missing users, and then calls the start script. Nix sees our dependencies, so we don’t need to include nginx or rclone or anything else manually.

packages.container = pkgs.dockerTools.buildLayeredImage {
  name = "rclone-webdav";
  tag = "flake";
  created = "now";
  contents = [
    pkgs.fakeNss
  ];
  extraCommands = ''
    # Nginx needs these dirs
    mkdir -p srv/client-temp
    mkdir -p var/log/nginx/
    mkdir -p tmp/nginx_client_body
  '';
  config = {
    ExposedPorts = { "80/tcp" = { }; };
    Entrypoint = [ "${startScript}/bin/startScript" ];
    Cmd = [ ];
  };
};
Container spec from flake.nix

The start script starts rclone and nginx as two Bash jobs, then waits for rclone to terminate. We could’ve done something more complicated with a dedicated “pid 0” process, but we wouldn’t gain much and it would complicate our container.

startScript = pkgs.writeShellScriptBin "startScript" ''
  set -m
  ${pkgs.rclone}/bin/rclone serve webdav --log-level INFO --baseurl priv --addr 127.0.0.1:8081 /data/ &
  ${pkgs.nginx}/bin/nginx -c ${nginxConfig} &
  fg %1
'';
Container start script from flake.nix

Next, we need the config file for Nginx. We set up the /pub endpoint to serve files from /data/pub. We set up the /priv endpoint as a proxy to the rclone process after authenticating. A slightly weird thing about this setup is that we access the /data/pub/file1 through the /pub/file1 URL, but we write to it through the /priv/pub/file1 URL.

A very important setting is client_max_body_size 200M. Without it, we wouldn’t be able to upload files larger than 1 MB. There is a corresponding necessary setting in the Ingress config later.

nginxConfig = pkgs.writeText "nginx.conf" ''
  daemon off;
  user nobody nobody;
  error_log /dev/stdout info;
  pid /dev/null;
  events {}
  http {
    server {
      listen 80;
      client_max_body_size 200M;
      location /pub {
        root /data;
        autoindex on;
      }
      location /priv {
        proxy_pass "http://127.0.0.1:8081";
        auth_basic "Files";
        auth_basic_user_file ${authConfig};
      }
      location = / {
        return 301 /pub/;
      }
      location / {
        deny all;
      }
    }
  }
'';
Nginx config from flake.nix

Finally, we need the authentication config for Nginx. This is where things get tricky. We need to include the username and password for /priv/ access in the container somewhere, but we don’t want them to appear in the repo—that’d just be bad security practice (and I certainly can’t include my password in the repo because I’m making it public). The standard solution is to pull these in as environment variables from the CI or build host, but we can’t do that here because Nix flakes are hermetic. That is, they cannot use anything at all from outside the repo. So, we cheat. We commit a users.env file with dummy environment variables to the repo and we overwrite it in the CI with the right values.

authConfig = pkgs.runCommand "auth.htpasswd" { } ''
  source ${./users.env}
  ${pkgs.apacheHttpd}/bin/htpasswd -nb "$WEB_USER" "$WEB_PASSWORD" > $out
'';
Authentication config from flake.nix

The .gitlab-ci.yml is essentially unchanged from my previous post. The only new addition is the echo "$PROD_CREDENTIALS" > users.env line which dumps the username and password from a Gitlab variable into the users.env file for the build to use.

build-container:
  image: "nixos/nix:2.12.0"
  stage: build
  needs: []
  only:
    - main
    - tags
  variables:
    CACHIX_CACHE_NAME: scvalex-rclone-webdav
    IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG
  before_script:
    - nix-env --install --attr nixpkgs.cachix
    - nix-env --install --attr nixpkgs.skopeo
    - cachix use "$CACHIX_CACHE_NAME"
  script:
    - mkdir -p "$HOME/.config/nix"
    - echo 'experimental-features = nix-command flakes' > "$HOME/.config/nix/nix.conf"
    - mkdir -p "/etc/containers/"
    - echo '{"default":[{"type":"insecureAcceptAnything"}]}' > /etc/containers/policy.json
    # 👇 👇 👇 NEW LINE 👇 👇 👇
    - echo "$PROD_CREDENTIALS" > users.env
    - cachix watch-exec "$CACHIX_CACHE_NAME" nix build .#container
    - skopeo login --username "$CI_REGISTRY_USER" --password "$CI_REGISTRY_PASSWORD" "$CI_REGISTRY"
    - ls -lh ./result
    - 'skopeo inspect docker-archive://$(readlink -f ./result)'
    - 'skopeo copy docker-archive://$(readlink -f ./result) docker://$IMAGE_TAG'

With all this in place, we can build the container locally with nix build .#container and have Gitlab build it for us in CI. See the Justfile for other useful local recipies.

Now, let’s operationalize our container and run it in Kubernetes.

Deploying on Kubernetes

The whole config is available here. It uses kubectl’s built-in Kustomize to force everything into a dedicated namespace.

The setup is fairly simple: the container is run in a single pod by a StatefulSet, it is fronted by a Service, and it is backed by a Longhorn PersistentVolume. It’s important to use a StatefulSet instead of a Deployment here to make sure the volume is properly detached before the pod is restarted or replaced.

apiVersion: v1
kind: Service
metadata:
  name: rclone-webdav
spec:
  selector:
    app: rclone-webdav
  type: NodePort
  ports:
  - port: 80
    protocol: TCP
    name: http
    targetPort: http
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: rclone-webdav
  labels:
    app: rclone-webdav
spec:
  replicas: 1
  serviceName: rclone-webdav
  selector:
    matchLabels:
      app: rclone-webdav
  template:
    metadata:
      labels:
        app: rclone-webdav
    spec:
      containers:
      - name: rclone-webdav
        image: registry.gitlab.com/abstract-binary/rclone-webdav-container:main
        imagePullPolicy: Always
        ports:
        - containerPort: 80
          name: http
        volumeMounts:
        - name: data
          mountPath: /data
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data
spec:
  storageClassName: longhorn
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 2G

Finally, we expose our service to the outside world with an Ingress. In my setup, I use ingress-nginx as the actual ingress implementation and cert-manager to manage TLS certificates.

Importantly, we have to configure the maximum upload size again with the nginx.ingress.kubernetes.io/proxy-body-size: 200m annotation. Without this, file uploads would be limited to 1 MB.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: rclone-webdav
  namespace: rclone-webdav
  annotations:
    cert-manager.io/cluster-issuer: 'letsencrypt'
    nginx.ingress.kubernetes.io/proxy-body-size: 200m
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - files.abstractbinary.org
    secretName: files-certs
  rules:
  - host: files.abstractbinary.org
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: rclone-webdav
            port:
              number: 80

Conclusion

…And that’s all there is to it. Like with most container and Kubernetes things, there’s seemingly a lot of code, but most of it is boilerplate or just common patterns.