Let’s write an app that watches some files for changes, runs them through Liquid templates (rs), and then compiles the output with LaTeX. It’ll take about 200 lines of code. This scheme is very useful when you need to produce some kind of document from data available to a Rust program, and when you expect to be iterating on the templates a lot.

The full code for this post is available here.

The goal

We’re going to write a little app that generates party invitations. The example is a bit contrived, but demonstrates the basic scheme:

  • The user writes the app, and runs it with cargo run -- --watch. This app is where the data comes from. Instead of trying to write some general app that munges arbitrary data, the user is supposed to adapt the code to their own needs. In our case, it reads a CSV file, but it could just as easily query a database or scrape the web. Since gathering the data is done only once per run, it’s fine if it’s slow.

  • The app watches the templates/ directory for changes. On every change, the files in templates/ are run through the Liquid templating engine. This might sound slow, but it’s actually fast enough on small amounts of data. Mind you, “small” on a 2018 mid-range Thinkpad is still larger than most usecases. For instance, it takes my laptop under a second to prepare the source for a 400 page PDF, and it takes two seconds to process the 400 files which make up this website, although that also involves copying files, translating markdown, and shelling out to external commands.

  • The output is saved to the out/ directory, and xelatex is run on one of the files to generate the final PDF.

Screenshots of a terminal running the app, an editor changing the templates, and a PDF viewer showing the output
Screenshots of a terminal running the app, an editor changing the templates, and a PDF viewer showing the output

💭 I mention speed here because generating this website used to take 30 seconds with a Node.js app. Rust cut the time down to 2 seconds. To be fair, the Node app was more complicated, used React to compose the site, and Astro to partially hydrate and generate static pages, but that’s par for the course in that ecosystem, and the output of the Node.js and Rust apps was almost byte-for-byte identical.

💭 The example in this post is for a PDF document generator, but the scheme can easily be adapted to make other things. As mentioned above, this website is generated like this, the differences being that it also translates markdown to HTML before applying templates, and that it calls html-tidy on the output instead of xelatex.

Hashing files

Let’s build our app from the bottom up. We need to check if file contents have changed, so let’s write some code to compute hashes for a given list of files.

fn hash_files(files: Vec<PathBuf>) -> OrError<HashMap<PathBuf, u64>> {
    let mut hashes = HashMap::new();
    for f in files {
        let hash = hash_file(&f)?;
        hashes.insert(f, hash);
    }
    Ok(hashes)
}

fn hash_file<P: AsRef<Path>>(path: P) -> OrError<u64> {
    use seahash::SeaHasher;
    use std::{hash::Hasher, io::Read};
    let mut file = File::open(path.as_ref())?;
    let mut buf = [0; 4096];
    let mut hasher = SeaHasher::new();
    loop {
        match file.read(&mut buf)? {
            0 => return Ok(hasher.finish()),
            _ => hasher.write(&buf),
        }
    }
}

The code is self-explanatory. The only interesting bit is the choice of hash function: we use seahash which describes itself as “blazingly fast”. The main concern here is speed because we rehash all the files on every change. Even if we get hash collisions, and some changes aren’t detected (it’s never happened to me in years of using seahash, but it could happen), that would be fine for our use case because the user would just notice the lack of update, and make some other change to the input files. So, we don’t need a strong cryptographic hash; we just need a very fast hash.

Scanning directories

Next, we need to get the list of files to hash. We want to scan the templates/ directory and its subdirectories, and also ignore temporary files. Instead of trying to write this logic from scratch, we use ignore:

fn scan_dir(dir: &str) -> OrError<Vec<PathBuf>> {
    info!("Scanning {dir}");
    use ignore::Walk;
    let mut res = vec![];
    for file in Walk::new(dir) {
        let file = file?;
        if !file.metadata()?.is_file() {
            continue;
        }
        res.push(file.into_path());
    }
    Ok(res)
}

As its name implies, ignore respects .gitignore and other similar files, which saves us from having to come up with our own scheme for ignoring files.

Watching for changes

Next, we need to watch templates/ for changes. We run a notify-debouncer-mini on a separate thread. Whenever the debouncer notifies us of a change via its mpsc::channel, we rescan the directory, recompute all file hashes, and if they have changed, we write the entire list of files to the output channel.

fn run_watcher(dir: &str) -> OrError<mpsc::Receiver<Vec<PathBuf>>> {
    use notify::RecursiveMode;
    use std::time::Duration;
    let (notify_tx, notify_rx) = mpsc::channel();
    let (watcher_tx, watcher_rx) = mpsc::channel();
    let watcher_loop = {
        let dir = dir.to_string();
        move || -> OrError<()> {
            let mut debouncer =
                notify_debouncer_mini::new_debouncer(Duration::from_millis(250), None, notify_tx)?;
            let mut hashes = hash_files(scan_dir(&dir)?)?;
            debouncer
                .watcher()
                .watch(Path::new(&dir), RecursiveMode::Recursive)?;
            loop {
                match notify_rx.recv()? {
                    Err(errs) => error!("Notify errors: {errs:?}"),
                    Ok(events) if events.is_empty() => {}
                    Ok(_) => {
                        let paths = scan_dir(&dir)?;
                        let new_hashes = hash_files(paths.clone())?;
                        if hashes != new_hashes {
                            hashes = new_hashes;
                            watcher_tx.send(paths)?;
                        }
                    }
                }
            }
        }
    };
    std::thread::spawn(move || match watcher_loop() {
        Ok(()) => error!("Watcher loop ended without error"),
        Err(err) => error!("Watcher loop ended with error: {err}"),
    });
    Ok(watcher_rx)
}

Recomputing all the file hashes on every change seems like it would be slow, and it certainly is wasteful. However, this code is part of a human interaction loop, so it doesn’t need to be the fastest—it just needs to be fast enough for a human not to mind. On my laptop, it takes one second to scan and hash a directory of 9000 files of 32KB each. That’s plenty fast. If we needed to, we could use the lower level notify crate, only react to file modifications, and then only rehash the modified files. That said, our efforts would be better spent on optimizing the template rendering from the next section which is 11x slower.

Rendering templates

Next, we need to run the files through the Liquid templating engine. There’s a bit of pomp and ceremony because we have to handle “partial” templates separately. Then, we render the full templates in parallel, while also passing data to them.

// Render the given templates.  Files that end with ".liquid" are partial
// templates that can be `include`d in other templates.  All other
// files are run through the template engine, and rendered to
// `out_dir`.
fn render_templates(
    in_dir: &str,
    files: Vec<PathBuf>,
    out_dir: &str,
    data: &HashMap<String, liquid::model::Value>,
) -> OrError<()> {
    use liquid::{
        partials::{EagerCompiler, InMemorySource},
        ParserBuilder, ValueView,
    };
    use rayon::prelude::*;
    info!("Rendering templates: {files:?}");
    let mut templates = InMemorySource::new();
    for f in &files {
        let path = f.strip_prefix(in_dir)?.to_str().unwrap();
        if path.ends_with(".liquid") {
            templates.add(path.to_string(), std::fs::read_to_string(f)?);
        }
    }
    let parser = ParserBuilder::new()
        .stdlib()
        .partials(EagerCompiler::new(templates))
        .build()?;
    files
        .par_iter()
        .map(|f| {
            let mut globals: HashMap<String, &dyn ValueView> = HashMap::new();
            globals.insert("data".to_string(), data as &dyn ValueView);
            let path = f.strip_prefix(in_dir)?.to_str().unwrap();
            if !path.ends_with(".liquid") {
                let out_file = PathBuf::new().join(out_dir).join(path);
                let template = parser.parse(&std::fs::read_to_string(f)?)?;
                info!("Rendering to {}", out_file.display());
                template.render_to(&mut File::create(out_file)?, &globals)?;
            }
            Ok(())
        })
        .collect::<Vec<OrError<()>>>()
        .into_iter()
        .collect::<OrError<Vec<()>>>()?;
    info!("Done rendering templates");
    Ok(())
}

We chose Liquid Templates because it’s good enough: the syntax is fairly clean, it’s used by a large company like Shopify, the Rust library is mature, and it’s extensible. I confess I have not put much thought into this choice, so there might be better overall choices of templating engines.

For instance, the main template for our party invitation example is this. Liquid lets us apply the invite.liquid partial once for each guest.

\documentclass{article}
\pagestyle{empty}
\usepackage{geometry,fontspec,tikz}
\geometry{a6paper,landscape,hmargin={1cm,1cm},vmargin={1cm,1cm}}
\setlength\parindent{0pt}
\begin{document}

\obeylines

{% for guest in data.guests %}
  {% include "invite.liquid" guest: guest %}
{% endfor %}

\end{document}

“Partial” templates in this context mean templates that can be included into other templates. These are our reusable components. The one ugly thing about them is that variables are dynamically scoped, so partial templates implicitly have access to all the variables in the templates that include them. Under templates/, partials have the .liquid extension. They are read and added to an InMemorySource in the Rust code, and that in turn is passed to the full template Parser.

The parallelism in this function is neatly encapsulated in the call to Rayon’s par_iter(). That takes care of spawning as many threads as we have cores, and running the rendering code on them.

The data we want to pass to the templates must first be converted to the dynamic types of Liquid: Value and &dyn ValueView. There’s not much to say here other than that we’ll see some mapping and wrapping action later.

Rendering the templates is as easy as calling parse and render_to. For simplicity, we just re-parse and re-render all the templates on every file change. It takes my laptop 11 seconds to do the 9000 templates from the previous section, so the speed is about 800 templates/second. Like with hashing, this could be optimized by only considering the templates which have changed in each loop, but it’s not really necessary unless we have multiple thousands of files.

The main program

Finally, we tie all these functions together in the main program. We use clap for command line argument parsing, and csv to load the guest list. The code is straightforward, but it gets messy around the aforementioned conversions to Liquid’s dynamic types.

type OrError<T> = Result<T, anyhow::Error>;
const TEMPLATES_DIR: &str = "templates";
const OUT_DIR: &str = "out";

#[derive(Parser)]
struct Opts {
    #[clap(long)]
    watch: bool,

    #[clap(long)]
    guest_list: String,
}

#[derive(Deserialize, Serialize)]
struct Guest {
    name: String,
    address: String,
}

fn main() -> OrError<()> {
    setup_log()?;
    let Opts { watch, guest_list } = Opts::parse();
    let mut data: HashMap<String, Value> = HashMap::new();
    let guests: Vec<Value> = csv::Reader::from_reader(File::open(guest_list)?)
        .deserialize()
        .map(|r: Result<Guest, _>| Value::Object(liquid::to_object(&r.unwrap()).unwrap()))
        .collect();
    data.insert("guests".to_string(), Value::array(guests));
    render_templates_and_compile_latex(scan_dir(TEMPLATES_DIR)?, &data)?;
    if watch {
        let updates = run_watcher(TEMPLATES_DIR)?;
        loop {
            render_templates_and_compile_latex(updates.recv()?, &data)?;
        }
    }
    Ok(())
}

We do all the templating work in render_templates_and_compile_latex() once at the start, and every time the file watcher reports a change. We’re careful to only run LaTeX compilation if template rendering succeeded, and we run it twice to account for internal references.

fn render_templates_and_compile_latex(
    template_files: Vec<PathBuf>,
    data: &HashMap<String, Value>,
) -> OrError<()> {
    match render_templates(TEMPLATES_DIR, template_files, OUT_DIR, data) {
        Ok(()) => {
            for _ in 0..2 {
                if let Err(err) =
                    compile_latex(PathBuf::new().join(OUT_DIR).join("invites.tex"), OUT_DIR)
                {
                    error!("Failed to compile latex: {err}")
                }
            }
        }
        Err(err) => {
            error!("Failed to render templates: {err}")
        }
    }
    Ok(())
}

The full 237 lines of code are available here.

Conclusion

That’s all it takes to write a polling and templating document generator in Rust. When I wrote this the first time, I was surprised by how easy it was. Given how many static site generators there are out there, I guess I shouldn’t have been, but ~200 lines is really short. And doing it in Rust makes it fast, even if the code is simple and unoptimized. I’ve used this scheme for this website, for my yearly tax report, and I imagine I’ll be using it in the future for other things as well.