Josiah
Nunemaker

founding engineer / frontend

Building interfaces that feel inviting to use, even when the work underneath is intricate. Rich interactions, careful detail, small things that compound.

Philosophy

The best interfaces are like well-tended gardens. They appear effortless, but they're the result of careful cultivation and constant refinement.

My work tends toward the places where interface complexity meets technical complexity. Virtualized lists, real-time sync, custom toolchains. The experience is only as good as the engineering underneath it.

Connect

Current Focus

01

Rich Interactions

Interfaces that feel good to use, not just functional

02

Polish & Detail

Microinteractions, transitions, the small touches that make an interface feel alive

03

Dev Experience

Tooling and AI helpers that make the work itself feel good

Recent Writing

view all writing →

Fixing Coolify websockets when running behind Caddy

I run Coolify on my homelab for some self-hosted projects, but I’ve opted not to run their reverse proxy, since I have a Caddy container that routes my other traffic. I’ve had a persistent error with the websockets, though, and Coolify’s firewall docs didn’t quite clear things up.

The issue that was showing in the console was as follows:

Coolify could not connect to its real-time service. This will cause unusual problems on the UI if not fixed! Please check the related documentation (https://coolify.io/docs/knowledge-base/cloudflare/tunnels/overview) or get help on Discord (https://coollabs.io/discord).

I experimented with adding the additional ports Coolify mentions in their docs as reverse_proxy rules in the handler for Caddy, but that didn’t help. What I was missing was the key detail that I need to handle it for certain routes not just the ports. Eventually, I stumbled upon this solution buried behind the “Show N previous replies” button on a Github discussion.

In a nutshell, the solution is adding path matchers to your route handler so that the terminal websockets and the realtime UI websockets get routed to the correct ports on the Coolify server. The final caddyfile is like so (as mentioned in the gh discussion, surfacing here so it’s easier to discover):

@terminal {
        path /terminal/ws /terminal/ws/*
}

@app {
        path /app/*
}

handle @terminal {
        reverse_proxy YOUR_COOLIFY_IP:6002
}

handle @app {
        reverse_proxy YOUR_COOLIFY_IP:6001
}

When Coolify is running the reverse proxy itself, it automatically adds these rules behind the scenes, but when you are managing the Caddy instance, you have to add them yourself.

Permalink: Fixing Coolify websockets when running behind Caddy

Auto-distillation for tasks by large models

We’re in the era where large models that run in the cloud can handle most straightforward tasks fairly quickly and accurately. The downside being, they are large, expensive models that run in the cloud. There’s many tasks that could be done quickly and effectively with smaller, specifically trained local models on my own machine with my own compute, without a dependency on Anthropic, OpenAI, OpenRouter, or whatever the cloud provider du jour is. What if we made making these smaller distilled models a core part of our workflows?

I’ve not sussed out a great interface for it yet, but the general workflow is that a larger model (or its harness) can identify when you are repeating a similar task multiple times. If it’s a task that it deems “simple”, it creates its own dataset of examples to train a much smaller model on. Some good cases for this could be categorization (eg, organize these downloads), simple data extraction (eg, get the first names out of this json), summarization, image conversion (eg, does this image that the user just downloaded need to be converted to a jpeg?), and similar tasks.

The advantage here being that, when tuned for one specific task, a smaller model can be much more efficient at tasks that it is capable of doing. It can be loaded into memory faster, so we can essentially run them as “on-demand” models. Their cost-effectiveness and speed would allow us to have a handful (or more) of very specific models that are able to do one thing, and do one thing well. Unlocking cost-effective automations that work online and offline, and are speedy, private, and essentially free because they run on our own hardware.

The dream would be some level of OS (or browser?) integration so that it can proactively identify and automate tasks that the user does frequently, so that the user doesn’t have to think about and identify tasks that could be effectively automated or accelerated by a model.

Challenges#

  • How does a model determine repetition, and “simple tasks”? That’s a hard problem to solve. Maybe v1 has the user manually specify these models, then publish them to some kind of marketplace for? Then you have the discovery issue, and the fact that the user (or agent) still needs to figure out which tasks are worth pulling a smaller model to automate.
  • Models struggle with correctness. How do you verify that this small, well-trained model is actually working properly?
  • As we create more and more models for specialized tasks, how do we keep track of them? Or update them? It could quickly turn into a model wrangling problem.
Permalink: Auto-distillation for tasks by large models