A month and a half after launching MultiTool, we’ve hit our first major milestone: support for Cloudflare Workers! But this isn’t an announcement post.
Instead, I want to dig into one of the most technically interesting changes we made while adding support for a second cloud provider: how we replaced a brittle, relational config schema with a content-addressable store that eliminated 500 lines of boilerplate and made it easier to scale across cloud platforms.
Where we started: highly normalized yet non-relational
When we first launched MultiTool, we only supported AWS Lambda with API Gateway. Even then, we knew we’d be adding more providers soon, so we aimed for a flexible backend design.
Application configuration was a particularly thorny area. When users create a new AWS application in the MultiTool app, they provide several configurables so MultiTool knows how to deploy their application to the API Gateway. Things like the gateway name, stage name, and AWS region are saved in the backend so users don’t have to specify them again for every deployment. However, Cloudflare Workers doesn’t share any of these fields. Instead, we need to store the name of the Worker and the user’s account ID. In Rust, it made perfect sense to represent this as a sum type:
enum ApplicationConfig {
CloudflareWorker{
worker_name: String,
account_id: String,
},
AwsApiGateway {
gateway_name: String,
stage_name: String,
method: String,
path: String,
}
}
But modeling this in a Postgres schema is awkward. We created an enum to track the provider, then set up separate tables per provider with foreign keys to the owning application:
CREATE TYPE cloud_provider AS ENUM (
'AWS',
'CLOUDFLARE'
);
CREATE TABLE applications (
id SERIAL NOT NULL,
provider cloud_provider NOT NULL,
);
CREATE TABLE cloudflare_workers {
app_id SERIAL NOT NULL REFERENCES application (id),
worker_name TEXT NOT NULL,
account_id TEXT NOT NULL
};
CREATE TABLE aws_api_gateway {
app_id SERIAL NOT NULL REFERENCES application (id),
gateway_name TEXT NOT NULL,
stage_name TEXT NOT NULL,
method TEXT NOT NULL,
path TEXT NOT NULL
};
Now, when we wanted to load an application’s configuration, we first would query to get the application itself. Then, we could inspect the type
column to see if it’s a Cloudflare Worker or an AWS API Gateway. Finally, we could use a second query to load the configuration from the related table using the id
column.
use sqlx::{PgConnection, query_file_as};
#[derive(sqlx::Type)]
enum CloudProvider {
#[sqlx(rename = "AWS")]
Aws,
#[sqlx(rename = "CLOUDFLARE"]
Cloudflare,
}
// Pseudo-code adopted from our production backend.
async fn load_app_config(conn: &mut PgConnection, id: ApplicationId) -> Result {
// Get which cloud provider this application uses.
let app_row = query_as!(
ApplicationModel,
"SELECT provider FROM applications WHERE id = $1;",
id
).fetch_one(conn)
.await
.map_err(DatabaseError::from)?;
// Now, load the config data specific to this cloud provider,
// delegating to a separate function to perform this behavior.
// We could use dynamic dispatch here to simply the control flow, but that
// doesn't address the underlying complexity.
match app_row.provider {
CloudProvider::Cloudflare => load_cloudflare_config(conn, id).await?.into(),
CloudProvider::Aws => load_aws_config(conn, id).await?.into(),
}
}
async fn load_aws_config(conn: &mut PgConnection, id: ApplicationId) -> Result {
query_as!(
CloudflareWorkerConfig,
"SELECT * FROM aws_api_gateway WHERE id = $1",
id
).fetch_one(conn)
.await
.map_err(DatabaseError::from)
}
async fn load_cloudflare_config(conn: &mut PgConnection, id: ApplicationId) -> Result {
query_as!(
CloudflareWorkerConfig,
"SELECT * FROM cloudflare_workers WHERE id = $1",
id
).fetch_one(conn)
.await
.map_err(DatabaseError::from)
}
This worked, but it wasn’t as ergonomic as it could be, especially in Rust. Every new provider required a second query and a distinct data model, which meant a growing amount of branching logic and boilerplate to load configuration. As we add additional platforms, we expected the surface area would balloon quickly.
We certainly could have joined the two tables together and loaded the data in one query instead of two. But doing so would have introduced a different kind of complexity: we’d need to marshal heterogeneous rows into a single data type, then figure out which one we actually had. That means trying to deserialize the result as an AWS config first, then falling back to Cloudflare if that fails.
Our SQL library didn’t support that kind of messy fallback logic cleanly. And even if we could make it work, it would quickly become a code smell, especially if we scaled to 10+ platforms.
At this point, our key takeaways were:
- Our data was well-suited to a sum type, but Postgres didn’t provide a clean way to store it due to its non-relational shape.
- The two-query workaround introduced some inefficiency, but the performance impact was likely small compared to the cost in complexity.
- The bigger issue was scale: each new platform would multiply our branching and boilerplate, and we were already maintaining 500+ lines of config-specific query logic for AWS alone.
Learning from Cloudflare’s API
While implementing Worker support, we encountered Cloudflare’s file upload API. It’s clever, though a bit unconventional. To upload a worker’s static assets, you send Cloudflare a manifest of files along with a content hash for each one. Cloudflare uses those hashes to determine which files it already has in its cache, and asks you to upload only the ones it doesn’t.
Here’s what that upload flow looks like in practice, adapted from Cloudflare’s docs:
import * as fs from "fs";
import * as path from "path";
import * as crypto from "crypto";
import { FormData, fetch } from "undici";
import "node:process";
// This is how you represent a single static file to be uploaded to Cloudflare
// This is the content's address, represented as a hexadecimal hash and a size.
interface FileMetadata {
hash: string;
size: number;
}
// This is what you send when you want to start an upload session.
interface UploadSessionData {
uploadToken: string;
buckets: string[][];
// The key is the file's name and the value is the file "address", a hash unique to
// the content within the file.
fileMetadata: Record;
}
// What you get back from Cloudflare after sending it all of the content hashes.
interface UploadResponse {
result: {
jwt: string;
buckets: string[][];
};
success: boolean;
errors: any;
messages: any;
}
// Function to calculate the SHA-256 hash of a file and truncate to 32 characters
function calculateFileHash(filePath: string): {
fileHash: string;
fileSize: number;
} {
const hash = crypto.createHash("sha256");
const fileBuffer = fs.readFileSync(filePath);
hash.update(fileBuffer);
const fileHash = hash.digest("hex").slice(0, 32); // Grab the first 32 characters
const fileSize = fileBuffer.length;
return { fileHash, fileSize };
}
// Function to gather file metadata for all files in the directory.
// Call this function to loop through all of the files in the directory and calculate
// their content hash.
function gatherFileMetadata(directory: string): Record {
const files = fs.readdirSync(directory);
const fileMetadata: Record = {};
files.forEach((file) => {
const filePath = path.join(directory, file);
const { fileHash, fileSize } = calculateFileHash(filePath);
fileMetadata["/" + file] = {
hash: fileHash,
size: fileSize,
};
});
return fileMetadata;
}
What stood out to me about Cloudflare’s API was that it doesn’t require a specific hash algorithm, just a consistent one. That flexibility allows users to choose something fast and modern like xxhash, rather than older options like SHA1 or MD5. If the hash changes, Cloudflare simply treats it as a cache miss. If there’s a cache hit, you do have to reupload the file, saving the end user some bandwidth and time. This matters more when you’re doing frequent deployments and when node_modules
is extremely packed.
I'd seen similar patterns in tools like Docker and PNPM, which use content hashes to deduplicate layers and packages. But Cloudflare’s implementation showed us a new angle: content-addressability at the API level. That inspired me to see if we could adapt this idea to solve our database schema issue. Instead of trying to model config in a relational schema, we realized we could treat it like a content blob: hash it, store it once, and reference it wherever it’s used.
We arrived at the schema migration fairly quickly. We built a new table in Postgres with the content hash as part of the primary key, and for added safety, we included the application ID too.
Reframing the problem: store configs by hash
This new design had many added benefits. If a user changes their configuration, they get a new hash and a new row. If the config is reused, we just reference the existing row, reducing duplication.
This gives us immutability, deduplication, and a drastically simplified data model:
-- Create application_configurations table as a content addressable store
CREATE TABLE application_configurations (
id BIGSERIAL NOT NULL,
application_id INT NOT NULL REFERENCES applications (id),
-- 64-bit unsigned integer as rational
digest NUMERIC (20),
-- 64-bit unsigned integer as rational
size NUMERIC (20) GENERATED ALWAYS AS (length (data::TEXT)) STORED,
-- JSONB of the platform, ingress, and monitor configuration
data JSONB NOT NULL,
-- No "updated at" timestamp because this row is immutable.
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
-- Create a PK on the combo of application_id, disgest, and size.
PRIMARY KEY (application_id, digest, size)
) ;
Like Cloudflare, we chose to use a natural key triple for defense in depth. If either the application ID, the digest, or the size changes, we go ahead and create a new row. We don’t mind the disk overhead. We’re primarily concerned with hash collisions across organizations leading to accidental overwrites, though the risk there is minimal as long as the API remains immutable.
We initially implemented our INSERT query like this:
INSERT INTO
application_configurations (application_id, digest, data)
VALUES ($1, $2, $3)
ON CONFLICT DO NOTHING
RETURNING
id;
This worked well when a new row was created, but if a conflict occurred, it wouldn’t return any rows, so we had no way of knowing whether the row already existed.
We wanted something that would always give us back the row ID regardless of whether it was inserted fresh or reused. The usual ON CONFLICT DO NOTHING
clause suppressed the result tuple entirely when a conflict occurred.
To solve that, we reached for a short-circuiting CTE that would coalesce into the fresh-or-cached primary key:
WITH
new_config AS (
INSERT INTO
application_configurations (application_id, digest, data)
VALUES ($1, $2, $3)
ON CONFLICT DO NOTHING
RETURNING
id
)
SELECT COALESCE(
(
SELECT
id
FROM
new_config
), (
SELECT
id
FROM
application_configurations
WHERE
application_id = $1
AND
digest = $2
AND
size = length($3::TEXT)
))
);
This gave us a stable, idempotent insert-or-reuse mechanism, almost like an UPSERT query. As a result, we could swap out hundreds of lines of branching config-loading logic for a single insert call and a straightforward hash computation.
async fn create_application(tx: &mut PgConnection, application_id: ApplicationId, config: ApplicationConfig ) -> Result {
// Hash the config to calculate the content-address.
let mut hasher = XxHash3_64::with_seed(DEFAULT_SEED);
config_model.hash(&mut hasher);
let digest = BigDecimal::from(hasher.finish());
// Serialize the struct into a JSON string to insert into the database
let config_json = serde_json::to_value(config_model.clone())
.map_err(|err| sqlx::Error::Encode(Box::new(err)))?;
// Create the rollout and insert the config used at the same time
// NOTE: the 'size' field is auto-calculated by the database
query_file!(
"queries/application/create.sql",
application_id,
digest,
config_json,
)
.fetch_one(&mut *tx)
.await
.map_err(DatabaseError::from)
}
The result: simpler, safer, more scalable
This new approach brought several benefits all at once:
- Immutability: Each config is stored as a new row, so nothing ever gets overwritten.
- Idempotency: Our app doesn’t check if a config already exists, it just issues an UPSERT and the database takes care of whether to reuse or insert.
- Deduplication: Config rows can be shared across multiple rollouts, but each unique config is only stored once on disk.
- Simplicity: One table, one query. No more back-and-forth between database queries and app logic.
- Scalability: Adding a new provider now just means adding a new enum variant, not a new table or query.
Borrowing the content-addressable store pattern helped us solve a tricky schema design problem. Postgres doesn’t have a natural way to store sum types, so rather than fighting the database, we stepped back and rethought how we store config altogether. Hashing the full config and storing it once turned out to be simpler, more scalable, and a lot easier to work with. We cut 500 lines of boilerplate and unlocked a clean path to supporting more platforms in the future.
That shift made it possible for us to ship Cloudflare Worker support just six weeks after our launch and gave us confidence in the direction we’re taking: practical, scalable, and easy to build on.