highlights
Dec 06, 2024

Designing a typed batch storage API

Rebecca Mark

In the 20.1.0 release of the Unison Cloud client, there's a new Ability related to Storage, Batch, which performs a bulk read from the database, returning values in one round-trip. At a high level, Cloud users can expect faster database queries since some reads internal to OrderedTable can happen in bulk. For the curious, the design process for an ability that interacts with arbitrarily typed Storage has some interesting considerations worth learning.

It's tempting to write Batch.read: Table k v -> [k] -> [v] and be done with it. But we needed a bulk read API that can span multiple tables that store arbitrary Unison types to enable the fewest database round trips possible. A batched read API that doesn't support heterogeneous types has limited value to us.

How then can we model a "batch" of mixed key and value types, while still keeping true to the Unison promise of typed schemas? Modeling a “batch” as read : [Any] → [Any] doesn't cut it here. Instead, you can think of a "batch" as something you build with a fork-await pattern.

ability Batch

Build a batch by issuing multiple Batch.forkRead requests, each of which preserves the type of its desired value. Then Batch.tryAwaitRead the requests in the batch, extracting the arbitrary Unison type.

This API is type-safe on the level of individual tables, but flexible enough to perform a bulk database request across tables of multiple types:

Storage.doc.example do use Storage write use Table Table db = Database (Database.Id.Id "id") "db" table1 = Table "table1" table2 = Table "table2" write db table1 1 "🌹" write db table2 "abc" false batchRead db do read1 : Read Text read1 = forkRead table1 1 read2 : Read Boolean read2 = forkRead table2 "abc" (awaitRead read1, awaitRead read2)
Either.Right ("🌹", false)

Batch fork/await drawbacks:

The arguable drawback to this API is that it is harder to see what requests are being executed in what batch, since the semantics of when a batch is actually flushed depends on the first call to tryAwaitRead after a batch has been built.

You can read more about those semantics in the API docs.

We felt this tradeoff was ultimately worth it. First, the type safety provided by this API is more in keeping with the spirit of Cloud storage. (Imagine trying to pair [Any] reads with [Any] responses!) Second, the performance improvements gained by running bulk database requests across tables in a single operation were too significant to overlook. This approach ensures that applications using typed Storage can handle large read workloads more effectively.