Forget about lenses, let's all implement array-backed anonymous records for fun

July 9, 2021
Written by Artyom Kazak @ Monadfix

Intro

For a while, reinventing monad transformers was a fun weekend activity. Then the community has switched to van Laarhoven lenses and raytracers (or were raytracers always there? I forgot). Anyway, I propose a new weekend project: array-backed anonymous records.

I am not going to walk you through the entire implementation, just enough to get you going.

"What are anonymous records?"

I want to have record types without data. Something like this:

profile :: { id :: Int, name :: Text, ... }
profile = { id = 13, name = "Alexandra Botez", ... }

So, something like tuples, but with field names.

"What are array-backed anonymous records?"

I want to have O(1) field access — so, the field values should be in an array rather than a list. Otherwise it's going to be too easy (apologies if I'm jumping ahead and all the Symbol stuff is new to you). Something like this:

data KV (key :: Symbol) val = KV val

data Rec xs where
  RNil :: Rec '[]
  RCons :: val -> Rec xs -> Rec (KV key val ': xs)
  
profile :: Rec '[KV "id" Int, KV "name" Text]
profile = RCons 13 (RCons "Alexandra Botez" RNil)

We have O(1) cons, but O(n) access. Usually access is much more common, so I'd rather have O(1) access and O(n) cons.

Also, array-backed records are more complex and therefore more fun. A good mix of type-level stuff and "hey you actually have to mutate an array".

"What array type should I use?"

Easy level: don't use an array at all, use a hashmap. You lose all the fun of working with mutable arrays and indices, but all operations on the value-level become ridiculously easy.
Medium level: Vector. Access should be several times faster than with a hashmap, even for tiny records.
Slightly harder: a primitive Array does not support O(1) slices and therefore uses less memory than a Vector. Its cousin SmallArray uses less memory still. Both are more obscure and low-level than Vector so there's extra fun involved.
Slightly harder 2: SmallArray becomes more expensive to garbage collect when you hit the 128 element limit, so you'll need to figure out how to transparently upgrade it to an Array once you hit that limit. (I didn't do this.)

My experience is primarily with superrecord, which I've had to modify for a project at Juspay. Since it uses SmallArray, I'll go with that too:

data Rec (lts :: [*]) = MkRec { _unRec :: SmallArray# Any }

Any is something like Dynamic but without carrying the type information alongside. You can cast any lifted type in and out of Any with unsafeCoerce, but there are no guardrails if you accidentally get the type wrong.

"How should I write the record types?"

The easiest way is the KV approach we've seen above. Just tag values with a Symbol.

However, it's better to have some type-level guarantees that you've listed the values in the right order when constructing the record. Ideally, we want field names both in the record type and in the record body.

Something like this:

data FldProxy (t :: Symbol) = FldProxy

data label := value = FldProxy label := value

And then we can write FldProxy @"id" := 13 on the value level, which is ugly for sure unless you know about overloaded labels. Then we can define an IsLabel instance:

instance l ~ l' => IsLabel (l :: Symbol) (FldProxy l') where
  fromLabel = FldProxy

And now suddenly our world is filled with sunshine. We can write key–value pairs like this: #id := 13. Sweet.

On the type-level, we still have the same key–value pairs, but with := instead of KV and it's just nicer: Rec '["id" := Int, "name" := Text].

"How should I construct records?"

There are a few options.

The easiest is to provide two functions — one for creating an empty record and another for consing a new field to it. This means O(n²) construction, though, because we'll have to reallocate the array at every step.

A slightly trickier option is to define a separate type for key–value list records, and a function that would convert a list-based record into an array-backed record. Internally it would count the elements and then allocate the array of exactly the right size. This is the approach of vinyl's ARec.

The option I went with is — falling back on the tuple syntax. It is great to be able to write something like this:

mkRec (#id := 13, #name := "Alexandra Botez")

Unfortunately, it means defining instances for all possible tuple sizes (currently up to 62). But hey, you can write a script or a Template Haskell bit to generate those.

A pattern synonym makes this syntax even more nice:

pattern Rec :: RecTuple tuple fields => tuple -> Rec fields
pattern Rec a <- (toTuple -> a)
  where Rec = fromTuple

Rec (#id := 13, #name := "Alexandra Botez")

One more option is to implement a Template Haskell based construction function that would use haskell-src-exts to parse a fake record and convert it into an anonymous record. Something like this: [rec|X {id = 13, name = "Alexandra Botez"}|]. Mark Karpov's Template Haskell tutorial should get you going, and if you get stuck with haskell-src-exts, Well-Typed's large-records library should serve as inspiration. (Finally — yes of course you shouldn't be using haskell-src-exts, but GHC's native quasiquote parser doesn't like references to undefined field names.)

"How should I access record fields?"

This is going to be the easiest operation, and I'll show how to implement it. All other operations are on you, though.

First, write a type family that would figure out a field's index in the record's fieldlist (lts):

type family RecIndex (i :: Nat) (l :: Symbol) (lts :: [*]) :: Nat where
  RecIndex idx l (l := t ': lts) = idx
  RecIndex idx m (l := t ': lts) = RecIndex (1 + idx) m lts
  -- Optional but nice
  RecIndex idx m '[] = TypeError ('Text "Could not find label " ':<>: 'Text m)

The last case is a custom type error. When hitting it, GHC's error message will say Could not find label name rather than Could not match RecIndex ... or whatever.

Then just grab a value at that index. Since we are using a raw SmallArray#, we'll need a lot of #-es. If you are using Vector, you won't need them.

unsafeGet :: forall lts. Int -> Rec lts -> Any
unsafeGet (I# index#) (MkRec vec#) =
  let size# = sizeofSmallArray# vec#
  -- NB: I decided to physically store the values in the opposite order,
  -- but forgot why
  in case indexSmallArray# vec# (size# -# index# -# 1#) of
       (# a# #) -> a#
{-# INLINE unsafeGet #-}

get :: forall l v lts. Has l lts v => FldProxy l -> Rec lts -> v
get _ r =
  let !index = fromIntegral (natVal' (proxy# :: Proxy# (RecIndex 0 l lts)))
  in unsafeCoerce $ unsafeGet index r
{-# INLINE get #-}

The Has type family says that a field with a certain name and value type has to be contained in a fieldlist:

type Has l lts v =
  ( RecTy l lts ~ v,
  , KnownNat (RecIndex 0 l lts) )
  
-- | Get value type if you have the name
type family RecTy (l :: Symbol) (lts :: [*]) :: k where
  RecTy l (l := t ': lts) = t
  RecTy q (l := t ': lts) = RecTy q lts

Is the fun creeping in already?

"What other operations should I implement?"

Easy level: set, modify, instance Eq, instance Ord.
Medium level: insert, delete, union, intersection. You'll have to make decisions about whether you want to allow duplicates or not, and how you want to handle them.
Harder: JSON encoding/decoding.
Hard: conversions from/to native Haskell records, a Generic instance (which would give you JSON encoding/decoding and a bunch of other things for free),

Look at the jrec library if you get stuck.

"How should I handle records with same fields in a different order?"

Easy level: ignore this entirely, just provide an operation to shuffle the record from one key order into the other key order. This is what I did.
Medium/hard level: keep the keys always sorted. You'll have to implement a type-level sort (either insertion or mergesort) for this. You will also have to be careful with type inference, and after all that it's still going to be slow-ish. My opinion is that it isn't worth it, but look at superrecord if you want to go via this route.

Bonus challenges

Ditch arrays and use type-level red–black trees like red-black-record does. Awful compilation times, but fun. Probably fun.
Implement variants (anonymous unions, something like a multi-way Either) using the same type-level machinery you've had to painstakingly implement for records.

Enjoy!