Catalogue of My Stupidity: My Haskell 'GenericStruct' Nonsense

2015-06-23

(A note: I took these notes during my time at Urbanalta, intending them to be a private reference to myself on how to learn from some mistakes. I’ve tried to scrub the proprietary bits out and leave the general things behind. I do reference some other notes that probably will still stay private.)

Background

Some background on this: This is some notes on a small Haskell module I did at Urbanalta which I called GenericStruct. Most of this post is very Haskell-heavy and perhaps more suited to HaskellEmbedded as it’s a very niche usage even within Haskell.

I talk about this much more extensively in my handwritten work notes, circa 2015-05-05 to 2015-05-20, and in a source file GenericStruct.hs. Neither of those are online (and trust me that you don’t want to try to understand my scratch notes anyway), but a cleaner summary is in the Appendix.

The short version is that I needed a way to express the format a packed data structure, similar to a C struct in some ways, but without any padding between fields, and more explicit about the exact size in bits of fields. I wanted this format to also be able to carry some documentation with it because it was meant to be able to express data formats for Bluetooth Low Energy, and so I had a need to present this format in a human-readable way and possibly a more general machine-readable way (such as JSON). This was a similar design goal to my Python creation, AnnotatedStruct, from nearly 2 years ago, but here I wanted the benefits of static typing for when accessing these data structures.

What complicated matters somewhat is that these data structures, rather than being used directly in Haskell to store things, were to be used with Ivory to model the proper C code for reading and writing.

What I eventually came up with used Haskell records with specially-crafted data types inside them, and then GHC.Generics to iterate over these data types and inject into them some context information (a field accessor in Haskell by itself cannot have any information about ‘where’ in the record it is, whether in absolute terms or relative to any other field). Context information here meant things like a bit offset, a size, and a type representation.

This was a little complicated to implement, but overall, not particularly daunting. The GHC.Generics examples included generic serialization which is a very similar problem in many ways, and I followed from this example and a JSON example.

(Another note from some prior work: Do not attempt to do record access with Base.Data.Data. It can get some meta-information, like the constructor itself, but only in a sufficiently generic way that you may call nothing specific to a record on it.)

Problems

The problem that I ran into fairly quickly (but not quick enough) is that what I had created had no good way to let me nest data formats inside each other. For instance, I had a 16-bit value which I used in several places, and that 16-bit value was treated in many places as 16 individual bitfields with each bit representing a specific Boolean parameter unto itself. In other places, treating it as simply a single 16-bit integer was more meaningful - and this was related to it being used in several places, such that operations like copying from one place to another became meaningful.

I had no good way to express this. I could not define that format in one place, and then put it inside each struct that used it. I thought initially that implementing this would be a matter of just making certain structures recursive, and I was partly right, but I ran into such complication in the type system that I felt like it was not worth it to proceed further.

What I wrote yesterday when I ran into these serious snags was:

  • At this stage of complexity, I sort of wish I’d opted for Template Haskell instead. It would have absorbed the change much better. GHC.Generics required me to sort of bend the type system. The problem there is that it had only so far to bend, while with Template Haskell the whole Haskell language (more or less) would be at my disposal, not just some slightly-pliable parts of its type system. (Perhaps this is why Ivory does what it does.)
  • Idris may have handled this better too by virtue of its dependent types, and for similar reasons.
  • johnw (Freenode IRC #haskell denizen & Galois Inc. employee) mentioned a possibly-viable approach based around an applicative expression of data formats and not requiring things like Template Haskell or possibly Generics. (See IRC logs from 2015-06-04; this was via PM.)
  • Another person mentioned that this sounded like a job for Lenses, particularly, their Iso (isomorphism) type which had different ‘projections’ of data.

What I did right

I properly implemented a nice structure with GHC.Generics over top of Haskell records, and kept it fairly compact and strictly-typed. I started making use of it right away, and this meant errors would readily show up (generally as type errors at compile time) as I made changes.

I kept the code clean and well-documented, and this helped me out substantially with writing the code, understanding it, and then understanding that much of it shouldn’t have been written.

I think that overall that it was a good idea for me to treat the data format as a specification that could be turned into C code, into a JSON description, and into (eventually) a human-readable description.

What I did wrong, and should have done instead

Overall: I tried very hard to solve the very unique, very specific problem. This blocked my view of the real, more general problem. Despite active attempts to discern that general problem, I was fixated on specifics. Despite my preachy guideline elsewhere in my notes that, “Your problem is not a unique snowflake - someone else has studied it,” I assumed my problem was a relatively unique snowflake.

When I expressed the problem to other people, a number of them told me that this sounded like a job for the Lens library in Haskell. On top of this, I had used Lenses before. While I had not used them enough to know for certain that Lens was the best solution here, I had used them enough to know that they were a likely first place to look. But, I ignored this experience, and I ignored what other people told me.

Why I ignored this, I suspect, is because I was focusing too much on the specifics of the issue. This led me to believe that this problem was sufficiently unique and different that Lenses were not an approach I should even look at.

Lenses might not be the proper approach, but I am almost certain that examining them would have helped me.

I missed something very crucial: That I would need to nest data formats and share definitions between them. This should have been obvious to me: this is a functional language, and composition (which is what this is) is essential to abstraction and reuse.

I assumed that my solution would have to be tied to Haskell records. This was not a given. Further, I knew of three methods which created similar structures but did not rely on records: Lenses, Ivory structs, and Ivory bitdata. Records were an irrelevancy (even to the specifics I was fixated on) and I tightly coupled my solution to them. Records are not meant to compose, while some other structures are.

Short, general summary

(i.e. the part where I get really preachy about vague things)

  • Foresee what else your problem may need to encompass. Perhaps it only looks like a unique problem because you put too much weight on the specifics, and you’ve missed the ways it resembles existing, well-studied problems - perhaps even ones you are familiar with.

  • Perhaps you haven’t missed anything notable. Still, knowing what else it may need to encompass makes for better solutions, and may prevent you from making design decisions early on which fundamentally limit it in ways that matter later.

  • Unsurprisingly, ekmett probably solved your problem already. (Or perhaps acowley did with Vinyl, or perhaps compdata solves it…)

Appendix

My aim was to solve a few problems:

  • Outputting a concrete representation of an entire type (for the sake of inserting into JSON specs, for instance),
  • Creating a correspondence between native Haskell types and Ivory’s specific types (which ended up not being so necessary),
  • Packing and unpacking a struct value to and from memory automatically (via Ivory),
  • Unpacking and packing individual fields of a struct (also via Ivory),
  • Doing the above with the benefit of strict, static typing (i.e. not relying on strings to access a field),
  • Handling all of this with (in Ivory) an in-memory representation with no padding or alignment concerns,
  • Having a single specification of a type, including human-readable descriptions.

What I saw as the largest problem is that accessors for Haskell records have no accessible information on which field they access, or where that field is relative to anything else in the data structure. Thus, if I access a field of a record, I can have no information there about ‘where’ in the record it is unless I put that information there somehow. The pieces of information that I seemed to need in the field were the field’s overall index in the record, and the field’s overall memory offset.

A simpler form of generics, Data.Data, allowed me to solve the first problem easily and produce a list of something like (TypeRep, name, size, position). However, I ran into problems when trying to find a way to insert context into the record somehow. The central issue is that Data.Data provides no way to do anything other than generic operations on a field, and those generic operations are fairly limited. I could find no way to make something like a typeclass and use typeclass methods to update those fields.

GHC.Generics, on the other hand, made this fairly trivial. I could solve the first problem (albeit in a more complicated way), and what I eventually turned the other problems into was the need to take the generic record type itself (in this case, in the form of a Proxy to it), and given certain constraints on it, to create a generic constructor for this type.

This proved to be fairly easy. Most of GHC.Generics will as readily traverse a Proxy of a type as a value of the type, given some changes which mostly amount to a lot of use of fmap. The ‘to’ function in GHC.Generics (after I had cleared up some conceptual confusion) simply took the representation and removed the Proxy (or pushed it elsewhere), until it hit a certain innermost point in which one created an abstract representation of the constructor call itself, but this time with the proper data.

Most of the rest was just modifying the above to allow me to propagate context information such as index and memory offset, and dealing with the confusion of type families (which I ended up needing much less of than I initially thought).

stupidityTechnobabble

Post at HaskellEmbedded - Introducing Ion

Hello, World (from Wordpress, then Jekyll, then Hakyll, then Hugo)