No tengo la menor idea

2010-01-28

Record Selector Punning with Type-Level Strings

Filed under: haskell — robgreayer @ 23:16

Haskell users occasionally complain about the language’s lack of a powerful record system. Programmers want exstensibilty, a different approach to the scope of field names, and a better update syntax than Haskell’s algebraic data types, enhanced with field labels, provide.

There are some proposals, such as this one, for new Haskell record systems. There are alternative record system implementations, though none in Haskell’s most popular compiler. There are elaborate exploitations of Haskell’s powerful type system to create record systems (such as HList) or even object oriented libraries (such as OO Haskell). And there are hacks and kludges to work around one or more limitations of the existing record system. This post falls into this last category.

The referenced complaint points out that the scoping of Haskell record selectors can be annoying. If these were real records (based on expectations of records from other languages), something like this would work:

data User = User {
        name :: String,
        host :: Host
    }

data Host = Host {
        name :: String,
        address :: String
    }

but it doesn’t because of scoping problems: each of the record selectors is a function, and I now have two name functions of different types. One solution (valid Haskell98) is to stick these definitions in separate modules, and import them qualified (e.g. import qualified Host.Host as H). Every reference to Host’s name would become (something like) H.name, and every reference to User’s name would become U.name. This solution is unpalatable enough to some that there exists a Haskell language proposal for type directed name resolution which would solve specifically this problem, introducing some new syntax along the way. Another (much maligned) workaround for the problem is to use Haskell type classes to achieve record selector overloading. In the example above, the simplest approach would look something like:

data User = User String Host
data Host = Host String String

class HasName a where
    name :: a -> String

class HasHost a where
    host :: a -> Host

-- etc.
instance HasName User where
    name (User v _) = v
instance HasName Host where
    name (Host v _ ) = v

To remove the monomorphic-ness of the record selector result type, I could use Haskell’s bright, shiny, relatively new associated types extension, or its hoary-but-still-not-standard multi-parameter type classes with functional dependencies extension. Here’s how I might enhance the above with functional dependencies:

class HasName a b | a -> b where
    name :: a -> b

instance HasName User String where
    name (User v _) = v

But still, for every record selector name, I need a type class. Further, even though I’d like to use records from different modules and packages together, seamlessly, I won’t be able to; I’ll end up with multiple versions of equivalent type classes — each author that wants to name a selector name will have to define a HasName type class, and the selector defined in that type class won’t be useful for pulling name values out of records defined by another author in another package. If this convention were followed universally, there’d be thousands of pitiful little type classes in different namespaces scattered across Haskell’s package space. Unpleasant.

So I’d like to contribute my (as far as I know) own variation on the type-class- driven approach, which, be it hacky, kludgy, or otherwise strange, at least doesn’t have the problem of creating thousands of incompatible type classes. Indeed, central to the solution is just one type class:

class Selection a b where
    type SelectionType a b
    select :: b -> a -> SelectionType a b

This class definition requires two exstensions to Haskell: MultiParamTypeClasses and TypeFamilies. Also central to the solution is the creation of a type level alphabet of characters. To represent a few letters I might define:

data Ca = Ca
data Cb = Cb
data Cc = Cc
-- etc.

To represent all the letters of the alphabet (that can actually be used in Haskell identifiers), I can do something like this:

let f i = let nm = mkName $ ("C" ++ show (fromEnum i)) in dataD (return []) nm [] [normalC nm []] []
      in sequence $ map f $ '\'':[ c | c <- [minBound .. maxBound], isIDStart c || isIDContinue c ]

The above expression requires TemplateHaskell extension (supported only by GHC) and at least GHC 6.12.x. The above snippet also uses the unicode-properties package from hackage, and generates on the order of 90,000 new types. Ideally (for some values of ‘ideal’) I would generate a type for every possible character, and not filter by isIDStart and isIDContinue, but doing so generates a few million types, which seemed to make ghci unhappy; consequently, I limited my alphabet to characters relevant to mimicking haskell identifiers. (Note also, ‘a’ becomes C97 rather than Ca).

A type level string is some a product of type level characters. The type-level string for "hello" would be C150 :* C101 :* C108 :* C108 :* C111 :* (), where :* is a right-associative infix type constructor (requiring the TypeOperators extension) and () is used to mean an empty product/list.

Now I can define record selectors using type level strings and my single Selection type class. Defining one ‘by hand’ for the "name" selector would look like:

instance Selection User (C110 :* C97 :* C109 :* C101 :* ()) where
    type SelectionType User (C110 :* C97 :* C109 :* C101 :* ()) = String
    select _ (User v _ _) = v

Of course, I wouldn’t want to define the above instance by hand; I’d use Template Haskell once again to create the instance. Template Haskell is capable, as of GHC 6.12.1, of generating a class instance with an associated type. If I go back to my original definition of User:

data User = User {
    name :: String,
    host :: Host
    }

I can write a Template Haskell function that takes the name of my type, determines the names of the desired selectors, and generates the necessary instances. I won’t present the (somewhat tedious) TH code, but a function can be written such as:

mkSelectionInstances :: Name -> -- the name of the record type
                        Q [Dec] -- the Haskell declarations

that will generate the required instances. It makes use of a selToType function which creates a selector name type given a selector name string:

selToType :: String -> Q Type
selToType cs = foldl appT (conT ''()) $ map f cs
    where f c = (conT ''(:*)) `appT` (conT $ mkName $ "C" ++ show (fromEnum c))

I’m still saddled with a few problems, perhaps the least of which is that I still cannot define two records in the same scope with the same selector names. But, because I am discarding (or at least not planning to use) the ‘real’ selectors, I can adopt a convention for naming selectors that the mkSelectionInstances function will understand, and thus generate the selectors I really want.

For example, the convention could be "when creating type-level selectors, ignore everything in the name from the last tick (‘) mark to the end of the identifier". I would then define my records like this:

data User = User {
        name'u :: String,
        host'u :: Host
    }

data Host = Host {
        name'h :: String,
        address'h :: String
    }

mkSelectionInstances ''User
mkSelectionInstances ''Host

and mkSelectionInstances will create name and host selector instances for User, and name and address selector instances for Host. I could now define (‘by hand’) aliases for the selectors:

name = select (C110 :* C97 :* C109 :* C101 :* ())
host = select (C104 :* C111 :* C115 :* C116 :* ())
address = select (C97 :* C100 :* C100 :* C114 :* C101 :* C115 :* C115 :* ())

name can now be used to select the name component of a User value or a Host value, or any other value of a type t with a Selection t C110 :* C97 :* C109 :* C101 :* ()) instance, no matter where it has been defined:

cory = Host "cory.EECS.Berkeley.EDU" 128.32.48.187
joe = User "Joe" cory

joesName = name joe
joesHostName = name $ host $ joe

Still, there is a problem: where should the the name, host, and address selectors be defined? I certainly don’t want to have to actually define them by hand, but if I generate them in the module where I generate the selector instances, I run into problems.

Assume I define name/host/address as above and export them, from module A. Another module, B defines similar selectors (name and address, say, and some other unrelated selectors). A third module C wants to use the records defined in modules A and B. It doesn’t matter whether it imports name and address from A or from B, but it must only import one definition. This restriction places an annoying burden on the author of C.

The approach is, nevertheless, one possible answer to the question of where to define identifiers representing my selectors. I’ve come up with two other answers, which I’ll explore below.

Alternative 1: ‘Import’ identifiers using a TH splice

Instead of defining the selector identifiers in the module that defines the record type(s) and the Selector instances for that type, use a TH function to ‘import’ the selectors in the module where they are used. For example, if I have a module C that imports records from modules A and B, I’d do something like:

module C where

-- the module that contains my selection class, etc.
import Data.Record.Selection
import A(User(User),Host(Host))
import B(Student(Student))

importSelectors (record::User) (record::Host) (record::Student)

I have one function, importSelectors, that, given a few record types, will generate the necessary selector function definitions, taking care not to generate a selector definition more than once if more than one of the specified records has a particular named selector. Even though User and Host, and perhaps Student each have a ‘name’ selector, importSelectors must generate only one name function (name = select (C110 :* C97 :* C109 :* C101 :* ())).

For this magic to work, I need a couple more type classes:

class HasSelectors a where
    record :: a
    selectors :: a -> [String]
    record = undefined

class Importer a where
    importGen :: [String] -> a

HasSelectors has an instance for each record type, and allows my ‘importer’ to get a list of selector names for each record. The mkSelectionInstances function would also generate the instance for this class, for a particular record. The record member of HasSelectors exists just to make the importSelectors call at the top of a module look nicer… I could just as easily write:

importSelectors (undefined::User) (undefined::Host) (undefined::Student)

The Importer has two instances (requiring OverlappingInstances):

instance Importer (Q [Dec]) where
    importGen ss = liftM2 (++) (mapM gent ss') (mapM gen ss') where
        ss' = nub ss
        gent s = sigD nm [t| Selection a $t => a -> SelectionType a $t |] where
            nm = mkName s
            t = return $ selToType s
        gen s = valD (varP nm) (normalB [e|select (undefined :: $t)|]) [] where
            nm = mkName s
            t = return $ selToType s

instance (HasSelectors a, Importer b) => Importer (a -> b) where
    importGen ss = (\ h -> importGen (selectors h ++ ss))

I’m not sure if there’s a name for this particular type class trick, enabling the definition of a function that takes an variable number of arguments, but I lifted it from the HaXR package. The type classes allow, finally, the definition of importSelectors:

importSelectors :: Importer a => a
importSelectors = importGen []

Some minor issues remain — how, for example, should you define the selector functions if you want to use them in the same module where you define your records? Because of Template Haskell stage restrictions, I wouldn’t be able to make use of the generated HasSelectors instance in another Template Haskell splice. I could have mkSelectionInstances generate them, but not export them, but I could run into a problem if I also wanted to import other records, with overlapping selector names, into a module where I’m also defining records. Nevertheless, with a few tweaks for corner cases like these, this approach has potential.

Alernative 2: Don’t create identifiers at all!

Instead, use a ‘literal’ representing a selector. The Haskell QuasiQuotes extension allows us to create custom literals, and I can create a Quasi Quoter to represent my record selectors:

selToExp :: String -> Q Exp
selToExp cs = return (VarE 'select `AppE`
   (foldr AppE (ConE '()) $ map f cs))
   where f c = (ConE '(:*)) `AppE` (ConE $ mkName $ "C" ++ show (fromEnum c))

selToPat :: String -> Q Pat
selToPat _ = fail "sorry, unimplemented"

π = QuasiQuoter selToExp selToPat

I no longer have to worry about creating identifiers for my selectors. I have a whole family of literals available to me, wherever I import my Quasi Quoter:

cory = Host "cory.EECS.Berkeley.EDU" 128.32.48.187
joe = User "Joe" cory

joesName = [$π|name|] joe
joesHostName = [$π|name|] $ [$π|host|] $ joe

Now, I realize that [$π|name|] is a bit more unwieldy than name. I have (in the case of name) four character of meaningful identifier and six characters of overhead, which is not convenient. But this overhead is purely syntactic, and suitable desugarer could (in the event that this idea becomes wildly popular) be created to make type level literals palatable. Perhaps ‘raw’ type level strings could look like «name», and my literal selectors could look like ℓname (a script ‘l’ symbol prefix indicating that some sort of label is being introduced), such that the following properties always hold (forall strings s):

ℓs == select «s»
«s» == $(selToExp s) :: $(selToType s)

Given this (unlikely to be implemented!) syntax sugar, my example would look like:

cory = Host "cory.EECS.Berkeley.EDU" 128.32.48.187
joe = User "Joe" cory

joesName = ℓname joe
joesHostName = ℓname $ ℓhost $ joe

By defining records this way, I’ve thrown away some of the utility of Haskell record selectors. As implemented above, the selectors are only useful as record destructors, whereas normal Haskell record selectors are useful also as (part of) record construction as well as record update. That’s not a particular problem for this approach: it could easily be combined with, say, first-class labels. mkSelectionInstances would need to be redefined a bit to produce instances that provide the right SelectionType (e.g. User :-> String instead of String for the name selector, where :-> is a first-class-label type constructor).

Pushing the idea of type level strings as labels a little further, it would be relatively easy to reimplement some HList features — specifically, the labels of its extensible record system — using type level strings. HList depends on type level naturals and a namespace type to define labels; type level strings could be used as easily. Algorithms written in terms of labels as defined above could (I think) be agnostic as to whether they operate on a record whose underlying representation is simple Haskell ADT or a (suitably reimplemted) HList.

About these ads

3 Comments »

  1. You could change the TH importSelectors to expect a Name, and then just use:

    importSelectors ”User ”Host ”Student

    That might help with verbosity.

    Comment by Edward Kmett — 2010-01-31 @ 11:16

  2. Yes, that would work — one reason I chose not to do this is that the modules defining User, Host, and Student would *have* to export the User, Host, and Student data constructors, otherwise they wouldn’t be fully reifiable in TH. You’d lose the option of hiding the data constructors and providing ‘smart’ constructors in their stead. The verbose approach, with the HasSelectors class, allows hiding the data constructors.

    The other reason I chose not to do it is that I thought hiding the actual ‘raw’ record selectors, which you definitely want to be able to do, would hide them from reify… which it doesn’t, I discovered! (I wonder if it ought to, though… seems like being able to discover selector names with reify that are otherwise hidden by the module system could break some sort abstraction that the module author wants to achieve.)

    Comment by robgreayer — 2010-01-31 @ 12:20

  3. [...] there, such as records, has, and the well known hlist. Wreckage’s novelty is in its use of type level strings. It is implemented with generous support from type families (to implement things like type/value [...]

    Pingback by Wreckage 0.0.0 « No tengo la menor idea — 2010-05-10 @ 16:52


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Shocking Blue Green Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: