No tengo la menor idea

2010-05-10

Wreckage 0.0.0

Filed under: haskell — robgreayer @ 16:00

Haskell 98 (and Haskell 2010) has a ‘sort of’ record system, algegraic data type with named fields. The (unreleased, but githubbed) Wreckage library is yet another attempt to implement a better record system within Haskell, without sacrificing some of the nice features of the named fields system. There are a few other ’embedded’ record systems out there, such as records, has, and the well known hlist. Wreckage’s novelty is in its use of type level strings. It is implemented with generous support from type families (to implement things like type/value heterogeneous list sorting), Template Haskell, Quasi Quotation, and other non-standard features too numerous to mention. Indeed it is a veritable train-wreck of Haskell extensions, hence the name (‘wreck’ being homonymous with ‘rec’, shorthand for ‘record’. Hilarious, I know).

There are three key features of ADTs with named fields that would be a shame to sacrifice.

Construction With Field Names

Given an ADT with named fields, e.g.

    data Person = Person {
        name :: String,
        age :: !Int
    }

we can build a Person value using the field names, instead of relying on their positions within the record. We can supply the fields in any order, and can leave out non-strict fields:

    jane = Person { age = 21, name = "Jane Doe" }
    john = Person { age = 30 }

Access and Update with Field Names

Given the definitions above, we can access components of a Person value with a label:

    janesAge = age jane

and, with a special syntax, can update the fields of a person value:

jim = john { name = "Jim Beam" }
olderJane = jane { age = age jane + 5 }

Although the above syntax for update is convenient, it doesn’t compose well for ‘deep’ update (which is a problem Wreckage tries to resolve, in the style of fclabels).

Pattern Matching with Field Names

Field names work nicely with Haskell pattern matching:

    nameAndAge (Person { name = n, age = i }) = n ++ ": " + show i

Wreckage 0.0.0

Wreckage attempts to preserve these features, in spirit, while still implementing basic records ‘under the covers’ as simple algebraic data types. An equivalent Wreckage record would be defined as follows:

    wreck [d|
        data Person {
            name :: String,
            age :: !Int }
        |]

The declaration above creates a ‘Person’ data type with two fields, along with a constructor ‘mkPerson’, and many (many) instances of various type classes that will make record construction and destruction ‘work’.

Before showing examples of how to use Wreckage records, I’ll expand the example to illustrate one key feature of wreckage:

    wreck [d|
        data Pet {
            name :: String,
            age :: !Int
        }|]

    wreck [d|
        data Person {
            name :: String,
            age :: !Int,
            pet :: Maybe Pet
        }|]

As plain data declarations, these would be illegal in Haskell, because of the reuse of the names ‘name’ and ‘age’ between the two records. Wreckage lifts this restriction. Field names can be reused among records, regardless of what scope the wreckage record is declared in.

Now we can construct values of type Person and Pet:

    fido = mkPet ( ℓname =: "Fido" :* ℓage =: 7 :* ())
    jane = mkPerson ( ℓage =: 21 :* ℓname = "Jane Doe" :* ())

The syntax for construction is slightly different, but has the same essential features: we need only specify strict fields at the time of construction (not specifying a strict field would be a type error), and can specify the fields in any order. The ℓ-notation (which works by leveraging a GHC compiler feature, rather than a Haskell language extension) will be explained fully later, but it introduces a type-level string, which is used instead of a normal identifier to represent a field name.

Access and update of Wreckage records is straightforward if slightly different than with ADTs with named fields:

    janesAge = get ℓage jane
    janesPetsAge = get (ℓpet:-ℓage) jane

    -- change jane's pet's name
    jane' = set (ℓpet:-ℓage) jane 8

Fields can be composed, or chained, together for both access and update, as shown above. Multiple fields can be accessed simultaneously:

    (nm :* age) = get (ℓname :* ℓage) jane

Without much effort upon the part of the Wreckage library, we can pattern match on Wreckage records using the View Patterns extension, implemented in GHC:

    nameAndAge (get (ℓname:*ℓage) -> (n:*i)) = nm ++ ": " ++ show i

Note that the ‘nameAndAge’ function above works both on fido and on jane (i.e. on the Person type and on the Pet type). Or, indeed, on any other wreckage record type, declared in any module in any package, that happens to also have a name field and and age field. With other ’embedded’ record systems, an ‘age’ field defined in one module will clash with an ‘age’ field defined in another module, however the fields are defined.

Person and Pet are (under the covers) just algebraic data types (without field labels), so access and update ought to be about as efficient as with standard field labels.

Wreckage and extensability

Wreckage is intended to have a good story regarding exstensibility, implementing something like scoped labels. Right now, the implementation is rudimentary, with the addition of just two operations: extend and restrict (under the covers extension works right now much like HList, but I intend future versions to work a bit differently). A Wreckage record can be extended with addtional fields:

    janeV2 = jane `extend` (ℓheight =: 1.7)

    janesHeight = get ℓheight janeV2

A field can be removed from a record:

    janeV3 = janeV2 `restrict` ℓage

    s = nameAndAge janeV3 -- type error!

Records are ‘scoped’ in that you can extend a record with a field name it already contains. The new field simply hides the existing field (until the field is removed via ‘restrict’).

The egregious hack

The ℓ notation used above ‘works’ in GHC and GHCI through the use of the Haskell preprocessor feature of GHC. A program can be passed to GHC or GHCI, which is used to transform each .hs file just prior to compilation (and after CPP has possibly been run on the source):

    ghci -F -cmdF prewreck

The prewreck program (included with wreckage) transforms and input file by changing any ℓ sequence into a quasi-quotation. The quasi-quotation could be used directly, but the ℓ-notation is more readable. One day something almost as nice might work out of the box with GHC.

Caveats

Wreckage may have exposed a performance issue with the GHC compiler (6.12.1 or 6.12.2) in that there’s an exponential blowup of memory used by the compiler when trying to optimize (-O1) a wreckage client (no problem with the library itself). Turning off optimization (-O0) and clients compile quickly.

Wreckage doesn’t try to deal with polymorphic records, yet. There’s no technical difficulty with them, but the first implementation of the wreck function was made simpler by ignoring them, for now.

2010-01-28

Record Selector Punning with Type-Level Strings

Filed under: haskell — robgreayer @ 23:16

Haskell users occasionally complain about the language’s lack of a powerful record system. Programmers want exstensibilty, a different approach to the scope of field names, and a better update syntax than Haskell’s algebraic data types, enhanced with field labels, provide.

There are some proposals, such as this one, for new Haskell record systems. There are alternative record system implementations, though none in Haskell’s most popular compiler. There are elaborate exploitations of Haskell’s powerful type system to create record systems (such as HList) or even object oriented libraries (such as OO Haskell). And there are hacks and kludges to work around one or more limitations of the existing record system. This post falls into this last category.

The referenced complaint points out that the scoping of Haskell record selectors can be annoying. If these were real records (based on expectations of records from other languages), something like this would work:

data User = User {
        name :: String,
        host :: Host
    }

data Host = Host {
        name :: String,
        address :: String
    }

but it doesn’t because of scoping problems: each of the record selectors is a function, and I now have two name functions of different types. One solution (valid Haskell98) is to stick these definitions in separate modules, and import them qualified (e.g. import qualified Host.Host as H). Every reference to Host’s name would become (something like) H.name, and every reference to User’s name would become U.name. This solution is unpalatable enough to some that there exists a Haskell language proposal for type directed name resolution which would solve specifically this problem, introducing some new syntax along the way. Another (much maligned) workaround for the problem is to use Haskell type classes to achieve record selector overloading. In the example above, the simplest approach would look something like:

data User = User String Host
data Host = Host String String

class HasName a where
    name :: a -> String

class HasHost a where
    host :: a -> Host

-- etc.
instance HasName User where
    name (User v _) = v
instance HasName Host where
    name (Host v _ ) = v

To remove the monomorphic-ness of the record selector result type, I could use Haskell’s bright, shiny, relatively new associated types extension, or its hoary-but-still-not-standard multi-parameter type classes with functional dependencies extension. Here’s how I might enhance the above with functional dependencies:

class HasName a b | a -> b where
    name :: a -> b

instance HasName User String where
    name (User v _) = v

But still, for every record selector name, I need a type class. Further, even though I’d like to use records from different modules and packages together, seamlessly, I won’t be able to; I’ll end up with multiple versions of equivalent type classes — each author that wants to name a selector name will have to define a HasName type class, and the selector defined in that type class won’t be useful for pulling name values out of records defined by another author in another package. If this convention were followed universally, there’d be thousands of pitiful little type classes in different namespaces scattered across Haskell’s package space. Unpleasant.

So I’d like to contribute my (as far as I know) own variation on the type-class- driven approach, which, be it hacky, kludgy, or otherwise strange, at least doesn’t have the problem of creating thousands of incompatible type classes. Indeed, central to the solution is just one type class:

class Selection a b where
    type SelectionType a b
    select :: b -> a -> SelectionType a b

This class definition requires two exstensions to Haskell: MultiParamTypeClasses and TypeFamilies. Also central to the solution is the creation of a type level alphabet of characters. To represent a few letters I might define:

data Ca = Ca
data Cb = Cb
data Cc = Cc
-- etc.

To represent all the letters of the alphabet (that can actually be used in Haskell identifiers), I can do something like this:

let f i = let nm = mkName $ ("C" ++ show (fromEnum i)) in dataD (return []) nm [] [normalC nm []] []
      in sequence $ map f $ '\'':[ c | c <- [minBound .. maxBound], isIDStart c || isIDContinue c ]

The above expression requires TemplateHaskell extension (supported only by GHC) and at least GHC 6.12.x. The above snippet also uses the unicode-properties package from hackage, and generates on the order of 90,000 new types. Ideally (for some values of ‘ideal’) I would generate a type for every possible character, and not filter by isIDStart and isIDContinue, but doing so generates a few million types, which seemed to make ghci unhappy; consequently, I limited my alphabet to characters relevant to mimicking haskell identifiers. (Note also, ‘a’ becomes C97 rather than Ca).

A type level string is some a product of type level characters. The type-level string for "hello" would be C150 :* C101 :* C108 :* C108 :* C111 :* (), where :* is a right-associative infix type constructor (requiring the TypeOperators extension) and () is used to mean an empty product/list.

Now I can define record selectors using type level strings and my single Selection type class. Defining one ‘by hand’ for the "name" selector would look like:

instance Selection User (C110 :* C97 :* C109 :* C101 :* ()) where
    type SelectionType User (C110 :* C97 :* C109 :* C101 :* ()) = String
    select _ (User v _ _) = v

Of course, I wouldn’t want to define the above instance by hand; I’d use Template Haskell once again to create the instance. Template Haskell is capable, as of GHC 6.12.1, of generating a class instance with an associated type. If I go back to my original definition of User:

data User = User {
    name :: String,
    host :: Host
    }

I can write a Template Haskell function that takes the name of my type, determines the names of the desired selectors, and generates the necessary instances. I won’t present the (somewhat tedious) TH code, but a function can be written such as:

mkSelectionInstances :: Name -> -- the name of the record type
                        Q [Dec] -- the Haskell declarations

that will generate the required instances. It makes use of a selToType function which creates a selector name type given a selector name string:

selToType :: String -> Q Type
selToType cs = foldl appT (conT ''()) $ map f cs
    where f c = (conT ''(:*)) `appT` (conT $ mkName $ "C" ++ show (fromEnum c))

I’m still saddled with a few problems, perhaps the least of which is that I still cannot define two records in the same scope with the same selector names. But, because I am discarding (or at least not planning to use) the ‘real’ selectors, I can adopt a convention for naming selectors that the mkSelectionInstances function will understand, and thus generate the selectors I really want.

For example, the convention could be "when creating type-level selectors, ignore everything in the name from the last tick (‘) mark to the end of the identifier". I would then define my records like this:

data User = User {
        name'u :: String,
        host'u :: Host
    }

data Host = Host {
        name'h :: String,
        address'h :: String
    }

mkSelectionInstances ''User
mkSelectionInstances ''Host

and mkSelectionInstances will create name and host selector instances for User, and name and address selector instances for Host. I could now define (‘by hand’) aliases for the selectors:

name = select (C110 :* C97 :* C109 :* C101 :* ())
host = select (C104 :* C111 :* C115 :* C116 :* ())
address = select (C97 :* C100 :* C100 :* C114 :* C101 :* C115 :* C115 :* ())

name can now be used to select the name component of a User value or a Host value, or any other value of a type t with a Selection t C110 :* C97 :* C109 :* C101 :* ()) instance, no matter where it has been defined:

cory = Host "cory.EECS.Berkeley.EDU" 128.32.48.187
joe = User "Joe" cory

joesName = name joe
joesHostName = name $ host $ joe

Still, there is a problem: where should the the name, host, and address selectors be defined? I certainly don’t want to have to actually define them by hand, but if I generate them in the module where I generate the selector instances, I run into problems.

Assume I define name/host/address as above and export them, from module A. Another module, B defines similar selectors (name and address, say, and some other unrelated selectors). A third module C wants to use the records defined in modules A and B. It doesn’t matter whether it imports name and address from A or from B, but it must only import one definition. This restriction places an annoying burden on the author of C.

The approach is, nevertheless, one possible answer to the question of where to define identifiers representing my selectors. I’ve come up with two other answers, which I’ll explore below.

Alternative 1: ‘Import’ identifiers using a TH splice

Instead of defining the selector identifiers in the module that defines the record type(s) and the Selector instances for that type, use a TH function to ‘import’ the selectors in the module where they are used. For example, if I have a module C that imports records from modules A and B, I’d do something like:

module C where

-- the module that contains my selection class, etc.
import Data.Record.Selection
import A(User(User),Host(Host))
import B(Student(Student))

importSelectors (record::User) (record::Host) (record::Student)

I have one function, importSelectors, that, given a few record types, will generate the necessary selector function definitions, taking care not to generate a selector definition more than once if more than one of the specified records has a particular named selector. Even though User and Host, and perhaps Student each have a ‘name’ selector, importSelectors must generate only one name function (name = select (C110 :* C97 :* C109 :* C101 :* ())).

For this magic to work, I need a couple more type classes:

class HasSelectors a where
    record :: a
    selectors :: a -> [String]
    record = undefined

class Importer a where
    importGen :: [String] -> a

HasSelectors has an instance for each record type, and allows my ‘importer’ to get a list of selector names for each record. The mkSelectionInstances function would also generate the instance for this class, for a particular record. The record member of HasSelectors exists just to make the importSelectors call at the top of a module look nicer… I could just as easily write:

importSelectors (undefined::User) (undefined::Host) (undefined::Student)

The Importer has two instances (requiring OverlappingInstances):

instance Importer (Q [Dec]) where
    importGen ss = liftM2 (++) (mapM gent ss') (mapM gen ss') where
        ss' = nub ss
        gent s = sigD nm [t| Selection a $t => a -> SelectionType a $t |] where
            nm = mkName s
            t = return $ selToType s
        gen s = valD (varP nm) (normalB [e|select (undefined :: $t)|]) [] where
            nm = mkName s
            t = return $ selToType s

instance (HasSelectors a, Importer b) => Importer (a -> b) where
    importGen ss = (\ h -> importGen (selectors h ++ ss))

I’m not sure if there’s a name for this particular type class trick, enabling the definition of a function that takes an variable number of arguments, but I lifted it from the HaXR package. The type classes allow, finally, the definition of importSelectors:

importSelectors :: Importer a => a
importSelectors = importGen []

Some minor issues remain — how, for example, should you define the selector functions if you want to use them in the same module where you define your records? Because of Template Haskell stage restrictions, I wouldn’t be able to make use of the generated HasSelectors instance in another Template Haskell splice. I could have mkSelectionInstances generate them, but not export them, but I could run into a problem if I also wanted to import other records, with overlapping selector names, into a module where I’m also defining records. Nevertheless, with a few tweaks for corner cases like these, this approach has potential.

Alernative 2: Don’t create identifiers at all!

Instead, use a ‘literal’ representing a selector. The Haskell QuasiQuotes extension allows us to create custom literals, and I can create a Quasi Quoter to represent my record selectors:

selToExp :: String -> Q Exp
selToExp cs = return (VarE 'select `AppE`
   (foldr AppE (ConE '()) $ map f cs))
   where f c = (ConE '(:*)) `AppE` (ConE $ mkName $ "C" ++ show (fromEnum c))

selToPat :: String -> Q Pat
selToPat _ = fail "sorry, unimplemented"

π = QuasiQuoter selToExp selToPat

I no longer have to worry about creating identifiers for my selectors. I have a whole family of literals available to me, wherever I import my Quasi Quoter:

cory = Host "cory.EECS.Berkeley.EDU" 128.32.48.187
joe = User "Joe" cory

joesName = [$π|name|] joe
joesHostName = [$π|name|] $ [$π|host|] $ joe

Now, I realize that [$π|name|] is a bit more unwieldy than name. I have (in the case of name) four character of meaningful identifier and six characters of overhead, which is not convenient. But this overhead is purely syntactic, and suitable desugarer could (in the event that this idea becomes wildly popular) be created to make type level literals palatable. Perhaps ‘raw’ type level strings could look like «name», and my literal selectors could look like ℓname (a script ‘l’ symbol prefix indicating that some sort of label is being introduced), such that the following properties always hold (forall strings s):

ℓs == select «s»
«s» == $(selToExp s) :: $(selToType s)

Given this (unlikely to be implemented!) syntax sugar, my example would look like:

cory = Host "cory.EECS.Berkeley.EDU" 128.32.48.187
joe = User "Joe" cory

joesName = ℓname joe
joesHostName = ℓname $ ℓhost $ joe

By defining records this way, I’ve thrown away some of the utility of Haskell record selectors. As implemented above, the selectors are only useful as record destructors, whereas normal Haskell record selectors are useful also as (part of) record construction as well as record update. That’s not a particular problem for this approach: it could easily be combined with, say, first-class labels. mkSelectionInstances would need to be redefined a bit to produce instances that provide the right SelectionType (e.g. User :-> String instead of String for the name selector, where :-> is a first-class-label type constructor).

Pushing the idea of type level strings as labels a little further, it would be relatively easy to reimplement some HList features — specifically, the labels of its extensible record system — using type level strings. HList depends on type level naturals and a namespace type to define labels; type level strings could be used as easily. Algorithms written in terms of labels as defined above could (I think) be agnostic as to whether they operate on a record whose underlying representation is simple Haskell ADT or a (suitably reimplemted) HList.

2009-11-15

Sorting Type-Heterogeneous Values with Type Families

Filed under: haskell — robgreayer @ 20:55

In an ongoing project, I’ve needed to do a bit of Haskell type-hackery with type functions implemented with type families. One thing I’ve needed is a type-heterogeneous sort — more than a type level sort, but a function that takes a value of one product type, and produces a value of a different, sorted product type. In doing so it needs to both sort the parameter type into a (potentially) different result type, as well as sort the parameter value into a result value (which of course needs to be of the result type). This works out to be a bit more complex than a pure type level sort (such as the quicksort implemented here, using funcitonal dependencies, or the merge sort implmenented here, again using functional dependencies), as Haskell type classes have to be used to select the right value-level functions to use for each step of the sort process. It was an interesting puzzle to solve — at least for me, as type level programming isn’t my strong point! (To make the problem a bit easier, I implemented a simpler, if less efficient, sorting algorithm, but quick sort or merge sort should also work at the type/value level).

Given a type for tagged values, e.g.

    -- n is a phantom, type-level natural
    newtype Tagged n t = Tagged t

And a product type, e.g.

    data a :* b = a :* b 
    infixr 6 :*

I want to define tagged values such as:

    tv0 :: Tagged N0 String
    tv0 = Tagged "hello"
    tv1 :: Tagged N1 Int
    tv1 = Tagged 2
    tv2 :: Tagged N2 Bool
    tv2 = Tagged False
    -- etc.

I want to be able to apply a function sort to an arbitrary product of these types:

    x = sort ( tv2 :* tv0 :* tv1 )
    y = sort ( tv1 :* tv2 :* tv0 )

And the result should be the sorted product, i.e. x == y && x == Tagged "hello" :* Tagged 2 :* Tagged False. Note that in the first application of sort, sort needs a type like:

    sort :: (Tagged N2 Bool :* Tagged N0 String :* Tagged N1 Int) ->
        (Tagged N0 String :* Tagged N1 Int :* Tagged N2 Bool)

and in the second, it needs:

    sort :: (Tagged N1 Int :* Tagged N2 Bool :* Tagged N0 String) ->
        (Tagged N0 String :* Tagged N1 Int :* Tagged N2 Bool)

It needs to work for arbitrarily long products of any Tagged type (with the constraint that the index n must be a type-level natural). My solution to the problem follows.

> {-# LANGUAGE EmptyDataDecls, TypeFamilies, UndecidableInstances,
>              ScopedTypeVariables, OverlappingInstances, TypeOperators,
>              FlexibleInstances, NoMonomorphismRestriction #-}
> module TSort where

I started with the usual definition for type level naturals (with some convenient aliases for a few of them):

> data Zero
> data Succ n
> type N0 = Zero
> type N1 = Succ Zero
> type N2 = Succ N1
> type N3 = Succ N2
> type N4 = Succ N3
> type N5 = Succ N4
> type N6 = Succ N5
> type N7 = Succ N6
> type N8 = Succ N7
> type N9 = Succ N8
> type N10 = Succ N9

I also use type level booleans:

> data TRUE
> data FALSE

I need one inequality operator on naturals:

> type family LessThan m n
> type instance LessThan Zero Zero = FALSE
> type instance LessThan (Succ n) Zero = FALSE
> type instance LessThan Zero (Succ n) = TRUE
> type instance LessThan (Succ m) (Succ n) = LessThan m n

I need a type level if statement:

> type family Cond c t f
> type instance Cond TRUE t f = t
> type instance Cond FALSE t f = f

I need my tagged and product types:

> newtype Tagged n a = Tagged a deriving (Show,Eq)
> data a :* b = a :* b deriving (Show,Eq)
> infixr 6 :*

The general type of my sort is going to be something like (but not exactly):

    sort :: (Sortable a) => a -> Sorted a

where Sortable is a type class and Sorted is a associated type of that type class. It didn’t work out to be quite that simple, but this is how I started thinking about the problem.

I envisioned the Sortable class as:

    class Sortable a where
        type Sorted a
        sort :: a -> Sorted a

Trying to create a simple instance of this type, for a pair of tagged values got me a bit bogged down:

    instance Sortable (Tagged m a :* Tagged n b) where
        type Sorted (Tagged m a :* Tagged n b) = ???
        sort = ???

The type of Sorted for this pair is conditional based on the value of the indices m and n, so I can write the type function like this:

    type Sorted (Tagged m a :* Tagged n b) =
        Cond (LessThan n m) -- if n < m
            (Tagged n b :* Tagged m a) -- then swap
            (Tagged m a :* Tagged n b) -- else don't

but how to write the sort function? It can’t test the type and decide whether to swap the values. The key insight for solving this problem was that I needed some additional types to represent the operations I needed. For this simple reordering of a pair, I need two operations – Swap, to swap the elements of the pair if they are out of order, and Id (identity), to leave them alone if they are already in order:

    data Swap x y = Swap x y
    data Id a = Id a

Each of these can be seen as implementing a single step in a sorting algorithm). Each will be an instance of a Sorter class:

> class Sorter a where
>     type Sorted a
>     type Unsorted a

mkSort ‘makes’ a sort function from (Unsorted a -> Sorted a) based on the type of its first argument, which is a proxy argument (its value is never examined).

>     mkSort :: a -> Unsorted a -> Sorted a

The Swap instance of Sorter just swaps the values in a pair:

> instance Sorter (Swap x y) where
>     type Sorted (Swap x y) = y :* x
>     type Unsorted (Swap x y) = x :* y
>     mkSort _ (x :* y) = y :* x

The Id instance of Sorter leaves its value alone:

> instance Sorter (Id a) where
>     type Sorted (Id a) = a
>     type Unsorted (Id a) = a
>     mkSort _ v = v

Now my Sortable class, instead of having a sort function, needs instead a function to produce a sorter value:

> class Sortable a where
>     type SorterType a
>     sorter :: a -> SorterType a

And the sort can be written in terms of functions from the two classes:

> sort v = mkSort (sorter v) v

But still, how to implement the sorter function for my pair of tagged elements?

> instance Sortable (Tagged m a :* Tagged n b) where

The type function for the SorterType is now:

>     type SorterType (Tagged m a :* Tagged n b) =
>         Cond (LessThan n m)
>             (Swap (Tagged m a) (Tagged n b))
>             (Id (Tagged m a :* Tagged n b))
>     sorter = undefined -- ???

As it turns out, the above implementation is sufficient. Since the mkSort function doesn’t examine its first argument — it’s merely a proxy — the sorter function doesn’t need to produce a value. So sorter can be left undefined — all its ‘work’ is done at compile time, in producing the right type such that the instance of ‘mkSort’ can be inferred (I have to admit, I discovered this accidentally while writing the sorting function. I left sorter undefined because I couldn’t think of how to define it, then wrote sort in terms of it, and accidentally ran it. It worked, my mind boggled, and then I figured out what what going on). Because no concrete value of either the Id or Swap types are ever needed, the actual definitions for these can be:

> data Id a
> data Swap a b

To sort something more than a pair, I need to be able to do more than just swap two types and values. A simple algorithm for sorting the product is to sort the ‘tail’ of the product, then insert the head of the product into the sorted tail (this is insertion sort — not the most efficient, but simpler than merge sort or quick sort). For this I need two more operations, a sort-then-insert operation, and an insert operation (the sort-then-insert operation sorts its ‘tail’ and then (if necessary) uses the insert operation to insert its head into the sorted tail). I need these operation types:

> data SortInsert h t
> data Insert x y

I need a new class similar to Sortable, which has a InserterType and a function to produce a InserterType from the instance type:

> class Insertable a where
>     type InserterType a
>     inserter :: a -> InserterType a

A insertable assumes that its tail is sorted. For a product of tagged values (more than a pair), the Insertable instance looks like this:

> instance Insertable (Tagged m a :* Tagged n b :* c) where

The InserterType depends on whether the head element is less than the first element of the sorted tail. If it is, then the InserterType is the identity, otherwise it is an Insert x y:

>     type InserterType (Tagged m a :* Tagged n b :* c) =
>              Cond (LessThan n m)
>                  (Insert (Tagged m a) (Tagged n b :* c))
>                  (Id (Tagged m a :* Tagged n b :* c))

As in the sorter function of a Sortable, we don’t need the inserter to do anything more than create a proxy argument:

>     inserter = undefined

Inserting two elements is the same as sorting them:

> instance Insertable (Tagged m a :* Tagged n b) where
>     type InserterType (Tagged m a :* Tagged n b) = 
>         Cond (LessThan n m) (Swap (Tagged m a) (Tagged n b))
>             (Id (Tagged m a :* Tagged n b))
>     inserter = undefined

A Inserter is similar to a sorter: it takes an value of an Uninserted type and produces a value of an Inserted type:

> class Inserter a where
>     type Inserted a
>     type Uninserted a

mkInsert ‘makes’ a function (Uninserted a -> Inserted a) from the instance type. The first argument again is just a proxy, it’s value never inspected:

>     mkInsert :: a -> Uninserted a -> Inserted a

The identity instance is straightforward:

> instance Inserter (Id a) where
>     type Inserted (Id a) = a
>     type Uninserted (Id a) = a
>     mkInsert _ v = v

The Swap instance of Inserter is essentially the same as the Swap instance of Sorter:

> instance Inserter (Swap a b) where
>    type Inserted (Swap a b) = b :* a
>    type Uninserted (Swap a b) = a :* b
>    mkInsert _ (a :* b) = b :* a

The Insert instance for a longer product has a rather convoluted context:

> instance (Insertable (a :* c)
>          , Inserter (InserterType (a :* c)) 
>          ,(a :* c) ~ Uninserted (InserterType (a :* c))
>          ) => Inserter (Insert a (b :* c)) where

The Inserted type is the first element of the tail followed by a inserted into the tail (via sort):

>     type Inserted (Insert a (b :* c)) = b :* Inserted (InserterType (a :* c))

The uninserted type is simply the original (uninserted) product:

>     type Uninserted (Insert a (b :* c)) = a :* b :* c

And the insert function effectively swaps the first two elements of the product and inserts the (new) tail:

>     mkInsert _ (x :* y :* z) = y :* mkInsert (inserter $ x :* z) (x :* z)

The Sorter instance for a SortInsert that we need again has a convoluted context:

> instance (Sortable (b :* c)
>           ,Unsorted (SorterType (b :* c)) ~ (b :* c)
>           ,Uninserted (InserterType (a :* Sorted (SorterType (b :* c)))) ~
>                (a :* Sorted (SorterType (b :* c)))
>           ,Sorter (SorterType (b :* c))
>           ,Insertable (a :* Sorted (SorterType (b :* c)))
>           ,Inserter (InserterType (a :* Sorted (SorterType (b :* c))))
>           ) => 
>         Sorter (SortInsert a (b :* c)) where

The type of the output of the sort is the type of the output of the head of the product inserted with the sorted tail:

>     type Sorted (SortInsert a (b :* c)) =
>         Inserted (InserterType (a :* (Sorted (SorterType (b :* c)))))

The type of the input of the sort is the original (unsorted, uninserted) product:

>     type Unsorted (SortInsert a (b :* c)) = a :* b :* c

And the sort function is just the insert of the head into the tail after the tail is sorted:

>     mkSort _ (a :* b :* c) = mkInsert (inserter preinsert) preinsert
>         where preinsert = a :* sort (b :* c)

Finally, the Sortable instance for an arbitrary product is:

> instance (Sortable (b :* c), Sorter (SortInsert a (b :* c))) => 
>     Sortable (a :* b :* c) where
>     type SorterType (a :* b :* c) = SortInsert a (b :* c) 
>     sorter = undefined

Given some tagged values, I can try out the sort function:

> t0 :: Tagged N0 String
> t0 = Tagged "hello"
> t1 :: Tagged N1 Bool
> t1 = Tagged True
> t2 :: Tagged N2 Int
> t2 = Tagged 5
> t3 :: Tagged N3 String
> t3 = Tagged "goodbye"
   *TSort1> sort (t2 :* t0 :* t1 :* t3)
   Tagged "hello" :* (Tagged True :* (Tagged 5 :* Tagged "goodbye"))
   *TSort1> sort (t3 :* t2 :* t1 :* t0)
   Tagged "hello" :* (Tagged True :* (Tagged 5 :* Tagged "goodbye"))
   *TSort1>

Beyond being an interesting puzzle to solve, there’s at least some use for such a function. For example, in a DSL that has operations that support named parameter association:

> rawConnect :: (Tagged N0 String :* Tagged N1 Int :* Tagged N2 String :*
>      Tagged N3 String) -> IO ()
> rawConnect (Tagged host :* Tagged port :* Tagged user :* Tagged pass) =
>     putStrLn $ "connecting to " ++ host ++ " on port " ++ show port ++ 
>        " with username " ++ user ++ " and password " ++ pass
> 
> connect = rawConnect . sort
> 
> (=:) = ($)
> host :: String -> Tagged N0 String
> host = Tagged
> port :: Int -> Tagged N1 Int
> port = Tagged
> user :: String -> Tagged N2 String
> user = Tagged
> password :: String -> Tagged N3 String
> password = Tagged

Now you can invoke the ‘connect’ operation with parameters in no particular order:

> doit = do
>    putStrLn "trying to connect!"
>    connect ( user =: "admin" :*
>              password =: "pa55w0rd" :*
>              port =: 22 :*
>              host =: "example.com" )
>    putStrLn "connected!"

… which is at least mildly cool, for certain definitions of cool.

Brought to you by BlogLiterately with generous support from pandoc, hscolour and viewers like you.

2009-11-03

BlogLiterately v0.2

Filed under: haskell — robgreayer @ 10:04

I’ve created a new version of BlogLiterately, now on hackage. The new version is quite similar to what I blogged about here, with some enhancements:

  • it now supports categories.
  • it now supports highlighting-kate, if you’ve installed Pandoc with highlighting support.
  • you can highlight Haskell with hscolour using CSS classes or inline styles.
  • you can highlight Haskell and non-Haskell with highlighting-kate (via Pandoc). (But only via CSS classes and a separate or embedded stylesheet, not via inline styles. So it does me personally no good!).

I contemplate a future upgrade will allow highlighting-kate marked up code to have the CSS class-based styling ‘baked’ into the HTML in a way similar to how I handle hscolour.

Anyone who enjoys this tool, thank the authors of Pandoc, hscolour, HaXml, HaXR and, of course, GHC and the Haskell Platform, as this program is just a bit of glue that binds these much bigger pieces together for one simple application.

2009-10-26

Blogging Literately in Haskell

Filed under: haskell — robgreayer @ 22:05

I’ve made one blog post about Haskell, and found the processs to be too tedious to be tolerated. The wysiwyg tinyMCE editor that is avalaible for WordPress is useless for formatting code segments and editing directly in the raw HTML editor WordPress provides is agonizing. Figuring there must be a better way, I looked around for an offline tool that would just work for me. I turned up a few editors, free and non-free, some which looked just horrid, a few almost reasonable, but none that would make the mix of Haskell and narrative as easy as it seems it ought to be. What I really want is an easy way to take Literate Haskell (you know, those seemingly innocuous text messages that are suddenly interrupted with ominous passages like:

> {-# LANGUAGE DeriveDataTypeable #-}
> module BlogLiterately where

for no immediately apparent reason) file, perhaps with simple markdown encoded text commentary, and just magically upload it to my blog, with good HTML formatting of the markdown and Haskell sections.

Haskell, theoretically, has all sorts of libraries for doing all sorts of wonderful things. Perhaps it wouldn’t be to hard to mash something together from Haskell libraries that did precisely what I wanted.

Somehow I’d osmotically absorbed from the ether the knowlege that there was a Haskell-based tool around called Pandoc, by John MacFarlane, which deals with markdown, HTML and other formats. It is, like all good Haskelly things, available on hackage, both as a program and library. So it seemed like a good place to start.

> import Text.Pandoc

Looking over the documentation for Pandoc, I observed that although was it able to do wonderful things with loads of formats, it didn’t automatically do what I wanted with the actual Haskell source code. What I want is to make the source look like what Haskell code looks like as part of haddock documentation, especially the Haddock documentation available on Hackage. Haddock uses hscolour, a tool by Malcolm Wallace, to format source listings.

> import Language.Haskell.HsColour(hscolour,Output(..))
> import Language.Haskell.HsColour.Colourise(defaultColourPrefs)

Assuming Pandoc and hscolour do what I want, I’d still need some way of actually publishing the blog post. With a bit of googliness I learned that my blog software supports something called the MetaWeblog API, which is an XML-RPC-based protocol for interacting with blogs.

There’s a Haskell XML-RPC library, HaXR, by Bjorn Bringert, (on hackage) which seems like it should be appropriate.

> import Network.XmlRpc.Client
> import Network.XmlRpc.Internals

And it works that out I’ll need some miscellaneous other stuff. Since I’m writing a command line tool, I’ll need to process the command line arguments, and Neil Mitchell’s CmdArgs library ought to work for that:

> import System.Console.CmdArgs

I’m going to end up needing to parse and manipulate XHTML, so I’ll use Malcolm Wallace’s HaXml XML combinators:

> import Text.XML.HaXml
> import Text.XML.HaXml.Verbatim

I’ll need to do some text IO, which I’ll use the UTF8 encoding for, leading to Eric Mertens’ ubiquitous (until GHC 6.12!) utf8-string library:

> import qualified System.IO.UTF8 as U

And I’ll need a couple other bits and pieces:

> import Control.Monad(liftM,unless)
> import Text.XHtml.Transitional(showHtml)
> import Text.ParserCombinators.Parsec

The program I envision will read in a literate Haskell file, use Pandoc to parse it as markdown, then somehow find the code blocks in the (parsed) input, and use hscolour to transform those. Pandoc turns its input into a structure of type:

data Pandoc = Pandoc Meta [Block]

where a Block (the interesting bit, for my purposes) looks like:

-- | Block element.
data Block  
    = Plain [Inline]        -- ^ Plain text, not a paragraph
    | Para [Inline]         -- ^ Paragraph
    | CodeBlock Attr String -- ^ Code block (literal) with attributes 
    | RawHtml String        -- ^ Raw HTML block (literal)
    | BlockQuote [Block]    -- ^ Block quote (list of blocks)
    | OrderedList ListAttributes [[Block]] -- ^ Ordered list (attributes
                            -- and a list of items, each a list of blocks)
    | BulletList [[Block]]  -- ^ Bullet list (list of items, each
                            -- a list of blocks)
    | DefinitionList [([Inline],[Block])]  -- ^ Definition list 
                            -- (list of items, each a pair of an inline list,
                            -- the term, and a block list)
    | Header Int [Inline]   -- ^ Header - level (integer) and text (inlines) 
    | HorizontalRule        -- ^ Horizontal rule
    | Table [Inline] [Alignment] [Double] [[Block]] [[[Block]]]  -- ^ Table,
                            -- with caption, column alignments,
                            -- relative column widths, column headers
                            -- (each a list of blocks), and rows
                            -- (each a list of lists of blocks)
    | Null                  -- ^ Nothing
    deriving (Eq, Read, Show, Typeable, Data)

The literate Haskell that Pandoc finds in a file ends up in various CodeBlock elements of the Pandoc document. Other code can also wind up in CodeBlock elements — normal markdown formatted code. The literate Haskell code seems to be differentiated from other by the Attr component, which has the form:

type Attr = (String, [String], [(String, String)])

Experimentation reveals that CodeBlock elements that have an Attr of the form (_,["sourceCode","haskell"],_) are literate Haskell code blocks, and other CodeBlock elements are the markdown code blocks. I want to syntax-highlight both kinds of code blocks, but when both get rendered to the output HTML, I want to preserve the literate Haskell as literate (it needs the prepended > character). Actually, I want to do just a little bit more…

When writing a Literate Haskell File, I might want to include non-Haskell but nevertheless ‘code’ examples in the text. Samples of how one might express the same thing in ML, or examples of how to run a program, etc. So not all code blocks should be colourised by hscolour, just specific Haskelly ones. Markdown doesn’t provide a way to express what kind of code a code block might be, but since I’m munging all the code blocks anyway, I can adopt a simple convention. If a code example looks like:


    [Haskell]
    foo :: String -> String

it is a Haskell block. If it looks like something else, e.g.


    [C++]
    cout << "Hello World!";

or


    [other]
    foo bar baz

etc., it is something else.

ed: It turns out that although Markdown doesn't provide a way to do this, Pandoc's extensions to Markdown do allow it, in a way fairly similar to what I've proposed. If I revise this further, I'll use the Pandoc convention. Thanks to John MacFarlane for pointing this out in the comments.

I can strip off the code type indicator from the beginning of each block as I examine them for code to colourise. I use Parsec to recognize the opening tag:

> unTag :: String -> (String, String)
> unTag s = either (const ("",s)) id $ parse tag "" s
>    where tag = do
>              tg <- between (char '[') (char ']') $ many $ noneOf "[]"
>              skipMany $ oneOf " t"
>              (string "rn" <|> string "n")
>              txt <- many $ anyToken
>              eof
>              return (tg,txt)

To highlight the syntax using hscolour (which produces HTML), I'm going to need to transform the String from a CodeBlock element to a String suitable for the RawHtml element (because the hscolour library transforms Haskell text to HTML). Pandoc strips off the prepended > characters from the literate Haskell, so I need to put them back, and also tell hscolour whether the source it is colouring is literate or not. The hscolour function looks like:

hscolour :: Output      -- ^ Output format.
         -> ColourPrefs -- ^ Colour preferences (for formats that support them).
         -> Bool        -- ^ Whether to include anchors.
         -> Bool        -- ^ Whether output document is partial or complete.
         -> String	-- ^ Title for output.
         -> Bool        -- ^ Whether input document is literate haskell or not
         -> String      -- ^ Haskell source code.
         -> String      -- ^ Coloured Haskell source code.

Hscolour supports a few different HTML-based Output formats: HTML, CSS, and ICSS. HTML is HTML with font tags, which are inherently evil and so I'll won't bother exploring this option. CSS is formatting using HTML class tags, which allows you the flexibility of using a separate stylesheet to control how hscolour-annotated code is style. ICSS is 'inline-CSS' a.k.a. HTML 'style' tags, which is really what I need for my WordPress blog, as I refuse to the pay $15 a year WordPress.com charge for the privilege of storing a stylesheet on their site. Unfortunately, hscolour's inline styling options are quite limited (very few colours, little control of fonts), but I've come up with a slightly more convoluted way of turning CSS coloured source into inline-css HTML. So I will start off by using hscolour in CSS mode to transform my source:

> colourIt literate srcTxt = 
>     hscolour CSS defaultColourPrefs False True "" literate src'
>     where src' | literate = prepend srcTxt
>                | otherwise = srcTxt

Prepending the literate Haskell markers on the source is trivial:

> prepend s = unlines $ map p $ lines s where p s = '>':' ':s

Hscolour uses HTML span elements and CSS classes like 'hs-keyword' or hs-keyglyph to markup Haskell code. What I want to do is take each marked span element and replace the class attribute with an inline style element that has the markup I want for that kind of source. I can capture the style posibilities with a type:

> data StylePrefs = StylePrefs {
>     keyword  :: String,
>     keyglyph :: String,
>     layout   :: String,
>     comment  :: String,
>     conid    :: String,
>     varid    :: String,
>     conop    :: String,
>     varop    :: String,
>     str      :: String,
>     chr      :: String,
>     number   :: String,
>     cpp      :: String,
>     selection :: String,
>     variantselection :: String,
>     definition :: String 
>     } deriving (Read,Show)

Each field in the above type will contain the desired style for the class hscolour assigns to a span of code. A default style that produces something like what the source listings on Hackage look like is:

> defaultStylePrefs = StylePrefs {
>     keyword  = "color: blue; font-weight: bold;"
>   , keyglyph = "color: red;"
>   , layout   = "color: red;"
>   , comment  = "color: green;"
>   , conid    = ""
>   , varid    = ""
>   , conop    = ""
>   , varop    = ""
>   , str      = "color: teal;"
>   , chr      = "color: teal;"
>   , number   = ""
>   , cpp      = ""
>   , selection = ""
>   , variantselection = ""
>   , definition = ""
>   }

I can read these preferences in from a file using the Read instance for StylePrefs. I could handle errors better, but this should work:

> getStylePrefs "" = return defaultStylePrefs
> getStylePrefs fname = liftM read (U.readFile fname)

Hscolour produces a String of HTML. To transform it, we need to parse it, manipulate it and then re-render it as a String. I'll use HaXml to do all of this:

> xformXml :: StylePrefs -> String -> String
> xformXml prefs s =  verbatim $ filtDoc (xmlParse "input" s) where
>     -- filter the document (an Hscoloured fragment of Haskell source)
>     filtDoc (Document p s e m) =  c where
>         [c] = filts (CElem e)
>     -- the filter is a fold of individual filters for each CSS class
>     filts = foldXml $ foldl o keep [
>             filt "keyword" keyword,
>             filt "keyglyph" keyglyph,
>             filt "layout" layout,
>             filt "comment" comment,
>             filt "conid" conid,
>             filt "varid" varid,
>             filt "conop" conop,
>             filt "varop" varop,
>             filt "str" str,
>             filt "chr" chr,
>             filt "num" number,
>             filt "cpp" cpp,
>             filt "sel" selection,
>             filt "variantselection" variantselection,
>             filt "definition" definition
>         ]
>     -- an individual filter replaces the attributes of a tag with
>     -- a style attribute when it has a specific 'class' attribute.
>     filt lbl f =
>         replaceAttrs [("style",f prefs)] `when`
>             (attrval $ ("class",AttValue [Left ("hs-" ++ lbl)]))

To completely colourise a CodeBlock we now can create a function that transforms a CodeBlock into a RawHtml block, where the content contains marked up Haskell (possibly with literate markers):

> colouriseCodeBlock prefs (CodeBlock attr@(_,inf,_) s) =
>     if tag == "Haskell" || lit
>         then RawHtml $ xformXml prefs $ colourIt lit s'
>         else CodeBlock attr s'
>     where (tag,s') = unTag s
>           lit = "sourceCode" `elem` inf && "haskell" `elem` inf
> colouriseCodeBlock _ b = b

And colourising a Pandoc document is simply:

> colourisePandoc prefs (Pandoc m blocks) = 
>     Pandoc m $ map (colouriseCodeBlock prefs) blocks

Transforming a complete input document string to an HTML output string:

> xformDoc :: StylePrefs -> String -> String
> xformDoc prefs s = 
>     showHtml 
>     $ writeHtml writeOpts -- from Pandoc
>     $ colourisePandoc prefs
>     $ readMarkdown parseOpts -- from Pandoc
>     $ fixLineEndings s
>     where writeOpts = defaultWriterOptions {
>               writerLiterateHaskell = True,
>               writerReferenceLinks = True }
>           parseOpts = defaultParserState { 
>               stateLiterateHaskell = True }
>           -- readMarkdown is picky about line endings
>           fixLineEndings [] = []
>           fixLineEndings ('r':'n':cs) = 'n':fixLineEndings cs
>           fixLineEndings (c:cs) = c:fixLineEndings cs

Now that I can transform a document, I need to be able to post the document to my blog. The metaWeblog API defines a newPost and editPost procedures that look like:

metaWeblog.newPost (blogid, username, password, struct, publish) returns string
metaWeblog.editPost (postid, username, password, struct, publish) returns true

For my blog (a WordPress blog), the blogid is just default. The user name and password are simply strings, and publish is a flag indicating whether to load the post as a draft, or to make it public immediately. The postid is an identifier string which is assigned when you initially create a post. The interesting bit is the struct field, which is an XML-RPC structure defining the post along with some meta-data, like the title. All I need is to be able to provide the post text and a title, so I can create the right struct like so:

> mkPost title text = 
>     [("title",title),("description",text)]

The HaXR library exports a function for invoking XML-RPC procedures:

remote :: Remote a => 
    String -- ^ Server URL. May contain username and password on
           --   the format username:password@ before the hostname.
       -> String -- ^ Remote method name.
       -> a      -- ^ Any function 
     -- @(XmlRpcType t1, ..., XmlRpcType tn, XmlRpcType r) => 
                 -- t1 -> ... -> tn -> IO r@

The funtion requires an URL and a method name, and returns a function of type Remote a => a. Based on the instances defined for Remote, any function with zero or more parameters in the class XmlRpcType and a return type of XmlRpcType r => IO r will work, which means you can simply 'feed' remote additional arguments as required by the remote procedure, and as long as you make the call in an IO context, it will typecheck. So to call the metaWeblog.newPost procedure, I can do something like:

> postIt :: String -> String -> String -> String -> String -> String -> Bool -> IO String
> postIt url blogId user password title text publish =
>     remote url "metaWeblog.newPost" blogId user password (mkPost title text) publish

To update (replace) a post, the function would be:

> updateIt :: String -> String -> String -> String -> String -> String -> Bool -> IO Bool
> updateIt url postId user password title text publish =
>     remote url "metaWeblog.editPost" postId user password (mkPost title text) publish

I've got most of the pieces in place -- I just need to turn it into a command line program. I can capture the command line controls in a type:

> data BlogLiterately = BlogLiterately {
>        style :: String,    -- name of a style file
>        publish :: Bool,    -- an indication of whether the post should be
>                            -- published, or loaded as a draft
>        blogid :: String,   -- blog-specific identifier (e.g. for blogging
>                            -- software handling multiple blogs)
>        blog :: String,     -- blog xmlrpc URL
>        user :: String,     -- blog user name
>        password :: String, -- blog password
>        title :: String,    -- post title
>        file :: String,     -- file to post
>        postid :: String    -- id of a post to updated
>     } deriving (Show,Data,Typeable)

And using CmdArgs, this bit of impure evil defines how the command line arguments work:

> bl = mode $ BlogLiterately {
>     style = "" &= text "Style Specification" & typFile,
>     publish = def &= text "Publish post",
>     blogid = "default" &= text "Blog specific identifier",
>     blog = def &= argPos 0 & typ "URL" 
>         & text "URL of blog's xmlrpc address (e.g. http://example.com/blog/xmlrpc.php)",
>     user = def &= argPos 1 & typ "USER" & text "blog author's user name" ,
>     password = def &= argPos 2 & typ "PASSWORD" & text "blog author's password",
>     title = def &= argPos 3 & typ "TITLE",
>     file = def &=  argPos 4 & typ "FILE" & text "literate haskell file",
>     postid = "" &= text "Post to replace (if any)"}

The main blogging function uses the information captured in the BlogLiterately type to read the style preferences, read the input file and transform it, and post it to the blog:

> blogLiterately (BlogLiterately style pub blogid url user pw title file postid) = do
>     prefs <- getStylePrefs style
>     html <- liftM (xformDoc prefs) $ U.readFile file
>     if null postid 
>         then do
>             postid <- postIt url blogid user pw title html pub
>             putStrLn $ "post Id: " ++ postid
>         else do
>             result <- updateIt url postid user pw title html pub
>             unless result $ putStrLn "update failed!"

And the main program is simply:

> main = cmdArgs "Blog Literately v0.1, (C) Robert Greayer 2009" [bl] >>= blogLiterately

I can run it to get some help:

$ ./BlogLiterately --help
Blog Literately v0.1, (C) Robert Greayer 2009

blogliterately [FLAG] URL USER PASSWORD TITLE FILE

  -? --help[=FORMAT]  Show usage information (optional format)
  -V --version        Show version information
  -v --verbose        Higher verbosity
  -q --quiet          Lower verbosity
  -s --style=FILE     Style Specification
     --publish        Publish post
  -b --blogid=VALUE   Blog specific identifier (default=default)
     --postid=VALUE   Post to replace (if any)

Which tells me I can actually upload a post something like:

$ ./BlogLiterately https://greayer.wordpress.com/xmlrpc.php myuser mypass 
    "Blogging Literately in Haskell" BlogLiterately.lhs

This is a great start for what I want. Handling of exceptions is non-existent; I simply cross my fingers and hope that the default error message will be self explanatory. I ought to also allow the author and categories for the document to be specified. But it works as is, and it's a tool I'd actuallly use (and did, to post this).

2009-10-14

Messing with First Class Labels

Filed under: haskell — robgreayer @ 14:48

In my sandbox Haskell project, lslplus, I’ve got massive amounts of complex state which I manage using a few levels of State and Error monad transformers.  The biggest stateful piece of the application is the simulator, which is implemented as a stack of monads:  an ErrorT transformer wrapping StateT transformer that captures the statefulness of the simulation of the simulated ‘world’, which in turn wraps another ErrorT/StateT monad stack for the LSL interpreter.

I’m unsatisfied with the amount of boilerplate involved in creating functions to manipulate the state in the simulator.  The state is complex, with a top-level ‘World’ data type composed of many other records, and collections of records, with ad hoc functions created to support ‘deep’ modifications of this structure.  In some other places in the application I’ve generated state accessor functions using  Template Haskell, but this area of the application was written earlier, and I’ve wanted to try out one of the libraries available on Hackage.  The ones I looked at were:

My intent was to try more than one, but my impression of fclabels was that it really subsumes the functionality of both the data-accessor and lenses packages, so picked that one to try out ‘in anger’.

A first class label identifies a component of  a structure, allowing both access and update. Given a label lbl, a value of type a :-> b (where :-> is an infix type constructor), access and update look something like:

get lbl rec -- get the value of a component of rec, identified by lbl
set lbl newVal rec -- update a component of rec, identified by lbl

The advantage is that first class labels themselves compose in a way that normal field-labels do not. Haskell field labels compose well for deep access to a record structure, but not for deep update of a record structure. E.g. given:

data Person = Person { name :: Name, age :: Double }
data Name = Name { forename :: String, surname :: String }

you can access the surname of a person p with the expression surname . name $ p, but update requires something much more convoluted, e.g. given a surname nm you’d need to do something like: p { name = (name p) { surname = nm }}. Field labels don’t compose for update. With field labels converted to first class labels, though, the expressions become get (surname.name) and set (surname.name) nm p respectively. Nice.  (The composition operator here is from the base module Control.Category.  It’s defined for instances of class Category, including (->) as well as (:->)).

With a slight change to my records —
data Person = Person { _name :: Name, _age :: Double }
data Name = Name { _forename :: String, _surname :: String }

the fclabels package will derive first class labels for me:

$(mkLabels [''Name,''Person])

The following labels are derived:

name :: Person :-> Name
age :: Person :-> Double
surname :: Name :-> String
forename :: Name :-> String

With this virtual dymo label maker, you can build your own labels:

-- labels for each element of a pair
fstl :: (a,b) :-> a
fstl = label fst (\ x (_,y) -> (x,y))

sndl :: (a,b) :-> b
sndl = label snd (\ y (x,_) -> (x,y))

which can come in handy.  But in looking at my own data types, I also run across things that look something like:

data Item = Item { _cost :: Int, _weight :: Double }
data Person = Person { _name :: String, _age :: Int, _possessions :: Map String Item }
data World = World { _day :: Int, _people :: Map String Person }

My deep data structures often contain collections (in this simplified example, a couple of Maps). I do really want to be able to easily compose labels to allow access to particular components of collections in the same (or nearly the same) way as I can access other components of my structures. To compose my way from ‘World’ to, say, a persons age, I need to go from ‘World’ to a collection (Map String Person) to an element of a collection (Person) to a component of a person (age). The missing piece is the label for the element of a collection. The first approach that sprang to mind was:

lm :: Ord k => (k,Map k v) :-> Maybe v

I discarded this because composing this properly would eventually lead to something like:

foo :: (String,World) :-> Maybe Person

which doesn’t work well with fclabel’s ‘getM’ and ‘setM’ functions — these expect the left hand side of the label arrow to be your state value, which in my case, going to be of type ‘World’, rather than (String,World). So I next tried something like:

lm :: Ord k => k -> Map k v :-> Maybe v

The immediate question is what does an expression like set (lm “foo”) Nothing coll, where coll is some Map String v, mean? An interpretation of this label might be:

lm k = label (lookup k) (\ v m -> maybe (delete k m) (\ v' -> insert k v' m) v)

With this definition, set (lm “foo”) Nothing coll actually deletes element “foo” from collection coll. This makes sense, but, once composed further, say with a label for the ‘age’ component of the ‘Person’ type, you’d get a label:

lbl :: String -> Map String Person :-> Maybe Int

With the above definition of lm, set (lbl “foo”) Nothing coll would also delete the underlying Person from the collection (setting the ‘age’ of the person to ‘Nothing’ deletes the person). This doesn’t seem like a desirable semantics. Another issue I found is that while you can create a label with this type ( a :-> Maybe b), you can’t extend this to an arbitrary monad — you can’t construct a label of (say) type Monad m => String :-> m Int because you can’t create a function of type:

forall m . Monad m => m Int -> String -> String

that depends on the value of its first parameter in any way[1]. Much more useful are labels of type Monad m => m a :-> m b, which work ‘forwards and backwards’ for arbitrary monads, and solve the issue of the semantics of my Map label. The Map label becomes something like:

lm :: (MonadPlus m, Ord k) => String -> m (Map k v) :-> m v
lm k = label getter setter where

    getter = lookup k `liftM` coll >>= maybe mzero return
    setter mv mc = mv >>= \ v -> mc >>= \ c -> return $ insert k v c

Or, better, for my purposes (since I’m using an error monad):

lm :: (MonadError e m, Error e, Show k, Ord k) => k -> m (Map k v) :-> m v
lm k = label getter setter where
    getter mm = mm >>= (maybe (throwError $ err k) return) . lookup k
    setter mv mm = mv >>= \ v -> insert k v `liftM` mm
    err k = strMsg $ "key " ++ show k ++ " not found"

Setting with mzero / an error would have no effect on the collection, rather than delete an element. Of course, to compose with other labels, we need to lift the other labels into the monad:

liftML :: Monad m => a :-> b -> m a :-> m b
liftML l = label (liftM $ get l) (liftM2 $ set l)

With this definition of lm, along with liftML I’m should be able to compose my way from ‘World’ all the way to the _weight of an item. With explicit lifts, it looks something like:

worldWeight k1 k2 = liftML weight.lm k2.liftML possesions.lm k1.liftML people

To actually use the lifted labels, the fclabels-provided the ‘getM’, ‘setM/=:’ and ‘modM’ functions don’t work as expected. For example, with a label l of type Monad m => m World :-> m Int, the type of getM l is (Monad m, MonadState (m World) m1) => m1 (m Int). Some alternate versions are needed:

import qualified Control.Monad.State as SM
...


getM :: MonadState s m => m s :-> m b -> m b
getM l = get l SM.get
setM :: MonadState s m => m s :-> m b -> b -> m ()
setM l v = set l (return v) SM.get >>= SM.put
modM :: MonadState s m => m s :-> m b -> (b -> b) -> m ()
modM l f = mod l (liftM f) SM.get >>= SM.put

infixr 7 =:
(=:) = setM

We can get rid of the explicit ‘liftML’s by augmenting the fclabel package’s mkLabels Template Haskell function:

mkLabelsPlus :: [Name] -> Q [Dec]
mkLabelsPlus names = do
    decs <- mkLabels names
    decs2 <- mapM liftDec decs
    return (decs ++ decs2)
where liftDec     (FunD nm _) = funD (mkName (nameBase nm ++ "M"))
            [clause [] (normalB $ (appE (varE 'liftML) (varE nm))) []]

This function will generate the unlifted labels (e.g. ‘person’, ‘weight’) as well as lifted version of those labels, named with the ever-so-creative ‘M’ suffix, (e.g. ‘personM’, ‘weightM’). Since I will mainly use my labels in a monadic context, the extra M’s are annoying, so an alternate TH function renames the ‘unlifted’ labels (with a ‘U’ suffix), and uses the simple names for the monadic labels:

mkLabelsAlt :: [Name] -> Q [Dec]
mkLabelsAlt names = do
    decs <- mkLabels names
    decs2 <- mapM liftDec decs
    return (map change decs ++ decs2)
    where liftDec (FunD nm _) = funD (mkName (nameBase nm))
            [clause [] (normalB $ (appE (varE 'liftML) (varE (mkName $ nameBase nm ++ "U")))) []]
          change (FunD nm x) = FunD (mkName (nameBase nm ++ "U")) x

Now I can easily compose deep access to my state data:


-- set someones age to the price of their hat...
f1 k = do
    hatCost <- getM (cost.lm "hat".possessions.lm k.people)
    (age.lm k.people) =: floor hatCost

-- update or insert a person
f2 k = (lm k.people) =: Person { _name = "Fred", _age = 32, _possessions = M.empty }
-- set k's age to 32
f3 k = (age.lm k.people) =: 32

I can run these in an ErrorT / State monad with a function like:

run :: ErrorT String (SM.State World) a -> World -> (Either String a, World)
run f = (runState . runErrorT) f

a quick test:

*FCLTest> world
World {_timestamp = 0, _people = fromList [(“person1”,Person {_name = “Joe”, _age = 17, _possessions = fromList [(“camera”,Item {_cost = 599.0, _weight = 250.0}),(“ipod”,Item {_cost = 195.0, _weight = 36.4})]}),(“person2”,Person {_name = “Zeke”, _age = 88, _possessions = fromList [(“hat”,Item {_cost = 65.0, _weight = 200.0})]})]}
*FCLTest> run (f1 “person2”) world
(Right (),World {_timestamp = 0, _people = fromList [(“person1”,Person {_name = “Joe”, _age = 17, _possessions = fromList [(“camera”,Item {_cost = 599.0, _weight = 250.0}),(“ipod”,Item {_cost = 195.0, _weight = 36.4})]}),(“person2”,Person {_name = “Zeke”, _age = 65, _possessions = fromList [(“hat”,Item {_cost = 65.0, _weight = 200.0})]})]})

With fclabels, some augmentation, and a small amount of Template Haskell hackery, it looks like I’ll be able to get rid an immense amount of boilerplate.  I’ll post some code-reduction stats once available.

2009-07-08

Added a new blog

Filed under: Uncategorized — robgreayer @ 09:06

I’m involved in another blog, the Harvard Farmers Market blog, dedicated to the farmers market in Harvard and to local food in New England in general.

2009-05-22

About this blog

Filed under: Uncategorized — Tags: — robgreayer @ 21:52

I have content scattered here and there on the web (a blog about x, a website I help out with about y, some project I contribute to about z, and so on), but no central place to organize it.  This blog is intended to be that place.

Blog at WordPress.com.