No tengo la menor idea

2009-11-15

Sorting Type-Heterogeneous Values with Type Families

Filed under: haskell — robgreayer @ 20:55

In an ongoing project, I’ve needed to do a bit of Haskell type-hackery with type functions implemented with type families. One thing I’ve needed is a type-heterogeneous sort — more than a type level sort, but a function that takes a value of one product type, and produces a value of a different, sorted product type. In doing so it needs to both sort the parameter type into a (potentially) different result type, as well as sort the parameter value into a result value (which of course needs to be of the result type). This works out to be a bit more complex than a pure type level sort (such as the quicksort implemented here, using funcitonal dependencies, or the merge sort implmenented here, again using functional dependencies), as Haskell type classes have to be used to select the right value-level functions to use for each step of the sort process. It was an interesting puzzle to solve — at least for me, as type level programming isn’t my strong point! (To make the problem a bit easier, I implemented a simpler, if less efficient, sorting algorithm, but quick sort or merge sort should also work at the type/value level).

Given a type for tagged values, e.g.

    -- n is a phantom, type-level natural
    newtype Tagged n t = Tagged t

And a product type, e.g.

    data a :* b = a :* b
    infixr 6 :*

I want to define tagged values such as:

    tv0 :: Tagged N0 String
    tv0 = Tagged "hello"
    tv1 :: Tagged N1 Int
    tv1 = Tagged 2
    tv2 :: Tagged N2 Bool
    tv2 = Tagged False
    -- etc.

I want to be able to apply a function sort to an arbitrary product of these types:

    x = sort ( tv2 :* tv0 :* tv1 )
    y = sort ( tv1 :* tv2 :* tv0 )

And the result should be the sorted product, i.e. x == y && x == Tagged "hello" :* Tagged 2 :* Tagged False. Note that in the first application of sort, sort needs a type like:

    sort :: (Tagged N2 Bool :* Tagged N0 String :* Tagged N1 Int) ->
        (Tagged N0 String :* Tagged N1 Int :* Tagged N2 Bool)

and in the second, it needs:

    sort :: (Tagged N1 Int :* Tagged N2 Bool :* Tagged N0 String) ->
        (Tagged N0 String :* Tagged N1 Int :* Tagged N2 Bool)

It needs to work for arbitrarily long products of any Tagged type (with the constraint that the index n must be a type-level natural). My solution to the problem follows.

> {-# LANGUAGE EmptyDataDecls, TypeFamilies, UndecidableInstances,
>              ScopedTypeVariables, OverlappingInstances, TypeOperators,
>              FlexibleInstances, NoMonomorphismRestriction #-}
> module TSort where

I started with the usual definition for type level naturals (with some convenient aliases for a few of them):

> data Zero
> data Succ n
> type N0 = Zero
> type N1 = Succ Zero
> type N2 = Succ N1
> type N3 = Succ N2
> type N4 = Succ N3
> type N5 = Succ N4
> type N6 = Succ N5
> type N7 = Succ N6
> type N8 = Succ N7
> type N9 = Succ N8
> type N10 = Succ N9

I also use type level booleans:

> data TRUE
> data FALSE

I need one inequality operator on naturals:

> type family LessThan m n
> type instance LessThan Zero Zero = FALSE
> type instance LessThan (Succ n) Zero = FALSE
> type instance LessThan Zero (Succ n) = TRUE
> type instance LessThan (Succ m) (Succ n) = LessThan m n

I need a type level if statement:

> type family Cond c t f
> type instance Cond TRUE t f = t
> type instance Cond FALSE t f = f

I need my tagged and product types:

> newtype Tagged n a = Tagged a deriving (Show,Eq)
> data a :* b = a :* b deriving (Show,Eq)
> infixr 6 :*

The general type of my sort is going to be something like (but not exactly):

    sort :: (Sortable a) => a -> Sorted a

where Sortable is a type class and Sorted is a associated type of that type class. It didn’t work out to be quite that simple, but this is how I started thinking about the problem.

I envisioned the Sortable class as:

    class Sortable a where
        type Sorted a
        sort :: a -> Sorted a

Trying to create a simple instance of this type, for a pair of tagged values got me a bit bogged down:

    instance Sortable (Tagged m a :* Tagged n b) where
        type Sorted (Tagged m a :* Tagged n b) = ???
        sort = ???

The type of Sorted for this pair is conditional based on the value of the indices m and n, so I can write the type function like this:

    type Sorted (Tagged m a :* Tagged n b) =
        Cond (LessThan n m) -- if n < m
            (Tagged n b :* Tagged m a) -- then swap
            (Tagged m a :* Tagged n b) -- else don't

but how to write the sort function? It can’t test the type and decide whether to swap the values. The key insight for solving this problem was that I needed some additional types to represent the operations I needed. For this simple reordering of a pair, I need two operations – Swap, to swap the elements of the pair if they are out of order, and Id (identity), to leave them alone if they are already in order:

    data Swap x y = Swap x y
    data Id a = Id a

Each of these can be seen as implementing a single step in a sorting algorithm). Each will be an instance of a Sorter class:

> class Sorter a where
>     type Sorted a
>     type Unsorted a

mkSort ‘makes’ a sort function from (Unsorted a -> Sorted a) based on the type of its first argument, which is a proxy argument (its value is never examined).

>     mkSort :: a -> Unsorted a -> Sorted a

The Swap instance of Sorter just swaps the values in a pair:

> instance Sorter (Swap x y) where
>     type Sorted (Swap x y) = y :* x
>     type Unsorted (Swap x y) = x :* y
>     mkSort _ (x :* y) = y :* x

The Id instance of Sorter leaves its value alone:

> instance Sorter (Id a) where
>     type Sorted (Id a) = a
>     type Unsorted (Id a) = a
>     mkSort _ v = v

Now my Sortable class, instead of having a sort function, needs instead a function to produce a sorter value:

> class Sortable a where
>     type SorterType a
>     sorter :: a -> SorterType a

And the sort can be written in terms of functions from the two classes:

> sort v = mkSort (sorter v) v

But still, how to implement the sorter function for my pair of tagged elements?

> instance Sortable (Tagged m a :* Tagged n b) where

The type function for the SorterType is now:

>     type SorterType (Tagged m a :* Tagged n b) =
>         Cond (LessThan n m)
>             (Swap (Tagged m a) (Tagged n b))
>             (Id (Tagged m a :* Tagged n b))
>     sorter = undefined -- ???

As it turns out, the above implementation is sufficient. Since the mkSort function doesn’t examine its first argument — it’s merely a proxy — the sorter function doesn’t need to produce a value. So sorter can be left undefined — all its ‘work’ is done at compile time, in producing the right type such that the instance of ‘mkSort’ can be inferred (I have to admit, I discovered this accidentally while writing the sorting function. I left sorter undefined because I couldn’t think of how to define it, then wrote sort in terms of it, and accidentally ran it. It worked, my mind boggled, and then I figured out what what going on). Because no concrete value of either the Id or Swap types are ever needed, the actual definitions for these can be:

> data Id a
> data Swap a b

To sort something more than a pair, I need to be able to do more than just swap two types and values. A simple algorithm for sorting the product is to sort the ‘tail’ of the product, then insert the head of the product into the sorted tail (this is insertion sort — not the most efficient, but simpler than merge sort or quick sort). For this I need two more operations, a sort-then-insert operation, and an insert operation (the sort-then-insert operation sorts its ‘tail’ and then (if necessary) uses the insert operation to insert its head into the sorted tail). I need these operation types:

> data SortInsert h t
> data Insert x y

I need a new class similar to Sortable, which has a InserterType and a function to produce a InserterType from the instance type:

> class Insertable a where
>     type InserterType a
>     inserter :: a -> InserterType a

A insertable assumes that its tail is sorted. For a product of tagged values (more than a pair), the Insertable instance looks like this:

> instance Insertable (Tagged m a :* Tagged n b :* c) where

The InserterType depends on whether the head element is less than the first element of the sorted tail. If it is, then the InserterType is the identity, otherwise it is an Insert x y:

>     type InserterType (Tagged m a :* Tagged n b :* c) =
>              Cond (LessThan n m)
>                  (Insert (Tagged m a) (Tagged n b :* c))
>                  (Id (Tagged m a :* Tagged n b :* c))

As in the sorter function of a Sortable, we don’t need the inserter to do anything more than create a proxy argument:

>     inserter = undefined

Inserting two elements is the same as sorting them:

> instance Insertable (Tagged m a :* Tagged n b) where
>     type InserterType (Tagged m a :* Tagged n b) =
>         Cond (LessThan n m) (Swap (Tagged m a) (Tagged n b))
>             (Id (Tagged m a :* Tagged n b))
>     inserter = undefined

A Inserter is similar to a sorter: it takes an value of an Uninserted type and produces a value of an Inserted type:

> class Inserter a where
>     type Inserted a
>     type Uninserted a

mkInsert ‘makes’ a function (Uninserted a -> Inserted a) from the instance type. The first argument again is just a proxy, it’s value never inspected:

>     mkInsert :: a -> Uninserted a -> Inserted a

The identity instance is straightforward:

> instance Inserter (Id a) where
>     type Inserted (Id a) = a
>     type Uninserted (Id a) = a
>     mkInsert _ v = v

The Swap instance of Inserter is essentially the same as the Swap instance of Sorter:

> instance Inserter (Swap a b) where
>    type Inserted (Swap a b) = b :* a
>    type Uninserted (Swap a b) = a :* b
>    mkInsert _ (a :* b) = b :* a

The Insert instance for a longer product has a rather convoluted context:

> instance (Insertable (a :* c)
>          , Inserter (InserterType (a :* c))
>          ,(a :* c) ~ Uninserted (InserterType (a :* c))
>          ) => Inserter (Insert a (b :* c)) where

The Inserted type is the first element of the tail followed by a inserted into the tail (via sort):

>     type Inserted (Insert a (b :* c)) = b :* Inserted (InserterType (a :* c))

The uninserted type is simply the original (uninserted) product:

>     type Uninserted (Insert a (b :* c)) = a :* b :* c

And the insert function effectively swaps the first two elements of the product and inserts the (new) tail:

>     mkInsert _ (x :* y :* z) = y :* mkInsert (inserter $ x :* z) (x :* z)

The Sorter instance for a SortInsert that we need again has a convoluted context:

> instance (Sortable (b :* c)
>           ,Unsorted (SorterType (b :* c)) ~ (b :* c)
>           ,Uninserted (InserterType (a :* Sorted (SorterType (b :* c)))) ~
>                (a :* Sorted (SorterType (b :* c)))
>           ,Sorter (SorterType (b :* c))
>           ,Insertable (a :* Sorted (SorterType (b :* c)))
>           ,Inserter (InserterType (a :* Sorted (SorterType (b :* c))))
>           ) =>
>         Sorter (SortInsert a (b :* c)) where

The type of the output of the sort is the type of the output of the head of the product inserted with the sorted tail:

>     type Sorted (SortInsert a (b :* c)) =
>         Inserted (InserterType (a :* (Sorted (SorterType (b :* c)))))

The type of the input of the sort is the original (unsorted, uninserted) product:

>     type Unsorted (SortInsert a (b :* c)) = a :* b :* c

And the sort function is just the insert of the head into the tail after the tail is sorted:

>     mkSort _ (a :* b :* c) = mkInsert (inserter preinsert) preinsert
>         where preinsert = a :* sort (b :* c)

Finally, the Sortable instance for an arbitrary product is:

> instance (Sortable (b :* c), Sorter (SortInsert a (b :* c))) =>
>     Sortable (a :* b :* c) where
>     type SorterType (a :* b :* c) = SortInsert a (b :* c)
>     sorter = undefined

Given some tagged values, I can try out the sort function:

> t0 :: Tagged N0 String
> t0 = Tagged "hello"
> t1 :: Tagged N1 Bool
> t1 = Tagged True
> t2 :: Tagged N2 Int
> t2 = Tagged 5
> t3 :: Tagged N3 String
> t3 = Tagged "goodbye"
   *TSort1> sort (t2 :* t0 :* t1 :* t3)
   Tagged "hello" :* (Tagged True :* (Tagged 5 :* Tagged "goodbye"))
   *TSort1> sort (t3 :* t2 :* t1 :* t0)
   Tagged "hello" :* (Tagged True :* (Tagged 5 :* Tagged "goodbye"))
   *TSort1>

Beyond being an interesting puzzle to solve, there’s at least some use for such a function. For example, in a DSL that has operations that support named parameter association:

> rawConnect :: (Tagged N0 String :* Tagged N1 Int :* Tagged N2 String :*
>      Tagged N3 String) -> IO ()
> rawConnect (Tagged host :* Tagged port :* Tagged user :* Tagged pass) =
>     putStrLn $ "connecting to " ++ host ++ " on port " ++ show port ++
>        " with username " ++ user ++ " and password " ++ pass
>
> connect = rawConnect . sort
>
> (=:) = ($)
> host :: String -> Tagged N0 String
> host = Tagged
> port :: Int -> Tagged N1 Int
> port = Tagged
> user :: String -> Tagged N2 String
> user = Tagged
> password :: String -> Tagged N3 String
> password = Tagged

Now you can invoke the ‘connect’ operation with parameters in no particular order:

> doit = do
>    putStrLn "trying to connect!"
>    connect ( user =: "admin" :*
>              password =: "pa55w0rd" :*
>              port =: 22 :*
>              host =: "example.com" )
>    putStrLn "connected!"

… which is at least mildly cool, for certain definitions of cool.

Brought to you by BlogLiterately with generous support from pandoc, hscolour and viewers like you.

2009-11-03

BlogLiterately v0.2

Filed under: haskell — robgreayer @ 10:04

I’ve created a new version of BlogLiterately, now on hackage. The new version is quite similar to what I blogged about here, with some enhancements:

  • it now supports categories.
  • it now supports highlighting-kate, if you’ve installed Pandoc with highlighting support.
  • you can highlight Haskell with hscolour using CSS classes or inline styles.
  • you can highlight Haskell and non-Haskell with highlighting-kate (via Pandoc). (But only via CSS classes and a separate or embedded stylesheet, not via inline styles. So it does me personally no good!).

I contemplate a future upgrade will allow highlighting-kate marked up code to have the CSS class-based styling ‘baked’ into the HTML in a way similar to how I handle hscolour.

Anyone who enjoys this tool, thank the authors of Pandoc, hscolour, HaXml, HaXR and, of course, GHC and the Haskell Platform, as this program is just a bit of glue that binds these much bigger pieces together for one simple application.

2009-10-26

Blogging Literately in Haskell

Filed under: haskell — robgreayer @ 22:05

I’ve made one blog post about Haskell, and found the processs to be too tedious to be tolerated. The wysiwyg tinyMCE editor that is avalaible for WordPress is useless for formatting code segments and editing directly in the raw HTML editor WordPress provides is agonizing. Figuring there must be a better way, I looked around for an offline tool that would just work for me. I turned up a few editors, free and non-free, some which looked just horrid, a few almost reasonable, but none that would make the mix of Haskell and narrative as easy as it seems it ought to be. What I really want is an easy way to take Literate Haskell (you know, those seemingly innocuous text messages that are suddenly interrupted with ominous passages like:

> {-# LANGUAGE DeriveDataTypeable #-}
> module BlogLiterately where

for no immediately apparent reason) file, perhaps with simple markdown encoded text commentary, and just magically upload it to my blog, with good HTML formatting of the markdown and Haskell sections.

Haskell, theoretically, has all sorts of libraries for doing all sorts of wonderful things. Perhaps it wouldn’t be to hard to mash something together from Haskell libraries that did precisely what I wanted.

Somehow I’d osmotically absorbed from the ether the knowlege that there was a Haskell-based tool around called Pandoc, by John MacFarlane, which deals with markdown, HTML and other formats. It is, like all good Haskelly things, available on hackage, both as a program and library. So it seemed like a good place to start.

> import Text.Pandoc

Looking over the documentation for Pandoc, I observed that although was it able to do wonderful things with loads of formats, it didn’t automatically do what I wanted with the actual Haskell source code. What I want is to make the source look like what Haskell code looks like as part of haddock documentation, especially the Haddock documentation available on Hackage. Haddock uses hscolour, a tool by Malcolm Wallace, to format source listings.

> import Language.Haskell.HsColour(hscolour,Output(..))
> import Language.Haskell.HsColour.Colourise(defaultColourPrefs)

Assuming Pandoc and hscolour do what I want, I’d still need some way of actually publishing the blog post. With a bit of googliness I learned that my blog software supports something called the MetaWeblog API, which is an XML-RPC-based protocol for interacting with blogs.

There’s a Haskell XML-RPC library, HaXR, by Bjorn Bringert, (on hackage) which seems like it should be appropriate.

> import Network.XmlRpc.Client
> import Network.XmlRpc.Internals

And it works that out I’ll need some miscellaneous other stuff. Since I’m writing a command line tool, I’ll need to process the command line arguments, and Neil Mitchell’s CmdArgs library ought to work for that:

> import System.Console.CmdArgs

I’m going to end up needing to parse and manipulate XHTML, so I’ll use Malcolm Wallace’s HaXml XML combinators:

> import Text.XML.HaXml
> import Text.XML.HaXml.Verbatim

I’ll need to do some text IO, which I’ll use the UTF8 encoding for, leading to Eric Mertens’ ubiquitous (until GHC 6.12!) utf8-string library:

> import qualified System.IO.UTF8 as U

And I’ll need a couple other bits and pieces:

> import Control.Monad(liftM,unless)
> import Text.XHtml.Transitional(showHtml)
> import Text.ParserCombinators.Parsec

The program I envision will read in a literate Haskell file, use Pandoc to parse it as markdown, then somehow find the code blocks in the (parsed) input, and use hscolour to transform those. Pandoc turns its input into a structure of type:

data Pandoc = Pandoc Meta [Block]

where a Block (the interesting bit, for my purposes) looks like:

-- | Block element.
data Block
    = Plain [Inline]        -- ^ Plain text, not a paragraph
    | Para [Inline]         -- ^ Paragraph
    | CodeBlock Attr String -- ^ Code block (literal) with attributes 
    | RawHtml String        -- ^ Raw HTML block (literal)
    | BlockQuote [Block]    -- ^ Block quote (list of blocks)
    | OrderedList ListAttributes [[Block]] -- ^ Ordered list (attributes
                            -- and a list of items, each a list of blocks)
    | BulletList [[Block]]  -- ^ Bullet list (list of items, each
                            -- a list of blocks)
    | DefinitionList [([Inline],[Block])]  -- ^ Definition list 
                            -- (list of items, each a pair of an inline list,
                            -- the term, and a block list)
    | Header Int [Inline]   -- ^ Header - level (integer) and text (inlines) 
    | HorizontalRule        -- ^ Horizontal rule
    | Table [Inline] [Alignment] [Double] [[Block]] [[[Block]]]  -- ^ Table,
                            -- with caption, column alignments,
                            -- relative column widths, column headers
                            -- (each a list of blocks), and rows
                            -- (each a list of lists of blocks)
    | Null                  -- ^ Nothing
    deriving (Eq, Read, Show, Typeable, Data)

The literate Haskell that Pandoc finds in a file ends up in various CodeBlock elements of the Pandoc document. Other code can also wind up in CodeBlock elements — normal markdown formatted code. The literate Haskell code seems to be differentiated from other by the Attr component, which has the form:

type Attr = (String, [String], [(String, String)])

Experimentation reveals that CodeBlock elements that have an Attr of the form (_,["sourceCode","haskell"],_) are literate Haskell code blocks, and other CodeBlock elements are the markdown code blocks. I want to syntax-highlight both kinds of code blocks, but when both get rendered to the output HTML, I want to preserve the literate Haskell as literate (it needs the prepended > character). Actually, I want to do just a little bit more…

When writing a Literate Haskell File, I might want to include non-Haskell but nevertheless ‘code’ examples in the text. Samples of how one might express the same thing in ML, or examples of how to run a program, etc. So not all code blocks should be colourised by hscolour, just specific Haskelly ones. Markdown doesn’t provide a way to express what kind of code a code block might be, but since I’m munging all the code blocks anyway, I can adopt a simple convention. If a code example looks like:


    [Haskell]
    foo :: String -> String

it is a Haskell block. If it looks like something else, e.g.


    [C++]
    cout << "Hello World!";

or


    [other]
    foo bar baz

etc., it is something else.

ed: It turns out that although Markdown doesn't provide a way to do this, Pandoc's extensions to Markdown do allow it, in a way fairly similar to what I've proposed. If I revise this further, I'll use the Pandoc convention. Thanks to John MacFarlane for pointing this out in the comments.

I can strip off the code type indicator from the beginning of each block as I examine them for code to colourise. I use Parsec to recognize the opening tag:

> unTag :: String -> (String, String)
> unTag s = either (const ("",s)) id $ parse tag "" s
>    where tag = do
>              tg <- between (char '[') (char ']') $ many $ noneOf "[]"
>              skipMany $ oneOf " t"
>              (string "rn" <|> string "n")
>              txt <- many $ anyToken
>              eof
>              return (tg,txt)

To highlight the syntax using hscolour (which produces HTML), I'm going to need to transform the String from a CodeBlock element to a String suitable for the RawHtml element (because the hscolour library transforms Haskell text to HTML). Pandoc strips off the prepended > characters from the literate Haskell, so I need to put them back, and also tell hscolour whether the source it is colouring is literate or not. The hscolour function looks like:

hscolour :: Output      -- ^ Output format.
         -> ColourPrefs -- ^ Colour preferences (for formats that support them).
         -> Bool        -- ^ Whether to include anchors.
         -> Bool        -- ^ Whether output document is partial or complete.
         -> String	-- ^ Title for output.
         -> Bool        -- ^ Whether input document is literate haskell or not
         -> String      -- ^ Haskell source code.
         -> String      -- ^ Coloured Haskell source code.

Hscolour supports a few different HTML-based Output formats: HTML, CSS, and ICSS. HTML is HTML with font tags, which are inherently evil and so I'll won't bother exploring this option. CSS is formatting using HTML class tags, which allows you the flexibility of using a separate stylesheet to control how hscolour-annotated code is style. ICSS is 'inline-CSS' a.k.a. HTML 'style' tags, which is really what I need for my WordPress blog, as I refuse to the pay $15 a year WordPress.com charge for the privilege of storing a stylesheet on their site. Unfortunately, hscolour's inline styling options are quite limited (very few colours, little control of fonts), but I've come up with a slightly more convoluted way of turning CSS coloured source into inline-css HTML. So I will start off by using hscolour in CSS mode to transform my source:

> colourIt literate srcTxt =
>     hscolour CSS defaultColourPrefs False True "" literate src'
>     where src' | literate = prepend srcTxt
>                | otherwise = srcTxt

Prepending the literate Haskell markers on the source is trivial:

> prepend s = unlines $ map p $ lines s where p s = '>':' ':s

Hscolour uses HTML span elements and CSS classes like 'hs-keyword' or hs-keyglyph to markup Haskell code. What I want to do is take each marked span element and replace the class attribute with an inline style element that has the markup I want for that kind of source. I can capture the style posibilities with a type:

> data StylePrefs = StylePrefs {
>     keyword  :: String,
>     keyglyph :: String,
>     layout   :: String,
>     comment  :: String,
>     conid    :: String,
>     varid    :: String,
>     conop    :: String,
>     varop    :: String,
>     str      :: String,
>     chr      :: String,
>     number   :: String,
>     cpp      :: String,
>     selection :: String,
>     variantselection :: String,
>     definition :: String
>     } deriving (Read,Show)

Each field in the above type will contain the desired style for the class hscolour assigns to a span of code. A default style that produces something like what the source listings on Hackage look like is:

> defaultStylePrefs = StylePrefs {
>     keyword  = "color: blue; font-weight: bold;"
>   , keyglyph = "color: red;"
>   , layout   = "color: red;"
>   , comment  = "color: green;"
>   , conid    = ""
>   , varid    = ""
>   , conop    = ""
>   , varop    = ""
>   , str      = "color: teal;"
>   , chr      = "color: teal;"
>   , number   = ""
>   , cpp      = ""
>   , selection = ""
>   , variantselection = ""
>   , definition = ""
>   }

I can read these preferences in from a file using the Read instance for StylePrefs. I could handle errors better, but this should work:

> getStylePrefs "" = return defaultStylePrefs
> getStylePrefs fname = liftM read (U.readFile fname)

Hscolour produces a String of HTML. To transform it, we need to parse it, manipulate it and then re-render it as a String. I'll use HaXml to do all of this:

> xformXml :: StylePrefs -> String -> String
> xformXml prefs s =  verbatim $ filtDoc (xmlParse "input" s) where
>     -- filter the document (an Hscoloured fragment of Haskell source)
>     filtDoc (Document p s e m) =  c where
>         [c] = filts (CElem e)
>     -- the filter is a fold of individual filters for each CSS class
>     filts = foldXml $ foldl o keep [
>             filt "keyword" keyword,
>             filt "keyglyph" keyglyph,
>             filt "layout" layout,
>             filt "comment" comment,
>             filt "conid" conid,
>             filt "varid" varid,
>             filt "conop" conop,
>             filt "varop" varop,
>             filt "str" str,
>             filt "chr" chr,
>             filt "num" number,
>             filt "cpp" cpp,
>             filt "sel" selection,
>             filt "variantselection" variantselection,
>             filt "definition" definition
>         ]
>     -- an individual filter replaces the attributes of a tag with
>     -- a style attribute when it has a specific 'class' attribute.
>     filt lbl f =
>         replaceAttrs [("style",f prefs)] `when`
>             (attrval $ ("class",AttValue [Left ("hs-" ++ lbl)]))

To completely colourise a CodeBlock we now can create a function that transforms a CodeBlock into a RawHtml block, where the content contains marked up Haskell (possibly with literate markers):

> colouriseCodeBlock prefs (CodeBlock attr@(_,inf,_) s) =
>     if tag == "Haskell" || lit
>         then RawHtml $ xformXml prefs $ colourIt lit s'
>         else CodeBlock attr s'
>     where (tag,s') = unTag s
>           lit = "sourceCode" `elem` inf && "haskell" `elem` inf
> colouriseCodeBlock _ b = b

And colourising a Pandoc document is simply:

> colourisePandoc prefs (Pandoc m blocks) =
>     Pandoc m $ map (colouriseCodeBlock prefs) blocks

Transforming a complete input document string to an HTML output string:

> xformDoc :: StylePrefs -> String -> String
> xformDoc prefs s =
>     showHtml
>     $ writeHtml writeOpts -- from Pandoc
>     $ colourisePandoc prefs
>     $ readMarkdown parseOpts -- from Pandoc
>     $ fixLineEndings s
>     where writeOpts = defaultWriterOptions {
>               writerLiterateHaskell = True,
>               writerReferenceLinks = True }
>           parseOpts = defaultParserState {
>               stateLiterateHaskell = True }
>           -- readMarkdown is picky about line endings
>           fixLineEndings [] = []
>           fixLineEndings ('r':'n':cs) = 'n':fixLineEndings cs
>           fixLineEndings (c:cs) = c:fixLineEndings cs

Now that I can transform a document, I need to be able to post the document to my blog. The metaWeblog API defines a newPost and editPost procedures that look like:

metaWeblog.newPost (blogid, username, password, struct, publish) returns string
metaWeblog.editPost (postid, username, password, struct, publish) returns true

For my blog (a WordPress blog), the blogid is just default. The user name and password are simply strings, and publish is a flag indicating whether to load the post as a draft, or to make it public immediately. The postid is an identifier string which is assigned when you initially create a post. The interesting bit is the struct field, which is an XML-RPC structure defining the post along with some meta-data, like the title. All I need is to be able to provide the post text and a title, so I can create the right struct like so:

> mkPost title text =
>     [("title",title),("description",text)]

The HaXR library exports a function for invoking XML-RPC procedures:

remote :: Remote a =>
    String -- ^ Server URL. May contain username and password on
           --   the format username:password@ before the hostname.
       -> String -- ^ Remote method name.
       -> a      -- ^ Any function 
     -- @(XmlRpcType t1, ..., XmlRpcType tn, XmlRpcType r) => 
                 -- t1 -> ... -> tn -> IO r@

The funtion requires an URL and a method name, and returns a function of type Remote a => a. Based on the instances defined for Remote, any function with zero or more parameters in the class XmlRpcType and a return type of XmlRpcType r => IO r will work, which means you can simply 'feed' remote additional arguments as required by the remote procedure, and as long as you make the call in an IO context, it will typecheck. So to call the metaWeblog.newPost procedure, I can do something like:

> postIt :: String -> String -> String -> String -> String -> String -> Bool -> IO String
> postIt url blogId user password title text publish =
>     remote url "metaWeblog.newPost" blogId user password (mkPost title text) publish

To update (replace) a post, the function would be:

> updateIt :: String -> String -> String -> String -> String -> String -> Bool -> IO Bool
> updateIt url postId user password title text publish =
>     remote url "metaWeblog.editPost" postId user password (mkPost title text) publish

I've got most of the pieces in place -- I just need to turn it into a command line program. I can capture the command line controls in a type:

> data BlogLiterately = BlogLiterately {
>        style :: String,    -- name of a style file
>        publish :: Bool,    -- an indication of whether the post should be
>                            -- published, or loaded as a draft
>        blogid :: String,   -- blog-specific identifier (e.g. for blogging
>                            -- software handling multiple blogs)
>        blog :: String,     -- blog xmlrpc URL
>        user :: String,     -- blog user name
>        password :: String, -- blog password
>        title :: String,    -- post title
>        file :: String,     -- file to post
>        postid :: String    -- id of a post to updated
>     } deriving (Show,Data,Typeable)

And using CmdArgs, this bit of impure evil defines how the command line arguments work:

> bl = mode $ BlogLiterately {
>     style = "" &= text "Style Specification" & typFile,
>     publish = def &= text "Publish post",
>     blogid = "default" &= text "Blog specific identifier",
>     blog = def &= argPos 0 & typ "URL"
>         & text "URL of blog's xmlrpc address (e.g. http://example.com/blog/xmlrpc.php)",
>     user = def &= argPos 1 & typ "USER" & text "blog author's user name" ,
>     password = def &= argPos 2 & typ "PASSWORD" & text "blog author's password",
>     title = def &= argPos 3 & typ "TITLE",
>     file = def &=  argPos 4 & typ "FILE" & text "literate haskell file",
>     postid = "" &= text "Post to replace (if any)"}

The main blogging function uses the information captured in the BlogLiterately type to read the style preferences, read the input file and transform it, and post it to the blog:

> blogLiterately (BlogLiterately style pub blogid url user pw title file postid) = do
>     prefs <- getStylePrefs style
>     html <- liftM (xformDoc prefs) $ U.readFile file
>     if null postid
>         then do
>             postid <- postIt url blogid user pw title html pub
>             putStrLn $ "post Id: " ++ postid
>         else do
>             result <- updateIt url postid user pw title html pub
>             unless result $ putStrLn "update failed!"

And the main program is simply:

> main = cmdArgs "Blog Literately v0.1, (C) Robert Greayer 2009" [bl] >>= blogLiterately

I can run it to get some help:

$ ./BlogLiterately --help
Blog Literately v0.1, (C) Robert Greayer 2009

blogliterately [FLAG] URL USER PASSWORD TITLE FILE

  -? --help[=FORMAT]  Show usage information (optional format)
  -V --version        Show version information
  -v --verbose        Higher verbosity
  -q --quiet          Lower verbosity
  -s --style=FILE     Style Specification
     --publish        Publish post
  -b --blogid=VALUE   Blog specific identifier (default=default)
     --postid=VALUE   Post to replace (if any)

Which tells me I can actually upload a post something like:

$ ./BlogLiterately http://greayer.wordpress.com/xmlrpc.php myuser mypass
    "Blogging Literately in Haskell" BlogLiterately.lhs

This is a great start for what I want. Handling of exceptions is non-existent; I simply cross my fingers and hope that the default error message will be self explanatory. I ought to also allow the author and categories for the document to be specified. But it works as is, and it's a tool I'd actuallly use (and did, to post this).

2009-10-14

Messing with First Class Labels

Filed under: haskell — robgreayer @ 14:48

In my sandbox Haskell project, lslplus, I’ve got massive amounts of complex state which I manage using a few levels of State and Error monad transformers.  The biggest stateful piece of the application is the simulator, which is implemented as a stack of monads:  an ErrorT transformer wrapping StateT transformer that captures the statefulness of the simulation of the simulated ‘world’, which in turn wraps another ErrorT/StateT monad stack for the LSL interpreter.

I’m unsatisfied with the amount of boilerplate involved in creating functions to manipulate the state in the simulator.  The state is complex, with a top-level ‘World’ data type composed of many other records, and collections of records, with ad hoc functions created to support ‘deep’ modifications of this structure.  In some other places in the application I’ve generated state accessor functions using  Template Haskell, but this area of the application was written earlier, and I’ve wanted to try out one of the libraries available on Hackage.  The ones I looked at were:

My intent was to try more than one, but my impression of fclabels was that it really subsumes the functionality of both the data-accessor and lenses packages, so picked that one to try out ‘in anger’.

A first class label identifies a component of  a structure, allowing both access and update. Given a label lbl, a value of type a :-> b (where :-> is an infix type constructor), access and update look something like:

get lbl rec -- get the value of a component of rec, identified by lbl
set lbl newVal rec -- update a component of rec, identified by lbl

The advantage is that first class labels themselves compose in a way that normal field-labels do not. Haskell field labels compose well for deep access to a record structure, but not for deep update of a record structure. E.g. given:

data Person = Person { name :: Name, age :: Double }
data Name = Name { forename :: String, surname :: String }

you can access the surname of a person p with the expression surname . name $ p, but update requires something much more convoluted, e.g. given a surname nm you’d need to do something like: p { name = (name p) { surname = nm }}. Field labels don’t compose for update. With field labels converted to first class labels, though, the expressions become get (surname.name) and set (surname.name) nm p respectively. Nice.  (The composition operator here is from the base module Control.Category.  It’s defined for instances of class Category, including (->) as well as (:->)).

With a slight change to my records —
data Person = Person { _name :: Name, _age :: Double }
data Name = Name { _forename :: String, _surname :: String }

the fclabels package will derive first class labels for me:

$(mkLabels [''Name,''Person])

The following labels are derived:

name :: Person :-> Name
age :: Person :-> Double
surname :: Name :-> String
forename :: Name :-> String

With this virtual dymo label maker, you can build your own labels:

-- labels for each element of a pair
fstl :: (a,b) :-> a
fstl = label fst (\ x (_,y) -> (x,y))

sndl :: (a,b) :-> b
sndl = label snd (\ y (x,_) -> (x,y))

which can come in handy.  But in looking at my own data types, I also run across things that look something like:

data Item = Item { _cost :: Int, _weight :: Double }
data Person = Person { _name :: String, _age :: Int, _possessions :: Map String Item }
data World = World { _day :: Int, _people :: Map String Person }

My deep data structures often contain collections (in this simplified example, a couple of Maps). I do really want to be able to easily compose labels to allow access to particular components of collections in the same (or nearly the same) way as I can access other components of my structures. To compose my way from ‘World’ to, say, a persons age, I need to go from ‘World’ to a collection (Map String Person) to an element of a collection (Person) to a component of a person (age). The missing piece is the label for the element of a collection. The first approach that sprang to mind was:

lm :: Ord k => (k,Map k v) :-> Maybe v

I discarded this because composing this properly would eventually lead to something like:

foo :: (String,World) :-> Maybe Person

which doesn’t work well with fclabel’s ‘getM’ and ’setM’ functions — these expect the left hand side of the label arrow to be your state value, which in my case, going to be of type ‘World’, rather than (String,World). So I next tried something like:

lm :: Ord k => k -> Map k v :-> Maybe v

The immediate question is what does an expression like set (lm “foo”) Nothing coll, where coll is some Map String v, mean? An interpretation of this label might be:

lm k = label (lookup k) (\ v m -> maybe (delete k m) (\ v' -> insert k v' m) v)

With this definition, set (lm “foo”) Nothing coll actually deletes element “foo” from collection coll. This makes sense, but, once composed further, say with a label for the ‘age’ component of the ‘Person’ type, you’d get a label:

lbl :: String -> Map String Person :-> Maybe Int

With the above definition of lm, set (lbl “foo”) Nothing coll would also delete the underlying Person from the collection (setting the ‘age’ of the person to ‘Nothing’ deletes the person). This doesn’t seem like a desirable semantics. Another issue I found is that while you can create a label with this type ( a :-> Maybe b), you can’t extend this to an arbitrary monad — you can’t construct a label of (say) type Monad m => String :-> m Int because you can’t create a function of type:

forall m . Monad m => m Int -> String -> String

that depends on the value of its first parameter in any way[1]. Much more useful are labels of type Monad m => m a :-> m b, which work ‘forwards and backwards’ for arbitrary monads, and solve the issue of the semantics of my Map label. The Map label becomes something like:

lm :: (MonadPlus m, Ord k) => String -> m (Map k v) :-> m v
lm k = label getter setter where

    getter = lookup k `liftM` coll >>= maybe mzero return
    setter mv mc = mv >>= \ v -> mc >>= \ c -> return $ insert k v c

Or, better, for my purposes (since I’m using an error monad):

lm :: (MonadError e m, Error e, Show k, Ord k) => k -> m (Map k v) :-> m v
lm k = label getter setter where
    getter mm = mm >>= (maybe (throwError $ err k) return) . lookup k
    setter mv mm = mv >>= \ v -> insert k v `liftM` mm
    err k = strMsg $ "key " ++ show k ++ " not found"

Setting with mzero / an error would have no effect on the collection, rather than delete an element. Of course, to compose with other labels, we need to lift the other labels into the monad:

liftML :: Monad m => a :-> b -> m a :-> m b
liftML l = label (liftM $ get l) (liftM2 $ set l)

With this definition of lm, along with liftML I’m should be able to compose my way from ‘World’ all the way to the _weight of an item. With explicit lifts, it looks something like:

worldWeight k1 k2 = liftML weight.lm k2.liftML possesions.lm k1.liftML people

To actually use the lifted labels, the fclabels-provided the ‘getM’, ’setM/=:’ and ‘modM’ functions don’t work as expected. For example, with a label l of type Monad m => m World :-> m Int, the type of getM l is (Monad m, MonadState (m World) m1) => m1 (m Int). Some alternate versions are needed:

import qualified Control.Monad.State as SM
...


getM :: MonadState s m => m s :-> m b -> m b
getM l = get l SM.get
setM :: MonadState s m => m s :-> m b -> b -> m ()
setM l v = set l (return v) SM.get >>= SM.put
modM :: MonadState s m => m s :-> m b -> (b -> b) -> m ()
modM l f = mod l (liftM f) SM.get >>= SM.put

infixr 7 =:
(=:) = setM

We can get rid of the explicit ‘liftML’s by augmenting the fclabel package’s mkLabels Template Haskell function:

mkLabelsPlus :: [Name] -> Q [Dec]
mkLabelsPlus names = do
    decs <- mkLabels names
    decs2 <- mapM liftDec decs
    return (decs ++ decs2)
where liftDec     (FunD nm _) = funD (mkName (nameBase nm ++ "M"))
            [clause [] (normalB $ (appE (varE 'liftML) (varE nm))) []]

This function will generate the unlifted labels (e.g. ‘person’, ‘weight’) as well as lifted version of those labels, named with the ever-so-creative ‘M’ suffix, (e.g. ‘personM’, ‘weightM’). Since I will mainly use my labels in a monadic context, the extra M’s are annoying, so an alternate TH function renames the ‘unlifted’ labels (with a ‘U’ suffix), and uses the simple names for the monadic labels:

mkLabelsAlt :: [Name] -> Q [Dec]
mkLabelsAlt names = do
    decs <- mkLabels names
    decs2 <- mapM liftDec decs
    return (map change decs ++ decs2)
    where liftDec (FunD nm _) = funD (mkName (nameBase nm))
            [clause [] (normalB $ (appE (varE 'liftML) (varE (mkName $ nameBase nm ++ "U")))) []]
          change (FunD nm x) = FunD (mkName (nameBase nm ++ "U")) x

Now I can easily compose deep access to my state data:


-- set someones age to the price of their hat...
f1 k = do
    hatCost <- getM (cost.lm "hat".possessions.lm k.people)
    (age.lm k.people) =: floor hatCost

-- update or insert a person
f2 k = (lm k.people) =: Person { _name = "Fred", _age = 32, _possessions = M.empty }
-- set k's age to 32
f3 k = (age.lm k.people) =: 32

I can run these in an ErrorT / State monad with a function like:

run :: ErrorT String (SM.State World) a -> World -> (Either String a, World)
run f = (runState . runErrorT) f

a quick test:

*FCLTest> world
World {_timestamp = 0, _people = fromList [("person1",Person {_name = "Joe", _age = 17, _possessions = fromList [("camera",Item {_cost = 599.0, _weight = 250.0}),("ipod",Item {_cost = 195.0, _weight = 36.4})]}),(“person2″,Person {_name = “Zeke”, _age = 88, _possessions = fromList [("hat",Item {_cost = 65.0, _weight = 200.0})]})]}
*FCLTest> run (f1 “person2″) world
(Right (),World {_timestamp = 0, _people = fromList [("person1",Person {_name = "Joe", _age = 17, _possessions = fromList [("camera",Item {_cost = 599.0, _weight = 250.0}),("ipod",Item {_cost = 195.0, _weight = 36.4})]}),(“person2″,Person {_name = “Zeke”, _age = 65, _possessions = fromList [("hat",Item {_cost = 65.0, _weight = 200.0})]})]})

With fclabels, some augmentation, and a small amount of Template Haskell hackery, it looks like I’ll be able to get rid an immense amount of boilerplate.  I’ll post some code-reduction stats once available.

2009-07-08

Added a new blog

Filed under: Uncategorized — robgreayer @ 09:06

I’m involved in another blog, the Harvard Farmers Market blog, dedicated to the farmers market in Harvard and to local food in New England in general.

2009-05-22

About this blog

Filed under: Uncategorized — Tags: — robgreayer @ 21:52

I have content scattered here and there on the web (a blog about x, a website I help out with about y, some project I contribute to about z, and so on), but no central place to organize it.  This blog is intended to be that place.

Blog at WordPress.com.