Abstract Nonsense

# Abstract Nonsense

Ramblings by Benjamin Kovach

Theme adapted from minimal by orderedlist.

• Posting to Twitter via HTTP in haskell

Published by Ben Kovach on August 9, 2014

When working on rapcandy, I had a bit of trouble getting the twitter connector bit working. I’ll detail here what worked – if you need to connect to Twitter via one of your Haskell applications, this should help you out.

### Setting up a Twitter Application

In order to interact with Twitter programmatically, you’ll first need to set up an application via dev.twitter.com. You can create a new app here and generate new API keys for it. Once you do this, you’ll be able to read from twitter. If you want to write to twitter (like I did, with @_rapcandy), you can go to the API Keys tab, click “modify app permissions” and give it write access. You can then generate new API keys which will permit writing.

Copy down your API key, API secret, Access token, and Access token secret into a JSON file called config.json that looks like this:

{
"apiKey": "<api key>",
"apiSecret": "<api secret>",
"userKey": "<user key>",
"userSecret": "<user secret>"
}

Now everything’s in place to be able to start interacting with Twitter via Haskell.

### Posting to Twitter

We’ll be using aeson, HTTP, and authenticate-oauth (Twitter uses OAuth to authenticate its users) to handle the transaction. A bit of boilerplate:

{-# LANGUAGE DeriveGeneric             #-}
{-# LANGUAGE NoMonomorphismRestriction #-}
{-# LANGUAGE OverloadedStrings         #-}
{-# LANGUAGE RecordWildCards           #-}

import           Data.Aeson
import           GHC.Generics
import           Data.ByteString
import qualified Data.ByteString.Char8   as B
import qualified Data.ByteString.Lazy    as BL
import qualified Network.HTTP.Base       as HTTP
import           Network.HTTP.Client
import           Network.HTTP.Client.TLS
import           Network.HTTP.Types
import           Web.Authenticate.OAuth

First thing’s first, we need to be able to pull in our config file in order to access the keys for our application. We “magically” do this using Aeson, deriving Generic and making automatic instances of {To, From}JSON for our user-defined Config type.

data Config = Config {
apiKey       :: String,
apiSecret    :: String,
userKey      :: String,
userSecret   :: String
} deriving (Show, Generic)

instance FromJSON Config
instance ToJSON Config

We can pull in a Config from a file with a basic function configFromFile:

configFromFile :: FilePath -> IO (Either String Config)
configFromFile path = do
contents <- BL.readFile path
return $eitherDecode contents Now calling configFromFile "config.json" should return something like Right (Config{...}). Now we can start authenticating requests to the Twitter API. The following function is adapted from the Yesod source code to be less specific to Yesod: oauthTwitter :: ByteString -> ByteString -> OAuth oauthTwitter key secret = newOAuth { oauthServerName = "twitter" , oauthRequestUri = "https://api.twitter.com/oauth/request_token" , oauthAccessTokenUri = "https://api.twitter.com/oauth/access_token" , oauthAuthorizeUri = "https://api.twitter.com/oauth/authorize" , oauthSignatureMethod = HMACSHA1 , oauthConsumerKey = key , oauthConsumerSecret = secret , oauthVersion = OAuth10a } Here we pass in our OAuth consumer key and secret to build an OAuth; these correspond to the Twitter API key/secret, and is one half of what we need to fully authenticate Twitter requests. The other half is a Credential, which we can build with newCredential using our user key and secret. We can fully sign an arbitrary request using a Config: signWithConfig :: Config -> Request -> IO Request signWithConfig Config{..} = signOAuth (oauthTwitter (B.pack apiKey) (B.pack apiSecret)) (newCredential (B.pack userKey) (B.pack userSecret)) Now all we have to do is actually send a request (post a status!), which is simple but took me a while to finagle into place. There are three things to keep in mind here: 1. We must urlEncode the status we want to send. 2. We must POST; not GET. 3. We must use tlsManagerSettings to enable TLS for our request (otherwise, the request won’t go through) tweet :: Config -> String -> IO (Response BL.ByteString) tweet config status = do url <- parseUrl$ "https://api.twitter.com/1.1/statuses/update.json?status=" ++ HTTP.urlEncode status
req <- signWithConfig config url{ method = "POST" }
manager <- newManager tlsManagerSettings
httpLbs req manager

Using this code and a sufficiently permissive Twitter application, you should be able to adapt this code to send requests to any of the Twitter REST API endpoints from Haskell.

Ben

• Modeling and Simulating Markov Chain Evolution

Published by Ben Kovach on August 4, 2014

In this post, I will describe and implement a small interface for modeling Markov chains and simulating their evolution in Haskell.

### What is a Markov Chain?

A simple two-state Markov chain.
image by Joxemai4.

A discrete-time Markov chain (DTMC) is a mathematical system that probabalistically transitions between states using only its current state. A Markov chain can be thought of as a directed graph with probabilities for edges and states for vertices. The Markov chain on the right has two states, E and A. The diagram states that a Markov chain in state E will transition back to state E with probability 0.3, and to state A with probability 0.7, and similarly for A’s transitions. They can be used for a wide variety of applications in statistical modeling.

They can also be used to generate sentences similar to arbitrary blocks of text. We will explore this application towards the end of the post.

### Aside: Weighted Random Generation

The first thing that comes to mind when I think of random generation is still the Prob data type described in LYAH. The MonadRandom library defines a data type Rand which works, in many ways, in the same way as the Prob data type does (with a bit of extension to produce a transormer, etc.). I won’t go into the full details of how this works, but the basic ideas is that, given a list of outcomes with weights, e.g.

[("Heads", 1), ("Tails", 1)]

we can – by wrapping it in Rand and giving it a random generator – produce a weighted random outcome, “Heads” or “Tails” with the desired weighting. For a concrete example, here’s a functional program written using the MonadRandom library.

import Control.Monad.Random

main :: IO ()
main = newStdGen >>= print . evalRand coinFlip
where coinFlip = uniform ["Heads", "Tails"]

uniform constructs a Rand from a list with a uniform distribution, i.e. with each member having the same weight. evalRand takes a Rand and a random generator (we’re using StdGen here with newStdGen) and spits out a weighted random object. Running this will print “Heads” about 50% of the time and “Tails” about 50% of the time.

Also of note is the fromList combinator, which takes a list of objects and their weights and constructs a Rand object. For example, replacing coinFlip with fromList [("Heads", 1), ("Tails", 1)] yields the same program as above.

### Markov Chains: An Intermediate Representation

In order to model Markov chains, we essentially want to build a graph with weighted edges. We can model edge-weighted graphs using a HashMap from vertices to lists of edges, represented as (vertex, weight) pairs (the vertex the edge points to and its weight).

import qualified Data.HashMap.Lazy as M
import           Data.Ratio

type MarkovI a = M.HashMap a (Maybe [(a, Rational)])

The MarkovI (I for “intermediate”) data type is a synonym for a lazy HashMap from a to a list of vertex-edge pairs. The only difference here is that we allow the list to be empty by using Maybe, which signifies an “end” state in the chain with no outgoing transitions. We could remove this wrapper and use an empty list to signify the same thing, but this representation works better with MonadRandom, since Rands can’t be empty, making the translation straightforward.

You might also be wondering why we need an intermediate representation for Markov chains in the first place. The reason for this is that we can’t arbitrarily insert extra objects/weights into Rands, and we’ll want to build up the mappings piecemeal. We need some intermediate structure to handle this functionality.

We can define functions to build up MarkovIs via insertion of objects:

insertMkvI :: (Hashable a, Eq a) => Rational -> a -> a -> MarkovI a -> MarkovI a
insertMkvI r k v mkv = M.insert k (Just $case M.lookup k mkv of Nothing -> [(v, r)] Just xs -> case xs of Nothing -> [(v, r)] Just ys -> (v, r):ys) mkv insertEnd :: (Hashable a, Eq a) => a -> MarkovI a -> MarkovI a insertEnd k = M.insert k Nothing insertMkvI inserts an edge into a MarkovI. Its first argument is the weight for the edge being inserted. Its next two arguments are the state objects to add a transition from/to, respectively, and the fourth is the MarkovI to insert into. insertEnd inserts a state with no outbound transitions into a Markov chain. It is worth noting that the Rand object constructed from lists like this: [(True, 1), (True, 1), (False, 1)] do weight True twice as heavily as False. This will become important later, when talking about a sentence generator. ### Markov Chains: Final Representation The final representation of Markov chains simply turns those [(a, Rational)]s in MarkovI into true Rands. import qualified Control.Monad.Random as R newtype Markov g a = Markov{ getMarkov :: M.HashMap a (Maybe (R.Rand g a)) } We can define a simple conversion function to construct Markovs from the intermediate representation by converting their distribution lists to Rands via fromList. This is straightforward because empty lists are represented as Nothing, so we don’t have to explicitly deal with that edge case when calling R.fromList, which would normally fail in such a case. fromMarkovI :: RandomGen g => MarkovI a -> Markov g a fromMarkovI = Markov . M.map (R.fromList <$>)

The first goal is to be able to – given a state and a random generator – transition to a new state probabalistically. The second goal is to be able to repeat this n times and track the states we pass through.

runMarkov1 accomplishes the first goal:

type Err = String
data Outcome g a =
Error Err
| Val a g
| End
deriving (Show, Eq)

runMarkov1 :: (R.RandomGen g, Hashable a, Eq a) => Markov g a -> g -> a -> Outcome g a
runMarkov1 mkv gen x = case M.lookup x (getMarkov mkv) of
Nothing -> Error "Internal error; cannot find value"
Just rs -> case flip R.runRand gen <$> rs of Nothing -> End Just (a, g) -> Val a g First, if the state we’re looking for doesn’t exist, it is impossible to transition out of it, so the computation fails with an internal error. If not, we get a probablistic value out of the transition mappings from the state in question. If there aren’t any, we just End – we cannot transition, but don’t really want to throw an error. If there are values to choose from, we return one along with a new random generator, wrapped in Val. Extending this to run n times isn’t too tough. It mostly consists of finagling data types into the representation we want. runMarkov :: (R.RandomGen g, Hashable a, Eq a) => Integer -> Markov g a -> g -> a -> Either Err [a] runMarkov n mkv gen x = go n where go m | m <= 0 = Right [] | otherwise = (x:) <$> case runMarkov1 mkv gen x of
Val a g -> runMarkov (n-1) mkv g a
End -> Right []
Error err -> Left err

If we hit an End, the simulation terminates because it can’t progress any further. If we get an error along the way, we wrap it in Left and return it. Otherwise, we run runMarkov1 repeatedly n times, starting from the previously computed state each time, and collecting the results into a list. If no errors occur, the result will be a list of states passed through while the simulation runs.

We can now define a fromList function, which builds up a Markov chain from mappings represented in list form.

fromList :: (Hashable a, Eq a, R.RandomGen g) => [(a, [(a, Rational)])] -> Markov g a
fromList = Markov . foldl' (flip $uncurry ins) M.empty where ins a b m = case b of [] -> M.insert a Nothing m _ -> M.insert a (Just$ R.fromList b) m

With this at our disposal, it’s easy to model and run the example Markov chain I mentioned earlier.

example :: Markov PureMT String
example = fromList [("E", [("E", 3), ("A", 7)]), ("A", [("E", 4), ("A", 6)])]
λ> :m + System.Random.Mersenne.Pure64
λ> gen <- newPureMT
λ> runMarkov 15 example gen "E"
Right ["E","A","A","A","E","E","A","A","A","A","A","A","E","E","A"]

The Markov chain passes through “A” a bit more often than “E”, which is to be expected from its definition.

### Towards a Sentence Generator

The process of sentence generation using Markov chains is pretty simple: For each word in a “seed text,” find the probability of each other word proceeding it. Build a Markov chain out of these probabilities, using words as states, and run it a desired number of times. In order to do this, we’ll first need a utility function which takes pairs of elements (which will represent words along with the word following them in a “seed text”) and produces a Markov chain out of them.

insertMkvPairsInto :: (Hashable a, Eq a) => MarkovI a -> [(a, a)] -> MarkovI a
insertMkvPairsInto mkv [] = mkv
insertMkvPairsInto mkv ps = insertEnd lst $foldl' (flip (uncurry (insertMkvI 1))) mkv ps where lst = snd$ last ps

For each pair (x, y), we insert a transition x -> y with weight 1 into the Markov chain, and insert the final value in as an End. The reason this works is because of something I mentioned earlier: Rand handles distributions like [(True, 1), (True, 1), (False, 1)] properly. We build lists very similar to this one when processing a block of text for sentence generation, and when finally converting to Markov, all of that plumbing gets handled automatically. As a final note, we’ll use that End construct to mark the end of a sentence.

The next thing is actually building a MarkovI from a sentence – this can be done by zipping the list of its words with the tail of it and using the aforementioned function, like so:

import qualified Data.Text as T
wordPairs :: T.Text -> [(T.Text, T.Text)]
wordPairs = (zip <*> tail) . T.words

insertSentence :: MarkovI T.Text -> T.Text -> MarkovI T.Text
insertSentence mkv = insertMkvPairsInto mkv . wordPairs

wordPairs could be written more simply in a pointful style, but I think the point{free, less} version is cool. :)

Now, to build a Markov chain from a bunch of sentences (be it a paragraph, a book), we can just fold into an empty MarkovI and convert it from the intermediate representation:

fromSentences :: R.RandomGen g => [T.Text] -> Markov g T.Text
fromSentences = fromMarkovI . foldl' insertSentence M.empty

The rest is mostly plumbing:

{-# LANGUAGE OverloadedStrings #-}
import System.Random.Mersenne.Pure64

runFromSentences :: Int -> [T.Text] -> IO (Either Err T.Text)
runFromSentences n sentences = do
g <- newPureMT
let hds = map (head . T.words) sentences
seed <- R.uniform hds
return $T.unwords <$> runMarkov n (fromSentences sentences) g seed

test :: [T.Text]
test = [
"I am a monster.",
"I am a rock star.",
"I want to go to Hawaii.",
"I want to eat a hamburger.",
"I have a really big headache.",
"Haskell is a fun language!",
"Go eat a big hamburger!",
"Markov chains are fun to use!"
]

We get a new PureMT to use for a generator, and grab a random word (from the beginning of a sentence) to use as the starting state. We then run a markov simulation, collecting the words we pass through, and finally call T.unwords on the result to build a sentence from the words in sequence. Running this yields some interesting statements (and a lot of nonsensical ones), for example:

λ> runFromSentences 10 test
Right "Haskell is a hamburger."
Right "Go eat a really big headache."
Right "I am a fun to go to go to eat"

### Application: Rap Candy

As you might imagine, this type of thing gets more interesting when you’re working with a larger set of sentences. For me, I thought it would be fun(ny) to take lines from Eminem’s music as “sentences,” make tweet-sized snippets from them, and automate a twitter bot to post one every day. Most of the tweets are pretty nonsensical (and very vulgar), here’s one:

rapcandy is open source. Its Markov chain mechanism differs slightly from what was presented here, but the ideas are the same. It also includes a simple example of how to connect to Twitter using Haskell (which I’ll be covering separately in a short blog post soon), as well as web-scraper written in node that I used to download Eminem’s lyrics programmatically. Feel free to browse the code and follow @_rapcandy on Twitter.

I’ve also boxed up (most of) the code from this blog post into a small cabal package that you can use if you’d like to play around with your own Markov chain based applications. You can download markov-sim and browse its source here.

• An attempt at zipper-based type inference in Molecule

Published by Ben Kovach on July 21, 2014

### Prelude

I recently designed a very small programming language, Molecule. Molecule is a slight extension of the simply typed lambda calculus (STLC) that supports type inference. It is mostly a contained experiment on how to implement type inference in an STLC-like language.

In this blog post, I want to talk about my experience with Molecule and its static type checking and inference mechanism, which works for most valid programs. I do not know if this approach can be modified to work for all valid programs (and indeed this is a problem), but I think it’s worth talking about regardless.

Molecule and its REPL, mi, can be downloaded from GitHub, if you’d like to play with it.

### Syntax

Molecule’s syntax tries to emulate the mathematical notation used in the lambda calculus, without explicit type annotations (which aren’t even allowed syntactically). A valid program can be one of the following expressions (informally):

• Int literals
• Boolean literals (t and f)
• Addition (1 + 2)
• Boolean Or (f | t)
• Lambda abstractions (\x. x + x), and
• Function application ((\x.x) 80)

Expressions can have type Int, Bool, or any function type between types, e.g. Int -> Bool. Expressions are monomorphic, meaning they cannot take more than one value. For a quick example, the polymorphic identity function \x.x has no type in Molecule – only applied to another expression can it take on a type. (\x.x) 10, for example, has type Int.

The type inference mechanism that I will be describing never produces the wrong type for a Molecule expression, but it fails to realize that some expressions can be typed at all. The following expression, for example:

((\x.x) (\x.x)) f

should have type Bool; however, Molecule’s type checker cannot unify its type.

### Type Inference in Molecule

When designing a slightly more ambitious PL, I was writing a naïve type inferencer and hit the point where I was pattern matching on non-leaf expression constructors in order to determine the types of the leaves. At a basic level, there was an Expr data type akin to:

data Expr =
Var String
| PInt Int
| Expr :+: Expr
-- ... and so on

And I was pattern matching against, for example (Var "x" :+: a) as well as (a :+: Var "x") in order to determine that x was an Int. As you can imagine, this got tedious quickly. I wanted to be able to have some function infer :: Expr -> Type such that infer (Var x) would spit out the type for x in the context of its expression. After narrowing down the information that I actually needed to typecheck a Var, I realized all that was needed in order to infer the type of an expression was information about where the Var came from one level higher in the AST. For example, in the expression (Var "x" :+: a), just knowing that Var "x" came from a :+: expression is enough to determine that x : Int, and similarly for other types of expressions.

Let’s dive into the actual definitions of Molecule’s data types in order to talk about how this works concretely in practice.

Types are simple, as described above. TLam represents function types (Int -> Bool === TLam TInt TBool).

data MoleculeType =
TBool
| TInt
| TLam MoleculeType MoleculeType
deriving Eq

Values can be Bools, Ints, or function values with an expression environment (mappings from variables in scope to expressions), the name of the argument it takes, and an expression body.

type Name = String
type Env = Map Name MoleculeValue

data MoleculeValue =
VBool Bool
| VInt Int
| VLam Env Name MoleculeExpr

Expressions are represented as you might expect. EAbs represents a lambda abstraction, EApp function application, and :+: and :|: correspond to addition and boolean or, respectively.

data MoleculeExpr =
EVar Name
| ETrue
| EFalse
| EInt Int
| EAbs Name MoleculeExpr
| EApp MoleculeExpr MoleculeExpr
| MoleculeExpr :+: MoleculeExpr
| MoleculeExpr :|: MoleculeExpr
deriving Eq 

Here’s the key data type we need in order to keep track of the expression we’re coming from when we hit Var values in the type inferencer. Each of these Crumbs tag the expression one level higher and carry along all information in it that isn’t already present in the expression currently being expected. This is one of the pieces of a zipper for the Molecule AST.

data MoleculeCrumb =
| CPlus MoleculeExpr   -- Came from +
| COr MoleculeExpr     -- Came from |
| CAbs Name            -- Came from a lambda abstraction
| CApp1 MoleculeExpr   -- Came from first arg of application
| CApp2 MoleculeExpr   -- Came from second arg of application
deriving (Show, Eq)

We use a basic monad transformer stack to represent the type of the type checker.

type TypeEnv     = M.Map Name MoleculeType
type Scope       = S.Set Name
type Typechecker = ExceptT MoleculeError (ReaderT (Maybe MoleculeCrumb, Scope) (StateT TypeEnv Identity))

TypeEnvs are maps from variable names to types, and a Scope is a list of variable names in scope. The typechecker has access to 3 basic (monadic) effects:

1. Error production, via ExceptT MoleculeError
2. Threading variable scope and a crumb through the computation, via ReaderT (Maybe MoleculeCrumb, Scope), and
3. Access to a mutable type environment, via StateT TypeEnv Identity.

Note that we don’t need a “full” zipper here, since we only care about the single expression that the variable came from, not the entire path from the AST root. This is all we need to implement a relatively robust type inference/checking mechanism for Molecule. The inference part of type checking takes place in the following branch of the check function, which accepts an EVar (Note: this is modified for brevity; in practice, more specific type errors are thrown for failing expressions).

check :: MoleculeExpr -> Typechecker MoleculeType
check (EVar name) = do
let addBinding name typ = modify (M.insert name typ) >> return typ
typeError = throwError . TypeError
(crumb, scope) <- ask
if S.notMember name scope
then typeError $"variable not in scope: " ++ name else do env <- get case M.lookup name env of Nothing -> case crumb of Just x -> case x of CPlus _ -> addBinding name TInt COr _ -> addBinding name TBool CAbs nm -> case M.lookup nm env of Just t -> addBinding name t CApp2 e -> do t <- check e case t of TLam v _ -> addBinding name v Just t -> case crumb of Nothing -> return t Just cb -> case cb of CPlus _ | t == TInt -> return TInt COr _ | t == TBool -> return TBool CAbs _ -> return t CApp1 _ -> case t of TLam _ _ -> return t CApp2 e -> do t' <- check e case t' of TLam typ _ | typ == t -> return t That’s a bit of a mouthful, but can be broken up into sections. First off, if we hit this block of code, we’ve hit a variable in a program and need to unify its type. We first get the crumb and scope that has been accumulating throughout traversal of the AST. We then check if the variable in question is in scope; if not, we throw a TypeError. We next get the type environment env. If the variable in question is not bound in the type environment, we bind it to the appropriate type and return it using addBinding. If we came from a lambda abstraction, the (sub)expression must be \x.x, so we return the type of x in the environment, if it already exists. If we came from the second value of an application (i.e. the value being applied to a function), we check the type of the function it is being applied to – if it isn’t a lambda abstraction, the typechecker fails. It’s worth noting that this last CApp2 rule is exactly why the expression I noted earlier – ((\x.x) (\y.y)) 10 – fails. The subexpression ((\x.x) (\y.y)) doesn’t typecheck to a TLam; it fails to typecheck because \y.y isn’t unifiable. But I digress – the typechecker works reasonably well and I think the schema is simple enough to be interesting, even if not exactly practical/suitable for real-world usage in its current state. If there exists a binding in the type environment for the variable in question, we just make sure that type matches what we expect and return it. The rest of the check function consists of functions in the same vein as this one (the :+: branch, again omitting error-handling noise): withCrumb :: Typechecker a -> MoleculeCrumb -> Typechecker a withCrumb action crumb = local (_1 .~ Just crumb) action check (e1 :+: e2) = do e1' <- check e1 withCrumb CPlus e2 e2' <- check e2 withCrumb CPlus e1 case (e1', e2') of (TInt, TInt) -> return TInt where we set the Crumb in the expression with a helper function withCrumb (which makes use of the local and a common lens/operation) and propagate typechecking through the rest of the AST. We can run the typechecker using the final function typecheck (which looks complex but just runs a typechecking operation with initially empty environments and no crumb): typecheck :: MoleculeExpr -> Either MoleculeError MoleculeType typecheck = runTypecheck Nothing S.empty M.empty where runTypecheck crumb scope te expr = runIdentity$ evalStateT (runReaderT (runExceptT (check expr)) (crumb, scope)) te

Now we have a static typechecker for Molecule expressions, which means, most importantly, that we can run expressions safely without the types, which in turn removes a lot of ambiguity from the actual evaluator and allows for faster evaluation since expressions need not be typechecked at runtime.

### Evaluating Molecule Expressions

Let’s get right to it – the code for evaluation in Molecule looks like this:

type Evaluator = ExceptT MoleculeError (Reader (M.Map Name MoleculeValue))

eval :: MoleculeExpr -> Evaluator MoleculeValue
eval e = do
case e of
ETrue   -> return $VBool True EFalse -> return$ VBool False
EInt x  -> return $VInt x EAbs name e1 -> return$ VLam env name e1
EVar nm -> return $fromJust (M.lookup nm env) e1 :+: e2 -> do [a, b] <- mapM eval [e1, e2] case (a, b) of (VInt a', VInt b') -> return . VInt$ a' + b'
EApp e1 e2 -> do
[e1', e2'] = mapM eval [e1, e2]
case e1' of
VLam env' name body -> local (const $M.insert name e2' env')$ eval body

I’ve removed the :|: rule for brevity (hint: it looks just like :+:). Most of this is pretty straightforward because we don’t have to deal with typechecking at runtime. The most complex evaluation rule is the one for EApp, which applies e2 to e1. This rule says to evaluate e1 and e2, then take the resulting lambda abstraction, bind the argument name to e2’s evaluated expression, then evaluate the lambda abstraction’s body with the modified environment.

Again, we can run the evaluator with a simple wrapper function evaluate:

evaluate :: MoleculeExpr -> Either MoleculeError MoleculeValue
evaluate = runEval M.empty
where runEval env e = runReader (runExceptT (eval e)) env

…and that’s basically all there is to Molecule! The mi REPL is built with haskeline and supports type checking via :t (a la ghci) and evaluation by simply typing (no pun intended) expressions.

I’ve still got a long way to go in the programming languages world, but I’m proud of Molecule even if its type inference is a little flawed. My next project will either have a type system closer to Hindley Milner (so I can type infer with something closer to Algorithm W), or just make type annotations explicit (for a different set of challenges).

Again, Molecule’s full source code is available on GitHub.

Ben

• Hylomorphisms and treesort

Published by Ben Kovach on April 30, 2014

Consider the following data structure, representing a binary search tree:

data  BST a =
Tree (BST a) a (BST a)
| Empty deriving (Eq)

As it turns out, this data structure provides a nice way to introduce the concepts of different types of morphisms used all over the place in Haskell - the fold, or “catamorphism”, the unfold, or “anamorphism”, and compositions of the two, the “hylomorphism” and “metamorphism.”

The bracket notations that I’ll use below come from Meijer, Fokkinga and Patterson’s excellent paper Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire. If you enjoy the article, I’d suggest giving it a look!

I’m writing this because I recall encountering these names when learning Haskell early on and being very confused, particularly by the latter two types. Folds (and to a lesser extent, unfolds) are commonplace in Haskell. Hylo- and metamorphisms are also pretty common, but they’re not as easy to spot. From Wikipedia:

In computer science, and in particular functional programming, a hylomorphism is a recursive function, corresponding to the composition of an anamorphism (which first builds a set of results; also known as ‘unfolding’) and a catamorphism (which then folds these results into a final return value)

The canonical example of a hylomorphism is the factorial function, which (usually) implicitly composes functions. The goal of this post is to lay out an explicit example of a hylo- (and meta-) morphism in a natural way. We’ll start with a couple of functions:

insert :: Ord a => a -> BST a -> BST a
insert x Empty = Tree Empty x Empty
insert x (Tree left a right)
| x < a     = Tree (treeInsert x left) a right
| otherwise = Tree left a (treeInsert x right)

fromList :: Ord a => [a] -> BST a
fromList xs = foldr treeInsert Empty xs

We have an insertion function and our first example of a catamorphism, fromList! We’re folding all values from a list into a new structure (a BST) and destroying the list in the process. This function can be written in so called “banana brackets,” like so: $$fromList = (\!\left|treeInsert\right|\!)$$.

IfromList can also be considered an anamorphism. Catamorphisms destroy structures to build final values, whereas anamorphisms take an initial seed value and build a new structure from it. In fromList, xs can be considered a “seed” value to build a BST a from, making fromList a perfectly valid anamorphism as well. As such, fromList can also be written in “lens brackets”: $$fromList = [\!(treeInsert)\!]$$.

We can also define a new pair of cata/anamorphisms by folding the tree into a list:

foldTree :: (a -> b -> b) -> b -> BST a -> b
foldTree f b Empty = b
foldTree f b (Tree left a right) = foldTree f ( f a (foldTree f b right) ) left

toList :: BST a -> [a]
toList t = foldTree (:) [] t

foldTree is analogous to foldr (and would be a fine definition for foldr in a Foldable instance), and toList destructs (folds) a BST a into an [a]. Thinking this way, toList again defines a catamorphism, this time from BST a -> [a], denoted $$toList = (\!\left| : \right|\!)$$. But we can also think of toList as unfolding a BST a into an [a], so we can analogously define an anamorphism $$toList = [\!( : )\!]$$.

There’s something interesting about toList: foldTree traverses a BST in order, so it actually produces a sorted list (given that elements are inserted rather than randomly placed!). Now we have a way to construct a binary search tree from a list of elements, and destruct a binary search tree into a sorted list of elements. This gives rise to a simple definion of a sorting algorithm, namely:

treesort :: Ord a => [a] -> [a]
treesort = toList . fromList

Because toList and fromList are both cata- and anamorphims, treesort actually defines a hylomorphism and a metamorphism.

As we noted before, a hylomorphism is defined as the composition of an anamorphism (unfold) with a catamorphism (fold). If we think of fromList as an anamorphism and toList as a catamorphism, we have constructed a hylomorphism directly. Namely, the function $$treesort = [\![([], (:)),(insert, null)]\!]$$ (the brackets here are commonly called “envelopes”). null isn’t explicit in the definition of treesort (instead, it’s implicit in foldr), but it describes a terminating condition for fromList. Similarly, [] is just the container to fold values into.

We can once again think of this function in the opposite manner by thinking of fromList as a catamorphism and toList as an anamorphism, giving rise to a metamorphism, defined by composition in the opposite direction. Metamorphisms (as far as I know) have no bracket notation in literature, but I want to mention that we do have a name for such things. My guess is that any metamorphism can actually be thought of as a hylomorphism, since the objects being operated on must be both foldable and unfoldable, but I don’t know for sure.

Finally, note that we can also create another function:

what :: Ord a => BST a -> BST a
what = fromList . toList

which is also a hylo- and metamorphism. However, this isn’t very useful (in fact, one might consider it counterproductive), but I’ll leave it as an exercise to the reader to figure out why.