Abstract Nonsense

Abstract Nonsense

Ramblings by Benjamin Kovach

Theme adapted from minimal by orderedlist.

Demand More of Your Automata

Published by Ben Kovach on August 22, 2014

Tags: haskell, automata, monad reader

Notes on Demand More of Your Automata from the Monad Reader 16.

Intro: We want to match on more than just strings, and in a way that actually makes sense instead of stuff like

(?P<area>\d{3})[- .](?P<n1>\d{3})[-.](?P<n2>\d{4})

Regexes might “match” a string.

Three operations:

  • Glue two regexes together
  • Match \(r_1\) OR \(r_2\) for regexes \(r_i\)
  • Match a regex zero or more times (Kleene closure)
data RegularExpression
    = Empty
    | Singleton Char
    | Kleene RegularExpression
    | Catenation RegularExpression RegularExpression
    | Alternation RegularExpression RegularExpression

for (ab)*c|d, we have syntax mapping:

  1. Singleton 'a' \(\mapsto\) a
  2. Kleene (...) \(\mapsto\) (...)*
  3. Catenation (Singleton 'a') (Singleton 'b') \(\mapsto\) ab
  4. Alteration (Singleton 'a') (Singleton 'b') \(\mapsto\) a|b

True regexes guarantee that everything can happen in one pass of the input. Regex matchers usually implemented in NDAs or DFAs (only powerful enough to match regular languages).

Overview: DFA, NFA in text.

Haskell representations:

-- Assume "type State = Int" or something like that.
-- transition returns Nothing if in "error state" (w/ no transitions)
data DFA
    = DFA { transition :: State -> Char -> Maybe State
          , start      :: State
          , accepting  :: State -> Bool
          }
data NFA
    = NFA { transition :: State -> Maybe Char -> Set State
          , start      :: State
          , accepting  :: State -> Bool
          }

In NFAs, \(\epsilon\) transitions are represented by Nothing.

Why just [Char]? We can do better!

data RegularExpression a
    = Empty
    | Singleton a
    | Kleene (RegularExpression a)
    | Catenation (RegularExpression a) (RegularExpression a)
    | Alternation (RegularExpression a) (RegularExpression a)
    deriving (Show, Functor)

A way to make NFAs directly:

type StateLabel = Int
type MapNFATransitionMap a
    = Map.Map (StateLabel, Maybe a) (Set.Set StateLabel)
data MapNFA a = MapNFA { transitionMap :: MapNFATransitionMap a
                       , startKey :: StateLabel
                       , acceptKeys :: Set.Set StateLabel
                       } deriving (Show
empty :: Ord a => MapNFA a
singleton :: Ord a => a -> MapNFA a
catenate :: Ord a => MapNFA a -> MapNFA a -> MapNFA a
alternate :: Ord a => MapNFA a -> MapNFA a -> MapNFA a
loop :: Ord a => MapNFA a -> MapNFA a

We implement loop here instead of kleene, which can be represented as kleene = optional . loop whereoptional = alternate empty`.

The rest of the article is about HOFs for abstracting away regex design, visualizing with GraphViz, compiling to LLVM.

I skimmed through the rest of it, because it’s the Friday of my first week back at school and I’m tired. It looks like the final implementation of the Automata library isn’t on hackage or github, and the link is broken. :(

Revisit this one later, it’s pretty cool. One of the non-string matching regexes was about matching lists of user actions (clicking, typing, following a link). Neat.

comments powered by Disqus