# Parsec: A convenient DSL for writing Parsers
### By: Kendrick, Kang Liang, Chee Yuan
---
# Overview
1. Introduction, Monad Transformers, Input Streams
1. Parsec Basics (no combinators)
1. Parsec Combinators, Streams, ParsecT
---
# Difference from lecture content
- Offering a different perspective
- More complete look at Parsec
- Peek into source of actual Parsec library
---
# Introduction
Parsec is a **Domain-Specific Language (DSL)** for writing Parsers
- A DSL is a customised language targeted to a particular kind of problem
- HTML, Regular Expressions, Parsec
- Parsec is an *internal* DSL, which uses the host language to give itself the feel of another language
- Regex is an *external* DSL, independent of its programming language.
- Parsec deals with the problem of *Parser Combinators* - defining basic character parsers and ways to combine them into complex parsers
---
# Introduction
Hackage:
"Parsec is a *monad transformer* that can be stacked on arbitrary monads, and is parametric in the *input stream* type."
- Monad Transformer?
- Input Stream?
---
# Monad Transformer
*Monad Transformers are functions that accept a (different) monad, and wrap around it to make a bigger, better monad.*
- E.g Add **Maybe** powers to **IO** computations
- Exit an IO operation is Nothing is encountered
- Naming convention: If I add Maybe powers, call the Monad Transformer MaybeT
```digit :: Stream s m Char => ParsecT s u m Char```
- ParsecT with stream type `s`, user state type `u`, underlying monad `m`, return type `a`
- "Giving parsec powers" to underlying monad `m`
---
# Input Stream
```digit :: Stream s m Char => ParsecT s u m Char```
- Will be explained in-depth later
- Think of it as a data structure with a list of tokens
- The parser (over `m`) can remove a single token at a time
---
# Parsec's Libraries

- Text.ParserCombinators is a legacy library, highlighted = covered in lecture
---
# Parsec: Basics
Some initial setup:
```haskell
import Text.Parsec.String (Parser)
import Text.Parsec
regularParse :: Stream s Identity t =>
Parsec s () a -> s -> Either ParseError a
regularParse p = parse p ""
```
---
```haskell
satisfy :: Stream s m Char => (Char -> Bool) -> ParsecT s u m Char
```
Satisfy is one of the primitive functions in Parsec. It looks at the next Char from the current input, and pops it if (Char -> Bool) returns true.
```haskell
regularParse (satisfy (=='a')) "a"
>> Right 'a'
regularParse (satisfy (=='b')) "a"
>> Left (line 1, column 1):
>> unexpected "a"
>> expecting "b"
regularParse (satisfy (`elem` "abc")) "a"
>> Right 'a'
```
---
We can use Satisfy to build our own rules, which is done in the original source code
```haskell
-- | Parses a digit. Returns the parsed character.
digit :: (Stream s m Char) => ParsecT s u m Char
digit = (satisfy isDigit) <?> "digit"
-- | Parses a white space character (any character which satisfies 'isSpace')
-- Returns the parsed character.
space :: (Stream s m Char) => ParsecT s u m Char
space = (satisfy isSpace) <?> "space" = satisfy isSpace <?> "space"
```
- Source code for http://hackage.haskell.org/package/parsec-3.1.3/docs/src/Text-Parsec-Char.html
---
## Parse multiple characters
```haskell
num :: Stream s m Char => ParsecT s u m [Char]
num = many1 digit
regularParse num "1"
>> Right "1"
regularParse num "123456a"
>> Right "123456"
```
- Other than parsing individual characters, we also need to parse multiple characters
- We do so using combinators, which would be further elaborated later on
- `many1`: Parses 1 or more of the given parser
---
## Whitespace
Whitespaces are generally ignored in most languages. Thus, we need a parser to consume and discard whitespaces.
```haskell
whitespace = many $ oneOf " \n\t"
regularParse whitespace " "
>> Right " "
num :: Stream s m Char => ParsecT s u m [Char]
num = do
whitespace
e <- many1 digit
whitespace
return e
regularParse num " 123 "
>> Right "123"
```
---
## Parentheses
To parse an Integer within a pair of parentheses, we would do something like this:
```haskell
parens :: Stream s m Char => ParsecT s u m [Char]
parens = do
whitespace
char '('
whitespace
e <- many1 digit
whitespace
char ')'
whitespace
return e
regularParse parens " ( 123 ) "
>> Right "123"
```
---
## Lexeme
```haskell
lexeme :: Stream s m Char => ParsecT s u m b -> ParsecT s u m b
lexeme p = do
x <- p
whitespace
return x
```
- 'lexeme' parsing
- Every token parser should consume and ignore any trailing whitespace.
- Allows skipping whitespace exactly once wherever it needs to be skipped
- More elegant than spamming all the parsing code with many calls to whitespace
---
## Comparison
```haskell
parens :: Stream s m Char => ParsecT s u m [Char]
parens = do
whitespace
char '('
whitespace
e <- many1 digit
whitespace
char ')'
whitespace
return e
regularParse parens " ( 123 ) "
>> Right "123"
```
```haskell
parensL :: Stream s m Char => ParsecT s u m [Char]
parensL = do
whitespace
lexeme $ char '('
e <- lexeme $ many1 digit
lexeme $ char ')'
return e
regularParse parensL " ( 123 ) "
>> Right "123"
```
---
# Parsec: Combinators
- Combinators make use of simple parser building blocks
- Allow us to build up more complex parsers by combining simpler ones
---
# Stream
```haskell
class Monad m => Stream s m t | s -> t where
uncons :: s -> m (Maybe (t, s))
```
- An instance of Stream has stream type s, underlying monad m and token type t determined by the stream
- Can be thought as a data structure holding a stream of tokens t
- Has the instance `Monad m => Stream [tok] m tok`
- Has the instance `Monad m => Stream ByteString m Char`
- Responsible for maintaining the "position within the stream" in the stream state s which is trivial in many cases
---
# ParsecT
- `ParsecT s u m a` is a parser with stream type `s`, user state type `u`, underlying monad `m` and return type `a`
- Has Functor, Applicative, Monad instance
- For our demonstration, we can think of the stream type as `String`, user state as `()`, underlying monad as `Identity`, and a return type of `Char`
```haskell
type Parsec s u = ParsecT s u Identity
```
---
# Parsing
```haskell
regularParse :: Stream s Identity t =>
Parsec s () a -> s -> Either ParseError a
regularParse p = parse p ""
parseA :: Stream s m Char => ParsecT s u m Char
parseA = char 'a'
parseB :: Stream s m Char => ParsecT s u m Char
parseB = char 'b'
```
---
# Combinators
```haskell
many1 :: (Stream s m t) => ParsecT s u m a -> ParsecT s u m [a]
many1 p = do { x <- p; xs <- many p; return (x:xs) }
parseManyA = many1 parseA
regularParse parseManyA "aaaabaaaa"
>> Right "aaaa"
```
```haskell
choice :: (Stream s m t) => [ParsecT s u m a] -> ParsecT s u m a
choice ps = foldr (<|>) mzero ps
parseAOrB = choice [parseA, parseB]
regularParse parseAOrB "a"
>> Right 'a'
regularParse parseAOrB "b"
>> Right 'b'
regularParse parseAOrB "c"
>> unexpected "c"
expecting "a" or "b"
```
```haskell
chainl1 :: (Stream s m t) => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> ParsecT s u m a
chainl1 p op = do{ x <- p; rest x }
where
rest x = do { f <- op;
y <- p;
rest (f x y)}
<|> return x
```
---
# The End
---
<style>
.reveal {
font-size: 24px;
}
</style>
{"metaMigratedAt":"2023-06-15T15:28:36.340Z","metaMigratedFrom":"Content","title":"Parsec: A convenient DSL for writing Parsers","breaks":true,"contributors":"[{\"id\":\"1b3703e1-3202-4d4b-a33d-45b2acabfb72\",\"add\":4666,\"del\":1883},{\"id\":\"73cbcfa9-3aef-4b30-8277-049991af2e88\",\"add\":2282,\"del\":244},{\"id\":\"3f0da83e-7d5e-4c4a-b160-9f47bdce02eb\",\"add\":4314,\"del\":1495}]"}