LR-grep documentation session
=============================
`cmon` is missing from the README instructions:
```
lrgrep$ dune build src/main.exe ocaml/interpreter.exe ocaml/frontend.bc
File "src/dune", line 5, characters 16-20:
5 | fix cmon utils menhirSdk lrijkstra))
^^^^
Error: Library "cmon" not found.
-> required by _build/default/src/main.exe
File "lib/utils/dune", line 3, characters 12-16:
3 | (libraries cmon fix))
^^^^
Error: Library "cmon" not found.
-> required by library "utils" in _build/default/lib/utils
-> required by executable interpreter in ocaml/dune:8
-> required by _build/default/ocaml/interpreter.exe
```
Looking at examples from Stackoverflow:
1. https://stackoverflow.com/questions/71847555/ocaml-syntax-error-but-i-dont-know-where-can-somoene-help-me
already handled! LATER: Could improve message (link to the manual ?)
2. https://stackoverflow.com/questions/52323697/why-is-there-a-ocaml-syntax-error-on-if-statement
2 is interesting, there is a semi-colon after the `if ... then ...` which is terminating this expression for the OCaml parser, therefore it fails on the `else`.
The important files to specify the grammar and the error messages are:
- the parser `ocaml/parser_raw.mly`, and the lexer `ocaml/lexer_raw.mll`, which are taken from the OCaml frontend with builtin error handling removed
- `ocaml/parse_errors.mlyl` in which we define the error matching rules
(TODO: mention that user should run shellscript to set up OCAMLPARAM)
Devising a new rule
-------------------
Consider the following example from StackOverflow: https://stackoverflow.com/questions/70691144/match-inside-match-ocaml-raises-syntax-error
This code snippet has multiple problems, so let's stick to the first part.
The type declarations in this part switches from a colon `:` to an equal sign `=`
in the middle of a record type definition:
```ocaml=
type 'a grid = 'a Array.t Array.t
type problem = { initial_grid : int option grid }
type available = { loc : int * int; possible : int list }
type state = { problem : problem; current_grid : int option grid; available = available list }
```
Without any patterns `ocamlc` just yields a syntax error:
```
$ ocamlc demo/stackoverflow3.ml
File "demo/stackoverflow3.ml", line 9, characters 76-77:
9 | type state = { problem : problem; current_grid : int option grid; available = available list }
^
Error: Syntax error
```
We now run the interpreter:
```
$ dune exec ocaml/interpreter.exe demo/stackoverflow3.ml
File "demo/stackoverflow3.ml", line 9, characters 76-77, parser stack (most recent first):
- line 9:66-75 LIDENT
label_declaration_semi ::= mutable_flag LIDENT . COLON possibly_poly(core_type_no_attr) list(attribute) SEMI list(attribute)
label_declaration ::= mutable_flag LIDENT . COLON possibly_poly(core_type_no_attr) list(attribute)
- line 9:65-65 mutable_flag
- line 9:33-65 label_declaration_semi
- line 9:14-33 label_declaration_semi
- line 9:13-14 LBRACE
- line 9:11-12 EQUAL
- ...
```
This output reveals that the parser is mid-way though either a `label_declaration_semi` or a `label_declaration` rule.
In this case `.` indicates state of the parser in the line
`label_declaration ::= mutable_flag LIDENT . COLON possibly_poly(core_type_no_attr) list(attribute)`
We can now capture the problematic situation with a pattern recognizing precisely this situation:
```
| [label_declaration: mutable_flag LIDENT . COLON]
{ ... }
```
We want to express that we are looking ahead at an equal sign in the token stream.
The lookahead `token` is passed as a parameter to the semantic action. We can directly pattern match on it.
It also means that our rule is not total: it should only apply when the token is `Parser_raw.EQUAL`. This is expressed by adding the `partial` keyword before the semantic action. When an action is `partial`, it should evaluate to an option. Returning `Some _` means that the rule applied, returning `None` forces lrgrep to resume matching with the next rules. With this in mind, we now formulate the following pattern and add it to `ocaml/parse_errors.mlyl`:
```
| [label_declaration: mutable_flag LIDENT . COLON]
partial {
match token with
| Parser_raw.EQUAL ->
Some "Expecting ':' to declare the type of a record field, not '='"
| _ -> None
}
```
Next we recompile the ocaml frontend with `make` for the frontend to pick up the newly added error pattern.
Rerunning `ocamlc` now gives a nice error message:
```
$ ocamlc demo/stackoverflow3.ml
File "demo/stackoverflow3.ml", line 9, characters 76-77:
9 | type state = { problem : problem; current_grid : int option grid; available = available list }
^
Error: Expecting ':' to declare the type of a record field, not '='
```
A more complex example
----------------------
- Run the interpreter to dump the state of the parser when it fails:
```
$ dune exec ../ocaml/interpreter.exe stackoverflow2.ml
Entering directory '/home/jmi/software/lrgrep'
File "stackoverflow2.ml", line 4, characters 2-6, parser stack (most recent first):
- line 3:22-23 SEMI
seq_expr ::= expr SEMI . PERCENT attr_id seq_expr
seq_expr ::= expr SEMI . seq_expr
seq_expr ::= expr SEMI .
- from 2:2 to 3:22 expr
↱ seq_expr
strict_binding ::= EQUAL seq_expr .
- line 1:30-31 EQUAL
↱ strict_binding
fun_binding ::= strict_binding .
↱ fun_binding
strict_binding ::= labeled_simple_pattern fun_binding .
- line 1:26-29 labeled_simple_pattern
```
- The output tells us that the rule has already been reduced by the one-armed `if` rule in the OCaml grammar:
```
| IF ext_attributes seq_expr THEN expr
{ Pexp_ifthenelse($3, $5, None), $2 }
```
- Add (complex) rule:
```
| expr as e; SEMI
partial {
match token with
| Parser_raw.ELSE -> (
match e with
| None -> assert false
| Some (Parser_raw.MenhirInterpreter.Element (state, expr, startp, _endp)) ->
match Parser_raw.MenhirInterpreter.incoming_symbol state with
| N N_expr -> (
match expr.pexp_desc with
| Pexp_ifthenelse(_, _, None) ->
Some ("The semicolon line "
^ string_of_int startp.pos_lnum
^ ", character "
^ string_of_int (startp.pos_cnum - startp.pos_bol)
^ " terminates the if _ then _ expression. \
Remove it to add an else branch."
)
| _ -> None
)
| _ -> None
)
| _ -> None
}
```
Misc notes
----------
A few things notices underway:
- missing `cmon` dependency
- `realpath` to allow shellscript to run outside `demo`
- Jan noticed a slight difference between `::=` from interpreter and `:` in mlyl
- LATER: typing of captured constructs in the generated code (avoid wrapping with options and manually matching Menhir's GADT when it is safe)
- LATER: in the coverage output, factor common reductions
- TODO: Rearrange sections in increasing order of difficulty
- LATER: Document other error rules