LR-grep documentation session ============================= `cmon` is missing from the README instructions: ``` lrgrep$ dune build src/main.exe ocaml/interpreter.exe ocaml/frontend.bc File "src/dune", line 5, characters 16-20: 5 | fix cmon utils menhirSdk lrijkstra)) ^^^^ Error: Library "cmon" not found. -> required by _build/default/src/main.exe File "lib/utils/dune", line 3, characters 12-16: 3 | (libraries cmon fix)) ^^^^ Error: Library "cmon" not found. -> required by library "utils" in _build/default/lib/utils -> required by executable interpreter in ocaml/dune:8 -> required by _build/default/ocaml/interpreter.exe ``` Looking at examples from Stackoverflow: 1. https://stackoverflow.com/questions/71847555/ocaml-syntax-error-but-i-dont-know-where-can-somoene-help-me already handled! LATER: Could improve message (link to the manual ?) 2. https://stackoverflow.com/questions/52323697/why-is-there-a-ocaml-syntax-error-on-if-statement 2 is interesting, there is a semi-colon after the `if ... then ...` which is terminating this expression for the OCaml parser, therefore it fails on the `else`. The important files to specify the grammar and the error messages are: - the parser `ocaml/parser_raw.mly`, and the lexer `ocaml/lexer_raw.mll`, which are taken from the OCaml frontend with builtin error handling removed - `ocaml/parse_errors.mlyl` in which we define the error matching rules (TODO: mention that user should run shellscript to set up OCAMLPARAM) Devising a new rule ------------------- Consider the following example from StackOverflow: https://stackoverflow.com/questions/70691144/match-inside-match-ocaml-raises-syntax-error This code snippet has multiple problems, so let's stick to the first part. The type declarations in this part switches from a colon `:` to an equal sign `=` in the middle of a record type definition: ```ocaml= type 'a grid = 'a Array.t Array.t type problem = { initial_grid : int option grid } type available = { loc : int * int; possible : int list } type state = { problem : problem; current_grid : int option grid; available = available list } ``` Without any patterns `ocamlc` just yields a syntax error: ``` $ ocamlc demo/stackoverflow3.ml File "demo/stackoverflow3.ml", line 9, characters 76-77: 9 | type state = { problem : problem; current_grid : int option grid; available = available list } ^ Error: Syntax error ``` We now run the interpreter: ``` $ dune exec ocaml/interpreter.exe demo/stackoverflow3.ml File "demo/stackoverflow3.ml", line 9, characters 76-77, parser stack (most recent first): - line 9:66-75 LIDENT label_declaration_semi ::= mutable_flag LIDENT . COLON possibly_poly(core_type_no_attr) list(attribute) SEMI list(attribute) label_declaration ::= mutable_flag LIDENT . COLON possibly_poly(core_type_no_attr) list(attribute) - line 9:65-65 mutable_flag - line 9:33-65 label_declaration_semi - line 9:14-33 label_declaration_semi - line 9:13-14 LBRACE - line 9:11-12 EQUAL - ... ``` This output reveals that the parser is mid-way though either a `label_declaration_semi` or a `label_declaration` rule. In this case `.` indicates state of the parser in the line `label_declaration ::= mutable_flag LIDENT . COLON possibly_poly(core_type_no_attr) list(attribute)` We can now capture the problematic situation with a pattern recognizing precisely this situation: ``` | [label_declaration: mutable_flag LIDENT . COLON] { ... } ``` We want to express that we are looking ahead at an equal sign in the token stream. The lookahead `token` is passed as a parameter to the semantic action. We can directly pattern match on it. It also means that our rule is not total: it should only apply when the token is `Parser_raw.EQUAL`. This is expressed by adding the `partial` keyword before the semantic action. When an action is `partial`, it should evaluate to an option. Returning `Some _` means that the rule applied, returning `None` forces lrgrep to resume matching with the next rules. With this in mind, we now formulate the following pattern and add it to `ocaml/parse_errors.mlyl`: ``` | [label_declaration: mutable_flag LIDENT . COLON] partial { match token with | Parser_raw.EQUAL -> Some "Expecting ':' to declare the type of a record field, not '='" | _ -> None } ``` Next we recompile the ocaml frontend with `make` for the frontend to pick up the newly added error pattern. Rerunning `ocamlc` now gives a nice error message: ``` $ ocamlc demo/stackoverflow3.ml File "demo/stackoverflow3.ml", line 9, characters 76-77: 9 | type state = { problem : problem; current_grid : int option grid; available = available list } ^ Error: Expecting ':' to declare the type of a record field, not '=' ``` A more complex example ---------------------- - Run the interpreter to dump the state of the parser when it fails: ``` $ dune exec ../ocaml/interpreter.exe stackoverflow2.ml Entering directory '/home/jmi/software/lrgrep' File "stackoverflow2.ml", line 4, characters 2-6, parser stack (most recent first): - line 3:22-23 SEMI seq_expr ::= expr SEMI . PERCENT attr_id seq_expr seq_expr ::= expr SEMI . seq_expr seq_expr ::= expr SEMI . - from 2:2 to 3:22 expr ↱ seq_expr strict_binding ::= EQUAL seq_expr . - line 1:30-31 EQUAL ↱ strict_binding fun_binding ::= strict_binding . ↱ fun_binding strict_binding ::= labeled_simple_pattern fun_binding . - line 1:26-29 labeled_simple_pattern ``` - The output tells us that the rule has already been reduced by the one-armed `if` rule in the OCaml grammar: ``` | IF ext_attributes seq_expr THEN expr { Pexp_ifthenelse($3, $5, None), $2 } ``` - Add (complex) rule: ``` | expr as e; SEMI partial { match token with | Parser_raw.ELSE -> ( match e with | None -> assert false | Some (Parser_raw.MenhirInterpreter.Element (state, expr, startp, _endp)) -> match Parser_raw.MenhirInterpreter.incoming_symbol state with | N N_expr -> ( match expr.pexp_desc with | Pexp_ifthenelse(_, _, None) -> Some ("The semicolon line " ^ string_of_int startp.pos_lnum ^ ", character " ^ string_of_int (startp.pos_cnum - startp.pos_bol) ^ " terminates the if _ then _ expression. \ Remove it to add an else branch." ) | _ -> None ) | _ -> None ) | _ -> None } ``` Misc notes ---------- A few things notices underway: - missing `cmon` dependency - `realpath` to allow shellscript to run outside `demo` - Jan noticed a slight difference between `::=` from interpreter and `:` in mlyl - LATER: typing of captured constructs in the generated code (avoid wrapping with options and manually matching Menhir's GADT when it is safe) - LATER: in the coverage output, factor common reductions - TODO: Rearrange sections in increasing order of difficulty - LATER: Document other error rules