Analysing grammatical errors in L1-L2 UD treebanks: first steps towards a multilingual AWE tool (abstract for something)

--- title: "Analysing grammatical errors in L1-L2 UD treebanks: first steps towards a multilingual AWE tool" author: Arianna Masciolini ---  L1-L2 parallel dependency treebanks are learner corpora of L2 sentences paired with correction hypotheses. Both the original sentences and the corresponding corrections are morphosyntactically annotated, usually following the cross-lingual Universal Dependency (UD) standard.  This format was proposed in an attempt to overcome the interoperability issues arising from the coexistence of several markup styles and error tagsets currently used for annotating learner text. On top of being language-agnostic, in fact, L1-L2 treebanks do not rely on any predefined error taxonomy: error classes can be defined dynamically as treebank queries, expressed as tree patterns.  We argue that, if correction hypotheses are automatically obtained through Grammatical Error Correction software and morphosyntactic annotation is performed by an UD parser, L1-L2 treebanks can also provide interesting opportunities for developing multilingual Automatic Writing Evaluation (AWE) tools. In this setting, errors are identified by looking for discrepancies between L1 and L2 trees, i.e. between UD representations of learner sentences and their correction hypotheses. The discrepancies themselves can be described by the same tree patterns used for queries, which can in turn serve as a basis for generating learner-friendly feedback.  While small-scale L1-L2 treebanks exist for English, Chinese and Italian, this format has not yet become widespread. An obstacle in this sense is the lack of a sufficiently powerful query engine. In the original proposal, queries were intended to be pairs of dependency tree patterns. However, users of what appears to be the only publicly available L1-L2 query engine, developed for the Treebank of Learner English, are only allowed to specify custom tree patterns for L1 sentences. When it comes to grammatical errors, only a set of pre-defined patterns, presented as a list of error labels, is made available. Besides, the pattern matching language itself is generally limiting because of the fact that it treats UD sentences as sequences of tokens rather than trees.  For these reasons, our first contribution is a new query engine allowing to look for discrepancies at an arbitrary level of granularity, applicable to L1-L2 treebanks as well as to any other parallel UD corpus. We use a pre-existing pattern matching language for UD trees as a starting point. A basic version of the engine, under development at the time of writing, is based on the above mentioned idea of expressing L1-L2 queries as pairs of patterns. First, learner sentences are aligned to the corresponding correction hypotheses at the phrase and word level, resulting in a set of correspondences between L1 and L2 subtrees. After that, the L1 pattern is used to retrieve subtrees from the L1 treebank. Finally, the resulting L1 subtrees are looked for among the L1-L2 alignments to select those where the L2 subtree matches the L2 pattern. Given the verbosity of some simple queries in this preliminary version of the engine, we plan to extend the pattern matching language in at least two ways: by introducing variables and by providing syntactic sugar allowing to write pairs of patterns as single expressions.  We plan to test our query engine on the handcrafted L2 Italian VALICO corpus, which has been converted to UD after having originally been built as an error-tagged learner corpus. The fact that the conversion to UD has been perfomed manually ensures that errors in learner sentences are annotated consistently, based on guidelines following the general principle of _literal reading_, i.e. on adhering as much as possible to the observed word forms and usages. Our aims are to show the reliability of our engine by establishing mappings between tree patterns and some of the original error labels and to demonstrate the query language's expressive power by providing examples of queries that could not be performed by simply searching for specific error tags.  We also intend to experiment with L2 Swedish essay sentences from the SweLL corpus, in this case annotating the text automatically with a standard UD parser. Clearly, both the precision and the recall of our error retrieval may be affected by parse errors, L2 text being especially challenging for mainstream parsers, with annotations of ungrammatical segments inconsistent at best and misleading at worst. Our hope is that investigating the limitations that existing tools present in this sense will give us insights useful to later design a parsing strategy specifically meant for our use case, where the parsing of an L2 sentence could be informed by the annotation of the corresponding correction hypothesis.  Finally, we argued that our pattern matching syntax is also a viable format for describing the errors found in learner text. Our second contribution will therefore be a program that, rather than finding example sentences given a specific error pattern, extracts error descriptions from L1-L2 treebanks. As mentioned in the above, error descriptions are syntactically identical to queries and therefore, by definition, machine-readable. Given this, the generation of user-friendly metalinguistic feedback can be seen as a data-to-text conversion taks. If time allows it, we intend to address this problem by implementing a Controlled Natural Language in Grammatical Framework, a well-established programming language for multilingual grammar engineering which would allow us to provide multilingual natural language feedback adjustable to the learner’s proficiency level and metalinguistic awareness - a technology that could be integrated to any language learning environment involving processing free user input.