---
tags: programming languages, ars, rewriting
---
(part of a course on programming languages)
# String Rewriting Exercises
*Term Rewriting* refers to rewriting abstract syntax trees (typically without binders). *String Rewriting* is the special case where we only rewrite strings (as opposed to trees). Normal forms, confluence, termination, invariants can all be studied in this simpler setting. Everything we will learn will also transfer to the generalisations of string rewriting. (And, btw, string rewriting is already Turing complete (Why?).)
## Introduction
In all of the following exercises the task is to analyse a so-called *abstract reduction system* (ARS). An ARS $(A,\to)$ consists of a set $A$ of words (finite lists, strings) and $\to$ is a relation on $A$. When we specify a rewrite rule
$$w\to v$$
we understand that it can be applied inside any longer word. For example, the rewrite rule
$$ba\to ab$$
can be applied to rewrite the word $cbad$ to $cabd$.
*Footnote:* The math behind the exercises requires the material on equivalence relations, abstract reduction systems, termination and invariants, but it may be good to first try the exercises and then learn the math.
The exercises in this section come in form of puzzles. **I give you algorithms** (in form of ARSs), but do not tell you what they are meant to compute. This is for you to find out ... in each case, the task is to explain in plain English (without mentioning the rules at all) **which specification** the algorithm/ARS implements. [^puzzles]
[^puzzles]: While this makes the puzzles more fun, the main reason to let you reverse-engineer the specification is that not knowing in advance the specification will make it necessary to pay close attention to the rules of the ARS as well as the mathematical definitions of normal form, confluence, etc. Thus, the puzzles give you both an opportunity to practice 'rule-following' (which is such an important skill for debugging code) and an opportunity to test your understanding of the mathematical definitions.
Note that the specification must be independent of the implementation. In particular, **the specification is not allowed to mention the rules** of the ARS.
### The Task
Roughly speaking, the task is as follows:
- **Show that the ARS is an algorithm** (that is, the ARS is terminating and every input computes to a unique result (=the ARS has unique normal forms)).
- **Find the specification** the ARS implements (this usually requires finding a (complete) invariant).
### The Roadmap
(skip this at first reading)
Here is a **roadmap** that you may find useful. [^meaning] But it may be better to first try the exercises and then come back to the roadmap.
[^meaning]: If we are able to successfully carry out such an analysis, we have three ways to think about the *meaning* of a word: as its equivalence class, as its normal form, and through its invariant.
- The basic question is how many equivalence classes are there and how we can recognise in which equivalence class a given word is.
- A good idea with any problem solving task is to collect some basic facts and obervations and then to see what patterns emerge. Are there normal forms? What are they? Do all elements reduce to a normal form? Are normal forms unique?
- Does the system terminate? [^termination]
- Can we characterise equivalence classes by unique normal forms? [^UNF]
- Can we characterise equivalence classes by invariants? [^invariants]
- Derive a specification from the invariant.
[^termination]: Recall what we learned about measure functions. There was also this trick which assigned numbers to words.
[^UNF]: Recall the theorem that confluence and termination implies the existence of UNFs. Thus, one way to establish the existence of UNFs is to establish confluence.
[^invariants]: Recall the lecture about invariants. One trick that is often helpful is to code letters as numbers and use modular arithmetic. Btw, invariants can be an alternative to using confluence to prove uniqueness of NFs.
## Exercises
- (Exse 1) The rewrite rule is
ba -> ab
- Why does the ARS terminate?
- What is the result of a computation (the *normal form*)?
- Show that the result is unique (the ARS is *confluent*).
- What specification does this algorithm implement?
- (Exse 2) <font color=red>**Homework for the report (HW 6)**. Rewrite rules are
aa -> a
bb -> a
ab -> b
ba -> b
- Why does the ARS terminate?
- What are the normal forms?
- Is there a string `s` that reduces to both `a` and `b`?
- Show that the ARS is confluent.
- The next questions have all essentially the same answer:
- Replacing `->` by `=`, which words become equal? [^equal]
- What is the nicest way you can find to characterise which words are equal by one English sentence?
- Can you repeat the last item using modular arithmetic?
- Which specification does the algorithm implement?
</font>
[^equal]: The relation defined by `=` is called the equivalence relation induced by `->`. We called it an equivalence and not an equality, because, even though we have, eg, `aa=a` the string `aa` and `a` are different as strings.
- (Exse 3) Rewrite rules are
aa -> a
bb -> b
ba -> ab
ab -> ba
- Why does the ARS not terminate?
- What are the normal forms?
- Modify the ARS so that it is terminating, has unique normal forms (and still the same equivalence relation).
- Describe the specification implemented by the ARS. (Hint: Use an invariant for this.)
- (Exse 4) Rewrite rules are
ab -> ba
ba -> ab
Same questions as above.
- (Exse 5) <font color=red>**Homework for the report (HW7, Option 1)**. Consider the rewrite rules
ab -> ba
ba -> ab
aa ->
b ->
- Reduce some example strings such as `abba` and `bababa`.
- Why is the ARS not terminating?
- How many equivalence classes does $\stackrel{\ast}{\longleftrightarrow}$ have? Can you describe them in a nice way? What are the normal forms?
[Hint: It may be easier to first answer the next question.]
- Can you change the rules so that the ARS becomes terminating without changing its equivalence classes? <!--Which measure function proves termination of your modified system?-->
- Write down a question or two about strings[^input] that can be answered using the ARS. Think about whether this amounts to giving a semantics to the ARS.
[Hint: The best answers are likely to involve a complete invariant.]
</font>
[^input]: Think of the string as the input, the ARS as the algorithm and the question as the specification of the algorithm.
- for 2024: change `aa ->` to `aa -> a`. Add a remark about "interesting" (a characterisation of the equivalence classes that mentions the reduction relation is not interesting).
- (Exse 6) Consider the rewrite rules
ba -> bbaa
aa ->
ba -> ab
ab -> ba
- Can one reduce `ab` to `aabb`?
- Can one reduce `ba` to `abbaababbab`?
- Can one reduce `ba` to `abbaababbaba`?
- Can you find a nice way of stating which words are in the equivalence class of the empty word `[]` and in the equivalence class of `a`?
- Can you list some properties of words that remain invariant under application of rules?[^invariant]
- Can you describe all equivalence classes? (That requires some work.)
- Can you change the rules so that the ARS becomes terminating without changing its equivalence classes? (This also requires more work. It helps if you answered the previous question first.) Which measure function proves termination of your modified system?
- I wrote out a [guided tour](https://hackmd.io/@alexhkurz/BJB3ORnDP) for the last two items, but try first yourself.
- (Exse 7) Consider the rewrite rules
ab -> a
bb -> b
aa -> b
plus rules saying that the order of letters does not matter.
- Think of `a` and `b` as colours and an [urn](https://en.wikipedia.org/wiki/Urn_problem) that has balls of colour `a` ("white") and `b` ("black"). Interpret the rewrite rules as rules about drawing balls from the urn.
- If you start with 198 black balls and 99 white balls, what is the colour of the last ball remaining?
- Answer the question above with the help of a suitable invariant.
- Use the invariant to show that the system has unique normal forms.
- If you start with $n$ black balls and $m$ white balls, what is the colour of the last ball remaining?
- (Exse 8) **<font color=red>Homework for the report (HW7, Option 2)</font>**. Consider rewrite rules
ab -> cc
ac -> bb
bc -> aa
plus rules saying that the order of letters does not matter. Starting from 15 `a`, 14 `b` and 13 `c`, is it possible to reach a configuration in which there are only `a`s or only `b`s or only `c`s?
(Once you found a helpful invariant, the problem is not difficult anymore.)
## More Exercises
**Exercise:**
If the last letter is R you may add a K at the end.
A string of the form Ox may be rewritten to Oxx
You may replace any occurrence of RRR by K
You may erase any occurrence of KK
Above, `x` is a variable that maybe replaced by any string.
- Describe an ARS $(A,\to)$ that is given by the 4 rules above.
- Give some sample reductions. Can you reduce `OK` to `OR`?
- Is it possible to reduce `OR` to `OK`?
As it often happens with this kind of exercises, it can be quite tricky until you suddenly see the solution. But trying to understand what is going on pays off in any case.
**Exercise:** Show that the following process always terminates. There is a box full of black and white balls. Each step consists of removing an arbitrary ball from the box. If it happens to be a black ball, one also adds an arbitrary (but finite) number of white balls to the box.
[^invariant]: A function $P:A\to B$ is an ***invariant*** for an ARS $(A,\to)$ if
$$ a\to b \ \Longrightarrow \ P(a)=P(b)$$ for all $a,b\in A$.
[^deterministic]: An ARS $(A,\to)$ is ***deterministic*** if for all $a\in A$ there is at most one $b\in A$ such that $a\to b$.
[^alpha]: We take lambda-terms here up to $\alpha$ equivalence, so you may rename bound variables at any point in your computation without explicitely invoking a rule.
[^signature]: STOP reading if you do not want to see hints at the solution.
- In case of string rewriting, the question is what are the operations that we use to form words such as `aba` from letters `a` and `b`? There are at least three possibilities.
1. Empty word, binary concatenation and constants for the letters. This needs some equations.
2. Unary operations for each letter. This does not need any equations.
3. One $n$-ary operation for each natural number $n$ to construct a lists lenght $n$.
- In case of the sorting example, the signature is given by constants `0`, `[]`, unary operation symbols `s`, `sort`, and binary operation symbols `[-|-]`, `min`, `max`, and `insert`. We can also refine this to a "many-sorted signature" by introducing types `nat` and `natlist` and say that operation symbols are typed as follows (in, hopefully, self-explanatory notation)
0 : nat
s : nat -> nat
max : nat,nat -> nat
min : nat,nat -> nat
[] : list
[-|-] : nat, natlist -> natlist
insert : nat, natlist -> nat
sort : natlist -> natlist
[^hintCAM]: Show that `A(CAM)(CAM)=(CAM)(CAM)` where `C` and `M` are defined as above.