# Machine Trouble Write-up
> From: Pearl CTF
> Tags: dfa, misc, medium
> Problem statement:
> I don't think even smart people like you can get the flag out of such a machine with so little memory. Prove me wrong. If you find it, enclose the flag in pearl{}.
>
> NOTE: Flag does not have any upper case character and number.
>
> nc dyn.ctf.pearlctf.in 30018
>
> Solved by: Jackylkk2003
## Netcat Contents
```!
ooooooooo. oooooooooooo .o. ooooooooo. ooooo
`888 `Y88. `888' `8 .888. `888 `Y88. `888'
888 .d88' 888 .8"888. 888 .d88' 888
888ooo88P' 888oooo8 .8' `888. 888ooo88P' 888
888 888 " .88ooo8888. 888`88b. 888
888 888 o .8' `888. 888 `88b. 888 o
o888o o888ooooood8 o88o o8888o o888o o888o o888ooooood8
Welcome to The Finite State Machine:
=======================RULES===========================
The flag is set as the input string, and the alphabets of the language are set to a-z, {, }, _.
Here, you can define your own states and transitions.
If there is no defined transition for a particular letter, then the machine gets trapped.
It must be a DFA, not an NFA.
An output of 1 means that the string is present in the language; 0 means otherwise.
'@' takes the machine from one state to another by consuming any one letter.
'~l' takes the machine from one state to another by consuming one letter unless the letter is 'l'.
Example: '5 @ 6' takes the machine from state 5 to state 6 for all letters.
Example: '6 ~b 7' takes the machine from state 6 to state 7 for all letters except 'b'.
=================================================
Enter number of states:
```
## DFA?
If you have learnt what a [DFA](https://en.wikipedia.org/wiki/Deterministic_finite_automaton) is, then you can skip most of this part, since what the challenge is trying to do will be quite obvious to you. You may want to skip directly to [the challenge](#The-Challenge).
If you unluckily don't know what is a DFA is. You can follow this brief introduction or go search online. Google is your good friend.
As suggested in the challenge, a DFA is a finite state machine. It contains some states and can perform some transitions from one state to another.
But for better understanding, the following section introduces some terminologies related to DFAs since some of them are used in this challenge.
## Terminologies
* Finite state machine (FSM): It is a mathematical model of computation. It contains a finite number of states and different operations can be carried out in different states. There can be transitions that brings the machine from one state to another. It is usually represented using a state diagram. An example of a FSM is a vending machine. A simplified state diagram of a vending machine is: 
* Transition: The change of states in a FSM. For example, the transitions in a vending machine above are "Choice selected" and "Payment received"
* Alphabet: The set of all characters in this context. For example, the alphabet of a binary string is {0, 1}, the alphabet of English characters is {a, b, c, ..., x, y, z}, the alphabet of decimal numbers is {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.
* Language: A subset of strings. For example, the set of all binary strings without consecutive 0s is a language, the set of all strings with exactly 3 'a' is a language, the set {"Hello", "Hi"} is a language, the empty set is also a language. All strings in the language must come from a specific alphabet.
* Deterministic Finite Automaton (DFA): A mathematical model of a computational machine on strings, and is itself a FSM. What it does is to determine if the input string belongs to certain language. As some facts unrelated to this challenge, all languages recognized (i.e. given any input string, the machine can decide if it is in the language) by DFA is called a regular language.
* Non-deterministic Finite Automaton (NFA): Also a mathematical model of a computational machine on strings, and is itself a FSM, but is non-deterministic. More details about deterministic/non-deterministic is given below.
## How a DFA works
A DFA contains some states, exactly 1 starting state, some accepting states, and some transitions. If the machine stops in an accepting state, then the machine decides that the string is in the language. If the machine stops in a non-accepting state, then the machine decides that the string is not in the language.
In a DFA, each transition consumes exactly 1 character, and the character that is consumed will be the first unconsumed character, or in other words, the next character in the string.
We usually represent a DFA using a graph ~~(or mathematical notations, but you wouldn't want to use it for complicated DFAs)~~. In the example below, node 1 is a starting state (you can see the thing on it as an annotation) and node 3 is an accepting state (you can see the double circle).

In this example, the alphabet is {a, b}, and our language is whether the string starts with ab.
If we want to check if "abaa" is in the language, we follow the steps below.
1. We are at state 1, and the (remaining) string is "abaa". The first character of this string is 'a', so we follow the 'a' edge and reach state 2.
2. We are at state 2, and the remaining string is "baa". The first character of this string is 'b', so we follow the 'b' edge and reach state 3.
3. We are at state 3, and the remaining string is "aa". The first character of this string is 'a', so we follow the 'a' edge and reach state 3.
4. We are at state 3, and the remaining string is "a". The first character of this string is 'a', so we follow the 'a' edge and reach state 3.
5. We are at state 3, and there are no more character left in the string. Since state 3 is an accepting state, the machine returns true.
Here is another example, guess what this language is:

It is a machine that recognizes the language "All strings that does not have aa as its substring". Again, the alphabet is {a, b}. Try to convince yourself that the language is indeed what I have suggested (Left as an exercise :P).
## NFA
An NFA is similar to DFA, except 4 changes:
1. There can be transitions that consumes no characters.
2. Each node can have multiple edges going out but with the same characters (for example, having 2 'a' edges going out of the same state).
3. An NFA accepts a string as long as there is at least 1 path that makes the machine stops at the ending state.
4. If there is no valid path from initial state to any accepting state by traversing the edges according to the rules, the string is not accepted.
The second change is the main reason why it is called non-deterministic.
In other words, the "deterministic" in DFA means that there is always one and only one possible path to traverse through the graph, no ambiguity at all. (Although the challenge allows 0 to also be possible)
But since NFA is not our main target today, we are not discussing a lot about it.
## The Challenge
After all these long background, let's come back to the challenge. And now we know a lot more about FSM and DFA, let's reread the rules.
```!
The flag is set as the input string, and the alphabets of the language are set to a-z, {, }, _.
```
So the input to the machine is the flag, and we are considering only the letters a-z, {, }, _ as the alphabet. In other words, there are no numbers, uppercase letters, other strange symbols and so on in this context (the flag).
```!
Here, you can define your own states and transitions.
```
So we need to construct our own FSM/DFA.
```!
If there is no defined transition for a particular letter, then the machine gets trapped.
```
For a DFA, it is supposed to have exactly 1 transition for all the characters in the alphabet for all states. But this challenge allows there to be none, and the machine will be trapped in this case. (So we have 3 possible outputs for the DFA, yes/no/trapped)
```!
It must be a DFA, not an NFA.
```
There is at most 1 transition for each character in each state.
```!
An output of 1 means that the string is present in the language; 0 means otherwise.
```
The machine output for yes/no. ~~(With some experimentation you will know that when the machine is trapped, it outputs "Machine in trapped state.")~~
```!
'@' takes the machine from one state to another by consuming any one letter.
```
Syntax sugar only. We can write less things.
```!
'~l' takes the machine from one state to another by consuming one letter unless the letter is 'l'.
```
Syntax sugar 2.
```!
Example: '5 @ 6' takes the machine from state 5 to state 6 for all letters.
```
Example 1.
```!
Example: '6 ~b 7' takes the machine from state 6 to state 7 for all letters except 'b'.
```
Example 2.
**TL;DR** We construct a DFA, and the machine is run on the flag (or at least the things inside pearl{...}) as the input. We will get 0/1/Machine in trapped state. as the output.
## Solution
Since the solution to this challenge is relatively easy :P, I will just provide the solution instead of any intermediate solutions or observations. (I don't have any of these when solving the challenge anyways)
Remember we have an example on a DFA checking whether the input starts with "ab" [here](#How-a-DFA-works)? We can simply use this idea to check what the flag starts with using brute force.
* Does the flag start with a?
* Does the flag start with b?
* ...
* Does the flag start with _?
After we know that the flag starts with d, we can continue the process:
* Does the flag start with da?
* Does the flag start with db?
* ...
* Does the flag start with d_?
Keep repeating until we have the whole string.
So the idea is simple, but how to do it?
We can do it with a DFA like this.

For the ?, we put everything except our target character.
Note that if the character to test does not match the next character in the string, the machine will get accepted. But if it is in the string, it will be trapped (output 1). We are doing this kind of in another way round since this helps prevent infite loop in the program (because of my laziness).
The next thing we need to do is to write a script that automates this brute force.
```python=
from pwn import *
def sl(s):
r.sendline(s.encode())
prefix = "pearl{"
sigma = "abcdefghijklmnopqrstuvwxyz{}_"
flag = False
ans = ""
while not flag:
n = len(ans)
flag = True
for c in sigma:
r = remote("dyn.ctf.pearlctf.in", 30018)
sl(str(n + 2)) # Num of nodes
sl(str(n + 1)) # Accepting state is the n-1 state
sl(str(n + 2)) # Edges
for i, x in enumerate(ans): # Eliminating known characters
sl(f"{i} {x} {i+1}")
sl(f"{n} ~{c} {n+1}") # All except our target character
sl(f"{n+1} @ {n+1}") # Accepting state loop
s = r.recvall().decode()
if "Machine in trapped state." in s: # Target character found
ans += c # Update answer
flag = False
break
print(prefix + ans + "}") # Output: pearl{dfa_hacked}
if flag: # The whole string is known
break
```
We got the flag `pearl{dfa_hacked}`.
## The End?
Although this script works, there is a slight flaw...
`No DoS, DDoS, automated scans or generating any large amount of traffic by any other means on any challenges and other contest infrastructure. It is not permitted and is never intended in any challenge.`
Does this script count as "automated scans" or "generating large amount of traffic"? Well we established 101 (if I calculated correctly) connections. Can we make it better?
In fact, yes. We can use a binary-search like approach to reduce the number of queries.
We split the set of possible next character into 3 groups G1, G2, G3, and construct the DFA as this:

We can know that if the character is in G1, the machine will output 0. If it is in G2, the machine will output 1. If it is in G3, the machine will be trapped. Then in each query, we can reduce the solution space by 2/3.
Again, if I have calculated correctly, we can establish only 40 connections to the server.
Since I am too lazy to write a script for that, and we have already obtained the flag anyway, making 101 connections are ok.
~~What brute force? What automated scans? What large traffic? I don't understand what you mean.~~