A report about Script System in Wvs project and OdinMS
===
(You can refer [3ad5fc5](https://github.com/tnsc4502/mnwvs196/tree/3ad5fc5494b65aa4ec997da3a78ecfae604a9987) as the one with old script system and [8793fce](https://github.com/tnsc4502/mnwvs196/tree/8793fceed8440dd936881e901234b02118f98f21) as the latest one with updated script system.)
An experiment about using std::thread to execute a script asynchronously showed that it may causes frequently D.C.
To begin with, I will briefly describe the pipeline of script execution, and in this report we only consider two scenarios to start up a script, the first one is by User_OnSelectNpc packet, and the other one, in this report, is by FieldSet.
The packet "User_OnSelectNpc" is received when players click NPCs, then the server side invokes a corresponding script by calling pScript->Run() where lua_pcall is called to execute the lua script file.
```graphviz
digraph User_OnSelectNpc_pipleline {
rankdir=LR;
node [shape=record];
{ "User::OnPacket"->"Script::Run"->"lua_pcall"; }
}
```
Invoking a lua script by lua_pcall is similar to call a function, the caller keep waiting until finish the whole script. Moreover, the conversation between the player and the Npc continues only when the player respond. For example, the following simple lua code(self represents the script system itself, askNumber shows a conversation with a textbox requires the player input an integer and onSay shows a simple conversation with an OK button):
``` lua=
inputNum = self.askNumber("Input an arbitrary number (0 ~ 1000):", 0, 0, 1000)
self.onSay("You input:" . inputNum)
```
Obviously, without player's response (inputNum), the script SHOULD NOT execute line2 since inputNum is undefined. Let's see what happens in askNumber(and other functions that are used to make the conversation):
```graphviz
digraph askNumber_example {
rankdir=LR;
node [shape=record];
{ "self.askNumber"
->"Script::SelfAskNumber"
->"User::SendPacket(MessageInfo)"
->"Script::Wait"
->"Return user's input"; }
}
```
where Script::Wait will enter some kind of infinite loop(e.g, while(1) in C++) and only be woken up by User_OnScriptMessageAnswer packet. In consideration of the CPU utilization, we don't actually use while(1) to wait the response but conditional_variable instead.
To wake up the suspended script, it sets the user input according to the packet information(we skip the details here) and calls m_pScript->Notify() at User::OnScriptMessageAnswer; therefore the Script::SelfAskNumber is now able to return a valid value.
```graphviz
digraph wait_and_notify {
{
rankdir=LR;
node [shape=record style=filled];
a [label="1. Script::Wait"]
b [fillcolor=yellow label="2. m_cndVariable.wait"]
c [label="3. Script::Notify"]
d [label="4. m_cndVariable.notify_one"]
e [label="5. lua_pushinteger(return user's input)"]
}
a->b->e
c->d
}
```
In original OdinMS, it uses state-machine mechanism to handle the conversation, every Npc script have function action(mode, type, selection) which is used to indicate current state and user's response (except askText and askNumber), each time the User_OnScriptMessageAnswer packet is received, the server invoke it again and pass different parameters to distinguish the state. The main reason, I guess, why the developer of OD adopted this way is simplicity, especially when making the conversation with prev/next button. When clicking the prev button, it requires the script system roll back to the previous line and perform the instruction again; however, the built-in script system in Java doesn't support such manipulation like "rewind", and lua neither.
The OD simply solved this problem(It's not a problem technically) by increasing/decreasing the state(In [3ad5fc5](https://github.com/tnsc4502/mnwvs196/tree/3ad5fc5494b65aa4ec997da3a78ecfae604a9987), we solve the problem by using a for-loop with a page indicator which is controlled by user's response, for more details, please refer to Script::SelfSayNextGroup, in [8793fce](https://github.com/tnsc4502/mnwvs196/tree/8793fceed8440dd936881e901234b02118f98f21) we use message-based mechanism instead, please refer to ScriptNPCConversation::SelfSayNext); however, it somehow brings itself inconvenience to write the script. For example, the following nested askMenu in OD style:
``` javascript=
if(state == 0) {
cm.sendSimple("Select an option:#L1#Buy something#l\r\n#L2#Job Selection(For beginner)#l");
} else if(state == 1) {
if(selection == 1) {
cm.sendSimple("Which item you want to buy?....");
nextState = 2
} else if(selection == 2) {
cm.sendSimple("Which job you want to be?...");
nextState = 3
}
} else if(state == 2) {
cm.sendYesNo("You really want this item? It costs...");
nextState = 4
} else if(state == 3) {
cm.sendYesNo("You really want to be this job? You can't change the job once you've decided.");
nextState = 5
} else if(state == 4) {
if(selection == 1)
cm.gainItem(...);
else
cm.sendOk("Don't waste my time!");
} else if(state == 5) {
if(selection == 1)
cm.changeJob(...);
else
cm.sendOk("Take some time to consider carefully...");
}
```
Comparing to the reformed one below, the latter one seems better:
``` javascript=
if(state == BEGIN) {
cm.sendSimple("Select an option:#L1#Buy something#l\r\n#L2#Job Selection(For beginner)#l");
} else if(state == MENU_SELECTION_1) {
if(selection == 1) {
cm.sendSimple("Which item you want to buy?....");
nextState = ITEM_SELECTION
} else if(selection == 2) {
cm.sendSimple("Which job you want to be?...");
nextState = JOB_SELECTION
}
} else if(state == ITEM_SELECTION) {
cm.sendYesNo("You really want this item? It costs...");
nextState = ITEM_SELECTION_CONFIRM
} else if(state == JOB_SELECTION) {
cm.sendYesNo("You really want to be this job? You can't change the job once you've decided.");
nextState = JOB_SELECTION_CONFIRM
} else if(state == ITEM_SELECTION_CONFIRM) {
...
} else if(state == JOB_SELECTION_CONFIRM) {
...
}
```
It now at least have descriptions of states not only strange numbers, but it still requires lots efforts to trace and read the whole script, the more complex the script is, the more pain you will feel. The elegant way to implement this script in our project may like this(I've re-implemented some OD-based servers, they are now able to execute scripts in this way too):
``` javascript=
selection = self.askMenu("Select an option:#L1#Buy something#l\r\n#L2#Job Selection(For beginner)#l");
if(selection == 1) {
selection = self.askMenu("Which item you want to buy?....");
if(self.askYesNo("You really want this item? It costs..."))
self.exchange(0, selection, 1)
else
self.onSay("Don't waste my time!")
} else if(selection == 2) {
selection = self.askMenu("Which job you want to be?...");
if(self.askYesNo("You really want to be this job? You can't change the job once you've decided."))
self.setJob(selection)
else
self.onSay("Take some time to consider carefully...")
}
```
For me and NEXON, it just resembles to write a C/C++ code.
The story of OD is now over, we should back to Script::Wait. Imagine that a player clicks a Npc and suddenly leave his seat, as mentioned before, the server invokes the script via lua_pcall and keeps waiting the response to the conversation, and finally the server is blocked consequently. The whole server, at least for that player(if every socket has it's own thread, then the thread will be blocked rather than the whole server), can no longer receive/send any packets since the server is blocked by Script::Wait, that is a disaster because it should still be able to receive/send packets even when talking to Npc(e.g, the party or buddy notification).
To prevent the server from being blocked by scripts, we first use std::thread to perform pScript->run() concurrently. Yet we quickly discovered that it has high probability(3/5) to cause the client to disconnect from server and return to Login Section every time the player clicks the Npc.
Honestly, I haven't figured out the real reason, but since D.C. usually caused by asymmetric IV between client and server(the encryption code didn't take thread-safety into design consideration), and I think the conditional waiting caused somewhere being unsynchronized, the clue is probably around there.
Fortunately, lua provides another lightweight method "coroutine" to do things like "concurrently", we don't discuss the difference between coroutine and other concurrent models here. There are two important functions related to our script system: lua_resume which resumes to run the script and lua_yield which enables us the ability to pause the execution of the script.
Our new pipeline of how the script system work is like below:
```graphviz
digraph User_OnSelectNpc_pipleline {
rankdir=LR;
node [shape=record];
{ "User::OnPacket"->"Script::Run"->"lua_resume"; }
}
```
The difference between lua_resume and lua_pcall is that the former just invoke the script asynchronously and return a state immediately(if the value is LUA_OK or value not LUA_YIELD, we abort the script) unlike the latter one which wait the whole script being run through.
Now we don't need std::thread and conditional variables anymore, the new Script::Wait simply calls lua_yield to pause the execution, and Script::Notify just calls Script::Run to resume the execution. For implementation details, please refer to [8793fce](https://github.com/tnsc4502/mnwvs196/tree/8793fceed8440dd936881e901234b02118f98f21), we divided the script system into core, conversation-related and others.
Using coroutine doesn't cause any D.C. problem, and the overall performance seems better than using std::thread and conditional variables.
In the other scenario, the FieldSet may invoke an arbitrary function in scripts(like events in OD). For example, when reaching the time limit, we may want to call onTimedOut in the script to do some stuffs. To call lua a function from C++ side, it requires first calling lua_getglobal(lua state obj, "func name"), and then calling lua_resume. But luaL_loadfile won't get all symbols in the script since it doesn't run through the whole script, we need to call lua_pcall after initializing the script.
To conclude, the script system in wvs project is adaptable to any scenarios in MapleStory and is stable enough though most helper functions like transerFieldRequest haven't implemented yet LEL.