In the past few weeks we have talked about the chat not being there yet and we have had a progress chat. In this blog post we want to explain why the chat is not here yet — after all, just sending some messages to and fro doesn’t sound all that difficult.
The thing is, we do not want to “just send some messages.” We want to make sure that the list of rooms, and who has joined what room, is acurate and reflects the actual situation — and does not go “out of sync” with what the RPGpad server thinks is happening.
For example, if the RPGpad server thinks that player Bob has joined the Lobby, but Alia’s computer does not they are out of sync. This would mean that if Alia also joined the Lobby, she would not know her messages are also going to Bob.
To keep everyone up-to-date with the newest changes there are actually a lot of things the different computers need to message each other about: the creation, changes, and removal of rooms, when players joins a room, when they change their mask, when they leave again, the actual chat messages the player sends, etc.
Complexity of Communicating Computers
Because there are multiple computers talking to each other and their messages are sometimes delayed a little, it is very easy for them to receive an update that they do not yet understand.
To start off we will explain with a diagram what a simple, and correct exchange looks like. We have left out most of the technical data from the messages — this is so we can focus on the core problem and not drown everyone in complex notation and details that don’t matter:
In the above diagram, there are three computers: Alia’s Computer, the RPGpad server, and Bob’s computer. Each of these has a vertical “track” of what they believe to be the current situation.
At the start, each computer believes there is a single empty room called “Lobby”. When Alia wants to join her computer sends the
Join Lobby plz? request to the RPGpad server. — in return the RPGpad server sends everyone an update
Alia joined Lobby!. Note how the messages take a moment to arrive, and the computers update their belief about the situation a little later than the RPGpad server.
After a while, Alia closes her browser screen and is disconnected from the chat. The RPGpad server detects this, updates what it believes and sends the
Alia left Lobby! update to Bob’s computer as well. Bob’s computer receives the message and updates what it believes too.
In the end, the RPGpad server and Bob’s computer believe the same thing, and that thing is the actual situation. All is well!
Now, we’ll walk through a scenario of something that goes wrong, it is a little more complex because it involves the creation of a new room:
We have the same three computers: Alia’s computer, the RPGpad server, and Bob’s computer. At first, they all believe the same thing: there as a single room called “Lobby”, and no one has joined it.
Alia creates a new room, and her computer sends a request
Create Inn plz?. The RPGpad server creates the room, updates it beliefs, and sends out the
New room: Inn! update. From here, things start getting wonky.
Bob’s computer receives the update, updates its beliefs and (perhaps because Bob is eager to play his new character) immediately requests to join the new room with
Join Inn plz?.
Meanwhile, Alia’s computer has not yet seen the new room update (because it was delayed somewhere).
When the RPGpad server receives Bob’s computer’s request, it puts Bob in the room and sends out the
Bob joined Inn! update. Bob’s computer receives it and updates its beliefs. Alia’s computer receives it, and does not understand the update — there is no room “Inn”, so what is going one here?!
There are many other situations where messages that get switched around — or even dropped and never delivered — create situations where a computer cannot understand the situation, or changes its beliefs in a way that makes them incorrect:
- If Alia just opened the chat, Alia’s computer can receive updates for the “Lobby” room before her computer has received the list of current rooms which will include information for “Lobby”.
- If Alia removes a room, Bob’s computer can receive updates for people leaving the room after Bob’s computer received the update removing the room.
- If Alia joins a room on her phone, and at the same time leaves that room with her laptop, Bob’s computer can receive the messages in the wrong order, which makes it think that Alia is no longer in the room.
And there are many, many more.
Our solution is to add an update number to each room. We increment the update number of a room by 1 every time something about the room changes, and we send the number to all connected computers with each update message. This allows us to resolve the problem:
This is the same situation as the one above. The three computers are, again, Alia’s computer, the RPGpad server, and Bob’s computer. The same sequence of events happens, and the
New room: Inn! message to Alia’s computer is delayed again.
However! Because the message now comes with a nice update number of
2 Alia’s computer sees that there is something off about it, and buffers it until the update with number
1 has arrived. Then, it changes its beliefs by looking at the updates in the correct order!
We are currently working on implementing this solution, and making sure that it works in all situations. We want to handle all room creation, changing, and removals in the correct way, and we want to be especially sure that what you see on your screen is the correct situation — nothing creates problems as fast as not knowing who is in the same chat room!