The Problem
As you're probably aware, I have 4 discord bots: Kernel, Spectral, PokéVenture and Economy. Originally, these were all in a single bot however I decided to split them up to make each bot more focused. However, I didn't want to have to use separate prefixes for each one. I wanted all of my bots to work together with each other to feel like a seamless experience. Some commands would be specific to just one bot, but others (like the help command), would be present on all bots. Then, when the user types a command, the bots have to decide between themselves who should respond.
Key Requirements
Now this might sound easy, but there's some requirements that the solution must satisfy:
- It must be fast. We can't have the bots having a little conversation with each other going "after you", "no no, you first" (the bots are British...). This system must not add any noticeable delay to command executions.
- It must be flexible. If you change a prefix on one bot, the other bots need to pick up on it immediately. If a bot is removed, the other bots shouldn't keep waiting for it to respond!
- It must be able to adapt when a bot goes down. The other bots need to be ready to take over should one of them be unresponsive.
- It must be resilient against lag. If there's a short delay, there should never be two bots responding to one command. Equally, multiple executions should be able to happen simultaneously without the bots getting confused about which message they are responding to.
- It must be simple. It shouldn't have to scan through message content to guess which command it responded to. It should have the same implementation for all commands, making it easy to implement for all existing commands and any new commands.
- It must be reliable. If something goes wrong, then key features will be unavailable. Therefore, there need to be built-in fallbacks that ensure at least one bot will actually respond.
- It must be invisible. Users shouldn't know what is happening. It is not feasible to add anything visible to the message content.
This feature is so crucial as it is a complicated feature that underpins many of the most important commands in the bots. If any of these requirements are not satisfied, then using the bots will deliver a subpar experience in comparison to either having it all as one bot or having separate prefixes.
Finding a Solution
As you can probably see, I did come up with a solution in the end. However, I don't want to simply jump to the final solution. Instead, I want to take you through my thought process, as I came up with various ideas that went nowhere, before the final solution began to emerge.
An Easy Solution
An obvious solution is to simply allow whichever bot reads the message first to respond, then all the others only respond if there isn't already a response. This would even improve response times as it's taking the first response out of various bots.
However, it's easy to see that this has a key problem. Notably, if the bots see the message at a similar time, then there could easily be multiple bots responding to the same message. Surprisingly though, it satisfies all of the other requirements! Unfortunately, this is a crucial requirement so this solution just won't do.
Who Should Respond?
It's clear that we need a mechanism to determine which bot should respond first. The bots should send a message to claim it, look at the send times to determine who was first and then process the command. This is not invisible, nor is it fast. We could store a document in the database and update it to claim messages... no, this is not fast. We could make the bots wait a random period of time before answering... no, that's even worse. The best idea that came to mind was simply to arrange the bots in a fixed hierarchy, so that each bot knows exactly who will be responding even before the command is typed. This is invisible, fast and simple.
When Things Go Wrong
While this solution is great, there's a key problem it doesn't solve: what happens if a bot goes down? What happens if a bot doesn't respond quickly enough? No-one wants to find that PokéVenture breaks just because Kernel is down. Clearly, we need a way for the bots to determine whether a given bot has responded or not, so that it can step in if necessary.
Let's handle the second problem - when should bots step in to help. We need to find a threshold for how long needs to pass before we give up and move on to the next bot in the hierarchy. I chose 1 second, as it's large enough that it should never usually happen, but also isn't so large that it leads to a significant delay (the bot is still usable on 1 second ping, albeit undesirable).
However, we still have one key issue to resolve. Namely, if a bot has ping greater than 1 second, then it will respond after the other bot has stepped in, leading to two responses. This has a simple fix, only requiring bots to check how much time has passed and avoid responding afterwards if it's no longer their turn. It's important that the final bot in the hierarchy should respond no matter how much time has passed, as no-one else is going to!
Now we can move onto the first problem - how do we determine when a bot has responded to a request? This might seem simple - after all, we could just check if the bot sends a message after the command was sent, and if so, assume the bot was responding to that message. However, this breaks down when lots of commands are being sent at once, or if any bots are lagging. One response could be assumed to respond to multiple messages (leading to no response being sent), or a response could be attributed to the wrong message (leading to multiple responses). We need a way of identifying which message each command is responding to. A simple way. An invisible way. There aren't any hidden attributes in the message itself that can be used. Accessing the database wouldn't be fast. Adding visible content to the message owuldn't be invisible. Now, this just leaves my solution: identifiers.
Identifiers
An identifier is used to inform the other bots which message it is responding to. It does this in a way that does not forefeit speed at all, and is entirely contained within the message. Well, how do I do this?
Invisible characters! Go on, try using a command like .help
! Can you find the hidden identifier? All of the responses to these commands have 10 invisible characters at the beginning that lets other bots know which message it is responding to. Depending on the message, this can be found in:
- The content (if there is no embed)
- The embed title (usually)
- The embed description (if there is no title)
When receiving a message by a bot, it will scrape these locations to find the identifier and determine which message it is responding to. Then, it knows exactly when it needs to step in, and when another bot has got it covered.
But how is it generated? Thought you'd never ask...
- First of all, I have compressed the ID down to only 8 bits. This is done by simply taking the ID modulo 256. This ensures the identifier is short but also gives a low enough chance of 1 in 256 for it to be interpeted wrongly.
- Then, as you may have guessed, it's converted to binary. There aren't many zero-width characters to work with, so only having to use two is great.
- Finally, the binary string is converted into invisible characters. It begins and ends with a zero width space to make it clear that it is an identifier, and to prevent any characters from interfering with the rest of the message. Then, a Zero-Width Joiner is used for a 1 and a Zero-Width Non-Joiner is used for a 0.
And we're done! We now have a fast, reliable way of bots determining which message they were responding to. Now, they can simply wait to see if any bot responds to the message they're interested in (and check message history), and if not, they can step in with the knowledge that they're not being annoying and giving the same response again.
The Final Solution
In summary, the final solution uses a fixed hierarchy of bots. They are ordered Kernel, Spectral, PokéVenture then Economy (although this is very easy to change). When receiving a message, each bot will determine which order in the hierarchy it is, and the relevant delay. If it is first, it will respond as soon as it can. If it is second, it will spend a second waiting for a response and if it doesn't come, it will step in. If the first bot receives a message but realises it was too slow, it will no longer respond. This continues through the hierarchy until the last bot, which will respond no matter how late it was. An identifier is attached to each message which contains some zero-width characters to indicate which message the bot is responding to, so it is reliable even with lag.
- This is fast, as it doesn't add anything expensive to the first bot in the hierarchy
- This is flexible - when you change a prefix or kick a bot, this is removed from the hierarchy straight away
- This can adapt when a bot goes down, by allowing other bots to step in if one is unresponsive
- This is resilient against lag, by using identifiers to track which message is being responded to
- This is simple, as it has the same implementation for all commands that can scan through the message in predicable places
- This is reliable, as if any bot is working, then it will always respond
- This is invisible, as the user can't see any of what's happening
Overall, this approach is very successful at ensuring that exactly one bot responds to each query when they are using the same prefix.
Limitations of this Approach
That said, there are still a few edge cases, such as:
- If the first bot is slow and the second bot is completely down, then no bot will respond.
- If the first bot is down, then all commands will have a 1 second lag
The first point is difficult to address (it would have to check if there has already been a response, and if not then respond but ensure that the second bot also doesn't respond...), and is also a very rare scenario so shouldn't be much of an issue. The second point would require having the bots learn whether another has gone down through some common algorithm, and noticing when one is back up. However, this will also require the first bot knowing if the other bots have decided it is down, and a way of communicating that it's back without responding twice. This would be complicated to implement, and is infrequent enough to not be a significant issue.
Fortunately these scenarios are rare and haven't been an issue in the past, so it is not a priority to fix them.
Conclusion
Overall, this system has been very successful at solving the problem. While more complex than simply using different prefixes, this solution undeniably delivers a better user experience and improves the feeling of using the bots. This is why I chose to create and solve a new problem to ensure all of my bots work well together (when in reality, I could've just done what everyone else did and it would've been way easier...).