WebSockets with Elixir - How to Sync Multiple Clients
Eli Fatsi, Former Development Director
Article Category:
Posted on
Back in November, we bunkered down in rooms across our 3 offices and went to work for a Pointless Weekend. The Boulder and Falls Church teams joined forces to bring Jambells into the world. If you haven’t seen it yet, Jambells is a fun little web app that lets everyone live out their dream to be in a Handbell Choir, with their friends, on their mobile devices. This app had all sorts of technical challenges and to tackle websocket management and client syncing, we turned to Elixir.
Elixir & Phoenix
If you haven’t heard of Elixir, it’s a fairly new language that combines the fault-tolerant and concurrent strengths of Erlang with the syntactic joys of Ruby. Phoenix is an Elixir Web Framework which makes getting Elixir up on the web incredibly easy.
Given that concurrency is in Elixir’s wheelhouse, it seems only natural that Phoenix is great at managing WebSocket connections. Phoenix gives you a handful of web related conventions to make use of, and if you’re coming from a Rails background like myself you’ve heard of most of these - Router
, Controller
, View
. One you might not have heard of is a Channel
. From the documentation:
You can think of channels as controllers, with two differences: they are bidirectional and the connection stays alive after a reply.
Controllers for WebSockets pretty much. Here’s some code to demonstrate the basics:
defmodule App.MyChannel do
use Phoenix.Channel
def join(socket, "chat_room", message) do
socket = assign(socket, :name, message["name"])
{:ok, socket}
end
def event(socket, "new:message", message) do
user_name = get_assign(socket, :name)
broadcast socket, "message:new", %{
name: user_name,
content: message["content"]
}
socket
end
def leave(socket, message) do
socket
end
end
With this code, we have the structure for a chat application. We’re defining three functions in our Channel here:
join
- this is called whenever a client initializes the WebSocket connection. In this chat app of ours, the client passes a
"name"
parameter on joining, and we can save that to the socket withassign
. Once a client has joined, they can send and receive messages scoped to that topic ("chat_room"
in this instance). - must return
{:ok, socket}
or{:error, socket, "reason"}
to allow or deny pubsub access
- this is called whenever a client initializes the WebSocket connection. In this chat app of ours, the client passes a
event
- this is called whenever a client sends a WebSocket message to the server. In this case, this function would handle a new chat message, as the client sends a
"new:message"
event with a"content"
parameter. The Channel can grab the user name (which was saved to the socket connection when the user joined) withget_assign
, and broadcast to all connected clients the new message. - must return
socket
- this is called whenever a client sends a WebSocket message to the server. In this case, this function would handle a new chat message, as the client sends a
leave
- the opposite of
join
. This removes a client from the set of subscribers to the"chat_room"
topic and tears down the connection. - must return
socket
- the opposite of
Now that we have the basics of joining, handling/broadcasting events, and leaving, let’s get to the interesting bit.
Syncing Multiple Clients
The idea of Jambells is technically quite similar to the Chrome Racer experiment - multiple users join a “room” using a shared code, and a synchronized game kicks off across everyone’s device. In Jambells, notes descend Guitar Hero style down the screen and you shake your phone (like a handbell) when one of your notes reaches the play area.
With multiple people playing a song together, it’s vital all the devices are in sync. Client side clocks keep the game in sync after the start, so the challenge at hand was having all devices “start the game” at the same time.
The first attempt at this had the game leader send a "game:start"
event to the server, which would in turn broadcast "game:started"
to all the clients.
We crossed our fingers and hoped this just worked … but it didn’t. Different devices seemed to receive the "game:started"
message later than others, and in a very inconsistent manner. It’s incredible how something like half a second can turn a holiday song everyone knows into some completely unrecognizable sound. Improvements were essential.
We guessed that since different phones had different levels of latency, we could just measure that latency and send the "game:started"
event along with some calculated delays.
So when the leader sends the "game:start"
event, the server broadcasts a "game:ping"
, and all the clients respond immediately with "game:pong"
. Once all the delays are calculated, "game:started"
is broadcasted and different clients wait their designated delay time before starting.
We felt smart, crossed our fingers, and tested things out again. Unfortunately, this approach was just as messed up as before. With some logging and video analysis, we discovered that any client who was given a delay would be behind by the amount of time we told it to wait.
Strange, still messed up, but not all bad news. We just removed the delays and everything worked! So it’s not that some devices have different latency issues, but instead that WebSocket communication just needs to be warmed up before being consistently speedy across devices.
So this is what we landed on. We no longer clock response times to determine latency, but we instead just wait for all clients to pong. Once all the connections have been primed, we broadcast "game:started"
and everyone starts together! We’ll admit it’s still not perfect, but without any crazy logic or optimized use of multiple app servers spread across the globe (how Chrome Racer tackles this problem) we’re seeing accurate synchronization within about 1/8th of a second.
Recap
This approach of warming up a WebSocket connection and then relying on it to be speedy enough seems pretty solid. It could of course be improved on - ideally you’d have your server repeatedly broadcast the state so clients can continuously hone in on synchronization, and possibly calculate the actual latency once the connection has been “warmed up” to help even more. Given the project constraints though (holiday season release pressure, small server-side team - me), we happily achieved satisfactory synchronization without too much hassle.