Elegant TCP with Elixir - Part 1 - TCP as Messages

Apr 08, 2020

Over the next few posts, I want to talk about some features of Elixir and how they can be leveraged to build simple (or complex) TCP socket code. I say "features of Elixir" but, as I've pointed out before, the beauty of the language is the synergy it has with its runtime and standard library. So, technically these aren't language features, but let's not get pedantic.

While there's a new Socket module which better aligns with traditional socket API ergonomics, I'm going to stick with then gen_tcp module.

The first feature is about receiving data from a socket. Normally, given a socket, you'd call the recv function to get data from it. Two common patterns are to issue a recv after a send as part of a request->reply, or to block on recv, process the incoming message, and then loop back wait for the next message - handling messages as they come in.

In Elixir, we have another option: letting the runtime read from the socket and deliver the data as messages to a process. This is called "active mode". Before we jump into code, note that there are possible negative implications to this feature (which we'll get to).

By default, the process that creates the socket is considered the socket's "controlling process". For outbound connections, that's probably fine. You can spawn a process and connect to your destination from within that process. However, for incoming connections, you'll most likely create the socket in a listening process and spawn a handler. Something like:

defp accept(socket) do
  case :gen_tcp.accept(socket) do
    {:ok, client_socket} -> GenServer.start(Client, socket: client_socket)
    err -> # log this error
  end
  accept(socket)  # go back to listening for more connections
end

What we need to do is change the "controlling process" from the process that accepted the connection to the newly spawned GenServer:

{:ok, client_socket} ->
   {:ok, pid} = GenServer.start(Client, socket: client_socket)
   :gen_tcp.controlling_process(client_socket, pid)

That's it. Thankfully, any messages received before the controlling process is changed will get moved to the new process' mailbox (so you won't lose any messages, but this wasn't always the case!).

Inside of the controlling process, there are three messages you'll want to handle. These are normal Erlang messages. I'm showing how to handle them in a GenServer:

def handle_info({:tcp, socket, data}, state) do
...
end

def handle_info({:tcp_closed, socket}, state) do
...
end

def handle_info({:tcp_error, socket, reason}, state) do
...
end

How do you put a socket in active mode? There are three ways. First, if you're opening an outbound connection, you can pass active: true to connect/3 or connect/4. Secondly, you can also pass this option to listen/2 which will then put any accepted sockets in active mode. Finally, an existing socket can be put in or out of active mode by calling :inet.setopts(socket, active: true | false).

That means you can switch a sockets from active to passive (and vice versa) whenever you want. This is particularly useful given the other possible values you can give the active flag. Specifically, you can specify :once. When active: :once is used, you'll receive 1 {:tcp, socket, data} message and then the socket will automatically switch into passive mode. At which point you can proceed to process the data (possibly manually calling :gen_tcp.recv(socket) to get more data). Typically in these cases, once you're done processing a message, you'll call :inet.setopts(socket, active: :once) and repeat the flow:

def handle_info({:tcp, socket, data}, state) do
  # assume the socket was initially in active: :once
  # do things with data, maybe read more bytes from socket
  :inet.setopts(socket, active: :once)
  {:noreply, state}
end

Why does :once exists? It's to prevent overburdening your process (especially in terms of memory usage). Remember, Elixir's process mailboxes are boundless. When a socket is in active mode, the messages will be delivered as fast as they can be - which may be faster than you can process them. By using active: :once you more truthfully represents your capacity to process data to the VM and the operating system.

(You can also specify active: -32768 ..32767, but that's less common and meaningless until we cover message boundaries in part 2)

Having said that, if all your peers are well-behaved, keeping the socket in active mode has two big advantages. First, it makes your code consistent. Switching between active and passive means having to deal with two different APIs and the fact that sometimes errors (like a closed socket) will be returned sycnhronously and sometimes asynchronously. Secondly, active is faster. It let's the VM transfer data from the OS into your process in the most efficient manner and avoids constant calls to :inet.setopts/2. Intuitively, it makes a lot of sense that passive or once can be a lot slower: all that time you're spending processing the message is time that the OS and VM could be passing more bytes into your process (which active mode does).