Nomad an alternative to Kubernetes

- devops nomad kubernetes grpc go

If you are familiar with this blog you know I really appreciate Kubernetes: as a former ops I strongly believe Kubernetes is one way to bundle “the sum of 20y ops good practices”.

But there are others solutions one is Nomad.
It’s made by Hashicorp, creators of Vault, Consul, Terraform
In general Hashicorp is synonym with quality.

Nomad is a task scheduler, a task could be execute a command, run a Docker container or a QEMU vm…
The project is open sourced with several enterprise features.

I had the opportunity to deploy the same project on Kubernetes and Nomad.
Here are some key points:

Self Deployment

If you are self hosted before using Kubernetes or Nomad, you first need to deploy them.
Nomad is one binary, but the truth is Nomad is almost useless without Consul (also one binary).
First deploy a Consul cluster, then deploy Nomad, both are so integrated Nomad will automatically create and join a cluster on top of an existing Consul instance running on the same host.
Deploying a working Nomad cluster is easy vs painful Kubernetes but it’s unfair to compare since some parts are missing in Nomad, see below.

Migration

This project consists of several gRPC micro services, Redis, PostgreSQL, Traefik and NATS.
I was afraid the rewrite needed to work on a different target than Kuberneres would be time consuming.
But with containers and following good practices, no matter how or on what you deploy it’s almost the same.
I find the Nomad deployment files (HCL) a lot easier to look at and understand than Kubernetes YAML which I’ve read for 4 years !@#

Here is one of my Nomad deployment file as an example, I find every lines to be self-explanatory.

Service Discovery

One big difference with Kubernetes: the services are not load balanced behind an IP address, by default Docker instances run on the existing node network stack using different ports to avoid conflicts.
So for a container to reach another service, you have to know the address in advance which is not always possible, hardcode a listening port which is also not always possible, or use Consul services resolution, via templates, via DNS or the Consul API.

Not all the apps know about Consul, especially containers you don’t own, like Redis.
For those Nomad is providing a template language (build around Consul-template) which allow you to change files or environment according to Consul:

      template {
        data = <<EOH
NATSBROKERS = "{{range $index, $service := service "nats-client" }}{{if ne $index 0}},{{end}}{{$service.Address}}:{{$service.Port}}{{end}}"
REDISADDR = "{{range $index, $service := service "redis-cache" }}{{if ne $index 0}},{{end}}{{$service.Address}}:{{$service.Port}}{{end}}"
EOH

        destination = "secrets/file.env"
        env = true
      }

This will create a file in /secrets/file.env (/local & /secret are volumes automatically mounted into your containers), with all addresses & ports entered in Consul as nats-client service, so on every starts your container will know about other services.

Finding the endpoints to a dependency service when your program start is one thing, updating this list in the long run is something else.

Nomad offers a way to notify your program for templates changes via a signal, but it’s not always optimal, you should probably go with client load balancing when possible:

I’ve already blogged about client load balancing inside Kubernetes with gRPC, the same applies with Nomad using a Consul gRPC resolver.
The code modifications needed for this to work with Consul has been just one line, importing the package.

DNS

Kubernetes installation includes most of the time CoreDNS which talks to the Kube API and updates the DNS entries accordingly.
With Nomad you’ll use Consul.
Consul is a key value storage (same as ETCD in k8s world) where you can register some special keys: a service.
This service can then be queried via the Consul API or via an embedded DNS server.
There are many ways to make this DNS visible to your programs, they are documented here and need some manual actions for your hosts to respond to consul. top level domain.

I’m using a simple dnsmasq.

# Enable forward lookup of the 'consul' domain:
server=/consul/127.0.0.1#8600

# Accept DNS queries only from hosts whose address is on a local subnet.
local-service

If systemd resolver is running you’ll hit an issue since it’s already binding 127.0.0.1:53.
Add this to /etc/systemd/resolved.conf

DNSStubListener=no

Then use the dns_servers directive to point your containers to dnsmasq.

Network

Since there is no automatic level 4 load balancing, no network manager ala Flannel, in base Nomad, you are on your own on the network management of your cluster which is probably a good thing, dealing with k8s sometimes unnecessary network complexity is no fun.

Beginning with Nomad 0.10.x there is a mesh solution using a sidecared Envoy called Connect, very similar to Istio.
Again I’m not convinced about the whole “Mesh is the solution for everything”, especially using Envoy which still suffers big issues with gRPC.

Load Balancing

Load balancing (level 7) is no different than Kubernetes, I’m used to Traefik and the move was easy (note that only Traefik 1.x is supporting Nomad so far).

With this section Traefik queries the Consul API and find any services tagged service.

[consulCatalog]
endpoint = "127.0.0.1:8500"
domain = "consul.localhost"
prefix = "traefik"
constraints = ["tag==service"]

Than tag your Nomad service to automatically create an endpoint load balanced to all the Nomad allocations:

        tags = [
          "traefik.tags=service",
          "traefik.frontend.rule=Host:map.dev.mydomain",
        ]

Consumption

When deployed in Kubernetes this project was running on GCP/GKE but also on a ARM64 Kubernetes cluster.
My ARM64 cluster made of 50$ servers was really unstable, this is a good way to test for failures, since Nomad I’m suffering way less reboots.
I was really surprised that, for the same workload, the overall Nomad cluster was way less resource consuming, surely Kubernetes is capable of more but we are comparing the same project deployments here.
Even on x86 GCP with 3 medium servers you’ll hit resources issues very fast, for some reasons Nomad is resources friendly.

Documentation & Help

I found the Hashicorp documentation very well written but sometimes they assume you are in the Hashicorp ecosystem for too long.
There is no good documentation for service resolution inside Nomad since it’s Consul based…

Kubernetes is the de facto orchestrator, finding answers is easier, don’t go with Nomad if you are new to this.

Included

Overall

As always, there is no such thing as a tool to rule them all.
Kubernetes is getting all the visibility for good reasons, but it’s probably not suitable for small to medium companies.
You don’t need to deploy Google infrastructure when you are not Google.
Nomad is a clear alternative and a fresh challenger with great ideas.
I’ll definitely keep Nomad for this project, not going back to k8s.
There are giant Consul deployments out there, not sure about Nomad but will follow the project actively.