11 Mar 2019, 00:32

gRPC Load Balancing inside Kubernetes

Context

I wanted to blog about this for years: how to connect to a Kubernete’s loadbalanced service?
How to deal with disconnections/re-connections, maintenance? What about gRPC specifically?
The answer is heavily connected to the network stack used by Kubernetes, but with the “Mesh Network” revolution, It’s not always clear how it works anymore and what the options are.

How it works

First I recommend you to watch this great yet simple video: Container Networking From Scratch, then the Services clusterIP documentation.

To make it simple when you create a Service in Kubernetes, it creates a layer 4 proxy and load balance connections to your pods using iptables, the service endpoint is one IP and a port hiding your real pods.

The Problem

A simple TCP load balancer is good enough for a lot of things especially for HTTP/1.1 since connections are mainly short lived, the clients will try to reconnect often, so it won’t stay connected to an old running pod.

But with gRPC over HTTP/2, a TCP connection is maintained open which could lead to issues, like staying connected to a dying pod or unbalancing the cluster because the clients will end on the older pods.

One solution is to use a more advanced proxy that knows about the higher layers.

Envoy, HAProxy and Traefik are layer 7 reverse proxy load balancers, they know about HTTP/2 (even about gRPC) and can disconnect a backend’s pod without the clients noticing.

Edge

On the edge of your Kubernetes cluster, you need a public IP, provided by your cloud provider via the Ingress directive it will expose your internal service.

To further control your request routing you need an Ingress Controller.
It’s a reverse proxy that knows about the Kubernetes clusters and can direct the requests to the right place. Envoy, HAProxy and Traefik can act as Ingress Controllers.

Internal Services & Service Mesh

In a Micro-services environment, most if not all your micro-services will also be clients to others micro-services.

Istio, a “Mesh Network” solution, use Envoy as a sidecar. This sidecar is configured from a central place (control plane) and makes each micro-service talking to each other through Envoy.

This way the client does not need to know about all the topology.

That’s great but in a controlled environment (yours), where you control all the clients, sending all the traffic through a proxy is not always necessary.

Client Load Balancing

In Kubernetes you can create a headless service; where there are no load balanced single endpoints anymore, the service pods are directly exposed, Kubernetes DNS will return all of them.

Here is an example service called geoipd scaled to 3.

Name:      geoipd
Address 1: 172.17.0.18 172-17-0-18.geoipd.default.svc.cluster.local
Address 2: 172.17.0.21 172-17-0-21.geoipd.default.svc.cluster.local
Address 3: 172.17.0.9 172-17-0-9.geoipd.default.svc.cluster.local

It’s up to your client to connect them all and load balance the connections.

In Go gRPC client side, a simple dns:/// notation will fetch the entries for you, then the roundrobin package will handle load balancing.

conn, err := grpc.Dial(
    "dns:///geoip:9200",
    grpc.WithBalancerName(roundrobin.Name),
)

This may sound like a good solution but it is not: the default refresh frequency is 30 minutes, meaning whenever you add new pods, it can take up to 30 minutes for them to start getting traffic! You can lower this problem by tweaking MaxConnectionAge on the gRPC server:

gsrv := grpc.NewServer(
    // MaxConnectionAge is just to avoid long connection, to facilitate load balancing
    // MaxConnectionAgeGrace will torn them, default to infinity
    grpc.KeepaliveParams(keepalive.ServerParameters{MaxConnectionAge: 2 * time.Minute}),
)

Even if you could refresh the list more often you wouldn’t know about pod eviction fast enough and you’d miss some traffic.

There is a nicer solution, implementing the gRPC client resolver for Kubernetes, talking to the Kubernetes API to get the endpoints and watch them constantly, this is exactly what Kuberesolver does.

// Register kuberesolver to grpc
kuberesolver.RegisterInCluster()

conn, err := grpc.Dial(
    "kubernetes:///geoipd:9200",
    grpc.WithBalancerName(roundrobin.Name),
)

By using kubernetes schema you tell kuberesolver to fetch and watch the endpoints for the geoipd service.

For this to work the pod must have GET and WATCH access to endpoints using a role:

kubectl create role pod-reader-role --verb=get --verb=watch --resource=endpoints,services 
kubectl create sa pod-reader-sa 
kubectl create rolebinding pod-reader-rb --role=pod-reader-role --serviceaccount=default:pod-reader-sa 

Redeploy your app (the client) with the service account:

spec:
  serviceAccountName: pod-reader-sa

Deploy, scale up, scale down, kill your pods, your client is still sending traffic to a living pod !

I’m surprised it’s not mentioned more often, client load balancing did the job for years, the same apply inside Kubernetes environment.
It is fine for small to medium projects and can deal with a lot of traffic, this will do it for many of you unless if you are Netflix sized…

Conclusion

Load-balancing proxies are great tools, especially useful on the edge of your platform. “Mesh Network” solutions are nice additions to our tool set, but the cost of operating and debugging a full mesh network could be really expensive and overkill in some situations, while a client load balancing solution is simple and easy to grasp.

Thanks to Prune who helped me with this post, and to Robteix & diligiant for reviewing.

04 Mar 2019, 00:32

Traefik gRPC Load Balancing and Traces Propagation

Following my recent blog post on setting up a dev environment in Kubernetes, here are some tips to use Traefik as a gRPC load balancer.

Traefik can be used on the edge and route incoming HTTP traffic to your Kubernetes cluster, but it’s also supporting gRPC.



gRPC Load Balancing with Traefik

Here I have a gRPC service I want to expose on the edge.

apiVersion: v1
kind: Service
metadata:
  name: myapp
  labels:
    name: "myapp"
    type: "grpc"
spec:
  ports:
    - port: 9200
      name: "grpc"
      targetPort: grpc
      protocol: TCP
  selector:
    app: "myapp"
  clusterIP: None

Note the clusterIP: None, it’s a headless service.

It will create a non loadbalanced service, pod’s services can be accessed directly.

myapp.default.svc.cluster.local.    2       IN      A       172.17.0.19
myapp.default.svc.cluster.local.    2       IN      A       172.17.0.10
myapp.default.svc.cluster.local.    2       IN      A       172.17.0.16

Here is the ingress for Traefik.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: myapp-ingress
  namespace: default
  labels:
    name: "myapp"
  annotations:
    ingress.kubernetes.io/protocol: h2c
spec:
  rules:
  - host: myapp-lb.minikube
    http:
      paths:
      - path: /
        backend:
          serviceName: myapp
          servicePort: 9200

Note the h2c prefix, indicating HTTP2 protocol without TLS to your backend !

Traefik

Tracing

Traefik can be configured to emit tracing.

I’m using ocgrpc Opencensus, for gRPC metrics & traces.
It automatically emits several counters for gRPC and traces using the StatsHandler.

Unfortunately ocgrpc does not yet propagate Jaeger traces, I’ve temporary forked it to support Jaeger.

As you can see you can follow the request from Traefik down to your services.

Jaeger

Happy tracing !

21 Feb 2019, 00:19

Kubernetes Quick Setup with Prometheus, Grafana & Jaeger

Introduction

When starting on a new project or prototyping on a new idea, I find myself doing the same tasks again and again.
Thanks to Kubernetes it’s possible to setup a new env from scratch really fast.

Here is a quick setup (mostly notes) to create a dev environment using Minikube and the workflow I’m using with it.

Not knowing in advance where this future project will be hosted, I try to stay platform agnostic.
OpenCensus or OpenTracing can abstract the target platform, letting you choose what tooling you want for your dev.

I consider some tools to be mandatory these days:

  • Jaeger for tracing
  • Prometheus for instrumentation/metrics
  • Grafana to display these metrics
  • A logging solution: this is already taken care of by Kubernetes and depends on your cloud provider (StackDriver on GCP…), otherwise use another tool like ELK stack.
    On your dev, plain structured logs to stdout with Kubernetes dashboard or Stern should be fine.
  • A messaging system: for example NATS but out of the scope of this post.

Tools Installation

I find it easier to let Minikube open reserved ports, but not mandatory:

minikube start --extra-config=apiserver.service-node-port-range=80-30000

I’ll use Traefik as Ingress to simplify access to several admin UI later.

I’m not a big fan of helm, so here is a little trick, I’m only using helm to create my deployment templates, using the charts repo as a source template, so I can commit or modify the resulting generated files. (Thanks Prune for this tips).

git clone git@github.com:helm/charts.git 

helm template charts/stable/traefik --name traefik --set metrics.prometheus.enabled=true --set rbac.enabled=true \
--set service.nodePorts.http=80 --set dashboard.enabled=true --set dashboard.domain=traefik-ui.minikube > traefik.yaml 

helm template charts/stable/grafana --name grafana --set ingress.enabled=true \ 
--set ingress.hosts\[0\]=grafana.minikube --set persistence.enabled=true --set persistence.size=100Mi > grafana.yaml 

helm template charts/stable/prometheus --name prometheus --set server.ingress.enabled=true \ 
--set server.ingress.hosts\[0\]=prometheus.minikube --set alertmanager.enabled=false \ 
--set kubeStateMetrics.enabled=false --set nodeExporter.enabled=false --set server.persistentVolume.enabled=true \
--set server.persistentVolume.size=1Gi --set pushgateway.enabled=false > prometheus.yaml 

A lot more templates are available: NATS, Postgresql …

For Jaeger a development ready solution exists, here is mine slightly tweaked to use an ingress:

curl -o jaeger.yaml https://gist.githubusercontent.com/akhenakh/615686891340f5306dcbed82dd1d9d67/raw/41049afecafb05bc29de3b0d25208c784f963695/jaeger.yaml

Deploy to Minikube (ensure your current context is minikube…), if you need to work on several projects at the same time remember you can use Kubernetes namespaces, (beware some helm templates are overriding it).

kubectl create -f traefik.yaml
kubectl create -f jaeger.yaml
kubectl create -f grafana.yaml
kubectl create -f prometheus.yaml

Again to ease my workflow I want Traefik to bind 80, edit the service and change it to nodePort 80.

kubectl edit service traefik

Add some urls to your /etc/hosts

echo "$(minikube ip) prometheus.minikube" | sudo tee -a /etc/hosts 
echo "$(minikube ip) grafana.minikube" | sudo tee -a /etc/hosts 
echo "$(minikube ip) traefik-ui.minikube" | sudo tee -a /etc/hosts
echo "$(minikube ip) jaeger.minikube" | sudo tee -a /etc/hosts

Point your browser to any of these addresses and you are good to go !

Deploy your own apps

First remember to always use the Kubernetes Docker:

eval `minikube docker-env` 

My applications are reading parameters from environment, in Go, I’m using namsral/flag, so the flag -httpPort is also set by the environment variable HTTPPORT.

I then use templates, where I set all my environment variables, to create my yaml deployment.

https://gist.github.com/akhenakh/8c06186b7d524f6a60d40764b307a5d5

So typical Makefile targets would fill the template with envsubst:

.EXPORT_ALL_VARIABLES:
VERSION := $(shell git describe --always --tags)
DATE := $(shell date -u +%Y%m%d.%H%M%S)
LDFLAGS := -ldflags "-X=main.version=$(VERSION)-$(DATE)"
CGO_ENABLED=0
GOOS=linux
GOARCH=amd64
PROJET = mysuperproject
...

helloworld: 
	cd helloworld/cmd/helloworld && go build $(LDFLAGS)

helloworld-image: helloworld
	cd helloworld/cmd/helloworld && docker build -t helloworld:$(VERSION) .

helloworld-deploy: NAME=helloworld
helloworld-deploy: helloworld-image 
	cat deployment/project-apps.yaml | envsubst | kubectl apply -f - 
	cat deployment/project-services.yaml | envsubst | kubectl apply -f - 

project-undeploy:
    kubectl delete --ignore-not-found=true deployments,services,replicasets,pods --selector=appgroup=$(PROJECT)

Dockerfile contains nothing but FROM gcr.io/distroless/base and a copy of the helloworld binary.

Note all this setup is only good for your dev, production deployment is another story.

make helloworld-deploy, compilation, image creation and deployment is less than 2s over here!

Shell tool

Another useful trick when working with new tools is to have a shell available inside Kubernetes to experiment with.

Create an image for this purpose, and copy your tools and clients.

FROM alpine:3.9
RUN apk add --no-cache curl busybox-extras tcpdump

WORKDIR /root/
COPY helloworldcli .
ENTRYPOINT ["tail", "-f", "/dev/null"]

And the relevant Makefile targets:

debugtool-image: helloworldcli
	cp helloworld/cmd/helloworldcli/helloworldcli debugtool/helloworldcli
	cd debugtool && docker build  -t debugtool:latest .
	rm -f debugtool/helloworldcli

debugtool-deploy: debugtool-image
	kubectl delete --ignore-not-found=true pod debugtool
	sleep 2
	kubectl run --restart=Never --image=debugtool:latest --image-pull-policy=IfNotPresent debugtool

debugtool-shell:
	kubectl exec -it debugtool -- /bin/sh 

By calling make debugtool-shell, you’ll be connected on a shell inside Kubernetes.

Conclusion

Hope this will help some of you, I would love to hear your own tricks and setup to develop on Minikube !