SMI traffic split on Linkerd2

July 16, 2019
Kubernetes SMI Linkerd2

SMI Traffic split with Linkerd2

This year I had the opportunity to speak at the KubeCon Barcelona. It was my first KubeCon and my first big talk abroad. I’m going to let the youtube video here:

But this post it’s not about it, sorry for the personal marketing. Back in track, on this KubeCon was announced the Service Mesh Interface (SMI). “A standard interface for service meshes on Kubernetes” as its slogan says. Service Mesh itself was a massive topic at this KubeCon edition. It’s a hype (bad or good sense of the word, I’ll let you decide) and it’s almost impossible to ignore it.

SMI aims to create a standard interface for service meshes on Kubernetes. Of course, it’s a super new project, the expectations are high and what’s actually done depends on the service meshes provider. It’s worth checking out the keynote session with Gabe Monroy announcing the project and his blog post on Microsoft’s cloud blog.

Linkerd 2.4.0

On July 11th, 2019, Linkerd announced its 2.4.0 version including SMI traffic split support. Recently @RodrigoVMonte and I did a quick presentation at a Berlin DevOps Meetup. We created a repo with a simple tutorial on how it’s possible to create traffic split with Ambassador.

Now, with SMI integration, on Linkerd2, I want to recreate this tutorial, with a simple, but a useful example for applications communicating internally on a Kubernetes cluster.

Creating an environment

I’m going to use a minikube cluster for that, but any Kubernetes cluster would work. You can download the resources I’m using here.

Once we have a Kubernetes cluster running and our manifests, let’s create our first deployment:

This first file is straightforward, it creates a deployment of one service called ‘simple-service’ and its respective Kubernetes' service. Nothing special. Let’s apply it:

$ kubectl create namespace simple-service
$ kubectl apply -f simple-service-v1.yaml

We’re going to create a deployment on the default namespace to help us on the tests, the following deployment creates a deployment with a ubuntu image waiting on a long sleep:

$ kubectl apply -f debug-deployment.yaml

It seems strange, and it is. It’s supposed to be just a debug pod, don’t get attached to it. In a real environment, this deployment would be another application running on your cluster. Imagine simple-service’s clients. Let’s run a kubectl exec on the pod created by our debug deployment to simulate these clients making requests for ‘simple-service’.

$ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
debug-5f7d54889f-vcb2j   1/1     Running   0          3m47s
$ kubectl exec -it debug-5f7d54889f-vcb2j bash
root@debug-5f7d54889f-vcb2j:/$ apt update
root@debug-5f7d54889f-vcb2j:/$ apt install curl
root@debug-5f7d54889f-vcb2j:/$ curl simple-service-v1.simple-service.svc.cluster.local.
I'm service v1

It works… but, a little bit strange the clients calling simple-service-v1, including the version. Imagine you releasing a new version, and having to update simple-service’s address for every client. Let’s create a new service to simplify it and be used as the root service by the SMI provider (Linkerd in this case). Take a look on the traffic split spec to understand better this root service concept: “The root service that clients use to connect to the destination application.”

Create the service now:

$ kubectl apply -f simple-service-root.yaml

Now, on the debug pod, let’s request with the new address:

root@debug-5f7d54889f-vcb2j:/$ curl simple-service.simple-service.svc.cluster.local.
I'm service v1

It works :)!

Simple Service v2

Imagine that the simple service ow has a new version called v2. Let’s deploy the new version of simple-service:

$ kubectl apply -f simple-service-v2.yaml

Pretty simple, it creates a new deployment with the new application’s version (the docker image tag) and its respective service. On the debug pod you can check its running:

root@debug-5f7d54889f-vcb2j:/$ curl simple-service-v2.simple-service.svc.cluster.local.
I'm new!! Service v2 :)

Perfect, let’s try it with the root service, but, multiple times:

root@debug-5f7d54889f-vcb2j:/$ for i in {1..10};do curl simple-service.simple-service.svc.cluster.local.; echo; done
I'm service v1
I'm service v1
I'm new!! Service v2 :)
I'm service v1
I'm service v1
I'm service v1
I'm service v1
I'm service v1
I'm new!! Service v2 :)
I'm new!! Service v2 :)

It worked, and the traffic is “split” between v1 and v2, but not in a controlled way. This behavior is normal since the simple-service service (the root service) we created is not defining the version label as a selector (recheck the service spec again: kubectl get svc/simple-service -n simple-service -o yaml). That’s not exactly what we are looking for. The traffic split defines a way to define weights for each version.

Now, let’s start with Linkerd and the SMI spec.

Linkerd

You’ll have to download Linkerd2:

$ curl https://run.linkerd.io/install | sh
$ export PATH=$PATH:$HOME/.linkerd2/bin

It’ll install and add the ~/.linkerd2 dir in your PATH temporarily. You’ll have to adjust your shell if you want this command to be there on new shell sessions (.bashrc, .zshrc, etc).

With Linkerd2 installed on your local machine, it’s time to deploy it to the cluster:

$ linkerd install | kubectl apply -f -

The command linkerd install generates Kubernetes' resources and its just prints a regular yaml text to your stdout. You can save it to a file, to verify it, or apply any other operations. In this case, I’m applying it directly to the kubectl command.

Check if the pods are running on the linkerd namespace. It takes a while (less than 3 minutes for me). You can also use the command linkerd check to verify its installation.

Once it’s running, it’s time to inject Linkerd’s proxies pods on our service. I won’t try to explain what’s Linkerd, or a service mesh, since, it’s a whole subject for a new post (maybe a book?!) itself. Take a look at Linkerd’s architecture reference and this Buyoant blog post if you want to understand better how Linkerd works. The vital part of this step is that we’re “injecting” Linkerd on our deployment, and it means that we’re going to modify our Kubernetes resource to include a new container on our pods.

kubectl get deploy -o yaml -n simple-service | linkerd inject -

This command will output both deployments (v1 and v2) for the simple-service with Linkerd’s configurations. Again we could save it to a file and apply it, but, once again, I choose to repass it to kubectl:

kubectl get deploy -o yaml -n simple-service | linkerd inject - | kubectl apply -f -

Now, simple-service is “meshed”. It means that, automatically, simple-service counts with new features as telemetry, automatic mTLS with other meshed pods and more. However, for this post, I want to focus on the canary release. With Linkerd deployed and our pods meshed, let’s create a TrafficSplit resource:

$ kubectl apply -f traffic-split.yaml

With this traffic split, we declare the following: all the requests to the simple-service service (our root service) should go to the service simple-service-v1. TrafficSplit uses the weight measure identical to resources (1 == 1000m). Check the traffic split spec again for more details. This first traffic split is a good approach when we are creating a v2 for our application. Even if the root service’s selector selects the v2 pod, our service mesh provider won’t send traffic to it. Let’s test it on the debug pod:

root@debug-5f7d54889f-vcb2j:/$ for i in {1..10};do curl simple-service.simple-service.svc.cluster.local.; echo; done
I'm new!! Service v2 :)
I'm new!! Service v2 :)
I'm service v1
I'm service v1
I'm service v1
I'm new!! Service v2 :)
I'm service v1
I'm service v1
I'm new!! Service v2 :)
I'm new!! Service v2 :)

It didn’t work. That’s expected since we didn’t mesh our debug pod. On the traffic split spec we have the following:

It will be used by clients such as ingress controllers or service mesh sidecars to split the outgoing traffic to different destinations.

So, it makes sense. The outgoing traffic for our debug pod isn’t being controlled by Linkerd. Let’s mesh it:

$ kubectl get deploy -o yaml | linkerd inject - | kubectl apply -f -

The pod where we were executing our debug bash will terminate. Wait for a new pod to start and let’s try it again:

$ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
debug-5df9f65b74-jxl2f   2/2     Running   0          65s
$ kubectl exec -it debug-5df9f65b74-jxl2f bash
root@debug-5df9f65b74-jxl2f:/$ apt update
root@debug-5df9f65b74-jxl2f:/$ apt install curl # Sorry for that!
root@debug-5df9f65b74-jxl2f:/$ for i in {1..10};do curl simple-service.simple-service.svc.cluster.local.; echo; done
I'm service v1
I'm service v1
I'm service v1
I'm service v1
I'm service v1
I'm service v1
I'm service v1
I'm service v1
I'm service v1
I'm service v1

Now it works! With our meshed debug pod, Linkerd can apply the traffic split rules.

Try to change the weights on the traffic-split.yaml file and apply it again to see how it works. For example, let’s suppose that v2 is ready to receive around 75% percent of our traffic. Edit the file to adjust the weights and apply it:

apiVersion: split.smi-spec.io/v1alpha1
kind: TrafficSplit
metadata:
    name: simple-service
    namespace: simple-service
spec:
    service: simple-service
    backends:
    - service: simple-service-v1
      weight: 250m
    - service: simple-service-v2
      weight: 750m
$ kubectl apply -f traffic-split.yaml

Now, test it again:

root@debug-5df9f65b74-jxl2f:/$ for i in {1..10};do curl simple-service.simple-service.svc.cluster.local.; echo; done
I'm new!! Service v2 :)
I'm new!! Service v2 :)
I'm new!! Service v2 :)
I'm new!! Service v2 :)
I'm service v1
I'm service v1
I'm new!! Service v2 :)
I'm new!! Service v2 :)
I'm new!! Service v2 :)
I'm service v1

TrafficSplit is a Kubernetes resource, you can edit/view/delete/create/etc it with kubectl:

$ kubectl get trafficsplit -o yaml -n simple-service

What’s next?

SMI is pretty new, so a lot can be expected from it. Istio, and Consul Connect are already implementing it as well. I’m expecting great things to come.

As a next step, I’ll try some automated canary releases with Flagger. The project is already using traffic split from SMI with Linkerd, for example, https://docs.flagger.app/usage/linkerd-progressive-delivery.

comments powered by Disqus