Kubernetes on Windows – Are we crazy?
- Software Engineering
Are we crazy? Possibly. At G-Research we’re experimenting with migrating our applications into containers, with Kubernetes as our choice of orchestrator. A big goal of ours is to have support within our clusters for Windows applications, as we have a large number of .NET-based apps which would benefit from being containerised without having to be migrated to .NET Core or another language completely. We were aware that although embryonic, Windows support on Kubernetes did seem to be on the road map and that pushed us towards Kubernetes as our choice of platform.
We have learned a fair amount about Kubernetes recently having set up clusters within our premises from scratch in our air-gapped environment. Having no internet connection on this environment forced us to get good at configuring Kubernetes, especially on the networking side with no cloud-provider magic to help us along.
The existing documentation on the Kubernetes website for setup on Windows appeared to be out of date, and after speaking to folks on the sig-windows Kubernetes Slack channel we soon cast these aside and started looking at instructions from a README on an outstanding PR. We broke up one of our existing QA clusters to use a few servers as the base of our mixed-mode cluster. At a very high level, we knew that we needed to be using the latest Windows Server 1709 build, to get kubelet and kube-proxy running there, and the network correctly configured to integrate with the rest of the cluster. We didn’t know much more than that, or really what ‘good’ configuration looked like for a Kubernetes Windows node. Although we already had really neat Ansible scripts to build and configure our cluster on Linux, we knew it would be a waste of time to try to replicate this set up initially on Windows, when we were still at the stage of just needing to hack something together to discover how it was supposed to work – we were definitely in the ‘spike’ phase before worrying about any kind of stabilisation.
At the time of starting out, we were working with Kubernetes v1.9.0-alpha-0. Being an alpha build, we were ready for some instability. Some of the guys on Slack had also pointed out that there was an outstanding PR required to allow the networking to behave correctly. With the alpha binaries we quickly ran into an issue where kubectl wouldn’t even start on our 1709 servers. It soon transpired that a recent PR has introduced a hidden dependency on some OpenGL libraries, and as the 1709 server build has no GUI at all, this caused a fatal exception on startup. Luckily for us someone had already created a new PR to fix this issue, although it had not yet been merged. It was starting to feel a bit too bleeding-edge! At that point, the best thing to do was to compile an updated Kubelet that included the outstanding PRs manually and continue. It took us a few further days of experimentation and head-scratching through the guide to work out how to get our Windows nodes connected, but the first time a kubectl ‘get nodes’ command returned a list with our Windows nodes with status ‘Ready’ was a pretty special moment. At this point we could also see lots of (good) errors on the Windows kubelet logs as it tried to fire up pods for the cluster’s existing daemonsets – these are of course running in Linux containers – as we hadn’t configured them not to run on these nodes either via a taint or node selector.
Once tainted and cleaned up, we had a couple of happy Windows nodes ready for deployment. To test our cluster, we created a very simple hello-world style application, written in C#. This was cross-compiled targeting both .NET Core and .NET Framework, allowing us to have Docker images for a ‘hello’ with app with versions that could run on both Linux and Windows (the latter running in .NET Framework just to prove the point). This just presented an endpoint under /hello which would output some details of where it was running, including the operating system.
We created a couple of app manifests including a deployment, service and ingress for each of the hello Linux and Windows varieties. Once these were deployed it helped us to further debug our network setup, which turned out to be not quite correct. The simplest way to debug this was to work from the inside out, calling our /hello endpoint on the pods directly on the machines they were running on, before working out to call them via the services and finally from outside the cluster through an ingress. Finally we were left with our app running across the whole cluster.
The next steps for us now are to fully automate our Windows node provisioning, once the Windows support is in the mainline and a bit more stable. At this point we will be able to add Windows nodes into our other test and QA clusters, and start to move some applications on to these. There is some more work to be done to integrate with external domain resources such as Active Directory, but our experience so far has been very encouraging. It may not be crazy to attempt to run Windows workloads in Kubernetes, as Microsoft add their weight behind containerization technology and are already key contributors to the Kubernetes project.
Jamie Poole, Software Engineering