Kubernetes Today

Kubernetes Today

The world of technology and the overall distributed systems layer have seen quite the change in the past 25 years (which is not a long time considering software engineering is only 65 years old).

Kubernetes is the latest in the orchestration layer of distributed systems.

However, much like with any technology, it’s not perfect. In this blog post, you’ll get a brief overview of Kubernetes, what Kubernetes got right, and how to streamline what may not be efficient in the realm of Kubernetes.

Kubernetes Primer

In computing, the timeline has gone something like this:

Mainframes → Servers → Virtualization → Cloud → Containerization/Kubernetes

<aside> 💡

You could put Serverless in the “cloud phase”.

</aside>

When containerization and then Kubernetes came out, this was where the true era of “cloud-native” began. Containers were the smallest unit that engineers have ever seen in terms of deploying an application, and it still performing as expected (or better), despite there not being huge servers running the stack.

The overall goal with Kubernetes is the ability to orchestrate containers for you. That means everything, including:

  • Self-healing (if an app goes down, it automatically comes back up with a Reconciliation Loop (Controllers)
  • Scaling clusters and Pods (where containers live)
  • Having the ability to export metrics and look at logs based on your workflows

And perhaps most important, manage your infrastructure programmatically. Kubernetes is made up of several APIs that run as a package on servers. It’s still running servers underneath the hood, but you’re interacting with them using a set of APIs.

With the ability to split application stacks (frontend, middleware, backend, databases, etc.) into containers, that means they can all be managed separately. This is a big deal for scale and overall Day Two Operations as stacks don’t have to be deployed in a monolithic fashion anymore, which means putting everything (frontend, middleware, backend, and sometimes even the database) on “one server”.

In short, Kubernetes gave us the ability to orchestrate the smallest unit of computing in a programmatic way.

As with all technology, however, it’s not perfect. In the next section, we’ll talk about what Kubernetes got right, followed by a few potential opportunities for improvement that you may see.

What Kubernetes Got Right

When Kubernetes came out, the whole idea was to give engineers the ability to:

  • Decouple applications and make workloads less monolithic.
  • Have an API to interact with to programmatically implement infrastructure.
  • Deploy workloads in a declarative fashion.

And Kubernetes got all of these right.

As an example, let’s discuss declarative workloads. When you think about declarative vs imperative, declarative means “tell me what to do, not how to do it” and imperative means “tell me what to do and how to do it”. For example, if you write a bash script, that’s imperative because it’s running line by line. A Kubernetes Manifest, however, is declarative. You specify things like port numbers, labels, container names, volumes, and any other part of the environment you want, but that’s it - you simply “declare it”. You don’t tell Kubernetes how to implement it or run it; it knows how to do that. That’s the declarative model.

Out of everything you’ve learned about through this blog post, there are a substantial number of amazing things that Kubernetes brought to the table. One thing, however, is still a concern.

Scaling.

Scaling Concerns

When you think about scaling in the realm of containerization and Kubernetes, it comes down to two pieces:

  • Scaling the cluster.
  • Scaling Pods

When you’re scaling the cluster, from a resources perspective, you’re primarily taking the Worker Nodes into consideration. Worker Nodes are where application stacks running inside of Pods exist, so CPU, memory, and GPU are the key factors. If there isn’t enough CPU or memory for a new Pod, that part of the application stack won’t run, which would ultimately result in an outage.

<aside> 💡

Control Planes are, of course, thought about with scaling as well, but not when you’re running in a managed Kubernetes service (AKS, EKS, GKE, etc.). If you’re running a self-managed Kuberentes cluster, scale comes into consideration when thinking about Etcd (the Kubernete database).

</aside>

For scaling a Kubernetes cluster, the “out-of-the-box” method is with the Cluster Autoscaler, which is part of the Kubernetes project. It scales nodes up and down based on load. The major problem with Cluster Autoscaler has always been the speed of which clusters scale down (not up). For example, if you have a major increase in requests to Pods and therefore more Worker Nodes are needed, what happens when the requests go back down? Well, of course, the Worker Nodes should scale down, and they do. The problem with Cluster Autoscaler is it’s a bit slow to scale nodes down.

Because of that, the team at AWS created Karpenter, which allows fine-grained control over everything about the scaling process from the max node count to what type of instances will be used for scaling Node Pools (where Worker Nodes exist in EKS). Karpenter has now been “scaled out” to work on Azure Kubernetes Service (AKS), which is great, but at the same time, those are the only two environments Karpenter works on. If you have GKE or another managed Kubernetes service, you can’t use it, which means it’s only impactful sometimes.

The next piece are the Pods themselves, which is a crucial part to think about when scaling. The reason is that Pods are what run applications, which means they need CPU, memory, and potentially a GPU. This scaling is set based on limits, quotas, and requests.

<aside> 💡

Quotas are set at the Namespace level and declare that “this Namespace is allowed X amount of resources”. Requests are very similar, but at the Pod level (this Pod can request up to X amount of memory and CPU). Limits are a hard limit (Pod cannot go above this amount). With requests, you have a minimum resource guarantee and with limits, you have a max resource cap.

</aside>

The major problem with setting resource optimization at the Pod level is that it’s all manual. You have to create within the Kubernetes Manifest that’s deploying the Pod(s) the resource constraints. Imagine if you have 20+ Manifests and your application needs to scale, so you have to go in and manually figure out, then change all of those resource constraints. It’s a hassle.

Let’s now talk about a more effective approach.

Implementing A More Effective Scaling Protocol

Up until this point within the realm of cloud-native distributed systems, scaling has been more or less “manual”. Yes, there are ways to scale and right-size from an automation perspective, but an engineer still has to think about it.

The typical process is:

  1. Get an idea of the load your application stack needs.
  2. Deploy the right amount of Worker Nodes with enough CPU, memory, and GPU to effectively run said application stack with performance in mind.
  3. Test it (and the only way to test this truly is by getting it into the hands of users to see real load)
  4. Adjust the size of the Worker Nodes and the resources they need as time goes on.

When thinking about the overall steps, number 4 is the most manual. For example, let’s say the application stack you’re managing is for an eCommerce site. That means you have to plan for specific days that will be busier than others (e.g.- Cyber Monday, holiday season, etc.) and even if you plan it properly, those numbers won’t be the same over the years. The goal would be for the company to be growing, which means there will be an increase in resources needed, but you won’t actually know how much until you’re in the thick of it.

With the power of AI and having the ability to forecast based on Models that are trained from the specfici usage within your environment, the majority of these problems vanish.