Load Testing Solium Infernum with Docker, Kubernetes and Enemy AI

Load Testing Solium Infernum with Docker, Kubernetes and Enemy AI

Load testing a video game is a critical part of the development process if you have any intention of building online systems into your game. As we’ve seen many times before, it’s important that you plan for both critical success as well as critical failure when it comes to online multiplayer systems. The results of being over or under prepared can have devastating effects on your players, or your bank account.

For Solium Infernum, we had a 4-6 player, turn-based, asynchronous multiplayer game. That last part is important because it means that a game of Solium Infernum can last for hours, weeks, even months. Even with the fastest game configurations in place, a single game of Solium Infernum can last an hour or two, so it was critical for us to have a way to load test the game that didn’t consume hours and hours of human time for every change.

What Options Do We Have?

Load Testing Services

When it comes to load testing, one option that often comes to mind is beta testing or third party load testing services. This is where either you or a third-party, organise hundreds, or thousands of players around the world to load test your game. This is pretty much the closest you can get to a real-world scenario testing. The main issue here is that people are expensive, and at a certain scale, it is no longer feasible to get enough humans involved to properly load test.

Automated testing

Another option for load testing is, of course, automation. Automation has the potential to save us huge amounts of time in testing scenarios, and you are able to run tests at a much higher volume and frequency. The trade-off here is that, generally, it’s less ‘exhaustive’ solution than manual testing.

I’d like to clarify that I’m not talking about Unit Testing here, although I do think that Unit Testing is an incredibly useful tool for development. Specifically, in this instance, I’m talking about load testing, which means creating automation tools that can attempt to saturate live backend systems, preferably in some kind of ‘non-live’ load testing environment. If I were to give this kind of testing a label, it would probably fit more broadly into the ‘integration test’ bucket.

The Challenges

Load testing systems like this is about more than just finding the maximum throughput of a given endpoint, it’s also about finding hot and cold paths in your system and working out what kind of reasonable ‘soft limits’ can be set to protect you from cloud overspend. In most cloud environments, there is a level of fine-grained control over metered endpoints/databases, so we want to be able to fairly reliably imitate player behavior if we can so that we can confidently configure our backend so scale nicely with player growth.

The other challenge we faced is that video games are resource intensive to run. With more traditional software, you may be able to more easily run a few hundred or thousand instances of your application on not very much hardware, but video games have a minimum spec, and unless you happen to have access to a huge number of GPUs that you’re not already using to build LLMs and mine bitcoin, you’re going to need to find another way to scale up your game for automation.

The Goal

So, back to Solium Infernum, we need a test solution that is lightweight enough to scale meaningfully for load testing, and we also need to find a way to simulate the actions of real players sending multiplayer traffic to our online systems without using actual people. One thing that came to mind when looking into this problem is that there is another key area of game development that aims to simulate player behavior very closely, the enemy AI.

Now, I want to be super clear here that when I’m talking about ‘enemy AI’ in this article, I’m not talking about LLMs or Generative AI. Though I imagine you could achieve similar outcomes using those tools, what I’m talking about in our case is more traditional game AI.

Building The Load Testers

The Core Game Package

To understand how we achieved the next step, you will need to understand our core packaging systems. The core architecture of Solium Infernum was designed in a way that allowed us to package the core systems of the game in a library that was not dependent on any specific engine libraries. What this meant is that at build time, we were able to create a separate NuGet package that contained the core game libraries for that version of the game and ship that package to the upstream server project to be consumed server-side for turn processing.

The most important part of the above is that NuGet package. Using this package, we were able to solve both of our biggest challenges. We can use this package to build a lightweight load testing application that has access to the core game systems like turn processing, validation, and most importantly for us, the enemy AI.

The Dummy Client

The next stage of this process is to take that core game package and use it to build a more lightweight or ‘dummy’ version of our game client. This ‘dummy client’ will have no graphics, no UI, and not even have any user interaction. The purpose of this client is simply to use the enemy AI systems to simulate player turns and then submit them to our backend infrastructure, effectively masquerading as real players for the purpose of testing our backend systems.

The specifics of how the dummy client is implemented are going to depend on the project and also personal preference for a load testing scenario. For Solium Infernum, the dummy client was set up as a .NET Core console application which simulated an entire multiplayer game start to finish, including several AI players running from a single dummy client.

It would also be possible to set up each client as an individual player, but that would require a bit more cross-client orchestration when it comes to hosting and joining lobbies, and we wanted to keep things as simple as possible. So, for us, each dummy client represents a single ‘match’ of between 4-6 players.

The core loop of the dummy client was as follows:

The end result here is a console application that can be run to simulate an entire game of Solium Infernum using real online infrastructure. The next part of this puzzle is being able to scale this application up to run as many concurrent games as we have the hardware to support.

Packaging the Dummy Client

One of the simplest and most effective ways to package and distribute an application today is with a Docker image.
From docker.com:

A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings

Bundling our dummy client up into a Docker image allows us to deploy it onto any kind of hardware and be confident that the client still has everything it needs to run, and that running multiple clients together won’t create concurrency issues between them.

All you will need to containerise your Dummy Client is a copy of Docker Desktop and a Dockerfile in the root of your project. For us, something like this was enough to bundle up our Dummy Client console app for use with Docker.

FROM mcr.microsoft.com/dotnet/runtime:6.0 AS base
WORKDIR /app

FROM mcr.microsoft.com/dotnet/sdk:6.0 AS build
WORKDIR /src
COPY [“nuget.config”, “DummyClient/”]
COPY [“DummyClient.csproj”, “DummyClient/”]
RUN dotnet restore “DummyClient/DummyClient.csproj”
COPY . .
WORKDIR “/src/DummyClient”
RUN dotnet build “DummyClient.csproj” -c Release -o /app/build

FROM build AS publish
RUN dotnet publish “DummyClient.csproj” -c Release -o /app/publish

FROM base AS final
WORKDIR /app
COPY –from=publish /app/publish/ ./
ENTRYPOINT [“dotnet”, “DummyClient.dll”]

Once you have your Dockerfile defined, you can build your image with:

docker build -t <image-name> .

Then the image can be run locally with:

docker rum -it -rm <image-name> sh

Now we have a packaged dummy client image that we can deploy just about anywhere. The next step is deploying lots of them.

Scale it up!

Now there are a few options when it comes to scaling up a number of container deployments. How frequently or easily you want to run your load tests will determine what solution might work for you.

Docker compose

Probably the most straight forward way to quickly scale up a lot of containers for an image is with docker-compose. With a relatively simple docker-compose.yml file you can define the environment variables and resource limitations you want for the dummy clients.

# Version of docker-compose
version: 3″

services:
loadtester.gamerunner:
image: dummy-client:dev
build:
context: .
dockerfile: DummyClient/Dockerfile
deploy:
replicas: <number-of-replicas>
resources:
limits:
cpus: 3′
memory: 1000M
reservations:
cpus: 1.50′
memory: 800M
environment:
OTHER_CONFIG: …”
SOME_CONNECTION_STRING: URL=…;KEY=….”

You can configure the number of replicas you want to run by setting the deploy:replicas value. Then you can run your load test with:

docker-compose up -d

The main thing to note here is that you are limited to the resources on the machine you are running on, so you can only run as many clients as your machine can handle. What happens if you want more? How do we spread our dummy clients across multiple machines?

Kubernetes

Our next option for scaling up our load testing instances is Kubernetes. From kubernetes.io:

Kubernetes is a portable, extensible, open source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation.

Essentially, Kubernetes can be deployed across a series of machines and used to orchestrate application deployments and lifecycles. To learn more about setting up a Kubernetes cluster, you can find more information here: Kubernetes Getting Started

Deployments

Very briefly, a deployment is a resource that tells Kubernetes how a containerised application should be run, and how many instances should be maintained at any given time. The example below shows a deployment file for our dummy client, where we define what Docker image we will be using, any environment variables that are needed and importantly how many replicas to deploy across the cluster.

A possible dummy-client-deployment.yaml file might be:

apiVersion: apps/v1
kind: Deployment
metadata:
name: dummy-client
labels:
app: dummy-client
spec:
replicas: <number-of-replicas>
selector:
matchLabels:
app: dummy-client
template:
metadata:
labels:
app: dummy-client
spec:
containers:
name: dummy-client
image: dummy-client:dev
imagePullPolicy: Always
env:
name: OTHER_CONFIG
value: …”
name: SOME_CONNECTION_STRING
value: URL=…;KEY=….”

With this deployment file, Kubernetes will handle the provisioning and resource management for us.
You can apply it to your cluster manually with:

kubectl apply -f ./dummy-client-deployment.yaml

Once you have your deployment in place, you are able to manually tweak the number of instances with kubectl and then use that to spin down and spin up tests as needed. eg:

# Spin down the load testers
kubectl rollout restart deployment/dummy-client –replicas=0

# Spin up the load testers
kubectl rollout restart deployment/dummy-client –replicas=<num-replicas>

With this method, you can have your load test application deployed across any number of machines. If your Kubernetes cluster is cloud-based, like GKE or AKS, you only limitation on instances is how much you can afford to spend on nodes.

Gitlab Kubernetes Executor

I’m adding this in for completeness, since this option will depend very heavily on your CI/CD setup, but in our case, we had available to us an on-prem Kubernetes cluster that was already connected to our GitLab CI system. I can talk about the GitLab and Kubernetes system in more detail in another post, but because of this setup we were able to trigger our load tests from our CI system.

We hooked up the dummy clients to the CI system in a way that allowed us to run a pipeline from GitLab that would trigger X jobs for the run, and each job was an instance of the load tester that ran in our cluster.

If you want to learn more about the GitLab Kubernetes Executor you can do so here: GitLab Kubernetes Executor

Reporting

The last, but most important, step in this whole process is reporting. If I can encourage you to do anything from this article when performing any kind of load testing, it would be to measure twice, measure twice again and then twice more for good luck. In order to properly assess your load test results you need to make sure that you have as much good information available to you as you can manage.

There are a lot of different kinds of monitoring and metrics tools available, and in our case, we built our own bespoke tools that were catered specifically for our needs. First, in the form of a simple console monitoring app.

Soon moving on to a full web-based front end that allowed us to interact with the load testing systems directly from the UI.

With the dummy client we made sure that every step of the way we were sending out metrics on what we were doing. We took records of load test timings, endpoint saturation, as well as stats on individual AI games and exception reports. Since we were running real games with AI, we ended up with a testing system that went beyond just load testing and into bug finding and stability testing.

Finishing thoughts

Using our game AI to test our online infrastructure ended up giving us so much more than just a backend load tester. We definitely succeeded in our initial goal of testing our load limits in the backend, but we also had the ability to simulate hundreds of games every day, and using the AI to generate fake user actions we were able to get huge amounts of coverage on internal game system. This ultimately led us to finding bugs that we weren’t even aware of at the time.

Load testing is absolutely critical in understanding the capabilities of the systems you are building, and sometimes simply saturating the network with bad traffic won’t tell you enough about the real bottlenecks in your system.

Being able to package and reuse systems like AI and core gameplay loops allows you to be creative with your testing and get close to real life simulations on your backend. You get a double win out of doing this because you can test your server load, whilst also finding genuine game bugs because your bots are running real games against each other.

Tools like Docker and Kubernetes are still finding their way into the game development space, but there is some powerful tech here that can be leveraged in creative ways to see some incredible results.