From ground to production: Deploying Workload Identities on AKS

Claudio Ctin2 months ago15 mins

Introduction

By default, people don’t like to think too much about security while developing a software project or testing a infrastructure new cool tool.

That’s nothing wrong about it while you are developing that personal project you think will change your and humanity future, but when it’s time to put in to production and face the real world, it must be changed.

In my current job, our team tries to always focus in the “what could go wrong” approach, and add security as much as we can. One of the things we are using for a good while was access control using Service Principals (with the Client ID/Client Secret pair), and this could be considered good enough to the vast majority of the cases, right ?

In our case, it started to become a nightmare, since:

We manage everything as code (thanks Hashicorp for almighty terraform)

Our secrets are rotated every 90 days, and

Some applications relies on the client secret at runtime to authenticate to other managed services (e.g. DocumentDB’s, etc)

So, every time we need to rotate these credentials, we need to coordinate the action with the engineering teams, in order to ensure the applications receive a fresh secret and continue to work as intended.

How can we solve it you can ask me ? Well, some approaches we considered:

Instrument the applications with a /refresh endpoint, in order to when a secret got rotated this endpoint can be called:

Or configure a “cron” which calls this endpoint time to time in order to ensure we always have the current version of the secret available. The problem with this is it would introduce a overhead in the application and besides it, if we use a client secret to authenticate against the KeyVault, when it got rotated, this permissions vanishes too.

Use Workload Identities: This bring us two advantages in comparison with client_id/client_secrets:

First, we don’t need to have the current version of the current client secret anymore (yes, we still use client secrets in other parts of our infra, but this are not the topic here)
Second, we don’t need to store a sensitive value as a secret on the cluster anymore. When the application starts, it requests a token and using this token it tries to authenticates against all services it needs to work, if the user identity has permissions on the services, all good, access granted, application up and running.

In a nutshell, it works like this:

Source of image: https://azure.github.io/AKS-DevSecOps-Workshop/assets/images/module1/diagram.png

Or in a different view, like this:

Source of image: https://learn.microsoft.com/en-us/azure/aks/media/workload-identity-overview/aks-workload-identity-model.png#lightbox

Or like this, in a more detailed view “inside” the cluster:

Source of image: https://azure.github.io/azure-workload-identity/docs/images/flow-diagram.png

While looking for documentation, I start to (naively) think “Wow, this will be a walk in the park” since I found a guide which looks very complete and also a lab guide, part of a AKS DevSecOps workshop, both provided by Microsoft.

I also found this post from an Azure MVP and started to think “how difficult it would be, since it’s so well documented?” and, oh boy, I couldn’t be more wrong. Buckle up and follow me!

What they don’t tell you

I had a hard time to make this work while ensuring I do not break my existing cluster in the process (yes, its a QA cluster, but I care about it anyway). So, I decided to compile a list of things the guides do not mention as prerequisites, to help you to not have the same issues I had while testing this thing. Here it goes:

Basic informations:

I’m using terraform to deploy everything I used in this article
The cluster is running Kubernetes v1.29.4
I’m using service principals

Other pre-requisites (those are mentioned on the guides linked before):

Your AKS cluster must have OIDC and Workload Identities enabled

If your cluster was created without those, it can be updated to have this capabilities, issue a az aks update –resource-group <YOUR RG NAME> –name <YOUR AKS CLUSTER NAME> –enable-oidc-issuer –enable-workload-identity and you should be good to go after it completes.

What are not mentioned and I discovered in the hard way during my implementation:

The user identities which will be federated to your service principal/app registration must be created at the same resource group of your AKS cluster (at least, this was the only way I managed to make this thing work, after days of research and tests).
The user identities will be the ones with the permissions from now one on the target resources (e.g. a Keyvault or a Database), instead of your Service Principal.
You MUST install this helm chart in order to use the Workload Identities.

And here I need to make a personal note: Remember I mentioned here we are trying to do the things in the most secure way possible ? Well, we are running Kyverno on our clusters, and one of the policies we are using, prevent a service account to be created if it don’t have the automountServiceAccountToken parameter set to false. After this policy in place, every time you need to use a service account on your workloads, and need to consume their credentials (token/certificates) mounted on the pods, you explicitly need to set this very same automountServiceAccountToken parameter to true inside your deployments.
The official helm chart linked above do not cover this scenario, so I pushed an PR to extend the helm chart to have it in place, lets see if the good folks from MS review and approve it 🙂

How we deployed it ?

In this part, I will only show the TF code I added on top of our existing code, so I’m assuming you already have your infrastructure up and running (at least on the basic level).

Here goes the TF code I used to deploy this thing on our infrastructure.

Workload Identity Webhook

 # Install the mutating webhook for the Azure Workload Identity
 resource “helm_release” “aad_workload_identity_webhook” {
 name             = “workload-identity-webhook”
 chart            = “./helm_charts/azure-workload-identity-webhook”
 namespace        = “azure-workload-identity-system”
 create_namespace = true
 set {
 name  = “azureTenantID”
 value = var.tenant_id
 }
 }

Remember, in my case I had to do changes on the helm chart in order to be able to install it while complying with our admission control policies, if you want to use the default helm chart, your code would be something like this:

 # Install the mutating webhook for the Azure Workload Identity
 resource “helm_release” “aad_workload_identity_webhook” {
 name             = “workload-identity-webhook”
 repository       = “https://azure.github.io/azure-workload-identity/charts”
 chart            = “workload-identity-webhook”
 namespace        = “azure-workload-identity-system”
 create_namespace = true
 set {
 name  = “azureTenantID”
 value = var.tenant_id
 }
 }

User assigned identity

 resource “azurerm_user_assigned_identity” “user_identity_qa_cluster” {
 location            = data.azurerm_resource_group.myqacluster.location
 name                = “qa-cluster-aad-user-identity-for-applications”
 resource_group_name = data.azurerm_resource_group.myqacluster.name
 }

Service account

resource “kubernetes_service_account” “service_account_qa_cluster” {
metadata {
name = “qa-cluster-app1-sa”
namespace = “test-namespace”
annotations = {
“azure.workload.identity/client-id” = azurerm_user_assigned_identity.user_identity_qa_cluster.client_id # Here we need to set the client ID of our User Assigned Identity created above
}
labels = {
“azure.workload.identity/use” = “true”
}
}

automount_service_account_token = false # Do not mount the token in the pods if not explicit set

depends_on = [
azurerm_user_assigned_identity.user_identity_qa_cluster
]
}

Federated identity credential

resource “azurerm_federated_identity_credential” “federated_credential_app1_qa_cluster” {
name = “qa-cluster-app1-federated-identity”
resource_group_name = data.azurerm_resource_group.myqacluster.name
audience = [“api://AzureADTokenExchange”]
issuer = data.azurerm_kubernetes_cluster.myqacluster.oidc_issuer_url
parent_id = azurerm_user_assigned_identity.user_identity_qa_cluster.id
subject = “system:serviceaccount:test-namespace:qa-cluster-app1-sa”

depends_on = [
azurerm_user_assigned_identity.user_identity_qa_cluster,
kubernetes_service_account.service_account_qa_cluster
]
}

This cover the basics, however, we still need to give permissions to the resources where the user identity will connect to, in this example lets give permissions to it be able to fetch secrets from a Azure Key Vault:

# Permissions for the User Assigned Identity to access the KeyVault
resource “azurerm_key_vault_access_policy” “aks_permissions_app1_qa_cluster” {
key_vault_id = data.azurerm_key_vault.mykeyvault.id
tenant_id = data.azurerm_subscription.mysubscription.tenant_id
object_id = azurerm_user_assigned_identity.user_identity_qa_cluster.principal_id

secret_permissions = [
“Get”,
“List”
]

depends_on = [
azurerm_user_assigned_identity.user_identity_qa_cluster
]
}

Testing it

To test it, you can deploy a example application, like showed on the Lab guide linked at the beginning of this article, something like:

 cat <<EOF | kubectl apply -f –
 apiVersion: v1
 kind: Pod
 metadata:
 name: quick-start-test-workload-identities
 namespace: test-namespace
 labels:
 azure.workload.identity/use: “true”
 spec:
 serviceAccountName: qa-cluster-app1-sa
 automountServiceAccountToken: true
 containers:
 – image: ghcr.io/azure/azure-workload-identity/msal-net
 name: oidc
 env:
 – name: KEYVAULT_URL
 value: https://my-kevault-address.vault.azure.net/
 – name: SECRET_NAME
 value: testsecret
 EOF

When the pod starts, you should see something like this in the logs:

Conclusion

Hopefully this aggregated steps will save you a day (or two) while deploying this thing!

Please follow and like us:

Stiri similare

Python vs Java: A Deep Dive into the Best Programming Language for You

Claudio Ctin1 hour ago1 hour ago

Hey everyone! How’s your week going? 😊 Whether you’re in the middle of a coding marathon, enjoying a well-deserved break, or just here to explore new tech ideas, we’re happy to have you. Today, we’re diving into a hot topic: Python vs. Java. 🚀 These two programming giants are often at the center of debates,…