Skip to main content

Command Palette

Search for a command to run...

How I Contributed to the CNCF Kubernetes Conformance Program

My First Major CNCF Open Source Contribution

Updated
8 min read
How I Contributed to the CNCF Kubernetes Conformance Program

Submitting Kubernetes v1.34 conformance results for Kubespray.

Every Kubernetes distribution needs to prove it behaves like Kubernetes. That proof comes through the CNCF Kubernetes Conformance Program, a standardized set of 424 end-to-end tests that verify a distribution implements the Kubernetes API correctly. If it passes, it earns the "Certified Kubernetes" badge.

This is the story of how I picked up a GitHub issue, ran into DNS nightmares, broke my cluster twice, learned a ton about Linux internals, and ultimately got my PR merged into the official CNCF conformance repository.

What is Kubespray?

Kubespray is an Ansible-based tool for deploying production-ready Kubernetes clusters. It's part of the official kubernetes-sigs organization and supports deployment across AWS, GCE, Azure, OpenStack, vSphere, Equinix Metal (bare metal), Oracle Cloud Infrastructure (Experimental), or Baremetal

Think of it as "Infrastructure as Code for Kubernetes", you define your cluster configuration, and Kubespray handles the 50+ steps needed to get a production-grade Kubernetes cluster running. For a deeper dive, check the official documentation.

What is Conformance Testing?

The CNCF runs the Kubernetes Software Conformance Certification program. Every time a new Kubernetes version is released, distributions like Kubespray need to submit proof that they can deploy a conformant cluster.

The submission requires four files:

FilePurpose
PRODUCT.yamlProduct metadata (vendor, name, version, URLs)
README.mdStep-by-step instructions to reproduce the test
e2e.logRaw test output log
junit_01.xmlMachine-readable test results

The testing tool is Sonobuoy, it runs 424 end-to-end tests inside your cluster in certified-conformance mode.

The Issue: K8s Conformance Test for v1.34

I found issue #12932 on the kubespray repo, they needed someone to run the conformance tests for Kubernetes v1.34 using Kubespray v2.30.0 and submit the results to the CNCF.

The task seemed straightforward:

  1. Spin up a 3-node Kubernetes cluster using Kubespray

  2. Run Sonobuoy conformance tests

  3. Collect results and submit a PR

How hard could it be? Spoiler: very.

My Setup

ComponentDetails
Host MachineMacBook Pro M1 Pro (Apple Silicon / ARM64)
HypervisorVMware Fusion
VM ProvisioningVagrant
OS (Guest VMs)Ubuntu 24.04 LTS ARM64
Kubernetesv1.34.3 (via Kubespray v2.30.0)
CNICalico (required for ARM6, Flannel doesn't support ARM64)
Cluster3 nodes (2 control-plane, 1 worker)

Why Vagrant?

Vagrant is a tool for building and managing virtual machine environments. It reads a Vagrantfile and provisions VMs automatically. Kubespray includes a Vagrantfile that integrates with its Ansible playbooks, so vagrant up creates VMs, configures networking, and runs Kubespray to deploy Kubernetes. All in one command.

For conformance testing, this is ideal because it ensures reproducibility, anyone can follow the same steps to recreate the exact environment.

Why Calico Instead of Flannel?

This is an ARM64-specific gotcha. According to the Kubespray architecture docs, Flannel does not support ARM64. Calico is the recommended CNI for ARM64 clusters.

The Journey: What Actually Happened

Phase 1: Getting the Cluster Up

Setting up the Vagrant configuration was straightforward:

$instance_name_prefix = "kube"
$vm_cpus = 2
$vm_memory = 4096
$num_instances = 3
$os = "ubuntu2404"
$subnet = "10.2.20"
$inventory = "inventory/k8s-conformance"
$network_plugin = "calico"

Running vagrant up kicked off the Ansible playbook, and after about 20 minutes, I had a 3-node Kubernetes cluster. So far, so good.

$ kubectl get nodes
NAME     STATUS   ROLES           AGE   VERSION
kube-1   Ready    control-plane   10m   v1.34.3
kube-2   Ready    control-plane   10m   v1.34.3
kube-3   Ready    <none>          10m   v1.34.3

Phase 2: The First Sonobuoy Run — DNS Failures

I downloaded Sonobuoy v0.57.2 (ARM64 binary) and started the conformance tests:

./sonobuoy run --mode=certified-conformance \
    --sonobuoy-image=sonobuoy/sonobuoy:v0.57 \
    --systemd-logs-image=sonobuoy/systemd-logs-arm64:v0.4 \
    --wait

~2 hours later, the results came back:

Passed: 0, Failed: 0, Remaining: 424

Wait, 0 passed AND 0 failed? That means the e2e plugin crashed before running any tests. Digging into the logs, I found the culprit:

[sig-network] DNS should provide DNS for the cluster
FAILED [602.517 seconds]

Unable to read from pod dns-test-...: the server could not find the requested resource

The test pods were being created but immediately disappearing. The DNS test couldn't find its own pods.

Phase 3: The Rabbit Hole — systemd-resolved

Here's where it got interesting. Ubuntu 24.04 uses systemd-resolved for DNS resolution. This service creates a stub resolver at 127.0.0.53 and manages /etc/resolv.conf(file:///etc/resolv.conf) as a symlink to /run/systemd/resolve/stub-resolv.conf.

The problem: CoreDNS inside the cluster was trying to use 127.0.0.53 as an upstream DNS server, but that address is only valid inside the host, not inside containers. Every container DNS lookup was timing out.

The fix:

# Disable systemd-resolved
sudo systemctl stop systemd-resolved
sudo systemctl disable systemd-resolved

# Replace with direct DNS servers
sudo rm -f /etc/resolv.conf
echo -e 'nameserver 8.8.8.8\nnameserver 8.8.4.4' | sudo tee /etc/resolv.conf

I applied this on all three nodes. DNS test looked good now:

$ kubectl run dnstest --image=busybox:1.28 --rm -it --restart=Never -- nslookup google.com
Name:      google.com
Address 1: 142.250.192.206
✅

Phase 4: The Second Sonobuoy Run — Still Failing

Ran Sonobuoy again. Another 2+ hours. Same result:

Passed: 0, Failed: 0, Remaining: 424

The cluster was showing signs of instability, controller restarts, unhealthy probes, context deadline exceeded errors. Something deeper was wrong.

Phase 5: The Reboot That Broke Everything

I rebooted all three nodes to clear the instability. After rebooting, the cluster was completely dead:

The connection to the server 127.0.0.1:6443 was refused

The API server wasn't starting. No containers were running. Checking the kubelet logs revealed the smoking gun:

E0203 18:15:08.841453 dns.go:285] "Could not open resolv conf file."
err="open /run/systemd/resolve/resolv.conf: no such file or directory"

E0203 18:15:08.841474 kuberuntime_sandbox.go:44]
"Failed to generate sandbox config for pod"
err="open /run/systemd/resolve/resolv.conf: no such file or directory"

Our DNS fix broke the kubelet!

When we disabled systemd-resolved and deleted its files, we also removed /run/systemd/resolve/resolv.conf. But the kubelet was configured to read DNS settings from that exact file path. Without it, kubelet couldn't create any pod sandboxes — which means no static pods, which means no API server, which means a dead cluster.

The solution was elegant: create the directory and symlink it back to our static resolv.conf:

# Create the directory kubelet expects
sudo mkdir -p /run/systemd/resolve

# Symlink to our static resolv.conf with real DNS servers
sudo ln -sf /etc/resolv.conf /run/systemd/resolve/resolv.conf

# Restart kubelet
sudo systemctl restart kubelet

Applied this to all three nodes. Within 30 seconds:

$ kubectl get nodes
NAME     STATUS   ROLES           AGE   VERSION
kube-1   Ready    control-plane   26h   v1.34.3
kube-2   Ready    control-plane   26h   v1.34.3
kube-3   Ready    <none>          25h   v1.34.3

The cluster was back!

Phase 7: The Final Run — 424/424 ✅

Third time's the charm. With the DNS properly configured and the kubelet symlink in place, I ran Sonobuoy one last time:

./sonobuoy run --mode=certified-conformance \
    --sonobuoy-image=sonobuoy/sonobuoy:v0.57 \
    --systemd-logs-image=sonobuoy/systemd-logs-arm64:v0.4 \
    --wait

~1 hour 42 minutes later:

Ran 424 of 7137 Specs in 6112.853 seconds
SUCCESS! -- 424 Passed | 0 Failed | 0 Pending | 6713 Skipped
PASS

ALL 424 TESTS PASSED. 🎉

The PR Submission

The Commit Squashing Lesson

The CNCF conformance repo has strict CI checks. One of them: exactly one commit per submission. My branch had accumulated multiple commits during the debugging process. The conformance bot flagged it:

[FAIL] it appears that there is not exactly one commit.
Please rebase and squash with git rebase -i HEAD

The fix:

# Reset to the base commit (keeping all changes staged)
git reset --soft <base-commit-hash>

# Create a single commit with DCO sign-off
git commit -s -m "Conformance results for v1.34/kubespray"

# Force push to update the PR
git push --force-with-lease origin conformance/v1.34-kubespray

DCO Sign-Off

The CNCF requires a Developer Certificate of Origin (DCO) sign-off on every commit. This is done with the -s flag in git commit, which adds:

Signed-off-by: Your Name <your-email@example.com>

The Result

After squashing into a single signed commit, the bot confirmed:

All requirements (15) have passed for the submission!

On February 10, 2026, Taylor Waggoner (CNCF staff) merged the PR with:

"You are now Certified Kubernetes 🎉"

PR #4078 — Merged.

What I Learned

Technical Skills

  • Linux DNS internals: How systemd-resolved works, the difference between stub resolvers (127.0.0.53) and direct DNS, and the resolution chain

  • Kubernetes node architecture: How kubelet starts static pods (API server, scheduler, controller-manager) using manifest files in /etc/kubernetes/manifests/

  • Container networking: How CoreDNS resolves cluster DNS and how host DNS configuration affects pod DNS resolution

  • Sonobuoy & e2e testing: Running conformance tests, interpreting results, understanding the difference between test failures and infrastructure failures

  • ARM64 considerations: Not all CNI plugins support ARM64, Sonobuoy needs architecture-specific images, and some configurations behave differently on ARM

Open Source Best Practices

  • Read existing submissions first: Looking at previous conformance PRs saved me hours

  • Follow CI requirements precisely: Single commit, DCO sign-off, correct directory structure, and automation enforces these strictly

  • Document everything: The README should contain reproducible steps so anyone can verify your results

  • Be patient with debugging: What seemed like a DNS issue turned out to be a kubelet configuration dependency. Layer by layer, the picture became clear

The Debugging Mindset

The biggest lesson was systematic troubleshooting. When the tests failed:

  1. Read the actual logs, not just the summary

  2. Check each component independently (kubelet → containerd → API server → pods)

  3. Understand the dependency chain (kubelet needs DNS config → DNS config path is hardcoded → removing systemd-resolved removed that path)

  4. Test incrementally (verify DNS works before running full test suite)

The Complete DNS Fix

For anyone running Kubespray conformance tests on Ubuntu 24.04 (or any OS with systemd-resolved), here's the complete fix that works:

# On each node (kube-1, kube-2, kube-3):

# 1. Disable systemd-resolved
sudo systemctl stop systemd-resolved
sudo systemctl disable systemd-resolved

# 2. Set up direct DNS
sudo rm -f /etc/resolv.conf
echo -e 'nameserver 8.8.8.8\nnameserver 8.8.4.4' | sudo tee /etc/resolv.conf

# 3. Create the symlink kubelet expects (THIS IS THE KEY STEP)
sudo mkdir -p /run/systemd/resolve
sudo ln -sf /etc/resolv.conf /run/systemd/resolve/resolv.conf

# 4. Restart services
sudo systemctl restart containerd kubelet

Wait 2-3 minutes for the cluster to stabilize before running Sonobuoy.

Resources

If you're working on a conformance submission and hit DNS issues on Ubuntu/Fedora, the symlink fix above might save you a few hours. Happy testing!

Open Source

Part 1 of 8

In this Open-Source series, I'll provide you blogs about Git, Github, Open-Source Contribution, and all the main topics of Github and Open-Source. In one line this series is dedicated to Open-Source.

Up next

Fix Wrong Authors Commits the Safe Way Using Git Rebase

Mastering Git Rebase: Fix Wrong Authors the Safe Way (Docker Lab + Real GitHub Practice)

How I Contributed to the CNCF Kubernetes Conformance Program