Education & Careers

How to Dynamically Adjust Resource Allocations for Suspended Kubernetes Jobs (v1.36 Beta)

2026-05-01 05:41:36

Introduction

Kubernetes v1.36 introduces a powerful enhancement for batch and machine learning workloads: the ability to modify container resource requests and limits in the pod template of a suspended Job. Now in beta (first introduced as alpha in v1.35), this feature lets queue controllers and administrators fine-tune CPU, memory, GPU, and extended resource specifications on a Job while it's suspended, before it starts or resumes running. This means you can adapt resource allocations without deleting and recreating the Job, preserving all metadata and status.

How to Dynamically Adjust Resource Allocations for Suspended Kubernetes Jobs (v1.36 Beta)

In this step-by-step guide, you'll learn how to leverage this feature to dynamically adjust resources for suspended Jobs, ensuring efficient cluster utilization and smoother operation of resource‑intensive workloads.

What You Need

Step-by-Step Guide

Step 1: Verify the Feature is Enabled

In Kubernetes v1.36, this feature is beta, so it's enabled by default. To confirm, run:

kubectl api-versions | grep batch/v1

If you're on v1.35, you may need to enable the JobMutablePodTemplate feature gate. In v1.36, no manual action is required.

Step 2: Create a Suspended Job

Define a Job manifest with the spec.suspend: true field. This suspends the Job immediately after creation, allowing you to modify its resources before any Pods are launched. Below is an example of a machine learning training Job requesting 4 GPUs:

apiVersion: batch/v1
kind: Job
metadata:
  name: ml-training-suspended
spec:
  suspend: true
  template:
    spec:
      containers:
      - name: trainer
        image: example-registry.example.com/training:latest
        resources:
          requests:
            cpu: "8"
            memory: "32Gi"
            example-hardware-vendor.com/gpu: "4"
          limits:
            cpu: "8"
            memory: "32Gi"
            example-hardware-vendor.com/gpu: "4"
      restartPolicy: Never

Apply it with kubectl apply -f job-suspended.yaml.

Step 3: Modify Resource Requests/Limits While Suspended

Once the Job is created and in a suspended state, you can update its pod template's resources. Use kubectl edit or kubectl patch. For example, to reduce GPU count from 4 to 2 and adjust CPU/memory:

kubectl patch job ml-training-suspended --type='json' -p='[
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/example-hardware-vendor.com~1gpu", "value": "2"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/example-hardware-vendor.com~1gpu", "value": "2"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value": "4"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/memory", "value": "16Gi"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/cpu", "value": "4"},
  {"op": "replace", "path": "/spec/template/spec/containers/0/resources/limits/memory", "value": "16Gi"}
]'

Note: The tilde (~1) in the GPU field escapes the slash in the resource name. Ensure the new values are valid (e.g., non‑negative, within cluster capacity).

Step 4: Resume the Job

After adjusting resources, unsuspend the Job by setting spec.suspend to false:

kubectl patch job ml-training-suspended -p '{"spec":{"suspend":false}}'

The Job will start creating Pods with the updated resource specifications. You can monitor progress with kubectl get pods -w.

Step 5: Verify Resource Allocation

Check that the running Pods reflect the new resources:

kubectl get pod ml-training-suspended-xxxxx -o jsonpath='{.spec.containers[0].resources}'

You should see the adjusted requests and limits. If a queue controller is managing the Job, it can also perform these updates automatically.

Tips and Best Practices

This feature dramatically improves flexibility for batch and ML workloads, letting you adapt to changing cluster conditions without disruption. Embrace it to make your Kubernetes environment more resilient and efficient.

Explore

New Milestones for AMD openSIL and Coreboot on Consumer Motherboards The Hidden Tracker: How a Postcard Compromised Naval Security How New Linux ‘Copy Fail’ flaw gives hackers root on major distros Rethinking Next-Gen: How Housemarque's Saros Prioritizes Gameplay Over Glitz The Transparency Advantage: How Clear Packaging Boosts Product Desirability and Sales