GOMAXPROCS in Containers: Addressing CPU Limits In Go 1.25

The Surprising Impact of GOMAXPROCS on Kubernetes

golang go-25-procs|300

Hello, here is Wesley, Today’s article is about GOMAXPROCS in Containers. Without further ado, let’s get started.💪

Problem: Default GOMAXPROCS in Containers

When running Go applications within containerized environments like Kubernetes, a significant performance bottleneck can arise from the default behavior of the GOMAXPROCS setting. Since Go 1.5, GOMAXPROCS defaults to the number of available CPU cores as seen by the Go runtime, which typically reflects the total number of CPU cores on the underlying node, rather than the CPU limit specifically allocated to the container (Pod).

Consider a scenario where a Go application is deployed in a Kubernetes Pod with a CPU limit of 1 core on a node with 32 cores. The Go runtime will see 32 available cores and set GOMAXPROCS to 32. This mismatch causes the Go runtime to attempt to run up to 32 operating system threads to execute Go code, while Kubernetes, through Linux Cgroups, strictly limits the Pod to the equivalent of 1 CPU’s worth of compute time.

This discrepancy leads to several detrimental performance impacts, as highlighted in the widely discussed blog post “Golang Performance Penalty in Kubernetes”:

  • Increased latency (up to 65%+): Applications experience significant delays in processing requests.
  • Reduced throughput (near 20%): The number of requests the application can handle per second decreases substantially.

Existing Solutions (Workarounds)

Before the official proposal, developers had to rely on workarounds to mitigate this issue. Two common approaches were:

  • Manually Setting the GOMAXPROCS Environment Variable: This involves explicitly setting the GOMAXPROCS environment variable in the container specification to match the Pod’s CPU limit. In Kubernetes, this can be done using the resourceFieldRef in the Deployment YAML:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: my-go-app
    spec:
    # ...
    template:
    spec:
    containers:
    - name: my-container
    image: my-go-image:latest
    env:
    - name: GOMAXPROCS
    valueFrom:
    resourceFieldRef:
    resource: limits.cpu
    divisor: "1"
    resources:
    limits:
    cpu: "2" # Example CPU limit

    In this example, Kubernetes will automatically set the GOMAXPROCS environment variable in the container to the value of the limits.cpu, which is 2.

  • Using Third-Party Libraries: Libraries like uber-go/automaxprocs can be imported into the Go application. These libraries automatically detect the CPU limits imposed by Cgroups at application startup and set runtime.GOMAXPROCS() accordingly. You can find more about this library here: https://github.com/uber-go/automaxprocs.

While these solutions address the problem, they require developers to be aware of the issue and implement the fix, adding configuration burden and potential for oversight.

Official Proposal: CPU Limit Aware GOMAXPROCS (Go 1.25)

In a significant step towards improving the developer experience in cloud-native environments, the Go core team, with a proposal (#73193) from Michael Pratt of the Go Runtime team, aims to address this issue directly within the Go runtime itself. This proposal, targeted for Go 1.25, introduces Cgroup CPU limit awareness into the default behavior of GOMAXPROCS, promising out-of-the-box performance optimization for Go applications running in containers.

You can find the full proposal details here: https://go.dev/issue/73193.

The core mechanisms of the proposed solution include:

  • Automatic Detection of CPU Limit: Upon program startup (on Linux, if GOMAXPROCS is not set via environment variable), the Go runtime will actively detect:

    • (a) Total Machine CPU Cores: Obtained through the underlying mechanism of runtime.NumCPU().
    • (b) CPU Affinity Limits: Retrieved using the sched_getaffinity(2) system call, indicating the set of CPU cores the process is allowed to run on.
    • (c) Cgroup CPU Quota Limits: The runtime will traverse the Cgroup hierarchy of the process (supporting both v1 and v2) and read cpu.cfs_quota_us and cpu.cfs_period_us (v1) or cpu.max (v2) files at each level. It will calculate the CPU limit (equivalent cores = quota / period) for each level and take the minimum value across the entire hierarchy as the “effective CPU limit”.
  • Calculation of the New Default GOMAXPROCS: The new default GOMAXPROCS value will be the minimum of the values calculated in (a), (b), and a modified version of (c). The Cgroup limit (c) will be adjusted using the formula: adjusted_cgroup_limit = max(2, ceil(effective_cpu_limit)). This means the effective CPU limit is first rounded up to the nearest integer using ceil, and then the result is compared with 2, with the larger value being used.

  • Automatic Updates: To accommodate dynamic changes in CPU limits or affinity (e.g., Kubernetes’ “in place vertical scaling”), the Go runtime will introduce a background mechanism (likely within the sysmon goroutine) to periodically re-check these values at a low frequency (e.g., every 30 seconds to 1 minute). If a change results in a different calculated default GOMAXPROCS, the runtime will automatically update the setting internally.

  • New API: The proposal also introduces a new public API: runtime.SetDefaultGOMAXPROCS(). Calling this function will immediately trigger the calculation and setting of the default GOMAXPROCS value, overriding any value set via the GOMAXPROCS environment variable. This allows for restoring the automatic detection behavior or forcing an update when external changes are known.

  • Compatibility Control: This change, which could alter existing program behavior, will be controlled by a GODEBUG flag: cgroupgomaxprocs=1. For Go projects with a go.mod file specifying a Go language version lower than 1.25, this flag will default to 0 (disabling the new behavior). Only when the project’s Go version is upgraded to 1.25 or later will the default become 1 (enabling the new behavior). Developers can still explicitly disable the new behavior by setting GODEBUG=cgroupgomaxprocs=0.

Design Considerations & Details

The proposal also addresses several design considerations and details:

  • Why Limit and Not Shares/Request? Cgroup cpu.shares (v1) or cpu.weights (v2) (corresponding to Kubernetes CPU Requests) define relative priorities during resource contention, not hard limits on CPU usage. When the system is not heavily loaded, containers with only Requests might use significantly more CPU than requested. Therefore, CPU Quota (Limit) is a more suitable metric for controlling parallelism via GOMAXPROCS, a conclusion also reached by Java and .NET runtimes.

  • Handling Fractional Limits (Rounding): Cgroup Quota can be fractional (e.g., 1.5 cores). Since GOMAXPROCS must be an integer, the proposal opts for rounding up (ceil). For example, a limit of 1.5 will result in a GOMAXPROCS of 2. This is intended to allow the application to utilize the burst capacity provided by Cgroups and might better indicate CPU starvation to monitoring systems. However, this differs from uber-go/automaxprocs, which defaults to rounding down, assuming fractional quotas might be reserved for sidecar processes. This remains an open point for discussion.

  • Minimum Value of 2: The proposal suggests a minimum adjusted Cgroup limit of 2. Even if the calculated effective CPU limit is less than 1 (e.g., 0.5), the adjusted value will be at least 2. This is because setting GOMAXPROCS to 1 entirely disables the Go scheduler’s parallelism, potentially leading to unexpected performance issues and behavior, such as GC workers temporarily halting user goroutines. A minimum of 2 preserves basic parallelism and allows better utilization of Cgroup burst capabilities. If the physical core count or CPU affinity is 1, GOMAXPROCS will still be 1.

  • Logging: Unlike automaxprocs, the built-in implementation in the proposal will not print logs about automatic GOMAXPROCS adjustments by default, aiming to keep runtime output cleaner.

Summary of Official Proposal Benefits

The successful implementation of this proposal in Go 1.25 promises significant benefits for Go applications running in containerized environments:

  • Out-of-the-Box Performance Optimization: By automatically aligning GOMAXPROCS with the Cgroup CPU Limit, the proposal eliminates a common source of performance bottlenecks like high latency and low throughput caused by misconfiguration.
  • Simplified Operations: Developers will no longer need to manually set GOMAXPROCS or rely on third-party libraries like automaxprocs, greatly simplifying deployment configurations and reducing the risk of misconfiguration.
  • Automatic Adaptation to Dynamic Resources: The automatic update mechanism ensures that Go applications can better adapt to dynamic resource adjustments in platforms like Kubernetes, maximizing resource utilization.

GOMAXPROCS and Containerization

The fundamental issue arises because the default behavior of GOMAXPROCS is ill-suited for the resource-constrained nature of containerized environments. As demonstrated by benchmarks, when GOMAXPROCS is set to the high number of node CPUs while the container is limited to fewer CPUs, the following performance penalties occur:

  • Excessive Context Switching: A large number of Go threads compete for a limited amount of CPU time, forcing the operating system kernel to perform frequent and inefficient context switches. Benchmarks have shown context switch counts increasing by nearly 4 times when GOMAXPROCS is misconfigured.

  • CPU Throttling and Scheduling Latency: Concurrent threads quickly exhaust the CPU time quota allocated by Cgroups. Once the quota is depleted, the kernel forcibly suspends all threads in the container until the next scheduling period, leading to significant spikes in request processing latency. Waiting times for CPU have been observed to reach peaks of 34 seconds under incorrect configurations, compared to mere milliseconds when correctly set.

  • Significant Application Performance Degradation: The combined effect of excessive context switching and frequent CPU throttling results in a substantial decrease in end-to-end application performance. Benchmarks showed average request latency increasing by 65%, maximum latency by 82%, and overall requests per second decreasing by nearly 20% when GOMAXPROCS was left at the node’s core count instead of being set to the container’s limit.

  • GC Amplification: Go’s concurrent garbage collector (GC) scales its work based on GOMAXPROCS. An excessively high GOMAXPROCS can cause the GC to initiate far more concurrent marking work than the available CPU resources can handle, exacerbating CPU throttling even when the application itself is not heavily loaded. In extreme cases, many GC worker goroutines might run simultaneously, briefly freezing user goroutine execution due to kernel scheduling.

  • Runtime Scalability Costs: Running with a high GOMAXPROCS incurs additional runtime overhead, such as increased memory usage due to per-P local caches (like mcache) and synchronization costs for work stealing and GC coordination between Ps. When GOMAXPROCS significantly exceeds the available CPU, these costs provide no corresponding benefit in terms of parallel processing.

Limitations of the Proposal

It’s important to note that this proposal primarily addresses scenarios where CPU Limits are explicitly set for containers. For the common configuration where Kubernetes users set CPU Requests but not Limits, this change will not have a direct impact. In such cases, GOMAXPROCS will still be based on the node’s CPU count or affinity settings. Optimizing resource utilization for Pods with only CPU Requests remains a future area for exploration.

GOMAXPROCS Fundamentals

The GOMAXPROCS environment variable and the runtime.GOMAXPROCS() function control the maximum number of operating system threads that can simultaneously execute user-level Go code. It’s crucial to understand that Go uses goroutines, which are lightweight, user-level threads, but these goroutines need to be scheduled onto actual operating system threads managed by the kernel to run on CPU cores. GOMAXPROCS essentially limits the number of these OS threads that the Go runtime can use concurrently for executing goroutines.

CPU Limits vs. Requests (Kubernetes)

In Kubernetes, it’s essential to distinguish between CPU Limits and CPU Requests:

  • CPU Limit: This defines the maximum amount of CPU time a container is allowed to use. Kubernetes enforces this limit using Linux Cgroups, throttling the container if it tries to exceed the specified amount. A CPU limit of 1 means the container will get at most the equivalent of one full CPU core’s worth of compute time, even if the underlying node has many more cores.

  • CPU Request: This represents the minimum amount of CPU the container is guaranteed. The Kubernetes scheduler uses these requests to decide which node is suitable to run the Pod, ensuring that the node has enough capacity to meet the requests of all its running Pods. However, a CPU request does not impose a hard limit on how much CPU the container can use if there are idle resources available on the node.

The proposal primarily focuses on aligning GOMAXPROCS with the hard upper bound defined by CPU Limits.

Context Switching

Context switching is the process by which the operating system’s kernel switches the CPU from one thread to another. This involves saving the state of the currently running thread and restoring the state of the next thread to be executed. While necessary for multitasking, excessive context switching introduces overhead and reduces the overall efficiency of the system.

When GOMAXPROCS is set too high in a container with a low CPU limit, the Go runtime creates many more OS threads than can effectively run in parallel. These threads constantly compete for the limited CPU time, leading to a significant increase in the number of context switches performed by the kernel. As demonstrated by benchmarks, a mismatch between GOMAXPROCS and the CPU limit can cause context switch counts to skyrocket, wasting valuable CPU cycles on managing threads instead of executing application code.

  • Cgroups (Control Groups): A Linux kernel feature that allows for limiting and isolating resource usage (CPU, memory, I/O, etc.) for groups of processes. Container runtimes like Docker and Kubernetes heavily rely on Cgroups to enforce resource limits on containers.

  • CPU Affinity: A kernel feature that allows restricting a process (or its threads) to run only on a specific set of CPU cores. The Go runtime currently considers CPU affinity when determining the default GOMAXPROCS. The new proposal will also take CPU affinity changes into account for automatic updates.

  • Go Scheduler: The component within the Go runtime responsible for scheduling goroutines onto available OS threads for execution. An incorrectly configured GOMAXPROCS can negatively impact the efficiency of the Go scheduler, leading to suboptimal goroutine execution.

The upcoming changes in Go 1.25 represent a significant step forward in making Go a more performant and developer-friendly language for building cloud-native applications. By baking in Cgroup awareness for GOMAXPROCS, the Go team is directly addressing a long-standing pain point and paving the way for more efficient resource utilization in containerized environments.`

References

More

Recent Articles:

Random Article:


More Series Articles about You Should Know In Golang:

https://wesley-wei.medium.com/list/you-should-know-in-golang-e9491363cd9a

And I’m Wesley, delighted to share knowledge from the world of programming. 

Don’t forget to follow me for more informative content, or feel free to share this with others who may also find it beneficial. It would be a great help to me.

Give me some free applauds, highlights, or replies, and I’ll pay attention to those reactions, which will determine whether I continue to post this type of article.

See you in the next article. 👋

中文文章: https://programmerscareer.com/zh-cn/go-25-procs/
Author: Medium,LinkedIn,Twitter
Note: Originally written at https://programmerscareer.com/go-25-procs/ at 2025-04-24 01:28.
Copyright: BY-NC-ND 3.0

HumanSystemOptimization 的总结 Go in 2024 and Beyond: Riding the Cloud and Charting the AI Frontier

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×