The Surprising Impact of GOMAXPROCS on Kubernetes
Hello, here is Wesley, Today’s article is about GOMAXPROCS in Containers. Without further ado, let’s get started.💪
Problem: Default GOMAXPROCS in Containers
When running Go applications within containerized environments like Kubernetes, a significant performance bottleneck can arise from the default behavior of the GOMAXPROCS
setting. Since Go 1.5, GOMAXPROCS
defaults to the number of available CPU cores as seen by the Go runtime, which typically reflects the total number of CPU cores on the underlying node, rather than the CPU limit specifically allocated to the container (Pod).
Consider a scenario where a Go application is deployed in a Kubernetes Pod with a CPU limit of 1 core on a node with 32 cores. The Go runtime will see 32 available cores and set GOMAXPROCS
to 32. This mismatch causes the Go runtime to attempt to run up to 32 operating system threads to execute Go code, while Kubernetes, through Linux Cgroups, strictly limits the Pod to the equivalent of 1 CPU’s worth of compute time.
This discrepancy leads to several detrimental performance impacts, as highlighted in the widely discussed blog post “Golang Performance Penalty in Kubernetes”:
- Increased latency (up to 65%+): Applications experience significant delays in processing requests.
- Reduced throughput (near 20%): The number of requests the application can handle per second decreases substantially.
Existing Solutions (Workarounds)
Before the official proposal, developers had to rely on workarounds to mitigate this issue. Two common approaches were:
Manually Setting the
GOMAXPROCS
Environment Variable: This involves explicitly setting theGOMAXPROCS
environment variable in the container specification to match the Pod’s CPU limit. In Kubernetes, this can be done using theresourceFieldRef
in the Deployment YAML:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20apiVersion: apps/v1
kind: Deployment
metadata:
name: my-go-app
spec:
# ...
template:
spec:
containers:
- name: my-container
image: my-go-image:latest
env:
- name: GOMAXPROCS
valueFrom:
resourceFieldRef:
resource: limits.cpu
divisor: "1"
resources:
limits:
cpu: "2" # Example CPU limitIn this example, Kubernetes will automatically set the
GOMAXPROCS
environment variable in the container to the value of thelimits.cpu
, which is2
.Using Third-Party Libraries: Libraries like
uber-go/automaxprocs
can be imported into the Go application. These libraries automatically detect the CPU limits imposed by Cgroups at application startup and setruntime.GOMAXPROCS()
accordingly. You can find more about this library here: https://github.com/uber-go/automaxprocs.
While these solutions address the problem, they require developers to be aware of the issue and implement the fix, adding configuration burden and potential for oversight.
Official Proposal: CPU Limit Aware GOMAXPROCS (Go 1.25)
In a significant step towards improving the developer experience in cloud-native environments, the Go core team, with a proposal (#73193) from Michael Pratt of the Go Runtime team, aims to address this issue directly within the Go runtime itself. This proposal, targeted for Go 1.25, introduces Cgroup CPU limit awareness into the default behavior of GOMAXPROCS
, promising out-of-the-box performance optimization for Go applications running in containers.
You can find the full proposal details here: https://go.dev/issue/73193.
The core mechanisms of the proposed solution include:
Automatic Detection of CPU Limit: Upon program startup (on Linux, if
GOMAXPROCS
is not set via environment variable), the Go runtime will actively detect:- (a) Total Machine CPU Cores: Obtained through the underlying mechanism of
runtime.NumCPU()
. - (b) CPU Affinity Limits: Retrieved using the
sched_getaffinity(2)
system call, indicating the set of CPU cores the process is allowed to run on. - (c) Cgroup CPU Quota Limits: The runtime will traverse the Cgroup hierarchy of the process (supporting both v1 and v2) and read
cpu.cfs_quota_us
andcpu.cfs_period_us
(v1) orcpu.max
(v2) files at each level. It will calculate the CPU limit (equivalent cores = quota / period) for each level and take the minimum value across the entire hierarchy as the “effective CPU limit”.
- (a) Total Machine CPU Cores: Obtained through the underlying mechanism of
Calculation of the New Default
GOMAXPROCS
: The new defaultGOMAXPROCS
value will be the minimum of the values calculated in (a), (b), and a modified version of (c). The Cgroup limit (c) will be adjusted using the formula:adjusted_cgroup_limit = max(2, ceil(effective_cpu_limit))
. This means the effective CPU limit is first rounded up to the nearest integer usingceil
, and then the result is compared with 2, with the larger value being used.Automatic Updates: To accommodate dynamic changes in CPU limits or affinity (e.g., Kubernetes’ “in place vertical scaling”), the Go runtime will introduce a background mechanism (likely within the
sysmon
goroutine) to periodically re-check these values at a low frequency (e.g., every 30 seconds to 1 minute). If a change results in a different calculated defaultGOMAXPROCS
, the runtime will automatically update the setting internally.New API: The proposal also introduces a new public API:
runtime.SetDefaultGOMAXPROCS()
. Calling this function will immediately trigger the calculation and setting of the defaultGOMAXPROCS
value, overriding any value set via theGOMAXPROCS
environment variable. This allows for restoring the automatic detection behavior or forcing an update when external changes are known.Compatibility Control: This change, which could alter existing program behavior, will be controlled by a
GODEBUG
flag:cgroupgomaxprocs=1
. For Go projects with ago.mod
file specifying a Go language version lower than 1.25, this flag will default to0
(disabling the new behavior). Only when the project’s Go version is upgraded to 1.25 or later will the default become1
(enabling the new behavior). Developers can still explicitly disable the new behavior by settingGODEBUG=cgroupgomaxprocs=0
.
Design Considerations & Details
The proposal also addresses several design considerations and details:
Why Limit and Not Shares/Request? Cgroup
cpu.shares
(v1) orcpu.weights
(v2) (corresponding to Kubernetes CPU Requests) define relative priorities during resource contention, not hard limits on CPU usage. When the system is not heavily loaded, containers with only Requests might use significantly more CPU than requested. Therefore, CPU Quota (Limit) is a more suitable metric for controlling parallelism viaGOMAXPROCS
, a conclusion also reached by Java and .NET runtimes.Handling Fractional Limits (Rounding): Cgroup Quota can be fractional (e.g., 1.5 cores). Since
GOMAXPROCS
must be an integer, the proposal opts for rounding up (ceil
). For example, a limit of 1.5 will result in aGOMAXPROCS
of 2. This is intended to allow the application to utilize the burst capacity provided by Cgroups and might better indicate CPU starvation to monitoring systems. However, this differs fromuber-go/automaxprocs
, which defaults to rounding down, assuming fractional quotas might be reserved for sidecar processes. This remains an open point for discussion.Minimum Value of 2: The proposal suggests a minimum adjusted Cgroup limit of 2. Even if the calculated effective CPU limit is less than 1 (e.g., 0.5), the adjusted value will be at least 2. This is because setting
GOMAXPROCS
to 1 entirely disables the Go scheduler’s parallelism, potentially leading to unexpected performance issues and behavior, such as GC workers temporarily halting user goroutines. A minimum of 2 preserves basic parallelism and allows better utilization of Cgroup burst capabilities. If the physical core count or CPU affinity is 1,GOMAXPROCS
will still be 1.Logging: Unlike
automaxprocs
, the built-in implementation in the proposal will not print logs about automaticGOMAXPROCS
adjustments by default, aiming to keep runtime output cleaner.
Summary of Official Proposal Benefits
The successful implementation of this proposal in Go 1.25 promises significant benefits for Go applications running in containerized environments:
- Out-of-the-Box Performance Optimization: By automatically aligning
GOMAXPROCS
with the Cgroup CPU Limit, the proposal eliminates a common source of performance bottlenecks like high latency and low throughput caused by misconfiguration. - Simplified Operations: Developers will no longer need to manually set
GOMAXPROCS
or rely on third-party libraries likeautomaxprocs
, greatly simplifying deployment configurations and reducing the risk of misconfiguration. - Automatic Adaptation to Dynamic Resources: The automatic update mechanism ensures that Go applications can better adapt to dynamic resource adjustments in platforms like Kubernetes, maximizing resource utilization.
GOMAXPROCS and Containerization
The fundamental issue arises because the default behavior of GOMAXPROCS
is ill-suited for the resource-constrained nature of containerized environments. As demonstrated by benchmarks, when GOMAXPROCS
is set to the high number of node CPUs while the container is limited to fewer CPUs, the following performance penalties occur:
Excessive Context Switching: A large number of Go threads compete for a limited amount of CPU time, forcing the operating system kernel to perform frequent and inefficient context switches. Benchmarks have shown context switch counts increasing by nearly 4 times when
GOMAXPROCS
is misconfigured.CPU Throttling and Scheduling Latency: Concurrent threads quickly exhaust the CPU time quota allocated by Cgroups. Once the quota is depleted, the kernel forcibly suspends all threads in the container until the next scheduling period, leading to significant spikes in request processing latency. Waiting times for CPU have been observed to reach peaks of 34 seconds under incorrect configurations, compared to mere milliseconds when correctly set.
Significant Application Performance Degradation: The combined effect of excessive context switching and frequent CPU throttling results in a substantial decrease in end-to-end application performance. Benchmarks showed average request latency increasing by 65%, maximum latency by 82%, and overall requests per second decreasing by nearly 20% when
GOMAXPROCS
was left at the node’s core count instead of being set to the container’s limit.GC Amplification: Go’s concurrent garbage collector (GC) scales its work based on
GOMAXPROCS
. An excessively highGOMAXPROCS
can cause the GC to initiate far more concurrent marking work than the available CPU resources can handle, exacerbating CPU throttling even when the application itself is not heavily loaded. In extreme cases, many GC worker goroutines might run simultaneously, briefly freezing user goroutine execution due to kernel scheduling.Runtime Scalability Costs: Running with a high
GOMAXPROCS
incurs additional runtime overhead, such as increased memory usage due to per-P local caches (likemcache
) and synchronization costs for work stealing and GC coordination between Ps. WhenGOMAXPROCS
significantly exceeds the available CPU, these costs provide no corresponding benefit in terms of parallel processing.
Limitations of the Proposal
It’s important to note that this proposal primarily addresses scenarios where CPU Limits are explicitly set for containers. For the common configuration where Kubernetes users set CPU Requests but not Limits, this change will not have a direct impact. In such cases, GOMAXPROCS
will still be based on the node’s CPU count or affinity settings. Optimizing resource utilization for Pods with only CPU Requests remains a future area for exploration.
GOMAXPROCS Fundamentals
The GOMAXPROCS
environment variable and the runtime.GOMAXPROCS()
function control the maximum number of operating system threads that can simultaneously execute user-level Go code. It’s crucial to understand that Go uses goroutines, which are lightweight, user-level threads, but these goroutines need to be scheduled onto actual operating system threads managed by the kernel to run on CPU cores. GOMAXPROCS
essentially limits the number of these OS threads that the Go runtime can use concurrently for executing goroutines.
CPU Limits vs. Requests (Kubernetes)
In Kubernetes, it’s essential to distinguish between CPU Limits and CPU Requests:
CPU Limit: This defines the maximum amount of CPU time a container is allowed to use. Kubernetes enforces this limit using Linux Cgroups, throttling the container if it tries to exceed the specified amount. A CPU limit of
1
means the container will get at most the equivalent of one full CPU core’s worth of compute time, even if the underlying node has many more cores.CPU Request: This represents the minimum amount of CPU the container is guaranteed. The Kubernetes scheduler uses these requests to decide which node is suitable to run the Pod, ensuring that the node has enough capacity to meet the requests of all its running Pods. However, a CPU request does not impose a hard limit on how much CPU the container can use if there are idle resources available on the node.
The proposal primarily focuses on aligning GOMAXPROCS
with the hard upper bound defined by CPU Limits.
Context Switching
Context switching is the process by which the operating system’s kernel switches the CPU from one thread to another. This involves saving the state of the currently running thread and restoring the state of the next thread to be executed. While necessary for multitasking, excessive context switching introduces overhead and reduces the overall efficiency of the system.
When GOMAXPROCS
is set too high in a container with a low CPU limit, the Go runtime creates many more OS threads than can effectively run in parallel. These threads constantly compete for the limited CPU time, leading to a significant increase in the number of context switches performed by the kernel. As demonstrated by benchmarks, a mismatch between GOMAXPROCS
and the CPU limit can cause context switch counts to skyrocket, wasting valuable CPU cycles on managing threads instead of executing application code.
Related Concepts
Cgroups (Control Groups): A Linux kernel feature that allows for limiting and isolating resource usage (CPU, memory, I/O, etc.) for groups of processes. Container runtimes like Docker and Kubernetes heavily rely on Cgroups to enforce resource limits on containers.
CPU Affinity: A kernel feature that allows restricting a process (or its threads) to run only on a specific set of CPU cores. The Go runtime currently considers CPU affinity when determining the default
GOMAXPROCS
. The new proposal will also take CPU affinity changes into account for automatic updates.Go Scheduler: The component within the Go runtime responsible for scheduling goroutines onto available OS threads for execution. An incorrectly configured
GOMAXPROCS
can negatively impact the efficiency of the Go scheduler, leading to suboptimal goroutine execution.
The upcoming changes in Go 1.25 represent a significant step forward in making Go a more performant and developer-friendly language for building cloud-native applications. By baking in Cgroup awareness for GOMAXPROCS
, the Go team is directly addressing a long-standing pain point and paving the way for more efficient resource utilization in containerized environments.`
References
- [1]: Golang Performance Penalty in Kubernetes: https://blog.esc.sh/golang-performance-penalty-in-kubernetes/
- [2]: https://github.com/uber-go/automaxprocs
- [3]: _https://go.dev/issue/73193
More
Recent Articles:
- Go Sum You Should Know in Golang on Medium on Website
- Common Optimizations You Should Know in Golang on Medium on Website
Random Article:
More Series Articles about You Should Know In Golang:
https://wesley-wei.medium.com/list/you-should-know-in-golang-e9491363cd9a
And I’m Wesley, delighted to share knowledge from the world of programming.
Don’t forget to follow me for more informative content, or feel free to share this with others who may also find it beneficial. It would be a great help to me.
Give me some free applauds, highlights, or replies, and I’ll pay attention to those reactions, which will determine whether I continue to post this type of article.
See you in the next article. 👋
中文文章: https://programmerscareer.com/zh-cn/go-25-procs/
Author: Medium,LinkedIn,Twitter
Note: Originally written at https://programmerscareer.com/go-25-procs/ at 2025-04-24 01:28.
Copyright: BY-NC-ND 3.0
Comments