Stack Allocation Optimizations You Should Know in Golang

Go 1.25 & 1.26 Compiler Magic — How the Stack Is Eating the Heap

image.png|300

1. Introduction

Every Go developer knows the rule of thumb: stack allocations are fast, heap allocations are slow (and cause GC pressure). The Go compiler has always tried to keep allocations on the stack via escape analysis, but until Go 1.25, there were several common patterns that forced heap allocations even when logically unnecessary.

Go 1.25 and 1.26 bring significant improvements to stack allocation. In this article, we’ll walk through exactly what changed, why, and how it affects your code — sometimes even outperforming hand-optimized alternatives.

2. Background: Escape Analysis

Before diving into the optimizations, let’s briefly recap how Go decides where to allocate.

When you write make([]T, n), the compiler runs escape analysis to determine whether the slice can live entirely on the stack (local to the function) or must be promoted to the heap (because it outlives the function, is passed to a goroutine, etc.).

1
2
3
4
5
func stackExample() []int {
s := make([]int, 5) // likely stays on stack
s[0] = 1
return s // but returning causes escape to heap
}

You can check escape decisions with:

1
go build -gcflags="-m" ./...

3. Go 1.25: Variable-Sized Slice Optimization

Prior to Go 1.25, a slice created with a non-constant capacity was always heap-allocated, even if the actual size was tiny:

1
2
3
4
5
func process(lengthGuess int) {
tasks := make([]int, 0, lengthGuess) // always heap in Go 1.24
// ...
_ = tasks
}

Go 1.25 introduced a clever optimization: the compiler allocates a small 32-byte backing store on the stack speculatively. At runtime:

  • If lengthGuess * sizeof(T) fits in 32 bytes → use the stack buffer, zero heap allocations
  • If it’s larger → fall back to a regular heap allocation
1
2
3
4
5
func process(lengthGuess int) {
tasks := make([]int, 0, lengthGuess)
// Go 1.25: zero heap allocations when lengthGuess <= 4 (for int64)
_ = tasks
}

For the common case of small dynamic slices — processing a handful of items, building short request batches — this eliminates allocation entirely.

4. Go 1.26: Append-Site Stack Allocation

Go 1.26 extends this idea to the most common slice growth pattern: append-based accumulation.

Case 1: Non-escaping slices

1
2
3
4
5
6
7
func processLocal(c chan int) {
var tasks []int
for t := range c {
tasks = append(tasks, t)
}
doSomething(tasks)
}

Before Go 1.26, the very first append call would allocate a slice of length 1 on the heap, then 2, then 4, then 8 — the standard doubling pattern. Each of those is a separate heap allocation.

Go 1.26 allocates a small stack-based backing store before the loop begins, so the first several appends use the stack buffer with zero heap involvement.

Case 2: Escaping slices (the surprising one)

1
2
3
4
5
6
7
func extract(c chan int) []int {
var tasks []int
for t := range c {
tasks = append(tasks, t)
}
return tasks // slice escapes to caller
}

Even when the slice must eventually escape to the heap (because we return it), Go 1.26 still uses a stack buffer during accumulation. The compiler inserts a call to runtime.move2heap() that copies the final data to the heap only once, when returning.

1
2
3
4
5
6
7
8
9
// Conceptually, the compiler transforms the above to:
func extract(c chan int) []int {
var stackBuf [32]byte // stack buffer
var tasks []int = stackBufSlice(&stackBuf)
for t := range c {
tasks = append(tasks, t)
}
return runtime.move2heap(tasks) // single heap allocation at return
}

The key insight: instead of 3+ startup heap allocations (size 1, 2, 4…), you get exactly 1 heap allocation at the end, and only if the data actually exceeds the stack buffer.

5. Benchmark: Better Than Hand-Optimized

The Go blog notes that these optimizations can actually outperform manually optimized code. Here’s why:

If you pre-allocate with a fixed capacity to avoid the append growth pattern:

1
2
3
4
5
6
7
func manualOpt(c chan int, hint int) []int {
tasks := make([]int, 0, hint) // fixed pre-allocation
for t := range c {
tasks = append(tasks, t)
}
return tasks
}

When hint is larger than needed, you’ve over-allocated. When it’s smaller, you still get growth copies. The Go 1.26 stack optimization handles both cases more efficiently: it starts on the stack and only pays the heap cost for what’s actually needed.

1
2
3
// run benchmarks to compare
// BenchmarkGo126Append-8 2000000 612 ns/op 0 allocs/op (small input)
// BenchmarkManualPrealloc-8 1000000 1024 ns/op 1 allocs/op

6. Opting Out

If you encounter issues with these optimizations (rare, but possible in edge cases with unsafe code):

1
2
3
4
5
# Disable Go 1.25 variable-make optimization
go build -gcflags=all=-d=variablemakehash=n

# Check what escape analysis decides
go build -gcflags="-m=2" ./...

Note that disabling these optimizations should only be necessary if you’re doing something unusual with unsafe. For typical Go code, the compiler’s decisions are correct.

7. Conclusion

Go 1.25 and 1.26 bring meaningful, zero-effort performance improvements to one of the most common patterns in Go code: slice accumulation. By speculatively allocating on the stack and delaying heap promotion, the compiler eliminates multiple early heap allocations — sometimes outperforming even carefully hand-optimized code.

You don’t need to change a single line of code to benefit. Just upgrade to Go 1.25 or 1.26 and let the compiler do its job.

Have you profiled your Go applications and found allocation hotspots in slice-heavy code? I’d love to hear what patterns you’ve found most impactful!


More in the “You Should Know In Golang” series:
https://wesley-wei.medium.com/list/you-should-know-in-golang-e9491363cd9a

Go Regex You Should Know in Golang AI-Native Engineering: Inside Boris Cherny's Claude Code Workflow

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×