Go 1.24 Map Improvements: Swiss Tables & Performance Gains

Revolutionizing Maps: Swiss Tables in Go 1.24

image.png|300

Note: The core content was generated by an LLM, with human fact-checking and structural refinement.

This article provides a comprehensive overview of the improvements made to Go maps in Go 1.24, focusing on the adoption of Swiss Tables, the challenges faced, their impact on performance and memory, the historical context of hash tables, and future work for Go maps. It also details a significant memory regression discovered in Go 1.24 that was addressed in collaboration with the Go community.

Go 1.24 includes a completely new implementation of the built-in map type, based on the Swiss Table design. This update promises reduced CPU and memory overhead, especially for large maps.

Swiss Tables Implementation

Swiss Tables are a form of open-addressed hash table. Unlike traditional chaining methods, open-addressed hash tables store all items in a single backing array, using a probe sequence to find an empty slot if the initial slot is occupied.

Here’s how Swiss Tables improve upon traditional open-addressed hash tables:

  • Structure: The backing array is divided into logical groups, typically of 8 slots each. Each group has a 64-bit control word, where each byte corresponds to a slot. This control word indicates whether a slot is empty, deleted, or in use, and if in use, it stores the lower 7 bits of the key’s hash (called h2).

    • Old Go Map Structure (Go 1.23 and before): Maps were based on traditional buckets and chaining. Each bucket had 8 slots. When full, keys overflowed into linked overflow buckets.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    // Conceptual representation of Go 1.23 map's bucket structure
    type hmap struct {
    // ... other fields
    buckets unsafe.Pointer // array of buckets
    oldbuckets unsafe.Pointer // during incremental growth
    nevacuate uint8 // progress counter for evacuation
    // ...
    }

    type bmap struct { // A bucket
    tophashuint8 // Top 8 bits of hash value for each slot
    keys unsafe.Pointer
    values unsafe.Pointer
    overflow unsafe.Pointer // link to overflow bucket
    }
  • Efficient Lookup and Insertion:

    1. The hash(key) is split into h1 (upper 57 bits) and h2 (lower 7 bits).
    2. h1 selects the initial group to consider.
    3. Within a group, the control word allows for parallel comparison of the target h2 with all 8 slots’ h2 values simultaneously. This is often supported by SIMD hardware, but can also be implemented using standard arithmetic and bitwise operations. This effectively performs 8 probe steps at once.
    4. If multiple h2 values match, the full key comparison is performed on those candidate slots (there’s a 1/128 probability of h2 collision).
    5. Collision Resolution: If a group is full, the probe sequence continues to the next group until an empty slot is found. This eliminates the need for overflow buckets entirely, a key difference from Go 1.23’s approach.
  • Load Factor: Swiss Tables can safely operate with a higher maximum load factor (87.5% or 7/8) due to faster probing, compared to Go 1.23’s 81.25% (13/16). This means more elements can be stored in fewer slots, saving memory.

Go Specific Challenges Solutions

Go’s built-in map type has unique properties that required specific adaptations for the Swiss Table design.

  • Incremental Growth: Traditional hash tables typically grow by doubling the backing array and copying all entries at once, which can cause arbitrarily large latency spikes for very large maps. Go is frequently used for latency-sensitive servers and aims to bound the latency impact of a single map insertion.
    • Go’s Solution: Go 1.24 addresses this by introducing another layer of indirection: each map consists of multiple independent Swiss Tables. This is a form of extendible hashing, where a variable number of upper bits of the hash select which table a key belongs to. Each individual Swiss Table is capped at a maximum of 128 groups (1024 entries). When a table needs to grow, it does so all at once, but only affects that specific sub-table, capping the maximum work per insertion to copying 1024 entries. This also means old arrays are not kept in memory during migration, which was a memory overhead in Go 1.23.
  • Modification During Iteration: The Go language specification explicitly allows map modifications during iteration with specific semantics (deleted entries not produced, updated values produced, new entries may or may not be produced). This is challenging for many hash table designs, including Abseil’s Swiss Tables, which forbid it.
    • Go’s Solution: To handle this, the iterator maintains a reference to the table being iterated over. If that table grows, the iterator continues using the old version for order, but consults the grown table before returning an entry to ensure it still exists and to retrieve the latest value. This complex logic covers the core semantics, though iteration remains the most intricate part of Go’s map implementation.

Impact & Benefits

The Go 1.24 map improvements have led to significant performance gains and memory savings.

  • Performance: In microbenchmarks, map operations are up to 60% faster than in Go 1.23. Overall, in full application benchmarks, a geometric mean CPU time improvement of around 1.5% was observed.
  • Memory Footprint Reduction: Swiss Tables contribute to lower memory usage due to their higher load factor and the elimination of overflow buckets. The efficient growth mechanism in Go 1.24, by splitting tables independently, also significantly reduces memory overhead during expansion compared to Go 1.23’s incremental migration that kept old buckets in memory.

Datadog Case Study: Datadog experienced a surprising memory usage drop in high-traffic environments after upgrading to Go 1.24. This was primarily due to the shardRoutingCache map.

1
2
3
4
5
6
7
8
9
10
11
// The key represents each routing key derived from the data payload
shardRoutingCache map[string]Response

type ShardType int // Enum representing the shard type (hot, warm, cold)

type Response struct {
ShardID int32
ShardType ShardType
RoutingKey string // (We'll tackle later why we're storing the routing key both as a key and value!)
LastModified *time.Time // Representing the timestamp at which the entry was last modified
}
  • Go 1.23 Memory Usage: For approximately 3,500,000 elements, Go 1.23’s bucket arrays (including old buckets during incremental growth) and pre-allocated overflow buckets resulted in an estimated 726.4 MiB of memory for the map structure itself.
  • Go 1.24 Memory Usage: For the same 3,500,000 elements, Go 1.24’s Swiss Table implementation (requiring ~500,000 groups distributed across ~3900 independent tables) resulted in an estimated 217 MiB.
  • Quantified Savings: This translates to a saving of roughly 500 MiB of live heap usage for the shardRoutingCache map, or about 1 GiB of physical memory (RSS) when factoring in Go’s Garbage Collector (GOGC=100). This represents a ~70% reduction in map memory usage for large maps.

Why Lower-Traffic Environments Didn’t See the Same Gains: In environments with fewer elements (e.g., 550,000 elements), the memory savings from Swiss Tables (approx. 28 MiB) were significantly smaller. This was not enough to offset a separate ~200-300 MiB RSS increase caused by a mallocgc memory regression in Go 1.24 (discussed below). This highlights that the benefits of Swiss Tables are maximized when handling large maps.

Application-Level Optimizations by Datadog: Beyond the runtime improvements, Datadog further optimized memory by refining their Response struct.

  • They discovered that RoutingKey and LastModified fields in the Response struct were never populated for this specific use case, despite being present.

  • The ShardType field, an int (8 bytes), was changed to a uint8 (1 byte) as it only had three possible enum values.

  • A new cachedResponse type was introduced, containing only ShardID and ShardType:

    1
    2
    3
    4
    type cachedResponse struct {
    ShardID int32
    ShardType uint8
    }

These changes reduced the size of a single key-value pair from 56 bytes to 24 bytes (including padding), resulting in an additional ~250 MiB RSS decrease per pod in high-traffic environments. This demonstrates the importance of scrutinizing application-level data structures alongside runtime optimizations.

Cost Reductions: The memory savings translate into two main options for cost reduction:

  1. Lowering Kubernetes memory limits: Allows other applications to use the freed memory.
  2. Trading memory for CPU with GOMEMLIMIT: For CPU-bound workloads, saved memory can be traded for CPU, potentially allowing for a reduction in the number of pods.

Overall Takeaways: The Datadog investigation provided valuable insights:

  • Go 1.24’s Swiss Tables are a significant win for large map-heavy applications, offering substantial memory and performance gains.
  • Vigilant observation and detailed runtime metrics (like RSS) and heap profiling are crucial for identifying and diagnosing subtle issues and optimizations across Go versions.
  • Runtime-level improvements and application-level optimizations are complementary; small changes in data structure definitions can have a major impact at scale.
  • The experience reaffirmed the power of community collaboration in resolving complex engineering challenges.

Historical Context of Hash Tables

The concept of hash tables dates back to 1953, when Hans Peter Luhn described placing items into “buckets” and using linked lists for overflow, a method now known as chaining. In 1954, Gene M. Amdahl, Elaine M. McGraw, and Arthur L. Samuel first used an open addressing scheme with linear probing, which was formalized and published by W. Wesley Peterson in 1957. Despite their long history, hash table data structures, like sorting algorithms (e.g., Go 1.19 switching to pattern-defeating quicksort), continue to see advancements. The Swiss Table design itself was presented by Sam Benzaquen, Alkis Evlogimenos, Matt Kulukundis, and Roman Perepelitsa at Google in 2017 and open-sourced in 2018 in the Abseil C++ library.

Future Work for Go Maps

The Go team plans to investigate further map improvements for future releases:

  • Increased locality of operations: Aiming to reduce CPU cache misses.
  • Improved control word comparisons: Extending SIMD support to other architectures beyond amd64 and potentially increasing the group size to 16 slots to perform 16 hash comparisons in parallel instead of 8. This would further decrease the average number of probes for lookups.

The Go 1.24 built-in map implementation is heavily based on the work of Peter Mattis, who combined initial ideas from YunHao Zhang, PJ Malloy, and @andy-wm-arthur to build github.com/cockroachdb/swiss, a Go-spec compliant Swiss Table implementation.

Go 1.24 Memory Regression (Part 1)

Datadog’s story began with tracking down an unexpected memory usage increase (a ~20% rise in RSS) after deploying Go 1.24 to data-processing services, despite Go 1.24 being advertised to reduce memory usage.

  • Initial Investigation: They ruled out Swiss Tables (by using GOEXPERIMENT=noswissmap) and the spin bit mutex change (GOEXPERIMENT=nospinbitmutex) as direct causes, as disabling them showed no improvement.
  • Discrepancy in Metrics: A puzzling contradiction emerged: system metrics (RSS) showed increased memory usage, while Go’s internal runtime metrics remained stable. RSS measures physical memory, while Go’s runtime primarily tracks virtual memory.
  • Theory: The theory was that Go 1.24 was causing previously uncommitted virtual memory to be committed to physical RAM, explaining the RSS increase without changes in Go’s internal accounting.
  • smaps Analysis: Using Linux’s /proc/[pid]/smaps, Datadog confirmed that the Go heap was the only memory region impacted by the RSS increase.
  • Root Cause: In collaboration with the Go community, a git bisect pointed to a significant refactoring of the mallocgc function in the Go runtime. The refactoring inadvertently removed an optimization where Go avoided re-zeroing freshly obtained memory from the OS for large objects (>32 KiB) containing pointers. This resulted in unconditional zeroing, causing more virtual memory pages to be committed to physical RAM, thus increasing RSS. Datadog’s service used large channel buffers with pointer-containing structs, which aligned perfectly with this finding.
  • Fix: A fix was developed and will be included in Go 1.25.

Despite this regression in lower-traffic environments, the significant gains from Swiss Tables in high-traffic environments (where the shardRoutingCache map dominated memory) ultimately made Go 1.24 a net win for Datadog’s services.


Quoted Article Links:

More

Recent Articles:

Random Article:


More Series Articles about You Should Know In Golang:

https://wesley-wei.medium.com/list/you-should-know-in-golang-e9491363cd9a

And I’m Wesley, delighted to share knowledge from the world of programming. 

Don’t forget to follow me for more informative content, or feel free to share this with others who may also find it beneficial. It would be a great help to me.

Give me some free applauds, highlights, or replies, and I’ll pay attention to those reactions, which will determine whether I continue to post this type of article.

See you in the next article. 👋

中文文章: https://programmerscareer.com/zh-cn/go-24-swiss/
Author: Medium,LinkedIn,Twitter
Note: Originally written at https://programmerscareer.com/go-24-swiss/ at 2025-07-31 22:52.
Copyright: BY-NC-ND 3.0

Error Handling in Go vs.Zig Go Programming: Best Practices, Anti-Patterns, and Refactoring

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×