Common Optimizations You Should Know in Golang

Golang Performance Optimization: Tips and Tricks to Improve Your Code.

golang golang-advanced-02(from github.com/MariaLetta/free-gophers-pack)|300

Hello, here is Wesley, Today’s article is about common optimizations. Without further ado, let’s get started.💪

What is Performance Optimization?

Performance optimization is a process that improves the performance of code and systems by making them run faster, more efficiently. It’s not just about making programs run faster; it also involves finding a balance between different resources (such as CPU, memory) to ensure that programs can effectively run in various situations.

In the process of performance optimization, several aspects are typically involved:

  1. CPU Utilization: Ensure that programs can effectively utilize CPU resources.
  2. Memory Usage: Optimize memory allocation and usage, reducing unnecessary memory consumption.
  3. Input/Output Efficiency: Improve I/O operation efficiency, such as file reading/writing, network communication, etc.
  4. Concurrent Processing: Use effective concurrent processing mechanisms to improve program throughput and response speed.

When multiple goroutines collaborate, it’s inevitable that they will encounter situations where they need to read/write the same block of memory. In this case, we usually introduce lock operations. However, in concurrent programming, locks often become a performance bottleneck, requiring us to carefully examine all places where locks are used and minimize or eliminate the overhead introduced by locks.

CPU-Oriented Optimization

When writing high-performance Go programs, understanding and optimizing CPU usage is crucial. CPU-oriented optimization primarily involves the following aspects:

  1. Cache Friendliness

Cache friendliness refers to how code accesses memory in a way that matches the behavior of CPU caches, thereby improving performance. Here are some methods for achieving cache friendliness:

  • Contiguous Memory Allocation: Use contiguous memory blocks as much as possible, which can effectively utilize CPU caches.
  • Reduce Cache Misses: Avoid random access to large arrays, as this will cause frequent cache misses.

Example: If you have an array that needs to be traversed extensively, it’s best to design it as a contiguous memory block rather than scattered nodes.

  1. Reducing Lock Contention

In concurrent programming, locks are unavoidable, but frequent contention can lead to performance degradation. Here are some methods for reducing lock contention:

  • Fine-Grained Locks: Break down large locks into several smaller fine-grained locks to reduce lock contention.
  • Lockless Data Structures: Use lockless data structures, such as the sync/atomic package provides atomic operations to implement lockless programming.
  1. Avoiding Frequent Memory Allocation

Frequent memory allocation and deallocation can lead to a significant amount of garbage collection. Here are some methods for reducing memory allocation:

  • Object Reuse: Use object pools to reuse already allocated objects instead of frequently creating new ones.
  • Pre-Allocate Memory: Pre-allocate sufficient memory when storing large amounts of data, which can reduce runtime memory allocation overhead.

Reducing Lock Contention

Coarse-Grained Locks

Lock contention is a common performance bottleneck, especially when multiple goroutines access the same lock simultaneously. To minimize this issue, we should strive to reduce the probability of lock contention.

For example, let’s consider implementing a website visitor counter with two variables: uv (unique visitors) and pv (page views). The uv variable rarely changes, while the pv variable frequently changes. We can implement the following code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
type Website struct {
mu sync.Mutex
uv int // unique visitors, rarely change
pv int // page views, frequently change
}

func (w *Website) Add(uv, pv bool) {
w.mu.Lock()
if uv {
w.uv++
}
if pv {
w.pv++
}
w.mu.Unlock()
}

However, this code uses the same lock to control both variables, which increases the probability of lock contention, especially for the rarely changing uv variable. To mitigate this issue, we can use separate locks for each variable:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
type WebsiteSplit struct {
muv sync.Mutex
uv int // unique visitors, rarely change
mpv sync.Mutex
pv int // page views, frequently change
}

func (w *WebsiteSplit) AddUV() {
w.muv.Lock()
w.uv++
w.muv.Unlock()
}

func (w *WebsiteSplit) AddPV() {
w.mpv.Lock()
w.pv++
w.mpv.Unlock()
}

Read-Write Locks

If a variable or a set of variables needs to be read and written simultaneously, with the probability of reads being much higher than writes, then a read-write lock is suitable. Read-write locks allow concurrent reads while ensuring that writes are mutually exclusive, further reducing lock contention.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
type WebsiteRW struct {
mu sync.RWMutex
uv int
}

func (w *WebsiteRW) AddUV() {
w.mu.Lock()
w.uv++
w.mu.Unlock()
}

func (w *WebsiteRW) UV()(uv int) {
w.mu.RLock()
uv = w.uv
w.mu.RUnlock()
return uv
}

Note that for integer variables, atomic operations are often more suitable. The above code is only intended as a demonstration scenario.

Reducing Lock Hold Time

The probability of lock contention is directly related to the time spent holding a lock, so in addition to reducing lock granularity, we can also minimize unnecessary locking by moving operations that don’t require locks to be performed after releasing the lock:

1
2
3
4
5
6
7
8
9
10
11
type Buffer struct  {
mu sync.Mutex
queue []string
}
func (w *Buffer) Flush() {
w.mu.Lock()
tmp := w.queue
w.queue = nil
w.mu.Unlock() // unlock before handler
batchHandler(tmp)
}

Sharding Locks

If concurrent access to the same variable is likely to cause conflicts, we can also consider using different variables in different situations to reduce conflicts. Sharding locks are an implementation of this idea.
A sharding lock requires that the scenario be able to be split into shards, which is typical for a Map:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
func NewMap() *Map {
return &Map{
locks: make([]sync.Mutex, runtime.GOMAXPROCS(0)),
mm: make([]map[string]string, runtime.GOMAXPROCS(0)),
}
}
type Map struct {
locks []sync.Mutex
mm []map[string]string
}
func (o *Map) Set(key, val string) {
shard := hash(key) % len(o.locks)
o.locks[shard].Lock()
o.mm[shard][key] = val
o.locks[shard].Unlock()
}

Here, we set the sharding number to the number of Go runtime processes, and each time Set is called, it calculates which shard the key belongs to based on the hash value, and then only modifies the object in that shard. In theory, this can reduce the probability of lock contention to one-sixth of the number of shards.

Lock-Free Programming

In other languages, we often have the concept of ThreadLocal, which Go does not provide explicitly. However, we can still expose internal thread-related operations using the go:linkname directive:

1
2
3
4
//go:linkname procPin runtime.procPin
func procPin() int
//go:linkname procUnpin runtime.procUnpin
func procUnpin() int

The procPin function returns the unique ID of the current P and prevents other parts of the process from influencing the variables held by P. This allows us to implement a local object pool for each P:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
type Cache struct {
pool [][]*Object
}

func (c *Cache) Get() (*Object, error) {
pid := procPin()
p := c.pool[pid]
if len(p) == 0 {
procUnpin()
return &Object{}, nil
}
o = p[len(p)-1]
c.pool[pid] = p[:len(p)-1]
procUnpin()
return o, nil
}

func (c *Cache) Put(o *Object) {
pid := procPin()
c.pool[pid] = append(c.pool[pid], o)
procUnpin()
}

This pattern is commonly used in caching scenarios where each P can obtain the same object and does not require returning it to the same P.

Note that directly using the //go:linkname directive to expose internal runtime functions is an unsafe and unsupported practice. This method may cause unpredictable behavior or compatibility issues across different Go versions.

For scenarios requiring object pool functionality, the Go standard library provides sync.Pool, which is a high-performance object reuse pool. sync.Pool maintains a local pool for each P, reducing lock contention between different Ps and improving performance.

Avoiding Frequent Memory Allocation

Go’s built-in garbage collector saves developers from manually managing object lifetimes, but it also sacrifices some performance. For everyday business development, this trade-off may be acceptable, but if we need to write low-level code that requires extreme performance, this language-level trade-off becomes a bottleneck.

Pre-allocating Objects

In Go, slices and maps require dynamic allocation when resizing, which can lead to high performance overhead. Each resize operation typically involves the following steps:

  1. Memory reallocation: When the current capacity is insufficient to hold new elements, the system needs to allocate a larger memory block.
  2. Data copying: Copying original data from the old memory region to the newly allocated memory region.
  3. Increased garbage collection pressure: The old memory region needs to be reclaimed by the garbage collector after it’s no longer used, which can increase the garbage collection burden.

These operations consume additional CPU and memory resources, affecting program performance. Therefore, when possible, it’s recommended to pre-allocate suitable capacity at initialization time to avoid frequent resize operations.

Reusing Objects

A running program will continuously create many new objects, each requiring a block of memory to be allocated and later scanned for garbage collection.

Let’s assume that a program creates a new object every 1 ms, with each object being used for 10 ms. This means the program will create 1000 objects per second, while at the same time, only 10 objects are in use within the system.

In this example, it is easy to see that we can simply create 10 objects and reuse them throughout the program, significantly reducing the number of created objects and thus the cost of garbage collection.

However, when considering concurrent object allocation between multiple goroutines, we should use sync.Pool. In most cases, sync.Pool meets the requirements for object reuse and is safer and more straightforward to use.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
var objectPool sync.Pool
func NewObject() *Object {
obj := objectPool.Get()
if obj == nil { return &Object{} }
return obj.(*Object)
}
type Object struct {
Name string
}
func (o *Object) Recycle() {
o.Name = "" // reset object
objectPool.Put(o)
}
func main() {
obj := NewObject()
obj.Recycle()
}

For objects that we want to reuse, we only need to implement a Recycle() method. When the program determines that an object is no longer needed, it can be recycled and released, leaving it available for use again later.

Note that the prerequisite for reusing objects is that we can precisely control their lifetime. If we prematurely release an object that is still being used elsewhere, it may lead to unpredictable errors.

Summary

Performance optimization is just one extra tool in our toolbox. Simple code is always the best code.

We only need to use advanced optimization techniques when there are specific scenarios that require them.

More

Recent Articles:

Random Article:


More Series Articles about You Should Know In Golang:

https://wesley-wei.medium.com/list/you-should-know-in-golang-e9491363cd9a

And I’m Wesley, delighted to share knowledge from the world of programming. 

Don’t forget to follow me for more informative content, or feel free to share this with others who may also find it beneficial. It would be a great help to me.

Give me some free applauds, highlights, or replies, and I’ll pay attention to those reactions, which will determine whether I continue to post this type of article.

See you in the next article. 👋

中文文章: https://programmerscareer.com/zh-cn/golang-advanced-02/
Author: Medium,LinkedIn,Twitter
Note: Originally written at https://programmerscareer.com/golang-advanced-02/ at 2025-01-05 19:39.
Copyright: BY-NC-ND 3.0

Go Sum You Should Know in Golang Goroutine Collaboration You Should Know In Golang

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×