WaitGroup You Should Know in Golang

Jul 8 2024 golang 15 minutes read (About 2192 words)

Peep Go’s WaitGroup implementation and application from source code

Does wisdom perhaps appear on the earth as a raven which is inspired by the smell of carrion?
— Friedrich Nietzsche

Medium Link: WaitGroup You Should Know in Golang | by Wesley Wei | Jul, 2024 | Programmer’s Career
Author：Wesley Wei – Medium

Hello, here is Wesley, Today’s article is about WaitGroup in Go. Without further ado, let’s get started.💪

(from github.com/MariaLetta/free-gophers-pack)

1. sync.WaitGroup Overview:

Introduce the functionality and usage of sync.WaitGroup.

Go, as a representative of cloud-native development, is known for its ease of use in concurrent programming. In most cases, people will use WaitGroup when working with concurrency. I often wonder how it works, so this article focuses on my understanding of WaitGroup.

In the Go language, sync.WaitGroup allows the main program or other goroutines to wait until multiple goroutines have completed execution before continuing.

It is mainly used in the following scenarios:

Waiting for a group of goroutines to complete: When we have multiple concurrent tasks that need to be executed and we want to continue executing subsequent operations after all these tasks are completed.
Ensuring resource release: In concurrent operations, it is necessary to ensure that all goroutines have completed execution before releasing resources, in order to avoid resource competition and data inconsistencies.

For example:

package main

import (
    "fmt"
    "sync"
)

func main() {
    var counter int64
    var mu sync.Mutex
    var wg sync.WaitGroup

    for i := 0; i < 10; i++ {
        wg.Add(1)
        go func() {
            for j := 0; j < 1000; j++ {
                mu.Lock()
                counter++
                mu.Unlock()
            }
            wg.Done()
        }()
    }

    wg.Wait()
    fmt.Println("Final Counter:", counter)
}

Run in Go 1.22

2 sync.WaitGroup in Go 1.17:

Dive into the structure and internal mechanisms of sync.WaitGroup.

I understand that the structure before Go 1.20 has some clever places, so this article will focus on explaining it with Go 1.17 as an example.

cs.opensource.google/go/go/+/refs/tags/go1.17:src/sync/waitgroup.go

type WaitGroup struct {
	noCopy noCopy

	// 64-bit value: high 32 bits are counter, low 32 bits are waiter count.
	// 64-bit atomic operations require 64-bit alignment, but 32-bit
	// compilers do not ensure it. So we allocate 12 bytes and then use
	// the aligned 8 bytes in them as state, and the other 4 as storage
	// for the sema.
	state1 [3]uint32
}

nocopy is a protection mechanism to prevent the structure from being copied, will be described later.
state1 mainly stores the count state and semaphore, which we will focus on next.

2.1 Comment Analysis

To understand the comments, first, you need to understand memory alignment and how state1 is used in Add() and Wait().

Memory alignment requires that data addresses must be multiples of a certain value, which can improve CPU reading memory data efficiency.For example:

32-bit alignment: The starting address of the data must be a multiple of 4.
64-bit alignment: The starting address of the data must be a multiple of 8.

In Add() and Wait(), the counter and waiter are combined into a single 64-bit integer for use.

statep, semap := wg.state()
…
state := atomic.AddUint64(statep, uint64(delta)<<32)
v := int32(state >> 32)
w := uint32(state)

When the counter and waiter values need to be changed, the 64-bit integer is operated on atomically via atomic. But atomic in you have some need to pay attention to the point of use, golang official document sync/atomic/PKG - note - bugs has this:

On ARM, 386, and 32-bit MIPS, it is the caller’s responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically via the primitive atomic functions (types Int64 and Uint64 are automatically aligned). The first word in an allocated struct, array, or slice; in a global variable; or in a local variable (because the subject of all atomic operations will escape to the heap) can be relied upon to be 64-bit aligned.

As you can see, based on this premise, in a 32 as system, we need to guarantee the 64-bit alignment of ‘count+waiter’ by ourselves. So the question is, if it’s you, how do you make it happen?

2.2 state()

Let’s learn about the official implementation:

state1 [3]uint32

// state returns pointers to the state and sema fields stored within wg.state1.
func (wg *WaitGroup) state()(statep *uint64, semap *uint32) {
    if uintptr(unsafe.Pointer(&wg.state1)) % 8 == 0 {
        return (*uint64)(unsafe.Pointer(&wg.state1)), &wg.state1[2]
    } else {
        return (*uint64)(unsafe.Pointer(&wg.state1[1])), &wg.state1[0]
    }
}

As shown:

On 64-bit systems, all meet the 8-byte alignment requirement. And on 32-bit systems, it could also be.
On 32-bit systems that do not meet the 8-byte alignment requirement, the sema field is moved forward by 4 bytes to ensure that the state field meets the 8-byte alignment requirement.

By simply rearranging the position of the sema field, we can guarantee that counter+waiter will always be aligned to 64-bit boundaries, which is indeed very clever.

PS: We may introduce an article about memory alignment later.

2.3 Simplified Implementation Process

Now, let’s consider the original structure as follows, ignoring memory alignment and concurrency safety factors for simplicity:

type WaitGroup struct {
    counter int32
    waiter uint32
    sema uint32
}

The fields are used approximately as follows:

counter represents the number of tasks that have not yet completed. WaitGroup.Add(n) will cause counter += n, while WaitGroup.Done() will cause counter--.
waiter represents the number of goroutines that have called WaitGroup.Wait.
sema corresponds to the internal semaphore implementation in the Go runtime. In WaitGroup, we use two related functions: runtime_Semacquire and runtime_Semrelease. runtime_Semacquire increases a semaphore and suspends the current goroutine. runtime_Semrelease decreases a semaphore and wakes up one of the goroutines waiting on it.

Note that this is a simplified implementation process, and actual code may be more complex.

3. Add(), Done(), Wait()

cs.opensource.google/go/go/+/refs/tags/go1.17:src/sync/waitgroup.go

Open the link, which is less than 150 lines, and read it first to help understand.

Combining our common usage scenarios, the key process is as follows:

When calling WaitGroup.Add(n), the counter will increment by n: counter += n

1	state := atomic.AddUint64(statep, uint64(delta)<<32)

When calling WaitGroup.Wait(), it will increment waiter++ and call runtime_Semacquire(semap) to increase the semaphore and suspend the current goroutine.

if atomic.CompareAndSwapUint64(statep, state, state+1) {
    …
    runtime_Semacquire(semap)
    …

When calling WaitGroup.Done(), it will decrement counter--. If the decremented counter equals 0, it means that the waiting process of WaitGroup has ended, and we need to call runtime_Semrelease to release the semaphore and wake up the goroutines waiting on WaitGroup.Wait.

1
2
3

for ; w != 0; w-- {
    runtime_Semrelease(semap, false, 0)
}

4. WaitGroup in Go 1.20

cs.opensource.google/go/go/+/refs/tags/go1.20:src/sync/waitgroup.go

I believe someone has noticed a problem, that counter and waiter need to ensure concurrent safety when changing. Why not use atomic.Uint64 directly?

That’s because atomic.Uint64 only supported in later versions.

In Go 1.20, we can notice that the memory alignment logic is replaced by atomic.Uint64, although it is not mentioned in the release notes of Go 1.20, there are many valuable things to learn from it.

Reference: [go] sync: use atomic.Uint64 for WaitGroup state, using atomic.Uint64 does not have additional performance overhead.

5. noCopy

In the waitGroup structure, we saw noCopy. Why do we need noCopy? Let’s look at an example:

package main
import "fmt"

// Define a struct type
type Person struct {
    Name string
    Age int
}

func main() {
    // Create a struct instance
    person := Person{Name: "Alice", Age: 30}

    // Create a pointer to the struct
    p := &person

    // Access and modify the struct's fields through the pointer
    fmt.Println(p.Name) // Output: Alice
    fmt.Println((*p).Name) // Output: Alice
    
    p1 := p
    p.Age = 32
    fmt.Println(p.Age) // Output: 32
    fmt.Println(p1.Age) // Output: 32
}

In Go, pointer copying is a shallow copy, which means that only the top-level structure is copied. This can lead to unexpected behavior if the original structure and its copy both point to the same underlying data. If one structure’s data is modified, it may affect another structure.

Using noCopy fields can help with static compilation checks. With go vet, you can check whether an object or a field within that object has been copied.

More about Pointers: Pointers You Should Know in Golang | by Wesley Wei | Jul, 2024 | Programmer’s Career

6. Notes on WaitGroup:

Explore some limitations and potential pitfalls when using WaitGroup, and learn how to avoid these issues.

If you’ve looked at the Go source code, you may have noticed the following classic notes that summarize the key points:

The Add() operation must occur before the Wait() operation.
The number of times Done() is called must match the counter value set by Add().
If the counter value is less than 0, it will panic.
Add() and Wait() cannot be called concurrently; for example, calling them in two different goroutines will panic.
You must wait until Wait() has completed before making a subsequent call to WaitGroup.

7. Semaphores

In the previous section, we mentioned semaphores, which are a mechanism for protecting shared resources and preventing multiple threads from accessing the same resource simultaneously. Let’s take a look at how semaphores work in Unix/Linux systems:

A semaphore contains a non-negative integer variable with two atomic operations: wait (down) and signal (up). The wait operation can also be referred to as P or down, which decrements the value by 1; while the signal operation is also known as V or up, which increments the value by 1. Semaphores use atomic operations to implement control over concurrent resources.

Here are the two operations:

wait (P, down) operation: If the semaphore’s non-negative integer variable S > 0, wait will decrement it; if S = 0, wait will block the thread.
signal (V, up) operation: After incrementing, if the pre-increment value is negative (indicating that there are processes waiting for resources), the blocked process will be moved from the semaphore’s wait queue to the ready queue; if no threads are blocked on the semaphore, signal will simply add 1 to S.

This corresponds to common scenarios in Go where you use a WaitGroup:

First execute Wait() with runtime_Semacquire(semap), which sets semap to 0 and should increase the semaphore and suspend the current goroutine.
When all goroutines have completed their Done() executions, execute runtime_Semrelease to release the semaphore and wake up any goroutines waiting on WaitGroup.Wait.

The corresponding source code is in sema.go:

//go:linkname sync_runtime_Semacquire sync.runtime_Semacquire
func sync_runtime_Semacquire(addr *uint32) {
    semacquire1(addr, false, semaBlockProfile, 0, waitReasonSemacquire)
}

//go:linkname sync_runtime_Semrelease sync.runtime_Semrelease
func sync_runtime_Semrelease(addr *uint32, handoff bool, skipframes int) {
    semrelease1(addr, handoff, skipframes)
}

For example, let’s take a look at semacquire1 (wait, P, down):

Try to acquire the semaphore:

if cansemacquire(addr) {
    return
}

func cansemacquire(addr *uint32) bool {
    for {
        v := atomic.Load(addr)
        if v == 0 {
            return false
        }
        if atomic.Cas(addr, v, v-1) {
            return true
        }
    }
}

Block and wait:

for {
    …
    if cansemacquire(addr) {
        root.nwait.Add(-1)
        unlock(&root.lock)
        break
    }
    root.queue(addr, s, lifo)
    goparkunlock(&root.lock, reason, traceBlockSync, 4+skipframes)
    if s.ticket != 0 || cansemacquire(addr) {
        break
    }
    …
}

The specific logic is quite complex and may be of interest to those who want to learn more. For more information on semaphores, refer to: Semaphore (programming) - Wikipedia.

More Series Articles about You Should Know In Golang:

https://wesley-wei.medium.com/list/you-should-know-in-golang-e9491363cd9a

And I’m Wesley, delighted to share knowledge from the world of programming.

Don’t forget to follow me for more informative content, or feel free to share this with others who may also find it beneficial. it would be a great help to me.

Give me some free applauds, highlights, or replies, and I’ll pay attention to those reactions, which will determine whether or not I continue to post this type of article.

See you in the next article. 👋

中文文章: https://programmerscareer.com/zh-cn/golang-waitgroup/
Author: Wesley Wei – Twitter Wesley Wei – Medium
Note: Originally written at https://programmerscareer.com/golang-waitgroup/ at 2024-07-08 22:37. If you choose to repost or use this article, please cite the original source.