1. Advanced Concurrency & Runtime

Push the runtime: scheduler behavior, atomics vs mutexes, leak prevention, bounded concurrency, and preemption.

Q1 Can you describe the Go scheduler's G-M-P model and how it handles system calls?

Answer: The Go scheduler uses a G-M-P model: Goroutines (G) are the lightweight threads managed by Go, OS Threads (M) are the actual execution units provided by the kernel, and Processors (P) are a scheduling context. A P is responsible for executing one G on one M at a time. When a goroutine makes a blocking system call, the M hands off its P (and its runnable goroutine queue) to another available M, preventing the OS thread from blocking other Go code.

Explanation: This model is key to Go's high concurrency. By multiplexing many Gs onto a small number of Ms, it avoids the high overhead of OS threads. The network poller is a critical component that integrates with the scheduler, allowing it to efficiently handle non-blocking I/O by parking goroutines that are waiting on network activity and waking them only when ready. GOMAXPROCS controls the number of Ps, effectively setting the limit of OS threads that can execute Go code simultaneously.

Q2 You suspect lock contention is causing performance degradation. How would you diagnose and mitigate it?

Answer: To diagnose, I would use pprof to capture and analyze a mutex profile, which reveals where goroutines are spending the most time waiting for locks. For mitigation, I would consider reducing the critical section, using more granular locks (like sharding), or switching to atomic operations for simple counters.

Explanation: The goal is to minimize both the time a lock is held and the number of goroutines competing for it.

Diagnose: Run go tool pprof -http=:8080 http://localhost:6060/debug/pprof/mutex.
Mitigate:
- Reduce Critical Section: Ensure only the absolutely necessary code is inside the lock.
- Shard Locks: Instead of one global lock for a map, split the map into N smaller maps, each with its own lock.
- Atomics: For simple numeric operations (counters, flags), sync/atomic provides hardware-level atomic operations that are much faster than mutexes.

Q3 How do you implement bounded concurrency to prevent a service from being overwhelmed?

Answer: The two primary patterns are using a buffered channel as a semaphore or using the golang.org/x/sync/semaphore package. A channel-based semaphore works by filling a buffered channel with "tokens"; a goroutine must acquire a token before starting work and release it upon completion.

Explanation: Unbounded concurrency can lead to resource exhaustion (memory, file descriptors) or overwhelming downstream services. Bounding concurrency provides backpressure. The semaphore package is often a cleaner and more explicit choice, especially when you need weighted semaphores.

import (
    "context"
    "runtime"
    "golang.org/x/sync/semaphore"
)

// Example using x/sync/semaphore
var sem = semaphore.NewWeighted(int64(runtime.GOMAXPROCS(0)))

func processRequest(ctx context.Context, job Job) {
    if err := sem.Acquire(ctx, 1); err != nil {
        // Handle error, maybe service is shutting down or context canceled
        return
    }
    defer sem.Release(1)

    // Do the actual work
    process(job)
}

Q4 When do you prefer `x/sync/semaphore` over a channel semaphore?

Answer: Prefer semaphore.Weighted for complex acquire/release patterns and weighted permits; use channel semaphores for simple 1:1 slots.

Explanation: Weighted supports acquiring multiple units atomically with context cancellation; channels are minimal but less expressive.

Q5 How do you expose and sample mutex/block profiles safely in prod?

Answer: Enable sampling with runtime.SetMutexProfileFraction and runtime.SetBlockProfileRate, expose on an internal admin port, and collect briefly.

Explanation: Sampling reduces overhead; never expose pprof publicly.

Q6 What is the Go memory model's stance on data races, and what does "happens-before" mean?

Answer: The Go memory model specifies that a data race occurs when two goroutines access the same variable concurrently, and at least one of the accesses is a write. Data races cause undefined behavior. The "happens-before" relationship is a guarantee that certain operations will complete before others begin.

Explanation: In Go, "happens-before" is established by synchronization primitives. For example, a send on a channel happens before the corresponding receive from that channel completes. Similarly, unlocking a mutex happens before any subsequent goroutine can lock that same mutex. If there is no explicit happens-before relationship between two accesses to a shared variable, the compiler and CPU are free to reorder them, leading to unpredictable outcomes. Always use the -race detector during testing.

Q7 What are best practices for `context.Context` usage and cancellation?

Answer: Pass context as the first parameter; do not store it in structs. Always set deadlines/timeouts at boundaries, propagate cancellation, and select on ctx.Done() in long-running work.

Explanation: Context ties request lifecycles to work and resources. Avoid leaking goroutines by checking ctx.Done() in loops and background workers. Use errgroup.WithContext to coordinate goroutines and propagate the first error.

func worker(ctx context.Context, jobs <-chan Job) error {
    for {
        select {
        case <-ctx.Done():
            return ctx.Err()
        case j, ok := <-jobs:
            if !ok { return nil }
            if err := process(ctx, j); err != nil { return err }
        }
    }
}

Q8 When should you use typed atomics vs. mutexes?

Answer: Use typed atomics (atomic.Int64, atomic.Pointer[T]) for independent fields like counters and flags. Prefer mutexes when updating related fields, enforcing invariants, or when operations are not trivially atomic.

Explanation: Atomics avoid lock contention but make complex state hard to reason about. Keep atomic updates minimal and localized; use mutexes for compound state transitions.

import "sync/atomic"

var requests atomic.Int64
requests.Add(1)

type Config struct{ Enabled bool }
var cfgPtr atomic.Pointer[Config]
cfgPtr.Store(&Config{Enabled: true})
current := cfgPtr.Load()

Q9 How do you prevent goroutine leaks in long-running services?

Answer: Tie goroutines to a context, use errgroup.WithContext for coordinated cancellation, and always select on ctx.Done() in loops. Ensure producers exit on cancellation and channels are closed by owners.

Explanation: Leaked goroutines retain memory, timers, and descriptors. Coordinated cancellation ensures fast, safe shutdowns.

g, ctx := errgroup.WithContext(parent)
g.Go(func() error { return worker(ctx, jobs) })
g.Go(func() error { return watcher(ctx) })
if err := g.Wait(); err != nil { /* handle */ }

Q10 How does preemption work in Go, and when should you use `runtime.Gosched`?

Answer: Since Go 1.14, the runtime supports asynchronous preemption: goroutines can be preempted at safe points without cooperation. You rarely need runtime.Gosched; prefer using timeouts, contexts, and bounded queues. GOMAXPROCS controls parallelism, not goroutine count.

Explanation: The scheduler employs work stealing between Ps and a background sysmon thread to preempt long-running goroutines (tight loops, heavy CPU). Cooperative yields (runtime.Gosched) are mostly unnecessary today. If a loop performs expensive pure CPU work with no blocking calls and long spans between safe points, consider introducing small blocking points (e.g., channel ops) or chunk the work so cancellation via ctx.Done() is observed promptly.

Q11 When should you use `sync.Map` instead of a regular map with a mutex?

Answer: Use sync.Map for patterns with many goroutines performing concurrent, mostly-read operations with keys appearing once (write-once, read-many), or when keys are dynamic and you want built-in atomic LoadOrStore/LoadAndDelete semantics.

Explanation: For stable key sets with frequent updates, a plain map protected by sync.RWMutex is typically faster and simpler. sync.Map shines for caches of ephemeral keys and for avoiding double work with LoadOrStore.

var m sync.Map
v, loaded := m.LoadOrStore(key, compute())
if loaded {
    // someone else stored it
}

Q12 Show a production-ready worker pool with backpressure and cancellation.

Answer: Bound both queue length and worker count; drain and stop workers on context cancellation.

Explanation: Bounding prevents unbounded memory growth and gives upstream backpressure signals (enqueue blocks when full).

type Job struct{ ID int }

func StartPool(ctx context.Context, workers, queue int, handle func(context.Context, Job)) (submit func(Job) error) {
    jobs := make(chan Job, queue)
    wg := &sync.WaitGroup{}
    for i := 0; i < workers; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for {
                select {
                case <-ctx.Done():
                    return
                case j, ok := <-jobs:
                    if !ok { return }
                    handle(ctx, j)
                }
            }
        }()
    }
    go func() { <-ctx.Done(); close(jobs); wg.Wait() }()
    return func(j Job) error {
        select {
        case <-ctx.Done():
            return ctx.Err()
        case jobs <- j:
            return nil
        }
    }
}

Q13 How do you avoid duplicate work across concurrent requests?

Answer: Use singleflight to collapse duplicate in-flight calls for the same key.

Explanation: This prevents cache stampedes and reduces load on dependencies.

import "golang.org/x/sync/singleflight"

var group singleflight.Group

func Fetch(ctx context.Context, key string) ([]byte, error) {
    v, err, _ := group.Do(key, func() (any, error) {
        return expensiveCall(ctx, key)
    })
    if err != nil { return nil, err }
    return v.([]byte), nil
}