2. Performance Engineering

Profile first; target CPU, memory, blocking; reduce allocations; validate gains with benchmarks and traces.

Question: Your application is slow. How would you profile it to find the bottleneck?

Answer: I would start by collecting a CPU profile using pprof. This will show which functions are consuming the most CPU time. If the issue is memory-related, a heap profile will identify where memory is being allocated. For I/O or lock contention issues, block and mutex profiles are essential.

Explanation: Profiling should always be the first step before optimizing.

  1. CPU Profile: go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

  2. Heap Profile: go tool pprof http://localhost:6060/debug/pprof/heap

  3. Trace: For latency issues, go tool trace gives a detailed view of goroutine execution, GC pauses, and syscalls, which is invaluable for finding scheduling delays. go tool trace http://localhost:6060/debug/pprof/trace?seconds=5

Note: enable mutex and block profiling to collect meaningful data:

import "runtime"

func init() {
    runtime.SetMutexProfileFraction(5)      // sample ~1/5 of mutex contention
    runtime.SetBlockProfileRate(10_000_000) // 10ms threshold
}

Question: What is escape analysis, and how can you use it to improve performance?

Answer: Escape analysis is a compile-time process that determines whether a value created in a function can be safely allocated on the goroutine's stack or if it must "escape" to the heap. Allocating on the stack is much cheaper and avoids GC overhead.

Explanation: You can see the compiler's decisions by building with -gcflags='-m'. A variable escapes if the compiler cannot prove its lifetime at compile time, for example, if you return a pointer to it or store it in a slice that outlives the function. To improve performance, you should structure code to minimize unnecessary heap allocations by preferring value types over pointers for small structs and being mindful of how values are shared across goroutine boundaries.

Question: How can you optimize I/O-heavy operations in Go?

Answer: Key strategies include reusing buffers to reduce allocations (sync.Pool), using interfaces like io.ReaderFrom and io.WriterTo for zero-copy transfers where possible, and batching smaller I/O operations into larger ones using bufio.

Explanation: I/O operations often involve system calls, which have significant overhead. bufio.Reader and bufio.Writer wrap io.Reader/io.Writer to minimize syscalls by buffering reads and writes in memory. For network clients and servers, properly configuring timeouts and reusing connections (e.g., with http.Transport) is also critical for performance and resilience.

Question: How do inlining and bounds-check elimination affect performance?

Answer: Inlining removes call overhead and can unlock further optimizations. Bounds-check elimination (BCE) removes slice/index checks when the compiler can prove safety. Inspect with -gcflags=all=-m.

Explanation: The compiler elides checks in common patterns (loop over i := 0; i < len(s); i++ { _ = s[i] }). Use copy(dst, src) and append patterns to help BCE. Use //go:noinline only for benchmarking to prevent over-optimizations from hiding costs.

Question: How do you build strings and bytes efficiently?

Answer: Prefer strings.Builder for strings, bytes.Buffer or pre-sized []byte for bytes. Reuse buffers via sync.Pool when safe.

Explanation: strings.Builder avoids intermediate allocations and is not safe for concurrent use. Reserve capacity when possible:

var b strings.Builder
b.Grow(256)
for _, s := range parts { b.WriteString(s) }
return b.String()

Question: What are common benchmarking pitfalls, and how do you avoid them?

Answer: Avoid measuring setup/IO, disable compiler elision with realistic data, run -benchmem, and ensure no dead-code elimination.

Explanation: Use b.ResetTimer()/b.StopTimer(), consume results (Sink = x) to avoid DCE, and compare with production-like inputs.

Question: How do you speed up JSON handling in hot paths?

Answer: Reuse json.Encoder/Decoder on persistent buffers, avoid MarshalIndent, preallocate structs/slices, and consider code-generated or SIMD-optimized libraries for the hottest paths if allowed. Validate with benchmarks.

Explanation: Minimize allocations and copies; for large payloads stream with Encoder/Decoder. Use DisallowUnknownFields for safety (may reduce GC by rejecting junk early). Keep field names as []byte constants when writing by hand.