4. Networking at Scale

Harden HTTP/gRPC at production scale: timeouts, pools, keepalives, TLS, load shedding, and graceful shutdown.

Q1 How would you design a robust HTTP server in Go that can handle production traffic?

Answer: A robust server must have explicitly configured timeouts (Read, ReadHeader, Write, Idle), use a separate http.ServerMux to avoid exposing debug handlers, implement backpressure (rate limiting), and include observability. Graceful shutdown is mandatory.

Explanation: The default http.ListenAndServe has no timeouts, making it vulnerable to slowloris-type attacks where a client slowly sends data, holding a connection open indefinitely. Timeouts and load shedding (e.g., token bucket) prevent cascading failures. Always implement graceful shutdown to drain in-flight requests on SIGTERM.

// Load shedding middleware with a token bucket
tb := rate.NewLimiter(rate.Limit(200), 400) // 200 rps, burst 400

func RateLimit(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if !tb.Allow() {
            http.Error(w, "too many requests", http.StatusTooManyRequests)
    return
        }
        next.ServeHTTP(w, r)
    })
}

Q2 How do you tune an http.Transport client for production?

Answer: Increase idle connection pools, set strict timeouts, and reuse a shared http.Client.

Explanation: Proper pooling reduces connection churn; timeouts bound tail latencies.

tr := &http.Transport{
    MaxIdleConns:          200,
    MaxIdleConnsPerHost:   100,
    IdleConnTimeout:       90 * time.Second,
    TLSHandshakeTimeout:   5 * time.Second,
    ExpectContinueTimeout: 1 * time.Second,
}
client := &http.Client{Transport: tr, Timeout: 15 * time.Second}

Q3 How do you implement zero-downtime HTTP deploys behind a load balancer?

Answer: Use health checks and connection draining: fail readiness, stop accepting new connections, gracefully Shutdown, and let the LB remove the instance before termination.

Explanation: Readiness gates traffic; graceful shutdown drains in-flight requests.

// Example http.Server with timeouts and graceful shutdown
srv := &http.Server{
    Addr:              ":8080",
    Handler:           mux,
    ReadTimeout:       5 * time.Second,
    ReadHeaderTimeout: 2 * time.Second,
    WriteTimeout:      10 * time.Second,
    IdleTimeout:       60 * time.Second,
}

// On SIGTERM/SIGINT:
ctx, cancel := context.WithTimeout(context.Background(), 20*time.Second)
defer cancel()
_ = srv.Shutdown(ctx) // allow in-flight requests to complete

Q4 What are the most important considerations when working with gRPC in a distributed system?

Answer: The most critical considerations are deadlines, idempotency, and observability. Every gRPC call must have a deadline set via the context to prevent requests from waiting forever. Retries must be implemented with care, ensuring the operations are idempotent. Interceptors should be used to inject cross-cutting concerns like tracing, metrics, and authentication.

Explanation: In a distributed system, failures are inevitable. Deadlines prevent a failure in one service from cascading and consuming resources in upstream services. gRPC-Go interceptors are the standard mechanism for adding middleware-like functionality to both the client and server, making it the perfect place to handle logging, metrics (e.g., RED), and propagating trace contexts.

Q5 How do you configure gRPC clients for reliability (keepalive, limits)?

Answer: Set per-RPC deadlines, configure keepalive pings, cap message sizes, and use backoff.

Explanation: These improve failure detection and prevent memory blowups from oversized messages.

import (
    "google.golang.org/grpc"
    "google.golang.org/grpc/keepalive"
)

conn, err := grpc.Dial(addr,
    grpc.WithTransportCredentials(creds),
    grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(8<<20)),
    grpc.WithKeepaliveParams(keepalive.ClientParameters{
        Time:                30 * time.Second,
        Timeout:             10 * time.Second,
        PermitWithoutStream: true,
    }),
)
_ = err

Q6 How do you cap concurrent requests server-side without head-of-line blocking?

Answer: Use a token bucket or weighted semaphore per-endpoint; return 429 when saturated.

Explanation: Apply limits early in middleware to shed load fairly and protect dependencies.

Q7 How do you harden HTTP servers and clients with TLS and HTTP/2?

Answer: Enforce TLS1.2+, prefer modern ciphers, enable HTTP/2, set MaxHeaderBytes, limit request body size, and drain/close bodies correctly.

Explanation: Tight limits prevent resource exhaustion and header attacks; HTTP/2 improves multiplexing but requires sane flow-control and timeouts.

srv := &http.Server{
    TLSConfig: &tls.Config{
        MinVersion: tls.VersionTLS12,
        CurvePreferences: []tls.CurveID{tls.X25519, tls.CurveP256},
        PreferServerCipherSuites: true,
    },
    MaxHeaderBytes: 1 << 20, // 1MB
}

// Client dialer timeouts
d := &net.Dialer{Timeout: 3 * time.Second, KeepAlive: 30 * time.Second}
tr := &http.Transport{DialContext: d.DialContext, ForceAttemptHTTP2: true}
client := &http.Client{Transport: tr, Timeout: 15 * time.Second}

Q8 How do you prevent request body abuse and ensure proper cleanup?

Answer: Wrap handlers with http.MaxBytesReader, check r.ContentLength, and always io.Copy(io.Discard, r.Body) on errors then r.Body.Close().

Explanation: Proper draining returns the connection to the pool; failing to drain can leak connections.