7. Robustness & Error Handling

Fail fast with context, keep exceptions specific, and design retries to be safe and bounded.

Question: What are your principles for robust exception handling?

Answer: Fail fast, be specific in except clauses, and add context to exceptions. Never swallow exceptions silently. A good strategy is to wrap low-level exceptions in custom, domain-specific exceptions to create a clear boundary and prevent implementation details from leaking.

Explanation: Using raise NewException from old_exception is critical because it preserves the original stack trace, making debugging much easier. This creates a causal chain of exceptions. Logging actionable information—not just the exception name, but context like relevant IDs or parameters—is also essential for production systems.

class DomainError(Exception):
    pass

def process_item(item_id):
    try:
        risky_call(item_id)
    except ExternalError as e:
        # Add context and preserve original cause
        raise DomainError(f"Failed to process item {item_id}") from e

Question: What is an ExceptionGroup and how do you handle it?

Answer: ExceptionGroup groups multiple exceptions into one (common with parallel/async tasks). Use except* to handle by type.

Explanation: This avoids losing errors from sibling tasks.

try:
    raise ExceptionGroup("many", [ValueError(), KeyError()])
except* ValueError as eg:
    handle_value_errors(eg.exceptions)

Question: How do you implement resilient retries?

Answer: Use bounded retries with exponential backoff and jitter; respect idempotency and timeouts.

Explanation: Prevents thundering herds and minimizes cascading failures.

import random, time
def retry(op, attempts=5, base=0.1):
    for n in range(attempts):
        try:
            return op()
        except TransientError:
            sleep = base * (2 ** n) + random.uniform(0, base)
            time.sleep(sleep)
    raise