5. Files & I/O

Use pathlib for cross-platform paths, stream large files, and handle encodings/compression correctly.

Q1 What are some best practices for working with files in Python?

Answer: Always use pathlib for path manipulation, explicitly specify file encoding (e.g., encoding="utf-8"), and use context managers (with open(...)) to ensure files are closed properly.

Explanation: pathlib provides a modern, object-oriented API for filesystem paths that is cleaner and less error-prone than string-based os.path. Failing to specify an encoding can lead to bugs, as the default encoding is system-dependent. For large files, read them line-by-line or in chunks to avoid consuming too much memory.

from pathlib import Path

# Reading
data = Path("data.json").read_text(encoding="utf-8")
# Writing
Path("out.txt").write_text("hello", encoding="utf-8")

Q2 How do you perform atomic file writes to avoid partial files?

Answer: Write to a temporary file in the same directory and os.replace it into place.

Explanation: replace is atomic on the same filesystem; readers never observe a half-written file.

import os, tempfile
from pathlib import Path

def atomic_write(path: Path, data: str):
    with tempfile.NamedTemporaryFile("w", delete=False, dir=path.parent, encoding="utf-8") as tmp:
        tmp.write(data)
        tmp_path = Path(tmp.name)
    os.replace(tmp_path, path)

Q3 How do you read/write compressed files?

Answer: Use modules like gzip/bz2/lzma; wrap in text mode for strings.

import gzip
from pathlib import Path

with gzip.open("data.json.gz", "rt", encoding="utf-8") as f:
    text = f.read()

Q4 How do you recursively find files with patterns?

Answer: Use Path.rglob("pattern").

from pathlib import Path
for p in Path("logs").rglob("*.log"):
    print(p)

Q5 When is `mmap` useful?

Answer: For memory-mapped I/O enabling random access to large files without reading them entirely into memory.

Explanation: Great for scanning binary formats; the OS handles paging efficiently.