5. Files & I/O
Use pathlib for cross-platform paths, stream large files, and handle encodings/compression correctly.
Q1 What are some best practices for working with files in Python?
Answer: Always use pathlib for path manipulation, explicitly specify file encoding (e.g., encoding="utf-8"), and use context managers (with open(...)) to ensure files are closed properly.
Explanation: pathlib provides a modern, object-oriented API for filesystem paths that is cleaner and less error-prone than string-based os.path. Failing to specify an encoding can lead to bugs, as the default encoding is system-dependent. For large files, read them line-by-line or in chunks to avoid consuming too much memory.
from pathlib import Path
# Reading
data = Path("data.json").read_text(encoding="utf-8")
# Writing
Path("out.txt").write_text("hello", encoding="utf-8")
Q2 How do you perform atomic file writes to avoid partial files?
Answer: Write to a temporary file in the same directory and os.replace it into place.
Explanation: replace is atomic on the same filesystem; readers never observe a half-written file.
import os, tempfile
from pathlib import Path
def atomic_write(path: Path, data: str):
with tempfile.NamedTemporaryFile("w", delete=False, dir=path.parent, encoding="utf-8") as tmp:
tmp.write(data)
tmp_path = Path(tmp.name)
os.replace(tmp_path, path)
Q3 How do you read/write compressed files?
Answer: Use modules like gzip/bz2/lzma; wrap in text mode for strings.
import gzip
from pathlib import Path
with gzip.open("data.json.gz", "rt", encoding="utf-8") as f:
text = f.read()
Q4 How do you recursively find files with patterns?
Answer: Use Path.rglob("pattern").
from pathlib import Path
for p in Path("logs").rglob("*.log"):
print(p)
Q5 When is mmap useful?
Answer: For memory-mapped I/O enabling random access to large files without reading them entirely into memory.
Explanation: Great for scanning binary formats; the OS handles paging efficiently.