5. Files & I/O
Use pathlib
for cross-platform paths, stream large files, and handle encodings/compression correctly.
Question: What are some best practices for working with files in Python?
Answer: Always use pathlib
for path manipulation, explicitly specify file encoding (e.g., encoding="utf-8"
), and use context managers (with open(...)
) to ensure files are closed properly.
Explanation: pathlib
provides a modern, object-oriented API for filesystem paths that is cleaner and less error-prone than string-based os.path
. Failing to specify an encoding can lead to bugs, as the default encoding is system-dependent. For large files, read them line-by-line or in chunks to avoid consuming too much memory.
from pathlib import Path
# Reading
data = Path("data.json").read_text(encoding="utf-8")
# Writing
Path("out.txt").write_text("hello", encoding="utf-8")
Question: How do you perform atomic file writes to avoid partial files?
Answer: Write to a temporary file in the same directory and os.replace
it into place.
Explanation: replace
is atomic on the same filesystem; readers never observe a half-written file.
import os, tempfile
from pathlib import Path
def atomic_write(path: Path, data: str):
with tempfile.NamedTemporaryFile("w", delete=False, dir=path.parent, encoding="utf-8") as tmp:
tmp.write(data)
tmp_path = Path(tmp.name)
os.replace(tmp_path, path)
Question: How do you read/write compressed files?
Answer: Use modules like gzip
/bz2
/lzma
; wrap in text mode for strings.
import gzip
from pathlib import Path
with gzip.open("data.json.gz", "rt", encoding="utf-8") as f:
text = f.read()
Question: How do you recursively find files with patterns?
Answer: Use Path.rglob("pattern")
.
from pathlib import Path
for p in Path("logs").rglob("*.log"):
print(p)
Question: When is
mmap
useful?
Answer: For memory-mapped I/O enabling random access to large files without reading them entirely into memory.
Explanation: Great for scanning binary formats; the OS handles paging efficiently.