1. Python Internals & Data Model
Master the runtime: execution model, object layout, hashing, attribute lookup, and the implications for correctness and performance.
Question: Can you describe the CPython execution model and how it manages memory?
Answer: CPython first compiles Python source code into bytecode, which is then executed by a virtual machine. For memory management, it primarily uses reference counting, supplemented by a cyclic garbage collector to deallocate objects with circular references.
Explanation: A critical component of this model is the Global Interpreter Lock (GIL), a mutex that ensures only one thread executes Python bytecode at a time. This simplifies memory management by preventing race conditions on object reference counts, but it is a known bottleneck for CPU-bound multi-threaded applications.
Question: What is the difference between
is
and==
? What is the relationship between an object's equality and its hash?
Answer: is
compares object identity (i.e., if two variables point to the exact same object in memory), while ==
compares equality by calling the __eq__
method. The hash/equality contract states that if two objects are considered equal (a == b
), then their hash values must also be equal (hash(a) == hash(b)
).
Explanation: The reverse is not true: two objects with the same hash are not necessarily equal, which is known as a hash collision. This contract is essential for objects to work correctly as keys in dictionaries or as elements in sets.
Question: Explain the difference between
__getattr__
and__getattribute__
.
Answer: __getattribute__
is called for every attribute access on an object, regardless of whether the attribute exists. __getattr__
is a fallback that is only called if the requested attribute is not found through normal mechanisms.
Explanation: You must be extremely careful when implementing __getattribute__
to avoid infinite recursion by calling the base class's __getattribute__
method. __getattr__
is safer and is commonly used to implement proxy objects or to compute attributes on the fly.
Question: What is the purpose of
__slots__
and when is it appropriate to use it?
Answer: __slots__
is a class variable that pre-declares instance attributes. By defining __slots__
, you prevent the creation of a __dict__
and __weakref__
for each instance, leading to significant memory savings and slightly faster attribute access.
Explanation: This is an optimization best used when you expect to create a very large number of small objects where memory is a concern. The main trade-off is inflexibility: you cannot add new attributes to instances that are not declared in __slots__
.
class Money:
__slots__ = ("amount", "currency")
def __init__(self, amount: int, currency: str) -> None:
self.amount = amount
self.currency = currency
Question: How does a Python
dict
orset
work internally to achieve O(1) lookups on average?
Answer: Python's dictionaries and sets are implemented using a hash table. When an object is added, its hash is used to determine which "bucket" to place it in. This allows for average O(1) time complexity for lookups, insertions, and deletions.
Explanation: The hash table is a sparse array. To find an element, Python re-computes the hash of the key to immediately find the correct bucket. If multiple keys hash to the same bucket (a collision), Python uses a technique called open addressing to probe for the next available slot. The table is automatically resized as it grows to maintain sparsity, which is why the O(1) complexity is an amortized average.
Question: What is the Method Resolution Order (MRO) and how does
super()
work?
Answer: MRO is the order in which Python looks up attributes on a class and its bases. super()
uses the class’s MRO to delegate to the next method in the resolution chain, enabling cooperative multiple inheritance.
Explanation: In the diamond pattern, all classes must call super()
to ensure each base is initialized exactly once. The C3 linearization algorithm defines the MRO.
class A:
def __init__(self):
self.trace = ["A"]
class B(A):
def __init__(self):
super().__init__()
self.trace.append("B")
class C(A):
def __init__(self):
super().__init__()
self.trace.append("C")
class D(B, C):
def __init__(self):
super().__init__() # runs A -> C -> B -> D
self.trace.append("D")
# D().trace == ['A','C','B','D']
Question: What is the difference between
__new__
and__init__
?
Answer: __new__
creates and returns a new instance (it’s a static method on the class), while __init__
initializes the already created instance.
Explanation: Override __new__
to control instance creation (e.g., for immutables like tuple
, singletons, or caching). Use __init__
for normal post-construction initialization.
class Singleton:
_inst = None
def __new__(cls, *a, **kw):
if cls._inst is None:
cls._inst = super().__new__(cls)
return cls._inst
Question: What is the buffer protocol and why use
memoryview
?
Answer: The buffer protocol exposes raw memory of objects (like bytes
, bytearray
, NumPy arrays) without copying. memoryview
lets you slice and manipulate large binary data efficiently.
Explanation: It avoids allocations and copies, which is critical in high-throughput I/O and binary processing.
data = bytearray(b"\x00" * 10)
mv = memoryview(data)
mv[2:5] = b"abc" # in-place, no copy
Question: When and how would you use weak references?
Answer: Use weakref
to reference objects without increasing their reference count, allowing garbage collection when no strong refs remain.
Explanation: Useful for caches or cross-references that shouldn’t keep objects alive.
import weakref
cache = weakref.WeakValueDictionary()
obj = SomeHeavyObject()
cache[obj.id] = obj # Evicted automatically when obj is GC’d