One of my favorite shows is "How It's Made" -- my enjoyment mostly stems from learning how stuff is made, but the narrator's cheeky puns and jokes certainly add to it. But something I enjoy more than knowing how stuff is put together, is knowing how things work. I don't know what it is, but I have this childlike fascination with opening things up and learning how it fits together, what each part. That was one of my favorite things about my brief stint (a whooping six months!) in the automative service industry: understanding, a little better, how cars work. It certainly opened my eyes to all the work that goes into even simple automotive repairs.
Sadly, I no longer work on or with cars, I do still fiddle some with mine though, and if anyone has a good link to how a transmission -- manual or automatic -- actually works, I'd be thrilled! But this has left me with a hole in my life. One I've recently begun to fill with how Python operates under the hood -- so to speak. While my skills with C -- which basically amount to printf and for loops -- leave me woefully unprepared to examine much of the source, I can examine the surface parts.
To use a car analogy, if reading the C source for Python is repairing a damaged block or transmission, examining how Python works is more similar to replacing motor mounts and broken belts (something I'm regretfully too familiar with on my CRV). Whereas reading someone else's Python is like doing your own fluid changes. Flawed analogies aside, I'd like to more fully examine how Python objects work and what it really means to call foo.bar().
As a forewarning, this knowledge is great for understanding what's happening, but it's not crucial knowledge to working with classes and objects in the regular sense. All the things I will discuss here deal with how Python 3 handles them. Python 2 is slightly different.
Building a Class
To talk about Python's data model and how it relates to classes and objects, we should first write a class. It so basic as to wonder why we're doing it. The point is, rather than examine some fictional class or object, why not have one of our own to open up and poke at?
class Baz: def __init__(self, thing): self.thing = thing def bar(self): print(self.thing)
That's an extremely basic object. The initalizer takes a single argument a method that prints it out. Of course, we need to instantiate it for us to get use out of it.
foo = Baz(1)
Already, there's some mechanisms at work for us. I don't want to get too deep into class creation, but the short take away is the implicit __new__ classes inherit from object handle object creation and __init__ simply sets the initial state of the object for us.. Delving into __new__ hooks into dealing with metaclasses, which is a topic for another time. What I want to focus on today is what happens when we call foo.bar()
Classes and Objects
You'll often hear that objects and classes in Python are simply nothing more than a pile of dictionaries with dotted access. This obtuse phrasing confused me for a long time and it wasn't until I began asking, "How the heck does self actually get passed?" that I began to understand. Asking this began me down a rabbit hole that lead me to descriptors and __getattribute__ and what they do.
All classes in Python have an underlying __dict__ and nearly every instance does as well. The first step to foo.bar() is understanding that methods live at the class level.
print('bar' in Baz.__dict__) print('bar' in foo.__dict__)
Methods are entries in the class's underlying __dict__ but not in the instance's. Because of this, most Python objects can remain relatively small, they simply store their state rather than all of their available methods as well. What does this method look like in the dictionary?
from inspect import isfunction, ismethod print(isfunction(Baz.__dict__['bar'])) print(ismethod(Baz.__dict__['bar'])) print(Baz.__dict__['bar'])
True False <function Baz.bar at 0x7f1d05a87ea0>
We can see that in the class's dictionary, methods are stored as functions and not as methods. It's reasonable to infer that methods are actually functions that operate on class instances. From here, we can imagine that behind the scenes
The next piece of the puzzle is how Python handles attribute access. If you're not familiar with how Python attribute look up happens, in short, it looks like this:
- Call __getattribute__
- Is the attribute in the object __dict__?
- No? Is the attribute in the class's __dict__?
- No? Is the attribute in any of the parent classes' __dict__?
- No? Call __getattr__ if present.
- Else, raise an AttributeError
Python starts at the bottom, calling __getattribute__. This what actually allows the dotted access. You can think of the . in foo.bar to be implicit call to this method. This method translates dictionary look up to dotted access and invokes the rest of the chain. Since we already know that methods live in the class's __dict__ and methods are functions that act on the instance, we'll fast forward to there and extrapolate.
Since methods are functions that live in the class's dictionary and act on instances and __getattribute__ is an implicit transformation from attribute to dictionary look up, we can infer that method calls look like this behind the scenes:
Methods vs Functions
So far so good. All this is pretty easy to grasp. But there's still burning question of how the heck is self (or rather foo) being passed to our methods. If we examine Baz.bar and foo.bar both, we can see there's a transformation going on somewhere.
<function Baz.bar at 0x7f1d05a87ea0> <bound method Baz.bar of <__main__.Baz object at 0x7f1d05a88208>>
Python is some how transforming our function that lives in Baz's dictionary into a method tied to our instance foo. The answer lies in the descriptor protocol. I've written about it else where, and it's probably time to revise it again with my recent understanding. But essentially, descriptors add another rule to our attribute look up. Just before the __getattr__ call: If we recieved a descriptor, call the __get__ method on the descriptor.
This is our missing link. When a function is declared in the class, not only is it placed in the class's dictionary it's also wrapped by a descriptor. Or more accurately, a non-data descriptor because it only defines the special __get__ method. The way descriptors work is by intercepting lookup of specific attributes.
The Descriptor likely has a passing resemblance to this (of course, implemented in C):
from types import MethodType class MethodDescriptor: def __init__(self, method): self.method = method def __get__(self, instance, cls): if instance is None: return self.method return MethodType(self.method, instance)
So, our initial thought of what foo.bar() looks like under the covers was wrong. It more accurately resembles:
Baz.__dict__['bar'].__get__(foo, Baz)() # if we inspect it we see the truth print(Baz.__dict__['bar'].__get__(foo, Baz))
1 <bound method Baz.bar of <__main__.Baz object at 0x7f1d05a88208>>
And in fact, if we put our imitation method descriptor into action, it works similarly to how object methods do.
def monty(self, x): print(x) class Spam: eggs = MethodDescriptor(monty) ##of course, it's also useable as a decorator @MethodDescriptor def bar(self): return 4 ham = Spam() # a lie if I ever saw one print(Spam.eggs) print(ham.eggs) ham.eggs(1) print(ham.bar())
<function monty at 0x7f1d045cef28> <bound method Spam.monty of <__main__.Spam object at 0x7f1d05a780b8>> 1 4
The reason we see a function when we access the bar method when we access it through the class is because the descriptor has already run and decided that it should simply return the function itself.
I spilled my brains, spill some of yours.