If you've used Flask at all, you've probably at least heard about Flask-SQLAlchemy. Despite the title, it's actually a great adapter from SQLAlchemy to Flask. It handles things like tying the current database session to the lifespan of the Flask request cycle and it'll centralize a lot of the ORM stuff under one namespace instead of having to remember where to import a bunch of stuff from.
Buuuuuut, there's several major issues I have with it.
SQLAlchemy has a first() selector that either returns the first row that matches the query or None. On one hand, I understand why it acts this way but it creates a weird disjoint between it and one which will:
- Return the single matching row
- Throw a NoResultFound if there's no matching results
- Throw a MultipleResultsFound if there's more than one matching row
Personally, it'd be good design for first() to follow this pattern, the same with get(), and I'm sure the SQLAlchemy team has their reasons that are perfectly valid, but that's not what I'm here to talk about.
Flask-SQLAlchemy adds first_or_404 as well as get_or_404. These either get the desired object or calls flask.abort(404). Think about that. Your database call is either going to find the right result or make routing call into your application.
If you used these, how would you test that a non-existent user causes a 404?
from unittest import mock from myapp.users.views import user_profile def test_non_existent_user_causes_404(): with mock.patch('flask_sqlalchemy.abort') as aborter: # red flag!!! user_profile('idontexist') assert aborter.called
That's a huge red flag. I need to patch something that's essentially a private call in a third party library. Whereas if I use just regular ol' first() I can check if I called abort, which isn't great but at least I'm patching against my logic not somebody else's.
Now, I get the utility of this. When it was written, it was probably with the same mindset I have: first should throw an exception when it doesn't find what it was looking for. But instead of causing a routing call, it could simply raise NoResultFound. Or provide a first_or_raise that allows me to provide my own exception, because maybe I want to handle a UserNotFound differently than an OrderNotFound. Currently, I think best practice is to not use the x_or_404 utilities and simply check for None. Better yet, provide your own utility method that does this (more on this later).
Conflation of the Model and the Repository
SQLAlchemy does a really good job of separating models from their repositories. There's a very clear separatation between what User represents and what session.query(User) represents. The first is structure and the second is access
However, Flask-SQLAlchemy blurs this line to the point where it doesn't exist any more. Instead of requiring data access to go through a session object, Flask-SQLAlchemy places a query object on the models and allows us to do stuff like:
User.query.filter(User.permission_level >= PermissionLevels.MODERATOR.value).all()
This also encourages adding complex queries onto the model itself, further blurring the line. For example, the query above might be used often and so we create a query method on the User object:
class User: #... @classmethod def find_by_permission_level(cls, permission): return cls.query.filter(cls.permission_level >= permission.value).all()
That's pretty handy. But does it make sense? I don't think it does. Even more so when you consider that the query property and these methods are available on actual instances of User.
me = User.query.get(1) me.find_by_permission_level(PermissionLevel.MODERATOR) me.query.filter(User.username.startswith('just')).first()
And even scarier is that often model instances are passed directly to templates. It isn't common, but if you allowed users to provide their own templates, what's to stop them from running queries in the template? That's bad news.
I don't have a good solution for this problem, the best I've come up with is to wrap data access into a repository layer itself. While it might seem really silly to wrap SQLAlchemy's data access in an abstraction (abstraction-inception!), to me this makes the most amount of sense. There are pros-and-cons. The biggest pro is that these common queries now have an actual home, but the biggest con is that a repository tends to grow accessor methods at an exponential rate (find_by_id, find_by_name, find_by_permission_level, find_by_this, find_by_that) outside of implementing some sort of criteria matching, I'm not sure how to address this.
This alone doesn't fix the issue of returning database models to the template, but that can be fixed by simply scrubbing your models before returning them off to the templates. Marshmallow is good for this, but will return dictionaries instead of objects. However, that's easily fixed by providing a post_dump method that transforms the result into something with dotted access:
from marshmallow import Schema class dottedict(dict): def __getattr__(self, key): return self.__getitem__(key) def __setattr__(self, key, value): return self.__setitem__(key, value) class UserSchema(Schema): # fields here @post_dump # note this is Marshmallow >= 2.0.0a1 def dottify(self, data, many=False): if many: return [dotteddict(**datum) for datum in data] return dotteddict(**data)
And this is probably a good practice everyone should get into. This allows completely forbidding access to implementation details or private attributes altogether. We simply don't serialize things like password or surrogate primary keys. The biggest issue for this is that Marshmallow doesn't handle multiple post_dump methods deterministically. So if you need to do multiple things to the data after dumping it, you're better off defining the post dump behavior on each schema independently, rather than allowing it propagate through inheritance (a mixin might be a good fix though).
As for unblurring the line, you could set up your repository to convert between plain Python objects and SQLAlchemy models depending on if you're accessing or persisting data, but this adds a certain amount of overhead.
Using this approach also solves the issue of first_or_404, we can simply define our own first() method on the repository that throws an exception:
class UserRepository: def first(self, **kwargs): user = self.session.query(self.model).filter_by(**kwargs).first() if not user: raise UserNotFound() return UserDTO(user)
SQLAlchemy allows configuring just about every aspect of it. The type of session you use, the query class you use, on and on and on. However, Flask-SQLAlchemy locks you into using the ones it provides. There are several issues open.
And until these are resolved, the default advice is to subclass the SQLAlchemy object and override what you need. But the actual change for specifying custom query classes involves patching three internal helpers so they get the correct query class as well. Ugh.
So going back to implementing our own first_or_raise utility, we need to:
- Define our own query class with the first_or_raise method
- Subclass the SQLAlchemy class to accept a query class
- Patch _set_default_query_class, _include_sqlalchemy and _wrap_with_default_query_class to accept a query class.
We could just override the query_class attribute on the individual models if having the default query class on relationships and dynamic queries isn't an issue.
Alternatively, we could do some horrible stuff to flask_sqlalchemy.BaseQuery directly:
from flask_sqlalchemy import BaseQuery def first_or_raise(self, exception): rv = self.first() if not rv: raise exception return rv def get_or_raise(self, ident, exception): rv = self.get(ident) if not rv: raise exception return rv def patch_BaseQuery(): BaseQuery.first_or_raise = first_or_raise BaseQuery.get_or_raise = get_or_raise
It's important to not unconditionally love our tools, to turn a critical eye towards their shortcomings and to submit patches and pull requests to fix these issues. Flask-SQLAlchemy is a pretty good utility to have, if only for tying together the request-response cycle with the database session.
Its drawbacks aren't necessarily crippling, but they can leak into your code and cause a mess. Sometimes working around these drawbacks feels like shimming (providing our own query class), but other times, it's architecting a solution (finding a home for common queries).
I'm interested in hearing how you've approached these issues, or if I'm just bikeshedding.
I spilled my brains, spill some of yours.