How do I make SQLAlchemy handy

It's no secret that SQLAlchemy is the most popular ORM in Python. It allows you to write much more advanced things than most fellows Active Record. But the charge for this is more complex code, and in simple tasks like CRUD it is annoying.

How I have made Alchemy convenient, using the experience of the best Active Record ORM, read under the cut.

Introduction

I started using SQLAlchemy, having previously worked with Active Record in Ruby on Rails, with Eloquent ORM in PHP and with Django ORM and Peewee in Python. All these Active Record ORM have a concise code, and I really like them.

Alchemy is very good: it can build quite complex queries, and the Data Mapper pattern also rules. But, alas, after Active Record, the Alchemy code seems too complicated, and this is a price to pay for flexibility.

During the year I worked with Alchemy on a complex project, and realized that for complete happiness I want to have:

Convenient CRUD : for a simple create in Alchemy, you need to create an object, add it to the session , and make it flush
Dynamic construction of filters / sorting based on magic strings as in Django : Post.objects.filter(user__name__startswith='John')
Attached eager load , when you need to immediately upload a post with a comment, and to the post of its user (well, it is , but not very convenient)
Informative __repr__ : print(post) gives something like <myapp.models.Post at 0x04287A50> , it is not suitable for debugging

I sadly recalled the Active Record ORM, where all this was (of course, in each ORM there were not all features, but most). As time went on, the project became more complicated, and in the end could not stand it and decided the tasks described.

The solution is packaged in a well-tested and documented package . Works on a combat project and feels very good.

There are ready-made solutions, you say!

Yes, there is, but they are either hard to implement, or sharpened for a specific framework, or poorly documented.
I wanted to have a universal, easily connectable solution, for example, to write

 from _ import ActiveRecordMixin class User(Base, ActiveRecordMixin): pass

and have ready Active Record.
The options "initialize Alchemy only through me" and additions to flask-sqlalchemy are no good.

What is not satisfied with the specific packages, see here .

More details about each task - below in the article.

About examples in the article

I will give examples for a simple blog with typical entities User , Post , Comment .

Database schema and ORM code

 class User(Base): __tablename__ = 'user' id = Column(Integer, primary_key=True) name = Column(String) posts = relationship('Post') class Post(Base): __tablename__ = 'post' id = Column(Integer, primary_key=True) body = Column(String) user_id = Column(Integer, ForeignKey('user.id')) user = relationship('User') comments = relationship('Comment') class Comment(Base): __tablename__ = 'comment' id = Column(Integer, primary_key=True) body = Column(String) user_id = Column(Integer, ForeignKey('user.id')) post_id = Column(Integer, ForeignKey('post.id')) rating = Column(Integer) user = relationship('User') post = relationship('Post')

You can play around with the code here .
There are ORM classes, initialization of pure Alchemy (without my package) and filling with initial data.

How to start

Install Alchemy:

 pip install sqlalchemy

Then save the test application to a file and run:

 python .py

Active record

After Active Record ORM, I did not understand why I should write 3 lines to create an object

 bob = User(name='Bobby', age=1) session.add(bob) session.flush()

instead of one?

  bob = User.create(name='Bobby', age=1)

I understand that a manual flush session is needed so that requests to the database go in one bundle, and indeed the unit of work pattern gives many advantages in terms of performance.

But in real web applications, most of the tasks are trivial CRUD , and because not 3 requests will be made in the database, but one, the gain is not great. In any case, it is not worth the complexity of the code. And in general, not for nothing that the creators of Django, Ruby on Rails, Laravel, Yii chose Active Record ORM.

Well, nothing prevents implement Active Record over Data Mapper ! For this, all that’s necessary is that when the application is initialized, the session is passed to the model

 BaseModel.set_session(session) #     #       BaseModel.session

Now ORM has access to the session, and you can implement the methods save, create, update, delete , etc.

 bob = User.create(name='Bobby', age=1) bob.update(name='Bob', age=21) bob.delete()

About update

In fact, on the lower layer of SQLAlchemy there is an update of fields from the array, but I want to have it at the ORM level.

Well, I also want to quickly create a request for a model

 User.query #  session.query(User)

and quickly get the first or all records

 User.first() #  session.query(User).first() User.all() #  session.query(User).all()

or find the record by id, collapsing with an error if necessary

 User.find(1) #  session.query(User).get(1) User.find_or_fail(123987) #  ,

As a result, we get a full-fledged Active Record as in my favorite Django, Laravel and Ruby on Rails, but under the hood we have a powerful Data Mapper. Thus, we have the best of both worlds .

I am not against the standard approach of Alchemy, but for simple tasks I want to have a simpler code (see the spoiler at the end of the article )

A detailed description and examples, see here .

Eager load

To solve the problem of N + 1 requests, each ORM has its own solutions.

Suppose we display on page 10 users and all posts of each user. To avoid 11 queries (1 for users and 10 for posts), in SQLAlchemy these posts can be added

 session.query(User).options(joinedload('posts'))

or download a separate request

 session.query(User).options(subqueryload('posts'))

Well, great! Only now if you need to display comments with posts, and with the comments of their authors? Alchemy allows it , but in practice it turned out to be inconvenient.

I would like to declaratively define the relationships that we want to load:

 User.with_({ 'posts': { 'comments': { 'user': JOINED } } }).all()

it is possible and without magic lines:

 User.with_({ User.posts: { Post.comments: { Comment.user: JOINED } } }).all()

In addition, you can set a different download strategy: joinedload or subqueryload .
The following code will lead to 2 requests: for posts (with posts by the authors authors) and for comments (with posts by the authors authors comments):

 Post.with_({ 'user': JOINED, #   'comments': (SUBQUERY, { #      'user': JOINED #           }) }).all()

Well, for simple cases, when you need a joinload or subqueryload of a couple of relations, there is a simple syntax (the with_subquery method with_subquery out just like in Eloquent ):

 Comment.with_joined('user', 'post', 'post.comments').all() #      : # session.query(Comment).options(joinedload('user'), joinedload('post'), # joinedload('post.comments')).all() User.with_subquery('posts', 'posts.comments').all() #      : # session.query(User).options(subqueryload('posts'), subqueryload('posts.comments')).all()

Note that you can load nested relationships like posts.comments . This is not my feature, but Alchemy (unfortunately, I could not find the dock on this feature).

A detailed description with examples can be found here .

PS Special thanks to my colleagues for the code.

Magic operators and join relations, as in Django

The first thing that struck me when studying Django was the magic operators in the filters :

 Entry.objects.filter(headline__startswith="What")

and absolutely struck by the filtering links :

 Entry.objects.filter(blog__name='Beatles Blog')

it is simpler than a more "correct" decision in Alchemy:

 session.query(Entry).join(Entry.blog).filter(Blog.name=='Beatles Blog')

Although...

Of course, this is less flexible, complex queries will not work here.
magic lines and can potentially give a bug in Runtime, if you make a typo, for example, instead of blog__name write blogg__name . Such strings, unlike class properties like Entry.blog , IDE will not inspect.

In addition to aesthetics, magic lines allow you to build queries dynamically (for example, passing filters with UI):

 filters = {'entry__headline__contains': 'Lennon', 'entry__pub_date__year': 2008} #     Blog.objects.filter(**filters)

This is especially useful in applications where the user can build arbitrary filters.

Alas, in Alchemy there is no possibility to build queries so dynamically. The maximum that it allows is a simple filtering of the type "column = value":

 session.query(MyClass).filter_by(name = 'some name')

Taking a ready-made solution as a model (which was still not enough ), I made my own analogue of Jange and now I can filter declaratively:

 Post.where(rating__in=[2, 3, 4], user___name__like='%Bi%').all()

How is it done

The string user___name__like parsed and we understand that you need to posttify the Post.user relationship and apply the filter User.name.like('...') .
I.e

 Post.where(user___name__like='%Bi%').all()

turns into

 session.query(Post).join(Post.user).filter(User.name.like('%Bi%')).all()

And in fact it's harder

In fact, it may happen that in the query some table will appear 2 times.
Suppose I want to get users who commented on posts by Vasya

 User.where(posts___comments___user___name='Vasya').all()

It turns out that there is a user whom I request, and there is a comment author.
The problem is solved through alias , i. The final query will contain 2 tables: user and user_1 .

Of course, we cannot know in advance whether the tables will be repeated, so we make each relation that the join has its own alias:

 post_alias = User.posts.property.argument() #       relationship session.query(User).outerjoin(post_alias) #  ..

Here is a simplified analogue of the real code :

 from sqlalchemy.orm import aliased from sqlalchemy.sql import operators #    {'posts___comments___user___name__like': 'Vasya'}. : relations = ['posts', 'comments', 'user'] # 1. ,    ___ attr_name = 'name' # 2. ,     ___ op_name = 'like' # 3. ,    __ #      op_name. #      OPERATORS = {'like': operators.like_op}, #       OPERATORS[op_name] operator = operators.like_op value = 'Vasya' cls = User #           cls query = session.query(cls) #    #      last_alias = cls for relation in relations: relation = getattr(last_alias, relation) #  relation,  User.posts next_alias = aliased(relation.property.argument()) #    (Post  User.posts) #     alias query = query.outerjoin(next_alias) #  / last_alias = next_alias #   SQL-   /   (   User) attr = getattr(last_alias, attr_name) #    User.name query = query.filter(operator(attr, value)) #  SQL-,    User.name   print(query.all())

Here is the ready-to-run code .

and sort:

 Post.sort('-rating', 'user___name').all() # sort by rating DESC, user name ASC

Automatic eager load

Moreover, since we automatically join join relations, it is logical to indicate to SQLAlchemy that the specified relations are already fuzzy with the help of contains_eager . Now, if the Post.user relationship Post.user used in Post.user / sorting, then we can get the user without an additional query :

 post = Post.sort('user___name').first() print(post.user) #      , ..

A detailed description with examples, see here .

If you do not like magic

Here is an example of magic filtering and sorting in a real project .
Question to all, especially to those who dislike magic lines:

How many lines are needed for such filters and sorting on pure Alchemy?
To what extent will the purely alchemical correct code be clear and maintained?
How easy will it be to reuse?
Overall, which is better: a bit of tested, documented magic or a custom bike on pure Alchemy with 100 lines each time complex filters are needed?

All in a bunch: filters, sorting, eager load

In the real world, you have to simultaneously filter, sort, and even eager to load connections.

Suppose we filter and sort posts by the same Post.user relation. It may be that filtering and sorting 2 times pridoyunyat the same attitude, which will affect the speed.

Doesn't Alchemy itself figure it out?

If you just write

 session.query(Post).join(Post.user).join(Post.user)

then, indeed, Alchemy will make only one join.
The thing is that we make our own alias for each relationship (see the spoiler " how it's done " above), and therefore Alchemy does not know that 2 alias on Post.user is essentially the same thing, and follow up on your own.

Therefore, filtering, sorting and eager load (yes, it too) had to be done in one function in order to have information about all required joins (or rather, to have a single list of alias, see the spoiler " how it's done ") and make them only one time:

  Comment.smart_query( filters={ 'post___public': True, 'user__isnull': False }, sort_attrs=['user___name', '-created_at'], schema={ 'post': { 'user': JOINED } }).all()

A detailed description with examples can be found here .

Easy debugging with convenient `repr`

As a developer, it's very important for me to play with the application in the console. And it was very unpleasant after the beauty in Laravel to see this:

 >>> session.query(Post).all() [<myapp.models.Post object at 0x04287A50>, <myapp.models.Post object at 0x04287A90>]

This is terribly uninformative. ~~and very furious~~ . Therefore, I decided to display at least ID

 >>> session.query(Post).all() [<Post #11>, <Post #12>]

How to universally get an ID

 from sqlalchemy import inspect ids = inspect(post).identity #     primary

In general, each model has 1-2 attributes that reflect its essence. For example, the user is the name, and the post is the user and the body of the post. Therefore, I made it possible to set these attributes declaratively:

 class User(BaseModel): __repr_attrs__ = ['name'] # ... class Post(BaseModel): __repr_attrs__ = ['user', 'body'] # body is just column, user is relationship # ...

Now we have a very convenient __repr__ :

 >>> session.query(Post).all() [<Post #11 user:<User #1 'Bill'> body:'post 11'>, <Post #12 user:<User #2 'Bob'> body:'post 12'>]

Detailed description with examples here .

Results

Implemented features, without which I personally worked very hard with Alchemy.
Now life has become easier, because:

With Active Record, we can quickly do CRUD without bothering with the flush session.
With powerful filtering / sorting in the Django style, we can make very complex, dynamic joins in one line, which can even be transferred from the UI .
We can set the eager load declaratively.
Debugging has become more convenient with informative __repr__ .

On a combat project, this helped greatly simplify support and improve readability of the code.
All features, again, implemented here .

If it seems to you that: (1) all this is evil; (2) I am against pure Alchemy; (3) I did not understand, climbed to improve it

I have been working very closely with Alchemy for a year on a complex combat project. All the tasks that I have brought are not a whim, but a real necessity.
I love Alchemy and its standard approach. I like its flexibility, control over SQL, explicitness, etc. I consider Alchemy the best ORM that I have met. No Active Record can compare with it.
But the flexibility fee is a longer code. And in simple tasks like CRUD, the code of Alchemy is much more complex than that in Active Record ORM.
That's why I wrote my package, which can sometimes be used as an addition to pure Alchemy.
My package is just an add- on for Alchemy. I still love Alchemy for what it is and do not propose to abandon its original use.
About Active Record:
I understand the power of the unit of work pattern to optimize application performance. But in simple tasks, when you need to simply create an object in the controller, writing the same 3 lines for this (create an object, add to a session, flush sessions) does not inspire. That's why I did Active Record over, rather than instead of a Data Mapper for simple tasks.
About filtering / sorting in Django style:
- I know that magic strings like user___name__like are implicit and fraught with runtime errors. But sometimes it can be convenient.
- I also prefer to write explicitly : session.query(Post).join(Post.user).filter(User.name.like(...)) , but it happens that you don’t know in advance which filter field to use, or there are filters so complex that it is obvious to write - there will be a lot of code and figs in it then you will understand. Try to write here such a filter and sorting on pure Alchemy. You write something, but the code will be a lot.
  And most importantly, you have to write a bunch of complex code every time. And so you can declaratively set the necessary filters and sorting
- I understand that it is impossible to start up any filters from the front end without validation:
  - in the combat project, we set a list of valid filters of the type ['product___price', 'product___user___name'] and cut off everything else
  - if the magic filter is incorrect, the packet will produce a readable error
  - There is an idea to make a convenient magic string validator.

Thanks for attention!

Source: https://habr.com/ru/post/324876/

All Articles