
During my visit to PyConRu 2014, I was surprised to learn that a large enough audience of python developers does not use SQLAlchemy as the main tool for working with the database. After discussing this topic after Light Talks, it was decided with colleagues to write an article about what can be done with all the power of SQLAlchemy.
Usually in writing sites do not need something like that from the regular ORM. And if it is required, then there are enough replacements for abnormal ones or reading the main part of the documentation. And, as a rule, it is not necessary to break the head over complex queries. Quite a lot of different ORMs offer the classic One-2-Many, One-2-One, Many-2-Many, etc. schemes. For ordinary queries and links this is quite enough. Unfortunately, in large projects it does not do without particular cases and programmers with complex queries write either raw sql or rely on what the basic ORM functionality offers them. It does not look very nice or creates a rather large load on the database.
It is clear that in pursuit of the speed of execution of scenarios, you can sacrifice the beauty of the code, but what if speed can be neglected, but not cross-platform? And I don’t want to see in python code anything other than python code. And what if you want to use your favorite ORM to its fullest (for my SQLAlchemy) and not write raw sql queries?
It is assumed that you have already configured access to the database and you know how to map classes, create a session and make simple requests (this is described in the SQLAlchemy documentation at
www.pythoncentral.io/series/python-sqlalchemy-database-tutorial ).
')
Below is a set of classes that we will use in our examples. Of course, he will not cover all those many cases that may arise, but I hope that you will have a low start for writing your own complex queries and help you to get rid of manually writing complex SQL queries.
Short note: I will not prescribe all imports for each example.
Here are some features you might need:
from sqlalchemy import func, and_, or_, not_, aliased
For the rest, refer to the documentation.
I want to separately note the function func. This function will allow you to generate almost any expression for your database.
from sqlalchemy import Column, Integer, String from sqlalchemy.schema import ForeignKey class UserStatus(Base): __tablename__ = 'user_statuses' STATUS_INITIAL = 1 id = Column(Integer(), primary_key=True) name = Column(String(), unique=True) class User(Base): __tablename__ = 'users' id = Column(Integer(), primary_key=True) username = Column(String(), unique=True) password = Column(String(), nullable=False) status_id = Column( Integer(), ForeignKey('user_statuses.id'), nullable=False, default=UserStatuses.STATUS_INITIAL ) class Role(Base): __tablename__ = 'roles' id = Column(Integer(), primary_key=True) name = Column(String(), unique=True) class UserRole(Base): __tablename__ = 'users_roles' user_id = Column(Integer(), ForeignKey('users.id')) role_id = Column(Integer(), ForeignKey('roles.id')) class Product(Base): __tablename__ = 'products' id = Column(Integer(), primary_key=True) name = Column(String(), unique=True) class Order(Base): __tablename__ = 'orders' id = Column(Integer(), primary_key=True) product_id = Column(Integer(), ForeignKey('products.id')) user_id = Column(Integer(), ForeignKey('users.id'))
Yes, there are no relationships in the structure. I did not begin to write them for one simple reason, we will not need them. This is, of course, cool when all the relations and backrefs are registered, but they can only select dependent data. We will try to use all the tables at once in different variations.
Simple JOIN
For example, we need to take all users with their roles, products, orders and statuses.
with SessionContext() as session: query = session.query(User, Role, Product, Order, UserStatus) records = query.all() for user, role, product, order, user_status in records:
In this case, SQLAlchemy will generate an INNER JOIN. This method is good when you have all the indexes in the database (believe me, very often they are not). SQLAlchemy itself generates a query based on the data of the class (after all, we have connections).
And what if everything is not so smooth, and there is no possibility to specify the Foreign Key in the database (for various reasons)? For this, SQLAlchemy allows you to explicitly specify for which columns we will link the tables.
with SessionContext() as session: query = session.query(User, Role, Product, Order, UserStatus) query = query.join(UserRole, UserRole.user_id == User.id) query = query.join(Role, Role.id == UserRole.role_id) query = query.join(Order, Order.user_id == User.id) query = query.join(Product, Product.id == Order.product_id) query = query.join(UserStatus, UserStatus.id == User.status_id) records = query.all() for user, role, product, order, user_status in records:
In this case, even if there is unrelated data in the database, we can select all the necessary records.
Simple LEFT JOIN
Suppose we need to take ALL users and even those who have no orders. Those. if the user has orders, then show them, and if not, then show the user without orders.
with SessionContext() as session: query = session.query(User, Role, Product, Order, UserStatus) query = query.join(UserRole, UserRole.user_id == User.id) query = query.join(Role, Role.id == UserRole.role_id) query = query.join(UserStatus, UserStatus.id == User.status_id) query = query.outerjoin(Order, Order.user_id == User.id) query = query.outerjoin(Product, Product.id == Order.product_id) records = query.all() for user, role, product, order, user_status in records:
Here we used the outerjoin function, which for PostgreSQL will generate a LEFT OUTER JOIN Query.
Complex queries
Sometimes there is a situation when we need to do a lot of sorts and add a lot of conditions. Complicates everything, often the base itself.
For example. Suppose users can order the same product several times. You need to select a record that indicates whether the user bought the product for each user and each product. In this case, if the user bought the product more than once, then it does not matter which purchase information will be selected. If the user did not buy this product, then the desired behavior is similar to left join. To group the results you can use GROUP BY, if we were not important data on the purchase. Otherwise, it is necessary to specify all required fields (which are specified in select) for GROUP BY, which is highly undesirable, since it will create additional load on the database. To select data, it is better to use DISTINCT ON, which simply cuts off duplicate records by product ID and user ID. The problem is that PostgreSQL requires that those columns specified in DISTINCT ON are present in ORDER BY. And I want to sort the output by user name (for example). This is where the fun begins. Fortunately, the database allows you to "wrap" one query into another.
SQLAlchemy has the cte () * Common Table Expression * function. This function creates a subquery from your query.
Example
with SessionContext() as session: query = session.query(User, Role, Product, Order, UserStatus) query = query.distinct(Product.id, User.id) query = query.join(UserRole, UserRole.user_id == User.id) query = query.join(Role, Role.id == UserRole.role_id) query = query.join(UserStatus, UserStatus.id == User.status_id) query = query.outerjoin(Order, Order.user_id == User.id) query = query.outerjoin(Product, Product.id == Order.product_id) query = query.order_by(Product.id, User.id) found_records = query.cte() main_query = session.query(found_records).order_by(found_records.c.user.username) records = main_query.all() for user, role, product, order, user_status in records:
In this example, we “isolated” the main query and sorted the result as we needed. You can also do with complex samples with group_by.
There is another useful tool in SQLAlchemy called subquery. This function allows you to use the generated statement in large queries or in a JOIN.
with SessionContext() as session: stmt = session.query(Order.user_id, sqlalchemy.func.count('id').label('users_found')).subquery() main_query = session.query(User, stmt.c.users_found).outerjoin(stmt, User.id==stmt.c.user_id).order_by(User.id) records = main_query.all() for user, orders in records:
This example makes a subquery within a query via LEFT OUTER JOIN and returns the number of orders for each user found.
Also, there are situations when it is necessary to link one and the same table in one query. For this there is a function aliased (). To be honest, I have never used it, so I will take an example from the documentation
with SessionContext() as session: adalias1 = aliased(Address) adalias2 = aliased(Address) for username, email1, email2 in \ session.query(User.name, adalias1.email_address, adalias2.email_address).\ join(adalias1, User.addresses).\ join(adalias2, User.addresses).\ filter(adalias1.email_address=='jack@google.com').\ filter(adalias2.email_address=='j25@yahoo.com'): print username, email1, email2
Well, in the end, an example that is used in my code and works on PostgreSQL. The over () function is not in the postgresql dialect section, so it will most likely work everywhere. In this case, I want to show how to work with the func function.
from sqlalchemy import over with SessionContext() as session: query = session.query( User, over( func.row_number(), partition_by=User.id ) ) for user, row_number in query.all(): print "{0}# {1}".format(row_number, user.username)
This example will display the numbered user names in the order in which they were found in the database.
At the end of the article I repeat. Perhaps this code will not be the fastest, but at least you will be able to maintain many databases that are supported by SQLAlchemy.
For example, for development and testing, you can not deploy a full-fledged relational database on a local machine, but use SQLite.
The advantages of this approach include:
- Code uniformity. Only python will be in python files.
- Screening possible incoming data using SQLAlchemy
- Ability to use different databases without changing the query syntax
- Ability not to know SQL at all. There is no direct use of SQL.
- Do not write kilometer requests yourself
Minuses:
- Memory consumption for function calls, for storage in the memory of generators. The string does take up less memory space.
- It is impossible to see immediately what generated SQLALchemy (only at the time of execution)
- You need to know SQLAlchemy, its functions and capabilities in order to use it correctly. Still, the library is not small
Read more about SQLAlchemy ORM here:
sqlalchemy.org