The original article is called “10 SQL errors that Java developers make”, but, by and large, the principles contained in it can be attributed to any language.
Java programmers interfere with object-oriented and imperative thinking depending on their level:
- mastery (everyone can program imperatively)
- dogmas (template for applying templates anywhere and naming them)
- moods (applying a true object approach is a bit more complicated than an imperative)
')
But everything changes when Java developers write SQL code.
SQL is a declarative language that has nothing to do with object-oriented or imperative thinking. It is very easy to express a request in SQ, but it is rather difficult to express it correctly and optimally. Developers not only need to rethink their programming paradigm, they also need to think within the framework of set theory.
The following are common errors that Java developers make using SQL in JDBC or jOOQ (without a specific order). For the
other 10 errors, see this article .
1. Forgot about NULL
NULL misunderstanding is most likely the biggest mistake a Java developer can make when writing SQL. This may be because NULL is also called UNKNOWN. If he were called simply UNKNOWN, it would be easier to understand. Another reason is that when retrieving data and binding variables, JDBC reflects SQL NULL in Java null. This can lead to NULL = NULL (SQL) behaving in the same way as null == null (JAVA).
One of the most striking examples of the error about NULL is when
NULL predicates are used with a row value expression .
Another, more specific problem appears when there is no understanding of
the NULL value in NOT IN anti-joins .
Medicine:Train yourself. Nothing complicated - while writing SQL, always think about NULL:
- Is this predicate correct with respect to null?
- Does null affect the result of this function?
2. Processing Java Memory
Not many Java programmers know SQL very well. Random JOIN, weird UNION and alright. And window functions? Grouping sets? Many Java developers load SQL data into memory, transform it into some suitable collection and perform the necessary calculations on these collections with verbose cyclic structures (at least until the
collections are improved in JAVA 8 ).
But some SQL databases support additional (SQL standard!) OLAP functions that are better suited for this and are easier to write. One example (not standard) is an excellent
MODEL statement from Oracle . Just let the database do the processing and pull the results into Java memory. Because in the end, some clever guy has already optimized these expensive products. So, using OLAP in the database, you get two things:
- Simplicity. Most likely, it is easier to write correctly in SQL than in Java.
- Performance. The database is likely to be faster than your algorithm. And more importantly, you won't have to pull millions of records over the wire.
Medicine:Every time you write a data-oriented algorithm using Java, ask yourself: “Is it possible to shift this work to the database?”
3. Using UNION instead of UNION ALL
Shame that UNION ALL requires an extra word about UNION. It would be much better if the SQL standard was defined to support:
- UNION (allows duplication)
- UNION DISTINCT (removes duplication)
Removing duplicates is not only less frequently used, it is also rather slow on large sample results, since two sub-requests must be ordered, and each tuple must be compared with its subsequent tuple.
Remember that even if the SQL standard defines INTERSECT ALL and EXCEPT ALL, not every database can implement these little-used sets of operations.
Medicine:Think about whether you wanted to write UNION ALL every time you write UNION.
4. Using JDBC for large sample pagination
Most databases support some sort of page breakdown tools with LIMIT ... OFFSET, TOP ... START AT, OFFSET ... FETCH operators. In the absence of support for these operators, there is still the possibility of having
ROWNUM (Oracle) or
ROW_NUMBER () OVER () filtering (DB2, SQL Server 2008, and others) , which are much faster than the breakdown in memory. This applies mainly to large offsets!
Medicine:Simply use these operators, or a tool (such as jOOQ) that can imitate these operators for you.
5. Connecting data in java memory
From the early days of SQL and still some Java programmers with a heavy heart write JOINs. They have an outdated fear that JOINs are slow. This may be the case if the overhead optimizer chooses to do a nested loop by loading entire tables into memory before creating cells of the joined table. But this rarely happens. With normal predicates, constraints, indexes, MERGE JOIN or HASH JOIN operations are performed very quickly - it all depends on the correct metadata (
Tom Kyte has written well about this ). However, there are probably still a few Java developers who load two tables with two separate queries and join them in Java memory in one way or another.
Medicine:If you choose from different tables at different stages, think again, all of a sudden you can express your requests in one.
6. Use DISTINCT or UNION to remove duplicates from a random Cartesian product.
Due to complex joins (JOIN), any developer can lose track of significant SQL query links. More specifically, when using a connection with composite foreign keys, you can forget to add meaningful predicates to the JOIN ... ON statements. This can lead to duplication of lines always or only in exceptional situations. Then some developers can add a DISTINCT operator to stop data duplication. This is not correct for three reasons:
- It can cure the consequences, but not the cause. And yet it may not solve the consequences under boundary conditions.
- It is slow for large samples. DISTINCT performs an ORDER BY operation to remove duplication.
- This is slow for large Cartesian works that will still be loaded into memory.
Medicine:As a rule, if you get unwanted duplicates, review your JOIN predicates. There probably was a small Cartesian product somewhere.
7. Avoiding the MERGE operator
In fact, this is not a mistake, but perhaps it is the lack of knowledge or the fears of the
powerful operator MERGE . Some databases know other forms of the UPSERT operator, for example, MySQL ON DUPLICATE KEY UPDATE. In fact, MERGE is very powerful, especially in databases that greatly extend the SQL standard, such as
SQL Server .
Medicine:If you are doing a UPSERT, building a chain of INSERT and UPDATE or SELECT ... FOR UPDATE and INSERT / UPDATE, think again. Instead of the risk of a race for resources, you can write a simpler MERGE query.
8. Using aggregate functions instead of window functions
Before the emergence of window functions, the only way to aggregate data in SQL was to use GROUP BY along with aggregate functions in a projection. This works well in most cases, and if the aggregated data needs to be filled with regular data, then the grouped query can be written in the attached query.
But SQL: 2003 defines window functions that are implemented by many database providers. Window functions can aggregate data on ungrouped samples. In fact, each window function supports its own, independent PARTITION BY operation, which is an excellent tool for generating reports.
Using window functions will allow:
- Build more readable SQL (fewer selected GROUP BY expressions in on queries)
- Improve performance because RDBMS can more easily optimize window functions.
Medicine:When you write a GROUP BY clause in a query, consider whether it can be expressed by a window function?
9. Using sorting in memory with different parameters.
The ORDER BY operator supports many types of expressions, including CASE, which can be very useful when defining a sort parameter. You should never sort the data in Java memory just because:
- SQL sorting is too slow.
- SQL sorting cannot do this.
Medicine:If you are sorting any SQL data in Java memory, consider whether it is possible to transfer this sorting to a database? It goes well with page breakdown in the database.
10. Alternately inserting multiple records
JDBC knows what a batch is, and you should use it. Do not INSERT thousands of records one by one, creating a new PreparedStatement each time. If all your records go into one table, create a batch of INSERT queries with one SQL query and several linked datasets. Depending on your database and its configuration, to keep the UNDO log clean, you may need to commit after some number of inserted records.
Medicine:Always use batch insert large data sets.
Several interesting books on the topic
-
SQL Antipatterns by Bill Karwin-
SQL Performance Explained by Markus Winand