Hey. I am a former developer who has become a database administrator, and I wrote below that I would like to hear myself in due time.
7. The performance of scalar UDF is poor
Good developers like to reuse code, putting it in functions and calling these functions from different places. This works fine at the application level, but at the database level it can lead to huge performance problems.
Look at this post
about the forced use of concurrency - in particular, a list of what leads to the generation of a "single-threaded" query execution plan. Most likely, the use of scalar UDF (
note translator: and for servers younger than 2008 R2 and not only scalar ) will result in your request being executed in one thread (
* sadly sighs * ).
If you want your code to be reused, think about stored procedures and views. (In fact, they can bring their own performance problems, but I just want to get you on the right track as quickly as possible, and UDF, alas, is not).
')
6. "WITH (NOLOCK)" does not mean that there will be no locks at all
At one of the stages of your development career, you can start using WITH (NOLOCK) hint everywhere, because with it your queries are executed faster. This is not always bad, but it can be accompanied by unexpected side effects, about which Kendra Little told
here in this video . I will focus only on one of them.
When your query accesses any table, even with the NOLOCK hint, you impose a schema stabilization lock (schema stability lock, Sch-S). No one can change this table or its indexes until your query is completed. This does not seem to be a serious problem until you need to delete the index, but you cannot do this because people are constantly working with this table, being completely sure that they do not create any problems, because they use the WITH (NOLOCK) hint .
There is no silver bullet here, but start reading about the
isolation levels of SQL Server — I believe that the READ COMMITTED SNAPSHOT isolation level is the best choice for your application. You will receive complete data with fewer problems with locks.
5. Use three connection strings in your application.
I know that you now have only one SQL Server, but believe me, it is worth it. Create three connection strings, which will now only refer to one server, but then when you think about scaling, you will have the opportunity to use different servers “for maintenance” of each of these strings.
- The connection string for writing and reading "in real time" is the connection string that you use now and think that all data should come from here. You can leave all your code as it is now, but when you add something or change the current one, consider changing the connection string to one of the following lines in the requests.
- Connection string for getting “relatively fresh” data, age 5-15 minutes - for data that may be slightly outdated, but still today.
- Connection string for yesterday's data - for reports and trend building. For example, in an online store, with this connection line, you can pull user reviews to products, and users themselves warn that their reviews will be published the next day.
The first connection string “to scale” is quite difficult, in SQL Server there are not very many options for “scaling write operations” (there are such options, but it is very difficult to use and manage them). The second and third lines of the connection "scale" much easier and cheaper. To get more information about using different connection strings, you can read
this post here .
4. Use intermediate database
You probably use the database to perform some minor tasks - calculations, sorting, loading, etc. If suddenly this data disappears, you are unlikely to be very upset, but the structure of the tables is, of course, another matter. Now you are doing everything in the "main database" of your application.
Create a separate database, name it MyAppTemp, and do everything in it! Give it a simple recovery model and just back it up once a day. Do not bother with high availability or disaster recovery of this database.
The use of such a technique has a lot of advantages. It minimizes the number of changes in the main database, which means that backup copies of the transaction log and differential backups will be made faster. If you use log shipping, the really important data will be copied faster. You can even store this database separately from other databases, for example, on an inexpensive, but bright, SSD disk, leaving the main storage system for critical data in production.
3. "Yesterday's" articles and books may cease to be relevant today.
SQL Server was released more than ten years ago, and over the years there have been many changes. Unfortunately, old materials are not always updated to describe "today's" changes. Even fresh materials from trusted sources may be wrong - for example,
criticism of Microsoft's methodology for improving the performance of SQL Server . Microsoft Certified Master Jonathan Kehayias has found many really bad tips in a Microsoft document.
When you hear something that sounds like good advice, I suggest you use the opposite strategy of Dr. Phil. Dr. Phil says you need to get into any idea for 15 minutes. Instead, try to hate it - try to disprove what you read before you use it in production. Even if the advice is damn good, it may not be very useful on your system. (Yes, this also applies to my advice).
2. Avoid using ORDER BY; sort the data in the application
SQL Server spends CPU time sorting the results of your query. SQL Server Enterprise Edition costs about $ 7,000 for one core - not for the processor, but for the core itself. A two-socket, six-core server will cost about $ 84,000 - and this is only the price of licenses, not counting hardware. You can buy a hell of a lot of application servers (even with 256 GB of RAM on each) for $ 84k.
As soon as possible, give the results of the queries to your application and sort them. Probably, your application server is designed in such a way that it can distribute the CPU load across different nodes, while your database server does not.
UPD. I received many comments that the application needs, for example, only ten lines, instead of ten million lines returned by the query. Yes, of course, if you write TOP 10, you need sorting, but how about rewriting the query so that it does not return a bunch of unnecessary data? If there is so much data that the application server has to spend too many resources on sorting, then SQL Server also does the same work. We will talk about how to find such requests in the webinar, a link to which is at the end of the post. Also, remember that I said “Avoid using ORDER BY” and not “Never use ORDER BY”. I use this instruction in the same way - but if I can avoid it at a very expensive database level, I try to do it. That's what it means to avoid.(And this is the part where MySQL and PostgreSQL fans talk about how to reduce the cost of licenses using an open source database.) (And in this part, you expect me to answer them wittily, but I will not do that. If you are developing a new application and are thinking about choosing a database, read
my answer to StackOverflow about which database will handle the greatest load.)
1. SQL Server has built-in tools for finding bottlenecks without affecting performance.
SQL Server Dynamic Administrative Views (DMV) can show you all the places that have a detrimental effect on performance, that is:
- what requests generate the most load on your server
- which indexes just take up space and slow down insert / delete / update operations
- What are the bottlenecks on your server (CPU, disk, network, locks, etc.)?
All you need to know is where to look at it all - and on Tuesday, March 5th, we will show it to you. We are doing a 30-minute webcast for developer training and you can register for it
here (
upd webinar has been successfully completed, you can watch the recording
here ).
Translator's note: any suggestions and remarks on translation and style, as usual, are welcome.