📜 ⬆️ ⬇️

"For the same functionality that SQL Server provides, Oracle asks 10 times more," - Konstantin Taranov about SQL Server

We talked with Konstantin Taranov , the developer and database administrator at Gazeconomika, a division of Gazprom. Konstantin's professional experience is more than 10 years, during which he has worked with all key relational DBMS, as a result of which he chose MS SQL Server. Konstantin explained why SQL Server is so attractive for the tasks solved in the company Gazeconomika, and spoke about the most comprehensive set of tools and useful materials for working with SQL Server in runet.



PG Day: Konstantin, tell us a little about yourself. How long have you been working in the profession, what professional problems have you been able to solve on your way of life?

KT : Briefly about myself: I graduated from the Moscow Engineering and Physics Institute, by education I am an engineer-physicist, specialization is reactor materials science. Already in the fourth year, I realized that this was not exactly my vocation, so I decided to retrain as an IT specialist. Then there was a very strong trend towards Oracle, the country was relatively on the rise, many vacancies in IT and banking. I managed to get settled in RDTech without experience and background. There I worked in one of the largest projects at that time - building a data warehouse for VTB24 Bank. The next five years were spent working closely with Oracle. In 2012, I came to Gas Economics. The main task of our center is to forecast the demand for gas, coal and other fuel resources not only in Russia, but also in the world.
')
We work with various data sources, which are often not consistent with each other, because of this, we have to regularly check and compare data. As it is now fashionable to say, data science.

In Gas Economics, Microsoft SQL Server has historically been used. At that time it was 2008, but literally with my arrival, we updated to 2012. We are now planning to move to 2017. CTP 2.1 is finally released, Microsoft promises a release for final use on May 31. We also have small information systems for internal use, which revolve on PostgreSQL. Accordingly, in 10 years in IT I became acquainted with many relational databases, because, in addition to PostgreSQL, Oracle and SQL Server, I had to work actively with MySQL.

PG Day: You have been working with relational databases for many years. Why do you think relational databases do not lose their relevance?

KT : Now a lot of NoSQL and big data. This is an interesting range of tasks, but I think this is a narrow niche. We have purchased a lot of sources from various news agencies. All of these agencies are very expensive, subscriptions reach hundreds of thousands of dollars. That is, it would seem, the information is ready, take it and use it. But in fact, it turns out that it is very “dirty” and often does not correspond to reality. You can’t just take this data and use it, write machine learning algorithms, build models. We spend 70% of the time verifying this data and comparing it with each other. Even the directory "Countries" and the concept of "country" in different information agencies is different. A vivid example is the United Kingdom. Some news agencies include only Scotland, England, Wales (3 countries), and Northern Ireland and various subordinate island states do not include, some vice versa. Accordingly, the total statistics for these data diverges.

What is the power of relational databases? This is a technology that has been tested for decades. It was created a long time ago and it is based on complex mathematics. Spending time creating the correct database structure, having done a very detailed primary analysis, processing, verification, you get ready data at the output, which you can already transfer to any model, any algorithm and get good results. If you have bad data, you can, of course, train a neural network or build an even more complex algorithm, but they yield poor results at the output. This is repeatedly tested on our example. We use the language of R. to solve the so-called transport optimization problems. We have 3 mathematicians, and R immediately liked them, because it is as close as possible to their structure of thinking.

We also had no performance issues. Everything works very well and quickly. Over the past three years, we have left C #, the business logic has been completely transferred to R and has never regretted it.

PG Day: So that you can advise those who plan to deal with databases, where to start, in which direction to “dig”?

CT : I highly recommend taking Stanford's course on relational databases . He spends a lot of time on the basics; in his time, this course opened up a lot to me. This course shows that the basis of relational databases is serious mathematics , relational algebra, behind which there are years of research. No NoSQL is also now, because, after all, large amounts of information, gigabytes of data, are sometimes better processed by other solutions. Do not try to shove them into a relational database, if you don’t understand the structure of the relationship between the data, or if you don’t have it as such - a black hole that needs to be explored for a long time. Then, when you know what is inside, it makes sense to transfer structured data to a relational structure.

PG Day: Suppose you have a need to choose a data warehouse for a new project, what would you stop at? Oracle, SQL Server, PostgreSQL?

KT : If I were choosing the technology stack now, I would love to choose SQL Server , but only for serious large projects. If I had to do my own project, then I would stop at PostgreSQL precisely because it is free. I am not so familiar with PostgreSQL to advise it to be used immediately in a serious commercial system. Minus PostgreSQL is the lack of support. In Microsoft, it is also not perfect, but it is.

Why not Oracle? Oracle is very expensive. It is inexpedient to use it, I will show it at the training. For the same functionality that SQL Server provides, Oracle asks, on average, 10 times more. Why pay more? This is another question. If you have servers purchased from Oracle or you have a lot of money, of course, you can choose Oracle. But I would recommend SQL Server. Over the past 5 years he has made a huge leap. I remember this database back in 2008, when we had SQL Server 2005 “sold out” and it made an oppressive impression both in terms of performance and in terms of the client of SQL Management Studio . Now SQL Management Studio is one of the best tools for working with SQL Server. His only drawback is that he works so far only with SQL Server. But this is probably the best free environment, which has a very wide range of opportunities for developers.

If you really want to enjoy the development of relational databases, then SQL Server + free SQL Management Studio - this is now the thing. Another important "bun" - SQL Server Developer license for the developer is free. Previously, it cost a little money (about 3000 rubles). I bought for my needs, it is not expensive - investing money in my training always pays off. Now it all became free. I believe that with the advent of SQL Server on Linux, it is worth spending a few hours and try this database. You might like it very much.

PG Day: You are familiar with the development and administration of SQL Server. How difficult is it to combine two different roles? How common is it in the SQL Server world?

KT : In the world of SQL Server, these are two completely separate roles. But SQL Server has two huge benefits. First, as mentioned earlier, SQL Management Studio . This is a great GUI that can be used for administration, and it covers 95% of all administrator tasks; this is a teacher and advisor all in one. SQL Management Studio allows you to generate a Transact-SQL script on the fly. By selecting any ticks in the GUI, having made the preliminary settings, you can immediately get a working script that you will use directly in the future. Or to fasten job which according to the schedule will launch this script. Do anything with him. Give it to your developers so that they finalize it.

The second trump card is very good support for PowerShell . If you have a large number of servers that need to perform routine, repetitive operations, then PowerShell is an excellent choice. It is fully integrated with SQL Server (the SqlServer module of the same name, which is constantly evolving and improving), is able to do absolutely everything the same as Transact-SQL, even better. Plus, the so-called DBA Tools finally appeared in open source. This is a set of functions and PowerShell scripts for working with SQL Server. At the moment, it covers almost everything, from backup to recovery - the entire SQL Server maintenance cycle. It is enough to install the PowerShell module, study the main commands and functions, their parameters. This is done literally 2 pm.

And an additional advantage of PowerShell - you get good support for C #. Almost all the code that runs on C # can be embedded in PowerShell. If you have ready-made C # solutions, you can easily apply them to work with a large number of servers and databases using PowerShell. If you have to do backup, backup setup, validation and similar tasks, there are tools and stored procedures that allow you to do this without spending money, you do not need to buy any third-party solution. It makes sense to buy it if you really need a user-friendly interface and various buns. But all this can be done absolutely free .

PG Day: I want to go to your upcoming performance at our conference, which will be held in the summer in St. Petersburg. You are going to give a very big master class. What made you take part in our event? What profile specialists do you expect to see on your presentation?

KT : First of all, I have a vested interest in participating in the conference. One of the tasks for the current year that we set for ourselves is the ability to transfer our solution to PostgreSQL, in case our new customers do not want to buy a license for SQL Server. You need to assess how difficult it will be to switch to PostgreSQL, how to take the first steps in this direction. We already have a prototype, about 20% of the current solution works with PostgreSQL in test mode. It is interesting for me to participate in such a large conference, to listen to the speeches of other people, to learn first-hand what tools they use, what resources exist for training, what amounts of data are being processed.

The idea to make a full-fledged training has been hatched for over five years About four years ago, I was terribly displeased at the courses of a specialist in TSQL, as the lecturer read the material, as if on a piece of paper. There were no real “cases” of using SQL Server. He just took the program Microsoft, memorized and read it. And we have accumulated a lot of experience in five years, we have passed all versions from SQL Server 2008 to SQL Server 2016. With each version, the product acquired opportunities, became faster, more reliable. There was support for executing R code in SQL Server, which we like very much, we already use it, however, there are problems with switching to SQL Server 2016 at the customer. Everything is good on the test environments, but in order to migrate, you have to show the customer all the advantages of this update (specifically for him, this is an increase in the speed of the system and possible savings on hardware). We work actively in this direction.

Accordingly, a lot of practices have accumulated, internal documents, scripts, free solutions, modified for our needs. All this is in the repository on Github , which we support and try to update almost every day: we update it with new articles, links to useful resources, scripts, including for PowerShell. At the moment, our website sqlserver-kit.org is, without false modesty, the most detailed resource on SQL Server , which contains comprehensive information.

My training is a continuation of the development of this resource, only in the form of specific checklists, practices that I will try to convey to the listener. I really want the audience level to be at the level between entry-level and mid-level developers who can contribute to the answer, indicate where to do it differently, in order to get the best possible result from SQL Server.

If you want to help, find mistakes, inaccuracies, add your own scripts - always welcome, we will be happy. You can start an issue or pull request, to which we will respond very quickly. The materials are distributed under the MIT license, any person can copy them and use as he pleases, including for commercial purposes.

PG Day: Promoting your own tools in open source is a very positive step. Running SQL Server on Linux, a free open source environment, is one of the reasons that pushed us to the idea of ​​holding this master class. The opportunity arose very recently, one of the questions that will interest the listeners - what are the difficulties and pitfalls associated with the operation of SQL Server on Linux? Are there any plans for debriefing in your master class?

KT: Yes, this is one of the main trump cards of Microsoft in the eternal dispute, what to choose: Oracle, PostgreSQL or MS SQL. Now a lot of developers are adapting SQL Server, it already works almost the same as under Windows. Accordingly, I will try my best to adapt my presentation accordingly. Naturally, this will be a virtual machine, SQL Server 2017, which is scheduled for release on May 31. If it is suddenly transferred, and this is quite possible, then it will be version CTP 2.1.

Accordingly, as a distribution of Linux, I consider CentOS. Now I check all the scripts that I plan to show on this stack. If everything goes as planned, the demonstration and the entire training will be built on a Linux + SQL Server stack adapted for Linux. If suddenly something goes wrong, then there will be Windows Server 2016 + SQL Server 2017.

PG Day: Microsoft products are a whole ecosystem, not just SQL Server “in a vacuum”. Can you identify the features of the entire ecosystem as a whole, which seem to you convenient and useful from the point of view of the database expert that you would like to see on other platforms?

KT : Ideally, I would like Management Studio to work with all databases, a very handy thing. What else is convenient in Windows? Perhaps this is PowerShell, I really loved it, it greatly simplifies life, it has everything you need. It is possible to build graphs in it, you can display grids, which are filtered very easily. You can make queries to the same SQL Server, Postgres, get the data in the right form, export to csv, do the import. In powershell, each variable is an object. Fill it, convert, do with it, whatever. Plus, PowerShell has become open source - now it's an open source, cross-platform scripting language.

The only disadvantage of PowerShell is the lack of a GUI, however strange it may sound. Still, some things are handy to do in the GUI: watch query execution plans, for example. Visibility and availability of graphs play an important role. PowerShell can build nice graphics. if you have nothing at hand, you can always install it and use it as a graph generator.

PG Day: I, as a post-sage player with experience, got used to the fact that any adequate actions with the database can only be done in the console, in psql, and there are no really good “guts” tools. This is true for many people working with open-source products: no one is used to doing everything in the graphical interface. Do you plan to support the Management Studio tool, which you speak very well about, under Linux? How suitable will it be for people who want to use SQL Server under Linux, or will they still have to do something the same way, in the console?

KT : Unfortunately, the release dates for Management Studio for Linux have not yet been announced. I don't know if they will postpone the release. Therefore, a Linux user will be deprived of such a rich interface for some time. There are no specific deadlines for the migration, but Visual Studio has already been released (SSMS is actually a fork of Visual Studio adapted for SQL Server development) for Mac, so most likely at the end of this year or next year SSMS will become cross-platform. As for working with the command line, there are already 2 utilities working under Linux: mssql-scripter and DBFC . To assess how many useful tools for working with SQL Server exist, it is enough to familiarize yourself with my list , which currently lists more than 200 utilities and programs. I think that the adaptation for Linux will be implemented with the release of the final version. For now, Linux users will have to wait, wait for the release of Management Studio. And I am sure that most developers will like it, especially those who have come across Visual Studio.

PG Day: You have planned to devote a very large query optimization section in your master class. What aspects of optimization are you going to dwell on? What tasks can students solve with you?

KT : Yes, this part of the material is simply huge and you need to organize a separate training for it. But within the time available to me, I will show all the beauty of query optimization in SQL Server Management Studio. SSMS is actively developing, and it is simply created to optimize requests. At the same time every two months there is a new version with support for new features of SQL Server. “Killer feature”, which appeared just six months ago, and now saves a lot of time, is viewing the query execution plan in real time.

In the optimization process, you always have to wait for the completion of the query to understand what statistics were used, what assumptions the optimizer made to select a particular operation, what estimates did about the number of rows in the tables. All this can be viewed using the evaluation plan for the execution of the request, but often it can be very different from the real one. Therefore, if you really need to optimize a critical request, you need a real execution plan.

Starting with version 16.4 , Management Studio is able to generate a request execution plan in real time. , , , , , : , join. “”, , , , . Management Studio , . , , “-”. , Oracle, , 2-3 .

, Management Studio . , XML Management Studio . GUI : , . , - , – . , , .

. SQL Server, : , . , 2012 SQL Server , , . .

PG Day: . – .

: , . - . - , .

PG Day: - , , ? “”, MS SQL Server, ?

: , , : Stack Overflow, github, .. SQL Server, , , .

, , SQL Server – . , , , , . , .

. #slack, twitter ( ). Github , SQL Server , SQL Server – .

- #sqlhelp , , .

sqlcom.ru : telegram-, SQL. , SQL Server, : , #, .. , , , .



, MS SQL Server, , . PG Day'17 Russia - , SQL Server: , Linux Windows, , !

Source: https://habr.com/ru/post/329842/


All Articles