"Performance is a feature." Interview with Marco Cecconi, Stack Overflow

Over the years, one of the main issues related to applications on .NET has been the issue of performance. One of the very first articles on this topic dates back to 2001 .

The topic has not lost relevance for more than 10 years, and in 2011 people still ask questions in search of the best tool for profiling.
')
About what all this means for modern .NET-development and what tools for ensuring maximum productivity are used by the largest community of developers in the world, we decided to talk with the performance engineer of Stack Overflow Marco Checconi.

Marco Chekkoni, Stack Overflow engineer from London. He writes a lot about software development, coding, architecture and teamwork.

- You work in Stack Overflow, can you name the main “pain points” of your project in terms of performance?

“There are two of them: on the one hand, we need to be very, very careful when instantiating objects and working with garbage collection, and on the other, we need to pay as much attention to how we use SQL server, write SQL queries, build tables, etc.

At the moment, these are two main aspects to which we pay maximum attention, and which most affect performance.

- Is your solution built entirely in C #, or are there parts in other languages, such as C ++, Java, Python, or others?

- I would say that 99% of us are C #. We, of course, have a bit of C ++ or C, but in the lines of code it is very little. Naturally, we have TypeScript and JavaScript. JavaScript on our server is used to compile bundles and minify code. We also use SQL, it is a different language. That's all.

- Can you lift the curtain, why did you decide to develop a project in C # and not in other languages and technologies?

- Can I come in a bit from the other side, answering this question? Stack Overflow has been around for 8 years. And during this time we have grown, I do not know, somewhere from one hundred thousand views per day to billions of views per month. And we still use C #. So why haven't we switched to something else? And the answer here is not some kind of unhealthy devotion to Microsoft, it's in a very good runtime environment that meets all our needs. And we just don’t see any reason to waste time and switch to something else. At the moment for us it all works more than well.

- That is, at the moment you have additional reserves in order to cope with the sharp influx of visitors, if such happens?

- At the moment we are working at 5% of maximum power. We can withstand 20 times more load. Well, it will be, of course, hard, we would not want to work 100% all the time, but at the moment we have something between 5 and 10%.

- Should we even continue to interview? :) Apparently you have no performance issues.

- Oh, no, we have a lot of performance problems. But you know, performance optimization is not what you do only when everything is already dying and the constant load on the processor does not fall below 80% in days. It is precisely because we are constantly being optimized, precisely because of this, we have a 5% load, optimization is a constant and long process.

Margin of safety

- OK, can you remember especially serious problems? Maybe some events in the market, in your history, when you really experienced delays and interruptions in the work of your decision?

- No, we have never had problems of such scale. The reason for this is our huge margin of safety. We work, even when we are DDoS. Of course, one should never be too self-confident, but I got the impression that with a super-scale DDoS attack, our Internet channel would be the first to clog up, and with this we can hardly do anything. There are certain events, which we see in the logs. When Pokemon-Go was released, we had a very big growth for several days. And the presidential elections gave a noticeable decline, after the elections the number of visitors decreased significantly. We can track such events, but these jumps do not exceed 100% of the norm.

- Let's go to the selection of tools. What is your favorite tool for finding bottlenecks in code?

We have our own tool called MiniProfiler. It is open source, you can find it on github . It works on .NET and Ruby. Under .NET, it works with the using statement . Using we use to determine the block of code we want to measure. If you want to measure call time, you can package it in using, thus creating a query execution profile. For example, we have a timer that turns on when we start processing a request, and the same timer stops when the request processing ends. Timers exist for each database, each request to the database, to the ElasticSearch engine.

Timers are set in different blocks of each page, so we can compare and evaluate the rendering speed of the list of issues with the rendering speed of the header or footer on the main page. This tool inserts the counting results as comments in response headers. The developer sees on the page a small field with data in the upper right corner, which contains data about the page rendering time, and clicking on this field will display detailed information on each of the processes on the page.

This tool is designed to detect specific problems. If when viewing a page there is a delay, I see exactly why it occurs. We also use a subset of the data that we collect and store together with the logs, this allows us to create SQL queries and display performance statistics for each page.

Another of our tools is called Bosun - this is a time series alarm system. It monitors the parameters we have indicated, such as memory, allocated resources, and raises an alarm when these parameters are significantly changed.

We have another system that monitors every SQL query, in other words, it collects statistics on the database operation. Because of this, we know exactly what requests take more time, have we not done something wrong, and what is happening in general.

The other system constantly monitors server operation parameters, such as memory and CPU. We have the ability to track all special situations that occur on all our servers with all our projects.

- Does it look like Telegraf?

- Our tool is called Opserver , we created it ourselves specifically to solve this problem, and it is open source.

- What is your approach to optimizing performance. What are you doing with “slow” code?

- Usually “smell bad” allocations, which are not always easy to find - the only way to do this is to take an IIS memory dump and directly see what is happening there and why it takes so much memory, because allocation can happen anywhere. This can happen in our own code, as well as in the called library. This happens in third-party libraries as well as in the .NET library. For example, StringBuilders began to add one allocation to the constructor, and this definitely very seriously affected the operation of our system.

Sometimes allocations occur when using LINQ. LINQ is terrible for allocations. Sometimes it is optimized, and it works, and sometimes it is not, and it is always very difficult to say whether everything will work without problems or not. This is the main thing for which we observe and what we track.

- You mentioned a special tool that you use in Stack Overflow. Can you tell me how you are benchmarking?

- In fact, we are not so often engaged in benchmarking. The main thing for us is how the code behaves when it enters production. Production is our benchmark. We just release the code and see if it works or not. And in order for you to understand why this is important, just think about SQL queries. Suppose you create a poorly optimized SQL query. You cannot immediately identify this particular request as a source of problems, because you can block something, and the rest will work normally. When the code is in production, the connections between the code and the consequences are not always obvious. In general, it is always difficult to create an environment as close to reality as possible to safely test code outside of production.

- Often optimized code looks much prettier.

- Most often, high-performance code is very ugly and full of different hacks. Do you have difficulty working with optimized code?

- Yes, sometimes optimization is hacks or ugly things like loop unwinding, but in most cases optimization is not something that degrades the code. In fact, in most cases, fast code is much prettier.

What I say may sound silly, but I assure you that this is true. If you write less code, it will work faster. You literally write less code, so it turns out to be compact. In the same class, you can find everything from HTML, unfortunately, to SQL. And it is very compact, everything is next to each other. All this makes our code more readable, because everything is at your fingertips, you do not need to always refer to other parts of the code.

- Isn't a more compact code in fact less readable and supported?

- A more compact code really does not necessarily become more understandable and easy to maintain, but the self-sufficiency of the code without external dependencies really makes it more understandable and maintainable. As it seems to me, the main thing in our code is dependence only on oneself. Suppose you are writing a Mock project, a very, very small thing, just to test the case or something very compact, which takes only a few hundred lines of code. Maybe you lift the Linkpad and load your piece there, very compact, and everything is there.

This code should be very clear to you. There are a hundred lines of code, you can read them all, keep them in your head, imagine how it works. And this is what we are striving for, each feature is very, very compact and autonomous. Of course, it does not always work, and the code is more complicated, but it is inevitable. It seems to me that in 90% of cases when I work, I do not write complex code. I write very, very simple things with a minimum of moving parts, and it is really simple. As for our talk about performance, I’ll just show you examples and how we achieved it. You will see that the code is as simple and compact as possible, even though it is very efficient.

- Performance problems arise not only in your own code

“I just realized that performance problems arise not only in your own code.” There are also environment, hardware and third-party libraries. Do you have to use it "as is", or are there some ways to optimize this too?

“We know this very well.” All our iron is developed by us. Naturally, we do not design motherboards, but all requirements are developed by a special division of SRE: how much RAM, which manufacturers, which models and which power strips we want to use. Everything is specified by us, which is why we do not use cloud services, we have our own hosting and we control everything. Construction of machines, specially sharpened by our tasks and requirements - one of the guarantees of our performance.

As for third-party libraries, yes, we are familiar with this problem. Our main questions are: what do you do, what tools do you use, for example, the library, which we wrote and published ourselves. In most cases, our requirements are very different from standard ones. We are constantly rewriting libraries for ourselves. We have our own set of libraries, for example, the Redis client, our own solution protobuf, JMS serializer, etc.

- By the way, do you have your own version of C #?

“We have our own version of the Roslyn-based compiler, mainly for compiling razor templates, but the only thing it is used for is localization. We redo only what we need, we do not expand the language. We use the language of Vanilla by itself. The main reason why we don’t modify the language is compatibility, it breaks Visual Studio and everything at once, and we don’t want it.

- Sounds cool.

Considering everything that we have discussed regarding optimization, are there any specific signals for the developer when he should start optimizing the code, or not to start at all?

- Constantly optimize the code. Performance is a feature, insufficient performance is a bug.

- The philosophical question - I believe that IT is about money. If someone asks how much time you spend on work optimization, will you be able to answer specifically, for example, “40 hours”? During these 40 hours, you can implement a feature for the user or spend on optimization, the need for which is uncritical. This is a kind of compromise.

- I do not agree with you. If you have a bug in the release, do not you fix it?

- Of course, correct.

- That's it. This is the same. Lack of performance is a bug that needs to be fixed. It's simple.

- And then how to determine that performance has become a "bug"?

- You constantly monitor performance parameters. Notice that performance drops? Find the problem and solve it. Naturally, no one is optimizing just for the sake of the process. We do it like this: track the parameters; see when something is broken or malfunctioning; fix the problem. In our particular case, everything is simple. The repair process is not easy, the decision making process is simple. When something works inefficiently, for example, the page instead of 20 milliseconds loads 2 seconds - you need to fix it. You can not force the user to wait 2 seconds. Maybe some companies can afford it. We are not.

- Do you use any ways to optimize iron? For example, multithreading, hyper-threading, GPU computing?

- We use. In one case, we used CUDA for demanding parallel computing. We generally use parallelism a lot, but mostly for tasks such as build. When we want to process multiple files, we use parallel computing and multithreading wherever possible. In terms of code, we try to use sync to avoid all external weights, such as database weights. But I'm not sure that this can be called true multithreading. Let me think…

In most cases, the best strategy for a web server is maximum speed, no need to be smeared on different kernels. Because we are constantly competing with a certain number of requests from other users. The best use of CPU in this case, probably, is to leave the distribution of requests among the cores for IAS. In the case of using libraries or applications in the backend, this makes more sense, since there are fewer requests and it is better to use more cores per request. In fact, it was for this that CUDA was created. We noticed that increasing the number of threads increased performance, so I said, "Let's try this." Another conversation that specifically I will give for this.

- I heard something about the icon "NOT ROBOT". Can you tell me something about this?

- Stack Overflow - a user community for developers, and we reward developers by granting them points, which are called "reputation", and also reward them with badges. With this, we encourage people and encourage them to do what we consider useful for the community. For example, if you ask a smart question, you get an icon. If you respond well to a smart question, you also get an icon. We reward people who come and talk with some of our speakers, as well as attend conferences, with the “NOT ROBOT” badge.

Recently, we have been commenting a lot and we have a lot of conferences. We also noticed that people are embarrassed to ask questions and do not go to conferences, maybe they are introverts, maybe they consider us inaccessible, and it may be that Stack Overflow are aliens from space. That's why we created a badge to motivate people. When we meet in real, you get a special code to activate the icon. The icon is very rare, and the only way to get it is to meet us. We want to distribute them in Helsinki and Moscow.

- What would you advise all .NET and C # developers?

- Do what you like and inspire. It is very important that development is not just a job. Development is also a creative challenge, because unlike, for example, a car mechanic, every day you come across something new. This means constantly doing something new and constantly learning something new. It is important not only to do your job professionally, but also with passion. Develop your passion for work, it constantly pushes you to additional achievements, and not just to work "from and to". Think about what will make you happy, and do it.

By the way, it will be possible to listen to Marco and get the "NOT ROBOT" badge on December 9 at the DotNext conference.

The following reports will also be devoted to performance issues:

⬝ WinDbg Superpowers for .NET Developers
⬝ Doesn’t the server exceed 100,000 requests / sec?
⬝ End-to-end JIT
⬝ .NET code modification in runtime

Source: https://habr.com/ru/post/316854/

All Articles

"Performance is a feature." Interview with Marco Cecconi, Stack Overflow

Margin of safety

- Often optimized code looks much prettier.

- Performance problems arise not only in your own code

More articles: