As many know, he is the founder of the Openwall project, the author of free software (including the popular password security audit program John the Ripper ). Solar Designer was a technical reviewer of the book on computer security Silence on the Wire by Michal Zalewski and wrote a preface for it.
With reports on computer security, Alexander participated in many international conferences - HAL2001, NordU, FOSDEM, CanSecWest, PHDays. At YaC 2012, he was telling " How to protect millions of passwords ." Before the performance, we had a small interview with him. Solar Designer told how he became a computer security specialist and shared his view of its current state. ')
Frankly, when I thought which of the YaC speakers to talk to in addition, I came to many in Yandex and asked this question.Someone did not know you, and someone, having seen your surname among the speakers, immediately said: “Call him!How can you even think about what to talk with him! ”It seems that not all people know what you are doing and only those who know understand how interesting it is.Tell me to start, what are you doing and why are you telling at all on such a topic?
You can start very far away. Computer safety as a hobby, I began to engage in the 90s. And then it became a professional activity. I developed free software on this topic. One of the subtopics was information security tools, the other was general-purpose software, but written with increased security in mind. In particular, the POP3 server used on OpenBSD and used by some ISPs. Further, already within the framework of professional work and the relevant needs of clients, the question of protecting their users arose. In particular, passwords, because password authentication was used in the 90s, and in zero, and is used now. Multifactor authentication, tokens have been added to it, but passwords still remain relevant. And not only for authentication, but also to protect confidential information, file encryption, encrypted file systems. In my report on YaC, I will talk about passwords for authentication using the example of companies, where the number of users is in the millions. This is a task that, on the one hand, is very difficult. On the other hand, having limited myself to such a sub-area, in 30-40 minutes I can tell what I think will be useful for the audience. In general, the area of ​​password protection is too broad and cannot be fully covered in one report.
I understand that it is impossible to retell the forty-minute report in a couple of minutes, but maybe you can share it at least in brief?Perhaps we are listened to by those who do not understand very well what exactly is behind it.
I will talk about the research of recent years, starting in 2009. It was then that Colin Percival, who until recently was a security officer of the FreeBSD project, suggested using so-called memory hard functions for password hashing. That is, cryptographic hash functions that intentionally use large amounts of RAM. The purpose of such use is to increase the cost of equipment for the selection of passwords. It turns out that when we use only a central processor on a regular computer, only a small part of the chip area is used to solve cryptographic tasks. Accordingly, with specialized developments for the selection of passwords - up to the creation of specialized microcircuits - it turns out that the attacker has a very strong advantage. In the scrypt mechanism developed by Colin, government agencies were considered as attacking. In the examples from my report, we are speaking in the context of password authentication rather of non-governmental malicious sources. These include those for whom hacking is a hobby, and you can talk about random password leaks. If you take the average world data, then a weekly in a large company leaks hundreds of thousands, and sometimes millions of passwords.
And why are those functions that are used now for bad hashing?
They are deprived of this property. This is one of their features, but there are still some other disadvantages. There is not enough concurrency in them. That is, because of this, they cannot use fully modern processors that are capable of simultaneously performing many tasks. Their algorithms and code were developed in the 90s, and sometimes earlier. The development of Colin Percival is devoid of this shortcoming. It has one of three parameters that indicates the desired amount of parallelism. In my report, I will talk, alas, about the shortcomings of his method. It turns out that it is quite difficult to implement it in companies with the volumes of user bases that are mentioned in it. The reason is that there are quite stringent time requirements for one authentication performed. Obviously, this is primarily due to how much the user is willing to wait. Colin's research suggested 100 milliseconds. Another limitation is server resources. If a large number of users are free accounts, the company will not be ready to invest big money in their authentication. It is necessary to reduce its cost. For a company of the scale of Yandex, even if we limit ourselves not to 100 milliseconds, but to 10 or one, we still need to put a cluster somewhere in 10 servers. In this situation, the use of scrypt becomes quite complicated. And the difficulties associated with in order to fill a decent amount of RAM in one millisecond. You can just fill it in, but our function should not be simplified; the size of the memory area, which should be in fast memory, should not decrease. In order to provide protection against quick fitting, which is faster than we can calculate ourselves, we have to use slow RAM and perform non-trivial operations on it. And these are not just sequential execution and reading. And putting it all in one millisecond is quite difficult.
It seems, first of all, it is important for large companies that work with especially valuable knowledge.And what to do even more massively - conditionally like Yandex, whose users will be better off if they spend their resources on some other development?Are there any alternatives for them?
In my report, I just talk about it. The one millisecond level is already a compromise: I would like a hundred, but we are ready to go down to just one.
Is it still possible to use scrypt?
At the same time there are difficulties with scrypt: it is necessary to find its modifications. I offer two. In addition, I am considering the installation of additional equipment that prevents the selection of passwords on a completely stolen database. And if the secret key is stolen - the so-called local parameter - stored in this additional equipment too.
Does it look like salt or is it something else?
Yes, it looks like salt. Some call it the second salt, some - pepper. But it will be encoded in the hardware device. This may be a regular service. It just means that we have one line of servers with authentication, and behind it - the second row with servers of password hashing. That is, servers with authentication and password hashing are posted. As a result, password hashing servers have far less possible attack vectors for them, because they provide a very simple interface: they are given a password - they return a hash. In the simplest case. Therefore, it becomes harder to hack them and they provide an additional level of protection because they are not directly accessible from the rest of the company's services. While authenticated servers are accessible from services such as webmail, for example.
Should therefore be isolated?
Yes, inaccessibility should be provided either by physical cables or a cryptographic protocol, if it is through a tunnel inside the company.
What security issues seem important to you, but underestimated?
In general, in this area, little has changed since the zero years, from the 90s, and perhaps from earlier ones. This is due to the overall trade-offs in various areas. In some cases, security is opposed to user convenience, in others - the commercial advantage of companies. Because of this, it turns out that much in matters of security is undervalued.
In my opinion, in many areas it is possible to find more correct compromises than those to which the industry has now arrived. The complexity of devices grows, the complexity of programs grows, and, accordingly, the number of vulnerabilities grows. True, not so fast, because at the same time the techniques of their search and elimination are being improved. Large companies such as Microsoft have already begun to incorporate security aspects into the development process. Even the relevant terms have appeared. S-SDLC , for example. The complexity of devices that may be in demand could be limited. But partly it is connected with the fact that when the number of computer users and Internet users grows, new people come who are far from computers. And this is good. It turns out that they do not know what to expect, what to ask the producers and what to vote for with a ruble. They buy devices that can be more beautiful and more complex, but do not consider simple alternatives that can be safer. Accordingly, manufacturers do not have the motivation to make simple and safe devices. A similar problem exists, but I cannot say how to solve it, because it is sufficiently fundamental and related to what is happening in society. And in general, these are good processes.
On the one hand, these are good processes.On the other hand, it is clear that at some level they will lead to some serious consequences, which may turn them in the opposite direction.
Yes, for example, very few people know about the risks posed by mobile devices and how to take them into account, and everyone uses them. It turns out that we are all somehow vulnerable. And with this little can be done, and in the coming years, the situation is likely to only get worse.
But is it such a user level, but at a slightly higher level?For example, awareness of problems in the industry.What, in your opinion, is happening there or not happening important.Also, nothing has changed since the 90s?
The vulnerability search technology has been greatly improved in both the program code: the source code and the binary one. The company Veracode has now made impressive binary code analysis mechanisms. Very impressive result. The way they analyzed the binary code can be compared with how competitors analyze their source code in order to find vulnerabilities for their customers. Accordingly, customers who have not yet decided to provide the source code for an audit of a third-party company were given the opportunity to provide a binary one. It is compiled in a special way, optimization is turned off and debugging information is turned on. Not only for processors, there Java bytecode can be given. Veracode is very successful in this regard. We can say that in her account the achievements of the past few years.
In which cases it is more useful to study something compiled, and not the source code, when you still have the opportunity to study it even without any special optimization?
Both options are relevant in the presence of both. When analyzing only the source code, it turns out that we are forced to use either compilers, in fact, analogous to those with which we usually compile a program for its distribution and use for its intended purpose. Or we use a static analyzer, which has its own high-level language parser and other components of its own, and will be different from the compiler. In particular, it will introduce some distortion.
But compilers, in which optimization is disabled, will also differ.
Yes, there is such a flaw. Why do I need to disable optimization? This is due to the cost of computer resources for the analysis of this binary code. Their algorithm is rather complicated there. I spoke with the staff of Veracode on this topic. It turns out that if the optimization is turned on, the compiler is so aggressively using the registers of the processor, which moves the same variable between registers within the same function, then the complexity of the analysis grows exponentially. In principle, for a high cost you can analyze, and if the project is small, you can optimize and analyze it in the same way. And the debug information is needed to ensure that the report contains a link to the source code, line numbers, identifiers. Convenience in this matter is also very important, because the reports are large and if they are inconvenient, they become meaningless.