
- Ok, now show your static code analyzer.
- Meet this Peter.
“Nice to meet you, but ...”
- Peter is our static code analyzer.
When you work with payment data, you must provide a certain level of security. This level is described in the PCI DSS standard developed by Visa, MasterCard and other payment systems. It is important because it applies to all participants in the process of working with cardholder data, but there are additional requirements for service providers.
The standard has 12 sections - starting from the fact that the security service should monitor the change of access and the removal of passes after the employees are dismissed, and ending with how and where all kinds of logs should be written.
')
I'll tell you how we certified
our cloud platform and how many nerves it exhausted.
Problem short
Initially, each payment system - Visa, MasterCard, American Express, JCB and Discover - had its own programs with a minimum set of security requirements when working with cardholder data, and these requirements overlapped. Then the Payment Card Industry Security Standards Council (PCI SSC) was formed, which included the payment systems mentioned earlier. In 2004, the first version of PCI DSS 1.0 was released. It described the minimum requirements for all participants in the process of storing, processing and transmitting cardholder data.
Somewhere in the second half of the 2000s, cloud computing began to actively develop. It turned out that some cloud nuances as a relatively new type of hosting are not taken into account in the current version of the PCI DSS standard. For example, the architecture in which there is a constant change in the set of components and their number, as well as the implementation of some technical solutions is not taken into account. Part of the requirements may not be applicable due to the lack of different processes. For example, the requirements of section 4 about the encryption of payment data during transmission over public networks. In our case, this task falls entirely on the client. For the most part, the same requirements apply to both the hosting site and the client’s virtual infrastructure.
In general, all the points are very sensible and at first glance understandable. That's right to take and do. But when you start to do - begin to complexity and interpretation. For example, in the requirements of the standard there is an item about the need for static code analysis. We have most of the code written in Python, at the moment there is no static code analyzer - more precisely, there is, for example, Bandit, but specifically it did not fit our history. Therefore, Peter began to analyze the security code. That we handed over as a static analyzer. Man Peter met the requirements for the analyzer. Around the same key were some other points.
Why certify the cloud
Our customers use the
cloud to solve their problems. For example, they decided to add some kind of service in the bank, and then after six months it turned out that he was eating too many resources in the production, and it was necessary to purchase iron. Iron rides for a long time, it is expensive, coordination agonizing. Naturally, it is more convenient with the cloud at times - and everyone has long been accustomed to paying for resources as they are consumed and to scale quickly.
But you can't just get up and get into the cloud - the site of the cloud provider (that is, in this case, ours), where customers store, process and transmit the data of cardholders of payment systems Visa and MasterCard, must comply with the PCI DSS standard. Otherwise, you can not work with him. If the client does not work with such data, then he does not need certification.
The customer can be certified himself on any infrastructure, but in practice it looks extremely difficult and sometimes impracticable. Certification will require active participation from the cloud service provider, since it will have to build processes and implement technical solutions to match PCI DSS at the infrastructure level of the cloud itself, since it will process card data. It is much easier to find someone who has already done his part of the certification. It is economically at times more appropriate. So we made this story.
Our certification allows our customers to migrate and pass their audits faster. This certificate covers all requirements for the data center (more precisely, in our case - data centers) and cloud infrastructure, including all technical and organizational processes. In reality, it reduces the time for customers to pass on their own audit and on explanatory documentation, part of which is already ready with us.
How was the audit
We have been preparing for an audit for over a year.
The entire cloud platform was built by us from scratch and initially did not take into account the requirements of PCI DSS. Yes, it is based on the KVM technology - these are the Red Heads, but this is not the only component of the cloud. Our developers had to write a ton of code and file existing technologies with a file so that everything worked as it should.
The standard has a whole section on how to develop software that is within the scope of PCI DSS, this refers to specific self-writing software for processing cardholder data, applications, web interfaces, etc. In our case, this included the code development cycle the entire platform. Three main groups - data center operation, cloud operation, cloud development - fell into audit. Plus, integration with certain corporate policies and processes of the company.
For all difficult moments, the story is this: there are requirements. No matter how the system works, the main thing is that the requirements are met. If the interpretation is unclear and how to land it on your infrastructure, you can ask the auditor to assess whether the solution is suitable or not. Often, at the same time, we had to enter into a discussion, proving the specificity of the platform and the correctness of the chosen solution, which is not always obvious. The mandatory requirement of certification is the annual passing of the pentest. For ourselves, we chose several models of violators: an external attacker, a compromised employee (he is a malicious employee) or a client.
One of the bright problems for the cloud is the need to ensure the functioning of the DMZ. In our concept it is very difficult to determine what exactly DMZ is, since in the classic version it does not fit into the platform. I had to make several micro-DMZs on each server accessible from the Internet or on the border with external local networks.
We went point by point.
Here, for example, you need to keep records of all system components documented. This includes everything related to the platform: network equipment, monitoring tools, all infrastructure servers, protection tools, etc. For each component, specify its location, software version, IP address. When you have 50 such components, there are no problems, but what if there are 500 of them? In addition, the number of components is constantly changing - not only in terms of the number of servers, but also by roles on each server. The whole cloud is built on roles, a server with a role can be added, a role can be added to the server. And all this diversity is constantly moving, and, as a rule, in a big way. It is clear that you will not manage with manual labor here, while you compile an up-to-date list, it has already changed, perhaps several times. I had to implement self-written automation, a cloud data collection system, in order to get a detailed report on all components at any time. The same was the problem with network schemes. The standard requires the display of all components at the L2-L3 model level of the OSI model. There are also more than 500 of them, and they are difficult to display in one diagram. And most importantly, the scheme will be completely unreadable. We were looking for options on how to group it so that it could just be covered with a glance. It is necessary to make a scheme, but it is difficult (and sometimes it seems almost unrealizable). Also found a way out.
There are monitoring issues — extended event logs and integrity monitoring on all components.
According to the requirements of the standard, you need to log the actions of all users, as well as collect all the logs of critical components on an external server and analyze them. The first part of the task does not look scary, even with a large number of components, since we can manage such a large infrastructure. But with the analysis there are difficulties. Logs arrive in huge quantities, if viewed manually, then it will take weeks. Maybe even months. To solve this problem, a special mechanism for collecting and analyzing logs was implemented, which made it possible to monitor logs in real time and issue alarms in case of triggering triggers. Naturally, the trigger had to sweat ...
The second problem is the integrity control mechanism. To begin with, it was necessary to define a list of critical configurations and logs, which you need to monitor, create templates for them and spread according to roles. In principle, the problem is quite trivial - for one but ... Due to the volume, the trigger can work on an automated process. At first, we received tons of letters about the operation of the monitoring system from all components. It took a lot of work to set everything up.
There is a requirement - conducting an internal scan for vulnerabilities and penetration testing. Such activities are always carried out on the combat infrastructure, and not in testing, otherwise there may be serious discrepancies with the productive. Accordingly, there is a serious risk of something to put or reduce performance. Therefore, we must treat with the highest responsibility.
Before conducting an internal scan for the first time, you need to very carefully configure the scan profiles and run them on the test many times to identify any deviations from the normal operation of the services. The estimated part was quite interesting. The same applies to penetration testing - you need to discuss all the details before you start breaking the prod. During the test, I had to look closely at the monitoring systems in order to notice any deviations.
One of the most terrible tasks for us was the introduction of an “internal” firewall. Due to the architecture of our platform, we needed to screen all connections on each server with the policy “everything that is not allowed is prohibited”. It was necessary to introduce a centralized management mechanism that could configure the firewall, depending on the role of the component. But each server can have several roles. Naturally, I had to develop my own version. How much time and nerves have been spent for setting policies for each role is not transferable. And it was doubly scary to launch it in the future - anything and anytime could fall off. Therefore, implemented under strict control and in stages on each role.
And so on. In general, all the requirements, I repeat, are quite understandable and obvious. Difficulties arise when imposing on the real infrastructure, all the pitfalls are not immediately visible. Initially, it is necessary to clearly define the areas of responsibility between the cloud provider and the client, from which a list of requirements is formed. And this is not so simple and obvious. Only after that you can begin to implement them. As practice shows, most of the requirements are imposed on both the site and the client. For example, firewall requirements are imposed both on the protection of the cloud itself and on the client’s virtual infrastructure. And this is a completely different implementation and responsibility.
Total
As a result, 478 Jira tickets were delivered in the project, it took 1 year. We constantly pulled the auditor with questions, he answered, interpreted and shared world experience. The result is a certificate for a cloud platform based on KVM. And at once several customers who want to process their payment data there.
As with other audits, the company as a whole benefits from this - if it is done with full dedication, because it affects the organization's processes and whatnot, and there the best world security practices. Yes, several units have added work, you need to follow all the processes for maintaining the status of PCI DSS annually. But in general, it became better.
Links