📜 ⬆️ ⬇️

Interview with Grigori Kornilov (Kaspersky Lab)

(Kaspersky Lab)

We interview Grigory Kornilov, senior service manager at Kaspersky Lab . Gregory provides users with a computing infrastructure using the resources of cloud IaaS providers and is responsible for ensuring that this service conforms to strict corporate SLAs.

Gregory will tell us about the two-year experience of using IaaS and how they came to the decision to use external cloud resources.
')
Gregory, how long and under what circumstances did Kaspersky Lab turn its attention to cloud services?

The use of "cloud" services in the infrastructure part began 5 years ago with a small project with a company that built a private "cloud" for us. However, the interaction with this company was carried out not so much as with the service provider, as with the company that hosted our equipment and supported it without any obligations on the functionality of the virtualization environment. It turned out that we bought the equipment, placed it from a third-party company, and it became the operator of our equipment. The lack of formal legal responsibility for the functionality of the virtualization environment affected the attitude towards us and our requests. The motivation and speed of solving the emerging problems did not suit us.

This prompted us to look towards service providers, i.e. those companies that when working with us will understand the width of their area of ​​responsibility. At that time, we expected a fairly large expansion, and we did not want to increase the number of our administrators, to buy our hardware, to buy counter-seats. We made certain assessments and realized that the service provider would make the best or comparable offer, and the labor costs for infrastructure development will be minimal. Also, from experience, I can say that services usually become cheaper, and their employees, on the contrary, are becoming more expensive. Thus, it was decided to use the cloud infrastructure of the service provider, clearly setting out what we are ready to answer for, and what the service provider should be responsible for.

Have you considered only Russian cloud providers or western players as well?

First of all, we, of course, considered Russian representatives, since we needed to organize high-quality network connectivity with our infrastructure. After studying the market, certification of companies, the practice of providing cloud services, we realized that this segment of the IT services market is most developed in Moscow and St. Petersburg.
We considered several European service providers, but did not find significant advantages for ourselves. In addition, when posted abroad, questions arise about network connectivity, speed, and mentality, time differences, when the third line of specialists is in another time zone. All these factors could adversely affect the level of service support. Therefore, preference was given to the Russian service provider. As for the cost, I would not say that services in the west are much more profitable.

Grigory, what is the reason for the rising cost of its employees?

To keep in your staff employees who would support the decision independently is problematic. These are very high level specialists. A critical task of our company is to support the business process of releasing updates, which should work 24/7. It turns out that we would have to invest in round-the-clock support service, including high-class specialists. And the cost of high-class specialists in Moscow only grows.

Gregory, have you considered renting dedicated physical servers? There are still a lot of disputes and comparisons of renting cloud and dedicated servers on the Internet.

We did not consider the option of dedicated servers, because our main expenses are spent on system administrators who work not with servers, but with virtualization software. These are the most expensive specialists. Therefore, if you use the services of a third-party service provider, then certainly including the administration of a virtual environment.

We have sufficient experience in the rental of standpoints, the purchase of equipment and its operation. We know how much the support line costs. Leaving the most valuable part of yourself is the same as not taking a step towards service. Usually, really serious problems do not occur with equipment that has been duplicated by both units and component elements. Therefore, only hosting is given to the service provider - inefficient.

Gregory, when considering cloud providers, what parameters did you pay attention to?

We have formed our vision of the service we want to receive. Immediately the shallows of providers who did not agree with our SLA (for example, the penalties requested) or with the distribution of areas of responsibility that we expected. We also shoal service providers who did not agree with us on technical issues. Our users are accustomed to working with certain tools within our own infrastructure, in connection with which we put forward the requirement for the provider to provide identical tools for working with the cloud. These are the highlights.

How many service providers refused to see your SLA?

Of the eight, one refused, he was not satisfied with the tools we requested and the required SLA.

What tools are we talking about?

We need access to manage resource pools and virtual machines directly in VMware vCenter. We have determined that the service provider will be responsible for the general virtualization environment, and we will be given reduced administrative rights for specific resource pools so that our administrators can create new machines on their own, switch them on and off. The failed service provider insisted on using VMware vCloud Director, the management through which is significantly different from the vCenter we already use.

I understand correctly that you considered only those service providers that also use VMware?

Yes. Our administrators who work with virtual machines (do not support virtualization, namely working with virtual machines), are used to working with vCenter. To retrain them to another interface, to force them to work in various environments is an additional and unnecessary cost.

Grigory, how deeply did you study providers? Have you checked the actual compliance of the capabilities to comply with the declared SLA?

We started the study of providers with checking the availability of their certification from a vendor, we looked at what kind of partner status they have. We looked at the quality of the data centers, on the basis of which the service will be implemented, studied the experience of the service provider with VMware solutions.

A very important point for us was the competence of the storage service provider, in our opinion this is the most difficult thing in the infrastructure. Failure of the server leads to a small downtime of part of virtual machines, because the virtualization environment will restart them on the backup servers. And the failure of storage can "put" almost all virtual servers at once. There are cases when improperly installed software can lead to unavailability of the entire repository.

Usually, one data storage system is used, fully replicated and fault-tolerant in the configuration from the manufacturer. In this case, the competence of the employees who manage and maintain this storage facility is extremely important. And the status confirming the large sales of the storage system as a whole is not enough. It is necessary that there be certificates for storage administrators.

Did you pay attention to the storage level, its configuration, disk types?

Yes, we required a storage system at least mid-range level. So that any operational work on updating the repository was carried out without any downtime. This was our prerequisite.

Based on our own practice of using data storage systems, we already knew in advance what configuration of the disk subsystem we needed. This we also defined as a requirement. Service providers, on their part, offered additional benefits (caches, solid-state disks).

What tasks have you brought to the cloud?

Our company has a business process for the release of updates to modules and anti-virus databases of our anti-virus products. This process is critical, 24x7 constantly running pipeline, ensuring that the update is tested on all supported platforms and provided to our users on time.

Does the entire staff of experts who produce these updates work around the clock?

Yes, round-the-clock services of system administrators, developers and virus analysts are involved in the process. If this process were halted, our products would become less quality to provide anti-virus protection, and this should not be allowed.

What is the resource request process for your internal customer?

For the internal user there is no fundamental difference from where he will receive the necessary resources. The user can send a request to our internal IT support service for the expansion of a resource pool or the creation of a new resource pool. The user knows that the budget that our company pays for resource pools will then be classified by user and an internal billing will be carried out.

I also note that the internal user is warned that a certain level of expansion of resources requires a certain time. Only an approved contact person can contact an external service provider for additional resources.

How hard are the parameters in the SLA of the service provider?

We proceeded from the fact that the SLA with an external service provider should be stricter than our SLA in front of our internal users.

2 hours is the maximum idle time that we allow for the service provider, given that this is not the only environment. It has its own computational power, which can compensate for the risks associated with downtime.

If idle time is more than two hours, then there are already penalties. A fine in the amount of payment of 100% of the cost per month is achieved with a service downtime of more than two days.

Did you sum up the interim results of your infrastructure in the cloud?

Of course, let down. We collected statistics on the availability of the service and discussed it with our internal business customer. The quality of service was rated a solid top five. The only wish was to implement the service from the second provider with the same quality in order to reduce dependence on one service provider.

How do you measure service availability? Do you use any monitoring tools?

Of course, the availability of the service is controlled by our monitoring systems, through which we report to the internal customer. Problems can arise not only at the service provider, but also at the interface of the infrastructures or on our side. It is worth noting that our internal SLA is wider than the SLA of the service provider.

Do you plan to transfer other services to the cloud?

Now we have no such plans. I note that our own infrastructure is likely to remain with us. We need internal competence and a minimum level of independence.

Did you put additional requirements on the service provider that we haven’t talked about yet?

Yes, for example, the integration of service-desk systems. We were not satisfied with the option in which someone has to go somewhere and do something to get a request. We defined the requirement for the implementation of the integration of IT service-desk systems based on the exchange of e-mail messages of a certain format.

The user contacts the IT service desk of our company. In our system, a support service specialist assigns an application to a specific group. This automatically sends the service provider a request for an agreed mailbox. The service provider responds in a specific format so that his answer is tied to the initial request in our IT service desk system and is immediately visible to our user. Thus, we accelerate the interaction of the service provider and the user, reducing the load on the first line of our support service.

Grigory, what advice can you give to colleagues from other companies that are now thinking about using clouds?

First of all, you need to coordinate with your security service what data you want to put on the side of the service provider. The security service will determine what can and should not be given. We carried out such a risk assessment and introduced a certain restriction on the use of this service.

Thank you very much.

Goodbye.

The interview was taken by Sergey Chukanov, development director of IT-GRAD company

Source: https://habr.com/ru/post/234229/


All Articles