📜 ⬆️ ⬇️

Practical tips for choosing a cloud provider

Choosing a cloud provider is a difficult task. In this post I will tell you how to approach it, what to pay attention to in the first place, where the catch can be hidden, and how to build communication with the provider in general. Below - about the most complex and complex scenario of events, the transfer of the entire IT infrastructure to the cloud. Let's look at the transfer to the “cloud” of a critical part of the IT infrastructure, the inaccessibility of which for even a few hours can cause significant damage to the company's business.

Memo


How to weed out hosting providers
  1. Is server virtualization used in principle?
  2. Is data storage virtualization or network virtualization used? These are optional requirements, but they indicate the technological level of the cloud provider.
  3. How to manage services? Is there a self-service portal? Can I launch new servers myself, manage the performance of already running ones? Can I add disks, configure internal addressing and manage routing? Can I set up a backup schedule myself and run data recovery tasks? Etc.
  4. How are resources counted? Is there an automated billing (per second-hourly)? Or is everything counted by hand?


Playground
  1. Where is the data center located: abroad or in the Russian Federation? How far from your office and the second data center, if any? Delays?
  2. Who owns the data center? Can I log in to see?
  3. Is it certified? What were the accidents at this site earlier?
  4. What communication providers are present at the site?
  5. How can I connect to the "cloud"?

')
Cloud Services
  1. What is vCPU (virtual core)? What does it mean: the whole physical core of the processor or, for example, its quarter?
  2. What disk resources are used? Local or connected via SAN?
  3. How are channels up to the Internet?
  4. What to do if the standard functionality of the "cloud" is not enough? Is it possible, for example, to connect specialized network equipment or non-x64 architecture machines to the “cloud” and so on?
  5. Is hybrid mode available? How is the integration done in this case?
  6. Is there a backup service?
  7. How are the IB facilities available in the database, which ones should be ordered separately?
  8. If you need to build HA (high availability) or DR (disaster recovery) solutions, is it possible to split parts of the hosted IT service between two data centers? Does the provider have a second cloud to build such solutions?


Support
  1. Does 24/7 support respond quickly and to business, and not “we will understand later”?
  2. Language - Russian and English?
  3. How far can you go for SLA, if you really need? (As a rule, in the West - not one step to the side).
  4. Do I need to contact support for monitoring resources and balance, or is all data available through the self-service portal?
  5. Is there a demo mode? How is it different from the "combat" and what exactly?


Remark: many questions will not be of particular practical use in the framework of the transfer to the "cloud" of the website - not the scale. Although the sites, of course, are different.

1. Site selection



The choice of cloud provider begins with the physical site of the data center. If the data center of the provider “becomes”, then the “cloud” will be disconnected, which means that all IT systems working in it will become inaccessible. Many ask the provider about the internal means of ensuring the availability of the cloud platform. This is correct, but it is not enough.

Find out where the data center is located
In Russia or abroad? This is important because, according to the legislation, some type of data cannot be moved outside the Russian Federation. The reverse is also true: some of these companies seek to hide in the western data center in order to at least partially protect themselves from inspections. In addition, if you need to transfer the entire IT infrastructure to the cloud, the issue of delays is relevant. Let's say you decided to bring some of your systems to a public “cloud” somewhere in Ireland. Are you sure that the available delays on communication channels will allow you to work comfortably with the systems? This moment stops many, and the choice is made in favor of local data centers.
Our example: at CROC, all 3 of its own data centers are located in Moscow and are united by a single optical ring.


Who owns the data center?
You should also pay attention to whether the data center is owned by the provider or the provider leases it from its partner. There is nothing bad in the rented site. It is quite possible that highly qualified specialists work in the company of a partner who are able to act quickly and harmoniously in all situations. But still, when renting a site, the path of your application from the moment of dispatch to the beginning of the work of the engineer of the company-owner of the data center is somewhat longer. In this case, the cloud provider acts as a transmission link, and the scheme of work is less flexible and operational.

When choosing a site, you should also pay attention to the opportunity to visit the data center. If the data center is owned and the provider has nothing to hide, you will be happy to be invited on a tour. If the excursion is impossible, then it casts some doubts. Refusal means that the site is rented and the cloud provider could not agree with the owner of the data center about the tour, or that something is wrong on the site. Of course, there may be other reasons, but doubts about the reliability of the data center still arise.
All CROC data centers are owned. Excursions to the data center are conducted on a periodic basis: both group and individual.


Is the site certified?
The leader of the data center certification market is Uptime institute. This company maintains an up to date list of reliability requirements for data center components. Its requirements and recommendations are based on practical experience in operating data centers around the world, taking into account real data center failures.

Uptime certification consists of two stages: certification of the project on paper and certification of the finished site. Site certification takes up to three weeks. Uptime experts come and personally verify the compliance of all technical solutions on site with design solutions on paper. According to the results issued a certificate of conformity. At the moment in Russia there are 5 certified data centers, one of them belongs to us.

There is an alternative to certification by the Uptime institute - a test for compliance with the TIA-942 standard. This is an American standard that carries recommendations for creating data centers. The disadvantage of this standard is that it has not been updated for a long time and lags behind Uptime in terms of a number of requirements. Also, a big disadvantage is that this standard is of a recommendatory nature and the data centers are not checked for compliance with it, at least in Russia. You have to believe the honest word of your cloud provider.

In general, the issue of data center certification is a source of eternal disputes. Many people say that certification is useless, that they still don’t believe the certification authority (Uptime institute), as they don’t trust the data center service provider. Many, on the contrary, trust only the practice of working with external auditors.

If we approach this issue in a constructive and sober way of looking at things, then, other things being equal, there is more trust in the certified data center. In the event of an accident at such a facility, the reputation of not only the provider, but also the external auditor will suffer - and Uptime's reputation is expensive. There are lots of nuances on an uncertified site that are very difficult to verify if you are not a specialist. The certified site has been tested by external specialized experts and contains significantly less controversial technical solutions.
I will give an example of a small, seemingly detail, which can distinguish certified data center from uncertified. She is from the category of things that the customer even would never think to check. There is an air conditioning system in the data center. It consists of an indoor unit (fan coil) located in the data center and an external cooling unit installed on the roof (chiller and cooling tower). The air conditioning system is reserved under the scheme N + 1. The output of any air conditioning unit does not cause the data center to stop. The problem lies in the fact that in order to replace a failed air conditioning unit, you need to shut off the coolant supply. And if the supply is only one and is not reserved, then all air conditioners will turn off, which means the data center will stop. This is where the “cloud” comes up with your systems.

Here is another example: a hurricane was several years ago in Moscow. A sheet of metal was blown off the roof of a neighboring building and threw it onto the roof of the data center. Liszt interrupted the coolant supply to the outdoor air conditioning unit on the roof of the building. Who could imagine such a scenario? Who would have thought that something could happen to the cooling system placed on the roof and fenced in? As a result, coolant flowed out, the data center stopped, all customer systems were turned off.

If the site had been certified and built in accordance with the Uptime institute TIER III standard, it would be possible to switch to the reserve coolant supply and isolate the damaged section of the pipeline. Therefore, if you choose a cloud provider for transferring serious tasks, you have to pay attention to many aspects, right down to the data center level. Because cloud services are “matryoshka”, and the data center is its core.

Someone may notice that, they say, no matter what the data center provider is there, we have a clear SLA, we work within it. And we do not want to drive and watch the site, we need a “cloud”. But few people think that if the data center, and with it all your systems, "stand up", then your management will be the last to be penalized. And first of all, there will be a scandal due to the fact that the work of the company has risen, and we need to urgently do something about it.

CROC currently owns three sites:
• Volochaevskaya 1 (70 racks + cloud 1) TIER 3 TIA-942
• Volochaevskaya 2 (110 racks) TIER 3 TIA-942
• Compressor (800 racks + cloud 2) TIER 3 Uptime institute

Read more about the principles of certification here . You can check the UI site certification here .

Communication with the outside world
You should always ask the cloud provider how to connect to the data center and to the “cloud”, in particular, from the outside. What are the default Internet access options? And is it possible to connect to the "cloud" using point-to-point channels?

If you can connect to the “cloud” with your own channels, you need to ask which communication providers are present in the data center, as communication services are monopolized on some sites. For example, there is only provider X and that's all - you cannot bring your providers.
CROC’s data center network currently has 13 telecommunications providers, and we are ready to accept providers that are convenient for you.


2. Cloud platform



CPU resources - what do we pay money for?
We figured out what you should pay attention to when choosing a physical site. Now let's move on to the list of questions that should be asked when choosing the cloud platform itself. Let's start with the principles of the allocation and sale of computing resources, namely, with the processing power. There are several ways to sell processor capacity on the market:

The first option is usually offered in a beautiful marketing wrapper. You can choose the lower and upper limits of the allocation of computing resources. The lower limit will be allocated guaranteed, and resources between the lower and upper bound will be allocated on demand and paid upon use. The approach is beautiful. But there is a nuance. When you urgently need all the resources available to your virtual server, it’s far from the fact that your neighbors in the physical server do not use them and that they can be allocated.
The second way of providing resources is less flexible in terms of payment, but more stable in terms of resource allocation and operation of your systems.
The use of this or that approach has a number of advantages and disadvantages that can be advantageously used in a given situation. Both approaches have the right to life.
The CRIC, in particular, allocates resources guaranteed.


What is vCPU?
Cloud providers measure the processing power of their servers in vCPU. Let's see what it is. Here are some options for calculating the power of vCPU that I came across:

You can somehow sort this out by asking the provider for the calculation methodology. But there are other pitfalls. Since 2007, the power of processor cores has increased 3.5 times. This can be seen in the available types of virtual servers on Amazon. In 2007, Amazon began to provide cloud services, and for this equipment was purchased. Then Intel Celeron processors were used. Their performance was measured and taken as a reference. The benchmark was called ECU (Elastic compute unit). Now you can order virtual servers in Amazon, the capacity of the physical cores of which is equal to 3.5 ECU. From this we can conclude that the power of the processor cores has increased 3.5 times over the past 6-7 years.

And now we take into account that a cloud provider under vCPU can mean not parts of the physical core, but parts of it, but it can also use the old hardware. This means that vCPU may differ by 20-30 times for different providers. You should always ask what vCPU is, how vCPU relates to physical cores, and what processors are generally used.
CROC, in particular, is attached to the methodology of measuring the power of Amazon processors. The power of our vCPU is 3.23 ECU and corresponds to the power of the physical core of the Intel Xeon x5650 processor 2.6 GHz.


Disk resources
When choosing a cloud provider, you should pay attention to the disk resources that are provided to virtual machines. First of all, it is worth asking how the data storage looks physically:

The first and second options are fraught with data loss or long inaccessibility when the server fails. Amazon, in particular, as a bonus, provides customers within virtual machines with disk space on local disks. Failure of the server is fraught with the loss of all data. For an additional fee, it is possible to rent additional disk space on the external storage system (EBS).

At the stage of shaping the cloud platform architecture, CROC refused to use local server disks for storing virtual machine data. All server disks are stored on SAN-connected storage systems. The failure of a physical server leads to automatic restarting of virtual servers on the surviving part of the “cloud”.


The second key point in the consideration of disk resources is a guaranteed SLA in terms of IOPS, speed of reading or writing data. Does your provider guarantee certain storage performance? CROC cloud platform discs are provided without a guaranteed SLA. The company makes every effort to timely scale the existing storage systems and add new ones, is engaged in continuous monitoring of performance. If the customer needs guaranteed IOPS near the “cloud”, it is always possible to place a physical storage system in the data center that will satisfy these requirements. Our large customers do this in practice. Fortunately, there are 3 own platforms, ready to accept physical equipment.

How to connect to the "cloud"?
Cloud providers provide default access services for virtual servers on the Internet. And they do it, naturally, in different ways.
First of all, you need to ask if this connection is reserved? Are different providers used? How is switching in case of failure of one of their communication channels?
CROC, in particular, provides for access from the Internet two 1Gbit communication channels from different providers operating in active-passive mode with automatic switching between them at the autonomous system level.


The second important question is whether the provider is able to shape the bandwidth of the communication channels to the Internet, that is, to guarantee a certain bandwidth.
CROC does not provide this service, but constantly monitors the utilization of channels and promptly tries to expand the capacity of communication channels.


All our large customers work with convenient telecom providers and organize point-to-point connections. This is a safer way to connect than an Internet connection.

What to do if the standard functionality of the "cloud" is not enough?
The cloud platform is not a panacea for all problems. It can not solve absolutely all IT problems. Here are examples where the cloud platform will not help you:

Examples can be listed for a long time. The fact remains that someday you will outgrow the built-in functionality of the cloud platform, if you have not already done so. What will you do when you hit the ceiling? In this case, the provider needs to ask if there is a possibility to connect additional physical equipment to the “cloud” and place it in the data center.
CROC provides such services. Moreover, most of our customers rent physical equipment from us, consume it in the cloud mode. The most commonly used lease of network routers is to connect dedicated communication channels.


Integration with the "cloud". Hybrid mode
Most likely, when you think about moving to the “cloud”, you will not do this at one moment. There will be a long transition process when you will live both on the local site and in the “cloud”. In some companies, this process may be delayed for a year, and perhaps it will never end.
In this case, it is important for the provider to ask about the mechanisms for integrating your local IT infrastructure with the "cloud".
If we consider the use of a public "cloud", then by default the management of network settings on the "cloud" side is almost absent:

Network management within the “cloud” is far from the level of flexibility and convenience that is available on its own site. It is worth forgetting about network integration tools. From strength there is the possibility of building a VPN tunnel between sites.
Fortunately, there are technical solutions for implementing the same convenient mechanism for working with networks in the "cloud" as on the local site.

CROC uses network virtualization software, which was discussed in a previous post . This software allows the customer, through the self-service portal, to manage:
  • Internal Addressing of Cloud Networks
  • Create the required number of additional networks
  • Manage access between them via firewall
  • Configure static or dynamic routing
  • Upload configurations for local physical network equipment to configure VPN

Thus, this functionality allows you to configure internal addressing in the “cloud” in the way that is most convenient for you, up to the construction of horizontal L2 networks between sites. You can independently configure VPN and register routes. In fact, you can manage your network settings in the cloud just as you would on your site. The “cloud” actually becomes a logical continuation of your local infrastructure at the customer’s site. IT systems can work perfectly transparently with each other, despite the fact that they will be located in physically different places.

If you intend to tightly integrate the part of the systems put into the “cloud” with local systems, you will need to ask the provider about network management capabilities.

Data backup
Data backup is one of the basic services of the cloud platform. But many people forget about him. The provider must first ask whether there is such a service in principle. If you still have, you can ask additional questions. Used commercial product or Opensource? If you use free software, then you should immediately lay the risk that the provider itself, and not the vendor, will support this decision.

Hence the second risk. The vendor packs its backup software with all the necessary agents for consistent backup of the application software. It is unlikely that you will be able to backup SAP using the Opensource backup solution. The provider always needs to ask for a compatibility list for this backup solution.

It is important to clarify with the provider, on which copies are stored, on disk or on tape. The tape is a cheaper information carrier, but the risk of not recovering is much higher. Moreover, work with the tape is slower than with disk media. If the information is stored on disks, and you need to periodically rewrite copies of the discarded media as an additional service, then the provider needs to ask about this possibility.

And finally, the important point is the type of service. How can she manage? Is there a self-service portal? Is it possible to manage the backup schedule itself, back up and restore data without involving a provider?
CROC has built its backup solution based on the EMC Avamar software and hardware complex. Information is stored on disk in deduplicated form. The service is managed through a fully functional self-service portal.


Information Security
This question is the most frequently discussed and most terrible for cloud providers. So what should you pay attention to? In addition to questions about the means of ensuring availability (redundancy of equipment and communication channels, separate storage of data, RAID, backups), you should ask about the composition of the built-in access control tools, as well as a list of additional information security services.

In terms of built-in tools, you should ask about password protection of Windows machines about protecting Linux machines with keys. You have created a new car, how is access protected? Does it immediately become accessible to the entire Internet by an external IP address with a standard password?
It is also worth asking about network access management. Is it possible to manage the built-in Firewall, if there is one at all, of course?

As part of additional services, you need to ask the provider about the possibility of purchasing services for (D) DoS protection, as well as systems for detecting and preventing intrusions (IDS, IPS) and for renting antivirus and antispam.

Build disaster recovery or high availability solutions
There is some chance that your tasks will become so critical that you need to reserve them between data centers , cost clusters, set up replication between storage systems. Is your provider ready for this turn of events? Or do you have to attract a second provider?

Attracting a second provider is fraught with dilution of responsibility and working with two technical support services, possibly within different SLAs. This can adversely affect the performance of your IT infrastructure.
CROC currently has 2 own cloud platforms located in different data centers (Volochaevskaya-1 and Compressor). Both platforms are managed through a single self-service portal. This allows you to build highly accessible distributed solutions. Well, or at least, store backups on a remote site relative to the main data.

Moreover, there is no need to sign an additional contract. For the customer, the process of working with two cloud platforms is completely transparent.

Conclusion


Even asking the applicant provider a whole list of questions and getting answers to them, you should not stop at that. It is necessary to ask for a demo access in order to make sure everything heard. In the end, test the performance of servers, storage and network on their tasks. Testing will be the brightest indicator of whether a particular provider is right for you or not. Does he vouch for his words, do his promises correspond to reality?

But this is not all. Unfortunately, many features of working with the cloud platform cannot be felt at the testing stage. Some surprises will inevitably emerge in the process of prolonged operation of the service. These surprises can turn out to be either uncritical or very noticeable, after which the possibility of subsequent work with this provider will be questioned.

At the end of communication with the provider you should always discuss the issue of moving to another service provider. You need to know the possibilities of data collection, the conversion of virtual machines in the desired format, etc.

Source: https://habr.com/ru/post/176803/


All Articles