Hello!
Today we want to share with you the experience of using Microsoft Azure to ensure the scalability and resiliency of our monitoring system of servers and sites
CloudStats.me .
It should be noted that, before starting work in the 6th FRIA Accelerator Kit and receiving a grant for using Microsoft Azure resources, our platform, like most, worked on regular dedicated servers shared on OpenVZ virtual machines using the
SolusVM panel.
')
Initially, we used several Online.net, Redstation and OVH servers in 2 x Intel Xeon E5 2620v3 configuration, 128 GB RAM, 2 x 500 GB SSD, H / W RAID1, the disk system of which provides up to 15,000
IOPS according to our tests. For any statistics collection platform, disk system performance is particularly important because of the need to process a large flow of incoming data and write operations to the database.
As we
wrote earlier , we used the free
HaProxy load
balancer with Apache Tomcat and the
MySQL MariaDB cluster to distribute the load on our servers. On the one hand, dedicated servers provided the necessary performance, but on the other hand, they required separate monitoring of resources and did not allow for fault tolerance due to the lack of a separate balancer on the data center side, which could lead to a system failure if our load balancer falls.
Thanks to working with Microsoft experts in the framework of the accelerator and the BizSpark program, we tested the capabilities of Azure and divided the data storage into two types - SQL (MariaDB Cluster) and NoSQL (DocumentDB).
So what was done?First of all, we tested the virtual machine disk system in Azure. In accordance with the characteristics, standard virtual machines of type A0-A7 have rather low IOPS values in the region of 500 IOPS per VM. This is due to the fact that virtual machines use remote network storage, unlike conventional VPS, which are located on dedicated servers.
Nevertheless, there are several options for increasing the number of IOPS of your virtual machine using RAID and additional settings that can be found
here . It should be noted that when using RAID0, the number of IOPS really increases, we were able to achieve about 2000-3000 IOPS with 4 disks combined in RAID0.

If this is not enough, you can use virtual machines of the type “DS”, which have local SSD storage, and also allow you to connect Premium type disks (SSD), which provide much more resources (
link ).
Availability Sets / Auto-Scaling / Load BalancingDespite the peculiarities of the disk system, Azure provides quite flexible customization options to ensure application resiliency (Availability Sets), as well as scalability (Auto-Scaling).
We were warned that all system components need to be duplicated and added to Availability Sets to avoid denial of service during maintenance and update of Microsoft Azure systems, which occur quite often. Technical work can lead to random reboots of virtual machines and their inaccessibility during certain periods.
It should be noted that in the case of creating copies of virtual machines and adding them to the Availability Set and Auto-Scaling, they can be turned off and thus saved, because Stopped (Deallocated) machines are not payable (except for storage).
In addition, Azure has a load balancer that can be easily used with front-end virtual machines, for example, to distribute the load on Nginx / Apache. However, to balance the load on the database (for example, MariaDB), this balancer is not suitable, since in the MySQL cluster, you need to monitor the status of servers with a database, which Azure load balancer cannot do yet.
HaProxy or
MaxScale is better suited for MySQL (supplied by MariaDB, but does not have a graphical interface for status tracking).
DocumentDB vs MongoDB
As part of working in the stress test lab of Microsoft, we were tasked with ensuring the smooth operation of our application database. At that time, we used only MySQL Mariadb Cluster with the HaProxy load balancer, which did not allow for the required degree of fault tolerance due to possible cluster out-of-sync and the need to manually add new servers to the cluster.
Since the scheme used was not optimal, we decided to divide the data storage into two types - SQL (MariaDB) and NoSQL. We decided to continue using MySQL for user data, such as user accounts, payment data, etc., and select statistical and historical data on monitoring servers in the NoSQL repository, thereby unloading the main database.
We tested Azure DocumentDB and MongoDB as NoSQL storage options. At the moment DocumentDB does not have built-in support for Ruby applications, so we used the Java library with the base odm wrapper (we can upload it on request). Since DocumentDB is a DBaaS solution, plus is that you don’t need to spend time and effort on server support, unlike MongoDB, which again needs to be configured manually on the virtual machine itself. On the other hand, MongoDB will allow you to easily migrate the infrastructure of the service if you need to change the data center.
At the moment, we stopped at DocumentDB, although we continue to test solutions CouchDB (has a convenient control panel) and MongoDB.
Things to remember when working with the Microsoft Azure platform:- Azure VMs have fairly tight IOPS restrictions, which you can nonetheless increase with RAID0 and additional system tuning
- Virtual machines must be added to Availability Sets to eliminate downtime and accidental restarts during maintenance work in Microsoft data centers.
- Auto-Scaling allows you to save money by running virtual machines only as needed
- Running virtual machines takes up to several minutes, which must be taken into account when configuring Auto-Scaling parameters
- Each Azure subshop has access to only 5 reserved public IP addresses.
- It is advisable to optimize the application code to reduce the cost of cloud resources.
- Azure load balancer can easily be used for front-end Nginx / Apache virtual machines, but not suitable for load balancing the database
- When using Azure, it is advisable to separately purchase a support package that is not present by default.
- ICMP (ping) is blocked in Azure and you cannot ping your virtual machines. Use telnet or psping to determine whether a port is open or closed.
What did we get in the end?We were able to provide application resiliency using virtual machines for the front-end with Nginx / Tomcat, the load balancer built into Azure, as well as Availability Sets and Auto-Scaling. We divided the data warehouse into SQL and NoSQL to distribute the load on the database. We hope that using DocumentDB will allow in the long term to reduce the costs of setting up, monitoring and supporting servers, as is the case with MongoDB.
Also, thanks to Azure resources, we were able to reduce the limitations of free accounts and allow monitoring of 100 servers, 100 web sites and 100 IP addresses completely free of charge. You can try
here .
In the near future, we will update the CloudStats platform and add application monitoring using the New Relic model, but more on that in the next article.
Architecture of the CloudStats.me service on July 2015