📜 ⬆️ ⬇️

DevOps Zoo or like 500px serves more than 500TB images

From the translator: I chose this article to be translated as a vivid example of a developing Western startup with features pronounced for this group: a lot of new technologies, the use of a large number of third-party services, experiments with architecture. The article touched upon particularly interesting topics related to building a platform from microservices, DevOps and very little covered phenomenon on Habré called ChatOps. Enjoy!


About 500px


500px is an online community formed around a photo. Millions of users from all over the world are browsing, sharing, selling and buying the most beautiful photos. We appreciate the design, the simplicity of the code and responsibility.
I am DevOps. At 500px, I work on a platform: backend, monitoring, configuration management, automation, and of course, system deployment.


Development


In 500px, the development team is divided into 4 groups: Web, Mobile Applications, Quality and Testing, and of course our team is responsible for the platform and architecture, which deals with the design of the API and backend, including infrastructure management in general.
')
Our teams are quite multifunctional and the boundaries between them are almost blurred, so that developers freely move from one team to another to work with different technologies that are interesting for them. It is very helpful to easily spread knowledge inside the team and prevent stagnation. Moreover, there is a very close connection between the development and design teams, which allows us to be flexible, honest and focus on the really important things.



Architecture


The 500px architecture can be represented as a huge Ruby on Rails monolith surrounded by a whole constellation of microservices. Rails is responsible for the web and the API, which in turn serves mobile applications and all of our client APIs.

Microservices provide various functionalities for the main monolith application, and also process some API requests directly.

Rails is a fairly standard monolith: the application and the API serve requests using Unicorn , followed by Nginx . We have whole clusters of these Rails servers, which stand for either HAProxy or LVS balancers. The main databases that the main Rails application works with are MySQL , MongoDB , Redis, and Memcached . We also have a bunch of Sidekiq servers for heavy tasks running in the background. At the moment, all this is hosted on its own hardware in the data center.

Microservices are more interesting. At the moment we have about 10 types of them, each of which is focused on providing a separate and independent business logic platform. Some of the microservices are:


Microservices work both on Amazon EC2 and on their hardware in the data center. They are mostly written in Go , but there are exceptions to NodeJS or Sinatra . In fact, regardless of the language, we try to create our microservices with good 12-factor applications that reduce the complexity of deployment and configuration management. All these services work either for HAProxy or for AWS Elastic Load Balancers.



The use of microservices helps a lot, making it possible to avoid difficulties by taking the logic out of the main application. All you need to know the front-end development team using these services is only an API. And if something needs to be changed in one of the components, this is easy to do. For example, it is very simple to use microservice search, not knowing anything about ElasticSearch. This flexibility has proven its worth as we develop our platform because it allows us to try new technologies in a safe, isolated way. If you are interested in using microservices, then a former 500px developer Paul Osman last year at QConSF spoke about our experience of migrating from a large monolith to microservices (from a translator: very interesting, I advise you to look) .

Image processing


Perhaps the most interesting of the microservices that we use at 500px is the processing and maintenance of images. Every month we absorb millions of high quality photos from our community and give out terabytes of picture traffic from the main CDN, Edgecast . Last month, we gave about 569TB of traffic, and the 95th percentile of the bandwidth was about 2308Mbps. People really like to watch beautiful pictures!

500px hometown, Toronto

To save and distribute graphic content, there are 3 groups of microservices in EC2, all built around S3 where we store all our pictures. All of them are written in Go and we really liked using Go for such things, because it allows us to write small but very fast multi-threaded services. And that means we can place them on a smaller number of machines, keeping the hosting account under control.

Visitors encounter the first microservice when they upload photos - we call it Media Service. The Media Service is quite simple: it takes the load, saves it to S3, and then simply adds the task to the RabbitMQ queue for further processing.

Next, accepting tasks from RabbitMQ is the second microservice, named Converter. The Conversion service downloads the original image from S3, produces a certain number of treatments for generating thumbnails of various sizes and again saves them in S3. We use these miniatures in different places: both on the site and in mobile applications.

All this is probably not surprising for the photo-sharing service, and for some time these two microservices coped with all our tasks - we just made S3 storage with thumbnails the source for our CDN. However, with the growth of the entire system, this solution was not only expensive and inefficient in terms of disk space usage, but also not flexible enough in cases when new products were added that require different image sizes.

To solve this problem, we recently created the Image Generation Service (and yes, we tend to choose visual names for such things). This new service works for CDN, dynamically generating an image of any size or format from the S3 original on the fly. He also knows how to impose watermarks or symbolism of the photographer, which is especially pleasing to our community.

The Image Generation Service is fairly high load, the cluster processes about 1000 requests / second during peak hours. Dynamic regeneration and watermarking is a resource-intensive process; maintaining a reasonable response time under high load is not an easy task. We worked hard on this problem, and at the peak of visits, we are able to maintain a 95-percentile return time of content below 180ms. This was made possible with a cool and fast VIPS image processing library, aggressive caching, and simply crazy optimizations. Outside rush hours, the normal return time for images below 150ms.

And we are not resting on our laurels! Almost certainly, many more optimizations will be found, we hope to more and more reduce the time of return of pictures in the future.

The working process


We use GitHub and practice continuous integration ( CI ) for all of our main repositories.

For monolith Rails, we use Semaphore and Code Climate , the standard rspec unit configuration for testing and a small number of Capybara / Selenium tests for integration testing. Our 500px employee and just a cool guy Devon Noel de Tilly described in detail how we use these tools, so I will not stop here.

For our Go microservices we use Travis CI for tests and for building Debian packages. Travis after the build loads the collected packages into the temporary S3 repository, after which another microservice downloads them, signs them and imports them into our own Apt repository. To create packages, we use FPM , Aptly - to manage our repositories. I recently tried packagecloud.io for these tasks and I really liked it, so maybe we will switch to it in the near future.

For deployment, we use a whole group of tools. At the lowest level, we use Ansible and Capistrano , Chef - for configuration management. At a higher level, we at 500px really liked the ChatOps practice, so we scripted all of our scripts to use these tools in the always-true Hubot bot, which was called BMO .


Each 500px can easily expand a site or microservice with a chat message:

bmo deploy <microservice name> 

BMO arrives, deploit / unfolds what they asked for and sends the log back to the chat! This simple and easy mechanism simply worked wonders to improve the visibility of the process and reduce the complexity around application deployment. We use Slack chat where we communicate with BMO. If you need to find a specific log or you have forgotten a command, just start a chat search. Magic!

Other important tools


We monitor everything with the help of New Relic , Datadog , ELK ( Elasticsearch , Logstash , Kibana ), and good old Nagios . We send all our mail using Mandrill and Mailchimp , all payments are processed by Stripe and Paypal . Help us make decisions (note translator: BigData and analytics) Amazon Elastic MapReduce , Amazon Redshift, and Periscope.io . We use Slack , Asana , Everhour and Google Apps to communicate and synchronize teams. And when something goes wrong, we have Pagerduty and Statuspage.io to keep our users up to date.

What's next?


Right now I am experimenting with the launch of our microservice constellation in Docker containers as a personal development environment (docker-compose up), with an eye to using them in production in the future. We have a CI process established with Travis and Docker Hub, and I really love the potential of cloud container services such as Joyent Triton and Amazon ECS . As we build more and more microservices and expand our architecture, we also look towards tools for distributed systems like Consul and Apache Mesos - all this will allow us to grow better and faster!

Source: https://habr.com/ru/post/258751/


All Articles