Those who are interested in highload-systems, read about the architecture of Twitter, Facebook and others. But there has never been a publication about systems of a class like Dyadok. Unlike Twitter, this system is not free and accessible to everyone and contains a fairly large layer of business logic designed to solve problems from a specific subject area.
A few words in brief about the system: what it is intended for. To make it immediately clear what it is, imagine a web-based interface for mail, but this is not exactly mail, or rather, not mail at all. This system is designed to exchange documents. The main documents are invoices and invoices. At the same time, electronic documents are legally significant, have the same force as paper documents with seals and signatures.
The exchange of electronic documents in Russia is only beginning to develop, and in the not too distant future, all invoices are likely to be transmitted electronically. Every year, 12 billion invoices are created in Russia. This is an average of 380 documents per second, and at peak load - thousands of documents per second. Any project that aims to provide electronic document exchange services should count on such volumes and create an appropriate architecture.
')
In more detail about Diadok from the point of view of business and accounts department
it is possible to learn on the website of Diadok , and here further technical details will go.
Platform
OS: Windows, Linux
Language: C #, .Net 4.0
Message Queue: RabbitMQ
Data warehouses: Cassandra, MySQL, Berkeley DB, Kanso (own development)
Protocols: Thrift, Protocol Buffers
Memory caching: Redis
MVC: ASP.NET MVC, Razor (only for admin)
Load balancing: Nginx.
Architecture
The system is service oriented (SOA). The main data format for interaction is Google's Protocol Buffers, which allows you to efficiently exchange data between services. The communication protocol is HTTP. In this case, for publishing services, it is not IIS that is used, but its own implementation of the HTTP handler. IIS is used only for the web interface of the system.
The deployment scheme contains a list of exe-files that are generally on the system, and when laying out on the working platform, it is determined which services will run on which server. If any component is required to connect to a service, then a random selection is made from the running replicas of the service and the connection is made.
Cassandra
Cassandra is mainly used for logging due to the high speed of writing data, but recently it has been used for other purposes, for example, if you need to store a key-value persistently. This is not to say that this is an ideal key-value storage, but we learned to work with it. To interact with Cassandra, the Thrift protocol is used. Thrift is an analogue of Protocol Buffers, developed by Facebook, now under the tutelage of the Apache Software Foundation.
Kanso
Own development of fault-tolerant and distributed data storage. In terms of functionality, it is somewhat similar to the file system, but with a hard limit: you can only write to the file at the end. What is already recorded cannot be changed. This restriction increases the amount of data, but ensures that no data is lost.
Mysql
Used only to store data that does not require frequent changes. Sharding is not used for MySQL, all changes occur through one server, and there are several replicas for reading data.
RabbitMQ
This messaging service performed well enough and is used for asynchronous event processing. Messages have a limited shelf life and are removed from the queue after a few days. Here, as in http-services, we transmit structures based on Protocol Buffers.
Data caching
For data caching and quick information retrieval, Redis is used, as well as a whole group of .net services, which, when launched, read data from Kanso and write Berkeley DB to their local database.
Integration
Protocol Buffers is also used for the public API, but it is also possible to interact through OLE Automation. Many large companies face integration automation problems, and the developers of Diadoc help integrate the project with other systems. Very often, it is impossible to upload data from external systems in XML or other machine-readable format, and we have to convert data from printed forms (PDF) into our format.
For more information about integration, see:
https://diadoc.kontur.ru/sdk/IntegrationOptions.htmlhttps://diadoc.kontur.ru/sdk/Principles of development
- Very often used pair programming.
- Required Code Review.
- Two-week iteration planning.
- Daily meeting of the whole team about the current state of affairs (Stand-up meeting).
- Transparency of information about the state of the project, both in terms of marketing and in terms of development.
Development tools
- Visual Studio 2012
- Resharper
- TeamCity for Continuous Integration
- YouTrack as an issue tracker
Statistics
Number of programmers: 24
Number of servers: ~ 40
Average document delivery time: 7 seconds
Registered organizations: ~ 160 000