We develop Shrimp: we control parallel requests, we log in via spdlog and more ...

Last week, we talked about our small demo project Shrimp , which clearly shows how you can use the C ++ libraries of RESTinio and SObjectizer in more or less reality-like conditions. Shrimp is a small C ++ 17 application that, through RESTinio, accepts HTTP requests for image scaling and serves these requests in multi-threaded mode using SObjectizer and ImageMagick ++.

The project turned out to be more than useful for us. The piggy bank for the extension of the functionality of RESTinio and SObjectizer has increased significantly. Something has even been embodied in a very fresh version of RESTinio-0.4.7 . So we decided not to dwell on the very first and most trivial version of Shrimp, but to do another one or two iterations around this project. If someone is interested in what and how we have done during this time, you are welcome under the cat.

As a spoiler: it’s about how we got rid of parallel processing of identical requests, how we added logging to Shrimp using the excellent spdlog library , and also made a command to force the reset of the transformed images cache.

v0.3: control of parallel processing of identical requests

The very first version of Shrimp, described in the previous article, contained a serious simplification: there was no control over whether the same request is currently being processed or not.
')
Imagine that for the first time Shrimp receives a request of the form "/demo.jpg?op=resize&max=1024". There is no such image in the transformed image cache, so the request is being processed. Processing can take considerable time, say, a few hundred milliseconds.

Processing the request has not yet completed, and Shrimp again receives the same request "/demo.jpg?op=resize&max=1024", but from another client. The result of the transformation in the cache is not yet, so this request will be processed.

Neither the first nor the second requests have yet been completed, but Shrimp can again receive the same request "/demo.jpg?op=resize&max=1024". And this request will also be processed. It turns out that the same image is scaled to the same size in parallel several times.

It is not good. Therefore, the first thing we decided in Shrimp to get rid of such a serious shoal. We did this at the expense of two tricky containers in the transform_manager agent. The first container is the queue of requests waiting for free transformers. This is a container named m_pending_requests. The second container stores queries that have already been processed (i.e., specific transformers have been allocated to these queries). This is a container named m_inprogress_requests.

When transform_manager receives another request, it checks the availability of the finished image in the transformed image cache. If the converted image is not present, then the m_inprogress_requests and m_pending_requests containers are checked. And if there is no request with such parameters in any of these containers, only then an attempt is made to put the request in the m_pending_requests queue. It looks something like this :

void a_transform_manager_t::handle_not_transformed_image( transform::resize_request_key_t request_key, sobj_shptr_t<resize_request_t> cmd ) { const auto store_to = [&](auto & queue) { queue.insert( std::move(request_key), std::move(cmd) ); }; if( m_inprogress_requests.has_key( request_key ) ) { //    . //         . store_to( m_inprogress_requests ); } else if( m_pending_requests.has_key( request_key ) ) { //      . store_to( m_pending_requests ); } else if( m_pending_requests.unique_keys() < max_pending_requests ) { //           . store_to( m_pending_requests ); //    transformer-     . try_initiate_pending_requests_processing(); } else { //  ,   . do_503_response( std::move(cmd->m_http_req) ); } }

It was said above that m_inprogress_requests and m_pending_requests are tricky containers. But what is the trick?

The trick is that these containers combine the properties of both a regular FIFO queue (in which the chronological order of adding elements is preserved) and multimap, i.e. associative container in which several values can be associated with a single key.

Preservation of chronological order is important, since the oldest elements in m_pending_requests should be periodically checked and withdrawn from m_pending_requests those requests for which the maximum waiting time has been exceeded. And effective access to the elements by key is necessary both for checking the presence of identical requests in the queues, and so that all duplicate requests can be removed from the queue at once.

In Shrimp, we biked our small container for this purpose. Although, if Boost were used in Shrimp, you could use Boost.MultiIndex. And, probably, if over time, an effective search in m_pending_requests will need to be organized according to some criteria, then Boost.MultiIndex will have to be used in Shrimp.

v0.4: spdlog logging

We aimed to keep the first version of Shrimp as simple and compact as possible. Because of what in the first version of Shrimp, we did not use logging. At all.

On the one hand, this made it possible to keep the code of the first version concise, containing nothing but the necessary Shrimp business logic. But, on the other hand, the lack of logging complicates both the development of Shrimp and its operation. Therefore, as soon as our hands reached us, we immediately dragged into Shrimp the excellent modern C ++ library for logging - spdlog . Breathing immediately became easier, although the code of some methods has grown in volume.

For example, the above code for the handle_not_transformed_image () method with logging starts to look something like this :

 void a_transform_manager_t::handle_not_transformed_image( transform::resize_request_key_t request_key, sobj_shptr_t<resize_request_t> cmd ) { const auto store_to = [&](auto & queue) { queue.insert( std::move(request_key), std::move(cmd) ); }; if( m_inprogress_requests.has_key( request_key ) ) { //    . m_logger->debug( "same request is already in progress; request_key={}", request_key ); //         . store_to( m_inprogress_requests ); } else if( m_pending_requests.has_key( request_key ) ) { //      . m_logger->debug( "same request is already pending; request_key={}", request_key ); store_to( m_pending_requests ); } else if( m_pending_requests.unique_keys() < max_pending_requests ) { //           . m_logger->debug( "store request to pending requests queue; request_key={}", request_key ); store_to( m_pending_requests ); //    transformer-     . try_initiate_pending_requests_processing(); } else { //  ,   . m_logger->warn( "request is rejected because of overloading; " "request_key={}", request_key ); do_503_response( std::move(cmd->m_http_req) ); } }

Configuring spdlog loggers

Logging to Shrimp is done to the console (i.e., to standard output). In principle, one could follow a very simple path and create in Shrimp a single instance of the spd shny logger. Those. you could call stdout_color_mt (or stdout_logger_mt ), and then pass this logger to all entities in the Shrimp. But we went a little more complicated way: we manually created a so-called. sink (i.e., the channel where spdlog will output generated messages), and for the Shrimp entities, separate loggers linked to this sink will be created.

 //     . [[nodiscard]] spdlog::sink_ptr make_logger_sink() { auto sink = std::make_shared< spdlog::sinks::ansicolor_stdout_sink_mt >(); return sink; } [[nodiscard]] std::shared_ptr<spdlog::logger> make_logger( const std::string & name, spdlog::sink_ptr sink, spdlog::level::level_enum level = spdlog::level::trace ) { auto logger = std::make_shared< spdlog::logger >( name, std::move(sink) ); logger->set_level( level ); logger->flush_on( level ); return logger; } //        : auto manager = coop.make_agent_with_binder< a_transform_manager_t >( create_one_thread_disp( "manager" )->binder(), make_logger( "manager", logger_sink ) ); ... const auto worker_name = fmt::format( "worker_{}", worker ); auto transformer = coop.make_agent_with_binder< a_transformer_t >( create_one_thread_disp( worker_name )->binder(), make_logger( worker_name, logger_sink ), app_params.m_storage );

There is a delicate point in configuring loggers in spdlog: by default, the logger ignores messages with trace and debug severity levels. Namely, they are the most useful when debugging. Therefore, in make_logger, by default we activate logging for all levels, including trace / debug.

Due to the fact that for each entity in Shrimp there is its own logger with its own name, we can see in the log who does what:

Tracing SObjectizer via spdlog

From time to time logging, which is performed within the framework of the main business logic of a SObjectizer application, is not enough to debug the application. It is sometimes not clear why some action is initiated in one agent, but not actually performed in another agent. In this case, the msg_tracing mechanism built into SObjectizer (which we described in a separate article ) helps a lot. But among the standard implementations of msg_tracing for SObjectizer there is no one that would use spdlog. For Shrimp, we will do this implementation ourselves:

 class spdlog_sobj_tracer_t : public so_5::msg_tracing::tracer_t { std::shared_ptr<spdlog::logger> m_logger; public: spdlog_sobj_tracer_t( std::shared_ptr<spdlog::logger> logger ) : m_logger{ std::move(logger) } {} virtual void trace( const std::string & what ) noexcept override { m_logger->trace( what ); } [[nodiscard]] static so_5::msg_tracing::tracer_unique_ptr_t make( spdlog::sink_ptr sink ) { return std::make_unique<spdlog_sobj_tracer_t>( make_logger( "sobjectizer", std::move(sink) ) ); } };

Here we see the implementation of the special SObjectizer interface tracer_t, in which the main thing is the virtual trace () method. It is he who performs the tracing of the SObjectizer internals using spdlog.

Further, this implementation is set up as a tracer when running SObjectizer:

 so_5::wrapped_env_t sobj{ [&]( so_5::environment_t & env ) {...}, [&]( so_5::environment_params_t & params ) { if( sobj_tracing_t::on == sobj_tracing ) params.message_delivery_tracer( spdlog_sobj_tracer_t::make( logger_sink ) ); } };

RESTinio tracing via spdlog

In addition to tracing what is happening inside SObjectizer, at times it is very useful to trace what is happening inside RESTinio. In the updated version of Shrimp, such a trace is also added.

This trace is implemented through the definition of a special class that can perform logging in RESTinio:

 class http_server_logger_t { public: http_server_logger_t( std::shared_ptr<spdlog::logger> logger ) : m_logger{ std::move( logger ) } {} template< typename Builder > void trace( Builder && msg_builder ) { log_if_enabled( spdlog::level::trace, std::forward<Builder>(msg_builder) ); } template< typename Builder > void info( Builder && msg_builder ) { log_if_enabled( spdlog::level::info, std::forward<Builder>(msg_builder) ); } template< typename Builder > void warn( Builder && msg_builder ) { log_if_enabled( spdlog::level::warn, std::forward<Builder>(msg_builder) ); } template< typename Builder > void error( Builder && msg_builder ) { log_if_enabled( spdlog::level::err, std::forward<Builder>(msg_builder) ); } private: template< typename Builder > void log_if_enabled( spdlog::level::level_enum lv, Builder && msg_builder ) { if( m_logger->should_log(lv) ) { m_logger->log( lv, msg_builder() ); } } std::shared_ptr<spdlog::logger> m_logger; };

This class is not inherited from anything, since the logging mechanism in RESTinio is based on generalized programming, and not on the traditional object-oriented approach. That allows you to completely get rid of any overhead in cases where logging is not at all necessary (we disclosed this topic in more detail when we talked about using templates in RESTinio ).

Next, we need to specify that the HTTP server will use the http_server_logger_t class shown above as its logger. This is done by clarifying the properties of the HTTP server:

 struct http_server_traits_t : public restinio::default_traits_t { using logger_t = http_server_logger_t; using request_handler_t = http_req_router_t; };

Well, then all that remains is to create a specific instance of the spd-shny logger and give this logger to the created HTTP server:

 auto restinio_logger = make_logger( "restinio", logger_sink, restinio_tracing_t::off == restinio_tracing ? spdlog::level::off : log_level ); restinio::run( asio_io_ctx, shrimp::make_http_server_settings( thread_count.m_io_threads, params, std::move(restinio_logger), manager_mbox_promise.get_future().get() ) );

v0.5: forced reset of the transformed images cache

In the process of debugging Shrimp, one small piece was discovered that was a bit annoying: in order to reset the contents of the transformed image cache, you had to restart the entire Shrimp. It would seem a trifle, but unpleasant.

Once unpleasant, you should get rid of this shortcoming. The benefit of this is not difficult.

First, we will define another URL in Shrimp, to which HTTP DELETE requests can be sent: "/ cache". Accordingly, hang up your handler at this URL:

 std::unique_ptr< http_req_router_t > make_router( const app_params_t & params, so_5::mbox_t req_handler_mbox ) { auto router = std::make_unique< http_req_router_t >(); add_transform_op_handler( params, *router, req_handler_mbox ); add_delete_cache_handler( *router, req_handler_mbox ); return router; }

where the add_delete_cache_handler () function looks like this:

 void add_delete_cache_handler( http_req_router_t & router, so_5::mbox_t req_handler_mbox ) { router.http_delete( "/cache", [req_handler_mbox]( auto req, auto /*params*/ ) { const auto qp = restinio::parse_query( req->header().query() ); auto token = qp.get_param( "token"sv ); if( !token ) { return do_403_response( req, "No token provided\r\n" ); } // Delegate request processing to transform_manager. so_5::send< so_5::mutable_msg<a_transform_manager_t::delete_cache_request_t> >( req_handler_mbox, req, restinio::cast_to<std::string>(*token) ); return restinio::request_accepted(); } ); }

A bit wordy, but nothing complicated. In the query string of the query must be the parameter token. This parameter must contain a string with a special administrative token value. You can reset the cache only if the value of the token from the token parameter coincides with what was specified when Shrimp was started. If the token parameter is not present, the request for processing is not accepted. If there is a token, then a special command message is sent to the transform_manager agent, which owns the cache, after which the transform_manager agent itself will respond to the HTTP request.

Secondly, we implement the new message handler delete_cache_request_t in the transform_manager_t agent:

 void a_transform_manager_t::on_delete_cache_request( mutable_mhood_t<delete_cache_request_t> cmd ) { m_logger->warn( "delete cache request received; " "connection_id={}, token={}", cmd->m_http_req->connection_id(), cmd->m_token ); const auto delay_response = [&]( std::string response_text ) { so_5::send_delayed< so_5::mutable_msg<negative_delete_cache_response_t> >( *this, std::chrono::seconds{7}, std::move(cmd->m_http_req), std::move(response_text) ); }; if( const char * env_token = std::getenv( "SHRIMP_ADMIN_TOKEN" ); // Token must be present and must not be empty. env_token && *env_token ) { if( cmd->m_token == env_token ) { m_transformed_cache.clear(); m_logger->info( "cache deleted" ); do_200_plaintext_response( std::move(cmd->m_http_req), "Cache deleted\r\n" ); } else { m_logger->error( "invalid token value for delete cache request; " "token={}", cmd->m_token ); delay_response( "Token value mismatch\r\n" ); } } else { m_logger->warn( "delete cache can't performed because there is no " "admin token defined" ); // Operation can't be performed because admin token is not avaliable. delay_response( "No admin token defined\r\n" ); } }

There are two points to clarify.

The first point in the implementation of on_delete_cache_request () is the verification of the value of the token. The administrative token is set via the SHRIMP_ADMIN_TOKEN environment variable. If this variable is set and its value matches the value from the token parameter of the HTTP DELETE request, the cache is cleared and a positive response to the request is immediately generated.

And the second point in the implementation of on_delete_cache_request () is the forced delay of the negative response to HTTP DELETE. If the wrong value of the administrative token has arrived, then you should delay the response to HTTP DELETE in order to avoid the desire to select the value of the token by brute force. But how to make this delay? After all, calling std :: thread :: sleep_for () is not an option.

This is where deferred SObjectizer messages come in. Instead of immediately forming a negative response inside on_delete_cache_request (), the transform_manager agent simply sends itself a pending negative_delete_cache_response_t message. The SObjectizer timer counts down the allotted time and delivers this message to the agent after the specified delay has elapsed. And now in the negative_delete_cache_response_t handler, you can immediately form an answer to an HTTP DELETE request:

 void a_transform_manager_t::on_negative_delete_cache_response( mutable_mhood_t<negative_delete_cache_response_t> cmd ) { m_logger->debug( "send negative response to delete cache request; " "connection_id={}", cmd->m_http_req->connection_id() ); do_403_response( std::move(cmd->m_http_req), std::move(cmd->m_response_text) ); }

Those. the following script is obtained:

The HTTP server receives the HTTP DELETE request, converts this request into a delete_cache_request_t message to the transform_manager agent;
The transform_manager agent receives a delete_cache_request_t message and either immediately generates a positive response to the request or sends itself a pending negative_delete_cache_response_t message;
The transform_manager agent receives a negative_delete_cache_response_t message and immediately generates a negative response to the corresponding HTTP DELETE request.

The end of the second part

At the end of the second part it is quite natural to ask the question: “What is next?”

Next, there will probably be another iteration and another update of our demo project. I want to do such a thing as converting an image from one format to another. Say, on the server, the picture is in jpg, and after the transformation, it is sent to the client in the webp.

It would also be interesting to attach a separate "page" with the display of current statistics on the work of Shrimp. First of all, it's just curious. But, in principle, such a page can be adapted for the needs of monitoring the viability of the Shrimp.

If someone has any other wishes on what I would like to see in Shrimp or in articles around Shrimp, then we would be happy to hear any constructive considerations.

Separately, I would like to note one aspect in the implementation of Shrimp, which surprised us somewhat. This is the active use of mutable messages when agents communicate with each other and an HTTP server. Usually, in our practice, the opposite happens - more often data is exchanged via immiable messages. This is not the case. That says that we knowingly listened in due time to the wishes of users and added mutable messages to SObjectizer. So if you would like to see something in RESTinio or SObjectizer, then feel free to share your ideas. We always listen to the good.

Well, in conclusion, I want to thank everyone who took the time and spoke about the first version of Shrimp, both here, on Habré, and through other resources. Thank!

Continued ...

Source: https://habr.com/ru/post/417527/

All Articles