How to write a good solution for Highload Cup, but not good enough to get to the top

Last week, the HighLoad Cup competition ended, the idea of which was to implement an HTTP server for the travelers site. About how to write a solution on Go in 5 days, which will bring 52 place in the absolute standings out of 295, read under the cut.

Task Description

User Afinogen in his article already described the conditions of the competition, but for convenience, I briefly repeat. The server must implement the API for the travelers site and support three types of entities: the traveler (User), the landmark (Location) and the visit (Visit). The API should provide the ability to add any new entities to the database, receive them and update them, as well as do 2 operations on them - getting the average sight estimate (avg) and getting the places visited by the traveler (visits). Each of these operations also has a set of filters that must be considered when forming the answer. For example, the API allows you to get an average assessment of attractions among men aged 18 to 25 years starting from April 1, 2010. This imposes additional difficulties in implementation.

Just in case, here is a brief formal description of the API:

GET / <entity> / <id> to get entity data
POST / <entity> / <id> for update
POST / <entity> / new to create
GET / users / <id> / visits for getting a list of visits by user
GET / locations / <id> / avg to get average sights

How to store data

The very first question that arises for all who are familiar with the condition of the problem - how to store data. Many (like me) first tried to use some kind of database with the expectation of a large amount of data that would not fit in memory and not crush crutches. However, this approach did not give a high place in the ranking. The fact is that the organizers of the competition very democratically approached the size of the data, so even without any optimization of the storage structure, all the data easily fit into the memory. In total, the rating bombardment included about 1 million travelers, 100,000 attractions and 10 million visits, the identifiers of each entity went in order from 1. At first glance, the volume may seem large, but if you look at the size of the structures that can be used to store data, and also on the size of the rows in the structures, you can see that on average the sizes are not too large. The structures and sizes of their fields I gave below:

type Visit struct { // overall 40 bytes Id int // 8 bytes Location int // 8 bytes User int // 8 bytes VisitedAt int // 8 bytes Mark int // 8 bytes } type User struct { //overall 133 bytes Id int // 8 bytes Email string // 22 bytes + 16 bytes FirstName string // 12 bytes + 16 bytes LastName string // 18 bytes + 16 bytes Gender string // 1 byte + 16 bytes Birthdate int // 8 bytes } type Location struct { // overall 105 bytes Id int // 8 bytes Place string // 11 bytes + 16 bytes Country string // 14 bytes + 16 bytes City string // 16 bytes + 16 bytes Distance int // 8 bytes }

The dimensions of the string in the structure I indicated in the format "average length of the string" + "the size of the string object". By multiplying the average size of each structure by the number of objects, we only need approximately 518 MB for data storage. Not there is not much, provided that we can roam as much as 4 GB.

The largest memory consumption, as it would not be strange at first glance, occurs not at the shelling itself, but at the data loading stage. The fact is that initially all the data is packed into a .zip archive and in the first 10 minutes the server needs to download this data from the archive for further work with them. The unpacked archive weighs 200 MB, + 1.5 GB files weigh after unpacking. Without accurate work with the data and without more aggressive settings of the garbage collector it was impossible to load all the data.

The second moment, which was very important, but not everyone noticed him right away - the shelling of the server went in such a way that the state of the race could not happen in principle. The server was tested in 3 stages: at the first stage GET requests were received, which received objects and called avg (get average grade) and visits (get user visits) methods, the second step was updated (at this stage there were only POST requests for creating and updating data) and at the last stage GET requests were sent again, only on new data and with greater load. Due to the fact that GET and POST requests were strictly separated, there was no need to use any thread synchronization primitives.

Thus, if we take into account these two points, as well as recall that the id objects of each entity went in order starting from 1, then as a result all the data can be stored as follows:

 type Database struct { usersArray []*User locationsArray []*Location visitsArray []*Visit usersMap map[int]*User locationsMap map[int]*Location visitsMap map[int]*Visit }

All objects, if there is enough size of the array, are placed in the appropriate type of array. In case the id of the object would be larger than the size of the array, it would be placed in the corresponding dictionary. Now, knowing that in the final, the data has not changed at all, I understand that the dictionaries are superfluous, but then there was no guarantee for this.

Indices

Pretty soon it became clear that in order to quickly implement the avg and visits methods, it is necessary to store not only the User and Location structures themselves, but also the user IDs and sights along with the structures themselves, respectively. As a result, I added the Visits field, which is a regular array to these two structures, and thus I was able to quickly find all the Visit structures associated with this user / landmark.

In the process of testing, I also thought about using "container / list" from the standard library, but knowledge of the device of this container prompted me that it will always play both in terms of access speed to the elements and from memory. Its only plus is the ability to quickly remove / add to any point is not very important for this task, since the ratio of the number of visits to users is about 10 to 1, we can make an assumption that the Visit containers in the structures of the Location and User will be approximately the size of 10. A removing an element from the beginning of an array of 10 units is not too expensive against the general background and is not a frequent operation. As for the memory, its consumption can be illustrated by the following code:

 package main import ( "fmt" "runtime" "runtime/debug" "container/list" ) func main() { debug.SetGCPercent(-1) runtime.GC() m := &runtime.MemStats{} runtime.ReadMemStats(m) before := m.Alloc for i:=0;i<1000;i++ { s := make([]int, 0) for j:=0;j<10;j++ { s = append(s, 0) } } runtime.ReadMemStats(m) fmt.Printf("Alloced for slices %0.2f KB\n", float64(m.Alloc - before)/1024.0) runtime.GC() runtime.ReadMemStats(m) before = m.Alloc for i:=0;i<1000;i++ { s := list.New() for j:=0;j<10;j++ { s.PushBack(1); } } runtime.ReadMemStats(m) fmt.Printf("Alloced for lists %0.2f KB\n", float64(m.Alloc - before)/1024.0) }

This code gives the following output:

Alloced for slices 117.19 KB
Alloced for lists 343.75 KB

This code creates 1000 arrays and 1000 lists and fills them with 10 elements, since this is the average number of visits. The number 10 is bad for the array, because when adding elements, it will expand to 8 elements on the 8th element and thus the memory will be spent more than necessary. The results still show that the solution with slices was spent 3 times less memory, which fits with the theory, since each element of the list stores a pointer to the next, previous element and the list itself.

Other users who reached the final made several more indices — for example, a country index for the visits method. These indexes would most likely speed up the solution, but not as much as storing user visits and storing site visits along with information about this object.

Libraries

The standard library for implementing an http server is fairly easy to use and well distributed, but when it comes to speed, everyone recommends fasthttp. Therefore, the first thing after implementing the logic, I threw out the standard http server and replaced it with fasthttp. The replacement was absolutely painless, even though these two libraries have a different API.

Next, the replacement went the standard json encoding / decoding library. I chose easyjson, since it showed excellent speed / memory data + had an API similar to "encoding / json". Easyjson generates its own parser for each data structure, which allows us to show such impressive results. Its only drawback is a small number of settings. For example, in the task there were requests in which one of the fields was missing, which should result in an error, but easyjson silently skipped such fields, which made it necessary to go into the source code of the parsers.

Other optimizations

Since all API methods with the exception of POST methods were implemented without using additional memory, it was decided to disable the garbage collector - anyway, if there is enough memory, then why drive it?

Routing requests were also rewritten. Each query was determined by one character, which allowed saving on string comparison. School optimization.

Result

For the last 5 days of the contest, I, without any experience of participating in such contests, wrote a solution for Go, which brought me 52 place out of 295 participants. Unfortunately, I didn’t have enough just a bit to get to the final, but on the other hand, worthy rivals gathered in the final, so due to the fact that the data and the conditions of the task didn’t change, it is unlikely that I would be able to go higher.

Github with solution

Source: https://habr.com/ru/post/337868/

All Articles