Chronicles of LinguaLeo: how we did “Dialogues in English” with Node.js and DynamoDB

LinguaLeo users are starting to learn English in the Jungle - a repository of thousands of materials of different levels of complexity, format and subject matter; step by step, learn to hear and understand native speakers and to expand vocabulary. Who needs grammar - go to Courses . Vocabulary is replenished not only from the Jungle, with the addition of unfamiliar words to the Personal Dictionary, but also with the help of prepared Word sets that are available for individual study. In the Communication section, you can conduct Dialogues in English in order to practice the language with other LinguaLeo users in real-time mode, choosing topics for communication. Communication is only in English!

We used Node.js, DynamoDB (all on AWS) to create Dialogs in English. Now we will share our experience.
')

█ Why Node.js?

“... To provide our users with all the possibilities for communication, we chose Node.js technology. Why? Node.js is adequate for working with streams, where there is a huge number of modules . In addition, a competent JavaScript programmer, working on Node, will be able to implement the entire volume of the task, and not just its server part (unlike, for example, from an Erlang specialist).

As a “transport hub” between the server and the client, we used WebSockets as the fastest way to deliver information.

To solve the cross-platform problem, we chose the library Sock.JS. First, because of compliance with W3C standards for WS, and secondly, Sock.JS is great for our needs.

So let's go. We launched the Node.js daemon on Amazon EC2 cloud-based facilities. Deployed application via git. Forever module is responsible for logging and restarting. By the way, Amazon has a very useful service - Amazon Cloudwatch . Its functionality allows you to monitor the main parameters of the system, and most importantly its dignity - customizable notifications, which allow you to monitor only what is really important. For detailed information about the state of the application, we use nodetime .

As a driver for working with DynamoDB, the dynode module is used . He proved himself well, but the imperfect technical documentation added tar to the honey barrel. Amazon’s official Node.js SDK came out later and is still in the Developer Preview stage. In such a ~~natural way~~ , it was decided to use products from Amazon, because they meet all the technical requirements.

█ Subtleties of Node.js

We paid special attention to the problem of user authorization on the dialog server. We are responsible for authorization on the server cookies. This is a very “expensive” solution in terms of speed, but it impresses with stability and security, plus an independent service-oriented architecture.

How it works? User cookies are delivered via WebSockets to the dialog server → the dialog server checks the validity of the cookie by sending a request with the received cookie to our API, and it already gives the user data back. If the data is received without errors, the Dialogs server does its work (authorizes) and sends the data to the user .

We know that many people work at LinguaLeo not only at home, but also at work. It often happens that corporate firewalls are configured too strictly, which is why all ports, except the standard ones, are inaccessible. We remember this, and for WebSocket connections we use port 443 (without binding to https). This solution avoids potential network problems.

█ DynamoDB Features

Just over a year ago, Amazon released a distributed NoSQL database - Amazon DynamoDB . At the design stage of the system architecture, we had a choice between two products: MongoDB and DynamoDB. From the first option, we refused because of the complexity of the administration, and the choice fell on Amazon, as support in the case of its product was not required. And, of course, it was interesting to “roll in” the technology for further use.

It turned out that working with DynamoDB is very different from all that we used to see. Being a SaaS product, this thing taught us to take into account the time for the http request. Within the Amazon datacenter, the average request time is about 20 milliseconds, because of which we came to the conclusion: it is highly desirable to do all the samples only via indices (this is faster and cheaper), and use scan queries exclusively for analytics or migrations.

█ Data structure

Dialogs - storage of dialog metadata, caching the last message.

UserDialogs - a list of user dialogs, caching + unread message counter.

Messages - all user messages.

User2User - members of the dialogue.

Example 1

The process of getting all user conversations for the “My Conversations in English” page is implemented outside the box. On the page, the user sees all his dialogs, including the total number of unread messages, as well as the last message of each dialogue.

Data is selected for two requests. The first is to select all dialoglds from the UserDialogs table using Query . The second is getting conversations through BatchGetItem. There is a small nuance - BatchGetItem selects a maximum of 100 messages at a time. Therefore, if the user has 242 pieces, we will need to make 3 requests, which takes about 70-100 milliseconds. Keeping this in mind and striving for greater performance optimization, we cache the dialog data in ElasticCache , updating it with each new message.

Example 2

Adding a new message to the dialogue also happens nontrivially. The speed of the database is not very important to us, since we do not wait for the recording in DynamoDB. We need to conduct a large number of records, and this plays a significant role for the speed of subsequent sampling. First, we write a new message to the Messages PutItem table, and then we cache it in the Dialogs UpdateItem table. Next we need to set the messageCounter in the UserDialogs table for each user (UpdateItem). Anything can happen, all of a sudden one of the users decided to delete the dialog and dropped the message counter to us. In the sum = 4 requests, about 70-100 milliseconds gather in time. Unfortunately, such transactions are not supported in DynamoDB, which imposes tangible restrictions for processes where data integrity is critical.

In general, changing the requirements for the product is a rather frequent phenomenon. Sometimes this entails a transformation of the data structure. In relational databases, this is solved using ALTER TABLE, but in DynamoDB this is simply not the case. Changing the scheme here is very expensive. We have to re-create the tables or use Elastic MapReduce . For both options have to pay. A lot of data = a lot of money. In order to cope with this somehow, I had to select all the data with Scan for 1 megabyte, and then write to a new table. It takes a lot of time, but this is a fee for the lack of DBA.

█ Impressions from DynamoDB

Our experiment on using DynamoDB was a success. This is an amazing database that is easy to scale. Work with her and forget about administration. But remember: in return, it requires careful handling at the architectural design stage. Otherwise there are all sorts of unexpected turns and unpleasant rakes. We recommend using DynamoDB if there is a lot of non-critical data, and you need to work with them often. We use - we like ”:)

***
Communicate easily and naturally in English Dialogues - practice English in a pleasant company and join our team !

Follow the news on Facebook , Vkontakte and Twitter , share your impressions and have fun. Freedom of communication is great!

Team LinguaLeo

Source: https://habr.com/ru/post/176859/

All Articles