📜 ⬆️ ⬇️

You make money on information (why you need an API and how to design it correctly)

Hello, my name is Alexander Zelenin and I am a web developer.
Information is the basis of any application or service.



More than 10 years ago, I spoke with the owner of the poker room, and he showed me a page that brought about $ 10,000 a day . It was a completely banal page. She had neither styles nor graphics. Solid text, broken headings, sections and links. I just did not fit in my head - well, how can this bring such money?
')
The secret is that “this is” was one of the first comprehensive guides to playing online poker. The page had a PageRank 10/10 (or 9, not an essence), and in search results it was the first thing that came across.

The purpose of your application, whatever it is, is to convey (receive, process) some information to the user.
Online store: product information, methods of purchase and delivery.
Even if it is a terrible, ugly and inconvenient site, users will still find the product they are looking for. Especially if you are trading something unique enough (at least in your area). Plus, search engines will help you, directing the user to the right product.

Of course, the conversion may be lower, or the user may not be very pleased with the experience with the site, but if the product itself is exactly what he was looking for - everything else will be insignificant.

I do not consider stores selling "on emotion" and purchases that the user may then regret.

Multiplayer online game: information about the player, friends and the world around him
Examples may vary depending on the genre and other parameters, but in general, the user is interested in such things as world history, correspondence / communication with allies, information about current events, information about his character / village / ship / anything else.

Very often, the way to access this information goes beyond the game client itself. Using a mobile application, you can check if anyone is attacking you, or put some items on the in-game auction, even without entering the game itself.

Music streaming service - meta-information + music files
The user wants to find the music that interests him. All wrappers, smart queues, licensing and other husks are of little interest to anyone.

Of course, it is good to use licensed content, but if the user cannot find what he was looking for, he will leave and find it elsewhere. On the Internet, people do not remember the information as such, they remember the place where this information was found. Therefore, if there are no X group songs on your site, but there is a link to the X group page where they sell their albums, your service is still a plus, because the user remembered where he took information about the X group and will return to you again look for information about group Y.

I worked in several musical projects, and very often it was all about the availability of the necessary tracks, despite dozens of terabytes of data.

Video service - video recordings
At some point, youtube scored a critical mass of videos and became the market leader. They had not the most convenient site, not the best conditions. In general, much was not so, but it was the abundance of content that attracted visitors, and as a result, there was only more content.


I think you already caught the idea. Examples can be given endlessly (here’s another one: they don’t go to Wikipedia. Moreover, some of the information from Wikipedia is displayed immediately in the search results, without even opening the site itself), and if you think that this is not applicable in your case - write in the comments (or mail / personal), and I will explain why it is still applicable.

So: whatever you do, information will always be primary. Users will surely find and contact you with good quality information.

I will tell you how to organize work with information so that it is:
1. Scalable - replication, sharding, etc. configured without interfering with the application.
2. Convenient for users - easy to document, understand how to use.
3. Convenient for your developers - rapid prototyping, optimization capabilities only necessary.

This approach does not make sense to you if you have a small project with a small number of components and developers.



Table of contents:
  1. Information Consumers
  2. How to work with information (API)
  3. API inner layer
  4. Outer layer API
  5. Heavy query optimization
  6. Users
  7. Scaling
  8. Caching
  9. Versioning
  10. Total


Assumptions / lack of information
In the text of the article, I use a number of assumptions: for example, that you already have something or are implemented.

Virtually each of the questions can go on indefinitely, in terms of volume, a detailed analysis draws on a whole cycle of similar articles. If any information was not specified - I decided that for the perception of the concept it is not important.

If you think that something is still missing, please inform us and the article will be supplemented in accordance with the wishes.



Information Consumers


Information consumers can be divided into two categories - internal and external.

Internal - these are your products and services. The difference is that for "their" APIs it can provide much more extensive functionality with fewer restrictions. For example, the same Google maps on their own domain could work with the use of webgl, and as a result, much faster, and embedded - not (at the moment the situation could change, did not check).

External - end users or products not owned by your company. For example, for Google Maps, you are an external user. Usually, access to information from outside is severely limited, and special access keys are often required.


How to work with information?


To work with information, we will provide a web API. We will implement 2 layers (external and internal). The layer implies at least a separate application.

Why do you need an API?

The API allows you to provide data in a platform-independent form. It is not always known how and where data will be used, and API development is a good way to state “we have information - contact us”.

All code examples are just one of many implementations. This will make it possible to use the data regardless of the ways in which the final product is implemented (including offline applications, provided at least one-time access to the network).

The first step is to describe the models and data collections. If the application is implemented in Javascript (nodejs on the server), it will be possible to use the same models on both clients and servers.

A model is a description of an entity (for example, a music track): its fields, their properties, ways to access and provide information. The model can duplicate the database schema, but can also expand it with additional information. Moreover, a model can contain information from several tables / collections and represent it as one entity. On the server, the model should be extended with descriptions for working with tables, access to servers, and so on. On the client, the model is extended by data access addresses.

When accessing data, the model may also contain additional meta-information about the request itself (execution time, position of the record in the database, links), virtual fields (for example, if path is stored in the database - relative path to the file, you can add a virtual url field that will be calculated "on the fly").

As an example, I will give a code describing a certain music service.

The examples will be in Javascript, but everything described applies to any language. I did things like this in php, python, and c ++. Everything needs to vary depending on the size of the project.


Model.extend('Track', { //   attributes: { id: 'integer', //     title: 'string', url: 'string', // /   duration: 'integer', album: 'app.model.Album.model', //     artist: 'app.model.Artist.model' } }) 


Data validation (validation)
In order not to litter the code, I will omit the detailed descriptions of the tests in the examples. If desired, you can specify any number of criteria, texts of validation errors, etc. Validation is applicable both on clients and servers.

One example:
 Model.extend('Track', { attributes: { ... title: { type: { value: 'string', errorText: '   «»' }, required: { value: true, errorText: '   ' }, length: { min: 5, max: 32, errorText: '     5  32 ' } } ... } }) 



A collection is a collection of entities (usually of the same nature, that is, for example, music tracks). The data set may also contain additional data related to the set itself. The number of selected tracks, the number of remaining (not selected) tracks, the page number, the number of pages can be represented as meta-information. A virtual field can be the total duration of all tracks.

 Model.List.extend('Track.List', { //   attributes: { duration: 'virtual' //  ,     } }, { duration: function(tracks) { //     return _.reduce(tracks, function(totalDuration, track) { return (track.duration || 0) + totalDuration; }) } }) 



API inner layer


This layer will be available only to our products.

Since our models already contain a lot of information, we can provide access to the data using the minimum amount of code.

Expansion models
We extend the model on the server and client, describing the name and path. General implementations of the findOne, update, destroy, and create methods are described in the model abstraction and do not require a separate implementation if they are not fundamentally different.
 Model.extend('Track', { // findOne: 'GET /track/{id}', //       update: 'PUT /track/{id}', //    destroy: 'DELETE /track/{id}', //    create: 'POST /track' //    }) 


We extend the model only on the server:
 Model.extend('Track', { 'GET /track/top/today': function() { //      var track = ...; ... return track; } }) 


We extend the model only on the internal client:
 Model.extend('Track', { findTodayTop: 'GET /track/top/today' }) 


 Model.extend('Track.List', { findByArtistId: 'GET /track/byArtistId/{artist_id}' //   ,    }) 



On this layer, we have maximum query flexibility.

Sample request and response
 app.model.Track.List.findByArtistId({ format: 'json', artist_id: 20974, fields: [ // ,      track, //     track.artist, //     track.album.name //  ,     .   . ], offset: 5, //   5  limit: 10, //    10  sort: [ 'track.title': 'ASC' //      ], cache: 1800 //       }) 


In response, we get something like:

 { "result": { "tracks": [ ..., { "id": 856, //     "title": " ", "url": "/////", "duration": 216, "artist": { "id": 20974, "title": " ", ... //   }, "album": { "name": " " //   } }, ... ] }, "offset": 5, //  "count": 7, //   "totalCount": 12 //   } 


It is not necessary to invest in this way. If you are afraid of duplicates (although, if in terms of traffic, then gzip copes with them perfectly), you can collect in separate fields in the initial result.



Outer layer API


The outer layer is directly accessible to end customers. It can be a website or just an API for third-party developers.

On the outer layer, we do NOT provide such flexibility as on the inner layer, but only give access to the basic features: the main query parameter, indent, number, etc. And all this with restrictions.

In part, he is a proxy to the internal API with a number of important additions.

Just an example:
 app.get('/api/track/:id', ..., function(req, res) { return app.model.Track.findOne({id: req.params.id}); }) 


Instead of "..." we do the necessary rights checks, modify the request, define the format. Data is returned in the same format and way as requested.
Those. for http and json, the data will come back. For socket and xml, the answer will be through the socket and in xml.

In this way, we can fully control what is available from the outside, how, under what conditions, and so on.


Heavy query optimization


Before that, we described working with the database as abstract as possible, and, of course, such requests will be executed much slower than the optimized ones. With the first step, we discover (using a profiler or some other convenient method) that one of the queries is slow.

Suppose we noticed that the query is slowly running, in which the track is selected along with information about the album and the musician. To optimize, we need to create a new internal method:
 Model.extend('Track', { 'GET /track/{id}/withAdditionalData': function() { var track = ...; //       //      return track; } }) 

and change the call on the outer layer to the inner one for the presented one. Everything. For the final client, nothing has changed, and the cache, the paths, and the data received are the same, but everything has become faster to work.


Users


The main task when working with users is to check their rights.

As soon as the user is authorized (the method is not important: a cookie, a key, another option), we make one request to the inner layer, verifying identity and obtaining permissions. Further checks we will do on the outer layer.


Scaling


We get great advantages at the scaling stage.

We can run both the outer and inner API layers in any number of instances, resolving the load with the help of balancers. Due to this separation, we can run multiple applications with the outer layer as close as possible to the final client, having received our own CDN network with data caching.

Databases are extended in classical ways, depending on the task.


Caching


For the inner layer, we cache the results of database queries, and on the outer layer, the results obtained from the inner layer. The end client can also have caching.

In one of the previous examples there was the line “cache: 1800” - it can provide a cache both on the outer layer, memorizing the result on the server for half an hour, and on the client, adding the result, for example, in localStorage (well, or another client storage).


Versioning


With the development of your project, new methods will appear, and any old methods will go away. To indicate the versions I recommend, undoubtedly, Semantic Versioning . We are particularly interested in changing the major version of the API, without backward compatibility. API access paths can be divided simply: / api / {version}

You can organize files and support for different versions in several ways, for example:
1. Make folders v1, v2 and place in them all the code relating to them. When modifying one of the API versions, the other is not affected.
2. Different repositories function as different applications.


Total





  1. We have full control over the movement of data at all stages.
  2. The developers are happy, they don’t have to wait for the API team to implement a super-tricky method to get data in a specific format.
  3. Developers are happy, they can only optimize what slows down.
  4. Clients are satisfied - the API is more stable and does not change the path due to the fact that some query was slow.
  5. Customers are satisfied - the access speed is increased due to the location of the server as close as possible.


I would be happy to add to the article additional sections on your request.
Also, if you want to clarify the code somewhere - I will write and attach, ask.
Perhaps this is the beginning of a large cycle of articles on various topics ( or a start has already been made ). Material has accumulated a lot, but it is not systematized.

Offtopic question about creating courses for those who want to develop web projects
I want to create a full-fledged course on the development of web projects. Straight from scratch to full stack.

According to the plans, it will include: video lectures + text lessons, homework, independent projects, work with a mentor, various intensives, a bunch of code (in the format of launching / updating online), and so on. As a feature, I want to make a “network” format, not a sequential one. Those. after passing through certain themes, others are revealed, and the student can choose for himself what interests him at the moment. According to preliminary estimates, the duration of training will be in the region of six months of full-fledged classes for a couple of hours a day.

It is clear that such a project is not implemented so quickly. Therefore, I want to approach its development with the involvement of a partner, on whose basis it can be sold / sold. Since If I take the accompanying questions, the deadline for implementation will be quite transcendental. Well, plus different portals have different tools that I want to take into account.

I have already written a number of projects like coursera with a description of my venture, but have not received any answers. And, what a shame, in general, no, even failures.

The market has studied and is confident that what I am implementing is more than competitive.

I would welcome any suggestions on partnership or advice to whom to contact.

Source: https://habr.com/ru/post/277161/


All Articles