ABBYYOnline - Architect's View

Today we will talk about distributed, heterogeneous and sometimes quite seriously and diversely loaded Internet applications using the example of the portal http://www.abbyyonline.com . I will try not to go into technical details related to a specific platform, and although the main part of the described application is implemented on ASP.Net MVC, it is possible that this material will be of interest to everyone related to web development, regardless of the toolkit used.

Every self-respecting developer, who loves his profession and gets real pleasure from it, when he embarks on another project, he strives to write the perfect application — optimal, with beautiful extensible architecture and beautiful code ... And there are always two fundamental problems that hinder this:

Imperfection of the world around us
Business requirements

In the project AbbyyOnline also had to deal with them at every turn. In this case, the imperfection of the world lies in the fact that AbbyyOnline (or AOL by its internal name) is a Web application, and in our imperfect world such applications are burdened with a lack of state (that is, they are as stateless as possible), limited by the network’s capabilities ( latency, bandwidth), are forced to adapt to the browser's serpentary and are doomed to be implemented through a whole zoo of technologies, from client scripts and, save the byte, flash to the server code and the next dialy SQL-th ect.

Business is no less cruel ... As planned, AOL marketing should be the face of the entire company on the Internet, so that all the services that ABBYY provides online are presented on this portal.
')
ABBYY has a fairly clear positioning - these are languages, recognition and everything connected with it, that is, word processing in one form or another. Naturally, I wanted the online services to be rendered as a united front. Is it reasonable? Of course, but what does this mean in practice? At a minimum, a single user base is needed, Single SignOn authentication should be implemented, that is, if a user logged in on one service, then the others should recognize him in person without asking questions, there should be uniform design and logic elements, and all that ...

Everything would be fine, but every online service of the company is a separate and rather serious application. And not only that, these applications are located on different servers (and some applications are not physically located on the same server and require quite a serious infrastructure for themselves), so these servers, for various reasons, are located on different platforms. Not to mention that, historically, these servers are equipped with different operating systems (there is a Windows server and FreeBSD, if you are interested in details). In general, a truly heterogeneous and distributed system - and against this background, such a trifle as different domain names for different portal sites can already be neglected.

General structure

When creating such portals, quite a lot of data and pages are common and it would be logical to bring all this to a central site, which is a link. In our case, the task of the central site includes authentication, work with shared pages and work with shared resources. This central site is equipped with the WebService API, through which other sites, whose tasks it is to provide services, can work with the general infrastructure of the portal, for example, receive complete information about the current user or place goods in the general basket of the online store. Service sites do not know anything about each other, they suspect only the existence of the main portal site and know how to work with the above-mentioned API.

Single SignOn and Cross Domain Authentication

The first problem is a single user base and Single SignOn authentication - what is the difficulty here? As we already found out, one of the features of web applications is the lack of state. In practice, this means that each request to the server from the browser does not depend on the previous one, and each time the server should guess “who came to us?”. Traditionally, this problem is solved by writing a special authentication cookie, which the browser sends back to the server on each request until the cookie expires. But the fact is that, for security reasons, the server part of the site can read only the cookie that is recorded on the same second-level domain, for example, the website deployed at http://finereader.abbyyonline.com can read the cookie that the website has written deployed on http://www.abbyyonline.com , but the site http://finereaderonline.com will not read this cookie under any circumstances. However, under the terms of our task, different parts of the portal can be located on different domains, so we need some kind of cross-domain authentication mechanism.

In principle, there are two main ways to solve this problem - on redirects (approximately as it was in .Net Passport, and then in OpenID, LiveID, etc.) and javascript, as Google or Yandex does in their services. The redirection scheme works approximately as follows:

When accessing any page of any portal site, you need to find out whether the portal of this user knows or not. Why to any page, and not just to a protected one? Because the nickname, the link to the profile and part of other personal information must be shown on all pages so that the user knows that we remember him and are happy for him :)

To achieve this goal, the following sequence of actions is performed.

The site is trying to read the authentication cookie of its own production. If this succeeds, then the authentication process ends here, because if there is such a cookie, it means that the user is not the first time and we have successfully identified it on this site.
The site is trying to read a special cookie stating that the authentication process was carried out, but did not end in success. If you succeed in this, then you can also end here - we know that we do not know this user.
If the previous points did not succeed, then ...
1. The site creates a special unique token
2. A record is created in the database, on a nashosh tablet, which, among other things, includes a token and the url of the page that requested authentication.
3. The site writes a special cookie that says that while we do not know this user, but the authentication request has already been sent. The usual authentication cookie with the sign that the user is anonymous may well play the role of this cookie - in this case there is no need for clause number 2.
4. The site redirects the request to a special url of the central server, passing the same token in the query string.
Having received such a request, the central server ...
1. Attempts to read his authentication cookie to the user.
2. Finds in the database, in the same nameplate for the token transferred in the query string, the desired entry and writes there the result of the operation from the previous paragraph.
3. The central server redirects back to the site requesting authentication, to a special url, all with the same token in the query string
The site, having received a request for this special url, which should be at each portal site ...
1. He tries to read a cookie about the authentication process (we wrote it in paragraph 3c.) And if it failed, then the client does not support writing cookies, and we don’t know how to work without cookies, so you need to send the user to a page that is intelligibly will bring to his consciousness this regrettable fact.
2. It climbs into the database, all into the same table and reads the authentication result by the central server using a token.
3. If the authentication by the central server was successful, then it writes its authentication cookie so that the next time the user does not drive in vain.
4. Redirects back to the page that initiated the authentication process.

When implementing authentication in AOL, we went as described above, and the result was a pretty good and compact library for cross-domain authentication, which has proven itself for quite a long period of operation. At the same time, if the second-level domain of the sites is common, as is happening now, then no unnecessary redirects are made and all authentication is transparent. Those sites that run on FreeBSD are forced to implement the client-side independently, but this is a rather trivial task. Here are some more comments:

Shared storage for sessions does not have to be a full-fledged database, you can use the standard Session Store, and even your own WCF service with a set of corresponding interfaces that keeps everything in memory. In our case, the portal sites exchange data with the central service through the aforementioned WebService API, and the database is used as internal storage, as it already exists, and it makes no sense to produce unnecessary entities, supporting them, but in general the storage is replaced with a flick of the wrist. ..
If, as in our case, all portal sites are known in advance, you can apply the following optimization (to make authentication not lazy, but greedy, using the language of developers =)): instead of redirecting authentication to the central server every time the user came to the new site, you can immediately after the user has entered the correct username / password pair, open all the portal sites in the IFrame so that redirects occur once to all the sites at once (in this case, you can do without redirects Av in the request token and taking the necessary information from the database). But in the general case it is less reliable and if something goes wrong, one or several user sites will not be seen, while the rest will be fine, and how to resolve this situation with minimal damage is not very clear.
This scheme is bad because it is not very search engine friendly. These guys hate redirects, and if any anonymous crawler said that it supports cookies and does not store them, then it turns out to be completely sad. For this reason, the next version of our library will most likely work by default with a javascript, it is more friendly to such scripts. Although in this case, users without javascript will be the affected party.

Creating a common design.

Another problem that arises when creating a larger, beautiful, distributed and heterogeneous portal is common design elements and functionality for their rendering. Each part of the portal (a separate site), has its own development team and, as already mentioned, these sites are deployed on different servers, the servers are physically located in different places and even the operating systems are different there. At the same time, one should try to maintain the general style, moreover, in such a way that, if necessary, it was easy enough to change it and this replacement would not require much effort on the part of all teams, and ideally, it would do without their participation. Thus, there is a need to create a certain library of controls responsible for the UI, which all project participants could use, without undue effort on their part.

Standard ASP.Net controls and MVC templates (aka Partial View) do not very well cope with this task, firstly, it is inconvenient to share them between different teams in different projects, and secondly, they are completely useless for the guys on FreeBSD. After some thought, the following scheme was built ...

The core of each common control is the xml / xsl bundle. The localized xml of such a control comes from the central server using the same API, which in some cases allows building this xml dynamically based on the data available to the central server, for example, for drawing various kinds of menus. Xsl to get the final html-i is located in the resources of the special assembly related to shared libraries, and in the same assembly is the code for extracting xml-i from the central server, processing its xsl-th and correct caching of the result. Almost always, to use the control on the child site, it is enough to call the method with the appropriate parameters and the control will be drawn, in rare cases it may be necessary to wedge into the drawing process and correct it a little for some specific task, which is also not a problem.

In projects on FreeBSD and other alternative systems, using this sort of contola is also quite convenient. Of course, the logic for obtaining xml-i, applying the corresponding xsl-i and other boiler-plate to it, has to do squats here independently, but most of the problems of drawing common elements are solved all the same. In all cases, Xml and xsl are the same, and if necessary centrally changing the logic of drawing or filling the controls is not very burdensome.

Online store.

Another aspect of the portal’s activities, which makes sense to talk about, is the Internet shop. In the future, each portal site will have something to sell, and even not only websites may be able to provide some online service for a small denyushku or sell some distribution kit. Implementing each store its own store - the idea is unproductive. Of course, it also has its advantages, but there are still more minuses. However, there are also obstacles to the implementation of a centralized store, and the main thing is that it is not known in advance what will have to be sold, and the goods can be very diverse - from services to boxes. And, as already mentioned, to sell products through the general store, in the future, it may be necessary not only for sites that are clearly part of the portal structure.

The successful solution turned out to be the separation of the storefronts of the store and the actual payment process of the order. Obviously, it doesn’t make much sense to generalize the showcase - the product can be the most diverse and versatile user-friendly design you can’t think of, especially since most of the intended product is not known at the time of creating the portal, and all participants in the race may have the most controversial wishes. Therefore, the "shop window" of the store, that is, the catalog of goods with the price and other description, every service who wants to sell, let them implement independently. But the process of placing the goods in the basket and further payment of this basket has been made common for everyone - the logic is rather muddy, but universal, so it can be implemented once, and any improvement, for example, connecting a new payment method, will automatically work for everyone services.

When a user on the site presses the “place to cart” button, the site, having recourse to using the API mentioned here many times, informs AbbyyOnline that such a user (as we remember, we have a perfectly working cross-domain authentication system) wants to put A cart is such a product, at such a price, and the general part of the store remembers all this data for further order formation.
Next, going to the basket, the user already gets to the central site, where the payment process and other interaction with the selected payment system takes place. After the order has been successfully paid, the central site sends to all services whose goods were in the basket, a notification that the user has already paid for such a product.

Although we refused at the very beginning from the idea of a centralized store, a common showcase where the most popular products of all services would be presented is still needed. This is a terribly useful thing, both in terms of marketing and sales - cross-branding and all that, and in terms of design and usability - you always need a reference to “just a store”, and not to any particular section of it. Therefore, each service that has a desire to present its products on a showcase should implement a simple RESTful method that provides relevant information for this most shared storefront.

Conclusion

In conclusion, I can say that here only a part of the portal architecture is described in general terms, which is related to an attempt to make the entire ensemble of various online services peacefully coexist with each other for the benefit of the user. :) Each individual service is unique and interesting in its own way ... :)

Ivan Bodyagin
OCR

Source: https://habr.com/ru/post/106132/

All Articles