📜 ⬆️ ⬇️

Web application evolution

It’s fun for everyone to discuss “everything is new, bad,” and for the last couple of years we have been enthusiastically discussing and trying NoSQL / NewSQL on the server and Angular / Knockout / Ember on the client. But these trends, it seems, is already out of place. A great moment to sit down and think about what's next. As M. Andreessen said, “software is eating the world”. At the same time, mobile / web apps eat regular apps. Therefore, it is especially interesting to estimate, but where does everything go in the world of mobile and web applications? After all, they, it turns out, eat in general all. I believe that the next Big Topic will be data synchronization, and here's why.
synchronous sweepers

And what actually happens on the client?



In the browser, developers are already building quite full-fledged "normal" MVC applications. The MVC architecture itself is not new (from the 1970s), but came to the web only 10 years ago, along with Google Mail. It is interesting that initially the GMail project was perceived as a stray for geeks, but, as Larry Page said then, “10 years' time” . Well, it worked, and GMail is now using everything.

With the advent of HTML5, the browser already has its own data storage (even two), its own business logic in JavaScript and its permanent connection to the server. Now, developers are trying to combine the best of both worlds: the availability and instant response of local applications and the constant “online mode” of web applications.
')
image
The main source of discomfort is server requests. This is especially felt on mobile devices. The irony is that wireless Internet fails when it is most needed - on the road (on the subway), at public events and "in the fields." At home and at work at the table, he works smoothly, thank you very much, but not much needed. And even then, in the same Yandex office, WiFi is not perfect - apparently because of the concentration of geeks per square meter. At conferences like FOSDEM, WiFi never worked.

You might think that with LTE the problem will disappear. Hardly. We see that bandwidth, storage capacity, chip density are growing exponentially, and RTT (server response time) and CPU frequency have improved very little over the past 10 years - because of physics. For mobile networks, physics is millions of tons of concrete, reinforcement and rock, and radio spectrum properties that are not going anywhere. Saves data caching and background synchronization, the same GMail and then ahead of everyone. Dropbox also pleases in this regard, but Evernote is not very - full of complaints about synchronization failures and data loss.


Think about it. More data and logic moves to the client. WebStorage, CoreData, IndexedDB. There is more data, more space on the client, and running to the server does not become easier. Downloading data is easy, caching is even easier, and with synchronization, rocket science begins. The failed mobile "saves" empty data to the server - oops, they are lost, the user is not satisfied. The data is changed on several devices at once - oops, conflict, the user grinds his teeth. And how many more such ups will be. New conditions on the client are already reminiscent of the conditions in the "big" systems - a lot of equipment, a lot of data replicas, everything is constantly breaking everywhere and this is normal. It seems that the client will soon need tools from the arsenal of "big boys".

And what actually happens on the server?



On the server, once everything began simply and logically - nothing foreshadowed trouble. There was one source of truth - the database (say, MySQL), there was one server with logic (say, PHP), and the client received a flat View and pulled the logic on the server through HTTP GET requests. At first, everything was easy with the scaling of logic - through the multiplication of stateless servers. The database gradually began to require replication (master-slave), then it was necessary to protect it with a cache (Memcache) and add pusher to send events to the client in real time. Then the situation began to complicate further. Somewhere they screwed Hadoop, somewhere NoSQL, somewhere graph database - there was a lot of storage. Also, everything is overgrown with specialized services - analytics, search, mailings, etc. etc.


All this rapid multiplication of specialized systems also raised the problem of data synchronization. As the most successful solution for the "zoo", many mention Apache Kafka . In such an architecture, different data slices are stored in different systems, which prefer to exchange events via the “bus”. Indeed, when integrating N systems, one can write either O (N * N) adapters or one common event queue. With N = 2..3, this nuance is still a little noticeable, but on LinkedIn, apparently, N is large.

And now two cigarette butts at the same time. Imagine the number of replicas of the same data in the system - both on the server and on the client side. Say MongoDB + Redis + Hadoop + WebStorage + CoreData. How to synchronize everything correctly?

And where does all this lead?



So what are the next trends? Nobody can say for sure, but you can figure out what to look for. After all, new ideas appear extremely slowly and extremely rarely. For example, MongoDB uses master-slave replication via oplog, which differs from MySQL replication through binlog only by incremental improvements and is based on the work of 1979-1984 Leslie Lamport about state machine replication.

NoSQL systems - Riak, Voldemort, Cassandra, CouchDB are trying to make a noticeable step forward and squeeze something from eventual consistency, causal order, logical and vector clocks - these approaches go back to the works of the same eighties of Lamport and C.Fidge.

The super-duper technology of simultaneous editing of the Operational transformation of Google Wave and Docs was first described in 1989 by CA Ellis and SJ Gibbs and is now obsolete.

Fresh little, but also comes across. Attempts by NoSQL vendors to try water with CRDT data structures are based on the theory from the late 2000s - a very atypical fresh one.

So, we will think that it will sprout in the new environment. New trends will inevitably be determined by something already widely known in narrow circles. I can offer three main suspects.

"Shared tire" and event-oriented

The first and most promising trend is event-oriented and reactive architecture, as well as event sourcing. In these approaches, the work in the system goes not only and not so much with the current state itself, but with operations that change it, which are objects of the “first category” - they are processed, transmitted and stored separately from the state. Kafka and common event bus I mentioned. If you think, then the logic of the client-side event-oriented. We add here event-oriented business logic and an interesting puzzle begins to be assembled.

image
The event-oriented approach, in particular, reduces dependence on ACID transactions. The main argument for transactions is “what if we count money?” The irony is that in the financial industry itself, where they really count big money, before thousands of transactions were invented, they did without thousands of years without them. Accounting with bills and balances, in the form in which they were formed in the Middle Ages, is a classic example of event sourcing: operations are fixed, reduced, and collected in the state. The same can be seen in the ancient system of calculations Hawala. No ACID, pure eventual consistency. The books are written , for example, about the developed financial system of the Spanish Empire, which has been in debt for a hundred years. Letters from the capital to the outlying provinces turned around for a year. And nothing, the goods were delivered, the army fought, balances were reduced. But my business partner recently went to the tax office three times, because they had “computers stuck”.

Information-centric architectures

The second trend that can start to whip is info-centric architectures. When data is stored anywhere and freely flows, it makes sense to implement applications not in logic where the storage is connected, but always to dance from the data. Historical examples of infocentric networks - USENET and BitTorrent; in the first, messages are identified by the group name and own id, and flow freely between servers. In the second, data is generally identified by hash, and participants mutually beneficially exchange data between themselves. General principle: we reliably identify data, then we build for them and the participants and the network topology. This is especially important when participants are unreliable and constantly go and come, which is very important for mobile customers.

About five years ago, there was an interesting exchange of views on the infocentricity of the Internet. The pioneer of TCP Van Jacobson defended the position that existing architectures have reached the limit of scaling and the info-centric Internet should be designed from scratch. Another group of participants, from where I can remember Ion Stoica, generally adhered to the position that HTTP is certainly outdated, but by adjusting the CDN infrastructure for HTTP, we can understand the URL as just an identifier of a piece of information, and not the server-file path. Therefore, the HTTP + CDN doublet can be infocentric to the extent necessary to solve all the pressing problems with the propagation of statics in HTTP. And, perhaps, the guys were basically right.

An interesting question is whether it is possible, by analogy, to drill infocentricity for data in web applications, without being engaged in architecture redesign from scratch.

CRDT types

And the third interesting trend that I already mentioned as fresh is CRDT (Conflict-free Replicated Data Types). They have already begun to be screwed into a high-loaded NoSQL system (I know for sure about Cassandra and Riak, but it seems someone else was). The main idea of ​​CRDT is to abandon the linearization of all operations in the system (this is the approach that MySQL uses for replication), and limit itself to having a partial order with good causal order. Accordingly, you can only use data structures and algorithms that do not break from easy reordering of operations.

CRDT allows optimistic synchronization of multiple replicas, when complete linearization of all operations is in principle impossible. The literature discusses options for how to implement basic data structures in CRDT — Set, Map, counter, text.

This is the most promising approach, which allows us to step a little further than the primitive last-writer-wins, which is dangerous because of the potential loss of data in a concurrent write. The main difficulty, of course, is that the developer must understand the capabilities of CRDT in order to work with them. That, in general, is true for any technology. Plus, not all data and algorithms can be decomposed into a CRDT basis. And the main benefit is the ability to resolve issues of synchronization of the zoo replicas and data warehouses, with competitive write access.

CRDT-engine in the mobile application will allow to fully work with the data even in the absence of a permanent connection to the Internet, and only when a connection appears, synchronize, without conflicts and data loss. Ideally, this might be completely transparent to the user. A sort of dropbox / git for objects. Yes, yes, Firebase, Meteor, Derby, Google Drive Realtime API - the move goes in this direction. But it was not yet a bomb, a MySQL or jQuery level - it means that something else is not ready.

If, for example, mentally cross the CRDT, information-centric and event-oriented, we get a hypothetical CRDT-bus, which, unlike the CORBA / DCOM / RMI, will allow not so much access to remote objects, as work with their local copies. At the same time, these local copies will not be a flat "cache", but full-fledged live replicas with the ability to record, which will be automatically synchronized.

What is not architecture for the web applications of the future?

Source: https://habr.com/ru/post/218215/


All Articles