📜 ⬆️ ⬇️

Krovi: Big Data - as dream. 4th series. Brain revolution

In previous episodes: Big Data is not just a lot of data. Big Data is a process with positive feedback. "The Obama button" as the embodiment of rtBD & A.

There are many great books in the world that have survived centuries and even millennia. The knowledge put in these books is universal. Chinese military strategists, the Bible, the Indian Mahabharata contain, among other things, templates and canons that can be applied to human relations in both I, XI and XXI from XXXI centuries. But the industrial revolution of the XIX-XXI centuries (locomotives-space-computers-Internet) needed its own philosophy.

For more than 100 years, we have been using the laws of dialectical materialism (the brilliant trinity Marx-Engels-Lenin not only discussed the overthrow of monarchies, but also were the greatest thinkers of the end of the second millennium). The laws of denial of negation, struggle and unity of contradictions, the transition of quantity into quality, the cyclical nature of development are all about Big Data too.

At the turn of the millennium (pompous, isn’t it possible? Simpler - at the end of the 90s) the search engines were simple - one server. If there was absolutely no money to go, then two servers. Once every six months or a year, with the growth of the Internet (it was then decided to write with a capital), the search engine was transferred to a newer one (from 64 MB of memory and 128 GB of disks to 128/256). Apport and Yandex were located in several units on Krasnokazarmennaya and Smolenka, and the coolest global search engine Altavista was a real monster - 2 DEC servers, whose products, in fact, advertised the search engine.
')
A few years later, a technological crisis began: the amount of data was no longer placed on the 1-2 servers - the Law of the transition of quantity to quality and (very, very simplistic to primitivism) voila! - the old paradigm “needs a new cool server (preferably from DEC or Sun)” was replaced by Google with the idea of ​​“a lot of cheap hardware” .

This paradigm exists and lives well, data becomes more and more, subsystems become systems, but there is more and more data! The law of transferring quantity to quality by eating “iron” (hard) has grown new fangs and drove teeth into “soft” (soft). Fashionable OSs and languages ​​appeared, new Google operating systems or rewritten by Yandex FreeBSD did not help to solve the new Big Data processing tasks - another revolutionary situation created by the “child elephant” Hadoop appeared on the face : many cheap “hardware” were supplemented with “brains” distributed across all the pieces of hardware .

Technocrat dream - maximum decentralization! More data - there will simply be more lattice “nodes”. More data? “Add some more iron with brains.” Change task for other data? Just fill in the new “thoughts” with iron brains. Since each node of the lattice solves simple tasks, it is quick and easy to make new “thoughts” from standard neuron elements.

I am sure that you have already continued further the chain of dialectic laws of the universe. But in the series we have to focus on all readers, not only on Sherlock Holmes, so let's fix: matter is the unity of space and time, even the term is - space-time. And in order for humanity to live is not boring, that is, the limitation is the speed of light. The more data in the Hadoop lattice, the sharper the teeth. The laws of diamat.

The most humorous dialectical law is the law of negation of negation . Only the next young scientifically new growth has conquered the old old foe and grew beards, like grandchildren come and smash their fathers - and this is where the humor is laid, under the slogans of their grandfathers!

Hadoop-centralizer cannot cope with the time-measurement of the space-time matter for rtBD & A tasks (real-time Big Data & Analytics), in which such value (“vileness”) of data appears as time value: the latest data have the importance much greater than previous.

Following the cyclical development of a centralized solution - technology IMC (In-Memory Computing): one expensive computer , in which, in fact, there is only fast memory - formally disk drives (the slowest nodes in the chain of data streams) are present, but in 30 roles . All the latest (most important) data is present in fast memory, analytical brains work with data "at the speed of light."

As an example of real utility IMC development based on SAP HANA on a popular topic in recent years - intelligent electric power systems. The main task is to optimize generation and consumption, and, as a result, to reduce energy costs. As well as operational monitoring and forecasting. Each house is equipped with a "smart meter". The readings are taken every few minutes and are processed by a big data analytical system integrated with GIS. In the system, you can see a general picture of energy consumption and get detailed information on each district and house: how energy consumption varies depending on weather conditions, time of year and day. And based on these real and accurate data, you can plan the power supply of one of the busiest and most energy-intensive areas.

We need a calculator with a large number of zeros to calculate the benefits in such large-scale projects the size of Manhattan or Brasilia. But the current cost of IMC solutions (hundreds of thousands of $) cuts off 99% of those who want, so for now this is not a mass decision and the search continues.

Where to go next? Will there be a “mix” of Hadoop-IMC, or dynamic “hybrid clouds” with composable “nodes”, or a transition to molecular chemical computers (it’s not for nothing that nature chose this approach)? Life will show.

Here is how the rtBD Platform development process took place in our case:
1. The first 3-4 months (spring-summer 2012) - a cloud , selected the best sets of “core-memory”. The cost of placing data in the cloud at that time was very high (the first TB), and finances like everyone always, that is, little.
2. Next year (2013) - one-time purchase of different-sized servers (HP) for the main subsystems according to the results of cloud experiments. Squeezed on disks, took a little fast, but the main arrays - slow SATA (10 TB).
3. In 2014, accelerated and scaled - buying cheap (compared to HP) servers with fast disks . With our partners, we tested in parallel with the main branch and the branch on SAP HANA - the gain was in speed up to 5 times, but our customers were satisfied with our SaaS or cheaper clouds than HANA.
4. 2014-15 year - hybrid distributed scheme , including the client “one system - one server” in a distributed network of data streams.
5. Negation-negation (to item 1): now dozens of TB archival data are stored in super-cheap clouds :-)

In the following series we will talk about the more vital things for today, but in continuation of what has been said: NoSQL or column DBMS, where the Blue Giant is floating, where the ears of hearing grow from, that “the data is running out”.

UPD for Habr: This “series” is published on Megamind, a group of “sympathizers” has developed around, now I want to “touch” again how much this topic is interesting for Habr readers: maybe someone is watching the work in rtBD or have ideas, solutions ready modules - we have vacancies, we love partnership and do not like to reinvent the wheel.

Big Data is like a dream. 1st series
2-nd series: Big Data negative or positive?
3rd series: "Obama button"

Source: https://habr.com/ru/post/255069/


All Articles