📜 ⬆️ ⬇️

Big Data Conversations

-Hello!

-Health. How do you? Alive?

- hold on. You can even say that cheerful and cheerful. Well, we will make an order? What are the current preferences - grilled bastard or Beef Finger Mit?
')
-I do not even know. Rather, the second. And how are things going at the front of selling IT solutions? Do you have time to bring the "iron" to the warehouse? Or is there already a shortage, and you have to give no more than two in one hand?

-Practically. Soon we will work in two shifts - in the morning we sell, in the evening we will ship (laughs). Was on the “Big Data 2013” ​​forum that Open Systems conducted?



-No, it was not possible to go. Was there something interesting or again told about the bright future that awaits us in the coming decade, if everyone starts using Big Data?

-I, unfortunately, was not. And what is the reason for such skepticism in relation to Big Data?

- First of all, with the fact that now behind the veil of marketing it is difficult to see the very grain from which Big Data should grow and develop into something really useful. In each issue of a journal or blog related to IT, there will definitely be some article or article about Big Data, and in this article they can tell about things that are completely unrelated to Big Data or related to it rather indirectly.

- And you do not admit the thought that Big Data is really new, and I will not even be afraid to say the innovative direction in IT?

-Let's try to decompose what Big Data consists of, present it in “large strokes”. First, it is data storage software containing mechanisms for writing and reading data, that is, a CRUD generator and metadata to understand what is written to and where. What is needed for this software from the hardware is only computing power and memory. And the memory can be, both in the form of RAM, and in the form of hard drives. Do you think the fundamental principles of work of processors and memory have changed since the 60s?

-No, the principles remain the same, but the power and the amount of memory increased by orders of magnitude. That's why everyone says that the era of Big Data has come - now the amount of data in a terabyte is considered frivolous.

- Absolutely right. Hard drives that appear on the market, fast and capacious, can hardly be called innovative, since the principles of work in them are laid even when the first computers appeared. Of course, the technical part is being modernized - technical processes are reduced, new interfaces appear. The same can be said about processors - they become smaller, faster, but not “more innovative”. That is, all the hardware used for Big Data is essentially the same old, good disks and processors, but now there are versions with Flash disks and multi-core for the discerning.

-What about software?

-Here here and there is just a bit of innovation - the same Hadoop, or rather one of its half - HDFS. It provides the storage of data in the form of a distributed file system and organizes access to it. Previously, this really was not, although representatives of Teradata here can argue with you. With the advent of Hadoop, it became possible to save hundreds of terabytes of information and even get some information from it - by the way, this was the second.

-What do you mean?

“It’s not enough to store and read data, we still have to extract what is needed from them.” Here the second half of Hadoop pops up - MapReduce, which is precisely what is involved in extracting and processing the necessary data from HDFS. Here there is a lack of innovation - parallel computing has appeared with supercomputers and this idea is not new. Doug Cutting did not come up with something new, but developed a framework for distributed computing. If you look at massively parallel architectures, then, again, Teradata came to this back in the 80s. So, if to sum up, then innovation in Big Data is extremely small.

-Why did they start talking about her just now?

-The technical characteristics of the modern “iron” made it possible to store and process hundreds of ter - this used to be difficult even to imagine. Cheap drives plus multi-core processors.

-And what, the whole Big Data consists of Hadoop?

Many people try to say that they are also Big Data - all large vendors say that their solutions work with Big Data and are designed to process huge amounts of data. Oracle, EMC, HP, IBM, Teradata and so on. But, by and large, Big Data is Hadoop, whatever it may be, even Cloudera or Hortonworks, and maybe MapR. Although there is no consensus and many place such solutions as Exadata, Greenplum, Netezza, Paracel, Vertica, Teradata to the Big Data area.

- For what reason?

-They can process relatively large amounts of data, several tens or even hundreds of terabytes, in a reasonable time.

- What is all the marketing about? Do vendors really get into the idea of ​​Hadoop and try to tell everything and everyone about it?

-Of course not. Someone is trying to recapture the costs invested in this business, for example, EMC bought Greenplum, HP bought Vertica, IBM bought Netezza, SAP bought Sybase - all this cost a lot of money and now you need to beat off these costs and make a profit. Therefore, all vendors arrange conferences, write articles to magazines, hire evangelists - all this is part of a marketing plan to convince potential customers that the era of Big Data has come and it’s time to buy. And why it is necessary, the customer should come up with himself, there are no ready-made solutions on the market.

-A Big Analytics? Using Big Data for analyzing large amounts of information?

- How is it different from a little one? All mathematical and statistical methods used for this analytics have been thought up and described long ago - nothing fundamentally new in this area appears. As the mathematical apparatus developed in the 20th century was used, it is used now. From the fact that the amount of information reached a huge size, analytical methods have not changed. Show me this Big Analytics ...

"Hmmm ... Did you hear anything about Data Science?"

-Of course. It's hard not to know about it when everyone around is talking about Data Science. The same marketing, albeit relevant to science. Let's imagine the process of analyzing data, for example, according to the Data-> Information-> Knowledge-> Wisdom model. So, first we get the data, in our case it’s just a sequence of bytes. The values ​​that are hidden in this sequence of bytes are information, and we get it using metadata, for example, the names are stored in this column, and the date of birth is stored there. To move to the next level - knowledge, we apply mathematical and statistical methods to information and extract some knowledge from it, for example, we can find out the number of people over 60 years old in Russia. This fact gives us some insight into the current or past situation. We can operate with this information and use it in the future. But to go further, to the next level - wisdom, we already use other methods - neural and semantic networks, machine learning, fuzzy logic, that is, all methods related to artificial intelligence. Wisdom gives us a complete understanding of the current situation, that is, we can answer the question: “Why do we have so many people over 60 in Russia” or any other question, the answer to which is hidden in the data. We can also make forecasts and prospectuses. Data Science is a combination of methods and approaches from different areas - here and machine learning, here and Data Mining, here and artificial intelligence, ordinary mathematics, expert systems, genetic algorithms, and so on, giving us an understanding of how to process data in order to obtain of them is wisdom. A very interesting thing in fact, which originated quite recently and is at the beginning of its path. I am following with interest the latest developments in this area.

-And something fundamentally new appeared with Data Science?

-Not yet. People who call themselves a data scientist are trying to apply various methods from the set that has long been known to mankind and look at the result. Sometimes you get pretty funny things. By the way, one of the reasons for the return of infographics is that now you can process a large amount of data and present some facts and phenomena in a particular area in the form of color diagrams and graphs. A certain merit in this belongs to Data Science. I also wanted to note that the speed of development of this direction is quite large. There are certain successes in the processing of textual and graphical information, video and audio. Recognition of the semantic load contained in the text, intelligent search, speech recognition - all this is located at the interface of machine learning and Data Science now. The same Alpha Dog already runs through the forest and is not stuck in every tree.

-Why did Data Science become popular right now?

- Again, technical capabilities have appeared. Now every home has a supercomputer by the standards of the 70s. 8 cores in the smartphone do not surprise anyone. And even a student can afford to buy disks for a couple of tens of terabytes.

-Well, with Data Science it is clear. Still, if we put aside the marketing hype created around Big Data, does it have any practical application?

-Of course. The amount of data that is generated every day is huge. Now even a light bulb has an IP address, or rather a MAC, and you can remove some indicators from it with a great desire. You can arrange Big Data even at home - set the required luminance indicator, for example, 300 lux, remove the value of solar lighting from an external sensor and adjust the power of the light bulb so that the illumination is always 300 lux. At the same time, take measurements of consumed current - if you accumulate such measurements for a year, then you can understand how much electricity you need for the next year, as well as when the light bulb is at full power, and when it has not been used at all. It only remains to draw a beautiful infographic - and that's it, you're in Big Data (laughs). In fact, the main commercial applications of Big Data are now seen in utilities, the real estate market, transport and logistics, warehousing, medicine, government, finance and insurance. In these areas, the flow of generated data is very large and you can try to use it for your own purposes. For scientific purposes or exploratory, Big Data can be used everywhere, there are no restrictions. But the difficulty here lies in the fact that there is no complete, complete commercial solution for Big Data, it’s time to come to develop it yourself. IBM has advanced most far in this regard with its projects TheSmarterCity and SmarterPlanet - these are, of course, powerful things.

-Sound, of course, exciting, but are there already projects where this really works? Generally, how in Russia with Big Data?

“You know, it starts to appear - all the big vendors brought their hardware here and talk about them - the Oracle Big Data Appliance, EMC Greenplum HD and EMC Greenplum MR, IBM BigInsights, Teradata AsterData.

-What are people who already know how to work with them? Javista, understanding in data analysis?

- There is, but a little. By the way, to work with Big Data, it is not necessary to know Java. The same Cloudera Impala has already been released as a beta version, so that soon it will be possible to work with Big Data using ordinary SQL, and this in real-time. Although R still needs to be studied ...

- And a lot of these people?

-Much more people who say that they work in the field of Big Data, but do not really understand what it is. This knowledgeable people can be counted on the fingers - I mean here in Russia.

- Who is all this implements? Integrators generally know about Big Data?

-While companies are implementing Big Data on their own. But, surprisingly, the integrators also came to their senses and in some of them opened a department in this area and even people appeared in them. IBS, Nvision, Technoserv, FORS. You will laugh, but even in Ayteko a direction on Big Data has been opened, although it would seem, why would government agencies need this? For the time being, high-profile implementations and projects are not being written in the press, but things are going slowly. The same Sberbank opened the direction of Big Data in its R & D center and is studying something there. It is possible that in a couple of years they will use Big Data for their own purposes.

Source: https://habr.com/ru/post/173757/


All Articles