Big Data through the eyes of various industries is another dream of the Grail, which will decide, save and protect! In life, everything is exactly the opposite: Big Data is a completely new task, rolling back stagnant projects and dismissing unreconstructed specialists. We offer a series of articles on the practice of real use of unstructured Big Data in various industries, the formation of new specialties, which are still coming up with the names - Big Data analyst and sociologist, HiLoad linguist, trendy journalist (from the word trEnd, not trYNde), and Hopefully, fruitful discussions where the new high road should lead.
Pink dreams, like ideas, according to BD (Big Data) are different for everyone: vendors have a lot of hardware, soft computers have a lot of new software, telecom has a cloud, customers have a magic wand: “I pressed a button, and she herself made for me! ". There is no worse bummer than a bummer from unfulfilled dreams. At the same time, vendors, softoviki, telecom, etc., will fulfill their dreams and fly to collect pollen from new dreams of customers disillusioned with BD. Knowledge is power, it's time to apply this power and look at BD with a sober look through the eyes and expectations of customers and industries.
Some years as we are engaged in the most "tasty" BD - unstructured rtBD & A (real-time Big Data & Analytics). In the rtBD & A segment, fast-growing or exploding existing industries are being created explosively, for which we need the “right” specialists and a lot: Gartner estimates the market for BD analysts in the US only by 2018 to 190 thousand people. As practitioners who have already faced new challenges, we understand that “we are owed”: to tell, explain, help — otherwise it will be as usual: the dream will turn from a grailist “pink elephant” into a “big pig” with all that follows.
The term Big Data, as a new concept of just something with a 5-year history, begins to actively penetrate and be used in various fields and industries: video, RTB, sociology, medicine, space, finance, and then everywhere - wherever you go, everywhere there are people who are proud to tell how they are courageously fighting terrabytes and trillions of records to improve the CURRENT work of specific industries.
')
Unfortunately, this approach may be the biggest mistake of the client understanding of Big Data as a dream of a bright future. Let's try to figure out what the problem is. Then we set out our vision, formulated by 20 years of experience in creating various Internet projects "in the field of Big Data" (they were called differently before) and with an emphasis on rtBD & A.
Our vision in some aspects may differ, and even significantly, from the usual technological pattern VVV (volume, diversity, speed) for Big Data, because:
1) From the client's side, only the result should be visible (whale fish, periodic table), and not the ocean of data itself;
2) Diversity not only in the data, but also in the diversity of sources, as well as the diversity of the attitude to the sources themselves;
3) Such “complex” systems as a person, groups of people, or entire nations with their own individual perception of the world, history, relationships, phraseology and vocabulary can act as sources of “sensors” for BD;
4) Life is always wider than any templates.
So,
first, let's forget about "BD is a lot of data." Analysts (researchers, inventors, and other "scientists" and clients) need enough data for an "explosion" in order to arrange the "explosion" of the OLD industry formation. Remarkable example: we do not know how much data Mendeleev had, but they were enough to form at the output a “Periodic table of chemical elements” of less than 100 cells. Further comments are not required - now everyone studies chemistry at school.
Secondly, it is necessary to separate :
A) personalized multi-data-per-object,
B) the information field of data in the industry and around objects.
An example of type A: RTB-data for displaying a specific “targeted” ad on a specific browser on a specific device. Are you still pursued by unnecessary advertising high bank deposits, because your half poked into a beautiful advertisement with a handbag? - This is it, the type A system - the “travels” of your browser on a laptop are stored in petabytes to remind you of all the sins of youth, even if you have already changed gender.
Examples of type B: what problems did the iPhone play to reduce sales in Russia? Will Le Pen be able to bypass Sarkozy in the regional elections?
Type A is often referred to as the “Dossier” type: there is a specific known object (for example, a person, or a wallet account, or a phone), and with any “wiggle” the data on the object is replenished with another entry in the Dossier. For type B, a specific object is not important (there is a big fish in the ocean), data are analyzed for the whole ocean, with all the fish, algae and plankton.
“Winwood Reed said it well,” Holmes continued. - He says that an individual is an insoluble mystery, but in the aggregate people represent a kind of mathematical unity and are subject to certain laws. It is possible, for example, to predict the actions of an individual person, but it turns out that the behavior of the whole team can be predicted with greater accuracy. Individuals differ from each other, but the percentage of human characters in any team remains constant. "(Arthur Conan Doyle," The Sign of Four ")Thirdly, it is necessary to distinguish between structured data (for example, a check for purchase in the store) and unstructured
data (but at least this article on Mega Brain). Of course, there is always someone who finds the text of the article “structured” - at least in the form of a set of 33 letters of the alphabet, 10 numbers and a few punctuation marks. Erundists can be sent to school to teach the same chemistry (why a liquid-ice water molecule is formed from two combustible and volatile chemical atoms “H” and “O”).
Fourthly, which is closer to technocracy, BD can be divided into real-time and ... non-realtime . Again without fanaticism, please. Two years ago, when communicating with colleagues from Cloudera, when they showed them some examples of rtBD & A, one of their specialists plaintively said that Hadoop is, of course, cool, and tomography of the brain can be processed in a day or two - the very thing, but real-time requires completely different solutions. But about this in another song.
Summary of the 1st series: Big Data - the amount of data needed for the revolution, not evolution. The data can be object or across the entire information field; they can be presented as structured or not; some tasks require data processing in the near real-time mode.
In the following series: Who are they, Big Data analysts? Why is IBM ready to train 10,000 employees to analyze Twitter data? Some unique case studies analytics unstructured BD. What industries are already going "under the chandelier"? What technologies are required to process Big Data? Why such successful companies as Motorola, Nokia, HTC “died” and whether Samsung will survive in the fight against Apple? Where are ideas now born and who makes them up? ..
But, as often happens in rtBigData & A, all the above-mentioned plans can be sidelined, and the next series will be devoted to discussing the issues and tasks that will be put in the comments to this introductory material :-)
2-nd series: Big Data negative or positive?