Species data diversity

The term “big data” has long been familiar, and many even understand what it really is and how to use it. At the same time, data analysts have come up with many other gradations of information collected, depending on the size, relevance, relevance, and so on. Surprisingly, the data can be “fast”, “hot”, “long” and “slow”, even “dirty”. Although this entire analytical zoo did not help numerous analysts correctly predict the decision of the British to leave the EU and Trump's victory.

Big data is not just very large amounts of information, but a set of approaches, methods and tools for processing various data of colossal amounts.
Big data is not just information, it is a socio-economic phenomenon that owes its appearance to the need to analyze vast amounts of information on a global scale.

Big Data is based on three V: volume (volume), type (variety) and velocity (speed). With volume everything is clear. Diversity depends on the breadth of the spectrum of sources feeding the database. And speed is generally the main indicator of the modern world, which does not stop for a second.
')
Is it possible, for example, to consider “big data” polls, even covering thousands of people? The amount of information that can be obtained from all kinds of surveys is large enough, but still not so much, so it can be attributed more to the " average data ". Probably, if the election analyst covered millions of respondents, it would have been “big data”. Also, Big Data can be made up of bricks of " small data ".

One of the trends today is “ fast data ”. In the modern world, everything happens at lightning speed. In applications and social networks information, which is 1-2 hours, is no longer relevant, every second is at stake. Fast data is important for banking applications, and for social networking applications, and especially for instant messengers. Every second, users receive new notifications, based on which they make important decisions for themselves.

In order to accumulate " slow data ", it will take a lot of time. In contrast to the fast data that can be obtained with the help of instant polling, the slow ones are accumulated literally by the grain. For example, you are conducting a survey of participants in a design conference. Each participant is polled before, during and after the event. Then all the information is very carefully processed and summarized.

And when the accumulation duration begins to be measured over the centuries, the slow data will turn into “ long ”. Since the era of Big Data began relatively recently, today long data should be searched not on the Internet, but in books, manuscripts, on the walls of architectural monuments and during archaeological excavations. The historical aspect may be very important for a specific study!

Although the data are not patties, they can be “hot” and “cold . ” The principle of “freshness” works here: more “fresh” - hot - data are of greater value. For a simple user, the long-awaited comment in the messenger “freshness” in 10 seconds is more important than the already “cold” comment created 2 hours ago. Of course, it can still be useful, for example, to clarify some fact from the correspondence: recall the name of a book or film offered by a friend, specify the time of the meeting, and so on. Access to hot data should be permanent. We do not need cold data as often, therefore constant access to them is by no means the first necessity.

In addition to the characteristics of size, speed or temperature, the data can be classified by their purity. “ Dirty ” refers to data that is either erroneous, or contains incomplete or inconsistent information, and is usually practically useless. Dirty data makes up most of the information accumulated in many companies. At the same time, real informational treasures can be found here - valuable long-term ideas. But from the dirty data enough and trouble. According to GovTechWorks, such unstructured and irrelevant information costs US companies $ 6 billion annually!

The term “ responsible data ” describes a situation where only reliable information is accumulated, which is taken from reliable sources, stored and transmitted in compliance with strict security measures.

“ Thick data ” is the next step after we play enough with big data: besides quantitative characteristics, qualitative ones are taken into account. That is, dry numbers in huge volumes alone are not enough for a deep understanding of trends and processes, for completeness of the analysis it is necessary to take into account such things as, for example, human emotions.

Big data rule the world

With such a variety of definitions, the question arises: what are they really, these data? First of all, big, gigantic! Big Data gathers near us, around us and even about each of us. Small grains of sand slowly and correctly form them.

The popular phrase “Big Brother is watching you” comes to mind immediately. Of the scraps of information collected everywhere, certain databases are made up that are used for various research and manipulation of public opinion. Subsequently, all the information obtained is analyzed, and the so-called guessing about the outcome of important events occurs. This fortune-telling gives rise to all sorts of predictions about election victories, changes in the political situation in the country, or fluctuations in the popularity of any musical group among young people.

The title of Big Data champion is earned by such three whales as Google, Facebook and Amazon. These corporations record the slightest click of the mouse of each user of their portals. And all this for the sake of global information gathering. Big data has high hopes. Researchers predict their enormous influence on all branches of human life and activity. Not spared this fate and medicine, and science.

How can Big Data be useful in medicine? The point here is not even in the amount of accumulation of information, but in the methods of its processing and analysis. The volume of medical data in a number of areas has long reached a size that is problematic not only what to process, but even to store. The most striking example is the decoding of the human genome, consisting of more than 3 billion characters. This work under the auspices of the National Health Organization of the United States took 13 years (from 1990 to 2003). In 2017, due to the increase in computer power and the development of theoretical and software tools, this task will take weeks or even days.

The main task of big data in medicine is to create the most complete and convenient registers of medical information with the possibility of mutual exchange, which will allow to enter everywhere the complete electronic cards of patients containing the entire medical history from the moment of birth. This will significantly optimize the work of healthcare institutions.

But let's go back to the latest sensational events that in the literal sense of the word turned the world Internet - the victory of Donald Trump in the elections. Although his victory turned out to be a surprise for many people, including analysts and political technologists, it is probably largely a natural result of the intelligent use of big data.

The Swiss magazine Das Magazin claims that this victory was provided by a couple of scientists, Big Data and modern technologies. Someone Michal Kosinski has developed a unique system that allows finding out the maximum information about a person only by his likes in social networks - the so-called “micro-targeting”. Kosinski’s later development, against his will, began to be used in major political games. Later the same system worked in the election campaign of an American businessman. No one had any idea about the connection of the politician with the analytical company, because there is not even a computer on Donald’s desk. But the current US president has betrayed himself. He wrote in his Twitter account that he would soon be called Mr. Brexit.

In her election campaign, Hillary Clinton acted traditionally - appealing to different groups of the country, making separate appeals to the black population and women. Cambridge Analytica acted differently. Having bought databases of adult residents of the United States, they studied each of them according to the OCEAN method, taking into account personal preferences and interests. Depending on their character and mindset, each person from the database sent messages calling to vote for a Cambridge Analytica client, and the justification was selected depending on the previously constructed individual profile of the addressee. Some of the messages were even built on the principle of contradiction, and offered to vote for Hillary.

Kosinski, the scientist who invented the micro-targeting system, is only observing this use of his development from the outside. According to Michael, it is not his fault that the invention has become a bomb in the hands of others. It should be emphasized that the publication of the Swiss magazine has been criticized by numerous European media that claim that the information provided is unproven.

While the question is being discussed of whether big data really influenced the elections in the United States, this data continues to be studied and systematized. Beware of social networks - who knows who you will vote for or what run to buy, having experienced the impact of big data?

Source: https://habr.com/ru/post/402345/

All Articles

Species data diversity

Big data rule the world

More articles: