📜 ⬆️ ⬇️

Big data and big questions

Every year the popularity of Big Data continues to grow. Analytical reports show an increase in the percentage of companies that actively use “big data” in various business processes.

Today we will talk about how this newfangled term is often misleading and does not allow us to fully appreciate the positive aspects of what is meant by it in reality.


/ photo by Philip Kromer CC
')
The technology itself, so to speak, is already used almost everywhere. Huge amounts of data that are at the disposal of companies that provide users with search services or social networks amaze even the wildest imagination. It would be strange to keep them in complete peace of mind - a business is trying to derive additional benefit from the knowledge that can be obtained by analyzing already existing data about users and their preferences.

Here you can draw parallels with any areas of activity: from medicine to traffic. Meaning in the analysis of data that must sufficiently meet the requirements, and then they can already be attributed to the very "big data".

What is the problem


What gives analysis is the ability to understand patterns and, on their basis, to predict the development of events in the near future. But like any new tool, Big Data requires careful work and careful attention to the quality of the research.

Sometimes it turns out that researchers go into the construction of certain dependencies that do not always correspond to the basic logic. One such example is the verification of a text for compliance with certain “quality” requirements. It all depends on the adequacy of the preset parameters and their compliance with the actual quality of the material.

In some cases, work with the evaluation of the text turns out that the algorithm highlights in long sentences as meaningful or "quality", but we all know that with this approach in matters of evaluation of any artistic work not to go far. Such evaluation algorithms are fairly easy to get around, understanding the logic of their work.

Another example is the Flu Trends project launched by Google. It turned out that he had to predict outbreaks of diseases, but he could not surpass the official services that were professionally engaged in these issues.

What to do in practice


The main reason why certain types of “big data” do not work is in the elementary absence of any minimally built system for collecting these data. Any such undertaking will require significant preparatory work, which leads to additional costs for planning and design.

In addition to understanding the processes of data collection and systematization, it is worth assessing the need to increase the IT infrastructure that will serve these processes. Today, any IT company is somehow confronted with these issues - the amount of data that needs to be processed is growing, and with it the importance of investing in technologies related to Big Data is increasing.

Now it is not enough just to collect a lot of data - to obtain even intermediate conclusions, it is necessary to be able to correctly formulate hypotheses, on the basis of which the analysis will be made. This issue requires the involvement of specialized experts who deal with direct issues of data analysis.

PS We try to share not only our own experience on the service of providing virtual infrastructure 1cloud , but also to talk about related areas of knowledge in our blog on Habré. Do not forget to subscribe to updates, friends!

Source: https://habr.com/ru/post/261487/


All Articles