📜 ⬆️ ⬇️

Big Data: Silver Bullet or another tool



The term “Big Data” appeared not so long ago - it was first used in Nature magazine in 2008. In that issue (September 3), it was suggested that readers should call a set of special methods and tools for handling huge amounts of information and present it in a form that is understandable to the user.

Very soon, researchers of the newly-minted region came to the conclusion that big data is not just suitable for analysis, but may be useful in a number of areas: from predicting outbreaks of influenza based on the analysis of queries to Google to determine the favorable cost of plane tickets based on a huge array of aviation data .
')
Apologists of this direction even claim that a tandem of powerful modern technologies and “powerful” volumes of information available in the digital age promise to become a formidable tool for solving virtually any problem: crime investigation, health protection, education, the automotive industry, and so on. “We just need to collect and analyze the data.”

Who works with big data


In the wake of the popularity of big data, the number of companies one way or another using them in their work (or seriously thinking about it) is increasing. The CNews publication conducted a survey of organizations on whether they use (or plan to use) big data in their work: 40 companies out of 108 respondents answered positively. And this definitely makes sense: with proper use of big data, the business sees a real return: executives point out revenue growth, increasing the accuracy of product positioning, increasing the effectiveness of marketing campaigns.

Here are some examples of such success-stories. First: the famous online platform for hosting, searching and short-term rental of private housing around the world - AirBnB. On the company's website, you will not only find information about the owner of the living space you want to rent, but also check if he is a friend of someone you know on Facebook.

User data is also analyzed by Netflix. The staff of the service has developed an algorithm that allows to form high-quality recommendations of films. Moreover, the company used the accumulated information to create its own unique content, which made a worthy competition to the best cable TV products.

It's about the political drama "House of Cards". Data expert Sebastian Wernicke says : in order to succeed, you need to disassemble the data into its components and analyze it, and only then, using your head, decide what to do next.

The specialists at Netflix examined the data that the company already had (ratings on the Netflix platform, history of views, and so on), and then used them to reveal those small aspects of TV shows that viewers like. The result is a series that has earned a rating of 9.0 in the IMDB rating (at the time of this writing).

The creation of the "House of Cards" Netflix work with big data is not limited. For example, the company uses data analysis to compile a catalog of genres and the classification of films and TV series “in its own way”: instead of the usual thrillers and romantic comedies among more than 90 thousand (!!!) Netflix genres, you can find “cult horror films with evil children” “Dark sci-fi suspense” and even “Indian romantic crime dramas”.

At the same time, the algorithm analyzing cinema from all over the world can not only determine the genre of the film, but also can potentially predict the emergence of new trends in cinema (we have described this in detail here ).

Although most companies do not have such capacities, this does not mean that only selected businesses can use the data. As Tom Davenport writes in his book “The Rise of Analytics 3.0: How to Compete in the Data Economy”: “The most important feature of the era of Analytics 3.0 is that not only online companies, but literally any firms in any field of activity may be involved in data economics. "

UPS, for example, uses digital map data and telemetry systems to plan the best route for each of their drivers, and there are more than 55,000 of them. Progressive Insurance takes into account information about the credit rating of its customers and compares it with its data to predict the likelihood of insured events.

All of this, on the one hand, is fairly standard scenarios for the use of data mining (unlike the Netflix recommender system), but gradually they are turning from the category of “top analytics to search for insights” into quite trivial business tasks.

How are we doing


The Russian big data market is still relatively small: in 2014, its size was estimated at just $ 340 million compared to the global $ 33.3 billion. However, it grows very quickly: if “on average in the hospital” (worldwide), the big data market adds annually at 17%, then our growth is 40% per year.

The main companies interested in analyzing big data in Russia are telecom operators, banks, large retailers: this is not surprising, since they really collect a lot of data and the tasks (primarily cluster analysis of consumers) are acute for them. However, there are other examples.

Of the most indicative are the international division of Yandex Data Factory and Mail.ru Group. And if in Mail.ru the analysis of large data arrays primarily serves the goals of developing its own services, then Yandex works as a b2b-dataminer (the company has data analysis projects for companies from Statoil to Wargaming).

Big data - great swimming?


Indeed, big data can be applied in various areas, but it is important to understand the pros and cons of this tool, as well as to imagine what can be done with it and what cannot be done.

When working with big data, we are not always talking about huge amounts of data (more precisely, not only about them), but for some reason this is what most people think of when they mention Big Data. Much more meaningful is the ability to evaluate the data — look at the relationships between them — and then link them together into a coherent picture.

But it is not always possible to trust the correlations in the obtained data, for example, it is known that the number of murders in the USA decreased with the fall of the share of Internet Explorer in the browser market - but this is absurd and has no practical applicability (except jokes).



In addition, many tools based on big data can be fooled. For example, programs for evaluating essays use metrics such as the length of sentences and the complexity of the words used, and also reveal correspondences in already written works that have received high marks.

As a result, the algorithm tries to reduce the quality of creative work to a relatively narrow set of quantitative characteristics. Of course, there is a certain sense in this task, but the process of writing an essay with such an approach to assessment can easily be reduced to a mechanical selection of the “right words”.

Even IT giants and Big data apologists like Google are not insured against mistakes. The company did not succeed in defeating the “ search bombs ” phenomenon, and the Google Flu Trends project, which, according to developers, is capable of predicting disease outbreaks, was mistaken more often than the US Center for Disease Control and Prevention.

Not without difficulties and in Netflix. The genre definition system discussed above works exactly as the developers intended - with the exception of Perry Mason's so-called Mystery , which neither third-party analysts nor the Netflix employees themselves can explain.

According to CA Technologies, 92% of companies working with big data companies are experiencing difficulties with the development of Big Data projects. The most serious obstacles are the lack of development of the existing infrastructure and organizational difficulties in introducing new approaches to data collection.

The problem may lie in the notorious "human factor" - not every analyst can work effectively in this direction. Ricardo Vladimiro (Miniclip employee) believes that in order to really dive into the study of data, a person must be well versed in statistics and probability theory, as well as be able to conduct experiments and test their hypotheses, visualize data.

But this is not enough: the science of data is a mixture of statistics, mathematics, programming, and, importantly, subject knowledge, whether it is trade, banking, or any other industry. Too many organizations hire brilliant mathematicians and programmers who do not have this last component.

It is simply impossible to deal with big data without a deep understanding of the market for a particular business and the characteristics of a particular company. In particular, it is for this reason that Gartner recommends not hiring employees to work with data from outside, but to train such specialists within the organization (not to mention that all the above-mentioned skills - from statistics to subject knowledge - are transformed into a separate profession - work with data).

Another problem that arises primarily for “lone analysts” - researchers working “for themselves” (for example, as part of research activities at the university), and small companies who decide to use big data - lack of funds for the appropriate infrastructure, necessary for their processing.

At the same time, the question “where to find data” in this case is not so acute: businesses collect a lot of customer data (as we remember, a staggering amount is not the only and optional feature of Big data), but researchers can use sets that are freely distributed by large IT -company.

An illustrative example is Yahoo: the company has released into the public domain an impressive dataset for research purposes. As the users of the Quora resource rightly pointed out in the relevant discussion, a researcher without a team and resources may not have the capacity to analyze this set.

The solution in this case can be cloud services: for example, in 1cloud we give the opportunity to use the infrastructure of data centers for both companies and individuals. On the one hand, it is easier and cheaper than working at own capacities, on the other hand, such a format of work allows you not to "put everything on big data" and reduce risks if their use turns out to be unjustified.

The popularity of Big Data led to the fact that this technology began to be perceived as a universal "silver bullet", possessing magical abilities to solve any problems. But in fact, this is just another tool with its own advantages and disadvantages.

In order for big data to bring real benefits, you need to not only invest in implementation projects, but also use new technologies (for example, cloud computing), work on debugging business processes and change management approaches.

PS Additional materials on the development of a virtual infrastructure provider 1cloud :

Source: https://habr.com/ru/post/282560/


All Articles