What is big data? The answer to this question depends on who and when it is asked. Take a regular user: Fifteen years ago, the average amount of data on a home computer was several gigabytes; now there are hundreds and even thousands of such gigabytes. A more serious example: sensors installed on the Boeing Jet generate approximately 10 TB of data from each engine in just 30 minutes. That is, the plane that arrived from Moscow, say, to Novosibirsk in 4 hours, will give us about 160 TB of data. And this is only from one flight. For dessert, you can calculate how much data the Olympics in Sochi left to humanity: hundreds of athletes and data about them, thousands of hours of video from competitions, data from security cameras, etc.
Big data is both big problems and big opportunities. Consider a few typical problems associated with “Big data”.
Volume. As we just noticed, there is a lot of data and their volume is constantly growing. This requires fundamentally new devices and algorithms for storing information.
Speed. The data itself is almost useless if it is not processed and processed quickly. By the way, speed is a very relative concept, and the fact that for some data is very fast, for others it will be prohibitively slow.
Heterogeneity The data can be very different: in importance, speed of update, addition, etc. All this requires different storage formats.
Security. Data should not be lost; unauthorized access to them is also undesirable.
This list can be continued, however any problem is the reverse side of possibilities. Amazon, known for its online store, only in 2013 earned about $ 4 billion from its cloud services. In 2014, according to various estimates, this amount can be from 6 to 10 billion. ')
How to store big data. Basic approaches
There are three ways to store digital data:
Traditional: “somewhere at home” - on disks, tapes, local storages, etc .;
In public "clouds": from such giants as Amazon, Microsoft and Google, or from smaller companies;
In private "clouds": a variant more characteristic of the corporate segment; The storage is included in the infrastructure of the company and is available only to its employees.
Let us examine some of the pros and cons of these approaches.
â–ŤStore "at home"
Most familiar to most of us. Information is recorded on local storage - disks, RAID arrays, tapes, etc.
pros
It is familiar. Data is always there, and we are so calmer.
Access speed As a rule, you can easily and quickly connect to the local media.
Price. Although it may be a minus.
Minuses
Unreliability Disks and servers fail as a result of physical wear and tear. No matter how reliable the server is, it will not protect the data from natural disasters or from banal theft.
Access to data. From a distance is absent, inconvenient, or, at a minimum, not always safe.
Scaling. Its capabilities are usually limited. We need to buy new media and place them somewhere. What if today you need 10 TB, tomorrow - only 5, and the day after tomorrow - all 50?
â–ŤPublic clouds
Provide the ability to store data in the cloud for a fee, which depends on the amount of data and related services.
pros
It's comfortable. Companies maximally simplify basic work scenarios.
Relatively safe. Most vendors protect data not only with a user password, but also with their own encryption algorithms.
Quite cheap. Prices in large public "clouds" fluctuate at the level of 5-10 cents per gigabyte per month, and there is a clear downward trend: just recall the recent change in price policy for Google Drive.
Relatively reliable. Even in the case of natural disasters, there is the possibility of geographic replication of data.
New horizons in the future. For example, fast and secure data sharing.
Minuses
Psychological factor. Your data is far from you; What if someone else has access to them?
Price. Cloud storage may seem more expensive than local storage. Although often miser pays twice.
Access speed Still, the speed of access to the Internet, even in advanced countries, is measured on average by megabytes per second (which is at least ten times slower than access to local storage).
Private “clouds” are in many ways similar to public ones and, when used in a corporate environment, can give a sense of greater control over data security.
That's all for now. Next time we will talk about various practical ways to use the "clouds".