In the first part, we learned about the data, and how they can be used to extract metadata or values from them.
The second part explained the term Big Data itself and showed how it turned into an industry, the reason for the appearance of which was the influence of the economy. This, the third part, in which there should be a logical continuation of the previous two, and all this should have a meaning - sad, sometimes ironic, and sometimes scary. You see for yourself how technological, business, and even social contracts in the future have already been redefined by big data in such a way that we are only now beginning to understand. And, perhaps, they will never be controlled.
With the help of what would not be carried out the analysis - a supercomputer or a table compiled manually in 1665 from the lists of the dead, some aspects of big data have existed much longer than we can imagine.
The dark side of big data. Historically, the role of big data has not always been crystal clear. The idea of processing numbers, leading to a quantitative rationalization for something that we already wanted to do, has existed since we had extra money.
Remember the corporate raiders from the 80s and their newfangled weapons - spreadsheets? The spreadsheet, a rudimentary database, allowed a 27-year-old bachelor with a PC and three scraps of dubious data to involve their superiors in a robbery from a company's pension fund and buy out controlling stakes through a loan. Without personal computers and spreadsheets, there would be neither Michael Milkens nor Ivan Boeskys. It was just the big data version of the 80s period. The pedants will say that this is not big data, but culturally they have the same effect as the industry, which we call today big data. For that time it was big data.
Remember reaganomics? Economist Arthur Laffer argued that it is possible to raise state revenues by reducing taxes for the rich. Some people still believe in it, but they are wrong.
Software trading - buying and selling stocks using a computer algorithm ruined Wall Street in 1987 because firms accepted it, but did not fully understand how to use it. Each individual computer did not understand that other computers were working at the same time and, perhaps, in response to the same rules, turning what was supposed to be an organized retreat into a panic sale.
The management of long-term capital in the 90s put the secondary securities in a position in which they had never been before, only to fail enchantingly, because no one understood what secondary securities are. If the government had not intervened on time, Wall Street would collapse again.
Enron used giant computers in the early 2000s to play on energy markets, or so it seemed to them, until the company collapsed. The history of Enron, remember, was based on the fact that computing giants made the company smarter, but in reality they were used to mask deception and manipulate the market. "Do not pay attention to the man behind the curtain!"
The global banking crisis of 2007 was partially triggered by large computers to create supposedly perfect financial products to which the market could not adapt as a result. But wasn't all this caused by deregulation (a decrease in state influence on the economy)? Deregulation may incite financiers to recklessness, but Moore's law played a more important role, and the cost of computation fell to a level where it was quite possible to technically enforce regulation. Technology has created the temptation to chase after big game.
All of this is just a variation on the dark side of big data. Big data schemas swelled up very quickly and collapsed. These twisted data of madness, in general, should be based not on reality, but on fantasy, which can somehow cover reality.
And inside such failures are usually based on lies or undermined by lies. How else can you make high-rating bonds secured by a mortgage using only junk mortgages? To lie
Building conclusions on the basis of an erroneous method or erroneous data is a big problem. Some experts would argue that this is easy to fix with larger, larger data. And maybe this is possible, but the background of such actions shows poor results.
There is irony here, and it’s that when we believe in erroneous big data, they are usually somehow related to money or some kind of manifestation of power, but we have the same tendency not to believe true big data when they relate to politics or religion . Therefore, big scientific evidence often struggles to be recognized by skeptics, those who deny climate change or support learning about creationism.
Big Data and insurance. Big data has already changed our world in many ways. Take, for example, medical insurance. There was a time when insurance company actuaries studied morbidity and mortality statistics in order to establish insurance rates. This involved metadata — data about the data — so basically the actuaries could not dig deep enough to get beyond the broad groups of insurance policy holders to individuals. In such a system, the profitability of the insurance company increases linearly with the growth in the volume of customers. The profit from the majority of clients is small, so medical insurance companies needed to increase the number of insured - the more the better.
Then in the 90s something happened: the cost of computing reached a level when it became economically effective to calculate the likely health outcomes on an individual basis. It threw the medical insurance business from setting standards to waiving compensation. In the US, the health insurance business model has shifted from reaching the maximum number of people to the minimum number and selling insurance to only healthy people, those who do not need healthcare.
Insurance company profits soared, but we have millions of uninsured families.
Considering that society needs healthy people, the sale of insurance to healthy people, obviously, could not last long. A new kind of economic bubble emerged that was waiting for its end - that is why Obamacare appeared (a healthcare system with patient protection and accessibility of services). You can write a whole book on this topic, but you need to understand that something had to happen in order for the insurance system to change in order to achieve a social goal. Trusting electronic smart systems to somehow find a characteristic way to cover a larger percentage of the population, while continuing to expand the boundaries of profits - irresponsible behavior.
Big scary google. The scam that too often underlies big data penetrates the entire economy and affects those people whom we decided to consider the embodiment of big data or even their creators. Google, for example, wants us to believe that it knows what it is doing. I'm not saying that the creations of Google are not fantastic and not important, but they are built on an inaccessible algorithm that they will not explain, just as Bernie Madoff will not explain his investment methods. Big frightening Google, he has comprehended universal wisdom.
Perhaps, but who knows for sure?
The truth is, they make all the money!
The truth about advertising. Google bills advertisers, but their entire income is paid advertisers. While this may seem trivial, advertisers often don’t really want to know how well their company is doing. If it were transparent enough, how ridiculously low is the yield of a larger amount of advertising, advertising agencies would leave the business. And since agencies not only produce advertising, but also place it somewhere for clients, from the point of view of the advertising industry it is sometimes better not to know.
As a result, among advertisers, the power structure has been turned upside down. Higher-level agency employees earn most of the money and are included in the creative layer, which deals directly with advertising. Newbies who earn almost no money and have little authority are those who publish print, media and even online advertising. The advertising industry estimates the creation of advertising is more expensive than any payback for customers. This is crazy and for such a scheme to work, ignorance must triumph.
It turns out that the Internet is corrupt. The news aggregator Huffington Post told its authors to use the terms of search engine optimization in publications, for the intended increase in readership. Does it work? It is not very clear, although studies say that the results of the search engine optimizer go up if you insert gibberish in posts, which does not make sense at all.
As a result, the following happens - we give in and put up with a lower standard of performance. Will Match.com or eHarmony help you find the best pair? Not. But it's funny to think that they are capable of it, so what the hell ...
Again, it’s strange that we cynically treat data when it comes to science (climate change, creationism, etc.), but almost completely without cynicism when it comes to business data.
Now it is interesting to take into account the following fact. Google's data is raw, not analyzed — more available to other companies, so why doesn't Google have an effective competitor in the search? Bing from Microsoft certainly has access to the same data as Google, but they have one sixth of the users. Here, it all comes down to market perception, and Bing is not perceived as a suitable alternative to Google, although it is just that.
This is a big data game.
The other, unlighted side of this story. Apple data center in North Carolina is estimated at $ 1 billion, it was built before the death of Steve Jobs. I spent the day, parked in front of the gate of this object, and counted one entering and leaving cars. I continued to figure out what server capacity would be needed if Apple kept several copies of all the data on Earth in this building and counted eight percent of the space available to them at the moment. This building is capable of containing two million servers.
Later, I met with a sales agent who sold to Apple every server in this building — all 20,000, as he said.
Twenty thousand servers are many for iTunes, but they occupy one percent of the total building area. What is happening there? It’s a big-data hoax: Spending $ 1 billion on construction, Apple looks at Wall Street (and Apple’s competitors) as a player in Google’s game.
This is not to say that big data is unrealistic, because they are just real. Amazon.com and any other huge retail company, including Walmart, have big data real because these companies need real data to be successful on the thin edges of profitability. The success of Walmart has always been based on information technology. In e-commerce, where real things are bought and sold, the customer always remains a customer.
For Google and Facebook, a client is a product. Google and Facebook are trading us.
All this time, Moore's law is successfully operating, conjuring more and more cheap and powerful calculations. As we said in the first part, every decade the computing power with the same price increases by a factor of 100, only thanks to Moore's law. The computer transaction necessary for the sale of airline tickets through the SABER system in 1955 has declined a billion times today. What was a reasonable expense of $ 10 per ticket in 1955, today is a tiny part of the penny, which makes no sense to take into account. In the SABRE value system, calculations are actually free today. This completely changes what we can do with the help of computers.
Your personal intelligence service. The calculations have become so inexpensive, and personal data has penetrated so deeply that now some cloud applications are turning your smartphone into something like the data mining machines of Edgar Hoover (FBI) or the NSA today. One of these tools was called Refresh and is shown in the picture. Refresh was then swallowed by LinkedIn, and that was swallowed by Microsoft, but the sample is still valid. Enter someone's name on the phone and hundreds of computers - literally hundreds - will tamper with social media and the web, making up an operational dossier for the person with whom you have a business meeting or you just want to sit with him in a bar and you don’t just see everything about this person: his life, work, family, education, the system can track how your lives intersected with him, predict questions that you could ask or topics of conversation that you might want to develop. All this within one second. And for free.
Well, well, the digital crib is hardly the pinnacle of the development of human-based computerization, but it shows how far we have come and suggests how much more we can, as the calculations become even cheaper. And the calculations will become even cheaper, since Moore's law does not slow down, but rather accelerates.
The failure of artificial intelligence. Back in the 80s, a field called artificial intelligence was popular, the main idea of which was to find out how experts do what they do, reduce these tasks to a set of rules, then program computers using these rules and effectively replace experts. The goal was to teach computers to diagnose diseases, translate languages, even find out what we want, but are not able to understand themselves.
It did not work.
Artificial intelligence, or, as it was called, AI (Artificial Intelligence), pumped hundreds of millions of venture dollars in the Silicon Valley before it was declared bankrupt. Although at that time the problem of artificial intelligence was not clearly drawn, it was that we simply did not have enough computing power at an appropriate time and price to achieve these ambitious goals. But thanks to Map Reduce and the cloud infrastructure, today we have more than enough computing power to create artificial intelligence.
Lying policeman Paradoxically, the key idea of artificial intelligence was to give the language to computers, but in reality it happened that a significant part of Google’s success turned out to be in effective separation of language from computers, the human language. The XML and SQL data standards that underlie almost all web content are not used by Google, because they realized that data structures adapted for human reading do not make sense for computers that will communicate with each other. Due to the fact that man was no longer required for computer communication, significant progress was made in machine learning. This is very important, please read it again.
You see, in the modern version of artificial intelligence, we do not need to train computers to perform human tasks: they teach themselves.
Google Translate, for example, can be used online, by anyone for free, to translate text in various combinations between more than 70 languages. This statistical translator uses billions of word sequences that are displayed in two or more languages. This in English means this in French. No parts of speech, no subject or verb, no grammar at all. The system just finds out. And that means that for theory there is no need. This works well, but we can’t say exactly how, because the whole process is driven by data. Over time, Google Translate will be improved more and more, making translation based on the so-called correlation algorithms - rules that never leave the car and are too complicated for people to even understand.
Google brain Google has one thing called Google Vision, most recently it had 16,000 microprocessors - the equivalent of about one tenth of the visual cortex of the human brain. He specializes in computer vision and is trained in the same way as Google Translate, with the help of a huge number of samples - in this case still images (a billion still images) that are taken from videos on YouTube. Google Vision views images for 72 hours and, in fact, trains itself to recognize two times more than any computer on Earth. Give him a picture, and he will find another one like that. Tell him that the cat is on the image and he will be able to recognize the cats. Remember, it takes three days. How long does it take to recognize a cat in a newborn baby?
In the same way that Watson from IBM won at Jeopardy (the Russian version of the TV show is called “Own Game” - approx. Lane), simply processing questions from past issues: there was no underlying theory.
Let's go a few more steps. Conducted research, data-driven, based on magnetic resonance imaging (MRI), images of living brains of convicted criminals. This system is no different from the example with Google Vision, except that we are examining another issue - recidivism, the likelihood that the criminal will break the law again and return to prison after release. Again, without any basic theory of Google Vision, it seems able to distinguish between MRI images of criminals who are capable of repeated crime and those who are not. The Google score for predicting the commission of a crime is based exclusively on one brain scan and is 90+ percent. Should MRI images become a tool for deciding which prisoners to be released on parole? Sounds a bit like the Minority Report movie (“Minority Report”) with Tom Cruise. In this scheme there is a huge presumptive economic benefit for the whole society, but it contains a terrible aspect of the lack of the theory underlying: it works because it works.
After that, Google scientists looked at the MRI of ordinary people while they were watching billions of YouTube frames. After processing a sufficiently large data set of these images and the resulting MRI, the computer can predict what the subject is looking at.
This is called thought reading ... and, again, we do not know how it works.
We promote science, eliminating scientists.
What do scientists do? They theorize. Big data in some cases makes the theory unnecessary or simply impossible. In 2013, the Nobel Prize in Chemistry was awarded to the top three biologists, who built all their research on the conclusion made by computer algorithms to explain the chemistry of enzymes. Upon receipt of this award, no enzyme was harmed.
Algorithms today are improving twice the law of Moore.
What is changing is the emergence of a new information technology workflow that begins with the traditional:
To the next generation:
What is the point? A new implementation style goes beyond what is always required for a significant technological leap — a new computing platform. What will happen after mobile phones, people ask? This will be after mobile phones. How will it look like? Nobody knows, and maybe all this will not matter.
In 10 years, Moore's law will increase the processor power 128 times. Inciting more processor cores to solving problems and exploiting the rapid development of algorithms, we must increase this value another 128 times: a total of 16.384. Remember, at this time, Google Vision is the equivalent of 0.1 volumes of visual cortex. Now multiply it by 16,384 and get 1 638 equivalents of visual cortex. This is where it leads.
In ten years, computer vision will be able to see things that we do not understand, just as dogs can smell cancer.
We beat the wall of our ability to generate relevant theories, while at the same time finding hacks in big data in order to continue to improve the results by any means. The only problem is that we no longer understand how something works. How much time is left until we completely lose control?
By about 2029, according to Ray Kurzweil, we will reach a technological singularity.
This year says the famous futurist (and googler), for $ 1,000 it will be possible to purchase computational power that will correspond to 10,000 human intellects. For the price of a PC, Ray says, we can use more computing power than we can understand or even explain. A real supercomputer in every garage.
Combined with equally fast networks, this can mean that your computer — or whatever device you have — can search every real-time word in real time to answer literally any question asked. Leaving no upholstered threshold.
Hide will not work. Apply it to the world, where every electrical device is a sensor that supplies a network signal, and we will not only have incredibly efficient fire alarms, we will most likely lose any privacy.
Those who predict the future tend to overestimate change for a short time and underestimate the long-term. The 1957 Desk Set with Katharine Hepburn and Spencer Tracy foresaw mainframe automation and the exclusion of people running the machines in the research and development department of television networks. To some extent this has come true, although it took another 50 years and people remained part of the process. But the biggest technological threat hung not over the research and development department, but over the television network itself. Will there be television networks in 2029? Will there be television at all?
Nobody knows.
If you read the entire series of articles and accidentally become an employee of Google, you may feel attacked because a lot of what I describe can threaten your current lifestyle, and the name "Google" is often found in the text. But it is not. More precisely, it is not so. Google is a convenient target, but companies like Amazon, Facebook and Microsoft are doing the same work right now, and about a hundred or more more than other startups. Google is not the only one. And regulating Google’s activities (which Europeans are trying to do) or trying to throw him out of business is likely to change nothing. The future comes no matter what. Five of these hundreds of startups will be a magical success, and they will be enough to change the world forever.
So we came to a self-driving car. Companies like Google and its competitors thrive on faster and cheaper computations because it makes them likely suppliers of data-driven products and services in the future. This is the future of the industry.
Today, if we take the cost of parts of a modern car, then a bundle of wires that connects all the electrical bits and controls the entire mechanism costs more than the engine and gearbox! This shows what our priority is: teams and communications, not movement. But these costs are dramatically reduced, and their functionality is increasing at the same speed. The $ 10,000 amount allocated by Google for a self-driving car will drop to zero in a decade after all new cars become self-driving.
Make all new cars self-driving and the nature of automotive culture will change completely. Cars will appear everywhere, they will go with the maximum allowed speed, and between them there will be only one meter. This will increase the capacity of roads 10 times.
The same effect can affect air travel. Self-guided aircraft can lead to the emergence of a large number of small aircraft, which will be like flocks of birds flying directly to their destination.
Or maybe we will stop traveling altogether. Increased computing power and faster networks already enable telepresence — full-scale video conferencing where the consumer needs.
Perhaps the only real communication with people outside their village will be moments when we physically touch them.
All this and more is likely. Bioinformatics - the use of massive computing power in medicine, in combination with correlation algorithms and machine learning, will look for answers to questions that we have not asked and will never ask.
It is possible that we will overcome both disease and aging, which means that we will die at the hands of criminals, suicides, or tragic accidents.
Companies with big data rush headlong, capturing the important positions of the suppliers of the future. Moore's law went far beyond the boundary, where it became inevitable. .
, , … … .
(Translation by Natalia Bass )
Source: https://habr.com/ru/post/311460/
All Articles