The CIA is big tasks and big data. Towards a Global Information Cap
Ira Gus Hunt, acting director of technology for the CIA, talks about his vision of Big Data in the service of the CIA, as well as the tasks and methods of solving them. The presentation took place at the GigaOM Structure: Data 2013 conference held March 20 in New York. As they say eyewitnesses, it was one of the most interesting and memorable performances.
HOSTING 00:00
If you do not applaud our next speaker, he will definitely pick up your own business and make a mark there. Mr. Gus Hunt is Director of Technology for the Central Intelligence Agency. He is going to talk about those big calls related to handling Big Data for the CIA. Welcome to the scene, Mr. Hunt. ')
[applause]
IRA GAS HUNT 00:22
Since there is only I alone between you and dinner, I am not very sure that I would like to be here, but we will see if I can manage to keep your attention. My name is Gus Hunt and I am the Chief Technology Officer at the CIA (Chief Technology Officer), and I would like to talk with you about the things you have been listening to all day here. I will try to tell you about how, from our point of view, everything that happens in the world looks like, why it matters to us, and then what we think should be changed so that we, and I suppose, the whole private sector , managed to take advantage of the use of Big Data.
If you think about the world that was once, then there were Clouds in it. So it was three years ago. We are now at the point where Big Data is located, so for the entire past year we have been reading all these articles that take our breath away. And more - glossy covers. I already expect Big Data to become Man of the Year at Time. This year, we are seriously talking about how we will be able to get value from existing materials, and I have heard many conversations and opinions about this here ...
In case you don’t know how we earn our living, the CIA has three main areas of “business”.
We collect information about the plans and intentions of our opponents. We do a comprehensive analysis of the sources, where we combine the freshly gathered information with what we already have in our hands, after which we can tell the President, the Secretary of State Security, those who make the policy, as well as all the others, what all this means. And the third thing we do - and we are the only agency authorized by law, with the knowledge of the President of the United States - undercover operations. These are the three areas for which we are responsible.
About four years ago, when I was appointed to the post of technology director, we somehow sat and talked: “What we should have to be confident in our future” and we came to what I call our four major tasks .
The big task at number one, which arose four years ago, concerned the Big Data and our abilities to take advantage of the large information flows arising on the planet. This is necessary so that we can understand what is happening in them and protect national security. That's what we do.
Number two - and that was before talking about sequestration and other things - the fact that we have a certain responsibility before you - taxpayers, and you need to be sure that we have spent every dollar as efficiently as possible. But when we think about efficiency, it’s not a matter of lower cost. This is a proposal of the best value, and for us “value” is defined as results divided by cost and time. Better results in less time and less money give more value.
Thirdly, and sometimes on this we deliberately emphasize that we must interact together as a community, despite the fact that you have to read that everything is working incorrectly, that we are not sharing information, and so on. In fact, it is not. We do our work well. And like any organization that is similar to those that exist in private business, we consider problems from different points of view and angles, and sometimes this gives rise to small discussions about the right way to solve the issues we face.
Number four is staff. If we do not have talents - people with the abilities we need, we will not be able to carry out the tasks that we set for ourselves.
Then we announced that in order to achieve these goals, we would need to have a solid foundation in which we are going to make investments. We have brought together six key technologies to solve our problems, and we are going to invest in these technologies for a long time.
All this is necessary in order for us to have confidence that we are a viable and competitive organization, aspiring to the future.
These are fairly simple things and you know them well, but safe mobility for us is a topic of great importance. Mobile technology is not safe. Repeat after me: Mobile technology is not safe. And indeed it is. How are we going to make them safe in our environment so that we can benefit from them? This is a serious task.
The second thing given here is what we call advanced analytics. In fact, we view analytics as a service. By this we mean everything we need to do with Big Data — to do the work necessary to maintain the security of our nation.
The third thing we have is widgets and services. We approached this by using such a thing as the Ozone Framework. Ozone is a framework that the intelligence community has developed based on the Google framework. The main reason for which we use it, coincides with the reasons for which you use your smartphones, iPads and other things. You can personalize them and put on them all sorts of different things needed for your business or personal life. We need to create an environment where our analysts and operators and other employees can deploy the necessary functionality for them and personalize their world. We can call it WebTop, or device top or something else, as you wish.
Fourthly, which, by the way, is on the slide at number three, and I don’t really want to explain the strangeness of such a calculus system - this is a long story. So, the fourth; Security is a service. We do not want you to have to rebuild a security system from top to bottom every time you supply or create a new system for us. We need to have a set of services, and the best practices from the world of architecture of old services. Does everyone remember that world? I bet I remember. These are security services, over which at the top level there are widgets and analytics, they communicate with security services in the middle, which interact with the infrastructure for computing and other low-level things. So - security services and so on have a lot in common with each other, and we want to be sure in them that they are constant throughout its length for any person who has access to any data element accessible through any analytical system. And these measures should also be provided through one of the security services.
Fifth - the data. I am going to talk about this in more detail. Immediately I recall - "this is data, you fool." We have the concept of using data as a service and such a concept, which we called the 'data bay'. The data bay is not exactly a clearly defined place, but we plan to assemble powerful calculation engines there, similar to those that you saw in the exhibition hall. We found (or, at least, believe that this is the case) that all analytics are above a certain level, often using common sets of large, high-performance computational infrastructures hidden inside.
We want to create an environment in which all our data and massive computing infrastructures will be present, so that it is easy for us to work out new ideas or our new skills at the top level, setting in motion what we have below. To do all these things, you need more power to do the calculations, and this little funny thing is called Cloud.
Did you ask yourself how much 'much' means? We do this all the time. I want to quickly go over how big the concept of 'big' is in Big Data. You all know google. Google is a great provider of all sorts of interesting things. Google stopped reporting its size, at least, as we were able to find out, about four years ago in their documents of the 2009 or 2010 SEC.
At the time, they said they had about 100 petabytes of data, more than a trillion URL indexes. This is quite a lot.
Facebook As you know, Facebook, somewhere in August last year exceeded a billion users, so now there are already more than a billion. I found one interesting thing - the latest figures show that approximately 35% of all world photos are posted on Facebook.
Youtube. We believe that Youtube is the only repository of exabyte size or more that can be found on the planet, at least in its public sector. According to the latest documents that came to us, the size of Youtube was about 768 petabytes. If you roughly calculate how much data is added to Youtube, then you will find that three or four years ago, Youtube was larger in size than exabyte.
World population. If you return approximately in the month of April, it turns out that the population has gradually exceeded the seven billion mark.
Everyone talks about Twitter and how great Twitter is. About 124 billion tweets per year are spent on twitter, 4500 per second.
But Twitter is just a miser compared to the global SMS short message system, where about 193,000 pieces per second are transmitted. Of which 190,000 is recruited by my daughter [laughter]. I have bills from the operator, I can confirm it.
But even this is not much compared to the number of calls made on cell phones in the USA. Only in the US, 2.2 trillion minutes of negotiations take place per year - 19 minutes per person per day - which I find incredibly small, if, of course, I use my daughter again as an average rating. Approximately two orders of magnitude less than it should be, but if all this is shifted to the usual data estimates, then this is another Youtube per year.
What makes all this happen? I think you know all this. There are three fundamental driving causes of the past few years, as well as one small curious thing - the Social Mobile Cloud. It was she who brought us most of the Big Data. In the social world, things very quickly spread like viruses and therefore they need an information space, elastically scalable, within much greater limits than it was originally intended when the Cloud was just beginning to exist. Everyone wants to be in society and share information. All of this, considered as a whole, creates what we are talking about - Big Data.
There is a significant increase in the rate of innovation. You can ask any of you who have startups: have you ever visited your investment companies, except on some special occasions, and told them that you were going to buy a lot of iron, hire a crowd of admins for it and after that would you start work? Has anyone done this? Hardly ... What do you usually do? You go, take out your credit card, buy services from Amazon or Rackspace, or something like that - and get power, and start doing your job. You start a project quickly, very cheaply and you can concentrate on your task and not think about the underlying infrastructure.
For our world, this means that Social Mobile Clouds have significantly accelerated social communication in ways we did not expect, and I believe that they did not exist at all until these technologies appeared in real life. A classic example is the Arab Spring. The ability of groups of citizens who participated in the Arab spring to stay in touch, despite the totalitarian governments that tried to hinder them in every way, enabled the processes and protests of the Arab spring to develop, and however, it came to the realization that we are going to see after some time. But we are still trying to understand what it all means.
What is most important, in our world, is that such a thing as the Social Mobile Cloud has completely changed the flow of information on an entire planet. When I started working at the CIA many years ago as an analyst, the world was quite simple. Speaking in terms of information flows, it was a movement from a few to many. There were NBC and CNN, another Soviet TASS and the American Times, and also the Washington Post. What you were doing was a classic model, when several information generators told the others what they should think and how, and things spread in this way. Social Mobile Cloud has turned this model upside down and has shifted to the complex many-to-many model, and I, of course, have to say that we, in fact, prefer the several-to-many model more [ laugh]. Getting an advantage in this model was quite simple. After all, what is interesting is that when everyone talks and exchanges information, then, despite the high level of noise, there is a useful signal here that we need to find. And this, I believe, is one of the big problems of Big Data in the world: how to find a signal in the ever-increasing oceans of noise.
If you think that it is difficult and you think that you know it - here we talked about it; The guy who does healthcare at Aetna and others who talked about this a bit earlier - there are three other forces that are emerging: Nano, Bio and Sensors.
You are already a walking platform for sensors, and I hope you know it. Your mobile devices — your smart phone, your iPad, so that there isn't yet there — everyone has a lot of these gizmos. I think that there is a closed list of what is installed inside these devices and what appears inside these spaces. As you walk around the neighborhood, like a mobile touch platform - and remember, I told you that your devices are not safe - you should be aware that someone might know where you were all the time, because You have a mobile device. Even if your mobile device is disabled. Hope you know it. Yes? Not? If not, you should know this [laughter]. Because it really matters.
Suppose you were once a Star Trek fan - like me when I was a child, and now imagine that your mobile platform, your smartphones, will turn into your Communicators, become your Tricorders and, finally, will become your Transporters. How do you get on a plane today? Do you want to go with a piece of paper, as I do, because in the place where I work, mobile devices are not very encouraged? Or you will pass through a small symbol in front of which you make a movement with your hand, and this magic thing will take you wherever you want.
It can also be your mobile platform that monitors your health. Right now you can buy additional devices for your pacemaker, which will monitor your blood sugar, control insulin and other healthy things. The healthcare industry itself is very hard looking for ways that it can do remote monitoring of your health, so that they can always work out what is happening to you and your body, and then be able to do a remote adjustment of your problems. You think: Gus speaks very quickly - so, I am very concerned that someone is going to remotely hack my remote settings and speed up my little pacemaker so that I can talk to you even faster. And this is exactly what we have to worry about, if you think that cyber attacks, as they arise, are directed not only against your business. In the end, they can be directed against you and your health. And if you do not take precautions, you will face serious risks.
In fact, if you think about your touch platform, there is a little cool program - Activity Tracker. This is a small program for Android. Do you know her? To collect data, the program typically uses your three-axis accelerometer on your phone. Although, I actually - Fitbit. You know about Fitbit, right? This is the usual simple three-axis accelerometer. We love these things because they don't have ... However, I will not go deep into the specifics here [laughter]. What usually happens: they collect information, and according to the data viewed, which can be collected with high accuracy, you can set your gender, find out your height - you are tall or low, you have great weight or not, but what's more surprising - all this can be set according to your walking style - the way you move when you walk.
But generally it can be a really good thing. Imagine this is a security program. If you go somewhere and you need access to your banking code, it may be a little easier, because the bank will know with absolute precision that you are you, having established this by your walk and after that will allow you to conduct operations in the bank .On the other hand, if you don’t want to find yourself or you want to protect yourself, you don’t want someone to know what your walk looks like, so that no one can understand where you have been all this time. What is curious, as you begin to bring all these things together, the inanimate becomes reasonable. We already see this happening. IBM talks about their Smarter Planet project. Google has a car that drives by itself. You already have a technician that knows what you need - you could see it at the last CES. Did you not read the article about the refrigerator that reads the products? He does this as you put them or take them out, and then sends you mail on your smartphone: “Buy milk.” I paint myself a somewhat gloomy picture of the future: Friday night, I am very tired, I worked late, I sit in my self-governing car, say “take me home” and where is she taking me? In a safe way, driving around all the obstacles she takes me for the damn milk [laughter]. Why?Because she knows better that you will eventually need milk! [laugh]. So, of course, there are a number of good things, but some things may not be so great.
But still, when you put everything together, it usually works well, because if you think about it, the potential of these things is incredible. And you know that too. Dramatic improvements in traffic management — the ability to dynamically change the route, so you can optimize your time and save gas or something else — this is great. We have already talked about the involvement of society, it also helps us to be green (automatic transport management), and we have already talked about how great it all is.
Crime Prevention. Probably everyone saw the last article in which the British conducted a study - in London, which is considered the city with the largest number of cameras on the planet - and the argument in favor of placing cameras, such as fighting crime. Do you know how many crimes they managed to prevent solely thanks to the cameras? Is there anyone here who knows the exact answer?
One!
So some such things cause questions.
The problem we face; Remember, I talked about the big world of data from the Social Mobile Cloud, in which you place the world of sensors and of course, it becomes a place of really interesting problems, especially for us, because the sensors are not limited by anything. These are just small pieces of silicon that we would like to place anywhere, they can move anywhere, and they are simple enough to make. Sensors are transparent, they will never process a signal not intended for them. And they make no difference: they process any received signal.
And when we apply this to the Internet, full of the entities we talked about earlier, everything becomes connected, everything is equipped with sensors, so everything communicates and talks with each other, and the volume of this conversation only grows. The capabilities of people look pale in comparison with what may occur in a world connected to sensors. And this is a very big challenge for our future.
You may ask yourself - why should we think about this? We take care of this, because in all this information there are important signals for us that help to ensure national security. We care, because we need to understand what is happening or is going to happen in the outside world, so that we can inform the people responsible for our policy, even before the trends take shape and before any problem situations arise.
We need this because we want to stop another terrorist who is going to carry a bomb on his plane in his underwear before his pants take fire.
We do it - and I have to be careful to say it here - because it may be better for you and your friends to know where you are constantly; which for my particular case may not be such a good thing. But most importantly, we worry about the direction in which this world is developing.
And we are also worried because the information that exists now is significantly different from that which was in a world where intelligence activities were completely controlled by humans. Below is a good table. Greenish bubble and purple bubble. Green is a world in accordance with the universal library decimal classification system, which when I was at school was called Dewey Decimal Classification (DDC), if I remember correctly. Another red is the world of information in accordance with Wikipedia. Which one should I trust? What order of organization of information do you trust? I know which world I trust - I trust Wikipedia.
What impact does Big Data have on us? Basically, it helps us understand what is happening in the world and know what we know; understand where we have white spots so that we can do our work better. It takes us a lot of time and requires the use of some very expensive assets, through which we can understand how and how we can fill in the gaps, and we don’t really need to collect information that we don’t need, which we can find and collect through other mechanisms, such as social media and other such things. This leads to some important implications, and I'm going to talk about the present and what I call the four big rules of Big Data over the next six minutes.
Number one. "This is data, you fool!". Remember, like James Carville: "This is the economy, stupid!". Two - it can be a force for people. Three - we'll talk about the delay, generating disrespect. And four - in the world of the future, everything is in a certain context and everything is in your context.
Number one, "this is data, little fool." A small history lesson from our world — that might sound a little trivial for you, but we got it from combat and earned hard work — no matter how sophisticated and complex tools you have, if they don't work with my data, then they will be completely useless. Our users, as a rule, tend to choose a mediocre tool for working with data instead of choosing the best available tool and showing me what a wonderful and wonderful object you can create with it.
And this is necessary in order to understand what is happening in the world of information - we must bring everything together, we must understand the plans of our opponents, we need to connect all the key points with each other.
The problem of big data lies in the following - the database of useless information is 500 million gigabytes, while the base of useful information is only 5K.
Our problem is to determine what is included in the 5000? Throughout our long history, we have already understood that information has value in time, just as money has value in time, and the value of any information becomes known when you can connect it with something else that will stand on your place in the future. If in our world, some information will be carelessly thrown out, because you thought that it had no value, or you decided not to take it into account and not to collect it, because you thought that it does not meet the needs of the current moment, then as how new events and new information will appear in the world, you will not have a link in the overall picture. The question is that if we cannot find and link all the links in our chain now,then it makes us constantly try to connect everything together later and we have to hang on this issue forever. Although "forever", there certainly should be in quotes.
Some interesting characteristics of Big Data are fairly simple, such as 'more is always better'. The signal-to-noise ratio in this world only gets worse, but the reason why 'more is better' is that it allows you to make a numerical assessment of what is happening in your data and not do expensive modeling. Does anyone remember George P. Box's famous phrase about modeling? "All models are wrong, but some of them are useful." The problem with modeling is that it forces you to make assumptions that are all somehow distorted by your vision of current events. We want to get away from a distorted perspective and have a clear understanding of what is happening in the world.
On the other hand, users are not data scientists or engineers. They are not oriented in detail in the material. And we need, and we must be sure that whatever happens in our world, it must feed our information - the actual data sets, with a sufficient amount of intelligence, so that the user does not need to do anything more than ask a question and get a meaningful answer from the actual data set. If they have to use their hands to browse thousands of data sets and try to understand which of them have information relating to the subject of interest, then this is a losing situation in all directions.
The following is power for humans. I will tell you that today analytics and tools are difficult to use. To extract valuable information from the data, we need specialists; we call these specialists data processing scientists, and we are trying to raise the prestige associated with this science to a high level, because the information, skills and knowledge required for this are very complex and take considerable time to master. The problem is that it requires a lot of work with the hands, and much of what is happening is not built into our business space.
The world of science is engaged in the development of these areas, of which we have already spoken here a lot - data researchers, information processing engineers, and so on.
A scientist in the field of processing this data, according to Wikipedia, must have fundamental training in all these areas. And how many people on the planet have these skills? Not so much. Of course, having received grants, many universities on the planet started programs in mastering new sciences, this is good news, but so far [indiscernible] the state of affairs is still far from ideal.
We believe that Big Data democracy will win. Our goal is to approximate the moment when I will be able to transfer the power of Big Data and analytics into the hands of the average user. The only way in which real value can be perceived by us, and, by the way, is true in the commercial sector and for individual companies - when everyone has access to the tool and the data, allowing them to do their work without worrying about how it works.
We want elegant, easy-to-use tools to appear tomorrow. Let the machines do the hard work, and we need simple things like the same search. Search in the modern world, about which we are constantly talking and which is already approaching the petabyte scale, is still not sensible.
We understand all these things, we can name seven universal constructs by which we want to do analytics. We look after people, places and organizations, we care about time, events, certain things and concepts. What we want for analysts is that everything is as simple as using functions in Excel. You go to Excel, write your small equations there — amounts, standard deviations, open a bracket, select a list of values, close a bracket — and immediately get an answer. And you see - it is correct or not. We want a similar tool, say for analyzing a group of people - I need to, say, see the connection between them, and it would be great if we opened the bracket, entered the list of names, closed the bracket. And so I wanted to get? A beautiful network graph from which to seehow people are interconnected by any different means, based on what I need.
I believe that for those who use it, all this will be quite simple. And we want people to use all of these things, and in an unexpected way, and so that they can change everything so that they can get more and more complex results from relatively simple building blocks.
This is exactly the case when I would like to mention the participants of the Arab spring, here I would like to conduct an analysis of moods over time and place it on the map as a temperature distribution map. And I would like everything that the users had to do - just draw a diagram, such as in Visio, and see what happens at the other end. And we would like for them to be as simple as possible.
Delay, generating disrespect. This is what concerns speed. Speed ​​is the only thing that is significant in our world, and I think that it is also the only thing that means something from a commercial point of view.
Just because we want everything to be fast, and do not want to wait. What drives my users more than anything else is when they are waiting for something to happen. So I think that we are gradually moving into a world where this already exists. We have jobs that are performed in almost real time related to MapReduce — we get rid of MapReduce, which is flexible, powerful, and slow, and we want to use MapReduce, which is flexible, powerful, and very fast.
We actually want to transfer this all into structures that we call petabyte-scale memory architectures so that we can do distributed analytics and other things of this type. This kind of thing leads to technological change that you constantly read about.
And we think that these processes will lead to the development of new competing architectures, radically changing the order of things happening in the world.
Finally, everything is in some context. In your context - and this is important, because this is the world in which we believe that we have built it and perceive it as such.
He will be within your concepts, because everything else will be within someone else's concepts. So the purpose of all the widgets is to allow you to build your WebTop, or let it be called something differently, using the tools and capabilities you need to do your job properly. What is the purpose of all the material that has arisen in the world of Big Data when using the 'reading scheme'? This is data outside the context from which you need to benefit. I want to receive, as I said earlier, the analyst collected by the user in the context of the problems and questions asked, and then all this will be miscalculated in the context of the requirements of the work being done. That is approximately what the problem of elastic calculations in our world is. A few thoughts in conclusion. I believe that it is midday in our information age — when the sun is high above your head, and I say this for some reason.
We are already standing very close to the ability to process all the information accumulated by mankind. Do you know what's good if you compare people in relation to sensors? Within 24 hours you can do a lot of things. You sit here, take your notes, take pictures or just listen - you do one thing. You cannot do many other things. You simply generate some data. Now this is the case - and if you don’t believe me, let's go back to my Facebook example, which contains one seventh of the entire population of the planet and 35% of all digital photos taken - if you want to think about the things that they [sensors] can do.
The inanimate becomes reasonable. When it becomes rational, it becomes somehow gloomy to me. There was a third wave of computation, which arose when cognitive machines appeared. Watson is a prime example. Interestingly, Watson in cognitive machines is like the IBM PC 8088 when compared to modern machines. Gradually, these machines will radically change our world and will be engaged in everything in medicine, trading on the stock exchange, and also help us in intelligence analysis abroad.
It is a fait accompli that the world moves faster than the government and legislation can do. I can bet that he is moving faster than you can keep up with him. You can ask a question, and what are your rights and who owns your data? I bet you will raise this question. As I said earlier, it causes social change at a pace and in ways that we cannot even expect, and all this creates a very interesting world. I will not speak here about cyber-threats, because we have run out of time. Thank you very much.
[Applause]
PROMOTIONER 28:47
Thank you very much.It was something amazing. I think that we are now ready to have dinner, and I think that everyone can find you somewhere around here. Thank you again Mr. Gus Hunt, CTO CIA. I do not know about you, gentlemen, but I am going to throw out my phone into the river after dinner.