Why virtually everything that reported “hacking” Facebook is not true

If you follow the news, you may have noticed that a company called Cambridge Analytica has often been included in the headlines . The media tells the following story:

A dubious British data analytics company, using a 24-year-old genius, has developed an innovative technology to hack Facebook and steal 50 million user profiles. They then used this data to help campaigns related to Trump and Brexit to psychologically manipulate voters through targeted advertising. As a result, at a referendum in Britain, people voted to leave the European Union, and Trump was elected president of the United States.

Unfortunately, almost all the statements described are misleading or simply wrong.

First, there was no hacking.
')
The collected data was taken from user profiles after users gave permission to access data from a third-party application. Remember these small confirmation windows that appear when you want to play Candy Crush or log in via FB so as not to create a new password for a random site? Yeah, these are the ones:

A Cambridge scientist, Alexander Kogan — not affiliated with Cambridge Analytica — made an application “Test Your Personality], advertised it, paying $ 1 for installation to people via the crowdsourcing site Amazon Mechanical Turk, and used the obtained permissions to collect data from profile. The application installed 270,000 people, so you could expect it to collect information from 270,000 profiles - but in fact, it processed 50 million profiles.

50 million profiles ???

Yes. In that reckless 2014, Facebook had the opportunity called “permission for friends”, which allowed access to the profiles of not only the person who installed the application, but also the profiles of all his friends. In order to prevent this, it was necessary to enable a certain setting in the privacy section, which few people knew about (here is an article from a blog from 2012 , which explains how to do it). It was with the help of “permission for friends” that Kogan multiplied 270,000 resolutions to the number of 50 million profiles.

The fact that the data of FB users were distributed to their friends without notifications and permits was a cause for serious concerns, which privacy advocates talked about back then. Therefore, in 2015, in the face of growing criticism and pressure, the FB removed this possibility, explaining that it wanted to give users “more control” over their data. This decision shocked application developers, since the opportunity to have access to the profiles of friends was extremely popular (see comments under the announcement of disabling features from 2014). Sandy Parakilas, a former FB manager, told Bloomberg that before the opportunity was turned off, "tens or even hundreds of thousands of developers" used it.

Let's summarize; At this point, we have two key points:

None of the above is related to the “hacking” of the FB or using any errors. It is about using the opportunity provided by the FB to all developers, which was used by at least tens of thousands of them.
The collected data does not refer to internal FB data. These data were collected by developers from profiles of people who downloaded their application (and their friends). FB has much more data collected on users than it is in the public domain, and this data is on all users of this platform. Only FB has access to them. Apparently, the journalists who wrote about this story could not understand this point - they constantly put an equal sign between the “internal FB data” and the “data collected from user profiles using a third-party application”. But between these concepts is a big difference.

The importance of the second moment becomes obvious if you read texts like this:

Simon Milner, FB Policy Director for Britain, in response to a question about whether Cambridge Analytica (CA) has data from the FB, replied: “No. They have a lot of data, but this is not user data from the FB. This may be data about people using FB, collected by them on their own, but this is not some data provided by us. ”

This text was offered as evidence that the FB lied to politicians about the relationship with CA. But if you understand the difference between the internal data of the FB and the data collected on the FB by third-party developers, it becomes clear that what the FB Policy Director says most likely is true.

So how does CA fit in with this whole story?

They paid Kogan for collecting 50 million profiles. Whose idea it was originally, it is already difficult to find out. Kogan says that CA came to him with a proposal, and CA said that Kogan came to her. Anyway, the data breach was just that; these were not FB internal data, but data dissemination rules. The developers were allowed to collect all user data that they needed for their applications, but they were not allowed (even in 2014) to collect this data for sale to third parties.

And yet, regardless of the official rules of the FB, it seems that the company did not try too hard to monitor how its developers use the collected data. Perhaps because of this, when the FB first discovered that Kogan sold the data to CA in 2015, she was content only to receive written confirmation from both sides that the collected data had been deleted.

The fact that there existed at least tens of thousands of developers who had access to such information meant that the data collected on the FB will inevitably be sold or otherwise accessed by third parties. And the former manager from FB, dissatisfied with the situation, confirmed this:

On the question of how exactly FB controlled the data received by external developers, he replied: “No. Totally. As soon as the data left the FB servers, there was no control and no idea what happened next. ” Paraquilas said that he "always assumed the existence of a black market" data collected from the FB, transferred to third-party developers.

Considering how common the data collection practice was, and that many developers had access to more users than 270,000, why did CA get into the media headlines?

The thing is, how journalists, especially Carol Cadwalladr from the Observer, shaped this story. Most of the publications promoted two views on this problem. First, an informant from CA revealed a “big leak” of data from the FB, and we have already described this problem. Secondly, this “leak” was linked to the success of Trump’s presidential campaign.

Christopher Wiley - an outstanding mind, "hacked" FB

The second point of view is as dubious as the first, and is based mostly on the pompous statements of Christopher Wiley , a former CA employee with pink hair. Carol Cadwalladr, who has worked with this story for years, in various interviews told that she approached her not as a research journalist, but as the author of essays. This means that she paid more attention to the “human side of history,” or, more simply, to Chris Wiley. Such an approach has its pros and cons, but the biggest disadvantage is how strongly its articles depended as a result of Wiley's stories, in which he portrayed himself as a young talent who was at the center of world political conspiracies.

Cadwalladr fully approves of such self-presentation by Wiley, and obsequiously describes him as "clever, funny, daring, wise, eager for knowledge, intriguing, impossible young." "The trajectory of his career, like most aspects of his life, was outstanding, incongruous, incredible." “Wiley lives for ideas. He speaks incessantly for hours on end. ” “When Wiley turns all his attention to something, his strategic brain, his attention to detail, his ability to plan 12 steps ahead are what it’s scary to watch.” “His set of outstanding talents includes such high-level political skills that, compared to them, the“ House of Cards ”looks like a cooking show.”

Wow. Here is this guy.

Cadwalladr’s personality-focused approach makes the articles easier and helps to hide essential technical details, instead producing sensational quotes and personal stories from the life of Wiley, his friends and colleagues. Such information can give food for thought, if you approach it critically - but this is rare. Instead, Cadwalladr just believed in the story told by Wiley: "By the time we first met in person , I talked to him for several hours every day."

So let's turn to an oversight and examine Wiley’s statements a bit more critically:

Steve Bannon wanted to use big data as a weapon - easy to believe.
CA says it is capable of providing effective tools for psychological positioning and manipulation - truly so.
Chris Wiley was engaged in a dubious business and considers himself partly responsible for what is happening - of course.
CA self-promotion really corresponds to the effectiveness of the services they offer - hmmmmm ...

The last point is the most important, and it gives the least amount of evidence.

There may be a temptation to point out Trump's unexpected victory, but there are many confusing factors in this matter. Trump really won. But he won the most unpopular Democratic candidate in modern history, who tried to hold the Democratic Party for a third consecutive term (and this has not happened since the 1940s). Moreover, he won with a very small advantage and lost the vote for popularity.

Alexander Nix, director of CA, stands on the background of a large number of impressive graphs

Can all this be evidence of the accuracy of the psychological positioning of the CA? Perhaps, but then we face the danger of working with an irrefutable hypothesis. It would be better to study the ratio of the number of wins and losses to CA. Unfortunately, we do not have access to the list of her clients, but we know that for the first time she gained fame by working on Ted Cruz’s presidential campaign. Yes, yes - Ted Cruz, the senator from the Republicans, whom Trump crushed in the republican inner-party elections, despite all the "power" of the CA, which the first had. I am not the first to notice this obvious contradiction - Martin Robbins noted the same thing in an article from last year:

The history of the republican inner-party elections lies in the fact that the fashion data of CA was lost to the dude with a website made for a thousand bucks. The transformation of this story into an exciting saga of invincible scientific voodoo, which Trump inexorably dragged to victory, passes with great stretch. Did they even work for someone else? Without a list of customers it is very easy to selectively approach the winners.

The meaning of the technologies used by CA is to build algorithms based on data from social networks that can accurately predict the effectiveness of the impact of messages on a person based on his personality and psychological portrait. This is exactly what articles referring to the use of psychography for micro-targeting voters have in mind. But most of the claims about the effectiveness of such technologies are extremely exaggerated. Kogan, a Cambridge scientist who was at the center of the discussion, wrote something similar. He claimed that he was appointed as a scapegoat and claimed that the personal profiles he had collected were not so helpful for making predictions for micro-targeting:

“During our further study of this topic,” he wrote, “we found that the predictions issued by us of SCL had 6 times more chances to correctly describe all 5 personality traits than to describe them all correctly. In short, even if this data was used for micro-targeting, in reality it could only harm the achievement of the goal.

Kogan can not be called an impartial source of information, but his statements coincide with various studies that have shown not the most brilliant results in attempts to manipulate through social networks. Take, for example, the controversial FB's “mind control” study that several journalists recently referred to. And none of the references to this study described how it turned out to be a failure.

FB conducted an experiment on 689,000 users, correcting the algorithm for issuing news so as to show them a little more or a little less of the status updates of their friends containing positively or negatively colored words. As any researcher knows, with such a large sample, you are guaranteed to get statistically significant differences between groups. A more important parameter will be the strength of the detected effect. In the FB study, the difference turned out to be truly frightening: people who saw fewer negative updates used 0.05 positively colored words for every hundred more in their status updates, while those who saw fewer positive updates used 1 positively colored word less for every hundred. Exactly. FB could manipulate people so that they use 1 positively colored word less than every hundred. Based on this, it cannot be said that FB is helpless, because more intervention would lead to stronger results, but it is important to see things in perspective.

Note that the y axis does not start with 0

It turns out that the real story is not that Kogan, Wiley and CA have developed an incredibly high-tech FB hack. The fact of the matter is that, except for the sale of data by Kogan, they used common methods allowed by FB until 2015. From the moment this story became public, CA was branded as reprehensible and unethical — at least that is how it advertises itself to potential customers. But most of the repetitive media statements are just mindless repetitions of what CA and Chris Wiley tell about themselves, without a critical look at the facts. The problem is the lack of sufficient evidence that the company is capable of what it says about itself, and full of evidence that it is not as effective as it likes to pretend; For example, remember that Ted Cruz did not hit the presidency.

No one is completely protected from marketing or politics, but there is little evidence that CA will be better than any other PR company or political campaigner or voter positioner. Campaigns on political positioning and disinformation, including advertising from Russia [the United States accused Russia of interfering in the recent presidential elections; Russia officially denied the allegations / approx. trans. ], of course, influenced the outcome of the last election, but did they become a critical factor? Was this factor more influential than the Komi statement [ ex FBI director / approx. trans. ] about reopening Hillary Clinton's email investigation a week before the election? Or the statement of Brekzit supporters that the European Union stole £ 250 million from the health fund every week [ at £ 350 million / approx. trans. ]? I am somehow skeptical about this.

I will clarify that I do not claim that CA and Kogan are innocent. At the very least, it is clear that they were engaged in things that are contrary to the rules on the dissemination of data on the FB. Likewise, the FB clearly allowed its developers too much in terms of access to private data. I argue that CA is not the evil puppeteers they try to imagine. She looks more like Trump - makes extremely exaggerated statements about her capabilities, which attracts increased attention to her.

Source: https://habr.com/ru/post/371387/

All Articles

Why virtually everything that reported “hacking” Facebook is not true

More articles: