About a very spy authentication method

So, we send super-secret agents Alice and Bob to an enemy country under cover. In the course of the mission, they will have to contact and work together, share information, regular espionage cases. Of course, all this must be done in compliance with all possible safety rules and techniques.

After all, the last thing we want is their disclosure: both the mission itself and the agents themselves, as well as the entire national security, are at risk. Therefore, it is in our interest to give the spies the minimum necessary information. In particular, the less they know about each other and communication techniques, the better.

But how then can they identify their comrade in headquarters?

KDPV

TL; DR - inventing a user authentication mechanism using steganography for an imaginary three-character agency of a non-existent country.

About wolves and sheep skins

A cover is a cover, therefore neither Alice nor Bob should raise suspicions with any actions. Proper planning implies paranoia about constantly spying on them at all possible levels. This post will not consider the task of directly exchanging information (it deserves its own separate series), but only a way to make sure that it is passed on to those who need it, to those who need it.

Most likely, both spies will have a history in the format of ordinary citizens, moreover, they are not connected with each other. Therefore, it is immediately necessary to impose a veto on the use of classical cryptographic tools and secure channels - every counterintelligence agent knows that there is nothing to hide for honest people who do not have a close relationship.

What to do?

Of course, such a task is not new, it happily existed and was solved long before the appearance of these of your Internet. And not only was it solved, so some decisions were entrenched in culture and are still found in books, films and games.

We will see at least such a scene: two people in long coats converge in a public place and exchange very strange phrases. If the initiating phrase and the answer are correct, then the authentication is successful, and people exchange folders with the note "Top Secret" and diverge in unknown directions.

The lack of such a scheme is immediately obvious - phrases must be kept secret and often changed , which is not very easy in enemy territory. At the same time, in order not to be uttered by chance and not lead to a case with KDPV, they are made quite distinguished and random, which means that they can issue agents who pronounce them.

We, in the digital age, do not like this method. Especially if you remember that almost all communication channels are controlled by someone and are heard from both good and bad motives. And what they would not have assured us about, people's lives, well, you shouldn’t trust the privacy policy of some Facebook.

Steganography (again?)

No brainer that the ability to hide one in the other in such a situation looks more attractive than ever. And in fact, even the described method is its subtype - code phrases can be viewed as containers with only one bit of information inside.

The same mammal of the Insectivorous Squad understands that this is not about simply throwing stegcontainers to each other. Such an exchange will cause almost more suspicion than any common PGP encryption, so we are not interested.

What about then?

Unlike cryptograms, stegocontainers have an obvious advantage - the context of application. Any text, image, audio file, etc., in addition to the obvious content, also carries the possibility of its natural discussion and can be sent for a reason not just a dialogue from the bay.

Armed with just such ideas, we can already make a simple protocol for steganographic authentication based on a common key:

A -> B: a nice message requesting a steganographic container of certain parameters;
B: selects a C container that matches the requested context and parameters;
B: similarly creates message M ;
B -> A: C '= Embed (C, M, K) ;
A: Checks C ' for compliance with the set parameters;
A -> B: M '= Extract (C', K) ;
B: Checks if M and M ' match.

Such a protocol has obvious flaws — Alice and Bob must have both a shared key and embedding and retrieval functions. Compromising them can lead to the possibility of a detailed analysis of the authentication method of the enemy and endanger both other users and the headquarters itself. Need to fix something.

Artist is not a bug

If the reader was in school after the appearance of computer classes, then he should remember learning the basics of algorithmization with the help of the performer turtles, ant and the like. Their idea was to demonstrate the possibilities of optimizing a large number of manual single actions by creating simple programs. To solve our problem, we need to go in the opposite direction.

Well, is it not lovely?

Since we can simplify the writing of the final algorithm from the sequence of steps to their procedural description according to the given parameters, we can also perform the reverse process. If we imagine the container as an array of some of its components, then the message embedding by key can be written as an ordered sequence of operations on the elements of the container at certain indices with various constant parameters.

Nedomathematics begins here, therefore, if I’m afraid of shy people, I’ll just flip through difficult-looking paragraphs to the operating section or even a little further away. Nothing bad will happen, I promise.

To embed data, we need a sequence of forms: (f1, S1, i, D1), (f2, S2, j, D2) ... , where:

Di - some of the embedded data;
i, j - indices of container elements;
fi: (State, Element, D) -> (State, Element) - embedding function;
Si is a state, operation context, (El ', S [i + 1]) = fi (Si, El, Di) .

To extract, you do not need to store parts of the data (K.O.), therefore triples are enough: (g1, S1, I1), (g2, S2, I2) ... with similar values, only gi: (State, Element) -> (State, D) .

All this can be represented by a symmetric diagram below. If for some reason I did not succeed in reaching clarity, then it is not terrible, just read further.

Well, or no more

It is seen that the embedding function has a greater number of degrees of freedom. Unlike her sister, she modifies a container, while doing so based on two independent elements — the embedded data and the element. Thanks to, or, more precisely, because of this, there are two possible global approaches to the implementation of the steganography algorithm by such a system:

Select the most optimal in accordance with the function of embedding the indices of elements to change (the least noticeable or not requiring it at all) and transfer the resulting sequence tied to a specific container. With this approach, it is necessary to isolate them from each other until the need arises for embedding using classical methods such as encryption and other protected media;
Find such a method of splitting the container into elements and the function of embedding, that any expected change by it will be equivalent to imperceptible. In this case, the sequence does not depend on the container and can be created at least completely random generator. The downside is less flexibility and the lack of control of the worst case. On the other hand, such an approach is simpler and more convenient when applied in the field, so below I will use it.

If the algorithm does not require a state, then all the above remains true, simply without a single letter and block in the diagram. Without it, even easier, in fact.

And why do we need it?

Now, if you know in advance which containers with which messages and keys will be used, instead of fully disclosing parts of the algorithm, you can generate and give agents to use only similar sequences and an interpreter for them. Well, okay, not just give, of course, but more on that later.

Add asymmetry

Even a turtle artist can draw a square in hundreds of different ways, simply changing the order of operations and adding new ones. So, no one bothers us and do the same with the described sequences for fixed input data.

That is, we can take the sequence of embedding, add new operations, mix it all, so much so that the result will remain the same. Unless, in the presence of a state, it will be necessary to monitor it and add the necessary changes separately to the sequence. That is why it's easier without him, yes.

Anyway, after such mixing and noise, even the builder himself will not be able to understand what he actually embeds, such that: any sequence of N operations will represent N! potentially embedded messages — one for each permutation of embedded parts. At the same time, N itself is a big question. Therefore, it is possible to call such sequences open - they do not give any information about the embedded message, or about the algorithm and key used.

When extracting information, the order is very important to us (to restore the right message from all possible) and the number of extracted parts, so the extracting sequences remain unchanged from the moment they are born. Since they implicitly contain information about the key used, the generator and the algorithm, they, like animals from the red book, should be stored and protected. And keep secret.

What does it have to do with asymmetry? The fact is that now each extracting sequence is matched by an infinite number of embeddings. And the restoration of one by the other is generally an unsolvable task.

We exploit

We forget about any near-mathematics and return to the original problem - how can we send Alice and Bob to enemy territory in order to:

in each other's face did not know
had no secret algorithms on hand
but could make sure each other when communicating over an open channel?

Well, the first point is clear, we just do not give them any explicit information about each other, no shared keys. For the second, you need to remember the description of the protocol above. Now we can exclude from the use of directly the Embed and Extract algorithms representing the potential state secret and all that. And, with this in mind, for the third one, we can draw up the following two-step protocol.

Creation of authentication information prior to the start of the mission with headquarters in the role of Trent’s trusted side:

T: selects the secret algorithm and the secret key K, creates with their help:
- extract sequence Ex ;
- suitable for authentication (below) context Ctx ;
T -> A: Ctx, Ex ;
T: using Ex and the created context, generates:
- Not used previously for selected agents message M from among the parts | Ex | ;
- a one-time sequence of Em , making it open as described above;
T -> B: Em, h (M) , if desired, creates additional sets.

Thus, Alice has only one sequence for all occasions and the context of the future contact, and Bob becomes the happy owner of a set of one-time sequences and hashes of messages that they embed.

The authentication protocol already during the mission looks like this:

A -> B: initiating IM message based on Ctx context with container description;
B: selects the appropriate C ~ IM ;
B -> A: C '= Em (C) ;
A: checks for compliance with C '~ IM (since the changes are invisible, must be preserved);
A -> B: M '= Ex (C') , marks M 'as used;
B: checks h (M ') == h (M) , destroys Em, h (M) .

The attentive reader will notice that before the protocol is held, Alice, that Bob possess just a set of information, which in itself means nothing either for them or for a potential adversary, and only during the game “plays with colors”.

Each open Bob set is used only once, which is controlled by Alice’s penultimate step. When she encounters a previously used M (and, therefore, an invisible Em ) by another person, she realizes that one of her "comrades" is a fake.

Reuse by the same person tells her that she is not aware of the details of the protocol and is certainly not the one with whom she needed to come into contact. Well, better late than never.

Okay, this is how it all looks too complicated and incomprehensible. Has anyone got here?

Let's better demonstrate in practice, because even the spies themselves don’t need to know the details of the protocol for use, what can we say about poor readers? Only first a little about how it was all implemented.

High tech

So, it remains only to write everything you need for the protocol. Well, not to do everything with your hands (although it is possible so). And today the victim of my code will be ... turns the wheel of fortune ... Java? Well, okay, at the same time everything will be in the STL, there’s no need to search for anything.

Let's start with the necessary API. For work, it is only necessary to define the class of the array of elements of the container with the ability to receive and modify elements by index:

class MyContainer implements StegoContainer<MyElement> { public MyElement get(int i) { //  i-  } public void set(int i, MyElement v) { //  i-  } public int size() { //    } }

Further use is reduced to the creation of a wrapper for a steganographic automaton above the desired container and to feed into its input the functions of embedding and extracting:

 StegoMachine<MyState, MyElement> myMachine = new StegoMachine( initialState, new MyContainer<MyElement>(/*   */) ); final StegoEmbed myEmbed = (st, el, dp) -> { //    dp   el   st }; final StegoExtract myExtract = (st, el) -> { //     el   st return dp; }; //    MyDataPart part = /*    */; myMachine.exec(1337, myEmbed, part); //... //    /,   //      State currState = myMachine.getState(); //... //    part = myMachine.exec(80085, myExtract); //... //        MyContainer container = (MyContainer) myMachine.getContainer();

Similarly, classes with the Stateless suffix are used, in case the implementation of the algorithm does not require the maintenance of an internal state.

Sequence generators can work as they please and do not have a common API. In general, anything can be part of the data, from single bits to rock art in a separate encoding.

Implementation example

About method

As an example of implementation, using the interfaces created, I implemented a simple algorithm from the LSB family for bitmap lossless compression. Their elements are pixels that do not have neighbors for the least significant bit of all RGB components. The embedding function works with single bits of the source data and simply changes the low-order bit of the value of one of the components (which index will point to).

Quite simply, but great for implementing the protocol, since changing any element is equally invisible by their choice, so generate indices of variable elements using a random generator. In the case of Java, using SecureRandom , but if you wish, it can easily be changed to its source of entropy.

However, this is a very simple method, I do not advise it to use real spies.

About hashes

Since the text has a tendency to be distorted depending on the simulated personality of the agent (some do not put in capital letters, others like to put emoticons, etc., some are illiterate at all), I suggest using sha256 to calculate the hash, but only from printed words translated into lower case:

 h("Hello world?...") == h("hello, world!11")

About the interface

The program complex consists of two parts - one for generating sequences and other hashes for Trent, the other for embedding and checking received messages for compliance.

Work with both comes from the command line through its arguments and input-output streams, other interfaces are not delivered (fear and horror). Still, to be that employee of the headquarters, that a spy - means to have any qualifications. Well, if not, I’ll show you by example.

Here is the help

Screenshot usage

What should they all do?

For starters, Trent at headquarters needs to work out authentication information. In particular, consider in advance the situation in which the agents will work.

For example, let Bob be a graphics freelancer and Alice his customer. Authentication will occur under the guise of the order to create graphics / design / something else.

We share this useful information with both and return to the protocol itself. Let's prepare in advance a suitable M.txt embedded message, minimizing the number of characters in it: "it is great for me where to transfer money." We generate Em and Ex with Trent utility:

 Trent@HQWorkstation:~$ java -jar HQUtil.jar -ex $(stat -c%s "M.txt") 4096 > Ex.txt Trent@HQWorkstation:~$ cat Ex.txt | java -jar HQUtil.jar -em "$(cat M.txt)" 0.25 4096 > Em.txt Trent@HQWorkstation:~$ cat M.txt | java -jar HQUtil.jar -h > hash.bin

Here $(stat -c%s "M.txt") returns the message size in bytes, and 4096 the limit for the range of generated indices (to allow the use of smaller containers). Similarly, $(cat M.txt) used to pass the message itself to the command line parameter. In principle, you can do without a bash, using your own manual labor, but to whom it is more convenient.

Ex.txt is passed to Alice, Em.txt and hash.bin to Bob. Now imagine that the agents are successfully implemented and want to communicate with each other - go to the protocol. Bob places his resume or job offer on some stock exchange, and Alice starts talking:

 : ,     %_% :        ,   .  ? : ,

Bob is looking for an image of an umbrella, maybe even draws himself, if the soul is creative, slightly squeezes / imposes a watermark (or whatever freelancers are doing there now) and performs:

 Bob@PC:~$ cat Em.txt | java -jar SpyUtil.jar -e umbrella.png

After waiting a while, pretending to work, if in reality he did not do it, he sends the received container to Alice, naturally keeping to the context:

 : ,  ,

Sends an umbrella with a message, 670kb

That, in turn, retrieves the message stored inside:

 Alice@PC:~$ cat Ex.txt | java -jar SpyUtil.jar -e umbrella.png

Turns a set of words into a normal sentence and sends it to Bob:

 : ,  ,   ?

He checks the accuracy of the message:

 Bob@PC:~$ java -jar SpyUtil.jar -c hash.bin ",  ,   ?" ,  ,   ? - Correct

And continues the easy communication, if everything is okay. The whole dialogue on the part of the observer looks something like this:

500 kilobyte gif

It is clear that the counter-intelligence will not find anything suspicious in all this intercepted. In fact, even steganalysis methods in this case will not always be used - well, someone ordered an umbrella image for 5 bucks, they found something to surprise the Internet. Computing resources and people are not infinite to check every such situation. Authentication was successful, curtain.

-> GitHub

Source: https://habr.com/ru/post/456670/

All Articles