📜 ⬆️ ⬇️

Simple data scrambling algorithms

Sometimes you need to encrypt something, but to attract serious encryption algorithms like and out of place will be like a cannon on sparrows. For example, you need a simple protection of traffic from users / Trojans with sniffers , but the data itself is not worth it to spend a lot of time on their encryption-decryption, and of the implementation itself, too. Or you need to somehow secure the secrecy of some stored data from ordinary users. It is clear that such algorithms will not stand up against targeted hacking attempts by professionals, but we will try to complicate the work with them, although such a task is usually not set. This is usually called scrambling .

Under the cut, I set out ideas for such algorithms and promise that they will be more complicated than an ordinary XOR with a fixed key. Just in case, I draw attention to the fact that these algorithms do not pretend to be crypto-resistant, but I am sure that you will be able to find an application for them.

Prerequisites

It is assumed that a potential hacker either does not have access to the code that performs encryption, or does not have sufficient qualifications for reverse engineering, or the data are not so valuable as to waste time on heavier hacking methods.

Probably everyone knows that in such cases, most often use a simple cyclic overlay key of a fixed length using the XOR operation. And they still know very well that such protection does not withstand even a novice “hacker” or advanced user. I would like something more complicated, but simple to implement.
')
And if you generate a key?

The first thing that comes to mind is to generate a long enough key to at least make it difficult to find the key length. For example, use a certain pseudo-random number generator with input data known to both the sender and the receiver. One of such frequently used generators is a linear congruent PRNG (GNSS is a pseudo-random number generator ). Of course, we guess that this is bad, but what exactly is bad about this approach? The problem is that it is quite difficult to generate parameters for the generator itself. It is rather difficult to choose programmatically good parameters for a linear congruent PRNG so that the sequence is long and nondegenerate. On this occasion, you can read in 3.2.1 in the book D. Knut "The Art of Programming". Therefore, often these parameters are inserted into the code as constants and, as a result, the potential hacker receives many encrypted messages with one key, which greatly simplifies his work.

And what if the data itself is used to generate this pseudo-random sequence?

This idea struck me about 20 years ago, when I “helped” to write a diploma to a friend of my acquaintance. At first glance it seems that this is impossible, because we need a generator that would produce the same sequence when encrypting and decrypting. Oddly enough, it is this “deadly” thesis that gives us the path to the creation of such a generator. Yes, we need an algorithm that changes the values ​​of its internal variables in the same way, if we give it the original byte (or whatever we have is the atomic coding unit) and the encrypted byte. How to achieve this? All ingenious is simple - to calculate the next key value, you can use commutative operations for pairs of source-encrypted bytes. Since the result of the operation does not depend on the order of the operands in the pair, it is obvious that such an algorithm will change its variables when decrypting in the same way as when encrypting, but the sequence of keys for other input data will be different.

An example of a key generator dependent on input data

To make it clearer, consider a simple example of such an algorithm.
Let x n is the next code in the source data, k n is the current key, k n + 1 is the next key value, y n is the encrypted code x n .
Q (a, b) is a certain commutative function, i.e. such that q (a, b) == q (b, a).
F (a, b, c) is a certain integer function.
Then, the (de) coding iteration can be described as:
y n : = x n xor k n ;
k n + 1 : = F (k n , Q (x n , y n ), n);
If for the F () function it is clear that its implementation is in general limited only by our imagination and common sense, then about Q (), you probably want to see the details, namely, what conditions it must meet to be commutative. The easiest way to achieve this is to use arguments only in pairs in commutative operations, for example, xor, addition, multiplication. Examples:
Q (a, b) = a xor b . ( Corrected: I guess I got excited, because when you impose a source and encrypted code, you get a key, which is undesirable. I personally use a bit more complex functions).
Q (a, b) = ((a xor b) or 1) * ((a + b) xor 1).
As you can see, it is not difficult to invent your super-duper Q () function. Another thing is whether to make it difficult? I think that there is no special meaning in its over-complication.

Well, now what's bad?

Yes, without knowing the code of the encoding function, it will be very difficult to read something. But what clues still remain? If the input parameters for the scrambler are the same, then
  1. If two messages start with the same data, then the beginning of the encrypted data will be exactly the same;
  2. The key for the first code is the same;
  3. The length of the encrypted message exactly matches the length of the original message.

How to deal with it, but do not put your young life? Of course, there can be many ways to fight, but I want to be cheap and angry, are there any? I have a few ideas on this.
To overcome the first two points, there is a very simple but effective technique. When encrypting, insert random data before each message. Even one byte (code) is enough. Since the next key depends on the data, even for identical messages we will get different key sequences. The recipient simply needs to drop this prefix after decrypting the message.
To combat the third item, you can add random data before and / or after the message.

Any more ideas?

And then! I always have ideas!
Suppose you transfer data in a compressed form. Or the data is already partially encrypted. Or each message / block is rather long and consists of binary data with an approximately uniform distribution of codes. In this case, any interference with the order of the codes in the message can significantly complicate the life of a potential hacker. I am sure that you yourself can come up with some primitive byte mixer in the data block, but I did promise interesting and beautiful ideas.

Data shuffler

Usually, to solve this problem, some HRMS are used to obtain pairs of code indices in a data block that swap places. The trouble with this method is that it is difficult to guarantee that some of the data will not remain in the same place. It is also not entirely clear how many permutations need to be made in order to achieve an acceptable result, although, for reliability, you can simply go through all the message codes and exchange each with a random one. But there is one more trouble - if the generator has a poor distribution over the square (and the linear congruent is such a disease and it hurts, and moreover it is hopeless), then at certain block sizes one can run into a looping of the output values. For a long time I went to the idea of ​​a fast pseudo-random data shuffler (shuffler) and I think that it deserves your attention not only as an algorithm for scrambling.

A bit of theory. In clause 3.2.1.2 of D. Knuth's “The Art of Programming” book, you can find criteria for choosing a multiplier for a linear congruent generator so that the length of the generated sequence is equal to the module. What does this mean? This means that for a generator with module m each value from 0 to m -1 will be given exactly one time. Why do we need it? Recall that it would be advisable for our shuffler to ensure that all the bytes (codes) of the message changed their place. That is, if the length of this data block is equal to this very m , then it will be sufficient for us to simply write the next input byte (code) of the message to the output buffer at an index equal to the next value from the generator. The simplicity of this algorithm is so seductive that I could not pass by.

But, as always happens with something seductive, it was not without problems. Firstly, not all m equally useful are good. From the same chapter of the same book, we see that if m is the product of primes in the first degree, then we cannot achieve a complete enumeration of the elements in principle (apart from iteration in a row, which is, of course, not interesting to us). It turns out that we cannot get the generator we need with a given sequence length, and therefore, if we have messages of arbitrary length, then we cannot always find such a generator. Dead end? Do we really need arbitrary length generators? Recall that for a potential burglar, knowing the length of a message is very useful. Then we already know the method of struggle that we successfully used for the generator, depending on the input data. That's right, you need to throw in random trash, and best of all before the useful data. True, there is a problem in that in each block you need to somehow indicate the amount of useful information in it. If in your case the length of all messages / data blocks is fixed, then you can fix and m choose the first value convenient for you that is longer than the length of the input block and satisfies the criterion from Theorem A from 3.2.1.3 from the book.

Now about the criteria for the generator parameters x n + 1 = (a * x n + c) mod m :
  1. c and m are mutually simple;
  2. a - 1 is a multiple of p for all prime divisors p of the number m;
  3. a - 1 must be a multiple of 4 if m is a multiple of 4.

How would this achieve a little blood? I suggest this option:
m = 2 n , where n> 3;
c = p, p is a prime number & p> 2;
a = 4 * k + 1.
As it is easy to notice, all three conditions are fulfilled and such values ​​are quite easy to select.

Any more ideas?

The idea of ​​combining a shuffler and a data-dependent generator is pretty obvious. To do this, we first feed the generator the necessary amount of garbage to adjust the length of the message to the size of the shuffler block, and then run the data of the message itself. All output codes are written by indices that we get from the shuffler.

I think that's enough for today.

Corrections: Corrected the noticed time and crossed out a bad example.

Source: https://habr.com/ru/post/116835/


All Articles