In one run

Among programming tasks, one often encounters the following: a sequence of elements of the same type is given (usually numbers), it is required to find some characteristic in one pass (standard deviation, number of minimum elements, continuous section with the largest amount ...) Additional restriction - the sequence can be very long and will not fit in the memory. Other restrictions on the elements of the sequence, usually not imposed.
With these tasks, everything is more or less clear: you need to find what is called the “inductive extension” of the desired function at the Moscow State University and implement its calculation. If it could not be found (the required amount of memory is too large), then the problem is not solved.
But there are also other tasks. They have additional restrictions on the elements of the sequence in the aggregate, and these restrictions have to be significantly used to solve (and they should not be checked). The simplest such task looks like this:

Task 1. The sequence contains integers from 1 to N in an arbitrary order, but one of the numbers is omitted (the others are encountered exactly once). N is unknown in advance. Determine Missing Number

The solution is obvious: we look through the numbers, we find their number K and the sum S. According to the condition, N = K + 1, then the sum of the numbers from 1 to N will be equal to (K + 1) * (K + 2) / 2, and the missing number equals (K + 1) * (K + 2) / 2-S. If for some reason you are afraid of overflows, then work with unsigned numbers (overflows are not terrible there - but be careful when calculating (K + 1) * (K + 2) / 2 :)), or instead of the amount, look for the XOR of all numbers.

Task 2. The sequence contains integers. One of the numbers occurs exactly once, the rest - twice. Find the number that occurs once.
')
Here, too, everything is simple: find the XOR of all numbers - it will be the answer. In fact, if a bit in the desired number is zero, then in the whole sequence it will be equal to 1 in an even number of elements, and its value in XOR is zero. Otherwise, similarly, its value in XOR is 1. Or, to put it simply, identical elements when summing up mutually destroy.

Slightly complicate the task:
Task 3. The sequence contains integers. The number X occurs once or twice, the other numbers three times. Find the number X. For simplicity, we assume that the numbers are non-negative.

Hidden text

We proceed in the same way as the previous problem: translate each of the numbers into a ternary system: b = b [0] + 3 * b [1] +3 ² * b [2] + ... For each digit, we find the sum of its values modulo 3 (we denote the sum s [0], s [1], s [2], ...). In addition, count the numbers themselves.
If the numbers in the sequence were 3 * k + 1, then X met once, and its value is s [0] + 3 * s [1] +3 ² * s [2] + ... If the numbers were 3 * k + 2, in the set s [i], the units will have to be replaced by twos and vice versa: x [i] = (3-s [i])% 3, and X = x [0] + 3 * x [1] +3 ² * x [2] + ...

And if you take one more step?
Task 4. The sequence contains integers. The number X occurs 1.2 or 3 times, the remaining numbers - 4 times. Find the number X.

Hidden text

The previous approach will not work here: if we take the number system with base 4, and we find the sum of digits, then for the cases when X was met once or three times, everything will be fine. But if X met twice, we will not be able to find out if the next digit was equal to 0 or 2 - the value of the sum si for this discharge will be zero in both cases. What to do?
In fact, the last time I lied to you. There is absolutely no need to mess around with the ternary system - it was enough to count the sum of bits in each binary digit, and if it was divided by 3, then in X the corresponding bit was zero. If not, then one.
In this problem we do the same, but we check the divisibility by 4. For example, these problems can be solved like this:

static int FindNotThree(IEnumerable<int> seq) { int a=0,b=0; foreach(int c in seq) { a^=~b&c; b^=~a&c; } return a|b; } static int FindNotFour(IEnumerable<int> seq) { int a=0,b=0; foreach(int c in seq) { a^=b&c; b^=c; } return a|b; }

Task 5. People are in a long line. For each of them, except the last one, we recorded his name and the name of the one behind him. The resulting records are mixed and recorded in the file. Required for a single file view to determine the names of the first and last person. It is known that these names are different (otherwise the task is unsolvable), but, in general, the names can be repeated. Each person’s name consists of sixteen 8-bit characters.

Hidden text

We will treat each name as a bit line of 128 elements. In each record we have two such lines - b [i] and c [i].
First, let's see what happens if for each i we find the sum s [i] of the differences b [i] -c [i] for all the records.
Since all names except the first and last occur in the lines b and c, the same number of times, when summed, they mutually destroy each other, and the sum of the first and last names will remain in the sum. The value of s [i], therefore, can be -1, 0, or 1.
If s [i] = - 1, then the value of b [i] for the first name is 0, and for the second one, if s [i] = 1, then the values will be 1 and 0, respectively. But if s [i] = 0, then we can only say that the values of this bit in the first and last name are the same. How would we find them?
Suppose we know that for some k, our s [k] is nonzero. What happens if we find the XOR values (b [i] & b [k]) ^ (c [i] & c [k])?
For all n names except the first and last, the expression n [i] & n [k] will enter the sum twice (once as b, second time as c) and will give zero contribution. If f is the first name and p is the last, then the sum will remain (f [i] & f [k]) ^ (p [i] & p [k]). We are interested only in those bits for which f [i] = p [i] (we have already found the values of the others). Therefore, (f [i] & f [k]) ^ (p [i] & p [k]) = f [i] & (f [k] ^ p [k]), and since s [k]! = 0 , then f [k] ^ p [k] = 1, and the total sum is equal to f [i].
Unfortunately, we cannot say in advance in which bit the names will differ. Therefore, just in case, we will consider the sums
(b [i] & b [k]) ^ (c [i] & c [k]) for all pairs i, k. In total, we need 128 * 127/2 = 8128 one-bit counters and 128 two-bit counters (to calculate s [i]).
For example, you can write processing as follows (we assume that both names in the record are transmitted in one byte array, written in a row):

  static byte[] FindDiffNames(IEnumerable<byte[]> seq) { const int LName=16; byte[,] pairs=new byte[LName*8,LName]; byte[] res=new byte[2*LName]; foreach(byte[] name in seq) { for(int i=0;i<LName;i++) { res[i+LName]^=(byte)(name[i]&res[i]); res[i]^=(byte)(name[i]^name[i+LName]); res[i+LName]^=(byte)(name[i+LName]&res[i]); for(int k=0;k<LName*8;k++) { byte mask=(byte)(1<<(k&7)); if((name[k>>3]&mask)!=0) pairs[k,i]^=name[i]; if((name[LName+(k>>3)]&mask)!=0) pairs[k,i]^=name[i+LName]; } } } for(int i=0;i<LName;i++) { int b0=res[i],b1=res[i+LName],s=0; for(int j=0;j<LName*8;j++) s|=pairs[j,i]; s&=~b0; res[i]=(byte)((b0&~b1)|s); res[i+LName]=(byte)((b0&b1)|s); } return res; }

With this technique, you can also find the difference of sets, one of which is obtained from the other by adding two or even three elements (or adding two and removing one). If the differences are stronger, you have to store the sum of conjunctions not only pairs, but also triples of bits. And the XOR is not enough there - you have to count at least three-bit alternating sums.

UPD: In the discussion of this problem in the comments SeptiM proposed a simpler solution. We consider names as 128-bit integers (xi, yi), and count the sums S1 = sum (xi-yi), S2 = sum (xi ^ 2-yi ^ 2) (the first sum should be signed 129-bit, the second - sign 257-bit. Overflow is ignored, we work modulo 2 ^ 129 and 2 ^ 257, respectively). It is clear that their values are S1 = x1-xn, S2 = x1 ^ 2-xn ^ 2, where x1 is the first name, xn is the last. From here we easily find x1 = (S1 + S2 / S1) / 2, xn = x1-S1.

Task 6. The sequence contains integers, more than half of which are equal to the same number X. In one view of the sequence, find this number.

Hidden text

Note that if we delete two different numbers from the sequence, the condition of the problem will remain true. Therefore, we can cross out pairs of different numbers until all elements are equal to the same number. This number will be X.
To implement this method, we will get a cell in which some element of the sequence will be stored, and a counter — how many copies of this element we have viewed and have not yet crossed out.
When we read the next item, we have three options:
- The counter is zero. Put the read item in the cell, increase the counter by 1.
- The element is equal to the cell value. Increase the counter by 1.
- Element is not equal to the cell value. Decrease the counter by 1.
After we look through the entire sequence, the desired number will appear in the cell.

Unfortunately, it is not possible to generalize this solution to the case when the number X occurs more than in 1 / k cases (k is known). We can also start a k-1 cell with a counter, delete k different elements at a time, we’ll get a candidate for role X at the end of k-1, but we won’t be able to identify it - even the counter will not have the largest value. But if we are allowed to make the second pass, we can count how many times each of the candidates met in a sequence, and issue the most frequent guaranteed.

The original problem has another solution. For each bit, we count how many times it equals 0, and how many - 1, and give a more frequent value. Perhaps it will be possible to generalize it to the case when X occurs more than in 1/3 cases - let's count the statistics for each pair of bits ... what if it helps?

The following two very similar tasks in one pass can hardly be solved. But for them there is an interesting solution for the log (M) passes.
Task 7. In the sequence are written non-negative integers less than M, and it is known that each number occurs no more than once. Find the smallest number that does not occur in this sequence.
Problem 8. The sequence contains M + 1 integer non-negative number, all numbers are less than M. Find some number that occurs at least twice.

Hidden text

The solutions are almost the same. We divide the range 0..M-1 into two or more parts. For each part, we calculate how many numbers it has. In the first task, we leave the earliest subrange, which has fewer numbers than its length, the second, any of the subranges, which has more numbers than its length. The process is repeated until there is a range of one number. It will be the answer.

There is also a task that interests me for a long time, but whose solutions I do not know.
Problem 9. The sequence contains numbers from 1 to N in some order. Each number occurs once. N is known in advance. It is required for one viewing of the sequence to determine the parity of the permutation written in it. What is the minimum amount of memory required for this?
The paradox is that at any pre-selected moment we just have to remember 1 bit of information. But after that it will be necessary to have N + 1 bits - in order to remember which elements go in the sequence after this moment.

Source: https://habr.com/ru/post/243819/

All Articles

In one run

More articles: