⬆️ ⬇️

White noise draws a black square

Any analyst, at the beginning of his work, goes through the hated stage of determining the identification of distribution parameters. Then, with the accumulation of experience, for him the reconciliation of the residual scatter obtained means that a stage, in the analysis of Big Data, has been passed and you can move on. It is no longer necessary to check hundreds of models for consistency with different regression equations, to look for segments with transients, to compose models. To torment yourself with doubts: "Maybe there is some other model that is more suitable?"



I thought: “What if you go by the opposite. See what white noise can do. Can white noise create something that our attention compares with a significant object from our experience? ”





Fig. White noise (file taken from the network, size 448h235).

')

On this issue, argued as follows:



  1. What is the probability that horizontal and vertical lines of noticeable length will appear?
  2. If they can appear, what is the probability that they will coincide with their origin along one of the coordinates and form a rectangular figure?


Further in the text, I will explain how these tasks associated with the analysis of Big Data.



In the book of G.Sekay “Paradoxes in the theory of probability and mathematical statistics” (p.43) I found a reference to the Erdos – Rényi theorem, which reads:

When throwing a coin n times, a series of arms of length  log2nis observed with a probability tending to 1, with n tending to infinity.



For our picture, this means that in each of the 235 lines with probability tending to 1, there is:







that is, we drop to the whole - 8 black dots in a row horizontally.



And for all 448 columns, with a probability tending to 1, there is:







discarding to the whole - 7 black dots in a row, vertically.



From here we get the probability that a black rectangle 8x7 pixels in size will be made up in the “white noise” for this picture:







Where 1 is the first sequence of black dots in a line, anywhere in two-dimensional space.



I do not argue that the probability is very small, but not zero.



Moving on, we can connect all the lines into one and get a line with a length of 102,225 characters. And then, according to the Erdos – Rényi theorem, with a probability tending to 1, there is a chain of length:







And for a chain of 1 million records:







As we see, the connection of the Erdos-Rényi theorem, with Big Data, was unambiguously designated.



Note. Further I will present my own analysis of the revealed. As in that form, this theorem and its proof, which is presented in the book of G. Sékei, I could not be found.



We obtain that the Erdos-Rényi theorem can be used as a test, by definition of data homogeneity.



It is applicable to distributions having a central moment of the first order (MX).

It can only be applied to single-channel sequential random processes.



How to apply it



Any distribution, with the expectation, we can imagine as a deviation from the center: left-right, up and down. That is falling out: tails.



Accordingly, by this theorem, the interval in which consecutive values, in the amount m= log2Nare above or below MX (Y (xi)).



Note. In this aspect, we wanted to see the proof of this theorem, in order to understand there is only one such row (only higher or lower) or two (higher and lower). In my thoughts, the symmetry of these phenomena should give rise to two contracts and, on the other hand, analyzing the proof of a similar process, these mathematicians related to graphs, then suggested that they built evidence on the definition of the maximum. That admits the existence of a proof on minimization of the objective function. There were questions about how the Erdös-Rényi theorem looks for asymmetric probabilities, for variants over 2.



The practical consequence of the discovery of only one, such a consecutive contract, in the base under study, enables us to assume that all the data presented are homogeneous.

The second. If, by processing the data, according to the Erdos – Rényi theorem, we find that there is a series of more in terms of the number of values ​​than it should be, then the situation presented in the figure is likely.





The series presented in the figure is composed as a composition of two functions, for the purposes of the example.



Third conclusion. If, processing data (1 million records), according to the Erdos – Rényi theorem, not a single row with a length of 19 numbers was found, but it was found, for example, three sequences with 17 numbers. It can be assumed that the general data consists of a composition of three functions, and by the place of these series, determine the intervals in which transients may occur.



When I was working on this material, an observation was made about the following. Everything developed data analysis methods are made for technologies when, by small natural observations, it is necessary to determine the parameters of a much larger population, according to 100 observations, to determine the properties of the general population of 1 million or more. And for modern tasks, when it is necessary to decompose a huge database, the tools developed by statistics are very laborious.

Source: https://habr.com/ru/post/460473/



All Articles