Explanation of SNARKs. From calculations to polynomials, the Pinocchio protocol and the pairing of elliptic curves (translation)

Hi, Habr! I present to your attention the translation of ZCash blog articles, which describe the mechanism of the evidence system with zero disclosure of SNARKs used in the ZCash cryptocurrency (and not only).

Source: https://z.cash/blog/snark-explain5.html

Previous articles:
')
Part 1: Explaining SNARKs. Homomorphic hiding and blind computation of polynomials (translation)
Part 2: Explaining SNARKs. Knowledge of the adopted coefficient and reliable blind calculation of polynomials (translation)

Introduction from the translator

Starting the final part of the translation, I want to say that we live in a truly amazing time. The time when higher mathematics has the ability to almost immediately be involved in software development and we can observe “in action” the results of the work of mathematicians of technological institutes in advanced things based on blockchains and data exchanges.

Well, I will not detain further your attention, let's move on to the most interesting ...

From calculations to polynomials

In previous articles we have developed a specific mechanism for working with polynomials. In this part, we will learn how to transform the statements we want to prove and verify in the language of polynomials. The idea of using polynomials in this way begins with the pioneering work of 1991 by Lund, Fortwon, Karloff and Nisan (Carsten Lund, Lance Fortnow, Howard Karloff - University of Chicago AND Noam Nisan - Hebrew University).

In 2013, another yet another breakthrough work by Gennaro, Gentry, Parno and Raikova (Rosario Gennaro, Craig Gentry, Bryan Parno, Mariana Raykova) comes out. This work identified extremely convenient transformations of computations into polynomials called the Quadratic Arithmetic Program (QAP). KAP became the basis for modern zk-SNARK constructions, in particular, those used in ZCash cryptocurrency.

In the first part of the article, the transformations of the calculations in the CAP will be explained by example. Even if you focus on a small example, and not on a general definition, you have to spend enough time to understand it to begin with. So be prepared for a certain mental effort :)

Suppose Alice wants to prove to Bob that he knows

c_{1}, c_{2}, c_{3} \in F_{p}

$c_1, c_2, c_3∈ F_p$ such that

(c_{1} \cdot c_{2}) \cdot (c_{1} + c_{3}) = 7

$(c_1⋅ c_2) ⋅ (c_1 + c_3) = 7$ . The first step is to present the calculation with

c_{1}, c_{2}, c_{3}

$c_1, c_2, c_3$ in the form of an arithmetic circuit.

Arithmetic schemes

An arithmetic circuit consists of calculated arithmetic operations, called transitions, such as addition and multiplication, with connections between them. In our case, the scheme looks like this:

The bottom connectors are input parameters, and the top output connection is the result of calculating the entire circuit for these input parameters.

As can be seen in the figure, connectors and transitions of the circuit are designated in a certain way. These rules will be necessary for the next step, namely the transfer of the scheme in the CAP:

When the same outgoing connector goes to more than one transition, it is considered that it is the same connector — for example, $w_1$ in the example.
It is assumed that multiplication blocks have exactly two inputs, which are called left and right connectors.
Connectors from addition to multiplication or addition are not marked. It is believed that the input parameters of the transitions of addition go directly to the transition multiplication. In the example it is considered that $w_1$ and $w_3$ are inputs $g_2$

A valid assignment set for the scheme is the assignment of values for labeled transitions, where the output value of each multiplication transition is the result of the product of the corresponding inputs.

So, for our scheme, the valid assignment set is:

(c_{1}, . . ., c_{5})

$(c_1, ..., c_5)$ Where

c_{4} = c_{1} \cdot c_{2}

$c_4 = c_1⋅ c_2$ and

c_{5} = c_{4} \cdot (c_{1} + c_{3})

$c_5 = c_4⋅ (c_1 + c_3)$

Following this terminology, Alice wants to prove that she knows a valid assignment set

(c_{1}, . . ., c_{5})

$(c_1, ..., c_5)$ such that

c_{5} = 7

$c_5 = 7$ . The next step is to translate this statement into a polynomial using CAP.

CAP

Each transition multiplication must be correlated with the field element:

g_{1}

$g_1$ will be correlated with

1 \in F_{p}

$1 ∈ F_p$ and

g_{2}

$g_2$ with

2 \in F_{p}

$2 ∈ F_p$ . We call the points {1, 2} our target points . Now we need to define a set of "left connecting polynomials"

L_{1}, . . ., L_{5}

$L_1, ..., L_5$ , "Right connective polynomials"

R_{1}, . . ., R_{5}

$R_1, ..., R_5$ and "output connecting polynomials"

O_{1}, . . ., O_{5}

$O_1, ..., O_5$ .

The main idea of this action is to ensure that the values of the polynomials are zero at all target points, except the target point of the multiplication transition, in which they are involved.

Speaking specifically, since

w_{1}, w_{2}, w_{4}

$w_1, w_2, w_4$ left, right and output connectors respectively

g_{1}

$g_1$ can define

L_{1} = R_{2} = O_{4} = 2 - X

$L_1 = R_2 = O_4 = 2 - X$ since the polynomial

2 - X

$2 - X$ equals one at point 1 corresponding

g_{1}

$g_1$ and is zero at point 2 corresponding

g_{2}

$g_2$ .

notice, that

w_{1}

$w_1$ and

w_{3}

$w_3$ both are right inputs

g_{2}

$g_2$ . Therefore, similarly, we define

L_{4} = R_{1} = R_{3} = O_{5} = X - 1

$L_4 = R_1 = R_3 = O_5 = X- 1$ , because

X - 1

$X- 1$ equals one at target point 2 corresponding

g_{2}

$g_2$ and zero at another target point.

Denote all other polynomials as zero polynomials.

With fixed values

(c_{1}, . . ., c_{5})

$(c_1, ..., c_5)$ they are used as coefficients for determining the left, right, and output "total" polynomials. That is, you can determine:

$ inline $ L: = Σ ^ 5_ {i = 1} c_i⋅L_i, R: = Σ ^ 5_ {i = 1} c_i⋅R_i, O: = Σ ^ 5_ {i = 1} c_i⋅O_i $ inline $Then we define the polynomial

P := L \cdot R - O

$P: = L ⋅ R - O$

Now, after all these definitions, we can form a central definition:

(c_{1}, . . ., c_{5})

$(c_1, ..., c_5)$ is a valid scheme assignment set if and only if P takes a zero value at all target points.

Let's see this using our example. Suppose that were defined

L, R, O, P

$L, R, O, P$ as stated above with some

c_{1}, . . ., c_{5}

$c_1, ..., c_5$ . Let's calculate all these polynomials at target point 1 :

Of all

L_{i}

$L_i$ only

L_{1}

$L_1$ is non-zero at point 1 . So,

L (1) = c_{1} \cdot L_{1} (1) = c_{1}

$L (1) = c_1⋅ L_1 (1) = c_1$ . Similarly, we get

R (1) = c_{2}

$R (1) = c_2$ and

O (1) = c_{4}

$O (1) = c_4$ .

Consequently,

P (1) = c_{1} \cdot c_{2} - c_{4}

$P (1) = c_1⋅ c_2- c_4$ . Similar can be obtained

P (2) = c_{4} \cdot (c_{1} + c_{3}) - c_{5}

$P (2) = c_4⋅ (c_1 + c_3) - c_5$ .

In other words, P vanishes at target points if and only if

(s_{1}, . . ., c_{5})

$(s_1, ..., c_5)$ is a valid assignment set.

Now we will use the following algebraic fact: for the polynomial P and the point

a \in F_{p}

$a ∈ F_p$ we have

P (a) = 0

$P (a) = 0$ if and only if the polynomial

X - a

$X-a$ divides P without remainder, i.e.

P = (X - a) \cdot H

$P = (X- a) ⋅ H$ for some polynomial H.

Having determined the target polynomial in this way:

T (x) := (x - 1) \cdot (x - 2)

$T (x): = (x-1) ⋅ (x-2)$ , we obtain that T divides P if and only if

(s_{1}, . . ., c_{5})

$(s_1, ..., c_5)$ is a valid assignment set.

Based on the foregoing, we define the CAP as follows:

The quadratic arithmetic program Q of order d and size m consists of polynomials

L_{1}, . . ., L_{m}, R_{1}, . . ., R_{m}, O_{1}, . . ., O_{m}

$L_1, ..., L_m, R_1, ..., R_m, O_1, ..., O_m$ and the target polynomial T of order d .

Assignment Set

(c_{1}, . . ., c_{m})

$(c_1, ..., c_m)$ satisfies Q if, defining

$ inline $ L: = Σ ^ m_ {i = 1} c_i ⋅ L_i, R: = Σ ^ m_ {i = 1} c_i⋅ R_i, O: = Σ ^ m_ {i = 1} c_i⋅ O_i $ inline $ and

P := L \cdot R - O

$P: = L ⋅ R - O$ , there exists T , which divides without remainder P.

Following this terminology, Alice wants to prove that she knows the set of assignments

(c_{1}, . . ., c_{5})

$(c_1, ..., c_5)$ satisfying the CAP defined above, where

c_{5} = 7

$c_5 = 7$ ,

Let's summarize this part. We saw a statement like "I know

c_{1}, c_{2}, c_{3}

$c_1, c_2, c_3$ such that

(c_{1} \cdot c_{2}) \cdot (c_{1} + c_{3}) = 7

$(c_1⋅ c_2) ⋅ (c_1 + c_3) = 7$ »Can be translated into an equivalent statement about polynomials using CAP. Next, we will look at an efficient protocol for confirming knowledge of a valid set of CAP assignments.

Above, we tried to give a fairly short example of the cast in the CAP. We also recommend an excellent post by Vitalik Buterin for more detailed information on the program conversion to CAP.

Pinocchio Protocol

Earlier, we showed that the statement that Alice wants to prove to Bob can be transformed into an equivalent form in the "language of polynomials," called the Quadratic Arithmetic Program (CAP).

In this section, we describe how Alice can send a fairly short proof to Bob, showing that she has a valid assignment set for CAP. We will use the Pinocchio Protocol developed by Parno, Howell, Gentry and Raykova (Bryan Parno, Jon Howell, Craig Gentry, Mariana Raykova).

As given above, Alice wants to prove that she has a valid assignment set, which has some additional restrictions, for example

c_{m} = 7

$c_m = 7$ . But we will not take this into account here and, for simplicity, we will show how easy it is to prove knowledge of some admissible assignment set.

If Alice has a valid assignment set, it means that if you define

L, R, O, P

$L, R, O, P$ as described above, then there exists a certain polynomial H such that

P = H \cdot T

$P = H⋅T$ . In particular, for any

s \in F_{p}

$s∈F_p$ we have

P (s) = H (s) \cdot T (s)

$P (s) = H (s) ⋅T (s)$ .

Suppose now that Alice has no valid assignment set, but she also defines

L, R, O, P

$L, R, O, P$ from some invalid set

(c_{1}, . . ., c_{m})

$(c_1, ..., c_m)$ . Then you can be sure that T does not divide P. This means that for any polynomial H in the order not higher than d , P and

H \cdot T

$H⋅T$ will be different polynomials. Note that P and

H \cdot T

$H⋅T$ there are no higher order

2 d

$2d$ .

For the proof, we use the well-known Schwarz-Zippel lemma, which states that two different polynomials of degree not higher than

2 d

$2d$ may intersect at most

2 d

$2d$ points

s \in F_{p}

$s∈F_p$ . Thus, if p is much larger

2 d

$2d$ the probability that

P (s) = H (s) \cdot T (s)

$P (s) = H (s) ⋅T (s)$ for a random selection

s \in F_{p}

$s∈F_p$ insignificant.

Based on this, you can create the following protocol sketch to check whether Alice has a valid assignment set:

Alice selects polynomials $L, R, O, H$ order not higher than d .
Bob picks a random point $s∈F_p$ and calculates the hiding $E (T (s))$ .
Alice sends Bob a hide for these polynomials at point s , namely $E (L (s)), E (R (s)), E (O (s)), E (H (s))$ .
Bob checks whether the required equality holds at s . That is, he checks $E (L (s) ⋅ R (s) - O (s)) = E (T (s) H (s))$

Repeating again, if Alice does not have a valid assignment set, she will eventually use polynomials with which the required equality for the majority of randomly chosen s will not hold. Therefore, Bob is more likely to reject Alice’s answer, regardless of the chosen s .

Let's think, do we have the tools to implement this sketch? An important point is the choice of Alice polynomials, which she will use, while not knowing s . But this is exactly the problem that we solved in a reliable blind calculation of polynomials described in the previous article .

Given this, there are still four main points that need to be resolved in order to turn this sketch into a zk-SNARK. The first two we will look at in this article, and the other two in the final article.

The confidence that Alice selects her polynomials according to the assignment set

Important point: if Alice does not have a valid assignment set, this does not mean that she cannot find any polynomials.

L, R, O, H

$L, R, O, H$ order not higher than d , for which

L \cdot R - O = T \cdot H

$L ⋅ R - O = T⋅ H$ . It simply means that it cannot find such polynomials where

L, R

$L, R$ and

O

$O$ were “derived from an assignment set”; namely, that

$ inline $ L: = Σ ^ m_ {i = 1} c_i⋅ L_i, R: = Σ ^ m_ {i = 1} c_i⋅ R_i, O: = Σ ^ m_ {i = 1} c_i⋅ O_i $ inline $ for recruitment

(c_{1}, . . ., c_{m})

$(c_1, ..., c_m)$ .

The protocol above only ensures that it uses some polynomials.

L, R, O

$L, R, O$ proper order, but does not guarantee that they were created from a valid assignment set. The formal proof is somewhat complicated, so we will give an approximate solution.

Let's combine polynomials

L, R, O

$L, R, O$ in one polynomial F as follows:

F = L + X^{d + 1} \cdot R + X^{2 (d + 1)} \cdot O

$F = L + X ^ {d + 1} ⋅ R + X ^ {2 (d + 1)} ⋅ O$

The meaning of multiplying R by

X^{d + 1}

$X ^ {d + 1}$ and o on

X^{2 (d + 1)}

$X ^ {2 (d + 1)}$ in "not mixing" coefficients

L, R, O

$L, R, O$ in f . Coefficients

1, X, . . ., X^{d}

$1, X, ..., X ^ d$ in F correspond to L , the following

d + 1

$d + 1$ coefficients

X^{d + 1}, . . ., X^{2 (d + 1)}

$X ^ {d + 1}, ..., X ^ {2 (d + 1)}$ correspond to R , and the last

d + 1

$d + 1$ coefficients correspond to O.

Combine the polynomials in the definition of the CAP in a similar way, defining for each

i \in 1, . . ., m

$i ∈ {1, ..., m}$ polynomial

F_{i}

$F_i$ whose first

d + 1

$d + 1$ coefficients are coefficients

L_{i}

$L_i$ and then the coefficients

R_{i}

$R_i$ and then

O_{i}

$O_i$ . That is for everyone

i \in 1, . . ., m

$i ∈ {1, ..., m}$ we define a polynomial:

F_{i} = L_{i} + X^{d + 1} \cdot R_{i} + X^{2 (d + 1)} \cdot O_{i}

$F_i = L_i + X ^ {d + 1} ⋅ R_i + X ^ {2 (d + 1)} ⋅ O_i$

Note that when we sum two

F_{i}

$F_i$ then

L_{i}, R_{i}

$L_i, R_i$ and

O_{i}

$O_i$ "Summed separately." For example,

F_{1} + F_{2} = (L_{1} + L_{2}) + X^{d + 1} (R_{1} + R_{2}) + X^{2 (d + 1)} (O_{1} + O_{2})

$F_1 + F_2 = (L_1 + L_2) + X ^ {d + 1} (R_1 + R_2) + X ^ {2 (d + 1)} (O_1 + O_2)$ .

More generally, suppose we have

F = Σ_{i = 1}^{m} c_{i} \cdot F_{i}

$F = Σ ^ m_ {i = 1} c_i⋅ F_i$ for some set

(c_{1}, . . ., c_{m})

$(c_1, ..., c_m)$ . Then we also get

$ inline $ L = Σ ^ m_ {i = 1} c_i⋅ L_i, R = Σ ^ m_ {i = 1} c_i⋅ R_i, O = Σ ^ m_ {i = 1} c_i⋅ O_i $ inline $ for the same coefficients

(c_{1}, . . ., c_{m})

$(c_1, ..., c_m)$ . In other words, if F is a linear combination

F_{i}

$F_i$ it means that

L, R, O

$L, R, O$ were really created from the set.

Therefore, Bob will ask Alice to prove to him that F is a linear combination from

F_{i}

$F_i$ . This is done similarly to the protocol for reliable calculation:

Bob chooses random

β \in F_{p}^{*}

$β∈ F ^ * _ p$ and sends Alice to hide

$ inline $ E (β⋅ F_1 (s)), ..., E (β⋅ F_m (s)) $ inline $ . He then asks Alice to send him a hide.

E (β \cdot F (s))

$E (β⋅ F (s))$ . If she succeeds, the expanded version of Accepted Coefficient knowledge suggests that she knows how to create F , which is a linear combination of

F_{i}

$F_i$ .

Adding the “Zero Disclosure” Part - Hiding the Assignment Set

In zk-SNARK, Alice wants to hide all the information about her set assignment. However hide

E (L (s)), E (R (s)), E (O (s)), E (H (s)

$E (L (s)), E (R (s)), E (O (s)), E (H (s)$ provide some information about recruiting.

For example, given some other satisfying assignment set

(c_{1}^{'}, . . ., c_{m}^{'})

$(c'_1, ..., c'_m)$ Bob can calculate the corresponding

L^{'}, R^{'}, O^{'}, H^{'}

$L ', R', O ', H'$ and hide

E (L^{'} (s)), E (R^{'} (s)), E (O^{'} (s)), E (H^{'} (s))

$E (L '(s)), E (R' (s)), E (O '(s)), E (H' (s))$ . If they are different from Alice’s answer, he can understand that

(c_{1}^{'}, . . ., c_{m}^{'})

$(c'_1, ..., c'_m)$ not Alice's assignment set.

To avoid such leakage of information about its set, Alice hides her set by adding a “random T- offset” for each polynomial, i.e. chooses random

$ inline $ δ_1, δ_2, δ_3∈ F ^ * _ p $ inline $ and determines

$ inline $ L_z: = L + δ_1⋅ T, R_z: = R + δ_2 T, O_z: = O + δ_3⋅ T $ inline $ .Let's pretend that

L, R, O

$L, R, O$ were obtained from a satisfying set of assignments. Consequently

L \cdot R - O = T \cdot H

$L ⋅ R - O = T⋅ H$ for some polynomial H. Since we just added a multiple of T everywhere, T also divides

L_{z} \cdot R_{z} - O_{z}

$L_z⋅ R_z- O_z$ . Let's see this:

$ inline $ L_z⋅ R_z- O_z = (L + δ_1⋅ T) (R + δ_2⋅ T) - (O + δ_3 T) $ inline $

$ inline $ = (L ⋅ R - O) + L ⋅ δ_2⋅ T + δ_1⋅ T⋅ R + δ_1δ_2⋅ T ^ 2- δ_3⋅ T $ inline $

$ inline $ = T⋅ (H + L ⋅ δ_2 + δ_1⋅ R + δ_1δ_2⋅ T- δ_3) $ inline $Thus, defining

$ inline $ H_z = H + L ⋅ δ_2 + δ_1⋅ R + δ_1δ_2⋅ T- δ_3 $ inline $ get

L_{z} \cdot R_{z} - O_{z} = T \cdot H_{z}

$L_z⋅ R_z- O_z = T⋅ H_z$ . Therefore, if Alice uses polynomials

L_{z}, R_{z}, O_{z}, H_{z}

$L_z, R_z, O_z, H_z$ instead

L, R, O, H

$L, R, O, H$ , Bob will always accept her answer.

On the other hand, these polynomials, calculated at

s \in F_{p}

$s ∈ F_p$ provided that

T (s) \neq 0

$T (s) ≠ 0$ (which is almost always for all d and s ), do not contain any information about the valid assignment set. For example,

T (s)

$T (s)$ non-zero and

δ_{1}

$δ_1$ is random. Then

δ_{1} \cdot T (s)

$δ_1⋅T (s)$ is a random value, so

L_{z} (s) = L (s) + δ_{1} \cdot T (s)

$L_z (s) = L (s) + δ_1⋅ T (s)$ will not contain any information about

L (s)

$L (s)$ because it is “masked” by this random value.

What is left to consider in the final part?

We approximately showed the Pinocchio protocol scheme, in which Alice can convince Bob that she has a valid assignment set for the CAP, without disclosing information about this set. But there are still two important issues that need to be addressed to get the zk-SNARK:

In the scheme, Bob needs the polynomial H , which "supports multiplication." For example, he needs to calculate the hiding $E (H (s) ⋅ T (s))$ of $E (H (s))$ and $E (T (s))$ . However, we have not yet considered examples of H , which allows this. We considered only H , which supports addition and linear combinations.
We also discussed interactive protocols between Alice and Bob. However, our ultimate goal is to allow Alice to send non-interactive evidence in one message. These messages are publicly available. This means that anyone who sees this “message - proof” can also be convinced of its authenticity, and not just Bob (who previously interacted with Alice).

Both of these issues can be solved using pairs of elliptic curves, which we will discuss in the final part.

Next article: Explaining SNARKs. Mating elliptic curves (translation)

Source: https://habr.com/ru/post/343954/

All Articles