2 min

A Trick for Making Voting Anonymous

Published on January 11, 2021

This post provides a short, very elementary teaser on Secure Multiparty Computation. If you are not already familiar with this area of work, I hope this whets your appetite for a family of algebraic techniques and application possibilities that arise. For example, you may not appreciate that it is possible to determine the statistical value of exogenous data (say by computing Granger causality) without learning anything about it. You can also use that data in a privacy preserving manner. 

The Scenario

As my mind tries in vein to digest recent events, I struggle to imagine a sizable fraction of Congress, including senior members of the administration, under siege. There are so many angles to this that will be explored in years to come, I am sure. One tiny corner of that analysis might include group decision making subject to certain procedures. This gives rise to a problem we can cleanly formulate - one that might be useful elsewhere. And if nothing good comes out of this, at least we can learn another application of algebra. 

Let us suppose the unfortunate participants have a simple but urgent task before them. They must compute whether the "sum of their opinions" exceeds a threshold. If it does, we imagine immediate collective action might follow. It not, none may. For simplicity in this dire thought experiment - if only it were that - we will assume that each individual's internal compass has moved into a binary state. Like a quantum wave function collapsing after an observation has been made, each person is a zero or a one. The group must aggregate these. 

The dilemma arises, we further assume, because many of the participants would prefer that their number never be revealed. They may have a shared interest in determining the sum, if that can be arranged, but would prefer that no other information is ever revealed to anyone. A private ballot administered with scrunched up pieces of paper is suggested, but the notion leaves several members uncomfortable. What if the hand-written notes were to be made public or fall into the hands of those rampaging outside? Hand recognition might be used at a later date to reveal who voted which way. 

There is another issue arising due to the relationships between these people - relationships we shall describe as "complex". Some may wish to see the sum exceed the threshold, but may also see opportunity in appearing as a zero, should the aggregation of votes fall short.

Furthermore, the task before them - while simple in principle - is complicated by the fact that there is no single person amongst the community who is viewed by every other person as entirely trusted to oversee a vote and keep votes anonymous thereafter.

There is a final challenge. Even if there were such a trusted person in their midst, it is necessary to perform this important vote in as transparent a manner as possible. That sounds, of course, like a conflicting objective. But we shall proceed. 

An Approach

To formalize the problem, we shall let \(x_i \in \{0,1\}\) denote the position taken by the \(i\) member of congress and we wish to determine \(s = \sum_{i=1}^n x_i\).

We choose a prime number \(p>n\) and in what follows we will assume members of congress are capable of generating random numbers modulo \(p\) on demand. They use this ability, and their ability to subtract, to arrive at two numbers, called secrets, which are uniformly chosen. The \(k\)'th congressperson will scratch out these two numbers on two separate pieces of paper. We will refer to them as \(r_{k,1}\) and \(r_{k,2}\) and, by construction, they satisfy: \begin{equation} x_k = r_{k,1} + r_{k,2} \mod p \end{equation}

We view this as a splitting of the private information \(x_k\) into secrets, any one of which is considered mostly harmless in isolation. For instance, if \(r_{k,1}\) were to be revealed to the press, this would say nothing at all about the value of \(x_k\). Statisticians might say that the Bayesian posterior of \(x_k\) conditioned on the information \(r_{kj}\) is uniform, for any choice of \(j\).

Next, the huddled leaders will arrange themselves in a circle. They will then pass one of their secrets to the person on their left. This person will be forever in possession of this piece of paper and know the source, but as noted, that in and of itself conveys no information.

Then, each member will add the secret number they received from the person on their right to the secret they retained. They will announce this sum to all present, allowing any and all who wish to compute the sum of all announced numbers modulo \(p\) - which we will call \(s'\). It remains to notice that this announced sum is precisely the vote we need, since: \begin{equation} s' = \sum_{k} \overbrace{ r_{k,1} + r_{k+1,2} }^{announced\ by\ k'th\ member} \mod p = \sum_j \overbrace{ r_{j1} + r_{j,2} }^{k'th\ vote} \mod p = s \end{equation} by rearrangement - which is to say that nobody lost any pieces of paper. Notice that by construction, the sum of the secrets is the number \(s\) we want at the outset, and all secrets are accounted for.   

Limitations and Extensions

The approach above might be the best one can hope for under extreme time pressure. However, there are some problems with it. It will be noted that if the person to your left and the person to your right were to conspire, they will be able to determine the value of your secret number. If we assume indexing in a clockwise direction, the \(k\)'th person in the circle receives \(r_{k+1,1}\) without loss of generality, and gives away \(r_{k,1}\). The secret \(r_{k,2}\) is held back, but since \(r_{k,2}+r_{k+1,1}\) is announced, \(r_{k,2}\) is, in effect, also given away. Another way to express this leakage is via: \begin{equation} r_{k,1} + \overbrace{r_{k,2}+r_{k+1,1}}^{announced} - r_{k+1,1}  = r_{k,1} + r_{k,2}  = x_k \mod p \end{equation} where it is clear that all terms on the left hand side are either public information, or known to the conspiracy.

Perhaps it is apparent to the reader that the scheme can be improved by breaking down each person's vote into a larger number of secrets - as many as there are people present. That would require more work, of course, and it might be argued that some additional security has been added to the voting while at the same time, the process is transparent. 

There may be a downside as well, related to a second shortcoming in the scheme - the system might be manipulated. There is no guaranteeing that any given member will choose secrets that reveal their true intent, and they might choose the opposite vote in order to try to manipulate the result. 

The field of Secure Multiparty Computation is made interesting, and non-trivial, by the need to overcome shortcomings such as this one. And yet we are already on our way towards some potentially transformational tricks. The core idea is that data can be masked by random noise that is mostly "white" yet somehow entangled with other noise. A series of rounds of private computation, followed by public communication, can be arranged to effect a multiparty computation. 

Splitting information into multiple pieces can also be facilitated by algebra over fields - just in case you thought that was abstract nonsense with no application. For example, you may recall that if \((x_1,y_1),\dots,(x_m,y_m)\) are arbitrary points in the plane, with the \(x\) distinct, then there exists a unique polynomial \(f\) of degree \(m-1\) satisfying \(f(x)=y\). This is Lagrange's Interpolation Theorem. 

Of course there is a desire to compute functions of privately held data that are much more complicated than addition modulo \(p\). But as you can read in the Secure Multiparty Computation textbooks, it isn't too hard to arrange this. One starts by reducing multiplication down to a slightly more general version of the protocol I have already described. The basic idea is that if two numbers, \(x_1 = \sum_{j=1}^m r_{1,j}\) and \(x_2 = \sum_{j}^m r_{2,j}\) have been split into secrets, and those secrets distributed, it may be the case that every term in their expansion is known to one or more people. \begin{equation} x_1 x_2 = \left( \sum_{j=1}^m r_{1,j}\right)\left( \sum_{k}^m r_{2,k} \right) = \sum_{j,k} \overbrace{ r_{1,j} r_{2,k} }^{announced} \end{equation} Thus as with the addition example, the group may be able to compute the product. Depending on the choice of \(m\) and sharing procedure, possible attacks need to be considered, of course.

Multiplication is important because as a special case (multiplying binary numbers) this gives rise to secure match-making. For instance, a bid and an offer might be determined to constitute a trade without, in the absence of a match, either side revealing any intent. More generally, matrix computations are possible and with that, Machine Learning algorithms including gradient descent. The possibilities are quite astonishing, so reach out if you are interested in applications to trading, or wish to contribute to open source work down these lines.

Further Reading

I used notation from Secure Multiparty Computation, by Cramer, Damgard and Nielsen (Cambridge University Press) so that you can seamlessly progress to more complicated examples there, should you wish to do so.