Anti-Harassment Tools/SecurePoll Improvements/Test Results/Precision

Comments

Background

There are various calculations that get done when tallying an STV election. All of these could be affected by arithmetic precision bugs.

Due to the limited precision of any computer arithmetic, we cannot guarantee our results will always be precise. I have been trying to test the sort of precision errors we can possibly expect, and whether they are acceptable.

Caveat: I am not a numerical analyst, so I am not an expert in doing this sort of analysis.

Calculations/possible sources of numerical error (not exhaustive):

Droop quota
Keep factors for each candidate
Vote distribution
Surplus
Total votes

Vote distribution

See "Issue 3" in T289347#7301407.

In STV, each vote is divided up and given to candidates based on a formula. For example, if a vote lists preferences as

Candidate A
Candidate B
Candidate C

if Candidate A has reached the quota and been elected then they would only receive a proportion x of the vote and Candidate B would receive the remainder (1 - x). If Candidate B also reached the quota they would receive a proportion of the remainder ((1 - x) * y) and Candidate C would receive the remainder ((1 - x) * (1 - y)).

Forward error analysis

I compared the vote distribution computed by SecurePoll with the "expected" vote distribution (calculated using bc, an arbitrary precision calculator).

There were at least two factors to consider:

the number of candidates a single vote is divided between
the number of proportional votes a candidate receives (particularly if those proportions are small)

a few elections were generated to cover some of these factors.

	Absolute difference between actual and expected
Case	Earned that round	Total votes
Candidate receives 1 vote divided between 9 candidates (blt, results)	2.51938604067E-22	2.51938604067E-22
Candidate receives a small proportion (< 0.01) of 8 votes (blt, results)	2.9283E-14	1.0680E-14
Candidate receives 31 votes, each divided between 9 candidates (blt, results)	2.4389485E-16	7.275280E-14

Internal consistency

In each round we calculate each candidates previous rounds votes, votes earned in that round and total votes.

The precision of votes earned that round and total votes is not always the same. For example, even though in this round the votes earned is greater than 0, the difference between the previous rounds votes and total votes is 0.

By subtracting the previous rounds votes from the total votes and comparing to the votes earned that round, this give us a particular measure of precision (exactly the significance of this measure, I don't know).

Based on the large amounts of test data I had locally, the biggest difference I found was 1.35173650051E-10 (this result from this election).

Factors effecting precision

Order of operands

A candidate's vote distribution is calculated like: 11 + 5 + 3.980676212560387 + ...

The order the operands appear in the calculation can affect the precision of the answer.

For example, to 16 decimal places in PHP:

0.980676212560387 + 0.980676212560387 + 100 = 101.9613524251207792
100 + 0.980676212560387 + 0.980676212560387 = 101.9613524251207650

The order of operands can vary from candidate to candidate.

For example, in this election, candidates 1 and 3 have slightly different "earned" votes in round 3 (see here), even though this number should be a the same (both candidates have exactly the same number of votes, including 2nd preferences). I suspect (although I cannot know for sure) that this is due to different orders of operands. The difference does not impact the candidates "total" votes, nor the election outcome.

I don't know if we can predict the effect, nor algorithmically decide which order of operands will give the "best" results.

Misc

The total votes are also calculated at the same time, which could affect the droop quota.