Anti-Harassment Tools/SecurePoll Improvements/Test Results/Precision

Comments

edit

Background

edit

There are various calculations that get done when tallying an STV election. All of these could be affected by arithmetic precision bugs.

Due to the limited precision of any computer arithmetic, we cannot guarantee our results will always be precise. I have been trying to test the sort of precision errors we can possibly expect, and whether they are acceptable.

Caveat: I am not a numerical analyst, so I am not an expert in doing this sort of analysis.

Calculations/possible sources of numerical error (not exhaustive):

  • Droop quota
  • Keep factors for each candidate
  • Vote distribution
  • Surplus
  • Total votes

Vote distribution

edit

See "Issue 3" in T289347#7301407.

In STV, each vote is divided up and given to candidates based on a formula. For example, if a vote lists preferences as

  1. Candidate A
  2. Candidate B
  3. Candidate C

if Candidate A has reached the quota and been elected then they would only receive a proportion x of the vote and Candidate B would receive the remainder (1 - x). If Candidate B also reached the quota they would receive a proportion of the remainder ((1 - x) * y) and Candidate C would receive the remainder ((1 - x) * (1 - y)).

Forward error analysis

edit

I compared the vote distribution computed by SecurePoll with the "expected" vote distribution (calculated using bc, an arbitrary precision calculator).

There were at least two factors to consider:

  1. the number of candidates a single vote is divided between
  2. the number of proportional votes a candidate receives (particularly if those proportions are small)

a few elections were generated to cover some of these factors.

Absolute difference between actual and expected
Case Earned that round Total votes
Candidate receives 1 vote divided between 9 candidates (blt, results) 2.51938604067E-22 2.51938604067E-22
Candidate receives a small proportion (< 0.01) of 8 votes (blt, results) 2.9283E-14 1.0680E-14
Candidate receives 31 votes, each divided between 9 candidates (blt, results) 2.4389485E-16 7.275280E-14

Internal consistency

edit

In each round we calculate each candidates previous rounds votes, votes earned in that round and total votes.

The precision of votes earned that round and total votes is not always the same. For example, even though in this round the votes earned is greater than 0, the difference between the previous rounds votes and total votes is 0.

By subtracting the previous rounds votes from the total votes and comparing to the votes earned that round, this give us a particular measure of precision (exactly the significance of this measure, I don't know).

Based on the large amounts of test data I had locally, the biggest difference I found was 1.35173650051E-10 (this result from this election).

Factors effecting precision

edit

Order of operands

edit

A candidate's vote distribution is calculated like: 11 + 5 + 3.980676212560387 + ...

The order the operands appear in the calculation can affect the precision of the answer.

For example, to 16 decimal places in PHP:

0.980676212560387 + 0.980676212560387 + 100 = 101.9613524251207792
100 + 0.980676212560387 + 0.980676212560387 = 101.9613524251207650

The order of operands can vary from candidate to candidate.

For example, in this election, candidates 1 and 3 have slightly different "earned" votes in round 3 (see here), even though this number should be a the same (both candidates have exactly the same number of votes, including 2nd preferences). I suspect (although I cannot know for sure) that this is due to different orders of operands. The difference does not impact the candidates "total" votes, nor the election outcome.

I don't know if we can predict the effect, nor algorithmically decide which order of operands will give the "best" results.

Misc

edit

The total votes are also calculated at the same time, which could affect the droop quota.