Leaderboard Manifesto

This is a vision for how an accurate leaderboard works and what I’ve been trying to make with my RTA and IL Leaderboard Trackers for SMS.

Contents

1 | Remit

What should the leaderboard track?

1a | Ontology

Let’s figure out what we’re tryna make in the first place – what is a leaderboard and what is a run?

Speedruns are achievements, i.e. notable events that happened. We can classify each speedrun based on whether it fits a definition of a category we’re tracking – gameplay requirements, hardware requirements, etc.

Aside: Native Timing

If a run doesn’t fit a requirement for a technical reason but still represents the same achievement, we may decide it is fairer to include it, but adjust the timing so that it doesn’t gain an advantage over a clear majority of runs that fit all requirements. That’s the idea of pessimistically retiming a run from non-native timing to native timing, and might be done e.g. for an IL lacking in-game timer because it was performed in a full-game run, or for hardware that unintentionally gained an advantage.

We have a ranking criterion, some form of the time a speedrun took, which gives an ordering on all speedruns that fit our criteria. For every unique person in the world who has performed a speedrun that fits, we want to find their best-ranking one, and the leaderboard then is a ranked list of all of those.

That’s really it; I think most people, if they listen to their intuitions, are looking for this definition. And if they go to a leaderboard, they expect to find out this information, without having to understand leaderboard politics. That’s why I find the current norm of deliberately removing valid top runs so unacceptable. The intuitive expectations of most users render this action disinformation.

1b | Objectivity

Postmodern Leaderboards

Counterargument: what counts as a valid run is what the community considers acceptable, and that takes into account a runner’s behaviour.

Other than not fitting what most people who don’t know the runner so well are seeking from the leaderboard, I find this pov rather dystopian. It’s decided via informal processes of cancelling and lynching, which are usually popularity contests, and the exact circumstances around how behaviour is judged, and the ethical beliefs surrounding it, are very slippery and changeable as time passes and history unravels. Don’t forget that the leaderboard now has a role to play as an archive for future generations of players and researchers, and these kinds of subjective judgements add noise – not least by polluting the definition of a valid speedrun with things unrelated to the run itself.

Victim Protection

Beyond the above, what about if the leaderboard mods recognise a case of abuse and want to erase data pertaining to the abuser to protect the victim? Firstly, it’s important to recognise that inaccurate leaderboards can co-exist with accurate ones, and this manifesto doesn’t concern those. An accurate leaderboard is purely data, so has no realistic notion of “platforming” or “celebrating” individuals. Wherein a leaderboard takes on this extra personal significance to someone owing to its prominence (e.g. speedrun.com), an ethical balance has to be drawn between what is reasonable to do to help a victim, and the trust that is placed in it by users to give correct information. My general view on this is that in the long run, redacting abusers is counterproductive to getting over victimhood, yet incurs a large iniquity of disinformation. Anonymisation of names is a reasonable short-term compromise, but I think an accurate leaderboard has a perennial role to play in the development of history, and so should always be available un-anonymised (I’ll revisit this in section 3, on data protection).

Official Leaderboards

It’s also important to realise that a leaderboard is a public good, and so has no relationship to the maintainers of the leaderboard or whatever is considered the “community” surrounding it. A public archive like this can be maintained independently by several people, and no single instance is special, other than that it might have a headstart by being tied to a market-leading platform such as speedrun.com. In practice, leaderboards compete based on how accurately they can answer questions that people want to know. I set out my belief for what those questions are above, and time will tell whether those leaderboards become more useful to current/future players and historians alike.

1c | Verifiability

Having specified the truth that we want to know, we have to confront that we will never be fully certain of it. Most obviously, it is possible for speedruns (that we want to track) to happen and to never be observed by leaderboard moderators. I’ll pick this topic up in section 2, but important to the remit of the leaderboard is the concept of a verifiable run.

Leaderboards often have verification criteria that are used as blunt tools to assess whether a claimed run is valid or indeed happened at all. A standard assumption is that a run without video didn’t happen. It may not be true, and so might make the leaderboard less accurate, but it simplifies the quandary of dealing with hearsay to something that’s practical for moderators.

However, what happens when a run is verifiable but isn’t verified by a leaderboard? This can result from policy – banning a runner, or not dealing in runs that runners themselves don’t submit for example. Under the remit described in section 1a, this run should be tracked. It may never even be verified (to the standard of a verified run), but the fact that it is believed to be verifiable makes it the business of the leaderboard. This avoids the situation where mods don’t engage with a run for reasons unrelated to the validity of the run itself.

Prior Verification
The historical practice of tracking speedruns (or anything else) relies on primary and secondary sources. One important secondary source is other leaderboards, especially past leaderboards that function well as archives, but also contemporary leaderboards. We can take a run being verified by another leaderboard as good evidence of its validity, and different leaderboards can mutually benefit each other in this way.

2 | Practice

How should the leaderboard track this?

2a | Principle of Best Knowledge

When confronted with the fact that our knowledge will always be incomplete, and there will always be valid runs we don’t know about, and ones we think are valid but actually aren’t, people often have the instinct to give up and treat a leaderboard as meaningless. But this is out-of-step with most human endeavours, from science to history, where the goal is really to try your best, and the result of that is worthwhile in itself.

The principle here is that the leaderboard is a sum of decisions made on the validity of all of the speedruns that may fit the criteria of its definition, and each decision is made with the best knowledge available at the time. Whatever effort is available from contributors for research and investigations of runs, we don’t question it but rather make do with what we have and accept it will be reflected in the quality of our leaderboard.

2b | Awareness

If we become aware of a new fact that affects the leaderboard, like say we discover an eligible run, then that change should ideally be made, and that’s what drives the progress of the leaderboard. We’re again limited by contributor effort, and have to take care to avoid bias due to caring more about some people than others. But ideally, there are contributors who monitor available resources for new verifiable pbs, and who re-examine previously accepted pbs, and make those changes as they arise. My leaderboard trackers are so-called because they automatically derive updates by monitoring the most popular leaderboards, which most runners actively submit to.

2c | Submission

This recasts the role of a player’s submission from being a necessary step in eir run’s publication to being an optional, structured way to make the leaderboard’s maintainers aware that the run happened. A player still has control over whether eir run appears on the leaderboard, but that control becomes the same as controlling public information, so for example hiding or never sharing a video. But in this way, assessing whether the run is valid is a democratised and public process, and usually claiming an unverifiable run, or removing documentation of a world record, is condemned socially.

2d | Awareness/Effort Tradeoff

Beware of ego – a player’s refusal to submit, or violation of social standards, does not affect the objective evidence available around a run, which is all that determines how the leaderboard handles it. This applies even when this creates unnecessary work for the moderators; it’s not valid, for example, to refuse to list a run because the runner refused to submit a video and the moderator has to archive a video emself out of a huge past broadcast – unless the moderator is unwilling to do this work regardless of the player in question. As an example, consider how you would treat the same run by a prominent player who proudly refuses to submit a video after declaring emself “above the rules”, vs an unknown player who hasn’t submitted video and you can’t get in contact with.

For a mod, this also causes a conflict between acting on anything one becomes aware of, and the effort one wants to put into the leaderboard. As an obvious extreme, it is unlikely that a mod will want to track every IL obtained in practice for a full-game run by a player who doesn’t submit ILs at all. This is ameliorated by having a larger and better-functioning mod team, but in any case it’s worth agreeing standards of leaderboard accuracy vs reasonable work, and strenghtening the standards as more labour becomes available, while taking care to avoid bias towards specific players. The better the community functions as a collective, the more willing players will be to uphold the standard of the leaderboard from their end, whether thru a more collectivist mentality or peer-pressure.

3 | Data Protection

Does this conflict with personal rights to personal data?

3a | Archival Ethics

There are two broad ethical principles in society that conflict with each other, and leaderboards (and human history in general) are affected by this, so have to take their cues from social norms (like laws) on how to reconcile them. They are

While it would be strange for someone to request having eir name etched out of the Wimbledon Championships’ engraved champion history (well, more comparably, the Wikipedia article), such requests are unfortunately commonplace in speedrunning and will continue to be until the sport matures.

It is entirely arbitrary whether society favours archival or data protection, but the current legal standards are (fortunately for a leaderboard) largely on the side of archiving. I agree with this stance, in that I value the collective good of accurate history over an individual’s right to be forgotten.

I will cite some principles from UK and EU law as a framework for this discussion. Note that speedrunning is niche enuff to be outside the law to where this should be considered more a comparison to general social practice than a requirement of a leaderboard. The gist of the archiving/data-protection tradeoff can be summarised by this:

Under data protection law individuals have the right to have personal data erased. This is also known as the ‘right to be forgotten’. The right is not absolute and only applies in certain circumstances. Importantly, the right to erasure does not apply if processing is necessary for archiving purposes in the public interest, where erasure is likely to render impossible or seriously impair the achievement of that processing*. Decisions as to whether the exemption applies should be taken on a case-by-case basis, but given that the purpose of an archive service is to ensure the integrity and authenticity of archived records and future analysis would be affected by the removal of data, erasure may often be likely to seriously impair the processing.

— UK National Archives, Guide to archiving personal data, guideline #29 (first half)

And to expand the citation marked *:

Paragraphs 1 and 2 [“The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her…”] shall not apply to the extent that processing is necessary:
(d) for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) [safeguarding; more on this later] in so far as the right referred to in paragraph 1 is likely to render impossible or seriously impair the achievement of the objectives of that processing

– EU Law, GDPR, article 17(3)(d)

It is clear that this favours archiving, but it also places great responsibility on the data processors (the leaderboard moderators) to justify the grounds of the processing. That takes us into three nuances: notability, safeguarding and copyright.

3c | Notability

The usefulness of an archive is measured against what information its users want to learn from it.

Counterargument: Speedrunning is not notable/serious, *some reference to children’s game*.

The notability of the sport is established by the time that participants and viewers put into engaging with it. The basis for archiving speedrunning results is clear to me.

Counterargument: It’s not notable if it isn’t a top time.

This is somewhat true; if you take a long-run view of any sport, then the achievements of the average person doing it for fun aren’t tracked by anyone. This data is less notable, but also less reliable, so an accurate leaderboard naturally develops a cutoff where only times better than it are tracked.

Aside: Rank Aggregation

Care has to be taken with systems based on rank aggregation, like the current SMS IL points system, which assigns points based on the number of runs a run beats, and sums them across 118 categories. If this were reformed to assign points to the number of runs a run is beaten by, it would still be unstable with respect to erasing bad runs, because those runs would contribute points to good players who haven’t submitted in that category and so are automatically billed the maximum (= the number of runs in that category). If this system had a cutoff (e.g. maximum 100 points per category), then it would automatically establish notability for precisely the top 100.

Having given that elaborate example, I would still argue that the costs to a person of retaining eir speedrun data are so small that all of it can be considered notable, and a naive rank aggregation system like the above would be considered “seriously impaired” by not retaining it. In the long run, as I said earlier, accuracy is compromised by full retention and such a system has to be reformed to obviate the need for this data.

3d | Safeguarding

Revisiting the guideline:

Archive services may still wish to consider requests to have personal data removed from public view on ethical grounds under takedown and reclosure policies if data subjects have expressed their concern and the wider balance with freedom of expression and information is not significantly affected. If a data subject complains of distress, archive services should consider if the following is appropriate to make processing fair, especially if the data is available to search engines and the data is inaccurate:
– Reclosure / takedown (removal from public access);
– Adding a supplementary statement to the record;
– Amending or adding metadata or catalogue description.

— UK National Archives, Guide to archiving personal data, guideline #29 (second half)

What is meant by “substantial damage or distress”?
The [Data Protection] Act does not define this. However, in most cases:
– substantial damage would be financial loss or physical harm; and
– substantial distress would be a level of upset, or emotional or mental pain, that goes beyond annoyance or irritation, strong dislike, or a feeling that the processing is morally abhorrent

— UK National Archives, Guide to archiving personal data, guideline #24 postscript

It’s important to show goodwill towards data protection concerns and consider compromises, since compromise is the fundamental principle behind how people get along with each other. In what ways do safeguarding concerns apply to speedrun records? (Note that this still relates to one’s own data, rather than the data of others, which I discussed in section 1b).

Primarily, they apply to personal identifiability. In most instances, people are identifiable via handles that don’t relate to their real names, and their speedrun records contain information of the achievement itself and some metadata. A data controller has to assess whether there is material risk to the person (damage or distress, as above) to maintaining this information. It is unlikely but possible if the subject is being stalked for example. Anonymisation may be considered, and a fully anonymised record (i.e. just the time of the speedrun) should never pose a safeguarding risk. However, anonymisation requires special record-keeping, so in practice often leads to the destruction of the identifying data itself in its absence, which is a problem for historical accuracy. Case-by-case analysis is needed to justify this step, which I consider a substantial step that needs a good case.

Things are a bit different with real names, since there is an expectation of anonymity surrounding the concept of doxing. I personally consider this concept misplaced in general gaming communities, in that it absolves people of personal responsibility for their conduct and makes the online world more alienated and crueler, but many do rely on this for their own wellbeing, so it’s inevitably reflected in leaderboard ethics. I think there is rarely a reason to consider retaining a real-name-related handle on a speedrun time.

For any speedrun leaderboard, video records of entries are considered essential, for historical interest and for ongoing vigilance over run legitimacy. The latter is a legitimate interest a leaderboard has for retaining videos, even if they are removed from public access by the leaderboard itself. However, these videos are almost always recorded by the subject of the data itself, which poses a copyright problem. I didn’t care to research how other sports navigate copyright in archives, or even whether player copyright applies to sanitised gameplay videos at all, but would proceed with caution and unpublish videos that data subjects request be erased. I think retaining the videos privately regardless of the subject’s wishes is in step with the decentralised nature of leaderboard archiving, and practical expectations around private copies of data that was publicly available once but isn’t anymore.

3f | The Right To Be Forgotten

I will end with a more abstract look at this. I’ve always personally considered the concept of a right to be forgotten to be impractical and unnatural. I think leaderboards should, where possible, reflect collective memory. And memory cannot be forcibly changed, nor is it generally right to do that. A person who wants to erase eir speedrun records is typically someone who was known for those records in the past, and not reflecting that is a failing of a leaderboard.

The purpose of this right in general is to provide means for people to escape being stigmatised for their pasts if they’re no longer relevant, but in our context, there’s no substantial claim of stigmatisation that can be made for the action of having achieved a speedrun. If anything, this is most relevant when someone has a history of abuse that is deemed worthy of being forgotten; if a leaderboard engages in spreading information about that abuse, then it would cease to do that as part of the person’s right to be forgotten. But I don’t see any relevance past that.

Information of speedruns exists everywhere, in private records, uncensored chat logs etc., and any ethical olive branch towards helping that person be forgotten, like removing records from a leaderboard, has to be weighed against the impact of disinformation and historical revisionism. And if the data is at-all notable, this olive branch will always lose in my opinion.