2022 update: principles #3 and #6 have changed. frames are natively stored with “presentation” timestamps, so logging these is ideal; avidemux is the recommended software because it seems able to extract these, tho ffmpeg frame dump is still the standard for perfect accuracy. this article discusses this from a software perspective.

what follows is this original article (from feb 2021).

video retime manifesto

writing down some principles on how to organise retimes from video in a spreadsheet leaderboard structure. il sheet. retime log.

1. The video is the absolute source of truth.

Retiming methods should always yield a consistent result and ideally consistent intermediate results given the same video as input. However, videos are falliable. Judgement may be necessary when identified frame numbers seem to disagree with expectation and dropped/duplicated frames are seen or suspected. In rare cases, it is possible that a video will contain a systematic timing problem – example – e.g. because it underwent an incorrect frame-rate conversion.

2. Every retime should be verifiable.

That is, all retimes must have a single video. This is particularly important because the standard timing method we’re trying to conform to is based on a timer with poorly-understood random error. Retiming methods change based on new theory, and can be hard to calculate, both of which necessitate retaining raw data, either to check a calculation or to redo existing calculations. Players should be required to submit a single cropped video, because every verification requires downloading the same video, so it should be as convenient as possible for the verifier (e.g. 4 min max length).

3. Log absolute frame numbers.

The data that should be logged is the absolute frame number of each reference frame that the theory tells us to look for, and a spreadsheet should be used for derived values like the final times. This is to optimise:

4. All reference frames should be the first frame something happens.

This is for two technical mathematical reasons:

Discretisation

Videos represent a continuous concept (a duration of time) as a discrete one (finite amount of frames). Every frame is a picture that represents the amount of time between the frame before it and itself. The difference between the frame numbers of the first frame two events were detected is an unbiased (average-case) estimate of the amount of time that passed between them. Explained with example:

E.g. If the first frame that’s part of a run (e.g. intro cs) is frame 12 and the first frame after the end (e.g. shine-get cs) is frame 58, then the start happened at some point between frames 11 and 12 being detected, and likewise 57 and 58 for the end, so the duration 58–12=46 is an average estimate of a quantity between 57–12=45 and 58–11=47.

Arithmetic Consistency

We can partition large events into smaller ones, by putting adjacent sets of frames into groups, such that every frame of the whole is in a group and none is in more than one. For example, partition a red coin level based on the distinct values shown by the red coin counter. From the discretisation point, it follows that the duration of one of these events is accurately represented by the difference in the frame numbers of the first frame it was active and the first frame it was inactive (equivalently, the first frame of the next section), and this is consistent under addition, in that if we combine two adjacent sections, the duration of the sum = the sum of the durations.

E.g. say the first frame the red-coin counter says 1 is 23, the first frame it says 2 is 55, the first frame it says 3 is 80. Then frames 23, 24, …, 54 are the duration when the counter said 1, and 55, 56, …, 79 are the duration it said 2. The count of these frames equals the difference between first frames, i.e. count(23, ..., 54) = 55 – 23 = 32, and count(55, ..., 79) = 80 – 55 = 25, and per the discretisation point, this accurately estimates the lengths these durations took in the real world. Furthermore, the count of the combined duration of the two is the sum of frames, and likewise 80 – 23 = (80 – 55) + (55 – 23) = 32 + 25, remains an accurate estimate of the duration in the real world.

 

5. Retimes should be verified by mods.

This system should work exactly the same as full-game runs, in that mods are expected to understand retime principles and agree on the results, and players aren’t expected to since they’re often misinformed about methodology. However, it’s beneficial to provide resources for them to understand, to promote knowledge and transparency.

6. Retimes should be done using a single program, namely VirtualDub 1.10.4.

Using the same program means absolute frame numbers will be consistent, so everyone can verify everyone’s work, and furthermore this program is preferred to other frame-servers like AviUtl for reasons given in [upcoming report]. It’s preferred to video editors like Premiere Pro because frame-servers tend to have user interfaces suited to instant seeking and reading absolute frame-numbers, and take way less time to start up.