Retiming (General Guide)

See also the current SM64 Retiming Guide, which has similar guidelines to these and some screenshots.

Retiming (General Guide)

Downloading

We follow a specific procedure when downloading videos to ensure different people get the exact same video, hence consistent retime results. We use yt-dlp with FFmpeg. You can install both on Windows via the command winget install yt-dlp.yt-dlp.

For the script file (called video.cmd in the SM64 guide), the contents should be the following. This file is called _dl.cmd in the downloadable template above.

yt-dlp -a _list.txt --fragment-retries infinite --download-archive _index.txt
cmd /k

You should make the _list.txt file in the same folder as the .cmd file. This file is the input list of videos to download. yt-dlp creates the _index.txt download archive file as it downloads videos, listing the IDs (e.g. YouTube keys) of each source video; this file is scanned when _dl.cmd is run to prevent yt-dlp from re-downloading videos. Thus, you can run yt-dlp without worrying about duplicates in the input list, which is useful when backing up current leaderboard videos or when you have to run it several times when downloads fail.

The file _dl-sim.cmd has the added parameters -s --force-write-archive, which tell yt-dlp to run and build a download archive without actually downloading any videos. This is simulation mode.

The console output of the downloading process can have useful information for investigating problems, e.g. the format downloaded from YouTube. Usually, YouTube downloads will end with an FFmpeg merge; a lack of this may indicate FFmpeg is missing, which you can test by typing ffmpeg into the console and hitting enter. You may optionally save the console output to a _dl-log.txt file.

Underscores at the start of non-video file names just help them to appear at the top of the folder with name sorting.

Command documentation

-a _list.txt specifies the input list.
--fragment-retries infinite forces yt-dlp to download the entire video or fail if chunks are unavailable (default behaviour is outputting a video file with chunks missing).
--download-archive _index.txt specifies the download archive.
-s enables simulation mode.
--force-write-archive enables the download archive in simulation mode.
cmd /k keeps the console open after the yt-dlp command finishes.

See the yt-dlp GitHub readme for full documentation.

Retiming | Comparison of Methods

These two methods are believed to be accurate and stable (meaning they give the exact same results on the same input file).

FFmpeg (frame dump) is the most transparent and accurate method. Its accuracy comes from it outputting raw timestamps directly from the original video (at least, that’s the intent of the command). It also produces a montage of reference frames from the video that can be used to verify retime results and catch human error. It’s a bit unintuitive to use vs other programs and requires downloaded video, so is recommended for permanent whole retimes of leaderboard runs.
avidemux also derives timestamps directly from the original (downloaded) video, but does some processing of the video start offset that can introduce rounding error of 0.002 at most. It’s for practical purposes as accurate as FFmpeg, and while it doesn’t produce documentation like FFmpeg’s frame montages, it does allow for multi-segment spreadsheet retimes, and simpler undocumented whole-run retimes.

All other methods I’ve seen don’t derive original timestamps, so shouldn’t be used in accurate contexts.

ytd (YouTube debug info) requires no software or video download, but only approximates timestamps, and so is unstable (variation of 0.02 is possible between runs on the same frame of the same video).

VirtualDub is no longer recommended because it makes constant frame-rate assumptions that can cause instability.

Retiming | avidemux

This method of retiming should be used in casual situations. It requires videos to be downloaded and avidemux 2.6+ to be installed. The SM64 guide shows how to do it for retiming entire runs (start to finish).

For spreadsheet-based retimes, you should instead copy the timestamps from the bottom-left textbox (next to “Time”). Take care to highlight the whole timestamp (3dp) each time.

Spreadsheet cells can just have times pasted into them, but the cells must be formatted to display time. Common time formats (note that negative durations show up as 59.999 etc.):

h:mm:ss.000 for full absolute timestamps (with leading zeros);
[<0.00069444]s.000;[>=0.00069444]m:ss.000 for sub-hour times (no leading zeros if the time is under a minute);
s.000: for sub-minute times, usually durations.

Retiming | FFmpeg

→ Demo + Video Tutorial

This method should be used for permanent retimes of entire runs. It requires videos to be downloaded and FFmpeg 7.1.1+ to be installed (which isn’t out yet but if it’s 2025 or later, it almost certainly will be. I had to fix a stupid bug in its code to get it to work properly, hence the delay). A simple way to install it on Windows (once it’s out), if you haven’t already while installing yt-dlp, is via the command winget install Gyan.FFmpeg.Essentials.

The retime lab template contains a set of scripts for this method, in the retime folder. Start by placing the subject video in the retime folder.

Instructions:

Look thru the video to identify where the start and end frames are loosely.
Run _dump.ps1 (right click > Run with PowerShell), and input a start and end timestamp to specify 1.5s windows (starting at the given times) that catch the start and end frames; these windows will be dumped into images of frames in the “frames” folder.
Sift thru the images (with File Explorer preview pane), and delete all but the start/end reference frames and the frames directly before those two.
Run _montage.ps1, which will tile those 4 images into a “manifest” file. The output filename has the format <vID>.<startTimestamp>.<endTimestamp>.<hash>, where vID is the ID on the video hosting platform (used by the accurate leaderboard as a primary key for retime records), the timestamps are of the chosen frames in μs, and hash is the MD5 hash of the combined AV tracks in the video file (logged so that anyone can check ey’s downloaded and retimed the exact same video as you).
Check the manifest file:
1. Check the left frames don’t match the retiming reference frames and the right frames do;
2. Check the integers in the printed text, to the left of the |, increase by 1 from left to right – these are the sequence numbers of those frames in their respective dumps, so this confirms no frame is missing.
Upload and link the manifest in your spreadsheet. On the accurate RTA leaderboard, the pastable formulas automatically extract the vID, timestamps and hash, and link in player/date/timing metadata via the vID. The leaderboard itself looks up retime data by vID and auto-updates all entries upon clicking “Sync”.

Scripts + documentation:

Script (_dump.ps1)

Write-Host "FFMPEG Frame Dump + Timestamp" -ForegroundColor Black -BackgroundColor Green
$video = Get-ChildItem . | where {$_.extension.ToLower() -in (".mp4",".mkv",".webm",".mov")}
If (!$video -or $video[1]) {
    Write-Host "Error | There must be exactly 1 video file in this folder." -ForegroundColor Red
    Read-Host -Prompt "(press enter to exit)"; Return
}
Write-Host ("Video file: $($video.Name)") -ForegroundColor Yellow
Write-Host "Will dump 1.5s segments starting from the following timestamps:" -ForegroundColor Yellow
$ss1 = Read-Host -Prompt "Input start timestamp"
$ss2 = Read-Host -Prompt "Input end timestamp"
Write-Host "Dumping frames...`n" -ForegroundColor Yellow
New-Item -ItemType Directory -Force "frames" | Out-Null
$drawTextParams = "fontfile=_font.ttf: fontcolor=yellow: fontsize=36: x=3: y=3: text='%{n} | %{pts}'"
ffmpeg  -ss $ss1  -t 1.5  -i $video.Name  -vf drawtext=$drawTextParams -copyts  -fps_mode passthrough  -enc_time_base 0.000001  -frame_pts 1  frames/%011d.png
ffmpeg  -ss $ss2  -t 1.5  -i $video.Name  -vf drawtext=$drawTextParams -copyts  -fps_mode passthrough  -enc_time_base 0.000001  -frame_pts 1  frames/%011d.png
Write-Host "Frames dumped.`n" -ForegroundColor Green
Read-Host -Prompt "(press enter to exit)"

Script (_montage.ps1)

Write-Host "Hash + Montage" -ForegroundColor Black -BackgroundColor Green
$imgs = Get-ChildItem frames | where {$_.extension.ToLower() -in (".png",".jpg",".jpeg")}
If (!$imgs -or !$imgs[3] -or $imgs[4]) {
    Write-Host "Error | There must be exactly 4 PNG/JPG files in the 'frames' subfolder." -ForegroundColor Red
    Read-Host -Prompt "(press enter to exit)"; Return
}
$video = Get-ChildItem . | where {$_.extension.ToLower() -in (".mp4",".mkv",".webm",".mov")}
If (!$video -or $video[1]) {
    Write-Host "Error | There must be exactly 1 video file in this folder." -ForegroundColor Red
    Read-Host -Prompt "(press enter to exit)"; Return
}
Write-Host ("Video file: $($video.Name)") -ForegroundColor Yellow
$vID = [regex]::match($video.Name, '\[([^\[]*)\][^\]]*$').Groups[1].Value
If (!$vID) {
    Write-Host "Error | ID missing from video filename (parsing text from last ] to preceding [)." -ForegroundColor Red
    Read-Host -Prompt "(press enter to exit)"; Return
}
$vID = $vID -replace '^v(\d+)$','$1' # remove preceding "v" from twitch decimal video id
Write-Host "Calculating hash...`n" -ForegroundColor Yellow
$hash = ffmpeg  -i $video.Name  -map 0  -c copy  -f md5  -v error  -
$outName = "$($vID).$($imgs[1].BaseName).$($imgs[3].BaseName).$($hash.split('=')[1]).png"
cd frames
ffmpeg -i $imgs[0].Name -i $imgs[1].Name -i $imgs[2].Name -i $imgs[3].Name  -lavfi "xstack=inputs=4:layout=0_0|w0_0|0_h0|w0_h0"  -update 1  "../$($outName)"
Write-Host "`nCreated montage of 4 frames by filename order." -ForegroundColor Green
Write-Host "Output file: $($outName)" -ForegroundColor Green
Read-Host -Prompt "(press enter to exit)"

Command documentation (dump)

-ss before -i: seeks to timestamp of input file;
-t: duration of video to process;
-i: input file;
-vf drawtext: process video with text addition filter (parameters listed under $drawtextparams, but it basically imprints sequence numbers and timestamps);
-copyts: copies timestamps verbatim from original video;
-fps_mode passthrough: each frame is processed, along with its timestamp;
-enc_time_base 0.000001: timestamps are rescaled by 1/0.000001 = 1000000 (i.e. to microseconds);
-frame_pts 1: timestamps are printed in the filename;
%011d.png: selects image encoder and filename (11-digit integer including leading zeros).

Derived from here. See also the ffmpeg documentation for more details.

Command documentation (hash)

-i: input file;
-map 0: selects all streams in the file (i.e. doesn’t drop any streams);
-c copy: prevents decoding the video (slow);
-f md5: output format: md5 hash;
-v error: suppresses all but output and errors;
-: outputs result to command window.

Accuracy theory

The FFmpeg method extracts original timestamps from the input video. These are stored, in modern containers, per-frame as a pts field, the timestamp at which the frame is meant to be presented to the video viewer. These are integer coefficients of a rational timebase (tb, in seconds, constant for a given video), which is the reciprocal of an integer timescale (denoted tbn by FFmpeg, in Hz), the frequency of the grid that the video snaps frames to, typically something like 1000 or 90000. The timestamps we want are thus pts_time = pts × tb = pts / tbn.

While passing all frames (per -fps_mode passthrough) from source video to target images, FFmpeg rescales the (integer) pts and (integer) tb of each frame as follows:

tbTarget = enc_time_base
ptsTarget = round_half_up(ptsSource * tbSource / tbTarget) - round_half_up(start_time * (...))

Here, the start_time variable is set to ts_offset, which is set to 0 by -copyts.

These frames are then passed to the image encoder, which forwards their (integer) pts values (remember, pts = pts_time / tb = timestamp * 1000000) directly into the filename generator (per -frame_pts 1), which itself prints them (snprintf with 64-bit integer format).

Retiming | ytd

This method (YouTube Debug Data) should only be used for ruff retimes where accuracy is not critical. It doesn’t require videos to be downloaded, but only works for YouTube videos.

Navigate to the start and end reference frames (via frame advance – the , and . keys), then right-click and select Copy Debug Info. Select the cmt field (around 5th in the list), and delete the rest of the data. The 3dp cmt number is the timestamp of that frame. This method is unstable, with variation of up to 0.05 typically between different attempts at retiming the same frame in the same video.

Retiming | Technique Notes

There are a few practices I recommend to deal with ambiguous situations. We are normally trying to identify exact reference frames, but may need to count from other frames if the reference frame is not visible or duplicated for any reason. So the below will talk about matching, and counting forwards and backwards. Target frames are the frames we log, so equal reference frames for in-transitions and are one after reference frames for out-transitions.

Whenever a reference frame is dropped in a capture, make a note over the relevant cell explaining the decision-making behind the time you chose. In very rare instances, it might be necessary to deviate from the following guidelines.

59.94/60 fps: to avoid bias, the convention is to treat the first of the two copies of each frame as the only matching frame. So always log the first one, and always count forwards/backwards from the first one. Always count unique frames, so the first of each pair.
Duplicated frames: if a frame is duplicated for reasons other than double frame-rate, then the first copy is still matched, and the first copy is counted backwards from, but the last copy is counted forwards from. That’s because the closer copy is a more accurate estimate of what the frame was at that (closer) timestamp.
Invisible frames: if a reference frame is not visible in (almost) every instance of that type of reference frame (e.g. fadein/fadeout, circle-out with very dark background), then the closest identifiable frame is matched and counted fowards/backwards from.
Dropped frames: if a reference frame is dropped, but is otherwise visible in (almost) every instance of that type of reference frame (e.g. pinhole, sliver), then the earliest frame matching or following the target frame is always used. In practice, this means the first visible frame of an in-transition, and the first blackout/whiteout of an out-transition, is always used. This upholds the principle of always identifying the first frame something happens, again to avoid bias.