Shine-Get Autosplitter

This is a report on how I made the autosplitter targetting the standard shine-get split frame, which I always use for full-game real-time runs. If you’re looking for a tutorial on setting this up for yourself, start with the last section.

Contents

Shine-Get Autosplitter

The Big Picture

Autosplitters are programs used to signal to a timer when to split, automatically based on external stimuli signifying milestones in a run. Speedrun autosplitters fall into two types:

visual autosplitters, whose stimuli are video feeds of gameplay, and
memory autosplitters, whose stimuli are changes in the game’s RAM.

Memory autosplitters are more accurate, since not only is it more reliable to read data than to recognise visual features, but in many cases, the time itself can be calculated from this data. Memory autosplitters are often used to remove (random and inconsistent) loads from the timing.

The Slippi Method
SMS will one day be timed with memory autosplitters, using something like a Project Slippi-esque setup. In that example (Super Smash Bros Melee), the game is modified via injected scripts that pass data out in real time, firstly by running a cheat-code interpreter that saves the data to a virtual memory card, and secondly by running a thread in the OS that redirects memory card data to a TCP connection. It will take some years for such modding initiatives to materialise in SMS, and then a bit longer for the rulesets to adapt to increasingly impure (due to mods) speedruns.

The Savebox Autosplitter
However, for now, we do have visual autosplitters, and those are still vastly preferred to manual splitting, not only for obviating the need to time splits every few minutes, letting the player just play the game, but also for being much more accurate, devoid of human error. In SMS, the usual (visual) autosplitter detects the savebox text that appears after every level, which is a very clear visual, but differs from the standardised visual, “shine get”, used for both manual splits and retiming – that is, manually determining exact splits from a recorded video, not in real time, which is the standard of truth the real time methods aim to approximate. This cue has a much less clear (to a computer) visual.

Solving The Shine-Get Autosplitter
What solutions might we have, then?

Retroactive splitting. Given the savebox autosplitter already exists, and its cue is offset from shine get by an exact, consistent amount, the simplest solution might seem to be modding the timer software (LiveSplit) to accept signals to split at the current time minus a fixed amount. I chose not to take this route because I found its code difficult, but in hindsight it may have been easier to persevere there. I will say I consider this an important missing feature.
Machine learning. The standard industry solution to this problem is to use convolutional neural networks to detect images. These are programs that are trained on a large sample of pictures of what is and isn’t the desired visual cue, and have the advantage of being completely agnostic of context – you do not have to care about the features that distinguish the visual cue and how simple or complicated they may be, for that’s the computer’s job, and the way it recognises patterns is entirely different to how a human does, and intractable for a human to understand. I didn’t do this simply because I have no experience with machine learning; however, it is the method that leads to robust and reliable one-size-fits-all software that anyone can use, such as AutoSplit64, the standard autosplitter for Super Mario 64 speedruns.
Masking. The savebox autosplitter is based on a host program AutoSplit, whose job it is to compare the live video feed with a reference image using a simple, explicit algorithm – explicit in the sense that a human can intuitively understand each comparison. The algorithm, “L2 norm”, subtracts the RGB colour values for every pixel in a region (known as the “mask”) of the feed to their corresponding values in the reference image, and then sums all the absolute differences (specifically, sums their squares and takes square root of the result, which is the same as how the physical distance between two points in 3D is calculated from their x,y,z distances). The thing we can control, then, is the “mask”. For savebox, the pixels will be identical to the reference image on most of the screen, so masking that region will give excellent results. Finding a good mask for shine-get is much harder…

But finding that mask is the route I chose to take, and the following sections will explain how I did it. Honestly, what about…

4. Standardise savebox. This entire endeavour is really an XY problem because instead of seeking a shine get autosplitter, we could just switch the split timing standard to savebox and give the players a little program that applies a standard mask to a screenshot of their game feed’s savebox, and let them run AutoSplit with that.

Yes… that is what should’ve happened. I made a frame-perfect shine get autosplitter, so far the only one to have done it, and I will write down how I did it for everyone’s interest, and everyone will go “ain’t no way I’m doing that LMAO”. But at least the sunk cost paid off for me, letting me split according to the current standard accurately and without having to do anything during the run myself, and the process by which I did it is interesting and worth documenting.

Creating The Autosplitter

Goal

To summarise the introduction, we’re creating a visual autosplitter hosted by AutoSplit, a program that uses a simple algorithm (“L2 norm”) to compare every frame from the live video feed with a single reference image, and output a match rating (a number between 0 and 1). We have control over:

the mask: region of the screen that is compared;
the reference image (which reduces to the mask since we can just apply the mask to a frame taken from a sample of the target cue from the live video feed);
the threshold: a match rating above which a split will be triggered and below which it won’t.

The goal is to create a visual autosplitter that reliably identifies the shine get frame, accepting it and rejecting any other frame from the game. This requirement can be weakened, mercifully, in two ways:

we don’t have to reject any frames right after the target frame, which obviously look quite similar to the target frame, since AutoSplit has the ability to auto-reject them via a cooldown window;
we don’t have to target the shine-get frame directly, and can pick an earlier frame if easier since AutoSplit supports delaying the split signal by a fixed duration.

So, we’re trying to identify a mask that does this, i.e. pixels that we think will, in conjunction with each other, match a target frame well and clash with the (visually similar) preceding frames, both as strongly as possible.

Method (Sketch)

This is a sketch of the method needed to create a reliable frame-perfect autosplitter via masking and AutoSplit.

Standardise the video feed capture method, including resolution (1:1 comparison without rescaling means you know what you’re getting with every pixel you include in the mask).
Create a mod of AutoSplit that frame-dumps various shine get animations (including the target frame and all similar-looking preceding frames), and then create a large sample of shine get frame dumps (identify the target frame in each).
Create a script that runs AutoSplit’s image comparison algorithm on two files (to compare one reference image to the entire sample of shine get frame dumps, yielding a table of match ratings – for every frame within a dump, for every dump).
Pick out features in the shine get animation, and hence create a mask to uniformly accept the target frame and reject all preceding frames (trial and error testing via the script in step #3), and set the relevant threshold.

Method (Details)

Nothing here is prescriptive; I’ll just commentate what I did. I’ll give more info on the target frame, visual features to target with the mask etc. in the next section; this section is purely the practicalities of the process.

Standardised Video Feed
AutoSplit still requires some form of screen capture for its ingest feed, so I use an OBS windowed projector of the capture card feed, having cropped it with a filter to the game’s native resolution (this will be 660×448 for most, but I use Swiss loader to force 640×448). These windows are not capturable with AutoSplit’s capture method in OBS version 28 or higher, so I use 27. OBS is incapable of resizing a windowed projector to source resolution, so I use the app Sizer to resize it every time I open it. Creating the preset in Sizer required trial and error with screenshots in Paint, since Sizer’s numbers are offset from real dimensions.

Frame-Dump Mod
I ran AutoSplit’s source code via a Python interpreter (after installing python 3.9 or later, and then following these instructions, specifically “install all dependencies” then “run the app directly”), and added code to the Python file AutoSplit.py directly:

Frame-Dump Mod Code

To def __init__(, add:

        # capture cache
        self.cacheSize = 90
        self.cache = [None for i in range(self.cacheSize)]
        self.cacheIndex = 0
        self.dumpIndex = 0

        # hacked hotkey
        keyboard.add_hotkey("/", self.startAutoSplitter)

To def reset(, add:

        folder = 'dump' + str(self.dumpIndex)
        self.dumpIndex += 1
        if not os.path.exists(folder): os.makedirs(folder)
        for i in range(self.cacheSize):
            if self.cache[(self.cacheIndex + i) % self.cacheSize] is not None:
                image = cv2.cvtColor(self.cache[(self.cacheIndex + i) % self.cacheSize], cv2.COLOR_BGRA2BGR)
                cv2.imwrite(folder + '/' + ('%02d'%(i%self.cacheSize)) + '.png', image)

To def getCaptureForComparison(, add:

        if self.startautosplitterButton.text() != 'Start Auto Splitter':
            self.cache[self.cacheIndex % self.cacheSize] = capture
            self.cacheIndex += 1

The third part makes AutoSplit keep references to the most recent 90 frames (3s) it read from the video feed, and the second makes the reset key dump those frames into a dump00, dump01 etc. subfolder within the folder AutoSplit was executed in. The first part adds in an explicit “/” keybind to start AutoSplit without splitting.

Being able to create precise, reliable autosplitters is an important use-case of AutoSplit so I will say this functionality and the image comparison mod below need to be added to the program at some point for it to be a serious tool.

Create Sample
My full sample was made by doing an Any% run and frame-dumping every shine. I ran AutoSplit with the video feed set up as earlier, pressed “/” to start it after starting the run, then for every shine get, timed a reset hotkey press about 2s before the target frame to try to capture it and the frames leading up to it in the 3s dump window. Then I pressed “/” to restart it, and continued.

Sort Images
I went thru all of the 43 shine-get dumps from the Any% run, labelled the folders with identifiers and with the frame number within the folder corresponding to the shine get frame, placing the latter after a ~ (e.g. A05b5~58 for a folder where 58.png is the shine-get frame). If the shine-get frame is missing from a sample, or it’s unusable for any other reason, then it can be discarded.

Image Comparison Mod
The code here in def match( is condensed from the def compare_l2_norm_masked( method in compare.py and def getCaptureForComparison( in AutoSplit.py, with the aim of making it completely true to how AutoSplit actually calculates match ratings. def run( runs this function on every frame in the sample and generates a CSV file of results. It shifts the columns according to the number after the ~ in the filename such that the frames line up across samples.

Image Comparison Mod Code

import sys
import os
import numpy as np
import cv2

UNIW, UNIH = 320, 240
N = 90
TARGET = 'D'

def match(targetFile, captureFile):
    # get target, resize, generate mask
    target = cv2.imread(targetFile, cv2.IMREAD_UNCHANGED)
    if len(target[0][0]) != 4: raise Exception("Target image should have 4 channels.")
    target = cv2.resize(target, (UNIW, UNIH), interpolation=cv2.INTER_NEAREST)
    [lower, upper] = [np.array(bounds, dtype="uint8") for bounds in [[0, 0, 0, 1], [255, 255, 255, 255]]]
    mask = cv2.inRange(target, lower, upper)
    target = cv2.cvtColor(target, cv2.COLOR_BGRA2BGR)

    # get capture, resize
    capture = cv2.imread(captureFile, cv2.IMREAD_COLOR)
    capture = cv2.resize(capture, (UNIW, UNIH), interpolation=cv2.INTER_NEAREST)

    # compare
    error = cv2.norm(target, capture, cv2.NORM_L2, mask)    # component-wise norm
    normaliser = (3 * np.count_nonzero(mask) * 255 * 255) ** 0.5
    return 1 - (error / normaliser)

def run(targetFile):
    samples = [folder for folder in os.listdir('sample') if folder[-1] != "'"]
    offsets = [int(sample[sample.rindex('~')+1:]) for sample in samples]
    output = ','+','.join(samples)+'\n' # csv header
    for i in range(-max(offsets), -min(offsets)+N):
        output += '%02d,'%i
        for k, sample in enumerate(samples):
            j = i + offsets[k]
            if j >= 0 and j < N:
                x = match('targets/'+targetFile+'.png', 'sample/'+sample+'/'+('%02d'%j)+'.png')
            else:
                x = ''
            output += str(x) + ','
        output += '\n'
        print('.' if i % 10 != 9 else ':', end='', flush=True)
    with open(targetFile+'.csv', 'w', encoding='utf-8') as outFile:
        outFile.write(output)
        print('!')

run(TARGET)

Create Mask
I created the mask as part of creating the reference image itself – taking one target frame from the sample and editing it in Paint 3D (for example), setting a transparent canvas and selecting + deleting any part of the image I wanted to exclude from the mask. This rectangular selection + deletion is quite laborious and other image editors may fare better. In particular, to copy a mask from one image to another, I used GIMP IIRC.

Findings

With this method, I investigated the sample of shine-get animation frame-dumps, to then pick and test masks targetting specific frames.

Shine Get Gallery
Firstly, I compared the shine-get animation between different samples. I found that it split the sample into 4 groups, within each of which the animation (text, Mario + shine orientations) was completely consistent. The theory says this is due to quarterframes (QFs); the shine get animation is rendered every QF and may start on any QF cycle. Every fourth rendered QF will be displayed, but the offset between the starting QF cycle and the video render QF cycle means one of four possible sequences of frames will be displayed. Hence, I took four samples representing each QF cycle and interpolated 69 frames of each to generate a 120Hz shine-get animation:

→ Shine Get Gallery (Italian)

(See also: → Shine Get Gallery (English)).

(Flick thru the whole folder using ←/→ keys while previewing an image in order to load all previews and be able to smoothly frame-advance the animation in either direction). The first QF of the shine-get visual (= the shine being “held up”) is numbered 200 by convention, meaning 200–203 represent the shine-get visual across all QF cycles.

QF Visual Identification
Looking at the 120Hz animation, it can be seen that the “Splendido” (or “Shine”, “Shine Get”) text moves every QF, the stars around the text move every 2 QFs, and the shine orientation changes twice per frame in a 1:3 QF uneven pattern. This gives rise to a set of consistent visual cues that identify the QF cycle (and so can theoretically be used to identify the QF an IL ended on).

The QF cycles are numbered 1–4 per the remainder of the filename numbering when divided by 4 (identifying 0 with 4; cycle 4 is 64,68,72, cycle 1 is 65,69,73, etc.). The visual cues are then as follows (for Italian and English; probably works on Japanese too):

the first frame text appears is QFs 121–124. QF 121 (cycle 1) is the only one in which the first letter is rendered in the far-left corner before jumping to the right half of the screen and then sliding back left on successive frames. the other 3 cycles have the first letter appear on the right half of the screen.
the first frame stars appear is QFs 181-184 in the gallery, in a tight circle over the first letter. the second frame of stars (185–188) has them in a looser circle over the first letter, and the third frame (189–192) looser still. in each case, the size of the circle is consistent on all QF cycles. however, notice, on the second frame of stars (185–188), the pattern of stars over the second letter – cycles 1 and 2 (qf 185,186) have none whereas cycles 3 and 4 (qf 187,188) have some.
finally, note, on the second frame of stars again (185–188), the orientation of the shine. on cycles 1–3 (185–187), the shine faces downwards, whereas on cycle 4, the shine faces leftwards.

These three cues entirely determine the QF cycle the shine-get cutscene is playing on. Summary:

cycle 1: 1st letter spawns bottom-left, 2nd frame of stars has no stars on 2nd letter + downward shine
cycle 2: 1st letter spawns centre-right, 2nd frame of stars has no stars on 2nd letter + downward shine
cycle 3: 1st letter spawns centre-right, 2nd frame of stars has stars on 2nd letter + downward shine
cycle 4: 1st letter spawns centre-right, 2nd frame of stars has stars on 2nd letter + leftward shine

I found that the patterns of match ratings for each frame dump were largely consistent between samples from the same QF cycle, so I grouped them as such in how I named their folders – e.g. C25s3~59 means QF cycle 3 (A–D = 1–4), sample #25 (Sirena 3 in Any%), where the shine-get frame was #59 in the sample.

Feature Selection
To recap the goal: we’re trying to create a mask that targets a specific frame in the shine-get animation, at or before the shine-get frame itself, and that rejects every frame before that. The addition of QF inconsistency means we have to uniformly target a set “A” of 4 successive QFs, and reject the entire set “B” of all QFs before those, all with the same mask. Specifically, we want to find as large as possible an ε>0 such that the minimum match rating across A (“α”) is ε greater than the maximum match rating across B (“β”), whence the threshold in AutoSplit will be set halfway between α and β. The greater the ε, the more reliable the autosplitter is in principle.

The visual that I found worked best to target was the first two letters of the shine-get text, after they had settled into place and the stars on them had both expanded to mostly not cover them. The mask is designed to capture as much of the blocks of colour within the letters and the black borders as possible while avoiding the stars and the discoloration their glow brings to surrounding pixels. This targets QFs 191–194, which are 9 QFs ahead of shine-get (200–203), which is 2f on QF cycles 1,2,4 and 3f on QF cycle 3, necessitating a delay in AutoSplit of average(2,2,2,3) = 2.25f (75ms). This is an averaged delay and so janky and imprecise, but the text and shine-orientation animations are out-of-phase from each other so it’s hard to do better.

Result
This is the reference image my real shine-get autosplitter is based on:

→ Results Table

The results table is generated by the image comparison script; I also grouped the samples into their 4 QF cycles. Initially, the numbers highlighted in aqua text were aligned with the row marked “0” – the shine-get frame. I then shifted them to line up the numbers highlighted in pink – the target frame the autosplitter aims for, 2f or 3f earlier depending on QF cycle (some individual samples also had to be shifted owing to frame drops/dupes). With this alignment, α,β,ε can easily be calculated. This reference frame has an ε of 0.05, which means it strongly distinguishes the target frame from preceding ones, and I’m really happy with it. I use an AutoSplit threshold of 0.96, as this implies.

Review

So that’s the setup. But how well does it work? With several seasons of Any% runs spanning 3 years, I can share both theoretical critique and real-world evidence.

Reliability

In my most recent Any% season, as of writing this, the autosplitter has correctly split 2186/2187 times, so 99.95% reliability. The one failed split was caused by Mario touching the Ricco 2 shine on the frame it spawned, which caused the IGT to get stuck on screen and block the visual the autosplitter looks for. So a little sandbagging here wouldn’t go amiss 💀.

The only other instances of missed splits are caused by AutoSplit failing to start up properly; it is very buggy software than can for no apparent reason not detect either the video feed or the reference image, but it’s easy to verify it’s started up correctly by using the GUI to peep these two things, and it doesn’t break afterwards.

Thus, the autosplitter does not miss; however, it’s theoretically possible per the results table above, only if the target frame gets dropped by the video capture, for it to split 4f late, as it would hit the next matching frame, which is 4f later on QF cycles 1/2 (1f on cycles 3/4, thankfully). On my setup, looking at my sample, the chance of this seems to be <3%.

Colour Shift

I had to resample an Any% run and regenerate the reference frame in Dec 2020 after the colours in my video capture shifted, which is still unexplained. It has remained working ever since then without modification, thankfully. I’d advise writing down all capture card video settings, but the fact that the autosplitter broke entirely shows the specificity of the reference image to one’s setup and the need to generate it yourself.

Colour Bleed

Certain shine-get animations that take place over red backgrounds were less reliable in my autosplitter because the red bleeds into the black letter borders for some reason. I trimmed the amount of border included in the mask to fix this.

Jitter

AutoSplit and LiveSplit both espouse what I consider to be a fundamentally flawed timing model (for SMS at least), in that they send and detect signals in real time, fully decoupled from the generation of a video of the run, despite the latter being the standard of truth for timing. I wrote more about this in this essay. AutoSplit takes it one step further in that it polls an uncompressed feed of the capture card. It would be more consistent with the video if it instead parsed the video as it was being created, which needn’t be done in real time, since the timing lives in the video timestamps themselves. The only downside is this would introduce noise to the video feed and reference image, because of video encoding.

In practice, AutoSplit does have some jitter, from polling a video feed that could lag, from using an OS clock and from communicating with LiveSplit via piping; this all can cause timing deviations of ±3f vs the video.

Manual Start/End Splits

Automatic splitting at the start and end of an SMS Any% run would require entirely new image setups that don’t generalise to a large number of situations like shine-get does, so I deemed it not worth the effort to make these. It’s important to be mindful of the systematic offsets this causes in those (manual) split times vs the automated shine-get ones. Shine-get split signals trigger 2f after they appear on screen (with a typical USB3 capture card, since 2f is the latency of its preview). The start signal triggers 3f before it appears on screen (since 3f is the total input lag from the A-press that starts the run, which coincides with the start key input). The end signal triggers simultaneously with the screen (the end key is pressed anticipating the visual signal). Thus, Airstrip segments with this autosplitter are typically 5f too long, and Corona segments 2f too short; all others are unbiased.

“Instructions”

If you’ve come here interested in replicating my setup… you’ll give up but let me try to summarise what that’d entail :p. Ensure you have AutoSplit installed (I use v1.6.1 with the LiveSplit component provided in the readme).

Here’s the comparison image I use. You can use it for the mask (the pattern of opaque-vs-transparent pixels) if you are using Italian, but the actual colours have to come from your setup. Beware that I run SMS at 640×448 resolution whereas most use 660×448 – you can try adjusting the mask’s pixels manually or rely on AutoSplit’s automatic rescaling and hope it works.
Set up the standardised video feed.
Set up the frame-dump mod for AutoSplit and then follow the create sample instructions right below that to create at least one frame-dump of a shine-get animation from your setup, which you can apply the mask from my comparison image to as per create mask. This should yield a comparison image that you can try out right away (I use threshold 0.96).
To create a reliable comparison image, you would test candidates by following the rest of the instructions in that section - creating a full sample of 43 frame-dumps by doing an Any% run, then sorting thru and identifying the shine-get frame in each one, then testing candidates against the sample via the script provided. To understand what you’re doing better, like the patterns in the numbers and fiddling with the mask itself, read the Findings section as well.