Elektrobank/Piku (EkPk)

This is a blueprint for a program that I’ve wanted to make since I first sketched the idea in March 2021 under the name Veep. Since then, I’ve spent over 3 years crippled by social abuse, so I’ve not been able to motivate myself to make it. I wouldn’t have been using it to speedrun myself, both because I’d be being hated for it and because it’d be buying too much time and so distracting me from plans to end my life. As such, it was also too hard for me to start without a big commitment, and would’ve probably been the most complex and best program I’ve ever made as a result. Still, I really think it should exist, so here’s step one, a blueprint.

Outline

The Veep sketch is a good introduction to the idea and technical side. Basically, the goal is to make a program that annotates a video with timings – splits, as they are known in speedrunning. These would primarily be sidecar files with metadata tied to video files, but in practice most videos would be disposed of after being processed, leaving just the timings, which accumulate in a set of historic data, akin to LiveSplit’s current splits files.

These timings are generated by recognising patterns in frames of the video, and pulling out the timestamp of the frame and annotating it with what was detected. Thus, there is no wall clock as in LiveSplit, and the data perfectly matches the video, and can be generated both by passing in a completed video (retroactive retiming) or in chunks in real time (live timing). This fits the model where the video is the source of truth that’s used for games like Super Mario 64 and Sunshine.

The program is thus divided into two halves.

Well, actually, first, the infrastructure needs to be built to serve the frames of an encoded video as an input stream of lossless images, which ideally supports seeking as well. Call that part 0.
Part 1 is the computer-vision-based state machine that processes a stream of frames and emits signals when it matches them. Different signals for different things, like say first black/white frame, first non-black/white frame, or a savebox. Call this Elektrobank.
Part 2 is the graph-based algorithm that parses this sequence of signals and semantically interprets it into a full run with a route and segments. Call this Piku. Piku’s job is to handle different routes and merge compatible segments together for statistics.

Take a list of splits output by Elektrobank and interpret it via Piku and you recover a traditional splits file. You know, except that:

It retimes to itself.
Each run has a different set of optional segments in it.
Splits can be triggered retroactively, even in live context.

These are just three of the many large flaws of LiveSplit-AutoSplit, which is a paradigm dating back 10 years and ought to be replaced. Other such flaws that would be fixed by this approach are:

The inability to edit splits efficiently (e.g. change the split frame);
Storing all data in RAM and losing it on crash;
The manual, setup-dependent and untestable construction of AutoSplit graphs (which are also linear-only).

The names of the programs come from the album Dig Your Own Hole; the word “Bank” reminds me of watching The Weakest Link and the process of saving down the progress made in a speedrun so far by emitting a split signal, and “Elektro” is for the futuristic automated luxury space communist nature of it all. And Piku is just the song after that.

Sounds cool? Let’s get stuck innn

Base Infrastructure

This section is less interesting than the next two; feel free to skip.

One of the things that stopped this project from getting off the ground is how painful it is to deal with video. AutoSplit worked best off a paradigm where OBS would fork the signal from a video capture device and send it both into the video and into a preview that AutoSplit would resample with screen-capture and time off a wall clock. Yikes. This is about the best that could be done with the amount of exclusive mode shart happening with those capture devices and DirectShow, all legacy stuff.

The input for EkPk is an already-encoded video, which both simpifies and complexifies things. Firstly, we want the final file to be the source. This is fine for retiming, but for live timing we run into the read/write contention problem and find ourselves asking for a data streaming interface. Things get fucked up quite fast:

Mp4/Mov containers don’t really support streaming (they get corrupted if OBS crashes…), we could use the fragmented variants I guess but it’s not as snappy as MPEG-TS (and you would want live retiming to be delayed by 2s at most vs real time). Idk the pros and cons of MPEG-TS but people aren’t used to storing files in that format. Maybe Matroska is the best of both worlds; idk.
There’s no way to pipe data out of OBS replay buffer so we may have to require recording everything to storage at first. An idea I had there is to reimplement replay buffer by recording to a file on a ramdisk, which could be a shout for those who value their storage’s endurance. Ofc, with splits that consist of original timestamps, a split file could be used to automatically crop the final video 👍. And the video could be copied at the end, from say an MPEG-TS file on a ramdisk to an MP4 on storage. Or a fragmented MP4 saved every minute in case of crash?
We cannot use OpenCV (our computer vision library) to directly ingest the video, because its interface is too simple and so doesn’t expose original timestamps from the video. Trying to use FFmpeg from the command line, exposing frames as images for OpenCV to read, it doesn’t sound like the image2pipe encoder even distinguishes different frames/files, and I can’t imagine in a live context having to re-call FFmpeg on new incoming data with a pipe, not duplicating the data, not overrunning the pipe’s buffer, etc…
Maybe with a ramdisk, FFmpeg can spit out a feed of saved images, be re-called every 1s or so, and those images can be read in then deleted by OpenCV?

I think my best solution is to write a library in C that calls FFmpeg API functions and exposes a frame-streaming interface that way. Then EkPk can be written in a language of one’s choice (I’d use Go) with C bindings. There’s nothing really wrong with this, particularly if the C program is very narrowly-scoped, but it’s just hard to get right. Programming with contexts, malloc etc… would take a bit.

The interface we want is, I think, akin to file I/O. Our library would wrap a video into a stream of data where one atom (like a character in plain-text or a byte in binary) would be a frame, and it has just one read pointer that starts from 0 and ends at end-of-file (which shifts along as the video exposes more frames), and it can seek as well. Then when a frame is requested (by timestamp or by next()), our library would seek using FFmpeg av_seek_frame() in the former case, stay put in the latter, and then run av_get_frame() till it finds a frame in the selected video stream, that kind of thing.

The return value should be a frame as an image, as well as its timestamp (either pts_time or pts + timebase).

Elektrobank

Aite then, here we go.

Elektrobank is pretty much a signal-emitter that runs a set of filter plugins against a stream of frames.

By filter, I mean in the sense of a boolean-valued pure function that takes a single frame and outputs whether it matches a pattern or not. IMO, these filters should be quite simple, like matching all black, or all white, or a single graphic (in the style of one AutoSplit image). Then Elektrobank, in order to emit a signal when the filter switches from off to on, only needs to track its state on the one previous frame. Well, it should also track cooldowns in the plugins, to disable them from spamming signals cos of noisy detections.

It should be Piku’s job to look back at the entire sequence of signals so far and interpret what they mean. For example, if we make filters that look for a bit of text that identifies which star has been collected (a picture for every star running on every frame in parallel threads… we can probably think of some performance optimisations later), then Piku’s algorithm would look at the sequence of signals and process it along the lines of:

if (signals[i].type == "savebox") {
    segmentName = signals.slice(0, i).filter(x => x.type == "text").at(-1).level
}

This also works live because all such algorithms may only use the array of signals so far, tho they may also edit results retroactively.

Plugins need to be modular, whether they get compiled into little DLL files or are just JavaScript scripts that different people can make. They also need to be able to be calibrated, because different black levels will work for different people. In fact, the best way to calibrate a plugin is to let it run default settings on a video made on a given video capture setup (first specify the crop to the game feed), then go back and manually adjust the resulting splits file. Then, the plugin can learn (somehow) what to look for. So that raises two questions.

Firstly, for manual setting of splits (which needs to always be easy to do), we need some user interface. The splits file will contain run objects, which contain a list/dict of tuples of (timestamp, signal) (I’m not defining the format closely yet). Point is, Elektrobank should be able to read in a set of splits as well as the video they were generated from, and expose a ←/→-scrollable list of pairs of images of the sequential frames that triggered a signal. Then, the user can edit those by scrolling thru nearby frames or typing in a manual timestamp (a more complex interface would turn into an avidemux-style frame server with full seeking capability).

More than that, if the video gets changed (say, crop, or upload-then-download from YouTube), then Elektrobank should be able to estimate the new timestamps using an offset/rounding, and use FFmpeg to try to match these up to the closest ones in the new video, and so generate a new split file for it automatically.

Right so, in this way, this becomes known, correct data. To calibrate a black level from this, we can probably just average its RGB values and add some overhead. For more complicated applications, this becomes training data, and lends itself well to becoming a machine-learning model. Without experience of all OpenCV has to offer, I can’t comment further, but that’s the idea.

To recap: minimum viable Elektrobank is thus:

black/white detection plugins with rgb thresholds
streams a sequence of frames and creates a sequence of signals changes the plugins detect
parses signal files and seeks to the signals in the video, allowing them to be manually edited

That last point is a retiming fundamental – if you have auto-generated timestamps, you need to be able to efficiently inspect the frames they were generated from.

A run would then be a sequence like:

signal: pts of off→on
...
all-white on: 2344
all-white off: 2364
savebox: 2370
all-black on: 2390
all-black off: 2431
...

Later, we could add text detection plugins and so on. We should also be able to enable/disable plugins when running Elektrobank, thus allowing it to, say, add a new kind of signal to an existing splits file, adding detail to the timings.

Piku

Piku interprets a list of signals like the one in the example right above. This is very game-dependent, so let’s look at some simple examples.

Altho the savebox signal is not so useful as an end product in a run that has all black/white signals marked (i.e. every load), it is defo useful for interpreting splits because it appears exactly once per level in Super Mario Sunshine. This means that, during a run, we have a list of savebox signals so far demarcating levels.

Piku is given some metadata on the run (also in plugin form). For example, for SMS Any% (no ACE), it should expect the shines to go in mostly fixed orders, like Airstrip, then Bianco 2–5. It can fork for Bianco 6 and 7, else proceed regularly to Gelato. Then there are different world orders in the second half. This lends itself to a fairly simple flowchart. Piku mostly knows which level is next by counting saveboxes, so it can match smaller segments and loads within each level easily.

Of course, if Piku can figure out the exact level purely from signals, say some text (more applicable to SM64), then the constraints on level order can be relaxed to allow more goofy shit like say optional 8th shines in each world.

If there is ambiguity, say from using a simple algorithm that only interprets savebox signals, then Piku needs to expose an interface to allow a runner to select in real time the current level. This would be, default to a standard one, and have a button toggle alternatives. Then Piku’s algorithm serves up a next default one, say with the constraint of every level being done only once, and we’re on our way. This is particularly applicable to something like SM64 70 Star.

The format Piku stores timings in is a bit more specific than Elektrobank. In the latter, we pretty much just have signal: timestamp. In the former, we would want to switch that to segments rather than splits. So it would be keyed by the triplet:

(start-event, end-event, intermediate-events)

The value would now be a duration, either a difference of pts integers or converted to seconds (it doesn’t make much difference since we’d keep them full-precision, i.e. microseconds). This item of data is saying the run went from start-event to end-event in a duration, and intermediate-events happened in between. This triplet identifies identical segments, basically. So, for example, a Ricco 1 segment is different if it started in Pinna 7 vs Bianco 7. It is also different if a blue coin was picked up in between. Piku for 120 Shines would be responsible for adding together all the blue coins in these segments and making sure they were all collected once, and runners would likely tab thru current segments if they deviated from default blue coin orders as they went.

The triplet format allows you to easily figure out routes and segment timings. Say,

(p2r,p3,otgna): 1:45.567
(p3,p4,0): 1:00.394
(p4,p5,f): 1:30.521

Where every blue coin is assigned a specific letter. Runners know not to compare, say, (p2r,p3,otgna) with (p2r,p3,a), cos the latter will be faster but those blues then have to be collected in (p4,p5,fotgn). But the sum of the two segments is comparable. You get me?

Piku would be responsible for merging down segments with identical starts (e.g. it makes no difference which shine was previous among shines from the same world). Sum of best would be calculated like that, across potential routes. Piku could even hint at the optimal route using its defaults.

It’s not easy to program these algos! But they are at least deterministic, rather than working with noisy signal detection as in Elektrobank. You would start small, just with LiveSplit-style linear splits, say savebox-only like current SMS autosplitters, and then add in more sophisticated interpreting. I cannot imagine what any of this would look like without sitting down to write it and iterate on it, so this blueprint and your imagination is the cutting-edge for now.

I would say, a final splits file should have this kind of format for a run:

a base timestamp
a sequence of segments (triplet → duration)
+ any metadata

And Piku would be able to convert these back to original (split) timestamps for Elektrobank to use to seek thru the original video. The conversion is basically, well, the splits file should probably order segments, and then their durations can just be summed with the base timestamp. As such, if more granular signals are used (for say every load) then the segments should be partitioned to be unique rather than repeated. The events would then have names like b3e-start, b3e-end, b3s-start, b3s-stop, b3s-savebox, b3s-end The useful data can be figured out by the user with a set of provided scripts to merge segments, export data… this, and indeed the styling of the live splits view, are all secondary concerns more easily extended from the solid basis described here.

Closing Remarks

When I first started a draft of Veep, the precursor to this blueprint, I installed, like, Rust, WASM and React. I just got lost in it. If I made EkPk, I would stick to the basics and have it be a console program. It would be at least necessary to render frames to tab between, so it might turn into a GUI later. But at its heart, it’s just a bot, and it can emit live timing info to a console in different colours and that’s good enuff really. It’s thru such an approach that I’ve made reliable and well-abstracted programs in the past, rather than going all-in on constructing a monolith from scratch.

And I would make this if I wanted to speedrun RTA again, at least getting it up to parity with the current savebox-based SMS autosplitters. Because I have this conviction that speedruns should just be automatically timed without any effort from the user, and that the process for doing this should be runnable live or retroactively with identical results. Basically, all the info at your fingertips, with none of it lost if you made the wrong decision in the moment. Plus, Super Mario Sunshine is in dire need of loadless timing, and Super Mario 64 of star splits that don’t have to be skipped for backup star orders. Speedrun timing’s just a problem that hasn’t been abstracted correctly and the existing solutions have worn on me.