m1nt_

rePlayer internals: Subtitles

Notice

この記事の日本語版はこちら

The main feature of rePlayer, my video player for Resonite, is the ability to load subtitles from YouTube. The current version of rePlayer can even handle the advanced subtitles that are especially useful for music videos. In this post, I’d like to talk a bit about the history of this feature and how each version worked.

Version 1: The Beginning

In the first version of the subtitle feature I made during development, the subtitle support was mostly restricted to simple subtitles. When I started working on it, I remembered there was a useful component called DynamicSubtitleProvider that was able to load regular subtitle files.

To summarize, it worked like this:

  1. User picks a subtitle from the list
  2. URL with video ID and requested language gets filled into the component’s AssetUrl
  3. Server downloads subtitle file from YouTube in the SRT format and transforms the text formatting tags (for bold, italics and color) to make them parseable by Resonite
  4. Server sends the resulting file to the game
  5. Resonite internally parses the SRT file and turns it into an animation with one string track
  6. Animator component drives the text on the subtitle slot, which displays it to the user

However, I quickly ran into problems with this approach: Because this is Resonite and nothing is sacred, it simply failed to parse the subtitles sometimes. This is made worse by the fact that YouTube’s SRT files’ format was inconsistent and sometimes outright broken, but even when they weren’t broken, Resonite sometimes failed to parse them.

I started to look for a different solution… Also, not much code was written yet, and TypeScript really started to piss me off, so I took the opportunity to rewrite it in Rust. I know rewriting things in Rust is kind of a meme, but I genuinely really just don’t like TypeScript at all. :(

Version 2: Weird Territory

During this whole ordeal, I was talking to Vlams. While talking to them, I suddenly had an idea: The DynamicSubtitleProvider turns subtitles into an Animation, so maybe I could skip that step and turn it into an Animation immediately?

I remembered AnimX, the custom binary format that Resonite uses for animations. After all, a StaticAnimationProvider also just loads assets from a URL. What if my server could serve AnimX directly over a URL?

I told Vlams about this idea and they offered to write a library to write (and parse!) AnimX for me. At that point, Vlams has never even touched Resonite yet, but even so, their first attempt at making an AnimX writer from the spec was already almost correct. After I did some more testing for them, Vlams managed to get it working and published it on GitHub: https://github.com/vlams1/resonite-core

With the AnimX writing solved pretty quickly by Vlams, I implemented the rest of the things that were needed to make it work. The full procedure was kind of similar to the last one, except I used WebVTT instead of SRT, and I sent AnimX to the game instead of a regular subtitle file.

Now, it worked like this:

  1. User picks a subtitle from the list
  2. URL with video ID and requested language gets filled into the StaticAnimationProvider’s AssetUrl
  3. Server downloads subtitle file from YouTube in the WebVTT format and transforms the text formatting tags (for bold, italics and color) to make them parseable by Resonite
  4. (New!) Server turns subtitle cues into AnimX keyframes, sometimes inserting empty cues for timestamps with no text
  5. Server sends the resulting AnimX file to the game
  6. Animator component drives the text on the subtitle slot, which displays it to the user

This had its own problems, but the issues were caused only by YouTube itself this time. Like with SRT, the WebVTT files served by YouTube were sometimes broken, and I had to work around that as best as I could. It worked most of the time, but I wasn’t satisfied with it. Also, videos that displayed multiple subtitles on the screen at the same time didn’t work properly. The thing that bothered me the most though was the lack of support for YouTube’s special features, like text position, font changes, and so much more…

That’s why I came up with something else, again.

Version 3: The Secret Ingredient is Crime

Since YouTube’s attempts to convert subtitles to conventional formats always caused trouble, I decided to process YouTube’s own subtitle format directly. YouTube has two internal subtitle formats, and I had to pick one of them. There’s SRV3 and JSON3 (guess what this one is based on). Both of these store the same data but with different names (JSON3 had sane names, SRV3 names data with two letters). I decided to use JSON3. This was the easiest choice of my life since SRV3 was XML-based, and I vowed to never touch XML again, unless I don’t have a choice.

Parsing the JSON3 files was extremely easy, since it’s just JSON. The interesting part was trying to transform it into something I could use in Resonite. It didn’t take me long to figure out how to do that. I ended up using AnimX again, but this time, I used multiple animation tracks.

My nefarious scheme worked something like this: One track for text, one for vertical position, one for horizontal position, one for shadow color, one for outline color, one for the decoration mode (none, outline, shadows), one for the background color… and a few more. There was one more track which served as metadata. This animation track stored the maximum count of subtitle tracks that are displayed at the same time. The final piece of the puzzle was the ProtoFlux needed to make it all work.

AnimX allows animation tracks to have some metadata. They can belong to a Node and describe a Property. ProtoFlux can use the name of the animation node and the name of the property to find the index of the animation track using FindAnimationTrackIndex. Using the index, it’s possible to get a value from the animation track.

First, I use this to read the property “trackCount” of the node “Metadata”. I use the value to duplicate a subtitle track template Slot. All of the subtitle track Slot are children of a NestedCanvas, and each of them displays exactly one subtitle track with all of its information. Every subtitle track has ProtoFlux which reads the needed information from the generated animation. Every subtitle track has its own node in the AnimX file, and each node has properties which contain all needed information to display the subtitles. For example, the first track’s node is called Subtitle0. The ProtoFlux uses the subtitle track Slot’s index to determine which node to get the subtitle data of, so in the case of the third Slot, it would use Subtitle2 as the node. The property names are always the same, so once it has the node name, it can easily get the rest of the data from the animation. This data is sometimes processed a little bit using ProtoFlux to better mimic the look of the subtitles on YouTube.

Here’s a full list of the properties used for each animation node:

The rest is pretty simple, the subtitle track Slot’s ProtoFlux samples all of these properties according to the video’s current playback time.

  1. User picks a subtitle from the list
  2. URL with video ID and requested language gets filled into the StaticAnimationProvider’s AssetUrl
  3. (New!) Server downloads subtitle file from YouTube in the JSON3 format and transforms the text formatting tags (for bold, italics and color) to make them parseable by Resonite
  4. (New!) Server checks how many simultaneous subtitle tracks will be needed to display the subtitles correctly
  5. (New!) Server turns subtitle cues into AnimX keyframes, sometimes inserting empty cues for timestamps with no text, and creates some more subtitle tracks for the other properties
  6. Server sends the resulting AnimX file to the game
  7. (New!) ProtoFlux reads count of needed subtitle tracks (only needs multiple tracks if there’s more than one subtitle being displayed at the same time), then duplicates the subtitle track template Slot, if needed
  8. (New!) ProtoFlux attached to each subtitle track Slot drives the text and other values, which control what’s being displayed

With this method, we are now able to display YouTube subtitles, even if they use advanced features, and even if there’s multiple subtitles on the screen at the same time! Though it’s more complicated than the other methods, it’s more accurate and much more reliable. This took a lot of time and effort, but I think it was definitely worth it.

The End

Thank you for reading! I hope you enjoyed this little post about the rePlayer internals. If there’s anything else that would be interesting to cover, please let me know! And, don’t forget to try rePlayer yourself! :)