This is a set of notes about Media Kit current limitations and improvements, attempting to extract the core ideas that led Barrett to attempt to write a Media2 and a Codec Kit. The goal is to see how much of this can be implemented while preserving the current APIs, and what needs to be moved to R2 or a new kit.
Media nodes are a pain to work with. There is a lot of boilerplate code to write to get one up and running. An easier API could maybe be considered, perhaps as a wrapper around the existing infrastructure.
The code for realtime streams and for media file encoding/decoding are not clearly separated, while the implementations are in fact largely independant. Without going to the extreme of two separate kits, at least moving the sourcefiles to different subdirectories may clarify this a little. This separation is questionable, for example DirectShow (https://en.wikipedia.org/wiki/DirectShow) implements file reading (BMediaFile), splitting (BMediaTrack) and decoding as separate "filters".
Non audio/video use
The two main use cases here are subtitles (not very complicated, just a new kind of stream that goes in the same direction as audio and video data), and DVD/BlueRay menus. Blueray is quite complicated (essentially, BlueRays can run arbitrary code in a Java virtual machine), so it's questionable if the Media Kit is the appropriate place to support this. DVD might be simple enough to be workable, but requires getting mouse and keyboard events to run the menu, which goes in the reverse direction from the usual media stream.
Another possible one is midi events, currently covered in a separate kit, but which are also a realtime stream of arbitrary data.
BMediaFile (as the name implies) was not really designed with streaming in mind. It has been implemented, still, but there are limitations. In particular, currently sniffing the file is done synchronously in the constructor and blocks until data has been downloaded, and there is no way to manage the download buffer to have proper seeking properties. It looks like neither of these need an API change, however.
The current API was designed in the days of AVI and QuickTime and revolves around 4CCs and other limited ways to identify content types. MIME type might be used instead, but they may not be appropriate for stream encoding format (they are more designed for file format). In any case, a review of how we specify the types is probably needed.
It would be great to make sure we are not restricted to ffmpeg. This may include unusual usecases (for example, an 8 bit computer emulator wanting to encode or decode tape data from a tape image, no matter if recorded as wav file, played from computer line in, or loaded from a more compact specific tape image format).
This create a problem (already probably causing some bugs for us) because ffmpeg is not designed so that its file multiplexing and encoding/decoding part are used independently. The ffmpeg API makes it somewhat possible, but we have to forward a lot of ffmpeg-specific data around. For example, sometimes the timestamps for video frames are known at the file level, and sometimes at the track level.