Media Contexts

Media streams change. You might be processing an inbound transport stream (or RTMP connection, WebRTC stream, etc.) and its content changes. One relatively common use case is that audio streams come and go. For example, one program might have both English and Spanish audio, but the next program in the schedule might only contain English audio (or it might contain audio with no language tag, or it might even contain no audio at all). It’s the Norsk Context message that tells you about these changes such that you code can respond.

Let’s take a quick look at how a Context Message is defined in Norsk’s media.proto file.

/**
 * The Context message is sent from Norsk in response to a
 * change to either the Media Node's inbound or output set of streams.
 * Once received, your code *must* acknowledge the context change
 * with a call to <ref>Media.UnblockCall</ref>,
 * passing in the <code>blockingCallRef</code>; note that if using the
 * JavaScript SDK then this is automatically handled.
 */
message Context {
  repeated StreamMetadata streams = 1; // The set of streams
  string blocking_call_ref =
      2; // The reference for acknowledging the context change
}

Let’s focus on the repeated StreamMetadata part of the message. Below is a short version of its definition.

  StreamKey stream_key = 1; // The StreamKey
  oneof message {
    AudioMetadata audio = 2;       // Audio metadata
    VideoMetadata video = 3;       // Video metadata
    // other types of metadata (subtitles etc)...
  }
}

So our context is a mapping from StreamKey to data about the multiple StreamKeys, and each stream key has supplemental information associated with it depending on its type (audio stream, video stream, subtitles, etc.).

Stream Keys

Let’s keep digging into the detail. Here’s how the StreamKey Message is defined in Norsk’s media.proto file.

message SourceName { string source_name = 1; }
message ProgramNumber { int32 program_number = 1; }
message StreamId { int32 stream_id = 1; }
message RenditionName { string rendition_name = 1; }

message StreamKey {
  SourceName source_name = 1;
  ProgramNumber program_number = 2;
  StreamId stream_id = 3;
  RenditionName rendition_name = 4;
}

A Stream Key consists of 4 parts:

  • SourceName - A string provided by your code. For example, "camera1", "Studio RTMP Input", "Bob’s SRT Encoder", or "UDP from the satellite receiver".

  • ProgramNumber - A number identifying the program within the source. This exists to support multiprogram-capable sources, such as a multiprogram transport stream, in which case it is the source that allocates the ProgramNumber. For single-program sources such as RTMP, the ProgramNumber is always 1.

  • StreamId - A number identifying the stream within the program. This is allocated by the source.

  • RenditionName - Any media may exist in multiple renditions; a good example is an ABR ladder for a video stream, where there are several streams containing the same content but encoded at different qualities. Your code may well choose to name the renditions "low", "medium", and "high". For streams received from a source Media Node where no additional configuration has been provided, the RenditionName is "default".

For those of you used to working with Transport Streams, this will probably seem rather familiar. For those of you less so, here’s an example.

Stream Key Example

Let’s say we are receiving a UDP stream from a satellite decoder tuned to a particular frequency. The received transport stream can contain multiple programs (think BBC1, BBC2, etc., all multiplexed together…​), and each program can contain multiple streams of data.

Typically, that’s one video stream and one audio stream but it can be zero or more video / audio / subtitle / data / …​ streams. Multiple audios are certainly reasonably common (English and Spanish audio, commentary, and stadium sound separately…​)

So the StreamKeys of the content from the "UDP from the satellite receiver" stream might well be:

  • "UDP from the satellite receiver" / 1 / 256 / "default" - program 1 video

  • "UDP from the satellite receiver" / 1 / 257 / "default" - program 1 English audio

  • "UDP from the satellite receiver" / 1 / 258 / "default" - program 1 Spanish audio

  • "UDP from the satellite receiver" / 2 / 256 / "default" - program 2 video

  • "UDP from the satellite receiver" / 2 / 257 / "default" - program 2 audio

How things like language or content type are communicated varies from input node to input node. For example, a transport stream input provides (optional) callbacks to the user with the data from the PAT and PMT for the transport stream. These mechanisms are discussed elsewhere, on a per-input basis.

The good news is, you now know how Norsk labels all streams. TS is as complex as it gets, and we use the same naming strategy for all streams regardless of whether they are from TS, WebRTC, SRT, MP4, fMP4, etc.