Synchronizing WebVTT Captions

In this topic, you will learn how to configure WebVTT captions for HLS videos to synchronize the audio and video with the subtitles.

Overview

The Web Video Text Tracks (WebVTT) file is a simple text file used to associate captions, subtitles, descriptions, etc with time segments in your video.

WEBVTT

  00:00:03.50 --> 00:00:05.000 align:middle line:84%
  In this video, you'll learn
  about how Video Cloud Studio is

For details about adding a WebVTT file, see the Add Captions to Videos document.

Add a metadata header

As part of the HLS spec, you need to add an X-TIMESTAMP-MAP metadata header to each WebVTT header in order to synchronize timestamps between the audio and video with your subtitles.

If you are missing this header or the MPEGTS value is not correct, you may see that your subtitles are not in sync with the video. This is because when the X-TIMESTAMP-MAP header is missing, the client assumes a default timestamp offset of 0. For example, the difference between using a value of 900000 and 0 can cause your captions to be off by 10 seconds.

X-TIMESTAMP-MAP format:

X-TIMESTAMP-MAP=MPEGTS:<MPEG-2 time>,LOCAL:<cue time>

Here is a sample WebVTT file:

The X-TIMESTAMP-MAP header must appear on line 2, directly after the WEBVTT line. You may experience unexpected results if the timestamp header is not placed on line 2.

WEBVTT
X-TIMESTAMP-MAP=MPEGTS:900000,LOCAL:00:00:00.000

1
00:00:03.500 --> 00:00:05.000 align:middle line:84%
In this video, you'll learn
about how Video Cloud Studio is

For details, see the Subtitle segments section of Apple's HLS Protocol document.

Determine the offset value

If you are using Brightcove's Dynamic Ingest or Zencoder to transcode your content, use an offset value of MPEGTS:900000.

If you are using an encoding system other than Zencoder, it is best to obtain the value needed. Apple recommends that you set the offset to match your encoded video.

The MPEGTS value corresponds to the presentation timestamp (PTS) value of the MPEG frame at the given LOCAL time. If you are using Brightcove's legacy ingest system, you may find that you can use a value of MPEGTS:0.

If your account is enabled for Dynamic Delivery, and you are hosting captions with us (non-remote captions), then we will make everything work auto-magically (PTS needs to be zero and we will make sure it is.)

If your account is enabled for Dynamic Delivery and you are using remote captions, you will need to set the PTS value to zero.

For example, to get the offset value, you can do the following:

Request1:

In the terminal, get an HLS video and store it in a local file. In this case, we are naming it seg.ts.

curl -o seg.ts "http://brightcove.vo.llnwd.net/v1/unsecured/media/4360108595001/201507/1154/4360341622001/4360108595001_4360341622001_s-1.ts?pubId=4360108595001&videoId=4360283683001"

Request2:

Then, use the ffprobe command to get the offset value. ffprobe is a multimedia stream analyzer, which is part of the FFmpeg framework. You will need to download and install this on your computer.

ffprobe -show_frames seg.ts

Response:

Your response should look similar to this:

pkt_pts=900000
pkt_pts_time=10.000000
pkt_dts=900000
pkt_dts_time=10.000000

Best practices

The following guidelines should help when developing your app with captions.

Caption duration

It is recommended that the caption duration does not exceed the video duration. This will prevent captions from displaying or an unseekable area in the progress bar from appearing after video playback has completed.