Synchronizing with External Video Cameras

Sometimes it is necessary to synchronize the recordings made by external video cameras to the timestamps of e.g. audio and video data that was recorded to .tide files. This document describes one way this synchronization can be achieved using a cross-correlation calculation on different recorded audio streams.

All the data (video, audio or other types) that is recorded using RSBag has timestamps that represent the absolute time they were produced or recorded with up to microsecond precision. Assuming that all the hosts communicating in a given RSB setup have their clocks synchronized using e.g. NTP this makes it easy to correlate events recorded on different RSB scope s. When you add external video cameras that are not triggered or synchronized in any programmatic way to this mix, this gets more complicated. Synchronizing them exactly to other recorded data is necessary e.g. when you want to annotate this data using information gathered from the recorded videos.

The approach described here for doing this synchronization relies on comparing audio data that was recorded with timestamps to the audio that was recorded using the external video cameras. The peak of the cross-correlation between those two streams is then used to determine an offset between them, which is in turn used to cut the video from the external camera to start at a specific known timestamp. Thanks to Lars Schillingmann who originally had the idea for this approach and provided the Praat script used for it.


Apart from installed versions of the RSBag tools, this approach depends on current versions of the following software packages:

  • Praat (for doing the cross-correlation computation, tested with version 5.3.08):
  • FFmpeg (for cutting and converting the video files and for extracting audio from video files, tested with a recent git version, namely git-2012-04-08-069cf86):
  • SoX (for some audio processing tasks, tested with version 14.3.0):


This approach will only work if you have a) recorded timestamped audio data in one of your .tide files and b) recorded audio in the video files of your external cameras. Additionally both audio streams need to have recorded approximately the same sounds, so should e.g. not have been in separate rooms.

Calculating Offsets

This section describes how to get the offset between the video recorded from an external camera and a known timestamped stream.

As an example, this and the following sections assume that we are working with the following files:

  • camera.mts: Video and audio from an external camera (can of couse also be in different formats, as long as they are readable by FFmpeg).
  • audio.tide: RSBag recording containing at least one stream of audio. recorded directly through rsb-gstreamer or from some other RSB-using device, like the Nao robot.

The end result will be a relative offset (in seconds) between the recording in the first and in the second file.

TODO: maybe mention how to convert the offsets between/for multiple external cameras.

Extracting the Relevant Audio Channels

Since the offset calculation later on can only work (or at least works best) on mono audio data in a specific format, we have to extract and convert single audio channels first.

To extract all of the audio channels from the video of your external camera, you can use ffmpeg like this (the result will probably have something between 2 and 6 audio channels, depending on your camera):

$ ffmpeg -i camera.mts camera-audio.wav

To pick the first audio channel from this (in most cases it wont really matter which one you pick) and convert it to the right sample format, you can use SoX like this:

$ sox camera-audio.wav -c 1 -e signed-integer -b 16 -L -r 16000 remix 1

TODO: describe how to get a .wav from the .tide. see this.

To also pick the first audio channel from this, we again use SoX:

$ sox reference-audio.wav -c 1 -e signed-integer -b 16 -L -r 16000 remix 1

Calculating the Offset Between Two Audio Channels

Now we can calculate the offset between the two mono wave-files produced in the previous section.

TODO: describe how to run the praat script


TODO: warn that peaks can be off if there is not much information in the audio streams to correlate. also mention script parameter.

form Enter fileA and fileB
    sentence fileA
    sentence fileB
Open long sound file... 'fileA$'
objectA$ = selected$ ("LongSound")
Open long sound file... 'fileB$'
objectB$ = selected$ ("LongSound")
select LongSound 'objectA$'
plus LongSound 'objectB$'
Extract part... 0 480 yes
Cross-correlate... "peak 0.99" zero
Get time of maximum... 0 0 Parabolic

Cutting the Videos

This section describes how to use the offset calculated in the previous section to cut the video from the external camera to start at a point in time whose timestamp is known.

TODO: cam-*.mts -> cut .mp4s (needs offsets to the begin-timestamp)...


TODO ...