In an ideal world, all video would play back at a uniform round-number frame rate, for example 30 frames per second. Associated with each frame of video would be a unique frame count, starting with zero. Each frame count could be translated into a unique timecode format of hours, minutes, seconds, and frames. Each frame count could likewise be translated into a (possibly fractional) time in seconds. All of these descriptions of a frame's temporal location would be mutually interconvertible and their meanings unambiguous and intuitively simple. For example, at 30 frames per second:
108,000 frames = 1 hour + 0 minutes + 0 seconds + 0 frames = 3600 seconds.
When black & white TV was first introduced in the US, it played at 30 frames per second. The AC power available at every wall outlet alternates at a rate of 60 cycles per second, providing an easily available sync signal. In Europe, the AC power oscillates at 50 cycles per second, hence their adoption of 25 frame per second video rates. The conversion between times based on 25 or 30 frames per second is relatively simple. If that were the only timecode problem facing video producers this document would be unneccesary. With the development of color TV, however, the situation became more complicated. The added color component to the broadcast TV signal could sometimes interfere with the preexisting audio. Changing the audio's format would have made all existing black & white TVs incompatible. The solution was to nudge the picture rate down from 30 to 29.97 frames per second. Today, all TVs in the US, Canada, Mexico, and Japan play at this rate. This color TV format is named after the committee that defined it: the National Television System Committee, or NTSC.
A frame-counting scheme called drop-frame has been developed that will allow users to ignore the distinction between 29.97 and 30 when interpreting timecodes. The details will be discussed below, but the central idea is important:
Drop-frame timecodes are defined so as to look like 30 frame per second times, and to reflect accurately the actual time elapsed. In particular, time durations found by subtracting drop-frame timecodes will be accurate to within one or two frames over arbitrary length intervals.
We will always express frame rates in frames per second to indicate the connection between the word "per" and the operation of division. Similarly we talked above about AC power varying at 60 cycles/s. The generic unit for "things [of any sort] per second" is Hertz, abbreviated Hz, and named after the German physicist Heinrich Hertz. Hertz was the first to generate and detect radio waves, and demonstrate that they were the same sort of thing as light, but oscillating at lower frequencies. Many technical discussions will use Hz to describe frame rates.
Let us imagine that we have a program of video in which every frame is just the image of a number, which is the frame number. So the first frame is an image of the number zero, the next is of the number one, etc. This program is played back at 30 frames/s and at 29.97 frames/s (see Fig. 1).
The most obvious way to count timecodes is to increment the frame count as each frame of video goes by. When the frame count gets to 29, at the next frame we set it to zero again and add one to the second count, like so:
00:00:00.00
00:00:00.01
00:00:00.02
...
00:00:00.28
00:00:00.29
00:00:01.00
00:00:01.01
(etc.)
This approach works just fine for 30 frames/s material (of which there is none in the video world!) but causes problems for 29.97 frames/s video, as illustrated in Fig. 3.
As you may know, leap year does not always come every fourth year. In years that are divisible by 400, leap day is omitted. Similarly, the actual solution for 29.97 timecode, called drop frame counting, is a bit more complicated than the approach described in the previous paragraph, but it achieves the same result: drop frame timecodes accurately describe the true time elapsed. (Fig. 4.)
The actual way that drop frame counting is implemented is illustrated by the following sequence of timecodes:
00:00:00,00
00:00:00,01
...
00:00:59,28
00:00:59,29
00:01:00,02
00:01:00,03
(etc.)
Every minute, two frames are dropped from the count. At this rate, every hour we would be dropping 120 frames. But as we saw above, the discrepancy after an hour is actually 108 frames. So, when the minute count is divisible by 10 (0, 10, 20, ... , 50) the two frame counts are not dropped. This scheme keeps the timecode from drifting away from the true time.
totalMinutes = 60 * hours + minutes frameNumber = 108000 * hours + 1800 * minutes + 30 * seconds + frames - 2 * (totalMinutes - totalMinutes div 10)where div means integer division with no remainder.
D = frameNumber div 17982 M = frameNumber mod 17982 frameNumber += 18*D + 2*((M - 2) div 1798)(If -2 div 1798 doesn't return 0, you'll have to special-case M = 0 or 1.)
frames = frameNumber mod 30 seconds = (frameNumber div 30) mod 60 minutes = ((frameNumber div 30) div 60) mod 60 hours = (((frameNumber div 30) div 60) div 60) mod 24where mod means the remainder after integer division.
timeInSec = frameNumber/29.97; frameNumber = seconds * 29.97
Note that the first frame is number zero in this scheme. Similary, the time of frame zero is zero. If you want to know how long an interval is, you have to add 1 to the frame number (to get the frame count) before dividing by 29.97. Don't let off-by-one errors drive you crazy!