Video transcoding, part 2: tools

Posted by Jon
on Monday, May 07

(This is Part 2 in a multi-part series on video transcoding and the web. For the rest of the series, take a look at the first post.)

There are a lot of tools that can be used to transcode video and audio. These fall into four main categories.

First are desktop applications that can “Export” to another format. iMovie, Final Cut, Premiere, Windows Movie Maker, Flash, and a host of other applications fall into this category. These tools are great for personal, low-volume use, but they require a person to manually load up each job, so they aren’t suitable for integration with another application.

Second are enterprisey apps that primarily do video transcoding, with a GUI frontend to make things easy. These applications watch an input folder, apply a recipe (like “AVI/MPEG-4, 2000kbps”), and send the file to an output folder. Telestream FlipFactory is an example. I haven’t actually used FlipFactory, though I’ve talked with people who say that it offers everything you’ve come to expect in enterprise software. (Ahem.)

Apple Compressor is a lightweight version of this, with a low price, poor documentation, a non-existent user community, slow transcoding speeds, and good quality output. Compressor has some nice built-in workflow and distribution tools. From what I understand, Compressor is kind of like a GUI on the Quicktime transcoding libraries.

Third are asset management tools that do video transcoding as one component among many. These are kind of like FlipFactory (and several actually make use of FlipFactory), but they have a lot of other things tacked on as well. If you need an asset management tool, this might be a good option. But if you want a website that transcodes and publishes user-submitted videos, these tools are like buying a PC for its calculator app.

Fourth and finally are command-line tools that do video/audio encoding, and nothing else. Examples include ffmpeg, mencoder, and On2 Flix Engine. These tools take an input file and a lot of other options and create an output file. Most are open-source. Most (all?) run on Unix/Linux.

It is probably clear that I’m most interested in the fourth category of tools (though Apple Compressor is at least moderately interesting). The first category isn’t suitable for building a robust system, and the second and third categories are expensive and heavyweight. ffmpeg and mencoder are free, and On2 is relatively cheap, and all three do a good job. So let’s explore them in a bit more detail.

Overview of tools

ffmpeg and mencoder

Notice that I’ve lumped these together. That is because ffmpeg and mencoder are kind of like C++ and Objective-C; they do the same thing, in similar ways, with the same foundation, though they aren’t identical. Both ffmpeg and mencoder use the libavcodec and libavformat libraries for the bulk of their codecs and formats. (libavcodec and libavformat are part of the ffmpeg project, and are LGPL’d.)

I’m not an expert on these tools, so I haven’t used ffmpeg and mencoder enough to be able to recommend one over the other. They have their differences; mencoder supports more powerful filters and advanced options. But I’ve used both successfully, and they are ultimately quite similar. (If you have extensive experience with both of these, and want to weigh in on the matter, leave a comment below.)

On the whole, these are two of the most impressive open source projects I’ve ever seen. They are powerful and reliable, with good speed, good quality, and a huge range of supported formats.

ffmpeg and mencoder have one downside: support. The easy things are easy, but the hard things are painful, and you’re basically on your own when it comes to figuring things out. Don’t look for a good manual, or a corporate support contract, or even a few local experts you can hire. The experts are young hackers from around the globe who hang out at http://forum.doom9.org or on the various mailing lists. Also, as good as they are with the complexities of ffmpeg and mencoder, their primary use case is ripping archiving DVDs, not building robust video transcoding systems.

This doesn’t just mean that you won’t squeeze every last drop of power out of ffmpeg. It also means that you will run into strange errors, and your only source of help will be mailing lists and deep Google searching.

Here are a few code examples that illustrate the range of complexity behind these tools. The following command transcodes an input file to MPEG-4 AVC:


ffmpeg -i matrix.mov -vcodec h264 -ab 128 -s 720x304 -r 23.98 matrix-h264.mp4

The following command also transcodes an input file to MPEG-4 AVC, but does a better job:


ffmpeg -y -i matrix.mov -v 1 -threads 1 -vcodec h264 -b 500 -bt 175 -refs 2 -loop 1 -deblockalpha 0 -deblockbeta 0 -parti4x4 1 -partp8x8 1 -partb8x8 1 -me full -subq 6 -brdo 1 -me_range 21 -chroma 1 -slice 2 -max_b_frames 0 -level 13 -g 300 -keyint_min 30 -sc_threshold 40 -rc_eq 'blurCplx^(1-qComp)' -qcomp 0.7 -qmax 35 -max_qdiff 4 -i_quant_factor 0.71428572 -b_quant_factor 0.76923078 -rc_max_rate 768 -rc_buffer_size 244 -cmp 1 -s 720x304 -acodec aac -ab 64 -ar 44100 -ac 1 -f mp4 -pass 1 matrix-h264.mp4

ffmpeg -y -i matrix.mov -v 1 -threads 1 -vcodec h264 -b 500 -bt 175 -refs 2 -loop 1 -deblockalpha 0 -deblockbeta 0 -parti4x4 1 -partp8x8 1 -partb8x8 1 -me full -subq 6 -brdo 1 -me_range 21 -chroma 1 -slice 2 -max_b_frames 0 -level 13 -g 300 -keyint_min 30 -sc_threshold 40 -rc_eq 'blurCplx^(1-qComp)' -qcomp 0.7 -qmax 35 -max_qdiff 4 -i_quant_factor 0.71428572 -b_quant_factor 0.76923078 -rc_max_rate 768 -rc_buffer_size 244 -cmp 1 -s 720x304 -acodec aac -ab 64 -ar 44100 -ac 1 -f mp4 -pass 2 matrix-h264.mp4

Note that the command occurs twice, almost identically; this is a two-pass job, so the same command is executed twice. Two-pass encoding can create more efficient files, since the second pass learns from the first pass.

Supporting tools

ffmpeg and mencoder are great, but they don’t stand on their own – just like Linux is great, but it doesn’t do much without GNU tools. ffmpeg and mencoder handle dozens of codecs and libraries, but several common (and important) libraries are handled elsewhere. Want H.264? Use x264. What about MP3? LAME mp3. AAC? faac and faad. These programs can run on their own, or support can be compiled directly into mencoder and ffmpeg. The latter option is usually preferable to keep things simple, but of course, this doesn’t always work. You may need to transcode in multiple steps, because ffmpeg doesn’t support your desired codec, or because you can get better quality by doing things separately. For this reason, most of these programs can output to a pipe and take input from a pipe. Otherwise, you can just create temporary files for the intermediate steps.

There are also many tools that do small, specialized tasks, and which are not integrated with mencoder and ffmpeg. Want to export your video as raw YUV frames? Want to repackage a mp4 file as mov? Want to add metadata to a file? Mencoder may be up to the task, but if it isn’t, there are dozens of small tools that can be used to do these things.

On2 Flix Engine

On2 Flix Engine is a commercial video transcoder that outputs VP6 files. This is good: VP6 is comparable to H.264 in terms of quality and efficiency. It works similarly to ffmpeg and mencoder – install it, run a binary with options (input file, output file, quality, resolution, etc.), and that’s it.

I haven’t On2 Flix Engine extensively, but I want to. Why? Becuase it outputs high quality FLV files. This means that On2 offers extremely high quality and extremely high compatibility, unlike flv/h.263 (high compatibility, low quality) or H.264 (high quality, lower compatibility). So it makes a lot of sense for distributing files over the web.

Since I haven’t used On2 much, I asked a friend of mine who has used it extensively, Matt Bauer, for his experience. He posted his thoughts in this article on On2. He’s also working on Ruby bindings for On2, which is very cool.

What about Ruby?

This series is, at least in part, a discussion of how to create a video transcoding system using Ruby. So does Ruby have a place in low-level video transcoding? Short answer: no. Video transcoding is time consuming and processor intensive. Ruby is slow. Bad combination. That doesn’t mean that Ruby isn’t suitable for a high level video transcoding system, as we’ll see in the next post; Ruby is a near-perfect language for gluing together a transcoding system. But for actually decoding a file and reencoding it in another format: stick to C. All of the tools I’ve discussed were written in C, with maybe some rogue C++ or Objective-C here and there. And actually, there is an x264 parallel encoder known as x264-farm that is written in OCaml, which is pretty sweet. But these are all really fast languages, and Ruby is not.

So let’s leave Ruby for our controller code. Stay tuned.

Video Transcoding, part 1: formats and codecs

Posted by Jon
on Thursday, April 26

(This is Part 1 in a multi-part series on video transcoding and the web. For the rest of the series, take a look at the first post. It was edited on April 27 to include the Ogg format and codecs.)

A typical video file is made up of one or more video streams and one or more audio streams. Each video or audio stream has a codec, and the file itself has a container format.

A codec (compressor/decompressor) is a software encoding that allows a stream to be compressed for storage or transmission, and decoded to a raw or readable format. Most, but not all, codecs are lossy. Examples of familiar audio codecs include MP3 (technically, MPEG-1 audio format 3) and AAC (technically, MPEG-4 part 3). Familiar video codecs include DivX and Xvid (both implementations of MPEG-4 ASP).

A container format (sometimes called a wrapper format) is a file format that can contain video and audio streams (along with other types of streams, like text). Common container formats include MOV (Quicktime), and AVI.

In this post, I will highlight a few interesting codecs and formats. The discussion will be far from comprehensive – there other good codecs and formats. My goal is to discuss the best ones, but there may be others that I’m not aware of. (Leave a comment if you want to suggest one!)

Formats and Codecs

Video codecs

  • MPEG-2 is commonly used for DVDs and ATSC broadcasts. MPEG-2 doesn’t have particularly efficient compression, so unless you’re creating a playable DVD or broadcasting over the air, you probably won’t create MPEG-2 video files.
  • MPEG-4 ASP is a much more interesting format, and it commonly goes by the names XviD or DivX. These codecs provide reasonably good compression, wide compatibility, and a reasonable CPU load for encoding/decoding. DivX is owned by DivX, Inc., while XviD is GNU licensed.
  • H.264 is also known as MPEG-4 AVC. H.264 is a more advanced video codec than ASP/XviD (hence the AVC acronym), and provides better compression. How much better? Don’t quote me on this, but IIRC it provides about 35-50% better quality for the same file size. Unfortunately, it has two disadvantages when compared to MPEG-4 ASP. First, H.264 decoders are a little less common than XviD decoders, so compatibility isn’t quite as widespread. Second, encoding and decoding H.264 is more processor intensive than XviD. That said, H.264 is probably the best quality video codec on the market today, or at least is tied for this honor.
  • H.263 is, unfortunately, not just an incremental step down from H.264. It is about 8 years older, and provides far worse quality. This is unfortunate, because H.263 is used in at least two prominant places. First, most Flash video is encoded with H.263, including just about everything on YouTube and MySpace. Second, H.263 is often used by cell phones to play or capture video. Despite the low quality, H.263 makes perfect sense if you want to put video on the web and don’t want to use the codec that will be described next – VP6.
  • VP6 is a proprietary codec developed by On2 Technologies. VP6 has two major advantages. First, it offers great compression, comparable to H.264. Second, VP6 is used by Flash 8, so it is a great candidate for video that will be played back on the web. The downside is that VP6 can’t be encoded for free; it requires commerical software, like Flash 8, or the On2 Flix Engine. But this may actually not be a downside, as I’ll discuss in a future article on licensing and royalties.
  • WMV describes several Microsoft codecs. Confusingly, WMV 7 is also known as WMV1, WMV 8 is also known as WMV2, and WMV 9 is also known as WMV3. (The latter name refers to the codec’s FourCC code, and I think the former name refers a corresponding version of Windows Media Player.) These codecs are not bad, and with Flip4Mac, they are no longer restricted to Windows machines.
  • Theora is a truly open-source video codec based on On2’s VP3 codec, as a part of the Ogg project. It provides comparable quality to MPEG-4 ASP (e.g. XviD), and it is BSD-licensed. Did I mention that it is open-source (unlike any of the MPEG, Microsoft, or On2 codecs)? I’ll discuss this in more detail in a future post.

Audio codecs

  • MP3 is probably the most famous codec of all time. It is a part of the MPEG-1 standard (MPEG-1 audio layer 3). This codec provides adequate quality at a bitrate of about 128kbps. However, distribution of MP3-encoded content is not free, as will be discussed in a later post.
  • For this reason, AAC is a better audio codec than MP3 for most uses. AAC-encoded content is free to distribute, which is one of the reasons why iTunes chose AAC for its content. AAC also provides better compression than MP3 – 96kbps AAC is generally considered equal to 128kbps MP3.
  • Vorbis is an Ogg audio codec that is similar in quality to MP3, or perhaps slightly better. Like Theora, it is truly free (unlike AAC or MP3).
  • FLAC and Apple Lossless are two lossless codecs that provide about 50% size savings compared to uncompressed audio. FLAC is part of the Ogg project, and I’m not sure who developed Apple Lossless.
  • AC-3 is the Dolby Digital audio codec that can store 5.1 channel audio.

Formats

  • MOV is the Quicktime container format. It is a reasonably good format, with pretty widespread compatibility (every computer with iTunes installed also has Quicktime) and decent codec support. One major advantage of Quicktime is that MOV files can be played back in many browsers (though not all).
  • MP4 (MPEG-4 part 14) is based on Quicktime and is very similar, but it supports some codecs and encoding options not supported by MOV (like advanced H.264 profiles). Most media players can play MP4, though not in a web browser.
  • AVI is an old format (1992!) that is somewhat outdated today. It has pretty good codec support, though it isn’t able to handle some modern codecs very efficiently, including XviD and H.264. Ask yourself why you want to use AVI instead of a more modern format.
  • ASF is a proprietary Microsoft format. If you’re primarily using a Microsoft codec and Microsoft players, ASF may be a good option.
  • OGG is a free, open-source format released under the BSD-license that typically is used with Vorbis, Theora, FLAC, and other Ogg codecs. However, the Ogg container format can wrap other formats (like MP3 and various MPEG-4 codecs).
  • FLV is the Flash Video format, and offers by far the best web browser compatibility – likely in the 99% range for folks with up-to-date computers. Of course, there are several versions of Flash Player, and so if you’re on the bleeding edge, the number may be lower. Unfortunately, FLV only supports two codecs: H.263 (bad) and On2 VP6 (great but commercial). That said, if the web is your medium, FLV is the obvious choice.

So what should I use?

That depends on what you want to do, of course. If you want to put video on the web, there is one easy option and two good options.

  • Easy web video: FLV/H.263. Low quality, high compatibility, easy to produce. A fine option when your input quality is low, or quality doesn’t matter, but otherwise not recommended.
  • Good web video (1): MOV/H.264. Great quality, decent compatibility, somewhat complex to produce. H.264 lends itself to a really high degree of optimization, and MOV is picky about its H.264. I guess Apple wants people to create their H.264 videos using Quicktime Pro, so other H.264 encoders have trouble with certain settings. Quicktime Pro creates beautiful H.264 videos, but this is a Mac-only solution, and integrating with the Quicktime API is more difficult than integrating with ffmpeg for server-side solutions.
  • Good web video (2): FLV/VP6. Great quality, high compatibility, not free. On2 Flix Engine is fairly affordable for businesses, but cost prohibitive for individuals. This is an attractive option for many businesses, with no downside that I can see except for the up-front cost.

If you aren’t producing video for the web, there are many more options. Ogg/Theora is a great option if you want good quality and royalty-free distribution. MP4/H.264 and MP4/XviD are also good options, though not truly free. WMV 9 (WMV3) could be valid for Windows applications. There are more options and fewer constraints for non-web video, and so it is hard to make generic recommendations.

In my next post, I’ll discuss tools used in video transcoding. Stay tuned!