(This is Part 2 in a multi-part series on video transcoding and the web. For the rest of the series, take a look at the first post.)
There are a lot of tools that can be used to transcode video and audio. These fall into four main categories.
First are desktop applications that can “Export” to another format. iMovie, Final Cut, Premiere, Windows Movie Maker, Flash, and a host of other applications fall into this category. These tools are great for personal, low-volume use, but they require a person to manually load up each job, so they aren’t suitable for integration with another application.
Second are enterprisey apps that primarily do video transcoding, with a GUI frontend to make things easy. These applications watch an input folder, apply a recipe (like “AVI/MPEG-4, 2000kbps”), and send the file to an output folder. Telestream FlipFactory is an example. I haven’t actually used FlipFactory, though I’ve talked with people who say that it offers everything you’ve come to expect in enterprise software. (Ahem.)
Apple Compressor is a lightweight version of this, with a low price, poor documentation, a non-existent user community, slow transcoding speeds, and good quality output. Compressor has some nice built-in workflow and distribution tools. From what I understand, Compressor is kind of like a GUI on the Quicktime transcoding libraries.
Third are asset management tools that do video transcoding as one component among many. These are kind of like FlipFactory (and several actually make use of FlipFactory), but they have a lot of other things tacked on as well. If you need an asset management tool, this might be a good option. But if you want a website that transcodes and publishes user-submitted videos, these tools are like buying a PC for its calculator app.
Fourth and finally are command-line tools that do video/audio encoding, and nothing else. Examples include ffmpeg, mencoder, and On2 Flix Engine. These tools take an input file and a lot of other options and create an output file. Most are open-source. Most (all?) run on Unix/Linux.
It is probably clear that I’m most interested in the fourth category of tools (though Apple Compressor is at least moderately interesting). The first category isn’t suitable for building a robust system, and the second and third categories are expensive and heavyweight. ffmpeg and mencoder are free, and On2 is relatively cheap, and all three do a good job. So let’s explore them in a bit more detail.
Overview of tools
ffmpeg and mencoder
Notice that I’ve lumped these together. That is because ffmpeg and mencoder are kind of like C++ and Objective-C; they do the same thing, in similar ways, with the same foundation, though they aren’t identical. Both ffmpeg and mencoder use the libavcodec and libavformat libraries for the bulk of their codecs and formats. (libavcodec and libavformat are part of the ffmpeg project, and are LGPL’d.)
I’m not an expert on these tools, so I haven’t used ffmpeg and mencoder enough to be able to recommend one over the other. They have their differences; mencoder supports more powerful filters and advanced options. But I’ve used both successfully, and they are ultimately quite similar. (If you have extensive experience with both of these, and want to weigh in on the matter, leave a comment below.)
On the whole, these are two of the most impressive open source projects I’ve ever seen. They are powerful and reliable, with good speed, good quality, and a huge range of supported formats.
ffmpeg and mencoder have one downside: support. The easy things are easy, but the hard things are painful, and you’re basically on your own when it comes to figuring things out. Don’t look for a good manual, or a corporate support contract, or even a few local experts you can hire. The experts are young hackers from around the globe who hang out at http://forum.doom9.org or on the various mailing lists. Also, as good as they are with the complexities of ffmpeg and mencoder, their primary use case is
ripping archiving DVDs, not building robust video transcoding systems.
This doesn’t just mean that you won’t squeeze every last drop of power out of ffmpeg. It also means that you will run into strange errors, and your only source of help will be mailing lists and deep Google searching.
Here are a few code examples that illustrate the range of complexity behind these tools. The following command transcodes an input file to MPEG-4 AVC:
ffmpeg -i matrix.mov -vcodec h264 -ab 128 -s 720x304 -r 23.98 matrix-h264.mp4
The following command also transcodes an input file to MPEG-4 AVC, but does a better job:
ffmpeg -y -i matrix.mov -v 1 -threads 1 -vcodec h264 -b 500 -bt 175 -refs 2 -loop 1 -deblockalpha 0 -deblockbeta 0 -parti4x4 1 -partp8x8 1 -partb8x8 1 -me full -subq 6 -brdo 1 -me_range 21 -chroma 1 -slice 2 -max_b_frames 0 -level 13 -g 300 -keyint_min 30 -sc_threshold 40 -rc_eq 'blurCplx^(1-qComp)' -qcomp 0.7 -qmax 35 -max_qdiff 4 -i_quant_factor 0.71428572 -b_quant_factor 0.76923078 -rc_max_rate 768 -rc_buffer_size 244 -cmp 1 -s 720x304 -acodec aac -ab 64 -ar 44100 -ac 1 -f mp4 -pass 1 matrix-h264.mp4
ffmpeg -y -i matrix.mov -v 1 -threads 1 -vcodec h264 -b 500 -bt 175 -refs 2 -loop 1 -deblockalpha 0 -deblockbeta 0 -parti4x4 1 -partp8x8 1 -partb8x8 1 -me full -subq 6 -brdo 1 -me_range 21 -chroma 1 -slice 2 -max_b_frames 0 -level 13 -g 300 -keyint_min 30 -sc_threshold 40 -rc_eq 'blurCplx^(1-qComp)' -qcomp 0.7 -qmax 35 -max_qdiff 4 -i_quant_factor 0.71428572 -b_quant_factor 0.76923078 -rc_max_rate 768 -rc_buffer_size 244 -cmp 1 -s 720x304 -acodec aac -ab 64 -ar 44100 -ac 1 -f mp4 -pass 2 matrix-h264.mp4
Note that the command occurs twice, almost identically; this is a two-pass job, so the same command is executed twice. Two-pass encoding can create more efficient files, since the second pass learns from the first pass.
ffmpeg and mencoder are great, but they don’t stand on their own – just like Linux is great, but it doesn’t do much without GNU tools. ffmpeg and mencoder handle dozens of codecs and libraries, but several common (and important) libraries are handled elsewhere. Want H.264? Use x264. What about MP3? LAME mp3. AAC? faac and faad. These programs can run on their own, or support can be compiled directly into mencoder and ffmpeg. The latter option is usually preferable to keep things simple, but of course, this doesn’t always work. You may need to transcode in multiple steps, because ffmpeg doesn’t support your desired codec, or because you can get better quality by doing things separately. For this reason, most of these programs can output to a pipe and take input from a pipe. Otherwise, you can just create temporary files for the intermediate steps.
There are also many tools that do small, specialized tasks, and which are not integrated with mencoder and ffmpeg. Want to export your video as raw YUV frames? Want to repackage a mp4 file as mov? Want to add metadata to a file? Mencoder may be up to the task, but if it isn’t, there are dozens of small tools that can be used to do these things.
On2 Flix Engine
On2 Flix Engine is a commercial video transcoder that outputs VP6 files. This is good: VP6 is comparable to H.264 in terms of quality and efficiency. It works similarly to ffmpeg and mencoder – install it, run a binary with options (input file, output file, quality, resolution, etc.), and that’s it.
I haven’t On2 Flix Engine extensively, but I want to. Why? Becuase it outputs high quality FLV files. This means that On2 offers extremely high quality and extremely high compatibility, unlike flv/h.263 (high compatibility, low quality) or H.264 (high quality, lower compatibility). So it makes a lot of sense for distributing files over the web.
Since I haven’t used On2 much, I asked a friend of mine who has used it extensively, Matt Bauer, for his experience. He posted his thoughts in this article on On2. He’s also working on Ruby bindings for On2, which is very cool.
What about Ruby?
This series is, at least in part, a discussion of how to create a video transcoding system using Ruby. So does Ruby have a place in low-level video transcoding? Short answer: no. Video transcoding is time consuming and processor intensive. Ruby is slow. Bad combination. That doesn’t mean that Ruby isn’t suitable for a high level video transcoding system, as we’ll see in the next post; Ruby is a near-perfect language for gluing together a transcoding system. But for actually decoding a file and reencoding it in another format: stick to C. All of the tools I’ve discussed were written in C, with maybe some rogue C++ or Objective-C here and there. And actually, there is an x264 parallel encoder known as x264-farm that is written in OCaml, which is pretty sweet. But these are all really fast languages, and Ruby is not.
So let’s leave Ruby for our controller code. Stay tuned.