Zencoder Sneak Peak, Part 3: Scaling and Reliability

Posted by Jon
on Monday, February 04

It’s been 2 months since I’ve posted about Zencoder. You may be glad to know that the project is still alive and is nearing completion. It has moved a little slower than we initially expected (we had hoped to finish in Nov/Dec), but the end is in sight. So if you are interested in a complete video transcoding solution for your web application, request more info at Zencoder.tv. And if you’ve inquired in the past, and you are interested in a demo, let us know: we’re starting our online demos this week.

But on to the sneak peak.

Scaling

Zencoder scales both horizontally and vertically to a nearly arbitrary limit. You can scale vertically (obviously) by increasing the speed of your machines: the faster the CPUs, the faster the transcoding, and any Linux or OS X machine should work. And you can scale horizontally by increasing your number of servers, as high as your licensing allows.

You can scale the same way on EC2, by moving from small instances to large instances, or by increasing your number of instances.

What’s important is that Zencoder works the same whether you run it with 2 servers or with 200 (or more). The system will happily distribute your jobs across however many servers you have. (Note that it will do this by distributing each job to a different server, not one job across many servers – the latter approach can finish each individual job more quickly, but the Zencoder approach should have a somewhat higher overall throughput, since it is a bit more efficient. And it is significantly simpler, of course.)

This means that Zencoder can grow with your application. If you underestimate your traffic, or your site grows exponentially, just add another server or fire up a new EC2 instance.

Finally, you don’t even need to restart Zencoder to add additional servers, so expansion is almost completely seamless. If you’re running on dedicated hardware, just configure Zencoder on a new server and start it up: it will begin working immediately. On EC2, things are even easier: click the big green “Launch new worker” button, and you’ll have a new server up and running in about 3 minutes.

Reliability

On the reliability front, Zencoder is built with three goals in mind.

First, errors should be few and far between. This goal is obvious, and is shared with most software (except for some software coming out of Redmond, WA). And Zencoder is a reliable system in this respect. The code is reasonably compact, and it is well tested (both by unit tests and by humans).

Second, if any (or all) pieces become unavailable, your website should continue to work. In other words, if a user uploads a video at your site, and Zencoder is unavailable for some reason, your website should continue functioning as-is. Similarly, if your site is down, and Zencoder finishes a job, your website should receive the job as soon as it comes back up. As a result, we have designed the system such that you can pull the plug on any piece – or multiple pieces – and as soon as everything is back online, the system picks up right where it left off.

Third, no job should ever get lost in the system. This is accomplished through integrity checks – between the queue and each transcoding node, and between the Zencoder system and your application. You can even wipe the Zencoder queue, and if your application is still waiting on any jobs, they will be automatically recreated.

The result is that Zencoder is almost a self-healing system. It isn’t perfect (yet), but it is robust, reliable, and scalable.

Zencoder Sneak Peak, Part 2: Licensing and Codecs

Posted by Jon
on Thursday, November 15

So how does Zencoder do its transcoding? What codecs and formats does it support? And what about the critical issue of codec licensing?

Licensing

Let’s start with the question of licensing. Codecs are licensed in three main ways.

  1. Commercial software (On2, Nellymoser)
  2. Patents and standards requiring licenses (MPEG-2, AAC, MP3)
  3. Free or unenforced formats and codecs (Ogg, Matroska, AVI)

If you’re going to transcode and distribute video, category 3 is easy. Category 1 is straightforward, and these commercial codecs work just fine with Zencoder.

Category 2 poses the real problem. In my estimation, three of the most important video codecs – MPEG-4, H.264, and MPEG-2 – and three of the most important audio codecs – AAC, MP3, and AMR – all require licensing direct from their patent holders and/or licensing authorities. Those six invaluable codecs are represented by four separate licensing bodies (representing hundreds of patent holders), each of which has separate terms, prices, minimum fees, etc. And when you move to the second tier of codecs, things get even more complicated.

Zencoder takes care of this for you. With Zencoder, you receive licenses to these and other critical codecs. We’re still working out the specifics, but when Zencoder launches, we plan on including licenses to 10-20 patented codecs and formats, covering all of the major ones and several in the second or third tier.

Compatibility

Zencoder can handle every major video codec, audio codec, and container format. And we do this legally, taking care of most licensing issues for you.

Here is a partial list of formats and codecs that we support.

  • H.264
  • FLV
  • MP4
  • 3GP
  • VP6 (with commercial license)
  • MP3
  • AAC
  • MPEG-4 Video, XviD
  • MPEG-2
  • AVI
  • Ogg, Theora, Vorbis
  • Nellymoser (with commercial license)
  • And 100+ more

The goal is to decode everything that you can throw at Zencoder, and to encode everything that you have a good reason to encode.

Zencoder Sneak Peak, Part 1: Integration

Posted by Jon
on Wednesday, October 31

Slantwise has been working on a video processing product called Zencoder for the last several months. Zencoder is a distributed multimedia processing system, written in Ruby, that is powerful, flexible, and scalable. It also integrates easily with any application that needs video transcoding. The product itself is in its final stages of testing, and will launch soon, so head over to http://zencoder.tv and fill out the contact form if you want more info.

In this first “sneak peek” post, we’ll look at Zencoder from a high level. Where does it run? How does it integrate with your application?

Two Hosting Options

Zencoder is a server, or collection of servers, that you host. The first server is the “mind” of Zencoder, which queues jobs and handles all communication with your application. The other servers are “workers” that do the actual video transcoding.

You can host Zencoder on your dedicated hardware or on Amazon EC2. Both work equally well, but we think that the EC2 approach is especially interesting, and Zencoder does some cool things with EC2. (More on that in a few weeks.) But you can mix-and-match EC2 instances and dedicated servers, so you don’t have to choose. You could use dedicated hardware for most of the transcoding work, and use EC2 for overflow work. Or use EC2 for most of your work, but one ultra-fast dedicated machine for high-priority jobs.

Easy Integration

Zencoder integrates via REST web services, so it is completely technology-agnostic. Your application can be written in Java, Ruby, PHP, .NET, Python, Erlang, Ada, or Assembly, as long as it can communicate via HTTP. It can be a web application, server software, or anything else that can send and receive HTTP requests.

To submit a job to Zencoder, you just need to send a POST request to your Zencoder server with the basic details of the job: input file, job id, and task/recipe. Zencoder will then do its work and send a request back to your application with the results of the job: status, output file attributes, output file location.

So your application only needs to do two things to integrate with Zencoder:

  • Send a HTTP request to your Zencoder server when you want to trigger a new transcoding job
  • Handle a HTTP request from Zencoder with the results of a job

We also provide a Ruby on Rails plugin that handles these things for you; so if you’re running Rails, this work is done for you. You just have to install the plugin and configure a few things, and your application is ready to go. Eventually, we plan on providing similar integration classes for Java and PHP. But even if you aren’t using one of these technologies, the integration work is pretty simple on your end, and we’ll help you through it.

RVideo 0.9 is now available

Posted by Jon
on Tuesday, October 02

RVideo is now available as a Ruby gem. Install with:

sudo gem install rvideo

(RVideo depends on other tools for transcoding, like ffmpeg, so you’ll probably need to install a few other things as well. See the Documentation for a little more detail.)

I’ve tagged this release as 0.9.0. It is still beta-quality code, so test thoroughly. If you run into problems, let me know – I’ll be deploying RVideo to a live app soon, so I want to squash any bugs as much as you do. :)

What is it?

RVideo is a Ruby library for video/audio transcoding. It provides a clean Ruby interface to transcoding tools like ffmpeg, and can easily be extended to support more tools. At this point, only ffmpeg and flvtool2 are supported, but more will follow.

1
2
transcoder.execute(recipe, {:input_file => "/path/to/input.mp4",
      :output_file => "/path/to/output.flv", :resolution => "640x360"})

Details

To inspect a file, initialize an RVideo file inspector object. See the documentation for details.

A few examples:

1
2
3
4
5
6
7
8
9
  file = RVideo::Inspector.new(:file => "#{APP_ROOT}/files/input.mp4")

  file = RVideo::Inspector.new(:raw_response => ffmpeg_inspection_response)

  file = RVideo::Inspector.new(:file => "#{APP_ROOT}/files/input.mp4",
                                :ffmpeg_binary => "#{APP_ROOT}/bin/ffmpeg")

  file.fps        # => "29.97"
  file.duration   # => "00:05:23.4"

To transcode a video, initialize a Transcoder object.


  transcoder = RVideo::Transcoder.new

Then pass a command and valid options to the execute method

1
2
3
4
5
6
7
8
9
  recipe = "ffmpeg -i $input_file$ -ar 22050 -ab 64 -f flv -r 29.97 -s"
  recipe += " $resolution$ -y $output_file$"
  recipe += "\nflvtool2 -U $output_file$"
  begin
    transcoder.execute(recipe, {:input_file => "/path/to/input.mp4",
      :output_file => "/path/to/output.flv", :resolution => "640x360"})
  rescue TranscoderError => e
    puts "Unable to transcode file: #{e.class} - #{e.message}"
  end

If the job succeeds, you can access the metadata of the input and output files with:

1
2
  transcoder.original     # RVideo::Inspector object
  transcoder.processed    # RVideo::Inspector object

If the transcoding succeeds, the file may still have problems. RVideo will populate an errors array if the duration of the processed video differs from the duration of the original video, or if the processed file is unreadable.

RVideo supports any transcoding tool with a command-line interface; adding a new tool just means writing a class for the tool that subclasses RVideo::AbstractTool. It also means that you need to use common sense to avoid attacks. For example: don’t run RVideo as a privileged user. Control your input recipes, and don’t accept user-submitted recipes. (RVideo is pretty well protected from these problems; you can’t execute a command that isn’t identified by a transcoder tool class, so `rm -rf *` won’t work. But it pays to be cautious.)

More info

See the RVideo Google Code site for more info, including links to Documentation and a Google discussion group. Use these to file tickets, discuss, etc. (The SVN repository is currently at Rubyforge, but I may move it to Google Code.)

Contribute

I would love help on this project. If you want to help out, there are a few things you can do.

  • Use, test, and submit bugs/patches
  • We need a RVideo::Tools::Mencoder class to add mencoder support. (Someone has started on this, so let me know if you’re interested in helping and I’ll put you in touch.)
  • Other tool classes would be great – On2, mp4box, Quicktime (?), etc.
  • Eventually, it would be great to (optionally) use the processing feedback provided by ffmpeg etc. to get real-time progress updates (e.g. 20% complete, 40% complete, 90% complete). (More info)
  • Submit other fixes, features, optimizations, and refactorings

RVideo is alive - really

Posted by Jon
on Thursday, September 13

It’s been a while since I posted about RVideo. Consider this a pre-release announcement.

RVideo is a Ruby gem that makes video and audio transcoding a bit easier. The gem wraps various video transcoding tools – ffmpeg and flvtool2 upon release. But it is extensible, so that other tools (like mencoder) can be added as needed. Transcoder instructions are specified by passing a recipe to RVideo, along with custom values.

For example:

1
2
3
4
5
6
recipe = "ffmpeg -i $input_file$ -r 29.97 -vcodec xvid -s $resolution$ $output_file"

transcoder = RVideo::Transcoder.new
transcoder.execute(recipe, {:input_file => "original.mov", 
                            :output_file => "processed.mp4", 
                            :resolution => "640x360"})

The resulting transcoder object will include information about the job, including metadata for the output file, etc. The execute method raises a variety of exceptions when it doesn’t work (e.g. input file not found, unsupported formats, could not save the output file), so you’ll want to run #execute in a begin/end block and handle each exception as needed.

I’ll go into more detail in a later post, so look for more updates soon. Expect the gem to launch sometime in October.

Flash Player adds H.264 video support

Posted by Jon
on Tuesday, August 21

Starting today, with Flash Player 9 Update 3 Beta 2, Flash Player will support H.264 video and AAC audio. This is great news for sites hosting online video. It is the equivalent of Microsoft announcing that IE7 would now render web pages exactly like Firefox. In other words, once this becomes reality, online video sites will really only need to worry about one format.

UPDATE: One of the engineers on Flash Player has posted a detailed account of the new Flash Player. The implementation looks well conceived. As expected (below), you will still need to worry about MPEG licensing issues with H.264.

H.264 is a mpeg-4 video codec that provides the best video compression widely available today. That means that H.264 allows better quality video that other codecs, when comparing files of the same size. AAC is a mpeg-4 audio codec (not an Apple codec!) that is a bit better than mp3 and has better licensing terms. (You have to pay royalties when distributing mp3-encoded content, but you don’t with AAC.) See my earlier post on formats and codecs for more info.

This move is great for online video, and bad for On2. Until now, On2’s VP6 codec was by far the best codec available in the Flash Player. VP6 and H.264 are both good codecs, though H.264 has the edge in my experience. The bigger issue is cost. On2’s Flix Engine software is commercial and isn’t cheap, while free H.264 encoders are available (x264). Expect to see less VP6 content over the next year.

A few caveats, though.

First, H.264 encoders may be free, but H.264 is not (strictly speaking). If you make money from H.264 in one way or another, you’ll need to pay royalties. This is true whether you sell a H.264-encoded content, a H.264 encoder or decoder, or make money through other means (subscriptions or advertising). Fortunately, there are minimums – for instance, if you have less than 100,000 subscribers, you don’t need to pay royalties. See the MPEG-LA FAQ for details. This alone may make On2’s one-time cost an attractive option for some businesses.

There is an outside chance that Adobe may have a licensing arrangement that takes care of this, which would be great for content creators, but don’t count on it.

Second, this Flash 9 update will take time to proliferate. It took 9-12 months for Flash 9 to reach 90% market penetration. So unless you’re willing to force your users to upgrade, don’t drop VP6 or H.263 support today.

Third, H.264-encoded video is more compressed than other video, and so it takes more processor power to watch. Most computers these days are plenty fast for H.264, but some users may see their CPUs spike while watching H.264 video. The good news is that early reports say that the new Flash Player will make use of multiple cores on multi-core processors.

Fourth, H.264 has five levels plus sub-levels. Each level allows for better quality and better compression. What level will Flash Player support? The lower levels (1-1.3) are still better than most competitors, but they don’t make use of the codec’s full potential. The Quicktime format, for instance, only supports level 1.3 (if I remember correctly) – not too shabby, but not as good as the MP4 format. It would be great to see Flash Player support at least level 2 H.264. (More info on levels)

Caveats aside, this sounds like great news for video content creators. Keep watching the wires as more details unfold. And watch for updates to RVideo and Spinoza (our forthcoming video transcoder system), which will support Flash/H.264.

RailsConf slides (video transcoding)

Posted by Jon
on Friday, May 18

Here are the slides from my RailsConf presentation today. I’ve taken out the video; if you’re really interested in seeing it, send me an email. I’ve also created a version without any images at all in order to save space.

Video transcoding slides, with images (4.5M)

Video transcoding slides, no images (207K)

I’d also love your feedback from the talk if you were there.

Video Transcoding, part 3: Asynchronous Processing (overview)

Posted by Jon
on Thursday, May 17

(This is part 3 in a series on video transcoding. Parts 1 and 2 covered video formats and codecs, and transcoding tools, respectively.)

Ruby on Rails is a great technology that gives you a lot for free. You get ORM, MVC, templating, HTML helpers, URL rewriting, database schema management, a development web server, prototype, script.aculo.us, deployment automation, a unit test framework, etc. etc. etc. But what you get falls into two basic categories: tools to handle a HTTP request, and tools to make the application easier to manage. In other words, Rails helps you out when something is either triggered directly by an HTTP request, or when it is triggered directly by a developer. This is because the Rails environment fires up and tears down with every HTTP request.

Unfortunately, a video transcoding system can’t be run in either of these ways.

First, video transcoding can’t be done within the space of an HTTP request. Transcoding jobs can take several minutes (or hours), and you can’t expect your users to wait that long. If a request takes longer than a few seconds, a significant number of users will cancel the request. Beyond this, since Rails isn’t thread-safe, every video transcoding job would cause a Mongrel instance to block.

Second, video transcoding could theoretically be triggered manually by an operator, but this solution isn’t responsive or scalable. What happens when your application takes off, and you have dozens or hundreds of files per day?

In my next few posts, I will examine three approaches to asynchronous processing, and will discuss three ways to host your asynchronous system. The use case will be video transcoding, but the approaches themselves could be used for just about any time-consuming action: expensive calculations, large PDF creation, data processing, etc.

1. Database polling with a daemon – This is the simplest approach, and it isn’t a bad one. All you need is one or more transcoding servers with access to your main Rails application database, and maybe some shared disk space. The transcoding servers query the database for unassigned jobs, and the first server to select the job takes it. Then, when the work is done, the transcoding servers update the database with the new state.

2. Message queue polling – Instead of polling a database, this solution uses a message queue (like Amazon SQS). The base application passes a message to the message queue announcing the new job, and the transcoders poll the queue looking for jobs. This approach is a little more complex than the first approach, but is a little more secure, robust, and scalable.

3. Reactor-pattern system – This solution doesn’t involve polling at all. Instead, a controller accepts jobs and assigns them directly to individual workers. This approach is the most complex of the three, but will satisfy the purist’s desire to avoid polling.

I’ll discuss each of these approaches in their own articles in detail. There certainly are other approaches; if you have a favorite that you don’t see here, post a comment.

Finally, I’ll look at three ways to host these approaches: local to your main application, in its own dedicated hosting environment, and using Amazon Web Services.

Stay tuned for the next post. I’m at RailsConf right now, but I’ll try to get something posted within the next few days.

Video transcoding, part 2: tools

Posted by Jon
on Monday, May 07

(This is Part 2 in a multi-part series on video transcoding and the web. For the rest of the series, take a look at the first post.)

There are a lot of tools that can be used to transcode video and audio. These fall into four main categories.

First are desktop applications that can “Export” to another format. iMovie, Final Cut, Premiere, Windows Movie Maker, Flash, and a host of other applications fall into this category. These tools are great for personal, low-volume use, but they require a person to manually load up each job, so they aren’t suitable for integration with another application.

Second are enterprisey apps that primarily do video transcoding, with a GUI frontend to make things easy. These applications watch an input folder, apply a recipe (like “AVI/MPEG-4, 2000kbps”), and send the file to an output folder. Telestream FlipFactory is an example. I haven’t actually used FlipFactory, though I’ve talked with people who say that it offers everything you’ve come to expect in enterprise software. (Ahem.)

Apple Compressor is a lightweight version of this, with a low price, poor documentation, a non-existent user community, slow transcoding speeds, and good quality output. Compressor has some nice built-in workflow and distribution tools. From what I understand, Compressor is kind of like a GUI on the Quicktime transcoding libraries.

Third are asset management tools that do video transcoding as one component among many. These are kind of like FlipFactory (and several actually make use of FlipFactory), but they have a lot of other things tacked on as well. If you need an asset management tool, this might be a good option. But if you want a website that transcodes and publishes user-submitted videos, these tools are like buying a PC for its calculator app.

Fourth and finally are command-line tools that do video/audio encoding, and nothing else. Examples include ffmpeg, mencoder, and On2 Flix Engine. These tools take an input file and a lot of other options and create an output file. Most are open-source. Most (all?) run on Unix/Linux.

It is probably clear that I’m most interested in the fourth category of tools (though Apple Compressor is at least moderately interesting). The first category isn’t suitable for building a robust system, and the second and third categories are expensive and heavyweight. ffmpeg and mencoder are free, and On2 is relatively cheap, and all three do a good job. So let’s explore them in a bit more detail.

Overview of tools

ffmpeg and mencoder

Notice that I’ve lumped these together. That is because ffmpeg and mencoder are kind of like C++ and Objective-C; they do the same thing, in similar ways, with the same foundation, though they aren’t identical. Both ffmpeg and mencoder use the libavcodec and libavformat libraries for the bulk of their codecs and formats. (libavcodec and libavformat are part of the ffmpeg project, and are LGPL’d.)

I’m not an expert on these tools, so I haven’t used ffmpeg and mencoder enough to be able to recommend one over the other. They have their differences; mencoder supports more powerful filters and advanced options. But I’ve used both successfully, and they are ultimately quite similar. (If you have extensive experience with both of these, and want to weigh in on the matter, leave a comment below.)

On the whole, these are two of the most impressive open source projects I’ve ever seen. They are powerful and reliable, with good speed, good quality, and a huge range of supported formats.

ffmpeg and mencoder have one downside: support. The easy things are easy, but the hard things are painful, and you’re basically on your own when it comes to figuring things out. Don’t look for a good manual, or a corporate support contract, or even a few local experts you can hire. The experts are young hackers from around the globe who hang out at http://forum.doom9.org or on the various mailing lists. Also, as good as they are with the complexities of ffmpeg and mencoder, their primary use case is ripping archiving DVDs, not building robust video transcoding systems.

This doesn’t just mean that you won’t squeeze every last drop of power out of ffmpeg. It also means that you will run into strange errors, and your only source of help will be mailing lists and deep Google searching.

Here are a few code examples that illustrate the range of complexity behind these tools. The following command transcodes an input file to MPEG-4 AVC:


ffmpeg -i matrix.mov -vcodec h264 -ab 128 -s 720x304 -r 23.98 matrix-h264.mp4

The following command also transcodes an input file to MPEG-4 AVC, but does a better job:


ffmpeg -y -i matrix.mov -v 1 -threads 1 -vcodec h264 -b 500 -bt 175 -refs 2 -loop 1 -deblockalpha 0 -deblockbeta 0 -parti4x4 1 -partp8x8 1 -partb8x8 1 -me full -subq 6 -brdo 1 -me_range 21 -chroma 1 -slice 2 -max_b_frames 0 -level 13 -g 300 -keyint_min 30 -sc_threshold 40 -rc_eq 'blurCplx^(1-qComp)' -qcomp 0.7 -qmax 35 -max_qdiff 4 -i_quant_factor 0.71428572 -b_quant_factor 0.76923078 -rc_max_rate 768 -rc_buffer_size 244 -cmp 1 -s 720x304 -acodec aac -ab 64 -ar 44100 -ac 1 -f mp4 -pass 1 matrix-h264.mp4

ffmpeg -y -i matrix.mov -v 1 -threads 1 -vcodec h264 -b 500 -bt 175 -refs 2 -loop 1 -deblockalpha 0 -deblockbeta 0 -parti4x4 1 -partp8x8 1 -partb8x8 1 -me full -subq 6 -brdo 1 -me_range 21 -chroma 1 -slice 2 -max_b_frames 0 -level 13 -g 300 -keyint_min 30 -sc_threshold 40 -rc_eq 'blurCplx^(1-qComp)' -qcomp 0.7 -qmax 35 -max_qdiff 4 -i_quant_factor 0.71428572 -b_quant_factor 0.76923078 -rc_max_rate 768 -rc_buffer_size 244 -cmp 1 -s 720x304 -acodec aac -ab 64 -ar 44100 -ac 1 -f mp4 -pass 2 matrix-h264.mp4

Note that the command occurs twice, almost identically; this is a two-pass job, so the same command is executed twice. Two-pass encoding can create more efficient files, since the second pass learns from the first pass.

Supporting tools

ffmpeg and mencoder are great, but they don’t stand on their own – just like Linux is great, but it doesn’t do much without GNU tools. ffmpeg and mencoder handle dozens of codecs and libraries, but several common (and important) libraries are handled elsewhere. Want H.264? Use x264. What about MP3? LAME mp3. AAC? faac and faad. These programs can run on their own, or support can be compiled directly into mencoder and ffmpeg. The latter option is usually preferable to keep things simple, but of course, this doesn’t always work. You may need to transcode in multiple steps, because ffmpeg doesn’t support your desired codec, or because you can get better quality by doing things separately. For this reason, most of these programs can output to a pipe and take input from a pipe. Otherwise, you can just create temporary files for the intermediate steps.

There are also many tools that do small, specialized tasks, and which are not integrated with mencoder and ffmpeg. Want to export your video as raw YUV frames? Want to repackage a mp4 file as mov? Want to add metadata to a file? Mencoder may be up to the task, but if it isn’t, there are dozens of small tools that can be used to do these things.

On2 Flix Engine

On2 Flix Engine is a commercial video transcoder that outputs VP6 files. This is good: VP6 is comparable to H.264 in terms of quality and efficiency. It works similarly to ffmpeg and mencoder – install it, run a binary with options (input file, output file, quality, resolution, etc.), and that’s it.

I haven’t On2 Flix Engine extensively, but I want to. Why? Becuase it outputs high quality FLV files. This means that On2 offers extremely high quality and extremely high compatibility, unlike flv/h.263 (high compatibility, low quality) or H.264 (high quality, lower compatibility). So it makes a lot of sense for distributing files over the web.

Since I haven’t used On2 much, I asked a friend of mine who has used it extensively, Matt Bauer, for his experience. He posted his thoughts in this article on On2. He’s also working on Ruby bindings for On2, which is very cool.

What about Ruby?

This series is, at least in part, a discussion of how to create a video transcoding system using Ruby. So does Ruby have a place in low-level video transcoding? Short answer: no. Video transcoding is time consuming and processor intensive. Ruby is slow. Bad combination. That doesn’t mean that Ruby isn’t suitable for a high level video transcoding system, as we’ll see in the next post; Ruby is a near-perfect language for gluing together a transcoding system. But for actually decoding a file and reencoding it in another format: stick to C. All of the tools I’ve discussed were written in C, with maybe some rogue C++ or Objective-C here and there. And actually, there is an x264 parallel encoder known as x264-farm that is written in OCaml, which is pretty sweet. But these are all really fast languages, and Ruby is not.

So let’s leave Ruby for our controller code. Stay tuned.

Video Transcoding, part 1: formats and codecs

Posted by Jon
on Thursday, April 26

(This is Part 1 in a multi-part series on video transcoding and the web. For the rest of the series, take a look at the first post. It was edited on April 27 to include the Ogg format and codecs.)

A typical video file is made up of one or more video streams and one or more audio streams. Each video or audio stream has a codec, and the file itself has a container format.

A codec (compressor/decompressor) is a software encoding that allows a stream to be compressed for storage or transmission, and decoded to a raw or readable format. Most, but not all, codecs are lossy. Examples of familiar audio codecs include MP3 (technically, MPEG-1 audio format 3) and AAC (technically, MPEG-4 part 3). Familiar video codecs include DivX and Xvid (both implementations of MPEG-4 ASP).

A container format (sometimes called a wrapper format) is a file format that can contain video and audio streams (along with other types of streams, like text). Common container formats include MOV (Quicktime), and AVI.

In this post, I will highlight a few interesting codecs and formats. The discussion will be far from comprehensive – there other good codecs and formats. My goal is to discuss the best ones, but there may be others that I’m not aware of. (Leave a comment if you want to suggest one!)

Formats and Codecs

Video codecs

  • MPEG-2 is commonly used for DVDs and ATSC broadcasts. MPEG-2 doesn’t have particularly efficient compression, so unless you’re creating a playable DVD or broadcasting over the air, you probably won’t create MPEG-2 video files.
  • MPEG-4 ASP is a much more interesting format, and it commonly goes by the names XviD or DivX. These codecs provide reasonably good compression, wide compatibility, and a reasonable CPU load for encoding/decoding. DivX is owned by DivX, Inc., while XviD is GNU licensed.
  • H.264 is also known as MPEG-4 AVC. H.264 is a more advanced video codec than ASP/XviD (hence the AVC acronym), and provides better compression. How much better? Don’t quote me on this, but IIRC it provides about 35-50% better quality for the same file size. Unfortunately, it has two disadvantages when compared to MPEG-4 ASP. First, H.264 decoders are a little less common than XviD decoders, so compatibility isn’t quite as widespread. Second, encoding and decoding H.264 is more processor intensive than XviD. That said, H.264 is probably the best quality video codec on the market today, or at least is tied for this honor.
  • H.263 is, unfortunately, not just an incremental step down from H.264. It is about 8 years older, and provides far worse quality. This is unfortunate, because H.263 is used in at least two prominant places. First, most Flash video is encoded with H.263, including just about everything on YouTube and MySpace. Second, H.263 is often used by cell phones to play or capture video. Despite the low quality, H.263 makes perfect sense if you want to put video on the web and don’t want to use the codec that will be described next – VP6.
  • VP6 is a proprietary codec developed by On2 Technologies. VP6 has two major advantages. First, it offers great compression, comparable to H.264. Second, VP6 is used by Flash 8, so it is a great candidate for video that will be played back on the web. The downside is that VP6 can’t be encoded for free; it requires commerical software, like Flash 8, or the On2 Flix Engine. But this may actually not be a downside, as I’ll discuss in a future article on licensing and royalties.
  • WMV describes several Microsoft codecs. Confusingly, WMV 7 is also known as WMV1, WMV 8 is also known as WMV2, and WMV 9 is also known as WMV3. (The latter name refers to the codec’s FourCC code, and I think the former name refers a corresponding version of Windows Media Player.) These codecs are not bad, and with Flip4Mac, they are no longer restricted to Windows machines.
  • Theora is a truly open-source video codec based on On2’s VP3 codec, as a part of the Ogg project. It provides comparable quality to MPEG-4 ASP (e.g. XviD), and it is BSD-licensed. Did I mention that it is open-source (unlike any of the MPEG, Microsoft, or On2 codecs)? I’ll discuss this in more detail in a future post.

Audio codecs

  • MP3 is probably the most famous codec of all time. It is a part of the MPEG-1 standard (MPEG-1 audio layer 3). This codec provides adequate quality at a bitrate of about 128kbps. However, distribution of MP3-encoded content is not free, as will be discussed in a later post.
  • For this reason, AAC is a better audio codec than MP3 for most uses. AAC-encoded content is free to distribute, which is one of the reasons why iTunes chose AAC for its content. AAC also provides better compression than MP3 – 96kbps AAC is generally considered equal to 128kbps MP3.
  • Vorbis is an Ogg audio codec that is similar in quality to MP3, or perhaps slightly better. Like Theora, it is truly free (unlike AAC or MP3).
  • FLAC and Apple Lossless are two lossless codecs that provide about 50% size savings compared to uncompressed audio. FLAC is part of the Ogg project, and I’m not sure who developed Apple Lossless.
  • AC-3 is the Dolby Digital audio codec that can store 5.1 channel audio.

Formats

  • MOV is the Quicktime container format. It is a reasonably good format, with pretty widespread compatibility (every computer with iTunes installed also has Quicktime) and decent codec support. One major advantage of Quicktime is that MOV files can be played back in many browsers (though not all).
  • MP4 (MPEG-4 part 14) is based on Quicktime and is very similar, but it supports some codecs and encoding options not supported by MOV (like advanced H.264 profiles). Most media players can play MP4, though not in a web browser.
  • AVI is an old format (1992!) that is somewhat outdated today. It has pretty good codec support, though it isn’t able to handle some modern codecs very efficiently, including XviD and H.264. Ask yourself why you want to use AVI instead of a more modern format.
  • ASF is a proprietary Microsoft format. If you’re primarily using a Microsoft codec and Microsoft players, ASF may be a good option.
  • OGG is a free, open-source format released under the BSD-license that typically is used with Vorbis, Theora, FLAC, and other Ogg codecs. However, the Ogg container format can wrap other formats (like MP3 and various MPEG-4 codecs).
  • FLV is the Flash Video format, and offers by far the best web browser compatibility – likely in the 99% range for folks with up-to-date computers. Of course, there are several versions of Flash Player, and so if you’re on the bleeding edge, the number may be lower. Unfortunately, FLV only supports two codecs: H.263 (bad) and On2 VP6 (great but commercial). That said, if the web is your medium, FLV is the obvious choice.

So what should I use?

That depends on what you want to do, of course. If you want to put video on the web, there is one easy option and two good options.

  • Easy web video: FLV/H.263. Low quality, high compatibility, easy to produce. A fine option when your input quality is low, or quality doesn’t matter, but otherwise not recommended.
  • Good web video (1): MOV/H.264. Great quality, decent compatibility, somewhat complex to produce. H.264 lends itself to a really high degree of optimization, and MOV is picky about its H.264. I guess Apple wants people to create their H.264 videos using Quicktime Pro, so other H.264 encoders have trouble with certain settings. Quicktime Pro creates beautiful H.264 videos, but this is a Mac-only solution, and integrating with the Quicktime API is more difficult than integrating with ffmpeg for server-side solutions.
  • Good web video (2): FLV/VP6. Great quality, high compatibility, not free. On2 Flix Engine is fairly affordable for businesses, but cost prohibitive for individuals. This is an attractive option for many businesses, with no downside that I can see except for the up-front cost.

If you aren’t producing video for the web, there are many more options. Ogg/Theora is a great option if you want good quality and royalty-free distribution. MP4/H.264 and MP4/XviD are also good options, though not truly free. WMV 9 (WMV3) could be valid for Windows applications. There are more options and fewer constraints for non-web video, and so it is hard to make generic recommendations.

In my next post, I’ll discuss tools used in video transcoding. Stay tuned!

Video Transcoding: part 0 in a N part series

Posted by Jon
on Sunday, April 22

(Update: I gave a talk on video transcoding at MinneBar, and the slides are now available online.)

Video on the web is a hot topic these days. Hundreds of people want to be the next YouTube, and thousands more are making use of user-submitted in some other way.

Unfortunately, putting video on the web is a total mystery to some developers, and may seem deceptively simple to others. The former don’t know where to start, and couldn’t tell a codec from a container, or ffmpeg from mencoder. The latter know the fundamentals, which don’t look too tough, but would have trouble putting together a scalable, robust, production-worthy system.

I’m not an expert, but I have been a part of several projects which do video transcoding. So in my next several posts, I will outline video transcoding for the web from a variety of angles.

  • Part 2: Tools – what are the free (e.g. ffmpeg) and commercial (e.g. On2 Flix) tools that can be used to build a transcoding system?
  • Web application integration – once I’ve settled on codecs and tools, how do I put everything together into a working system?
  • Legal and licensing issues – some prominent codecs and formats are commercial; some are completely free; and some of the most popular occupy an ugly middle ground that require royalties.
  • RVideoSlantwise is working on a Ruby-based video transcoding library called RVideo. I’ll outline its capabilities and the various design decisions we made (and are making) along the way.

A few parts of this series will have a Ruby on Rails focus, but most of the information is generic. So if Ruby isn’t your thing, you may still want to stick around.