Understanding map and reduce

Posted by Jon
on Monday, August 11

Map and reduce are two of the most important internal iterators in functional programming. But in my experience as a Ruby developer, while map is frequently used, it should be used a bit more; and reduce (== inject) is underused and often misunderstood.

So how do you know when to use map or reduce on a collection? Simple. When iterating through an array, if you don’t want a return value from the operations, use each; and if you’re looking for a return value, use the iterator method that delivers the type of value you want returned. So if you want to take a collection and return a subset of that collection based on some criteria, use select. (See an earlier article for more.) If you want to return a transformed version of each element, use map. And if you want to return any value whatsoever, or a value that doesn’t match another iterator method, use reduce.

As an aside, do reduce and map have anything to do with the MapReduce architecture for distributed processing? Not surprisingly, the answer is “yes,” and I’ll talk more about that later this week.

inject, reduce, fold

One function, three names. If you’re a Ruby user and have access to Ruby 1.8.7, I suggest you forget the name inject altogether; I find it confusing, personally, and moving forward, inject has another name: reduce. This is much better, and I’ll discuss terminology in a minute (along with a third common name for this function: fold).

reduce takes in an array and reduces it to a single value. It does this by iterating through a list, keeping and transforming a running total along the way. This running total can be a single value (0, 3.7, “abcdefg”), a collection ([], {}), or anything else, really. Each iteration starts with the return value of the previous iteration and does something with it.

Formally, reduce takes three arguments: a collection, an initial value (which is used on the first pass), and a function to apply at each pass through the collection. Here is a Ruby example that uses reduce to sum a series of numbers:

1
2
3
(5..10).reduce(0) do |sum, value|
  sum + value
end

Let’s walk through this example in more detail. Here are the three arguments passed to reduce in this example:

  • Collection: 5, 6, 7, 8, 9, 10
  • Initial Value: 0
  • Function: add current value to running total
Pass # Collection Value Running Total Return Value
1 5 0 (initializer value) 5
2 6 5 11
3 7 11 18
4 8 18 26
5 9 26 35
6 10 35 45

The return value from this function will be 45. At each pass, the function takes two values: the current element in the array, and the return value from the previous return value (or the initializer value for the first pass). (This is the |sum, value| part of the Ruby example.)

What would this example look like using each instead of reduce?

1
2
3
4
5
6

sum = 0
(5..10).each do |value|
  sum += value
end
sum

Any time you see this (anti)pattern – initializing a variable, looping to change the variable, and returning the variable – you know you need a new collection function. In this case, reduce does the trick.

Digression on blocks

Personally, while Ruby’s block syntax makes code beautifully readable, I sometimes have trouble keeping track of how this syntax relates to a straightforward functional syntax. After all, I described reduce as taking three arguments: a collection, a starter value, and a function. But in the Ruby example above, I’m only passing one argument (0) to reduce. So if it helps, here is a another way to think about reduce, in pseudo-scheme.


(reduce + 0 (range 5 10))

Here we’re explicitly passing three arguments to reduce: + (the addition operator), 0 (the seed value), and the range of numbers from 5 to 10 (our collection). Remember that (5..10).reduce(0) {|sum, value| sum + value } does exactly the same thing, just rearranged a bit.

Back on track

Let’s look at a slightly more complicated case. reduce can be used to implement just about any other collection function, from map to sort to select. Here is a way to emulate select using reduce.

1
2
3
4
5
(1..10).reduce([]) do |result, value| 
  result << value if value > 5
  result
end
# [6, 7, 8, 9, 10]

You can also emulate map with reduce, like this:

1
2
3
4
5
(1..10).reduce([]) do |result, value| 
  result << value * value
  result
end
# [1, 4, 9, 25, 36, 49, 64, 81, 100]

Of course, you wouldn’t want to do this. Whenever possible, you’re generally better off using a more specific function, like map in this case. If you want to sum numbers, use a sum function instead of reduce. If you want a hash, try build_hash. (I say “generally”, because there are also diminishing returns – creating a new reduce-style iterator for every possible use of reduce is overkill. Use your judgment.)

But this shows you the power of reduce; reduce can be used to implement any other internal iterator. Any time you want to take a collection return something else – a value, another collection, etc. – reduce is capable.

Why 3+ names?

This function has three names: “inject”, “reduce”, and “fold”. All make sense from one perspective.

  • fold is used by Haskell, Scheme, and OCaml. This name highlights the fact that this function “folds” the return value of one pass into the next pass. Actually, this function is really divided into fold-left and fold-right, referring to the direction of the reduction. Do you start at the left of the list, moving right, or go from right to left? For associative operations (like addition), it doesn’t make a difference. 1 + 2 + 3 == 3 + 2 + 1. But for non-associative operations, like division, exponents, and string concatenation, order matters: 12^ != 21^, and “a” + “b” != “b” + “a”.
  • reduce, used by Common Lisp, Python, Javascript, and now Ruby, describes the ultimate goal of the function: reduce a collection to a single return value. But keep in mind that the single return value can be a collection. So reduction has nothing to do with size – a reduce function called on a 10 element array could return a 100 element array, or it could return a single integer, or a hash, or something else.
  • inject, the Smalltalk name for this function (and the dominant Ruby name until recently), is my least favorite. I think it refers to “injecting” the return value of the previous function call into the next function call, but I could be wrong.
  • If it helps, you can even think of this function as accumulate, which is what C++ calls it. This name is generally appropriate; accumulate some return value through iterating through a collection. Just remember that there isn’t actually a “global” accumulated variable that is carried over through each function call, and returned at the end; each pass just folds its return value into the next pass. That’s it.

So that’s reduce. If you’re having trouble getting your mind around it, I recommend reading up a bit more, because it is an important concept. It is also important to understanding MapReduce.

map

map takes an array, applies a function to each element, and returns a new array with the results. Here is its equivalent using each.

1
2
3
4
5
email_addresses = []
users.each do |user|
  email_addresses << user.email
end
email_addresses

We can improve upon this using map.

1
2
3
users.map do |user|
  user.email
end

This is quite a bit simpler than reduce, and I’m not going to spend much time on it. If you’re an experienced Ruby programmer, you’ve probably used map hundreds of times. If it’s new to you, just remember that map takes an array and returns an array of exactly the same size. And think of some practical uses of map:

  • Convert [1,2,3,4,5] to [“one”, “two”, “three”, “four”, “five”]
  • Convert [“Jon Dahl”, “Luke Francl”, “Eric Chapweske”] to [“Jon”, “Luke”, “Eric”]
  • Convert [“72%”, “1%”, “50%”] to [.72, .01, .5]
  • Convert Tag.find(:all) to [”<span class=’small-tag’>Ruby</span>”,”<span class=’large-tag’>Merb</span>”,”<span class=’small-tag’>Perl</span>”]

Other functions

These aren’t the only important iterator functions, by any means. But map, reduce, and select are among the most important. Get them solidly under your belt, and you’ll write better code. They’ll also help you from a conceptual standpoint; MapReduce isn’t exactly map + reduce; it can even be implemented in languages that don’t have map or reduce capabilities. But it forms the conceptual foundation of MapReduce, and MapReduce works because of specific properties of map and reduce. More on that later this week.

Comments

Leave a response

  1. jacob swannerAugust 11, 2008 @ 02:13 PM

    I described reduce as taking three arguments: a collection, a starter value, and a function. But in the Ruby example above, I’m only passing one argument (0) to reduce.

    you could pass 2 arguments if you make a proc first:
    
    func = proc { |sum, val| sum + val }
    (5..10).inject(0, &func)
    
  2. Adam ByrtekAugust 11, 2008 @ 03:05 PM

    Reduce is as you say “underused” because it actually is not that useful in a language like Ruby. Most of the times it can be replaced with a more concrete method call (like sum), which makes it easier to understand. This is why Python guys decided to remove reduce from Python 3000. Actually even in your post I can’t see any use of reduce that is original and cannot be substituted with something more readable.

  3. Jon DahlAugust 11, 2008 @ 04:10 PM

    Adam: this is somewhere where Guido and I part ways. Here is a quote from Guido.

    I’m not killing reduce() because I hate functional programming; I’m killing it because almost all code using reduce() is less readable than the same thing written out using a for loop and an accumulator variable.

    I agree with you, Guido, and others, that more specific method calls (e.g. sum) are preferable to reduce. But I disagree on two levels. First, I don’t think that a for(each) loop + external accumulator is more readable than reduce. I like to have my accumulator internal to the iterator, so as to eliminate an unnecessary side effect (i.e. operating on an external variable). Second, I rarely use reduce in Ruby, because another function is generally a better abstract fit; but reduce is powerful and fundamental to computing, and I like to have it around every now and then!

    I think this comes down to a philosophical preference: one right way to do things vs. multiple ways to do things.

  4. Liam MorleyAugust 15, 2008 @ 01:49 AM

    From what I can tell (and I’m not a Ruby guru by any means), there is no Enumerable.sum or Array.sum in Ruby; Rails, however, tacks this on, and in fact uses reduce in their implementation.

  5. Jon DahlAugust 15, 2008 @ 10:46 AM

    Liam: you’re correct. Rails adds this in via ActiveSupport. If Ruby were ever to go the way of Python and get rid of reduce/inject, it would have to add a few native methods, including a flexible sum (that knows how to combine numbers, strings, arrays, etc.). Actually, I’d love to see a native sum method alongside reduce – maybe 1.9 or 2.0 will provide this.

  6. Rick DeNataleAugust 15, 2008 @ 10:55 AM

    inject, the Smalltalk name for this function (and the dominant Ruby name until recently), is my least favorite. I think it refers to “injecting” the return value of the previous function call into the next function call, but I could be wrong.

    In Smalltalk, the full message name is inject:into, and it’s called like this:

    collection inject: 0 into: [:sum :element | sum + element]

    So, in Smalltalk, I think that it’s pretty clear that the inject refers to injecting the initial value into the computation. Ruby took the name, kind of, but it gets muddled by the loss of the parameter name for the block(function argument) and the fact that the initial (injected) argument is optional.

  7. ljscAugust 15, 2008 @ 03:14 PM

    but reduce is powerful and fundamental to computing, and I like to have it around every now and then!

    Exactly. Well, actually it’s fundamental to list processing. Arrays aren’t inductively defined, but this post is about functional programming and when you use Enumerable you can essentially pretend they are lists. Lists are created using a cons operation which creates a new list from an element and another list. Then it has an empty list as the base case.

    So basically what inject/fold/reduce do is replace cons with some arbitrary binary operation and the empty list with a base value of the proper type (which often turns out to be the identity for your operation).

    
        1 : 2 : 3 :4 : 5 : []
        --->
        1 + 2 + 3 + 4 + 5 + 0
        1 * 2 * 3 * 4 * 5 * 1
    

    Sure, in higher level code you might end up using a sum method, but why wouldn’t you use inject to implement it?