Functional programming and looping

Posted by Jon
on Tuesday, July 29

If you’re a programmer, you’ve probably worked through one or more books teaching you the syntax of a new language. I’ve had this experience with half a dozen languages, like C, Javascript, and Perl. These books are typically introduce loops midway through the syntax discussion, after datatypes and control flow, but before I/O and advanced features.

Loops are almost always presented according to this formula.

  • Inane intro text: “what if you want to do an operation more than once”?
  • Introduce while loop, with difference between do while and while do.
  • Introduce for loop, the while loop’s crazy cousin.
  • (Bonus) Introduce foreach loop if language is sufficiently high-level. And that’s it – you know how to loop through code; time to move on.

Not so fast. If you’re lucky enough to use a language that draws from functional programming, you shouldn’t loop like this.

The point

From now on, I’m going to use Ruby for examples, but this article isn’t about Ruby. It is about transitioning from primitive loops to iterating through collections, and from generic collection functions (like each) to more specific functions (like map).

From loops to array traversal

For the last several months, I’ve been working on Tumblon, a medium-sized Rails application. I’ve worked on 15-20 Ruby applications over the last three years, probably totaling 50,000 lines of Ruby code.

I’ve only used a primitive loop once.

That primitive loop was a loop {} loop, forever polling a task list looking for jobs. In other words, a loop with no exit condition beyond ^C or a server crash. As far as I know, Ruby doesn’t have a for loop at all, which would explain why I haven’t used it. It has a foreach loop (for item in arr), but that’s syntactic sugar for arr.each {}.

So the first reason why I’ve only used a simple loop in one case: the each concept usually a better option. Its Ruby implementation will be familiar to anyone who’s seen Ruby code before:

1
2
3
["horse", "pig", "cow"].each do |animal|
  puts "Old MacDonald has a #{animal}"
end

(Yes, I have a small child.)

This is far cleaner than its for or while loop alternatives. And it is a better abstract representation of what we’re doing: we aren’t looping with an exit condition, we are iterating through an array. But what if you want to do something a fixed number of times? Even that can be understood as traversing a list, like [1,2,3,4,5,6,7,8,9,10].each {}. Of course, Ruby provides a cleaner version: 10.times {}.

So if your loop is working through a list of some sort, each is a better abstraction of the problem. And in my experience building Ruby applications, every loop but one has been traversing a list. Parsing XML? Traversing a collection. Summing numbers? Traversing a collection. Reading in a textfile? Listening to STDIN? Working with rows in a database? Traversing a collection. That’s what each loops do well.

Beyond arr.each

But each isn’t the final word. It is a step up from a primitive for or while loop when working with a collection of values, but many each loops should be replaced with other array methods, like map, inject, and select.

When is each useful? Simple: when you want to create side-effects, like saving to the database, printing a result, or sending a web service call. In these cases, you’re not concerned with the return value; you want to change state on the screen, the disk, the database, or something else. Take a look at this code.

1
2
3
User.find(:all).each do |user|
  Notification.deliver_email_newsletter(user)
end

You don’t need a return value from this – you need emails to be delivered.

But don’t use each if you want to extract some new value from an array. That’s not what it’s for. Instead, take a look at three other powerful functions: map, inject, or select. To see why, let’s take a look at select. Here is code that takes in an array, and creates a new array from elements that match a certain condition, using each.

1
2
3
4
5
active_users = []
users.each do |user|
  active_users << user if user.active?
end
active_users

Man, the first and last lines are ugly. Why do you have to initialize and return active_users? Answer: because this is a misuse of each. You are much better off using select (or its equivalent, find_all):

1
2
3
users.select do |user|
  user.active?
end

Using select is shorter, easier to understand, and less bug-prone. And more importantly, it clearly encapsulates one common use of each (and looping in general).

Two other key functions – map and inject (or reduce) – complement select and follow a similar pattern. And not surprisingly, they form the foundation of the mapreduce approach to distributed processing. I’ve written more about map and reduce in another article, and here is shorthand for knowing which of these functions to use:

Desired Return Value Function
New array with same number of values map
New array composed of part of the old array select
Single value (though this value can be an array) inject
none each

The point, redux

Use each for changing state. Otherwise, avoid side-effects and use “functional” array methods that return a value. Simple. Your code will be cleaner and less bug prone.

And remember the dead giveaway:

  1. Initialize an empty value, or array, or whatever (new_arr = [])
  2. arr.each, changing the initialized value
  3. Return the value (return new_arr)

Whenever you see this pattern, you know you’ve got an each loop that needs swapping out.

(Edit: I’ve posted a follow-up article with more about map and reduce.)

Comments

Leave a response

  1. Pedro PimentelJuly 29, 2008 @ 11:43 AM

    Hmm, very useful your explanation about looping/iteration. I’m going to use it to show Ruby features to convince my friends about ruby !

  2. Tim ConnorJuly 29, 2008 @ 11:57 AM

    Ruby has a barebones for loop, and it’s faster than enumerating, so you might encounter it in the case of things that have to be heavily optimized. Outside of that, no, there is no reason to use it, generally.

    Hell, I use range enumeration* for looping between just a series of numbers, even.

    *(1..5).each

  3. Stuart HenryJuly 29, 2008 @ 02:33 PM

    Great post, very well explained. I can think of a handful of mis-used each loops that I will refactor to selects or map asap.

    Looking forward to your posts regarding map and inject functions.

  4. GravisJuly 29, 2008 @ 03:03 PM

    Thanks, really, for this simple but great article. I have learned something today !

  5. Jim DevilleJuly 29, 2008 @ 03:51 PM

    RE Tim Connor’s comment: From the semantics, I wonder if Ruby’s for is the same as Ruby’s foreach.

    I don’t have a source, but I believe that if you benchmark, you’ll see that foreach is not the same as .each. One is faster than the other. Semantically, your statement is correct. They are the same.

  6. Pete FordeJuly 29, 2008 @ 09:08 PM

    This is a great post, Jon.

    I’m going to print out your little chart and mount it above my bed.

  7. RemiJuly 30, 2008 @ 09:34 AM

    Ruby actually has a better endless loop instead of this: while true{}. It is simply: loop {}

  8. Jon DahlJuly 30, 2008 @ 10:01 AM

    Remi – you’re right, and that’s actually what I used. I’ll edit the article.

    Tim – is Jim correct (that the only Ruby for loop is for i in arr), or is there also a straight for loop in Ruby that I’m not familiar with?

    Tim, Jim, etc. – you’re absolutely right that about optimization. Just like denormalizing a database, there may be times when you should move from a more abstract function to a faster function.

  9. Trey JacksonJuly 30, 2008 @ 02:37 PM

    Excellent points. If I had my way , for loops would be banished. We should force up the level of abstraction.

  10. xavier brinonJuly 31, 2008 @ 09:41 AM

    Maybe you should look at functional programing language techniques, out of curiosity, there is lot’s more than iter and map. I think you can also consider recursion algorithm… I’d like to see how those abstraction are internaly represented in Ruby, maybe with for loops ?

  11. Jon DahlJuly 31, 2008 @ 10:36 AM

    Xavier – sounds like you’ve got a blog post waiting to happen. :)

    Recursion could be used in place of a bare loop (loop {}), and of course, there is a lot more to functional programming than two functions. As for how map etc. are represented internally, that’s a bit beyond my expertise; could be a C loop, but it could also be iterative recursion or something else.

  12. Old MacdonaldJuly 31, 2008 @ 03:44 PM

    Man, the first and last lines are ugly. Why do you have to initialize and return active_users? Answer: because this is a misuse of each.

    Dang. I wrote code like that just yesterday. I’ve got some cleaning up to do.

  13. Luke RandallAugust 01, 2008 @ 07:23 AM

    Thanks for the article. I’ve tried to remember to use enumerable methods properly, but went through my code and found a couple of loops I could fix. However, it also prompted me to try fix what I had thought had a bad code smell, but hadn’t been motivated to find a better way of doing:

    items = {}
    collection.each {|item| items[item] => item.method }
    items
    

    I had a couple of these scattered around in my code, and after a quick search, found build_hash which cleaned up my code a lot!

    Just thought I’d mention it since it’s along the same lines as what you spoke about, and my finding it was definitely a result of reading your post.

  14. Jon DahlAugust 01, 2008 @ 01:35 PM

    Hi Luke – I’d never seen build_hash before, but it looks appropriate. If you weren’t familiar with that function, you could also just use inject.

    
    
    [1,2,3].inject({}) { |result, item| result.merge(item => item.method) }

    That’s true for just about everything – you can implement map, select, etc. in terms of inject – but whenever possible, you’re better off using a more abstract method (like build_hash instead of inject).

    (I also fixed the link in your comment.)

  15. Mariusz NowakAugust 02, 2008 @ 09:21 AM

    What would you use if you’d like to check if there’s any element within array that matches given condition? In such case probably most efficient is ‘for’ loop so you can break after you encounter first element that matches (??)

  16. Luke FranclAugust 02, 2008 @ 12:42 PM

    Mariusz,

    In Ruby, you would want to use Enumerable#find or Enumerable#any? to do that. find will return the first object that returns true for a block, while any? simply returns true if any of the elements returns true for a block.

  17. JimAugust 04, 2008 @ 12:47 PM

    Good stuff, but don’t completely write off the for.

    Consider (1) the case where the index itself is instrumental to the operation mutating the collection elements; and (2) the case where the operation mutating the collection element depends on the preceding or succeeding element.

    I realize that both of these cases can be handled without a ‘for’, but there are times when a for can be the clearer abstraction.

  18. LennonAugust 04, 2008 @ 12:53 PM

    @Jon: Your example for building a hash using #inject has one problem: you’re creating and then throwing away an anonymous hash for each iteration through the inner loop, which could consume a lot of extra memory and CPU (for garbage collection) if your initial collection is large.

    I actually tend to use #inject to construct hashes quite often, but without the “magic” method call parameter capture of a new anonymous hash:

    my_hash = @array_val.inject({}) {|h, val| h[val] = val.some_method; h }