If you’re a programmer, you’ve probably worked through one or more books teaching you the syntax of a new language. I’ve had this experience with half a dozen languages, like C, Javascript, and Perl. These books are typically introduce loops midway through the syntax discussion, after datatypes and control flow, but before I/O and advanced features.
Loops are almost always presented according to this formula.
- Inane intro text: “what if you want to do an operation more than once”?
- Introduce
whileloop, with difference betweendo whileandwhile do.
- Introduce
forloop, thewhileloop’s crazy cousin.
- (Bonus) Introduce
foreachloop if language is sufficiently high-level. And that’s it – you know how to loop through code; time to move on.
Not so fast. If you’re lucky enough to use a language that draws from functional programming, you shouldn’t loop like this.
The point
From now on, I’m going to use Ruby for examples, but this article isn’t about Ruby. It is about transitioning from primitive loops to iterating through collections, and from generic collection functions (like each) to more specific functions (like map).
From loops to array traversal
For the last several months, I’ve been working on Tumblon, a medium-sized Rails application. I’ve worked on 15-20 Ruby applications over the last three years, probably totaling 50,000 lines of Ruby code.
I’ve only used a primitive loop once.
That primitive loop was a loop {} loop, forever polling a task list looking for jobs. In other words, a loop with no exit condition beyond ^C or a server crash. As far as I know, Ruby doesn’t have a for loop at all, which would explain why I haven’t used it. It has a foreach loop (for item in arr), but that’s syntactic sugar for arr.each {}.
So the first reason why I’ve only used a simple loop in one case: the each concept usually a better option. Its Ruby implementation will be familiar to anyone who’s seen Ruby code before:
1 2 3 |
["horse", "pig", "cow"].each do |animal| puts "Old MacDonald has a #{animal}" end |
(Yes, I have a small child.)
This is far cleaner than its for or while loop alternatives. And it is a better abstract representation of what we’re doing: we aren’t looping with an exit condition, we are iterating through an array. But what if you want to do something a fixed number of times? Even that can be understood as traversing a list, like [1,2,3,4,5,6,7,8,9,10].each {}. Of course, Ruby provides a cleaner version: 10.times {}.
So if your loop is working through a list of some sort, each is a better abstraction of the problem. And in my experience building Ruby applications, every loop but one has been traversing a list. Parsing XML? Traversing a collection. Summing numbers? Traversing a collection. Reading in a textfile? Listening to STDIN? Working with rows in a database? Traversing a collection. That’s what each loops do well.
Beyond arr.each
But each isn’t the final word. It is a step up from a primitive for or while loop when working with a collection of values, but many each loops should be replaced with other array methods, like map, inject, and select.
When is each useful? Simple: when you want to create side-effects, like saving to the database, printing a result, or sending a web service call. In these cases, you’re not concerned with the return value; you want to change state on the screen, the disk, the database, or something else. Take a look at this code.
1 2 3 |
User.find(:all).each do |user| Notification.deliver_email_newsletter(user) end |
You don’t need a return value from this – you need emails to be delivered.
But don’t use each if you want to extract some new value from an array. That’s not what it’s for. Instead, take a look at three other powerful functions: map, inject, or select. To see why, let’s take a look at select. Here is code that takes in an array, and creates a new array from elements that match a certain condition, using each.
1 2 3 4 5 |
active_users = [] users.each do |user| active_users << user if user.active? end active_users |
Man, the first and last lines are ugly. Why do you have to initialize and return active_users? Answer: because this is a misuse of each. You are much better off using select (or its equivalent, find_all):
1 2 3 |
users.select do |user| user.active? end |
Using select is shorter, easier to understand, and less bug-prone. And more importantly, it clearly encapsulates one common use of each (and looping in general).
Two other key functions – map and inject (or reduce) – complement select and follow a similar pattern. And not surprisingly, they form the foundation of the mapreduce approach to distributed processing. I’ve written more about map and reduce in another article, and here is shorthand for knowing which of these functions to use:
| Desired Return Value | Function |
|---|---|
| New array with same number of values | map |
| New array composed of part of the old array | select |
| Single value (though this value can be an array) | inject |
| none | each |
The point, redux
Use each for changing state. Otherwise, avoid side-effects and use “functional” array methods that return a value. Simple. Your code will be cleaner and less bug prone.
And remember the dead giveaway:
- Initialize an empty value, or array, or whatever (
new_arr = []) arr.each, changing the initialized value- Return the value (
return new_arr)
Whenever you see this pattern, you know you’ve got an each loop that needs swapping out.
(Edit: I’ve posted a follow-up article with more about map and reduce.)



