Zipping Arrays

August 23, 2010

When programming in any language, you are sure to be in a situation at some point where you have two or more arrays that match up by index. For example, say you have this:

cities = %w[Baltimore Washington Pittsburgh]
teams = %w[Ravens Redskins Steelers]

So the in this case, the name of the team in the nth city is the nth team. In languages like Java and JavaScript, a common method for doing this would be to use a for loop and pull each value out of the array using the index:

for i in (0...cities.size)
  puts "%s %s" % [cities[i], teams[i]]
end

As you can see, this works in Ruby as well. The output of that will be:

Baltimore Ravens
Washington Redskins
Pittsburgh Steelers

But Ruby's enumerable class has a built-in method for handling this that you might not know about. On any Enumerable, you can call zip and pass in another array and it will return a two-dimensional array with each of the values paired up:

p cities.zip(teams) 
# => [["Baltimore", "Ravens"], ["Washington", "Redskins"], ["Pittsburgh", "Steelers"]]

Conveniently, Ruby's each method also allows you to assign each value of the sub-array to a variable in the block. So we can perform the for loop from above like this:

cities.zip(teams).each do |city, team|
  puts "%s %s" % [city, team]
end

Which outputs:

Baltimore Ravens
Washington Redskins
Pittsburgh Steelers

No indexes to keep track of. Also, the zip method can take multiple arrays, so you can zip up more than one array and iterate through them in a similar fashion:

qbs = %w[Flacco McNabb Roethlisberger]

cities.zip(teams, qbs).each do |city, team, qb|
  puts "%s %s %s" % [city, team, qb]
end

Which outputs:

Baltimore Ravens Flacco
Washington Redskins McNabb
Pittsburgh Steelers Roethlisberger

Posted in Technology | Tags Ruby | 6 Comments

Fibers in Ruby 1.9

April 1, 2010

One of the new features in Ruby 1.9 is Fibers. In order to understand how Fibers work, we need to first understand how threads work.

A thread is an execution context. When a ruby programs starts, there is a main thread, which you can access by calling Thread.current. You can create new threads within your program. Here's an example of creating a thread:

Thread.new do
  puts "start"
  sleep 5
  puts "finish"
end
Thread.list.each do |t|
  p t
end

The first thing this program does is create a new thread. What the thread should run is passed in via a block. Remember that in Ruby, the block is a Proc which does not execute when it is created, only when it is called. Next we call Thread.list to iterate each of the threads that exists in our program. Unlike Thread.new, the each method does call it's block immediately, so we see each thread printed out. What you actually see when you run this program is hard to say. When running in 1.9, you might see this:

#<Thread:0x000001008648e0 run>
#<Thread:0x00000101031010 run>

What we can see is that we have a couple of threads and they are both ready to be run. We don't see the puts "start" from the thread we created because in this case, the program exited before our thread got a chance to run. You might see this:

#<Thread:0x000001008648e0 run>
start#<Thread:0x00000101003a08 sleep>

In this case, we can see that while were iterating over the thread list, after we printed the main thread and before we printed the second thread, the second thread started executing. Now also notice that the status of the second thread is sleep. This doesn't just mean the thread is sleeping, a thread could have a state of sleep if it's waiting on IO.

What is happening in this code is that we have multiple threads and a thread scheduler is deciding when each thread should execute. In Ruby 1.8 MRI the thread scheduler is part of the ruby interpreter process and in Ruby 1.9 YARV, the thread scheduling is being handled by the operating system, but in either case what is happening is conceptually similar.

The thread scheduler allows each thread to run for a short period of time, like 10ms. Once that time runs out or when the thread's status changes to sleep, the thread scheduler finds the next thread that isn't sleeping and let's that thread run for 10ms. This continues throughout the life of the program. It's actually more complicated than this, but at the heart of if, this is what happens.

Every time the thread scheduler switches from one thread to the next, it has to switch the execution context to allow the next to run. There is some overhead with this that can add up if your program has to do a lot of context switching. More importantly, there is no way of knowing ahead of time when the context switching will occur. So not only is this inefficient, it's also dangerous, because the outcome of your program can change based on circumstances out of your control. In order to achieve parallelism in your program though, ruby has to switch from one thread to the next as some point, so the thread scheduler has to just guess. But what if you could indicate in your code exactly when you want a context switch to occur?

Enter fibers in Ruby 1.9. The easiest way to understand fibers is to think about them as being very similar to threads. When your program starts, there is a current fiber. You can create more fibers as your program runs. Each fiber defines some code to run. Here's our example from above:

fibers = [Fiber.current]
fibers << Fiber.new do
  puts "start"
  sleep 3
  puts "finish"
end
fibers.each do |f|
  p f
end

In the case, the output of our program is more determinate. It will be something like this: (the only thing indeterminate about it is what the ids of the objects will be)

#<Fiber:0x0000010109cdc8>
#<Fiber:0x0000010109cd58>

Unlike threads, fibers don't have a state that can be runnable or sleeping. This is because with fibers, only one fiber in the process can be running at once. This is true of threads as well, there can only be one thread running at once within one Ruby process. The difference is that a fiber gets to decide how long it wants to run for, unlike threads, which get preempted by the thread scheduler.

In our example above, our second fiber never executed because the main fiber never started it. In this case, the main fiber ran until the end of the program. If we want to run the fiber, we have to call resume on it:

require 'fiber'
fibers = [Fiber.current]
fibers << Fiber.new do
  puts "start"
  sleep 3
  puts "finish"
end
fibers.each do |f|
  p f
end
fibers.last.resume

Now we will see the fibers printed out as before, but then since we call resume on our second fiber, then it will execute, print start, then after 3 seconds, print finish:

#<Fiber:0x0000010101f030>
#<Fiber:0x0000010101efc0>
start
finish

Where things actual get interesting with fibers is that once a fiber is started, it can then yield back to the fiber that started it. Then, you can call resume on the fiber and it will pick up executing where it left off. Take a look at this example:

require 'fiber'
you = Fiber.new do
  Fiber.yield "potato"
  Fiber.yield "tomato"
end
puts "I say potato"
puts "You say #{you.resume}"
puts "I say tomato"
puts "You say #{you.resume}"

The output of this will be:

I say potato
You say potato
I say tomato
You say tomato

What happens here is when the second puts is called, it calls you.resume. This means start executing you, which is a fiber. The return value of the call to resume will be the argument to Fiber.yield. A good mental model for thinking about fibers is a stack. When you call resume on a fiber, that fiber gets pushed on to the stack and starts executing. It executes until it's finished or until it calls Fiber.yield. Fiber.yield means pop the current fiber of the stack, keep track of where that fiber was, and resume executing the fiber that's at the top of the stack now. This is why in our example above, when we call resume on you the second time, Fiber.yield "potato" doesn't happen because the fiber is already past that point, so Fiber.yield "tomato" is executed.

Fibers have some powerful uses in the context of code that does asynchronous IO. Mike Perham gave a talk at Austin on Rails which covers using Fibers with Event Machine, which I highly recommend. For more detail on threads and thread scheduling, I recommend the "Scaling Ruby" envycast, which is available at peepcode. Also checkout this post on Ruby Inside, which has a list of 8 other articles on Fibers.

Posted in Technology | Tags Ruby | 6 Comments

How to spy on a Hash in Ruby

February 24, 2010

Let's say you're dealing with a large Rails codebase and you've got a Hash stored in a global variable or a constant and you want to know who is changing that Hash. Here's a contrived example:

IMPORTANT_STUFF = {
  :password => "too many secrets"
}

def change_password(h)
  h[:password] = "FAIL"
end

def print_password
  puts IMPORTANT_STUFF[:password]
end

print_password
change_password(IMPORTANT_STUFF)
print_password

Here it's pretty obvious where the Hash gets changed, but as I said, imagine you are trying to figure this out in a much larger codebase. Something is changing the value of IMPORTANT_STUFF and you don't know what. So how do you figure out what is? Easy, you do what Lester Freeman would do!

Lester Freeman from The Wire

We set up a sting! We put a wire tap on IMPORTANT_STUFF and monitor all communication with IMPORTANT_STUFF. So how do we do that? Let's create a class that proxies all communication with a Hash:

class HashSpy

  def initialize(hash={})
    @hash = hash
  end

  def method_missing(method_name, *args, &block)
    puts "***** hash access"
    puts "  before: #{@hash.inspect}"
    r = @hash.send(method_name, *args, &block)
    puts "  after: #{@hash.inspect}"
    puts "  backtrace:\n    #{caller.join("\n    ")}"
    r
  end

end

This uses a couple of interesting Ruby techniques. First, we just pass the actual Hash to the constructor. Then, we use method missing so that any method that is called on the HashSpy will be then called on the Hash and the return value of that method call with be called instead. Note that in Ruby 1.8, this isn't a transparent proxy because if you called class on the HashSpy, you would get HashSpy, not Hash. In Ruby 1.9, you can have your object inherit from BasicObject, which won't have those methods, making it easier to be a transparent proxy. In Ruby 1.8, you can use Jim Weirich's Blank Slate pattern

In HashSpy's method missing, we use caller to get a backtrace of the current call stack, which will tell us who the perpetrator is.

So, if we just change IMPORTANT_STUFF to be created like this:

IMPORTANT_STUFF = HashSpy.new(
  :password => "too many secrets"
)

Now when we run the program, we'll get output something like this:

***** hash access
  before: {:password=>"too many secrets"}
  after: {:password=>"too many secrets"}
  backtrace:
    hash_spy.rb:27:in `print_password'
    hash_spy.rb:30
too many secrets
***** hash access
  before: {:password=>"too many secrets"}
  after: {:password=>"FAIL"}
  backtrace:
    hash_spy.rb:23:in `change_password'
    hash_spy.rb:31
***** hash access
  before: {:password=>"FAIL"}
  after: {:password=>"FAIL"}
  backtrace:
    hash_spy.rb:27:in `print_password'
    hash_spy.rb:32
FAIL

And by reading through the output, we can see that the second time the hash is accessed is when the value is changed, so the perpetrator is on line 23 of hash_spy.rb in the change_password method. Here's the entire script in one gist for reference.

Posted in Technology | Tags Ruby, Rails | 4 Comments

Node.js Presentation

February 8, 2010

Last week I presented on node.js at the Baltimore/DC JavaScript Users Group. Here's the video:

Paul Barry on Node.js at February Baltimore JavaScript Meetup from Shea Frederick on Vimeo.

To make it easier to follow along, here are the slides and the code.

Posted in Technology | Tags Javascript, node.js | 2 Comments

Sharing a git repo on the network

February 6, 2010

If you find yourself on a network with other developers you'd like to share a git repo with, here's a simple way to do that. First, on your machine, the one with the git repo, you run this:

$ git daemon --base-path=/path/to/dir/with/repo --export-all

So if you have a git repo in your home directory called, foo, you would make the base path be your home directory. Then, assuming your IP is 192.168.1.42, others can clone the repo using:

$ git clone git://192.168.1.42/foo

Posted in Technology | Tags Git | 2 Comments

  Older Articles >>