Fibers in Ruby 1.9

April 1, 2010

One of the new features in Ruby 1.9 is Fibers. In order to understand how Fibers work, we need to first understand how threads work.

A thread is an execution context. When a ruby programs starts, there is a main thread, which you can access by calling Thread.current. You can create new threads within your program. Here's an example of creating a thread:

Thread.new do
  puts "start"
  sleep 5
  puts "finish"
end
Thread.list.each do |t|
  p t
end

The first thing this program does is create a new thread. What the thread should run is passed in via a block. Remember that in Ruby, the block is a Proc which does not execute when it is created, only when it is called. Next we call Thread.list to iterate each of the threads that exists in our program. Unlike Thread.new, the each method does call it's block immediately, so we see each thread printed out. What you actually see when you run this program is hard to say. When running in 1.9, you might see this:

#<Thread:0x000001008648e0 run>
#<Thread:0x00000101031010 run>

What we can see is that we have a couple of threads and they are both ready to be run. We don't see the puts "start" from the thread we created because in this case, the program exited before our thread got a chance to run. You might see this:

#<Thread:0x000001008648e0 run>
start#<Thread:0x00000101003a08 sleep>

In this case, we can see that while were iterating over the thread list, after we printed the main thread and before we printed the second thread, the second thread started executing. Now also notice that the status of the second thread is sleep. This doesn't just mean the thread is sleeping, a thread could have a state of sleep if it's waiting on IO.

What is happening in this code is that we have multiple threads and a thread scheduler is deciding when each thread should execute. In Ruby 1.8 MRI the thread scheduler is part of the ruby interpreter process and in Ruby 1.9 YARV, the thread scheduling is being handled by the operating system, but in either case what is happening is conceptually similar.

The thread scheduler allows each thread to run for a short period of time, like 10ms. Once that time runs out or when the thread's status changes to sleep, the thread scheduler finds the next thread that isn't sleeping and let's that thread run for 10ms. This continues throughout the life of the program. It's actually more complicated than this, but at the heart of if, this is what happens.

Every time the thread scheduler switches from one thread to the next, it has to switch the execution context to allow the next to run. There is some overhead with this that can add up if your program has to do a lot of context switching. More importantly, there is no way of knowing ahead of time when the context switching will occur. So not only is this inefficient, it's also dangerous, because the outcome of your program can change based on circumstances out of your control. In order to achieve parallelism in your program though, ruby has to switch from one thread to the next as some point, so the thread scheduler has to just guess. But what if you could indicate in your code exactly when you want a context switch to occur?

Enter fibers in Ruby 1.9. The easiest way to understand fibers is to think about them as being very similar to threads. When your program starts, there is a current fiber. You can create more fibers as your program runs. Each fiber defines some code to run. Here's our example from above:

fibers = [Fiber.current]
fibers << Fiber.new do
  puts "start"
  sleep 3
  puts "finish"
end
fibers.each do |f|
  p f
end

In the case, the output of our program is more determinate. It will be something like this: (the only thing indeterminate about it is what the ids of the objects will be)

#<Fiber:0x0000010109cdc8>
#<Fiber:0x0000010109cd58>

Unlike threads, fibers don't have a state that can be runnable or sleeping. This is because with fibers, only one fiber in the process can be running at once. This is true of threads as well, there can only be one thread running at once within one Ruby process. The difference is that a fiber gets to decide how long it wants to run for, unlike threads, which get preempted by the thread scheduler.

In our example above, our second fiber never executed because the main fiber never started it. In this case, the main fiber ran until the end of the program. If we want to run the fiber, we have to call resume on it:

require 'fiber'
fibers = [Fiber.current]
fibers << Fiber.new do
  puts "start"
  sleep 3
  puts "finish"
end
fibers.each do |f|
  p f
end
fibers.last.resume

Now we will see the fibers printed out as before, but then since we call resume on our second fiber, then it will execute, print start, then after 3 seconds, print finish:

#<Fiber:0x0000010101f030>
#<Fiber:0x0000010101efc0>
start
finish

Where things actual get interesting with fibers is that once a fiber is started, it can then yield back to the fiber that started it. Then, you can call resume on the fiber and it will pick up executing where it left off. Take a look at this example:

require 'fiber'
you = Fiber.new do
  Fiber.yield "potato"
  Fiber.yield "tomato"
end
puts "I say potato"
puts "You say #{you.resume}"
puts "I say tomato"
puts "You say #{you.resume}"

The output of this will be:

I say potato
You say potato
I say tomato
You say tomato

What happens here is when the second puts is called, it calls you.resume. This means start executing you, which is a fiber. The return value of the call to resume will be the argument to Fiber.yield. A good mental model for thinking about fibers is a stack. When you call resume on a fiber, that fiber gets pushed on to the stack and starts executing. It executes until it's finished or until it calls Fiber.yield. Fiber.yield means pop the current fiber of the stack, keep track of where that fiber was, and resume executing the fiber that's at the top of the stack now. This is why in our example above, when we call resume on you the second time, Fiber.yield "potato" doesn't happen because the fiber is already past that point, so Fiber.yield "tomato" is executed.

Fibers have some powerful uses in the context of code that does asynchronous IO. Mike Perham gave a talk at Austin on Rails which covers using Fibers with Event Machine, which I highly recommend. For more detail on threads and thread scheduling, I recommend the "Scaling Ruby" envycast, which is available at peepcode. Also checkout this post on Ruby Inside, which has a list of 8 other articles on Fibers.

Posted in Technology | Tags Ruby

Comments

1.

Good overview, Paul. Thanks for the shoutout to my talk.

# Posted By Mike Perham on Thursday, April 1 2010 at 12:02 PM

2.

Good post. I found that digging into Aman Gupta's "Poor Man's Fibers" for ruby 1.8 was also helpful and demystifying: http://github.com/tmm1/fiber18/blob/a37a4c336ac94bd494f7e2d5213cfe2db48d7a6d/lib/fiber18.rb

# Posted By Nicholas A. Evans on Thursday, April 1 2010 at 5:59 PM

3.

Very nice article!
I think it's good also to mention Thread#join in this context http://ruby-doc.org/core/classes/Thread.html#M000462

# Posted By khelll on Wednesday, April 7 2010 at 5:57 AM

Comments Disabled