Paul Barry

In Search Of Sharper Tools

July 21, 2009

After reading the comments in my last post, one thing I realized that I neglected to do is define what I mean by metaprogramming. Rubyists probably already know what I mean, but people coming from other programming languages might have different ideas about what I mean by metaprogramming.

I’ve actually mentioned this in a few talks I’ve given about Ruby. I don’t really like the word Metaprogramming, it’s a little bit nebulous. I think a better term is dynamic code generation. I like that term because I think most programmers will have a pretty good mental model of what I am talking about when I say that. There are several features of Ruby that when combined together allow you to do almost anything to bend the language to your will. To understand how to do that, I recommend reading the book Metaprogramming Ruby, which is in beta right now.

I’ll give a short, real world example of what I’m talking about. I’m working on a Rails app that uses a database that has bit field columns. I want to treat the boolean flags in the bit field as though they are regular boolean attributes throughout my code. So I wrote a Ruby gem has-bit-field, which generates all the methods necessary to work with the bit field columns. You define what is in the bit field in the most clear, simple, elegant way possible by just stating which column has the bit field and then what attributes should be generated for each bit:

class Person < ActiveRecord::Base
  has_bit_field :bit_field, :likes_ice_cream, :plays_golf, :watches_tv, :reads_books
end

This is the kind of abstraction that the metaprogramming capabilities of Ruby afford you. I threw this together, with tests, in an hour or so. Can you imagine the amount of nonsense you would have to go through in Java to create an abstraction equivalent to this?

This type of abstraction is what I think makes Ruby a great language, but I realize you are definitely walking a fine line with this kind of thing. It’s possible for this sort of thing to go wrong quickly. The first way these things go wrong is when the abstraction is not intuitive and not documented. First of all, a good abstraction should be almost intuitive, requiring other programmers to be able to guess what is does. This is commonly referred to as the Principal of Least Surprise. This doesn’t mean that you are excused from providing some sort of documentation explaining how it works, especially for more complex abstractions.

The reason why it’s important that the abstraction is clear is that most of the time the code that defines the abstraction is dense at best, and downright ugly at worst. This isn’t a problem specific to Ruby, as anyone who has worked with Lisp macros can attest to. But in the end I’d rather have a small chunk of code that is tested and documented that I don’t really need to look at that enables me to make the code where the business logic is defined as clear as possible. If other programmers are constantly having to dive into the guts of the definition of these abstractions just to understand how the code works, you have officially created a mess. And this is no ordinary mess, this is meta-spaghetti, and is a mess on an order of magnitude not possible in statically typed languages.

So does this mean you shouldn’t use Ruby? Not at all, and I think Glenn Vanderburg sums it up best:

Weak developers will move heaven and earth to do the wrong thing. You can’t limit the damage they do by locking up the sharp tools. They’ll just swing the blunt tools harder.

I think developers often associate “blunt tools” with static typing, because really they associate static typing with Java. I’m not sure that static typing is in fact a blunt tool. If static typing means I can’t create these kinds of abstractions, then yes, it’s a blunt tool. But can you do this kind of thing with Scala Compiler Plugins? How about with Template Haskell? What about with MetaOCaml? If you can, are those tools then sharper than Ruby? Or is there a way to define abstractions like these without metaprogramming at all?

Types and Metaprogramming: Can We Have Both?

July 21, 2009

Michael Feathers has posted a an excellent article called Ending the Era of Patronizing Language Design. You should go read the whole article, but the crux of the arguments boils down to:

The fact of the matter is this: it is possible to create a mess in every language. Language designers can’t prevent it. All they can do is determine which types of messes are possible and how hard they will be to create. But, at the moment that they make those decisions, they are far removed from the specifics of an application. Programmers aren’t. They can be responsible. Ultimately, no one else can.

I agree with Michael, but I see it from a slightly different viewpoint. The most common argument for static typing is that it will prevent you from shooting yourself in the foot. If your program has syntax errors or type errors, the compiler will catch them for you. These means that you can write less tests than you would need to write for the same program written in a dynamically typed language.

This argument I completely disagree with. Having a spell checker doesn’t mean you can proofread less. No amount of spell checking and grammar checking is going to prevent your writing from having errors. If you don’t test your program, it will have errors.

I like looking at static typing in a more “glass is half full” sort of way. The bigger benefit of static typing is that the type system can be used as a mechanism to make the code more self-descriptive. Here’s an analogy from Real World Haskell:

A helpful analogy to understand the value of static typing is to look at it as putting pieces into a jigsaw puzzle. In Haskell, if a piece has the wrong shape, it simply won’t fit. In a dynamically typed language, all the pieces are 1x1 squares and always fit, so you have to constantly examine the resulting picture and check (through testing) whether it’s correct.

I think the jigsaw puzzle analogy is even better if you think about it a different way. Think of a simple puzzle that is made up of only nine pieces. If each piece is a 1x1 square, you have to look at the part of the picture on each piece to determine where it goes. If you had a puzzle with nine uniquely shaped pieces, you could solve the puzzle without looking at the picture on each piece at all. You could assemble the puzzle simply by connecting the piece that fit together.

When analyzing a program, “looking at the picture on the piece” means reading the body of a method. Let’s take a look at some Ruby code without looking at the body of the method. Take for example, these two method declarations:

def select(*args)
def select(table_name=self.class.table_name, options={})

The first method gives us no hint as to what this method is, whereas in the second example, we can start to see what’s going on with this method. What if the method signature looked like this:

SQLQueryResult execute(SQLQuery query)

It should be obvious what this does. So it should be clear that having types in our method signatures is not the language’s way of making sure we don’t make errors, but instead is a means for expressing the intent of how the code works and how it should be used.

The problem is that this is not the only way of making code more self-descriptive or easier to reason about. The authors of Structure and Interpretation of Computer Programs state:

A powerful programming language is more than just a means for instructing a computer to perform tasks. The language also serves as a framework within which we organize our ideas about processes. Thus, when we describe a language, we should pay particular attention to the means that the language provides for combining simple ideas to form more complex ideas. Every powerful language has three mechanisms for accomplishing this:

  • primitive expressions, which represent the simplest entities the language is concerned with,
  • means of combination, by which compound elements are built from simpler ones, and
  • means of abstraction, by which compound elements can be named and manipulated as units.

The meta-programming capabilities of Ruby, which are on par with that of any other language, including Lisp, provide a powerful means of abstraction. ActiveRecord is a clear example of that:

class Order < ActiveRecord::Base
  belongs_to :customer
  has_many :line_items
end

Can statically typed languages provide a means of abstraction on par with that of dynamic typed languages? Steve Yegge doesn’t think so. I’m not sure and I am in constant search of that answer. The combination of meta-programming, object-orientation and functional programming in Ruby provides a powerful means of abstraction. I would love to have all of that as well as a type system, so I would hate to see the programming community completely give up on static typing and just declare that dynamic typing has won.

A rake task for tracking your time with git

July 7, 2009

Are you using Ruby on Rails? Are you using Git? Do you have a need to track how long you spend on things? Then I have just the thing for you.

I threw together a quick rake task that gets all of your commits in a git repo and parses out the times and commit message from them. Then it formats them with the time and also the time interval between them. You can get the rake task to track your time from this gist.

The output will look something like this:

Fri, Jul 07 10:55AM  20m 49s  Added toolbar for controllers using temp...
Fri, Jul 07 10:34AM  21h 52m  Added support for using page templates i...
Thu, Jul 07 12:42PM  37m 57s  LH#77, fixed issue with tests failing on...
Thu, Jul 07 12:04PM  12m 18s  LH#67, added a limit option to the rende...
Thu, Jul 07 11:52AM  17m 30s  Removed debug statement                 ...
Thu, Jul 07 11:34AM  19h 52m  LH#66, added :path option to render menu...
Wed, Jul 07 03:41PM           Added DSL for modifying portlet behavior...
Tue, Jun 06 02:05PM  18h 44m  LH#119, multiple HTML fields on one bloc...
Mon, Jun 06 07:20PM   6h 21m  Converted docs to textile               ...
Mon, Jun 06 12:58PM           Fix for LH#118, create directories in ge...
Sat, Jun 06 10:22PM           Added support for other template handler...
Fri, Jun 06 04:49PM   0m 58s  bump build                              ...
Fri, Jun 06 04:48PM  23m 11s  Fix LH#106: Section not correctly loadin...
Fri, Jun 06 04:25PM  34m 25s  Fix for LH#107, images were not showing ...
Fri, Jun 06 03:51PM   9m 48s  Fix for LH#110, can't view usages of a p...
Fri, Jun 06 03:41PM  11m 12s  Fix for LH#113, check to see if there is...
Fri, Jun 06 03:30PM   2m 52s  Fixed LH#114, documentation typo        ...
Fri, Jun 06 03:27PM   0m 38s  bump build number                       ...
Fri, Jun 06 03:26PM   5h 38m  Fix for LH#98, tags not getting updated ...
Fri, Jun 06 09:48AM  33m 14s  Fixed LH#105, deleted portlets showing u...

It doesn’t actually truncate the commit messages, I just did that here to make each one fit on a line. If the time interval is over 24 hours, it doesn’t bother printing the interval, because you probably didn’t actually work on that one commit for 37 hours straight. I’ve been thinking if you really want to track time this way then each time you sit down to start hacking on a project, you just make a minor change to the .gitignore or something and then commit it with a message like “started hacking on foo”, so then when you commit your first chunk of actual work, you will know how long you spend on that.

Posted in Technology | Topics Git, Ruby, Rake | 4 Comments

Zip Code Proximity Search with Rails

June 27, 2009

So you’re building the next big social networking website using Rails and like all the other hip kids you are going to need to allow your users to search for other users near them. The fancy term for this is “Proximity Search”. For our search, we just want be able to find other people that are generally within some radius, like 5, 10 or 25 miles. For this, there is no need to geocode the address for each user in our database, we’ll just use their zip code. So effectively, in our system, every user’s location is just the center point of their zip code.

For starters we want to create a zip code model:

script/generate model zip code:string city:string state:string lat:decimal lon:decimal

That will create a model and a migration. You need to alter the migration to specify the precision and scale for the lat and the lon.

t.decimal :lat, :precision => 15, :scale => 10
t.decimal :lon, :precision => 15, :scale => 10

So to populate this database, luckily the good people over at the US Census Bureau have the data readily available for us. I’ve created a rake task to download and load that data into your zips table. Simply put the load.rake file from this gist into the lib/tasks directory of your Rails app.

So now when you run rake load:zip_codes you should see something like:

== Loaded 29470 zip codes in ( 1m 40s) ========================================

Next we need a table for our users. So let’s generate a model and a migration:

script/generate model user

I’ll save you the hassle of typing out all the fields at the command-line and just give them to you here. Paste this into the create_users migration that was generated:

t.string   :username
t.string   :email
t.string   :password
t.string   :password_confirmation
t.string   :first_name
t.string   :last_name
t.string   :address
t.string   :city
t.string   :state
t.integer  :zip_id    

Next you need to hook up the relationship between the zip and the user. This is basic stuff, the zip has many users and the user belongs to a zip.

Now we need some users to play with. A great tool for this is Mike Subelsky’s Random Data gem. I’ve already created a rake task that uses this gem to create some test user accounts. You call it like this:

rake load:random_users[10000]

The 10000 is the number of users we want the rake task to generate for us. Did you know you can pass command-line arguments to a rake task like that? Pretty spiffy. 10000 is a pretty good number because it gives us a fairly large dataset to work with and is still able to load in a reasonable amount of time. 10000 users finished in about 6 minutes and 30 seconds for me.

Next we need to setup our methods to do the querying. For this I basically used Josh Huckabee’s Simple Zip Code Perimeter Search method, but re-worked it a little so we can use named scope with it. You can grab the code for both zip.rb and user.rb from the gist.

There are a couple of things we get here. First is a named scope to easily find zip codes. Looking at the output of the loading of the random users, the last one for me was Mr. Steven Moore of Koloa, HI, 96756. So let’s see how many other people are in that zip code. Start up script/console and run this:

>> Zip.code(96756).users.count
=> 1

Hmm…I guess it’s lonely in Hawaii. Let’s find the zip code that randomly ended up with the most inhabitants:

>> Zip.count_by_sql "select zip_id, count(*) as count 
from users group by zip_id order by count desc limit 1"
=> 18177

Ok, so that’s the id of the zip record, not to be confused with the actual zip code. So let’s find the first person in this zip code:

>> user = Zip.find(18177).users.first
=> #<User id: 1267, username: "cabel1266", ...>

I got Ms. Cheryl Abel of Bloomville, NY. So now for the big moment. What we really want to do is find everyone within 25 miles of Cheryl.

>> user.within_miles(25).count(:all)
49

Looks like Cheryl has 49 people nearby. Let’s see who they are:

>> user.within_miles(25).all.each{|u|
?> puts "%.2f %20s, %2s, %5s" %
?> [u.distance, u.city, u.state, u.zip.code]}
0.00           Bloomville, NY, 13739
0.00           Bloomville, NY, 13739
0.00           Bloomville, NY, 13739
0.00           Bloomville, NY, 13739
0.00           Bloomville, NY, 13739
7.04            Worcester, NY, 12197
7.04            Worcester, NY, 12197
7.43             Maryland, NY, 12116
8.09             Meredith, NY, 13753
8.54            De Lancey, NY, 13752
8.71     Livingston Manor, NY, 12758
9.11             Roseboom, NY, 13450
9.88          Jordanville, NY, 13361
...

So there you have it! I’m still trying to work out some kinks with this and get it to work with count and will paginate, so if you have any suggestions, fork the gist, hack away and leave a comment. I’ll update this post when I get count and pagination working.

Posted in Technology | Topics Ruby, Rails | 2 Comments

Ruby Nation

June 13, 2009

Thanks to everyone who attended my talk yesterday at Ruby Nation about BrowserCMS. I’ve posted my slides online here. Unfortunately the demo part of the talk wasn’t record, so I’ll try to record a screencast of the demo. Look for that in the next few days.

If you did attend my talk, I would appreciate it if you take a minute to rate it.

Posted in Technology | Topics Ruby