Eymiha - Developing gem

5 April 2007

Dave Anderson • @eymiha

Developing gem_raker

Sometimes I lament that I didn’t get in on the ground floor of Ruby and Rails. Sooner than some perhaps, but I spent many years writing too much code while I could have been writing Ruby. If I had known, if I had made the change sooner, I’d have had much more fun and saved myself thousands of hours of sweat. Alas, I didn’t and I sometimes feel like I’m always catching up. But then again, there are lots of unsolved problems and much work left to do. So I stop lamenting and cheer up again.

One of my favorite ways to cheer up is to dive into my old code and make it better. With a few Ruby and Rails projects under my belt, I continue to pull up my existing Java, C++ and C code and move it forward. This is always something I enjoy. I’ve written a lot of stuff over the years; rewriting it gives me the chance to do what I should have done in the first place - using the experience of working with the old code, the knowledge I’ve gained since I originally wrote it, and the requests from others who have used my code in their projects. A language-change-driven rewrite is an opportunity for doing the Grand Refactoring. I just love it.

While recently moving some of my code to Ruby, the question of packaging came up. In C, C++ or Java, the unit of distribution for reusable functionality is a library or jar file. Ruby has libraries but with a little different flavor - they’re features, and implemented through a versioned packaging facility called Rubygems. Building code with Rubygems in mind lets me cleanly segment my functionality - but I do tend to end up with a lot of gems!

Managing a large set of interdependent Rubygems created a problem. I needed development facilities to test and install lots of Rubygems at once, simply and cleanly. I needed an assembly line to mechanize my Rubygem management. Happily, I was able to build this mechanism in Ruby using the same type of facilities I was creating.

It was lots of fun to write too. But first, some background.

Rubygems

The Ruby community is a quite a generous group, sharing their ingenuity with each other since the very beginning. One of the best ways they share is through the Rubygems facility.

Rubygems is a system for packaging and sharing code, with versioning built in from the onset. A Rubygem is a chunk of Ruby code that, once downloaded and installed from a Rubygem repository, can be required into your code. Rubygems quickly gained popularity in the community after its introduction. Shortly thereafter, formal support for the feature was added to the Ruby language core in version 1.8.

In standard Ruby, you bring other code into yours with the require statement.

require 'foo'

Foo.each do |foo|
  foo.bar
end

This code effectively looks for the foo feature (usually the file foo.rb, or less typically a platform dependent variant) on the load path and loads it if it hasn’t been already. The require method is defined in the Kernel module, which is included by the Object class making its methods available to every Ruby object.

But what if foo isn’t part of standard Ruby? What if the honknbar feature (which contains foo.rb) was written by your friend Doug in Newark? Wouldn’t it be nice if there were standard way for Doug to package his code and put it somewhere so that you could grab it, install it and pull it into yours? Rubygems is what you use to do this. Using the gem command, you can locate Ruby features on the Internet and install the code onto your machine.

$ gem install foos_and_bars

This command, after looking in the current directory for honknbar.gem to install on your machine - but finding it missing - goes to the standard Rubygem repository on the Internet (where Doug put the Rubygem) to download and install honknbar.gem. Once the code is correctly installed (and tested - most Rubygems install their unit tests which you may run to verify all is well) you can use it in your Ruby programs through the same require statement that you normally would.

Since Ruby is aware of Rubygems, when your code does a require it really uses a custom require method. This new require first tries the original Kernel require method. If that fails to find the file, it looks for it in the installed Rubygems on your system, and if found (within any versioning constraints that have been set) adds it to the load path and retries the original require - that should now succeed.

This programmatic slight of hand literally opens up a whole world of extension. There is now a huge treasure chest full of Rubygems written by hundreds of authors - just run

$ gem list --remote

to see the current list.

Getting that Rubygems Religion

You might ask: “Why do I need to write Rubygems? I just write code on projects. Agile has taught me to write only the code I need when I need it. Why should I bother about packaging and libraries instead of just getting the current project working and out the door?”

For several reasons…

DRY happens at all levels - It isn’t enough to write your code with an eye to DRY (Don’t Repeat Yourself is a mantra of the Pragmatic Programmer to which we all should aspire) on a project-by-project basis - common code that spans projects should be refactored and maintained in one place for all projects that use it. Anything less and you’re just copying code and DRY goes out the window. The standard for project code sharing in Ruby is Rubygems.

Organization by feature is good - Agile teaches us to write small, coherent chuncks of code that are simple in themselves, easy to test, and get the job done. By putting each chunk of self-consistent code (that’s what a feature is) into its own Rubygem, all of these goals may be met in a nice little requireable unit. What’s more, the code can be versioned so that the chunks may grow and mature while still maintaining compatibility with older usage.

Despite Agile, high-level architecture and design really do exist - Though it may appear that many agile projects done with out it, underlying goals are always present to direct the choices that are made. Even with an involved client and test driven development, there must be a high-level plan for guidance. Sometimes it is collective, sometimes it is in one person’s head, sometimes it changes as work gets done - but it still must be present. Capturing your architectural and design decisions in a Rubygem not only allows you to make that small chunk of functionality part of the ongoing design of your project, but once you distribute it, you also let other developers in other projects that can use your work share in the power of your decisions.

Go pro - How are you going to get to be an acclaimed Ruby professional if you don’t do what they do? Professional developers reuse the code they write and that others have developed. When a professional creates a chunk of functionality that they can reuse, they don’t copy it into each project - they tuck it into a library and reference it. They don’t have to - it’s just the best way to do it. Ruby professionals don’t write Rubygems for fun - they write them because they’re the right way to do the job!

Rake

Jim Weirich is one of the heroes of the Ruby Revolution. Besides being part of the group that develops the Rubygems infrastructure, among other things he created rake, a Ruby version of the Unix make command. He tells the story of it’s creation - several “wouldn’t it be cool if you could do this, but that’d take way too long to write” moments - and then writing that chunk of code in twenty or thirty minutes. This is a testament both to Ruby’s power and clarity as well as to Jim’s abilities. A code hacker from the days when hacking was a labor of love rather than infamy, he created the first rake in a few hundred lines of Ruby code. It’s grown larger as it’s matured, but remains a flexible approach to dependency-based process automation that is unequaled in simplicity and elegance.

Rakefiles (the files containing the statements that define the automation of a process) consist of tasks and rules that define a domain specific language. Tasks declare the processing that occurs when called and dependencies on other tasks have been satisfied. Rules define a template process that is used to marshall objects with appropriate characteristics through the same process steps. There’s certainly a lot more that the rake command is capable of, but that’s better discussed elsewhere. Here I’ll just cover the that parts I needed to use.

Using rake as the foundation for my work was a simple choice. Each Rubygem already had a rakefile. What I needed was a base rakefile that would do my management tasks. I started by abstracting a common rakefile from the existing Rubygems I’d written. I whittled it down to six common actions I could use in all my local Rubygem development:

rake test	Run the Rubygem’s unit tests
rake rdoc	Generate the Rubygem’s documentation
rake install	Install the Rubygem in the local Rubygem repository
rake uninstall	Remove the Rubygem from the local Rubygem repository
rake clean	Remove temporary and extraneous files in the Rubygem
rake clobber	Clean on steroids with impunity

Of course, most of this work had already been done by others. I just needed to tease it all together into one base rakefile.rb that would put all of the rake tooling for my Rubygem automation in one place.

Raking a Rubygem

Each Rubygem typically resides in its own directory. I decided that the code needed to carry out each task should be extracted into a common base that could be directly required into the rakefile.rb in that directory. Only the code that specializes the rakefile should reside in a particular Rubygem’s directory.

#rakefile.rb  (for an automated gem)

require 'gem_package'  # specializations
require 'gem_raker'    # abstractions

Needing only two require statements in a rakefile is quite nice, however the abstractions are loaded before the specializations - which may seem a little counterintuitive. This is because the abstractions control the action in the rakefile and need to use the specializations immediately to dynamically produce tasks. If the specializations were specified after the abstractions, then a third step would be required to drive the creation of the dynamic rakefile. This is part of the blessing and curse of Ruby: since it is interpreted, time-dependencies such as when something is defined make a difference. Things may seem out of whack sometimes, but they’re really not.

The `gem_raker.rb` Abstraction

For my tooling, a Rubygem’s rakefile.rb should require the gem_raker.rb abstraction as its backbone. Each part of what’s in the abstraction is described below.

Testing

The first part of the gem_raker.rb runs any unit test tasks that may be present in the Rubygem. Most Rubygems will have them, but some may not. It all depends on who’s doing the writing, what’s being written, and the situation at the time.

# lib/gem_raker.rb

 1 | require 'rake'
 2 |
 3 | require 'rake/testtask'
 4 |
 5 | task :default => :test
 6 |
 7 | Rake::TestTask.new(:test) do |t|
 8 |   t.test_files = FileList['test/tc_*.rb']
 9 |   t.warning = true
10 |   t.verbose = true
11 | end

At line 1, we pull in the rake Rubygem (yes, rake itself is distributed as a Rubygem!) Then we pull in the rake’s testtask framework that defines how testing proceeds. Line 5 declares test to be our default task, so a call to rake in our Rubygem’s directory without an additional command will execute the test task. Then lines 7-11 define the test task itself, running all of the Ruby files containing test cases that reside in the test subdirectory, loudly and with warnings.

Packaging the Gem

The class we’ll specialize is called GemPackage, and what we’ll fill in later will be the specifics that pertain to the Rubygem we’re building. But GemPackage has some common parts that we define and use at the abstract level.

# lib/gem_raker.rb  (continued)

14 | require 'rake/gempackagetask'
15 |
16 | class GemPackage
17 |   attr_accessor :package_dir
18 |
19 |   def method_missing(method, *args)
20 |     if (method == :fill_spec || method == :name || method == :version)
21 |       $stderr.puts(error = "GemPackage must define the #{method} method")
22 |       raise Exception.new(error)
23 |     end
24 |   end
25 | end
26 |
27 | gem_package = GemPackage.new
28 | gem_package.package_dir ||= 'pkg'
29 |
30 | spec = Gem::Specification.new do |s|
31 |   gem_package.fill_spec(s)
32 | end
33 |
34 | Rake::GemPackageTask.new(spec) do |p|
35 |   p.package_dir = gem_package.package_dir
36 |   gem_package.fill_package(p)
37 | end

Line 14 brings in rake’s gempackagetask, which is defined on top of rake’s more generic packagetask. The common functionality for the GemPackage class is declared between lines 16 and 25. An accessor to the package_dir instance variable is declared at line 17 to hold the name of the subdirectory where the Rubygem that is being constructed will reside. The method_missing method at line 19 is simply a nagger: it complains if the fill_spec, name or version methods are missing - the abstraction needs them to be specialized; otherwise it does nothing.

A small bit of pain is reflected in line 22. Since rake uses the Ruby standard FileUtils package in verbose mode, the shell actions are output - but to $stderr. This is to make sure the output buffering gets flushed while the command runs. However this also implies that $stderr is used consistently throughout, and unfortunately, it is not. The best we can do is pick up the pieces and flush everything to $stderr whenever we get the chance.

On line 27, an instance of GemPackage is constructed for subsequent use in the rakefile. The default package subdirectory pkg is set to be the package_dir if something else has not been set during the instance’s initialization. While pkg is the default used by rake’s packagetask, it is only exposed when the package is being constructed. We’ll need it for other things as well.

After instantiation, the GemPackage is used to create a Gem::Specification at line 30. The fill_spec method in the GemPackage is called (which is complained about if missing from the specialization) to set the non-abstract values for the Rubygem we’re raking.

Finally, with the GemPackage and Gem::Specification in hand, we declare the GemPackageTask at line 34, which declares the package and repackage tasks. We set the package_dir explicitly, and let the specialized GemPackage adjust anything else through its fill_package method if it’s been defined.

Installing and Uninstalling

Working with Rubygems when you’re the one doing the development means that you may need to go through some cycles of local installation and uninstallation. By putting the install and uninstall tasks in the rakefile it’s quick and easy to do that, and it will look exactly like you did them in using the gem command - exactly the way that the others using your Rubygem will do it.

# lib/gem_raker.rb  (continued)

39 | task :install => [:repackage] do |t|
40 |   Dir.chdir(gem_package.package_dir)
41 |   Gem::GemRunner.new.run ["install",gem_package.name, "-v", gem_package.version]
42 | end
43 |
44 | task :uninstall do |t|
45 |   Gem::GemRunner.new.run ["uninstall", gem_package.name, "-v", gem_package.version]
46 | end

Line 39 declares the install task, repackaging the gem first for good measure. We install by changing into the directory that holds the package we built through the repackage task and do a gem install of the feature from inside rake just as if we had gone through the gem command directly.

Line 44 uninstalls the Rubygem, but there is some nastiness here. In an effort to do good, the authors of the gem command did not include an option to uninstall verbatim - even when you specify precisely what Rubygem and version you want to uninstall, if there are other Rubygems with names that are close to what you specified, gem assumes you might not really be sure and gives you back a list of selections to choose from. This makes automating uninstalls slightly painful, but without modifying gem (which is not out of the question) this is the behavior of the system we must live with. Such is life.

Documenting

When a Rubygem is released, the RDoc system is typically used to document it. It is intended to document the software-oriented contents of files, classes and methods - the typical manifest of a Rubygem. The documentation is an invaluable resource for other developers.

# lib/gem_raker.rb  (continued)

48 | require 'rake/rdoctask.rb'
49 |
50 | Rake::RDocTask.new(:rdoc) do |rd|
51 |   rd.main = gem_package.rdoc_main
52 |   rd.doc_files.include(rd.main) if (rd.main != nil)
53 |   rd.rdoc_files.include("lib/**/*.rb")
54 | end

Line 48 pulls in rake’s rdoctask code, and line 50 declares the rdoc task. If a main document exists, it is added to the set of files that will be in the documentation, and is followed by the set of files that are contained in the Rubygem’s lib directory.

Cleaning and Clobbering

The final step is cleaning up after ourselves.

# lib/gem_raker.rb  (continued)

56 | require 'rake/clean'
57 |
58 | task :clobber => [:clobber_rdoc]

Rake already does most of this setup for us. Just pull in its clean code and add any other dependencies. Line 56 gives us the clean and clobber commands, but we also want to clobber our documentation when we clobber too, as indicated on line 58.

The `gem_package.rb` Specialization

Once I started getting into this, I decided that there’s no reason gem_raker shouldn’t be a Rubygem itself! Of course! Put the Rubygem raking framework into a Rubygem! Marvelous! There’s only one small gotcha - I have to bootstrap. You can’t use the gem_raker Rubygem from the installed Rubygem repository to make itself to put into the Rubygem repository. Circularity. Chickens and eggs, you know. Fortunately, that can be handled by just slightly altering the two-line rakefile in this special case:

# rakefile.rb (for an automated gen, modified)

require 'gem_package'      # specializations
require 'lib/gem_raker'    # abstractions

Instead of using an installed Rubygem, it uses itself - what would be installed in the local Rubygem repository. I suppose this makes it a meta-gem, but pragmatically speaking, everything’s still nice and DRY.

The specialization consists of some accessor declarations (line 3), the initialization of the GemPackage (lines 5-14), and a Rubygem GEM::Specification filler (lines 16-28).

# gem_package.rb

01 | class GemPackage
02 |   attr_reader :name, :version, :files, :rdoc_options
03 |
04 |   def initialize
05 |     @name = 'gem_raker'
06 |     @version = '0.1.0'
07 |     @files = FileList[ '*.rb', 'lib/*', 'test/*', 'html/**/*' ]
08 |   end
09 |
10 |   def fill_spec(s)
11 |     s.name = name
12 |     s.version = version
13 |     s.summary = "Eymiha standard rake-based gem assembly line"
14 |     s.files = files.to_a
15 |     s.require_path = 'lib'
16 |     s.autorequire = name
17 |     s.has_rdoc = true
18 |     s.rdoc_options << "--all"
19 |     s.author = "Dave Anderson"
20 |     s.email = "dave@eymiha.com"
21 |     s.homepage = "http://www.eymiha.com"
22 |     s.rubyforge_project = "cori"
23 |   end
24 | end

Note that the code in gem_raker.rb and gem_package.rb has no nice RDoc commenting in place. Unfortunately, RDoc documents Ruby classes and methods quite well, but does not handle domain specific languages (like rake). The gem_multi_raker.rb file about to appear is similarly devoid of RDoc-style documentation.

Now that we have the machinery to mechanize the management of a single Rubygem, we can move on to fashioning the Rubygem conveyor belt.

Raking Multiple Rubygems

We now have the potential for creating lots of Rubygems and managing them all through the same interface. But what should the interface to that manager look like? What sort of things do we want the manager to do?

It’s an interesting question. If we want to do something to an individual Rubygem, we just go do it to that Rubygem directly - all the management logic is already in its rakefile. Since our Rubygems could be interdependent in many different ways, we don’t necessarily want to re-expose all the connectivity that’s explicit in our Rubygem corpus - in fact, even if we tried, some of it may be dynamic and that would be impossible to get right. In this case we would want to keep related Rubygems together in their own assembly line and everything would be fine.

An assembly line should be a tool that lets us deal with a set of related Rubygems uniformly: test them all, install them all, clobber them all… Hmmm… It sounds like the assembly line has the same interface as the rakefile for an individual Rubygem! This makes it very easy to build, and we can use rake yet again.

But there’s one more thing - since we made gem_raker a Rubygem, why not put the assembly line in the Rubygem too? That way, specifying an assembly line can be done with a quick require of a Rubygem and a little bit of specialization data. We’ll call the feature gem_multi_raker.

# lib/gem_multi_raker.rb

01 | require 'rake'
02 |
03 | task :default => :test
04 |
05 | task :test do
06 |   each('testing') { `rake --silent test` }
07 | end
08 |
09 | task :rdoc do
10 |   each('rdocing') { `rake --silent rdoc` }
11 | end
12 |
13 | task :install do
14 |   each('installing') { `rake --silent install` }
15 | end
16 |
17 | task :uninstall do
18 |   each('uninstalling') { `rake --silent uninstall` }
19 | end
20 |
21 | task :clean do
22 |   each('cleaning') { `rake --silent --verbose clean` }
23 | end
24 |
25 | task :clobber do
26 |   each('clobbering') { `rake --silent --verbose clobber` }
27 | end
28 |
29 | def each(announcer)
30 |   File.open("gem_list.txt").each { |line|
31 |     $stderr.puts "\n***********  #{announcer} #{line}\n"
32 |     gem_dir = line.chomp
33 |     cd(gem_dir, :verbose => false) {
34 |       op = yield.chomp
35 |       $stderr.puts op if op != ""
36 |     }
37 |   }
38 | end

The task definitions correspond directly to our actions before - but at this level we call the rake task for each Rubygem on our assembly line. The each method is in lines 29-38. It opens the file gem_list.txt in the assembly line directory, and treats each line in the file as a subdirectory holding a Rubygem. It changes to that directory and yields to the block that was passed into the method - in our case this the specification of a rake process to execute.

Note that we put any non-empty output we capture to $stderr. Again, we need to remain consistent with what’s going on inside rake. Note also that we make judicious use of rake’s silent and verbose options to keep stray output from appearing at inopportune moments.

To set up an assembly line, we just add this rakefile to the directory that holds the Rubygems we want to manage:

# rakefile.rb  (for a rubygem assembly line)

require 'gem_multi_raker'

This is even better than before - the entire assembly line is in a single require statement. The only specialized data is the list of gem directories being managed by the assembly line:

# gem_list.txt  (for a rubygem assembly line)

foobar
mumble
bast

To set up our special meta-gem assembly line, we have to bootstrap again, but this is still a single line:

rakefile.rb  (for the gem_multi_raker rubygem assembly line)

require 'gem_raker/lib/gem_multi_raker'

The list of files is just the gem_raker Rubygem:

# gem_list.txt  (for the gem_nulti_raker rubygem assembly line)

gem_raker

That’s all there is to it. We can handle the management of arbitrary sets of Rubygems as well as manage our meta-gem simply and easily - and all with the same code.

And so on…

It isn’t necessary to stop at one level. Since the vocabulary is the same at all levels, we can set up assembly lines of assembly lines (of assembly lines, etc.) if we’d like. We just use the name of the assembly line directory instead of the Rubygem directory in the gem_list.txt file. Since the rakefile in that directory responds to the same commands as everything else, we can go shallow or deep with the same apparatus for raking our Rubygems.