Eymiha - Creating Alternate Ruby Objects

13 June 2008

Dave Anderson • @eymiha

Creating Alternate Ruby Objects

One of the things I’ve found slightly frustrating about Ruby is the new method. Nearly twenty years ago I was writing code in Objective-C on NeXT, which had explicit allocation and initialization:

Object foo = [[Foo alloc] init];

Using this invocation, foo could be any sort of Object, and init could return any sort of Object. This two-part object creation could do something out-of-the-box ruby couldn’t.

It’s easy to create a new object in Ruby. An instance of a class Foo is created by calling its new method.

foo = Foo.new

When new is called it routes to the Foo’s new instance method. The code for the new method is written in C: the rb_class_new_instance function is the new instance method in Foo, and is mapped up into Ruby by the rb_define_method function.

/* src/ruby-1.8.6/object.c  (in the Ruby Interpreter) */

  VALUE
  rb_class_new_instance(argc, argv, klass)
    int argc;
    VALUE *argv;
    VALUE klass;
  {
    VALUE obj;

    obj = rb_obj_alloc(klass);
    rb_obj_call_init(obj, argc, argv);

    return obj;
  }

  rb_define_method(rb_cClass, "new", rb_class_new_instance, -1);

In the context of the Foo instantiation above, this code allocates a Foo instance, initializes it with whatever arguments were passed to new, and finally returns the new object.

Something Different

Sometimes though, a new object is not what is really wanted. Sometimes an object may already exist that should be returned instead, or perhaps an object of another type should be created and returned. Whatever the reason and for whatever purpose, this is not possible using the standard Ruby mechanism - only the allocated instance is returned. To return an object different from the one allocated, an enhanced mechanism is needed.

In a class that uses Class’ new method to construct instances, there are only two calls that may be intercepted to enhance its functionality - memory allocation and initialization. During memory allocation the space needed to hold an instance is allocated by Class’ allocate method, which returns an uninitialized instance of the class. During initialization the instance’s initialize method is called using any arguments that were passed to new to set up the initial state of the instance. It’s the initialize method’s responsibility to call its superclass’ initialize method - which is responsible for calling its superclass’ initialize and so on up the chain to Object, from which all instances derive.

The only other constraint on any mechanism affecting object creation is that it should be above the surface of the Ruby interpreter - object creation should be extended, not replaced in Ruby.

Returning an Alternate Object

Initialization is the more reasonable place to control the enhanced functionality since allocation is generally logic-free. This also allows information in a partially instantiated instance to be used to make decisions about its possible replacement. When initialize determines that another object should be returned by new, some sort of replacement mechanism must be used.

Looking back at the C code, it is allocation, not initialization, that determines the object that is returned by new. Since the value returned from initialize is ignored and the Ruby Interpreter cannot be changed, different semantics are required to replace an object. A remap_new_object method that can be called by initialize to declare an object replacement works well.

# class.rb  (Alternative Object Creation)

class Class
  alias old_new new

  def new *args
    obj = old_new *args
    @@new_remap.delete(obj) || obj
  end

  @@new_remap = {}

  def self.remap_new_object(o,r)
    @@new_remap[o] = r
  end
end

After moving the original new method aside with an alias, a new new method is defined that first calls the original new method and then either returns the alternate object that was reassigned, or the object itself. If an alternate object is present, it’s deleted upon retrieval to keep the remapping Hash small.

This would work fine if some classes didn’t already redefine the new method. The enhanced code above can’t deal with this situation since these classes may not even call Class’ new method. Rather than intercepting the new and possibly returning an alternate object, these classes would bypass the remapping code altogether. The mechanism must still be adjusted somewhat to handle this.

Elective Alternate Remapping

Part of the code is still good: the chunk that creates the Hash and adds alternate objects to it.

# class.rb  (Alternative Object Creation)

class Class
  @@new_remap = {}

  def self.remap_new_object(o,r)
    @@new_remap[o] = r
  end
end

What’s still left is the retrieval of an alternate object.

Thinking more about the whole idea of returning alternate objects from new, this relatively small change is philosophically a radical break from standard Ruby. It probably makes sense to require classes that need object replacement during creation to explicitly include this capability. Only when a class elects to have the capability should the mechanism to alias the new method and lookup remappings be put into place - for just that class. Besides positively asserting the class’ intention to break from standard Ruby, this explicitness will help safeguard against cavalier use of such non-standard behavior.

The code to do this is the virtually same as was in the Class version:

# class.rb  (Alternative Object Creation)

class Class
  def allows_object_remapping_in_new
    self.class_eval <<EOS
class<<self
  alias old_new new

  def new(*args,&block)
    obj = old_new(*args,&block)
    Class.create_remap.delete(obj) || obj
  end
end
EOS
  end
end

To use this, a call to allows_object_remapping_in_new would be made by each class that needed remapping capability. By making the call, the metacode would add the retrieval mechanism to the class.

Note the metacode is surrounded by the class<<self ... end construct. This is required because it must be evaluated at the class level rather than the instance level. Ruby is intertwingled at this point: the instance methods of Class are the class methods of Class’ instances (the classes themselves.) This is what allows Ruby to be entirely object-oriented - it folds back on itself in a strange loop.

An Example

Consider a simple class Foo that either creates an instance of itself or an instance of another class Bar when new is called. (The criteria for deciding which to create is purposefully trivial here, but should demonstrate the idea - much more complicated logic can be used as the situation dictates.)

class Foo

  allows_object_remapping_in_new

  def initialize(n)
    Class.remap_new_object(self,Bar.new) if n > 0
  end

end

class Bar
end

puts Foo(0).new.class.name
puts Foo(1).new.class.name

When this code runs, it produces:

Foo
Bar

Simple and straightforward. Also, if Bar conditionally created and returned an instance of some other class, say Mumble, Bar’s initialize method would remap the allocated Bar instance to the Mumble instance. Then the Foo instance would remap to the Mumble instance and it would ultimately be what was returned from Foo’s new. Cascaded remappings like this require no additional effort or special knowledge outside the initialize method; multiple levels of alternative remapping happen naturally.

Last Words

So now there’s a generic Ruby way for object creation to return a different object than the one allocated.

However, care must be taken in the use of this mechanism, lest havoc ensue. Generally, if the object returned from new is remapped, the returned object should either be of the same type, or have duck-like aspects of the original where needed. If what is returned is not compatible, any manner of problems may take place. This is likely the reason that the semantics of standard Ruby’s new method are what they are. Careful (and perhaps exhaustive) testing of any code that uses this mechanism should definitely take place!

The mechanism has been added to the eymiha rubygem, available at rubygems.org.