Perl does late binding of method calls. This means that the particular subroutine that is invoked by a method call isn't definitely known until the method is invoked at run time.
In both cases, the underlying issue is whether certain decisions are made earlier or later: at compile time or run time. Languages like C++ require the programmer to make decisions early; Perl allows them to be made later.
Perl is occasionally criticized for these features. Some argue that weak typing impedes robust software design. And late binding does impose some run time overhead.
On the other hand, these features allow extraordinary flexibility in the design and code of Perl programs. Fundamentally, this is because decisions that are made early must be expressed in the text of the program, while decisions that are made later can be computed by the execution of the program, and execution is more powerful that text.
In this column, we discuss some ways to exploit this flexibility in the design and implementation of Perl modules.
+-----------+ +-----------+ +-----------+ | Text | | Spread | | CAD | | Editor | | Sheet | | Package | +----+------+ +----+------+ +----+------+ | | | +---------------+---------------+ | +----+------+ | Printer | | Driver | +-----------+In this example, three different applications are using the same printer driver.
A related, but different, benefit of modules is reimplementation. Once an application has been coded to a module interface, it can use any module that implements that interface—even modules that are written after the application. Schematically, reimplementation looks like this
+-----------+ | CAD | | Package | +----+------+ | +---------------+---------------+ | | | +----+------+ +----+------+ +----+------+ | Printer | | Plotter | | CRT | | Driver | | Driver | | Driver | +-----------+ +-----------+ +-----------+In this example, a single application is using three different device drivers.
Reuse doesn't depend upon late binding. If a module already exists, then a new application can use it, and it doesn't matter when it binds to it.
Pretty much by definition, reimplementation requires late binding. If an application has been written and bound (early) to a particular module, then it can't later use a different module—not without rebinding. Depending upon the language, rebinding may require relinking, recompiling, or even rewriting the application.
Languages that do early binding typically have special facilities to support reimplementation. In C++, you use virtual base classes. In Java, you use interfaces. In Perl, you just do it.
Consider, for example, the HTML::Stream
module. This module generates HTML. Nominally, it sends output to a filehandle, supplied by the caller:
$fh = new IO::File ">$file.html"; $stream = new HTML::Stream $fh;Internally,
HTML::Stream
writes to $fh
by calling its print
method:
$fh->print(...);However,
$fh
needn't be an IO::File
object. Because Perl does late binding, the only requirement on $fh
is that it be blessed into a package that has a print
method. This means that you can easily create and use alternate implementations of print
. The HTML::Stream
POD provides this example.
package StringHandle; sub new { my $class = shift; my $sh = ''; bless \$sh, $class; } sub print { my $sh = shift; $$sh .= join('', @_); } package main; use HTML::Stream; my $sh = new StringHandle; my $stream = new HTML::Stream $sh;A
StringHandle
object has a reference to a single string, and the StringHandle::print
method simply appends its arguments to that string. HTML::Stream
then outputs to this string, rather than to a file.
In the context of Perl, all this may seem unremarkable. In other languages, however, it can be difficult or impossible.
my $circle = new Circle $x, $y, $radius; $circle->draw($color, $weight);If you look inside the
Circle
package, you'll find that new
and draw
are both subroutines. But they seem to be used differently: draw
is called with the arrow operator, while new
isn't; draw
is called on a $circle
object, while new
is called on the Circle
package.
There are actually two separate distinctions wound up together here: the distinction between direct and indirect syntax, and the distinction between class and instance methods. These distinctions combine to make four different ways to call a method. All four are in use, so it is worth sorting them out.
The direct syntax uses the arrow operator. On the left is either a package name, or a reference that has been blessed into a package. On the right is the name of a subroutine within that package. Arguments are placed in parentheses following the method name.
Circle->new ($x, $y, $radius) $circle->draw($color, $weight)The indirect syntax is modeled after the syntax of Perl's own
print
statement. First comes the method name, followed by
either a package name or a blessed reference. Arguments follow the
package name. There is no comma between the
package name and the first argument.
new Circle $x, $y, $radius draw $circle $color, $weightThe difference between the direct and indirect syntax is just that: syntax. The semantics are exactly the same.
For example, new
is a class method. When we create a new circle, we write
new Circle $x, $y, $radiusto apply the
new
method to the Circle
class. We can't call new on a $circle
object (although
see Method Overloading, below), because we
don't have one yet: that's why we're calling new
.
Conversely, draw
is an instance method. When we draw a circle, we write
draw $circle $color, $weightto apply the
draw
method to a $circle
object. It wouldn't make sense to call draw
on the
Circle
class: as a class, Circle
represents
all circles in the abstract, not any particular circle that could
actually be drawn.
The difference between class and instance methods is a fundamental semantic distinction. You have to get it right, or your program won't work.
Syntax | |||
---|---|---|---|
Indirect | Direct | ||
Semantics | Class | new Circle $x, $y, $radius | Circle->new($x, $y, $radius) |
Instance | draw $circle $color, $weight | $circle->draw($color, $weight) |
It is very common in Perl code to call class methods (especially constructors) with indirect syntax and to call instance methods with direct syntax: to only use the upper left and lower right entries in the table. However, there is no requirement to do this. We can use either syntax with either semantics.
In many cases, the indirect syntax is more readable. One reason is that it mimics English word order. For example,
new Circlereads like adjective-noun, and
draw $circlereads like verb-object.
If we have Set::IntSpan
objects, then
union $a $bis perhaps more natural than
$a->union($b)which obscures the symmetry of the underlying operation.
On the other hand, the direct syntax is more powerful than the indirect syntax. With the direct syntax, you can chain method calls
Circle->new($x, $y, $r)->draw($c, $w); $a->union($b)->intersect($c)and you can do a computed method call by placing the method name in a scalar variable
$method = 'draw'; $circle->$method;
a bcan be parsed as either a method call
b->a()or as a subroutine call
a(b)The parser chooses the first if
b
is known to be a
package name at the point of the call, and the second if it is
not. Whether b
is known to be a package name, in turn,
can depend upon the load order of modules in the program.
Perl always gets it right, but humans cope poorly with this sort of distant ambiguity. Tom Christiansen gives examples of subtle bugs that can result, and the Perl docs suggest that the indirect syntax be avoided entirely.
However, this seems extreme. Variety of expression is one of the things that makes Perl such a lucid language. If you are uncertain how the parser will interpret an indirect call, then by all means use the direct syntax. But if you know what it does and it does what you want, then consider using the indirect syntax where it aids readability.
$_[0]
) is either the name
of the package on which the call was made (for class methods) or a
reference to the object on which the call was made (for instance
methods). For all but the most trivial methods, you will want to assign
this argument to a lexical (my
) variable inside the
method body.
my $x = shift;The question is what to name the variable.
For class methods, $class
and $package
are natural choices. In the common case where the class method is a constructor, you can then bless
the object into $package
.
sub new { my $package = shift; my $object = { }; bless $object, $package }
For instance methods, $self
and $this
are frequently seen. However, these may not be the best choices.
The names $self
and $this
function as pronouns. They don't name the object themselves; rather, they mean "whatever object this method was called on". It is up to the reader to remember what object that is. It seems a small point, but reading code is hard: we need all the help we can get.
As an alternative, consider naming the self variable after the package. Then we can write things like
sub Circle::move { my($circle, $dx, $dy) = @_; $circle->{x} += $dx; $circle->{y} += $dy; }and know, immediately, on every line, that the self variable refers to a
Circle
object. In effect, we save ourselves the trouble of doing an indirection in our head.
Here's a common example
sub Circle::new { my($self, $x, $y, $r) = @_; my $package = ref $self || $self; my $circle = { x => $x, y => $y, r => $r }; bless $circle, $package }With this definition, we can call
new
on either the package name or an existing object.
my $circle1 = Circle ->new($x, $y, $radius); my $circle2 = $circle1->new($x, $y, $radius);
Here's a method that loads data from a file, a file handle, a string or an array:
sub Doc::load { my($doc, $source) = @_; my $ref = ref $source; local *isa = \&UNIVERSAL::isa; not $ref and $doc->load_file ($source); isa $source, 'IO::File' and $doc->load_fh ($source); $ref eq 'SCALAR' and $doc->load_string($source); $ref eq 'ARRAY' and $doc->load_list ($source); }
You can also do a kind of overloading based on return type. The wantarray
operator returns true if the subroutine was called in list context, and false if it was called in scalar context.
One simple application of this is to return an array in list context, and an array reference in scalar context
sub MakeList { my @list = ... wantarray ? @list : \@list }Applications can then write
$list = MakeList;for efficiency, and
@list = MakeList;for simplicity.
$this
this
is the corresponding keyword in C++