Do not meddle in the affairs of wizards, for you are crunchy and good with ketchup.
This article is in five parts
November | Introduction | motivation, definitions, examples |
December | Architecture | the Perl interpreter, calling conventions, data representation |
January | Tools | h2xs, xsubpp, DynaLoader |
February | Modules | Math::Ackermann , Set::Bit |
March | Align::NW |
Needleman-Wunsch global optimal sequence alignment |
From the top, the problem is how to call a C subroutine from Perl. In order to do this, two things must happen:
We'll refer to these as control flow and data flow, respectively. In order to understand control and data flow, we must first understand
Finally, underlying all this is the fact that ultimately connects Perl to C: the Perl interpreter is a C program. We'll look at these ideas in more detail below.
Data type | Format |
---|---|
character | a single byte |
integer | 32-bit 2s complement |
double | 80 bits, allocated to sign, exponent and mantissa |
function pointer | address of the function entry point |
Complex data types are represented as aggregates of simple data types. For example, the C type
int x[2];
is represented by 8 bytes in memory: 4 bytes for x[0]
, followed immediately by 4 bytes for x[1]
. Similarly, the C type
struct S { int a; char b[4]; }
is represented by 8 bytes in memory: 4 bytes for a
, followed immediately by 4 bytes for the 4 elements of b
.
This sort of data representation has advantages and disadvantages.
Also, the data is stored in the formats that the CPU uses. This makes C code run fast, because the CPU can operate directly on the C representation, without doing any conversion or translation.
Other advantages of C data representation include simplicity and transparency.
For example, given 8 bytes in memory, there is no way to tell—from the bytes themselves—whether they contain an instance of int x[2]
, or struct S
, or some other data type, or no meaningful data at all.
All the information about data is implicit in the executable code of the program. The thing that makes 8 bytes of memory a struct S
is the fact that the program treats it as such: the fact that the program puts an int
in the first 4 bytes, and 1 character in each of the second 4 bytes.
Two specific disadvantages of C data representation are
Every data object in a Perl program is represented internally by a C struct. (Remember, the Perl interpreter is a C program.) For example, a scalar looks something like this
typedef enum { IOK = 0x01, /* has valid integer value */ POK = 0x02, /* has valid string value */ } Flags; struct Scalar { int refcnt; /* how many references to us */ Flags flags; /* what we are */ char *pv; /* pointer to malloc'd string */ int cur; /* length of pv as a C string */ int len; /* allocated size of pv */ int iv; /* integer value */ };
The Scalar
struct allows the interpreter to manage the type information for each scalar, so that programmers don't have to. For example, suppose we write
my $x = 42;
When the interpreter executes this statement, it
Scalar
structrefcnt
field to 1iv
field to 42IOK
flagPOK
flagIf we later write
print "$x";
the interpreter
pv
sprintf(pv, "%d", iv)
to convert iv to a stringPOK
flag
refcnt
starts out at 1 because $x
refers to the scalar. Whenever a reference to the scalar is created, the interpreter increments refcnt
; whenever a reference to the scalar goes away, the interpreter decrements refcnt
. When the last reference to the scalar (including $x
) goes away, refcnt
goes to zero and the interpreter frees the Scalar
struct.
Arrays are represented by a struct (declared in av.h
). Hashes are represented by a struct (declared in hv.h
). Code refs are represented by a struct (declared in cv.h
). Every data object in Perl is represented by a struct of one sort or another.
It frees the programmer from data typing. The interpreter knows the type of every data object; it converts types as necessary.
It also frees the programmer from storage allocation. The interpreter knows the size and location of every data object. It allocates, resizes and frees them as necessary. No memory leaks, no wild or dangling pointers. No C programmer's disease.
.h
files in /usr/local/lib/perl/
version/
architecture/CORE/
,
and are documented in perlguts. They constitute Perl's C language Application Programming Interface (API).
If you plan to write XS code, you should read perlguts to get an idea of how the API is organized, and what kinds of facilities it provides. When you actually write XS, you need to have perlguts at hand as a reference.
A dedicated CPU register, called the program counter (PC), holds the address of the next instruction to execute. In normal control flow, the CPU increments the PC as each instruction is executed. This causes the CPU to execute instructions in the order that they appear in memory.
C control structures, such as for
, while
, and if () then; else;
, require the CPU to execute jumps: from the bottom of a loop back to the top; around then
clauses and else
clauses that aren't taken. A jump instruction contains the address of the next instruction to execute; this is called the target of the jump. When the CPU executes a jump, it loads the target address directly into the PC, and execution continues there.
call
instructionreturn
instruction
The first instruction of a subroutine is called its entry point. A call
instruction contains the address of the entry point.
To execute a call
instruction, the CPU first increments the PC to obtain the address of the next instruction. This is called the return address. However, it does not execute the next instruction. Instead, it pushes the return address onto the processor stack. Then, it loads the entry point from the call
instruction into the PC and begins executing the subroutine.
At the end of the subroutine is a return
instruction. When the CPU executes a return
instruction, it pops the return address off of the stack, loads it into the PC, and resumes execution with the instruction that follows the original call
instruction.
The CPU keeps return addresses on a stack so that it can handle nested—and especially, recursive—subroutine calls.
Parameters are passed on the same stack as the return address. The compiler generates code that evaluates each parameter and pushes the parameter value onto the stack. After all the parameters are on the stack, the call
instruction is issued.
The subroutine can't pop its parameters off the stack: if it did, it would lose the return address. However, it knows the location of each parameter as an offset from the stack pointer, so it can access its parameters directly on the stack.
A C subroutine can return only one value. For simplicity and efficiency, the subroutine may pass the return value from the subroutine back to the caller in a CPU register. Alternatively, the caller can leave space for the return value on the stack. This is necessary if the return value is too big to fit into a register.
The execution of a Perl program divides into two similar phases
However, the division between the two phases is not so clean
BEGIN
blocks execute code during the translation phaseeval
and s///e
translate code during the execution phase.The ability to cross the line between translation and execution is a powerful feature of Perl.
cos
, open
, and sort
cmp
, +
, and <<
The children of a node represent its operands: the condition of an if
statement; numbers to be added; arguments to a subroutine.
The interpreter walks the tree in postfix order. This means that it walks all the children of a node before it walks the node itself. Since the children of a node may have children of their own, this is a recursive process.
Walking a node typically yields a value; for this reason, we also speak of evaluating a node. The interpreter keeps these values on a stack. We'll call this the Perl stack, to distinguish it from the processor stack mentioned earlier.
The things on the Perl stack are not the C structs that represent Perl data, but pointers to those structs. This is important both for efficiency, and for obtaining the correct semantics for Perl programs. However, we'll still speak of the things on the Perl stack as values.
To evaluate a node, the interpreter pops the operands for the node off of the stack, carries out whatever operation is specified by the node, and then pushes the resulting value back onto the stack. In this way, the values of child nodes become operands for their parent. Programs that manage data on a stack in this way are sometimes called stack machines.
$x = $y + 3;
This is parsed into a syntax tree like this
= / \ $x + / \ $y 3
This tree has 5 nodes. The table below shows their order of evaluation, and the contents of the stack after each node is evaluated.
Step | 1 | 2 | 3 | 4 | 5 |
Node | $x | $y | 3 | + | = |
Stack | \$x | 42 \$x |
3 42 \$x |
45 \$x |
Here is what happens in each step.
=
node will assign to $x
, the
interpreter pushes a reference to $x
, not its value.$y
turns out to be 42.3
evaluates to 3.+
node pops the values 3 and 42, adds them, and pushes
the sum onto the stack.=
pops the value 45 and the reference to
$x
, and carries out the assignment. Now the stack is empty.my $coderef = sub { ... }
or
sub foo { ... } my $coderef = \&foo;
in Perl. Internally, a code reference is represented by (what else?) a C struct. Among other things, the code reference has a pointer to the root of the syntax tree for the subroutine. We'll call this the root pointer.
A subroutine call in the program source is represented in the syntax tree by a fragment that looks like this
entersub | +----+--...---+ | | | arg1 arg2 ... argN
The entersub
opcode transfers control to the called subroutine. Its children evaluate the arguments of the call.
To execute a subroutine call, the interpreter first walks each child node, and pushes the result onto the Perl stack. When all the arguments are on the stack, the interpreter walks the entersub
node.
An entersub
opcode holds a pointer to a code reference. When the interpreter walks an entersub
, it follows this pointer to the code reference, and then follows the root pointer in the code reference to the syntax tree for the subroutine. Then it executes the subroutine.
As mentioned above, the things on the Perl stack are not the C structs that represent the arguments of the subroutine, but rather, pointers to those structs. This is the meaning of the statement in perlsub that
The array @_ is a local array, but its elements are aliases for the actual scalar parameters.
In other words, Perl passes parameters by reference, like FORTRAN, and unlike C.
If the subroutine returns any values, it pushes pointers to them onto the stack, in the same locations where its parameters were. After the subroutine returns, the caller retrieves the return values from the stack.
When the interpreter executes an entersub
opcode, it first checks the xsub pointer in the code reference. If the xsub pointer is null, it follows the root pointer to the syntax tree for the subroutine and walks it, as described above.
If the xsub pointer is not null, then the interpreter ignores the root pointer. Instead, it gets the address of the xsub from the xsub pointer, and calls the xsub. This is an eXternal Subroutine call. This how control passes from Perl to C. This is XS.
In order for a C subroutine to become an xsub, three things must happen
newXS(char *name, void (*fp)())
Given the name of a Perl subroutine in name
, and the address of the entry point of a C subroutine in fp
, newXS
will install fp
as the xsub pointer in the code reference for name
. Once this happens, any Perl code that calls name()
will invoke the C subroutine.
name
can be in any package. To install a subroutine called new()
in the package Align::NW
, we pass the string "Align::NW::new"
for name
. If the Align::NW
package implements a class, this has the effect of making the C subroutine a method of that class.
C subroutines begin as source code in source files. The compiler translates source files into object files, and the linker combines object files to produce a link library. A link library contains
Static linking requires us to rebuild the Perl interpreter. We compile the Perl source files into object files, and then link the Perl object files with our library to create a new version of the Perl executable.
With static linking, our C subroutines become a permanent part of the Perl interpreter. They are loaded into memory as part of the executable image whenever the interpreter runs. The interpreter knows where the entry points are because the linker resolves their addresses when it builds the executable.
Dynamic linking allows us to use the Perl interpreter unmodified. If a Perl program needs an xsub, it loads the library at run time and looks up the subroutine entry point in the symbol table. Different programs can install different xsubs.
We usually prefer dynamic linking, precisely because it allows us to write new XS modules without having to rebuild the Perl interpreter. However, XS modules can be statically linked. This is sometimes done for efficiency. A few modules (such as the DynaLoader) require static linking. Prior to Perl5, all language extensions required static linking.
In practice, dynamic linking is complex. One reason is that the internal format of link libraries is system dependent. Most systems provide facilities to load libraries and look up entry points. However, these facilities are also—you guessed it—system dependent. They all do more or less the same thing, but the names of the system calls and the precise semantics vary.
Even the names of the libraries are system dependent. In Unix systems,
they are called shared objects, and are identified by a
.so
extension. In Windows, they are called Dynamic
Link Libraries and are identified by a .dll
extension. On Macintosh, they are called shared libraries,
and are (usually) identified by a finder type of shlb
.
Another issue is that many systems treat dynamic link libraries specially. For example, they may use virtual memory to map them into shared memory pages. This makes it essential that we use the proper facilities on each system for dynamic linking.
Finally, the system calls that do dynamic linking, as well as the newXS()
call, are C language calls. This means that we have to call them from C code, not Perl code. We need an xsub to install an xsub.
DynaLoader::bootstrap
method. bootstrap
locates our link libraries, loads them, finds our entry points, and makes the appropriate newXS()
calls.
DynaLoader handles system dependencies by brute force. If you look in the Perl sources, you'll find about 10 different versions of the Dynaloader XS code. The Configure
script chooses the right version for each system when the Perl interpreter is built.
Tracing the data paths through the Dynaloader to understand how it installs xsubs is left as an exercise for the reader.
The problem is that the Perl interpreter puts parameters on the Perl stack, but a C subroutine expects to find parameters on the processor stack. Furthermore, the Perl parameters are pointers to C structs that represent Perl data, while C code expects parameters to be values in native hardware formats.
To solve this problem, something has to convert between Perl and C data representations. The Perl interpreter doesn't, so the xsub has to. Typically, the xsub uses facilities in the Perl C API to get parameters from the Perl stack and convert them to C data values. To return a value, the xsub creates a Perl data object and leaves a pointer to it on the Perl stack.
Coding directly to the Perl C API is hard. Furthermore, we often want to call existing C subroutines from Perl: subroutines that don't know about the Perl C API, and that do expect to find their parameters on the processor stack. We'll refer to an existing C subroutine as a target routine.
Rather than drag the Perl C API into all our C code, we usually write glue routines. The Perl interpreter calls the glue routine as an xsub. The glue routine converts the Perl parameters to C data values, and then calls the target routine, passing it the C data values as parameters on the processor stack. When the target routine returns, the glue routine creates a Perl data object to represent its return value, and pushes a pointer to that object onto the Perl stack. Finally, the glue routine returns control to the Perl interpreter.
Glue routines provide some structure for the data flow problem, but they are still hard to write. So we don't. Instead, we write XS code. XS is, more or less, a macro language. It allows us to declare target routines, and specify the correspondence between Perl and C data types. A program called xsubpp
reads XS code and generates glue routines in straight C.
Much of the work, and much of the wizardry, of XS has to do with
figuring out how to write XS code so that xsubpp
will do
the right thing. We'll talk about this in great detail—next
month.
/usr/local/lib/perl/
version/
architecture/CORE/sv.h
\$x
$x=\$x
, prevent reference counts from going to zero. If the programmer does not break the cycle explicitly, the memory is not freed until the program terminates.entersub
node can't hold the root pointer itself, because someone might write
sub One { print "One\n" } One(); eval 'sub One { print "Two\n" }'; One();The interpreter can find the code reference through the symbol table and reassign the root pointer during the execution phase. In contrast, the interpreter has no way of finding an
entersub
node after the translation phase ends.