Do not meddle in the affairs of wizards, for you are crunchy and good with ketchup.
This article is in five parts
November | Introduction | motivation, definitions, examples |
December | Architecture | the Perl interpreter, calling conventions, data representation |
January | Tools | h2xs, xsubpp, DynaLoader |
February | Modules | Math::Ackermann , Set::Bit |
March | Align::NW |
Needleman-Wunsch global optimal sequence alignment |
Two months ago, we presented a problem that could benefit from an XS implementation. Last month, we discussed the architecture of XS. This month, we discuss the tools that are used to write XS.
h2xs
and xsubpp
. Like any tools, these are easier to use if you understand how they are intended to be used. Chisels cut with the grain, not across it. To understand how the XS tools are intended to be used, we need some historical background.
Perl is often used for tasks that were formerly done with shell scripts, C programs, and assorted Unix tools, such as find(1), awk(1), sed(1), and sort(1). To help programmers port existing software to Perl, the Perl distribution includes some translation utilities, such as find2perl
, a2p
(awk to Perl), and s2p
(sed to Perl). The output of these utilities may require some editing, but they generate reasonably complete and correct translations.
There is no c2p
. Such a program would be difficult to write; besides, a direct translation from C to Perl is rarely desirable. More commonly, we want to call existing C code from new Perl programs.
h2xs
.h
file. To generate interfaces to C code, Perl provides h2xs
. h2xs
is a utility that reads a .h
file and generates an outline for an XS interface to the C code. This includes
Makefile.PL
file.xs
file.pm
file
However, the output of h2xs
is not a complete, or even a nearly complete, XS interface. It is merely a beginning. It is a valuable beginning: it includes some boilerplate that is difficult to generate by hand. But it is only a beginning.
Neither is the output of h2xs
necessarily correct. Interfacing Perl to C is a hard problem. h2xs
makes guesses about how to do it; sometimes it guesses wrong.
If you run h2xs
assuming that the results will be complete and correct—assuming that you will find structure and coherence in its output—then you are going to be very confused, and very frustrated. To move forward from the outline that h2xs
generates, you must accept it as strictly provisional.
Similar issues surround the inputs to h2xs
. h2xs
takes many command line options. However, these do not constitute a complete and coherent system for making h2xs
do what you need. Rather, they have accumulated over time, each one added to meet a particular need, in a particular context. Many of these options are useful, but there is not necessarily any combination of them that will make h2xs
do the Right Thing for you. You have to take what you can get and go forward from there.
xsubpp
xsubpp
is the program that translates XS code to C code. XS is sometimes referred to as a language, but it is better thought of as a collection of macros; xsubpp
is the macro expander. Again, the XS macros do not constitute a complete and coherent language for interfacing Perl to C. They have accumulated over time, each one added to meet a particular need.
Writing XS doesn't require an understanding of the deep structure of the macros—there isn't any. Rather, it requires searching through perlxs to find a macro that does what you need, and then using that macro.
h2xs
h2xs
from the command line.
h2xs
is a process of successive refinement. You should create a development directory for this purpose. In the examples below, we'll refer to the development directory as
.../development/
When you run h2xs
, it creates a new directory within the development directory to hold the module sources; we'll call this the module directory. The module directory is created on a path that maps the module name. For example, the module directory for Align::NW
is
.../development/Align/NW/
h2xs
was originally written to generate XS interfaces for existing C libraries. At its simplest, you specify the header file for a library, and it creates and populates a module directory. If the header file is /usr/include/rpcsvc/rusers.h
, we can do
.../development>h2xs rpcsvc/rusers Writing Rusers/Rusers.pm Writing Rusers/Rusers.xs Writing Rusers/Makefile.PL Writing Rusers/test.pl Writing Rusers/Changes Writing Rusers/MANIFEST
h2xs
searches for the header file in the current directory and on the standard include paths, and complains if it doesn't find it.
.../development>h2xs foo Can't find foo.h
h2xs
names the module and the module directory after the header file. It upcases the first letter of the name, in accordance with the Perl convention that module names have leading capitals.
If you don't like the module name that h2xs
generates, you can specify a different one with the -n
flag.
.../development>h2xs -n RPC::Rusers rpcsvc/rusers Writing RPC/Rusers/Rusers.pm Writing RPC/Rusers/Rusers.xs Writing RPC/Rusers/Makefile.PL Writing RPC/Rusers/test.pl Writing RPC/Rusers/Changes Writing RPC/Rusers/MANIFEST
The -n
flag controls both the name of the module directory and the name of the Perl module; in this case, the Perl module will be RPC::Rusers
. The -n
flag doesn't affect the search for the header file: h2xs
still finds the header in /usr/include/rpcsvc/rusers.h
.
Align::NW
module. Align::NW
isn't an interface to an existing library, and its headers aren't in /usr/include/
. It is a new Perl module that is partly implemented in C.
Because the C code for Align::NW
is part of the module, it ought to live in the module directory. The Perl code will be in Align/NW/NW.pm
, and it is tempting to name the C sources to match
.../development/Align/NW/NW.pm .../development/Align/NW/NW.c .../development/Align/NW/NW.h
However, this won't work. The problem is that h2xs
is going to create
.../development/Align/NW/NW.xs
and xsubpp
will translate NW.xs
into
.../development/Align/NW/NW.c
which collides with our NW.c
file.
Another possibility is to integrate the C code into the .xs
file
.../development/Align/NW/NW.pm .../development/Align/NW/NW.xs # contains our C code .../development/Align/NW/NW.h
This works, because anything in a .xs
file that isn't an XS macro is passed through unchanged by xsubpp
to the .c
file. Some XS modules implement large amounts of C code directly in the .xs
file; ultimately, the distinction between XS code and C code becomes arbitrary.
However, I prefer to keep the bulk of my C code in .c
files, and reserve the .xs
file for glue routines. Reasons for this include
.c
files together with a main.c
and test them in a stand-alone C program..xs
file as possible.
The C code in Align::NW
implements a single Perl method, named score
. We'll name our C sources score.c
and score.h
. Then we can create and populate the module directory like this
.../development>ls score.c score.h .../development>h2xs -n Align::NW score Writing Align/NW/NW.pm Writing Align/NW/NW.xs Writing Align/NW/Makefile.PL Writing Align/NW/test.pl Writing Align/NW/Changes Writing Align/NW/MANIFEST .../development>cp score.c score.h Align/NW/
#define
constants that appear in their interfaces. h2xs
parses these constants and makes them available to the Perl module as methods. For example, if score.h
contained the lines
#define FOO 17 #define BAR 42
then the values 17
and 42
would be available to Perl code as the return values of Align::NW::FOO()
and Align::NW::BAR()
, respectively.
h2xs
doesn't do this by creating FOO()
and BAR()
methods. Instead, it creates Align::NW::AUTOLOAD()
in Align/NW.pm
, and a C routine named constant()
in Align/NW.xs
.
Calls to FOO()
and BAR()
are handled by Align::NW::AUTOLOAD()
. AUTOLOAD()
calls constant()
, and constant()
returns the value #define
'd in the .h
file.
Align::NW::AUTOLOAD()
enforces a Perl function prototype on constant methods. To satisfy this prototype, you have to predeclare any constant methods that you use, like this
sub FOO (); sub BAR ();
h2xs
with the -c
switch. This suppresses the AUTOLOAD
routine from the .pm
file and the constant
routine from the .xs
file.
If you don't need the AutoLoader
for anything else, you can run h2xs
with the -A
switch. -A
implies -c
, and additionally suppresses inheritance from AutoLoader
.
h2xs
can parse function prototypes and generate glue routines based on them. It doesn't always guess right about how to convert parameters, so we may have to edit the glue by hand. Even so, this can save us some typing. To automatically generate glue routines, do
.../development>h2xs -n Align::NW -A -O -x -F '-I ../..' score.h
The -n
and -A
flags are as before. If you've previously run h2xs
, you'll need the -O
flag to force it overwrite the existing Align/NW/*
files. The -x
flag tells h2xs
to generate glue routines based on the function prototypes in score.h
.
The -x
flag uses the C::Scan
module to locate header files. You'll need to have this module installed on your system in order to use -x
.
We run h2xs
from the development directory, but C::Scan
is cd
'd to the module directory when it searches for header files. The -F
flag specifies additional switches for C::Scan
to pass to the C preprocessor. We pass a -I ../..
switch to tell the preprocessor to search for headers two levels up, in the development directory. This allows it to find score.h
.
h2xs
, we're going to look inside an .xs
file to see what is there.
.xs
file. xsubpp
translates XS routines to xsubs for us.xsubpp
emits when it generates an xsub.hypotenuse.c
double hypotenuse(double x, double y) { return sqrt(x*x + y*y); }
and its prototype in hypotenuse.h
double hypotenuse(double x, double y);
The Perl routine is Geometry::hypotenuse()
. We want calls to Geometry::hypotenuse()
to invoke the target routine.
h2xs
h2xs
.../development>h2xs -n Geometry -A hypotenuse.h
and it generates Geometry/Geometry.xs
as
#include "EXTERN.h" #include "perl.h" #include "XSUB.h" #include <hypotenuse.h> MODULE = Geometry PACKAGE = Geometry
I've omitted the -x
flag. This is an instance where it guesses wrong about parameter conversion. It is easy enough to fix, but you have to understand the typemap, which we haven't discussed yet.
Geometry.xs
Geometry.xs
in detail.
#include
s give our XS code access to the Perl C API. Through them you can find all the entry points and data types mentioned in perlguts.
hypotenuse.h
#include
gives our XS code access to our own C header file: hypotenuse.h
. h2xs
searches for header files in the current working directory and on the standard include path; however, the #include
directive that it emits uses angle brackets instead of quotes. Angle brackets instruct the C compiler to only search for header files on the standard include path. We're going to put hypotenuse.h
in the module directory, which is not on the standard include path, so we need to edit the #include
directive to use quotes.
#include "hypotenuse.h"
Then the C compiler will find hypotenuse.h
in the module directory.
MODULE
and PACKAGE
MODULE
and PACKAGE
are XS directives. They specify the module and package for our xsubs. This is easy to understand if we remember the underlying definitions
We write xsubs in XS; xsubpp
translates the XS code to straight C; the C compiler compiles the C code into link libraries; the makefile installs those libraries, and the DynaLoader
loads those libraries at run time. In order to load a library, the DynaLoader
needs to know two things:
The MODULE
directive tells it the file, and the PACKAGE
directive tells it the namespace.
The PACKAGE
directive names a Perl package, like Geometry
or Align::NW
. xsubpp
then generates code to install xsubs in that package.
The MODULE
directive doesn't name an actual file: it names a Perl package, just like the PACKAGE
directive. xsubpp
maps that package name into a file name, like Geometry.so
or Align/NW.so
. The makefile installs that file on an appropriate path in the Perl library, and the DynaLoader
finds it there.
An .xs
file may contain multiple MODULE
and PACKAGE
directives. MODULE
and PACKAGE
directives should always appear together, as shown above. All MODULE
directives in an .xs
file should name the same module. PACKAGE
directives can name different packages as necessary to place different xsubs into different Perl packages; this is quite analogous to the use of repeated package
statements in ordinary Perl code.
Now we'll start adding things to Geometry.xs
.
PROTOTYPES
PROTOTYPES
directive tells xsubpp
whether or not to install our xsubs with prototypes. Write
PROTOTYPES: ENABLE
or
PROTOTYPES: DISABLE
to enable or disable prototypes.
The PROTOTYPES
directive goes below the MODULE
directive. If you put it above the MODULE
directive, it will be passed through to the C compiler, and cause compilation errors.
h2xs
predates prototypes in Perl, and does not emit a PROTOTYPES
directive for you. xsubpp
complains if you forget to add one. I generally enable prototypes, unless I have some reason not to.
PROTOTYPES
directive come XS routines.
An XS routine can contain nearly arbitrary code. However, in simple cases, all it needs to do is describe the signature of the target routine. To do this, it specifies
Here is an XS routine that describes our target routine
double hypotenuse(x, y) double x double y
The newlines are significant; the indentation is not. However, this is the style that h2xs
uses, and I usually follow it.
The name of the XS routine is hypotenuse
. xsubpp
derives the name of the Perl routine from the name of the XS routine. In this example, xsubpp
also determines the name of the target routine from the name of the XS routine. Later on, we'll see examples where the target routine has a different name than the XS routine.
Makefile.PL
and add the name/value pair
'OBJECT' => 'Geometry.o hypotenuse.o'
to the arguments of WriteMakefile
. Then do
.../development/Geometry>cp ../hypotenuse.c . .../development/Geometry>cp ../hypotenuse.h . .../development/Geometry>perl Makefile.pl .../development/Geometry>make
Makefile.pl
writes a makefile. The makefile runs
xsubpp
to translate Geometry.xs
to Geometry.c
Geometry.c
to Geometry.o
Geometry.o
into a link libraryGeometry.c
Geometry.c
, edited a bit for clarity.
#include "EXTERN.h" #include "perl.h" #include "XSUB.h" #include "hypotenuse.h" XS(XS_Geometry_hypotenuse) { dXSARGS; if (items != 2) croak("Usage: Geometry::hypotenuse(x, y)"); { double x = (double)SvNV(ST(0)); double y = (double)SvNV(ST(1)); double RETVAL; RETVAL = hypotenuse(x, y); ST(0) = sv_newmortal(); sv_setnv(ST(0), (double)RETVAL); } XSRETURN(1); } XS(boot_Geometry) { dXSARGS; char* file = __FILE__; XS_VERSION_BOOTCHECK ; newXSproto("Geometry::hypotenuse", XS_Geometry_hypotenuse, file, "$$"); XSRETURN_YES; }
Geometry.c
is a ordinary C source file, suitable for compilation. It looks strange because it is written with XS macros. Let's decode the macros and see how it works.
#include
s are passed through unchanged from the .xs
file. The C compiler will need them.
XS_Geometry_hypotenuse
is the actual xsub that is generated by xsubpp
. The xsub name is pasted together from
XS
PACKAGE
directive
The XS()
macro declares XS_Geometry_hypotenuse
with the return type and parameters that Perl expects an xsub to have. These are not the parameters to hypotenuse()
; we will get those from the Perl stack.
dXSARGS
is another XS macro; it declares some local variables that the xsub needs.
One of the locals declared by dXSARGS
is items
; this gives the number of arguments that were passed to the xsub on the Perl stack. As declared, hypotenuse()
requires 2 arguments; the xsub emits a usage message if hypotenuse()
is called from Perl with the wrong number of arguments.
Next comes the code that extracts arguments from the Perl stack
double x = (double)SvNV(ST(0)); double y = (double)SvNV(ST(1));
ST()
is an XS macro that accesses an argument on the Perl stack: ST(0)
is the first argument, ST(1)
is the second, and so on.
Perl passes parameters by reference, so the things on the stack are pointers to the underlying scalars. SvNV
is an entry point in the Perl C API. It takes a pointer to a scalar and returns the value of that scalar as a number. xsubpp
adds a (double)
typecast to quiet the C compiler, and assigns that value to a local variable: x
for ST(0)
and y
for ST(1)
.
xsubpp
also declares a local variable to hold the return value of the subroutine.
double RETVAL;
This variable is always named RETVAL
, but it is declared with whatever type the subroutine returns.
With x
, y
, and RETVAL
set up, xsubpp
can generate a call to the target routine. xsubpp
emits the name of the XS routine as the name of the target routine.
RETVAL = hypotenuse(x, y);
There is no magic here. This is a perfectly ordinary C subroutine call. Don't get used to it.
The next two lines return the value to Perl.
ST(0) = sv_newmortal(); sv_setnv(ST(0), (double)RETVAL);
Return values go on the Perl stack, starting at ST(0)
. sv_newmortal
and sv_setnv
are entry points in the Perl C API. sv_newmortal
creates a new scalar value. Like any scalar, it has an initial value of undef
. sv_setnv
sets the value of the scalar to the value that was returned from hypotenuse
.
Finally, the XSRETURN(1)
macro tells the interpreter how many values we are returning on the Perl stack: in this case, one.
boot_Geometry
is the subroutine that DynaLoader
calls to install the xsubs in the Geometry
module. The subroutine name is pasted together from
boot
MODULE
directive
To install an xsub, boot_Geometry
calls
newXSproto("Geometry::hypotenuse", XS_Geometry_hypotenuse, file, "$$");
newXSproto
is an entry point in the Perl C API. Its arguments are
newXSproto
installs the C subroutine XS_Geometry_hypotenuse
as an xsub for the Perl routine Geometry::hypotenuse
. It supplies a prototype, because we specified PROTOTYPES: ENABLE
in the .xs
file. The source file name is provided so that Perl can report it in error messages.
The name of the Perl routine is constructed from
PACKAGE
directive
xsubpp
only generates one boot routine per module. The boot routine makes one call to newXSproto
for each xsub in the module.
Geometry/test.pl
and add the line
print Geometry::hypotenuse(3, 4), "\n";
at the end. Then do
.../development/Geometry>make test
The output should be
1..1 ok 1 5
r2p
hypotenuse()
has a simple signature; given that signature, xsubpp
can generate code to call it. In more complex cases, we have to write some of the code ourselves. XS provides directives that allow us to supply C code directly, instead of relying on xsubpp
. In the examples below, we'll use these to take over progressively more control from xsubpp
.
Here is another target routine, in a file called r2p.c
double r2p(double x, double y, double *theta) { *theta = atan2(y, x); return sqrt(x*x + y*y); }
and its prototype in r2p.h
double r2p(double x, double y, double *theta);
r2p
converts rectangular to polar coordinates, so it has to return 2 values: a magnitude and an angle. The magnitude is the return value of the subroutine; the angle is returned in a third parameter, passed by address. If we write the XS routine as
double r2p(x, y, theta) double x double y double theta
then xsubpp
will treat theta
as an input parameter. It will initialize it from the Perl stack, and won't return a value in it. Instead, we write the XS routine as
double r2p(x, y, theta) double x double y double theta = NO_INIT CODE: RETVAL = r2p(x, y, &theta); OUTPUT: RETVAL theta
The NO_INIT
directive suppresses initialization from the Perl stack.
The CODE
directive tells xsubpp
that we will supply C code to call the target routine. xsubpp
still declares RETVAL
for us, but we have to assign the return value to it. The call to r2p
is
RETVAL = r2p(x, y, &theta);
This is not an XS directive; it is a C statement, and will be passed through to the C compiler. Therefore, it ends with a semicolon.
The OUTPUT
directive lists values that are to be copied back to Perl scalars. The order in which we list them doesn't matter; xsubpp
knows where each value goes. We need to return both RETVAL
and theta
.
Here is the xsub that xsubpp
generates for this XS routine.
XS(XS_Geometry_r2p) { dXSARGS; if (items != 3) croak("Usage: Geometry::r2p(x, y, theta)"); { double x = (double)SvNV(ST(0)); double y = (double)SvNV(ST(1)); double theta; double RETVAL; RETVAL = r2p(x, y, &theta); sv_setnv(ST(2), (double)theta); SvSETMAGIC(ST(2)); ST(0) = sv_newmortal(); sv_setnv(ST(0), (double)RETVAL); } XSRETURN(1); }
It looks very much like the xsub for hypotenuse
. xsubpp
declares theta
for us, so that we can pass its address to r2p
. It also generates these lines to return theta
to Perl
sv_setnv(ST(2), (double)theta); SvSETMAGIC(ST(2));
It knows to assign theta
to ST(2)
, because we declared theta
as the 3rd parameter to r2p
. SvSETMAGIC
ensures that the scalar at ST(2)
will be created, if necessary. It must be created, for example, if it is a non-existent array or hash value.
r2p
to the Geometry
module. Copy r2p.c
and r2p.h
into the module directory and add r2p.o
to the OBJECT list in Makefile.PL. Add an
#include "r2p.h"
line and the XS code shown above to Geometry.xs
. Add
my $theta; my $r = Geometry::r2p(3, 4, $theta); print "$r, $theta\n";
to test.pl
. Now do
.../development/Geometry>perl Makefile.pl .../development/Geometry>make .../development/Geometry>make test
The output should be
1..1 ok 1 5 5, 0.927295218001612
r2p_list
r2p
as
($r, $theta) = r2p_list($x, $y);
We can obtain this calling sequence with this XS routine
void r2p_list(x, y) double x double y PREINIT: double r; double theta; PPCODE: r = r2p(x, y, &theta); EXTEND(SP, 2); PUSHs(sv_2mortal(newSVnv(r ))); PUSHs(sv_2mortal(newSVnv(theta)));
There are a few differences between this XS routine and the one that we wrote above for r2p
.
The name of the XS routine doesn't match the name of the target routine. xsubpp
doesn't need the name of the target routine, because we are supplying the code to call the target routine. xsubpp
still uses the name of the XS routine to derive the name of the Perl routine.
The return type of r2p_list
is void
. This doesn't mean that r2p_list
doesn't return anything. Rather, it tells xsubpp
that we will supply the code to return values to Perl. Therefore, xsubpp
doesn't declare RETVAL
for us.
The PREINIT
directive gives us a place to declare C variables. Without it, xsubpp
might emit executable C code before our variable declarations, which is a syntax error in C. We declare two C variables: r
and theta
.
The PPCODE
directive is similar to the CODE
directive. It tells xsubpp
that we will supply both the C code to call r2p
and the PP code to return values to Perl. PP code is Perl Pseudocode; it is the internal language that the Perl interpreter executes.
The C code to call r2p
is
r = r2p(x, y, &theta);
and the PP code to return values to Perl is
EXTEND(SP, 2); PUSHs(sv_2mortal(newSVnv(r ))); PUSHs(sv_2mortal(newSVnv(theta)));
The EXTEND
macro allocates space on the stack for 2 scalars, and the PUSHs
macros push the scalars onto the stack. The PP macros are passed through to the C compiler, so they end with semicolons, like any other line of C code.
The xsub that xsubpp
generates is
XS(XS_Geometry_r2p_list) { dXSARGS; if (items != 2) croak("Usage: Geometry::r2p_list(x, y)"); SP -= items; { double x = (double)SvNV(ST(0)); double y = (double)SvNV(ST(1)); double r; double theta; r = r2p(x, y, &theta); EXTEND(SP, 2); PUSHs(sv_2mortal(newSVnv(r ))); PUSHs(sv_2mortal(newSVnv(theta))); PUTBACK; return; } }
xsubpp
emits code to extract our arguments from the Perl stack, as before. It passes our C variable declarations and our subroutine call through unchanged. It also passes our PP code through.
The biggest difference between XS_Geometry_r2p
and XS_Geometry_r2p_list
is the stack management. XS_Geometry_r2p
uses an XSRETURN(1)
macro call to return one value on the stack. XS_Geometry_r2p_list
lowers SP
by the number of input parameters, and then issues a PUTBACK
macro before returning.
I don't actually understand what any of the stack macros do. I wrote the glue routines shown above by following the examples in perlxs. The macros are defined in /usr/local/lib/perl5/version/architecture/CORE/*.h
, but when I tried reading them, I quickly got lost in a maze of #define
s, #ifdef
s, typedef
s, and internal Perl data structures.
Lacking a principled understanding of Perl stack management, you can't actually write PP code: all you can do is follow working examples, as I have. The examples in perlxs appear to be adequate for most xsubs.
r2p_open
CODE
or a PPCODE
directive in our XS code, we can put any C code in the XS routine.
In r2p_open
, we dispense with the r2p
routine, and compute r
and theta
in open code.
void r2p_open(x, y) double x double y PREINIT: double r; double theta; PPCODE: r = sqrt(x*x + y*y); theta = atan2(y, x); EXTEND(SP, 2); PUSHs(sv_2mortal(newSVnv(r ))); PUSHs(sv_2mortal(newSVnv(theta)));
Here is the xsub that xsubpp
emits. It looks just like the xsub for r2p_list
, except for the lines that compute r
and theta
.
XS(XS_Geometry_r2p_open) { dXSARGS; if (items != 2) croak("Usage: Geometry::r2p_open(x, y)"); SP -= items; { double x = (double)SvNV(ST(0)); double y = (double)SvNV(ST(1)); double r; double theta; r = sqrt(x*x + y*y); theta = atan2(y, x); EXTEND(SP, 2); PUSHs(sv_2mortal(newSVnv(r ))); PUSHs(sv_2mortal(newSVnv(theta))); PUTBACK; return; } }
Add these lines to Geometry/test.pl
to test our new xsubs.
($r, $theta) = Geometry::r2p_list(3, 4); print "$r, $theta\n"; ($r, $theta) = Geometry::r2p_open(3, 4); print "$r, $theta\n";
When we run
.../development>make test
we get
1..1 ok 1 5 5, 0.927295218001612 5, 0.927295218001612 5, 0.927295218001612
For reference, here are the final versions of
xsubpp
emits. Now, we're going to look at how xsubpp
converts data between Perl and C representations.
Here's the problem. When the Perl interpreter calls a subroutine, it pushes a list of scalars onto the Perl stack. On input, an xsub has to get those scalars off the stack and convert them to C data. On output, the xsub has to convert C data to Perl scalars and put the scalars back on the stack. xsubpp
must emit the C code to do these conversions.
Conversion between Perl and C data types is handled with macros and routines in the Perl C API, but the necessary operations vary, depending on the C data types and the direction of the conversion. Consider:
C data type | input | output |
---|---|---|
int n |
n = (int ) SvIV(ST(0)) |
sv_setiv( ST(0), (IV )n ) |
double x |
x = (double) SvNV(ST(0)) |
sv_setnv( ST(0), (double)x ) |
char *psz |
psz = (char *) SvPV(ST(0),na) |
sv_setpv((SV*)ST(0), psz) |
We could imagine a big switch statement inside xsubpp
to select the right code fragment for each C data type, but this would be clumsy and inflexible. It would be better to put the code fragments in a table, like the one shown above.
If we start writing such a table, we quickly discover that the mapping between Perl and C datatypes is not one-to-one. As a strongly typed language, C distinguishes more data types than Perl does. For example, these seven C integer types are all converted with essentially the same code fragment, the only variation being the typecast used to quiet the C compiler.
C data type | input | output |
---|---|---|
int n |
n = (int )SvIV(ST(0)) |
sv_setiv(ST(0), (IV)n) |
unsigned n |
n = (unsigned )SvIV(ST(0)) |
sv_setiv(ST(0), (IV)n) |
unsigned int n |
n = (unsigned int )SvIV(ST(0)) |
sv_setiv(ST(0), (IV)n) |
long n |
n = (long )SvIV(ST(0)) |
sv_setiv(ST(0), (IV)n) |
unsigned long n |
n = (unsigned long )SvIV(ST(0)) |
sv_setiv(ST(0), (IV)n) |
short n |
n = (short )SvIV(ST(0)) |
sv_setiv(ST(0), (IV)n) |
unsigned short n |
n = (unsigned short)SvIV(ST(0)) |
sv_setiv(ST(0), (IV)n) |
In view of this, xsubpp
uses a two-level mapping. First, it maps C data types to XS types, like this
C data type | XS type |
---|---|
int |
T_IV |
unsigned |
T_IV |
char |
T_CHAR |
char * |
T_PV |
Then it maps the XS types to code fragments, in two tables: one for input
XS type | input code fragment |
---|---|
T_IV |
$var = ($ntype)SvIV($arg) |
T_CHAR |
$var = (char)*SvPV($arg,na) |
T_PV |
$var = ($ntype)SvPV($arg,na) |
and one for output
XS type | output code fragment |
---|---|
T_IV |
sv_setiv ($arg, (IV)$var); |
T_CHAR |
sv_setpvn($arg, (char *)&$var, 1); |
T_PV |
sv_setpv ((SV*)$arg, $var); |
These tables constitute the typemap.
The XS types are meaningful only to xsubpp
, and appear only in the typemap. They do not appear in Perl code, XS code, or C code.
$var
, $ntype
, and $arg
$var
$ntype
$var
$arg
xsubpp
is a Perl program. When it needs to convert an argument from Perl to C, it sets $var
, $ntype
, and $arg
, obtains the appropriate code fragment from the typemap, and eval
s the fragment to replace the Perl variables with their values.
For example, consider this XS routine
int max(a, b) int a int b
To generate code to convert the first parameter from Perl to C, xsubpp
sets the Perl variables like this
variable | value |
---|---|
$var |
a |
$ntype |
int |
$arg |
ST(0) |
Then, it eval
s the fragment
$var = ($ntype)SvIV($arg)
to yield the C code
a = (int)SvIV(ST(0))
It is important to understand how these variables work, because sometimes you have to arrange for them to have the right values in order to make xsubpp
do what you want. We'll see an example of this next month when we write the XS code for Align::NW
.
# A typemap file TYPEMAP int T_IV SV * T_SV INPUT T_SV $var = $arg T_IV $var = ($ntype)SvIV($arg) OUTPUT T_SV $arg = $var; T_IV sv_setiv($arg, (IV)$var);
The first TYPEMAP
header may be omitted.
Files containing typemaps are conventionally named typemap
. xsubpp
can read and aggregate multiple typemap files to construct the typemap; entries in later files override entries in earlier files.
Perl supplies a default typemap in
/usr/local/lib/perl5/version/ExtUtils/typemap
XS modules may provide a local typemap file in the module directory. If the module declares structs or other C data types, it can map them to XS types in a TYPEMAP section. Local typemaps rarely need INPUT or OUTPUT sections; the default typemap almost always contains appropriate code fragments.
Next month, we'll use these tools to complete the XS implementation of Align::NW
.
The mapping is similar, but not identical, to that used in the installation directory. NW.pm
is developed in
.../development/Align/NW/NW.pm
but installed in
/usr/local/lib/perl5/site_perl/version/Align/NW.pm
The extra /NW/
in the development area is necessary so that we can have, for example,
.../development/Align/NW/Makefile.PL .../development/Align/SW/Makefile.PL
without conflict.
h2xs
POD consistently uses the term extension for module.h2xs
, you may still find a
require AutoLoader;statement in your
.pm
file. You can delete it if you like.
sv_newmortal
t/*.t
files. See Module Mechanics for details.SvSETMAGIC
typename
keyword in The C++ Programming Language.