Do not meddle in the affairs of wizards, for you are crunchy and good with ketchup.
This article is in five parts
| November | Introduction | motivation, definitions, examples |
| December | Architecture | the Perl interpreter, calling conventions, data representation |
| January | Tools | h2xs, xsubpp, DynaLoader |
| February | Modules | Math::Ackermann, Set::Bit |
| March | Align::NW |
Needleman-Wunsch global optimal sequence alignment |
Two months ago, we presented a problem that could benefit from an XS implementation. Last month, we discussed the architecture of XS. This month, we discuss the tools that are used to write XS.
h2xs and xsubpp. Like any tools, these are easier to use if you understand how they are intended to be used. Chisels cut with the grain, not across it. To understand how the XS tools are intended to be used, we need some historical background.
Perl is often used for tasks that were formerly done with shell scripts, C programs, and assorted Unix tools, such as find(1), awk(1), sed(1), and sort(1). To help programmers port existing software to Perl, the Perl distribution includes some translation utilities, such as find2perl, a2p (awk to Perl), and s2p (sed to Perl). The output of these utilities may require some editing, but they generate reasonably complete and correct translations.
There is no c2p. Such a program would be difficult to write; besides, a direct translation from C to Perl is rarely desirable. More commonly, we want to call existing C code from new Perl programs.
h2xs.h file. To generate interfaces to C code, Perl provides h2xs. h2xs is a utility that reads a .h file and generates an outline for an XS interface to the C code. This includes
Makefile.PL file.xs file.pm file
However, the output of h2xs is not a complete, or even a nearly complete, XS interface. It is merely a beginning. It is a valuable beginning: it includes some boilerplate that is difficult to generate by hand. But it is only a beginning.
Neither is the output of h2xs necessarily correct. Interfacing Perl to C is a hard problem. h2xs makes guesses about how to do it; sometimes it guesses wrong.
If you run h2xs assuming that the results will be complete and correct—assuming that you will find structure and coherence in its output—then you are going to be very confused, and very frustrated. To move forward from the outline that h2xs generates, you must accept it as strictly provisional.
Similar issues surround the inputs to h2xs. h2xs takes many command line options. However, these do not constitute a complete and coherent system for making h2xs do what you need. Rather, they have accumulated over time, each one added to meet a particular need, in a particular context. Many of these options are useful, but there is not necessarily any combination of them that will make h2xs do the Right Thing for you. You have to take what you can get and go forward from there.
xsubppxsubpp is the program that translates XS code to C code. XS is sometimes referred to as a language, but it is better thought of as a collection of macros; xsubpp is the macro expander. Again, the XS macros do not constitute a complete and coherent language for interfacing Perl to C. They have accumulated over time, each one added to meet a particular need.
Writing XS doesn't require an understanding of the deep structure of the macros—there isn't any. Rather, it requires searching through perlxs to find a macro that does what you need, and then using that macro.
h2xsh2xs from the command line.
h2xs is a process of successive refinement. You should create a development directory for this purpose. In the examples below, we'll refer to the development directory as
.../development/
When you run h2xs, it creates a new directory within the development directory to hold the module sources; we'll call this the module directory. The module directory is created on a path that maps the module name. For example, the module directory for Align::NW is
.../development/Align/NW/
h2xs was originally written to generate XS interfaces for existing C libraries. At its simplest, you specify the header file for a library, and it creates and populates a module directory. If the header file is /usr/include/rpcsvc/rusers.h, we can do
.../development>h2xs rpcsvc/rusers Writing Rusers/Rusers.pm Writing Rusers/Rusers.xs Writing Rusers/Makefile.PL Writing Rusers/test.pl Writing Rusers/Changes Writing Rusers/MANIFEST
h2xs searches for the header file in the current directory and on the standard include paths, and complains if it doesn't find it.
.../development>h2xs foo Can't find foo.h
h2xs names the module and the module directory after the header file. It upcases the first letter of the name, in accordance with the Perl convention that module names have leading capitals.
If you don't like the module name that h2xs generates, you can specify a different one with the -n flag.
.../development>h2xs -n RPC::Rusers rpcsvc/rusers Writing RPC/Rusers/Rusers.pm Writing RPC/Rusers/Rusers.xs Writing RPC/Rusers/Makefile.PL Writing RPC/Rusers/test.pl Writing RPC/Rusers/Changes Writing RPC/Rusers/MANIFEST
The -n flag controls both the name of the module directory and the name of the Perl module; in this case, the Perl module will be RPC::Rusers. The -n flag doesn't affect the search for the header file: h2xs still finds the header in /usr/include/rpcsvc/rusers.h.
Align::NW module. Align::NW isn't an interface to an existing library, and its headers aren't in /usr/include/. It is a new Perl module that is partly implemented in C.
Because the C code for Align::NW is part of the module, it ought to live in the module directory. The Perl code will be in Align/NW/NW.pm, and it is tempting to name the C sources to match
.../development/Align/NW/NW.pm .../development/Align/NW/NW.c .../development/Align/NW/NW.h
However, this won't work. The problem is that h2xs is going to create
.../development/Align/NW/NW.xs
and xsubpp will translate NW.xs into
.../development/Align/NW/NW.c
which collides with our NW.c file.
Another possibility is to integrate the C code into the .xs file
.../development/Align/NW/NW.pm .../development/Align/NW/NW.xs # contains our C code .../development/Align/NW/NW.h
This works, because anything in a .xs file that isn't an XS macro is passed through unchanged by xsubpp to the .c file. Some XS modules implement large amounts of C code directly in the .xs file; ultimately, the distinction between XS code and C code becomes arbitrary.
However, I prefer to keep the bulk of my C code in .c files, and reserve the .xs file for glue routines. Reasons for this include
.c files together with a main.c and test them in a stand-alone C program..xs file as possible.
The C code in Align::NW implements a single Perl method, named score. We'll name our C sources score.c and score.h. Then we can create and populate the module directory like this
.../development>ls score.c score.h .../development>h2xs -n Align::NW score Writing Align/NW/NW.pm Writing Align/NW/NW.xs Writing Align/NW/Makefile.PL Writing Align/NW/test.pl Writing Align/NW/Changes Writing Align/NW/MANIFEST .../development>cp score.c score.h Align/NW/
#define constants that appear in their interfaces. h2xs parses these constants and makes them available to the Perl module as methods. For example, if score.h contained the lines
#define FOO 17 #define BAR 42
then the values 17 and 42 would be available to Perl code as the return values of Align::NW::FOO() and Align::NW::BAR(), respectively.
h2xs doesn't do this by creating FOO() and BAR() methods. Instead, it creates Align::NW::AUTOLOAD() in Align/NW.pm, and a C routine named constant() in Align/NW.xs.
Calls to FOO() and BAR() are handled by Align::NW::AUTOLOAD(). AUTOLOAD() calls constant(), and constant() returns the value #define'd in the .h file.
Align::NW::AUTOLOAD() enforces a Perl function prototype on constant methods. To satisfy this prototype, you have to predeclare any constant methods that you use, like this
sub FOO (); sub BAR ();
h2xs with the -c switch. This suppresses the AUTOLOAD routine from the .pm file and the constant routine from the .xs file.
If you don't need the AutoLoader for anything else, you can run h2xs with the -A switch. -A implies -c, and additionally suppresses inheritance from AutoLoader.
h2xs can parse function prototypes and generate glue routines based on them. It doesn't always guess right about how to convert parameters, so we may have to edit the glue by hand. Even so, this can save us some typing. To automatically generate glue routines, do
.../development>h2xs -n Align::NW -A -O -x -F '-I ../..' score.h
The -n and -A flags are as before. If you've previously run h2xs, you'll need the -O flag to force it overwrite the existing Align/NW/* files. The -x flag tells h2xs to generate glue routines based on the function prototypes in score.h.
The -x flag uses the C::Scan module to locate header files. You'll need to have this module installed on your system in order to use -x.
We run h2xs from the development directory, but C::Scan is cd'd to the module directory when it searches for header files. The -F flag specifies additional switches for C::Scan to pass to the C preprocessor. We pass a -I ../.. switch to tell the preprocessor to search for headers two levels up, in the development directory. This allows it to find score.h.
h2xs, we're going to look inside an .xs file to see what is there.
.xs file. xsubpp translates XS routines to xsubs for us.xsubpp emits when it generates an xsub.hypotenuse.c
double hypotenuse(double x, double y)
{
return sqrt(x*x + y*y);
}
and its prototype in hypotenuse.h
double hypotenuse(double x, double y);
The Perl routine is Geometry::hypotenuse(). We want calls to Geometry::hypotenuse() to invoke the target routine.
h2xsh2xs
.../development>h2xs -n Geometry -A hypotenuse.h
and it generates Geometry/Geometry.xs as
#include "EXTERN.h" #include "perl.h" #include "XSUB.h" #include <hypotenuse.h> MODULE = Geometry PACKAGE = Geometry
I've omitted the -x flag. This is an instance where it guesses wrong about parameter conversion. It is easy enough to fix, but you have to understand the typemap, which we haven't discussed yet.
Geometry.xsGeometry.xs in detail.
#includes give our XS code access to the Perl C API. Through them you can find all the entry points and data types mentioned in perlguts.
hypotenuse.h#include gives our XS code access to our own C header file: hypotenuse.h. h2xs searches for header files in the current working directory and on the standard include path; however, the #include directive that it emits uses angle brackets instead of quotes. Angle brackets instruct the C compiler to only search for header files on the standard include path. We're going to put hypotenuse.h in the module directory, which is not on the standard include path, so we need to edit the #include directive to use quotes.
#include "hypotenuse.h"
Then the C compiler will find hypotenuse.h in the module directory.
MODULE and PACKAGEMODULE and PACKAGE are XS directives. They specify the module and package for our xsubs. This is easy to understand if we remember the underlying definitions
We write xsubs in XS; xsubpp translates the XS code to straight C; the C compiler compiles the C code into link libraries; the makefile installs those libraries, and the DynaLoader loads those libraries at run time. In order to load a library, the DynaLoader needs to know two things:
The MODULE directive tells it the file, and the PACKAGE directive tells it the namespace.
The PACKAGE directive names a Perl package, like Geometry or Align::NW. xsubpp then generates code to install xsubs in that package.
The MODULE directive doesn't name an actual file: it names a Perl package, just like the PACKAGE directive. xsubpp maps that package name into a file name, like Geometry.so or Align/NW.so. The makefile installs that file on an appropriate path in the Perl library, and the DynaLoader finds it there.
An .xs file may contain multiple MODULE and PACKAGE directives. MODULE and PACKAGE directives should always appear together, as shown above. All MODULE directives in an .xs file should name the same module. PACKAGE directives can name different packages as necessary to place different xsubs into different Perl packages; this is quite analogous to the use of repeated package statements in ordinary Perl code.
Now we'll start adding things to Geometry.xs.
PROTOTYPESPROTOTYPES directive tells xsubpp whether or not to install our xsubs with prototypes. Write
PROTOTYPES: ENABLE
or
PROTOTYPES: DISABLE
to enable or disable prototypes.
The PROTOTYPES directive goes below the MODULE directive. If you put it above the MODULE directive, it will be passed through to the C compiler, and cause compilation errors.
h2xs predates prototypes in Perl, and does not emit a PROTOTYPES directive for you. xsubpp complains if you forget to add one. I generally enable prototypes, unless I have some reason not to.
PROTOTYPES directive come XS routines.
An XS routine can contain nearly arbitrary code. However, in simple cases, all it needs to do is describe the signature of the target routine. To do this, it specifies
Here is an XS routine that describes our target routine
double
hypotenuse(x, y)
double x
double y
The newlines are significant; the indentation is not. However, this is the style that h2xs uses, and I usually follow it.
The name of the XS routine is hypotenuse. xsubpp derives the name of the Perl routine from the name of the XS routine. In this example, xsubpp also determines the name of the target routine from the name of the XS routine. Later on, we'll see examples where the target routine has a different name than the XS routine.
Makefile.PL and add the name/value pair
'OBJECT' => 'Geometry.o hypotenuse.o'
to the arguments of WriteMakefile. Then do
.../development/Geometry>cp ../hypotenuse.c . .../development/Geometry>cp ../hypotenuse.h . .../development/Geometry>perl Makefile.pl .../development/Geometry>make
Makefile.pl writes a makefile. The makefile runs
xsubpp to translate Geometry.xs to Geometry.cGeometry.c to Geometry.oGeometry.o into a link libraryGeometry.cGeometry.c, edited a bit for clarity.
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#include "hypotenuse.h"
XS(XS_Geometry_hypotenuse)
{
dXSARGS;
if (items != 2)
croak("Usage: Geometry::hypotenuse(x, y)");
{
double x = (double)SvNV(ST(0));
double y = (double)SvNV(ST(1));
double RETVAL;
RETVAL = hypotenuse(x, y);
ST(0) = sv_newmortal();
sv_setnv(ST(0), (double)RETVAL);
}
XSRETURN(1);
}
XS(boot_Geometry)
{
dXSARGS;
char* file = __FILE__;
XS_VERSION_BOOTCHECK ;
newXSproto("Geometry::hypotenuse", XS_Geometry_hypotenuse, file, "$$");
XSRETURN_YES;
}
Geometry.c is a ordinary C source file, suitable for compilation. It looks strange because it is written with XS macros. Let's decode the macros and see how it works.
#includes are passed through unchanged from the .xs file. The C compiler will need them.
XS_Geometry_hypotenuse is the actual xsub that is generated by xsubpp. The xsub name is pasted together from
XSPACKAGE directive
The XS() macro declares XS_Geometry_hypotenuse with the return type and parameters that Perl expects an xsub to have. These are not the parameters to hypotenuse(); we will get those from the Perl stack.
dXSARGS is another XS macro; it declares some local variables that the xsub needs.
One of the locals declared by dXSARGS is items; this gives the number of arguments that were passed to the xsub on the Perl stack. As declared, hypotenuse() requires 2 arguments; the xsub emits a usage message if hypotenuse() is called from Perl with the wrong number of arguments.
Next comes the code that extracts arguments from the Perl stack
double x = (double)SvNV(ST(0)); double y = (double)SvNV(ST(1));
ST() is an XS macro that accesses an argument on the Perl stack: ST(0) is the first argument, ST(1) is the second, and so on.
Perl passes parameters by reference, so the things on the stack are pointers to the underlying scalars. SvNV is an entry point in the Perl C API. It takes a pointer to a scalar and returns the value of that scalar as a number. xsubpp adds a (double) typecast to quiet the C compiler, and assigns that value to a local variable: x for ST(0) and y for ST(1).
xsubpp also declares a local variable to hold the return value of the subroutine.
double RETVAL;
This variable is always named RETVAL, but it is declared with whatever type the subroutine returns.
With x, y, and RETVAL set up, xsubpp can generate a call to the target routine. xsubpp emits the name of the XS routine as the name of the target routine.
RETVAL = hypotenuse(x, y);
There is no magic here. This is a perfectly ordinary C subroutine call. Don't get used to it.
The next two lines return the value to Perl.
ST(0) = sv_newmortal(); sv_setnv(ST(0), (double)RETVAL);
Return values go on the Perl stack, starting at ST(0). sv_newmortal and sv_setnv are entry points in the Perl C API. sv_newmortal creates a new scalar value. Like any scalar, it has an initial value of undef. sv_setnv sets the value of the scalar to the value that was returned from hypotenuse.
Finally, the XSRETURN(1) macro tells the interpreter how many values we are returning on the Perl stack: in this case, one.
boot_Geometry is the subroutine that DynaLoader calls to install the xsubs in the Geometry module. The subroutine name is pasted together from
bootMODULE directive
To install an xsub, boot_Geometry calls
newXSproto("Geometry::hypotenuse", XS_Geometry_hypotenuse, file, "$$");
newXSproto is an entry point in the Perl C API. Its arguments are
newXSproto installs the C subroutine XS_Geometry_hypotenuse as an xsub for the Perl routine Geometry::hypotenuse. It supplies a prototype, because we specified PROTOTYPES: ENABLE in the .xs file. The source file name is provided so that Perl can report it in error messages.
The name of the Perl routine is constructed from
PACKAGE directive
xsubpp only generates one boot routine per module. The boot routine makes one call to newXSproto for each xsub in the module.
Geometry/test.pl and add the line
print Geometry::hypotenuse(3, 4), "\n";
at the end. Then do
.../development/Geometry>make test
The output should be
1..1 ok 1 5
r2phypotenuse() has a simple signature; given that signature, xsubpp can generate code to call it. In more complex cases, we have to write some of the code ourselves. XS provides directives that allow us to supply C code directly, instead of relying on xsubpp. In the examples below, we'll use these to take over progressively more control from xsubpp.
Here is another target routine, in a file called r2p.c
double r2p(double x, double y, double *theta)
{
*theta = atan2(y, x);
return sqrt(x*x + y*y);
}
and its prototype in r2p.h
double r2p(double x, double y, double *theta);
r2p converts rectangular to polar coordinates, so it has to return 2 values: a magnitude and an angle. The magnitude is the return value of the subroutine; the angle is returned in a third parameter, passed by address. If we write the XS routine as
double
r2p(x, y, theta)
double x
double y
double theta
then xsubpp will treat theta as an input parameter. It will initialize it from the Perl stack, and won't return a value in it. Instead, we write the XS routine as
double
r2p(x, y, theta)
double x
double y
double theta = NO_INIT
CODE:
RETVAL = r2p(x, y, &theta);
OUTPUT:
RETVAL
theta
The NO_INIT directive suppresses initialization from the Perl stack.
The CODE directive tells xsubpp that we will supply C code to call the target routine. xsubpp still declares RETVAL for us, but we have to assign the return value to it. The call to r2p is
RETVAL = r2p(x, y, &theta);
This is not an XS directive; it is a C statement, and will be passed through to the C compiler. Therefore, it ends with a semicolon.
The OUTPUT directive lists values that are to be copied back to Perl scalars. The order in which we list them doesn't matter; xsubpp knows where each value goes. We need to return both RETVAL and theta.
Here is the xsub that xsubpp generates for this XS routine.
XS(XS_Geometry_r2p)
{
dXSARGS;
if (items != 3)
croak("Usage: Geometry::r2p(x, y, theta)");
{
double x = (double)SvNV(ST(0));
double y = (double)SvNV(ST(1));
double theta;
double RETVAL;
RETVAL = r2p(x, y, &theta);
sv_setnv(ST(2), (double)theta);
SvSETMAGIC(ST(2));
ST(0) = sv_newmortal();
sv_setnv(ST(0), (double)RETVAL);
}
XSRETURN(1);
}
It looks very much like the xsub for hypotenuse. xsubpp declares theta for us, so that we can pass its address to r2p. It also generates these lines to return theta to Perl
sv_setnv(ST(2), (double)theta); SvSETMAGIC(ST(2));
It knows to assign theta to ST(2), because we declared theta as the 3rd parameter to r2p. SvSETMAGIC ensures that the scalar at ST(2) will be created, if necessary. It must be created, for example, if it is a non-existent array or hash value.
r2p to the Geometry module. Copy r2p.c and r2p.h into the module directory and add r2p.o to the OBJECT list in Makefile.PL. Add an
#include "r2p.h"
line and the XS code shown above to Geometry.xs. Add
my $theta; my $r = Geometry::r2p(3, 4, $theta); print "$r, $theta\n";
to test.pl. Now do
.../development/Geometry>perl Makefile.pl .../development/Geometry>make .../development/Geometry>make test
The output should be
1..1 ok 1 5 5, 0.927295218001612
r2p_listr2p as
($r, $theta) = r2p_list($x, $y);
We can obtain this calling sequence with this XS routine
void
r2p_list(x, y)
double x
double y
PREINIT:
double r;
double theta;
PPCODE:
r = r2p(x, y, &theta);
EXTEND(SP, 2);
PUSHs(sv_2mortal(newSVnv(r )));
PUSHs(sv_2mortal(newSVnv(theta)));
There are a few differences between this XS routine and the one that we wrote above for r2p.
The name of the XS routine doesn't match the name of the target routine. xsubpp doesn't need the name of the target routine, because we are supplying the code to call the target routine. xsubpp still uses the name of the XS routine to derive the name of the Perl routine.
The return type of r2p_list is void. This doesn't mean that r2p_list doesn't return anything. Rather, it tells xsubpp that we will supply the code to return values to Perl. Therefore, xsubpp doesn't declare RETVAL for us.
The PREINIT directive gives us a place to declare C variables. Without it, xsubpp might emit executable C code before our variable declarations, which is a syntax error in C. We declare two C variables: r and theta.
The PPCODE directive is similar to the CODE directive. It tells xsubpp that we will supply both the C code to call r2p and the PP code to return values to Perl. PP code is Perl Pseudocode; it is the internal language that the Perl interpreter executes.
The C code to call r2p is
r = r2p(x, y, &theta);
and the PP code to return values to Perl is
EXTEND(SP, 2); PUSHs(sv_2mortal(newSVnv(r ))); PUSHs(sv_2mortal(newSVnv(theta)));
The EXTEND macro allocates space on the stack for 2 scalars, and the PUSHs macros push the scalars onto the stack. The PP macros are passed through to the C compiler, so they end with semicolons, like any other line of C code.
The xsub that xsubpp generates is
XS(XS_Geometry_r2p_list)
{
dXSARGS;
if (items != 2)
croak("Usage: Geometry::r2p_list(x, y)");
SP -= items;
{
double x = (double)SvNV(ST(0));
double y = (double)SvNV(ST(1));
double r;
double theta;
r = r2p(x, y, &theta);
EXTEND(SP, 2);
PUSHs(sv_2mortal(newSVnv(r )));
PUSHs(sv_2mortal(newSVnv(theta)));
PUTBACK;
return;
}
}
xsubpp emits code to extract our arguments from the Perl stack, as before. It passes our C variable declarations and our subroutine call through unchanged. It also passes our PP code through.
The biggest difference between XS_Geometry_r2p and XS_Geometry_r2p_list is the stack management. XS_Geometry_r2p uses an XSRETURN(1) macro call to return one value on the stack. XS_Geometry_r2p_list lowers SP by the number of input parameters, and then issues a PUTBACK macro before returning.
I don't actually understand what any of the stack macros do. I wrote the glue routines shown above by following the examples in perlxs. The macros are defined in /usr/local/lib/perl5/version/architecture/CORE/*.h, but when I tried reading them, I quickly got lost in a maze of #defines, #ifdefs, typedefs, and internal Perl data structures.
Lacking a principled understanding of Perl stack management, you can't actually write PP code: all you can do is follow working examples, as I have. The examples in perlxs appear to be adequate for most xsubs.
r2p_openCODE or a PPCODE directive in our XS code, we can put any C code in the XS routine.
In r2p_open, we dispense with the r2p routine, and compute r and theta in open code.
void
r2p_open(x, y)
double x
double y
PREINIT:
double r;
double theta;
PPCODE:
r = sqrt(x*x + y*y);
theta = atan2(y, x);
EXTEND(SP, 2);
PUSHs(sv_2mortal(newSVnv(r )));
PUSHs(sv_2mortal(newSVnv(theta)));
Here is the xsub that xsubpp emits. It looks just like the xsub for r2p_list, except for the lines that compute r and theta.
XS(XS_Geometry_r2p_open)
{
dXSARGS;
if (items != 2)
croak("Usage: Geometry::r2p_open(x, y)");
SP -= items;
{
double x = (double)SvNV(ST(0));
double y = (double)SvNV(ST(1));
double r;
double theta;
r = sqrt(x*x + y*y);
theta = atan2(y, x);
EXTEND(SP, 2);
PUSHs(sv_2mortal(newSVnv(r )));
PUSHs(sv_2mortal(newSVnv(theta)));
PUTBACK;
return;
}
}
Add these lines to Geometry/test.pl to test our new xsubs.
($r, $theta) = Geometry::r2p_list(3, 4); print "$r, $theta\n"; ($r, $theta) = Geometry::r2p_open(3, 4); print "$r, $theta\n";
When we run
.../development>make test
we get
1..1 ok 1 5 5, 0.927295218001612 5, 0.927295218001612 5, 0.927295218001612
For reference, here are the final versions of
xsubpp emits. Now, we're going to look at how xsubpp converts data between Perl and C representations.
Here's the problem. When the Perl interpreter calls a subroutine, it pushes a list of scalars onto the Perl stack. On input, an xsub has to get those scalars off the stack and convert them to C data. On output, the xsub has to convert C data to Perl scalars and put the scalars back on the stack. xsubpp must emit the C code to do these conversions.
Conversion between Perl and C data types is handled with macros and routines in the Perl C API, but the necessary operations vary, depending on the C data types and the direction of the conversion. Consider:
| C data type | input | output |
|---|---|---|
int n |
n = (int ) SvIV(ST(0)) |
sv_setiv( ST(0), (IV )n ) |
double x |
x = (double) SvNV(ST(0)) |
sv_setnv( ST(0), (double)x ) |
char *psz |
psz = (char *) SvPV(ST(0),na) |
sv_setpv((SV*)ST(0), psz) |
We could imagine a big switch statement inside xsubpp to select the right code fragment for each C data type, but this would be clumsy and inflexible. It would be better to put the code fragments in a table, like the one shown above.
If we start writing such a table, we quickly discover that the mapping between Perl and C datatypes is not one-to-one. As a strongly typed language, C distinguishes more data types than Perl does. For example, these seven C integer types are all converted with essentially the same code fragment, the only variation being the typecast used to quiet the C compiler.
| C data type | input | output |
|---|---|---|
int n |
n = (int )SvIV(ST(0)) |
sv_setiv(ST(0), (IV)n) |
unsigned n |
n = (unsigned )SvIV(ST(0)) |
sv_setiv(ST(0), (IV)n) |
unsigned int n |
n = (unsigned int )SvIV(ST(0)) |
sv_setiv(ST(0), (IV)n) |
long n |
n = (long )SvIV(ST(0)) |
sv_setiv(ST(0), (IV)n) |
unsigned long n |
n = (unsigned long )SvIV(ST(0)) |
sv_setiv(ST(0), (IV)n) |
short n |
n = (short )SvIV(ST(0)) |
sv_setiv(ST(0), (IV)n) |
unsigned short n |
n = (unsigned short)SvIV(ST(0)) |
sv_setiv(ST(0), (IV)n) |
In view of this, xsubpp uses a two-level mapping. First, it maps C data types to XS types, like this
| C data type | XS type |
|---|---|
int |
T_IV |
unsigned |
T_IV |
char |
T_CHAR |
char * |
T_PV |
Then it maps the XS types to code fragments, in two tables: one for input
| XS type | input code fragment |
|---|---|
T_IV |
$var = ($ntype)SvIV($arg) |
T_CHAR |
$var = (char)*SvPV($arg,na) |
T_PV |
$var = ($ntype)SvPV($arg,na) |
and one for output
| XS type | output code fragment |
|---|---|
T_IV |
sv_setiv ($arg, (IV)$var); |
T_CHAR |
sv_setpvn($arg, (char *)&$var, 1); |
T_PV |
sv_setpv ((SV*)$arg, $var); |
These tables constitute the typemap.
The XS types are meaningful only to xsubpp, and appear only in the typemap. They do not appear in Perl code, XS code, or C code.
$var, $ntype, and $arg$var$ntype$var$arg
xsubpp is a Perl program. When it needs to convert an argument from Perl to C, it sets $var, $ntype, and $arg, obtains the appropriate code fragment from the typemap, and evals the fragment to replace the Perl variables with their values.
For example, consider this XS routine
int max(a, b) int a int b
To generate code to convert the first parameter from Perl to C, xsubpp sets the Perl variables like this
| variable | value |
|---|---|
$var |
a |
$ntype |
int |
$arg |
ST(0) |
Then, it evals the fragment
$var = ($ntype)SvIV($arg)
to yield the C code
a = (int)SvIV(ST(0))
It is important to understand how these variables work, because sometimes you have to arrange for them to have the right values in order to make xsubpp do what you want. We'll see an example of this next month when we write the XS code for Align::NW.
# A typemap file TYPEMAP int T_IV SV * T_SV INPUT T_SV $var = $arg T_IV $var = ($ntype)SvIV($arg) OUTPUT T_SV $arg = $var; T_IV sv_setiv($arg, (IV)$var);
The first TYPEMAP header may be omitted.
Files containing typemaps are conventionally named typemap. xsubpp can read and aggregate multiple typemap files to construct the typemap; entries in later files override entries in earlier files.
Perl supplies a default typemap in
/usr/local/lib/perl5/version/ExtUtils/typemap
XS modules may provide a local typemap file in the module directory. If the module declares structs or other C data types, it can map them to XS types in a TYPEMAP section. Local typemaps rarely need INPUT or OUTPUT sections; the default typemap almost always contains appropriate code fragments.
Next month, we'll use these tools to complete the XS implementation of Align::NW.
The mapping is similar, but not identical, to that used in the installation directory. NW.pm is developed in
.../development/Align/NW/NW.pm
but installed in
/usr/local/lib/perl5/site_perl/version/Align/NW.pm
The extra /NW/ in the development area is necessary so that we can have, for example,
.../development/Align/NW/Makefile.PL .../development/Align/SW/Makefile.PL
without conflict.
h2xs POD consistently uses the term extension for module.h2xs, you may still find a
require AutoLoader;statement in your
.pm file. You can delete it if you like.
sv_newmortalt/*.t files. See Module Mechanics for details.SvSETMAGICtypename keyword in The C++ Programming Language.