IO::Socket
LWP::UserAgent
module to download web pages. LWP::UserAgent
provides a high-level interface to the network: we give it a URL, and it returns a web page. This interface hides several low-level operations
This month, we'll get underneath the application protocols and see how to do our own network I/O. This will give us the tools we need to write our own network servers and implement our own protocols.
There are many different computer networks and protocols. This article limits itself almost exclusively to the TCP/IP protocols that run on the internet.
Sending a single packet is useful in certain circumstances; however, an application protocol like HTTP needs more than that. HTTP involves two hosts in a conversation: the client sends a request, the server returns a response, and so on. To carry out this conversation, each host must send an ongoing stream of bytes to the other.
A little thought shows that these things require state. Establishing the necessary state creates a TCP connection between two hosts. Once a connection is established, the two hosts may send arbitrary data to each other over it. When they are done, the state is discarded; the connection is then said to be closed. Hosts that are communicating over TCP are sometimes called peers.
206.112.62.102
This is called dotted decimal notation.
Numbers like this are convenient for computers and routers, but not for humans. For human consumption, hosts may also be given domain names
www.perlmonth.com
Technically, the domain name is the name of the computer, while the IP address is the address of a network interface. Since a single computer may have several network interfaces, there is a one-to-many correspondence between domain names and IP addresses. A large distributed database called the domain name system (DNS) maps between the two forms.
Every packet on the internet carries its own source and destination addresses. The source address is necessary, but not sufficient, to associate a packet with a TCP stream. It is not sufficient because there may be more than one simultaneous TCP connection between a pair of hosts. For example, each window that a web browser opens to a web server typically creates a separate TCP connection from that client to that server.
To accommodate this, TCP has a concept of ports. A TCP port is a 16-bit number. A TCP connection is made from a particular port at one IP address to a particular port at another IP address. Thus, every TCP connection that does, or could, exist on the internet is uniquely identified by a 4-tuple
address1, port1, address2, port2
Packets sent over a TCP connection carry their source and destination TCP ports, in addition to their source and destination IP addresses. Multiple TCP connections between the same pair of hosts can then be distinguished by port numbers. For example
192.74.137.5, 3423, 206.112.62.102, 80 192.74.137.5, 1834, 206.112.62.102, 80
are two different TCP connections, because the source ports are different.
The combination of an IP address and a port number is an endpoint. Endpoints are conventionally written by giving the IP address and port number, separated by a colon. The domain name is sometimes written in place of the IP address
192.74.137.5:3423 theworld.com:3423
/etc/services
; here are a few
echo 7/tcp ftp 21/tcp telnet 23/tcp http 80/tcp
The first column gives the name of the service; the second gives the TCP port number on which that service is provided. Clients discover the port number for a desired service by looking it up in /etc/services
.
This scheme isn't as powerful as it sounds, because every host maintains its own copy of /etc/services
. Therefore, it only works if the /etc/services
files on the client and server list the same port numbers for the same services. This is what well-known means: everyone knows it, and puts it in their /etc/services
file.
print SOCKET $request; $response = <SOCKET>
The first line sends $request
to the remote peer; the second line receives one line from the remote peer and stores it in $response
.
However, creating a socket is more complicated. Consider the open
function
open(FILE, "file.txt") or die "Can't open file.txt: $!\n";
open
actually does three distinct things
Conceptually, we could create sockets in the same way
open(SOCKET, theworld.com, 3423, www.perlmonth.com, 80) or die "open: $!\n";
In practice, it doesn't work that way. A TCP connection requires two sockets: one on each peer. Each socket maintains state; activity on the two peers must be coordinated in order to establish that state. Furthermore, the roles of the peers in a client/server protocol are asymmetrical.
To manage all this, the conceptually unified process of creating a TCP connection is carried out in stages by five separate functions
socket
bind
connect
listen
accept
Client and server play different roles in setting up a connection. The client calls
socket
to create a socketbind
to associate the socket with a local endpointconnect
to create a connection to the serverThe server calls
socket
to create a socketbind
to associate the socket with a local endpointlisten
to create a queue of incoming connectionsaccept
to extract a connection from the queueWe'll discuss these calls in detail below.
socket
socket
creates a socket. It doesn't create a TCP connection, or even name a TCP connection. Still, we have to tell it some things about the connection that we are going to create
$domain = PF_INET; $type = SOCK_STREAM; $proto = getprotobyname('tcp'); socket(CLIENT, $domain, $type, $proto) or die "socket: $!\n";
CLIENT
is the name of the socket that we are creating. Syntactically, it works the same as a file handle.
$domain
specifies the address space of the endpoints that CLIENT
uses. PF
stands for Protocol Family; PF_INET
specifies the internet protocol family. This means that CLIENT
will work with IP addresses and port numbers.
$type
may be either SOCK_STREAM
or SOCK_DGRAM
. SOCK_STREAM
specifies a byte stream, suitable for TCP connections. DGRAM
is short for datagram, and datagram is another name for packet. SOCK_DGRAM
specifies a socket that sends and receives individual packets, rather than a byte stream.
$proto
is a number; it specifies the protocol that CLIENT
will use. Available protocols are listed in /etc/protocols
. Protocol numbers are system-dependent, so we use getprotobyname
to look up the one we want by name.
On many systems, TCP is the only protocol of type SOCK_STREAM
in the PF_INET
domain. However, the socket interface was designed to allow applications to select a protocol at run time, so we have to specify one.
Given $domain
, $type
, and $proto
, socket
creates CLIENT
and binds it to the appropriate protocol handler.
The constants PF_INET
and SOCK_STREAM
are defined in Socket.pm
; be sure to do a
use Socket;
to import them.
bind
bind
associates the socket with a local endpoint. We'd like to write something simple, like
bind(CLIENT, $IP_address, $TCP_port) or die "bind: $!\n";
but we can't do that, because IP addresses and TCP ports are specific to the PF_INET
domain, and the sockets interface is not.
Instead, we have to pack the address and port into an opaque data structure called a socket address. bind
passes the socket address intact to the underlying protocol handler, which knows how to interpret it.
$port = 0; $sockaddr = sockaddr_in($port, INADDR_ANY); bind(CLIENT, $sockaddr) or die "bind: $!\n";
sockaddr_in
is defined in Socket.pm
; it builds a socket address from an IP address and a TCP port number.
bind
specifies the name of the local endpoint, so we have to give the IP address of a network interface on our own machine. If we don't care which interface we use, we can pass the constant INADDR_ANY
, and the protocol handler will choose one for us.
Zero is not a valid TCP port number. If we don't care which port we use, we can specify zero, and the protocol handler will choose an available port for us.
connect
connect
specifies the name of the remote endpoint, and establishes the TCP connection.
$host = 'www.perlmonth.com'; $port = getservbyname('http', 'tcp'); $ip_addr = inet_aton($host) or die "inet_aton: $!\n"; $sockaddr = sockaddr_in($port, $ip_addr); connect(CLIENT, $sockaddr) or die "connect: $!\n";
In this example, the remote host is www.perlmonth.com
. In production code, host names are typically provided as user input.
We call getservbyname
to look up the well-known TCP port for the HyperText Transport Protocol (HTTP). As it happens, this is port 80. Servers don't always use the well-known port; production code should allow the user to specify a different one. For example, web browsers accept a port number after the host name in URLs
http://my.obscure.server:8080/index.html
aton
is short for ASCII to Numeric; inet_aton
translates an ASCII domain name to a numerical IP address. It uses DNS to do this, and is subject to failure if DNS is unavailable or if the domain name is not valid.
As with bind
, we have to pack the IP address and TCP port number into a socket address before we pass them to connect
.
connect
takes $sockaddr
, opens a TCP connection to the remote endpoint, and associates that connection with CLIENT
. We can then send and receive data over the connection simply by writing to and reading from CLIENT
, as if it were an ordinary file handle.
print CLIENT "GET /index.html HTTP/1.0\n"; $status = <CLIENT>;
socket(...); bind (...); listen(...); for (;;) { accept(...); # handle the client }
First, the server calls socket
, bind
, and listen
to set up a socket. Then it calls accept
in an infinite loop. accept
blocks until a client connects to the server. When accept
returns, the server handles the client. Then it goes around the loop and calls accept
again to wait for another client.
socket
and bind
socket
, and binds it to a local endpoint with bind
, just as the client does.
my $proto = getprotobyname('tcp'); socket(SERVER, PF_INET, SOCK_STREAM, $proto) or die "socket: $!\n"; my $port = getservbyname('http', 'tcp'); my $sockaddr = sockaddr_in($port, INADDR_ANY); bind(SERVER, $sockaddr) or die "bind: $!\n";
The semantics of INADDR_ANY
are a bit different for clients and servers. For a client, INADDR_ANY
tells the protocol handler to select an arbitrary network interface and bind the socket to it. For a server, it tells the protocol handler to allow the socket to accept incoming TCP connections on any network interface. If we specified a particular IP address in place of INADDR_ANY
, then the socket would only accept TCP connections through that interface.
listen
listen
function can be confusing. listen
causes the underlying protocol handler to begin listening for incoming TCP connections. However, listen
doesn't wait for a connection to arrive, or return a connection to the application. Instead, it creates a queue for incoming connections and returns immediately. The protocol handler puts incoming connections into the queue.
listen(SERVER, SOMAXCONN) or die "listen: $!\n";
The second argument is the length of the queue. Longer queues consume more memory; shorter queues increase the risk that the queue will fill up. Clients cannot connect to the server while the queue is full. SOMAXCONN
is the largest queue supported by the system; on my system, it is equal to 1000.
accept
accept
listen
If there are no connections in the queue, accept
blocks until a client connects to the server.
$sockaddr = accept(HANDLER, SERVER) or die "accept: $!\n"; ($port, $ip_addr) = sockaddr_in($sockaddr); $host = gethostbyaddr($ip_addr, AF_INET);
The arguments to accept
are both socket handles. SERVER
is the socket created by the server to accept incoming connections. HANDLER
is the socket created by accept
to handle the connection that it extracts from the queue. accept
creates a new socket so that the server can continue to accept connections on the original socket.
The return value of accept
is the socket address of the client. We call sockaddr_in
in list context to unpack the IP address and TCP port from the socket address, and gethostbyaddr
to translate the IP address back to a host name. AF
stands for Address Family; AF_INET
tells gethostbyaddr
to interpret its first argument as an IP address. Again, this is done so that the interface can support multiple protocols and addressing schemes.
After accept
returns, the server can receive and send
data over the connection by reading and writing on
HANDLER
as if it were an ordinary filehandle.
$command = <HANDLER>; print HANDLER "HTTP/1.0 200 OK\n\n"
The server listens on port 7800 and accepts connections in an infinite loop. For each connection, it reads lines to EOF, reverses each line, and returns it to the client. To exit the server, hit ^C
.
7800 is a well-known port, in the sense that the client has to know it, but it isn't listed in /etc/services
. For security reasons, /etc/services
is writable only by root. As a result, custom servers tend to have port numbers hard-coded in them.
The client takes a host name and a port number on the command line, and connects to the server. It sends lines from STDIN to the server, and prints whatever comes back on STDOUT.
Both client and server print messages to show their progress. Here is a pair of sample transcripts.
world:~>./reverseS socket bind listen accept theworld.com:9252 close HANDLER ^C world:~>
world:~>./reverseC theworld.com 7800 socket bind connect Help! I'm trapped in a PDP-11. .11-PDP a ni deppart m'I !pleH ^D close world:~>
IO::Socket
IO::Socket
module provides an object-oriented interface to the socket functions. Like the sockets interface, IO::Socket
is designed to handle multiple protocols. IO::Socket
is an abstract base class; interfaces to particular protocols are implemented in derived classes. The IO::Socket::INET
class manages TCP connections.
IO::Socket::INET
clientsIO::Socket::INET
, we write
$client = new IO::Socket::INET Proto => 'tcp', PeerAddr => $host, PeerPort => $port; $client or die "Can't connect to $host:$port\n";
The constructor provides many defaults; in the simplest case, all we have to specify is the protocol and the remote endpoint. The constructor creates the underlying socket, binds it to a local endpoint, and connects to the server.
If the constructor succeeds, then we can use $client
just like a file handle.
print $client "42\n"; my $factors = <$client>; print $factors;
IO::Socket::INET
servers$server = new IO::Socket::INET Proto => 'tcp', LocalPort => 7801, Listen => SOMAXCONN; $server or die "Can't create server\n";
The constructor creates the socket, binds it to the port that we specify in LocalPort
, and does a listen
.
It doesn't do a connect
.
If the constructor succeeds, then we can make accept
calls on the IO::Socket
object to accept incoming connections.
for (;;) { my $handler = $server->accept; $handler or die "accept: $!\n"; # handle the client }
accept
returns a new IO::Socket
object to handle each incoming connection. We can use this object like a file handle to read and write to the client
$n = <$handler>; @factors = Factor($n); print $handler "@factors\n";
We can also make method calls on $handler
to discover the network address of the client
$peeraddr = $handler->peeraddr; $hostinfo = gethostbyaddr($peeraddr); printf "accept %s\n", $hostinfo->name || $handler->peerhost;
$peeraddr
is the IP address of the client; gethostbyaddr
translates it to a domain name. If the translation fails, we fallback to a $handler->peerhost
call, which simply returns the IP address of the client in dotted decimal notation.
IO::Socket::INET
.
The server listens on port 7801. When a client connects to it, it reads an integer from the client, factors it, and sends the factors back to the client.
Here is a pair of sample transcripts.
world:~>./factorS Listening on 7801 accept theworld.com ^C world:~>
world:~>./factorC theworld.com 7801 42 2 3 7 world:~>
/etc/services
on my machine lists over 100, including FTP, gopher, HTTP, HTTPS, SMTP, SSH, Telnet, and time. If one of these does what you need, then by all means use it. You will find widespread support for it on servers,
and existing client interfaces that you can use in your applications.
On the other hand, if you can't find what you need already out there, then the Perl sockets interface and the IO::Socket
modules give you the tools you need to implement your own services and protocols.
Server socket, socket bind;
Bind to service and accept.Client socket, socket bind;
Bind to zero and connect.Accept return, return and fork;
Fork and loop while children serve.Client-server rule the net
T.C.P.