Usenet is a distributed system. There is no single location where articles are stored. Rather, articles are stored on many different computers, called servers. The servers are all connected to the Internet, and they constantly exchange articles with each other.
Similarly, there is no single computer through which users submit or retrieve articles. Users can submit and retrieve articles through any computer that is connected to a server; these computers are called clients.
An article submitted from a particular client to a particular server will usually propagate to the bulk of the servers on the Internet within 24 hours. As it reaches each server, it becomes available to all the users who have clients that connect to that server. Servers typically retain articles for a few days or weeks, and then discard them.
Clients and servers send articles to each other using Network News Transport Protocol (NNTP).
All this software exists so that users can read usenet without having to understand or worry about the mechanics of running a large distributed database.
Details vary between newsreaders. Some are text-based, others have GUIs; some sort articles by date, others by thread; some are integrated with text editors, or web browsers. However, the underlying user model is largely the same, and has been since the inception of usenet.
Like the rest of the Internet, usenet has undergone explosive growth in the last few years. There are now over 10,000 newsgroups, carrying among them millions of articles. A single newsgroup may have thousands of available articles, and receive hundreds of new ones each day.
At the same time, much of the traffic on usenet is mislabeled, off-topic, inappropriate, or repetitive. Some of this is due to deliberate abuse, such as trolling or spam, some is inherent in the nature of usenet, and some is simply due to the fact that with an exponentially growing user base, most users are inexperienced.
In addition, user requirements have become more complex. Some newsgroups carry ordinary discussions; some are moderated; some carry binary files that require special encoding. Users may archive newsgroups, or gateway them to mailing lists, or collect statistics on the traffic.
Newsreaders still work, but they don't provide all the functions that users want. And the functions that they do provide may no longer be useful. For example, most newsreaders will list available newsgroups, but a list of 10,000 newsgroups may be unmanageable.
If newsreaders don't meet your needs, you may consider writing your own applications to manage usenet.
On the other hand, it isn't trivial. Writing a usenet application will
potentially involve you in the details of NNTP (see RFC
977), and the format of news articles (see RFC
1036). It may require you to navigate the article database on a
news server, or make network connections to one. You may have to read
and write .newsrc
files.
Fortunately, you don't have to do all this yourself. Much of the infrastructure necessary to write a usenet application has been packaged in modules and made available on CPAN.
This article surveys eight modules. Six of them encapsulate basic functionality needed to write a usenet application:
File::Find
News::NNTPClient
Net::NNTP
News::Article
News::Newsrc
Two others provide more specialized functions:
News::Gateway
News::Scan
File::Find
/var/spool/news/
.
The File::Find
module is useful for navigating directory
trees. See Finding
your files with File::Find for details and examples.
News::NNTPClient
and Net::NNTP
News::NNTPClient
and
Net::NNTP
. News::NNTPClient
is a
free-standing module, while Net::NNTP
is part of the
larger libnet package.
To retrieve articles using News::NNTPClient
, you do something like
$server = "news.isp.com"; $client = new News::NNTPClient $server $group = "comp.lang.perl.modules"' ($first, $last) = $client->group($group); for ($n=$first; $n<=$last; $n++) { @lines = $client->article($n); }
To post an article, do
@header = ("Newsgroups: test", "Subject: test", "From: tester"); @body = ("This is the body of the article"); $client->post(@header, "", @body);
The interface to Net::NNTP
is similar:
$client = new Net::NNTP $server; ($n, $first, $last) = $client->group($group); print "$group contains $n articles\n"; $lines = $client->article($first); $client->post(@header, "", @body);
News::Article
Newsgroups: test Subject: test From: tester
and a body, which can contain arbitrary ASCII text:
This is the body of the article
The body is separated from the headers by a single blank line.
The News::NNTPClient
and Net::NNTP
article
methods return articles as an array (or a reference to an array) of lines. You could go groveling through the article, parsing headers and locating the body; it wouldn't even be that hard: Perl is excellent at this sort of text processing. But you don't have to. Instead, you can use News::Article
.
News::Article
takes the list of lines that constitute an article and creates an object to manage that article. It provides methods for getting and setting headers and the body. It can also post the article back through a Net::NNTP
object.
$article = new News::Article $lines; @newsgroups = $article->header("Newsgroups"); $subject = $article->header("Subject" ); $body = $article->body; @quoted = map { "> $_" } @$body; $followup = new News::Article; $followup->set_headers(From => "clueful@isp.com", Newsgroups => [ @newsgroups ] , Subject => $subject ); $followup->set_body (@quoted, @incisive_commentary); $followup->post($client);
News::Newsrc
Many usenet applications keep lists of articles that have been read or otherwise processed. Listing millions of article numbers would be infeasible; instead, they use a compressed format, like this
1-1013,1015,1020-1030
Each newsgroup has its own article list. Article lists are typically stored in a .newsrc
file:
comp.lang.perl.announce: 1-1186 comp.lang.perl.misc: 1-233883,234000-234018 comp.lang.perl.moderated: 1-3406,3478 comp.lang.perl.modules: 1-25308,25450,25452,25494
Parsing a .newsrc
file isn't too difficult, and you can
use Set::IntSpan
to manipulate the article lists. But,
again, you don't have to. News::Newsrc
will take care of
the whole thing for you.
$newsrc = new News::Newsrc; $newsrc->load("$ENV{HOME}/.newsrc"); $group = "comp.lang.perl.modules"; $number = 42; if (not marked $newsrc $group, $number) { # process the article mark $newsrc $group, $number; } $newsrc->save;
News::Gateway
News::Gateway
provides infrastructure and architecture for a common usenet application: news/mail gateways.
Email messages and usenet articles are very similar, both in structure and function. They have some headers and a body, and they are transported over the network from a sender to a receiver. News/mail gateways allow articles that originate on usenet to be read as email, and allow email messages to be posted to usenet. This is useful in several contexts.
News::Gateway
defines a 3-layer architecture for gateways.
News::Gateway
provides the infrastructure, and it defines
a framework for collecting and organizing implementations, which may
be provided by third parties. Policy is implemented separately by each
application. The goals are to handle the details common to all gateway
applications, and reduce the amount of code that must be written in
each application.
As always, programmers should consider using existing modules in order
to reduce the amount of code that they have to write. However, there
is a special reason to use News::Gateway
. Programs that
handle mail and news can have subtle bugs. They may make assumptions
that are valid for most users, systems, and networks, and then fail in
rare instances where those assumptions don't hold. In the worst
case, they can create infinite mail loops and flood servers. These
problems can be intermittent and difficult to reproduce; they are
typically detected by users separated by time and distance from the
original programmers, and they can be very difficult to track
down.
If you are writing any kind of mail/news gateway, consider using News::Gateway
.
News::Scan
News::Scan
reads articles from a newsgroup and computes statistics about the traffic in the group. These include the total number of
It also collects information about
News::Scan
is not a general purpose module. It was
written for one single purpose: collecting traffic
statistics. Typically, these are collected at intervals and then posted on
the newsgroup, so that people who read that newsgroup can have some
idea what kind of traffic it is carrying.
With just a little more code (and a little less documentation),
News::Scan
could have been made into an application. But
then it would run from the command line; it would scan newsgroups in
just one way; it would provide output in just one format. Anyone who
wanted to do anything different would be faced with
either writing their own program from scratch or trying to hack
News::Scan
to meet their needs.
Because News::Scan
is a module, programmers can easily embed it in larger programs, they can take input from whatever sources they have, and they can generate output in whatever format they need. If you want to collect traffic statistics, News::Scan
is the module for you.