Big fleas have little fleas
On their backs to bite 'em,
And little fleas have littler still
And so, ad infinitum.
File::Find
%ls lib Bar Foo.pm SCCS %To see all the files in a directory, and all its subdirectories, and all their subdirectories, all the way down, you can use a command like
%ls -R lib Bar Foo.pm SCCS lib/Bar: Baz.pm SCCS lib/Bar/SCCS: Baz.pm,v lib/SCCS: Foo.pm,v %This is called a recursive subdirectory search, and it is a very powerful way to operate on directory hierarchies.
find(1)
find(1)
. find(1)
has many options that allow you to include or exclude files from your search. However, it has several drawbacks:
find(1)
run it for each file.
find(1)
at all. After all, Perl supports recursion. How hard could it be?
sub Find { my $dir = shift; opendir(DIR, $dir); my @files = readdir(DIR); for my $file (@files) { -d $file and Find("$dir/$file"); -f $file andSTOP! Don't write another line. It's already been done. It's called
File::Find
, it comes with the standard Perl distribution, AND when you use File::Find
, you get these two BONUS LINES at NO EXTRA CHARGE:
$file eq '.' and next; $file eq '..' and next;
File::Find
File::Find
, write
use File::Find; find(\&Wanted, $dir);
find()
does a recursive subdirectory search of $dir
. It calls Wanted()
once for each file and directory in $dir
, including $dir
itself. You can actually specify a list of directories, and find()
will search all of them
find(\&Wanted, @dirs);You get to write
Wanted()
. It is an arbitrary subroutine, and can do whatever you need. When Wanted()
is called,
$_
is set to the current file name
$File::Find::dir
is set to the current directory
$File::Find::name
is set to "$File::Find::dir/$_"
chdir()
'd to $File::Find::dir
find()
relies on $_
, so if you change it, you must restore it before returning from Wanted()
. If Wanted()
sets $File::Find::prune
on a directory, then find()
will not descend into that directory.
Wanted()
Wanted()
begins by deciding whether it wants to operate on the current file. Regular expression matches on $_
do this concisely:
sub Wanted { # only operate on Perl modules /\.pm$/ or return; ... } sub Wanted { # Don't descend into SCCS directories /SCCS/ and $File::Find::prune = 1; ... }
finddepth()
File::Find
has an alternate entry point called finddepth()
. find()
and finddepth()
both traverse the directory hierarchy depth-first. The difference is that find()
calls Wanted()
on subdirectories on the way down, and finddepth()
calls Wanted()
on subdirectories on the way back up.
This is easier to understand with an example. If we run
find(sub { print "$_\n" }, 'lib')on the directory hierarchy shown at the beginning of this article, the output is
. SCCS Foo.pm,v Bar Baz.pm SCCS Baz.pm,v Foo.pmNote that
find()
calls the sub on SCCS
before Foo.pm,v
. On the other hand, if we run
finddepth(sub { print "$_\n" }, '.')on the same hierarchy, the output is
Foo.pm,v SCCS Baz.pm Baz.pm,v SCCS Bar Foo.pm .and we see that
finddepth()
calls the sub on SCCS
after Foo.pm,v
. $File::Find::prune
doesn't work in finddepth(),
because finddepth()
has already descended into the subdirectory before Wanted()
has a chance to set it.
Many people access usenet through a newsreader. Newsreaders are good if they do what you want; they can be slow and clumsy if they do not. If you can't find a newsreader that does what you want, you can use File::Find
to scan your news spool directly. Here's an example:
#!/usr/local/bin/perl use strict; use File::Find; my($Group, $Text) = @ARGV; my $Spool = "/var/spool/news"; # or wherever your newsspool lives $| = 1; # so we can see it run find(\&Kibo, "$Spool/$Group"); sub Kibo { -d and print "$_\n"; -f and /^\d+$/ or return; print "$_\r"; open(ARTICLE, $_) or return; my @lines = <ARTICLE>; for my $line (@lines) { $line =~ /$Text/o and print $line; } }This program takes two command line arguments: a newsgroup and a string. It reads all the articles in the newsgroup, and all its subgroups, and prints any lines that contain the string. It also prints the newsgroup names and article numbers as it goes; this is mainly so the user can tell that something is happening. Depending on the size of the newsgroup, the program can take a long time to run.
The filename filtering in Kibo()
is typical. The -d
and -f
tests sort out files and directories. The articles in a newsspool have numeric filenames; the /^\d+$/
test skips any extraneous files that may be lying around.
If you actually want to write or use a program like this to scan usenet, check the scripts/news/
directory on CPAN; it contains several working examples. Remember, Laziness is one of the principle virtues of a programmer.