Set::IntSpan - Manages sets of integers
# BEGIN { $Set::IntSpan::integer = 1 } use Set::IntSpan qw(grep_set map_set grep_spans map_spans); # $Set::IntSpan::Empty_String = '-'; # or ''; $set = new Set::IntSpan $set_spec; $set = new Set::IntSpan @set_specs; $valid = valid Set::IntSpan $run_list; $set = copy $set $set_spec; $run_list = run_list $set; @elements = elements $set; @sets = sets $set; @spans = spans $set; $u_set = union $set $set_spec; $i_set = intersect $set $set_spec; $x_set = xor $set $set_spec; $d_set = diff $set $set_spec; $c_set = complement $set; $set->U($set_spec); # Union $set->I($set_spec); # Intersect $set->X($set_spec); # Xor $set->D($set_spec); # Diff $set->C; # Complement equal $set $set_spec equivalent $set $set_spec superset $set $set_spec subset $set $set_spec $n = cardinality $set; $n = size $set; empty $set finite $set neg_inf $set pos_inf $set infinite $set universal $set member $set $n; insert $set $n; remove $set $n; $min = min $set; $max = max $set; $holes = holes $set; $cover = cover $set; $inset = inset $set $n; $smaller = trim $set $n; $bigger = pad $set $n; $subset = grep_set { ... } $set; $mapset = map_set { ... } $set; $subset = grep_spans { ... } $set; $mapset = map_spans { ... } $set; for ($element=$set->first; defined $element; $element=$set->next) { ... } for ($element=$set->last ; defined $element; $element=$set->prev) { ... } $element = $set->start($n); $element = $set->current; $n = $set->at($i); $slice = $set->slice($from, $to); $i = $set->ord($n); $i = $set->span_ord($n);
$u_set = $set + $set_spec; # union $i_set = $set * $set_spec; # intersect $x_set = $set ^ $set_spec; # xor $d_set = $set - $set_spec; # diff $c_set = ~$set; # complement $set += $set_spec; # union $set *= $set_spec; # intersect $set ^= $set_spec; # xor $set -= $set_spec; # diff $set eq $set_spec # equal $set ne $set_spec # not equal $set le $set_spec # subset $set lt $set_spec # proper subset $set ge $set_spec # superset $set gt $set_spec # proper superset # compare sets by cardinality $set1 == $set2 $set1 != $set2 $set1 <= $set2 $set1 < $set2 $set1 >= $set2 $set1 > $set2 $set1 <=> $set2 # compare cardinality of set to an integer $set1 == $n $set1 != $n $set1 <= $n $set1 < $n $set1 >= $n $set1 > $n $set1 <=> $n @sorted = sort @sets; # sort sets by cardinality if ($set) { ... } # true if $set is not empty print "$set\n"; # stringizes to the run list
@EXPORT
Nothing
@EXPORT_OK
grep_set
, map_set
, grep_spans
, map_spans
Set::IntSpan
manages sets of integers.
It is optimized for sets that have long runs of consecutive integers.
These arise, for example, in .newsrc files, which maintain lists of articles:
alt.foo: 1-21,28,31 alt.bar: 1-14192,14194,14196-14221
A run of consecutive integers is also called a span.
Sets are stored internally in a run-length coded form. This provides for both compact storage and efficient computation. In particular, set operations can be performed directly on the encoded representation.
Set::IntSpan
is designed to manage finite sets.
However, it can also represent some simple infinite sets, such as { x | x>n }.
This allows operations involving complements to be carried out consistently,
without having to worry about the actual value of INT_MAX on your machine.
A span is a run of consecutive integers. A span may be represented by an array reference, in any of 5 forms:
Span Set [ $n, $n ] { n } [ $a, $b ] { x | a<=x && x<=b}
Span Set [ undef, $b ] { x | x<=b } [ $a , undef ] { x | x>=a } [ undef, undef ] The set of all integers
Some methods operate directly on spans.
Many of the methods take a set specification. There are four kinds of set specifications.
If a set specification is omitted, then the empty set is assumed. Thus,
$set = new Set::IntSpan;
creates a new, empty set. Similarly,
copy $set;
removes all elements from $set.
If an object reference is given, it is taken to be a Set::IntSpan
object.
If a string is given, it is taken to be a run list. A run list specifies a set using a syntax similar to that in newsrc files.
A run list is a comma-separated list of runs. Each run specifies a set of consecutive integers. The set is the union of all the runs.
Runs may be written in any of 5 forms.
The empty set is consistently written as '' (the null string). It is also denoted by the special form '-' (a single dash).
The runs in a run list must be disjoint, and must be listed in increasing order.
Valid characters in a run list are 0-9, '(', ')', '-' and ','. White space and underscore (_) are ignored. Other characters are not allowed.
Run list Set "-" { } "1" { 1 } "1-2" { 1, 2 } "-5--1" { -5, -4, -3, -2, -1 } "(-)" the integers "(--1" the negative integers "1-3, 4, 18-21" { 1, 2, 3, 4, 18, 19, 20, 21 }
If an array reference is given, then the elements of the array specify the elements of the set. The array may contain
The set is the union of all the integers and spans in the array. The integers and spans need not be disjoint. The integers and spans may be in any order.
Array ref Set [ ] { } [ 1, 1 ] { 1 } [ 1, 3, 2 ] { 1, 2, 3 } [ 1, [ 5, 8 ], 5, [ 7, 9 ], 2 ] { 1, 2, 5, 6, 7, 8, 9 } [ undef, undef ] the integers [ undef, -1 ] the negative integers
Each set has a single iterator,
which is shared by all calls to
first
, last
, start
, next
, prev
, and current
.
At all times,
the iterator is either an element of the set,
or undef
.
first
, last
, and start
set the iterator;
next
, and prev
move it;
and current
returns it.
Calls to these methods may be freely intermixed.
Using next
and prev
,
a single loop can move both forwards and backwards through a set.
Using start
, a loop can iterate over portions of an infinite set.
new
Set::IntSpan
$set_spec
new
Set::IntSpan
@set_specs
Creates and returns a Set::IntSpan
object.
The initial contents of the set are given by $set_spec, or by the union of all the @set_specs.
valid
Set::IntSpan
$run_list
copy
$set $set_spec
copy
returns $set.
run_list
$set
Returns a run list that represents $set. The run list will not contain white space. $set is not affected.
By default, the empty set is formatted as '-';
a different string may be specified in $Set::IntSpan::Empty_String
.
elements
$set
sets
$set
Set::IntSpan
objects.
The sets in the list are in order.
spans
$set
Returns the runs in $set, as a list of the form
([$a1, $b1], [$a2, $b2], ... [$aN, $bN])
If a run contains only a single integer, then the upper and lower bounds of the corresponding span will be equal.
If the set has no lower bound, then $a1 will be undef
.
Similarly,
if the set has no upper bound, then $bN will be undef
.
The runs in the list are in order.
For these operations,
a new Set::IntSpan
object is created and returned.
The operands are not affected.
union
$set $set_spec
intersect
$set $set_spec
xor
$set $set_spec
diff
$set $set_spec
complement
$set
By popular demand, Set::IntSpan
now has mutating forms of the binary set operations.
These methods alter the object on which they are called.
U
($set_spec)
I
($set_spec)
X
($set_spec)
D
($set_spec)
C
equal
$set $set_spec
equivalent
$set $set_spec
superset
$set $set_spec
subset
$set $set_spec
cardinality
$set
size
$set
size
is provided as an alias for cardinality
.
empty
$set
finite
$set
neg_inf
$set
pos_inf
$set
infinite
$set
universal
$set
member
$set $n
insert
$set $n
remove
$set $n
min
$set
undef
if there is none.
max
$set
undef
if there is none.
holes
$set
Returns a set containing all the holes in $set, that is, all the integers that are in-between spans of $set.
holes
is always a finite set.
cover
$set
Returns a set consisting of a single span from $set->min
to
$set->max
. This is the same as
union $set $set->holes
inset
$set $n
trim
$set $n
pad
$set $n
inset
returns a set constructed by removing $n integers from
each end of each span of $set. If $n is negative, then -$n
integers are added to each end of each span.
In the first case, spans may vanish from the set; in the second case, holes may vanish.
trim
is provided as a synonym for inset
.
pad
$set $n is the same as inset
$set -$n.
first
undef
.
Returns the iterator.
last
undef
.
Returns the iterator.
start
($n)
undef
.
Returns the iterator.
next
Sets the iterator for $set to the next element of $set.
If there is no next element,
sets the iterator to undef
.
Returns the iterator.
next
will return undef
only once;
the next call to next
will reset the iterator to
the smallest element of $set.
prev
Sets the iterator for $set to the previous element of $set.
If there is no previous element,
sets the iterator to undef
.
Returns the iterator.
prev
will return undef
only once;
the next call to prev
will reset the iterator to
the largest element of $set.
current
The elements of a set are kept in numerical order. These methods index into the set based on this ordering.
at
($i)
Returns the $ith element of $set,
or undef
if there is no $ith element.
Negative indices count backwards from the end of the set.
Dies if
neg_inf
pos_inf
slice
($from, $to)
Returns a Set::IntSpan
object containing the elements of $set
at indices $from..$to.
Negative indices count backwards from the end of the set.
Dies if
neg_inf
pos_inf
ord
($n)
The inverse of at
.
Returns the index $i of the integer $n in $set,
or undef
if $n if not an element of $set.
Dies if $set is neg_inf
.
span_ord
($n)
Returns the index $i of the span containing the integer $n,
or undef
if $n if not an element of $set.
To recover the span containing $n, write
($set->spans)[$i]
For convenience, some operators are overloaded on Set::IntSpan
objects.
One operand must be a Set::IntSpan
object.
The other operand may be a Set::IntSpan
object or a set specification.
$u_set = $set + $set_spec; # union $i_set = $set * $set_spec; # intersect $x_set = $set ^ $set_spec; # xor $d_set = $set - $set_spec; # diff $c_set = ~$set; # complement $set += $set_spec; # union $set *= $set_spec; # intersect $set ^= $set_spec; # xor $set -= $set_spec; # diff
The string comparison operations are overloaded to compare sets for equality and containment.
One operand must be a Set::IntSpan
object.
The other operand may be a Set::IntSpan
object or a set specification.
$set eq $set_spec # equal $set ne $set_spec # not equal $set le $set_spec # subset $set lt $set_spec # proper subset $set ge $set_spec # superset $set gt $set_spec # proper superset
The numerical comparison operations are overloaded to compare sets by cardinality.
One operand must be a Set::IntSpan
object.
The other operand may be a Set::IntSpan
object or an integer.
$set1 == $set2 $set1 != $set2 $set1 <= $set2 $set1 < $set2 $set1 >= $set2 $set1 > $set2 $set1 <=> $set2 $set1 cmp $set2 $set1 == $n $set1 != $n $set1 <= $n $set1 < $n $set1 >= $n $set1 > $n $set1 <=> $n $set1 cmp $n
N.B. The cmp
operator is overloaded to compare sets by cardinality, not containment.
This is done so that
sort @sets
will sort a list of sets by cardinality.
In boolean context, a Set::IntSpan
object evaluates to true if it is not empty.
A Set::IntSpan
object stringizes to its run list.
grep_set
{ ... } $set
Evaluates the BLOCK for each integer in $set
(locally setting $_
to each integer)
and returns a Set::IntSpan
object containing those integers
for which the BLOCK returns TRUE.
Returns undef
if $set is infinite.
map_set
{ ... } $set
Evaluates the BLOCK for each integer in $set
(locally setting $_
to each integer)
and returns a Set::IntSpan
object containing
all the integers returned as results of all those evaluations.
Evaluates the BLOCK in list context, so each element of $set may produce zero, one, or more elements in the returned set. The elements may be returned in any order, and need not be disjoint.
Returns undef
if $set is infinite.
grep_spans
{ ... } $set
Evaluates the BLOCK for each span in $set
and returns a Set::IntSpan
object containing those spans
for which the BLOCK returns TRUE.
Within BLOCK, $_
is locally set to an array ref of the form
[ $lower, $upper ]
where $lower and $upper are the bounds of the span.
If the span contains only one integer, then $lower and $upper will be equal.
If the span is unbounded, then the corresponding element(s) of the array will be undef
.
map_spans
{ ... } $set
Evaluates the BLOCK for each span in $set,
and returns a Set::IntSpan
object consisting of the union of
all the spans returned as results of all those evaluations.
Within BLOCK, $_
is locally set to an array ref of the form
[ $lower, $upper ]
as described above for grep_spans
.
Each evaluation of BLOCK must return a list of spans.
Each returned list may contain zero, one, or more spans.
Spans may be returned in any order, and need not be disjoint.
However, for each bounded span, the constraint
$lower <= $upper
must hold.
$Set::IntSpan::Empty_String
$Set::IntSpan::Empty_String
contains the string that is returned when
run_list
is called on the empty set.
$Empty_String
is initially '-';
alternatively, it may be set to ''.
Other values should be avoided,
to ensure that run_list
always returns a valid run list.
run_list
accesses $Empty_String
through a reference
stored in $set->{empty_string
}.
Subclasses that wish to override the value of $Empty_String
can
reassign this reference.
$Set::IntSpan::integer
Up until version 1.16, Set::IntSpan
specified use integer
,
because they were sets of...you know...integers.
As of 2012, users are reporting newsgroups with article numbers above
0x7fffffff, which break Set::IntSpan
on 32-bit processors.
Version 1.17 removes use integer
by default.
This extends the usable range of Set::IntSpan
to the number of bits
in the mantissa of your floating point representation.
For IEEE 754 doubles, this is 53 bits, or around 9e15.
I benchmarked Set::IntSpan
on a Pentium 4,
and it looks like use integer
provides a 2% to 4% speedup,
depending on the application.
If you want use integer
back, either for performance,
or because you are somehow dependent on its semantics, write
BEGIN { $Set::IntSpan::integer = 1 } use Set::IntSpan;
Any method (except valid
) will die
if it is passed an invalid run list.
Set::IntSpan::_copy_run_list: Bad syntax:
$runList
Set::IntSpan::_copy_run_list: Bad order:
$runList
Set::IntSpan::elements: infinite set
elements
.
Set::IntSpan::at: negative infinite set
at
was called with a non-negative index on a negative infinite set.
Set::IntSpan::at: positive infinite set
at
was called with a negative index on a positive infinite set.
Set::IntSpan::slice: negative infinite set
slice
was called with $from non-negative on a negative infinite set.
Set::IntSpan::slice: positive infinite set
slice
was called with $from negative on a positive infinite set.
Set::IntSpan::ord: negative infinite set
ord
was called on a negative infinite set.
elements
$set can generate an "Out of memory!"
message on sufficiently large finite sets.
Beware of forms like
union $set [1..5];
This passes an element of @set to union, which is probably not what you want. To force interpretation of $set and [1..5] as separate arguments, use forms like
union $set +[1..5];
or
$set->union([1..5]);
grep_set
and map_set
make it easy to construct
sets for which the internal representation used by Set::IntSpan
is not small. Consider:
$billion = new Set::IntSpan '0-1_000_000_000'; # OK $odd = grep_set { $_ & 1 } $billion; # trouble $even = map_set { $_ * 2 } $billion; # double trouble
There are two common approaches to error handling:
exceptions and return codes.
There seems to be some religion on the topic,
so Set::IntSpan
provides support for both.
To catch exceptions, protect method calls with an eval:
$run_list = <STDIN>; eval { $set = new Set::IntSpan $run_list }; $@ and print "$@: try again\n";
To check return codes, use an appropriate method call to validate arguments:
$run_list = <STDIN>; if (valid Set::IntSpan $run_list) { $set = new Set::IntSpan $run_list } else { print "$@ try again\n" }
Similarly, use finite
to protect calls to elements
:
finite $set and @elements = elements $set;
Calling elements
on a large, finite set can generate an "Out of
memory!" message, which cannot (easily) be trapped.
Applications that must retain control after an error can use intersect
to
protect calls to elements
:
@elements = elements { intersect $set "-1_000_000 - 1_000_000" };
or check the size of $set first:
finite $set and cardinality $set < 2_000_000 and @elements = elements $set;
Although Set::IntSpan
can represent some infinite sets,
it does not perform infinite-precision arithmetic.
Therefore, finite elements are restricted to the range of integers on your machine.
Users report that you can construct Set::IntSpan objects on anything that behaves like an integer. For example:
$x = new Math::BigInt ...; $set = new Set::Intspan [ [ $x, $x+5 ] ];
I'm not documenting this as supported behavior, because I don't have the resources to test it, but I'll try not to break it. If anyone finds problems with it, let me know.
The sets implemented here are based on a Macintosh data structure called a region. See Inside Macintosh for more information.
Set::IntSpan
was originally written to manage run lists for the News::Newsrc
module.
Steven McDougall <swmcd@theworld.com>
Copyright (c) 1996-2013 by Steven McDougall. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.