NAME

Align::NW - Needleman-Wunsch algorithm for optimal global sequence alignment

SYNOPSIS

    use Align::NW;
  
    $payoff = { match      => $match,
		mismatch   => $mismatch,
		gap_open   => $gap_open,
		gap_extend => $gap_extend };
  
    $nw = new Align::NW $a, $b, $payoff, %options
    $nw->score;
    $nw->align;
  
    $score = $nw->get_score;
    $align = $nw->get_align;

    $nw->print_align;
    $nw->dump_score;

DESCRIPTION

Align::NW finds the optimal global alignment of the sequences $a and $b, subject to the $payoff matrix.

Algorithm

Align::NW uses the Needleman-Wunsch dynamic programming algorithm. This algorithm runs in O(a*b*(a+b)), where a and b are the lengths of the two sequences to be aligned.

Alignments

An alignment of two sequences is represented by three lines. The first line shows the first sequence, and the third line shows the second sequence.

The second line has a row of symbols. The symbol is a vertical bar where ever characters in the two sequences match, and a space where ever they do not.

Dots may be inserted in either sequence to represent gaps.

For example, the two sequences

    abcdefghajklm
    abbdhijk

could be aligned like this

    abcdefghajklm
    || |   | || 
    abbd...hijk

As shown, there are 6 matches, 2 mismatches, and one gap of length 3.

Align::NW retuns an alignment as a hash

    $align = { a => $a,
	       s => $s,
	       b => $b };

$a and $b are the two sequences. $s is the line of symbols.

The Payoff Matrix

The alignment is scored according to a payoff matrix

    $payoff = { match      => $match,
		mismatch   => $mismatch,
		gap_open   => $gap_open,
		gap_extend => $gap_extend };

The entries in the matrix are the number of points added to the score

for each match
for each mismatch
when a gap is opened in either sequence
for each position that a gap is extended (including the first)

For correct operation, match must be positive, and the other entries must be negative.

Example

Given the payoff matrix

   $payoff = { match      =>  4,
	       mismatch   => -3,
	       gap_open   => -2,
	       gap_extend => -1 };

The sequences

    abcdefghajklm
    abbdhijk

are aligned and scored like this

                a b c d e f g h a j k l m
                | |   |       |   | | 
                a b b d . . . h i j k

    match       4 4   4       4   4 4  
    mismatch       -3          -3
    gap_open           -2
    gap_extend         -1-1-1

for a total score of 24-6-2-3 = 15. The algorithm guarantees that no other alignment of these two sequences has a higher score under this payoff matrix.

METHODS

$nw = new Align::NW $a, $b, $payoff, %options: Creates and returns a new Align::NW object. $a and $b are the sequences to be aligned. $payoff is the payoff matrix, described above. Additional options maybe passed in the %options hash; see /OPTIONS for details.
$nw->score: Fills in the score matrix for $nw. This is the O(a*b*(a+b)) operation.
$nw->align: Backtracks through the score matrix and generates an alignment for the two sequences. score must be called before align.
$score = $nw->get_score: Returns the score of the alignment. score must be called before get_score.
$align = $nw->get_align: Returns the alignment of the two sequences, as described above in /Alignments. align must be called before get_align.
$nw->print_align: Pretty prints the alignment to STDOUT. align must be called before print_align.
$nw->dump_score: Dumps the score matrix to STDOUT. This is useful mainly for debugging. The matrix is not pretty printed; line wrapping makes large matrices difficult to read.