Align::NW - Needleman-Wunsch algorithm for optimal global sequence alignment
use Align::NW; $payoff = { match => $match, mismatch => $mismatch, gap_open => $gap_open, gap_extend => $gap_extend }; $nw = new Align::NW $a, $b, $payoff, %options $nw->score; $nw->align; $score = $nw->get_score; $align = $nw->get_align;
$nw->print_align; $nw->dump_score;
Align::NW
finds the optimal global alignment of the sequences
$a
and $b
, subject to the $payoff
matrix.
Align::NW
uses the Needleman-Wunsch dynamic programming algorithm.
This algorithm runs in O(a*b*(a+b)), where a and b are the
lengths of the two sequences to be aligned.
An alignment of two sequences is represented by three lines. The first line shows the first sequence, and the third line shows the second sequence.
The second line has a row of symbols. The symbol is a vertical bar where ever characters in the two sequences match, and a space where ever they do not.
Dots may be inserted in either sequence to represent gaps.
For example, the two sequences
abcdefghajklm abbdhijk
could be aligned like this
abcdefghajklm || | | || abbd...hijk
As shown, there are 6 matches, 2 mismatches, and one gap of length 3.
Align::NW
retuns an alignment as a hash
$align = { a => $a, s => $s, b => $b };
$a and $b are the two sequences. $s is the line of symbols.
The alignment is scored according to a payoff matrix
$payoff = { match => $match, mismatch => $mismatch, gap_open => $gap_open, gap_extend => $gap_extend };
The entries in the matrix are the number of points added to the score
For correct operation, match must be positive, and the other entries must be negative.
Given the payoff matrix
$payoff = { match => 4, mismatch => -3, gap_open => -2, gap_extend => -1 };
The sequences
abcdefghajklm abbdhijk
are aligned and scored like this
a b c d e f g h a j k l m | | | | | | a b b d . . . h i j k
match 4 4 4 4 4 4 mismatch -3 -3 gap_open -2 gap_extend -1-1-1
for a total score of 24-6-2-3 = 15. The algorithm guarantees that no other alignment of these two sequences has a higher score under this payoff matrix.
new
Align::NW
$a, $b, $payoff, %options
Align::NW
object.
$a and $b are the sequences to be aligned.
$payoff is the payoff matrix, described above.
Additional options maybe passed in the %options hash;
see /OPTIONS for details.
score
align
score
must be called before align
.
get_score
score
must be called before get_score
.
get_align
align
must be called before get_align
.
print_align
align
must be called before print_align
.
dump_score
Options may be passed to new
in the %options
hash.
The following options are defined.
There are usually some some tutorials on Needleman-Wunsch and Smith-Waterman alignment floating around on the web. I used to provide links to some, but they kept going 404. If you Google around a bit you can probably find a current one.
Steven W. McDougall <swmcd@theworld.com>
Copyright 1999-2003 by Steven W. McDougall. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.